r— “. E“. 3- my -.__-..- -...L I higan Siite LI ,.., .‘-'li'-" '77.. rt: ‘1 I, --- Viv-‘1 more This is to certify that the thesis entitled Areal Data Reaggregatibn: A Comparison of Two Methods presented by Gustave William Rylander has been accepted towards fulfillment of the requirements for Master Of Al‘tSdegree in Geography M or professor Date 10/ 28/86 0-7639 MS U is an Waive Action/Equal Opportunity Institution RETURNING MATERIALS: 1V1531_] Place in book drop to LJBRARJES remove this checkout from 4!!.;3...._ your record. FINES will be charged if book is returned after the date stamped below. AREAL DATA REAGGREGATION: A COMPARISON OF TWO METHODS BY Gustave William Rylander A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF ARTS Department of Geography 1986 Lise-saw 7 ABSTRACT AREAL DATA REAGGREGATION: A COMPARISON OF TWO METHODS BY Gustave William Rylander Two methods for reaggregating data between sets of areal units, "source units" and "target units", are compared. With the overlay method, the value for each source unit is allocated among the target units in proportion to its area of overlap with each target unit. The pycnophylactic method first interpolates an intermediate surface from the source units and then aggregates it into the target units. A previous case study with census tracts and planning districts in London, Ontario, revealed no significant differences in their performance. Two factors that may lead to better performance by one method or the other, however, are the size of the units and their delineation criteria. The performance of the methods in estimating populations of the 48 contiguous states from those of 181 economic areas is compared with their performance in the previous study. Despite the fact that pycnophylactic interpolation resulted in a closer approximation to the actual surface, the difference in the performance of the methods was not significant. A difference was observed, however, between the relative success of the methods in the two case studies. ACKNOWLEDGEMENTS I would like to thank my advisor, Judy Olson, and my other committee members, Bruce Pigozzi and John Hunter, for their valuable comments, criticisms, and support during the deve10pment of this thesis. My department chairman, Gary Manson, was also very patient and supportive, and I extend my thanks to him. Cheers and the best of luck to three friends and fellow graduate students: Ann Goulette, Charlie Johnston, and Kim Medley. Bruce, Murray, and Matt provided me with a comfortable home and plenty of pizza during my last month at Michigan State. Finally, my parents, Gus and Nancy Rylander, offered their constant support, and I want to express my gratitude and love to them. TABLE OF CONTENTS List Of Tables 0 O O O O O O O O O O O O O O O O I O O O O O O O O O O O O O O O O O O O O O 1 List Of Figures 0 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 0 ii Chapter I. IntIOduction O O O O O O O O O O I O O O O O O O O O O C O O O O O O O O 1 Chapter II. Previous Research on Areal Data Reaggregation ................. 4 A. Development of Areal Data Reaggregation Methods .................. 4 l. The Overlay Method ............................. 4 2. The Contour Reaggregation Method ............... 6 3. The Pycnophylactic Method ...................... 8 B. Variations on the Methods ......................... 11 C. Empirical Comparison of the Areal Data Reaggregation Methods .................. 13 D. Applications and Adaptations of the Methods ....... 15 E. Summary ........................................... 16 Chapter III. Research Hypotheses and Objectives ...... 17 A. Effect of Size of Source Units on Areal Data Reaggregation .......................... 17 B. Effect of Source Unit Delineation Criteria on Areal Data Reaggregation ....................... 18 C. Research Objectives ............................... 18 Chapter IV. Research Methodology ..................... 23 A. Data Sources and Compilation ...................... 23 B. Rasterization Methods and the Loss of County Data ....................... 24 C. Implementation of the Areal Data Reaggregation Methods .................. 25 Chapter V. Results ................................... 27 A. Convergence of the Pycnophylactic Interpolation Process ............................. 27 B. Goodness-of—Fit of the Estimating Surfaces ........ 28 C. Target Unit Estimates and Errors .................. 33 Chapter VI. Summary and Conclusions .................. 45 Chapter VII. Recommendations for Further Study ....... 48 Bibliography OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 50 LIST OF TABLES Population Data for Planning Districts ............ Summary Statistics for Actual and Estimated Values for Planning Districts ............................ Source Unit Delineation Criteria A. Census Tracts: B. Economic Areas: London, Ontario ................. united States OOOOOOOOOOOOOOOOOO POPUIation Data for States .0.00000000000000_OOOOOOO Summary Statistics for Actual and Estimated Values for states .0...0.0.0....O0.00000000000000000000COO Relationship of Success of Reaggregation Methods to Source Units Used 15 21 21 36 44 2.1. 2.2. LIST OF FIGURES Interpolating a Smooth, Pycnophylactic Surface: Iteration l ....................................... 12 Effects of Values of Neighboring Units on Pycnophylactic Surface Configuration a. Trough ......................................... 12 b. Slope .......................................... 12 c. Peak ............................................12 Source Units and Target Units a. Source Units: Economic Areas ................... 20 b. Target Units: 48 Contiguous States ............. 28 c. Source Units and Target Units Superimposed ..... 20 Idealized Surface Configuration a. Within an Economic Area ........................ 21 b. Within a Census Tract .......................... 21 Convergence of the Pycnophylactic Interpolation Process a. Change in Source Unit Population Due to Volume-Adjustment ....................... 30 b. Change in Estimating Surface Between Successive Smoothings .................. 30 Error in the Estimating Surfaces .................. 30 Error in the Estimating Surfaces a. Positive Errors: Economic Areas ................ 31 b. Negative Errors: Economic Areas ................ 32 c. Positive Errors: Pycnophylactic Surface ........ 33 d. Negative Errors: Pycnophylactic Surface ........ 34 Target Unit Population Density a. Population Density, 1988: Actual ............... 38 b. Population Density, 1980: Overlay Method ....... 39 c. Population Density, 1980: Pycnophylactic Method. 39 Target Unit Error , a. Positive Error: Overlay Method ................. 40 b. Negative Error: Overlay Method ................. 48 c. Positive Error: Pycnophylactic Method .......... 41 d. Negative Error: Pycnophylactic Method .......... 41 ii CHAPTER I. INTRODUCTION Geographers often use data that are associated with areal units. The boundaries of different sets of overlapping units are usually delineated by different institutions for their own purposes and do not always coincide. A result of this non-coincidence is that the sets of data that are available for a region may not be compatible. This is a problem for the researcher_who needs to relate the data from different sets of units either statistically or visually on maps. For example, the hypothesis that income is geographically related to political behavior is difficult to test empirically when the boundaries for census tracts do not respect those of political wards. Assuming that a census of one of the sets of units is not feasible, the solution is to estimate the data for one set, the target units, from those of the other set, the source units. The term areal data reaggregation is used here to refer to the entire family of methods by which values for target units are estimated from source units. Previous research (Lam, 1989) has shown that two of the available methods of areal data reaggregation, the overlay and pycnophylactic methods, are clearly better than the others. Overlay reaggregates the data directly from the source units into the target units, while the pycnophylactic method first interpolates an intermediate, continuous surface from the source units and then reaggregates it into the target units. The two methods, then, aggregate 1 2 different estimating surfaces, approximations of the actual surface, into the target units. The estimating surface represents an assumption about the configuration of the actual surface. In a previous study (Lam, 1988) the overlay and the pycnophylactic methods performed nearly identically, and it is not known whether their relative accuracy and precision are stable when units with different spatial characteristics are used. Information about the actual surface is important to the selection of an appropriate data reaggregation method because the configuration of the actual surface affects the validity of the assumptions implied by the use of the different estimating surfaces. This study isolates two of the factors that may affect the performance of the methods, the size (or aggregation level) of the areal units and the criteria used £2 delineate their boundaries. The size of the source units may affect the likelihood that the actual distribution is homogeneous within their boundaries. Larger units may be less likely to contain homogeneous distributions. The delineation criteria often provide more detailed information about the configuration of the actual surface, which strongly affects the accuracy of the estimates. In other words, they suggest the kind of estimating surface that may yield good estimates. The performance of the two reaggregation methods is compared using the results from an earlier study along with parallel information for a new set of units that is clearly 3 different in these two factors. Although a single comparison cannot greatly improve the confidence with which we choose a method, this study may suggest some characteristics of size and delineation criteria to consider beforehand. CHAPTER II. PREVIOUS RESEARCH ON AREAL DATA REAGGREGATION Research on the areal data reaggregation problem has yielded three basic reaggregation methods: overlay, contour reaggregation, and the pycnophylactic method. Systematic, if limited, comparisons of their performance have also been made. A. Development of the Areal Data Reaggregation Methods 1. The Overlay Method Markoff and Shapiro (1973) demonstrated and empirically tested the simplest method of areal data linkage, the overlay method. This method reaggregates data directly from the source units into the target units, taking each target unit value as a weighted sum of the source unit values. The weights are the areas of overlap between each target unit and the source units. The contribution of each source unit to a target unit, then, is proportional to its area of overlap. Different formulas for overlay are necessary depending on whether the source unit data are absolute numbers or percentages. For absolute numbers, A. A E : 1 ' 3' A3 A where V' is the estimated value for target i j is the actual value fer source j Aij is the area of overlap between i and j Aj is the area of j. Fbr percentages, A. A E : 1 ' 3 A1 where A1 is the area of i. Using data for 18th Century French generalites and departments, Markoff and Shapiro estimated the populations of each of these sets of areal units from the other. They made their area measurements with a grid overlay.. Using equation 1), the estimates had a correlation of .53 when estimating data for departements from those of generalites and .96 when the direction was reversed. The higher correlation was obtained when estimating the values for the relatively larger generalites. This improvement is expected, because the departements provide a more detailed representation of the actual population density surface. In general, for any areal data reaggregation method, better estimates will be obtained when data are reaggregated from smaller units to relatively larger units. Errors in the overlay method occur because of intra-unit deviations from the single density value assigned to each of the source units. In other words, the implicit assumption of the overlay method is spatially homogeneous density within the source units. This has been called the choropleth assumption, after the mapping method commonly used for areal data. Crackel (1975) presented modifications of the overlay formulas for special cases in which the area of overlap is 6 either less than or greater than the area of the target unit. This can happen when the target unit is not completely covered by source units or when more than one set of source units (overlapping one another) are used to estimate the value of a target unit. He suggested multiplying the weighted sum used in formulas l and 2 by the ratio of the target area to the total area of overlap. This correction reduces the estimated value when the target area is smaller than the area of overlap and increases it when the target area is greater, and it should should often reduce the severity of errors. 2. The Contour Reaggregation Method Ford (1976) contributed an alternative to the overlay method. The contour reaggregation (CR) method involves interpolating values for a regular grid from those given for control points located at the centers—of—gravity of the source units and reaggregating the interpolated values into the target units. The primary difference between CR and overlay, then, is that CR transforms the source units into a smooth, disaggregated surface before aggregating it into the target units. This intermediate step is intended to improve the representation of the actual surface by introducing a degree of spatial autocorrelation into the estimating surface. The value of a spatially-distributed variable at a particular location is usually positively related to nearby values, but the aggregation of the surface into source units hides some of this information. Interpolation can recover some of the lost detail. 7 Ford's interpolation method was similar to the one used by the SURFACEII program. He fit a second-order trend surface equation to the control points using weighted least squares (weight = l/d2) of the nearest 8 control points. Then the equation was used to predict the values at the nodes of the regular grid. He estimated rent values for 10 postal zones from those of 15 census tracts in Dade County: Florida. Although his sample size was too small to produce a reliable measure of statistical association, he reported a relative root mean squared error (RRMSE) of .255 between the actual and estimated values. RRMSE is a root mean squared error (RMSE) standardized about the mean of the estimated values: mass 1 RRMSE = 3-3- : ‘01 - Ei)2 “-2 :31 3) i N i N i where Bi and Oi are the expected and observed values. Ford suggested that methods yielding an RRMSE of around .10 could be considered fairly accurate. Although CR may have a theoretical advantage over the overlay method because of its incorporation of spatial autocorrelation into the estimating surface, it also has a critical disadvantage: interpolation from control points generally does not preserve the total value, or volume, of each of the source units. The common interpolation methods often preserve the values at the control points, but these values are inconsequential when areas are the units of interpolation. If one assumes that the source unit data 8 were collected without error, then CR adds detail (however realistic) to the surface at the expense of accuracy. The next section traces the progression of research that proved that this trade-off is unnecessary. 3. The Pycnophylactic Method The problem left for subsequent research into areal data reaggregation was to incorporate smoothness into the estimating surface while preserving the volume of the source units. The solution to this problem had its origins in work by Boneva, Kendall, and Stefanoff (1971). They introduced a technique for interpolating yearly birth rates from data given in five-year intervals. Aggregated, temporal data of this sort can be displayed graphically as a histogram, with the total births for a five-year period given by the area of an individual bar. Histogram bars, however, hide some of the serial autocorrelation usually exhibited by data collected over time. The authors' objective, then, was to replace the discrete bars with a continuous curve while preserving the total area under the curve as a whole and within five-year intervals. The curve was generated as a mathematical spline, which is essentially a series of simple curves (or functions) pieced together end-to-end. The resulting curve, called a histospline, adds hypothetical detail to the intervals without changing the accuracy of the original, aggregated data. Tobler and Lau (1978) adapted the technique for the two-dimensional case, i.e., for bivariate histograms. A 9 bivariate histogram has data associated with rectangular cells instead of intervals on a line. The bars Of a univariate histogram are replaced by columns, with the density and total value for each cell represented by its column's height and volume, respectively. Tobler and Lau demonstrated a geographic application of this method using population data compiled by 0.25 square mile grid cells for Ann Arbor, Michigan. When the surface interpolated from these grid cells is threaded with isolines, an isopleth map is produced. The primary motivation for their research, in fact, was the improvement of isopleth mapping methods, not areal data reaggregation. Tobler (1979) developed the technique further by adapting it to operate on irregular geographic units. He accomplished this by disaggregating the units into a set of small grid cells. In computer terminology, he used a raster data structure. This is the digital counterpart of the physical grid overlay that Markoff and Shapiro used to make their area measurements. Tobler used the term pycnophylactic (volume-preserving) to refer to the new interpolation method. He also provided a useful analogy to enhance its intuitive appeal. He suggested thinking of the original stepped surface as a clay model. If the pycnophylactic interpolation method were to operate physically, it would sculpt the clay such that the surface would become as smooth as possible without moving any clay between neighboring columns. Tobler's study, then, was the final step in maintaining 10 source unit accuracy (like overlay) while interpolating realistic detail into the surface (like CR), and the pycnophylactic method has obvious appeal as a method for areal data reaggregation. Algorithms for pycnophylactic interpolation are described in Tobler (1979) and Lam (1980, 1983). A simple raster-based algorithm, based on the one used by Lam (1980, 1983), is described below because an understanding of the interpolation process is necessary in order to understand the pycnophylactic estimating surface. The pycnophylactic interpolation algorithm is shown graphically in figure 2.1. First, the grid cell values used to identify the source units are replaced with the corresponding density values. The surface is then smoothed using an arbitrary smoothing operator (e.g., each grid cell is assigned the mean of its four non-diagonal neighbors). Finally, the total values within the source unit boundaries are recovered by multiplying each grid cell by the ratio of the original source unit value to the new (smoothed) value. This process is repeated an arbitrary number of times. Since the smoothing routine reduces differences in neighboring grid cells, it operates from the edges of the source units (i.e., from the "cliffs" between neighboring units) inward. The monotonic reduction of differences also implies that the process converges; that is, the changes in the surface become smaller with each iteration. Another characteristic of this interpolation algorithm is that the configuration of the interpolated surface 11 within source units depends entirely on the densities of neighboring units. Figure 2.2 shows how the locations of maxima and minima within a unit are affected by neighboring units. If all of the neighbors have higher densities, a "trough” will be interpolated within the unit (fig. 2.2a). If the neighbors have a combination of higher and lower densities on opposite sides then the surface will slope from one side of the unit to the other (fig. 2.2b). Finally, a "peak" is interpolated if the neighboring units all have lower densities (fig. 2.2c). B. Variations on the Methods The research up to and including Tobler (1979), then, had produced three basic methods for areal data reaggregation: overlay, CR, and pycnophylactic. Variations of all three methods are possible. Overlay can be approximated by using unweighted averages of the overlapping source units or simply the value of the source unit with the largest overlap area. CR can use any of the multitude of methods for interpolating from control points. And the pycnophylactic method can use different smoothing routines and boundary conditions. Overlay uses the source units themselves as the estimating surface, while CR and the pycnophylactic method use interpolated surfaces. The estimating surfaces for overlay and pycnophylactic both preserve the total volume within the source units. 12 1. Replece unit identifiers with density values (see teble below) 3. Adjust unit values (see table below) ITERATION 1 lnitiel Initial Ratio Unit Ares Pop. Density Pop. (initiel pop. / pop.) 1 6 I) 5.0 32.83 0.914 2 I3 78 6.0 79.00 0.%7 3 15 105 7.0 103.75 1.012 4 6 48 80 45.42 1.057 Figure 2.1. Interpolating a Smooth, Pycnophylactic Surface: Iteration l o) m b)“ r c) m Figure 2.2. Effects of Values of Neighboring Units on Pycnophylactic Surface Configuration 13 C. Empirical Comparison of Areal Data Reaggregation Methods Although the performance of overlay and CR had been assessed independently (Markoff and Shapiro, Ford), no comparisons of the three methods were made until 1980. Lam (1980) did not use the same interpolation algorithm for CR as Ford but took the values at grid cell nodes as distance-weighted averages of the control points. The weight she used, Z'd/B, allows arbitrary specification of the value of B. Higher B values result in more weight allocated to distant points and, thus, a smoother surface. To give CR the benefit of the doubt relative to the volume-preserving methods, Lam chose a value of B that minimized the difference between the CR surface and the actual one (0.15 for her data set). She estimated population data for 21 planning districts from 51 census tracts in London, Ontario, and her results are listed in Table 2.1. The summary statistics in Table 2.2 were derived from these results. The correlation coefficients from these experiments are indicators of the precision of the estimation method, which is clearly higher for the volume—preserving methods than for the "optimal" CR method. The accuracy of the estimates is indicated by the regression and RMSE statistics. Perfect accuracy would result in a slope of 1, an intercept of 0, and an RMSE of 0. Table 2.2 thus also indicates the generally higher accuracy of the volume-preserving methods. Finally, Lam observed that the highest errors for the 14 amp.a mae.en maa.a amm.au ham.an mea.s mea.ma Amm.s mom.s em~.m mH~.a ema.a mma.au em~.su $55.5- em~.sn mp~.su maa.su mma.s mma.an aflmne mo .mmmxm. Nmmmmmw Hma.al mmH.aI mHe.al aha.sl hma.al mmm.a a¢¢.a mm&.&l Hma.5I mnn.st Hvs.al sma.al mma.al mHH.s mos.s msa.SI N¢5.& maa.a Hem.a abs.s vH¢.sI mmN.a @HH.5I Nma.a mma.st HHS.&I hb¢.& maa.~ m¢a.sl mma.al wms.al mma.st MNH.aI mms.al Nss.s mas.a mas.a hvs.a mwa.a oaa.a wma.a hNH.aI Aamsuoc my muouum c.5mvma m.mmvol «.mhmm m.mmNaHI m.m~mml m.vmm h.vmbh «.mbmm a.mmmw m.mmav v.55NH m.ammv m.mmmhl m.~mvvl m.m>msai o.mmaan a.asmmu ~.~maeu ~.mamm H.5emu mnwwmm mo H.val m.~wo¢ N.mmhml m.mmmal H.mm~I m.aam m.>m~mt «.mmwml m.o~ml N.th m.svHH m.mam m.mm m.v>a m.mw¢Hi h.mmhl N.mle h.HmmI m.mawl m.m~l N.&¢~I N.maml h.wvmai ~.hbmal H.HMHHI N.mvml a.mma~ s.m~ m.vasa m.vm m.H~bI m.am m.mmm m.v¢m m.mmm m.maw N.Hmm¢ N.smh~ h.mmh m.m~m mommmml o.mwml OCMNM NmHuw>o MHOHHN m.mHmHm 5.5mmh ¢.amvma m.mmsm h.masm m.eam~ >.H~mh ~.amem~ 5.5mmea w.aama a.HmHA m.H¢seH ~.amem m.aam- «.mmoa «.mmme e.aa~m m.mmmm ~.mmmo m.memafl m.HHmm mo a.maass m.ammHH m.ma~ma e.vmaes ~.eamm m.oesm m.mwa m.hb~ms m.m-s~ m.me~ w.saem m.mam¢ m.haama s.HmmmH m.HmmeH «.mamm m.am~ms m.mmsas ~.eemm A.mmama b.56Hm ocmxm NmHum>o mwumefiumm m.ahsmm h.mmmNH m.vawH o.mmhma m.a>mm m.vaN m.Hw~ m.mmmmH m.NhHaH N.vwh m.mhmm m.~awm m.mmeH s.~mm5a m.HHmmH m.mmmw m.mmmma m.mmmaH «.mmmm «.mmwaa ¢.HNb¢ mmmma Hm hamwa am «mvma ma Nmmma ma mwvm 5H msaa ma hm ma mwhha ea «mmaa ma was NH Hmmm HA umHHH 5H mNHbH 3mg .53 "wousomv muofiflmwo 92:on now 38 cofiumgmom .H.~ manna. 15 Table 2.2. Sunnary Statistics for Actual and Estimated Values for Planning Districts (Source: Lap, 1980) Est. Method RMSE R Slope Intercept (RRMSE) (a?) (5.3.) (S.E.) Overlay 1479.756 .979 9.974 + 151.180 (.139) (.940) (.056) (681.553) Pycno. 1657.200 .960 1.915 + 97.621 (.162) (.930) (.965) (768.243) ca (w'= 2-d/-15) 6199.649 .519 0.494 + 5267.386 + (.576) (.269) (.159) (2932.199) + Significant at 95% confidence level planning districts occurred in suburban areas. This is to be expected because a target unit in a suburban area often overlaps one or more predominantly 25233 source units. The high density of these source units must be partially allocated to the target unit, resulting in frequent overestimation. D. Applications and Adaptations of the Methods Most other articles addressing the areal data reaggregation problem either directly or indirectly (Goodchild and Lam, 1980; Lam, 1985; Wallin, 1984) were generally applications rather than evaluations of the methods. Clarke (1984) presented an interesting adaptation of the pycnophylactic method. He generated a pycnophylactic surface exhibiting periodicities, implementing it using two-dimensional fourier series instead of simple smoothing. A fourier series is a summation of a set of sine and cosine waves of various wavelengths, amplitudes, and phase angles. One possible application of a periodic surface is to model 16 geographic central places. By changing the number of harmonics in the series, he was able to simulate surfaces ranging from the stepped surface represented by the source units to a smooth surface similar to one interpolated using inverse distance-weighting. Between these extremes, irregular periodicities were generated in the surface. Clarke was able to incorporate all of these diverse hypotheses about the actual surface while maintaining the pycnophylactic property. E. Summary The research into the areal data reaggregation problem, then, has yielded three basic methods: overlay, contour reaggregation, and the pycnophylactic method. Markoff and Shapiro confirmed that better estimates will generally be achieved when the source units are small relative to the target units. In addition, Lam (1980) showed that the volume-preserving methods, overlay and the pycnophylactic method, clearly perform better than those that do not preserve volume. However, no difference in performance between the two volume—preserving methods has been demonstrated. This is surprising considering the difference in their estimating surfaces. CHAPTER III. RESEARCH HYPOTHESES AND OBJECTIVES Although Lam (1980) found no significant difference between the overlay and pycnophylactic data reaggregation methods for her study area, there are characteristics of the areal units and the actual distribution that may be useful in discriminating between the performance of the two methods. In particular, the absolute size of the areal units and the criteria used in delineating them could affect the relative performance of the methods. Variations in these two factors can change the validity of the choropleth and smoothness assumptions implied by the use of the overlay and pycnophylactic methods, respectively. A. Effect of Source Unit Size on Data Reaggregation It is well established that reaggregation from smaller to larger units will generally produce better estimates using any of the reaggregation methods. However, the absolute size of the source units may also play a role. Units of larger size may have less "opportunity" to capture areas of spatial homogeneity and thereby satisfy the choropleth assumption. This idea was introduced by Coulson (1978) in his paper on the potential for variation in areal units. Potential for variation is an index that Coulson devised for assessing the likelihood, or opportunity, for a single areal unit to contain a homogeneous distribution. It is based on the hypothesis that larger and less compact areal units are less likely to be homogeneous. The 17 18 empirical evidence in support of this hypothesis is scanty, but the idea has intuitive appeal. Since the overlay method is based on the choropleth assumption, larger source units may result in poorer performance for this method. B. Effects of Source Unit Delineation Criteria on Data Reaggregation The criteria, if any, used to delineate the source units can give clues to the configuration of the actual surface. There are other, preferred methods for obtaining information on actual surface configuration, namely: 1) smaller source units and 2) data for intra-unit features such as the locations and populations of cities. However: this kind of information can be difficult to obtain. Often, verbal descriptions of the delineation criteria are the best information at our disposal about the intra-unit configurations. Our ability to model this configuration for a study area, i.e., to generate an accurate estimating surface, could strongly affect the accuracy of the reaggregated data. C. Research Objectives The present study replicates Lam's experimental methodology using units that are significantly different in size and delineation criteria. It uses source units that are much larger than Lam's census tracts and that are delineated in a manner that should favor the pycnophylactic 19 method. In other words, an attempt is made to ascertain whether source unit size and delination will lead to a significant difference between them. The performances of the two volume-preserving data reaggregation methods are compared using the 48 contiguous states as target units and 181 economic areas, delineated by the U.S. Bureau of Economic Analysis, as source units (fig. 3.1). The economic areas completely cover the area of the United States. The sets of delineation criteria for Lam's census tracts and the economic areas are listed in Table 3.1. These criteria give some indication of the configuration of the population density surface within the source units. They can be used to hypothesize idealized, intra-unit population distributions for each set of units, and a comparison of these idealized distributions with the estimating surfaces used by the data reaggregation suggests which method will perform better. Of the two sets of criteria, the one for economic areas reveals more about the actual surface configuration. The idea that each area contains a predominant node or central place (criterion 82) surrounded by tributary counties suggest an idealized distribution that is bivariate normal. A cross-section of this distribution is shown in figure 3.2a. This idealized economic area distribution corresponds closely to the "peak" that is interpolated using the pycnophylactic method when a source unit is surrounded by neighbors with lower densities (fig. 2.2c), i.e., when it is a local maximum. Thus, the 20 CI . SOURCE UNITS: ECONOMIC AREAS I“ headless-em 06 best urn-wane”) b- TARGET UNITS: 48 CONTIGUOUS STATES C. SOURCE UNITS AND TARGET UNITS SUPERIMPOSED Figure 3.1. Source Units and Target Units 21 Table 3.1. Source Unit Delineation Criteria A. Census Tracts in London, Ontario (used by Lam) l. Boundaries follow permanent, easily identified features such as roads and streans. 2. Tbtal population between 2500 and 8000 for each tract, except within central business district. 3. Socio—econanic hanogeneity within each tract. 4.<3mmmmt shape. B. Economic Areas in the United States (used in present study) 1. Each area disaggregates into counties. 2. An economic node, usually an SMSA, within each area. 3. Sane areas contain smaller SMSA's as secondary nodes. 4. Cities with population above 25000 function as nodes in regions vtererw afim'scxmur. a) Within an Economic Area b) Within a Census Tract Figure 3.2 Idealized pycnophylactic method should states intersecting economic and worse estimates when the Surface Configurations produce better estimates for areas that are local maxima economic areas are local minima. The difference in the errors associated with each method will generally be greatest in the locally high—valued economic areas, where the pycnophylactic method performs best. Based on the size and delineation criteria for economic areas, then, it is anticipated that the pycnophylactic method will yield superior estimates. Although the criteria for census tracts indicate nothing specific about the actual surface configuration, 22 some information can be hypothesized. Permanent linear features (criterion A1) often separate land uses that are associated with different population densities. That is, they may represent discontinuities in the population surface. Socio-economic homogeneity (criterion A3) within a tract may imply some degree of discontinuity in the population surface. For instance, regions of lower socio-economic status are often associated with higher population densities. Finally, if we accept Coulson's hypothesis, compact shape (criterion A4) reduces the potential for variation of the census tracts. Thus, there is some justification for proposing a model for the census tract distributions similar to the stepped surface shown in cross-section in figure 3.2b. This model resembles the estimating surface associated with the overlay method, the source units themselves. The overlay method did, in fact: have a higher percentage of superior estimates for Lam's planning districts (71%). To summarize, the objective of this research is to determine whether there are significant differences in the performance of the volume-preserving methods of areal data reaggregation. It is argued that the absolute size and delineation criteria for source units can be important discriminating variables, and the units selected for this study are markedly different in these two factors than units used previously. The use of economic areas, rather than census tracts, as source units may favor of the pycnophylactic method. CHAPTER IV. RESEARCH METHODOLOGY The general research strategy has precedent in the earlier studies. A set of small units that aggregate into both the source and target units is used as an approximation of the actual density surface. The data for these small units are aggregated to give ”actual" values for the source and target units. The data reaggregation methods are applied to the source units to yield estimated values for the target units. These estimates are then compared with the actual target unit values. A. Data Sources and Compilation U.S. counties in 1980 were used as the small units because they aggregate into both economic areas and states. The digitized map data for 3073 counties were obtained from a file distributed by the 0.8. Bureau of the Census. The county population data were obtained from the County and City Data Book, 1980 (U.S. Bureau of the Census). The organization of the data was then transformed from the original polygon, or vector, structure to a raster structure using the ERDAS 400 digitizing system. A grid cell approximately equal in size to Baltimore City, Maryland and Washington, D.C was selected (167.3 sq. mi.)- The original vector data had been digitized from a base map on an Alber's Equal Area projection. The rasterization process resulted in the elimination of 433 counties from the file. This loss was probably an 23 24 artifact of the rasterization algorithm and the file sequence of the digitized county data. A more detailed description of these two factors is provided here to account for the loss of data. B. Rasterization Methods and the Loss of County Data The vector-to-raster conversion process often operates sequentially, processing one feature (in this case, a polygon) at a time. In other words, it assigns a cluster of grid cells to each county in the file as it comes to it. Depending on the criteria used for assignment, small counties processed earlier in the file can be lost because their grid cells are assigned to subsequent counties in the file. Some common assignment criteria are: l) assign the grid cell to the county with the largest overlap area, 2) assign the grid cell to the county that overlaps at the center point of the cell, and 3) assign the grid cell to the last county in the file that intersects it, regardless of the area of overlap. The third criterion is the crudest and would probably result in the loss of more counties than the other two, but any of these methods can result in some loss. Information on the algorithm used by the ERDAS system is proprietary. The counties in the file were generally organized alphabetically within states that were also organized alphabetically. A consequence of this organization was that the loss of counties was biased toward those with names that are low in the alphabet because they were 25 processed earlier. Although many counties were lost, none of the states or economic areas were unrepresented; and because the surface without the lost counties became the surface of reference for all further analyses, the data loss should not affect the conclusions reached about the reaggregation methods. C. Implementation of the Areal Data Reaggregation Methods Once the population and map data were consolidated and aggregated into states and economic areas, the two reaggregation methods were applied to the economic areas. With raster data, the overlay method simply required aggregating all of the grid cell values for economic areas into states. The pycnophylactic interpolation process described in figure 2.1 was applied to the economic areas and allowed to iterate twelve times. The changes in individual cell values and in total populations for economic areas were closely monitored during this process. It was found that the rate of change for both had nearly decreased to zero after twelve iterations. Additional iterations probably would not have transformed the surface significantly. The final pycnophylactic surface was then aggregated into states to produce population estimates. Some of the details of pycnophylactic interpolation are not shown in figure 2.1 and deserve mention. First, the smoothing routine assigned each grid cell the mean value of its non-diagonal neighbors, regardless of how many there 26 were (zero to four). If a cell had no non-diagonal neighbors, then its value remained constant; and if a cell had only one neighbor, then it was assigned the neighbor's value. Second, cells at the boundary of the study area were not affected by their neighbors outside of the study area, which were assigned a constant value of zero. This means that there were ”cliffs" at the edges of the study area even after interpolation. This boundary condition is more realistic along the coasts of the United States than along its Canadian and Mexican borders. Some of the primary deficiencies of the research design were the approximation of the actual surface with county—level data, the loss of counties due to rasterization, the somewhat arbitrary selection of twelve iterations for the pycnophylactic interpolation process, and the unrealistic boundary conditions. It is felt that none of these weaknesses was important enough to affect the results substantially. CHAPTER V. RESULTS The results of these data reaggregation experiments can be divided into three parts. First, since the number of iterations used in the pycnophylactic algorithm directly affects the target unit estimates, information on the convergence of the interpolation process is valuable. Second, the "goodness-of-fit" of the estimating surface (economic areas for overlay, the interpolated surface for the pycnophylactic method) to the actual one, may be the most important factor affecting the overall performance of the reaggregation methods and needs examination. And third, the target unit estimates themselves constitute the primary results. The overall performance of the methods is compared with their performance in Lam's study. A. Convergence of the Pycnophylactic Interpolation Process The interpolated surface was reaggregated into the target units after twelve iterations. The decision to stop at this point was arbitrary, but it can be supported by examining the change in the estimating surface during the interpolation process. Two statistics indicative of this change were monitored during this process. These statistics were 1) the RMSE between the values for each grid cell after subsequent smoothings, and 2) the RMSE between total target unit values before and after adjustment. The following formula was used: 27 28 . - " E . . . . 2 RMSE] - n . (0113 .01 13-1) 4) 1 where, fer the first RMSE: 0i,j is the value fer grid cell i after iteration j (j > 1): 01 -_ is the value fer grid cell 1 after iteration j-l, n i; the number of grid cells, and where, for the second RMSE, Oi - the the value fer target unit i after adjustment, Oi'J_ is the value fer target unit i before adjustment, n i; the nunber of target ants. These RMSE's indicate the overall effectiveness of the smoothing routine and the volume-adjustment routine: respectively. The progression of their values is displayed in figure 5.1. Since the first RMSE was computed over 18060 grid cells, as opposed to 48 states, its value changed more smoothly. The rate 2; change gf both curves had declined almost 32 zero 2y twelve iterations, which can be interpreted £3 mean that the smoothing and adjustment routines had lost most 23 their effectiveness. Additional iterations, then, probably would not have changed the surface siggificantly. B. Goodness-of—Fit of the Estimating Surfaces The "optimal" estimating surface would minimize the difference between itself and the actual surface, and the trend of this difference during the pycnophylactic interpolation process indicates whether or not the process is improving the surface. RMSE can be used again to summarize this error in the interpolated surface. 29 Figure 5.2 shows the succession of RMSE values between the interpolated and actual (county) values for all grid cells. Iteration 0 is the pre-interpolated (economic area) surface, and iterations 1 through 12 are the increasingly smooth, pycnophylactic surfaces. The error monotonically decreases and is similar to the convergence of the pycnophylactic process represented in figure 5.1. 323 pygnophylactic surface after 12 iterations, then, was generally 2 better approximation than any 2; the previous gggg, includigg the economic area surface used 52 derive overlay estimates. This is an important result, because it suggests that the target estimates using the pycnophylactic method should be an improvement over those using overlay. But positive and negative errors in the estimating surface can offset one another when they are aggregated into the target units, so the aggregation can hide the improvement. The goodness-of-fit can also be viewed spatially using maps of the cell-by-cell difference between the estimated surfaces and the actual one. The maps displayed in figure 5.3 correspond to iterations 0 (5.3a and 5.3b) and 12 (5.3c and 5.3d) in figure 5.2. They show the error in the estimating surfaces used by the overlay and pycnophylactic methods. For visual clarity, the values are classed and separated into their positive and negative components. The two estimating surfaces must be very similar judging By their similar error distributions. The postive errors are extensive and the negative errors are intensive and concentrated 12 urban areas. Neither method, then, is able 30 20" a. Change in Source Unit Population due to Volume Adjustment 3 15 u E (If, 10 «r a C 5 «In A .4 - Iteration 20" b. Change in Estimating Surface between Successive Smoothings 8 9 15 «H- 5 III In 2 104- C Sell- 0 : : a: t :: :14; : : 4: t 1% 6 IO Iteration Figure 5.1. Convergence of the Pycnophylactic Interpolation Process 473 472 471 470 489 488 487 488 488 RMSE (I100) Iteration Figure 5.2. Error in the Estimating Surface r rflynmmuflwhmfi w .gif a—«Wflmflj Wirfinflflzmwxnm... .. . . a . .T‘:rrlr.l|li.lrill_l‘rl.ilr.\ér...z.;I.. z unmafii w irmw hm”. . I . . ., . 832 .26 an OOOOpOaOOOp w I W 839 8 32: I M I W 822 2: i _ no. 8 o D wit. u ._ . Emu, a . :33 . 35.233 HLE; . “We .. u95.5.3.5.“ .. I? _ mooc< 289.com wEOm—mw w>_._._m0n_ ' Areas Economic Figure 5.3a. Error in the Estimating Surface, Positive Error: .mx fine“... 9?? .. . ...---z.s-,--.-.i.-. i-.. . mxxwmwh . . . 882' 3...... I W . x «new? _. . t _. , . I 2 I . mm»; x.oooo2 82: I m .31.: T». v‘ r. rig. 837 8 82.. I 89: 2 on? E . co? 2 o D u. .2354 . 3522.5“ _ nzo.53=...o.. mocha. 289.com mmOmm—w w>F<0mZ F KZJMX .rr.IC).Ip.... Surface, 1ng 1n the Estimat Negative Error: Economic Areas Figure 5.3b. Error r; . . . . 4.1 .1 .9an wamtnflwfiicramm . Lrwwnfirwi.n..nmm.fiu.fi.m,umr13.! .. . . u. U .. . .. .” 5i\ZI.r..I.II!x..:.III".... i. ... .1. .7me . . . gt .memwmm . . .. . . . . . 00000.. .25 .. 4:: . , fl_. .,..m.. . ...- , ooooo—o.oooop Is .3; .. 00009 3 000.. 000' 3 009 00.. o. 0 D <3h0< . th2 5.5: 5.5: 5.55555: 5.55555: 5.5555555 5.5555555 5555555 .52 5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 5555555 5> 5.55 5.55 5.555555 5.555555 5.555555 5.555555 555555 5> 5.55: 5.55: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 55 5.5: 5.5: 5.555555: 5.555555: 5.55555555 5.55555555 55555555 55 5.55: 5.55: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 25 5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 .555555 55 5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55 5.5 5.5 5.5 5.5 5.555555 5.555555 555555 55 5.55: 5.55: 5.5555555: 5.5555555: 5.5555555 5.5555555 55555555 55 5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55 5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55 5.55: 5.55: 5.5555555: 5.5555555: 5.5555555 5.5555555 5555555 :5 5.55: 5.5: 5.55555: 5.55555: 5.555555 5.555555 555555 52 5.5: 5.5: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 52 5.5: 5.5 5.55555: 5.55555 5.5555555 5.55555555 55555555 52 5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 5555555 22 5.55 5.55 5.5555555 5.5555555 5.5555555 5.5555555 5555555 52 5.555 5.555 5.555555 5.5555555 5.5555555 5.5555555 555555 :2 5.5 5.55: 5.5555 5.55555: 5.555555 5.555555 555555 >2 8mm 555596 8mm 5555558 8mm 555596 5532 335 5.52 5. 5855 55555 8525555 Av.u:oov .H.m manna 38 Table 5.2. Sunnary Statistics for Actual and Estimated Values for States Fst. bathed RMSE R 810 Intercggg (RRMS_E) (E2) 5.15:.) (3.13.) Overlay 773094.8 .985 .946 + 214946.655 (.182) (.971) (.024) (141770.957) Pycno. 757856.9 .984 .950 + 201315.690 (.189) (.968) (.025) (147942.336) + Significant at 95% confidence level POPULATION DENSITY. 1980 Actual‘ People] Grid Coll - 0 to 5000 - 6000 to 15000 4.5:.:'§.':r5"'i'._l 15000 to 25000 25000 to 50000 50000 to 75000 75000 to 100000 over 100000 ' After Rasterization Figure 5.4a. Population Density, 1980: Actual 39 POPULATION DENSITY. 1980 Overlay Method People I Grid 0.) - 0 to 6000 - 6000 to 16000 a 16000 to 26000 1 25000 to 50000 p 50000 to 75000 75000 to 100000 over 100000 Figure 5.4b. Target Unit Population Densit , Population Density, 1980: Overlay Method POPULATION DENSITY. 1980 Pycnophylactic Method People I Grld - 0 go 5000 - 6000 to 15000 » 15000 to 26000 25000 to 60000 76000 to 100000 ovOI 100000 Figure 5.4c. Target Unit Population Density, . Population Density,l980: Pycnophylact1c Method 40 POSITIVE ERROR Overlay Method ‘%A£TUALPOE over 100 NEGATIVE ERROR ' Boundaries Coincide with Economic Area: Figure 5.5a. Target Unit Error, Positive Error: Overlay Method NEGATIVE ERROR Overlay Method 9GACTUALPOR ” —26 to —50 POSHTVEERROR ’ Boundlriee Coincide with Economic Area: Figure 5.5b. Target Unit Error, Negative Error: Overlay Method 41 POSITIVE ERROR Method ‘%ACTUALPOI V 25 to 50 50 to 100 over 100 NEGATIVE ERROR ' Boundaries Coincide with Economic Area: Figure 5.5c. Target Unit Error, Positive Error: Pycnophylactic Method NEGATIVE ERROR Pycnophylactic Method TGACTUALPOP. ‘—25 to -60 POSHTVEERROR ' Boundaries Coincide with Economic Areaa Figure 5.56. Target Unit Error Negative Error: Pycnophylactic Method 42 RMSEs are very similar, but the RRMSEs exceed Ford's arbitrary cut-off value of .10 considerably. Since none of the empirical tests of areal data reaggregation methods have demonstrated accuracy as high as that prescribed by Ford, it may be that a higher cut-off value would be more realistic. The values for states with highly urbanized neighbors (e.g., Delaware, Indiana, Kentucky, New Hampshire, Vermont, and West Virginia) were often severely overestimated. This result is probably the counterpart, at this smaller geographic scale, of the suburban errors that Lam observed. The most glaring errors occurred in Delaware and New Hampshire. These errors can be largely attributed to the relatively larger size of the economic areas that overlap these states. The small size of Lam's data set (21) limits the validity of parametric statistical tests that could otherwise be used to compare these results with hers. However, the data do suggest some improvement for the pycnophylactic method relative to overlay. In particular, the pycnophylactic method yielded better estimates for only 29% (6) of Lam's planning districts versus 59% (26) states. Although the regression statistics can be used as indicators of the accuracy of the methods, they are only summary statistics. For a particular application, the number of improved estimates that one obtains by using one method or the other is also important. Given an expectation of equal accuracy for the methods, 43 a chi-square test for a relationship between performance of the reaggregation method and units used was performed. Although the units used cannot be separated from other differences in the two studies that could cause differences in performance, it is felt that the units are clearly the most important methodological difference. Table 5.3 shows the contingency table used in this test. The computed chi-square statistic from these data is 5.29. This value allows us to conclude, at the 95% between the performance of the method and the spatial characteristics of the areal units used in the stud . Finally, a binomial difference of proportions test (Hammond and McCullagh, pp. 154-157), similar to a t-test for difference of means, was conducted to determine whether the observed proportion of successes for the pycnophylactic method (.59) is significantly greater than the proportion that would be expected if the methods performed identically (.50). The difference of proportions test statistic has a standard normal probability distribution and had a computed value of 0.882 for this data set. The hypothesis that the proportion of successes is equal to .50 cannot be rejected at the 95% confidence level. There is a fairly high probability that a proportion as large as .59 occurred completely by chance. This result suggests that there is little difference between the two methods for a given set of data. To summarize, the results reinforce many of the 44 Table 5.3. Relationship of Success of Reaggregation Method to Source Units Used Medufl Owaflay Eycno. U census Tract 15 (10.66) 6 (10.34) 2‘ t Econ. Area 18 (22.34) 26 (21.66) * Expected value in parentheses findings of previous studies. High errors are more likely to occur in target units estimated from relatively larger source units and in predominantly suburban target units. The overlay and pycnophylactic methods performed nearly identically and yielded fairly accurate and precise estimates. A new finding is that, despite the fact that the pycnophylactic estimating surface was generally a better approximation of the actual one, this improvement was not translated into significantly better target unit estimates. Finally, there is some support for the hypothesis that the size and delineation criteria of the source units can affect the relative accuracy of the two methods. For this individual study, the units selected resulted in superior performance for the pycnophylactic method. CHAPTER VI. SUMMARY AND CONCLUSIONS This study has compared the performance of two of the best available methods for reaggregating areal data into different units. The primary motivation for the work was the finding, in a previous study (Lam, 1980), that there was no difference in the performance of the methods. This was a puzzling result because of the difference in their estimating surfaces, the approximations of the actual surface that are aggregated into the target units. The overlay method aggregates the source units themselves, a stepped estimating surface, and the pycnophylactic method aggregates a smooth estimating surface interpolated from the source units. A characteristic of both of these methods that improves their performance relative to others is that their estimating surfaces both preserve the total value, the volume, within the source units. The units selected for this study, economic areas and states, have spatial characteristics that 1) are markedly different than those used previously (census tracts and planning districts), and 2) should result in greater accuracy for the pycnophylactic method. In particular, the units were much larger and the delineation criteria for the economic areas suggested an intra—unit distribution that is approximately bivariate normal, rather than the homogeneous distribution hypothesized for census tracts. These factors probably favored the overlay method in Lam's study. 45 46 The results, like Lam's, do not strongly favor either method. Although both methods are fairly accurate and precise, there was no significant difference in their estimates. Also, although the pycnophylactic method had a higher proportion of successes than overlay, it was not significantly greater than the proportion that would be expected if the methods performed identically. The estimates were compared only after the pycnophylactic interpolation process had almost completely converged. Comparing the results with those from Lam's study, a significant difference was found in the relative success of the two methods. There is some support, then, for the hypothesis that the spatial characteristics of the units affect the relative performance of the methods. The highest errors occurred most frequently when the target units intersected source units that were relatively large or highly urbanized. In order to maintain the volume-preserving constraint for the source units, the negative errors (usually located at the urban centers) in the estimating surface had to be offset by positive errors in the neighboring (suburban) areas. These positive errors were then allocated to the suburban target units. In general, the better the fit of the estimating surface to the actual one, the more accurate the target estimates should be. However, this study has demonstrated that the improved fit may not be translated into better eatimates because of the offsetting of positive and negative errors when they are aggregated into the target 47 units. The pycnophylactic surface was a better representation of the actual one, but there were no significant differences in the pycnophylactic and overlay estimates for target units. The maps of the errors in the estimating surfaces show that the two methods suffer from a common deficiency: they do not capture extremes in the actual surface well. The estimating surfaces for both methods hover about the mean density values of the source units, so they under- or over-estimate locations that deviate from the means. While the pycnophylactic method is a theoretical improvement over the overlay method, it misrepresents the actual surface in a manner somewhat analogous to the interpolation of a topographic surface using distance-weighted averaging of control points. That is, the interpolated surface is continuous and smooth, or "undulating", while the actual surface is continuous but sometimes "jagged". The presence of sudden extremes may be a characteristic of population (and related) density surfaces, and such extremes may even be largely limited to a range of geographic scales. The overlay and pycnophylactic methods should perform better when the actual density surface has few extremes. CHAPTER VII. RECOMMENDATIONS FOR FURTHER STUDY The aggregation of estimating surfaces into target units is the step in the areal data reaggregation process that distinguishes it from the problem of geographic surface modelling. The objective of surface models is usually to represent some actual surface as accurately as possible. This study has shown that the aggregation step acts as a sort of filter on the errors in the estimating surface. A better estimating surface does not necessarily produce better target estimates. This problem notwithstanding, accurate surface modelling is still a worthy objective in the study of areal data reaggregation: an accurate estimating surface will probably do more to provide good estimates than any other factor. with this in mind, three avenues for future research on areal data reaggregation are suggested. First, different variables should be used in future empirical tests. The pycnophylactic method will probably model an undulating surface better than the seemingly jagged population density surface used here. Second, Clarke's fourier adaptation of the pycnophylactic method should be applied to economic areas or similar units. The Clarke algorithm is designed specifically to capture periodicity, such as that exhibited by geographic central places. It is this method, more than any of the other available ones, that has the potential to capture local extremes in the surface. 48 49 Third, and finally, recent research (Clarke, 1984; Wallin, 1984) is demonstrating that the pycnophylactic concept, interpolation constrained by volume-preservation, is very flexible. Many different assumptions, or types of detail, can be applied to the surface while preserving the source unit values. One of the major reasons that the pycnophylactic method did not convincingly out-perform overlay in this study is that it interpolates a "peak" within source units only in special cases, i.e., neighbors of lower density. Almost all of the economic areas have one predominant "peak", an SMSA, near their centers. Thus, a bivariate normal distribution was proposed here as a model of the population density surface within an economic area. To constrain a pycnophylactic surface to bivariate normality within source units is a worthwhile research task that could significantly improve the pycnophylactic estimates for this data set. BIBLIOGRAPHY Boneva, L., D. Kendall and I. Stefanov (1971), "Spline Transformations: Three New Diagnostic Aids for the Statistical Data—Analyst", Journal pf the Royal Statistical Society, Ser. B, vol. 33, no. 1, pp. l-70. Clarke, K. C. (1984), "Two-Dimensional Fourier Interpolation for Uniform Area Data", Technical Papers, 50th Annual Meeting of the Americap Society 22 Photogrammetgy, vol.‘§, pp. 835-845. Coulson, M. R. C. (1978), "Potential for Variation: A Concept for Measuring the Significance of Variations in Size and Shape of Areal Units", Geografiska Annaler, Ser. B, vol. 60, no. 1, pp. 48-64. Crackel, T. (1975), "The Linkage of Data Describing Overlapping Geographical Units -- A Second Iteration", Historical Methods Newsletter, vol. 8, no. 3, pp. 146-150. Ford, L. (1976), "Contour Reaggregation: Another Way to Integrate Data", Papers, 13th Annual Conference pf the Urbap and Regional Systems Association, vol. 11, pp. 528-575. Goodchild, M. F. and N. S-N. Lam (1980), "Areal Interpolation: A Variant of the Traditional Spatial Problem", Geo-Processipg, vol. 1, pp. 297-312. Hammond, R. and P. McCullagh (1974), Quantitative Techniques 32 Geography: 32 Introduction, Clarendon Press: Oxford, 318 pp. Lam, N. S-N. (1980), "Methods and Problems of Areal Interpolation", Ph.D. Dissertation, University of Western Ontario, l77pp. Lam, N. S-N. (1983), "Spatial Interpolation Methods: A Review", The American Cartographer, vol. 10, no. 2, Lam, N. S-N. (1985), "A Method for Choropleth Inversion", Technical Papers, 45th Annual Meeting pf the American Congress 22 Surveyipg and Mapping, pp. 365-373. '50 51 Markoff, J. and G. Shapiro (1973), "The Linkage of Data Describing Overlapping Geographical Units", Historical Methods Newsletter, vol. 7, no. 1, pp. 34-36. Tobler, W. (1979), "Smooth, Pycnophylactic Interpolation for Geographical Regions", Journal 2£.£EE American Statistical Association, vol. 74, no. 367, pp. 519-536. Tobler, W. and J. Lau (1978), "Isopleth Mapping Using Histosplines", Geographical Analysis, vol. 10, no. 3, pp. 273-279. United States Bureau of the Census (1983), "County and City Data Book, 1983", U.S. Government Printing Office. United States Bureau of Economic Analysis (1977), "BEA Economic Areas (revised 1977): Component SMSA's, Counties, and Independent Cities", U.S. Government Printing Office. Wallin, E. (1984), "Isarithmic Maps and Geographical Disaggregation", Proceedings, International Symposium pp Spatial Data Handling, vol. 1, pp. 209-217. "TI’I'IWQITILMMIIfiifijlflffiiflfimflmfl'Es