r— “. E“. 3- my

-.__-..- -...L I
higan Siite

LI ,.., .‘-'li'-" '77.. rt: ‘1 I,
--- Viv-‘1

 

 

 

 

more
This is to certify that the
thesis entitled
Areal Data Reaggregatibn:
A Comparison of Two Methods
presented by
Gustave William Rylander
has been accepted towards fulﬁllment
of the requirements for
Master Of Al‘tSdegree in Geography
M or professor
Date 10/ 28/86

 

0-7639 MS U is an Waive Action/Equal Opportunity Institution

 

 

 

RETURNING MATERIALS:
1V1531_] Place in book drop to
LJBRARJES remove this checkout from

4!!.;3...._ your record. FINES will
be charged if book is

 

 

returned after the date
stamped below.

 

 

 

 

AREAL DATA REAGGREGATION:
A COMPARISON OF TWO METHODS

BY
Gustave William Rylander

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

MASTER OF ARTS

Department of Geography

1986

Lise-saw 7

ABSTRACT
AREAL DATA REAGGREGATION:
A COMPARISON OF TWO METHODS

BY

Gustave William Rylander

Two methods for reaggregating data between sets of areal
units, "source units" and "target units", are compared.
With the overlay method, the value for each source unit is
allocated among the target units in proportion to its area
of overlap with each target unit. The pycnophylactic method
first interpolates an intermediate surface from the source
units and then aggregates it into the target units. A

previous case study with census tracts and planning

districts in London, Ontario, revealed no significant
differences in their performance. Two factors that may lead
to better performance by one method or the other, however,
are the size of the units and their delineation criteria.
The performance of the methods in estimating populations of
the 48 contiguous states from those of 181 economic areas is
compared with their performance in the previous study.
Despite the fact that pycnophylactic interpolation resulted
in a closer approximation to the actual surface, the
difference in the performance of the methods was not
significant. A difference was observed, however, between the

relative success of the methods in the two case studies.

ACKNOWLEDGEMENTS

I would like to thank my advisor, Judy Olson, and my
other committee members, Bruce Pigozzi and John Hunter, for
their valuable comments, criticisms, and support during the
deve10pment of this thesis. My department chairman, Gary
Manson, was also very patient and supportive, and I extend
my thanks to him. Cheers and the best of luck to three
friends and fellow graduate students: Ann Goulette, Charlie
Johnston, and Kim Medley. Bruce, Murray, and Matt provided
me with a comfortable home and plenty of pizza during my
last month at Michigan State. Finally, my parents, Gus and
Nancy Rylander, offered their constant support, and I want
to express my gratitude and love to them.

TABLE OF CONTENTS

List Of Tables 0 O O O O O O O O O O O O O O O O I O O O O O O O O O O O O O O O O O O O O O 1
List Of Figures 0 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 0 ii
Chapter I. IntIOduction O O O O O O O O O O I O O O O O O O O O O C O O O O O O O O 1

Chapter II. Previous Research on
Areal Data Reaggregation ................. 4

A. Development of

Areal Data Reaggregation Methods .................. 4

l. The Overlay Method ............................. 4

2. The Contour Reaggregation Method ............... 6

3. The Pycnophylactic Method ...................... 8
B. Variations on the Methods ......................... 11
C. Empirical Comparison of the

Areal Data Reaggregation Methods .................. 13
D. Applications and Adaptations of the Methods ....... 15
E. Summary ........................................... 16

Chapter III. Research Hypotheses and Objectives ...... 17
A. Effect of Size of Source Units on

Areal Data Reaggregation .......................... 17
B. Effect of Source Unit Delineation Criteria

on Areal Data Reaggregation ....................... 18
C. Research Objectives ............................... 18

Chapter IV. Research Methodology ..................... 23
A. Data Sources and Compilation ...................... 23
B. Rasterization Methods

and the Loss of County Data ....................... 24
C. Implementation of the

Areal Data Reaggregation Methods .................. 25

Chapter V. Results ................................... 27
A. Convergence of the Pycnophylactic

Interpolation Process ............................. 27
B. Goodness-of—Fit of the Estimating Surfaces ........ 28
C. Target Unit Estimates and Errors .................. 33

Chapter VI. Summary and Conclusions .................. 45
Chapter VII. Recommendations for Further Study ....... 48

Bibliography OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 50

LIST OF TABLES

Population Data for Planning Districts ............
Summary Statistics for Actual and Estimated Values
for Planning Districts ............................
Source Unit Delineation Criteria

A. Census Tracts:
B. Economic Areas:

London, Ontario .................
united States OOOOOOOOOOOOOOOOOO

POPUIation Data for States .0.00000000000000_OOOOOOO
Summary Statistics for Actual and Estimated Values

for states .0...0.0.0....O0.00000000000000000000COO

Relationship of Success of Reaggregation Methods to

Source Units Used

15
21

21
36

44

2.1.
2.2.

LIST OF FIGURES

Interpolating a Smooth, Pycnophylactic Surface:
Iteration l ....................................... 12
Effects of Values of Neighboring Units on
Pycnophylactic Surface Configuration
a. Trough ......................................... 12
b. Slope .......................................... 12
c. Peak ............................................12
Source Units and Target Units
a. Source Units: Economic Areas ................... 20
b. Target Units: 48 Contiguous States ............. 28
c. Source Units and Target Units Superimposed ..... 20
Idealized Surface Configuration
a. Within an Economic Area ........................ 21
b. Within a Census Tract .......................... 21
Convergence of the Pycnophylactic Interpolation
Process
a. Change in Source Unit Population

Due to Volume-Adjustment ....................... 30
b. Change in Estimating Surface

Between Successive Smoothings .................. 30
Error in the Estimating Surfaces .................. 30
Error in the Estimating Surfaces
a. Positive Errors: Economic Areas ................ 31
b. Negative Errors: Economic Areas ................ 32
c. Positive Errors: Pycnophylactic Surface ........ 33
d. Negative Errors: Pycnophylactic Surface ........ 34
Target Unit Population Density
a. Population Density, 1988: Actual ............... 38
b. Population Density, 1980: Overlay Method ....... 39
c. Population Density, 1980: Pycnophylactic Method. 39
Target Unit Error ,
a. Positive Error: Overlay Method ................. 40
b. Negative Error: Overlay Method ................. 48
c. Positive Error: Pycnophylactic Method .......... 41
d. Negative Error: Pycnophylactic Method .......... 41

ii

CHAPTER I. INTRODUCTION

Geographers often use data that are associated with
areal units. The boundaries of different sets of
overlapping units are usually delineated by different
institutions for their own purposes and do not always
coincide. A result of this non-coincidence is that the sets
of data that are available for a region may not be
compatible. This is a problem for the researcher_who needs
to relate the data from different sets of units either
statistically or visually on maps. For example, the
hypothesis that income is geographically related to
political behavior is difficult to test empirically when the
boundaries for census tracts do not respect those of
political wards. Assuming that a census of one of the sets
of units is not feasible, the solution is to estimate the

data for one set, the target units, from those of the other

 

set, the source units. The term areal data reaggregation is

 

 

used here to refer to the entire family of methods by which
values for target units are estimated from source units.
Previous research (Lam, 1989) has shown that two of the
available methods of areal data reaggregation, the overlay
and pycnophylactic methods, are clearly better than the
others. Overlay reaggregates the data directly from the
source units into the target units, while the pycnophylactic
method first interpolates an intermediate, continuous
surface from the source units and then reaggregates it into

the target units. The two methods, then, aggregate

1

2

different estimating surfaces, approximations of the

 

actual surface, into the target units. The estimating
surface represents an assumption about the configuration of
the actual surface.

In a previous study (Lam, 1988) the overlay and the
pycnophylactic methods performed nearly identically, and it
is not known whether their relative accuracy and precision
are stable when units with different spatial characteristics
are used. Information about the actual surface is important
to the selection of an appropriate data reaggregation method
because the configuration of the actual surface affects the
validity of the assumptions implied by the use of the
different estimating surfaces.

This study isolates two of the factors that may affect
the performance of the methods, the size (or aggregation

level) of the areal units and the criteria used £2

 

delineate their boundaries. The size of the source units

 

may affect the likelihood that the actual distribution is
homogeneous within their boundaries. Larger units may be
less likely to contain homogeneous distributions. The
delineation criteria often provide more detailed information
about the configuration of the actual surface, which
strongly affects the accuracy of the estimates. In other
words, they suggest the kind of estimating surface that may
yield good estimates.

The performance of the two reaggregation methods is
compared using the results from an earlier study along with

parallel information for a new set of units that is clearly

3
different in these two factors. Although a single
comparison cannot greatly improve the confidence with which
we choose a method, this study may suggest some
characteristics of size and delineation criteria to consider

beforehand.

CHAPTER II. PREVIOUS RESEARCH ON AREAL DATA REAGGREGATION

Research on the areal data reaggregation problem has
yielded three basic reaggregation methods: overlay, contour
reaggregation, and the pycnophylactic method. Systematic,
if limited, comparisons of their performance have also been

made.

A. Development of the Areal Data Reaggregation Methods
1. The Overlay Method

Markoff and Shapiro (1973) demonstrated and empirically
tested the simplest method of areal data linkage, the
overlay method. This method reaggregates data directly
from the source units into the target units, taking each
target unit value as a weighted sum of the source unit
values. The weights are the areas of overlap between each
target unit and the source units. The contribution of each
source unit to a target unit, then, is proportional to its
area of overlap. Different formulas for overlay are

necessary depending on whether the source unit data are

absolute numbers or percentages.

For absolute numbers,

A.
A E : 1 '
3' A3

A
where V' is the estimated value for target i
j is the actual value fer source j
Aij is the area of overlap between i and j
Aj is the area of j.

Fbr percentages,

A.
A E : 1 '

3 A1
where A1 is the area of i.

Using data for 18th Century French generalites and

 

departments, Markoff and Shapiro estimated the populations

 

of each of these sets of areal units from the other. They
made their area measurements with a grid overlay.. Using
equation 1), the estimates had a correlation of .53 when

estimating data for departements from those of generalites

 

 

and .96 when the direction was reversed. The higher
correlation was obtained when estimating the values for the

relatively larger generalites. This improvement is

 

expected, because the departements provide a more detailed

 

representation of the actual population density surface. In
general, for any areal data reaggregation method, better
estimates will be obtained when data are reaggregated from
smaller units to relatively larger units.

Errors in the overlay method occur because of
intra-unit deviations from the single density value
assigned to each of the source units. In other words, the
implicit assumption of the overlay method is spatially
homogeneous density within the source units. This has been

called the choropleth assumption, after the mapping method

 

commonly used for areal data.
Crackel (1975) presented modifications of the overlay

formulas for special cases in which the area of overlap is

6

either less than or greater than the area of the target
unit. This can happen when the target unit is not
completely covered by source units or when more than one
set of source units (overlapping one another) are used to
estimate the value of a target unit. He suggested
multiplying the weighted sum used in formulas l and 2 by
the ratio of the target area to the total area of overlap.
This correction reduces the estimated value when the target
area is smaller than the area of overlap and increases it
when the target area is greater, and it should should often
reduce the severity of errors.
2. The Contour Reaggregation Method

Ford (1976) contributed an alternative to the overlay

method. The contour reaggregation (CR) method involves

 

interpolating values for a regular grid from those given
for control points located at the centers—of—gravity of the
source units and reaggregating the interpolated values into
the target units. The primary difference between CR and
overlay, then, is that CR transforms the source units into
a smooth, disaggregated surface before aggregating it into
the target units. This intermediate step is intended to
improve the representation of the actual surface by
introducing a degree of spatial autocorrelation into the
estimating surface. The value of a spatially-distributed
variable at a particular location is usually positively
related to nearby values, but the aggregation of the
surface into source units hides some of this information.

Interpolation can recover some of the lost detail.

7

Ford's interpolation method was similar to the one used
by the SURFACEII program. He fit a second-order trend
surface equation to the control points using weighted least
squares (weight = l/d2) of the nearest 8 control points.
Then the equation was used to predict the values at the
nodes of the regular grid. He estimated rent values for 10
postal zones from those of 15 census tracts in Dade County:
Florida. Although his sample size was too small to produce
a reliable measure of statistical association, he reported
a relative root mean squared error (RRMSE) of .255 between
the actual and estimated values. RRMSE is a root mean
squared error (RMSE) standardized about the mean of the

estimated values:

mass 1
RRMSE = 3-3- : ‘01 - Ei)2 “-2 :31 3)
i N i N i

where Bi and Oi are the expected and observed values.

 

 

Ford suggested that methods yielding an RRMSE of around
.10 could be considered fairly accurate.

Although CR may have a theoretical advantage over the
overlay method because of its incorporation of spatial
autocorrelation into the estimating surface, it also has a
critical disadvantage: interpolation from control points
generally does not preserve the total value, or volume, of
each of the source units. The common interpolation methods
often preserve the values at the control points, but these
values are inconsequential when areas are the units of

interpolation. If one assumes that the source unit data

8
were collected without error, then CR adds detail (however
realistic) to the surface at the expense of accuracy. The
next section traces the progression of research that proved
that this trade-off is unnecessary.
3. The Pycnophylactic Method

The problem left for subsequent research into areal
data reaggregation was to incorporate smoothness into the
estimating surface while preserving the volume of the
source units.

The solution to this problem had its origins in work by
Boneva, Kendall, and Stefanoff (1971). They introduced a
technique for interpolating yearly birth rates from data
given in five-year intervals. Aggregated, temporal data of
this sort can be displayed graphically as a histogram, with
the total births for a five-year period given by the area
of an individual bar. Histogram bars, however, hide some of
the serial autocorrelation usually exhibited by data
collected over time. The authors' objective, then, was to
replace the discrete bars with a continuous curve while
preserving the total area under the curve as a whole and
within five-year intervals. The curve was generated as a
mathematical spline, which is essentially a series of
simple curves (or functions) pieced together end-to-end.

The resulting curve, called a histospline, adds

 

hypothetical detail to the intervals without changing the

accuracy of the original, aggregated data.
Tobler and Lau (1978) adapted the technique for the

two-dimensional case, i.e., for bivariate histograms. A

9
bivariate histogram has data associated with rectangular
cells instead of intervals on a line. The bars Of a
univariate histogram are replaced by columns, with the
density and total value for each cell represented by its
column's height and volume, respectively. Tobler and Lau
demonstrated a geographic application of this method using
population data compiled by 0.25 square mile grid cells for
Ann Arbor, Michigan. When the surface interpolated from

these grid cells is threaded with isolines, an isopleth map

 

is produced. The primary motivation for their research, in
fact, was the improvement of isopleth mapping methods, not
areal data reaggregation.

Tobler (1979) developed the technique further by
adapting it to operate on irregular geographic units. He
accomplished this by disaggregating the units into a set of
small grid cells. In computer terminology, he used a
raster data structure. This is the digital counterpart of
the physical grid overlay that Markoff and Shapiro used to
make their area measurements. Tobler used the term
pycnophylactic (volume-preserving) to refer to the new
interpolation method. He also provided a useful analogy to
enhance its intuitive appeal. He suggested thinking of the
original stepped surface as a clay model. If the
pycnophylactic interpolation method were to operate
physically, it would sculpt the clay such that the surface
would become as smooth as possible without moving any clay
between neighboring columns.

Tobler's study, then, was the final step in maintaining

10
source unit accuracy (like overlay) while interpolating
realistic detail into the surface (like CR), and the
pycnophylactic method has obvious appeal as a method for
areal data reaggregation.

Algorithms for pycnophylactic interpolation are
described in Tobler (1979) and Lam (1980, 1983). A simple
raster-based algorithm, based on the one used by Lam (1980,
1983), is described below because an understanding of the
interpolation process is necessary in order to understand
the pycnophylactic estimating surface.

The pycnophylactic interpolation algorithm is shown
graphically in figure 2.1. First, the grid cell values
used to identify the source units are replaced with the
corresponding density values. The surface is then smoothed
using an arbitrary smoothing operator (e.g., each grid cell
is assigned the mean of its four non-diagonal neighbors).
Finally, the total values within the source unit boundaries
are recovered by multiplying each grid cell by the ratio of
the original source unit value to the new (smoothed) value.
This process is repeated an arbitrary number of times.

Since the smoothing routine reduces differences in

 

neighboring grid cells, it operates from the edges of the
source units (i.e., from the "cliffs" between neighboring
units) inward. The monotonic reduction of differences also
implies that the process converges; that is, the changes in
the surface become smaller with each iteration.

Another characteristic of this interpolation algorithm

is that the configuration of the interpolated surface

11
within source units depends entirely on the densities of
neighboring units. Figure 2.2 shows how the locations of
maxima and minima within a unit are affected by neighboring
units. If all of the neighbors have higher densities, a
"trough” will be interpolated within the unit (fig. 2.2a).
If the neighbors have a combination of higher and lower
densities on opposite sides then the surface will slope
from one side of the unit to the other (fig. 2.2b).
Finally, a "peak" is interpolated if the neighboring units

all have lower densities (fig. 2.2c).

B. Variations on the Methods

The research up to and including Tobler (1979), then,
had produced three basic methods for areal data
reaggregation: overlay, CR, and pycnophylactic. Variations
of all three methods are possible. Overlay can be
approximated by using unweighted averages of the
overlapping source units or simply the value of the source
unit with the largest overlap area. CR can use any of the
multitude of methods for interpolating from control points.
And the pycnophylactic method can use different smoothing
routines and boundary conditions. Overlay uses the source
units themselves as the estimating surface, while CR and
the pycnophylactic method use interpolated surfaces. The
estimating surfaces for overlay and pycnophylactic both

preserve the total volume within the source units.

12

 

1. Replece unit identifiers with
density values (see teble below)

 

    

3. Adjust unit values
(see table below)

  

 

 

ITERATION 1

lnitiel Initial Ratio
Unit Ares Pop. Density Pop. (initiel pop. / pop.)
1 6 I) 5.0 32.83 0.914

2 I3 78 6.0 79.00 0.%7
3 15 105 7.0 103.75 1.012
4 6 48 80 45.42 1.057

Figure 2.1. Interpolating a Smooth, Pycnophylactic Surface:
Iteration l

 

 

 

 

 

 

 

 

 

 

o) m b)“ r c) m

Figure 2.2. Effects of Values of Neighboring Units on
Pycnophylactic Surface Configuration

13

C. Empirical Comparison of Areal Data Reaggregation
Methods

Although the performance of overlay and CR had been
assessed independently (Markoff and Shapiro, Ford), no
comparisons of the three methods were made until 1980. Lam
(1980) did not use the same interpolation algorithm for CR
as Ford but took the values at grid cell nodes as
distance-weighted averages of the control points. The
weight she used, Z'd/B, allows arbitrary specification of
the value of B. Higher B values result in more weight
allocated to distant points and, thus, a smoother surface.
To give CR the benefit of the doubt relative to the
volume-preserving methods, Lam chose a value of B that
minimized the difference between the CR surface and the
actual one (0.15 for her data set). She estimated
population data for 21 planning districts from 51 census
tracts in London, Ontario, and her results are listed in
Table 2.1. The summary statistics in Table 2.2 were
derived from these results.

The correlation coefficients from these experiments are

indicators of the precision of the estimation method, which

 

is clearly higher for the volume—preserving methods than
for the "optimal" CR method. The accuracy of the estimates
is indicated by the regression and RMSE statistics.
Perfect accuracy would result in a slope of 1, an intercept
of 0, and an RMSE of 0. Table 2.2 thus also indicates the
generally higher accuracy of the volume-preserving methods.

Finally, Lam observed that the highest errors for the

14

 

amp.a
mae.en
maa.a
amm.au
ham.an
mea.s
mea.ma
Amm.s
mom.s
em~.m
mH~.a
ema.a
mma.au
em~.su
$55.5-
em~.sn
mp~.su
maa.su
mma.s
mma.an
aﬂmne

mo .mmmxm. Nmmmmmw

Hma.al
mmH.aI
mHe.al
aha.sl
hma.al
mmm.a
a¢¢.a
mm&.&l
Hma.5I
mnn.st
Hvs.al
sma.al
mma.al
mHH.s
mos.s
msa.SI
N¢5.&
maa.a
Hem.a
abs.s
vH¢.sI

mmN.a
@HH.5I
Nma.a
mma.st
HHS.&I
hb¢.&
maa.~
m¢a.sl
mma.al
wms.al
mma.st
MNH.aI
mms.al
Nss.s
mas.a
mas.a
hvs.a
mwa.a
oaa.a
wma.a
hNH.aI

Aamsuoc my muouum

c.5mvma
m.mmvol
«.mhmm

m.mmNaHI

m.m~mml
m.vmm
h.vmbh
«.mbmm
a.mmmw
m.mmav
v.55NH
m.ammv
m.mmmhl
m.~mvvl

m.m>msai

o.mmaan
a.asmmu
~.~maeu
~.mamm
H.5emu
mnwwmm
mo

H.val m.~wo¢
N.mmhml m.mmmal
H.mm~I m.aam
m.>m~mt «.mmwml
m.o~ml N.th
m.svHH m.mam
m.mm m.v>a
m.mw¢Hi h.mmhl
N.mle h.HmmI
m.mawl m.m~l
N.&¢~I N.maml
h.wvmai ~.hbmal
H.HMHHI N.mvml
a.mma~ s.m~
m.vasa m.vm
m.H~bI m.am
m.mmm m.v¢m
m.mmm m.maw
N.Hmm¢ N.smh~
h.mmh m.m~m
mommmml o.mwml
OCMNM NmHuw>o
MHOHHN

m.mHmHm
5.5mmh
¢.amvma
m.mmsm
h.masm
m.eam~
>.H~mh
~.amem~
5.5mmea
w.aama
a.HmHA
m.H¢seH
~.amem
m.aam-
«.mmoa
«.mmme
e.aa~m
m.mmmm
~.mmmo
m.memaﬂ
m.HHmm
mo

a.maass
m.ammHH
m.ma~ma
e.vmaes
~.eamm
m.oesm
m.mwa
m.hb~ms
m.m-s~
m.me~
w.saem
m.mam¢
m.haama
s.HmmmH
m.HmmeH
«.mamm
m.am~ms
m.mmsas
~.eemm
A.mmama
b.56Hm

ocmxm NmHum>o

mwumeﬁumm

m.ahsmm
h.mmmNH
m.vawH
o.mmhma
m.a>mm
m.vaN
m.Hw~
m.mmmmH
m.NhHaH
N.vwh
m.mhmm
m.~awm
m.mmeH
s.~mm5a
m.HHmmH
m.mmmw
m.mmmma
m.mmmaH
«.mmmm
«.mmwaa
¢.HNb¢

mmmma Hm
hamwa am
«mvma ma
Nmmma ma
mwvm 5H
msaa ma
hm ma
mwhha ea
«mmaa ma
was NH
Hmmm HA
umHHH 5H
mNHbH

3mg .53 "wousomv muoﬁﬂmwo 92:on now 38 coﬁumgmom .H.~ manna.

 

15

 

Table 2.2. Sunnary Statistics for Actual and Estimated Values
for Planning Districts (Source: Lap, 1980)

 

Est. Method RMSE R Slope Intercept
(RRMSE) (a?) (5.3.) (S.E.)
Overlay 1479.756 .979 9.974 + 151.180
(.139) (.940) (.056) (681.553)
Pycno. 1657.200 .960 1.915 + 97.621
(.162) (.930) (.965) (768.243)
ca (w'= 2-d/-15) 6199.649 .519 0.494 + 5267.386 +
(.576) (.269) (.159) (2932.199)

+ Significant at 95% confidence level

planning districts occurred in suburban areas. This is to
be expected because a target unit in a suburban area often
overlaps one or more predominantly 25233 source units. The
high density of these source units must be partially
allocated to the target unit, resulting in frequent

overestimation.

D. Applications and Adaptations of the Methods

Most other articles addressing the areal data
reaggregation problem either directly or indirectly
(Goodchild and Lam, 1980; Lam, 1985; Wallin, 1984) were
generally applications rather than evaluations of the
methods. Clarke (1984) presented an interesting adaptation
of the pycnophylactic method. He generated a pycnophylactic
surface exhibiting periodicities, implementing it using
two-dimensional fourier series instead of simple smoothing.
A fourier series is a summation of a set of sine and cosine
waves of various wavelengths, amplitudes, and phase angles.

One possible application of a periodic surface is to model

16
geographic central places. By changing the number of
harmonics in the series, he was able to simulate surfaces
ranging from the stepped surface represented by the source
units to a smooth surface similar to one interpolated using
inverse distance-weighting. Between these extremes,
irregular periodicities were generated in the surface.
Clarke was able to incorporate all of these diverse
hypotheses about the actual surface while maintaining the

pycnophylactic property.

E. Summary

The research into the areal data reaggregation problem,
then, has yielded three basic methods: overlay, contour
reaggregation, and the pycnophylactic method. Markoff and
Shapiro confirmed that better estimates will generally be
achieved when the source units are small relative to the
target units. In addition, Lam (1980) showed that the
volume-preserving methods, overlay and the pycnophylactic
method, clearly perform better than those that do not
preserve volume. However, no difference in performance
between the two volume—preserving methods has been
demonstrated. This is surprising considering the difference

in their estimating surfaces.

CHAPTER III. RESEARCH HYPOTHESES AND OBJECTIVES

Although Lam (1980) found no significant difference
between the overlay and pycnophylactic data reaggregation
methods for her study area, there are characteristics of
the areal units and the actual distribution that may be
useful in discriminating between the performance of the two
methods. In particular, the absolute size of the areal
units and the criteria used in delineating them could
affect the relative performance of the methods. Variations
in these two factors can change the validity of the
choropleth and smoothness assumptions implied by the use of

the overlay and pycnophylactic methods, respectively.

A. Effect of Source Unit Size on Data Reaggregation

It is well established that reaggregation from smaller
to larger units will generally produce better estimates
using any of the reaggregation methods. However, the
absolute size of the source units may also play a role.
Units of larger size may have less "opportunity" to capture
areas of spatial homogeneity and thereby satisfy the
choropleth assumption. This idea was introduced by Coulson
(1978) in his paper on the potential for variation in areal
units. Potential for variation is an index that Coulson
devised for assessing the likelihood, or opportunity, for a
single areal unit to contain a homogeneous distribution. It
is based on the hypothesis that larger and less compact
areal units are less likely to be homogeneous. The

17

18
empirical evidence in support of this hypothesis is
scanty, but the idea has intuitive appeal. Since the
overlay method is based on the choropleth assumption,
larger source units may result in poorer performance for

this method.

B. Effects of Source Unit Delineation Criteria on Data
Reaggregation

The criteria, if any, used to delineate the source
units can give clues to the configuration of the actual
surface. There are other, preferred methods for obtaining
information on actual surface configuration, namely: 1)
smaller source units and 2) data for intra-unit features
such as the locations and populations of cities. However:
this kind of information can be difficult to obtain.
Often, verbal descriptions of the delineation criteria are
the best information at our disposal about the intra-unit
configurations. Our ability to model this configuration
for a study area, i.e., to generate an accurate estimating
surface, could strongly affect the accuracy of the

reaggregated data.

C. Research Objectives

The present study replicates Lam's experimental
methodology using units that are significantly different in
size and delineation criteria. It uses source units that

are much larger than Lam's census tracts and that are

delineated in a manner that should favor the pycnophylactic

19
method. In other words, an attempt is made to ascertain
whether source unit size and delination will lead to a
significant difference between them. The performances of
the two volume-preserving data reaggregation methods are
compared using the 48 contiguous states as target units and
181 economic areas, delineated by the U.S. Bureau of
Economic Analysis, as source units (fig. 3.1).

The economic areas completely cover the area of the
United States. The sets of delineation criteria for Lam's
census tracts and the economic areas are listed in Table
3.1. These criteria give some indication of the
configuration of the population density surface within the
source units. They can be used to hypothesize idealized,
intra-unit population distributions for each set of units,
and a comparison of these idealized distributions with the
estimating surfaces used by the data reaggregation suggests
which method will perform better.

Of the two sets of criteria, the one for economic areas
reveals more about the actual surface configuration. The
idea that each area contains a predominant node or central
place (criterion 82) surrounded by tributary counties
suggest an idealized distribution that is bivariate normal.
A cross-section of this distribution is shown in figure
3.2a. This idealized economic area distribution
corresponds closely to the "peak" that is interpolated
using the pycnophylactic method when a source unit is
surrounded by neighbors with lower densities (fig. 2.2c),

i.e., when it is a local maximum. Thus, the

20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CI .
SOURCE UNITS:
ECONOMIC AREAS
I“ headless-em 06 best urn-wane”)
b- TARGET UNITS:
48 CONTIGUOUS STATES
C. SOURCE UNITS AND TARGET UNITS
SUPERIMPOSED

 

 

 

 

 

 

Figure 3.1. Source Units and Target Units

 

21

 

Table 3.1. Source Unit Delineation Criteria

A. Census Tracts in London, Ontario (used by Lam)
l. Boundaries follow permanent, easily identified features such as

roads and streans.

2. Tbtal population between 2500 and 8000 for each tract, except
within central business district.
3. Socio—econanic hanogeneity within each tract.

4.<3mmmmt shape.

B. Economic Areas in the United States (used in present study)
1. Each area disaggregates into counties.
2. An economic node, usually an SMSA, within each area.
3. Sane areas contain smaller SMSA's as secondary nodes.
4. Cities with population above 25000 function as nodes in regions

vtererw aﬁm'scxmur.

 

 

 

a) Within an Economic Area b) Within a Census Tract

Figure 3.2 Idealized
pycnophylactic method should
states intersecting economic

and worse estimates when the

Surface Configurations
produce better estimates for
areas that are local maxima

economic areas are local

minima. The difference in the errors associated with each

method will generally be greatest in the locally

high—valued economic areas, where the pycnophylactic method

performs best. Based on the size and delineation criteria

for economic areas, then, it

is anticipated that the

pycnophylactic method will yield superior estimates.

Although the criteria for census tracts indicate

nothing specific about the actual surface configuration,

22
some information can be hypothesized. Permanent linear
features (criterion A1) often separate land uses that are
associated with different population densities. That is,
they may represent discontinuities in the population
surface. Socio-economic homogeneity (criterion A3) within
a tract may imply some degree of discontinuity in the
population surface. For instance, regions of lower
socio-economic status are often associated with higher
population densities. Finally, if we accept Coulson's
hypothesis, compact shape (criterion A4) reduces the
potential for variation of the census tracts. Thus, there
is some justification for proposing a model for the census
tract distributions similar to the stepped surface shown in
cross-section in figure 3.2b. This model resembles the
estimating surface associated with the overlay method, the
source units themselves. The overlay method did, in fact:
have a higher percentage of superior estimates for Lam's
planning districts (71%).

To summarize, the objective of this research is to
determine whether there are significant differences in the
performance of the volume-preserving methods of areal data
reaggregation. It is argued that the absolute size and
delineation criteria for source units can be important
discriminating variables, and the units selected for this
study are markedly different in these two factors than
units used previously. The use of economic areas, rather
than census tracts, as source units may favor of the

pycnophylactic method.

CHAPTER IV. RESEARCH METHODOLOGY

The general research strategy has precedent in the
earlier studies. A set of small units that aggregate into
both the source and target units is used as an
approximation of the actual density surface. The data for
these small units are aggregated to give ”actual" values
for the source and target units. The data reaggregation
methods are applied to the source units to yield estimated
values for the target units. These estimates are then

compared with the actual target unit values.

A. Data Sources and Compilation

U.S. counties in 1980 were used as the small units
because they aggregate into both economic areas and states.
The digitized map data for 3073 counties were obtained from
a file distributed by the 0.8. Bureau of the Census. The
county population data were obtained from the County and
City Data Book, 1980 (U.S. Bureau of the Census).

The organization of the data was then transformed from
the original polygon, or vector, structure to a raster
structure using the ERDAS 400 digitizing system. A grid
cell approximately equal in size to Baltimore City,
Maryland and Washington, D.C was selected (167.3 sq. mi.)-
The original vector data had been digitized from a base map
on an Alber's Equal Area projection.

The rasterization process resulted in the elimination

of 433 counties from the file. This loss was probably an

23

24
artifact of the rasterization algorithm and the file
sequence of the digitized county data. A more detailed
description of these two factors is provided here to

account for the loss of data.

B. Rasterization Methods and the Loss of County Data

The vector-to-raster conversion process often operates
sequentially, processing one feature (in this case, a
polygon) at a time. In other words, it assigns a cluster
of grid cells to each county in the file as it comes to it.
Depending on the criteria used for assignment, small
counties processed earlier in the file can be lost because
their grid cells are assigned to subsequent counties in the
file. Some common assignment criteria are: l) assign the
grid cell to the county with the largest overlap area, 2)
assign the grid cell to the county that overlaps at the
center point of the cell, and 3) assign the grid cell to
the last county in the file that intersects it, regardless
of the area of overlap. The third criterion is the crudest
and would probably result in the loss of more counties than
the other two, but any of these methods can result in some
loss. Information on the algorithm used by the ERDAS
system is proprietary.

The counties in the file were generally organized
alphabetically within states that were also organized
alphabetically. A consequence of this organization was
that the loss of counties was biased toward those with

names that are low in the alphabet because they were

25
processed earlier. Although many counties were lost, none
of the states or economic areas were unrepresented; and
because the surface without the lost counties became the
surface of reference for all further analyses, the data
loss should not affect the conclusions reached about the

reaggregation methods.

C. Implementation of the Areal Data Reaggregation
Methods

Once the population and map data were consolidated and
aggregated into states and economic areas, the two
reaggregation methods were applied to the economic areas.
With raster data, the overlay method simply required
aggregating all of the grid cell values for economic areas
into states. The pycnophylactic interpolation process
described in figure 2.1 was applied to the economic areas
and allowed to iterate twelve times. The changes in
individual cell values and in total populations for
economic areas were closely monitored during this process.
It was found that the rate of change for both had nearly
decreased to zero after twelve iterations. Additional
iterations probably would not have transformed the surface
significantly. The final pycnophylactic surface was then
aggregated into states to produce population estimates.

Some of the details of pycnophylactic interpolation are
not shown in figure 2.1 and deserve mention. First, the

smoothing routine assigned each grid cell the mean value of

its non-diagonal neighbors, regardless of how many there

26
were (zero to four). If a cell had no non-diagonal
neighbors, then its value remained constant; and if a cell
had only one neighbor, then it was assigned the neighbor's
value. Second, cells at the boundary of the study area
were not affected by their neighbors outside of the study
area, which were assigned a constant value of zero. This
means that there were ”cliffs" at the edges of the study
area even after interpolation. This boundary condition is
more realistic along the coasts of the United States than
along its Canadian and Mexican borders.

Some of the primary deficiencies of the research design
were the approximation of the actual surface with
county—level data, the loss of counties due to
rasterization, the somewhat arbitrary selection of twelve
iterations for the pycnophylactic interpolation process,
and the unrealistic boundary conditions. It is felt that
none of these weaknesses was important enough to affect the

results substantially.

CHAPTER V. RESULTS

The results of these data reaggregation experiments can
be divided into three parts. First, since the number of
iterations used in the pycnophylactic algorithm directly
affects the target unit estimates, information on the
convergence of the interpolation process is valuable.
Second, the "goodness-of-fit" of the estimating surface
(economic areas for overlay, the interpolated surface for
the pycnophylactic method) to the actual one, may be the
most important factor affecting the overall performance of
the reaggregation methods and needs examination. And third,
the target unit estimates themselves constitute the primary
results. The overall performance of the methods is compared

with their performance in Lam's study.

A. Convergence of the Pycnophylactic Interpolation
Process

The interpolated surface was reaggregated into the
target units after twelve iterations. The decision to stop
at this point was arbitrary, but it can be supported by
examining the change in the estimating surface during the
interpolation process. Two statistics indicative of this
change were monitored during this process. These
statistics were 1) the RMSE between the values for each
grid cell after subsequent smoothings, and 2) the RMSE

between total target unit values before and after

adjustment. The following formula was used:

27

28

. - " E . . . . 2
RMSE] - n . (0113 .01 13-1) 4)
1

where, fer the first RMSE:

0i,j is the value fer grid cell i after iteration j (j > 1):

01 -_ is the value fer grid cell 1 after iteration j-l,
n i; the number of grid cells, and

 

 

where, for the second RMSE,

Oi - the the value fer target unit i after adjustment,

Oi'J_ is the value fer target unit i before adjustment,
n i; the nunber of target ants.

These RMSE's indicate the overall effectiveness of the
smoothing routine and the volume-adjustment routine:
respectively. The progression of their values is displayed
in figure 5.1.

Since the first RMSE was computed over 18060 grid
cells, as opposed to 48 states, its value changed more

smoothly. The rate 2; change gf both curves had declined

 

almost 32 zero 2y twelve iterations, which can be

 

 

interpreted £3 mean that the smoothing and adjustment

routines had lost most 23 their effectiveness. Additional

 

iterations, then, probably would not have changed the

 

surface siggificantly.

B. Goodness-of—Fit of the Estimating Surfaces

The "optimal" estimating surface would minimize the
difference between itself and the actual surface, and the
trend of this difference during the pycnophylactic
interpolation process indicates whether or not the process

is improving the surface. RMSE can be used again to

summarize this error in the interpolated surface.

29

Figure 5.2 shows the succession of RMSE values between
the interpolated and actual (county) values for all grid
cells. Iteration 0 is the pre-interpolated (economic area)
surface, and iterations 1 through 12 are the increasingly
smooth, pycnophylactic surfaces. The error monotonically
decreases and is similar to the convergence of the
pycnophylactic process represented in figure 5.1. 323

pygnophylactic surface after 12 iterations, then, was

 

generally 2 better approximation than any 2; the previous

 

 

 

gggg, includigg the economic area surface used 52 derive
overlay estimates. This is an important result, because it
suggests that the target estimates using the pycnophylactic
method should be an improvement over those using overlay.
But positive and negative errors in the estimating surface
can offset one another when they are aggregated into the
target units, so the aggregation can hide the improvement.
The goodness-of-fit can also be viewed spatially using
maps of the cell-by-cell difference between the estimated
surfaces and the actual one. The maps displayed in figure
5.3 correspond to iterations 0 (5.3a and 5.3b) and 12 (5.3c
and 5.3d) in figure 5.2. They show the error in the
estimating surfaces used by the overlay and pycnophylactic
methods. For visual clarity, the values are classed and

separated into their positive and negative components.

The two estimating surfaces must be very similar judging

 

By their similar error distributions. The postive errors

 

are extensive and the negative errors are intensive and

 

concentrated 12 urban areas. Neither method, then, is able

 

30

 

 

 

 

 

 

 

20" a. Change in Source Unit Population
due to Volume Adjustment

3 15 u
E
(If, 10 «r
a
C

5 «In A .4 -

Iteration
20" b. Change in Estimating Surface
between Successive Smoothings

8
9 15 «H-
5
III
In
2 104-
C

Sell-

0 : : a: t :: :14; : : 4: t 1%

6 IO
Iteration

Figure 5.1. Convergence of the Pycnophylactic
Interpolation Process

  
  

473
472

471

470
489
488
487
488
488

RMSE (I100)

  

Iteration

Figure 5.2. Error in the Estimating Surface

 

r

rﬂynmmuﬂwhmﬁ w .gif a—«Wﬂmﬂj Wirﬁnﬂﬂzmwxnm... .. . . a . .T‘:rrlr.l|li.lrill_l‘rl.ilr.\ér...z.;I..
z unmaﬁi w irmw hm”. . I . . ., . 832 .26

an

OOOOpOaOOOp

w

I W

839 8 32: I M
I W

822 2: i _

no. 8 o D

wit. u ._ . Emu, a . :33 . 35.233

HLE; . “We .. u95.5.3.5.“ ..
I? _

    

mooc< 289.com
wEOm—mw w>_._._m0n_

  

' Areas

Economic

Figure 5.3a. Error in the Estimating Surface,
Positive Error:

    

 

   

       

     

                      
       

    

.mx ﬁne“... 9?? .. . ...---z.s-,--.-.i.-. i-..

. mxxwmwh . . . 882' 3...... I W

. x «new? _.
. t _. , . I 2 I .
mm»; x.oooo2 82: I m

.31.: T».

v‘ r.

rig.

837 8 82.. I
89: 2 on? E .
co? 2 o D u.

.2354 . 3522.5“ _
nzo.53=...o..

 

mocha. 289.com
mmOmm—w w>F<0mZ

          

      

 

F KZJMX

.rr.IC).Ip....

  
  

Surface,

1ng

1n the Estimat
Negative Error: Economic Areas

Figure 5.3b. Error

 

r; . . . . 4.1 .1 .9an

wamtnﬂwﬁicramm . Lrwwnﬁrwi.n..nmm.ﬁu.ﬁ.m,umr13.! .. . . u. U .. . .. .” 5i\ZI.r..I.II!x..:.III".... i. ...
.1. .7me . . . gt .memwmm . . .. . . . . . 00000.. .25 ..
4:: . , ﬂ_. .,..m.. . ...- , ooooo—o.oooop

Is
.3; ..

00009 3 000..
000' 3 009

00.. o. 0 D

<3h0< . th<c§hmw

 

 

        

1 u... .. .
.rx:. . .. . . a
.

   

mmOmmm ugh—m0...—

Surface,

the Estimating
Positive Error: Pycnophylactic Surface

Error in

.3c.

Figure 5

 

cocoa? 5:5 I

.832: 8 887 I
887 8 coo—I I
82: 2 87 D

00P|

4<DFU<

nu

 

.

m

A.

 

the Estimating Surface,

1n

Negative Error: Pycnophylactic Surface

Figure 5.36. Error

35
to capture the sudden "peaks" in the distribution at the

urban areas. The largest positive errors occur in suburban

areas e

C. Target Unit Estimates and Errors

Tables 5.1 and 5.2 show the actual and estimated
population data for states and the summary statistics
compiled from them. These data are also shown spatially in
figures 5.4a-5.4c (actual and estimated population
densities) and 5.53-5.5d (positive and negative errors for

each method). in general, the results appear 52 reinforce

 

 

 

 

Lam's findings. There is little difference ig.the

 

performance pf the volume-preserving methods, and both

 

 

 

 

yielded fairly accurate and precise estimates.

 

Paired-sample t-tests revealed no significant
difference between the overlay and pycnophylactic estimates
or between each set of estimates and the actual values at
the 95% confidence level. Four states (Arizona,
California, Maine, and Rhode Island) had perfect estimates
using both methods because their boundaries coincided
perfectly with those of the economic areas that overlapped
them. The data for these states were not included in the
summary statistics.

The correlation/regression and RMSE statistics also
indicate good overall performance for both methods. The
correlations show the high precision of the estimates. The
regression slopes are close to l, but the intercepts, like

Lam's, are not significantly different from zero. The

36

 

 

5.55: 5.55: 5.555555: 5.555555: 5.555555 5.5555555 5555555 52
5.5a 5.5 5.5555: 5.5555 5.555555 5.555555 555555 5:
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 oz
5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 5555555 5:
5.5- 5.5- 5.555555: 5.555555: 5.5555555 5.5555555 5555555 2:
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 5:
5.55- 5.55: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 5:
5.55- 5.55: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 5:
5.5 5.5 5.5 5.5 5.5555555 5.5555555 5555555 5:
5.5: 5.5: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 55
5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 5555555 55
5.55: 5.55: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 55
5.5: 5.5: 5.555555: 5.555555- 5.5555555 5.5555555 5555555 45
5.55 5.55 5.5555555 5.5555555 5.5555555 5.5555555 5555555 25
5.55: 5.55: 5.5555555: 5.5555555: 5.5555555 5.5555555 55555555 55
5.5: 5.5: 5.55555: 5.55555: 5.555555 5.555555 555555 55
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55
5.5- 5.5: 5.55555: 5.55555: 5.5555555 5.5555555 5555555 55
5.555 5.555 5.555555 5.555555 5.555555 5.555555 55555 55
5.55: 5.55- 5.5555555: 5.5555555: 5.5555555 5.5555555 5555555 so
5.5: 5.5: 5.55555: 5.55555: 5.5555555 5.5555555 5555555 on
5.5 5.5 5.5 5.5 5.55555555 5.55555555 55555555 50
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55
5.5 5.5 5.5 5.5 5.5555555 5.5555555 5555555 55
5.5: 5.5: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 55
8mm Nuauws 8a NMHHQ’O OCH Nwauwbo H8552 Baum
32 5. 5855 585.5 3555555

555555 505 5555 5055555555 .5.5 55555

 

37

 

 

 

5.55 5.55 5.555555 5.555555 5.555555 5.555555 555555 53
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 53
5.55 5.55 5.5555555 5.5555555 5.5555555 5.5555555 5555555 >2
5.5: 5.5: 5.55555: 5.55555: 5.5555555 5.5555555 5555555 .52
5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 5555555 5>
5.55 5.55 5.555555 5.555555 5.555555 5.555555 555555 5>
5.55: 5.55: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 55
5.5: 5.5: 5.555555: 5.555555: 5.55555555 5.55555555 55555555 55
5.55: 5.55: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 25
5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 .555555 55
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55
5.5 5.5 5.5 5.5 5.555555 5.555555 555555 55
5.55: 5.55: 5.5555555: 5.5555555: 5.5555555 5.5555555 55555555 55
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55
5.5 5.5 5.555555 5.555555 5.5555555 5.5555555 5555555 55
5.55: 5.55: 5.5555555: 5.5555555: 5.5555555 5.5555555 5555555 :5
5.55: 5.5: 5.55555: 5.55555: 5.555555 5.555555 555555 52
5.5: 5.5: 5.555555: 5.555555: 5.5555555 5.5555555 5555555 52
5.5: 5.5 5.55555: 5.55555 5.5555555 5.55555555 55555555 52
5.55 5.55 5.555555 5.555555 5.5555555 5.5555555 5555555 22
5.55 5.55 5.5555555 5.5555555 5.5555555 5.5555555 5555555 52
5.555 5.555 5.555555 5.5555555 5.5555555 5.5555555 555555 :2
5.5 5.55: 5.5555 5.55555: 5.555555 5.555555 555555 >2
8mm 555596 8mm 5555558 8mm 555596 5532 335
5.52 5. 5855 55555 8525555

Av.u:oov .H.m manna

 

38

Table 5.2. Sunnary Statistics for Actual and Estimated Values for

 

States

Fst. bathed RMSE R 810 Intercggg
(RRMS_E) (E2) 5.15:.) (3.13.)

Overlay 773094.8 .985 .946 + 214946.655
(.182) (.971) (.024) (141770.957)

Pycno. 757856.9 .984 .950 + 201315.690

(.189) (.968) (.025) (147942.336)
+ Significant at 95% confidence level

 

 

POPULATION DENSITY. 1980
Actual‘

   

People] Grid Coll
- 0 to 5000
- 6000 to 15000
4.5:.:'§.':r5"'i'._l 15000 to 25000

25000 to 50000
50000 to 75000
75000 to 100000
over 100000

    

 

' After Rasterization

 

Figure 5.4a. Population Density, 1980: Actual

 

39

 

POPULATION DENSITY. 1980
Overlay Method

   

People I Grid 0.)

- 0 to 6000

- 6000 to 16000
a 16000 to 26000
1 25000 to 50000
p 50000 to 75000
75000 to 100000
over 100000

   
   

 

 

 

Figure 5.4b. Target Unit Population Densit ,
Population Density, 1980: Overlay Method

 

POPULATION DENSITY. 1980
Pycnophylactic Method

   

People I Grld

- 0 go 5000
- 6000 to 15000
» 15000 to 26000
25000 to 60000
76000 to 100000
ovOI 100000

 
   

 

 

 

Figure 5.4c. Target Unit Population Density, .
Population Density,l980: Pycnophylact1c Method

40

 

POSITIVE ERROR
Overlay Method

   
  

‘%A£TUALPOE

over 100
NEGATIVE ERROR ' Boundaries Coincide with Economic Area:

 

 

 

 

Figure 5.5a. Target Unit Error,
Positive Error: Overlay Method

 

NEGATIVE ERROR
Overlay Method

   
  

9GACTUALPOR

” —26 to —50
POSHTVEERROR

 

 

’ Boundlriee Coincide with Economic Area:

 

 

Figure 5.5b. Target Unit Error,
Negative Error: Overlay Method

41

 

POSITIVE ERROR

  

Method

‘%ACTUALPOI

V 25 to 50

50 to 100

over 100

NEGATIVE ERROR ' Boundaries Coincide with Economic Area:

 

 

 

 

Figure 5.5c. Target Unit Error,
Positive Error: Pycnophylactic Method

 

NEGATIVE ERROR
Pycnophylactic Method

  
  

 

TGACTUALPOP.

‘—25 to -60
POSHTVEERROR

 

 

' Boundaries Coincide with Economic Areaa

Figure 5.56. Target Unit Error
Negative Error: Pycnophylactic Method

 

42
RMSEs are very similar, but the RRMSEs exceed Ford's
arbitrary cut-off value of .10 considerably. Since none of
the empirical tests of areal data reaggregation methods

have demonstrated accuracy as high as that prescribed by

Ford, it may be that a higher cut-off value would be more
realistic.

The values for states with highly urbanized neighbors

 

(e.g., Delaware, Indiana, Kentucky, New Hampshire, Vermont,

and West Virginia) were often severely overestimated. This

 

result is probably the counterpart, at this smaller
geographic scale, of the suburban errors that Lam observed.
The most glaring errors occurred in Delaware and New
Hampshire. These errors can be largely attributed to the
relatively larger size of the economic areas that overlap
these states.

The small size of Lam's data set (21) limits the
validity of parametric statistical tests that could
otherwise be used to compare these results with hers.
However, the data do suggest some improvement for the
pycnophylactic method relative to overlay. In particular,
the pycnophylactic method yielded better estimates for only
29% (6) of Lam's planning districts versus 59% (26) states.
Although the regression statistics can be used as
indicators of the accuracy of the methods, they are only
summary statistics. For a particular application, the
number of improved estimates that one obtains by using one
method or the other is also important.

Given an expectation of equal accuracy for the methods,

43

a chi-square test for a relationship between performance of
the reaggregation method and units used was performed.
Although the units used cannot be separated from other
differences in the two studies that could cause differences
in performance, it is felt that the units are clearly the
most important methodological difference.

Table 5.3 shows the contingency table used in this
test. The computed chi-square statistic from these data is
5.29. This value allows us to conclude, at the 95%

between the performance of the method and the spatial

 

characteristics of the areal units used in the stud .

 

Finally, a binomial difference of proportions test
(Hammond and McCullagh, pp. 154-157), similar to a t-test
for difference of means, was conducted to determine whether
the observed proportion of successes for the pycnophylactic
method (.59) is significantly greater than the proportion
that would be expected if the methods performed identically
(.50). The difference of proportions test statistic has a
standard normal probability distribution and had a computed
value of 0.882 for this data set. The hypothesis that the
proportion of successes is equal to .50 cannot be rejected
at the 95% confidence level. There is a fairly high
probability that a proportion as large as .59 occurred
completely by chance. This result suggests that there is
little difference between the two methods for a given set
of data.

To summarize, the results reinforce many of the

44

 

Table 5.3. Relationship of Success of Reaggregation Method to
Source Units Used

Meduﬂ
Owaﬂay Eycno.
U census Tract 15 (10.66) 6 (10.34)
2‘
t Econ. Area 18 (22.34) 26 (21.66)

* Expected value in parentheses

findings of previous studies. High errors are more likely
to occur in target units estimated from relatively larger
source units and in predominantly suburban target units.
The overlay and pycnophylactic methods performed nearly
identically and yielded fairly accurate and precise
estimates. A new finding is that, despite the fact that
the pycnophylactic estimating surface was generally a
better approximation of the actual one, this improvement
was not translated into significantly better target unit
estimates. Finally, there is some support for the
hypothesis that the size and delineation criteria of the
source units can affect the relative accuracy of the two
methods. For this individual study, the units selected
resulted in superior performance for the pycnophylactic

method.

CHAPTER VI. SUMMARY AND CONCLUSIONS

This study has compared the performance of two of the
best available methods for reaggregating areal data into
different units.

The primary motivation for the work was the finding, in
a previous study (Lam, 1980), that there was no difference
in the performance of the methods. This was a puzzling
result because of the difference in their estimating
surfaces, the approximations of the actual surface that are
aggregated into the target units. The overlay method
aggregates the source units themselves, a stepped
estimating surface, and the pycnophylactic method
aggregates a smooth estimating surface interpolated from
the source units. A characteristic of both of these
methods that improves their performance relative to others
is that their estimating surfaces both preserve the total
value, the volume, within the source units.

The units selected for this study, economic areas and
states, have spatial characteristics that 1) are markedly
different than those used previously (census tracts and
planning districts), and 2) should result in greater
accuracy for the pycnophylactic method. In particular, the
units were much larger and the delineation criteria for the
economic areas suggested an intra—unit distribution that is
approximately bivariate normal, rather than the homogeneous
distribution hypothesized for census tracts. These factors
probably favored the overlay method in Lam's study.

45

46

The results, like Lam's, do not strongly favor either
method. Although both methods are fairly accurate and
precise, there was no significant difference in their
estimates. Also, although the pycnophylactic method had a
higher proportion of successes than overlay, it was not
significantly greater than the proportion that would be
expected if the methods performed identically. The
estimates were compared only after the pycnophylactic
interpolation process had almost completely converged.

Comparing the results with those from Lam's study, a
significant difference was found in the relative success of
the two methods. There is some support, then, for the
hypothesis that the spatial characteristics of the units
affect the relative performance of the methods.

The highest errors occurred most frequently when the
target units intersected source units that were relatively
large or highly urbanized. In order to maintain the
volume-preserving constraint for the source units, the
negative errors (usually located at the urban centers) in
the estimating surface had to be offset by positive errors
in the neighboring (suburban) areas. These positive errors
were then allocated to the suburban target units.

In general, the better the fit of the estimating
surface to the actual one, the more accurate the target
estimates should be. However, this study has demonstrated
that the improved fit may not be translated into better
eatimates because of the offsetting of positive and

negative errors when they are aggregated into the target

47
units. The pycnophylactic surface was a better
representation of the actual one, but there were no
significant differences in the pycnophylactic and overlay
estimates for target units.

The maps of the errors in the estimating surfaces show
that the two methods suffer from a common deficiency: they
do not capture extremes in the actual surface well. The
estimating surfaces for both methods hover about the mean
density values of the source units, so they under- or
over-estimate locations that deviate from the means. While
the pycnophylactic method is a theoretical improvement over
the overlay method, it misrepresents the actual surface in
a manner somewhat analogous to the interpolation of a
topographic surface using distance-weighted averaging of
control points. That is, the interpolated surface is
continuous and smooth, or "undulating", while the actual
surface is continuous but sometimes "jagged".

The presence of sudden extremes may be a characteristic
of population (and related) density surfaces, and such
extremes may even be largely limited to a range of
geographic scales. The overlay and pycnophylactic methods

should perform better when the actual density surface has

few extremes.

CHAPTER VII. RECOMMENDATIONS FOR FURTHER STUDY

The aggregation of estimating surfaces into target
units is the step in the areal data reaggregation process
that distinguishes it from the problem of geographic
surface modelling. The objective of surface models is
usually to represent some actual surface as accurately as
possible. This study has shown that the aggregation step
acts as a sort of filter on the errors in the estimating
surface. A better estimating surface does not necessarily
produce better target estimates. This problem
notwithstanding, accurate surface modelling is still a
worthy objective in the study of areal data reaggregation:
an accurate estimating surface will probably do more to
provide good estimates than any other factor. with this in
mind, three avenues for future research on areal data
reaggregation are suggested.

First, different variables should be used in future
empirical tests. The pycnophylactic method will probably
model an undulating surface better than the seemingly
jagged population density surface used here.

Second, Clarke's fourier adaptation of the
pycnophylactic method should be applied to economic areas
or similar units. The Clarke algorithm is designed
specifically to capture periodicity, such as that exhibited
by geographic central places. It is this method, more than

any of the other available ones, that has the potential to

capture local extremes in the surface.

48

49

Third, and finally, recent research (Clarke, 1984;
Wallin, 1984) is demonstrating that the pycnophylactic
concept, interpolation constrained by volume-preservation,
is very flexible. Many different assumptions, or types of
detail, can be applied to the surface while preserving the
source unit values. One of the major reasons that the
pycnophylactic method did not convincingly out-perform
overlay in this study is that it interpolates a "peak"
within source units only in special cases, i.e., neighbors
of lower density. Almost all of the economic areas have
one predominant "peak", an SMSA, near their centers. Thus,
a bivariate normal distribution was proposed here as a
model of the population density surface within an economic
area. To constrain a pycnophylactic surface to bivariate
normality within source units is a worthwhile research task
that could significantly improve the pycnophylactic

estimates for this data set.

BIBLIOGRAPHY

Boneva, L., D. Kendall and I. Stefanov (1971), "Spline
Transformations: Three New Diagnostic Aids for the
Statistical Data—Analyst", Journal pf the Royal
Statistical Society, Ser. B, vol. 33, no. 1, pp. l-70.

 

 

Clarke, K. C. (1984), "Two-Dimensional Fourier
Interpolation for Uniform Area Data", Technical Papers,
50th Annual Meeting of the Americap Society 22
Photogrammetgy, vol.‘§, pp. 835-845.

 

 

 

 

Coulson, M. R. C. (1978), "Potential for Variation: A
Concept for Measuring the Significance of Variations in
Size and Shape of Areal Units", Geografiska Annaler, Ser.
B, vol. 60, no. 1, pp. 48-64.

Crackel, T. (1975), "The Linkage of Data Describing
Overlapping Geographical Units -- A Second Iteration",
Historical Methods Newsletter, vol. 8, no. 3, pp. 146-150.

 

Ford, L. (1976), "Contour Reaggregation: Another Way to
Integrate Data", Papers, 13th Annual Conference pf the
Urbap and Regional Systems Association, vol. 11, pp.
528-575.

 

 

 

Goodchild, M. F. and N. S-N. Lam (1980), "Areal
Interpolation: A Variant of the Traditional Spatial
Problem", Geo-Processipg, vol. 1, pp. 297-312.

 

Hammond, R. and P. McCullagh (1974), Quantitative
Techniques 32 Geography: 32 Introduction, Clarendon Press:
Oxford, 318 pp.

 

 

 

 

Lam, N. S-N. (1980), "Methods and Problems of Areal
Interpolation", Ph.D. Dissertation, University of Western
Ontario, l77pp.

 

 

Lam, N. S-N. (1983), "Spatial Interpolation Methods: A
Review", The American Cartographer, vol. 10, no. 2,

 

Lam, N. S-N. (1985), "A Method for Choropleth
Inversion", Technical Papers, 45th Annual Meeting pf the
American Congress 22 Surveyipg and Mapping, pp. 365-373.

 

 

 

'50

51

Markoff, J. and G. Shapiro (1973), "The Linkage of Data
Describing Overlapping Geographical Units", Historical
Methods Newsletter, vol. 7, no. 1, pp. 34-36.

 

 

Tobler, W. (1979), "Smooth, Pycnophylactic
Interpolation for Geographical Regions", Journal 2£.£EE
American Statistical Association, vol. 74, no. 367, pp.
519-536.

 

Tobler, W. and J. Lau (1978), "Isopleth Mapping Using
Histosplines", Geographical Analysis, vol. 10, no. 3, pp.
273-279.

 

United States Bureau of the Census (1983), "County and
City Data Book, 1983", U.S. Government Printing Office.

United States Bureau of Economic Analysis (1977), "BEA
Economic Areas (revised 1977): Component SMSA's, Counties,
and Independent Cities", U.S. Government Printing Office.

Wallin, E. (1984), "Isarithmic Maps and Geographical
Disaggregation", Proceedings, International Symposium pp
Spatial Data Handling, vol. 1, pp. 209-217.

 

 

 

 

 

   

"TI’I'IWQITILMMIIﬁiﬁjlﬂfﬁiﬂfimﬂmﬂ'Es