)V1531_J RETURNING MATERIALS:
PIace in book drop to
LIBRARIES remove this checkout from

-rj—. your record. FINES will
be charged if book is

 

 

returned after the date
stamped beIow.

 

 

 

EXTENDED RULES FOR THE CLASSIFICATION
OF DEPENDENT PARAMETERS

By

Kosgallana Liyana Durage Gunawardena

A DISSERTATION

Submitted to
Michigan State University
in partial ful lllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability
1988

ABSTRACT

EXTENDED RULES FOR THE CLASSIFICATION
OF DEPENDENT PARAMETERS

By

Kosgallana Liyana Durage Gunawardena

In this dissertation we consider the classiﬁcation problem in which the
unknown states .6 = (01,02, ..... ,0n) are dependent.

We ﬁrst review the compound approach to the construction of decision
rules and then introduce the extended compound approach as a general
approach to the construction of rules with favorable risk behavior. In most
empirical Bayes literature, the {0i} are assumed to be i.i.d.. The
construction of decision rules in the i.i.d. case is very simple. The problem
becomes considerably more complex if the {0i} are dependent, since
the Bayes risk may depend on n and the class of distributions on {0i}
may be of such high dimension so as to make accurate estimation impossible.
The extended compound decision theory developed by Gilliland and Hannan
(1969) is utilised to obtain decision rules in the case when {0i} are
dependent.

The one—dimensional case is generalized to two dimensions where the
indices are positions in an integer lattice. The two—dimensional case has
applications in image segmentation and pattern recognition where the image is

a matrix of parameters Q = {0U} taking values in a ﬁnite set 9 . The

observed image is a random matrix X having a distribution depending on

Q. Here Q may reasonably be assumed to be the realization of a Markov
random ﬁeld in some applications. The extended compound approach to the
image case is described with the choice of a neighborhood system on the
lattice.

Many simulations were run to compare the risk behavior of the
extended compound rules with that of other rules. The simulations performed
show that the extended rules perform better than the other empirical Bayes

rules that have been pr0posed.

To my daughter Kalpanee

iv

ACKNOWLEDGMENTS

I am truly grateful to my advisor Professor Dennis C. Gilliland for
his continuous encouragement and guidance in the preparation of this
dissertation.

I would also like to thank Professor James Hannan for the introduction
to compound decision theory and for teaching me much about mathematical
statistics.

I would like to thank Professors R. Ericson and H. Salehi for serving on
my guidance committee.

Also thanks are due to Ms. Loretta Ferguson for typing the manuscript.

Finally, I wish to thank the Department of Statistics and Probability
for providing the ﬁnancial support which made my graduate studies in
Statistics possible.

TABLE OF CONTENTS

CHAPTER PAGE
1. Introduction ...................... 1
1.1 A Literature Review ................. 1
1.2 A General Approach ................. 10
1.3 The Image Problem .................. 17
1.4 Summary ...................... 18
2. The Transect Case ................... 20
2 1 Introduction .................... 20
2.2 Classiﬁcation by Maximum Posterior Probability (MAP) .22
2.3 Empirical MAP .................... 28
2.4 Classiﬁcation by Extended Compound Bayes Rules . . .30
2 5 Comparison of Methods ................ 39
3. The Image Case ..................... 41
3 1 Introduction .................... 41
3.2 Introduction to Image Segmentation ......... 47
3.3 Classiﬁcation by Empirical Bayes Rules ....... 50
3.4 Classiﬁcation by Extended Rules .......... 55
3 5 Comparison of Methods ................ 59
4. Simulation ...................... 60
4.1 Introduction .................... 60
4. 2 Simulations on the Transect ............. 62
4. 3 Classiﬁcation of Images .............. 63
4. 4 Conclusions ..................... 65
APPENDIX ...................... 97
Dynamic Programming Recursion ............ 97
BIBLIOGRAPHY ....................... 99

vi

LIST OF TABLES

TABLE

SDmKIOSO‘IbWMo—D

HHHHHHD—‘I—‘l—i
ao-acacnw-wwwc

Bayes risk and PCC for the SR ...........
The PCC for the EMAP rule with n = .50 ......
The PCC for the EMAP rule with n = 200 ......
The PCC for the HEB3 rule with n = 50 ......
The PCC for the HEB3 rule with n = 200 ......
The PCC for the HEBS rule with n = 50 ......
The PCC for the HEB5 rule with n = 200 ......
The PCC for the ER2 rule with n = 50 ......
The PCC for the ER2 rule with n = 200 .......
The PCC for the ER3 rule with n = 50 ......
The PCC for the ER3 rule with n = 200 .......

The performance of the ER2 rule versus the HEB3 rule

The PCC with the GREAT LAKES pattern ......
The PCC with the CHECKERBOARD pattern ......
The PCC with the ROAD SIGN pattern ........
The PCC with the (.90,.50,.50,.10) MRF pattern . . .
The PCC with the (.01,.10,.01,.90) MRF pattern . . .

The PCC with the (.80,.40,.50,.60) MRF pattern

vii

..61
..66
..67
..68
..69
..70
..71
..72
..73
..74
..75

.76

..77
..78
..79
..80
..81

.82

LIST OF FIGURES

FIGURE
1 Neighborhood systems J; ........
2 Neighborhood systems .11 and .12 and their
associated cliques . . . .
3 GREAT LAKES: The best classiﬁcation . .
4 GREAT LAKES: The worst classiﬁcation .
5 CHECKERBOARD: The best classiﬁcation .
6 CHECKERBOARD: The worst classiﬁcation
7 ROAD SIGN: The best classiﬁcation . . .
8 ROAD SIGN: The worst classiﬁcation . .
9 MRF (.90,.50,.50,.10): The best classiﬁcation
10 MRF (.90,.50,.50,.10): The worst classiﬁcation
11 MRF (.01,.10,.01,.90): The best classiﬁcation
12 MRF (.01,.10,.01,.90): The worst classiﬁcation
13 MRF (.80,.40,.50,.60): The best classiﬁcation
14 MRF (.80,.40,.50,.60): The worst classiﬁcation
15 GREAT LAKES classiﬁcation by SR and ER4

viii

........ 44

........ 46
........ 84
........ 85
........ 86
........ 87
........ 88
........ 89
..... 90

. 91

..... 92

. 93

..... 94

....... 96

CHAPTER 1
INTRODUCTION

In this thesis we consider the classiﬁcation problem in which each of n

observations is to be classiﬁed as belonging to one of two classes.

1.1 A Literature Review

In most of the work relating to the classiﬁcation problem, the successive
classes are assumed to be independent. In many situations, though, it is
unreasonable to assume that the successive classes are independent. One of
the earliest, if not the ﬁrst, papers to consider positively correlated classes
and that a more sensitive scheme of classiﬁcation can be obtained by
exploiting this correlation is due to Cox (1960).

In Cox (1960) a sequence of batches {Bi} is to be classiﬁed based on
the number of defectives in random samples of predetermined ﬁxed size. The
{Xi} are random observations, where Xi is the number of defectives in the
sample from batch Bi' The {Xi} are assumed to be independent and
have a Poisson distribution with mean mi, given the sequence {mi},
where 111i is the true but unknown batch quality. To complete the model,
the sequence {mi} is assumed to be a realization of a two—state Markov
chain with speciﬁed states and known transition matrix. Cox suggests using
some other X's in addition to xi in making the decision about the 1th
batch. In particular he suggests a one-step back rule based on (Xi_1,Xi) and
a one—step forward, one—step back rule based on (xi-l’xi’xi +1).

Preston (1971) seems to be the ﬁrst person to consider the empirical
Bayes approach to a classiﬁcation problem in which the set of parameter

values is a realization of a stationary Markov chain. The model considered

by Preston is very specialized. The random observation X is assumed to

have the following conditional distribution:
P(X=0|0=0)=1,P(X=1|0=1)=p,P(X=0|0=1)= l—p

where p is known. It is assumed that the unknown classiﬁcations {0i}
are a realization of a stationary Markov chain with unknown transition
matrix and that the observations Xi are independent conditional on {0i}.
At stage n, the transition probabilities are estimated based on the partial
sequence x = (X1,X2, ..... ,Xn) and substituted into the Bayes rule for the
decision about 0n.

Devore (1975) considers the Robbins' (1951) component with Markovian
{0i}. The Robbins' component has normal conditional distributions
N(20 — l, 1) and independent observations conditional on {0i}. In Devore
(1975), the {0i} sequence is assumed to be a realization of a stationary
two-state Markov chain with

P(0=0) = P(0=1) = .5

and transition matrix
.5(1 + 1r) .5(1 — 1r)

for a or 6 [—1,1].
.5(1 — 1r) .5(1 + 7r)

The above model can be considered as a generalization of the model with
repetitions of the Robbins' component problem with independent 0i and
probabilities P(0=0) = P(0=1) = .5. For this model, the simple compound
rule g0 = (wgmg, ..... ,tpg) deﬁned by

1 > 0

«2?: decide 0, = if xi ‘ i= 1,2, ..... ,n
0 < 0

is the minimax classiﬁcation rule for Q = (01,0 , ..... ,on).
Devore studies the prOperties of classiﬁcation rules for 0i based on
linear functions of X.'s within the context of his problem. The two rules

J
considered are those resulting from Optimization over classes of linear rules:

Rule 1

Among all the rules in the class given by:

1 2 k
if Xi + aXi_l i = 2, ..... ,n
<k

decide 0.

choose that rule, i.e., a and k, which minimizes the probability of

misclassiﬁcation of 0i.

Rule 2
Among all the rules in the class given by:

1 2 0
de01de 0i = 0 1f aXi_l + Xi + aXi+1 < 0 1 = 2, ..... ,n

choose that rule, i.e., a, which minimizes the probability of misclassiﬁcation
of 0i.

In both cases the choice can be made independent of i for i 2 2. The
optimum rules have k=0 and a as a function of 1r. When 1r is unknown,
a moment estimator 5r based on X is used for r and a "plug—in" rule

based on ir is considered.

Hill et al. (1984) consider the application of parametric empirical Bayes
theory to the same classiﬁcation problem, which they call the two—crOp model
on the transect. In Hill et a1. (1984), {0i} represents the sequence of crap
types (classiﬁcations) on a transect with each 0i = 0 or 1. Let Xi be the

h

random observation at the it pixel. The Xi is assumed to have density

f0.(-) where both f0 and fl are known. The Xi's are assumed to be
1

conditionally independent given {0i}. The sequence of cr0p types is assumed

to be a Markov chain with transition probability matrix

D p
£=[00 01].

p10 p11

If the Markov chain is stationary, then the common marginal distribution for

Oiis

p10
p = P(0=0) = —— .
I’10‘LI’01
The induced prior density on Q satisﬁes
(1.1) P(§) or exp(¢lN -¢2F)
where
n
N = 2 01
i=1
F "' E (01 0i+1)
1—1
p11

and

pp
¢2=110 1100
p01910

if the end—points are ignored in the derivation of P(_0). Letting

flog)
fax.) ’

 

(1.2) Z. = log

we have
HEEL”) = iEl f1(xi) f0(xi)
n

 

f
(1-3) =.“ fo("i) [f

n
i=1
Using (1.1) and (1.3), the posterior density of 9, given 2E: is approximately

n—l

(1.4) «an a expli: setups-<12 131 (oi—emf].

Morris et a1. (1985) reports that the exact Bayes approach gives a
complicated joint posterior for Q whose maximization is non-trivial and that
efﬁcient likelihood estimation of the parameters of the Markov chain is

difﬁcult.

Taking f0.(-) to be N040” a) Hill et al. (1984) use empirical Bayes
1 1

theory to classify and to estimate the posterior probability for each class.
The parameters 110, "1 and a are assumed to be known.
With f0.(-) = N010) a), (1.2) reduces to
1 1

_ l
Zi — "’ $2 (1‘0 - ﬂl)(2Xi - ”0 - 1‘1)

 

 

”1'”0 - 1
where 6 = a and p = 2 (u0+pl) .
_ _ l _ ‘1- - .—
Let Z‘H?Zi and rj—nEXZHjZXZi Z).
Then E7 = 6205 - p)

~ _. _ i - _
Erj - 64p(1 p)(pll p01), 1 — 1,2, ----- .
The unknown parameters p11 and p01 are estimated by solving the

equations
6205 - p) = 7
(1.5)
fps - non — p0,) = r1 .
Instead of maximizing (1.4) over all possible Q, Hill et al. (1984)
develop a formula for the posterior log odds ratio for each component i. The

posterior log odds ratio is approximated by a moving average

P(0-=1|§)

(1.6) log P(0i=0ll(_) - p

.Z. . Z. .
j§1 WJ( 1—1 + 1+1)

7

where wj = (p11 - p01)j. Hence, the Bayes decision rule, where the loss is
the average number of misclassiﬁcation for 01,0 , ..... ,Qn, is approximated by

the rule

1 if RHS (1.6) > 0

(1.7) decide 0i

0 otherwise .

Hill et al. (1984) use (1.5) to estimate p and p11 — p01, substitute the
estimates into (1.6), and use (1.7) for classiﬁcation. Thus, they have
developed a set empirical Bayes decision procedure for the Robbins'
component and Markov prior. We will refer to procedures with j restricted
to {1} in (1.6) as HEB3 and to procedures with j restricted to {1,2}
in (1.6) as HEB5.

In Chapter 2, Section 1 we derive an exact expression for the induced
posterior density on Q , where the end-points are included. We also give an
algorithm for the maximization of the joint posterior, and classiﬁcation rules
are derived based on this maximization.

The work reviewed thus far concerns models with a certain common
underlying structure. The important features of this structure are:

1. There is a sequence of unobservable parameters { 0i} where
0i=0 or 1, i=l,2, ..... .

2. Conditional on {0i}, independent X1,X2, ...... are observed
where Xi ~ P 0i, i = 1,2, ..... . The two distributions P0 and P1 are

assumed known.

3. The parameter sequence {0i} has a (joint) distribution G.

8

4. There is a decision bi to be made about 0i, speciﬁcally, a
classiﬁcation as 0 or 1, i = 1,2, ..... .

For each n we let Q= (01,02, ..... ,Qn) and X = (X1,X2, ..... ,Xn)
denote partial sequences, let G denote the distribution of Q and let P Q
denote the conditional distribution of 2; given Q. We consider the set
compound decision problem and allow the bi to depend on X_,
i = 1,2, ..... 11.

There are two loss functions that are commonly considered with

this structure. They are

as Heb=§§ga¢h1
1:

and

(1.9) Lani) =12 t i]

where square brackets denote indicator functions. Ll deﬁnes the loss as the
average number of misclassiﬁcations across the 11 components. L2 deﬁnes
the loss as 0 or 1 depending upon whether all 11 components are correctly
classiﬁed or not.

At stage n, a Bayes rule with respect to g and L1 is determined
by taking bi as 0 or 1 depending upon which maximizes the posterior
probability distribution of 0i given _X_, i = 1, 2, ..... ,n. Most of our work
concerns comparing rules relative to loss L1 and we refer to the Bayes
rules in this case as simply "the Bayes rules." A Bayes rule with respect to
g and L2 selects Q to maximize the posterior probability of Q given
X. We refer to such a rule as a MAP rule following the common usage of
"Maximum a—Posteriori" to describe such a rule. Of course, a Bayes rule

has components that are based on the marginal distributions of the posterior
of Q given X.

There may be an ordering or spatial structure to the index set {1,2, ..... }
that suggests certain classes of priors G for certain problems. The
one—dimensional case with order of indices representing spatial or time order
is called the transect case. We consider the transect case in Chapter 2.
Here G might be taken to be a simple product measure
G = Glelelx ..... or a Markov chain, depending upon the degree of
dependence that is being modeled. In the literature cited previously, i.e. Cox
(1960), Preston (1971), Devore (1975) and Hill et al. (1984), the 0i are
assumed to be the realization of a stationary Markov chain. In most
empirical Bayes literature, the 0i are assumed to be i.i.d. G1 where G1
is unknown.

In the i.i.d. Gl case, the Bayes rule and the MAP rule are very
simple since a conditional distribution for Q given _X_ is the product of
the conditional distributions of Qi given Xi; i = l,2,...,n. In the Markov
chain case, the Bayes and MAP rules are sufﬁciently complicated so as to
have motivated researchers to impose simplifying assumptions. Recall, for
example, that Cox (1960) takes P0, P1 to be Poisson, G to be a known
Markov chain distribution and derives restricted Bayes rules, speciﬁcally, 3i
restricted to be Bayes with respect to the class of rules that are (xi—1’ Xi)
(or (xi-1’ Xi, Xi+1)) measurable. Preston (1971) takes the very special
case P0 is degenerate on 0 and P1 is Bernoulli B(l, p) and assumes
G to be from the family of stationary Markov distributions. Preston
considers Bayes, MAP and restricted Bayes rules. Devore (1975) takes
P0 = N(—l, 1), P1 = N(1, 1) and assumes G to be from a special
one—parameter family of symmetric stationary Markov distributions. Hill et
al. (1984) take P0 = N(p0, a), P1 = N011, a) (without loss of generality
N(—1, 1) and N(1, 1)) and assumes G is from the family of stationary

10

Markov distributions. They derive an approximation to the Bayes rule.
Preston (1971), Devore (1975) and Hill et al. (1984) give estimators for the
transition probabilities deﬁning G and, therefore, consider the empirical

Bayes model in a case where the parameter sequence is not i.i.d..

1.2 A General Approach

Consider the decision problem involving Q introduced in last section
and suppose that the loss function is L1. Then a decision rule Q is evaluated
in terms of average number of errors ( misclassiﬁcations ) across the 11
components. We will ﬁrst review the compound approach to the construction
of Q and will then introduce the extended compound approach as a general
approach to the construction of rules with favorable risk behavior. There is
a long history of work on the two—state compound decision problem including
Robbins (1951), Hannan and Robbins (1955), Hannan and Van Ryzin (1965),
Van Ryzin (1966), Gilliland, Hannan and Huang (1976), and on the extended
version including Gilliland and Hannan (1969), Ballard and Gilliland (1978),
Ballard, Gilliland and Hannan (1975), Vardeman (1975), (1980), (1981). Our
review will be relatively brief and the arguments will be heuristic.

For a (compound) decision rule Q, the compound risk at Q and n is
(1.10) MM) = E, L1(M).

where L1 is deﬁned in (1.8). If G111 is the empirical distribution of

01,02, ..... ,0n and R1 is the Bayes enve10pe risk function in the component

problem, then the modiﬁed regret of Q at Q and n is

11

(1.11) 11(1):) = s(1.1)-R1(G,1,).

Let y be a given class of distributions G on {0i.} For each 11
let 9 denote the class of all (set) compound decision rules Q with Qi a
function of X, i=1,2, ..... ,n. Let B,(Q,Q) denote the Q expectation of
B,( Q, Q). Suppose that Q(Q) minimizes R( Q, Q) among all rules Q in a
We call Q(Q) a Bayes response and denote the minimum Bayes risk as
B(Q). Then any rule Q can be judged by its Bayes risk relative to that of
Q(Q), i.e., by the excess

(1.12) 9111.1) = 3111.1) - ﬁle).

Note that this is not simply the Q expectation of (1.11). Any sequence of
rules Q such that

(1.13) iimn DELI) = 0 for all (3 e y

is said to be WM.

If y is a singleton set, i.e., G is assumed known, then the only
difﬁculty that arises in the Bayes decision problem is in calculating a Bayes
response Q(Q), since it may be a complicated function of Q and X. In
the literature reviewed in the last section, Cox (1960) took G to be a
known Markov chain.

Robbins (1956), (1961) introduced the empirical Bayes decision problem
by assuming that G is unknown and using X to estimate Q or Q(Q).
( As introduced by Robbins, only the component observations X1,X2, ..... ,X.

l

are available for making the decision about 0i, i=1,2, ..... so that his version

12

is sequence in nature.) Robbins proposed (bootstrap) empirical Bayes rules of
the type Q(Q) where Q is an estimate of Q and rules of the type
where the Bayes response is estimated directly without estimating Q.

It is obvious that the class y of priors to which G is assumed to
belong must be restricted for the empirical Bayes approach to produce
reasonable methods. Robbins chose y to be the class of all symmetric
products

y = { Glelelx ..... lo, e V1}

in which case the Bayes risk enve10pe is simply B(Q) = R1(Gl),
independent of n. This is the assumed structure for f in most empirical
Bayes literature where much of the effort concerns ﬁnding sequence rules Q

that are component-wise asymptotically Optimal, such that
. ‘ 1
(1.14) 11mi E [ 0i if 0i] = R (GI) for all G1 6 fl,

where bi depends on X through X1,X ’Xi' The feasibility of

2, .....
constructing sequence empirical Bayes rules that are asymptotically optimal
for this f is quite apparent. Since the observed random variables

X1,X2, ..... are i.i.d. according to the G1 mixture PGl of P0 and P1,
if G1 is identiﬁed by PG , then consistent estimation of G1 is
1

possible.

Of course, the restriction to product measures is unreasonable in
applications for which the parameters 01,02, ..... are correlated. In the last
section, we reviewed examples of empirical Bayes problems where y went

beyond the product case.

13

The empirical Bayes problem becomes considerably more complex if the
class of distributions contains distributions that are not symmetric product
measures. The excess risk (1.12) is the apprOpriate measure of efﬁciency for
a decision rule Q , but problems arise in the construction of bootstrap
empirical Bayes rules. For one thing, y may be of such high dimension so
as to make accurate estimation of Q or Q(Q) based on X impossible
except for very large 11. Moreover, Q( Q) may be such a complicated
function of Q and X so as to make estimation difﬁcult and B,(Q) may
depend on n.

The dimensionality problem is not severe when y is a family of
Markov distributions. For the two—state component we are considering, the
stationary Markov distributions are indexed by two real parameters. The
literature reviewed in the last section concerned this family.

The second problem has led investigators to approximate the functional
form of a Bayes response Q(Q) and to estimate the approximation. For
example, (1.7) is an approximation to the decision rule Qi(Q) in the
transect case with stationary Markov prior. Truncating the range of j in
( 1.6) to various neighborhoods of i produces additional approximations.

The extended compound decision theory developed by Gilliland and
Harman (1969) suggests a general approach for ﬁnding empirical Bayes
decision rules in our applications. It provides a logical and systematic
method for ﬁnding approximations to Bayes rules. It solves the
dimensionality problem and the approximation problem by deﬁning envelope
risk in terms of that resulting by restricting decision rules to a class
9“ C .9 . Basically, the restriction is limited so that the enve10pe risk is
close to Bayes risk B,(Q) and, at the same time, the restricted Bayes

decision rules are simple enough to be easily estimated. Finally it should be

14

noted that this approach leads to compound decision rules which have
controlled risk behavior conditional on Q whereas the competitors that we
test have not been shown to enjoy this more robust risk behavior.

We will now give some details to illustrate this approach. This will
require introduction of the extended envelope theory and I‘k construct of
Gilliland and Harman (1969). The parameter vector Q belongs to the
parameter set Q = 911 where 9 = {0,1}, in our case. Suppose that
associated with each index i E I = {1,2, ..... ,n} is an (ordered) k—tuple of
indices, .1; e Ik, where i is the last index in 4*. We let Q‘l‘ denote
the restriction of a to 4*, i e 1. An example with k = 3 is provided
by 4* = (i-l, 1+1, i) and (z? =(oi_1, ”1+1, 0,). (One can avoid
endpoint problems by using the mod-n arithmetic on I to deﬁne the Q1:
for all i E I or simply by restricting the indices, say to i = 2,3, ..... ,n-l in
this example.)

The decision problem concerning the last component of a k—tuple with
the decision based on independent obsevations based on all k parameters is
called 11. Pk construct by Gilliland and Hannan (1969). Thus, we see that
associated with n, the compound decision problem Q, and a neighborhood
system 1* are n I‘k-decision problems with parameters Qll‘,Q12‘, ..... ,QE.
With G: denoting the empirical distribution of this sequence and Rk
denoting the Bayes envelope risk for the I‘k decision problem, the

k-extended modiﬁed regret for a decision rule Q is
(1.15) Likes) = mas) - R“(G,‘§).

which subsumes (1.11) as a special k=1 case. Note that we are
suppressing the display of dependence on I * by displaying only the size

15

k of the neighborhoods. As shown by Gilliland and Hannan (1969), for

k

k > 1, the R are more stringent envelopes than the (unextended)

component enve10pe R1 = R, and, since G111 is a marginal of (3:,
(1.16) Rk(G:) g R1(G111) for all Q, n.

The enve10pe risk Rk(G:) is the envelope to compound risk obtained by
minimizing over the class d‘ of compound decision rules Q where Qi is

a ﬁxed function of X? i = 1,2, ..... ,n. (The function does not depend on i.)

-1,

We are interested in the Bayes risk obtained by taking expectation of
compound risk with respect to the distribution Q on Q. Suppose that G
is a strictly stationary distribution on 01,02, ..... with common marginal Gk
on Q‘i‘, i= 1,2, ...... . Theorem 3 of Gilliland and Hannan (1969) shows, in

our application, that any compound decision rule Q satisfying

(1.17) lim supn Dk(Q,Q) _<_ 0 for all Q
also satisﬁes

(1.1s) lim snpn pk(g,k) g o for all Q
where

(1.19) shes) = sash — R“(G“)-

Suppose that the neighborhood structure is such that Rk(Gk) is close
to Bayes risk B,(Q) for all C E y, speciﬁcally, suppose that

(1.20) 11km“) 5 mg) + c for all G, n.

16

If Q is an asymptotic solution to the extended compound problem, i.e.,

satisﬁes (1.17), then, by (1.12), (1.18) and (1.20),
(1.21) lim supn med) g c for all C e y.

Thus, the compound decision rule will be asymptotically c—Bayes. The
extended rules we prOpose can be shown to satisfy (1.17) by suitable
adaptation Of the sequence extended work Of Ballard, Gilliland and Hannan
(1975).

Recall from the last section that Preston (1971), Devore (1975) and Hill
et al. (1984) consider approximations to the Bayes response Q(Q). They
examine the component Qi(Q) as a function Of X and approximate it by
a. function Of Xj for j in a neighborhood of i, i E I. In the empirical
Bayes situation where G is an unknown stationary Markov distribution,
they propose empirical Bayes procedures that estimate G using X and
plug these estimates into the component approximations. They give no
rationale for the approximations. In contrast, the extended compound
approach provides a rationale for determining the neighborhood structure. It
should be large enough to control the excess Rk(Gk) - mg) and small
enough to allow for efﬁcient estimation Of the Optimal decision procedure in
the corresponding restricted class of compound decision rules 9“. We ﬁnd

in our risk calculations in latter chapters that the extended compound rules

outperform the competitors selected for comparison.

17

1.3 The Imsge Problem

We have called the case where the index set I = {1,2, ..... } the transect
case. Besides the transect case, we will consider the two—dimensional case
where the indices index positions in an integer lattice, for example, the
pixels in an image. We call this the ﬂags case. Here {oij} may
reasonably be assumed to be the realization Of a Markov random ﬁeld in
some applications.

A brute force calculation Of the MAP estimate for the image Q with
n = leN2 requires a search for a maximum across 211 calculated posterior
probabilities. There has emerged a considerable literature dealing with the
computational issues that arise in the MAP approach to image reconstruction

2 = 65,536 and, therefore, a

since an image may easily have n = 256
posterior that is supported by 265’536 points. (LANDSAT 1—3 scanned the
earth every 18—19 days recording images on a 3240x2340 lattice. Ripley
(1986).)

In the typical empirical Bayes approach to the two—dimensional case, the
unknown Markov random ﬁeld distribution has parameters that deﬁne the
conditional distribution for each 0i j given values of am in a Speciﬁed
neighborhood. It follows that the posterior probability for 0i j given X
depends most heavily upon the Xkl for (k,l) in a neighborhood of (i,j).
The result is that approximations to Bayes rules and MAP rules are possible
and that estimates Of the parameters based on X provide for the
construction Of empirical Bayes procedures for classifying the components of Q.

The extended compound approach has the same advantages in the image
case as in the transect case. In fact, Swain et al. (1981), Tilton et al.

(1982) have used the compound decision theory approach to develOp rules tO

classify image data. A difﬁculty encountered in the implementation Of the

18

rules is the estimation of GE. A "classify—and—count" method is prOposed

to estimate Gk with the use Of a training set Of data (a set Of data with

11’
known classiﬁcations or a representative sample from the image data). First,
the training set of data is classiﬁed by using uniform prior probabilities and
then from this classiﬁcation estimates Of G: are Obtained by counting the
occurences of Qll‘.

We will use the extended compound approach in the image case
and compare the rules with the empirical Bayes classiﬁcation rules suggested

by Morris et al. (1985), Morris (1986) on selected images.

1.4 Summary

In this Chapter we have reviewed the most relevant literature and have
set the stage for later work where the risk behavior Of extended compound
decision rules is compared with that Of other rules.

In Chapter 2 we develop the details of the transect case. We derive an
algorithm for the calculation of the MAP rule. We introduce an empirical
MAP rule and describe further the empirical Bayes classiﬁcation rules of Hill
et al. (1984). The extended compound rules are developed and some results
of simulations to determine risk behavior for the various rules are
summarized.

In Chapter 3 we develOp the details Of the image case. We introduce
the empirical Bayes classiﬁcation rules Of Morris et al. (1985), Morris ( 1986).
The extended compound rules are deveIOped and some results Of simulations
to determine risk behavior for the various rules are summarized.

In Chapter 4 we give the results of the many simulations that were run.
In the transect case, only Q generated as Markov chains are considered. In

the image case both ﬁxed pattern and randomly generated Q are considered.

19

Risks conditional on Q are estimated in every case.

In image segmentation literature, the performance Of a classiﬁcation rule
is Often measured by the quantity "per cent classiﬁed correctly" (PCC). The
measure PCC is related to the loss function Ll by PCC = 100(1 - Ll).
However, a higher PCC does not necessarily mean that the image reproduced
will be perceived to be better than one with a lower PCC. Swain et al.
(1981) suggests measuring the performance of classiﬁcation rules in two ways,
one Of which is PCC. The other measurement is "average by class accuracy"
(ACA). The ACA is obtained by ﬁrst computing PCC for each class and
then taking their arithmetic average. ACA will show whether the
classiﬁcation rule favors one class more than the others. We will report risk

behaviors in terms of PCC.

CHAPTER 2
THE TRANSECT CASE

2.1 Introduction

In this chapter we consider the classiﬁcation on a transect. Let
0i 6 O = {0,1}, i = 1,2, ..... . For the empirical Bayes application, it is
assumed that parameter sequence {0i} has a stationary Markov chain
distribution G as its prior. We derive an empirical MAP rule and
extended compound rules for the classiﬁcation Of Q. The risk performances
of these rules are evaluated in comparison with the empirical Bayes rules
(1.7) suggested by Hill et al. (1984).

Let {Xi} denote the random observations. For our calculations, we
take the Robbins' component. Thus, the Xi are conditionally independent
given {0i} with Xi ~ P 0i, where P 0i is the normal distribution with

mean 20i - 1 and variance 1. We let f0 denote the density Of P0 . The
sequence of states {Qi} is assumed to be a realization of a two state

stationary Markov chain with initial probability distribution
(2.1) P(0=0)=p=1—P(0=1), 0<p<1

and transition matrix

1—6

6
(2-2) 2 = _ _ , 0 < 6 < 1.
i {in 1" i-l-p ]

If 6 is close to 1 there are likely to be long sequences of 0's and 1's in

{0i} and 6 = p corresponds to independence.

20

21

It can be easily shown that the Markov chain deﬁned by (2.1) and (2.2)

is stationary. We will sometimes denote g by

p
.12 = 00
p10

The induced density on Q is

9(3)

1—0 0 n
1(1-p)1 .11

1:

=1)
_ 1
—p[‘§2] .9 p00

11
II

 

mm =

2

n—1 1_—2

n
= me, = 01) £2 mi

p10

 

p01
p11

= 01' 01—1 = 91-1)

1-0. 0. 1—0.
1 1 1—1
[ J [P10 p11

1—0. 0. 0.
1 1 1—1
p00 p01 ]

[1’10 ]0H [”01 J01 [Poo p11] 0,40,

poo p00 p01 p10

]”i-1 [pm It [p00 m} M

p00 p00 p01 p10

In the above derivation, the end—points of Q are included and all the
entries in g are assumed to belong to (0, 1). When pij E {0, l}, the

obvious modiﬁcations have to be done to RHS (2.3).

22

2.2 Classiﬁcation by Maximum Posterior Probability (MAP)
In this section we derive the MAP rule. The MAP rule is Bayes with
respect to the loss function
L2(Q,;0) = [Q#2]

and chooses a state sequence which maximizes f(_0_|x_). Here

(2.4) mg) = ii [J— exp {- 1(x. - 20. + 1)2}]
— i=1 .m— P ' ‘

and

{(54) = RHS(2.3) x RHS(2.4).

 

 

 

Thus,
logf(x,_) =—g-ilog21r-%E (x. -20. + 1)2
+log [p p001] + 01 log [1:2]
11 p p p
+2 0i— llog— 0+ 0i log—— ——Q-1 + 0i_10il 0g 30—1-1
i=2 P00 P00 P01P10
— n1 2 1 E 12 1
‘Q'Og T-§i_l(xi+ ) +08(PP001)
+ 22 xiiﬂ + 010g [1:2]
,_.1
P P10 P0__1_ P00P11
+2 0i_ 110 0+ 0i 10 + 0 010g
i=2 8— P00 15— P00 ‘1 ‘ [P01P10
(2.5) =a+2 bii0+c2 0,49,,

i=1 i=2

23

 

where
a = I21--log21r—%i 21 (xi +1)2 +log(pp361),
b -2x +10 1:2 p_10 -2 +101-
1— l g D p—OO ._ X1 3 T’
P P
bi=2xi+log[_1_0,m] m...
Poo
1_2
b=2 n+lo _0_p1=2x+10[1_6]
n gpo0 n g T’
and

C = log P00P11 = log 6(1-2p+p6).
P01P10 p(1-— 15)2

Maximizing f(Q|x) is equivalent to maximizing (2.5). A brute force
maximization of (2.5) is simply out of the question because there are 2n
possible values of Q to be searched. Traditionally, dynamic programming
[ Bertsekas (1976), Cooper and C00per (1981) ] has been used to ﬁnd
"Optimum" solutions to multistage decision problems. An Optimum solution
has generally been one that maximizes or minimizes a performance or cost
functional.

Dynamic programming techniques have been used in signal processing
problems [Scharf and Elliot (1981)] and image segmentation [Hansen and

Elliot (1982)]. The multistage decision problem we consider is formulated in

24

a probabilistic framework and the criterion of optimality is MAP. The cost
functional (2.5) is a multivariable likelihood function and (2.5) is maximized

by the following dynamic programming recursion [see Appendix].

Let A? = 0 Ai = b1
For 1 _<_ i 5 n—l, let
A(i)+1 = mmA?’ Ai)
(2.6)
A? =max(A9+h. , Ai+h +c).
1+1 1 1+1 1 1+1

Then,
mgx log f(x,Q) = a + max (Ag, A111).

Note that, A12 is the value of

k k
max Xbi0i+c20i_li ,
(01,02. . . ’0k-1) i=1 i=2

(2.7)
with 0k = o and All( is the value of (2.7) with (IR = 1.

The MAP rule is given by

1 if A}2A?

ll
H
IA
“0
IA
§

(2.8) decide 0i

0 otherwise

25

As stated before, few modiﬁcations have to be done on (2.3) and (2.5)
when pij e {0, 1}; i, j = 0, 1. We shall consider the following three cases
in addition to the one already considered. These cover all of the nontrivial
possibilities. In all four cases the cost functional is maximized by using (2.7)

and the state sequence is classiﬁed using (2.8).

0 1
Casel. 0<p<-5, P: .

In this case (2.3) reduces to

1—0

_ 1 0 11 1-0.
11(2) -— p 0-9)

0. Q.
1 £21510 ‘ p11) “lin. 0,) i (o. 0)1} .

For all Q e on with (0i ”1) e e2 - {(0, 0)}; i = 2,3, ..... ,h

_1’
KM) = RHS (2-4) x p 0-13) .112 (p10 p11)
1:
and (2.5) becomes

0

i

I] 11
log f(x,Q) = a + 2 bioi + c 2 0H
i=1 i=2

where

1n 2
a=—g-log21—?i:1(xi+l)+10gp,

_ 1" _
b1 — 2X1 + log ['32 ° p10] — 2X1,

bi=2xi+log[ ], 1<i<n,

26

and

In this case (2.3) reduces to

1-0101 n l-i0 0i1—0i_1
PM) = P 1(1-p)i1_1_2{(P001P01) [(0i_1101)#(1 1)”

Forall oeen with(9i_1,0.)692-{(1, 1)}; i=2,3, ..... ,n

1—1001 n 1-010i1—0i_1
{(252) = RHS (2'4) " P 1(1 ‘ P) in 21(P00 P01)

and (2.5) becomes
n
i=1
where

1n 2
a =—n§log27r-§i21(xi+l) +logp+(n—1)logp00,

= 1_-p_ . ._1_ = [1-]
b1 2xl + log [ p p00] 2x1 + log 5-59- ,

 

27

and

In this case (2.3) reduces to

11 1-0 0i1-0i_1
p<1>=10 = 01 n {(Poo p01) [(0 ,_1, o) i (1 1)}}

For all Q E 911 with (oi—1’ 0i) 6 92 - {(1, 1)} and 01 = 0; i = 2,3, ..... ,n
n li—Q 0i1—0i_
f(§1.€) = RHS (2'4) x iI=12(Pool P01)
and (2.5) becomes

11
log f(x,Q) = a + 2 bioi
i=1

 

where
1 P 2
a = — 3— log 21 — 2121 (xi + 1) + (n-l) log p00 ,
b=2x+log—1-— =2x+log[l]
1 1 p00 1 T ’
P 1 1— .
P00
and

b = 2xn + log p0——1- = 2x + log [1'6].
11 p00 11 T

28

2.3 Empirical MAP
In order to implement the MAP rule (2.8) we need to obtain estimates
for the unknown parameters of G. The proposed estimates are based on

2(_. When plugged into (2.8) the result is an empirical MAP rule.

. X

i’ is an unbiased estimate
1 1

II M:

(ii) Let Zi = [sgn Xi 1‘ sgn Xi+l]’ i = l,2,...,n-1 and
_ l n—l

= a7 _ - b is an unbiased estimate for p(1—b),

where a = .5[<I>(1) - <I>(-1)]‘2 1.0731 and b = 2a «1(1) <I>(-1) 2 .2865.

ll

Proof
(i) Since E(Xi|0i) = 20i —1 and E(0i) = l—p the proof of (i) is immediate.
(ii) We have Zi = [sgn Xi at sgn Xi +1]

Then,
and the RHS (2.9) is evaluated by partitioning on (0i, 2+1) as follows.

29

Let RHS(2.9) = 2(pl+p2+p3+p4)
where
P1 = P(Xi 3 0’ X1+1 < 01 0i = 0’ “1+1 = 0)
p3 = P(xi 2 0, xi+1 < 0, 0, = 1, em = 0)
and

p1, p2, p3, p4 can be evaluated without much difﬁculty. For example,

P1 = P(Xi

= P(Xi
= (1 - <1’(1))‘1’(1)1511
= 9(-1)<1>(1)6p
and similar computations yield
p, = mania-0p
p3 = («rustic-6)
p4 = 2(1)<I>(-1)11—p—p(1—01 .

Z 01Xi+1< ””1 = 019m: 0) PM = 01‘1“: 0)

Olli = 0)P(Xi+1 < 0|0i+l = 0)6p

IV

RHS (2.9) = 2p(l-6) [c(1) — <1>(—1)]2 + 2e(1)<1>(—1).

 

zi-2<1>(1)<1>(-1) J
= _ 5,
E [mums—1))2 p“ )
and this completes the proof. :1

30

Since 0 5 p, 6 5 1, it is natural to truncate U11 and Vn—l at 0 and
l and use the modiﬁed estimators
I
Un = Un[0 < Un < 1] + [Unz 1]

I
Vn—l

The unknown parameters p and 6 are estimated by solving the

= Vn_1[0 < Vn_1 < 1] + [Va-1 2 1].

equations
D = 111’,

(2.10) p(1 _ 5) = vii-1'
2.4 Classiﬁcation by Extended Compound Bayes Rules

We begin this section with a brief review of the I‘k construct
introduced by Gilliland and Harman (1969) when applied to the ﬁnite—state,
ﬁnite—action compound decision problem. Consider a component decision
problem with states 0 E 9 = {1,2, ..... ,m} indexing possible distributions
P0 E .9: {P1,P2, ..... ,Pm} where Pi (i=1,2, ..... ,m) are distinct probability
measures on (..S .3. The actions a e .1: {1,2, ..... ,n}. Let the loss
function L(0, a) be such that

0$L(0,a)<oo forall Qeeandaex.

Let (p be a component decision rule. Then 1;: is a ﬂ-measurable
mapping into .A' *, the (n—l)-dimensional simplex of probability measures on
A Let R(i, ) denote the risk of the component decision rule (p at state
1. Then

R(i,so) = 1 (1% Lot) w] dpi.
1:1 J

31

Let P: Poleo2x ..... xPoN and 52: (91,392, ..... ,gN) where for each

*
a, 59a is a ﬂ—measurable mapping into .1 . Let R a( Q,5g) denote the
risk in the a component decision for the compound rule 53 and let

B(Q,(Q) denote the compound risk at N repetitions. Then »

11
R0122) =1 (Elwyn sad-1dr

and

N
has) =11; 03111022) -

For the above description of the component decision problem, we will
now describe the I‘k construct. The I‘k decision problem has states
Qk = (01,0 2, ..... ,Qk) E 9k, observations _Xk = (X1,X2, ..... ,Xk) distributed as
ng = P0 xPo x ..... x130 e 51‘ and 0k is to be classiﬁed. The loss

1 2 k

matrix Lk is mkxn with Lk(Qk,a) = L(0k,a).

Let Rk(Qk,<p) denote the risk of a decision rule (p in the I‘k

decision problem at state Qk. Then
k
R (2w) = EIkasjekm

n
= [[jglLWksJ) soj(£k)ldng

Il
. k
= [[jgleks) ‘Pj(¥.k)f_0_k(-’5k)d” (5k)

where fgk(xk) = iI'I f0 (x1)

32

Letting Rk(G,<p) denote the Bayes risk of (,0 versus a prior G on

61‘, we have

Rk(G,tp) = Ele(Qk,¢)l

k
= 2 R (0 ,¢)G
-k 0
”1 -k

D
(2.11) = ”121 ¢j(2£k)[ EkkaJHﬂkgﬂngndpkO—(k) .

The Bayes enve10pe in the I‘k decision problem is given by

Rk(G) = inf Rk(G,<p).
‘P
Let

(2.12) A'(Z(.k)

J L(0kaj)f£k(£k)GQk-

=13
2k

From (2.11) it follows that a I‘k

Bayes rule has all of its mass placed on
the j's which minimize Aj(xk).

The incorporation of asymptotic risk objectives, for the compound
decision problem, more stringent than the usual standard R1(GN) is an
important develOpment in compound decision theory. This was discussed in
Section 1.3 of Chapter 1. The advantage of using these extended rules in
classiﬁcation problems is due to the fact that these rules beneﬁt from
empirical dependencies in the state sequence 01,02, ..... ,QN. Ballard and
Gilliland (1978) have studied the ﬁnite risk performance of extended decision

rules for the sequence version of the compound decision problem with

Robbins' component.

33

The criterion for the classiﬁcation rules is minimizing the risk and the
concept of the P1‘ decision problem is useful for constructing rules for this
purpose. We consider both the I‘2 Bayes and [‘3 Bayes rules. For the
application of I‘2 to the compound decision problem, the 0i are grouped
as of =(0i_1, 0,) with oi to be classiﬁed; for the application of r3
to the compound decision problem the 0i are grouped as
I? = (0i_1, 0H1, 0i) with 0i to be classiﬁed. For each {0i} and 11
let G2 be the empirical distribution of £242, ..... ,Q: and let GEL2 be

n—l
the empirical distribution of 53$", ..... ”6:4.

For the 1‘2 decision problem, the state space is
92 = {(i. i): 1.1 = o, 1}
and we let p2 = {pijz i, j=0,1} be a probability measure on 92.
Taking L(Q,a) = Ll(0,a) = [ 0 #a ] in (2.12), we obtain
A032) = 0(x1)f1("2)P01 + f1("1)f1("2)P11
and

A132) = 0(x1)f0("2)P00 + f1("1)f0("2)P10 °

A F2 Bayes rules versus p2 is given by

(2.13) decide 0k =

0
>

1 - 1
1‘ f1(Xk) E pilfi(Xk—l) {0(Xk) .E piofi(Xk—l)'
1—0 < 1—0

34

This is equivalent to

 

1 Z 1 i§0p10f1(xk-l)
decide 0k = if Xk 2 log I
0 <.E0P11fi(Xk-1)

which can be written as

 

1 2
(2.14) decide 0k = if Xk C(p(Xk_1))
0 <
where
lE PilfiP‘)
p“) = i s m
p.. x
1—0 j=0 ‘1 ‘
and

C(9) = $103 [L32] -

The Bayes enve10pe in the [‘2 decision problem is

2 2 __ x x
11(2) — Ix x1 {X2: x2$c(p(xl))} [f 0( 1)})01 + f1( 1)P11]f1(x2)dx1 dxg

f x f x f dx (1
+ Ix1 {x2zx2>{:(p(x1))} [0( 1)P00 + 1( 1)P10] 002) 1 x2

35

= 1x1 [f0("1)P01 + f1(x1)P11]°(c(P(xl)) 4W1

“I" le [f0(x1)p00 + f1("1)P10][1 " ¢(C(P(x1)) + 1)]dx1

1
(2.15) = igo Ifi(x1)[p,0{1- <I>(c(p(x1)) + 1)} + pil<I>(c(p(x1))-1)ldx1.

For the F3 decision problem, the state space is
e3 = {(i, j, k): i, j, k = 0, 1}
and we let p3 = {pijkz i, j, k = 0, 1} be a probability measure on 93.

In this case (2.12) gives

1

E f101<(§3)P10k

1
A(x)=2
0'3 i=0 k=0

1 1
=f 2 2 f.x f x p.
0(X2). “(=0 1( 1)k( 3) 10k

1:
and

1
E0 fillt(3‘-3) Pilk

l
A(x)=2
1‘3 i=0k-

1 1

= f1("2)iE0 kE0 fi(xl)fk(xk)pilk -

36

The I‘3 Bayes rule versus p3 is given by

1
(2.16) decide 0! = if
0
1 1 2 1 1
f1(x£) iE0 kEopilltin‘t-l)fl<(xt+1) £009) iE0 kEoPiokin‘t—l)flt(xt+1) -
<

This is equivalent to

decide 0! = if
1
kE0 piOkfi(Xl-1)fk(xl+l)

><
N
[0|
0%
“1441(th
O

1
0 kE0 pilkfi(Xt-l)fk(xl+l)

which can be written as

l 2
(2.17) decide 0! = if X l c(p(X[_l, X t +1))
0 <

where
1 1
2 2 pi
_ 1: =
NEW-g: 213 ‘1]: f()f()
p... . X y
i=0 j=0 k=0 ‘1‘“ 1‘

lkfi(x)fk(y)

37

It can be shown that the Bayes envelope in the I‘3 decision problem is

19(3):}; I A(-)dx
B x1"3[{x2=x2.<.C(p(x1.x3))} 1x3 2

+ j A ( ) ] dx dx
{x2=x2><=(p(x1,x3))} 053% l 3

1

1
= igo kgo I x1 Ix3 fi("1)fli("3)[P101t{1 — ‘P(C(p(x1, x3)) + 1)}

+ pilk <I>(c(p(xl, x3)) - 1)] dxldx3 .

The estimation of the I‘2 and 1‘3 Bayes rules versus empirics in the
compound problem depends on the construction of estimates for G121 and
G3. The estimates for these empirical distributions are then plugged into
(2.14) and (2.17) in place of p2 and p3 yielding extended compound
Bayes rules. We follow Hannan (1957), Robbins (1964), Van Ryzin (1966)

and Ballard (1974) in the construction of unbiased estimates.

Deﬁnition 2.1
h = (110’ hl) is said to be an unbiased estimator of the family
.9: {P0, P1} if
Eihj(X)=[i=j] i,j=0,1

where Ei denotes expectation with respect to Pi .

38

From h_ one obtains unbiased estimators of 9 k by taking
_ . k
hik(xk) — hi1(xl)hi2(x2) ..... hik(xk) for all 1k E 9 .
Such an estimator is called a product estimator [Ballard (1974)]. The
unbiased estimator we use for {P0, P1} is the kernel function used by
Robbins (1951):
{(x) = (r0007 110‘»

where r0(x) = é-(l — x) and rl(x) = £1 + x).

These are used to produce estimators for the components of G121 and

G3 as follows:

‘2 _ 1 n
and
. —1
3 _ 1 n
(2.19) (3114;in — _2n— E2 ri(Xa_1)rj(Xa)rk(Xa+l) , n 2 2 .

These estimators are unbiased and consistent for the components of G34
and G24. The extended compound rule ER2 is formed by substituting
(2.18) for pij in (2.13) and letting k = 2,3, ..... n . Similarly, the extended
compound rule ER3 is formed by substituting (2.19) for pijk in (2.16)
and letting l = 2,3, ..... ,n—l .

39

2.5 Comparison of the Methods

Even though we have suggested two types of classiﬁcation rules, MAP
and extended, our main interest is in the performance of the extended rules.
Chapter 4 describes in more detail the simulations and the classiﬁcations
obtained using various rules. As an example, we present some of the PCC
comparisons for the extended rule ER3 and its competitor HEB3 deﬁned
following (1.7) in Chapter 1. Table 12 in Chapter 4 gives more detailed
information on the performance of these two rules.

In 100 simulations for each of 84 different (p, 6) combinations:

A. with p = 50 mmmngnts

(i) In 76 (90%) cases the ER3 has a higher PCC than the HEB3. The
improvement of ER3 [=PCC of ER3 - PCC of HEB3] has a mean of 3.74
with standard deviation 2.62.

(ii) In 8 (10%) cases the HEB3 has a higher PCC than the ER3. The
improvement of HEB3 has a mean of .82 with standard deviation .48.

B. with n = 299 cgmmnents.

(i) In 75 (89%) cases the ER3 has a higher PCC than the HEB3. The
improvement of ER3 has a mean of 2.30 with standard deviation 2.25.

(ii) In 9 (11%) cases the HEB3 has a higher PCC than the ER3. The
improvement of HEB3 has a mean of .33 with standard deviation .24

It was also observed that the ER has a very high improvement over

HEB when either p is very small ( < .1) or 6 is very large ( >.9).

40

In implementing the empirical MAP rule (2.8), the unknown parameters
(p,6) were estimated by using Proposition 2.1 and the following convention
was used in cases where either the estimates were undeﬁned or truncation

was necessary. Let u,v denote un , v respectively.

n—l
(i) If{usO}U{0<ug.5,v30} then decidea110i=l.
(ii) If {u > .5 , v s 0} then decide all 0i = 0.
(iii)If{u21,v,>_1}u{.5<u<1,u_<,v} then

(iv) MAP rule (2.8) for the remaining (u,v).

decide 02

We used the estimators following from (2.10) for substitution into the
HEB3 and HEB5 rules. Hill et al. (1984) also have pr0posed estimators (1.5)
for (p,6) which they apparently used to implement their empirical Bayes
rules. The method of estimating (p,6) did not effect the performance of the
empirical MAP rule. However the same was not true for the HEB rules.
The HEB rule with j=1 (HEB3) and the parameters estimated by (1.5) had
a better performance than the same rule with the parameters estimated by
(2.10) whereas the HEB rule with j=2 (HEB5) and the parameters
estimated by (2.10) had a better performance than the same rule with the
parameters estimated by (1.5). However even with the improvement, the

HEB rules remained inferior in performance to the ER3 rules.

CHAPTER 3
THE IMAGE CASE

3.1 Introduction

In this chapter we consider the following classiﬁcation problem.
Suppose we have a rectangular array of pixels and on pixel (i,j) we have
the random observation Xij' Pixel (i,j) has a true but unknown
classiﬁcation 0i)" The {Qij} gives rise to a two dimensional image. The
observed image {Xij} has a distribution depending on {Qij}. Because of
the spatial correlations inherent in images, {Qij} is considered to be a
realization of a Markov random ﬁeld (MRF) in some models. This can be
viewed as a variation of Robbin's empirical Bayes problem, with the
parameters being dependent. As in Chapter 2, tools developed for extended
compound decision problem will be applied to derive decision rules based on
X to reconstruct the image. Here Q and X_ denote restrictions to an
leN2 rectangle of indices and n = NlN2 .

In Chapter 2 the sequence {0i} was assumed to be a realization of a
two—state Markov chain. This prior distribution can be considered as a

special case of a more general class of Markov distributions for {0i}. To see

this and to set the stage for the introduction of a MRF, we rewrite (2.3) as

follows:
n—l n
.E 0i .3 01-101
_ n—l 1_—p P10 ”1 P01 on P01P10 '=2 P00P11 1:2
P(.Q) - P P00 — — _—
P P00 P00 p00 P01P10
II II
1: 1:

41

 

where
C(n, p, 6) = 170351 = min—l,
a : 40g 1_-B BE
1 P P00 ,
P P
(17i--—log——0%-—l—0 , 1<i<n,
P00
P01
an — -log [p—OO ],
and

 

P00P11
[3 =—log .
p01P10

Let :Qi denote the vector obtained by deleting the ith component from
Q.
Qi = (61,02, ..... ’oi—l’oiH’ ..... on)
If Q is a Markov chain, then it can be shown that
~ ' P(01|02) i = 1
(3.2) P(0i|Qi) = « P(0i|0i_1,0i+1) 1 < i < n
5 P(0n|0n_l) i = n

 

and from (3.1) for 1 < i < n
"%0rPﬁIﬁ—1+9i+ﬂ

= e
(33) P(0i|0i-l’ 01+1) He'arﬂwi—ﬁ 0i+l)

 

43

 

From (3.3) an expression for the log odds ratio for the ith class is
given by
P(Q.=l|0._,0. )
P(Q.=0|0. ,0. )
1 1—1 1+1

The expressions (3.1) and (3.2) can be generalized into more complex
situation than the one dimensional Markov chain considered above. We shall
concentrate on discrete two dimensional random ﬁelds deﬁned over a ﬁnite
leN2 rectangular lattice of pixels. Let

L={(i,j)|igigN1,lsj5N2}.

Deﬁnition 3.1
A collection of subsets of L,
I= {N(i, in (i. 1') e L}.
issaid tobeanemmmmdystemon L ifforall (i,j)EL,
(i) (i, i) 1! N(i, i)
and (ii) (k, l) E N(i, j) implies (i, j) e N(k, l).
N(i, 1') is called the new (i. i).

A hierarchically ordered sequence of neighborhood systems that are
commonly used in image segmentation and pattern recognition are I,
.6, ...... where for each (i, j) e L ( except for boundary pixels ) .61
consists of four pixels neighboring (i, j), .12 consists of the eight pixels
neighboring (i, j) and so on. The neighborhood systems .I , .6, I , .12,
and .15 are shown in Figure 1.

44

5 4 3 4 5
4 2 1 2 4
3 1 (i,j) 1 3
4 2 1 2 4
5 4 3 4 5

Figure 1. Neighborhood systems A;

In Figure 1, J; = {Nn(i, j)} where symbolicaly Nn(i, j) = {k: k 5 n}.
.11 is called the nearﬁt neighbor (NN) system and J; is called the
nth MAW-

Deﬁnition 3.2
Let I be a neighborhood system deﬁned on L. A random ﬁeld
6 = {Pijl (i, 1') E L} is a Mmmndgnhﬁekl with respect to I if

(34) P(4,,-I 14,-) = would?)

for each (i, j) e L, where Q? denotes the restriction of Q to N(i, j).

The Markov chain on the transect Q results in this structure by

taking
N01= {2}
N(i) = {i—l, 1+1}, 1 < i < 11,
N0!) = {11-1};

(cf. (3.2) and (3.4)).

45

It should be noted that any random ﬁeld is a MRF with respect to the
neighborhood system N(i, j) = L for all (i, j) E L. A MRF is not
characterized by specifying the conditional probabilities (3.4) yet (3.4) is
termed the W of the random ﬁeld. A difﬁculty with the
above characterization of a MRF is the unavailability of the joint
distribution. These difﬁculties are overcome by characterizing a MRF
through its Gibbs distribution. To do this we will introduce additional

concepts.

Deﬁnition 3.3
A gﬂgue of the pair (L,.I ), denoted C, is a subset of L such that
(i) C consists of a single pixel
or (ii) (i, j) e C and (k, 1) e C z: (i, j) e N(k, l)
for (i, j) at (k, l).

The collection of all cliques of (L,.l ) will be denoted by 8’.

For the Markov chain

9' = {{1}, {2}, ..... ,{n}, {1, 2}, {2, 3}, ..... {n—l, n}}.
The type of cliques associated with .11, .6 are shown in Figure 2.

46

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1
1 (iii) 1
1
«ii neighborhood system cliques in II
2 1 2
1 (i,j) 1
2 l 2
.12 neighborhood system cliques in 6

Figure 2. Neighborhood systems I] and «if? and their associated cliques

Deﬁnition 3.4
Let .I be a neighborhood system deﬁned over a lattice L and 16’
the collection of all cliques of (L,J’ ). A random ﬁeld Q = {oij |(i, j) E L}

has a mm (CD) or equivalently is a giibbs rapdgm ﬁeld (GRF)
with respect to .I if its joint distribution is of the form
-U(Q)
(35) 11(1) = 9—2—
where

= 2
11(4) CE{gt/C(10)

Z = (Um).

1:04

47

The U, VC(~) and Z in Deﬁnition 3.4 are termed, respectively, the
energy function, the potential associated with clique C, and the partition
function. The only condition on the otherwise fully arbitrary clique potential
VC is that it depends on the pixel values in clique C. That is, oij and
0k] may appear together in a term is U(-) only if (i, j) and (k, l) are
neighbors.

Comparison of (3.1) and (3.5) gives

V{i}(0i)=azi0i , ISiSn,

V{i-1,i}(0i-l’ 91) = 591—1 0i . 2 5 i 5 n '
The origin of GD lies in physics and statistical mechanics literature. The
source of revived interest in GD, especially is the context of image
segmentation and pattern recognition is due to an important result known as
Hammersley—Clifford theorem. This result establishes a one—to—one
correspondence between MRF and GRF. Unlike the MRF local
characterization (3.4) the GD characterization (3.5) provides the joint
distribution of the random ﬁeld and is free from consistency problems. A
detailed treatment of MRF and GRF can be found in Besag (1974, 1986),
Kinderman and Snell (1980).

3.2 Introduction to Image Segmentation

The image segmentation process is an important component of scene
analysis and reconstruction of noisy images. As such it has received
considerable attention in computer vision and image processing literature.
The general image segmentation problem can be viewed as a two dimensional
classiﬁcation problem. The data are collected in two spatial dimensions,
specially on a rectangular lattice of points called pixels. Each pixel is

assumed to have a true but unknown ﬁxed classiﬁcation.

48

Suppose, we have a N 1xN2 rectangular array L of pixels and we
observe Kij’ a vector of intensities, at pixel (i, j). Let 0ij denote the
label of pixel (i, j) and

oij E 9 = {1,2, ..... ,m}.
The observed image is a random ﬁeld X = {Xij} deﬁned over L. The
image random ﬁeld X_ is assumed to be a function 0 "scene random ﬁeld"
Q = {Qij} and a corruptive "noise random ﬁeld" X each deﬁned over L.
The functional relationship between the random ﬁelds is such that at each
pixel the image random variable is a function of the scene random variable
and the noise random variable at that pixel, that is
Xij = fij(_Y_,Q) for all (i, j) E L.
However, the interest is mainly in the case of additive noise where
fij(X,Q) = Yij + aij for all (i, j) E L.

The use of MRF as models for the distribution of the scene random
ﬁeld have been prevalent in the image segmentation literature for some time
now. Geman and Geman (1984), Ripley (1986) are two, among the
many papers which contain detailed references about the use of MRF in
image segmentation. The MRF model has the spatial distributional pr0perties
that have proven to be useful in segmentation studies.

The objective in image segmentation problems is that, given a image
realization X, to determine the scene realization Q that has given rise to
X. The scene realization Q is, of course, invisible and can not be obtained
deterministically from 2;. So the problem is to obtain an estimate
Q = Q(X) of the scene, based on a realization X- A Bayes decision
theoretic approach requires the speciﬁcation of a loss function. The two loss

functions L2 and L introduced in Chapter 1 lead, respectively, to

1

49

(1) maximum a posteriori estimation
and

(2) maximization of the posterior marginal probability at each pixel.

A major reservation regarding (l) is purely computational. A brute
force maximization is simply out of the question because there are mn
possible scene conﬁgurations for the maximization to be done over.
Exceptions occur by design in the subclass of MRF considered. Derin et al.
(1984), Geman and Geman (1984) have obtained MAP rules using a
procedure from combinatorial Optimization known as simulated annealing.
The simulated annealing is used on a subclass of MRF's called Markov mesh
random ﬁelds (MMRF), where an additional assumption that the columns of
the scene constitute a Markov chain is made. Still there are computational
difﬁculties in implementing the Bayes smoothing algorithm of Derin et al.
(1984), Geman and Geman (1984). For example, for a binary image on L,
at each step of the algorithm there are on the order of 2N2 calculations to
be performed. In order to overcome the above computational difﬁculties the
image is processed in relatively narrow strips. The overlapping strips are
chosen in such a way that the union of the middle sections of the strips give
the entire image. Haslett (1985) has proposed an approximation to (2) based
on maximum likelihood discriminant analysis for a subclass of MRF, called
the Pickard (1977) random ﬁeld (PRF). An important feature of PRF is
that the columns and rows of Q separately form a Markov chain. An easily
proven fact for a Markov chain is that Qi_1 and 0”] are conditionally

independent given 6i. In Haslett's (1985) derivation of the classiﬁcation

rules, it is assumed that the four random variable 6. Q.

1,ji:1’ lil,j are

conditionally indpendent given oij'

50

In the next section we introduce the empirical Bayes approach to image
reconstruction which is based on estimating parameters of the MRF. In
Section 3.4 we will give details of the construction of extended compound
rules for the image application; these rules have favorable risk behavior that
does not require distributional assumptions. We introduced the MRF because

three of the tested images in our simulations were generated as a MRF.

3.3 Classiﬁcation by Empirical Bayes Rules

In this section we describe the empirical Bayes rule pr0posed by Morris
et al. (1985), Morris (1986) for the classiﬁcation of a binary image,
9 = {0,1}.

Let Xij be the random observation on pixel (i, j) and oij the
true but unknown classiﬁcation of the pixel (i, j), i,j = 1,2, ..... . We let Q
and X be restrictions of {Qij} and {Xij} to the leN2 rectangular
lattice L.

The speciﬁc empirical Bayes model considered by Morris et al. (1985),
Morris (1986) has

(a) Xij ~ N(-56(20ij-1), 1) independently, with 6 a known
constant, conditional on {Qij}.

(b) {Qij} has a distribution G that is spatially isotrOpic (invariant
under translations and rotations) with

p = P(Qij = 0) for all (i, j)

and

pt,“ = corr. (Qij, 0i for all (i, j) and (t, u).

+t,j+u)

51

Even with the structure (a) and (b), the exact form for each marginal
posterior probability P(Qij=1|)$) is quite complicated. Morris et al. (1985),
Morris (1986) suggest a logistic form as a good approximation to this
probability, with accuracy increasing as 6 -+ 0. Furthermore, the logistic
fimction is approximated by a discriminant function, which predicts
P(0ij=1|X) from the "ring" averages, these being averages of those data
values in Speciﬁed locations relative to the pixel (i, j). The ring locations
R0, R1, ..... ,R5 with center at pixel (i, j) are indicated by the integers in
Figure 1 with R0 being the singleton pixel (i, j). The four nearest
neighbors marked 1 make up R1, the next four marked 2 make up R2, and
so on.

Deﬁne the discriminant function Aijm) as
P(Qij=1|2(_)

P(6ij=o|x) '
The approximation proposed in Morris et al. (1985), Morris (1986) is based

 

(3.6) i,j(x) = log

on a moving average of the form

r
.. e 1:2 ..
(3.7) AUQQ) log p +£01qu

where Xijt is the average of the observations in the ring Rt centered at

pixel (i, j) and 7t depends on X and Q. In (3.7),
_ — _ 1
3tij,0 - Xij 1 Xij,1 - 4(Xi,j—1 + Xi—1,j + Xi,j+1 + Xi+l,j)
and so on, with suitable modiﬁcations at the edges of the lattice.

52

Deﬁne the nx(r+1) matrix

 

 

xn'X _le ‘ X1 """ 3:11,: " Xr
X12”? 5:121 " x1 ''''' X.12; " X.1
Y =
Xn_x Xn,l _ X1 °°°° Kn; - Xr
with X, X1, ..... ,Xr the average of Xi., inj 1, ..... ’Xij,r , respectively.

If oij were observable, RHS(3.7) can be estimated by

 

x ~ 1 1 x -
AUL) log 1=-'6+RSS(Q)t Wotan,” m,(4)),

where z) = % is} oil. and W), mt(Q), 1133(1) deﬁned as follows:

Let 144) = (110(4), b1(4):.....b,(4))'
X = (xix 11 °-1—r')
mm) = (1110(4) m1(4) m,(4))'

 

0(4) = n‘1Y4
s = nor'v)‘l .
Then
12(2) = SC(ﬂ)
(f) = + C Q
in 14 2;; :7) ( )

RSS(£)= P(1:4) C (250(2).

53

(3.8) can be simpliﬁed by letting

r
13.0th) (351M - mtg» = 81+82

r
where S1 = tgotho) (in,t — it)
= I. b(£), y. is the (i, j)th row of Y
(i,j) (i,j)
= 114', , (Y'Y)"‘C(4)
(1.1)
and

82 = téobdg) (it - 11143))

 

_ 2'0—1 '
— 270-?) C (MGM)

_ ‘0—5 ' ' -1
- 4(1-4) nC(4)(Y Y) 0(4).

 

With the above simpliﬁcations (3.8) reduces to

..x g l. __n_ ' 0;?) ' ' -1 ,
(3-9) A,J(_) 103 149+ 1233(4) 10s) + 90-?) 0(9 (Y Y) 0(4)

Let ct
those in Rt' That is

denote the sample autocovariance of elements in R0 with

_ 1 — _ - _
Ct — H izjxij(xij,t Xt) t — l,2,....,l‘.

54

Proposition 3.1
Let a = .5 + 4’11: and (“3(4) = (661(1-ﬁ),5—1c1—n—2n1,....,Flcr-n—2nr)'
where X = % .2.Xij and nt is the number of pixels in ring Rt' Then
1,1
ii and (3(4) are unbiased estimates for '0 and C(ﬁ) respectively.

Proof: see Morris et al. (1985). n

In applications nt is very small compared to n2. Thus a nearly
unbiased estimate of ct(£) is 5' 1ct (t = 1, 2,...,r). Plugging in the
estimates (“1 and (3(9) into (3.9), Morris et al. (1985), Morris (1986)

obtain their empirical Bayes rule as

1 if i,jzo
decide 49“.:

0 otherwise

We investigate the risk behavior of the Morris rule based on the four
nearest neighbors and based on the eight nearest neighbors. We call the
rules MEB4 and MEB8.

55

3.4 Classiﬁcation by Extended Rules

In this section tools deveIOped for extended compound decision problem
are used to obtain classiﬁcation rules which enables us to reconstruct the
image. These rules will be called extended rules.

Let aij e .1 = {0, 1} be the action taken with respect to the pixel
(i, j) and let Lwij’aij) be the loss incurred by taking action aij when the
true classiﬁcation of the pixel (i, j) is aij' As in Section 2.4 assume that

0$L(0,a)<ao oeeaex
The average loss suffered over the n classiﬁcations is
M4. 4) = -3; istwﬁ, zen-(sh

and the risk

(3.10) R(4. a.) = E [M4, 4.)]-

In order to apply the extended compound approach, that was introduced
in Section 1.2, to the image case, we choose a neighborhood system for the
leN2 lattice. Our computations will concern compound rules designed to
achieve the asymptotic risk behavior (1.17) for a particular neighborhood
structure and k = 5. Speciﬁcally, that neighborhood system is the nearest
neighbor system [1 deﬁned in Section 3.1. For simplicity of exposition,
we develop our ideas in terms of this system although all results hold more
generally.

It is also convenient to measure risk on the sublattice

“LDIZSiSN-LZSjSNfU

1
since here every lattice point has a complete neighborhood N(i, j) consisting
of four nearest neighbors. On the sublattice, n = (NI-2)(N2—2)

classiﬁcations are made.

56

As in Section 1.2, we let ﬂij and xij denote restrictions of f and
K to the positions N(i, j) U {(i, j)}. For our neighborhood system, k = 5

and £2sz (0 i,j—1’ 0i-l ,j’ 01 1+1’ 0i+1,j’ 0ij)
We now derive the form of the 1‘5 rule that has compound risk
R5(G151). Let 51 be a compound rule such that

where d is a I‘5 decision rule.

Notations
X : k-vector of observations
ﬂk : k—vector of parameters
XE]. : k-vector of observations with X.. as
4':

1]
j : k—vector of observations with 0” as kt"h

kth component.

component.

From (3.10) the compound risk of a rule of the form (3.11) over the

subarray is

R(4. d) = 31-}; iszw, d(_,,))1

=1- 2 230 EL( d_..)
n ﬂkeek {( i1j)'0i j=0k} [ 0k ( ”)1

(3.12) = 2 -1- 2 ] L(0 , d( k))f(_k|£k)d¥k .
ﬂkeek n {(i,j)zoij=gk} k x x

57

Letting Gﬁwk) denote the relative frequency of the occurrence of _Qk
in the subarray _Q, (3.12) can be written as

M4. «1) = 2 6,134") I wk. d(xk))f(xklﬂk)dxk
leek

(3.13) =1 2 k 6134“) M4,. d(xk))f(zik|£k)dxk-
_ 69

Note that G1; is simply the empirical distribution of the ﬂij across the
subarray.
A rk Bayes rule d* with respect to G: is obtained by
minimizing the integrand in (3.13) for each Xk. Thus,
d*(2(_ij) = the action a which minimizes

(3.14) 01.391. 61,34“) wk, a)f(2i,j|4“) .

Taking L(0, a) = L1(0, a) = [0 # a], (3.14) reduces to

d*(xij) = the action a which minimizes
k

2 1 0,44“) fen-I4“)

_ 69

1.4a

= the action a which maximizes

(3.15) 013636]; Gﬁ<4“)i(z,jl4“) .

if.

58

Since the Xij's are assumed to be class conditionally independent
k
«ah-I4“) = 021 f(Xa|0,,)
and (3.15) becomes

d*(£ij) = the action a which maximizes

r
(3.16) ska“)k n: 1(xa I00) f(Xij|a).

the“
1f.

 

 

Taking k = 5 and

fij = (”i,j—1’ 0i—1,j’ 0i,j+1’0i+1,joij)

with oij to be classiﬁed, (3.16) gives the [‘5 Bayes rule as

(3.17) decide 0.. = 1 if

4 4
2 5 03(45) nlr(xa|40) f(Xij|l) z z 5 0151045) 01:11axaloa) f(Xijl0)

e 9 a: 05 e 9
'05: 1 95: 0
and
0.. = 0 otherwise.

As in Section 2.4, we will estimate Giws) by unbiased product
estimators. For example, G:(o,o,o,o,0) is estimated by C(o,o,o,o,0)

where

(‘3(0,0,o,o,0)
n1 n2

_ l
_ n1- 112- £2 jEZ r0(Xi,j-l)l'0(xi-1,j)r0(xi,j+1)1'()(xi+1,j)1'0(xij)-

59

where 111 = N 1-1 and n2 = N2—1.
We refer to the extended compound decision rule that is obtained by
substituting the product estimates based on kernel r into (3.17) as ER4.

3.5 Comparison of the Methods

The performance of the extended rule (ER) of Section 3.4 was compared
with the empirical Bayes rule (MEB) of Section 3.3 based on 2 different sets
of neighbors, the four nearest neighbors (R1) and the eight nearest neighbors
(R1 and R2). The rules were tested on both deterministic and stochastic
images, details of which is given in Chapter 4. In a total of 150 simulations
with 6 different images, in 149(99%) cases the ER had a higher PCC than
the MEB based on 4 neighbors and in 142 (95%) cases the ER had a higher
PCC than the MEB based on 8 neighbors. We also considered revised
classiﬁcations using the ER, which were done in the following way. After
obtaining the initial classiﬁcation by the ER, revised estimates of Gﬁuk)
were obtained by computing the relative frequencies of _Qk from the
reconstructed image and then (3.17) was used to obtain the revised
classiﬁcation. The process was repeated. It was observed that in most cases
there was no change in the PCC after the third revision. Furthermore all
the revised classiﬁcations had a higher PCC than both of the empirical Bayes
rules.

CHAPTER 4
SIMULATIONS.

4.1 Introduction.
Consider the component decision problem with XI 0 ~ P 0 where
P0 = N(po, l), 0 E 9 = {0, 1} and p1 > no. The Bayes rule versus a

probability measure p on 0 = 0 and l - p on 0 = 1 is

 

1 2
(4.1) decide 0 = if (1 - p)f1(X) pfO(X)
0 <
which is equivalent to
1 2
(4.2) decide 0 = if X c(p)
0 <
where
_ 1 1
(4-3) c(p) — 2 (#0 + #1) + ”1 - #0 log [1%]

The Bayes risk R(p) is given by
(4-4) R(p) = (H) ¢(C(p) - #1) + pll - “C(10) - #0)]-

We shall call the Bayes rule versus the uniform prior on 9 the simple rule
(SR). Then (4.2) and (4.3) gives the simple rule as

(4.5) decide 0: (I) if X 2 ago + 141)
<

and (4.4) become

(44) R(-5) = «as, + 41)) .

60

61

Table 1 below gives the Bayes risk and the theoretical value of the
PCC when classiﬁcation is performed with the SR, in the special case when

”0 = “”1 = “I‘-

Table l. Bayes risk and PCC for the SR

‘5 Bug risk PCC

 

.25 .4013 59.87
.50 .3085 69.15
.75 .2266 77.34
1.00 .1587 84.13
1.50 .0668 93.32
1.75 .0401 95.99
2.00 .0228 97.92

The PCC of the SR is a reasonable measure against which to
compare the performance of other classiﬁcation rules. In our simulation
studies we have taken 11 = 1 so that, if a decision rule is to be of any
practical use, then its PCC has to be at least 84.13. The simulation
programs were written by Mr. G. Heidari under the supervision of the
author. The programs are in Turbo Pascal, Version 87. Standard normal
deviates were generated in pairs (2], z2). Using the inverse probability
transform for the Raleigh distribution and a uniform (0, 1) random variable,
a radius R was generated. Using the uniform (0, 21), an angle (4 was

generated and (zl, 22) determined by 21 = R cos 1.) , 22 = R sin 11).

62

4.2 Simulations on the Transect
In our simulation studies the .0 sequence was generated as a Markov
chain with the parameters (p, 6) speciﬁed by (2.1) and (2.2). A total of
of 84 different combinations of (p, 6') were considered, with
p = 0, .01, .10, .20, .30, .40, .50
6 = 0, .10, .20, .30, .40, .50, .60, .70, .80, .90, .99, 1.00 .
The observable Xk(k = 1,2, ..... ,n+1) were generated as
Xk = Zk + 20k-l
where Z1‘ is i.i.d standard normal. For each (p, b), 100 replications were
performed with n = 50 and n = 200. The simulated data were classiﬁed
by the following 5 decision rules.

1. EMAP: Empirical MAP rule (2.8)

2. ER2: r2 Bayes rule
3. ER3: I‘2 Bayes rule
4. HEB3: Empirical Bayes rule of Hill et al. (1.7) with j = 1.
5. HEB5: Empirical Bayes rule of H111 et al. (1.7) which j = 2.

Tables 2 - 11 display the mean and the standard error of the PCC for the 5
decision rules. (All tables appear at the end of this Chapter.) The ER3 and
HEB3 when used for classifying 0i are (Xi—l’xi’xi +1) measurable. Table
12 compares the performance of ER3 versus HEB3.

63

4.3 Classiﬁcation of Images
Due to the complexity of the simulation of the true image as a MRF
only special types of patterns and MRF were considered. The patterns
considered were
1. checkerboard size 25 x 25
2. road sign size 25 x 25
3. great lakes size 31 x 44.
In the case of MRF simulations, the Markovian dependence of 0ij was
restricted to only one side. In particular, we assume
(4.7) P(0ij|rest) = P(0. .0| ij—l’a i—lj)'
MRF of type (4.7) with additional simplifying approximations were considered
by Hansen and Elliott (1982), Haslett (1985). In our simulations the MRF
was generated as (4.7) by specifying values for the 4 parameters
p1 = Puij = ”Mg—1 = 0’ ‘i-1,j = 0)
p2 = PM)“ = "I‘m—1 = 0’ ‘r—1,j = 1)
p3 = P(0ij = 0|0.,J._1 = 1, oi—l,j = 0)

p4 = Puij = ”lag—1 = 1’ ‘1-1,j = 1)

The MRF simulated were of size 31 by 44. In each of the images the lattice
was extended to size 27 by 27 or 33 by 46 is an obvious way, in order to
provide the border pixels with enough neighbors. The conventions used in
obtaining the image was to color the pixel black if 0 = l and white if

0 = 0. As in the transect case, for each pixel (i, j) the observable Xij

were generated as

where Zij is i.i.d. standard normal.

64

The image segmentation was performed by the following 4 decision rules

1. ER4: Extended rule (3.16) with 4 neighbors
2 SR: Simple rule (4.5)

3. MEB4: MEB rule (3.7) with 4 neighbors

4 MEBS: MEB rule (3.7) with 8 neighbors

After obtaining the image by using ER4 three revisions (lRER, 2RER,
3RER) of the extended rule (3.16) were implemented. The method of
revision is as follows. At the kth (k 2 1) revision, the data were classiﬁed
using ER4 with the empirics (3151(f5) estimated from the (k—l)St revision
(0th revision is the classiﬁcation obtained by using ER4)

For each image 25 simulations were performed. Morris et al. (1985)
reports their ﬁndings based on one simulation except for one image where 10
repetitions were done. Tables 13—18 give the PCC for the decision rule used
on the 6 images. For each image we have reconstructed (Figures 3—14) the
best and the worst (based on the PCC of ER4) classiﬁcation. Figure 15
shows a reconstruction of the great lakes pattern along with the PCC and

the ACC at each stage.

65

4.4 Conclusion

The simulations performed with regard to the transect case show that
overall the 1‘3 Bayes rule performs better than the existing empirical Bayes
methods proposed by Hill et al. (1984).

The simulations performed with various images, both deterministic and
stochastic, show that again the extended rules perform better than the
empirical Bayes rules suggested by Morris et al. (1985), Morris (1986).

In a total of 150 simulations with various images, MEB4 had a higher
PCC than ER4 only once. Furthermore in 142 simulations the ER4 had a
higher PCC than even MEB8. All of the revised extended rule classiﬁcations
had a higher PCC than both MEB4 and MEB8. For some images, great
lakes pattern and MRF with (p1, p2, p3, p4) = (.80, .40, .50, .60), the
simple rule performed better than both the MEB4 and MEB8 rules in all 50
simulations, whereas for ER4 only 3 simulations had a lower PCC than the
simple rule. Even in those 3 instances the revisions took care of that. The
revisions of the extended rules had a small improvement on the PCC and no
more than 3 revisions were needed to produce stability.

The work of Geman and Geman (1984) falls within the empirical Bayes
structure in that the hyperparameters are estimated from the marginal
distribution of the data. It will be of interest to compare the performance of
the extended rules with the method of simulated annealing proposed by
Geman and Geman (1984).

.10

.20

.30

.40

.70

.80

.99

1.00

Table 2.

2‘8 2‘8 {"3 1‘8 Pg ..
as as as 88 as 88

<9
1"?“
5%

The PCC

.01

90.42
1.58

91.32
1.35

89.74
1.27

88.98
1.51

89.66
1.35

89.98
1.88

91.18
1.53

93.12
1.21

92.44
1.41

93.66
1.15

90.08
1.63

93.32
1.20

I...
O

.4
.. 2"‘9°
33 S8

00 «I
t—Iga hub
88

oo
hit-I

00

as a:

8 oo on
H "‘9’ 599
HO!
Nth

oo
l—IH

32 83 3% ea

r8 e522
88

88

.20

79.88
.98

78.38
1.31

81.52
1.01

80.12
1.47

77.64
1.50

79.08
1.25

77.70
1.51

73.36
1.92

74.02
1.83

78.22
1.80

88.22
1.57

94.10
1.05

for the EMAP rule

78.78
.65

79.40
.62

80.42
.77

80.40
.91

80.54
.85

78.88
1.06

73.00
1.51

73.62
1.60

72.00
1.49

73.76
1.91

90.64
1.46

95.02
1.12

with n

.40

77.68
1.15

79.18
.80

78.72
.78

78.24
.88

79.80
.71

79.30
.83

76.50
1.18

74.02
1.37

70.74
1.53

70.86
1.78

90.06
1.39

95.10
1.01

=50

75.42
2.26

71.74
1.83

75.72
1.22

77.74
79.24

.83
80.24

.10

.30

.40

.50

.60

.70

.80

1.00

Table 3.

.01

90.84
1.48
91.33
1.58

90.19
1.43

92.28
1.37

‘O CO 8 on ‘0
rr #9 r. #9 rr
3% 38 88 $3 83

to co
2"!" 1"!"
$8 83

Lg
i—ias
1—11—

.10

81.82
1.03

81.85
1.09

85.85
.78

81.10
1.47

81.09

1.73

79.34
1.79

77.20
2.21

79.39
1.92

80.40
2.14

80.57
2.18

87.11
1.80

94.01
1.17

.20

80.57
.42

82.50
.48

83.44
.78

85.26
.46

83.62
.70

80.24
1.34

78.17
1.37

72.06
1.86

69.52
1.70

66.22
2.16
81.30
1.99

93.47
1.09

.30

77.69
.41

77.43
.50

80.94
.48

82.97
.43

83.95
.37

82.18
79.91
.76

70.99
1.42

66.23
1.69

61.53
1.69

78.43
1.78

92.86
1.12

.40

77.37
.83

74.07
.85

74.58
.70

79.61
82.50
82.33

.33

81.52
.43

75.53
66.57
1.13

61.56
1.16

72.89
2.20

96.19
.96

The PCC for the EMAP rule with n = 200

.50

67.01
2.95

72.59
1.24

75.67
.89

79.00
81.74

.45
83.17
82.64

.32
79.91

72.10
65.78
70.87

1.98

94.05
1.38

.10

.20

.30

.40

.70

.80

1.00

Table 4.

The PCC for the HEB3 rule with n = 50

CO CD
p—s t—s
NCO

p

g r.
ca: at:

00
MN 01” N

we» 1°59 1°
90‘

.8 r3 as r3 .°
3: as as as as

to
. 3".“ N
83 38

to
"‘3"

.10

84.60
1.28
86.26
1.33

86.86
1.28

88.48
1.07

88.68
1.08

88.52
1.28

89.24
1.49

90.12
1.46

91.02
1.44

91.30
1.67

92.74
1.81

92.34
1.80

.20

83.38
.79

82.58
.78

85.36
.82

85.30
.80

84.98
.86

88.00
.68

88.12
.74

88.88
.70

91.52
.89

90.14
1.66

91.86
1.81

93.76
1.65

83.28
.61

83.46
.51

83.74
.56

82.80
.63

84.38
.49

85.32
.55

86.12
86.82
88.84

.67

91.68
.93

92.00
1.79

94.18
1.61

.40

88.12
.49

85.70
83.26
82.80

.49

82.22
.57

82.42
83.32
85.54

.58

88.08
.59

90.74
93.52
1.46

93.18
1.60

95.02
.42

91.12
.45

86.98
.51

84.20
.51

82.50
81.70
81.56

.61

84.26
.61

86.34
.57

91.12

92.78

19.8
83'

Table 5.
p= .00
6: .00 94.48
1.65

.10 94.31
1.39

.20 93.39
1.71

.30 92.64
1.91

.40 94.09
1.66

.50 96.45
1.25

.60 94.80
1.36

.70 94.14
1.50

.80 93.24
1.89

.90 93.63
1.65

.99 90.45
2.25

1.00 92.08
2.04

#3
m

r8
“&

8 co
"‘. 1‘3!"
3—8 cc
on 014 or

‘D
Q 04
88

#3 1‘3
323 at:

.8
as

.10
88.81
90.75

.37

90.90
.37

91.75
92.59
.27

93.10
.33

94.21
.20

94.89
.22

94.46
.97

95.93
.52

95.12
1.35

94.69
1.40

.20

86.45
.35

87.58
.30

87.64
.25

87.90
.24

88.32
.26

89.12
90.39
.22

91.75
.25

92.74
.23

94.75
.21

95.56
1.24

95.01
1.39

.30

85.53
.28

84.63
84.73
.30

85.57
.25

85.08
.29

85.62
.25

87.53
.24

88.89
.24

90.96
93.68
.21

94.13
1.24

92.87
1.72

.40
88.17
86.27

.26
84.27
82.77

.28
83.82

83.64
84.89
86.31

.28

88.72
.27

92.25

96.21

r3
28

The PCC for the HEB3 rule with n = 200

95.85
.19

91.32
.21

87.23
.23

84.74
.26

84.00
83.30

.25
83.90
84.81

.24
87.82
91.17

95.32

96.43
1.07

70

Table 6. The PCC for the HEB5 rule with n = 50

p= .00 01 10 .20 .30 .40 .50
6:.00 90.90 84.44 78.66 79.84 82.48 97.46
1.89 2.47 1.74 1.13 .65 .31

.10 89.80 86.64 82.04 78.46 81.28 91.44
2.06 2.35 1.57 .90 .55 .43

.20 89.66 85.56 83.14 82.78 81.60 86.40
1.89 2.26 1.55 .92 .62 .55

.30 82.58 85.76 84.36 81.96 81.22 83.28
2.65 2.35 1.42 .81 .65 .54

.40 86.38 87.24 84.62 83.84 82.86 81.22
2.33 2.12 1.32 .90 .53 .57

.50 88.92 92.80 86.78 86.02 83.58 80.92
2.16 1.39 1.40 .81 .62 .58

.60 90.90 88.08 86.20 86.96 85.04 80.16
1.89 2.20 1.87 .87 .74 .64

.70 87.18 86.88 85.82 87.38 85.40 83.44
2.36 2.26 2.10 .98 .84 .58

.80 90.90 88.18 89.40 91.60 88.26 85.82
1.89 2.27 1.69 .91 .79 .59

.90 87.18 86.98 87.10 89.50 90.72 90.72
2.36 2.30 2.29 1.90 1.10 .64

.99 89.18 90.62 89.60 89.08 89.54 89.90
2.12 1.97 2.27 2.09 2.01 1.89

1.00 87.52 88.76 86.84 90.04 90.16 86.42
2.36 2.27 2.31 2.02 1.94 2.48

.10

.20

.30

.40

.60

.70

1.00

Table 7.

88

‘0

:-'r-'
$8

r3
:8

.3 NS
35 83

on
88

CD on

NS P8 P9 w
1%
“8

88

The PCC for the HEB5 rule with n = 200

F3 #3 93 98 98 9% 98 $8 98 98
as a: as :8 a: as 83 a; $8 3% 8

.10

87.02
.79

87.88
.76

89.14
.49

89.95
.70

90.80
92.60
.32

93.71
.28

94.95
.21

94.21
1.03

95.79
.67

94.25
1.47

90.60
1.96

.20

85.63
.39

86.31
.35

87.01
.27

87.11
.32

87.85
.27

88.35
.26

89.71
.23

91.55
.25

93.29
95.88
.18

95.98
1.19

93.15
1.76

.30

85.55
.27

84.47
84.00
.29

84.79
.28

84.46
.29

85.25
.27

86.90
.27

88.55
.25

90.83
94.43
.21

92.97
1.69

88.14
2.10

.40

88.39
.25

86.06
.26

83.76
.27

82.03
.27

82.82
83.06
.24

84.15
.26

85.64
.28

88.55
.27

93.45
.19

95.58
1.22

93.49
1.78

98.62
.12

91.71
.21

86.84
.25

83.98
.25

83.12
.28

82.56
.27

83.05
84.02
87.54
91.79

.24

95.67
1.37

94.51
1.42

Table 8.

= .00
6:00 98.38
.31

.10 98.80
.22

.20 98.78
.25

.30 98.46
.22

.40 98.32
.25

.50 98.76
.19

.60 98.38
.31

.70 98.78
.20

.80 98.38
.31

.90 98.78
.20

99 98.68
.22

1.00 98.48
.24

The PCC for the ER2

.01
97.94
98.14

.24

97.56
.28

97.82
.27

97.82
.26

98.34
.22

98.34
.28

98.36
.23

98.66
.25

98.52
98.50
.21

98.82
.20

.10

91.50
.45

92.00
.40

91.68
.47

91.64
.47

91.94
.49

92.76
.49

93.36
95.00
.38

94.02
.65

96.86
98.44
.28

98.64
.20

.20

88.26
.44

87.16
.48

88.58
87.84
.52

88.08
.48

89.26
.45

89.04
.47

88.74
.55

91.32
.55

93.08
.59

97.12
.40

98.78
.21

rule with n = 50

.30

85.88
.52

85.90
.47

85.56
.51

84.90
.49

85.36
.48

86.82
.52

86.18
.55

86.76
.61

87.80
91.78

.55
96.92

98.82
.21

.40

86.90
.53

85.50
.47

84.42
84.14
83.32
83.42

.51

83.96
.49

85.56
86.30
.57

88.80
.59

97.46

98.78
.21

90.30
87.14

.59
84.86
84.14

.53
83.94

83.02
83.40
84.80
84.96

.59

87.98
.49

96.30

98.68
.24

6: .00

.10

.20

.30

.40

.50

.60

.70

.80

.99

1.00

Table 9.

p: .00

99.47
.07

99.45
.08

99.46
.07

99.55
.08

99.56
99.60
.07

99.41
.07

99.58
.08

99.59
99.50
.07

99.42
.07

99.54
.07

73

The PCC for the ER2 rule with n

.01
98.71
98.50

.11
98.66
98.78

.11

98.32
.13

98.88
.11

98.47
.12

98.62
.14

98.93
.15

98.77
.17

99.26
.14

99.56
.07

.10

92.52
.18

92.38
.17

92.34
.17

92.76
.19

92.81
.20

93.10
.21

93.91
.21

94.09
.25

94.76
.24

95.87
.27

98.26
.27

99.51
.08

.30

87.42
.30

86.18
.25

85.78
.25

85.99
85.63
.30

86.06
.26

87.22
.26

88.16
.25

88.99
.26

90.78
95.59
.36

99.43
.08

.40

87.59
.25

85.90
.23

84.54
83.49
.26

84.41
.25

84.50
.23

85.20
.22

86.22
.26

87.10
.27

89.22
.23

94.52
.41

99.53
.07

200

.50

91.62
.23

88.16
.28

85.80
.29

84.29
.27

84.41
.24

84.02
.23

84.38
.26

84.45
.26

86.58
.24

88.64
.23

93.89

99.45
.07

6:00

.10

.20

.30

.40

.50

.60

.70

.80

1.00

Table 10.

p= .00

97.58
.36

98.36
.28

98.16
.33

8.8 8 8 8 8 8 8
a 82 88 82 88 82 38 88

is)
oo

:9
..‘q
583

The PCC for the ER3

.01

97.46
.28

97.42
.31

97.02
.33

97.26
.34

97.22
.32

97.86
.25

97.88
.29

97.92
.26

98.16
.26

98.02
97.98
.29

98.12
.31

.10
90.56
91.60

.42

90.62
.55

91.10
91.42
.55

92.66
.48

93.32
.51

94.70
.42

94.16
96.62

98.00
.32

98.20
.25

74

.20

87.78
.48

86.40
.53

87.90
87.48
.53

87.34
.52

89.00
.49

88.80
89.36

.53
92.02
94.00

97.24

98.30
.24

rule with n
.30 .40
86.22 88.86

.51 .49
85.64 86.62
.49 .50
84.88 84.42
.56 .50
84.10 83.22
.50 .53
84.96 82.50
.49 .49
86.12 83.10
.51 .50
86.96 84.20
.56 .53
87.52 85.82
.57 .59
88.74 87.12
.56 .58
92.28 90.20
.56 .60
96.62 97.62
.44 .32
98.28 98.22
.25 .25

50

93.52
.46

89.80
86.08
.58

83.88
.52

82.94

82.04

82.46

84.74

86.50

.57

90.22
.49

96.56
.41

98.22
.27

75

Table 11. The PCC for the ER3 rule with n = 200

p= .00 .01 .10 .20 .30 .40 .50

6: .00 99.16 98.35 92.10 89.19 88.71 90.14 95.05
.10 .12 .20 .23 .26 .24 .20

.10 99.23 98.16 92.13 88.72 86.45 87.33 90.99
.11 .15 .19 .23 .23 .23 .25

.20 99.08 98.42 92.09 88.08 85.48 84.96 87.52
.11 .14 .17 .22 .26 .28 .24

.30 99.26 98.55 92.45 88.20 85.82 83.30 85.00
.10 .12 .21 .22 .24 .25 .26

.40 99.26 98.21 92.87 88.72 85.29 84.06 84.52
.09 .14 .20 .27 .30 .27 .24

.50 99.28 98.72 93.52 89.15 86.19 84.14 83.65
.11 .12 .18 .25 .26 .23 .22

.60 99.10 98.25 94.23 90.43 87.99 85.58 84.11
.10 .13 .21 .22 .26 .22 .26

.70 99.27 98.45 94.73 91.71 89.19 87.04 85.30
.09 .12 .23 .27 .25 .27 .26

.80 99.12 98.74 95.68 92.49 90.85 89.02 88.52
.10 .15 .20 .24 .23 .31 .26

.90 99.11 98.66 96.60 94.46 93.10 91.80 91.22
.07 .17 .21 .22 .20 .21 .24

.99 99.10 98.98 98.40 97.23 96.95 96.30 95.99
.10 .13 .19 .26 .27 .30 .24

1.00 99.26 99.21 99.16 99.12 99.14 99.16 99.23

.09

.09

.11

.10

.12

.10

.10

76

Table 12. The performance of the ER2 rule versus the HEB3 rule

A. n=50
p=.00 01 10 20 .30 40 50
6:.00 4.64 7.86 5.96 4.40 2.94 0.74 -1.50
.10 3.70 5.48 5.34 3.82 2 18 0.92 -1.32
.20 6.22 5.08 3.76 2.54 1 14 1.16 -0.90
.30 9.10 5.64 2.62 2.18 l 30 0.42 -0.32
.40 5.78 7.70 2.74 2.36 0.58 0.28 0.44
.50 6.72 4.28 4.14 1.00 0.80 0.68 0.34
.60 4.64 5.50 4.08 0.68 0.84 0.88 0.90
.70 8.90 9.38 4.58 0.48 0.70 0.28 0.48
.80 4.64 5.52 3.14 0.50 -0.10 -0.96 0.16
.90 8.90 9.88 5.32 3.86 0.60 -0.54 —0.90
.99 5.68 4.12 5.26 5.38 4.62 4.10 3.78
1.00 6.72 5.70 5.86 4.54 4.10 5.04 7.48
76/84 90% ER3 better: mean = 3.74 std. dev. = 2.62
8/84 10% HEB3 better: mean = .82 std. dev. = .48
B. n=200
p= 00 01 10 .20 30 40 50
6:.00 4.68 4.31 3.29 2.74 3.18 1.97 -0.80
.10 4.92 4.97 1.38 1.14 1.82 1.06 -0.33
.20 5.69 7.78 1.19 0.44 0.75 0.69 0.29
.30 6.62 4.07 0.70 0.30 0.25 0.53 0.26
.40 5.17 3.37 0.28 0.40 0.21 0.24 0.52
.50 2.83 2.34 0.42 0.03 0.57 0.50 0.35
.60 4.30 7.18 0.02 0.04 0.46 0.69 0.51
.70 5.13 2.45 -0.16 -0.04 0.30 0.73 0.49
.80 5.88 1.74 1.22 -0.25 —0.11 0.30 0.70
.90 5.48 2.28 0.67 -0.29 —0.58 -0.45 0.05
.99 8.65 2.75 3.28 1.67 2.82 0.09 0.67
1.00 7.18 2.25 4.47 4.11 6.27 2.26 2.80
75/84E89%; ER3 better: mean = 2.30 std. dev. = 2.25
9/84 11% HEB3 better: mean = 0.33 std. dev. = 0.24

lRER

94. 868
94.575
95. 528
95. 015
96.554
95. 381
95. 601
95. 528
96.554
95.968
94. 648
95.968
95. 528
95. 748
95. 894
95. 528
95. 674
94.795
94.721
95. 601
95.088
95. 821
95. 821
95. 674
94.868

95.478
.542

2RER

94.795
94.721
95.528
94.868
96.481
95.455
95.601
95.528
96.408
95.894
94.648
95.821
95.455
95.601
96.114
95.528
95.674
94.941
94. 941
95. 601
95.235
95. 748
95. 601
95. 674
94. 795

95.466
.499

77

3RER

94.795
95.015
95.455
94.868
96.481
95.381
95.601
95.528
96.188
95.894
94.648
95.821
95.455
95.601
96.041
95.601
95.674
94.941
94.941
95.601
95.161
95.748
95.601
95.601
94.868

95.460
.462

SR

83.871
83.798
84.824
83.871
84.897
83.871
83.284
84. 091
85. 557
85.191
85.264
85.557
85.337
84. 971
85.484
84.091
83. 798
83.138
84.531
83.358
84.091
84. 824
84.604
82. 845
84.091

84.370
.800

MEB4

71.408
70.894
69.941
71.408
71.188
70.528
72.067
70.894
71.774
70.455
71.188
70.894
72.727
71.848
71.334
70.528
71.041
69.795
70.455
70.528
70.088
70.015
71.334
69.868
71.848

70.962
.756

Table 13. The PCC with the GREAT LAKES pattern

MEBS

73.680
73.837
73.900
73.680
74.194
72.947
73.607
73.314
73.754
72.874
72.874
73.021
74.633
73.387
74.707
73.754
73.240
72.581
72.727
72.287
72.141
73.974
72.067
72.067
74.633

73.337
.784

ER4

90. 560
91. 360
93. 600
89.440
90.720
92.480

90. .720
93.120
91.520
89.600

91. 360
92.960
92. 000
92. 480
91. 360
88. 480
92. 160
93. 280
92.320
92.960
92. 000

93. 440

mean 91.706

s.d.

1.467

lRER

91.360
92.480
93.440
89. 280

92. 160
94.080
91. 360
93.120
92.320
89.760
89. 760
91. 520
91. 840
92. 800
93.760
92. 320
90.400
92.000
93.440
93.280
93.920
93.600
92.640
93.920

92.186
1.441

78

SR

82.720
83. 520
85.760
83.200

84.000
85.440
83.840
84.480
83.040
82.560
84.480
84. 800
84.160
83.520
84.800
85. 440
83. 840
85.920
86. 240
84.960
87.040
84. 800
85.280
85.280

84.531
1.128

Table 14. The PCC with the CHECKERBOARD

MEB4

88.000
89.120
92.320
87.680
89. 600
90. 400
90. 880
89.280
90.720
88.000
87.840
88. 960
89. 120
91.040
89.440
91.840
89.760
87.520
90.080
90.080
89.760
91.840
90. 720
90.040
90. 880

89.731
1.412

pattern

MEB8

89. 280
90. 240
93. 440

89. 600
90. 880
92. 320
90.880
92.160
89.440

89. 600
89.920
91. 840
90.240
92.800

88.800
90. 560
91.680
91.200
94.240
91.840
91.520
92.000

90.867
1.533

 

Table 15. The

79

PCC with the

2RER

95.520
97.280

93.600
96.160
95.360
96.960
95.200
94.720
94. 080
95. 040
95.840
95.680

94.560
95.360
95.520
96.640
96.640
96.640

95. 200
95. 040

95.840

95.578
.912

3RER

95.680
97.600
95.360
93. 440
96.160
95. 520
96. 640
95. 200
94.720
94. 080
95.040
96.000
95.520
94.880
94. 560
95. 680
95. 520
96. 640
96. 640
96.320
96.000
95. 200
95. 200
96. 640
95.840

95.603
.908

SR

83.360
87.680
85.760
83.200
84.000
83.040
86. 560
85.120
83. 520
84.640
82.720
83.520
86.720
82.400
83.840
83.840
86.400
86.880
85.280
85.920
84. 640
85. 280
84.000
88.000
86.560

84.915
1.607

ROAD SIGN pattern

MEBS

91.360
95.200
93.440
90.880
90.560
90.560
94.080
92.320
92.800
92.320
90.720
89.280
93.440
90.720
92.160
91.040
93.760
92.320
92. 640
91. 360
92.000
92. 000
91. 360
93. 760
92. 000

92.083
1.359

Table 16. The PCC with the (.90,.50,.50,.10) MRF

ER4

89.223
88.416
89.003
88.196

2RER

89.809
88.563
89. 003
88. 710
87. 977
88. 636
87.683
88. 416
88.050
89. 809
87. 683
89.516
89.150
88. 270
87. 977
90.396
88. 343
87.830
88. 563
88. 710
88. 343
90. 029
88.856
88. 710
90. 469

88.780
.814

80

3RER

89.736
88.416
88.930
88.710
88.123
88.563
87.683
88.636
88.123
89.809
87.683
89.443
89.223
88.343
87.977
90.396
88.343
87.903
88.636
88.490
88.416
90.029
88.930
88.563
90.323

88.777
.784

SR

85.191
84.164
83.431
84.897
84.018
83. 871
83.358
83.871
83.651
84.677
84. 897
84.384
84.677
83. 871
83.211
85.924
83. 871
82.258
84. 897
83.504
83.504
84.018
84.384
85. 777
85.337

84.226
.854

MEB4

87.537
86.804
86.510
86.364
86.510
86.657
86.070
87.648
85.777
86.657
86.804
87.170
86.510
86.144
84.971
88.270
85.411
85.264
87.243
86.217
86.437
87.023
87.243
88.636
87.390

86.691
.867

pat tern

MEBS

88.270
87.170
86.950
86.657
87.023
87.023
85.997
87.454
85.777
87. 170
87. 097
87. 610
88.050
86. 437
86. 364
88. 416
85. 997
85.264
87.317
86.217
86.730
87.463
87.097
87.830
88.343

87.029
.822

81

Table 17. The PCC with the (.01,.10,.01,.90) MRF pattern

ER4

91.569
91.569
92.229
93. 035
92. 595
92.889
92.815
93.768
92. 962
92. 595
92. 595
92.082
92.815
93.475
91.642
93.109
93. 109
93. 475
93. 328
93.182
93.035
93.842
92. 082
92. 522
92. 669

mean 92.760

ad

.633

lRER

91.496
93.182
93.035
93.182
93.548
93.548
92.522
93.475
93.182
92.375
92.962
92.522
94.135
93.475
92.155
93.109
92.595
94.282
93.255
92.889
93.548
94.282
92.449
92.742
93.328

93.091
.658

2RER

91.569
92.669
93.182
93.035
93.622
93.622
92.522
93.548
93.475
92.302
92.962
92.669
93.915
93.109
92.009
93.328
92.742
94.355
93.402
92.522
93. 402
94. 355
92. 742
92. 669
93.255

93.079
.666

3RER

91.642
92.962
93.182
93.109
93.622
93.402
92.595
93.255
93.402
92.155
92.962
92.595
94.062
93.255
92.009
93. 182
92.669
94.501
93.548
92.522
93.402
94.208
92.669
92.742
93.109

93.070
.661

SR

82.771
83.871
82.258
82. 918
83. 871
84.091
83.211
84.897
84.457
84.238
84.824
82.331
85.191
83.724
83.138
83. 798
82.478
84. 897
84. 164
83. 798
82.258
85. 997
82.918
84.091
82. 771

83.718
1.002

MEB8
90.616

Table 18.

ER4

85. 924
82. 771
86. 217
84.091
85.191
84.824
84. 457
86. 730
85. 117
84. 531
86. 290
85. 411
85. 557
84.311
83. 358
85. 191
84. 531
85. 630
85. 264
85.337
83.724
84.971
86.510
85.484
84.164

mean 85.023

ad

.972

2RER

85.117
83.578
86.217
84.018
85.630
84.824
84.604
86.730
84.897
86.437
85.924
85.117
85.924
84.677
82.478
86.290
84.677
85.850
85.850
84.824
83.431
85.191
86.510
85.777
84.164

85.149
1.063

82

3RER

84.897
83. 798
86. 144
83. 944
85. 630
85.044
84.531
86.730
84.824
86.657
85.777
84.824
86.070
85.117
82. 405
86. 364
84.751
85.850
85.704
84. 677
83. 504
85.337
86.584
85.850
84.311

85.173
1.067

SR

84. 311
81. 965
84. 897
84. 018
85. 044
83. 431
85.557
85.484
84.164
85.191
84.897
84.018
85. 557
84. 677
82.551

83. 724
84. 897
83.724
83.944
82.918
84.384
85. 557
84. 311
83.724

84.302
.942

MEB4

83.724
81.158
83.724
82.625
83.284
82.185
83.358
84.604
82. 625
83. 578
84.384
83. 798
84.238
83.358
81.745
83.211
82.478
83. 651
83.065
81. 745
82.038
83. 431
84.164
83. 504
82.991

83.147
.885

The PCC with the (.80,.40,.50,.60) MRF pattern

MEBS

83.944
81.085
83.358
82.771
83.184
82.331
83.358
84.897
82.771
83.504
84.238
84.018
84.238
83.211
81.672
83.358
82.405
83.724
83.138
81.745
81.891
83.211
84.238
83.138
83.211

83.144
.915

83

In Figures 3 — 15 the images are arranged as shown below.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

TRUE IMAGE ER 4
SR 1 RER
MEB 4 2 RER

 

 

 

 

 

 

 

 

MEB 8 3 RER

 

 

 

 

 

 

84

Original Picturu
Tho Great Likll 31 by 44

   

Rulol:

Rovision I l on Rul¢41

  
 

Rov1|1an u 2 an Ru1I41

 

Rov151on I 3 on Rule41

 

Figure 3. GREAT LAKES: The best classification

85

urxganA chturo
Tho Grunt Lakcn 31 by 44

 

Figure 4. GREAT LAKES: The worst ciassification

86

Original Pictur.
Chockor Board :5 by 25 Rulabx

R 1 l
u . z Rovisxon h 1 on Rule4:

. JI 1Jh| Iili

Ru1a21 Rovxslon I 2 on Rulubn

 

Ru1a31 Rovtsxan u 3 an RUII4|

Figure 5. CHECKERBOARD: The best classification

87

Original Pictur-
Chcckor Board 23 by 25

 

 

Ruins:

 

Figure 6. CHECKERBOARD:

Rulo4l

HEP

Rlvxsxon h i on Ru1041

Rovusxon 0 2 on Rulo‘r

Rovxsron ﬂ 3 on Rul¢41

The worst classification

88

Orthnal Picture
Road Sign 23 by 23 Ruto41

 

newtszon O 1 on Rulo41

 

Rov1aton a 2 :n Ru1¢41

   

Ruler Raylston ﬂ 7 on Rulc41

   

Figure 7. ROAD SIGN: The best classification

Figure 8.

89

Original Picture
Road Sign 25 by 25

 

Rulots

 

RuleZ:

 

Rule}:

      

  

0..
I

ROAD SIGN:

Rull4t

 

Rcvssxon I 1 on Ru1e4:

 

Rovxsxon I 2 on Ruie4:

     

an

Rov1sion u I on Ruled:

   

     

9:

   

The worst classification

90

Rulo4:

91.92.93.94l 0.90 0.50 0.50 0.10

Orthnal Picture

 

Povns1on I 1 on Rulc4r

Ru1l1:

n u 2 an Eulndr

Rovnsxu

 

Rovxsxon u 3 on Rull4l

 

 

The best classification

MRF (.90,.50,.50,.10):

Figure 9.

Oriqxnal Picturo
91.92.93.941 0.90 0.50 0.50 0.10 Rut-4'

 

RIVlsxon n l on Ruled:

h

Rovtstan n 2 )n Pule41

   

Figure 10. MRF (.90,.50,.50,.10): The worst classification

92

°"‘°‘““ “ﬂu" (DI-0.01 92-u.1o 9380.01 94.0.90)

I I III-'- ::%I 'II I

'7 v. ‘5 '3’!“-

 

Figure 11. MRF (.01,.10,.01,.90): The best classification

93

O'AQInIl'Ptcturo (pl-0.01 92-o.1o 93-0.01 94-0.901

     

..I' IW'II- IIII. l'l:1:ll‘: O I

LI. II:I. II... I'. L'-
II n.._ M I-

 
   

   

 

    
    

’1 A - ll
._~5‘.,1-¢'
.' ' .0, 0":- 1::.. I II. a

' ”nix-5:628:45:

 
      
    
 

     

       

   

     

L I III. II. I'O'I I'I I... .
lﬁqﬁlﬂd'],-. l .1 L;':

 

Figure 12. MRF (.01,.10,.01,.90): The worst classification

94

 

Figure 13. MRF (.80,.40,.50,.60): The best classification

95

 

Figure 14. MRF (.80,.40,.50,.60): The worst classification

96

Orthnal Picturox Rcvxsxon . 1 on tho Rolult of Ruln4 1

 

PCC =96.04Z
PCCO '83.41Z
PCCI =°8.35'
ACA 390.88%

Rov1s1un G 3 on the Result of ”ulna 1

 

PCC =°e.192
. - .. Pcco =34.a:x
2250:”;ASSQ PCCl =va.:7z

a ow '1: II- 0
etc: =a:.::x “c“ ‘°““3‘
ACA aes.esx

 

Rclult of Rulodr -

RIVIIIOh I 9 on thl Rnlult o6 Rulo‘ 1

   

PCC I95.16% PCC I96.ll%
PCCO '80.57Z PCCO n93.312
PCC! I97.83Z PCCI I9B.O9X
RCA .99.202 ACA .91.7OZ

Figure 15. GREAT LAKES CLASSIFICATION BY SR and ER4

APPENDIX

APPENDIX
DYNAMIC PROGRAMMING RECURSION

Suppose

(A.1) ADM”): 131 aiiﬂ + Embi0i_ 191

where 11%: (01,02... .011), oi e {0, 1}.
Let A?=0,A%=al.
For 1 S i 5 n—l, let

0 1
Ai+1 = max (A9,A Ai)
1 _ 0 1

Ai+l — max (Ai + ai+1’ Ai + ai+1 + bi+l)

Then,
max ADM”): max (A0, A1).
2"

Proof:

The underlying similarity of all dynamic programming processes is the
creation of a set of functional equations of a particular type, called recurrence
relations.

(A.1) can be written as

(A.2) Ana“) = An_l(£“—l) + an0n + bn0n_ 1011
From (A. 2)
max A DMD) = max max 1An_ 1(Qn1 ), max 1An_ l(£111) )+ all + ”11011- 1
e“ 1‘“ 2“
_ l
— max (An’ An)

97

98

where

0 —1)
An = maxAn_l()f1
011-1

1
An=maxAn__ll(£n )+an +bnn_0
011-1

Using (A.2) again, we get

 

0 _
An—max mix-VAD2LQD2 ),max2_An2(,Qn2 )+an ———1+bn10n2
0 in
= max (A0 A )
n-1’ n—1
and
1
An = max max An_ 2m“ )+an, max 2An _‘W‘ )+an __l+bn 1011- 2+an +bn

 

on—2 0n—2

 

_ 0 1
—max(An_l+ a,A1+a +b)

Proceeding as above, we obtain for 1 5 i 5 n—1

0 0 1
Ai+1 = "1&va A1)

A1 =max(A0+a

i+1 i+1’ Ai + aLi+1+ b

i+1)

Letting A? = 0 and A% = a1,
we get

max ADM“) = max (A2, A111). 1::
on

Note that,
A? = .max Aim‘) and A} ‘max Aid).
[2 ; 0i=0} {2‘ ; 01:1}

BIBLIOGRAPHY

10.

11.

12.

13.

BIBLIOGRAPHY

Ballard, R. J. (1974). Extended rules for the sequence compound
decision problem with mxn component. Ph.D. dissertation,
Department of Statistics and Probability, Michigan State
University, East Lansing.

Ballard, R. J. and Gilliland, D. (1978). On the risk performance of
extended sequence compound rules for classification between N(—1, 1)
and N(l, 1). J. Stat. Compat. Simul, 6, 265—280.

Ballard, 11.1., Gilliland, D. and Hannan, J. (1975). 0(N'1/2)
convergence to k-extended Bayes risk in the sequence compound
decision problem with mxn components. RM—333, Department of
Statistics and Probability, Michigan State University, East Lansing.

Bertsekas, D. P. (1976). Dynamic Programming and Stochastic Control,
Academic Press, New York.

Besag, J. E. (1974). Spatial interaction and statistical analysis of lattice
systems. J. Royal Statist. Soc., Series B, 36, 192—236.

Besag, J. E. (1986). On the statistical analysis of dirty pictures.
J. Royal Statist. Soc.,Series B, 48, 259—302.

Cooper, L. and COOper, M. V. (1981). Introduction to Dynamic
Programming, Pergamon Press, New York.

Cox, D. R. (1960). Serial sampling acceptance schemes derived from
Bayes' theorem. Technometrics, 2, 353—360.

Derin, H., Elliott, H., Christi, R and Geman, D. (1984). Bayes
smoothing algorithms for se entation of binary ima es modeled by
Markov random ﬁelds. IE E Trans. Pattern Anal. achine Intell,
PAMI 6, 707-720.

Devore, J. L. (1975). Linear near neighbor classiﬁers for correlated
categories. Commun. Statist, 4, 891—905.

Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs
distributions and the Bayesian restoration of images. IEEE Trans.
Pattern Anal. Machine Intell, PAMI 6, 721—741.

Gilliland, D. and Hannah, J. (1969). On an extended compound
decision problem. Ann. Math. Statist, 40, 1536—1541.

Gilliland, D., Harman, J. and Huang, J.S. (1976). Asymptotic solutions

to the two state component compound decision problem, Bayes
versus diffuse priors on pr0portions. Ann. Statist, 4, 1101-1112.

99

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

100

Hannan, J. (1957). Approximations to Bayes risk in repeated play.
Contributions to the Theory of Games, 3, 97—139, Ann. Math.
Studies, No. 39, Princeton University Press.

Harman, J. and Robbins, H. (1955). Asymptotic solutions of the
compound decision problem for two completely speciﬁed distribution.
Ann. Math. Statist., 26, 37-51.

Harman, J. and Van Ryzin, J. (1965). Rate of convergence in the
compound decision problem for two completely speciﬁed
distributions. Ann. Math. Statist., 36, 1743-1752.

Hansen, F. R. and Elliott, H. (1982). Image segmentation using simple
Markov random ﬁeld models. Computer Graphics and Image
Processing, 20, 101-132.

Haslett, J. (1985). Maximum likelihood discriminant analysis on the
plane using a Markovian model of spatial context. Pattern
Recognition, 18, 287—296.

Hill, J. R., Hinkley, D. V., Kostal, H. and Morris, C. N. (198:3.
Spatial estimation from remotely sensed data via empiri Bayes
models. Proc. of the NASA Symp. on Math. Pattern Recognition
and Image Analysis, 115-136.

Kinderman, R. and Snell, J. L. (1980). Markov random ﬁelds and their
applications, American Mathematical Society, Providence.

Morris, C. N. 1986). Binary image classiﬁcation. Proc of the 18th
Symp. on t e Interface, 71—74.

Morris, C. N., Hinkley, D. V. and Johnston, W. (1985). Classiﬁcation
in a spatial] correlated environment. Proc. of the Third Annual
Symp. on Math. Pattern Recognition and Image Analysis; also, CSS
Technical Report No. 20, Center for Statistical Sciences, The
University of Texas, Austin.

Pickard, D. K. (1977). A curious binary lattice process. J. Appl
Prob., 14, 717-731.

Preston, P. F. 9971). An empirical Bayes problem with a Markovian
parameter. iometrilca, 58, 535-543.

Ripley, B. D. (1986). Statistics, images and pattern recognition.
and J. Statist., 14, 83-111.

Robbins, H. (1951). Asymptotically subminimax solutions of compound
statistical decision problems. Proc. Second Berkeley Symp. Math
Stat. Prob.,University of California Press, 131-148.

Robbins, H. (1964). The empirical Bayes approach to statistical decision
problems. Ann. Math. Statist., 35, 1—20.

28.

29.

30.

31.

32.

33.

34.

101

Scharf, L. L. and Elliott, H. (1981). Aspects of dynamic programming
in signal and image processing. IEEE' Trans. Automat. Contr.,

AC 26, 1018—1029.

Swain, P. H., Vardeman, S. B. and Tilton, J. C. (1981). Contextual
classiﬁcation of multispectral image data. Pattern Recognition, 13,
429—441. .

Tilton, J. C., Vardeman, S. B. and Swain, PH. (1982). Estimation of
context for statistical classiﬁcation of multispectral image data.
IEEE Trans. on Geosciences and Remote Sensing, GE 20, 445—452.

Van Ryzin, J. (1966). The compound decision problem with mxn ﬁnite
loss matrix. Ann. Math. Statist., 37, 412-424.

Vardeman, S. (1975). k—extended procedures in the ﬁnite state sequence
compound decision problem. RM-337, Department of Statistics and
Probability, Michigan State University, East Lansing.

Vardeman, S. (1980). Admissible solutions of k—extended ﬁnite set and
sequence compound decision problems. Journ. Multivariate Anal, 10,
426—441.

Vardeman, S. (1981). On the small N performance of bootstrap and
Bayes extended and unextended set compound rules for classiﬁcation
between N(-l,1) and N(1,1). J. Statist. Comput. Simul, 13,
255—271.

"Illlllllllllllllllllls