)V1531_J RETURNING MATERIALS: PIace in book drop to LIBRARIES remove this checkout from -rj—. your record. FINES will be charged if book is returned after the date stamped beIow. EXTENDED RULES FOR THE CLASSIFICATION OF DEPENDENT PARAMETERS By Kosgallana Liyana Durage Gunawardena A DISSERTATION Submitted to Michigan State University in partial ful lllment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1988 ABSTRACT EXTENDED RULES FOR THE CLASSIFICATION OF DEPENDENT PARAMETERS By Kosgallana Liyana Durage Gunawardena In this dissertation we consider the classification problem in which the unknown states .6 = (01,02, ..... ,0n) are dependent. We first review the compound approach to the construction of decision rules and then introduce the extended compound approach as a general approach to the construction of rules with favorable risk behavior. In most empirical Bayes literature, the {0i} are assumed to be i.i.d.. The construction of decision rules in the i.i.d. case is very simple. The problem becomes considerably more complex if the {0i} are dependent, since the Bayes risk may depend on n and the class of distributions on {0i} may be of such high dimension so as to make accurate estimation impossible. The extended compound decision theory developed by Gilliland and Hannan (1969) is utilised to obtain decision rules in the case when {0i} are dependent. The one—dimensional case is generalized to two dimensions where the indices are positions in an integer lattice. The two—dimensional case has applications in image segmentation and pattern recognition where the image is a matrix of parameters Q = {0U} taking values in a finite set 9 . The observed image is a random matrix X having a distribution depending on Q. Here Q may reasonably be assumed to be the realization of a Markov random field in some applications. The extended compound approach to the image case is described with the choice of a neighborhood system on the lattice. Many simulations were run to compare the risk behavior of the extended compound rules with that of other rules. The simulations performed show that the extended rules perform better than the other empirical Bayes rules that have been pr0posed. To my daughter Kalpanee iv ACKNOWLEDGMENTS I am truly grateful to my advisor Professor Dennis C. Gilliland for his continuous encouragement and guidance in the preparation of this dissertation. I would also like to thank Professor James Hannan for the introduction to compound decision theory and for teaching me much about mathematical statistics. I would like to thank Professors R. Ericson and H. Salehi for serving on my guidance committee. Also thanks are due to Ms. Loretta Ferguson for typing the manuscript. Finally, I wish to thank the Department of Statistics and Probability for providing the financial support which made my graduate studies in Statistics possible. TABLE OF CONTENTS CHAPTER PAGE 1. Introduction ...................... 1 1.1 A Literature Review ................. 1 1.2 A General Approach ................. 10 1.3 The Image Problem .................. 17 1.4 Summary ...................... 18 2. The Transect Case ................... 20 2 1 Introduction .................... 20 2.2 Classification by Maximum Posterior Probability (MAP) .22 2.3 Empirical MAP .................... 28 2.4 Classification by Extended Compound Bayes Rules . . .30 2 5 Comparison of Methods ................ 39 3. The Image Case ..................... 41 3 1 Introduction .................... 41 3.2 Introduction to Image Segmentation ......... 47 3.3 Classification by Empirical Bayes Rules ....... 50 3.4 Classification by Extended Rules .......... 55 3 5 Comparison of Methods ................ 59 4. Simulation ...................... 60 4.1 Introduction .................... 60 4. 2 Simulations on the Transect ............. 62 4. 3 Classification of Images .............. 63 4. 4 Conclusions ..................... 65 APPENDIX ...................... 97 Dynamic Programming Recursion ............ 97 BIBLIOGRAPHY ....................... 99 vi LIST OF TABLES TABLE SDmKIOSO‘IbWMo—D HHHHHHD—‘I—‘l—i ao-acacnw-wwwc Bayes risk and PCC for the SR ........... The PCC for the EMAP rule with n = .50 ...... The PCC for the EMAP rule with n = 200 ...... The PCC for the HEB3 rule with n = 50 ...... The PCC for the HEB3 rule with n = 200 ...... The PCC for the HEBS rule with n = 50 ...... The PCC for the HEB5 rule with n = 200 ...... The PCC for the ER2 rule with n = 50 ...... The PCC for the ER2 rule with n = 200 ....... The PCC for the ER3 rule with n = 50 ...... The PCC for the ER3 rule with n = 200 ....... The performance of the ER2 rule versus the HEB3 rule The PCC with the GREAT LAKES pattern ...... The PCC with the CHECKERBOARD pattern ...... The PCC with the ROAD SIGN pattern ........ The PCC with the (.90,.50,.50,.10) MRF pattern . . . The PCC with the (.01,.10,.01,.90) MRF pattern . . . The PCC with the (.80,.40,.50,.60) MRF pattern vii ..61 ..66 ..67 ..68 ..69 ..70 ..71 ..72 ..73 ..74 ..75 .76 ..77 ..78 ..79 ..80 ..81 .82 LIST OF FIGURES FIGURE 1 Neighborhood systems J; ........ 2 Neighborhood systems .11 and .12 and their associated cliques . . . . 3 GREAT LAKES: The best classification . . 4 GREAT LAKES: The worst classification . 5 CHECKERBOARD: The best classification . 6 CHECKERBOARD: The worst classification 7 ROAD SIGN: The best classification . . . 8 ROAD SIGN: The worst classification . . 9 MRF (.90,.50,.50,.10): The best classification 10 MRF (.90,.50,.50,.10): The worst classification 11 MRF (.01,.10,.01,.90): The best classification 12 MRF (.01,.10,.01,.90): The worst classification 13 MRF (.80,.40,.50,.60): The best classification 14 MRF (.80,.40,.50,.60): The worst classification 15 GREAT LAKES classification by SR and ER4 viii ........ 44 ........ 46 ........ 84 ........ 85 ........ 86 ........ 87 ........ 88 ........ 89 ..... 90 . 91 ..... 92 . 93 ..... 94 ....... 96 CHAPTER 1 INTRODUCTION In this thesis we consider the classification problem in which each of n observations is to be classified as belonging to one of two classes. 1.1 A Literature Review In most of the work relating to the classification problem, the successive classes are assumed to be independent. In many situations, though, it is unreasonable to assume that the successive classes are independent. One of the earliest, if not the first, papers to consider positively correlated classes and that a more sensitive scheme of classification can be obtained by exploiting this correlation is due to Cox (1960). In Cox (1960) a sequence of batches {Bi} is to be classified based on the number of defectives in random samples of predetermined fixed size. The {Xi} are random observations, where Xi is the number of defectives in the sample from batch Bi' The {Xi} are assumed to be independent and have a Poisson distribution with mean mi, given the sequence {mi}, where 111i is the true but unknown batch quality. To complete the model, the sequence {mi} is assumed to be a realization of a two—state Markov chain with specified states and known transition matrix. Cox suggests using some other X's in addition to xi in making the decision about the 1th batch. In particular he suggests a one-step back rule based on (Xi_1,Xi) and a one—step forward, one—step back rule based on (xi-l’xi’xi +1). Preston (1971) seems to be the first person to consider the empirical Bayes approach to a classification problem in which the set of parameter values is a realization of a stationary Markov chain. The model considered by Preston is very specialized. The random observation X is assumed to have the following conditional distribution: P(X=0|0=0)=1,P(X=1|0=1)=p,P(X=0|0=1)= l—p where p is known. It is assumed that the unknown classifications {0i} are a realization of a stationary Markov chain with unknown transition matrix and that the observations Xi are independent conditional on {0i}. At stage n, the transition probabilities are estimated based on the partial sequence x = (X1,X2, ..... ,Xn) and substituted into the Bayes rule for the decision about 0n. Devore (1975) considers the Robbins' (1951) component with Markovian {0i}. The Robbins' component has normal conditional distributions N(20 — l, 1) and independent observations conditional on {0i}. In Devore (1975), the {0i} sequence is assumed to be a realization of a stationary two-state Markov chain with P(0=0) = P(0=1) = .5 and transition matrix .5(1 + 1r) .5(1 — 1r) for a or 6 [—1,1]. .5(1 — 1r) .5(1 + 7r) The above model can be considered as a generalization of the model with repetitions of the Robbins' component problem with independent 0i and probabilities P(0=0) = P(0=1) = .5. For this model, the simple compound rule g0 = (wgmg, ..... ,tpg) defined by 1 > 0 «2?: decide 0, = if xi ‘ i= 1,2, ..... ,n 0 < 0 is the minimax classification rule for Q = (01,0 , ..... ,on). Devore studies the prOperties of classification rules for 0i based on linear functions of X.'s within the context of his problem. The two rules J considered are those resulting from Optimization over classes of linear rules: Rule 1 Among all the rules in the class given by: 1 2 k if Xi + aXi_l i = 2, ..... ,n 0 (1.7) decide 0i 0 otherwise . Hill et al. (1984) use (1.5) to estimate p and p11 — p01, substitute the estimates into (1.6), and use (1.7) for classification. Thus, they have developed a set empirical Bayes decision procedure for the Robbins' component and Markov prior. We will refer to procedures with j restricted to {1} in (1.6) as HEB3 and to procedures with j restricted to {1,2} in (1.6) as HEB5. In Chapter 2, Section 1 we derive an exact expression for the induced posterior density on Q , where the end-points are included. We also give an algorithm for the maximization of the joint posterior, and classification rules are derived based on this maximization. The work reviewed thus far concerns models with a certain common underlying structure. The important features of this structure are: 1. There is a sequence of unobservable parameters { 0i} where 0i=0 or 1, i=l,2, ..... . 2. Conditional on {0i}, independent X1,X2, ...... are observed where Xi ~ P 0i, i = 1,2, ..... . The two distributions P0 and P1 are assumed known. 3. The parameter sequence {0i} has a (joint) distribution G. 8 4. There is a decision bi to be made about 0i, specifically, a classification as 0 or 1, i = 1,2, ..... . For each n we let Q= (01,02, ..... ,Qn) and X = (X1,X2, ..... ,Xn) denote partial sequences, let G denote the distribution of Q and let P Q denote the conditional distribution of 2; given Q. We consider the set compound decision problem and allow the bi to depend on X_, i = 1,2, ..... 11. There are two loss functions that are commonly considered with this structure. They are as Heb=§§ga¢h1 1: and (1.9) Lani) =12 t i] where square brackets denote indicator functions. Ll defines the loss as the average number of misclassifications across the 11 components. L2 defines the loss as 0 or 1 depending upon whether all 11 components are correctly classified or not. At stage n, a Bayes rule with respect to g and L1 is determined by taking bi as 0 or 1 depending upon which maximizes the posterior probability distribution of 0i given _X_, i = 1, 2, ..... ,n. Most of our work concerns comparing rules relative to loss L1 and we refer to the Bayes rules in this case as simply "the Bayes rules." A Bayes rule with respect to g and L2 selects Q to maximize the posterior probability of Q given X. We refer to such a rule as a MAP rule following the common usage of "Maximum a—Posteriori" to describe such a rule. Of course, a Bayes rule has components that are based on the marginal distributions of the posterior of Q given X. There may be an ordering or spatial structure to the index set {1,2, ..... } that suggests certain classes of priors G for certain problems. The one—dimensional case with order of indices representing spatial or time order is called the transect case. We consider the transect case in Chapter 2. Here G might be taken to be a simple product measure G = Glelelx ..... or a Markov chain, depending upon the degree of dependence that is being modeled. In the literature cited previously, i.e. Cox (1960), Preston (1971), Devore (1975) and Hill et al. (1984), the 0i are assumed to be the realization of a stationary Markov chain. In most empirical Bayes literature, the 0i are assumed to be i.i.d. G1 where G1 is unknown. In the i.i.d. Gl case, the Bayes rule and the MAP rule are very simple since a conditional distribution for Q given _X_ is the product of the conditional distributions of Qi given Xi; i = l,2,...,n. In the Markov chain case, the Bayes and MAP rules are sufficiently complicated so as to have motivated researchers to impose simplifying assumptions. Recall, for example, that Cox (1960) takes P0, P1 to be Poisson, G to be a known Markov chain distribution and derives restricted Bayes rules, specifically, 3i restricted to be Bayes with respect to the class of rules that are (xi—1’ Xi) (or (xi-1’ Xi, Xi+1)) measurable. Preston (1971) takes the very special case P0 is degenerate on 0 and P1 is Bernoulli B(l, p) and assumes G to be from the family of stationary Markov distributions. Preston considers Bayes, MAP and restricted Bayes rules. Devore (1975) takes P0 = N(—l, 1), P1 = N(1, 1) and assumes G to be from a special one—parameter family of symmetric stationary Markov distributions. Hill et al. (1984) take P0 = N(p0, a), P1 = N011, a) (without loss of generality N(—1, 1) and N(1, 1)) and assumes G is from the family of stationary 10 Markov distributions. They derive an approximation to the Bayes rule. Preston (1971), Devore (1975) and Hill et al. (1984) give estimators for the transition probabilities defining G and, therefore, consider the empirical Bayes model in a case where the parameter sequence is not i.i.d.. 1.2 A General Approach Consider the decision problem involving Q introduced in last section and suppose that the loss function is L1. Then a decision rule Q is evaluated in terms of average number of errors ( misclassifications ) across the 11 components. We will first review the compound approach to the construction of Q and will then introduce the extended compound approach as a general approach to the construction of rules with favorable risk behavior. There is a long history of work on the two—state compound decision problem including Robbins (1951), Hannan and Robbins (1955), Hannan and Van Ryzin (1965), Van Ryzin (1966), Gilliland, Hannan and Huang (1976), and on the extended version including Gilliland and Hannan (1969), Ballard and Gilliland (1978), Ballard, Gilliland and Hannan (1975), Vardeman (1975), (1980), (1981). Our review will be relatively brief and the arguments will be heuristic. For a (compound) decision rule Q, the compound risk at Q and n is (1.10) MM) = E, L1(M). where L1 is defined in (1.8). If G111 is the empirical distribution of 01,02, ..... ,0n and R1 is the Bayes enve10pe risk function in the component problem, then the modified regret of Q at Q and n is 11 (1.11) 11(1):) = s(1.1)-R1(G,1,). Let y be a given class of distributions G on {0i.} For each 11 let 9 denote the class of all (set) compound decision rules Q with Qi a function of X, i=1,2, ..... ,n. Let B,(Q,Q) denote the Q expectation of B,( Q, Q). Suppose that Q(Q) minimizes R( Q, Q) among all rules Q in a We call Q(Q) a Bayes response and denote the minimum Bayes risk as B(Q). Then any rule Q can be judged by its Bayes risk relative to that of Q(Q), i.e., by the excess (1.12) 9111.1) = 3111.1) - file). Note that this is not simply the Q expectation of (1.11). Any sequence of rules Q such that (1.13) iimn DELI) = 0 for all (3 e y is said to be WM. If y is a singleton set, i.e., G is assumed known, then the only difficulty that arises in the Bayes decision problem is in calculating a Bayes response Q(Q), since it may be a complicated function of Q and X. In the literature reviewed in the last section, Cox (1960) took G to be a known Markov chain. Robbins (1956), (1961) introduced the empirical Bayes decision problem by assuming that G is unknown and using X to estimate Q or Q(Q). ( As introduced by Robbins, only the component observations X1,X2, ..... ,X. l are available for making the decision about 0i, i=1,2, ..... so that his version 12 is sequence in nature.) Robbins proposed (bootstrap) empirical Bayes rules of the type Q(Q) where Q is an estimate of Q and rules of the type where the Bayes response is estimated directly without estimating Q. It is obvious that the class y of priors to which G is assumed to belong must be restricted for the empirical Bayes approach to produce reasonable methods. Robbins chose y to be the class of all symmetric products y = { Glelelx ..... lo, e V1} in which case the Bayes risk enve10pe is simply B(Q) = R1(Gl), independent of n. This is the assumed structure for f in most empirical Bayes literature where much of the effort concerns finding sequence rules Q that are component-wise asymptotically Optimal, such that . ‘ 1 (1.14) 11mi E [ 0i if 0i] = R (GI) for all G1 6 fl, where bi depends on X through X1,X ’Xi' The feasibility of 2, ..... constructing sequence empirical Bayes rules that are asymptotically optimal for this f is quite apparent. Since the observed random variables X1,X2, ..... are i.i.d. according to the G1 mixture PGl of P0 and P1, if G1 is identified by PG , then consistent estimation of G1 is 1 possible. Of course, the restriction to product measures is unreasonable in applications for which the parameters 01,02, ..... are correlated. In the last section, we reviewed examples of empirical Bayes problems where y went beyond the product case. 13 The empirical Bayes problem becomes considerably more complex if the class of distributions contains distributions that are not symmetric product measures. The excess risk (1.12) is the apprOpriate measure of efficiency for a decision rule Q , but problems arise in the construction of bootstrap empirical Bayes rules. For one thing, y may be of such high dimension so as to make accurate estimation of Q or Q(Q) based on X impossible except for very large 11. Moreover, Q( Q) may be such a complicated function of Q and X so as to make estimation difficult and B,(Q) may depend on n. The dimensionality problem is not severe when y is a family of Markov distributions. For the two—state component we are considering, the stationary Markov distributions are indexed by two real parameters. The literature reviewed in the last section concerned this family. The second problem has led investigators to approximate the functional form of a Bayes response Q(Q) and to estimate the approximation. For example, (1.7) is an approximation to the decision rule Qi(Q) in the transect case with stationary Markov prior. Truncating the range of j in ( 1.6) to various neighborhoods of i produces additional approximations. The extended compound decision theory developed by Gilliland and Harman (1969) suggests a general approach for finding empirical Bayes decision rules in our applications. It provides a logical and systematic method for finding approximations to Bayes rules. It solves the dimensionality problem and the approximation problem by defining envelope risk in terms of that resulting by restricting decision rules to a class 9“ C .9 . Basically, the restriction is limited so that the enve10pe risk is close to Bayes risk B,(Q) and, at the same time, the restricted Bayes decision rules are simple enough to be easily estimated. Finally it should be 14 noted that this approach leads to compound decision rules which have controlled risk behavior conditional on Q whereas the competitors that we test have not been shown to enjoy this more robust risk behavior. We will now give some details to illustrate this approach. This will require introduction of the extended envelope theory and I‘k construct of Gilliland and Harman (1969). The parameter vector Q belongs to the parameter set Q = 911 where 9 = {0,1}, in our case. Suppose that associated with each index i E I = {1,2, ..... ,n} is an (ordered) k—tuple of indices, .1; e Ik, where i is the last index in 4*. We let Q‘l‘ denote the restriction of a to 4*, i e 1. An example with k = 3 is provided by 4* = (i-l, 1+1, i) and (z? =(oi_1, ”1+1, 0,). (One can avoid endpoint problems by using the mod-n arithmetic on I to define the Q1: for all i E I or simply by restricting the indices, say to i = 2,3, ..... ,n-l in this example.) The decision problem concerning the last component of a k—tuple with the decision based on independent obsevations based on all k parameters is called 11. Pk construct by Gilliland and Hannan (1969). Thus, we see that associated with n, the compound decision problem Q, and a neighborhood system 1* are n I‘k-decision problems with parameters Qll‘,Q12‘, ..... ,QE. With G: denoting the empirical distribution of this sequence and Rk denoting the Bayes envelope risk for the I‘k decision problem, the k-extended modified regret for a decision rule Q is (1.15) Likes) = mas) - R“(G,‘§). which subsumes (1.11) as a special k=1 case. Note that we are suppressing the display of dependence on I * by displaying only the size 15 k of the neighborhoods. As shown by Gilliland and Hannan (1969), for k k > 1, the R are more stringent envelopes than the (unextended) component enve10pe R1 = R, and, since G111 is a marginal of (3:, (1.16) Rk(G:) g R1(G111) for all Q, n. The enve10pe risk Rk(G:) is the envelope to compound risk obtained by minimizing over the class d‘ of compound decision rules Q where Qi is a fixed function of X? i = 1,2, ..... ,n. (The function does not depend on i.) -1, We are interested in the Bayes risk obtained by taking expectation of compound risk with respect to the distribution Q on Q. Suppose that G is a strictly stationary distribution on 01,02, ..... with common marginal Gk on Q‘i‘, i= 1,2, ...... . Theorem 3 of Gilliland and Hannan (1969) shows, in our application, that any compound decision rule Q satisfying (1.17) lim supn Dk(Q,Q) _<_ 0 for all Q also satisfies (1.1s) lim snpn pk(g,k) g o for all Q where (1.19) shes) = sash — R“(G“)- Suppose that the neighborhood structure is such that Rk(Gk) is close to Bayes risk B,(Q) for all C E y, specifically, suppose that (1.20) 11km“) 5 mg) + c for all G, n. 16 If Q is an asymptotic solution to the extended compound problem, i.e., satisfies (1.17), then, by (1.12), (1.18) and (1.20), (1.21) lim supn med) g c for all C e y. Thus, the compound decision rule will be asymptotically c—Bayes. The extended rules we prOpose can be shown to satisfy (1.17) by suitable adaptation Of the sequence extended work Of Ballard, Gilliland and Hannan (1975). Recall from the last section that Preston (1971), Devore (1975) and Hill et al. (1984) consider approximations to the Bayes response Q(Q). They examine the component Qi(Q) as a function Of X and approximate it by a. function Of Xj for j in a neighborhood of i, i E I. In the empirical Bayes situation where G is an unknown stationary Markov distribution, they propose empirical Bayes procedures that estimate G using X and plug these estimates into the component approximations. They give no rationale for the approximations. In contrast, the extended compound approach provides a rationale for determining the neighborhood structure. It should be large enough to control the excess Rk(Gk) - mg) and small enough to allow for efficient estimation Of the Optimal decision procedure in the corresponding restricted class of compound decision rules 9“. We find in our risk calculations in latter chapters that the extended compound rules outperform the competitors selected for comparison. 17 1.3 The Imsge Problem We have called the case where the index set I = {1,2, ..... } the transect case. Besides the transect case, we will consider the two—dimensional case where the indices index positions in an integer lattice, for example, the pixels in an image. We call this the flags case. Here {oij} may reasonably be assumed to be the realization Of a Markov random field in some applications. A brute force calculation Of the MAP estimate for the image Q with n = leN2 requires a search for a maximum across 211 calculated posterior probabilities. There has emerged a considerable literature dealing with the computational issues that arise in the MAP approach to image reconstruction 2 = 65,536 and, therefore, a since an image may easily have n = 256 posterior that is supported by 265’536 points. (LANDSAT 1—3 scanned the earth every 18—19 days recording images on a 3240x2340 lattice. Ripley (1986).) In the typical empirical Bayes approach to the two—dimensional case, the unknown Markov random field distribution has parameters that define the conditional distribution for each 0i j given values of am in a Specified neighborhood. It follows that the posterior probability for 0i j given X depends most heavily upon the Xkl for (k,l) in a neighborhood of (i,j). The result is that approximations to Bayes rules and MAP rules are possible and that estimates Of the parameters based on X provide for the construction Of empirical Bayes procedures for classifying the components of Q. The extended compound approach has the same advantages in the image case as in the transect case. In fact, Swain et al. (1981), Tilton et al. (1982) have used the compound decision theory approach to develOp rules tO classify image data. A difficulty encountered in the implementation Of the 18 rules is the estimation of GE. A "classify—and—count" method is prOposed to estimate Gk with the use Of a training set Of data (a set Of data with 11’ known classifications or a representative sample from the image data). First, the training set of data is classified by using uniform prior probabilities and then from this classification estimates Of G: are Obtained by counting the occurences of Qll‘. We will use the extended compound approach in the image case and compare the rules with the empirical Bayes classification rules suggested by Morris et al. (1985), Morris (1986) on selected images. 1.4 Summary In this Chapter we have reviewed the most relevant literature and have set the stage for later work where the risk behavior Of extended compound decision rules is compared with that Of other rules. In Chapter 2 we develop the details of the transect case. We derive an algorithm for the calculation of the MAP rule. We introduce an empirical MAP rule and describe further the empirical Bayes classification rules of Hill et al. (1984). The extended compound rules are developed and some results of simulations to determine risk behavior for the various rules are summarized. In Chapter 3 we develOp the details Of the image case. We introduce the empirical Bayes classification rules Of Morris et al. (1985), Morris ( 1986). The extended compound rules are deveIOped and some results Of simulations to determine risk behavior for the various rules are summarized. In Chapter 4 we give the results of the many simulations that were run. In the transect case, only Q generated as Markov chains are considered. In the image case both fixed pattern and randomly generated Q are considered. 19 Risks conditional on Q are estimated in every case. In image segmentation literature, the performance Of a classification rule is Often measured by the quantity "per cent classified correctly" (PCC). The measure PCC is related to the loss function Ll by PCC = 100(1 - Ll). However, a higher PCC does not necessarily mean that the image reproduced will be perceived to be better than one with a lower PCC. Swain et al. (1981) suggests measuring the performance of classification rules in two ways, one Of which is PCC. The other measurement is "average by class accuracy" (ACA). The ACA is obtained by first computing PCC for each class and then taking their arithmetic average. ACA will show whether the classification rule favors one class more than the others. We will report risk behaviors in terms of PCC. CHAPTER 2 THE TRANSECT CASE 2.1 Introduction In this chapter we consider the classification on a transect. Let 0i 6 O = {0,1}, i = 1,2, ..... . For the empirical Bayes application, it is assumed that parameter sequence {0i} has a stationary Markov chain distribution G as its prior. We derive an empirical MAP rule and extended compound rules for the classification Of Q. The risk performances of these rules are evaluated in comparison with the empirical Bayes rules (1.7) suggested by Hill et al. (1984). Let {Xi} denote the random observations. For our calculations, we take the Robbins' component. Thus, the Xi are conditionally independent given {0i} with Xi ~ P 0i, where P 0i is the normal distribution with mean 20i - 1 and variance 1. We let f0 denote the density Of P0 . The sequence of states {Qi} is assumed to be a realization of a two state stationary Markov chain with initial probability distribution (2.1) P(0=0)=p=1—P(0=1), 0=10 = 01 n {(Poo p01) [(0 ,_1, o) i (1 1)}} For all Q E 911 with (oi—1’ 0i) 6 92 - {(1, 1)} and 01 = 0; i = 2,3, ..... ,n n li—Q 0i1—0i_ f(§1.€) = RHS (2'4) x iI=12(Pool P01) and (2.5) becomes 11 log f(x,Q) = a + 2 bioi i=1 where 1 P 2 a = — 3— log 21 — 2121 (xi + 1) + (n-l) log p00 , b=2x+log—1-— =2x+log[l] 1 1 p00 1 T ’ P 1 1— . P00 and b = 2xn + log p0——1- = 2x + log [1'6]. 11 p00 11 T 28 2.3 Empirical MAP In order to implement the MAP rule (2.8) we need to obtain estimates for the unknown parameters of G. The proposed estimates are based on 2(_. When plugged into (2.8) the result is an empirical MAP rule. . X i’ is an unbiased estimate 1 1 II M: (ii) Let Zi = [sgn Xi 1‘ sgn Xi+l]’ i = l,2,...,n-1 and _ l n—l = a7 _ - b is an unbiased estimate for p(1—b), where a = .5[(1) - (-1)]‘2 1.0731 and b = 2a «1(1) (-1) 2 .2865. ll Proof (i) Since E(Xi|0i) = 20i —1 and E(0i) = l—p the proof of (i) is immediate. (ii) We have Zi = [sgn Xi at sgn Xi +1] Then, and the RHS (2.9) is evaluated by partitioning on (0i, 2+1) as follows. 29 Let RHS(2.9) = 2(pl+p2+p3+p4) where P1 = P(Xi 3 0’ X1+1 < 01 0i = 0’ “1+1 = 0) p3 = P(xi 2 0, xi+1 < 0, 0, = 1, em = 0) and p1, p2, p3, p4 can be evaluated without much difficulty. For example, P1 = P(Xi = P(Xi = (1 - <1’(1))‘1’(1)1511 = 9(-1)<1>(1)6p and similar computations yield p, = mania-0p p3 = («rustic-6) p4 = 2(1)(-1)11—p—p(1—01 . Z 01Xi+1< ””1 = 019m: 0) PM = 01‘1“: 0) Olli = 0)P(Xi+1 < 0|0i+l = 0)6p IV RHS (2.9) = 2p(l-6) [c(1) — <1>(—1)]2 + 2e(1)<1>(—1). zi-2<1>(1)<1>(-1) J = _ 5, E [mums—1))2 p“ ) and this completes the proof. :1 30 Since 0 5 p, 6 5 1, it is natural to truncate U11 and Vn—l at 0 and l and use the modified estimators I Un = Un[0 < Un < 1] + [Unz 1] I Vn—l The unknown parameters p and 6 are estimated by solving the = Vn_1[0 < Vn_1 < 1] + [Va-1 2 1]. equations D = 111’, (2.10) p(1 _ 5) = vii-1' 2.4 Classification by Extended Compound Bayes Rules We begin this section with a brief review of the I‘k construct introduced by Gilliland and Harman (1969) when applied to the finite—state, finite—action compound decision problem. Consider a component decision problem with states 0 E 9 = {1,2, ..... ,m} indexing possible distributions P0 E .9: {P1,P2, ..... ,Pm} where Pi (i=1,2, ..... ,m) are distinct probability measures on (..S .3. The actions a e .1: {1,2, ..... ,n}. Let the loss function L(0, a) be such that 0$L(0,a) 1 - 1 1‘ f1(Xk) E pilfi(Xk—l) {0(Xk) .E piofi(Xk—l)' 1—0 < 1—0 34 This is equivalent to 1 Z 1 i§0p10f1(xk-l) decide 0k = if Xk 2 log I 0 <.E0P11fi(Xk-1) which can be written as 1 2 (2.14) decide 0k = if Xk C(p(Xk_1)) 0 < where lE PilfiP‘) p“) = i s m p.. x 1—0 j=0 ‘1 ‘ and C(9) = $103 [L32] - The Bayes enve10pe in the [‘2 decision problem is 2 2 __ x x 11(2) — Ix x1 {X2: x2$c(p(xl))} [f 0( 1)})01 + f1( 1)P11]f1(x2)dx1 dxg f x f x f dx (1 + Ix1 {x2zx2>{:(p(x1))} [0( 1)P00 + 1( 1)P10] 002) 1 x2 35 = 1x1 [f0("1)P01 + f1(x1)P11]°(c(P(xl)) 4W1 “I" le [f0(x1)p00 + f1("1)P10][1 " ¢(C(P(x1)) + 1)]dx1 1 (2.15) = igo Ifi(x1)[p,0{1- (c(p(x1)) + 1)} + pil(c(p(x1))-1)ldx1. For the F3 decision problem, the state space is e3 = {(i, j, k): i, j, k = 0, 1} and we let p3 = {pijkz i, j, k = 0, 1} be a probability measure on 93. In this case (2.12) gives 1 E f101<(§3)P10k 1 A(x)=2 0'3 i=0 k=0 1 1 =f 2 2 f.x f x p. 0(X2). “(=0 1( 1)k( 3) 10k 1: and 1 E0 fillt(3‘-3) Pilk l A(x)=2 1‘3 i=0k- 1 1 = f1("2)iE0 kE0 fi(xl)fk(xk)pilk - 36 The I‘3 Bayes rule versus p3 is given by 1 (2.16) decide 0! = if 0 1 1 2 1 1 f1(x£) iE0 kEopilltin‘t-l)fl<(xt+1) £009) iE0 kEoPiokin‘t—l)flt(xt+1) - < This is equivalent to decide 0! = if 1 kE0 piOkfi(Xl-1)fk(xl+l) >< N [0| 0% “1441(th O 1 0 kE0 pilkfi(Xt-l)fk(xl+l) which can be written as l 2 (2.17) decide 0! = if X l c(p(X[_l, X t +1)) 0 < where 1 1 2 2 pi _ 1: = NEW-g: 213 ‘1]: f()f() p... . X y i=0 j=0 k=0 ‘1‘“ 1‘ lkfi(x)fk(y) 37 It can be shown that the Bayes envelope in the I‘3 decision problem is 19(3):}; I A(-)dx B x1"3[{x2=x2.<.C(p(x1.x3))} 1x3 2 + j A ( ) ] dx dx {x2=x2><=(p(x1,x3))} 053% l 3 1 1 = igo kgo I x1 Ix3 fi("1)fli("3)[P101t{1 — ‘P(C(p(x1, x3)) + 1)} + pilk (c(p(xl, x3)) - 1)] dxldx3 . The estimation of the I‘2 and 1‘3 Bayes rules versus empirics in the compound problem depends on the construction of estimates for G121 and G3. The estimates for these empirical distributions are then plugged into (2.14) and (2.17) in place of p2 and p3 yielding extended compound Bayes rules. We follow Hannan (1957), Robbins (1964), Van Ryzin (1966) and Ballard (1974) in the construction of unbiased estimates. Definition 2.1 h = (110’ hl) is said to be an unbiased estimator of the family .9: {P0, P1} if Eihj(X)=[i=j] i,j=0,1 where Ei denotes expectation with respect to Pi . 38 From h_ one obtains unbiased estimators of 9 k by taking _ . k hik(xk) — hi1(xl)hi2(x2) ..... hik(xk) for all 1k E 9 . Such an estimator is called a product estimator [Ballard (1974)]. The unbiased estimator we use for {P0, P1} is the kernel function used by Robbins (1951): {(x) = (r0007 110‘» where r0(x) = é-(l — x) and rl(x) = £1 + x). These are used to produce estimators for the components of G121 and G3 as follows: ‘2 _ 1 n and . —1 3 _ 1 n (2.19) (3114;in — _2n— E2 ri(Xa_1)rj(Xa)rk(Xa+l) , n 2 2 . These estimators are unbiased and consistent for the components of G34 and G24. The extended compound rule ER2 is formed by substituting (2.18) for pij in (2.13) and letting k = 2,3, ..... n . Similarly, the extended compound rule ER3 is formed by substituting (2.19) for pijk in (2.16) and letting l = 2,3, ..... ,n—l . 39 2.5 Comparison of the Methods Even though we have suggested two types of classification rules, MAP and extended, our main interest is in the performance of the extended rules. Chapter 4 describes in more detail the simulations and the classifications obtained using various rules. As an example, we present some of the PCC comparisons for the extended rule ER3 and its competitor HEB3 defined following (1.7) in Chapter 1. Table 12 in Chapter 4 gives more detailed information on the performance of these two rules. In 100 simulations for each of 84 different (p, 6) combinations: A. with p = 50 mmmngnts (i) In 76 (90%) cases the ER3 has a higher PCC than the HEB3. The improvement of ER3 [=PCC of ER3 - PCC of HEB3] has a mean of 3.74 with standard deviation 2.62. (ii) In 8 (10%) cases the HEB3 has a higher PCC than the ER3. The improvement of HEB3 has a mean of .82 with standard deviation .48. B. with n = 299 cgmmnents. (i) In 75 (89%) cases the ER3 has a higher PCC than the HEB3. The improvement of ER3 has a mean of 2.30 with standard deviation 2.25. (ii) In 9 (11%) cases the HEB3 has a higher PCC than the ER3. The improvement of HEB3 has a mean of .33 with standard deviation .24 It was also observed that the ER has a very high improvement over HEB when either p is very small ( < .1) or 6 is very large ( >.9). 40 In implementing the empirical MAP rule (2.8), the unknown parameters (p,6) were estimated by using Proposition 2.1 and the following convention was used in cases where either the estimates were undefined or truncation was necessary. Let u,v denote un , v respectively. n—l (i) If{usO}U{0 .5 , v s 0} then decide all 0i = 0. (iii)If{u21,v,>_1}u{.5 no. The Bayes rule versus a probability measure p on 0 = 0 and l - p on 0 = 1 is 1 2 (4.1) decide 0 = if (1 - p)f1(X) pfO(X) 0 < which is equivalent to 1 2 (4.2) decide 0 = if X c(p) 0 < where _ 1 1 (4-3) c(p) — 2 (#0 + #1) + ”1 - #0 log [1%] The Bayes risk R(p) is given by (4-4) R(p) = (H) ¢(C(p) - #1) + pll - “C(10) - #0)]- We shall call the Bayes rule versus the uniform prior on 9 the simple rule (SR). Then (4.2) and (4.3) gives the simple rule as (4.5) decide 0: (I) if X 2 ago + 141) < and (4.4) become (44) R(-5) = «as, + 41)) . 60 61 Table 1 below gives the Bayes risk and the theoretical value of the PCC when classification is performed with the SR, in the special case when ”0 = “”1 = “I‘- Table l. Bayes risk and PCC for the SR ‘5 Bug risk PCC .25 .4013 59.87 .50 .3085 69.15 .75 .2266 77.34 1.00 .1587 84.13 1.50 .0668 93.32 1.75 .0401 95.99 2.00 .0228 97.92 The PCC of the SR is a reasonable measure against which to compare the performance of other classification rules. In our simulation studies we have taken 11 = 1 so that, if a decision rule is to be of any practical use, then its PCC has to be at least 84.13. The simulation programs were written by Mr. G. Heidari under the supervision of the author. The programs are in Turbo Pascal, Version 87. Standard normal deviates were generated in pairs (2], z2). Using the inverse probability transform for the Raleigh distribution and a uniform (0, 1) random variable, a radius R was generated. Using the uniform (0, 21), an angle (4 was generated and (zl, 22) determined by 21 = R cos 1.) , 22 = R sin 11). 62 4.2 Simulations on the Transect In our simulation studies the .0 sequence was generated as a Markov chain with the parameters (p, 6) specified by (2.1) and (2.2). A total of of 84 different combinations of (p, 6') were considered, with p = 0, .01, .10, .20, .30, .40, .50 6 = 0, .10, .20, .30, .40, .50, .60, .70, .80, .90, .99, 1.00 . The observable Xk(k = 1,2, ..... ,n+1) were generated as Xk = Zk + 20k-l where Z1‘ is i.i.d standard normal. For each (p, b), 100 replications were performed with n = 50 and n = 200. The simulated data were classified by the following 5 decision rules. 1. EMAP: Empirical MAP rule (2.8) 2. ER2: r2 Bayes rule 3. ER3: I‘2 Bayes rule 4. HEB3: Empirical Bayes rule of Hill et al. (1.7) with j = 1. 5. HEB5: Empirical Bayes rule of H111 et al. (1.7) which j = 2. Tables 2 - 11 display the mean and the standard error of the PCC for the 5 decision rules. (All tables appear at the end of this Chapter.) The ER3 and HEB3 when used for classifying 0i are (Xi—l’xi’xi +1) measurable. Table 12 compares the performance of ER3 versus HEB3. 63 4.3 Classification of Images Due to the complexity of the simulation of the true image as a MRF only special types of patterns and MRF were considered. The patterns considered were 1. checkerboard size 25 x 25 2. road sign size 25 x 25 3. great lakes size 31 x 44. In the case of MRF simulations, the Markovian dependence of 0ij was restricted to only one side. In particular, we assume (4.7) P(0ij|rest) = P(0. .0| ij—l’a i—lj)' MRF of type (4.7) with additional simplifying approximations were considered by Hansen and Elliott (1982), Haslett (1985). In our simulations the MRF was generated as (4.7) by specifying values for the 4 parameters p1 = Puij = ”Mg—1 = 0’ ‘i-1,j = 0) p2 = PM)“ = "I‘m—1 = 0’ ‘r—1,j = 1) p3 = P(0ij = 0|0.,J._1 = 1, oi—l,j = 0) p4 = Puij = ”lag—1 = 1’ ‘1-1,j = 1) The MRF simulated were of size 31 by 44. In each of the images the lattice was extended to size 27 by 27 or 33 by 46 is an obvious way, in order to provide the border pixels with enough neighbors. The conventions used in obtaining the image was to color the pixel black if 0 = l and white if 0 = 0. As in the transect case, for each pixel (i, j) the observable Xij were generated as where Zij is i.i.d. standard normal. 64 The image segmentation was performed by the following 4 decision rules 1. ER4: Extended rule (3.16) with 4 neighbors 2 SR: Simple rule (4.5) 3. MEB4: MEB rule (3.7) with 4 neighbors 4 MEBS: MEB rule (3.7) with 8 neighbors After obtaining the image by using ER4 three revisions (lRER, 2RER, 3RER) of the extended rule (3.16) were implemented. The method of revision is as follows. At the kth (k 2 1) revision, the data were classified using ER4 with the empirics (3151(f5) estimated from the (k—l)St revision (0th revision is the classification obtained by using ER4) For each image 25 simulations were performed. Morris et al. (1985) reports their findings based on one simulation except for one image where 10 repetitions were done. Tables 13—18 give the PCC for the decision rule used on the 6 images. For each image we have reconstructed (Figures 3—14) the best and the worst (based on the PCC of ER4) classification. Figure 15 shows a reconstruction of the great lakes pattern along with the PCC and the ACC at each stage. 65 4.4 Conclusion The simulations performed with regard to the transect case show that overall the 1‘3 Bayes rule performs better than the existing empirical Bayes methods proposed by Hill et al. (1984). The simulations performed with various images, both deterministic and stochastic, show that again the extended rules perform better than the empirical Bayes rules suggested by Morris et al. (1985), Morris (1986). In a total of 150 simulations with various images, MEB4 had a higher PCC than ER4 only once. Furthermore in 142 simulations the ER4 had a higher PCC than even MEB8. All of the revised extended rule classifications had a higher PCC than both MEB4 and MEB8. For some images, great lakes pattern and MRF with (p1, p2, p3, p4) = (.80, .40, .50, .60), the simple rule performed better than both the MEB4 and MEB8 rules in all 50 simulations, whereas for ER4 only 3 simulations had a lower PCC than the simple rule. Even in those 3 instances the revisions took care of that. The revisions of the extended rules had a small improvement on the PCC and no more than 3 revisions were needed to produce stability. The work of Geman and Geman (1984) falls within the empirical Bayes structure in that the hyperparameters are estimated from the marginal distribution of the data. It will be of interest to compare the performance of the extended rules with the method of simulated annealing proposed by Geman and Geman (1984). .10 .20 .30 .40 .70 .80 .99 1.00 Table 2. 2‘8 2‘8 {"3 1‘8 Pg .. as as as 88 as 88 <9 1"?“ 5% The PCC .01 90.42 1.58 91.32 1.35 89.74 1.27 88.98 1.51 89.66 1.35 89.98 1.88 91.18 1.53 93.12 1.21 92.44 1.41 93.66 1.15 90.08 1.63 93.32 1.20 I... O .4 .. 2"‘9° 33 S8 00 «I t—Iga hub 88 oo hit-I 00 as a: 8 oo on H "‘9’ 599 HO! Nth oo l—IH 32 83 3% ea r8 e522 88 88 .20 79.88 .98 78.38 1.31 81.52 1.01 80.12 1.47 77.64 1.50 79.08 1.25 77.70 1.51 73.36 1.92 74.02 1.83 78.22 1.80 88.22 1.57 94.10 1.05 for the EMAP rule 78.78 .65 79.40 .62 80.42 .77 80.40 .91 80.54 .85 78.88 1.06 73.00 1.51 73.62 1.60 72.00 1.49 73.76 1.91 90.64 1.46 95.02 1.12 with n .40 77.68 1.15 79.18 .80 78.72 .78 78.24 .88 79.80 .71 79.30 .83 76.50 1.18 74.02 1.37 70.74 1.53 70.86 1.78 90.06 1.39 95.10 1.01 =50 75.42 2.26 71.74 1.83 75.72 1.22 77.74 79.24 .83 80.24 .10 .30 .40 .50 .60 .70 .80 1.00 Table 3. .01 90.84 1.48 91.33 1.58 90.19 1.43 92.28 1.37 ‘O CO 8 on ‘0 rr #9 r. #9 rr 3% 38 88 $3 83 to co 2"!" 1"!" $8 83 Lg i—ias 1—11— .10 81.82 1.03 81.85 1.09 85.85 .78 81.10 1.47 81.09 1.73 79.34 1.79 77.20 2.21 79.39 1.92 80.40 2.14 80.57 2.18 87.11 1.80 94.01 1.17 .20 80.57 .42 82.50 .48 83.44 .78 85.26 .46 83.62 .70 80.24 1.34 78.17 1.37 72.06 1.86 69.52 1.70 66.22 2.16 81.30 1.99 93.47 1.09 .30 77.69 .41 77.43 .50 80.94 .48 82.97 .43 83.95 .37 82.18 79.91 .76 70.99 1.42 66.23 1.69 61.53 1.69 78.43 1.78 92.86 1.12 .40 77.37 .83 74.07 .85 74.58 .70 79.61 82.50 82.33 .33 81.52 .43 75.53 66.57 1.13 61.56 1.16 72.89 2.20 96.19 .96 The PCC for the EMAP rule with n = 200 .50 67.01 2.95 72.59 1.24 75.67 .89 79.00 81.74 .45 83.17 82.64 .32 79.91 72.10 65.78 70.87 1.98 94.05 1.38 .10 .20 .30 .40 .70 .80 1.00 Table 4. The PCC for the HEB3 rule with n = 50 CO CD p—s t—s NCO p g r. ca: at: 00 MN 01” N we» 1°59 1° 90‘ .8 r3 as r3 .° 3: as as as as to . 3".“ N 83 38 to "‘3" .10 84.60 1.28 86.26 1.33 86.86 1.28 88.48 1.07 88.68 1.08 88.52 1.28 89.24 1.49 90.12 1.46 91.02 1.44 91.30 1.67 92.74 1.81 92.34 1.80 .20 83.38 .79 82.58 .78 85.36 .82 85.30 .80 84.98 .86 88.00 .68 88.12 .74 88.88 .70 91.52 .89 90.14 1.66 91.86 1.81 93.76 1.65 83.28 .61 83.46 .51 83.74 .56 82.80 .63 84.38 .49 85.32 .55 86.12 86.82 88.84 .67 91.68 .93 92.00 1.79 94.18 1.61 .40 88.12 .49 85.70 83.26 82.80 .49 82.22 .57 82.42 83.32 85.54 .58 88.08 .59 90.74 93.52 1.46 93.18 1.60 95.02 .42 91.12 .45 86.98 .51 84.20 .51 82.50 81.70 81.56 .61 84.26 .61 86.34 .57 91.12 92.78 19.8 83' Table 5. p= .00 6: .00 94.48 1.65 .10 94.31 1.39 .20 93.39 1.71 .30 92.64 1.91 .40 94.09 1.66 .50 96.45 1.25 .60 94.80 1.36 .70 94.14 1.50 .80 93.24 1.89 .90 93.63 1.65 .99 90.45 2.25 1.00 92.08 2.04 #3 m r8 “& 8 co "‘. 1‘3!" 3—8 cc on 014 or ‘D Q 04 88 #3 1‘3 323 at: .8 as .10 88.81 90.75 .37 90.90 .37 91.75 92.59 .27 93.10 .33 94.21 .20 94.89 .22 94.46 .97 95.93 .52 95.12 1.35 94.69 1.40 .20 86.45 .35 87.58 .30 87.64 .25 87.90 .24 88.32 .26 89.12 90.39 .22 91.75 .25 92.74 .23 94.75 .21 95.56 1.24 95.01 1.39 .30 85.53 .28 84.63 84.73 .30 85.57 .25 85.08 .29 85.62 .25 87.53 .24 88.89 .24 90.96 93.68 .21 94.13 1.24 92.87 1.72 .40 88.17 86.27 .26 84.27 82.77 .28 83.82 83.64 84.89 86.31 .28 88.72 .27 92.25 96.21 r3 28 The PCC for the HEB3 rule with n = 200 95.85 .19 91.32 .21 87.23 .23 84.74 .26 84.00 83.30 .25 83.90 84.81 .24 87.82 91.17 95.32 96.43 1.07 70 Table 6. The PCC for the HEB5 rule with n = 50 p= .00 01 10 .20 .30 .40 .50 6:.00 90.90 84.44 78.66 79.84 82.48 97.46 1.89 2.47 1.74 1.13 .65 .31 .10 89.80 86.64 82.04 78.46 81.28 91.44 2.06 2.35 1.57 .90 .55 .43 .20 89.66 85.56 83.14 82.78 81.60 86.40 1.89 2.26 1.55 .92 .62 .55 .30 82.58 85.76 84.36 81.96 81.22 83.28 2.65 2.35 1.42 .81 .65 .54 .40 86.38 87.24 84.62 83.84 82.86 81.22 2.33 2.12 1.32 .90 .53 .57 .50 88.92 92.80 86.78 86.02 83.58 80.92 2.16 1.39 1.40 .81 .62 .58 .60 90.90 88.08 86.20 86.96 85.04 80.16 1.89 2.20 1.87 .87 .74 .64 .70 87.18 86.88 85.82 87.38 85.40 83.44 2.36 2.26 2.10 .98 .84 .58 .80 90.90 88.18 89.40 91.60 88.26 85.82 1.89 2.27 1.69 .91 .79 .59 .90 87.18 86.98 87.10 89.50 90.72 90.72 2.36 2.30 2.29 1.90 1.10 .64 .99 89.18 90.62 89.60 89.08 89.54 89.90 2.12 1.97 2.27 2.09 2.01 1.89 1.00 87.52 88.76 86.84 90.04 90.16 86.42 2.36 2.27 2.31 2.02 1.94 2.48 .10 .20 .30 .40 .60 .70 1.00 Table 7. 88 ‘0 :-'r-' $8 r3 :8 .3 NS 35 83 on 88 CD on NS P8 P9 w 1% “8 88 The PCC for the HEB5 rule with n = 200 F3 #3 93 98 98 9% 98 $8 98 98 as a: as :8 a: as 83 a; $8 3% 8 .10 87.02 .79 87.88 .76 89.14 .49 89.95 .70 90.80 92.60 .32 93.71 .28 94.95 .21 94.21 1.03 95.79 .67 94.25 1.47 90.60 1.96 .20 85.63 .39 86.31 .35 87.01 .27 87.11 .32 87.85 .27 88.35 .26 89.71 .23 91.55 .25 93.29 95.88 .18 95.98 1.19 93.15 1.76 .30 85.55 .27 84.47 84.00 .29 84.79 .28 84.46 .29 85.25 .27 86.90 .27 88.55 .25 90.83 94.43 .21 92.97 1.69 88.14 2.10 .40 88.39 .25 86.06 .26 83.76 .27 82.03 .27 82.82 83.06 .24 84.15 .26 85.64 .28 88.55 .27 93.45 .19 95.58 1.22 93.49 1.78 98.62 .12 91.71 .21 86.84 .25 83.98 .25 83.12 .28 82.56 .27 83.05 84.02 87.54 91.79 .24 95.67 1.37 94.51 1.42 Table 8. = .00 6:00 98.38 .31 .10 98.80 .22 .20 98.78 .25 .30 98.46 .22 .40 98.32 .25 .50 98.76 .19 .60 98.38 .31 .70 98.78 .20 .80 98.38 .31 .90 98.78 .20 99 98.68 .22 1.00 98.48 .24 The PCC for the ER2 .01 97.94 98.14 .24 97.56 .28 97.82 .27 97.82 .26 98.34 .22 98.34 .28 98.36 .23 98.66 .25 98.52 98.50 .21 98.82 .20 .10 91.50 .45 92.00 .40 91.68 .47 91.64 .47 91.94 .49 92.76 .49 93.36 95.00 .38 94.02 .65 96.86 98.44 .28 98.64 .20 .20 88.26 .44 87.16 .48 88.58 87.84 .52 88.08 .48 89.26 .45 89.04 .47 88.74 .55 91.32 .55 93.08 .59 97.12 .40 98.78 .21 rule with n = 50 .30 85.88 .52 85.90 .47 85.56 .51 84.90 .49 85.36 .48 86.82 .52 86.18 .55 86.76 .61 87.80 91.78 .55 96.92 98.82 .21 .40 86.90 .53 85.50 .47 84.42 84.14 83.32 83.42 .51 83.96 .49 85.56 86.30 .57 88.80 .59 97.46 98.78 .21 90.30 87.14 .59 84.86 84.14 .53 83.94 83.02 83.40 84.80 84.96 .59 87.98 .49 96.30 98.68 .24 6: .00 .10 .20 .30 .40 .50 .60 .70 .80 .99 1.00 Table 9. p: .00 99.47 .07 99.45 .08 99.46 .07 99.55 .08 99.56 99.60 .07 99.41 .07 99.58 .08 99.59 99.50 .07 99.42 .07 99.54 .07 73 The PCC for the ER2 rule with n .01 98.71 98.50 .11 98.66 98.78 .11 98.32 .13 98.88 .11 98.47 .12 98.62 .14 98.93 .15 98.77 .17 99.26 .14 99.56 .07 .10 92.52 .18 92.38 .17 92.34 .17 92.76 .19 92.81 .20 93.10 .21 93.91 .21 94.09 .25 94.76 .24 95.87 .27 98.26 .27 99.51 .08 .30 87.42 .30 86.18 .25 85.78 .25 85.99 85.63 .30 86.06 .26 87.22 .26 88.16 .25 88.99 .26 90.78 95.59 .36 99.43 .08 .40 87.59 .25 85.90 .23 84.54 83.49 .26 84.41 .25 84.50 .23 85.20 .22 86.22 .26 87.10 .27 89.22 .23 94.52 .41 99.53 .07 200 .50 91.62 .23 88.16 .28 85.80 .29 84.29 .27 84.41 .24 84.02 .23 84.38 .26 84.45 .26 86.58 .24 88.64 .23 93.89 99.45 .07 6:00 .10 .20 .30 .40 .50 .60 .70 .80 1.00 Table 10. p= .00 97.58 .36 98.36 .28 98.16 .33 8.8 8 8 8 8 8 8 a 82 88 82 88 82 38 88 is) oo :9 ..‘q 583 The PCC for the ER3 .01 97.46 .28 97.42 .31 97.02 .33 97.26 .34 97.22 .32 97.86 .25 97.88 .29 97.92 .26 98.16 .26 98.02 97.98 .29 98.12 .31 .10 90.56 91.60 .42 90.62 .55 91.10 91.42 .55 92.66 .48 93.32 .51 94.70 .42 94.16 96.62 98.00 .32 98.20 .25 74 .20 87.78 .48 86.40 .53 87.90 87.48 .53 87.34 .52 89.00 .49 88.80 89.36 .53 92.02 94.00 97.24 98.30 .24 rule with n .30 .40 86.22 88.86 .51 .49 85.64 86.62 .49 .50 84.88 84.42 .56 .50 84.10 83.22 .50 .53 84.96 82.50 .49 .49 86.12 83.10 .51 .50 86.96 84.20 .56 .53 87.52 85.82 .57 .59 88.74 87.12 .56 .58 92.28 90.20 .56 .60 96.62 97.62 .44 .32 98.28 98.22 .25 .25 50 93.52 .46 89.80 86.08 .58 83.88 .52 82.94 82.04 82.46 84.74 86.50 .57 90.22 .49 96.56 .41 98.22 .27 75 Table 11. The PCC for the ER3 rule with n = 200 p= .00 .01 .10 .20 .30 .40 .50 6: .00 99.16 98.35 92.10 89.19 88.71 90.14 95.05 .10 .12 .20 .23 .26 .24 .20 .10 99.23 98.16 92.13 88.72 86.45 87.33 90.99 .11 .15 .19 .23 .23 .23 .25 .20 99.08 98.42 92.09 88.08 85.48 84.96 87.52 .11 .14 .17 .22 .26 .28 .24 .30 99.26 98.55 92.45 88.20 85.82 83.30 85.00 .10 .12 .21 .22 .24 .25 .26 .40 99.26 98.21 92.87 88.72 85.29 84.06 84.52 .09 .14 .20 .27 .30 .27 .24 .50 99.28 98.72 93.52 89.15 86.19 84.14 83.65 .11 .12 .18 .25 .26 .23 .22 .60 99.10 98.25 94.23 90.43 87.99 85.58 84.11 .10 .13 .21 .22 .26 .22 .26 .70 99.27 98.45 94.73 91.71 89.19 87.04 85.30 .09 .12 .23 .27 .25 .27 .26 .80 99.12 98.74 95.68 92.49 90.85 89.02 88.52 .10 .15 .20 .24 .23 .31 .26 .90 99.11 98.66 96.60 94.46 93.10 91.80 91.22 .07 .17 .21 .22 .20 .21 .24 .99 99.10 98.98 98.40 97.23 96.95 96.30 95.99 .10 .13 .19 .26 .27 .30 .24 1.00 99.26 99.21 99.16 99.12 99.14 99.16 99.23 .09 .09 .11 .10 .12 .10 .10 76 Table 12. The performance of the ER2 rule versus the HEB3 rule A. n=50 p=.00 01 10 20 .30 40 50 6:.00 4.64 7.86 5.96 4.40 2.94 0.74 -1.50 .10 3.70 5.48 5.34 3.82 2 18 0.92 -1.32 .20 6.22 5.08 3.76 2.54 1 14 1.16 -0.90 .30 9.10 5.64 2.62 2.18 l 30 0.42 -0.32 .40 5.78 7.70 2.74 2.36 0.58 0.28 0.44 .50 6.72 4.28 4.14 1.00 0.80 0.68 0.34 .60 4.64 5.50 4.08 0.68 0.84 0.88 0.90 .70 8.90 9.38 4.58 0.48 0.70 0.28 0.48 .80 4.64 5.52 3.14 0.50 -0.10 -0.96 0.16 .90 8.90 9.88 5.32 3.86 0.60 -0.54 —0.90 .99 5.68 4.12 5.26 5.38 4.62 4.10 3.78 1.00 6.72 5.70 5.86 4.54 4.10 5.04 7.48 76/84 90% ER3 better: mean = 3.74 std. dev. = 2.62 8/84 10% HEB3 better: mean = .82 std. dev. = .48 B. n=200 p= 00 01 10 .20 30 40 50 6:.00 4.68 4.31 3.29 2.74 3.18 1.97 -0.80 .10 4.92 4.97 1.38 1.14 1.82 1.06 -0.33 .20 5.69 7.78 1.19 0.44 0.75 0.69 0.29 .30 6.62 4.07 0.70 0.30 0.25 0.53 0.26 .40 5.17 3.37 0.28 0.40 0.21 0.24 0.52 .50 2.83 2.34 0.42 0.03 0.57 0.50 0.35 .60 4.30 7.18 0.02 0.04 0.46 0.69 0.51 .70 5.13 2.45 -0.16 -0.04 0.30 0.73 0.49 .80 5.88 1.74 1.22 -0.25 —0.11 0.30 0.70 .90 5.48 2.28 0.67 -0.29 —0.58 -0.45 0.05 .99 8.65 2.75 3.28 1.67 2.82 0.09 0.67 1.00 7.18 2.25 4.47 4.11 6.27 2.26 2.80 75/84E89%; ER3 better: mean = 2.30 std. dev. = 2.25 9/84 11% HEB3 better: mean = 0.33 std. dev. = 0.24 lRER 94. 868 94.575 95. 528 95. 015 96.554 95. 381 95. 601 95. 528 96.554 95.968 94. 648 95.968 95. 528 95. 748 95. 894 95. 528 95. 674 94.795 94.721 95. 601 95.088 95. 821 95. 821 95. 674 94.868 95.478 .542 2RER 94.795 94.721 95.528 94.868 96.481 95.455 95.601 95.528 96.408 95.894 94.648 95.821 95.455 95.601 96.114 95.528 95.674 94.941 94. 941 95. 601 95.235 95. 748 95. 601 95. 674 94. 795 95.466 .499 77 3RER 94.795 95.015 95.455 94.868 96.481 95.381 95.601 95.528 96.188 95.894 94.648 95.821 95.455 95.601 96.041 95.601 95.674 94.941 94.941 95.601 95.161 95.748 95.601 95.601 94.868 95.460 .462 SR 83.871 83.798 84.824 83.871 84.897 83.871 83.284 84. 091 85. 557 85.191 85.264 85.557 85.337 84. 971 85.484 84.091 83. 798 83.138 84.531 83.358 84.091 84. 824 84.604 82. 845 84.091 84.370 .800 MEB4 71.408 70.894 69.941 71.408 71.188 70.528 72.067 70.894 71.774 70.455 71.188 70.894 72.727 71.848 71.334 70.528 71.041 69.795 70.455 70.528 70.088 70.015 71.334 69.868 71.848 70.962 .756 Table 13. The PCC with the GREAT LAKES pattern MEBS 73.680 73.837 73.900 73.680 74.194 72.947 73.607 73.314 73.754 72.874 72.874 73.021 74.633 73.387 74.707 73.754 73.240 72.581 72.727 72.287 72.141 73.974 72.067 72.067 74.633 73.337 .784 ER4 90. 560 91. 360 93. 600 89.440 90.720 92.480 90. .720 93.120 91.520 89.600 91. 360 92.960 92. 000 92. 480 91. 360 88. 480 92. 160 93. 280 92.320 92.960 92. 000 93. 440 mean 91.706 s.d. 1.467 lRER 91.360 92.480 93.440 89. 280 92. 160 94.080 91. 360 93.120 92.320 89.760 89. 760 91. 520 91. 840 92. 800 93.760 92. 320 90.400 92.000 93.440 93.280 93.920 93.600 92.640 93.920 92.186 1.441 78 SR 82.720 83. 520 85.760 83.200 84.000 85.440 83.840 84.480 83.040 82.560 84.480 84. 800 84.160 83.520 84.800 85. 440 83. 840 85.920 86. 240 84.960 87.040 84. 800 85.280 85.280 84.531 1.128 Table 14. The PCC with the CHECKERBOARD MEB4 88.000 89.120 92.320 87.680 89. 600 90. 400 90. 880 89.280 90.720 88.000 87.840 88. 960 89. 120 91.040 89.440 91.840 89.760 87.520 90.080 90.080 89.760 91.840 90. 720 90.040 90. 880 89.731 1.412 pattern MEB8 89. 280 90. 240 93. 440 89. 600 90. 880 92. 320 90.880 92.160 89.440 89. 600 89.920 91. 840 90.240 92.800 88.800 90. 560 91.680 91.200 94.240 91.840 91.520 92.000 90.867 1.533 Table 15. The 79 PCC with the 2RER 95.520 97.280 93.600 96.160 95.360 96.960 95.200 94.720 94. 080 95. 040 95.840 95.680 94.560 95.360 95.520 96.640 96.640 96.640 95. 200 95. 040 95.840 95.578 .912 3RER 95.680 97.600 95.360 93. 440 96.160 95. 520 96. 640 95. 200 94.720 94. 080 95.040 96.000 95.520 94.880 94. 560 95. 680 95. 520 96. 640 96. 640 96.320 96.000 95. 200 95. 200 96. 640 95.840 95.603 .908 SR 83.360 87.680 85.760 83.200 84.000 83.040 86. 560 85.120 83. 520 84.640 82.720 83.520 86.720 82.400 83.840 83.840 86.400 86.880 85.280 85.920 84. 640 85. 280 84.000 88.000 86.560 84.915 1.607 ROAD SIGN pattern MEBS 91.360 95.200 93.440 90.880 90.560 90.560 94.080 92.320 92.800 92.320 90.720 89.280 93.440 90.720 92.160 91.040 93.760 92.320 92. 640 91. 360 92.000 92. 000 91. 360 93. 760 92. 000 92.083 1.359 Table 16. The PCC with the (.90,.50,.50,.10) MRF ER4 89.223 88.416 89.003 88.196 2RER 89.809 88.563 89. 003 88. 710 87. 977 88. 636 87.683 88. 416 88.050 89. 809 87. 683 89.516 89.150 88. 270 87. 977 90.396 88. 343 87.830 88. 563 88. 710 88. 343 90. 029 88.856 88. 710 90. 469 88.780 .814 80 3RER 89.736 88.416 88.930 88.710 88.123 88.563 87.683 88.636 88.123 89.809 87.683 89.443 89.223 88.343 87.977 90.396 88.343 87.903 88.636 88.490 88.416 90.029 88.930 88.563 90.323 88.777 .784 SR 85.191 84.164 83.431 84.897 84.018 83. 871 83.358 83.871 83.651 84.677 84. 897 84.384 84.677 83. 871 83.211 85.924 83. 871 82.258 84. 897 83.504 83.504 84.018 84.384 85. 777 85.337 84.226 .854 MEB4 87.537 86.804 86.510 86.364 86.510 86.657 86.070 87.648 85.777 86.657 86.804 87.170 86.510 86.144 84.971 88.270 85.411 85.264 87.243 86.217 86.437 87.023 87.243 88.636 87.390 86.691 .867 pat tern MEBS 88.270 87.170 86.950 86.657 87.023 87.023 85.997 87.454 85.777 87. 170 87. 097 87. 610 88.050 86. 437 86. 364 88. 416 85. 997 85.264 87.317 86.217 86.730 87.463 87.097 87.830 88.343 87.029 .822 81 Table 17. The PCC with the (.01,.10,.01,.90) MRF pattern ER4 91.569 91.569 92.229 93. 035 92. 595 92.889 92.815 93.768 92. 962 92. 595 92. 595 92.082 92.815 93.475 91.642 93.109 93. 109 93. 475 93. 328 93.182 93.035 93.842 92. 082 92. 522 92. 669 mean 92.760 ad .633 lRER 91.496 93.182 93.035 93.182 93.548 93.548 92.522 93.475 93.182 92.375 92.962 92.522 94.135 93.475 92.155 93.109 92.595 94.282 93.255 92.889 93.548 94.282 92.449 92.742 93.328 93.091 .658 2RER 91.569 92.669 93.182 93.035 93.622 93.622 92.522 93.548 93.475 92.302 92.962 92.669 93.915 93.109 92.009 93.328 92.742 94.355 93.402 92.522 93. 402 94. 355 92. 742 92. 669 93.255 93.079 .666 3RER 91.642 92.962 93.182 93.109 93.622 93.402 92.595 93.255 93.402 92.155 92.962 92.595 94.062 93.255 92.009 93. 182 92.669 94.501 93.548 92.522 93.402 94.208 92.669 92.742 93.109 93.070 .661 SR 82.771 83.871 82.258 82. 918 83. 871 84.091 83.211 84.897 84.457 84.238 84.824 82.331 85.191 83.724 83.138 83. 798 82.478 84. 897 84. 164 83. 798 82.258 85. 997 82.918 84.091 82. 771 83.718 1.002 MEB8 90.616 Table 18. ER4 85. 924 82. 771 86. 217 84.091 85.191 84.824 84. 457 86. 730 85. 117 84. 531 86. 290 85. 411 85. 557 84.311 83. 358 85. 191 84. 531 85. 630 85. 264 85.337 83.724 84.971 86.510 85.484 84.164 mean 85.023 ad .972 2RER 85.117 83.578 86.217 84.018 85.630 84.824 84.604 86.730 84.897 86.437 85.924 85.117 85.924 84.677 82.478 86.290 84.677 85.850 85.850 84.824 83.431 85.191 86.510 85.777 84.164 85.149 1.063 82 3RER 84.897 83. 798 86. 144 83. 944 85. 630 85.044 84.531 86.730 84.824 86.657 85.777 84.824 86.070 85.117 82. 405 86. 364 84.751 85.850 85.704 84. 677 83. 504 85.337 86.584 85.850 84.311 85.173 1.067 SR 84. 311 81. 965 84. 897 84. 018 85. 044 83. 431 85.557 85.484 84.164 85.191 84.897 84.018 85. 557 84. 677 82.551 83. 724 84. 897 83.724 83.944 82.918 84.384 85. 557 84. 311 83.724 84.302 .942 MEB4 83.724 81.158 83.724 82.625 83.284 82.185 83.358 84.604 82. 625 83. 578 84.384 83. 798 84.238 83.358 81.745 83.211 82.478 83. 651 83.065 81. 745 82.038 83. 431 84.164 83. 504 82.991 83.147 .885 The PCC with the (.80,.40,.50,.60) MRF pattern MEBS 83.944 81.085 83.358 82.771 83.184 82.331 83.358 84.897 82.771 83.504 84.238 84.018 84.238 83.211 81.672 83.358 82.405 83.724 83.138 81.745 81.891 83.211 84.238 83.138 83.211 83.144 .915 83 In Figures 3 — 15 the images are arranged as shown below. TRUE IMAGE ER 4 SR 1 RER MEB 4 2 RER MEB 8 3 RER 84 Original Picturu Tho Great Likll 31 by 44 Rulol: Rovision I l on Rul¢41 Rov1|1an u 2 an Ru1I41 Rov151on I 3 on Rule41 Figure 3. GREAT LAKES: The best classification 85 urxganA chturo Tho Grunt Lakcn 31 by 44 Figure 4. GREAT LAKES: The worst ciassification 86 Original Pictur. Chockor Board :5 by 25 Rulabx R 1 l u . z Rovisxon h 1 on Rule4: . JI 1Jh| Iili Ru1a21 Rovxslon I 2 on Rulubn Ru1a31 Rovtsxan u 3 an RUII4| Figure 5. CHECKERBOARD: The best classification 87 Original Pictur- Chcckor Board 23 by 25 Ruins: Figure 6. CHECKERBOARD: Rulo4l HEP Rlvxsxon h i on Ru1041 Rovusxon 0 2 on Rulo‘r Rovxsron fl 3 on Rul¢41 The worst classification 88 Orthnal Picture Road Sign 23 by 23 Ruto41 newtszon O 1 on Rulo41 Rov1aton a 2 :n Ru1¢41 Ruler Raylston fl 7 on Rulc41 Figure 7. ROAD SIGN: The best classification Figure 8. 89 Original Picture Road Sign 25 by 25 Rulots RuleZ: Rule}: 0.. I ROAD SIGN: Rull4t Rcvssxon I 1 on Ru1e4: Rovxsxon I 2 on Ruie4: an Rov1sion u I on Ruled: 9: The worst classification 90 Rulo4: 91.92.93.94l 0.90 0.50 0.50 0.10 Orthnal Picture Povns1on I 1 on Rulc4r Ru1l1: n u 2 an Eulndr Rovnsxu Rovxsxon u 3 on Rull4l The best classification MRF (.90,.50,.50,.10): Figure 9. Oriqxnal Picturo 91.92.93.941 0.90 0.50 0.50 0.10 Rut-4' RIVlsxon n l on Ruled: h Rovtstan n 2 )n Pule41 Figure 10. MRF (.90,.50,.50,.10): The worst classification 92 °"‘°‘““ “flu" (DI-0.01 92-u.1o 9380.01 94.0.90) I I III-'- ::%I 'II I '7 v. ‘5 '3’!“- Figure 11. MRF (.01,.10,.01,.90): The best classification 93 O'AQInIl'Ptcturo (pl-0.01 92-o.1o 93-0.01 94-0.901 ..I' IW'II- IIII. l'l:1:ll‘: O I LI. II:I. II... I'. L'- II n.._ M I- ’1 A - ll ._~5‘.,1-¢' .' ' .0, 0":- 1::.. I II. a ' ”nix-5:628:45: L I III. II. I'O'I I'I I... . lfiqfilfld'],-. l .1 L;': Figure 12. MRF (.01,.10,.01,.90): The worst classification 94 Figure 13. MRF (.80,.40,.50,.60): The best classification 95 Figure 14. MRF (.80,.40,.50,.60): The worst classification 96 Orthnal Picturox Rcvxsxon . 1 on tho Rolult of Ruln4 1 PCC =96.04Z PCCO '83.41Z PCCI =°8.35' ACA 390.88% Rov1s1un G 3 on the Result of ”ulna 1 PCC =°e.192 . - .. Pcco =34.a:x 2250:”;ASSQ PCCl =va.:7z a ow '1: II- 0 etc: =a:.::x “c“ ‘°““3‘ ACA aes.esx Rclult of Rulodr - RIVIIIOh I 9 on thl Rnlult o6 Rulo‘ 1 PCC I95.16% PCC I96.ll% PCCO '80.57Z PCCO n93.312 PCC! I97.83Z PCCI I9B.O9X RCA .99.202 ACA .91.7OZ Figure 15. GREAT LAKES CLASSIFICATION BY SR and ER4 APPENDIX APPENDIX DYNAMIC PROGRAMMING RECURSION Suppose (A.1) ADM”): 131 aiifl + Embi0i_ 191 where 11%: (01,02... .011), oi e {0, 1}. Let A?=0,A%=al. For 1 S i 5 n—l, let 0 1 Ai+1 = max (A9,A Ai) 1 _ 0 1 Ai+l — max (Ai + ai+1’ Ai + ai+1 + bi+l) Then, max ADM”): max (A0, A1). 2" Proof: The underlying similarity of all dynamic programming processes is the creation of a set of functional equations of a particular type, called recurrence relations. (A.1) can be written as (A.2) Ana“) = An_l(£“—l) + an0n + bn0n_ 1011 From (A. 2) max A DMD) = max max 1An_ 1(Qn1 ), max 1An_ l(£111) )+ all + ”11011- 1 e“ 1‘“ 2“ _ l — max (An’ An) 97 98 where 0 —1) An = maxAn_l()f1 011-1 1 An=maxAn__ll(£n )+an +bnn_0 011-1 Using (A.2) again, we get 0 _ An—max mix-VAD2LQD2 ),max2_An2(,Qn2 )+an ———1+bn10n2 0 in = max (A0 A ) n-1’ n—1 and 1 An = max max An_ 2m“ )+an, max 2An _‘W‘ )+an __l+bn 1011- 2+an +bn on—2 0n—2 _ 0 1 —max(An_l+ a,A1+a +b) Proceeding as above, we obtain for 1 5 i 5 n—1 0 0 1 Ai+1 = "1&va A1) A1 =max(A0+a i+1 i+1’ Ai + aLi+1+ b i+1) Letting A? = 0 and A% = a1, we get max ADM“) = max (A2, A111). 1:: on Note that, A? = .max Aim‘) and A} ‘max Aid). [2 ; 0i=0} {2‘ ; 01:1} BIBLIOGRAPHY 10. 11. 12. 13. BIBLIOGRAPHY Ballard, R. J. (1974). Extended rules for the sequence compound decision problem with mxn component. Ph.D. dissertation, Department of Statistics and Probability, Michigan State University, East Lansing. Ballard, R. J. and Gilliland, D. (1978). On the risk performance of extended sequence compound rules for classification between N(—1, 1) and N(l, 1). J. Stat. Compat. Simul, 6, 265—280. Ballard, 11.1., Gilliland, D. and Hannan, J. (1975). 0(N'1/2) convergence to k-extended Bayes risk in the sequence compound decision problem with mxn components. RM—333, Department of Statistics and Probability, Michigan State University, East Lansing. Bertsekas, D. P. (1976). Dynamic Programming and Stochastic Control, Academic Press, New York. Besag, J. E. (1974). Spatial interaction and statistical analysis of lattice systems. J. Royal Statist. Soc., Series B, 36, 192—236. Besag, J. E. (1986). On the statistical analysis of dirty pictures. J. Royal Statist. Soc.,Series B, 48, 259—302. Cooper, L. and COOper, M. V. (1981). Introduction to Dynamic Programming, Pergamon Press, New York. Cox, D. R. (1960). Serial sampling acceptance schemes derived from Bayes' theorem. Technometrics, 2, 353—360. Derin, H., Elliott, H., Christi, R and Geman, D. (1984). Bayes smoothing algorithms for se entation of binary ima es modeled by Markov random fields. IE E Trans. Pattern Anal. achine Intell, PAMI 6, 707-720. Devore, J. L. (1975). Linear near neighbor classifiers for correlated categories. Commun. Statist, 4, 891—905. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell, PAMI 6, 721—741. Gilliland, D. and Hannah, J. (1969). On an extended compound decision problem. Ann. Math. Statist, 40, 1536—1541. Gilliland, D., Harman, J. and Huang, J.S. (1976). Asymptotic solutions to the two state component compound decision problem, Bayes versus diffuse priors on pr0portions. Ann. Statist, 4, 1101-1112. 99 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 100 Hannan, J. (1957). Approximations to Bayes risk in repeated play. Contributions to the Theory of Games, 3, 97—139, Ann. Math. Studies, No. 39, Princeton University Press. Harman, J. and Robbins, H. (1955). Asymptotic solutions of the compound decision problem for two completely specified distribution. Ann. Math. Statist., 26, 37-51. Harman, J. and Van Ryzin, J. (1965). Rate of convergence in the compound decision problem for two completely specified distributions. Ann. Math. Statist., 36, 1743-1752. Hansen, F. R. and Elliott, H. (1982). Image segmentation using simple Markov random field models. Computer Graphics and Image Processing, 20, 101-132. Haslett, J. (1985). Maximum likelihood discriminant analysis on the plane using a Markovian model of spatial context. Pattern Recognition, 18, 287—296. Hill, J. R., Hinkley, D. V., Kostal, H. and Morris, C. N. (198:3. Spatial estimation from remotely sensed data via empiri Bayes models. Proc. of the NASA Symp. on Math. Pattern Recognition and Image Analysis, 115-136. Kinderman, R. and Snell, J. L. (1980). Markov random fields and their applications, American Mathematical Society, Providence. Morris, C. N. 1986). Binary image classification. Proc of the 18th Symp. on t e Interface, 71—74. Morris, C. N., Hinkley, D. V. and Johnston, W. (1985). Classification in a spatial] correlated environment. Proc. of the Third Annual Symp. on Math. Pattern Recognition and Image Analysis; also, CSS Technical Report No. 20, Center for Statistical Sciences, The University of Texas, Austin. Pickard, D. K. (1977). A curious binary lattice process. J. Appl Prob., 14, 717-731. Preston, P. F. 9971). An empirical Bayes problem with a Markovian parameter. iometrilca, 58, 535-543. Ripley, B. D. (1986). Statistics, images and pattern recognition. and J. Statist., 14, 83-111. Robbins, H. (1951). Asymptotically subminimax solutions of compound statistical decision problems. Proc. Second Berkeley Symp. Math Stat. Prob.,University of California Press, 131-148. Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist., 35, 1—20. 28. 29. 30. 31. 32. 33. 34. 101 Scharf, L. L. and Elliott, H. (1981). Aspects of dynamic programming in signal and image processing. IEEE' Trans. Automat. Contr., AC 26, 1018—1029. Swain, P. H., Vardeman, S. B. and Tilton, J. C. (1981). Contextual classification of multispectral image data. Pattern Recognition, 13, 429—441. . Tilton, J. C., Vardeman, S. B. and Swain, PH. (1982). Estimation of context for statistical classification of multispectral image data. IEEE Trans. on Geosciences and Remote Sensing, GE 20, 445—452. Van Ryzin, J. (1966). The compound decision problem with mxn finite loss matrix. Ann. Math. Statist., 37, 412-424. Vardeman, S. (1975). k—extended procedures in the finite state sequence compound decision problem. RM-337, Department of Statistics and Probability, Michigan State University, East Lansing. Vardeman, S. (1980). Admissible solutions of k—extended finite set and sequence compound decision problems. Journ. Multivariate Anal, 10, 426—441. Vardeman, S. (1981). On the small N performance of bootstrap and Bayes extended and unextended set compound rules for classification between N(-l,1) and N(1,1). J. Statist. Comput. Simul, 13, 255—271. "Illlllllllllllllllllls