M A LIBRARY Michigan mate 1 University \ _ - I m ———-——— —-V This is to certify that the dissertation entitled Inference of Array Grammars Under Noise and Distortion presented by Gautam Biswas has been accepted towards fulfillment of the requirements for Ph.D. degree inCQmpgter Science Major professor Date December: 29, 1982 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 MSU LIBRARIES m RETURNING MATERIAL§: Place in book drop to remove this checkout from your record. FINES will be charged if book is returned after the date stamped below. INFERENCE OF ARRAY GRAHMARS UNDER NOISE AND DISTORTION By Gautam Biswas A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1982 b“ 9053 (3/ I,. ABSTRACT INFERENCE OF ARRAY GRAMMARS UNDER NOISE AND DISTORTION By Gautam Biswas This thesis research deals with learning of the syntactic structure of two dimensional binary patterns that occur in image processing and pattern recognition. The development of a two dimensional inference scheme using a probabilistic array grammar that incorporates noise and distortion models is the primary contribution of this thesis. A two level inference scheme is proposed based on a block structured array grammar. The ideal pattern must be known, which implies knowledge of the nonterminal rewriting rules. The key step is the inference of intermediate grammars from probabilistic samples. Complexity and discrepancy measures are the criteria for selecting a simple and 'best-fitting' grammar. A number of complexity and dis- crepancy measures for string grammars are reviewed and a new, computat- ionally feasible discrepancy measure that involves both probabilistic and structural factors, is defined. The new measure is shown to possess a number of intuitively desirable properties. The discrepancy measure is extended to two dimensions to quantify the suitability of an inferred array language. The inference scheme uses noise and distortion models to accommo- date deviations from the ideal structure among observed patterns. An independent noise model and two intuitive distortion models are analy- zed along with a powerful and comprehensive model based on Markov ran- dom fields. The problems of generating patterns and estimating para- meters are studied. Robustness tests and parsing experiments demonstr- ate the versatility of the Markov random field inferential model. Not only is it applicable with a number of noise and distortion models, but it also allows small deviations from the ideal structure, thus providing a flexible inference scheme. The inference scheme and the computation of the discrepancy measures require large amounts of computer time in the sequential mode of operation. SIMD and pipeline architectures are suggested to speedup computation significantly. lmplementation of the proposed scheme with these architectures should open up a number of real time practical applications in industrial vision and robotics. Dedicated to my father the Late Professor Anil Bhushan Biswas and my mother Mrs. Anjali Biswas ACKNOWLEDGEMENTS I am indebted to my thesis advisor, Professor Richard C. Dubes, for his help, guidance and encouragement throughout the course of this thesis work. He has served as an excellent role model and has contributed greatly to the development of my research abilities. Special thanks go to Professor Raoul LePage, for his patient efforts and many helpful suggestions that contributed greatly to the development of the Markov random field model and to Professor Anil K. Jain, for his critical review of this thesis work. I thank Professors Lewis Greenberg, Jacob Plotkin and lndranand Sinha for agreeing to serve on my Ph.D. guidance committee. Acknowledgements are also due to Dr. Erdal Panayirci and past and present graduate students of the Pattern Recognition and Image Processing Laboratory - Dr. George Cross, Dr. Steve Smith, Dr. James Coggins, Neal Wyse, Weichung Lin, Ardeshir Goshtashby, Rick Hoffman and Xiabo Li for their help and suggestions during the course of this work. I must not forget to mention Dr. and Mrs. Subinoy Chakravarty, who have made my stay in E. Lansing a very pleasant one, as well as my wife Sujata, for her (long distance) encouragement through those hectic final months of thesis writing. The financial support provided by NSE grant ECS-8007lO6 and DER is also gratefully acknowledged. TABLE OF CONTENTS LIST or TABLES ......... ...... ...... . ............ . ........... viii LIST or FIGURES ................................ . .............. ix LIST or SYMBOLS ....... ......................................... x CHAPTER I. INTRODUCTION ................................ ....... I 1.1 Grammatical Inference ................................... 5 l.2 Two Dimensional Picture Representation .... ....... ....... 8 l.2.l Picture Description Languages .. ........ ............ 9 1.2.2 Two Dimensional Grammars .......................... lO l.3 Organization and Contributions of this Thesis ..... ..... 12 CHAPTER 2. ARRAY GRAMMARS .. ................. . ........ . ....... IS 2.l Rosenfeld Array Grammars ........................ . ...... IS 2.2 Siromoney Array Grammars ........... ........ .. ......... . l7 2.2.l Formal Definition of Siromoney Array Grammars ..... l9 2.2.2 Some Useful Properties ........ .................... 2A 2.2.3 Probabilistic Array Grammars ..... ................. 26 2.3 Array Automata .................... ...... ......... ...... 27 2.A Definition of Type and Block Number ..... ............... 28 2.5 Summary ................................ ....... . ........ 29 CHAPTER 3. MODEL DESCRIPTION AND INFERENCE .. ........... . ..... 3i 3.] Pattern Description and Learning ............. .......... 3i 3.2 The Inference Scheme .............. ..... ................ 33 3.3 Summary .......... ... ........ .... ..... .. ....... . ........ 38 CHAPTER A. COMPLEXITY AND DISCREPANCY MEASURES ............... 39 A.I Complexity Measures .................................... A0 A.l.l Size Complexity Measures .......................... AI A.l.2 Information Theoretic Complexity Measures ......... A3 A.I.3 Choice of Complexity Measure .... ...... ............ AA A.2 Discrepancy Measures ............................. ...... A5 A.2.l Probabilistic Discrepancy Measures ................ A7 A.2.2 Structural Discrepancy Measures ................... A8 A.2.3 Choice of Discrepancy Measure ..................... A9 A.3 New Discrepancy Measure ............ .. ................. . 50 A.3.I Definition ..................................... ... 50 A.3.2 Particular Proximity Measures ..................... 53 A.3.3 Properties of Structural Discrepancy Measure ...... 56 A.3.A Computation of the Discrepancy Measure ............ 58 A.A Discrepancy Measure for Array Languages ................ 61 A.5 Summary ...................... . ......... . ................ 63 CHAPTER 5. MODELS FOR NOISE AND DISTORTION ................... 65 5.] The Independence Noise Model ........................... 66 5.2 Intuitive Distortion MOdels ... .............. ........... 70 5.2.1 Independence Distortion Model . ..... ... ........ .... 72 5.2.2 Neighborhood Distortion Models .................... 7A 5.2.3 Applications and Limitations . ..................... 80 5.3 Markov Random Field Distortion Model ................... 82 5.3.] Background ......... ............................... 82 5.3.2 The Distortion Model .. .......... . ....... .......... 85 5.3.3 Generation of Distorted Patterns ... ......... , ..... 87 5.3.A Estimation of Parameters .......................... 92 5.A Summary .... ............................................ 96 CHAPTER 6. EXPERIMENTAL RESULTS AND COMPUTATION ..... ......... 98 6.1 The Inference Scheme ................................... 99 6.1.1 Implementing the Inference Scheme ................ 101 6.1.2 Robustness Tests ................................. IOA 6.2 Parsing Experiments ........ ..... ............. ......... 108 6.3 Recognition of Rotated Triangles .... .................. IIO 6.A Parallel Implementation Techniques .................... 11A 6.5 Summary ............. .. ....... ..... .................... 118 CHAPTER 7. SUMMARY, CONCLUSIONS AND FUTURE RESEARCH ......... Il9 7.1 Summary and Conclusions ....................... ........ 119 7.2 Future Research ....................................... 122 APPENDIX A. NOTATIONS AND DEFINITIONS'ARRAY GRAMMARS ......... 125 APPENDIX B. ARRAY AUTOMATA ................................... 128 APPENDIX C. MATRIX GRAMMARS .................................. I31 LIST OF REFERENCES. .......................................... I33 LIST OF TABLES Parameter estimates by the two schemes for the Markov random field model ...................................... 95 Parameter values used for generating training sets ..... 100 Estimated Parameter values ..... . ....................... 102 Run Times and Core Requirements ........................ 106 Robustness Test - using the Two dimensional Discrepancy Heasure O ..... 00...... ..... 00...... ...... ...... ....... 0.107 Acceptance Probabilities - Patterns generated from Array Grammar ..... ... ............. . ...... .. ....... . ...... .... 109 Acceptance Probabilities - Patterns generated from Physical Process .. ............ ..... ...... ... ........... 110 Acceptance Probabilities - Rotated Triangles ........... 112 viii LIST OF FIGURES 1. Shearing Effect ....... ....... . ................ . ......... l7 2. Array Grammar - Isosceles Triangle ...................... 22 3. Array Grammar for Numeral '7; ........................... 23 A. Pattern Type and Block Number ...... . .......... . ......... 30 5. Flowchart for Inference Scheme . ............. ............ 3A 6. Coding Intermediate Languages of '7' into String form ... 36 7. Triangles generated from Independent Noise Process ...... 68 8. Noisy 7's ............................................... 69 9. Interior and Boundary Points .. .................. . ....... 71 10. Triangles generated by Independent Distortion Process .. 7A 11(a). Squared distance to point x . ........................ 76 11(b). Orders of Neighbors relative to point x ..z ....... ... 77 12. Triangles Generated by Neighborhood Process ............ 81 I3. Algorithm for generating Templates from MRF distribution with given parameters ................................... 89 1A. Triangles generated by MRF generative model (I) ........ 9O 15. Triangles generated by MRF generative model (2) ... ..... 91 16. Coding Blocks into Strings - 7's ...................... 103 17. Complexity and Discrepancy Measure vs. Tail length .... 105 18. Rotated Triangle ...................................... Ill 19. Noisy Rotated Triangle ................................ 113 Symbol o-em LIST OF SYMBOLS Usage Empty matrix. Row Catenation Column Row or First order parameter for Catenation Column Catenation interior points - Markov random field model First order parameter for boundary points - Markov random field model. Second among Second among Second order parameter for pairwise interaction interior points - Markov random field model. order parameter for pairwise interaction boundary points - Markov random field model. order parameter for pairwise interaction among interior-boundary points - Markov random field model. Weight factor for window function. Parameter vector Markov random field model; A - I,I b,b’ ab,i Specific subset of language L(G). Siromoney Array Grammar Number Number Number of pixels in a block. of interior pixels in block. of boundary pixels in block. General second parameter - Markov random field model. drect dtri Da("') D(.,.) D (...) tot Boundary template Specific subset of language L(G). Number of pixels flipped in a block. Number of internal pixels flipped in a block. Number of boundary pixels flipped in a block. Count vector; C - (si’sb’sii’sbb’sbi> Complexity of string grammar G. Proximity value between string x and language L. Window proximity measure (rectangle or triangle). Minimum proximity measure. Rectangular proximity measure. Triangular proximity measure. Absolute discrepancy measure. Structural discrepancy measure. Total discrepancy measure. Minimum structural discrepancy measure. Rectangular structural discrepancy measure. Triangular structural discrepancy measure. Distance. exponential. Expected value. Probability function. Interacting Potential. Formal grammar. Class of formal grammars. xi IML Intermediate Matrix Language. Kc Number of strings in (cardinality of) set Lc' £(x) length of string x. LA’LB Intermediate languages of array grammar. Lc(x) Window on set L centered at x. LS Specific subset of array language. L(G) Language of grammar G. L(.) Likelihood function. min Minimum. MG Matrix Grammar. MRF Markov random field. n(.) Number of. N Pattern type. ka Pixels in kth order neighborhood of a particular pIxel. p Probability of change pi Probability of change for interior points. pb Probability of change for boundary points. p(x),q(x) Probability of occurrence of string x. p-AG Probabilistic array grammar. P Set of production rules. Pl Nonterminal production rules of array grammar. P2 Intermediate production rules of array grammar. Q Potential function. R Sample covariance matrix. Set of terminal symbols or picture primitives. xii R Finite set of probabilities. RI Probabilities assigned to nonterminal rules. R2 Probabilities assigned to intermediate rules. si Count of interior points that are 1's. sb Count of boundary points that are 1's. sii Count of interior-interior pairs that are both 1's. sbb Count of boundary-boundary pairs that are both l's. sbi Count of interior-boundary pairs that are both l's. S Sample (training) set. S+ POsitive sample set. S- Negative sample set. 3 Start symbol of grammar. T Set of all possible realizations on a binary lattice. T Change template. u(.) Discrete probability distribution - Markov random field. U[.,.] Discrete uniform distribution. V Set of nonterminal symbols. VI Set of nonterminal symbols of array grammar. V2 Set of intermediate symbols of array grammar. V1,V2,--,V8 Coded vectors to convert intermediate language to string form. W(.,.) Neighborhood function. W1,W2,--,W8 Coded vectors to convert intermediate language to string form. WLD Weighted Levenshtein Distance for strings. WLD2 Weighted Levenshtein Distance for arrays. xiii x Pixel value (0 or 1). 5 Configuration of pixels on a lattice. z(i,j) Indicator function. Z Normalizing constant for probability distribution - Markov random field. CHAPTER I INTRODUCTION Pattern Recognition techniques are major contributors to the field of machine intelligence and perception. They involve the description and analysis of data, including their categorization into identifiable classes by the extraction of significant features or attributes that are relevant to the problem being studied. The mathematical techniques available to solve pattern recognition problems can be broadly classified into the statistical, or decision-theoretic approach [31,A3], and the structural, or syntactic approach [A1,A6,83]. The first involves measuring characteristic features on the input data and then classifying the resulting patterns by partitioning the feature space according to categories or pattern classes. An example is the use of Fourier descriptors [85,120] to describe the shape of objects Afrom their outer boundaries. The structural approach involves the des- cription of objects, or patterns, in terms of their parts (subpatt- erns), and the connectivity relations among these parts. An example is the use of primitive symbols (simple curves) and a formal grammar to describe different types of chromosomes [62,63]. Decision theoretic methods are based on the use of discriminant functions and are well suited for patterns that can be meaningfully represented in vector form. However, there are applications, such as scene analysis and image segmentation, where the relationships among the various component parts (subunits) are nonnumerical in nature, but can be expressed in terms of discrete mathematical models such as formal grammars [18]. In such situations, the structural approach seems to be more appropriate for understanding and analyzing complex patterns than the statistical approach. Structural and syntactic patt- ern recognition techniques have been applied to a number of application areas including chromosome identification [62,63] character recogni- tion [3], shape analysis [8A], classification of speech [27,57], data compression [5], biomedical waveform analysis [2.12.107], fingerprint identification [75,90], image understanding [108], texture analysis [56,68], defect location [77,87], component recognition and placement [111,112], and testing of integrated circuits [82,121]. This thesis is primarily concerned with the description and analysis of two dimensional patterns using syntactic techniques. This involves two main steps - (i) the selection and extraction of primitives, and (ii) the inference or learning of connectivity relations among primitives in terms of formal grammars. In particular, we infer two dimensional formal grammar models with applications to the generation, description and recognition of two dimensional objects in mind. The patterns of interest here are binary images or pictures, so we choose pixel values 0 and 1 as our two basic picture primitives. Our main emphasis will be on the learning of a formal grammar to describe a class of patterns. The study of the primitive selection problem is not covered here. Two dimensional grammar models present many advantages in representing two dimensional patterns [98]. The relationship among the components (subpatterns) of a pattern can be directly represented in terms of two dimensional ”concatenation” relations. A one dimensional scheme may require a number of preprocessing operations to reduce the two dimensional pattern to a one dimensional form. Traditional one dimensional schemes suffer inherent difficulties in modelling objects with disconnected parts or objects with holes. Two dimensional grammars are easily amenable to parallel processing [72,91,101]. Parallel forms simplify mathematical models for pattern represent- ation [72] and significantly reduce parsing time, which can be exploited in developing very efficient classification schemes. After proposing a formal mathematical model for describing two dimensional patterns based on array grammars, we will concentrate on the problems of learning or inferring a formal grammar for a class of patterns or pictures from a finite set of samples. Inference schemes generate a number of 'candidate' grammars from a set of training patterns. Complexity and discrepancy measures are criteria for select- ing the simplest and 'best-fitting' grammar and are therefore studied in some detail. A strong emphasis is placed on the learning of the structures of noisy and distorted patterns. Patterns observed in the sample set are different from ideal structures because of inherent limitations in the recording and digitization processes. Changes in the ideal represent- ations of the pattern that are caused by phenomena independent of the underlying ideal structures are termed 'noise' processes, whereas those that depend on the underlying structure are called 'distortion' processes. A number of noise and distortion models are discussed, in- cluding one based on Markov random fields. We compare the robustness of the inference procedure under various noise models. The grammatical scheme proposed in this thesis has several advantages. Automated industrial inspection and robot applications require well designed and very precise lighting systems to avoid problems like low contrast, shadows and extraneous detail, which increase the complexity of vision algorithms [A7]. The use of a noise or distortion model in conjunction with an inference scheme when the lighting system is inadequate would incorporate some of these problems in the model description and keep the recognition scheme computationally feasible. Also, integrating the noise or distortion process with the learning scheme allows setting up discrimination or classification procedures which distinguish between patterns that have the same ideal structures, but have been subjected to different noise or distortion processes. Array grammars can generate patterns of different sizes, but with the same proportion of length to width. This property can be used to advantage when an imaging device for robot vision does not maintain a fixed distance from the objects of interest. Conventional schemes, such as template matching, require an extra dis- tance dependent scaling operation before recognition. Moreover, the integration of the noise and distortion processes into the array gram- mar model allows small deviations in the orientation of the pattern thus simplifying the registration problem. It may also help differen- tiate among two sets of parts such as two types of bolts that have similar shapes, but different proportions of length to width. The remainder of this chapter briefly presents the grammatical inference problem and reviews techniques that have been used for two dimensional pattern representation. Lastly, the organization of this thesis is described and its main contributions are summarized. 1.1 Grammatical Inference Grammatical inference algorithms obtain a formal grammar [28,52,95] from a finite set of training patterns, expressed as strings or arrays of primitives. The finite set of training patterns is generally re- ferred to as the sample set, and consists of two disjoint subsets - the positive sample set, 5+, which contains patterns from the language gen- erated by the grammar to be inferred, and the negative sample set, 5’, of patterns which do not belong to that language. The grammars {Gi}’ created by an inference algorithm should necessarily satisfy the following conditions : S+CZ L(Gi) and S-CS L(Gi). where L(Gi) is the language generated by grammar Gi and L(Gi) is the the complement of L(Gi)' Our inference schemes utilize only a positive sample set, S+ The sample set S+ can be probabilistic. Each pattern in the training set has a probability and patterns with higher probability are observed more frequently [A2]. The corresponding inferred probabilis- tic grammar associates a probability with each of its production rules to reflect the probability of occurrence. Grammatical inference algorithms in the literature apply only to strings of symbols and are therefore one dimensional in nature. Inference of finite-state string grammars have been attacked analyti- cally [9,A2]. Many properties of context-free grammars are undecid- able, so inference algorithms for such grammars require heuristic techniques, which are applicable to restricted subsets of that class of grammars. An example is the Crespi-Reghizzi algorithm for operator precedence grammars [2A]. Inference algorithms for string grammars have been extended to other constructs, such as trees and webs, with the idea that they naturally represent higher dimensional patterns [1A,A6,65]. Enumerative and constructive methods of inference have been pro- posed in the literature. Enumerative techniques usually involve an exhaustive search through a large, specified class of grammars, such as finite-state grammmars, for ‘candidate' grammars that best describe the samples [53.69.81.118]. Tree searching and state space methods. in conjunction with a cost function, are used to direct the search. Constructive techniques derive grammars from similarities among the syntactic structures in the sample set. Restriction to a particular class of grammars enables one to define specific rules to convert observed syntactic structures into production rules for the grammar. Rules can also be defined for adding, merging. and deleting productions based on some criterion. to obtain a more acceptable grammar. An. example of a constructive inference technique is Cook's [22] 'hill climbing' approach to the inference of context free grammars. Both methods require a criterion. or cost function. to determine the 'best' grammar from a set of candidate grammars. The two properties that are used to define a good grammar are its structural complexity, and its discrepancy or fit to the given sample set. It is trivial to infer a grammar whose language is identical to the given sample set. To be useful, an inference algorithm must infer a grammar which is a natural generalization of the training set, yet has minimal complexity. Syntactic Pattern Recognition schemes are often used to discriminate among classes whose sample strings contain random errors, or when languages are not disjoint. The quality of recognition can be improved by inferring probabilistic or stochastic grammars (p-gram- mars) [A2,llO], where each production rule of the grammar has an asso- ciated probability. These probabilities indicate that some patterns occur more frequently than others. or certain substructures within the pattern are more favoured than other possible ones. The general inference problem for string grammars has not been solved completely. and there are still a number of open questions, both in formal language theory and in the capability of general grammatical inference schemes. However. several techniques exist for inferring limited classes of grammars and we shall be using some of them to develop our two dimensional inference scheme. 1.2 Two Dimensional Picture Representation Regarding a class of two dimensional pictures as a two dimensional language has led to the use of formal linguistics for picture descrip- tion and generation. Pictures are seen as a concatenation of subpic- tures. which are in turn built up of still smaller parts. called the picture primitives. Some models express the relations among picture primitives, either by string concatenation or by algebraic operators. whereas others generalize one dimensional string grammars to two dimen- sional forms. l.2.l Picture Description Languages One of the first attempts to represent two dimensional objects in terms of primitive substructures was the Freeman [38] chain code. Among its drawbacks are extremely long and unwieldy strings for object description, susceptibility to noise, and difficulties encountered with the representation of disconnected curves. Narasimhan [78.79] used a syntactic description model for classes of pictures composed of line like elements, but his method does not seem very easily generalizable to complex pattern descriptions and inference schemes. Shaw [97] extended Freeman's simple left-right string concatenation to four different binary relations among primitives in his Picture Description Language (PDL). resulting in two dimensional properties being expressible in string form. This technique looks quite elegant, but certain simple geometric operations like reflection and rotation are difficult to describe and pictures are often not easily expressed in terms of primitives of the form required by PDL. These methods reduce two dimensional objects to one dimensional string representations which makes it quite clumsy to describe disconnected parts, or configurations with embedded objects. or even l0 objects with holes. These limitations are overcome by structures such as trees [13,A0,A5] and graphs [20.86.92]. While trees can represent high dimensional patterns more naturally than strings. and can provide simpler parsing schemes. it can be shown that there is a direct corres- pondence between tree representations and context free grammars [A6]. Therefore, they have the same difficulties as strings in complex pattern descriptions. and do not seem to present a very natural scheme for general two dimensional structures. Graph or web representations are the most general schemes available for the description of two dimensional patterns. but require extremely complicated parsing tech- niques which are presently not well developed [15]. The development of automatic learning and decision making procedures is extremely difficult for general two dimensional schemes. There are. however, proper subsets of web grammars called matrix and array grammars, that have well developed formal structures, are relatively easy to parse. and also seem to be well suited to picture generation and description. This thesis will concentrate on these structures. 1.2.2 Two Dimensional Grammars Kirsh [58] gave the first example of a two dimensional array grammar for generating a class of labelled A5-degree right triangles. His technique is not easily generalizable to other classes of pictures. Yodokawa et. al. [119] attempted to construct an appropriate mathemati- cal model for two dimensional grammars. These authors also demon- Il strated the existence of two dimensional Turing-type acceptor automata. Siromoney et al. [99] proposed a model of a two dimensional matrix grammar (GI’GZ)’ which is essentially a two level string grammar. The horizontal string grammar, G], generates a string of length n from k possible (intermediate) symbols. Each symbol in this string serves as a start symbol for a vertical right-linear (regular) string grammar. Grammar G2 is the union of disjoint grammars which generate string columns of size m. consisting of terminal symbols which are picture primitives. This two step process generates an mxn picture. (Details are presented in Appendix C). Wang [11A] proposed a variation on Siromoney matrix grammars as a three tuple (G],GZ,M), where G is a 1 horizontal grammar defined as before. but G2 is the union of k disjoint grammars which need not be right linear; M - {m],m2, ------ ,mp}. where each mi is a kxl matrix consisting of k production rules, one from each one of the grammars that make up G2. In the matrices generated by these grammar forms, it is not possible to maintain a fixed ratio of length to breadth, since the column expressions are independent of the row expressions. Rosenfeld and Milgram [7A] defined another form of array grammar and acceptor automaton based on a Turing machine with a two dimensional tape. This scheme starts with an array of blank symbols and fills it with picture primitives using production rules which map subarrays into subarrays. The main problem with these grammars is the number of 12 restrictions placed on the production rules to obtain patterns in their desired shape, as explained in Chapter 2. Siromoney et al. [100] described a theoretical array grammar model which is more powerful than the matrix grammar models. since it can generate picture models not representable by matrix grammars. Its rewriting rules use a two step block structured approach, and describe a picture model in a clear step by step procedure. These array grammars are also very natural generalizations of one dimensional string grammars and possess several properties of one dimensional grammars. We have based our structural model on the Siromoney array grammars. Its two step block structured approach has been exploited to describe a simple model that incorporates noisy and distorted patterns. A detailed formal description of array grammars is presented in the next chapter. 1.3 Organization and Contributions of the Thesis Chapter 2 contains both a theoretical review of array grammars and a critical discussion of properties required by the learning procedures to be developed. This chapter provides a comprehensive review of grammatical inference literature. and the models that have been developed for the structural description of two dimensional patterns. The concept of a probabilistic array grammar is defined for the first time in Chapter 2. 13 Chapter 3 introduces the picture description model and the main inference scheme. Several problem areas. including the role of com- plexity and discrepancy measures, and the need for probabilistic noise and distortion models, are examined. The main contribution of this chapter is the development of a two dimensional learning scheme. and the integration of noise and distortion models into this scheme. Chapter A describes complexity and discrepancy measures for string grammars. The emphasis is on the properties of these measures, and their significance in the grammatical inference framework. An im- portant contribution of this chapter. is the definition of a new struc- tural discrepancy measure that overcomes some of the limitations of the presently available measures. The concept of structural discrepancy for string languages is also extended to define a discrepancy measure for array languages. Chapter 5 describes the theoretical background and the computational schemes used for three noise and distortion models. The computational aspects of the generation of patterns and the estimation of parameters for the proposed schemes are then studied. The main contribution of this chapter is the development of a general distortion model based on Markov random fields. This is the first application and implementation of nonhomogeneous Markov random fields to image processing. lA Chapter 6 presents the results of several experiments applying the inference scheme in conjunction with the noise and distortion models to two different pattern classes. The two dimensional discrepancy measure is employed to assess the robustness of the inference scheme. Another contribution of this chapter is the discussion of computational matters and potential speedup using parallel schemes. Finally, Chapter 7 presents the conclusions of our study and suggests future research. The original contributions of this thesis can be summarized as - 1) Development and study of a two dimensional model description and inference scheme that incorporates the description of noisy and distortedipatterns. 2) The definition and use of a new discrepancy measure for string grammars that incorporates both probabilistic and structural differen- ces. 3) The extension of the discrepancy measure for string gramars to a measure of suitability for array languages. A) The adaptation of nonhomogeneous Markov random fields in defining a distortion process that is very general, and covers a number of other noise and distortion models. CHAPTER II ARRAY GRAMMARS This chapter establishes the mathematical basis for the inference procedures proposed in this thesis. There are two different structural forms for array grammars, the Rosenfeld array grammars and the Siromoney array grammars. We first discuss the Rosenfeld form of array grammars and its limitations. Then we present a formal description of the Siromoney array grammar (AG), review definitions and notation and briefly describe array automata. Examples and illustrations clarify definitions and notation. 2.1 Rosenfeld Array Grammars A Rosenfeld array grammar [7A.9l] contains production rules that replace one subarray of symbols, with at least one nonterminal symbol, by another subarray of symbols. which need not be of the same shape and size. These grammars produce arrays which are accepted by Turing machines with two dimensional tapes, where the primitive symbols are never rewritten. The initial array consists of a start symbol on one location of the tape and blank symbols on all others. The terminal array is made up of pattern primitives and blank symbols. Simpler 15 16 forms of these grammars could be defined by requiring the grammar to be monotonic. i.e., make the number of non blank symbols in the derivation sequences monotonically non decreasing. Rosenfeld [93] defined normal forms for these grammars and showed the equivalence of different types of array grammars. He also showed [91] that some of these grammars have parallel forms that are essentially the same as local digital picture processing functions. Individual rows and columns of the host array in Rosenfeld array grammars must be shrunk or stretched by varying amounts to accommodate different rewriting rules. The implication of replacing one subarray by another of a different size and shape extends beyond the immediate vicinity of the subarray and causes relative displacements among rows and columns, and also overall changes in the outer boundary of the representation (Fig. I). This effect. known as the shearing effect, makes it difficult to generalize properties of one dimensional grammars to two dimensions and causes difficulties in object description. Rosenfeld [91] circumvented this problem by introducing isotonic array grammars. which require the left and right sides of the productions to have the same shape. Cook and Wang [23,116,117] defined normal forms for these grammars and established the Chomsky hierarchy. The shearing effect. and the steps taken to remedy it. make Rosenfeld array grammars rather cumbersome. These array grammars are not natural extensions of one dimensional grammars. The Siromoney and 17 8 Production rule: 5 -> S A applied to result # ##II‘ #B# #5# #SA# ### ### FIG. 1: Shearing Effect Wang matrix grammars discussed in Chapter I, though simple extensions of one dimensional grammars. lack the power to generate pictures of different sizes having the same proportion. For example, one cannot generate objects of size mxn such that m and n have a fixed ratio. The fact that the column grammars are all independent makes it difficult to express any relation between columns. as one goes down the rows.‘ The Siromoney array grammars(AG) overcome the above difficulty and are free from the shearing effect. We base our structural model on the Siromoney array grammar form. 2.2 Siromoney Array Grammars Structural models based on array grammars lend themselves very naturally to the generation of two dimensional patterns. Each cell represents a primitive structure of the pattern, and therefore a 18 terminal symbol of the array grammar. For example, if one considers a class of binary digitized pictures. the simplest set of primitives is {0.1}. An mxn rectangular picture array consists of mn samples from this set. As the number of primitives and the sizes of the arrays to be processed increase. one must resort to primitive selection techniques. We shall not be concerned with this problem. and assume we have two primitives. or terminal symbols for our array grammar. The rewriting rules of Siromoney array grammars generalize the notion of string rewriting rules to array rewriting rules. with arrays of terminals replacing strings of terminals. In place of the conven- tional left to right concatenation. row and column catenation are defined, which establish two dimensional structure. Derivations are restricted by the conditions of row and column catenation; i.e. two matrices can be 'row catenated' only if they have the same number of columns and two matrices can be 'column catenated' only if they have the same number of rows. Rewriting rules can be regular(R), context-free(CF), or context-sensitive(CS). Appendix A clarifies the concept of row and column catenation of arrays, and defines some of the notation used in the formal definition of array grammars [60,100]. Probabilistic array grammars are introduced at the end of this section. 19 2.2.1 Formal Definition of Siromoney Array Grammars The Siromoney array grammar AG is a four tuple, AG g (V, I ’P’S) where V is the union of V], a finite set of nonterminal symbols. and V2, a finite set of intermediate symbols; I is a finite set of terminal symbols; V(\I = 0 S E VI is the start symbol. P = P] U P2 U P3; P1 is a finite set of nonterminal rules, P2 is a finite set of intermediate rules and P3 is a finite set of terminal rules. PI can be described as a finite set of ordered pairs, (u,v) written as + (u -> v). where u,v 6 (VI U V2) , or (VI U V2)+. I: u = ulslvl and v = u}a v], + a E (V] U V2) or (VI U V2)+. PI is context-sensitive if 3 (u,v) E P where SI 6 V]. u]. v], 1 (VI U V2)+. P is context-free if V (u,v) E P]. u E V v 6 (VI U V2)+. or 1! Lastly. PI is regular if V (u,v) E P]. u E V . v E X G Y (B is 0 or B). 1 X 6 VI and Y 6 V2 or vice versa. P2 is a set of ordered pairs (u,v), such that 20 + u,v 6 (V2 U {x].x2. ----- ,xp}) . or U.v 6 (V2 U {xl.x2. ----- .xp})+i ++ . . . . where x].x2, ----- ,xp 6 I are arrays of prImItIves with conformable rows and columns. The implication is that the intermediate rules in- volve only intermediates and a fixed and finite set of arrays in I++. Further. each intermediate in V2 generates a language. called the intermediate matrix language (IML), whose terminals are arrays with either the same number of rows or the same number of columns. P3 is a finite set of ordered pairs (U.V). U 6 (V1 U V2) and V E 1++° An array grammar is called an (XX:YY) array grammar, where XX denotes context-sensitive. context-free, or regular depending on whether PI is context-sensitive. context-free, or regular, and YY denotes context-sensitive, context-free, or regular if at least one intermediate language is context-sensitive. context-free, or regular. and all other languages are either the same or below it in the Chomsky hierarchy. For convenience of notation we write the IML generated by A 6 V2 as. LA - {x I A X E (x1,x ----.xp)+, xi 6 I++ and all xi have the 2’ same number of rows}. or H LA {X I A X E (x].x2.----,xp)+. xi 6 l and all xi have the 21 same number of columns}. Beginning with the start symbol 3, derivations proceed by applying the rules from P1 successively until all nonterminals are replaced. This may involve the use of a P3 rule if the symbol in the innermost parenthesis belongs to V]. The operators 0 and B are not mutually associative. so proper parenthesization is required. Step two involves replacing each intermediate A 6 V2 by elements from LA’ subject to the conditions imposed by row and column catenation. The replacement starts from the innermost parenthesis and proceeds outward. Since P3 by itself does not play an important role in the derivation scheme, we modify the above definition of P and incorporate P3 into PI and P2. Therefore, Pl can also have productions u -> v, where u E V] and v E I++. An important fact to be noted is that derivations would come to a dead end if conditions for row and column catenation were not satisfied. Examples of array grammars are shown in Figs. 2 and 3. This generation procedure requires that the nonterminal rules generate a set of blocks. catenated horizontally and vertically. This we call the block structure description of the picture. Filling in each block with primitive symbols using the intermediate rules P2 creates rectangular arrays of primitive symbols. The set of arrays generated by an array grammar is called the array language. and the 22 NONTERMINAL = { S I: INTERMEDIATES - {A.B}; PRIMITIVES = {x,.} NDNTERMINAL REWRITING RULES: 3 -> ( A e s ) g B. S -> X INTERMEDIATE LANGUAGES: LA a { (.)n | n>=1 } LB - { Ix)n l n>=2 } A REALIZATION or SIZE A: (I) APPLY NONTERMINAL RULES - S -> I A e S ) G B -> ( A e (( A e S ) A B )) ¢ 8 ->(Ae((Ae(IAes)¢9))¢a))¢a ->(A6((Ae((Ae((AeS)¢B))¢B))¢B))¢B (2) FILL INTO BLOCKS WITH INTERMEDIATE LANGUAGES '- ..L‘ I I7 T X X X X X X . X X X X X X X X X fig; _2_: 5.15.1! Grammar - Isosceles Triangle 23 NONTERMINALS = I 5 .TI; INTERMEDIATES = {A.B}: PRIMITIVES = {x,.} NONTERMINAL RULES. P S -> ( A t T ) e B I. T -> ( A i T ) e B T -> X X X X INTERMEDIATE LANGUAGES: n . . . . X X LA 8 X X X n >= 0 LB = . X X n >= 1 X X X X X . (. . .) n A REALIZATION: (1) BLOCK STRUCTURE FROM NONTERMINAL RULES i—L 'Y T T 7 1' —T T T I T T A T A T A I T 3 -> i —+ ——4 -> A 4 +7 -4 -> A s. 4 B B A B B B B (2) APPLY INTERMEDIATE RULES - X X X X X X X X X X X X X X X X X X X X X X X X X X X X C O C O O O O x x C . . . . X X . . . X X . O O O O x x O O O O O I x x O O O O O O O O x x O O O O O O I C x x O O O O O x x O O O O O O C O x x O C O O O x x O O O O X X . x x O O O O O O I O O O 0 FIG. 3: Array Grammar or Numeral '7' 2A individual arrays are called patterns. In image processing applications these arrays represent digitized images. 2.2.2 Some useful properties We now state some useful properties of array grammars which are stated as theorems and proved in Siromoney et al. [100]. I) The relations between various classes of matrix grammar languages (Siromoney) and array languages (Siromoney) is summarized in the following two dimensional scheme. RMLC (R:R)AL c (R:CF)AL c (R:CS)AL C: C: C: i C: CFMLC (CF:R)ALC (CF:CF)ALC (CF:CS)AL C: C3 (3 <3 CSMLC (cs:R)ALc (CS:CF)AL c (CS:CS)AL For example, all regular matrix languages (RML) are (regular,regular) (RiR) array languages and are also context free matrix languages (CFML). This establishes the Chomsky hierarchy and implies that array grammars are more powerful than their corresponding matrix grammars. Since we are interested in developing inference schemes. we require that models use only regular and context-free grammars, both of which have well developed parsing techniques. Subsequent properties will be 25 stated only for regular and context-free grammars. II) Let {Mn | n >= 1} be an infinite sequence of matrices. (1) For n > 1, if Mn is in one of the following forms- Mn = (x 6 Mn_]) i Y. or Y ¢(x 6 Mn_]). or Y 0 (Mn_ 6 X), or (Mn_ 6 X) D Y, 1 1 where X and Y are matrices from IML's Lx and LY and satisfy row and column catenation requirements. then {Mn} can be generated by regular nonterminal rules. (2) For n >= 1. if or x] e (YI ¢ Mn_] ¢ Y2) 9 x2, where X X2, Y1 and Y are chosen from IML's L L and L , 2 x,’ xz' y, Yz subject to row and column catenation restrictions, and X I, l and X2. or Y1 and Y2 are nonempty then {Mn} can be generated by context-free nonterminal rules. Property II illustrates the self embedding structure of (CF:XX)AG's analogous to the self embedding property of string context-free grammars. These two properties indicate the approach one may have to follow to develop a two dimensional inference scheme for the nonterminal rewriting rules. 26 It can be shown that all the (CF:XX)AL's are closed under row and column catenation. However, none of the (R:XX)AL's are closed under row or column catenation. All the (R:XX)AL's and the (CF:XX)AL's are closed under transpose, quarter- and half- turns, and reflections about both the base and rightmost verticals. indicating that there are relationships that one may use to parse patterns that are in the different configurations mentioned above. using the array grammar for the original pattern structure. 2.2.3 Probabilistic Array Grammars A probabilistic array grammar is similar to a probabilistic string grammar [11,39]. The probabilistic array grammar (p-AG) is defined as a 5-tuple p-AG - {v,I,P.R.3} where V, I, P and S are defined as in Section 2.3.1 with P = P1 U P2. R 3 R] U R2; RI is a finite set of probabilities that are assigned in sequence to the nonterminal production rules in P] so that there is a probability associated with each production rule; similarly R2 is a finite set of probabilities that are assigned in sequence to the set of intermediate rules. 27 The sum of the probabilities of all productions with the same left hand side must be 1. Assuming that all nonterminal and intermediate grammars are unambiguous, i.e., they have unique left most derivations, the probability function for an array language is defined as f(x) . TT Irk‘TTrl2 ifxEL(AG) kEK IEL where r1 signifies probabilities of nonterminal production rules and r2. probabilities of intermediate production rules; K and L are sets that represent the sequence of nonterminal and intermediate productions, respectively, that were used in deriving the element x. The p-AG is consistent if XE: f(x) = l xELIAG) 2.3 Array Automata Krithivasan and Siromoney [60] have defined array acceptors to accept languages generated by two dimensional array grammars as (AA:BB) array automata for (XX:YY)AG's; where AA and BB can be linear bounded. pushdown. or finite state depending on whether XX and YY are context-sensitive. context-free. or regular, respectively. The 28 automaton illustrates a generation scheme for array patterns and a two step parsing scheme that can be used for recognizing array languages. Step one consists of imposing the appropriate block structure on the pattern being tested, and step two checks if the contents of individual blocks are members of the corresponding intermediate matrix language. This approach is also exploited in the inference scheme described in the next chapter. A formal definition of an array automaton and its implementation is given in Appendix B. We note that the automaton uses the sequen- tial-parallel approach to processing. The first step is a sequential step by step process to subdivide the picture into blocks, and the second step is a parallel procedure where the automaton can work on each block independently and simultaneously. 2.A Definition of Type and Block Number In this section, we define the type of an array pattern. and block numbers, which label samples from particular intermediate languages. Type: The pattern type is defined to be the number of times nonterminal production rules have to be applied to generate the required block structure for the pattern. Block number: Block numbers are defined for the blocks that make up an intermediate language. All the blocks from an intermediate language 29 either have the same number of rows. or the same number of columns; therefore. they can be arranged in order of increasing number of rows if the number of columns is fixed, or in order of increasing number of columns, if the number of rows is fixed. Integer labels are assigned in sequence to the blocks of an intermediate language in order of their increasing size determined by their variable dimension. These labels are individually defined for each intermediate language of an array grammar. For example, consider the nonterminal rule S->(A¢S)es Applying the nonterminal rule four times we obtain the block structure shown in Fig. A. For the pattern of size A, the four blocks of intermediate languages A and B have been labelled 1. 2, 3 and A. 2.5 Summary In this chapter we have formally defined array grammars. stated some of their useful properties and also described a type of sequential-parallel array automaton for accepting array languages. The examples suggest that these grammars provide a more powerful generative model than matrix grammars in terms of generating patterns of fixed 3O A 3 2 I <- L + + 4. + + +L A S I A + v+ % A B l A + 4 + B 2 B 3 B A + 41 T Fl . A: Pattern Type and Block Number proportions without the inconvenient shearing effects of Rosenfeld array grammars. The block structure approach to array grammars is exploited in defining a model for pattern description. CHAPTER III MODEL DESCRIPTION AND INFERENCE This chapter develops the concept of two dimensional pattern description and learning based on the array grammar model and describes a scheme for inferring intermediate matrix grammars. Some interesting research questions that arise in the inference scheme are discussed. Techniques for computing block probabilities appears with the noise and distortion models in Chapter 5. 3.1 Pattern Description and Learning The general inference problem can be stated as follows: Given a number of samples of rectangular arrays of various sizes from a class of patterns or images, infer an array grammar that describes the structure of this class of patterns. Section 2.3 suggests that this is a two step process. The first step is the inference of nonterminal rules. and the second is the inference of intermediate rules. Prior information is required to set up the block structure, or the nonterminal rewriting rules, for the class of patterns. This leaves the learning problem to the inference of the intermediate matrix 31 32 grammars. Looking at this in another perspective. we are presented with a set of patterns for which ideal structures are known. The observable representations are distorted from the ideal structure primarily because of inherent limitations of the recording and digitization process. These pictures would not be accepted by an array automaton trained to recognize only the ideal structures of the class. Uncertainty in string grammars has been modelled by probabilistic or stochastic grammars and error correcting parsing techniques. In probabilistic grammars [ll,Al,69]. probabilities are assigned to each production rule and a sample is accepted in a particular class only if it is generated by the grammar representing the class with a high enough probability. Methods have been developed to infer p-grammars from a stochastic sample set. where a probability is given for each pattern in the sample set. or assigned according to a probabilistic model [A2,53]. Another approach to handling uncertainty in one dimension has been to apply error correction to noisy strings and transform them to a form that belongs to the class of patterns being described. Three common types of corrupting errors are defined; insertion of extra symbols. deletion of existing symbols. and the change of one symbol to another. These are analogous to the three edit operations adopted to define distance measures between strings, such as the string edit or Weighted Levenshtein Distance (WLD) [6A,80,ll3]. Error correcting parsing 33 techniques [l.AA,67] develop proximity measures between a sample and a language. Acceptance in a particular class involves finding the string of minimum distance using a parsing algorithm (e.g. the Earley parser [33]). Lu and Fu [67] define a stochastic deformation model for patterns by extending a stochastic context-free grammar to a stochastic context-free error induced grammar. Liou [66] developed another type of heuristic inference procedure for regular and context-free grammars. 3.2 The Inference Scheme One objective of this thesis is to incorporate distortions into the array grammar model. so that the automaton derived from the grammar accepts not only the ideal structure but also distored versions of the ideal structure. Section 2.2.3 defines probabilistic array grammars, and this concept is used to infer probabilistic grammars at the inter- mediate level. The proposed inference scheme is outlined in Fig. 5. The first step is to impose a block structure on sample arrays using the nonterminal rewriting rules. which are assumed known. using an array automaton. as described in Sec. 2.3 and Appendix B. When the underlying noise or distortion model is also assumed to be known. Step 2 computes the probabilities of blocks extracted from the sample set. The inference problem is now reduced to the inference of p-grammars for each intermediate language from training sets made up of blocks with 3A + ----------------------------------------------- + I I I 1) Using prior information I I break up pattern into blocks. I I Form sample sets for each IML I I I I ----------------------------------------------- I I I I 2) Using noise/distortion model, assign I I probabilities to each block in sample set I I I I ----------------------------------------------- + I I I For each IML I I I I + ------------------------------------------ + I I I I I 3) Convert matrix samples into I I I string form using vector primitives ! I I I I + ------------------------------------------ + I I - . I I I A) Using p-string samples infer I I I p-grammar for above strings I I I I +----+ ------------------------------------------ + I I I 5) Compute discrepancy measure for I I array grammar I I I + ----------------------- + ----------------------- + associated probabilities. Step 3 codes each block into a string form using the property that all matrices of an intermediate language either have the same number of rows or the same number of columns. For example. the blocks of IML LA’ 35 for the digit '7' in Fig. 6 have three columns whereas IML LB has a fixed number of rows. When the number of columns is fixed. each row is coded as a single symbol. Thus rows represented by the pattern 'xxx' are coded as 'u' and the rows represented by '...' are coded as 'v'. Then 2 n LA = {vu v | n >= 0}. IML LA is thus reduced to a string form. with u and v as primitive symbols. The terminology 'vector primitive' for u and v means that they represent a row or column of primitive picture elements. Similarly. each column of a IML with a fixed number of rows is replaced by a coded vector primitive to reduce the elements of the language to a string form. As shown in Fig. 6, LB: {pqrstnI n >; I}. If the number of rows or columns is k, the total number of possible vectors is 2k. Step A applies a standard grammatical inference algorithm to obtain a p-grammar. When a regular intermediate grammar model is used, the k-tail method of Biermann and Feldman [9] along with a maximum likeli- hood estimation scheme [A2] will infer the probabilistic intermediate matrix grammars. The k-tail method produces a number of candidate grammars as the length of the tail. k. increases. The 'best' candidate grammar is selected with a criterion based on the complexity of the grammar and the discrepancy between the language of the grammar and the given sample set in Step 5. These complexity and discrepancy measures 36 / 1 L = X X X n >= 0 A Ixxx I L(XXX)n / Code X X X as u and . . . as v. Then 2 n LA = vu v | n >= 0 } ..xx “ L = X X . n >- I 3 xx .iI Code 0 O x X 0 . as p. X as q, X as r, . as s and . as t. X X . Then L8 = { pqrstn | n>=l } Fig. 6: Coding Intermediate Languages 9i L1; into String form. are a key step in the inference scheme. Properties of different complexity and discrepancy measures are discussed in the next chapter. The inference scheme poses a number of interesting research questions. The first involves the selection of mathematical noise and distortion models to describe the deviations in the observed patterns from the ideal pattern structures. Step 3 requires a moderate number 37 of vector primitives, so as to keep the inferred grammar simple. This gives rise to the question of how vector primitives may be optimally defined. Another research question is the establishment of a criterion for the suitability of the inferred array grammar. A major contribution of this thesis is the development of mathematical noise and distortion models that describe deviations from the ideal pattern structure. These models are used to compute probabilities of individual blocks. A simple random noise model, two independent distortion models. and a distortion model based on Markov random fields are presented in Chapter 5 along with techniques for estimating the parameters of these models and computing the block probabilities. In Chapter 6. we study the robustness of the inference scheme to different noise and distortion models. Once all the intermediate matrix grammars are inferred. we have an array grammar representation for the pattern class being considered. It is important to establish the suitability of this grammar to the pattern class. or how well it fits the training patterns. A discrepancy measure which compares the 'fit' between two array languages is developed in the next chapter. 38 3.2 Summary This chapter describes a model for two dimensional patterns in terms of array grammars and proposes a learning scheme that infers pro- babilistic intermediate matrix grammars from noisy and distorted patterns. once the block structure description of the pattern class is known. The development of the two dimensional learning scheme and its use in conjunction with probabilistic noise and distortion models is a unique contribution of this thesis. CHAPTER 1V COMPLEXITY AND DISCREPANCY MEASURES A large number of grammars exist which generate languages that contain a given finite training set. A cost function is needed to evaluate the 'goodness' of candidate grammars generated by an inference scheme. This function is either intrinsic to the scheme or imposed on its outcome. At one extreme. the language of a candidate grammar could be the finite sample set itself. At the other extreme, the language includes all possible strings. The grammar that generates exactly the finite sample set tends to be more complex in structure than grammars that generate larger languages. Therefore, it seems natural that the 'goodness' of an inferred grammar be based on two measures - the grammatical complexity, which refers to the structural complexity of the inferred grammar, and the discrepancy. which measures fit between the language generated by the grammar and the given sample set. On one hand, the inference scheme should pick the structurally simplest grammar as the solution grammar, but on the other hand, the language generated by the grammar should closely fit the given sample set and contain only strings that are logically implied by the training samples. The choice of the 'best' grammar will then depend on how much discrepancy one may allow in light of the complexity of the grammar. 39 AO Complexity and discrepancy measures have been defined for string grammars, but have not been generalized to other constructs such as tree or array grammars. In this chapter. we review the complexity and discrepancy measures cited in the literature, and pick a simple comp- lexity measure that suits our inference scheme. A new discrepancy measure is defined based on dissimilarity measures among strings and its key properties are presented. The concept of the discrepancy meas- ure for string languages is extended to array languages by defining a measure of fit between two arrays. This measure is then used to define the suitability of an inferred array grammar to the particular class of two dimensional patterns under consideration. A.l Complexity Measures Complexity measures in one dimension are based on the size of a grammar or on information theory. Size complexity measures count the number of nonterminals. terminals and productions of a grammar, while information theoretic measures are based on probabilistic models. In the first two parts of this section we briefly review size and information theoretic complexity measures. In the last part, we compare these measures and select one for our inference scheme. based on 'its adequacy to effectively order inferred grammars by their complexity, and its computational complexity. Al A.l.l Size Complexity Measures Size complexity for string grammars can be dynamic or static. Dynamic measures [51] deal with the notion of the 'speed' with which a grammar generates its language. and are therefore directly tied to the number of steps an automaton would take to recognize a string. This concept has been used for language parsing in compiler theory. Static measures [10,35] have been more widely used in the grammatical inference framework, because they are relatively easy to compute. They are usually based on the number of nonterminals. production rules and/or the sizes of production rules. Gruska [A8]. Blum [10] and Wharton [118] defined complexity measures by mapping a class of grammars G, into the nonnegative inte- gers. Gruska's definition allows intuitively unacceptable mappings. such as constant mappings [118]. Blum's measure. defined on Q. and a fixed terminal set. I. is based on two important axioms; (i) there are a finite number of equivalence classes of grammars of any given com- plexity, and (ii) there is an effective procedure which determines for any positive integer. c. which grammars are of complexity c. For example. a complexity measure C. for the class of context free grammars in Chomsky normal form is. C(G) = n(V) + n(P) + n(I) where. n(V) is the number of nonterminals, n(I) is the number of A2 terminal symbols and n(P) is the number of production rules. A complexity measure valid for one class of grammars may not be valid for a larger class of grammars. The complexity measure defined above cannot apply to the general class of context free grammars, because the finiteness of the complexity classes depends on the fact that the maximum length of the right hand side of a production in the Chomsky normal form is two. Wharton [118] remedied the above problem by introducing the maximum length of the right hand side of a production into the complexity measure. Feldman [35] also defined a general complexity measure to be expressible in terms of an intrinsic complexity which is EAO (effectively approximately ordered) by the number of symbols required to write the grammar, and a derivational complexity. This complexity function is a computable. unbounded. increasing function of each of its two arguments. Feldman's measure is quite restrictive in the sense that one measure. length. is used as a cannonical measure, in terms of which all other measures are defined. Blum's definition of complexity is more general. because it does not require a specific cannonical measure. All the size complexity measures discussed above. are based on intuitively simple concepts. and are directly linked to the structural size of grammars. A3 A.l.2 Information Theoretic Complexity Measures Information theoretic complexity measures are based on probabilistic grammar models. The term 'information' comes from Infor- mation Theory. and represents the decrease in uncertainty that comes about with the occurrence of an event in a stochastic process. Cook [21] defined a complexity measure for grammars as the information required to specify the grammar, or the sum of the complex- ities of its productions. The complexity of a production is determined by its right hand side and the complexity of a string is computed in terms of a grammar-grammar. This measure assumes that the elements of productions are statistically independent, which may not be realistic. For example. context free grammars exhibit similarity among product- ions. A way around this is to introduce conditional formulation. Other approaches have been suggested by Morning [53]. who derived a complexity measure from Bayesian theory, and Feldman et al. [3A] who defined the derivational complexity of the sample set relative to the grammar G. The derivational complexity is closely related to the probability of the string being generated by the grammar. The less probable the string, the higher its derivational complexity. Therefore, this measure is similar to Horning's because it maximizes the probability of occurrence of the sample set by the grammar. AA Complexity measures have also been defined [19.61] in terms of the generative capacity of grammars, called the entropy or channel capacity of the grammar, and depend directly on the number of words generated by the grammar. Chomsky and Miller [19] formulated the information capacity of regular grammars in a manner similar to the channel capacity concept of Shannon [96]. Kuich [61] developed similar results for context free grammars and defined entropy in terms of the structure generating function for a grammar. All the information theoretic measures relate the complexity of a grammar to the information required to specify the grammar. in terms of the number of production rules and the number of symbols used. Whereas size measures are directly based on counts and are computationally much simpler. these measures use probabilistic concepts to derive information theoretic quantities. resulting in measures that are computationally more complex. The choice of a complexity measure is highly dependent on the problem at hand. Therefore, it is not possible to define general criteria for a best complexity measure. A.l.3 Choice of Complexity Measure This section describes our choice for the complexity measure to be used in conjunction with the inference scheme for intermediate string A5 grammars. The measure is easy to compute, but permits quantification of the differences in complexity among grammars generated by the k-tail inference scheme described in the last chapter. Size complexity measures. such as Blum's and Wharton's. are very easy to compute. as opposed to those based on information theory. Moreover, Cook's and Feldman's measures are tailored towards context free grammars and are quite redundant for regular grammars. Horning's measure requires the specification of the intrinsic probability of a grammar. which is actually the a priori probability of that grammar in the given class of grammars. This parameter is very difficult to define in our application. For these reasons, we choose the size complexity measure defined by Blum. We define the set 9 (Sec. A.l.l). as the set of regular grammars generated by the k-tail inference scheme for different tail lengths. A complexity measure for regular grammars requires only the number of production rules. The complexity of the inferred grammars then increases monotonically with increasing tail lengths. A.2 Discrepancy Measures Discrepancy measures for string grammars evaluate the match between the language generated by a grammar and the given training set of patterns. These measures can use the differences in the probability A6 distributions of the elements in the sample set and those generated by the grammar or they can be based on structural differences between the sample set and the language. The discrepancy between two probabilistic languages LI and L2, can be made up of three parts corresponding to the subsets. L‘(\ L2. Ll-LZ and L2-L]. Languages L1 and L2 are said to fit closely if the strings in L but not in L2, are structurally similar to the strings 1’ in L2; so also for strings in L which are not present in L Various 2 1' measures have been proposed to quantify these ideas. such as the Weighted Levenshtein Distance (WLD) [80,113] and the similarity measures of Findler [36]. Liou [66] has defined variations of the WLD and studied their properties. We use the WLD as the measure of dissimilarity among strings. In the inference framework. language L2 is the finite sample set. S. The discrepancy measure should reflect how closely the language of an inferred grammar fits S. and whether it is a natural generalization of 5. Since L(G):> S, the discrepancy measure should depend on probabilistic differences for strings belonging to S. and structural differences on the set L(G)-S. The discrepancy or difference of fit, is a form of proximity measure between two sets of strings. Therefore. it would be desirable that they satisfy the metric properties, analogous to distance measures in metric space. Measures of fit should be bounded and convergent; otherwise the discrepancy might increase A7 without bound for languages L(G) that are infinite generalizations of S. Convergence of the measure also allows a comparison of individual discrepancy values against an upper bound and makes the discrepancy measure computationally feasible. In this section, we review existing probabilistic and structural discrepancy measures and motivate discussion for a new discrepancy measure. A.2.l Probabilistic Discrepancy Measures Maryanski [69] defined several discrepancy measures for probabilis- tic languages based on the absolute difference or the square difference between probabilities. Given two probabilistic languages L and 1 L2 with word probabilities p(x) and q(x) respectively, he defined six difference measures which are not metrics. An example is the absolute difference measure oaIL,.L2) ~23 |p(x) - q(xll + naIL2.L,I xELl One difficulty of these measures is that as LI increases monotonically in size, the distance between LI and L2 increases monotonically. so a 'best' grammar is not easily selected. Generalizations resulting in infinite languages would be rejected in favour of finite generalizations. A8 Cook [21] defined a discrepancy measure between a finite training set of samples and a language, which he interpreted as the information lost or gained in going from the string probabilities in the sample set to those in L(G). even though this measure has no basis in information theory. Another problem with this measure is that there is no guarantee that it will converge for infinite languages. Liou [66] defined a procedure for limiting the number of strings to be considered in computing the sum for Cook's discrepancy measure, while trying to keep the information loss as low as possible. This procedure is still computationally expensive, and does not directly reflect structural differences among strings. A.2.2 Structural Discrepancy Measures Structural differences between the training set and an inferred language can be evaluated in terms of a distance measure between two strings. such as the string-edit, or the Weighted Levenshtein Distance (WLD) [6A,80,ll3]. A string that belongs to both the sample set and the language of the grammar being evaluated, would not contribute to the discrepancy measure. Otherwise if x E L(G) but x C S. then d(x) 8 min {dist(x.Y)}. A general discrepancy measure could be defined as, DELIG).S] ‘ XE: WXX) d(X). 'T-(l) xEL(G) A9 where {w(x)} is a sequence of weights that corresponds to the strings of the language L(G). This idea was used by Wharton [118] to define metrics on a class of languages over a finite terminal vocabulary to measure the difference between two strings. He defined a discrete metric, which is used in exact language identification but is not useful for inference. The set L(G)-S is countably infinite, so evaluation of (1) may not be computationally feasible. There is no guarantee that the sum converges. Therefore, this measure. too, is not very useful for our purposes. A.2.3 Choice of Discrepancy Measure Maryanski's discrepancy measures, though computationally simple, ignore structural differences and Wharton's weight measure ignores probabilistic differences. Cook's measure incorporates both to some extent but has the drawbacks discussed above. Therefore, we have decided to define a new discrepancy measure. which takes into consideration both structural and probabilistic differences. 50 A.3 New Discrepancy Measure In this section, we propose a new discrepancy measure for string grammars that possesses a number of desirable properties. We begin with a formal definition of the measure, study its properties in relation to the inference scheme, and also discuss computational matters. A probabilistic difference measure. such as the absolute difference measure in Section A.2.l, adequately measures the discrepancy for S. In Section A.3.1 we define a new discrepancy measure for L(G)-S. D(L(G).S), which is a special case of the general structural discrepancy measure (I) used by Wharton. The overall discrepancy between L(G) and S is defined as the following linear combination of the above two measures. where bl and b2 are positive real numbers which determine the relative importance of each discrepancy in the total discrepancy measure. The choice of bl and b2 depends on the application at hand. and will be discussed in greater detail in Chapter 6. A.3.l Definition The structural discrepancy measure, D(L(G).S), is based on struct- ural differences between probabilistic languages. Two languages L‘ and L2 are defined to be structurally equivalent, iff 51 V x :x E L1¢=>x 6 L2. The structural difference between L1 and L2 is expressed as, 0“],in -ELZp(x) «10:.in + g q(y) d(y.L,) -—(3) 1 Y 2 where p(x) is the probability of x G L1 and q(y) is the probability of y 6 L2; d(z,L) is some measure of proximity between string 2 and lan- guage L. Equation (3) can be rewritten as D(L].L2) = ELdIXiL2)] + ELdIY.L])] where E denotes the expected value implied in (3). and X and Y are ran- dom variables representing strings from languages LI and L2 respect- ively. The discrepancy measure D(L].L2) is a proximity or dissimilarity measure between the two languages L1 and L2. Anderberg [A] provides a thorough review of such measures. They are normally required to satisfy the following properties : (I) D(L].L2) >- 0. with equality if and only if, L1 and L2 are struc- turally equivalent, (II) D(L].L2) - D(L2.L]). A third condition. the triangle inequality, i.e. 52 would result in the discrepancy measure having metric properties. However. we have not found a convenient and useful way of defining D(L],L2) to make it satisfy the triangle inequality. Condition (II) is always satisfied irrespective of the way we define d(x,L). To satisfy condition (I) d(x.L) must satisfy the property d(x.L) - o, if x E L. --(A) From (3) and (A) it follows that D(L .L) = p(X) d(x.L) + q(y) d(y.L ). ' 2 xELZ-L 2 ye; -L I I 2 2 1 In general, we will define d(x.L) in the following form. d (X.L) = X f (X.y.WLD(x.y)) . yEL where WLD(x,y) is the Weighted Levenshtein Distance between the two strings x and y. An example is d(x.L) - Za(x.yl g(WLD(x.y)). --<5) yEL Here a(x.y) is a weight factor. called the window function. As L can be an infinite language, the weight factor limits the sum to only a finite number of terms. We make a(x.y) depend on the length of string x. denoted by £(x). to bound the magnitude of d(x.L). Therefore, there exists a constant c, for which 0 <- d(x.L) <- c, Vx E L]. Thus for any arbitrary language, L, p(x) d(x.L) <= p(x) c <= c --(6) PIX) d(x.L) >= 0 "(7) 53 Equations (6) and (7) show that (3) is bounded and therefore, convergent . Equation (2) reduces to D(L(G).Sl = p(x) d(x,S) --(8) L G)-S where d(x,S) = 0, if x E S. The above is usually an infinite sum, but because of its bounded and convergence properties. only a finite number of terms have to be computed to produce D(L(G),S) for a specified accuracy. A.3.2 Particular Proximity Measures In this section we develop particular proximity measures defined in (2) that reflect the difference in structure between a string x and the set of strings in the language L. Three different discrepancy measures are proposed below. The minimum discrepancy measure is defined as. Dmin(L].L2) ‘xEZLIpOO dmin(X.L2) +YéEjL2q(y) dmin(y’Ll)° where dmin(x’L) = a(x) minIWLD(X.Y)} --(9) yEL and a(x) = l / £(x). Since WLD(x,y) <= £(x) + [(Y) 5A we have minIWLD(X.y)} <‘ L(x) + b yEL where b <= min{ £(y)}, which shows that dmin is bounded. which implies that Dmin(Ll’L2) Is convergent. The minimum discrepancy measure can be looked upon as an optimistic estimate of the proximity between two languages. Another estimate may be an average distance between string. x, and the set, L. A reasonable way to define the distance would be to compute a weighted average distance between x and all strings from a finite subset of L. Such discrepancy measures are called window discrepancy measures. Finite subsets are identified by placing windows on the set L which assigns 'weights' to all strings whose lengths are between £(x)-c and £(x)+c, where c >. 0. The general definition of window discrepancy measures d*(x.L), appears below. Let Lc(x) = {yEL| £(x)-c<= £(y)<- £(x)+c}, where c >. 0 If x E L, d*(x,L) = 0. If x G L and Lc(x) 8 O. define d*(x,L) - 2 + c/ £(x). If x E L and Lc(x) f O. d*(x,L) =YELa*(x,y) WLD(x,y). where a*(x,y) - 0, if y E Lc(x). Let Kc(x) denote the cardinality of Lc(x). For some fixed c >= 0. the rectangular window function is. 55 l / ( £(XI KCIX)). if Y 5 L (x) c arect(x’y) a O , otherwise. The corresponding discrepancy is f 0, if x E L --(10) drect(x’L) =<2 + c/ £(x). if Lc(x) - O. x G L [l/ £(x)] WLD(x.c) otherwise, \ where m denotes the average distance between x and strings in Lc(x). Similarly a triangulgr window function is defined as ( £(Y)- £(x)+c)/Ci £(y)<' £(x). Y 5 Lc(x) atri (My) = l/( £(x) .Kc(x)) < ( £(x)- £(y)+c)/c; £(yl> CIXI. y E Lc(x) 0 . Y i Lc(x) Strings in Lc(x) with lengths close to £(x) influence the triangular proximity most. The corresponding discrepancy function is defined as f 0, if x E L dtri(x’L) g A 2 + c/£(x). if Lc(x) = O. X¢ L --(11) \atri(x,y) WLD(x,y). if x EL. From a comparison of the three discrepancy measures it is apparent that dmin(x’L) <- d (X.L): rect dmin(x.L) <= dtri(x.L). Therefore it follows that D . (L].L2) <= 0 mIn (Ll’L 2)i Dmin(Ll’L2) <. Dtri(Ll’L2)° FCCt 56 For a fixed window size. c >= 1, we have Dtri(LI’L2) <- D (L L rect l’ 2)' This confirms the fact that the minimum discrepancy measure is an optimistic estimate of the proximity. The triangular and rectangular measures fall into the category of weighted average estimates. The next section shows that all three discrepancy measures behave similarly when used with an inference scheme. The choice of a particular measure depends on the application being considered. A.3.3 Properties of Structural Discrepancy Measure We now look into properties of a structural discrepancy measure D(L(G),S) in the grammatical inference problem. The greater the probability or the smaller the length of a string, the larger is its contribution to the discrepancy measure. This conforms to intuitive notions [A2] that small strings are more important than large ones. Simple examples are used to illustrate some of the desired properties. We assume SCL(G) and D(L(G).S) =Zp(x) d(x,S). xE{L(G)-S} Property: Given a probabilistic language, L, and two finite sample sets SI and $2. such that SIC: SZCZ L. Then, Dmin(L’Sl) >' Dmin(L'52)' 57 The proof is simple. The same conclusion cannot be made for Drect (and D .) as demonstrated below. Example I: Let L = {abc, bac. cba, acb. bca. cab}. with probabilities (l/S-I. 1/51. 1/5-1. 1/5-1. 1/5-1. I/S-I). S]8 {abc. cba} with probabilities, (l/2, 1/2) and $2: {abc. bac. cba} with probabilities. (1/3. 1/3, l/3). With wIndow Size = O, Drect(L’sl) 8 1.2156 and Drec (L.Sz) = 1.3072. Therefore, Drect(L’ S 2)> Drect(L' SI) though S‘CISZ. For a language. L, and and a finite sample set SC: L, we can write our three different discrepancy measures as. (L.S) = p(x) l minIWLD(x,y)} _ --(12) n Z [M yes (L. S) -= p(x) l W) "(13) Z w, Dtri(L'S) =Zplx) 2m ——)WLD(X.c --(III) where WLD(x. c) - at i(x.y) WLD(x,y) yEL The contribution of a string x 6 L to the discrepancy meaéure is a product of three factors - its probability, the inverse of its length and its proximity measure to the set. S. This shows that the 58 discrepancy measures increase in direct proportion to the probability and the WLD function of the strings in L(G)-S. and shorter strings have a greater contribution to the discrepancy measure. An example below illustrates the convergence property of the discrepancy measure for infinite languages. Example 2: Consider S = {a, a2. ----- , an} with probabilities (l/n. l/n, ----, l/n). Let G: X -> aX I a (p.1-p), so L(G) - {an | n >. I} with probability(anEL(G)) = pn-](l-p). oo ._1 Then. Dmin(L(G).S) {go-pipJ (j-n)/j = j=n+1 n 1/9 - (I-p") - n(l-pl/pLIOQII/(l-pll - EZIpJ/III 1-1 This shows that as n increases. Dmin(L(G)’S) decreases. As n ->cz>, Dmin(L(G)’S) -> O. A.3.A Computation of the Discrepancy Measure In this section we discuss the computational complexity of the discrepancy measure in terms of the number of steps required in the computation. This discussion pertains to sequential algorithms for computing the WLD. Recently [17] parallel algorithms have been proposed that reduce the order of the time complexity. They are discussed in detail in Chapter 6. 59 The sum in D(L(G).S) usually involves an infinite number of terms, because L(G) is generally an infinite language. Following Liou [66]. there are two ways in which we can make the computation finite. One way is to use the fact that any grammar generates only a finite number of strings of length <8 m. where m is a positive integer. By choosing a subset. Am 8 {x E L(G) | £(x) <8 m}. we can approximate D(L(G).S)z D(Am,S) .. [22:20 d(x,S) m One intuitive choice for m is max{£(y)}. but then the computed discre- pancy measure may fail to verify whether L(G) is a natural generaliza- tion of S. A second approach is to use string probabilities to limit the number of strings used in the computation. Select a real number. q, O8 q. Then the discrepancy measure can be approximated as D(L(G).S) 0(8 .5) = p(x) d(X.S). q geq 60 Distance computations are based on the WLD [80]. The conventional WLD algorithm has computational complexity of order nm, where n and m are string lengths. Masek et al. [71] have proposed a faster algorithm of order (n max(l,n/log m)), where n >8 m. If the sample set, S, has M strings and we restrict ourselves to N strings from L(G), the worst case computation time for (L(G).S) becomes O(MNmn) (or O(NMn D . min max(l,n/log m)), n>8m). For rectangular and triangular windows the order of computations is similar, except that for x E S. the actual computation time may be less than required for Dmin(L(G)’S)° The computation time for Dmin(L(G)’S) can be reduced by use of a search algorithm. The algorithm is summarized as follows - Arrange the strings in S in order of their lengths. For each x E L(G) chosen in the computation first compare x with y E S : [(Y) = [(x). If 3 y : WLD(x,y) 8 0. exit and pick the next x. Otherwise, find a number a : a 8 minIWLD(x,y)}. yES such that C (x) 8E (Y) The search can then be restricted to the set {y 6 S : L(XI-a <8 L(y) <8 £(XI+a }. For every new length of strings being searched, the bound a is updated. if a < a new old’ and correspondingly the search space Is mInImIzed. 61 All the discrepancy measures defined above are of the same order of complexity; the computation time for rectangular and triangular window measures depends on the window size. All three measures have identical properties in relation to a string grammar inference scheme. We chose the minimum discrepancy measure as our proximity measure for structural discrepancies, but test runs for the experiments we conducted in Chap- ter 6 showed that the other two measures would have produced almost identical results. A.A Discrepancy Measure for Array Languages The fifth step in our inference scheme described in Section 3.2 requires the establishment of a measure of suitability to determine how well the language of the inferred array grammar fits the training set of patterns. We now extend the concept of the structural discrepancy measure for string languages to a structural difference measure between two array languages. This requires the definition of a proximity measure between two arrays. The string edit or the WLD distances has been extended to measure the distance between two finite arrays of symbols [76.109] and is called the two dimensional WLD. or WLD2. We use the concept of WLDZ to define the discrepancy measure between two array languages. The computational aspects of this measure are investigated, and then approximations to the actual measure are 62 developed to reduce the computational effort in our inference scheme. The discrepancy measure between an inferred array language L and the training set of array patterns S is defined as D(L.S) 8 p(x) d(x,S) --(15) xEL-S where x is an array pattern and p(x) is the probability that x E L; d(x,S) is a proximity measure that has the same properties as the proximity measures of Section A.3.l and d(X.S) 8 Z a(x.y) g(WLDZIX.y)) yES where a(x.y) is the weight factor, or the window function. As in Section A.3.2 we can define the minimum measure and the two window measures . The language L, as a generalization of S. is often an infinite language, so we need to adopt techniques to make the computation finite. The algorithm to compute WL02 between two arrays of sizes IxJ 2L+IJ2KL2) [109]: if I - J - K - L = N. we and KxL. is of order (l2+JK have O(N6) and this makes the computation formidable even for moderate N. We approximate D(L.S) by computing only a few terms from L-S. 1’ "IT" ---, N2 (the definition of type is introduced in Section 2.A), with Suppose the training set S. contains array patterns of types N nN , nN +1. ----, nN array patterns of each type respectively. We 1 1 z defIne a new set LSC: L. such that L5 contaIns an,’ 2nN£+], ---. 2nN2 patterns of types N], N]+l, ---, up to N2. respectively. For each 63 size Nk’ Ls contains the 2nNk patterns of highest probability from L; D(LS.S) is then used as an approximation for D(L.S). For our inference scheme, we chose the triangular window proximity measure to compute d(x,S), x E LS; the window is defined as Lc(x) 8 {yES | size(x)-c <8 size(y) <8 size(x)+c}, with C81. where size(x) denotes the size of array x. This estimate is a rough approximation to the actual discrepancy measure D(L.S), but it does estimate the fit for comparison of array grammars inferred by applying different noise and distortion models to the same training set without making the computations unduly lengthy. These experiments are explained in detail in Chapter 6. Much better approximations can be obtained. however, by the use of parallel systems and parallel algorithms, which greatly reduce the time complexity of the algorithm. They too are discussed in Chapter 6. A.5 Summary This chapter discusses the role of discrepancy and complexity measures in the inference of string grammars. A number of the existing measures are reviewed. A complexity measure is adopted that satisfies Blum's and Wharton's criteria. is computationally very simple. and quite adequate for regular grammars. A new structural discrepancy measure is defined; the total discrepancy between the language of an inferred grammar and the sample set is defined as a linear combination of the structural discrepancy measure between the two sets and the 6A absolute discrepancy measure between strings common to the two sets. Lastly, a new discrepancy criterion for the fit between two array languages is proposed. Computational aspects of these algorithms are investigated. CHAPTER V MODELS FOR NOISE AND DISTORTION The inference of probabilistic grammars involves estimating probabilities for production rules from probabilities assigned to the samples in the training set. In the string grammar inference problem, probabilities are assigned to strings either by frequency counts [11.A2]. or from a probabilistic model. A probability model on strings defines conditional probabilities of the type p(x|y), where x.y E T+. An example is the error correcting parsing model described in Section 3.1, where probabilities assigned to each of the three edit operations can be used to compute the probabilities of strings. Pattern description and learning based on array grammars involves inferring the structure of a given class of patterns from images that may be corrupted by some physical phenomenon. The ideal image is changed by random fluctuations in the digitizing unit. or during transmission. This phenomenon, which is independent of the underlying pattern structure. is called a noise process. In other situations, the corrupting phenomenon may depend on the underlying pattern structure. Such effects are observed during photographing or digitizing objects when the intensity differences between light and dark areas is low. causing diffraction effects. This effect becomes more pronounced when 65 66 the aperture of the camera lens is small. resulting in blurred edges separating the object from the background [88]. Points near the boundaries of objects experience more distortion than internal regions. We term such processes distortion processes. This chapter investigates some noise and distortion models. We begin with a simple. independent noise model, and then extend it to define two intuitive distortion models and a comprehensive distortion model using Markov random field theory. We study the mathematical formulation of each model. and techniques for generating sample patterns. based on an ideal structure along with the estimation of the parameters of a model. The estimated parameters are used to compute individual block probabilities as outlined in Section 3.2. 5.1 The Independence Noise Model The independent noise model assumes that the probability of a pixel changing value is independent of all other pixel values in the pattern. This is equivalent to considering an ideal image. which is a digitized mxn pattern, being degraded with independent. additive, stationary noise. The noise process is modelled as a Bernoulli process. The number of changes in a block is assumed to have a binomial distribution. with parameters b, the number of pixels in the block, and p. the probability 67 that a pixel value is switched. The scheme for generating patterns from this noise process is described in Procedure 1 below. We assume that all pattern types (random variable N) occur with equal probability. so N has a discrete uniform distribution over [N],N2]. ( 1 , if Nl<= n <8 N2 P[N=n] = a Nz-N]+I O , otherwise. \ Procedure 1: Input: G. the rewriting rules for the ideal structure, p, N N . l’ 2' Begin: (1) Generate n..~.. U[N].N2] (2) Using the nonterminal rewriting rules, create an ideal pattern structure of size n. (3) For each pixel in pattern, (a) generate x:: U[0,l] (b) if x<8 p. flip pixel value otherwise. retain pixel value End loop. End The result of applying this procedure to a set of isosceles triangles and the numeral '7' are illustrated in Figures 7 and 8 respectively. The parameter p was set at 0.05. xxx . .XXX . .XX . ..x . . a . . e 8 IO xxxxxx oxxxxx X .XXXX . . .XXX ....xx . . o .x . . . . o o o .x ....x. axe... X . X X X X X . X X X X . . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X . X X . X X X X X X X X . X X X . X X X X X X . X X X X X X X . X X X X X X X . . X X X X X X X X X X X X X X X X X X . X X X X X X x O X X X . X X . X X X X X X X X . X X X X X X . X X X X X X X . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X . X X . X X X . X X X X X X X X X X X X X X X X X X X X X X 1: Triangles Generated from Eta- Independent Noise Model. p 8 0.05. are obtained by training set the in patterns Given that the noise model on the ideal pattern structure, the maximum imposing the is likelihood estimate of the parameter p. 69 XXXXXXXXXXX XXXXX XXXXX xx o xx . . .xx . o a o o u .xx o o o . ex .x o .x . xx . o o ..xxx. o a o .XX Xe. . o .x . XXXXXXXXXXX XXXX X.XXXX . o x. . xx .. .xx..... x xx.. . .xx .x . xx. . . .xx.. ..xx. xx. ...xx . .x .....X XXXXXXX. XXXXXXXX x. one xx oeo .XX..X 000x 0 oooXXo 00 .XX XXXXXXXXX XXXXXXX X .XXX .xx .. o o . .xx . o o o o .x o a. o o . . .xx . . ox . o a o . . .XX . . o a .x o .x p = 0.05. . _8_: Noisy ]_'_s_. 7O 8 8 total number of pixels that are flipped total number of pixels in the sample set The ideal structure must be known. Our inference scheme requires that the patterns be broken down into blocks. which can then be grouped into sample patterns for the different intermediate languages. Treating each intermediate language separately. we can compute the probability of occurrence of each block from the noise model. The probability of observing a block with b pixels of which c are flipped is: p° (I-plb'c -- pi boundary changes are more likely than changes in the interior and background regions. The scheme to generate distorted patterns using this model (Procedure 2) is similar to that for the independent noise model. The pattern type is assumed to have a U[N].N2] distribution. Procedure 2: Input: G, B. pi. pb, N1 and N2 Begin: (1) Generate n=: U[N],N2] (2) From rewriting rules create ideal structure of size n (3) For each pixel in pattern. (a) Generate x 2: U[0, l] (b) if interior pixel then if x <8 pi flip pixel value otherwise retain otherwise (boundary point) if x <8 pb flip pixel value 73 otherwise retain End loop. End An example in Figure 10 illustrates this procedure with the class of triangular patterns. The parameters pi and pb were set to 0.05 and 0.15. respectively. Estimation of the parameters pi and pb. from a finite set of training patterns using maximum likelihood estimates, again involves counting procedures, 8i 8 total number of interior pixels flipped total number of interior pixels and 8b 8 total number of boundary pixels flipped total number of boundary pixels Once 8i and ‘hb are estimated. a probability can be assigned to any block by assuming the patterns are acted upon by two independent binomial processes. The probability that a block b pixels experiences ci internal changes and c boundary changes is b c b -c c b -c RE (I pi) pb (1 pb) is where bi is the total number of internal points in the block. and bb the total number of boundary points in the block; bi+bb 8 b. 7A .XX X X . X X X X X . X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X X X . X X X X X X X X . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X . X X X X . X X . . X X X X X . X X . X . X X X X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X X X . X ' X X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X . X X X X X X X X X X . X X X X X X X X X X X X X X X X X X X . X X X X X . X X X X X X X X . X X X X 0 x o O O O x . X X X . X X X X X X X . . X X X X X X X X X X X X X . X X X X X . X X X X X . X X X X X X . X X X . X X X X X X X [gles generated g1 Independent Fig. 19: Trian pi8O.05, pb8O.15 Distortion Process. 5.2.2 Neighborhood Distortion Models at The neighborhood distortion model allows the distortion process a function of the values at neighboring pixels in the to be pixel extent of the determine surrounding pixels pattern. The ideal 75 nonhomogeneity in that region. If a majority of the neighboring pixels have the same value, we assign a smaller probability of change than when the surrounding neighborhood is nonhomogeneous. We now formally define the concept of neighborhood for a two dimensional array of pixels. The distance between pixels or points in the array assumes discrete values. Figure ll(a) depicts some of the squared distances from point x in the array. These squared distances form a sequence of numbers I, 2. A, 5, --- called {e(k)}, k 8 l, 2, ----. Following usual image processing conventions. pixel (m,n) is a kth order neighbor of pixel (i.j) iff the squared distance between (m,n) and (i.j) is the kth term in {e(k)}. Figure ll(b) shows the ordering of the neighboring pixels from the one marked x. A neighborhood distortion model of order k requires that the probability of change for each pixel (i.j) depends on all its neighbors up to order k in the ideal pattern. In general, if the number of pixel neighbors of order k or less is c, we define a discrete neighborhood function W(i.j) for each pixel (i.j) that can assume at most 2c different values. We assume. for a kth order neighborhood function, that the pattern is surrounded by k rows and columns of 0's at each edge. so that W(i.j) is then well defined for all pixels of the pattern. Corresponding to this neighborhood function we have at most 2c different probability of change values. (p]. p2. ---. p2 ). as parameters for the model. For example. consider a first order model 76 + ---------------------------------- + I I l I I I l I I I I 10 I 9 l 10 I I I I I I I I I I I l----+----+----+ --------- +----+----I I I I I I I I I I I 8 I 5 I A I 5 I 8 I I I I l I I I I I I----+----+----+---—+----+---—+----I I I I I I I I I I 10 I 5 I 2 I I I 2 I 5 I 10 I I I I I I I I I I----+----+----+----+----+----+----I I I l I I I I I I 9 I A I l I x I I I A I 9 I I I I I I I I I I----+----+----+----+----+----+----I I l I I I I I I I 10 I 5 I 2 I I I 2 I 5 I 10 I I l I I I I I I I----+----+----+----+--—-+----+----I I I I I I I l I l I 8 I 5 I A I 5 I 8 I I I I I I I I I I I----+----+---—+----+----+----+----I I I I I I l I I I I I 10 I 9 I 10 I I I I l I I I I I I + --------------------------------- + FIG. ll(a): Squared Distances 39 Point x. where each pixel has four neighbors. The neighborhood function can have a maximum of 16 different values. and the model can have a maximum of 16 different probability of change values as parameters. Simpler functions can be used to describe the type of distortion we desire, such as, 77 +----------------------------------+ — . . . _ . _ - . . . . . - . 6 . . . . . . — . . '0 '0 '0 + ' '0 ' + ' '0 '0 + '0 '0 '0 + '0 '0 '0 + ' '0 '0 + ' '0 I . . . . . . - . _ . . . . 5 . I... . 3 . I“ . 5 . . . . . . . '0 '0 '0 + ' '0 ' + '0 '0 | + ' '0 '0 + '0 '0 '0 + '0 '0 '0 + '0 '0 ' . . - _ . . . _ . . . . 7 u I“ . 2 . 1| . 2 . h. — 7 . . . . . ' '0 '0 + '0 '0 ' + ' '0 '0 + I '0 '0 + ' '0 '0 + '0 '0 '0 + ' '0 '0 . . . . _ . . . . . — . 6 . 3 . 1| . x . 1 _ 3 . 6 . . . . . . '0 '0 '0 + '0 '0 '0 + ' '0 '0 + ' '0 '0 + '0 '0 '0 + '0 '0 ' + ' '0 ,0 . . - . . . . _ . . _ . 7 . h . 2 . 1 _ 2 . h. . 7 . . . _ . . '0 '0 '0 + '0 '0 ' + I '0 '0 + ' '0 '0 + '0 '0 '0 + '0 I '0 + ' '0 '0 _ . . . . . . _ . - . _ . 5 . I". . 3 . I“. . 5 . _ - . . _ . I '0 ' + '0 '0 '0 + '0 '0 '0 + '0 '0 '0 + ' '0 '0 + ' '0 '0 + ' '0 '0 _ _ - - . . . . . . . . . . . 6 — _ . . . . . . _ +-------------------------------—-—+ x. to Point ghbors Relative Orders of Nei ll(b): FIG. \l’ 2 ( . . . ) ).J n 9 Mn“ ( k Vb N Hue n m ( W(i.j) 8 order neighborhood of (i.j) th denotes pixels in the k where ka(i.j) (m,n) . is the value of pixel and v(m,n) 78 For k = l, the above function can have five values. 0 through b, so we can define a maximum of five different probabilities of change, p0 through p“. As a snmple example, we set po - ph and p1 - p2 = p3. Then a value of 0 or b for the neighborhood function indicates a homogeneous region and the other values represent nonhomogeneous regions. Setting p1 > po produces more likely distortion along edges than in interior regions. The value of the neighborhood function for each pixel is determined entirely from the ideal structure of the pattern, which is pre- determined. Therefore, the probability of change for any pixel (i,j) is independent of the probability of change for any other pixel (i',j') in the same pattern. We define an indicator function z(i,j) for an observed pattern, as I if the pixel value is flipped, z(i._i) = 0 otherwise. Assuming that N, the type of the pattern, comes from a U[N‘,N2] distribution, the probability of a block sample belonging to an intermediate language is Nz-N]+l J 6block This model is uniquely defined by the neighborhood function W(i.j), the 79 size of the neighborhood k, and the different probability of change values that are used. The generation procedure for samples from this model is given in Procedure 3. Procedure 3: N Input: G. W(i.j). {pi}. N]. 2 Begin: (l) Generate n z U[N],N2] (2) Using rewriting rules create ideal structure of type n (3) For each pixel (i.j) (a) compute W(i.j) from ideal pattern (b) generate x z U[0,l] (c) if x <- pw(i,j)’ flip pixel value in observed pattern otherwise, retain ideal pixel value End loop. End Maximum likelihood estimation of the parameters, given the neighborhood function and size of neighborhood, again involves a counting procedure. The pixels in the observed patterns are grouped in terms of their neighborhood function values, and for each distinct functional value, we count the number of pixels flipped as well as the total number of pixels in that group. Therefore, for a particular value of W(i.j) = x, 80 A . . p = number of x-group pixels flipped total number of x-group pixels The complexity of this model depends on the size of the neighborhood and the neighborhood function H. We chose the w function as defined in (2), and used the model for first order neighbors with two different probability of change values. The computational complexity is the same order as in the previous two models, except that we now compute the neighborhood function for each pixel (i.j). The results of applying this model to the triangular patterns are illustrated in Fig. l2. 5.2.3 Applications and Limitations The two models described above satisfy the requirements for simple distortion processes in that the corrupting phenomenon depends on the underlying pattern structures. The neighborhood model is computationally more complex than the independence model. However, it is a richer model in the sense that it can model a variety of distortions. Both models are limited by the fact that they are static in nature. The pixel values in the observed patterns depend only on the ideal pattern structure, but are independent of each other and do not allow the propagation of distortion effects. 8i oxx Xe. xxx . X X X . X X X . X X X X X X X X X X X X X X X . . X X X X X X X . X X . X X X X X X X X X X X X X . X X X X . X X X X X X X X X . X X X . X . X X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X X X X X X X X X . X X X X X X . X X X X X X X X X X X X X . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X X X xxx 0 Ox 0 ex .xxxx .XXX .xx X X X X X X X X . X X X X X . . X . X X X X X X X X X X X X X X X . X X X X X X X X X X X X X X X X . X X X X X X . X X X X X X X X X . X X X X X X . X X X X X X X X X X . X X X X X X X X X X X X . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X gig. lg: Triangles generated by Neighborhood Distortion Process. po = p“ = 0.05; pI - p2 = p3 - 0.15 82 5.3 Markov Random Field Distortion Model In this section we define a distortion model in which the probability that a pixel changes value depends on whether neighboring pixel values have also undergone a change. The mathematical formulation of this dependence is expressed as a Markov random field process [7,8,55]. The success of MRF's in modelling texture [25] suggests they are appropriate for distortion models. Our two dimensional patterns are binary mxn arrays which, in Markov random field terminology are finite lattices whose sites are labelled O or l. We begin this section with a brief review of the mathematical background of Markov random fields on finite lattices. Techniques for generating distorted samples using this model and the estimation of the parameters of this model from finite training sets are also studied. 5.3.l Background We follow the notations and definitions of Besag [8] and Isham [55]. The set of all possible realizations on a binary mxn lattice is denoted by T. A particular assignment of 0's and l's to a lattice, called a configuration, is denoted X E T. Since T is finite, the process can be described by a discrete probability distribution, u, on the power set of T. Suppose u(X) > 0, V X E T. Then, there exists a real valued potential function 83 (209 = -log{ um / u(¢) l ---(3) where ¢ is the configuration with 0's at all site values and Q(¢) = 0. Rewriting (3) we obtain, U(§) = Z expi-Q(§)}. ---(h) where Z is a normalizing constant obtained by setting the sum of (h) over all subsets X E T to one. This type of probability distribution is frequently referred to as a Gibbs potential. For finite lattices, the class of probability distributions with Gibbs potentials has been proved to be identical to Markov random field models [8.50.89]. A Markov random field [29] is a joint probability density on the set of finite mxn arrays, subject to the following conditions - (i) Positivity : p(X) > 0 V X E T, and (ii) Markovianity : p(Xi j | all points on lattice except (i.j)) - p(xi j | neighbors of (i.j)). A Markov random field is defined to be of order k, if Xi j depends on neighbors up to the kth order. The potential function can be expanded as follows if the pixels are labelled l to N. Q(l) =2 xiFi(xi) + leiji j(xi’xj) + ---- i i,j ’ + xI"2"""NF1,2,----,N(xi'x2’""’"N)' 8h where the functions {F. } are called interacting potentials, and '9j9---ak xi is the label on the the ith pixel. A Markov random field is called an Auto-model if the following two conditions are satisfied. (i) All F functions with more than two subscripts vanish identically; (ii) The conditional probability distribution at each point belongs to the exponential family. The potential function for a first order Auto-model is called the near neighbor potential function and can be expressed as (2(5) g :xiai + iijiiji’j ---(5) where b. . = 0, unless xi and xj are neighbors. From (5) we obtain exp{-x(ai +-Z:iji’j)} l + exp{-(ai +23iji,j)} ---(6) where X. represents the configuration X minus the ith point. Equations (5) and (6) define a first order random field with N(N-l) parameters, the ai's and the bi j's. A Markov random field is said to be spatially homogeneous if ai . a, i =19 2’ ---! N; bi . = b(l) if xi and xj are vertical neighbors and 35 bi j 8 b(2) if xi and xj are horizontal neighbors. A spatially homogeneous first order Markov random field can be defined by these three parameters and is called isotropic if b(l) = b(2). 5.3.2 The Distortion Model We base our distortion model on the isotropic first order Auto model. Given an ideal structure of size mxn, we define a change template T as an mxn binary array generated by the isotropic first order Auto model that relates the ideal structure Y = [yi,j] and the observed patterns X = [xi jJ in the following manner, l-y. . if t. . = 1. ---(7) A desirable property for our distortion models is that the probability of change on and around borders be greater than the probability of change in interior and background regions. This is achieved by defining a nonhomogeneous Markov random field. The points on the change template are broken up into interior points and border points, as determined by the ideal pattern structure. Our first order nonhomogeneous Markov random field model requires five parameters, (i) ai - corresponding to first order interactions for the interior points, (ii) ab - corresponding to first order interactions for the 86 boundary points (iii) ai i - corresponding to pairwise interactions of interior D points (iv) ab b - corresponding to pairwise interactions of boundary 9 points, and (v) ab i 8 a3 b - corresponding to second order interactions for 9 9 interior-boundary pairs. Setting ab < ai produces more changes around border regions than in interior regions. Cross[25] showed that the parameters b(l) and b(2) control the clustering or grouping of l's on a lattice for homogeneous models, with negative values indicating attraction and postive values indicating repulsion. Setting the parameters, b(l) and b(2) to 0, produces equally likely configurations. This interpretation is also valid for our nonhomogeneous model, and ai,i’ ab,b and ab,i control the clustering of interior-interior, boundary-boundary and interior-bound- ary pixels that are l's. Following (5) the potential function can be written as Qt(X) - siai + s a + siia. --(8) b b .,i + sbbab,b + sbiab,i’ where 5i and 5b are the numbers of interior and boundary points respectively that are l's, and sii’ and s are the number of int- sbb bi erior-interior, boundary-boundary and interior-boundary neighbor pairs, respectively, that are both l's. The probability of occurrence of a block, obtained by applying the nonterminal rules of the pattern class, can be written as, 87 l TT p(t..|X) NZ-N]+l (i.j)Eblock "J ' where p(ti | X) is defined in (6), and ti j is the pixel value in .i change template T. We again assume that the type of a pattern is distributed as U[N],N2]. 5.3.3) Generation 9: Distorted Patterns The procedure for generating distorted patterns is based on an algorithm for generating texture [25]. The generation procedure is derived by analogy to a discrete, finite state Markov chain, whose states are the set of configurations { X }, with the limiting distribution given by u(X)[h9]. The transition probability from a state X to a state 1 is given by u(X)/u(X). The algorithm is given in Fig. l3 [37,73]. Convergence to the limit distribution is unaffected by the choice of the initial configuration; only the time taken to achieve the final configurations depend on the initial configuration. Step l in Fig. l3, therefore, requires the choice of a good initial configuration. We generate our initial configuration using the procedure for the independent distortion model described in Sec. 5.2.1. This procedure requires the two probabilities pb and pi whose maximum likelihood estimates are developed below. A likelihood function that defines the ratio of the desired density function to that for the independent model distribution is, 88 exP(Siai + sbab + siiai,i + sbbab,b + sbiab,i L(l) ‘ 55(1— )nb-sb 5i(]- )“L'sl __( ) pb pb pi :3i 9 where nb and ni are the total number of boundary and interior points respectively. Equation (9) can be broken up into product terms and expressed as L(l) ' [exp(- ab)(l- pbl/ pb]sb [exp(- ai)(l- pi)/ pi Jsl - '"i - -n - [ (' pi) (‘ pb) b exp{ (Siiai,i+sbbab,b+sbiab,i)} J The two probabilities pb and pi are chosen to make the expressions in the first two brackets equal to l. Therefore, = l , p. = l Pb : l+exp{ ab} ll+exp{ ai} We follow the procedure used by Cross [25] to define the parameter STABLE. Given mxn arrays, we define M = mn. and consider M attempted switches to constitute one iteration. This count ignores attempted switches between pixels with the same value. Convergence to the desired configurations is indicated when the number of changes per iteration drop off to a small percentage of M, or the estimated parameters match the input parameters closely. These two factors are used to define the variable STABLE in Fig. 13. The algorithm was used to generate distorted sets for the triangle pattern structures as illustrated in Figs. lh and l5 for different second order parameters. The time taken to generate templates of types l0-20 was roughly 500-600 seconds per template on a PDP—ll/3h computer; the computation time for 39 ........................................ .4. l (l) Generate initial Template l usung pi and pb : ........................................ + l (2) Until STABLE ! l + ----------------------------------- + , l (3) Choose sites (k,l) G (k',l') l with different pixel values ! l l l Switch l Retain I l (k,l)G l old i I (k',l')! config.l l l l + -------- + -------- + l (h) Compute u(new)/u(old)=r ! l + ----------------------------------- + l l l (5) r >8 I 7 l l yes no ! + ----------------------------------- + l l l l ! Get 2 - U[0,l] l ! l l ! + ----------------- + ! l l l Switch ! r > z 7 l ! (k,l) l yes no l ! and + -------- + -------- + ! (k',l') ! l l l l l + 519. 1}: Algorithm for Generating Template from MRF distribgtion with given Parameters 90 . X X X X . X X X . X X X X . X X X X . X X X X X X X X X X X . X X X X X X X X X X . X X X X X X X . X X X X X X X . X X . X X X X . X X X X X X X X X X X X X X X . X X X X X X X X X . X X X X X X X X X X . X X X X X X . . . X . X X X X X X . X X X X X X X . X X X . X X X X X . X . X X X X . X X X X . X X X X . X X X X X X X . X X X . X X . X X X X X X . X X X X X X X X X X X X X . X X . X . X X X X . . X X X X X X X X X X . . X X X X X . X X X X . X . . X X X X X 0 x O xxxx xx X X X X X X X . X X X X X X X X . X X . X X X X X X .XX .vn vnvx vnvn yawn vAvA .vn vxvx .vx vxvx vAvA X X X X X X X X X X . X X X X X X X . X X X X X X X X X X . X X X X X X . X X X X X X X . X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X . X X X X X . X X X X X X x O x X xoxxx ox oxxx .xxxx xx X X . X X . X X X . . X X X . . X X X X X . X . X X X X X X X .vA vAvA unvn vnvn x e vAvA .vA .vA les created by Markov random field as fig. 15: Tria generative model. Parameters . < 2.2,].5,0.l,-l.0,-0.8>. 9i X . X X X X X X X . X X X X . X X X X X X X X X X X . X X X X X X X X X X X X X X X X . X X X X . X X X X X X X X X X . X X X X X X X X X X . X X X X X X X . O X . X . . X X X X X X X X X . . X X X X X X X X X X X . X X X X X X X X X X X X X X X X . X X X X X X X X X X X X . X X . X X X X X X X X X X X X X X X X X X X X X X X X . X X X X X . X . X 0 X X X X X X X X X X X X X X X X X X X X X X X X X . X . X X . X X X X X . X X X X X X X X C x x 0 .xxx xx .x e exx . . X X X X . X X X X X X X X X X X X X . X X X X X X X . X X X X X X X X X 0 X x x O O O X . X . X X X . X X . X X X X X X X X X 0 x O X . X X X X . X X X X X X X X X X X X . X X X X X X X X X X X X X X X X X X X X X X . . X X . X . X X X X X X X X X X X X X X X X X X X X . X X X X X X X X X X X X X X . X . X X X X X X X . . X X X X X . . X X X X X X X X X X X . X X X . X X X . X X X . X . X . X X X X X X X X X . X . X X X X X X X X X X X X X X X X fig. 15: Iglggglgg from Markov random field generative Model. Parameters - <2.0,l.2,-0.5,-0.8,0.5> 92 the indepedent models was just a fraction of a second. 5.3.A Estimation of Parameters This section explains a method for estimating Markov random field parameters from a given set of training patterns. Nonhomogeneous Markov random fields involve many more parameters than homogeneous ones [25]. The first approach simplifies the computational requirements of the estimation procedure by approximating the probability distribution u(X) of a configuration X expl-Qt (5)} u(X) = ---(lO) Zexp{-Qt (x); XET For convenience of notation we define C, the count vector as 'S ., 'S c ‘ (”si’ "‘b' in bb’ and A, the parameter vector as, A = ( ai’ ab’ ai,i’ ab,b’ ab,i) Then exp{-Qt(X)} - exp, where < , > denotes the vector dot product. The denominator of (l0) can be expressed as Z exnl-QtQQ} = 2”"EtexpJ mm) XET for a mxn lattice where E is the expectation assuming all 93 configurations generated by the parameter set A are equally likely. We expect the count vector C to converge to a multivariate normal random vector as the pattern size increases. Monte Carlo runs performed to verify this for the triangular pattern structure showed that (ll) seems to converge in distribution to a multivariate normal distribution with mean vector E[C] and covariance matrix R; R is a 5x5 matrix consisting of the second order moments of the C vector. Mardia's [26,70] skewness and kurtosis tests for multivariate normality were the criteria. The Monte Carlo runs involved picking a template size N, and then creating sets of 500 random templates at random. The count vector was then computed for each of these configurations, and Mardia's test was applied to each set .of 500 count vectors. The procedure was repeated l00 times and the number of rejections of the null hypothesis for the skewness and kurtosis tests were observed. This test was conducted for pattern types l0, l5, 20, no and 50. For N less than 20, the multivariate normal hypothesis was consistently rejected by the skewness test but there were less than five rejections by the kurtosis test at the 5% significance level. For pattern types #0 and 50 the hypothesis was accepted at the 5% significance level for both the tests. Therefore, the approximation to the denominator seems adequate and we write, E[exp] exp{ + 0.5} which reduces (ID) to u(X) - exp{,A> - 0.5} 9h If R is nonsingular, the maximum likelihood estimate can be written as A = R"(c-E[cJ) where C is the vector of observed values; E[C] and R depend on the ideal pattern structure. Table 1 indicates that this estimation procedure worked poorly for the configuration of triangles. One possible explanation is that the parameters chosen generate distorted, atypical configurations and that the degree of nonhomogeneity is very high. There is a high degree of attraction among boundary l's in the change template, but the distribution of l's in the interior is quite random. The normal approximation of the probability function is exact at the origin of the parameter space. As one moves further away from the origin this approximation worsens. This was illustrated when the estimation procedure was applied to sets of triangular patterns of type l5 generated by the Markov field model with the parameters listed in Table l. A second explanation is that the approximation is linear but the function is not, so the parameters are not unique; i.e., several parameter sets produce the same probability values but for different sets of configurations. The poor results obtained in Table l prompted us to use the exact form of the likelihood function to estimate the five parameters. The log likelihood function for the probability distribution can be written as 95 Table 1: Parameter Estimates g; the two schemes for the Markov random field model r ____________________________________________________________________ - PARAMETER ESTLMATES-t—f_aiz_ab, ai’i, ab’b, ab’i> GENERATIVE __ __ __ _'._ __E§T1MATED _____ .___ _ _ SCHEME I ‘F SCHEME 2 --L ............................................. . ...................... 1) o.o,o.o,o.o,o.o,o.o o.o,-o.2,o.o,o.i,o.o ** 2) l.0,l.0,0.0,0.0,0.0 2.5,2.5,-0.8,-l.l,-0.8 l.2.l.2,0.0,0.l,0.0 3) 2.0,l.0,0.0,0.0,0.0 .h,3.6,-2.l,-l.6,-l.6 l.8,l.0,0.l,0.3,0.l 5) 2.0,].2,0.0,0.0,-l. 5 h) 2.0,].0,-l.0,0.0 5.5,2.9,-2.l,-l.6,-l.6 l.8,l.3,0.0,-l.3,0.l 5.h,3.3,-2.0,-l.6,-l.8 2.2,0.9,-0.l,0.2,-0.2 6 6) 2.2,).5,0.l,-l.,-.8 .h,h.h,-2.5,-2.3,-2.0 2.0,].2,0.l,-l.l,0.7 7) 2.2,).5,0.5,0.5,0.5 5.“,h.h,-2.l,-2.l,-l.7 l.5,l.3,0.5,0.7,0.5 8) 2.2,].5,-.5,-.5,-.S 5.9,k.h,-2.3,-2.l,-l.9 l.6,l.5,-0.h,-0.h,-0.h 6 9) 2.2,1.5,-.2,-.8,.1 .3,A.7,-2.3,-2.5,-l.9 1.7,1.h,-o.3,-o.9,-.05 L.. .................... ll .............................................. -) ** - could not compute; problems in numerical computation. LI- 2: ln(p(xi B x | X) xiGX where p(xi - x | X) is given in equation (6). All the terms in the summation above are not independent, because of the nature of the Markov random field scheme. Therefore coding schemes [8,25], which break up the lattice into disjoint sets of independent points, are employed. A first order scheme requires at least two codings for 96 estimation purposes so we break up L into parts L(l) and [(2), corresponding to the two codings. The maximum likelihood estimate requires the solution of five simultaneous non linear equations for each k. This system of equations can be solved numerically by a multivariable extension to Newton‘s method [5h]. The final estimate for A is the average value of the estimates obtained from each coding scheme. Although the estimation procedure is quite complex, it does yield accurate estimates for the parameter set A as illustrated in Table l. The estimate may be more accurate as the pattern size increases. 5.h Summary ln this chapter we have studied the independent noise model, two independent distortion models and a distortion model based on Markov random fields and presented their capabilities and restrictions. The Markov random field model is very rich, and generalizes other models. For example, setting the a parameters to 0 and ai - ab, creates the independent noise model; setting the a... parameters 0, but requiring ai f ab describes the independent distortion model. For each model we state a random experiment for the generation of patterns based on an ideal structure. The results of applying these experiments to specific configurations is illustrated. Parameter estimation from a finite sample set and the scheme for computing the probabilities of blocks, obtained by applying the nonterminal rewriting rules, are also 97 presented. The estimation procedure for the Markov random field distortion model is quite complex, and we have presented an approximation to simplify this computation, that is applicable in certain cases. These noise and distortion models are quite general, and may be used in other image modelling applications. For example, the nonhomogeneous Markov random field model can be applied to texture analysis and the analysis of plant and crop data [8]. CHAPTER VI EXPERIMENTAL RESULTS AND COMPUTATION This chapter presents applications of the array grammar inference scheme. The two structures studied are the triangular patterns and the numeral '7', whose array grammars and ideal structures are presented in Figures 2 and 3 in Chapter 2. The training sets are versions of the ideal structures corrupted by three types of noise and distortion. The first set of experiments uses the two dimensional discrepancy measure defined in Section A.A to compare the robustness of the inference algorithm. Robustness properties are further investigated by parsing experiments defined in Section 6.2. The second objective is to examine the effect of errors in the ideal configuration. A set of noisy patterns is generated after rotating the ideal structure slightly and the rate at which the array grammars inferred from the training sets generated by the three corrupting processes accept the samples is determined. The last section suggests the use of parallel techniques to reduce the computation time for the array grammar inference scheme. 98 99 6.] The Inference Scheme This section applies the inference scheme presented in Section 3.2 to training sets whose patterns are generated by independent noise, independent distortion and the Markov random field distortion models of Chapter 5. The triangular pattern structure, illustrated in Figure 2 (Chapter 2), was used for most of our experiments. The remaining experiments generate a p-array grammar inferred from noisy versions of the figure '7' (Fig. 3, Chapter 2). This illustrates the concept of coding matrix samples into strings using vector primitives. The parameters used to generate each of the training sets are summarized in Table 2. All the training sets contained 20 patterns. With independent noise and distortion models the pattern types were chosen uniformly from the range [7,l0]. With Markov random field distortion, the training set was made up of l0 triangles of type 9 and l0 triangles of type l0. This ensured a sufficient number of patterns of each size for parameter estimation. The training set for the '7's were either 8x8 or llxll arrays corresponding to pattern types 2 and 3, corrupted by the uniform noise model. The inference of the array grammar for the 7's takes an enormous amount of computer time and memory so we did not employ the distortion models with it. The parameters used to generate each of the training sets are summarized in Table l. The probability of change parameters p, pi and pb were chosen so as to allow a reasonable amount of deviation without completely lOO destroying the underlying pattern structure. The value of pb was set higher than pi so as to allow greater distortions in the boundary than in interior regions. For the Markov random field model the parameter values were selected after a number of test runs. Parameters ai and ab in Table 2 introduce reasonable amounts of distortion into the patterns, with more changes likely in the boundary regions. Reasonable amounts of distortion along the three edges of the triangle and random changes are obtained in the interior regions for the values chosen for ai,i’ ab,b and ab,i in Table 2. Table ;; Parameters values used Lg generate training sets Generative Parameter values Number of model patterns I) Pattern Structure: Triangles. l) Independent noise p = 0.05, N U[7,lO] 20 2) Independent distortion pi = 0.05, pb = 0.l5 ' 20 N U[7,lO] 3) Markov random field A = <2.2,l.5,0.l,-l.0,-0.8> 20 distortion N - 9,l0 II) Pattern Structure: '7' 1) Independent Noise p - 0.05, N U[2,3] 20 N denotes the pattern type. 10] The first part of this section presents the results obtained at each stage of the inference scheme. The second part studies the robustness of the inference algorithm and demonstrates that the Markov random field model is quite general and covers the other two schemes. 6.l.l Implementing the Inference Scheme The first step in the inference scheme, described in Fig. 5 (Chap- ter 3), is to estimate the parameters of the model from the patterns of the training set as training set as described in Chapter 5. The noise/distortion process used to generate the training patterns is referred to as the generative model, whereas the model assumed when estimating the parameters for the inference scheme is termed the inferential model. With independent noise and distortion inferential models, this involved counting over the entire pattern set. However, the estimation procedure for the Markov random field model depends on size, so parameters were estimated separately for each size. Since the pattern sizes are small, the counts for all the patterns of one size are accumulated together for estimation. Our final estimates average the values obtained over all sizes and the two codings (Section 5.3.A). The estimates for each of the training sets appears in Table 3. 102 Table 3i Estimated Parameter Values I) Pattern Structure: Triangles l) Independent noise p = 0.0A68 2) Independent distortion . pi - 0.05h3, pb = 0.l37 3) Markov random field distort. A = II) Pattern Structure: '7' l) Independence Noise p = 0.05l Step 2 in Fig. 5 breaks up the patterns into individual blocks, computes their probability, and separates them into their respective intermediate languages. For the '7's the blocks were coded into 'vector primitives' and converted to string form as shown in Fig. l6. The k-tail method for inferring regular string grammars produces a number of candidate grammars corresponding to different tail lengths. To emphasize structural differences, weight factors bl = l.0 and b2 - l0.0 were used to compute the overall discrepancy (Section A.2.3). The complexity and discrepancy values were plotted as functions of tail length and the grammar corresponding to the tail length that is closest l03 INTERMEDIATE LANGUAGE -- A VECTOR PRIMITIVES -- Vl --> X . . V5 --> . X V2 --> . X X V6 --> . X . V3 --> . . . V7 --> X . X VA --> X X X V8 --> X X BLOCK CODED FORM x O . X X ----> Vl V2 V2 X X X X X X X ----> V3 VA V2 V3 V3 V3 INTERMEDIATE LANGUAGE -- B . . X X VECTOR PRIMITIVES -- WI --> . W2 --> X W3 --> X WA --> X X . . . X X W5 --> . W6 --> x W7 --> . W8 --> x . - . X X BLOCK CODED FORM 0 O x x O . X X . . ----> Wl W2 W3 Wk W5 x x O O O X . X X . . X X X . X . ----> W8 W2 W3 WA W5 W6 W5 Wl W5 W5 W5 X X . . X . . gig. lg: Coding Blocks into String; or Numeral to the point of intersection of the two graphs was selected as the th 'best' grammar for the intermediate language. Some of these plots are illustrated in Figure 17. The inference procedure was implemented by a Spitbol (Snobol) program on the Cyber l70/750 computer. However, to reduce the total computation time, the discrepancy computation was implemented in a separate Fortran routine. The run time and the amount of core memory required by the inference program is presented in Table A for each of the training sets: LA and LB correspond to the two intermediate languages for each array grammar as illustrated in Figs. 2 and 3 (Chapter 2). The table indicates that the run time and core requirements for the inference algorithm are extremely high, even for the moderate pattern sizes used in our experiments. This was the main reason for limiting the number of experiments conducted. 6.l.2 Robustness Tests The purpose of the robustness test is to assess the effectiveness of the inference procedure when the generative and inferential models are not the same. The two dimensional discrepancy measure assesses the fit of the inferred array language to the training set. 105 oo.pu oo.m« 21% £23.28 22m. oo.;. oo.p. on”. . 0.00 W . m g .uE n H m 5 . mu .m cm m ..." _ w. an o. o- 5 a. 0 MO 8610 8.% 8 o s o n . Szmamcumum » exuuezou 8.6... 85... 8.5 we}: 2.». 8..» 8.0m mu m. n a L L n m. u e one ouuo cane oe.qu 8 c s o s . Bzafiufima cos. IAI Int. Lang. Lang. '8' Int. can. cone buzcuwcumuc »p_xwumzou ee.x« oo.p. ate 0.00 0.00 ouuo ruzcmwcum~c 8.0 (II) OISB. N (I) Generative Model: (ll) MRF Distortion. Generative Model: plexity and Discrepancy versus Tail length. l7: Com Fig. l06 Table 5i Run Times and Core Reguirements Generative Run Time Max. Core Used* Model (secs.) (in K-words) LA LB LA LB Pattern Structure: Triangle l) Independent Noise 295.l 506.7 72.6 92.5 2) Independent Distort. 36l.2 559.9 72.6 72.6 3) Markov Random Field 299.h 909.3 72.6 98.5 Pattern Structure: '7' l) Independent Noise 867.3 906.2 76.6 83.2 A - Each word on the Cyber is 60 bits long. The maximum core available to a user is I27.5 K-words. The triangle was used as the ideal structure. Three training sets of 20 patterns each were formed: l0 of the patterns were of type 9 and the other IO were of type ID. The parameters for the generation process were the same as in Table 2 and the corresponding array grammars were inferred in the usual way. The discrepancy measures are presented in Table 5. Rows l, 2 and 3 of Table 5, with the independent noise generative model, show that the Markov random field distortion model subsumes the 107 Table 5; Robustness Test ; using the Discrepancy Measure Generative Model Inferential Model Suitability Value l) Indep. Noise Indep. Noise l.06l E-l6 2) Indep. Noise Indep. Distort. 2.090 E-l6 3) Indep. Noise MRF Distort. 0.2h5 E-l6 A) Indep. Distort. Indep. Noise l7.62 E-l6 5) Indep. Distort. Indep. Distort. l.933 E-l6 6) Indep. Distort. MRF Distort. 0.66l E-l6 7) MRF Distort. Indep. Noise 37.9l E-l6 8) MRF Distort. . Indep. Distort. 3h.02 E-l6 9) MRF Distort. MRF Distort. ' 'o.oso E-l6 independent noise model. Rows A, 5 and 6 show that with the independent distortion generative model, an inferential model of independent noise creates a high discrepancy whereas a Markov random field inferential model produces an even better fit than the "correct" inferential model. Comparing rows 3 and 6 with the discrepancy values obtained in rows 7, 8 and 9, emphasizes the generality of the Markov random field inferential model, which produces more suitable grammars than the other two inferential models create for the same generative model (rows l and 5). This justifies the use of the Markov random field inferential model. 108 6.2 Parsing Experiments The two experiments reported in this section compare acceptance ' probabilities for triangles generated by the three different noise/distortion models. In the first experiment, the array grammars used for parsing were inferred in Section 6.l.l (Table 3, part I) and the patterns parsed were generated from the same array grammars. The second experiment created l5 distorted triangles from the physical processes themselves, five under each of the noise/distortion models. The array grammars used to parse them were inferred as in experiment I except that intermediate grammars of tail length one were selected. These array grammars are less restrictive than the grammars in I experiment I. The results of experiment I are given in Table 6. Except for pattern number lh, the probability of acceptance for each pattern was highest when the pattern was parsed by the array grammar that generated it. Some of the patterns generated by one of the corruption processes could not be parsed by the array grammars trained on patterns from the other corruption processes which demonstrates differences among the array grammar models of Section 6.l.l. The results of the second experiment are listed in Table 7. The grammar based on the Markov random field model accepted all l5 patterns, whereas grammars based on the independent models could parse l09 Table 6; Acceptance Probabilities ; Patterns generated from Array Grammars Source Pattern Size Inferential Model Number INN IND MRF INN l 9 0.l58 E-l7 0.032 - 7 0.000l E-l7 INN 2 l0 0.l99 E-l9 0.026 - 0.000l E-l9 INN 3 l0 0.203 E-22 R R INN A ll 0.68l E-23 0.026 E-23 R INN 5 ll 0.602 E-26 0.0A9 E 26 R IND 6 9 O IOA E-l9 0.207 E-l9 R IND 7 IO 0.0l3 -27 0.233 E-27 R IND 8 IO R 0.A2A E-26 0 I79 E-26 IND 9 II R 0.275 E-29 R IND l0 ll 0 l78 E-3l 0.5I2 E-3l R MRF ll 9 R R 0.629 E-25 MRF 12 IO R 0.009 E-30 0 7l6 E-30 MRF l3 l0 R R 0.7A7 E-32 MRF lA ll 0.0l7 E-3l 0.327 E-3l 0.00A E-3l MRF l5 II R R 0.353 E-35 INN - Independent noise model: IND - Independent distortion model: MRF - Markov random field model. R - Rejection: pattern could not be parsed by that grammar. only two patterns generated under the Markov random field model. This illustrates that the Markov random field model generalizes both the other models. Table 7 also shows that the independent distortion model generalizes the independent noise model. In a few cases, the patterns were accepted with higher probability by an inferential model that was different from the generative model (e.g. patterns I and 3), but this is more likely a spurious result caused by the random nature of the generative process. IIO Table 1i Acceptance Probability ; Patterns Generated from Physical process Source Pattern Size Inferential Model Number INN IND MRF INN l 6 0 05l E-Il 0.l76 E-ll 0.086 E-ll INN 2 6 0 I65 E-l3 R 0 29A E-l3 INN 3 ll 0 35A E-33 0.530 E-33 0.0lA E-33 INN A ll 0 508 E-32 R 0.006 E-32 INN 5 l2 0 l9l E-30 0.0l5 E-30 < E-33 IND 6 6 0.0l6 E-l3 0.l93 E-l2 0.308 E-IZ IND 7 6 R 0 l6A E-l3 0.002 E-l3 IND 8 l2 0.009 E-38 0 80l E-38 0 07A E-38 IND 9 l2 R 0 59l E-39 O.Al0 E-39 IND l0 l2 R R 0.266 E-A3 MRF ll l2 R R 0.5l2 E-A5 MRF l2 l2 0.0002 E- 7 R 0.l80 E-A7 MRF l3 l2 < E-5l R 0.lAl E-A7 MRF IA l2 R R O.A88 E-A6 MRF l5 l2 R R 0.lAA E-52 INN - Independent noise model: IND - Independent distortion model: MRF - Markov random field model. R - Rejection: pattern could not be parsed by that grammar. < - The acceptance probability was less than the value indicated. 6.3 Recognition of Rotated Triangles The purpose of this experiment is to examine the effects of small rotations of the ideal configuration on the acceptance rate. A triangle of type l3 was rotated by l0 degrees as shown in Figure l8. Ten patterns were created by imposing the independent noise model with a probability of change value of 0.05 on the rotated triangles and parsed by the three array grammars of Section 6.l.l: the results are lll lO° Direction of h Rotation XXXXXXXX #- I ~a‘ ...... 10°L ,k x x x x - -x—-x--x - I I - 4-»4-x Boundary of original 13x13 triangle. ----- Boundary of rotated triangle. Fig. 18; Rotated Triangle. IIZ presented in Table 8. Some of the noisy rotated triangles are shown in Figure l9. The Markov random field model grammar accepted 9 of the ID rotated patterns though the probability of acceptance was quite low (the acceptance value is of the order of E-A7 for the ideal triangular structure of size IS). The noise and distortion model grammars failed to recognize these triangles. This shows that an inference scheme based on the Markov random field distortion model can incorporate small variations in the ideal structure into the distortion process. Table 8; Acceptance Probabilities ; Rotated Triangles Source Pattern Inferential Model Number INN IND MRF INN l R R 0.l32 E-97 INN 2 R R 0 A08 E-IOA INN 3 R R 0.A82 E-93 INN A R R 0.262 E-92 INN 5 R R 0 725 E-95 INN 6 R R R INN 7 R R 0.388 E-97 INN 8 R R O.A3A E-lOl INN 9 R R 0.275 E-92 INN l0 R R 0.723 E-97 INN - Independent noise model; IND - Independent distortion model: MRF - Markov random field model. R - Rejection: pattern could not be parsed by that grammar. ll3 . X X X X . . . X X . . X X X X X X . . X X X X X . X X X X X X X . . X X X X . . X X X X X X X X . . X X X X X X X X . X X X X X X X X X X X . . X X X X X X X . X X X X X X X X . . X X X X X X X X X X . X X X X . X X X X X X X X X X X X X X X . X . . X X X X X X X X X . . X X X X X X X X X X X . X X X X X X . . X X X X X X X X X X X X X X 0 x X X . x O . X X . X X X X X X X X X X X X X X X X X X . X X . X X . X X X X X X . X X X X X X X 0 x . X . X X X X X X X . X X X X X X X X . . X .x xxx ox. . X X . X X X X . x O X . X X X X X X X . X X X X . X 0 x . X X X X X X X . X X X X X X X X . . X X X X X X X X xx xx xx . . xxx . xxxx xx . . xxx xxx xxx o xxxx xx ox xxxx xxxx .XXX 0 0 ex xx ox ox . xx . xxx xxx xxx xxx xxx xxx xxx . xxxx xxxx xxxx .xx . .vAvA [lg . _l_9: Noisy Rotated Trian les IIA 6.A Parallel Computation Techniques The inference algorithm for regular grammars, the algorithm for computing the discrepancy measure and the computation of the measure of suitability for array languages require large amounts of computer time in the normal sequential mode of operation. However, parallel computation techniques, such as the SIMD (Single Instruction-Multiple Data) and pipeline architectures, offer significant speedup. Parallel algorithms have been presented for the k-tail inference scheme for string grammars [l6] and the computation of the WLD between two strings [l6,l7], which can be implemented on general purpose pipeline processors [6] or dedicated SIMD systems [l7]. The time complexity for the WLD computation for two strings of lengths m and n is reduced from O(mn) to 0(m+n), but requires [min(m,n)+l] processing elements [l7]. We now extend the parallel algorithm for computing the WLD for strings to a parallel algorithm for computing WL02, the distance between two arrays. This algorithm, a parallel implementation of the algorithm from Moore [76], is presented below. The notation WLD(COL-J,COL-L) denotes the WLD measure for strings between column J of the IxJ subarray from A and the Lth column from the KxL subarray from B. WLD(ROW-I,ROW-K) has a similar interpretation. SUBS(qu,BRS) represents the substitution cost of replacing element APQ of array A by element B of array B. It is 0 if A = BRS’ and SC (the substitution RS PO cost) otherwise. IIS Procedure A: Input: Two arrays, A and B, of sizes PxQ and sizes RxS respectively. SC, the substitution cost, IC, the insertion cost and DC, the deletion cost. Begin (I) Set Dist(0,0,0,0) - o (2) For M 8 l to P+Q+R+S Do in Parallel For all I,J,K,L = O to M such that |+J+K+L 8 M, steps (i)-(xvi) (i) eI = Dist(I,J,K,L-l) + K*IC (ii) e 8 Dist(I,J,K-l,L) + L*IC (iii) e 8 Dist(I,J-I,K,L) + I*DC (iv) eh 8 Dist(I-l,J,K,L) + J*DC (v) e 8 Dist(I,J,K-l,L-l) + (K*L-l)*IC (vi) e6 8 Dist(I-l,J-I,K,L) + (I*J-l)*DC (vii) e7 8 Dist(I,J-l,K,L-l) + WLD(COL-J,COL-L) (viii) e8 8 Dist(I-l,J,K-I,L) + WLD(ROW—l,ROW-K) (ix) e Dist(I,J-l,K-I,L) + L*IC + I*DC 9 (x) e10 - Dist(I-l,J,K,L-l) + K*IC + J*DC (xi) e1] 8 Dist(I,J-l,K-l,L-l) + (J-l)*DC + WLD(COL-J,COL-L) (xii) e12 = Dist(I-l,J,K-l,L-l) + (K-l)*IC + WLD(ROW-I,ROW-K) (xiii) e]3 = Dist(I-l,J-l,K,L-l) + (J-l)*DC + WLD(COL-J.COL-L) Dist(l-l,J-l,K-l,L) + (l-l)*DC + (xiv) em ll6 WLD(ROW-I,ROW—K) (xv) e1 8 Dist(I-l,J-l,K-l,L-l) + WLD(ROW-l,ROW-K) + 5 WLD(COL-J,COL-K) - SUBS(A PQ’BRS) (xvi) Dist(l,J,K,L) 8 min(ei: l<8i<8l5). End Loop End Loop Distance(A,B) 8 Dist(P,Q,R,S) End Procedure. At each step in the WL02 computation, the string WLD between strings of length I and K, and J and L respectively, is computed. Therefore, the WL02 computation for PxQ and RxS arrays requires P+Q+R+S steps with a time complexity of O{(P+Q)+(R+S)} at each step. The overall time complexity can then be expressed as O((P+Q+R+S)2} For P8Q8R8S8N this reduces to O(Nz), as opposed to O(N6) for the sequential case. The general expression for the maximum number of processors required is extremely complicated, but for I8J8K8L8N can be expressed as (N+l) (2N2+AN+3) , for N>82 --(l) 3 For N 8 l, seven processors are required. This shows that the number of processors required is of the order of N3. The WL02 computation of two 20x20 arrays requires 6l8l processors, a rather large number. A more practical scheme might involve using a ll7 smaller number of processors, say U, and a pipeline architecture. Ignoring the delay in the set-up time, or the time taken to structure the pipeline for the computation, the number of time steps required for computation can be expressed as, I+J+K+L > 5' / E: where Mi is the number of processors required at each step (2) of the above algorithm. For I8J8K8L8N, Mi can be expressed recursively as (i+l)(i+2)(i+3) l<-i<-N M.‘ --------------- I 6 = Mi_ + i+l i8N+l 8 Mi-l + (2N-l)-3(i-N-l) N+2<8i<82N 8 MAN-i 2N 127 Kleene closure by rows is defined as Let - MI 8 M, M2 8 M1 8 M. ------- . M 8 Mn 8 M. APPENDIX B ARRAY AUTOMATA An array automaton consists of a two dimensional input tape and a two dimensional storage tape. If the input is an mxn array of cells, the storage tape contains a (2m+l)x(2n+l) array of cells. The movement of the automaton is in three stages. In the first. the automaton acts on the storage tape. reading, printing and ”pushing“ symbols around depending on the nonterminal production rules, ultimately subdividing the (2m+l)x(2n+1) storage tape into blocks. each of which corresponds to a matrix of a corresponding IML. In the second step. the automaton acts as several automata and fills up each symbol (i.j) of the storage tape : i and j are both even numbers. The blocks can have a fixed num- ber (say k >8 1) of rows (or columns) so the automaton must act on these rows or columns simultaneously. The third step consists entirely of checking. The automaton compares the input symbol in the (i.j)th cell with the symbol on the (2i,2j)th cell of the storage tape. If the comparison fails at any cell the automaton halts without accepting. Otherwise the automaton halts and accepts the input. The set of patterns accepted by an array automata are equivalent to the set of arrays generated by array grammars [60]. We conclude this section with a formal definition of a (regular:regular) array automaton 128 129 {(R:R) AA}. The (regular:regular) array automaton is a 10 tuple (K,I,Z.d], d d 2 F C) 2! 3! qo! o! i where K is a set of finite states: q0 E K is a distinguished (start) state: FCZK is the set of final states: I is the input alphabet, or the primitive picture symbols: 2 8 Zl U 22 U 23, with Z3 8 I is the finite set of storage symbols: Z0 E Z] is the initial storage symbol; d‘ 15 a mapping from {qoixlI into finite subsets of {qo}x{Z]QZ2} or {qo}x{ZzQZ]} where 0 represents row catenation (B) or column catenation (IP) : d2 is a mapping from le2 into the finite subsets of Kx{(zz¢23)u(23¢zz)l: d3 is a mapping from lexl into Kx{DOWN,TOP}: C is a counter that starts from an initial value of 0. and keeps track of the d1 moves of the automaton. The .dI moves of the automaton correspond to the nonterminal rewriting rules of the corresponding array grammar, and create the block structure description of the required size. The d2 moves are the second step and correspond to filling each block independently with the primitive symbols of the picture. The d mapping corresponds to the 3 checking phase. where the input picture is compared to the picture on 130 the storage tape. DOWN implies a move downward along a column and TOP implies a change to the top of the next column. APPENDIX C MATRIX GRAMMARS A matrix grammar is a two tuple "6' (GI! G) 2 where GI ‘ (Vl’ context-free or regular: I],P.S) is a string grammar that can be context-sensitive, V] is a finite set of horizontal nonterminals; II is a finite set of intermediates {SI’SZ’---’Sk}; P1 is a finite set of production rules: S 6 V1 is the start symbol: k . G2 8 U G2i is the set of vertical grammars where each i8l G2i ‘ (Vzi"' PZi’Si) is a right linear (regular) grammar and V2i is a finite set of vertical nonterminals, P2i is a finite set of right linear production rules. I is the set of picture primitives and v2i(\ V21 - ¢ if i + j. Derivations to create an mxn picture follow a two step procedure. First a string 5 5 ---sn symbols from I is generated horizontally 1 2 I 131 132 using G]. Then the corresponding G rules are applied m times 2 simultaneously to every column. to create an mxn matrix. The set of all matrices generated by a matrix grammar is denoted L(MG) 8 {mxn arrays [aij]' m,n>81 I S8 5152"5517{ai13}° L I ST OF REFERENCES [l] [2] [3] [A] [5] [6] [7] [8] [9] [10] [11] [12] REFERENCES A.V. Aho and T.G. Peterson, 'A minimum distance error-correcting parser for context free Ianguages'. SIAM Jour. Computing, vol. A, 1972. pp. 30A-312. J.E. Albus, 'Electrocardiogram intrepretation using a stochastic finite state model' in Syntactic Pattern Recognition: Applications (K.S. Fu. ed.), Springer Verlag. Berlin Heidelburg, 1977. pp. 51-6A- F. Ali and T. Pavlidis. 'Syntactic recognition of handwritten numerals', IEEE Trans. Sys.. Man and Cyber.. vol. SMC-7. 1977. pp- S37-5Al- M.R. Anderberg, Cluster Analysis For Applications. Academic Press, New York. 1973. A. Apostolico. 'On the efficiency of syntactic feature extraction and approximation schemes for data compression', Proc. 5th. Intl. Conf. on Pattern Recognition, Miami Beach. Florida. vol. 2, 1980. pp. 982-98A J.L. Baer, Computer Systems Architecture. Chapter 10.2, Computer Science Press. Maryland. 1980. J.E. Besag, 'Nearest neighbor systems and the auto-logistic model for binary data'. J. Royal Stat. Soc.. Ser. B. vol. 3A. 1972. PP- 75-83. J.E. Besag, 'Spatial interaction and the statistical analysis of lattice systems', J. Royal Stat. Soc.. Ser. B. vol. 36. 197A, pp. 192-236. . A.W. Biermann and J.A. Feldman, 'On the synthesis of finite state machines from samples of their behavior',~ IEEE Trans. Comput.. vol. C-21, 1972. PP. 592-597. M. Blum. 'On the size of machines', Info. and Control. vol. 11. 1967. PP- 257‘265- T.L. Booth and R.A. Thomson, 'Applying probabilistic measures abstract Ianguages'. IEEE Trans. Comput.. vol. C-22, 1973, pp. AA2-A50. J.R. Bourne. V. Jagannathan. B. Giese and J.W. Ward. 'A software 133 [13] [1A] [15] [16] [17] [18] [19] [20] [21] [22] [23] [2A] [25] 13A system for syntactic analysis of EEG'. Computer Programs in Biomedicine, vol. 11, 1980. pp. 190-200. W.S. Brainerd, 'Tree-generating regular systems', Info. and Control. vol. 1A. 1969, pp. 217-231. J.M. Brayer and K.S. Fu, 'A note on the k-tail method of tree grammar inference'. IEEE Trans. Sys.. Man, Cyber.. vol. SMC-7. 1977. pp- 293-300- J.M. Brayer, 'Web automata and parsing web grammars', Report NSF Workshop on Structural and Syntactic Pattern Recog., Saratoga Springs. New York, 1981. pp. AO-Al. Y. Chiang and K.S. Fu. 'Parallel processing of some synatctic pattern recognition algorithms', Ninth Workshop Appl. Imagery. Maryland. 1980. pp. 1-21. Y. Chiang and K.S. Fu, 'Parallel processing for distance computation in syntactic pattern recognition', IEEE Comput. Soc. Workshop Comput. Architecture for Pattern Analysis and Image Database Management. VA, 1981. pp. 337-3AA. N. Chomsky. Syntactic Structures, Mouton and Co., The Hague, Netherlands. 1957. N. Chomsky and G.A. Miller, 'Finite state Ianguages'. Info. and COO‘EI’O], V0]. 19 1958' pp. 91-112. V. C1aus,et al., eds.. Graph Grammars and their Applications 39 Computer Science Egg Biology. Springer Verlag. Berlin. Heidelberg, 1979. C.M. Cook. 'A cost function for concept formation', Tech. Rep. TR-212. Computer Science Center. Univ. of Maryland. College Park. 1972. C.M. Cook, 'Grammatical inference by heuristic search', Tech. Rep. TR-287, Computer Science Center. Univ. of Maryland. College Park. 197A. C.R. Cook and P.S. Wang. 'A Chomsky hierarchy of isotonic array grammars and Ianguages'. Computer Graphics and Image Processing. V0]. 8’ 1978’ pp. lab-'52. S. Crespi Reghizzi. 'The mechanical acquisition of precedence grammars', Tech. Rep. UCLA-ENG-705A, School of Engg. and Appl. Science, Univ. of California, Los Angeles, 1970. G.R. Cross. 'Markov random fields and texture models'. Ph.D. [26] [27] [28] [29] [30] [31] [32] [33] [3A] [35] [36] [37] [38] I35 thesis, Michigan State Univ., E. Lansing. 1980. G.R. Cross. et al.. 'Multivariate normality in Pattern Rec- ognition and clustering', Proc. 6th. Intl. Conf. on Pattern Recog., Munich, Germany, 1982. pp. 862-86A. R. DeMori, et al., 'A syntactic procedure for the recognition of glottal pulses in continuous speech'. Pattern Recognition, vol. 9, 1977. pp. 181-189. P.J. Denning. et al., Machines, Languages and Computation, Pren- tice- Hall. New Jersey. 1978. R.L. Dobrushin. 'Gibbsian random fields for lattice systems with pairwise interactions'. Funcl. Analy. Applic.. vol. 2, 1968, pp. 292-301. T. Dubitzsky. et al., 'Parallel region property computations by active quadtree networks', IEEE Trans. Pattern Analysis and Machine Intelligence. vol. PAMI-3. 1981. pp. 626-633. R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. John Wiley, New York. 1973. M.J.B. Duff, 'Parallel processing techniques'. from Pattern Recognition - Ideas lg Practice (B.G. Batchelor. ed.), Plenum Press, 1978. pp. lA5-l76. J. Earley, 'An efficient context free parsing algorithm'. CACM, vol. 13, 1970, pp. 9A-52. J.A. Feldman et al.. 'Grammatical complexity and inference'. Tech. Rep. CS-125, Computer Science Dept., Stanford Univ., Stanford. 1969. J.A. Feldman. 'Some decidability results on grammatical inference and complexity'. Info. and Control. vol. 20, 1972. pp. 2AA-262. N.V. Findler and J.V. Leeuven, 'A family of similarity measures between strings', IEEE Trans. Pattern Analysis and Machine Intelligence. vol. PAMI-l, 1979, pp. 116-118. P.A. Flinn, 'Monte Carlo calculation of phase separation in a 2-dimensional lsing system'. J. Stat. Physics. vol. 10. 197A, pp. 89-97. H. Freeman. 'On the encoding of arbitrary geometric patterns' IRE Trans. on Electronic Comp.. vol. EC-lO, 1961. pp. 260-268. [39] [A0] [A1] [A2] [A3] [AA] [A5] the] [A7] [A8] [A9] [50] [51] [52] 136 K.S. Fu and T. Huang, 'Stochastic grammars and Ianguages'. Intl. Jour. Comp. and Info. Sciences. vol. 1, 1972. PP. 135-170. K.S. Fu and B.K. Bhargava. 'Tree systems for syntactic pattern recognition', IEEE Trans. Computers. vol. C-22, 1977. pp. 1087-1099. K.S. Fu. Syntactic Methods lg Pattern Recognition. Academic Press. New York, 197A. K.S. Fu and T.L. Booth, 'Grammatical inference: introduction and survey', Part I, IEEE Trans. Sys.. Man, Cyber.. vol. SMC-5. 1975. pp. 95-111- Part II, IEEE Trans. Sys.. Man. Cyber.. vol. SMC-5. 1975. PP- A09'A23- K. Fukunaga, Introduction Lg Statistical Pattern Recognition. Academic Press, New York, 1972. L.W. Fung and K.S. Fu, 'Stochastic syntactic decoding for pattern classification'. IEEE Trans. Comput., vol. C-2A, 1975, pp. 662-667. R.C. Gonzalez and M.G. Thomason. 'Tree grammars and their applications to pattern recognition', Tech. Rep. no. TR-EE/CS-7A-20. Electrical Engineering Department. Univ. of Tennessee, Knoxville, 197A. R.C. Gonzalez and M.G. Thomason. Syntactic Pattern Recognition: gg Introduction. Addison-Weseley, Reading, Mass., 1978. R.C. Gonzalez and R. Saafabakhsh. 'Computer vision techniques for industrial applications and robot control'. Computer, vol. 15. 1982. pp. 17-32. J. Gruska, 'Some classifications of context free Ianguages'. Info. and Control, vol. 1A, 1969. pp. 152-179. J.M. Hammersley and D.C. Handscomb, Monte Carlo Methods. Meuthen. London, 196A. J.M. Hammersley and P. Clifford, 'Markov random fields on finite graphs and lattices', (Unpublished manuscript). J. Hartmanis and R.E. Stearns, 'On the computational complexity of algorithms', Trans. American Math. Soc.. vol. 117, 1965, pp. 285-306. J.E. Hopcroft and J.D. Ullman. Introduction 39 Automata Theory, [53] [5A] [55] [56] [57] [58] [59] [60] [61] [62] [63] [6A] [65] 137 Languages. and Computation, Addison Weseley. Mass., 1979. J.J. Horning, 'A study in grammatical inference'. Ph.D. Thesis, Stanford Univ., Stanford, CA, 1969. E. Isaacson and H.B. Keller. Analysis 91 Numerical Methods, John Wiley, New York. 1966. V. Isham. 'An introduction to spatial point processes and Markov random fields'. Intl. Stat. Review, vol. A9, 1981, pp. 21-A3. S.N. Jayaramamurthy. 'Multilevel array grammars for generating texture scenes'. Proc. IEEE Conf. on Pattern Recognition and Image Processing. Chicago. Illinois, 1979, pp. 391-398. R.L. Kashyap and M.C. Mittal, 'Recognition of spoken words and phrases in multitalker environment using syntactic methods', IEEE Trans. Computers. vol. C-27, 1978. pp. AA2-A51. R.A. Kirsh. 'Computer intrepretation of english text and picture patterns'. IRE Trans. Electronic Computers. vol. EC-13, 196A. pp- 363-376. H.C.M. Kleijn and G. Rozenberg. 'A study in parallel rewriting systems', Info. and Control, vol. AA. 1980. pp. l3A-163. K. Krithivasan and R. Siromoney. 'Array automata and operations on array Ianguages'. Intl. Jour. of Computer Math.. vol. A. Sec. A, 197A, pp. 3-30. W. Kuich. 'On the entropy of context free Ianguages'. Info. and Control. vol. 16, 1970. pp. 173-200. R.S. Ledley, et al.. 'FIDAC: Film input to digital automatic computer and associated syntax-directed pattern recognition programming system'. Optical 22g Electro-Optical Info. Proc. (J.T. Tippet et. al., eds.), MIT Press. Cambridge, Mass., 1965. pp. 591-613. H.C. Lee and K.S. Fu, 'A stochastic syntax analysis procedure its application to, pattern classification'. IEEE Trans. Computers. vol. C-21. 1972. Pp. 660-666. V.I. Levenshtein. ‘Binary codes capable of correcting deletions. insertions and reversals', Sov. Phys. Dok1., vol. 10. 1966, pp. 707-710. B. Levine, 'Derivatives of tree sets with applications to grammatical inference'. IEEE Trans. Pattern Analysis and Machine Intelligence. vol. PAMI-3. 1981, pp.285-293. [66] [67] [68] [69] [70] [71] [72] [73] [7A] [75] [76] [77] [78] 138 J. Liou. 'Grammatical inference by a constructive method'. Ph.D. dissertation. Dept. of Computer Science. Michigan State Univ., E. Lansing. 1977. S.Y. Lu and K.S. Fu, 'Stochastic error-correcting syntax analysis for recognition of noisy patterns', IEEE Trans. Comput., vol. C-26, 1977 pp. 1268-1276. S.Y. Lu and K.S. Fu. 'A syntactic approach to texture analysis'. Computer Graphics and Image Processing. vol. 7. 1978, 99.303-330. F.J. Maryanski, 'Inference of probabilistic grammars', Ph.D. dissertation. Dept. of Elec. Engg. and Computer Science, Univ. of Connecticut. Storrs, 197A. K.V. Mardia, 'Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies'. Sankya: Ind. J. of Stat., vol. 36. Ser. 8, pt. 2, 197A. PP. lA2-15A. W.J. Masek and M.S. Paterson, ' A faster algorithm computing string edit distances'. J. Comput. and Sys. Sciences. vol. 20, 1980. pp. 18-31. A. Mercer and A. Rosenfeld. 'An array grammar programming system'. Comm. ACM. vol. 16. 1973. pp. 299-305. N. Metropolis, et al.. 'Equations of state calculations by fast computing machines', J. Chem. Physics. vol. 21, 1953, pp. 1087-1091. D.L. Milgram and A. Rosenfeld, 'Array automata and array grammars', Proc. IFIP Congress-71. vol. 1. 1971, North Holland. pp-69-7A. B. Moayer and K.S. Fu. 'An application of stochastic languages to fingerprint pattern recognition', Pattern Recognition, vol. 8, 1976. pp. 173-179. R.K. Moore. 'A dynamic programming algorithm for the distances between two finite areas'. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-l, 1979. pp. 86-88. J.L. Mundy and R.E. Joynson. 'Automatic visual inspection using syntactic analysis'. Proc. IEEE Conf. on Pattern Recognition and Image Processing. Troy. New York, 1977. pp. 1AA-1A7. R. Narasimhan, 'Labelling schemata and syntactic description pictures'. Info. and Control. vol. 7. 196A, pp. 151-179. [79] [80] [81] [82] [83] [8A] [85] [86] [87] [88] [89] [90] [91] [92] 139 R. Narasimhan. 'Syntax directed intrepretation of classes of pictures', Comm. ACM. vol. 9. 1966. pp. 166-173. T. Okuda, et al., 'A method for correction of garbled words based on the Levenshtein metric', IEEE Trans. Comput., vol. c-ZSO 19769 PP- ‘72-‘77- T.W. Pao. 'A solution of the syntactical induction-inference problem for a non-trivial subset of context free Ianguages'. Tech. Rep. 69-19, Moore School of Elec. Engg..Univ. of Pennsylvania. Philadelphia, 1969. L.F. Pau. 'Semiconductor IC's: integrated testing and algorithms for visual inspection'. Proc. 5th. Intl. Conf. on Pattern Recog., Miami Beach, Florida, vol. 1. 1980, pp. 238-2A0. T. Pavlidis. Structural Pattern Recognition, Springer-Verlag. Berlin, Heidelburg, 1977. T. Pavlidis and F. Ali. 'A hierarchical syntactic shape analyzer', IEEE Trans. Pattern Analysis and Machine Intelligence. vol. PAMI-l, 1979. pp. 2-9. E. Persoon and K.S. Fu, 'Shape description using Fourier descriptors'._IEEE Trans. Sys. Man and Cyber.. vol. SMC-7. 1977, pp. 170-179. J.L. Pfaltz. 'Web grammars and picture description'. Computer Graphics and Image Processing. vol. 1. 1972, pp. 193-220. G.M. Porter. et al.. 'Automatic visual inspection of blind holes in metal surfaces'. Proc. IEEE Conf. on Pattern Recognition and Image Processing, Chicago, Illinois, 1979. pp. 83-86. W.K. Pratt, Digital Image Processing. John Wiley (Interscience), New York, 1978. C.J. Preston, 'Generalized Gibbs states and Markov random fields'. Adv. Appl. Prob.. vol. 5. 1973. pp. 2A2-26l. K. Rao and K. Balck, 'Type classification of fingerprints: a syntactic approach', IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-Z. 1980, pp. 223-231. A. Rosenfeld. 'Isotonic grammars, parallel grammars and picture grammars', Machine Intelligence, vol. A, 1971. pp. 281-29A. A. Rosenfeld and D.L. Milgram, 'Web automata and web grammars', Machine Intelligence, vol. 7, 1972. pp. 307-32A. 1A0 [93] A. Rosenfeld, 'Array grammar normal forms', Info. and Control. vol. 23, 1973, pp. 173-182. [9A] G. Rozenberg and A. Salomaa, The Mathematical Theory gfi L Systems. Academic Press, 1980. [95] A. Salomaa, Formal Languages. Academic Press. New York, 1973. [96] C.E. Shannon, 'A mathematical theory of communication', Bell Sys. Tech. Jour.. vol. 27, 19A8, pp. 379-A23. [97] A.C. Shaw, 'A formal picture description scheme as a basis for picture processing systems', Info. and Control. vol. 1A, 1969. PP- 9’52 [98] A.C. Shaw, 'Picture graphs. grammars and parsing'. from Frontiers 9: Pattern Recognition (S. Watanabe. ed.), Academic Press. New York, 1972. [99] G. Siromoney, et al.. 'Abstract families of matrices and picture languages'.Computer Graphics and Image Processing, vol. 1, 1972. pp. 28A-307. [100] G. Siromoney, et al.. 'Picture Languages with array rewriting rules', Info. and Control, vol. 22, 1973, pp. AA7-A70. [101] R. Siromoney and K. Krithivasan, 'Parallel context free Ianguages'. Info. and Control, vol. 2A, 197A. pp. 155-162. [102] R. Siromoney and G. Siromoney, 'Parallel 0L Ianguages'. Intl. J. Comput. Math.. vol. 5, Sec. A, 1975, pp. 109-123. [103] A.R. Smith III, 'Cellular automata and formal Ianguages'. IEEE Eleventh Ann. Symp. Switch. and Automata Theory. CA, 1970, pp. 216-22A. [10A] A.R. Smith 111, 'Cellular automata complexity tradeoffs', Info. and Control, vol. 18, 1971. pp. A66-A82. [105] A.R. Smith 111. 'Real time language recognition by one dimensional cellular automata', J. Comput. Sys. Sciences, vol. 6. 1971. pp- 233-253. [106] S.R. Sternberg, 'Parallel architectures for image processing', Intl. IEEE COMPSAC. Chicago. 1979, pp. 1-6. [107] G.C. Stockman. et al., 'Structural pattern recognition of cartoid waves using a general waveform parsing system'. Comm. ACM. vol. 19. 1976, pp. 688-695. 1A1 [108] G.T. Tang and T.S. Huang. 'A syntactic-semantic approach to image understanding and creation'. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-l, 1979, pp. 135-1AA. [109] E. Tanaka and Y. Kikuchi. 'A metric between pictures', Tech. Rep., Dept. of Info. Science, Utsunomiya Univ., Japan. 1981. [110] R.A. Thompson. 'Determination of probabilistic grammars for functionally specified probabilistic-measure Ianguages'. IEEE Trans. Comput., vol. C-23, 1973. pp. 603-61A. [111] H. Tropt, 'Analysis by synthesis search for semantic segmentation applied to workpiece recognition', Proc. 5th. Intl. Conf. on Pattern Recog., Miami Beach, Florida, vol. 1,. 1980. pp.2Al-2AA. [112] W.H. Tsai and K.S. Fu, 'A syntactic-statistical approach to recognition of industrial objects', Proc. 5th. Intl. Conf. on Pattern Recog., Miami Beach. Florida. vol. 1, 1980. PP. 251-259. [113] R.M. Wagner and M.J. Fischer, 'The string-to-string correction problem'. J. ACM. vol. 21, 1973. pp. 168-173. [11A] P.S. Wang, 'Sequential/parallel matrix array Ianguages'. Journal of Cyber.. vol. 5, 1975. pp. 19-36. [115] P.S. Wang, 'Recognition of two dimensional patterns', Tech. Rep. no. CS-TR-8-l6. Dept. of Computer Science, Univ. of Oregon. Eugene Oregon. 1978. [116] P.S. Wang, 'Syntactic structures of parallel isometric array patterns',Tech. Rep. no. CIS-TR-80-1. Dept. of Comp. and Info. Science. Univ. of Oregon, Eugene. Oregon. 1980. [117] P.S. Wang. 'Some new results on isotonic array grammars', Info. Proc. Letters, vol. 10, 1980. pp. 129-131. [118] R.M. Wharton. 'Grammatical inference and approximation', Ph.D. Thesis, Univ. of Toronto. Toronto. 1973. [119] E. Yodokawa and N. Honda. 'On two dimensional pattern generation grammars', Systems, Computers and Controls, vol. 1, 1970. PP. 6'13. [120] G.T. Zahn and R.Z. Roskies. 'Fourier descriptors for plane closed curves', IEEE Trans. Computers, vol. C-21. 1972. pp. 269-281. [121] B. Zavidovique and G. Stamon. 'An automated process for electronics scheme analysis'. Proc. 5th. Intl. Conf. on Pattern 1A2 Recog., Miami Beach. Florida. vol. 1, 1980. pp. 2A8-250.