M
A

LIBRARY
Michigan mate
1 University

\ _ - I

m
———-——— —-V

 

 

This is to certify that the
dissertation entitled

Inference of Array Grammars
Under Noise and Distortion

presented by

Gautam Biswas

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree inCQmpgter Science

 

Major professor

Date December: 29, 1982

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

MSU

LIBRARIES
m

 

 

RETURNING MATERIAL§:
Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped below.

 

 

 

 

 

 

INFERENCE OF ARRAY GRAHMARS UNDER

NOISE AND DISTORTION

By

Gautam Biswas

A DISSERTATION
Submitted to
Michigan State University

in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

1982

b“

9053

(3/

I,.

ABSTRACT

INFERENCE OF ARRAY GRAMMARS
UNDER NOISE AND DISTORTION

By

Gautam Biswas

This thesis research deals with learning of the syntactic structure
of two dimensional binary patterns that occur in image processing and
pattern recognition. The development of a two dimensional inference
scheme using a probabilistic array grammar that incorporates noise and
distortion models is the primary contribution of this thesis. A two
level inference scheme is proposed based on a block structured array
grammar. The ideal pattern must be known, which implies knowledge of
the nonterminal rewriting rules. The key step is the inference of

intermediate grammars from probabilistic samples.

Complexity and discrepancy measures are the criteria for selecting
a simple and 'best-fitting' grammar. A number of complexity and dis-
crepancy measures for string grammars are reviewed and a new, computat-
ionally feasible discrepancy measure that involves both probabilistic
and structural factors, is defined. The new measure is shown to
possess a number of intuitively desirable properties. The discrepancy
measure is extended to two dimensions to quantify the suitability of an

inferred array language.

The inference scheme uses noise and distortion models to accommo-
date deviations from the ideal structure among observed patterns. An
independent noise model and two intuitive distortion models are analy-
zed along with a powerful and comprehensive model based on Markov ran-

dom fields. The problems of generating patterns and estimating para-

meters are studied. Robustness tests and parsing experiments demonstr-
ate the versatility of the Markov random field inferential model. Not
only is it applicable with a number of noise and distortion models, but
it also allows small deviations from the ideal structure, thus

providing a flexible inference scheme.

The inference scheme and the computation of the discrepancy
measures require large amounts of computer time in the sequential mode
of operation. SIMD and pipeline architectures are suggested to speedup
computation significantly. lmplementation of the proposed scheme with
these architectures should open up a number of real time practical

applications in industrial vision and robotics.

Dedicated to my father

the Late Professor Anil Bhushan Biswas

and my mother

Mrs. Anjali Biswas

ACKNOWLEDGEMENTS

I am indebted to my thesis advisor, Professor Richard C. Dubes, for
his help, guidance and encouragement throughout the course of this
thesis work. He has served as an excellent role model and has
contributed greatly to the development of my research abilities.
Special thanks go to Professor Raoul LePage, for his patient efforts
and many helpful suggestions that contributed greatly to the
development of the Markov random field model and to Professor Anil K.

Jain, for his critical review of this thesis work.

I thank Professors Lewis Greenberg, Jacob Plotkin and lndranand
Sinha for agreeing to serve on my Ph.D. guidance committee.
Acknowledgements are also due to Dr. Erdal Panayirci and past and
present graduate students of the Pattern Recognition and Image
Processing Laboratory - Dr. George Cross, Dr. Steve Smith, Dr. James
Coggins, Neal Wyse, Weichung Lin, Ardeshir Goshtashby, Rick Hoffman and
Xiabo Li for their help and suggestions during the course of this work.
I must not forget to mention Dr. and Mrs. Subinoy Chakravarty, who have
made my stay in E. Lansing a very pleasant one, as well as my wife
Sujata, for her (long distance) encouragement through those hectic

final months of thesis writing.

The financial support provided by NSE grant ECS-8007lO6 and DER is

also gratefully acknowledged.

TABLE OF CONTENTS

LIST or TABLES ......... ...... ...... . ............ . ........... viii
LIST or FIGURES ................................ . .............. ix
LIST or SYMBOLS ....... ......................................... x
CHAPTER I. INTRODUCTION ................................ ....... I

1.1 Grammatical Inference ................................... 5
l.2 Two Dimensional Picture Representation .... ....... ....... 8
l.2.l Picture Description Languages .. ........ ............ 9

1.2.2 Two Dimensional Grammars .......................... lO

l.3 Organization and Contributions of this Thesis ..... ..... 12
CHAPTER 2. ARRAY GRAMMARS .. ................. . ........ . ....... IS
2.l Rosenfeld Array Grammars ........................ . ...... IS
2.2 Siromoney Array Grammars ........... ........ .. ......... . l7
2.2.l Formal Definition of Siromoney Array Grammars ..... l9

2.2.2 Some Useful Properties ........ .................... 2A

2.2.3 Probabilistic Array Grammars ..... ................. 26

2.3 Array Automata .................... ...... ......... ...... 27
2.A Definition of Type and Block Number ..... ............... 28
2.5 Summary ................................ ....... . ........ 29
CHAPTER 3. MODEL DESCRIPTION AND INFERENCE .. ........... . ..... 3i
3.] Pattern Description and Learning ............. .......... 3i
3.2 The Inference Scheme .............. ..... ................ 33
3.3 Summary .......... ... ........ .... ..... .. ....... . ........ 38

CHAPTER A. COMPLEXITY AND DISCREPANCY MEASURES ............... 39
A.I Complexity Measures .................................... A0
A.l.l Size Complexity Measures .......................... AI

A.l.2 Information Theoretic Complexity Measures ......... A3

A.I.3 Choice of Complexity Measure .... ...... ............ AA
A.2 Discrepancy Measures ............................. ...... A5
A.2.l Probabilistic Discrepancy Measures ................ A7
A.2.2 Structural Discrepancy Measures ................... A8
A.2.3 Choice of Discrepancy Measure ..................... A9
A.3 New Discrepancy Measure ............ .. ................. . 50
A.3.I Definition ..................................... ... 50
A.3.2 Particular Proximity Measures ..................... 53
A.3.3 Properties of Structural Discrepancy Measure ...... 56
A.3.A Computation of the Discrepancy Measure ............ 58
A.A Discrepancy Measure for Array Languages ................ 61
A.5 Summary ...................... . ......... . ................ 63
CHAPTER 5. MODELS FOR NOISE AND DISTORTION ................... 65
5.] The Independence Noise Model ........................... 66
5.2 Intuitive Distortion MOdels ... .............. ........... 70
5.2.1 Independence Distortion Model . ..... ... ........ .... 72
5.2.2 Neighborhood Distortion Models .................... 7A
5.2.3 Applications and Limitations . ..................... 80
5.3 Markov Random Field Distortion Model ................... 82
5.3.] Background ......... ............................... 82
5.3.2 The Distortion Model .. .......... . ....... .......... 85

5.3.3 Generation of Distorted Patterns ... ......... , ..... 87

5.3.A Estimation of Parameters .......................... 92

5.A Summary .... ............................................ 96
CHAPTER 6. EXPERIMENTAL RESULTS AND COMPUTATION ..... ......... 98
6.1 The Inference Scheme ................................... 99
6.1.1 Implementing the Inference Scheme ................ 101

6.1.2 Robustness Tests ................................. IOA

6.2 Parsing Experiments ........ ..... ............. ......... 108
6.3 Recognition of Rotated Triangles .... .................. IIO
6.A Parallel Implementation Techniques .................... 11A
6.5 Summary ............. .. ....... ..... .................... 118
CHAPTER 7. SUMMARY, CONCLUSIONS AND FUTURE RESEARCH ......... Il9
7.1 Summary and Conclusions ....................... ........ 119
7.2 Future Research ....................................... 122
APPENDIX A. NOTATIONS AND DEFINITIONS'ARRAY GRAMMARS ......... 125
APPENDIX B. ARRAY AUTOMATA ................................... 128
APPENDIX C. MATRIX GRAMMARS .................................. I31
LIST OF REFERENCES. .......................................... I33

LIST OF TABLES
Parameter estimates by the two schemes for the Markov
random field model ...................................... 95
Parameter values used for generating training sets ..... 100
Estimated Parameter values ..... . ....................... 102
Run Times and Core Requirements ........................ 106

Robustness Test - using the Two dimensional Discrepancy
Heasure O ..... 00...... ..... 00...... ...... ...... ....... 0.107

Acceptance Probabilities - Patterns generated from Array
Grammar ..... ... ............. . ...... .. ....... . ...... .... 109

Acceptance Probabilities - Patterns generated from
Physical Process .. ............ ..... ...... ... ........... 110

Acceptance Probabilities - Rotated Triangles ........... 112

viii

LIST OF FIGURES

1. Shearing Effect ....... ....... . ................ . ......... l7
2. Array Grammar - Isosceles Triangle ...................... 22
3. Array Grammar for Numeral '7; ........................... 23
A. Pattern Type and Block Number ...... . .......... . ......... 30
5. Flowchart for Inference Scheme . ............. ............ 3A
6. Coding Intermediate Languages of '7' into String form ... 36
7. Triangles generated from Independent Noise Process ...... 68
8. Noisy 7's ............................................... 69
9. Interior and Boundary Points .. .................. . ....... 71
10. Triangles generated by Independent Distortion Process .. 7A
11(a). Squared distance to point x . ........................ 76
11(b). Orders of Neighbors relative to point x ..z ....... ... 77
12. Triangles Generated by Neighborhood Process ............ 81
I3. Algorithm for generating Templates from MRF distribution
with given parameters ................................... 89
1A. Triangles generated by MRF generative model (I) ........ 9O
15. Triangles generated by MRF generative model (2) ... ..... 91
16. Coding Blocks into Strings - 7's ...................... 103
17. Complexity and Discrepancy Measure vs. Tail length .... 105
18. Rotated Triangle ...................................... Ill
19. Noisy Rotated Triangle ................................ 113

Symbol

o-em

LIST OF SYMBOLS

Usage

Empty matrix.

Row Catenation

Column

Row or

First order parameter for

Catenation
Column Catenation

interior points -

Markov random field model

First order parameter for boundary points -
Markov random field model.

Second
among

Second
among

Second

order parameter for pairwise interaction
interior points - Markov random field model.

order parameter for pairwise interaction
boundary points - Markov random field model.

order parameter for pairwise interaction among

interior-boundary points - Markov random field model.

Weight

factor for window function.

Parameter vector Markov random field model;

A - <ai, ab, a.

0 Q a >

I,I b,b’ ab,i

Specific subset of language L(G).

Siromoney Array Grammar

Number

Number

Number

of pixels in a block.
of interior pixels in block.

of boundary pixels in block.

General second parameter - Markov random field

model.

drect
dtri
Da("')

D(.,.)

D (...)

tot

Boundary template
Specific subset of language L(G).
Number of pixels flipped in a block.
Number of internal pixels flipped in a block.
Number of boundary pixels flipped in a block.
Count vector; C - (si’sb’sii’sbb’sbi>
Complexity of string grammar G.
Proximity value between string x and language L.
Window proximity measure (rectangle or triangle).
Minimum proximity measure.
Rectangular proximity measure.
Triangular proximity measure.
Absolute discrepancy measure.
Structural discrepancy measure.

Total discrepancy measure.
Minimum structural discrepancy measure.
Rectangular structural discrepancy measure.
Triangular structural discrepancy measure.
Distance.
exponential.
Expected value.
Probability function.

Interacting Potential.
Formal grammar.

Class of formal grammars.

xi

IML Intermediate Matrix Language.

Kc Number of strings in (cardinality of) set Lc'

£(x) length of string x.

LA’LB Intermediate languages of array grammar.

Lc(x) Window on set L centered at x.

LS Specific subset of array language.

L(G) Language of grammar G.

L(.) Likelihood function.

min Minimum.

MG Matrix Grammar.

MRF Markov random field.

n(.) Number of.

N Pattern type.

ka Pixels in kth order neighborhood of a
particular pIxel.

p Probability of change

pi Probability of change for interior points.

pb Probability of change for boundary points.

p(x),q(x) Probability of occurrence of string x.

p-AG Probabilistic array grammar.

P Set of production rules.

Pl Nonterminal production rules of array grammar.

P2 Intermediate production rules of array grammar.

Q Potential function.

R Sample covariance matrix.

Set of terminal symbols or picture primitives.

xii

R Finite set of probabilities.

RI Probabilities assigned to nonterminal rules.

R2 Probabilities assigned to intermediate rules.

si Count of interior points that are 1's.

sb Count of boundary points that are 1's.

sii Count of interior-interior pairs that are both 1's.
sbb Count of boundary-boundary pairs that are both l's.
sbi Count of interior-boundary pairs that are both l's.
S Sample (training) set.

S+ POsitive sample set.

S- Negative sample set.

3 Start symbol of grammar.

T Set of all possible realizations on a binary lattice.
T Change template.

u(.) Discrete probability distribution - Markov random field.

U[.,.] Discrete uniform distribution.

V Set of nonterminal symbols.
VI Set of nonterminal symbols of array grammar.
V2 Set of intermediate symbols of array grammar.

V1,V2,--,V8 Coded vectors to convert intermediate language to
string form.

W(.,.) Neighborhood function.

W1,W2,--,W8 Coded vectors to convert intermediate language to
string form.

WLD Weighted Levenshtein Distance for strings.

WLD2 Weighted Levenshtein Distance for arrays.

xiii

x Pixel value (0 or 1).

5 Configuration of pixels on a lattice.

z(i,j) Indicator function.

Z Normalizing constant for probability distribution -

Markov random field.

CHAPTER I

INTRODUCTION

Pattern Recognition techniques are major contributors to the field
of machine intelligence and perception. They involve the description
and analysis of data, including their categorization into identifiable
classes by the extraction of significant features or attributes that
are relevant to the problem being studied. The mathematical techniques
available to solve pattern recognition problems can be broadly
classified into the statistical, or decision-theoretic
approach [31,A3], and the structural, or syntactic approach [A1,A6,83].
The first involves measuring characteristic features on the input data
and then classifying the resulting patterns by partitioning the feature
space according to categories or pattern classes. An example is the
use of Fourier descriptors [85,120] to describe the shape of objects
Afrom their outer boundaries. The structural approach involves the des-
cription of objects, or patterns, in terms of their parts (subpatt-
erns), and the connectivity relations among these parts. An example is
the use of primitive symbols (simple curves) and a formal grammar to

describe different types of chromosomes [62,63].

Decision theoretic methods are based on the use of discriminant

functions and are well suited for patterns that can be meaningfully

represented in vector form. However, there are applications, such as
scene analysis and image segmentation, where the relationships among
the various component parts (subunits) are nonnumerical in nature, but
can be expressed in terms of discrete mathematical models such as
formal grammars [18]. In such situations, the structural approach
seems to be more appropriate for understanding and analyzing complex
patterns than the statistical approach. Structural and syntactic patt-
ern recognition techniques have been applied to a number of application
areas including chromosome identification [62,63] character recogni-
tion [3], shape analysis [8A], classification of speech [27,57], data
compression [5], biomedical waveform analysis [2.12.107], fingerprint
identification [75,90], image understanding [108], texture
analysis [56,68], defect location [77,87], component recognition and

placement [111,112], and testing of integrated circuits [82,121].

This thesis is primarily concerned with the description and
analysis of two dimensional patterns using syntactic techniques. This
involves two main steps - (i) the selection and extraction of
primitives, and (ii) the inference or learning of connectivity
relations among primitives in terms of formal grammars. In particular,
we infer two dimensional formal grammar models with applications to the
generation, description and recognition of two dimensional objects in
mind. The patterns of interest here are binary images or pictures, so
we choose pixel values 0 and 1 as our two basic picture primitives.

Our main emphasis will be on the learning of a formal grammar to

describe a class of patterns. The study of the primitive selection

problem is not covered here.

Two dimensional grammar models present many advantages in
representing two dimensional patterns [98]. The relationship among the
components (subpatterns) of a pattern can be directly represented in
terms of two dimensional ”concatenation” relations. A one dimensional
scheme may require a number of preprocessing operations to reduce the
two dimensional pattern to a one dimensional form. Traditional one
dimensional schemes suffer inherent difficulties in modelling objects
with disconnected parts or objects with holes. Two dimensional
grammars are easily amenable to parallel processing [72,91,101].
Parallel forms simplify mathematical models for pattern represent-
ation [72] and significantly reduce parsing time, which can be

exploited in developing very efficient classification schemes.

After proposing a formal mathematical model for describing two
dimensional patterns based on array grammars, we will concentrate on
the problems of learning or inferring a formal grammar for a class of
patterns or pictures from a finite set of samples. Inference schemes
generate a number of 'candidate' grammars from a set of training
patterns. Complexity and discrepancy measures are criteria for select-
ing the simplest and 'best-fitting' grammar and are therefore studied

in some detail.

A strong emphasis is placed on the learning of the structures of
noisy and distorted patterns. Patterns observed in the sample set are
different from ideal structures because of inherent limitations in the
recording and digitization processes. Changes in the ideal represent-
ations of the pattern that are caused by phenomena independent of the
underlying ideal structures are termed 'noise' processes, whereas those
that depend on the underlying structure are called 'distortion'
processes. A number of noise and distortion models are discussed, in-
cluding one based on Markov random fields. We compare the robustness

of the inference procedure under various noise models.

The grammatical scheme proposed in this thesis has several
advantages. Automated industrial inspection and robot applications
require well designed and very precise lighting systems to avoid
problems like low contrast, shadows and extraneous detail, which
increase the complexity of vision algorithms [A7]. The use of a noise
or distortion model in conjunction with an inference scheme when the
lighting system is inadequate would incorporate some of these problems
in the model description and keep the recognition scheme
computationally feasible. Also, integrating the noise or distortion
process with the learning scheme allows setting up discrimination or
classification procedures which distinguish between patterns that have
the same ideal structures, but have been subjected to different noise

or distortion processes. Array grammars can generate patterns of

different sizes, but with the same proportion of length to width. This
property can be used to advantage when an imaging device for robot
vision does not maintain a fixed distance from the objects of interest.
Conventional schemes, such as template matching, require an extra dis-
tance dependent scaling operation before recognition. Moreover, the
integration of the noise and distortion processes into the array gram-
mar model allows small deviations in the orientation of the pattern
thus simplifying the registration problem. It may also help differen-
tiate among two sets of parts such as two types of bolts that have

similar shapes, but different proportions of length to width.

The remainder of this chapter briefly presents the grammatical
inference problem and reviews techniques that have been used for two
dimensional pattern representation. Lastly, the organization of this

thesis is described and its main contributions are summarized.

1.1 Grammatical Inference

Grammatical inference algorithms obtain a formal grammar [28,52,95]
from a finite set of training patterns, expressed as strings or arrays
of primitives. The finite set of training patterns is generally re-
ferred to as the sample set, and consists of two disjoint subsets - the
positive sample set, 5+, which contains patterns from the language gen-
erated by the grammar to be inferred, and the negative sample set, 5’,

of patterns which do not belong to that language. The grammars {Gi}’

created by an inference algorithm should necessarily satisfy the

following conditions :

S+CZ L(Gi) and S-CS L(Gi).

where L(Gi) is the language generated by grammar Gi and L(Gi) is the
the complement of L(Gi)' Our inference schemes utilize only a positive

sample set, S+

The sample set S+ can be probabilistic. Each pattern in the
training set has a probability and patterns with higher probability are
observed more frequently [A2]. The corresponding inferred probabilis-
tic grammar associates a probability with each of its production rules

to reflect the probability of occurrence.

Grammatical inference algorithms in the literature apply only to
strings of symbols and are therefore one dimensional in nature.
Inference of finite-state string grammars have been attacked analyti-
cally [9,A2]. Many properties of context-free grammars are undecid-
able, so inference algorithms for such grammars require heuristic
techniques, which are applicable to restricted subsets of that class of
grammars. An example is the Crespi-Reghizzi algorithm for operator
precedence grammars [2A]. Inference algorithms for string grammars
have been extended to other constructs, such as trees and webs, with
the idea that they naturally represent higher dimensional

patterns [1A,A6,65].

Enumerative and constructive methods of inference have been pro-
posed in the literature. Enumerative techniques usually involve an
exhaustive search through a large, specified class of grammars, such as
finite-state grammmars, for ‘candidate' grammars that best describe the
samples [53.69.81.118]. Tree searching and state space methods. in
conjunction with a cost function, are used to direct the search.
Constructive techniques derive grammars from similarities among the
syntactic structures in the sample set. Restriction to a particular
class of grammars enables one to define specific rules to convert
observed syntactic structures into production rules for the grammar.
Rules can also be defined for adding, merging. and deleting productions
based on some criterion. to obtain a more acceptable grammar. An.
example of a constructive inference technique is Cook's [22] 'hill

climbing' approach to the inference of context free grammars.

Both methods require a criterion. or cost function. to determine
the 'best' grammar from a set of candidate grammars. The two
properties that are used to define a good grammar are its structural
complexity, and its discrepancy or fit to the given sample set. It is
trivial to infer a grammar whose language is identical to the given
sample set. To be useful, an inference algorithm must infer a grammar
which is a natural generalization of the training set, yet has minimal

complexity.

Syntactic Pattern Recognition schemes are often used to
discriminate among classes whose sample strings contain random errors,
or when languages are not disjoint. The quality of recognition can be
improved by inferring probabilistic or stochastic grammars (p-gram-
mars) [A2,llO], where each production rule of the grammar has an asso-
ciated probability. These probabilities indicate that some patterns
occur more frequently than others. or certain substructures within the

pattern are more favoured than other possible ones.

The general inference problem for string grammars has not been
solved completely. and there are still a number of open questions, both
in formal language theory and in the capability of general grammatical
inference schemes. However. several techniques exist for inferring
limited classes of grammars and we shall be using some of them to

develop our two dimensional inference scheme.

1.2 Two Dimensional Picture Representation

Regarding a class of two dimensional pictures as a two dimensional
language has led to the use of formal linguistics for picture descrip-
tion and generation. Pictures are seen as a concatenation of subpic-
tures. which are in turn built up of still smaller parts. called the

picture primitives. Some models express the relations among picture

primitives, either by string concatenation or by algebraic operators.
whereas others generalize one dimensional string grammars to two dimen-

sional forms.

l.2.l Picture Description Languages

One of the first attempts to represent two dimensional objects in
terms of primitive substructures was the Freeman [38] chain code.
Among its drawbacks are extremely long and unwieldy strings for object
description, susceptibility to noise, and difficulties encountered with
the representation of disconnected curves. Narasimhan [78.79] used a
syntactic description model for classes of pictures composed of line
like elements, but his method does not seem very easily generalizable
to complex pattern descriptions and inference schemes. Shaw [97]
extended Freeman's simple left-right string concatenation to four
different binary relations among primitives in his Picture Description
Language (PDL). resulting in two dimensional properties being
expressible in string form. This technique looks quite elegant, but
certain simple geometric operations like reflection and rotation are
difficult to describe and pictures are often not easily expressed in

terms of primitives of the form required by PDL.

These methods reduce two dimensional objects to one dimensional
string representations which makes it quite clumsy to describe

disconnected parts, or configurations with embedded objects. or even

l0

objects with holes. These limitations are overcome by structures such
as trees [13,A0,A5] and graphs [20.86.92]. While trees can represent
high dimensional patterns more naturally than strings. and can provide
simpler parsing schemes. it can be shown that there is a direct corres-
pondence between tree representations and context free grammars [A6].
Therefore, they have the same difficulties as strings in complex
pattern descriptions. and do not seem to present a very natural scheme
for general two dimensional structures. Graph or web representations
are the most general schemes available for the description of two
dimensional patterns. but require extremely complicated parsing tech-
niques which are presently not well developed [15]. The development of
automatic learning and decision making procedures is extremely
difficult for general two dimensional schemes. There are. however,
proper subsets of web grammars called matrix and array grammars, that
have well developed formal structures, are relatively easy to parse.
and also seem to be well suited to picture generation and description.

This thesis will concentrate on these structures.

1.2.2 Two Dimensional Grammars

Kirsh [58] gave the first example of a two dimensional array
grammar for generating a class of labelled A5-degree right triangles.
His technique is not easily generalizable to other classes of pictures.
Yodokawa et. al. [119] attempted to construct an appropriate mathemati-

cal model for two dimensional grammars. These authors also demon-

Il

strated the existence of two dimensional Turing-type acceptor automata.

Siromoney et al. [99] proposed a model of a two dimensional matrix
grammar (GI’GZ)’ which is essentially a two level string grammar. The
horizontal string grammar, G], generates a string of length n from k
possible (intermediate) symbols. Each symbol in this string serves as
a start symbol for a vertical right-linear (regular) string grammar.
Grammar G2 is the union of disjoint grammars which generate string
columns of size m. consisting of terminal symbols which are picture
primitives. This two step process generates an mxn picture. (Details
are presented in Appendix C). Wang [11A] proposed a variation on

Siromoney matrix grammars as a three tuple (G],GZ,M), where G is a

1

horizontal grammar defined as before. but G2 is the union of k disjoint
grammars which need not be right linear; M - {m],m2, ------ ,mp}. where
each mi is a kxl matrix consisting of k production rules, one from each
one of the grammars that make up G2. In the matrices generated by
these grammar forms, it is not possible to maintain a fixed ratio of

length to breadth, since the column expressions are independent of the

row expressions.

Rosenfeld and Milgram [7A] defined another form of array grammar
and acceptor automaton based on a Turing machine with a two dimensional
tape. This scheme starts with an array of blank symbols and fills it
with picture primitives using production rules which map subarrays into

subarrays. The main problem with these grammars is the number of

12

restrictions placed on the production rules to obtain patterns in their

desired shape, as explained in Chapter 2.

Siromoney et al. [100] described a theoretical array grammar model
which is more powerful than the matrix grammar models. since it can
generate picture models not representable by matrix grammars. Its
rewriting rules use a two step block structured approach, and describe
a picture model in a clear step by step procedure. These array
grammars are also very natural generalizations of one dimensional
string grammars and possess several properties of one dimensional
grammars. We have based our structural model on the Siromoney array
grammars. Its two step block structured approach has been exploited to
describe a simple model that incorporates noisy and distorted patterns.
A detailed formal description of array grammars is presented in the

next chapter.

1.3 Organization and Contributions of the Thesis

Chapter 2 contains both a theoretical review of array grammars and
a critical discussion of properties required by the learning procedures
to be developed. This chapter provides a comprehensive review of
grammatical inference literature. and the models that have been
developed for the structural description of two dimensional patterns.
The concept of a probabilistic array grammar is defined for the first

time in Chapter 2.

13

Chapter 3 introduces the picture description model and the main
inference scheme. Several problem areas. including the role of com-
plexity and discrepancy measures, and the need for probabilistic noise
and distortion models, are examined. The main contribution of this
chapter is the development of a two dimensional learning scheme. and

the integration of noise and distortion models into this scheme.

Chapter A describes complexity and discrepancy measures for string
grammars. The emphasis is on the properties of these measures, and
their significance in the grammatical inference framework. An im-
portant contribution of this chapter. is the definition of a new struc-
tural discrepancy measure that overcomes some of the limitations of the
presently available measures. The concept of structural discrepancy
for string languages is also extended to define a discrepancy measure

for array languages.

Chapter 5 describes the theoretical background and the
computational schemes used for three noise and distortion models. The
computational aspects of the generation of patterns and the estimation
of parameters for the proposed schemes are then studied. The main
contribution of this chapter is the development of a general distortion
model based on Markov random fields. This is the first application and
implementation of nonhomogeneous Markov random fields to image

processing.

lA

Chapter 6 presents the results of several experiments applying the
inference scheme in conjunction with the noise and distortion models to
two different pattern classes. The two dimensional discrepancy measure
is employed to assess the robustness of the inference scheme. Another
contribution of this chapter is the discussion of computational matters
and potential speedup using parallel schemes. Finally, Chapter 7

presents the conclusions of our study and suggests future research.

The original contributions of this thesis can be summarized as -

1) Development and study of a two dimensional model description and
inference scheme that incorporates the description of noisy and
distortedipatterns.

2) The definition and use of a new discrepancy measure for string
grammars that incorporates both probabilistic and structural differen-
ces.

3) The extension of the discrepancy measure for string gramars to a
measure of suitability for array languages.

A) The adaptation of nonhomogeneous Markov random fields in defining
a distortion process that is very general, and covers a number of other

noise and distortion models.

CHAPTER II

ARRAY GRAMMARS

This chapter establishes the mathematical basis for the inference
procedures proposed in this thesis. There are two different structural
forms for array grammars, the Rosenfeld array grammars and the
Siromoney array grammars. We first discuss the Rosenfeld form of array
grammars and its limitations. Then we present a formal description of
the Siromoney array grammar (AG), review definitions and notation and
briefly describe array automata. Examples and illustrations clarify

definitions and notation.

2.1 Rosenfeld Array Grammars

A Rosenfeld array grammar [7A.9l] contains production rules that
replace one subarray of symbols, with at least one nonterminal symbol,
by another subarray of symbols. which need not be of the same shape and
size. These grammars produce arrays which are accepted by Turing
machines with two dimensional tapes, where the primitive symbols are
never rewritten. The initial array consists of a start symbol on one
location of the tape and blank symbols on all others. The terminal

array is made up of pattern primitives and blank symbols. Simpler

15

16

forms of these grammars could be defined by requiring the grammar to be
monotonic. i.e., make the number of non blank symbols in the derivation
sequences monotonically non decreasing. Rosenfeld [93] defined normal
forms for these grammars and showed the equivalence of different types
of array grammars. He also showed [91] that some of these grammars
have parallel forms that are essentially the same as local digital

picture processing functions.

Individual rows and columns of the host array in Rosenfeld array
grammars must be shrunk or stretched by varying amounts to accommodate
different rewriting rules. The implication of replacing one subarray
by another of a different size and shape extends beyond the immediate
vicinity of the subarray and causes relative displacements among rows
and columns, and also overall changes in the outer boundary of the
representation (Fig. I). This effect. known as the shearing effect,
makes it difficult to generalize properties of one dimensional grammars
to two dimensions and causes difficulties in object description.
Rosenfeld [91] circumvented this problem by introducing isotonic array
grammars. which require the left and right sides of the productions to
have the same shape. Cook and Wang [23,116,117] defined normal forms

for these grammars and established the Chomsky hierarchy.

The shearing effect. and the steps taken to remedy it. make
Rosenfeld array grammars rather cumbersome. These array grammars are

not natural extensions of one dimensional grammars. The Siromoney and

17

8
Production rule: 5 -> S A
applied to result
#
##II‘ #B#
#5# #SA#
### ###

FIG. 1: Shearing Effect

Wang matrix grammars discussed in Chapter I, though simple extensions
of one dimensional grammars. lack the power to generate pictures of
different sizes having the same proportion. For example, one cannot
generate objects of size mxn such that m and n have a fixed ratio. The
fact that the column grammars are all independent makes it difficult to
express any relation between columns. as one goes down the rows.‘ The
Siromoney array grammars(AG) overcome the above difficulty and are free
from the shearing effect. We base our structural model on the

Siromoney array grammar form.

2.2 Siromoney Array Grammars

Structural models based on array grammars lend themselves very
naturally to the generation of two dimensional patterns. Each cell

represents a primitive structure of the pattern, and therefore a

18

terminal symbol of the array grammar. For example, if one considers a
class of binary digitized pictures. the simplest set of primitives is
{0.1}. An mxn rectangular picture array consists of mn samples from
this set. As the number of primitives and the sizes of the arrays to
be processed increase. one must resort to primitive selection
techniques. We shall not be concerned with this problem. and assume we

have two primitives. or terminal symbols for our array grammar.

The rewriting rules of Siromoney array grammars generalize the
notion of string rewriting rules to array rewriting rules. with arrays
of terminals replacing strings of terminals. In place of the conven-
tional left to right concatenation. row and column catenation are
defined, which establish two dimensional structure. Derivations are
restricted by the conditions of row and column catenation; i.e. two
matrices can be 'row catenated' only if they have the same number of
columns and two matrices can be 'column catenated' only if they have
the same number of rows. Rewriting rules can be regular(R),
context-free(CF), or context-sensitive(CS). Appendix A clarifies the
concept of row and column catenation of arrays, and defines some of the
notation used in the formal definition of array grammars [60,100].

Probabilistic array grammars are introduced at the end of this section.

19

2.2.1 Formal Definition of Siromoney Array Grammars

The Siromoney array grammar AG is a four tuple,
AG g (V, I ’P’S)

where V is the union of V], a finite set of nonterminal symbols. and

V2, a finite set of intermediate symbols;

I is a finite set of terminal symbols; V(\I = 0

S E VI is the start symbol.

P = P] U P2 U P3; P1 is a finite set of nonterminal rules, P2 is a
finite set of intermediate rules and P3 is a finite set of terminal
rules.

PI can be described as a finite set of ordered pairs, (u,v) written as

+
(u -> v). where u,v 6 (VI U V2) , or (VI U V2)+.

I: u = ulslvl and v = u}a v],

+
a E (V] U V2) or (VI U V2)+.

PI is context-sensitive if 3 (u,v) E P
where SI 6 V]. u]. v],

1
(VI U V2)+.

P is context-free if V (u,v) E P]. u E V v 6 (VI U V2)+. or

1!

Lastly. PI is regular if V (u,v) E P]. u E V . v E X G Y (B is 0 or B).

1

X 6 VI and Y 6 V2 or vice versa.

P2 is a set of ordered pairs (u,v), such that

20

+
u,v 6 (V2 U {x].x2. ----- ,xp}) . or
U.v 6 (V2 U {xl.x2. ----- .xp})+i
++ . . . .
where x].x2, ----- ,xp 6 I are arrays of prImItIves with conformable

rows and columns. The implication is that the intermediate rules in-
volve only intermediates and a fixed and finite set of arrays in I++.
Further. each intermediate in V2 generates a language. called the
intermediate matrix language (IML), whose terminals are arrays with

either the same number of rows or the same number of columns.

P3 is a finite set of ordered pairs (U.V). U 6 (V1 U V2) and V E 1++°

An array grammar is called an (XX:YY) array grammar, where XX
denotes context-sensitive. context-free, or regular depending on
whether PI is context-sensitive. context-free, or regular, and YY
denotes context-sensitive, context-free, or regular if at least one
intermediate language is context-sensitive. context-free, or regular.
and all other languages are either the same or below it in the Chomsky

hierarchy.

For convenience of notation we write the IML generated by

A 6 V2 as.

LA - {x I A X E (x1,x ----.xp)+, xi 6 I++ and all xi have the

2’

same number of rows}. or

H
LA {X I A X E (x].x2.----,xp)+. xi 6 l and all xi have the

21
same number of columns}.

Beginning with the start symbol 3, derivations proceed by applying
the rules from P1 successively until all nonterminals are replaced.
This may involve the use of a P3 rule if the symbol in the innermost
parenthesis belongs to V]. The operators 0 and B are not mutually
associative. so proper parenthesization is required. Step two involves
replacing each intermediate A 6 V2 by elements from LA’ subject to the

conditions imposed by row and column catenation. The replacement

starts from the innermost parenthesis and proceeds outward.

Since P3 by itself does not play an important role in the
derivation scheme, we modify the above definition of P and incorporate
P3 into PI and P2. Therefore, Pl can also have productions u -> v,
where u E V] and v E I++. An important fact to be noted is that
derivations would come to a dead end if conditions for row and column

catenation were not satisfied. Examples of array grammars are shown in

Figs. 2 and 3.

This generation procedure requires that the nonterminal rules
generate a set of blocks. catenated horizontally and vertically. This
we call the block structure description of the picture. Filling in
each block with primitive symbols using the intermediate rules
P2 creates rectangular arrays of primitive symbols. The set of arrays

generated by an array grammar is called the array language. and the

 

22

NONTERMINAL = { S I: INTERMEDIATES - {A.B}; PRIMITIVES = {x,.}

NDNTERMINAL REWRITING RULES: 3 -> ( A e s ) g B.
S -> X
INTERMEDIATE LANGUAGES: LA a { (.)n | n>=1 }

LB - { Ix)n l n>=2 }
A REALIZATION or SIZE A:
(I) APPLY NONTERMINAL RULES -
S -> I A e S ) G B -> ( A e (( A e S ) A B )) ¢ 8
->(Ae((Ae(IAes)¢9))¢a))¢a
->(A6((Ae((Ae((AeS)¢B))¢B))¢B))¢B

(2) FILL INTO BLOCKS WITH INTERMEDIATE LANGUAGES '-

..L‘ I
I7 T

 

 

 

 

X

X X

X X X

. X X X X
X X X X X

 

 

 

 

 

 

ﬁg; _2_: 5.15.1! Grammar - Isosceles Triangle

23

NONTERMINALS = I 5 .TI; INTERMEDIATES = {A.B}: PRIMITIVES = {x,.}

NONTERMINAL RULES. P S -> ( A t T ) e B

I. T -> ( A i T ) e B

T -> X X
X X
INTERMEDIATE LANGUAGES:
n
. . . . X X
LA 8 X X X n >= 0 LB = . X X n >= 1
X X X X X .

 

(. . .)

n

A REALIZATION: (1) BLOCK STRUCTURE FROM NONTERMINAL RULES

i—L

 

 

'Y T T 7 1' —T T T I T T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A T A T A I T
3 -> i —+ ——4 -> A 4 +7 -4 -> A s. 4
B B A B
B B
B
(2) APPLY INTERMEDIATE RULES -
X X X X X X X X X X X X X X
X X X X X X X X X X X X X X
C O C O O O O x x C
. . . . X X .
. . X X .
O O O O x x O O
O O O I x x O O O O
O O O O x x O O O O
O O I C x x O O O
O O x x O O O O O
O C O x x O C O
O O x x O O O O
X X .

x x O O O O O O I O O O 0

FIG. 3: Array Grammar or Numeral '7'

2A

individual arrays are called patterns. In image processing

applications these arrays represent digitized images.

2.2.2 Some useful properties

We now state some useful properties of array grammars which are

stated as theorems and proved in Siromoney et al. [100].

I) The relations between various classes of matrix grammar languages
(Siromoney) and array languages (Siromoney) is summarized in the

following two dimensional scheme.

RMLC (R:R)AL c (R:CF)AL c (R:CS)AL

C: C: C: i C:
CFMLC (CF:R)ALC (CF:CF)ALC (CF:CS)AL
C: C3 (3 <3

CSMLC (cs:R)ALc (CS:CF)AL c (CS:CS)AL

For example, all regular matrix languages (RML) are (regular,regular)
(RiR) array languages and are also context free matrix languages

(CFML).

This establishes the Chomsky hierarchy and implies that array
grammars are more powerful than their corresponding matrix grammars.
Since we are interested in developing inference schemes. we require
that models use only regular and context-free grammars, both of which

have well developed parsing techniques. Subsequent properties will be

25

stated only for regular and context-free grammars.
II) Let {Mn | n >= 1} be an infinite sequence of matrices.
(1) For n > 1, if Mn is in one of the following forms-

Mn = (x 6 Mn_]) i Y. or Y ¢(x 6 Mn_]).

or Y 0 (Mn_ 6 X), or (Mn_ 6 X) D Y,

1 1

where X and Y are matrices from IML's Lx and LY and satisfy row and

column catenation requirements. then {Mn} can be generated by regular

nonterminal rules.
(2) For n >= 1. if

or x] e (YI ¢ Mn_] ¢ Y2) 9 x2,

where X X2, Y1 and Y are chosen from IML's L L and L ,

2 x,’ xz' y, Yz

subject to row and column catenation restrictions, and X

I,
l and X2. or

Y1 and Y2 are nonempty then {Mn} can be generated by context-free

nonterminal rules.

Property II illustrates the self embedding structure of (CF:XX)AG's
analogous to the self embedding property of string context-free
grammars. These two properties indicate the approach one may have to
follow to develop a two dimensional inference scheme for the

nonterminal rewriting rules.

26

It can be shown that all the (CF:XX)AL's are closed under row and
column catenation. However, none of the (R:XX)AL's are closed under
row or column catenation. All the (R:XX)AL's and the (CF:XX)AL's are
closed under transpose, quarter- and half- turns, and reflections about
both the base and rightmost verticals. indicating that there are
relationships that one may use to parse patterns that are in the
different configurations mentioned above. using the array grammar for

the original pattern structure.

2.2.3 Probabilistic Array Grammars

A probabilistic array grammar is similar to a probabilistic string
grammar [11,39]. The probabilistic array grammar (p-AG) is defined as
a 5-tuple

p-AG - {v,I,P.R.3}

where V, I, P and S are defined as in Section 2.3.1 with P = P1 U P2.

R 3 R] U R2; RI is a finite set of probabilities that are assigned in
sequence to the nonterminal production rules in P] so that there is a
probability associated with each production rule; similarly R2 is a
finite set of probabilities that are assigned in sequence to the set of

intermediate rules.

27

The sum of the probabilities of all productions with the same left
hand side must be 1. Assuming that all nonterminal and intermediate
grammars are unambiguous, i.e., they have unique left most derivations,
the probability function for an array language is defined as

f(x) . TT Irk‘TTrl2 ifxEL(AG)
kEK IEL

where r1 signifies probabilities of nonterminal production rules and
r2. probabilities of intermediate production rules; K and L are sets
that represent the sequence of nonterminal and intermediate
productions, respectively, that were used in deriving the element x.
The p-AG is consistent if

XE: f(x) = l

xELIAG)

2.3 Array Automata

Krithivasan and Siromoney [60] have defined array acceptors to
accept languages generated by two dimensional array grammars as
(AA:BB) array automata for (XX:YY)AG's; where AA and BB can be linear
bounded. pushdown. or finite state depending on whether XX and YY are

context-sensitive. context-free. or regular, respectively. The

28

automaton illustrates a generation scheme for array patterns and a two
step parsing scheme that can be used for recognizing array languages.
Step one consists of imposing the appropriate block structure on the
pattern being tested, and step two checks if the contents of individual
blocks are members of the corresponding intermediate matrix language.
This approach is also exploited in the inference scheme described in

the next chapter.

A formal definition of an array automaton and its implementation is
given in Appendix B. We note that the automaton uses the sequen-
tial-parallel approach to processing. The first step is a sequential
step by step process to subdivide the picture into blocks, and the
second step is a parallel procedure where the automaton can work on

each block independently and simultaneously.

2.A Definition of Type and Block Number

In this section, we define the type of an array pattern. and block

numbers, which label samples from particular intermediate languages.

Type: The pattern type is defined to be the number of times
nonterminal production rules have to be applied to generate the

required block structure for the pattern.

Block number: Block numbers are defined for the blocks that make up

an intermediate language. All the blocks from an intermediate language

29

either have the same number of rows. or the same number of columns;
therefore. they can be arranged in order of increasing number of rows
if the number of columns is fixed, or in order of increasing number of
columns, if the number of rows is fixed. Integer labels are assigned
in sequence to the blocks of an intermediate language in order of their
increasing size determined by their variable dimension. These labels
are individually defined for each intermediate language of an array

grammar.

For example, consider the nonterminal rule

S->(A¢S)es

Applying the nonterminal rule four times we obtain the block structure

shown in Fig. A.

For the pattern of size A, the four blocks of intermediate languages A

and B have been labelled 1. 2, 3 and A.

2.5 Summary

In this chapter we have formally defined array grammars. stated
some of their useful properties and also described a type of
sequential-parallel array automaton for accepting array languages. The
examples suggest that these grammars provide a more powerful generative

model than matrix grammars in terms of generating patterns of fixed

3O

 

 

 

 

 

 

 

 

 

 

 

 

 

A 3 2 I <- L
+ + 4. + + +L
A S I
A + v+ %
A B l
A + 4 +
B 2
B 3
B A

 

 

+

41
T

Fl . A: Pattern Type and Block Number

proportions without the inconvenient shearing effects of Rosenfeld
array grammars. The block structure approach to array grammars is

exploited in defining a model for pattern description.

CHAPTER III

MODEL DESCRIPTION AND INFERENCE

This chapter develops the concept of two dimensional pattern
description and learning based on the array grammar model and describes
a scheme for inferring intermediate matrix grammars. Some interesting
research questions that arise in the inference scheme are discussed.
Techniques for computing block probabilities appears with the noise and

distortion models in Chapter 5.

3.1 Pattern Description and Learning

The general inference problem can be stated as follows: Given a
number of samples of rectangular arrays of various sizes from a class
of patterns or images, infer an array grammar that describes the
structure of this class of patterns. Section 2.3 suggests that this is
a two step process. The first step is the inference of nonterminal

rules. and the second is the inference of intermediate rules.

Prior information is required to set up the block structure, or the

nonterminal rewriting rules, for the class of patterns. This leaves

the learning problem to the inference of the intermediate matrix

31

32

grammars. Looking at this in another perspective. we are presented
with a set of patterns for which ideal structures are known. The
observable representations are distorted from the ideal structure
primarily because of inherent limitations of the recording and
digitization process. These pictures would not be accepted by an array

automaton trained to recognize only the ideal structures of the class.

Uncertainty in string grammars has been modelled by probabilistic
or stochastic grammars and error correcting parsing techniques. In
probabilistic grammars [ll,Al,69]. probabilities are assigned to each
production rule and a sample is accepted in a particular class only if
it is generated by the grammar representing the class with a high
enough probability. Methods have been developed to infer p-grammars
from a stochastic sample set. where a probability is given for each
pattern in the sample set. or assigned according to a probabilistic

model [A2,53].

Another approach to handling uncertainty in one dimension has been
to apply error correction to noisy strings and transform them to a form
that belongs to the class of patterns being described. Three common
types of corrupting errors are defined; insertion of extra symbols.
deletion of existing symbols. and the change of one symbol to another.
These are analogous to the three edit operations adopted to define
distance measures between strings, such as the string edit or Weighted

Levenshtein Distance (WLD) [6A,80,ll3]. Error correcting parsing

33

techniques [l.AA,67] develop proximity measures between a sample and a
language. Acceptance in a particular class involves finding the string
of minimum distance using a parsing algorithm (e.g. the Earley
parser [33]). Lu and Fu [67] define a stochastic deformation model for
patterns by extending a stochastic context-free grammar to a stochastic
context-free error induced grammar. Liou [66] developed another type

of heuristic inference procedure for regular and context-free grammars.

3.2 The Inference Scheme

One objective of this thesis is to incorporate distortions into the
array grammar model. so that the automaton derived from the grammar
accepts not only the ideal structure but also distored versions of the
ideal structure. Section 2.2.3 defines probabilistic array grammars,
and this concept is used to infer probabilistic grammars at the inter-

mediate level.

The proposed inference scheme is outlined in Fig. 5. The first
step is to impose a block structure on sample arrays using the
nonterminal rewriting rules. which are assumed known. using an array
automaton. as described in Sec. 2.3 and Appendix B. When the
underlying noise or distortion model is also assumed to be known. Step
2 computes the probabilities of blocks extracted from the sample set.
The inference problem is now reduced to the inference of p-grammars for

each intermediate language from training sets made up of blocks with

3A

+ ----------------------------------------------- +
I I
I 1) Using prior information I
I break up pattern into blocks. I
I Form sample sets for each IML I
I I
I ----------------------------------------------- I
I I
I 2) Using noise/distortion model, assign I
I probabilities to each block in sample set I
I I
I ----------------------------------------------- +
I I
I For each IML I
I I
I + ------------------------------------------ +
I I I
I I 3) Convert matrix samples into I
I I string form using vector primitives !
I I I
I + ------------------------------------------ +
I I - . I
I I A) Using p-string samples infer I
I I p-grammar for above strings I
I I I
+----+ ------------------------------------------ +
I I
I 5) Compute discrepancy measure for I
I array grammar I
I I
+ ----------------------- + ----------------------- +

associated probabilities.

Step 3 codes each block into a string form using the property that
all matrices of an intermediate language either have the same number of

rows or the same number of columns. For example. the blocks of IML LA’

35

for the digit '7' in Fig. 6 have three columns whereas IML LB has a
fixed number of rows. When the number of columns is fixed. each row is
coded as a single symbol. Thus rows represented by the pattern 'xxx'
are coded as 'u' and the rows represented by '...' are coded as 'v'.
Then

2 n
LA = {vu v | n >= 0}.

IML LA is thus reduced to a string form. with u and v as primitive
symbols. The terminology 'vector primitive' for u and v means that
they represent a row or column of primitive picture elements.
Similarly. each column of a IML with a fixed number of rows is replaced
by a coded vector primitive to reduce the elements of the language to a
string form. As shown in Fig. 6,
LB: {pqrstnI n >; I}.
If the number of rows or columns is k, the total number of possible

vectors is 2k.

Step A applies a standard grammatical inference algorithm to obtain
a p-grammar. When a regular intermediate grammar model is used, the
k-tail method of Biermann and Feldman [9] along with a maximum likeli-
hood estimation scheme [A2] will infer the probabilistic intermediate
matrix grammars. The k-tail method produces a number of candidate
grammars as the length of the tail. k. increases. The 'best' candidate
grammar is selected with a criterion based on the complexity of the
grammar and the discrepancy between the language of the grammar and the

given sample set in Step 5. These complexity and discrepancy measures

36

 

 

 

/ 1
L = X X X n >= 0
A Ixxx I
L(XXX)n
/
Code X X X as u and . . . as v.
Then
2 n
LA = vu v | n >= 0 }
..xx “
L = X X . n >- I
3 xx .iI
Code
0 O x X 0
. as p. X as q, X as r, . as s and . as t.
X X .
Then

L8 = { pqrstn | n>=l }

Fig. 6: Coding Intermediate Languages 9i L1;

into String form.

are a key step in the inference scheme. Properties of different

complexity and discrepancy measures are discussed in the next chapter.

The inference scheme poses a number of interesting research
questions. The first involves the selection of mathematical noise and
distortion models to describe the deviations in the observed patterns

from the ideal pattern structures. Step 3 requires a moderate number

37

of vector primitives, so as to keep the inferred grammar simple. This
gives rise to the question of how vector primitives may be optimally
defined. Another research question is the establishment of a criterion

for the suitability of the inferred array grammar.

A major contribution of this thesis is the development of
mathematical noise and distortion models that describe deviations from
the ideal pattern structure. These models are used to compute
probabilities of individual blocks. A simple random noise model, two
independent distortion models. and a distortion model based on Markov
random fields are presented in Chapter 5 along with techniques for
estimating the parameters of these models and computing the block
probabilities. In Chapter 6. we study the robustness of the inference

scheme to different noise and distortion models.

Once all the intermediate matrix grammars are inferred. we have an
array grammar representation for the pattern class being considered.
It is important to establish the suitability of this grammar to the
pattern class. or how well it fits the training patterns. A
discrepancy measure which compares the 'fit' between two array

languages is developed in the next chapter.

38

3.2 Summary

This chapter describes a model for two dimensional patterns in
terms of array grammars and proposes a learning scheme that infers pro-
babilistic intermediate matrix grammars from noisy and distorted
patterns. once the block structure description of the pattern class is
known. The development of the two dimensional learning scheme and its
use in conjunction with probabilistic noise and distortion models is a

unique contribution of this thesis.

CHAPTER 1V

COMPLEXITY AND DISCREPANCY MEASURES

A large number of grammars exist which generate languages that
contain a given finite training set. A cost function is needed to
evaluate the 'goodness' of candidate grammars generated by an inference
scheme. This function is either intrinsic to the scheme or imposed on
its outcome. At one extreme. the language of a candidate grammar could
be the finite sample set itself. At the other extreme, the language
includes all possible strings. The grammar that generates exactly the
finite sample set tends to be more complex in structure than grammars
that generate larger languages. Therefore, it seems natural that the
'goodness' of an inferred grammar be based on two measures - the
grammatical complexity, which refers to the structural complexity of
the inferred grammar, and the discrepancy. which measures fit between

the language generated by the grammar and the given sample set.

On one hand, the inference scheme should pick the structurally
simplest grammar as the solution grammar, but on the other hand, the
language generated by the grammar should closely fit the given sample
set and contain only strings that are logically implied by the training
samples. The choice of the 'best' grammar will then depend on how much

discrepancy one may allow in light of the complexity of the grammar.

39

AO

Complexity and discrepancy measures have been defined for string
grammars, but have not been generalized to other constructs such as
tree or array grammars. In this chapter. we review the complexity and
discrepancy measures cited in the literature, and pick a simple comp-
lexity measure that suits our inference scheme. A new discrepancy
measure is defined based on dissimilarity measures among strings and
its key properties are presented. The concept of the discrepancy meas-
ure for string languages is extended to array languages by defining a
measure of fit between two arrays. This measure is then used to define
the suitability of an inferred array grammar to the particular class of

two dimensional patterns under consideration.

A.l Complexity Measures

Complexity measures in one dimension are based on the size of a
grammar or on information theory. Size complexity measures count the
number of nonterminals. terminals and productions of a grammar, while
information theoretic measures are based on probabilistic models. In
the first two parts of this section we briefly review size and
information theoretic complexity measures. In the last part, we
compare these measures and select one for our inference scheme. based
on 'its adequacy to effectively order inferred grammars by their

complexity, and its computational complexity.

Al

A.l.l Size Complexity Measures

Size complexity for string grammars can be dynamic or static.
Dynamic measures [51] deal with the notion of the 'speed' with which a
grammar generates its language. and are therefore directly tied to the
number of steps an automaton would take to recognize a string. This
concept has been used for language parsing in compiler theory. Static
measures [10,35] have been more widely used in the grammatical
inference framework, because they are relatively easy to compute. They
are usually based on the number of nonterminals. production rules

and/or the sizes of production rules.

Gruska [A8]. Blum [10] and Wharton [118] defined complexity
measures by mapping a class of grammars G, into the nonnegative inte-
gers. Gruska's definition allows intuitively unacceptable mappings.
such as constant mappings [118]. Blum's measure. defined on Q. and a
fixed terminal set. I. is based on two important axioms; (i) there are
a finite number of equivalence classes of grammars of any given com-
plexity, and (ii) there is an effective procedure which determines for
any positive integer. c. which grammars are of complexity c. For
example. a complexity measure C. for the class of context free grammars

in Chomsky normal form is.

C(G) = n(V) + n(P) + n(I)

where. n(V) is the number of nonterminals, n(I) is the number of

A2

terminal symbols and n(P) is the number of production rules. A
complexity measure valid for one class of grammars may not be valid for
a larger class of grammars. The complexity measure defined above
cannot apply to the general class of context free grammars, because the
finiteness of the complexity classes depends on the fact that the
maximum length of the right hand side of a production in the Chomsky

normal form is two.

Wharton [118] remedied the above problem by introducing the maximum
length of the right hand side of a production into the complexity
measure. Feldman [35] also defined a general complexity measure to be
expressible in terms of an intrinsic complexity which is EAO
(effectively approximately ordered) by the number of symbols required
to write the grammar, and a derivational complexity. This complexity
function is a computable. unbounded. increasing function of each of its
two arguments. Feldman's measure is quite restrictive in the sense
that one measure. length. is used as a cannonical measure, in terms of
which all other measures are defined. Blum's definition of complexity
is more general. because it does not require a specific cannonical
measure. All the size complexity measures discussed above. are based
on intuitively simple concepts. and are directly linked to the

structural size of grammars.

A3

A.l.2 Information Theoretic Complexity Measures

Information theoretic complexity measures are based on
probabilistic grammar models. The term 'information' comes from Infor-
mation Theory. and represents the decrease in uncertainty that comes

about with the occurrence of an event in a stochastic process.

Cook [21] defined a complexity measure for grammars as the
information required to specify the grammar, or the sum of the complex-
ities of its productions. The complexity of a production is determined
by its right hand side and the complexity of a string is computed in
terms of a grammar-grammar. This measure assumes that the elements of
productions are statistically independent, which may not be realistic.
For example. context free grammars exhibit similarity among product-

ions. A way around this is to introduce conditional formulation.

Other approaches have been suggested by Morning [53]. who derived a
complexity measure from Bayesian theory, and Feldman et al. [3A] who
defined the derivational complexity of the sample set relative to the
grammar G. The derivational complexity is closely related to the
probability of the string being generated by the grammar. The less
probable the string, the higher its derivational complexity.
Therefore, this measure is similar to Horning's because it maximizes

the probability of occurrence of the sample set by the grammar.

AA

Complexity measures have also been defined [19.61] in terms of the
generative capacity of grammars, called the entropy or channel capacity
of the grammar, and depend directly on the number of words generated by
the grammar. Chomsky and Miller [19] formulated the information
capacity of regular grammars in a manner similar to the channel
capacity concept of Shannon [96]. Kuich [61] developed similar results
for context free grammars and defined entropy in terms of the structure

generating function for a grammar.

All the information theoretic measures relate the complexity of a
grammar to the information required to specify the grammar. in terms of
the number of production rules and the number of symbols used. Whereas
size measures are directly based on counts and are computationally much
simpler. these measures use probabilistic concepts to derive
information theoretic quantities. resulting in measures that are
computationally more complex. The choice of a complexity measure is
highly dependent on the problem at hand. Therefore, it is not possible

to define general criteria for a best complexity measure.

A.l.3 Choice of Complexity Measure

This section describes our choice for the complexity measure to be

used in conjunction with the inference scheme for intermediate string

A5

grammars. The measure is easy to compute, but permits quantification
of the differences in complexity among grammars generated by the k-tail

inference scheme described in the last chapter.

Size complexity measures. such as Blum's and Wharton's. are very
easy to compute. as opposed to those based on information theory.
Moreover, Cook's and Feldman's measures are tailored towards context
free grammars and are quite redundant for regular grammars. Horning's
measure requires the specification of the intrinsic probability of a
grammar. which is actually the a priori probability of that grammar in
the given class of grammars. This parameter is very difficult to

define in our application.

For these reasons, we choose the size complexity measure defined by
Blum. We define the set 9 (Sec. A.l.l). as the set of regular grammars
generated by the k-tail inference scheme for different tail lengths. A
complexity measure for regular grammars requires only the number of
production rules. The complexity of the inferred grammars then

increases monotonically with increasing tail lengths.

A.2 Discrepancy Measures

Discrepancy measures for string grammars evaluate the match between
the language generated by a grammar and the given training set of

patterns. These measures can use the differences in the probability

A6

distributions of the elements in the sample set and those generated by
the grammar or they can be based on structural differences between the

sample set and the language.

The discrepancy between two probabilistic languages LI and L2, can
be made up of three parts corresponding to the subsets. L‘(\ L2.
Ll-LZ and L2-L]. Languages L1 and L2 are said to fit closely if the

strings in L but not in L2, are structurally similar to the strings

1’

in L2; so also for strings in L which are not present in L Various

2 1'
measures have been proposed to quantify these ideas. such as the
Weighted Levenshtein Distance (WLD) [80,113] and the similarity
measures of Findler [36]. Liou [66] has defined variations of the WLD

and studied their properties. We use the WLD as the measure of

dissimilarity among strings.

In the inference framework. language L2 is the finite sample set.
S. The discrepancy measure should reflect how closely the language of
an inferred grammar fits S. and whether it is a natural generalization
of 5. Since L(G):> S, the discrepancy measure should depend on
probabilistic differences for strings belonging to S. and structural
differences on the set L(G)-S. The discrepancy or difference of fit,
is a form of proximity measure between two sets of strings. Therefore.
it would be desirable that they satisfy the metric properties,

analogous to distance measures in metric space. Measures of fit should

be bounded and convergent; otherwise the discrepancy might increase

A7

without bound for languages L(G) that are infinite generalizations of
S. Convergence of the measure also allows a comparison of individual
discrepancy values against an upper bound and makes the discrepancy
measure computationally feasible. In this section, we review existing
probabilistic and structural discrepancy measures and motivate

discussion for a new discrepancy measure.

A.2.l Probabilistic Discrepancy Measures

Maryanski [69] defined several discrepancy measures for probabilis-
tic languages based on the absolute difference or the square difference

between probabilities. Given two probabilistic languages L and

1
L2 with word probabilities p(x) and q(x) respectively, he defined six
difference measures which are not metrics. An example is the absolute

difference measure

oaIL,.L2) ~23 |p(x) - q(xll + naIL2.L,I
xELl
One difficulty of these measures is that as LI increases monotonically
in size, the distance between LI and L2 increases monotonically. so a

'best' grammar is not easily selected. Generalizations resulting in

infinite languages would be rejected in favour of finite

generalizations.

A8

Cook [21] defined a discrepancy measure between a finite training
set of samples and a language, which he interpreted as the information
lost or gained in going from the string probabilities in the sample set
to those in L(G). even though this measure has no basis in information
theory. Another problem with this measure is that there is no
guarantee that it will converge for infinite languages. Liou [66]
defined a procedure for limiting the number of strings to be considered
in computing the sum for Cook's discrepancy measure, while trying to
keep the information loss as low as possible. This procedure is still
computationally expensive, and does not directly reflect structural

differences among strings.

A.2.2 Structural Discrepancy Measures

Structural differences between the training set and an inferred
language can be evaluated in terms of a distance measure between two
strings. such as the string-edit, or the Weighted Levenshtein Distance
(WLD) [6A,80,ll3]. A string that belongs to both the sample set and
the language of the grammar being evaluated, would not contribute to
the discrepancy measure. Otherwise if x E L(G) but x C S. then d(x) 8
min {dist(x.Y)}. A general discrepancy measure could be defined as,

DELIG).S] ‘ XE: WXX) d(X). 'T-(l)
xEL(G)

A9

where {w(x)} is a sequence of weights that corresponds to the strings

of the language L(G).

This idea was used by Wharton [118] to define metrics on a class of
languages over a finite terminal vocabulary to measure the difference
between two strings. He defined a discrete metric, which is used in

exact language identification but is not useful for inference.

The set L(G)-S is countably infinite, so evaluation of (1) may not
be computationally feasible. There is no guarantee that the sum
converges. Therefore, this measure. too, is not very useful for our

purposes.

A.2.3 Choice of Discrepancy Measure

Maryanski's discrepancy measures, though computationally simple,
ignore structural differences and Wharton's weight measure ignores
probabilistic differences. Cook's measure incorporates both to some
extent but has the drawbacks discussed above. Therefore, we have
decided to define a new discrepancy measure. which takes into

consideration both structural and probabilistic differences.

50

A.3 New Discrepancy Measure

In this section, we propose a new discrepancy measure for string
grammars that possesses a number of desirable properties. We begin
with a formal definition of the measure, study its properties in
relation to the inference scheme, and also discuss computational
matters. A probabilistic difference measure. such as the absolute
difference measure in Section A.2.l, adequately measures the
discrepancy for S. In Section A.3.1 we define a new discrepancy
measure for L(G)-S. D(L(G).S), which is a special case of the general
structural discrepancy measure (I) used by Wharton. The overall
discrepancy between L(G) and S is defined as the following linear

combination of the above two measures.

where bl and b2 are positive real numbers which determine the relative
importance of each discrepancy in the total discrepancy measure. The
choice of bl and b2 depends on the application at hand. and will be

discussed in greater detail in Chapter 6.

A.3.l Definition

The structural discrepancy measure, D(L(G).S), is based on struct-
ural differences between probabilistic languages. Two languages L‘ and

L2 are defined to be structurally equivalent, iff

51

V x :x E L1¢=>x 6 L2.

The structural difference between L1 and L2 is expressed as,

0“],in -ELZp(x) «10:.in + g q(y) d(y.L,) -—(3)
1

Y 2
where p(x) is the probability of x G L1 and q(y) is the probability of
y 6 L2; d(z,L) is some measure of proximity between string 2 and lan-

guage L. Equation (3) can be rewritten as
D(L].L2) = ELdIXiL2)] + ELdIY.L])]

where E denotes the expected value implied in (3). and X and Y are ran-
dom variables representing strings from languages LI and L2 respect-

ively.

The discrepancy measure D(L].L2) is a proximity or dissimilarity
measure between the two languages L1 and L2. Anderberg [A] provides a
thorough review of such measures. They are normally required to

satisfy the following properties :

(I) D(L].L2) >- 0. with equality if and only if, L1 and L2 are struc-

turally equivalent,
(II) D(L].L2) - D(L2.L]).
A third condition. the triangle inequality, i.e.

52

would result in the discrepancy measure having metric properties.
However. we have not found a convenient and useful way of defining
D(L],L2) to make it satisfy the triangle inequality. Condition (II) is
always satisfied irrespective of the way we define d(x,L). To satisfy
condition (I) d(x.L) must satisfy the property

d(x.L) - o, if x E L. --(A)
From (3) and (A) it follows that

D(L .L) = p(X) d(x.L) + q(y) d(y.L ).
' 2 xELZ-L 2 ye; -L I

I 2 2 1
In general, we will define d(x.L) in the following form.

d (X.L) = X f (X.y.WLD(x.y)) .
yEL

where WLD(x,y) is the Weighted Levenshtein Distance between the two

strings x and y. An example is

d(x.L) - Za(x.yl g(WLD(x.y)). --<5)
yEL
Here a(x.y) is a weight factor. called the window function. As L can

be an infinite language, the weight factor limits the sum to only a
finite number of terms. We make a(x.y) depend on the length of string
x. denoted by £(x). to bound the magnitude of d(x.L). Therefore, there
exists a constant c, for which

0 <- d(x.L) <- c, Vx E L].
Thus for any arbitrary language, L,

p(x) d(x.L) <= p(x) c <= c --(6)

PIX) d(x.L) >= 0 "(7)

53

Equations (6) and (7) show that (3) is bounded and therefore,

convergent .

Equation (2) reduces to

D(L(G).Sl = p(x) d(x,S) --(8)
L G)-S

where d(x,S) = 0, if x E S. The above is usually an infinite sum, but
because of its bounded and convergence properties. only a finite number
of terms have to be computed to produce D(L(G),S) for a specified

accuracy.

A.3.2 Particular Proximity Measures

In this section we develop particular proximity measures defined in
(2) that reflect the difference in structure between a string x and the
set of strings in the language L. Three different discrepancy measures

are proposed below.

The minimum discrepancy measure is defined as.

Dmin(L].L2) ‘xEZLIpOO dmin(X.L2) +YéEjL2q(y) dmin(y’Ll)°
where
dmin(x’L) = a(x) minIWLD(X.Y)} --(9)

yEL
and a(x) = l / £(x). Since

WLD(x,y) <= £(x) + [(Y)

5A

we have

minIWLD(X.y)} <‘ L(x) + b
yEL

where b <= min{ £(y)}, which shows that dmin is bounded. which implies

that Dmin(Ll’L2) Is convergent.

The minimum discrepancy measure can be looked upon as an optimistic
estimate of the proximity between two languages. Another estimate may
be an average distance between string. x, and the set, L. A reasonable
way to define the distance would be to compute a weighted average
distance between x and all strings from a finite subset of L. Such
discrepancy measures are called window discrepancy measures. Finite
subsets are identified by placing windows on the set L which assigns
'weights' to all strings whose lengths are between £(x)-c and £(x)+c,
where c >. 0. The general definition of window discrepancy measures

d*(x.L), appears below. Let

Lc(x) = {yEL| £(x)-c<= £(y)<- £(x)+c}, where c >. 0
If x E L, d*(x,L) = 0.
If x G L and Lc(x) 8 O. define d*(x,L) - 2 + c/ £(x).

If x E L and Lc(x) f O. d*(x,L) =YELa*(x,y) WLD(x,y).

where a*(x,y) - 0, if y E Lc(x).
Let Kc(x) denote the cardinality of Lc(x). For some fixed c >= 0. the

rectangular window function is.

55

l / ( £(XI KCIX)). if Y 5 L (x)
c
arect(x’y) a
O , otherwise.
The corresponding discrepancy is

f
0, if x E L --(10)

drect(x’L) =<2 + c/ £(x). if Lc(x) - O. x G L

[l/ £(x)] WLD(x.c) otherwise,

 

\

where m denotes the average distance between x and strings in
Lc(x). Similarly a triangulgr window function is defined as

( £(Y)- £(x)+c)/Ci £(y)<' £(x). Y 5 Lc(x)
atri (My) = l/( £(x) .Kc(x))
< ( £(x)- £(y)+c)/c; £(yl> CIXI. y E Lc(x)

0 . Y i Lc(x)

 

Strings in Lc(x) with lengths close to £(x) influence the triangular
proximity most. The corresponding discrepancy function is defined as

f
0, if x E L

dtri(x’L) g

A

2 + c/£(x). if Lc(x) = O. X¢ L --(11)

 

\atri(x,y) WLD(x,y). if x EL.

From a comparison of the three discrepancy measures it is apparent
that

dmin(x’L) <- d (X.L):

rect
dmin(x.L) <= dtri(x.L).
Therefore it follows that

D . (L].L2) <= 0

mIn (Ll’L

2)i
Dmin(Ll’L2) <. Dtri(Ll’L2)°

FCCt

56

For a fixed window size. c >= 1, we have

Dtri(LI’L2) <- D (L L

rect l’ 2)'

This confirms the fact that the minimum discrepancy measure is an
optimistic estimate of the proximity. The triangular and rectangular
measures fall into the category of weighted average estimates. The
next section shows that all three discrepancy measures behave similarly
when used with an inference scheme. The choice of a particular measure

depends on the application being considered.

A.3.3 Properties of Structural Discrepancy Measure

We now look into properties of a structural discrepancy measure
D(L(G),S) in the grammatical inference problem. The greater the
probability or the smaller the length of a string, the larger is its
contribution to the discrepancy measure. This conforms to intuitive
notions [A2] that small strings are more important than large ones.
Simple examples are used to illustrate some of the desired properties.

We assume

SCL(G) and D(L(G).S) =Zp(x) d(x,S).
xE{L(G)-S}
Property: Given a probabilistic language, L, and two finite sample sets
SI and $2. such that
SIC: SZCZ L. Then,

Dmin(L’Sl) >' Dmin(L'52)'

57

The proof is simple.

The same conclusion cannot be made for Drect (and D .) as demonstrated

below.

Example I: Let L = {abc, bac. cba, acb. bca. cab}. with probabilities
(l/S-I. 1/51. 1/5-1. 1/5-1. 1/5-1. I/S-I).
S]8 {abc. cba} with probabilities, (l/2, 1/2) and

$2: {abc. bac. cba} with probabilities. (1/3. 1/3, l/3).

With wIndow Size = O, Drect(L’sl) 8 1.2156 and Drec (L.Sz) =

1.3072.

Therefore, Drect(L’ S 2)> Drect(L' SI) though S‘CISZ.

For a language. L, and and a finite sample set SC: L, we can write

our three different discrepancy measures as.

 

 

 

(L.S) = p(x) l minIWLD(x,y)} _ --(12)
n Z [M yes
(L. S) -= p(x) l W) "(13)
Z w,
Dtri(L'S) =Zplx) 2m ——)WLD(X.c --(III)
where WLD(x. c) - at i(x.y) WLD(x,y)
yEL

The contribution of a string x 6 L to the discrepancy meaéure is a
product of three factors - its probability, the inverse of its length

and its proximity measure to the set. S. This shows that the

58

discrepancy measures increase in direct proportion to the probability
and the WLD function of the strings in L(G)-S. and shorter strings have
a greater contribution to the discrepancy measure. An example below

illustrates the convergence property of the discrepancy measure for

infinite languages.

Example 2: Consider S = {a, a2. ----- , an} with
probabilities (l/n. l/n, ----, l/n).
Let G: X -> aX I a (p.1-p),

so L(G) - {an | n >. I} with probability(anEL(G)) = pn-](l-p).

oo ._1
Then. Dmin(L(G).S) {go-pipJ (j-n)/j =
j=n+1

n
1/9 - (I-p") - n(l-pl/pLIOQII/(l-pll - EZIpJ/III
1-1

This shows that as n increases. Dmin(L(G)’S) decreases. As n ->cz>,

Dmin(L(G)’S) -> O.

A.3.A Computation of the Discrepancy Measure

In this section we discuss the computational complexity of the
discrepancy measure in terms of the number of steps required in the
computation. This discussion pertains to sequential algorithms for
computing the WLD. Recently [17] parallel algorithms have been
proposed that reduce the order of the time complexity. They are

discussed in detail in Chapter 6.

59

The sum in D(L(G).S) usually involves an infinite number of terms,
because L(G) is generally an infinite language. Following Liou [66].
there are two ways in which we can make the computation finite. One
way is to use the fact that any grammar generates only a finite number
of strings of length <8 m. where m is a positive integer. By choosing
a subset.

Am 8 {x E L(G) | £(x) <8 m}. we can approximate
D(L(G).S)z D(Am,S) .. [22:20 d(x,S)
m
One intuitive choice for m is max{£(y)}. but then the computed discre-
pancy measure may fail to verify whether L(G) is a natural generaliza-

tion of S.

A second approach is to use string probabilities to limit the
number of strings used in the computation. Select a real number. q,
O<q<l, and demand that (lOOq)% of the probability of strings from L(G)
be used in the discrepancy computation. This requires choosing strings
from L(G) in a prescribed order, until the sum of the probabilities
equals or just exceeds q. A reasonable way to do this is to generate
strings from L(G) in lexicographic order and accumulate these strings
in a set Bq, till p(x) >8 q. Then the discrepancy measure can be
approximated as

D(L(G).S) 0(8 .5) = p(x) d(X.S).
q geq

60

Distance computations are based on the WLD [80]. The conventional
WLD algorithm has computational complexity of order nm, where n and m
are string lengths. Masek et al. [71] have proposed a faster algorithm
of order (n max(l,n/log m)), where n >8 m. If the sample set, S, has M

strings and we restrict ourselves to N strings from L(G), the worst

case computation time for (L(G).S) becomes O(MNmn) (or O(NMn

D .

min
max(l,n/log m)), n>8m). For rectangular and triangular windows the
order of computations is similar, except that for x E S. the actual

computation time may be less than required for Dmin(L(G)’S)°

The computation time for Dmin(L(G)’S) can be reduced by use of a

search algorithm. The algorithm is summarized as follows -

Arrange the strings in S in order of their lengths.

For each x E L(G) chosen in the computation first compare x with
y E S : [(Y) = [(x).

If 3 y : WLD(x,y) 8 0. exit and pick the next x.

Otherwise, find a number a : a 8 minIWLD(x,y)}.
yES such that
C (x) 8E (Y)

The search can then be restricted to the set
{y 6 S : L(XI-a <8 L(y) <8 £(XI+a }.

For every new length of strings being searched, the bound a is updated.

if a < a

new old’ and correspondingly the search space Is mInImIzed.

61

All the discrepancy measures defined above are of the same order of
complexity; the computation time for rectangular and triangular window
measures depends on the window size. All three measures have identical
properties in relation to a string grammar inference scheme. We chose
the minimum discrepancy measure as our proximity measure for structural
discrepancies, but test runs for the experiments we conducted in Chap-
ter 6 showed that the other two measures would have produced almost

identical results.

A.A Discrepancy Measure for Array Languages

The fifth step in our inference scheme described in Section 3.2
requires the establishment of a measure of suitability to determine how
well the language of the inferred array grammar fits the training set
of patterns. We now extend the concept of the structural discrepancy
measure for string languages to a structural difference measure between
two array languages. This requires the definition of a proximity
measure between two arrays. The string edit or the WLD distances has
been extended to measure the distance between two finite arrays of
symbols [76.109] and is called the two dimensional WLD. or WLD2. We
use the concept of WLDZ to define the discrepancy measure between two
array languages. The computational aspects of this measure are

investigated, and then approximations to the actual measure are

62
developed to reduce the computational effort in our inference scheme.

The discrepancy measure between an inferred array language L and
the training set of array patterns S is defined as

D(L.S) 8 p(x) d(x,S) --(15)
xEL-S

where x is an array pattern and p(x) is the probability that x E L;
d(x,S) is a proximity measure that has the same properties as the
proximity measures of Section A.3.l and

d(X.S) 8 Z a(x.y) g(WLDZIX.y))
yES

where a(x.y) is the weight factor, or the window function. As in
Section A.3.2 we can define the minimum measure and the two window

measures .

The language L, as a generalization of S. is often an infinite
language, so we need to adopt techniques to make the computation
finite. The algorithm to compute WL02 between two arrays of sizes IxJ

2L+IJ2KL2) [109]: if I - J - K - L = N. we

and KxL. is of order (l2+JK
have O(N6) and this makes the computation formidable even for moderate
N. We approximate D(L.S) by computing only a few terms from L-S.
1’ "IT"
---, N2 (the definition of type is introduced in Section 2.A), with

Suppose the training set S. contains array patterns of types N

nN , nN +1. ----, nN array patterns of each type respectively. We
1 1 z

defIne a new set LSC: L. such that L5 contaIns an,’ 2nN£+], ---.

2nN2 patterns of types N], N]+l, ---, up to N2. respectively. For each

63
size Nk’ Ls contains the 2nNk patterns of highest probability from L;
D(LS.S) is then used as an approximation for D(L.S). For our inference
scheme, we chose the triangular window proximity measure to compute
d(x,S), x E LS; the window is defined as

Lc(x) 8 {yES | size(x)-c <8 size(y) <8 size(x)+c}, with C81.

where size(x) denotes the size of array x. This estimate is a rough
approximation to the actual discrepancy measure D(L.S), but it does
estimate the fit for comparison of array grammars inferred by applying
different noise and distortion models to the same training set without
making the computations unduly lengthy. These experiments are
explained in detail in Chapter 6. Much better approximations can be
obtained. however, by the use of parallel systems and parallel
algorithms, which greatly reduce the time complexity of the algorithm.

They too are discussed in Chapter 6.

A.5 Summary

This chapter discusses the role of discrepancy and complexity
measures in the inference of string grammars. A number of the existing
measures are reviewed. A complexity measure is adopted that satisfies
Blum's and Wharton's criteria. is computationally very simple. and
quite adequate for regular grammars. A new structural discrepancy
measure is defined; the total discrepancy between the language of an
inferred grammar and the sample set is defined as a linear combination

of the structural discrepancy measure between the two sets and the

6A

absolute discrepancy measure between strings common to the two sets.

Lastly, a new discrepancy criterion for the fit between two array

languages is proposed. Computational aspects of these algorithms are

investigated.

CHAPTER V

MODELS FOR NOISE AND DISTORTION

The inference of probabilistic grammars involves estimating
probabilities for production rules from probabilities assigned to the
samples in the training set. In the string grammar inference problem,
probabilities are assigned to strings either by frequency
counts [11.A2]. or from a probabilistic model. A probability model on
strings defines conditional probabilities of the type p(x|y), where x.y
E T+. An example is the error correcting parsing model described in
Section 3.1, where probabilities assigned to each of the three edit

operations can be used to compute the probabilities of strings.

Pattern description and learning based on array grammars involves
inferring the structure of a given class of patterns from images that
may be corrupted by some physical phenomenon. The ideal image is
changed by random fluctuations in the digitizing unit. or during
transmission. This phenomenon, which is independent of the underlying
pattern structure. is called a noise process. In other situations, the
corrupting phenomenon may depend on the underlying pattern structure.
Such effects are observed during photographing or digitizing objects
when the intensity differences between light and dark areas is low.

causing diffraction effects. This effect becomes more pronounced when

65

66

the aperture of the camera lens is small. resulting in blurred edges
separating the object from the background [88]. Points near the
boundaries of objects experience more distortion than internal regions.

We term such processes distortion processes.

This chapter investigates some noise and distortion models. We
begin with a simple. independent noise model, and then extend it to
define two intuitive distortion models and a comprehensive distortion
model using Markov random field theory. We study the mathematical
formulation of each model. and techniques for generating sample
patterns. based on an ideal structure along with the estimation of the
parameters of a model. The estimated parameters are used to compute

individual block probabilities as outlined in Section 3.2.

5.1 The Independence Noise Model

The independent noise model assumes that the probability of a pixel
changing value is independent of all other pixel values in the pattern.
This is equivalent to considering an ideal image. which is a digitized
mxn pattern, being degraded with independent. additive, stationary

noise.

The noise process is modelled as a Bernoulli process. The number
of changes in a block is assumed to have a binomial distribution. with

parameters b, the number of pixels in the block, and p. the probability

67

that a pixel value is switched.

The scheme for generating patterns from this noise process is
described in Procedure 1 below. We assume that all pattern types
(random variable N) occur with equal probability. so N has a discrete

uniform distribution over [N],N2].

 

(
1 , if Nl<= n <8 N2
P[N=n] = a Nz-N]+I
O , otherwise.
\

Procedure 1:

Input: G. the rewriting rules for the ideal structure, p,
N N

. l’ 2'
Begin:
(1) Generate n..~.. U[N].N2]
(2) Using the nonterminal rewriting rules, create an
ideal pattern structure of size n.
(3) For each pixel in pattern,
(a) generate x:: U[0,l]
(b) if x<8 p. flip pixel value
otherwise. retain pixel value
End loop.
End

The result of applying this procedure to a set of isosceles triangles
and the numeral '7' are illustrated in Figures 7 and 8 respectively.

The parameter p was set at 0.05.

xxx .

.XXX

. .XX

. ..x

. . a

. . e
8
IO

xxxxxx

oxxxxx

X .XXXX

. . .XXX

....xx

. . o .x

. . . .

o o o .x

....x.

axe...

X . X X X

X X

. X X X X .
. X X X X X X X X
X X X X X X X X X
X X X X X X X X X X
X X X X X
X X X X X X X X X X X X
X X X X X X X X X X X X X
. X X X X X X X X X X X

X X X X X X

X X X X X
X X X X X X X
X X X X X X

X

. X X X X X

. X X

. X X X X X X X X

. X
X X

. X X X X X X

. X X X X X X X

. X X X X X X X .

. X X X X X X X X X

X X X X
X X X X X
. X X X X X X

x O

X

X

X
. X X . X X X X X X X X

. X

X X X X X

. X X X X X X X

. X X X X X X X X
X X X X X X X X X

X X X

X X X X X X X X X X X X X

X X X X X X X X .

X X

. X X X . X X X

X X X X

X X X X X X X X X X X X X X X

 

1: Triangles Generated from

Eta-

Independent Noise Model. p 8 0.05.

 

are obtained by

training set

the

in

patterns

Given that the

noise model on the ideal pattern structure, the maximum

imposing

the

is

likelihood estimate of the parameter p.

69

XXXXXXXXXXX

XXXXX

XXXXX

xx o
xx . .
.xx . o a o o
u .xx o

o o . ex .x o
.x . xx . o
o ..xxx.
o a o .XX
Xe. .
o .x .

XXXXXXXXXXX

XXXX

X.XXXX

. o
x. .
xx ..
.xx.....
x xx..
. .xx .x
. xx. .
. .xx..
..xx.
xx. ...xx
. .x

.....X
XXXXXXX.
XXXXXXXX

x. one
xx oeo
.XX..X
000x 0

oooXXo
00 .XX

XXXXXXXXX

XXXXXXX

X

.XXX

.xx .. o
o . .xx . o
o o o .x o a.
o o . . .xx
. . ox .

o a o . . .XX
. . o a .x o .x

p = 0.05.

. _8_: Noisy ]_'_s_.

7O

8 8 total number of pixels that are flipped
total number of pixels in the sample set

The ideal structure must be known. Our inference scheme requires that
the patterns be broken down into blocks. which can then be grouped into
sample patterns for the different intermediate languages. Treating
each intermediate language separately. we can compute the probability
of occurrence of each block from the noise model. The probability of
observing a block with b pixels of which c are flipped is:

p° (I-plb'c --<I)

This model for noise is computationally very simple. both in terms
of generating patterns, and computing the block probabilities required
by the inference scheme. It represents random fluctuations in a
digitizing unit or a transmission channel that are independent of the

underlying pattern structure.

5.2 Intuitive Distortion Models

The corrupting process in distortion models depends on the
structure of the ideal image. Deviations from the ideal are seen more
along the boundary than in the interior of the object or in the
background. Boundary and interior points are illustrated in Fig. 9.
This type of corruption can be attributed to diffraction effects that
occur when digitizing an object or by image preprocessing and

segmentation operations that extract a binary image from a background

71

. X
. . X
. . X X X X X X
. X X X X X X X
. X X X X X X X X
. X X X X X X X X X
X X X X X X X X X X

(a) Triangle

. . .
.xxxxx. ...
.xxxxx.. .
.xxxxx... .
.xxxx .xx..
..xxxxaxx.
..xxxxxxxo.
..xxxxxxx.
..xxxxxxoo
...xxxxxxo.
.xxxxxxxxx
.xxxxxxxxxx
.xxxxxxxx.
..xxxxxx..

(b) Arbitrary Figure

 

Interior and Boundary Points

:19. 2:

Two simple intuitive distortion models that are

irrelevant detail.

of

this

in

noise model are presented

independent

the

extensions of

We discuss their shortcomings and motivate the study of the

section.

distortion models based on Markov random fields.

72

5.2.1 Independence Distortion Model

Instead of looking upon an image or pattern as a homogeneous
region, with the same probability of change at each pixel, we define a
nonhomogeneous model with two parameters; pi is the probability of
change for interior points, and pb is the probability of change for
boundary points. If pb > pi boundary changes are more likely than

changes in the interior and background regions.

The scheme to generate distorted patterns using this model
(Procedure 2) is similar to that for the independent noise model. The

pattern type is assumed to have a U[N].N2] distribution.

Procedure 2:
Input: G, B. pi. pb, N1 and N2
Begin:

(1) Generate n=: U[N],N2]

(2) From rewriting rules create ideal structure
of size n

(3) For each pixel in pattern.
(a) Generate x 2: U[0, l]
(b) if interior pixel then
if x <8 pi flip pixel value
otherwise retain
otherwise (boundary point)

if x <8 pb flip pixel value

73

otherwise retain
End loop.

End

An example in Figure 10 illustrates this procedure with the class of
triangular patterns. The parameters pi and pb were set to 0.05 and

0.15. respectively.

Estimation of the parameters pi and pb. from a finite set of
training patterns using maximum likelihood estimates, again involves

counting procedures,

8i 8 total number of interior pixels flipped
total number of interior pixels
and 8b 8 total number of boundary pixels flipped
total number of boundary pixels

Once 8i and ‘hb are estimated. a probability can be assigned to any
block by assuming the patterns are acted upon by two independent
binomial processes. The probability that a block b pixels experiences

ci internal changes and c boundary changes is

b

c b -c c b -c
RE (I pi) pb (1 pb)

is

where bi is the total number of internal points in the block. and bb

the total number of boundary points in the block; bi+bb 8 b.

7A

.XX

X X .
X X X X X
. X X X X X
X X X X X X X X

X X X X X

X X X X X X

. X X X X X X X

. X X X X X X X X

. X X X X X X X X X

X X X X X X X X X X

X X X X X X X X X X X

X X X X X X X X X X X X

X X X X X X X

X X X X X X X X X

X X X X X X X X X X

X X X X X X X X X X X

. X

X

X X . X X . . X X

X X X

. X X . X .

X

X X X X
X X X X
X X X X X X

X

X X X X X
X X X X X X
. X X X X X X

X
X X X X X X X . X

' X X
X X X X X X X X X X X

X X X X X X X X X X X X
. X X X X X X X X X X X X

. X X X X X X X

X X

X
. X X X X X

X X
X X X X X X X

X

X X X X

. X X X X

X

. X X X X X X

X X

. X X X X

0 x o O O
O x
. X

X X

. X X X X X X X .
. X X X X X X X X X X X X

X

. X X X X X

. X X X

X X .

X X X X X X

. X X X

. X X X X X X X

 

[gles generated g1 Independent

 

Fig. 19: Trian

pi8O.05, pb8O.15

Distortion Process.

 

5.2.2 Neighborhood Distortion Models

at

The neighborhood distortion model allows the distortion process

a function of the values at neighboring pixels in the

to be

pixel

extent of

the

determine

surrounding pixels

pattern. The

ideal

75

nonhomogeneity in that region. If a majority of the neighboring pixels
have the same value, we assign a smaller probability of change than

when the surrounding neighborhood is nonhomogeneous.

We now formally define the concept of neighborhood for a two
dimensional array of pixels. The distance between pixels or points in
the array assumes discrete values. Figure ll(a) depicts some of the
squared distances from point x in the array. These squared distances
form a sequence of numbers I, 2. A, 5, --- called {e(k)}, k 8 l, 2,
----. Following usual image processing conventions. pixel (m,n) is a
kth order neighbor of pixel (i.j) iff the squared distance between

(m,n) and (i.j) is the kth term in {e(k)}. Figure ll(b) shows the

ordering of the neighboring pixels from the one marked x.

A neighborhood distortion model of order k requires that the
probability of change for each pixel (i.j) depends on all its neighbors
up to order k in the ideal pattern. In general, if the number of pixel
neighbors of order k or less is c, we define a discrete neighborhood
function W(i.j) for each pixel (i.j) that can assume at most
2c different values. We assume. for a kth order neighborhood function,
that the pattern is surrounded by k rows and columns of 0's at each
edge. so that W(i.j) is then well defined for all pixels of the
pattern. Corresponding to this neighborhood function we have at most
2c different probability of change values. (p]. p2. ---. p2 ). as

parameters for the model. For example. consider a first order model

76

+ ---------------------------------- +
I I l I I I l I
I I I 10 I 9 l 10 I I I
I I I I I I I I
l----+----+----+ --------- +----+----I
I I I I I I I I
I I 8 I 5 I A I 5 I 8 I I
I I l I I I I I
I----+----+----+---—+----+---—+----I
I I I I I I I I
I 10 I 5 I 2 I I I 2 I 5 I 10 I
I I I I I I I I
I----+----+----+----+----+----+----I
I I l I I I I I
I 9 I A I l I x I I I A I 9 I
I I I I I I I I
I----+----+----+----+----+----+----I
I l I I I I I I
I 10 I 5 I 2 I I I 2 I 5 I 10 I
I l I I I I I I
I----+----+----+----+--—-+----+----I
I I I I I I l I
l I 8 I 5 I A I 5 I 8 I I
I I I I I I I I
I----+----+---—+----+----+----+----I
I I I I I l I I
I I I 10 I 9 I 10 I I I
I l I I I I I I
+ --------------------------------- +

FIG. ll(a): Squared Distances 39 Point x.

where each pixel has four neighbors. The neighborhood function can
have a maximum of 16 different values. and the model can have a maximum
of 16 different probability of change values as parameters. Simpler
functions can be used to describe the type of distortion we desire,

such as,

77

+----------------------------------+

— . . . _ .
_ - . . . .
. - . 6 . . .
. . . — . .
'0 '0 '0 + ' '0 ' + ' '0 '0 + '0 '0 '0 + '0 '0 '0 + ' '0 '0 + ' '0 I
. . . . . .
- . _ . . .
. 5 . I... . 3 . I“ . 5 .
. . . . . .
'0 '0 '0 + ' '0 ' + '0 '0 | + ' '0 '0 + '0 '0 '0 + '0 '0 '0 + '0 '0 '
. . - _ . .
. _ . . . .
7 u I“ . 2 . 1| . 2 . h. — 7
. . . . .
' '0 '0 + '0 '0 ' + ' '0 '0 + I '0 '0 + ' '0 '0 + '0 '0 '0 + ' '0 '0
. . . . _ .
. . . . — .
6 . 3 . 1| . x . 1 _ 3 . 6
. . . . . .
'0 '0 '0 + '0 '0 '0 + ' '0 '0 + ' '0 '0 + '0 '0 '0 + '0 '0 ' + ' '0 ,0
. . - . . .
. _ . . _ .
7 . h . 2 . 1 _ 2 . h. . 7
. . . _ . .
'0 '0 '0 + '0 '0 ' + I '0 '0 + ' '0 '0 + '0 '0 '0 + '0 I '0 + ' '0 '0
_ . . . . .
. _ . - . _
. 5 . I". . 3 . I“. . 5 .
_ - . . _ .
I '0 ' + '0 '0 '0 + '0 '0 '0 + '0 '0 '0 + ' '0 '0 + ' '0 '0 + ' '0 '0
_ _ - - . .
. . . . . .
. . . 6 — _ .
. . . . . _

+-------------------------------—-—+

x.

to Point

ghbors Relative

 

Orders of Nei

ll(b):

 

 

FIG.

\l’

2

(
.
.
.

)
).J
n 9
Mn“
( k
Vb

N
Hue

n

m

(

W(i.j) 8

order neighborhood of (i.j)

th

denotes pixels in the k

where ka(i.j)

(m,n) .

is the value of pixel

and v(m,n)

78

For k = l, the above function can have five values. 0 through b, so
we can define a maximum of five different probabilities of change,
p0 through p“. As a snmple example, we set po - ph and p1 - p2 = p3.
Then a value of 0 or b for the neighborhood function indicates a
homogeneous region and the other values represent nonhomogeneous

regions. Setting p1 > po produces more likely distortion along edges

than in interior regions.

The value of the neighborhood function for each pixel is determined
entirely from the ideal structure of the pattern, which is pre-
determined. Therefore, the probability of change for any pixel (i,j)
is independent of the probability of change for any other pixel (i',j')
in the same pattern. We define an indicator function z(i,j) for an
observed pattern, as

I if the pixel value is flipped,
z(i._i) =
0 otherwise.
Assuming that N, the type of the pattern, comes from a U[N‘,N2]
distribution, the probability of a block sample belonging to an
intermediate language is

Nz-N]+l J 6block

This model is uniquely defined by the neighborhood function W(i.j), the

79

size of the neighborhood k, and the different probability of change
values that are used. The generation procedure for samples from this

model is given in Procedure 3.

Procedure 3:

N

Input: G. W(i.j). {pi}. N]. 2

Begin:
(l) Generate n z U[N],N2]

(2) Using rewriting rules create ideal structure of
type n

(3) For each pixel (i.j)
(a) compute W(i.j) from ideal pattern
(b) generate x z U[0,l]
(c) if x <- pw(i,j)’ flip pixel value
in observed pattern
otherwise, retain ideal pixel value
End loop.

End

Maximum likelihood estimation of the parameters, given the
neighborhood function and size of neighborhood, again involves a
counting procedure. The pixels in the observed patterns are grouped in
terms of their neighborhood function values, and for each distinct
functional value, we count the number of pixels flipped as well as the
total number of pixels in that group. Therefore, for a particular

value of W(i.j) = x,

 

80

A . .
p = number of x-group pixels flipped
total number of x-group pixels

The complexity of this model depends on the size of the
neighborhood and the neighborhood function H. We chose the w function
as defined in (2), and used the model for first order neighbors with
two different probability of change values. The computational
complexity is the same order as in the previous two models, except that
we now compute the neighborhood function for each pixel (i.j). The
results of applying this model to the triangular patterns are

illustrated in Fig. l2.

5.2.3 Applications and Limitations

The two models described above satisfy the requirements for simple
distortion processes in that the corrupting phenomenon depends on the
underlying pattern structures. The neighborhood model is
computationally more complex than the independence model. However, it
is a richer model in the sense that it can model a variety of
distortions. Both models are limited by the fact that they are static
in nature. The pixel values in the observed patterns depend only on
the ideal pattern structure, but are independent of each other and do

not allow the propagation of distortion effects.

8i

oxx

Xe.
xxx

. X X X
. X X X
. X X

X
X X X X X X

X X X X X X

. . X X X X X X X
. X X
. X X X X X X X X
X X X X X . X X X X
. X X X

X

X X X X X

. X X X

. X .

X

X X X X X X X X X
X X X X X X X X X X
. X X X X X X X X X X

X

X

X

X X X X X X X

. X X X X X X . X X X X

X X X X X X X X X

. X X

X X X

X

X X X X X X X X X

X X X X X X X X X X X X X
X X X X X X X X X X X X X X X

X X X X X X X X X X X . X X

X X X X X X X X X X X

X

xxx 0

Ox 0 ex
.xxxx
.XXX

.xx

X

X X
X X X X X
. X X X X X .
. X . X X X X X X X

X X X X X X X X
. X X X X X X X
X X X X X X X X

X

. X X X X X X . X
X X X X X X X

X

. X X X X X X
. X X X X X X X X X X
. X X X X X X X X X X X

X

. X

X X X X X X X X X X
X X X X X X X X X X

X
X

X X X X X X

. X X X X X X X
X X X X X X X X X
X X X X X X X X X X X X X X

X X X
X X

X

 

 

gig. lg: Triangles generated by Neighborhood

Distortion Process. po = p“ = 0.05; pI - p2 = p3 - 0.15

 

82

5.3 Markov Random Field Distortion Model

In this section we define a distortion model in which the
probability that a pixel changes value depends on whether neighboring
pixel values have also undergone a change. The mathematical
formulation of this dependence is expressed as a Markov random field
process [7,8,55]. The success of MRF's in modelling texture [25]

suggests they are appropriate for distortion models.

Our two dimensional patterns are binary mxn arrays which, in Markov
random field terminology are finite lattices whose sites are labelled O
or l. We begin this section with a brief review of the mathematical
background of Markov random fields on finite lattices. Techniques for
generating distorted samples using this model and the estimation of the

parameters of this model from finite training sets are also studied.

5.3.l Background

We follow the notations and definitions of Besag [8] and
Isham [55]. The set of all possible realizations on a binary mxn
lattice is denoted by T. A particular assignment of 0's and l's to a
lattice, called a configuration, is denoted X E T. Since T is finite,
the process can be described by a discrete probability distribution, u,
on the power set of T. Suppose u(X) > 0, V X E T. Then, there exists

a real valued potential function

83

(209 = -log{ um / u(¢) l ---(3)
where ¢ is the configuration with 0's at all site values and Q(¢) = 0.
Rewriting (3) we obtain,
U(§) = Z expi-Q(§)}. ---(h)
where Z is a normalizing constant obtained by setting the sum of (h)
over all subsets X E T to one. This type of probability distribution

is frequently referred to as a Gibbs potential.

For finite lattices, the class of probability distributions with
Gibbs potentials has been proved to be identical to Markov random field
models [8.50.89]. A Markov random field [29] is a joint probability
density on the set of finite mxn arrays, subject to the following

conditions -

(i) Positivity : p(X) > 0 V X E T, and
(ii) Markovianity : p(Xi j | all points on lattice except

(i.j)) - p(xi j | neighbors of (i.j)).

A Markov random field is defined to be of order k, if Xi j depends on

neighbors up to the kth order.

The potential function can be expanded as follows if the pixels are

labelled l to N.

Q(l) =2 xiFi(xi) + leiji j(xi’xj) + ----
i i,j ’

+ xI"2"""NF1,2,----,N(xi'x2’""’"N)'

8h

where the functions {F. } are called interacting potentials, and

'9j9---ak

xi is the label on the the ith pixel.

A Markov random field is called an Auto-model if the following two

conditions are satisfied.

(i) All F functions with more than two subscripts vanish

identically;

(ii) The conditional probability distribution at each point belongs

to the exponential family.

The potential function for a first order Auto-model is called the near

neighbor potential function and can be expressed as
(2(5) g :xiai + iijiiji’j ---(5)

where b. . = 0, unless xi and xj are neighbors. From (5) we obtain

exp{-x(ai +-Z:iji’j)}

 

l + exp{-(ai +23iji,j)} ---(6)

where X. represents the configuration X minus the ith point.

Equations (5) and (6) define a first order random field with N(N-l)

parameters, the ai's and the bi j's. A Markov random field is said to

be spatially homogeneous if

ai . a, i =19 2’ ---! N;

bi . = b(l) if xi and xj are vertical neighbors and

35

bi j 8 b(2) if xi and xj are horizontal neighbors.
A spatially homogeneous first order Markov random field can be defined

by these three parameters and is called isotropic if b(l) = b(2).

5.3.2 The Distortion Model

We base our distortion model on the isotropic first order Auto
model. Given an ideal structure of size mxn, we define a change
template T as an mxn binary array generated by the isotropic first
order Auto model that relates the ideal structure Y = [yi,j] and the

observed patterns X = [xi jJ in the following manner,

l-y. . if t. . = 1. ---(7)

A desirable property for our distortion models is that the
probability of change on and around borders be greater than the
probability of change in interior and background regions. This is
achieved by defining a nonhomogeneous Markov random field. The points
on the change template are broken up into interior points and border
points, as determined by the ideal pattern structure. Our first order

nonhomogeneous Markov random field model requires five parameters,

(i) ai - corresponding to first order interactions for the
interior points,

(ii) ab - corresponding to first order interactions for the

86

boundary points

(iii) ai i - corresponding to pairwise interactions of interior
D
points
(iv) ab b - corresponding to pairwise interactions of boundary
9

points, and
(v) ab i 8 a3 b - corresponding to second order interactions for
9 9

interior-boundary pairs.

Setting ab < ai produces more changes around border regions than in
interior regions. Cross[25] showed that the parameters b(l) and b(2)
control the clustering or grouping of l's on a lattice for homogeneous
models, with negative values indicating attraction and postive values
indicating repulsion. Setting the parameters, b(l) and b(2) to 0,
produces equally likely configurations. This interpretation is also
valid for our nonhomogeneous model, and ai,i’ ab,b and ab,i control the

clustering of interior-interior, boundary-boundary and interior-bound-

ary pixels that are l's.

Following (5) the potential function can be written as

Qt(X) - siai + s a + siia. --(8)

b b .,i + sbbab,b + sbiab,i’
where 5i and 5b are the numbers of interior and boundary points

respectively that are l's, and sii’ and s are the number of int-

sbb bi
erior-interior, boundary-boundary and interior-boundary neighbor pairs,
respectively, that are both l's. The probability of occurrence of a

block, obtained by applying the nonterminal rules of the pattern class,

can be written as,

87

l TT p(t..|X)

NZ-N]+l (i.j)Eblock "J '

where p(ti | X) is defined in (6), and ti j is the pixel value in

.i
change template T. We again assume that the type of a pattern is

distributed as U[N],N2].

5.3.3) Generation 9: Distorted Patterns

The procedure for generating distorted patterns is based on an
algorithm for generating texture [25]. The generation procedure is
derived by analogy to a discrete, finite state Markov chain, whose
states are the set of configurations { X }, with the limiting
distribution given by u(X)[h9]. The transition probability from a
state X to a state 1 is given by u(X)/u(X). The algorithm is given in
Fig. l3 [37,73]. Convergence to the limit distribution is unaffected
by the choice of the initial configuration; only the time taken to
achieve the final configurations depend on the initial configuration.
Step l in Fig. l3, therefore, requires the choice of a good initial
configuration. We generate our initial configuration using the
procedure for the independent distortion model described in Sec. 5.2.1.
This procedure requires the two probabilities pb and pi whose maximum
likelihood estimates are developed below. A likelihood function that
defines the ratio of the desired density function to that for the

independent model distribution is,

88

exP(Siai + sbab + siiai,i + sbbab,b + sbiab,i

L(l) ‘ 55(1— )nb-sb 5i(]- )“L'sl __( )
pb pb pi :3i 9

where nb and ni are the total number of boundary and interior points
respectively. Equation (9) can be broken up into product terms and

expressed as

L(l) ' [exp(- ab)(l- pbl/ pb]sb [exp(- ai)(l- pi)/ pi Jsl

- '"i - -n -
[ (' pi) (‘ pb) b exp{ (Siiai,i+sbbab,b+sbiab,i)} J
The two probabilities pb and pi are chosen to make the expressions in
the first two brackets equal to l. Therefore,

= l , p. = l

Pb :
l+exp{ ab} ll+exp{ ai}
We follow the procedure used by Cross [25] to define the parameter
STABLE. Given mxn arrays, we define M = mn. and consider M attempted
switches to constitute one iteration. This count ignores attempted
switches between pixels with the same value. Convergence to the
desired configurations is indicated when the number of changes per
iteration drop off to a small percentage of M, or the estimated
parameters match the input parameters closely. These two factors are
used to define the variable STABLE in Fig. 13. The algorithm was used
to generate distorted sets for the triangle pattern structures as
illustrated in Figs. lh and l5 for different second order parameters.
The time taken to generate templates of types l0-20 was roughly 500-600

seconds per template on a PDP—ll/3h computer; the computation time for

39

........................................ .4.
l

(l) Generate initial Template l
usung pi and pb :
........................................ +
l

(2) Until STABLE !

l

+ ----------------------------------- +

, l
(3) Choose sites (k,l) G (k',l') l
with different pixel values !

l

l

l Switch l Retain I
l (k,l)G l old i
I (k',l')! config.l
l l l
+ -------- + -------- +

l
(h) Compute u(new)/u(old)=r !
l
+ ----------------------------------- +
l l
l (5) r >8 I 7 l
l yes no !
+ ----------------------------------- +
l l l
l ! Get 2 - U[0,l] l
! l l
! + ----------------- +
! l l
l Switch ! r > z 7 l
! (k,l) l yes no l
! and + -------- + -------- +
! (k',l') ! l
l
l
l
l
+

519. 1}: Algorithm for Generating Template from MRF distribgtion with
given Parameters

90

. X X X X

. X X X
. X X X X

. X X X X
. X X X X X
X X X X X X

. X X X X

X X X X X X

. X X X X X X X

. X X X X X X X
. X X

. X X X X
. X X X X X X X X X

X X X X X X
. X X X X X X X X

X
. X X X X X X X X X X
. X X X X X X .

. . X
. X X X X X X

. X X X X X X X

. X X X .

X X X X X . X

. X X X X

. X X X X .

X X X X
. X X X X

X

X X

. X

X X

. X X . X X X X X X
. X X X X X X X X X

X X X X .

X X

. X

. X X X X .

. X X X X X X X X X X .

. X X X X X

. X X X X

. X .

. X X X X X

0 x O

xxxx

xx

X X X X X X X

. X X X
X X X X

X

. X

X

. X X X X X X

.XX

.vn
vnvx
vnvn
yawn
vAvA

.vn

vxvx

.vx
vxvx
vAvA

X X X X X X X X X X

. X X X X X X X
. X X X X X X X X

X X . X X X

X X X

. X X X X X X X

. X X X X X X X

X X X X X X X X X X X X
. X X X X X X X

X X X X X . X X X X X

. X X X X X X

x

O x
X

xoxxx
ox oxxx
.xxxx

xx

X X .
X X .

X
X
X

. . X X X .

. X X

X X X . X

. X X X X X X X

.vA
vAvA

unvn
vnvn
x e
vAvA
.vA
.vA

 

les created by Markov random field

as

 

fig. 15: Tria

 

generative model. Parameters . < 2.2,].5,0.l,-l.0,-0.8>.

9i

X . X X

X

X X

X

X

. X

X X X
. X

X X
X X

X X X X X X
. X X X X X X X X
X X X X X X X X

. X X X X .
X X X X X
X X X X X . X X
X X X X X X X X

. X X X X

X

X

X .

O

X

. X

. . X X X X X X X X X .
. X X X X X X X X X X X

. X X X X X X X X
X X X X X X X

X
. X X X X X X X X X X X X

. X

X

. X

X
X X X X X X X X X X X X

X

X

X X X X X X X X .

X X X X X . X .

X

0

X

X X X X X X X X X X X
X X X X X X X X

X

X X
X

X

. X

. X X

. X X X X

X

. X X X X X X X X

C x x 0

.xxx
xx .x
e exx

. . X X X X .

X X X X X X

X X X X X X X

. X X X X X X X .
X X X X X X X X X

0

X

x x O O O

X

. X
. X X X
. X X . X X X X

X

X

X

X X

0 x O

X

. X X X X .

X X
X X X X X X X X X X
. X X X X X X X X X X X
X X X X X X X X X X X .
. X X . X . X X X X X X X X

X X X X X X X X X X X X . X

X X X

X X X X X X X X X X

. X .

X X

X X X X X

. . X X X X X .
. X X X X X X X X X X X

. X X

X

. X X X
. X X X . X . X . X X X X X

X X X X . X

. X X X X

X X

X

X X X X X X X

X

X

 

fig. 15: Iglggglgg from Markov random

field generative Model. Parameters - <2.0,l.2,-0.5,-0.8,0.5>

 

92

the indepedent models was just a fraction of a second.

5.3.A Estimation of Parameters

This section explains a method for estimating Markov random field
parameters from a given set of training patterns. Nonhomogeneous
Markov random fields involve many more parameters than homogeneous
ones [25]. The first approach simplifies the computational
requirements of the estimation procedure by approximating the
probability distribution u(X) of a configuration X

expl-Qt (5)}
u(X) = ---(lO)

Zexp{-Qt (x);
XET

 

For convenience of notation we define C, the count vector as

'S ., 'S

c ‘ (”si’ "‘b' in bb’

and A, the parameter vector as,

A = ( ai’ ab’ ai,i’ ab,b’ ab,i)

Then exp{-Qt(X)} - exp<C,A>, where < , > denotes the vector dot

product. The denominator of (l0) can be expressed as

Z exnl-QtQQ} = 2”"Etexp<c.A>J mm)
XET

for a mxn lattice where E is the expectation assuming all

93

configurations generated by the parameter set A are equally likely.

We expect the count vector C to converge to a multivariate normal
random vector as the pattern size increases. Monte Carlo runs
performed to verify this for the triangular pattern structure showed
that (ll) seems to converge in distribution to a multivariate normal
distribution with mean vector E[C] and covariance matrix R; R is a 5x5
matrix consisting of the second order moments of the C vector.
Mardia's [26,70] skewness and kurtosis tests for multivariate normality
were the criteria. The Monte Carlo runs involved picking a template
size N, and then creating sets of 500 random templates at random. The
count vector was then computed for each of these configurations, and
Mardia's test was applied to each set .of 500 count vectors. The
procedure was repeated l00 times and the number of rejections of the
null hypothesis for the skewness and kurtosis tests were observed.
This test was conducted for pattern types l0, l5, 20, no and 50. For N
less than 20, the multivariate normal hypothesis was consistently
rejected by the skewness test but there were less than five rejections
by the kurtosis test at the 5% significance level. For pattern types
#0 and 50 the hypothesis was accepted at the 5% significance level for
both the tests. Therefore, the approximation to the denominator seems
adequate and we write,

E[exp<C,A>] exp{<E[C],A> + 0.5<A,RA>}
which reduces (ID) to

u(X) - exp{<C-E<C>,A> - 0.5<A,RA>}

9h

If R is nonsingular, the maximum likelihood estimate can be written as
A = R"(c-E[cJ)
where C is the vector of observed values; E[C] and R depend on the

ideal pattern structure.

Table 1 indicates that this estimation procedure worked poorly for
the configuration of triangles. One possible explanation is that the
parameters chosen generate distorted, atypical configurations and that
the degree of nonhomogeneity is very high. There is a high degree of
attraction among boundary l's in the change template, but the
distribution of l's in the interior is quite random. The normal
approximation of the probability function is exact at the origin of the
parameter space. As one moves further away from the origin this
approximation worsens. This was illustrated when the estimation
procedure was applied to sets of triangular patterns of type l5
generated by the Markov field model with the parameters listed in Table

l. A second explanation is that the approximation is linear but the
function is not, so the parameters are not unique; i.e., several
parameter sets produce the same probability values but for different
sets of configurations. The poor results obtained in Table l prompted
us to use the exact form of the likelihood function to estimate the

five parameters.

The log likelihood function for the probability distribution can be

written as

95

Table 1: Parameter Estimates g; the two schemes

for the Markov random field model

r ____________________________________________________________________ -
PARAMETER ESTLMATES-t—f_aiz_ab, ai’i, ab’b, ab’i>
GENERATIVE __ __ __ _'._ __E§T1MATED _____ .___ _ _
SCHEME I ‘F SCHEME 2
--L ............................................. . ......................
1) o.o,o.o,o.o,o.o,o.o o.o,-o.2,o.o,o.i,o.o **

2) l.0,l.0,0.0,0.0,0.0 2.5,2.5,-0.8,-l.l,-0.8 l.2.l.2,0.0,0.l,0.0

3) 2.0,l.0,0.0,0.0,0.0 .h,3.6,-2.l,-l.6,-l.6 l.8,l.0,0.l,0.3,0.l

5) 2.0,].2,0.0,0.0,-l.

5

h) 2.0,].0,-l.0,0.0 5.5,2.9,-2.l,-l.6,-l.6 l.8,l.3,0.0,-l.3,0.l

5.h,3.3,-2.0,-l.6,-l.8 2.2,0.9,-0.l,0.2,-0.2
6

6) 2.2,).5,0.l,-l.,-.8 .h,h.h,-2.5,-2.3,-2.0 2.0,].2,0.l,-l.l,0.7

7) 2.2,).5,0.5,0.5,0.5 5.“,h.h,-2.l,-2.l,-l.7 l.5,l.3,0.5,0.7,0.5

8) 2.2,].5,-.5,-.5,-.S 5.9,k.h,-2.3,-2.l,-l.9 l.6,l.5,-0.h,-0.h,-0.h
6

9) 2.2,1.5,-.2,-.8,.1 .3,A.7,-2.3,-2.5,-l.9 1.7,1.h,-o.3,-o.9,-.05
L.. .................... ll .............................................. -)

** - could not compute; problems in numerical computation.

 

 

 

 

 

LI- 2: ln(p(xi B x | X)
xiGX

where p(xi - x | X) is given in equation (6). All the terms in the
summation above are not independent, because of the nature of the
Markov random field scheme. Therefore coding schemes [8,25], which
break up the lattice into disjoint sets of independent points, are

employed. A first order scheme requires at least two codings for

96

estimation purposes so we break up L into parts L(l) and [(2),
corresponding to the two codings. The maximum likelihood estimate
requires the solution of five simultaneous non linear equations for
each k. This system of equations can be solved numerically by a
multivariable extension to Newton‘s method [5h]. The final estimate
for A is the average value of the estimates obtained from each coding
scheme. Although the estimation procedure is quite complex, it does
yield accurate estimates for the parameter set A as illustrated in
Table l. The estimate may be more accurate as the pattern size

increases.

5.h Summary

ln this chapter we have studied the independent noise model, two
independent distortion models and a distortion model based on Markov
random fields and presented their capabilities and restrictions. The
Markov random field model is very rich, and generalizes other models.

For example, setting the a parameters to 0 and ai - ab, creates the

independent noise model; setting the a... parameters 0, but requiring
ai f ab describes the independent distortion model. For each model we
state a random experiment for the generation of patterns based on an
ideal structure. The results of applying these experiments to specific
configurations is illustrated. Parameter estimation from a finite

sample set and the scheme for computing the probabilities of blocks,

obtained by applying the nonterminal rewriting rules, are also

97

presented. The estimation procedure for the Markov random field
distortion model is quite complex, and we have presented an
approximation to simplify this computation, that is applicable in
certain cases. These noise and distortion models are quite general,
and may be used in other image modelling applications. For example,
the nonhomogeneous Markov random field model can be applied to texture

analysis and the analysis of plant and crop data [8].

CHAPTER VI

EXPERIMENTAL RESULTS AND COMPUTATION

This chapter presents applications of the array grammar inference
scheme. The two structures studied are the triangular patterns and the
numeral '7', whose array grammars and ideal structures are presented in
Figures 2 and 3 in Chapter 2. The training sets are versions of the

ideal structures corrupted by three types of noise and distortion.

The first set of experiments uses the two dimensional discrepancy
measure defined in Section A.A to compare the robustness of the
inference algorithm. Robustness properties are further investigated by
parsing experiments defined in Section 6.2. The second objective is to
examine the effect of errors in the ideal configuration. A set of
noisy patterns is generated after rotating the ideal structure slightly
and the rate at which the array grammars inferred from the training
sets generated by the three corrupting processes accept the samples is
determined. The last section suggests the use of parallel techniques

to reduce the computation time for the array grammar inference scheme.

98

99

6.] The Inference Scheme

This section applies the inference scheme presented in Section 3.2
to training sets whose patterns are generated by independent noise,
independent distortion and the Markov random field distortion models of
Chapter 5. The triangular pattern structure, illustrated in Figure 2
(Chapter 2), was used for most of our experiments. The remaining
experiments generate a p-array grammar inferred from noisy versions of
the figure '7' (Fig. 3, Chapter 2). This illustrates the concept of

coding matrix samples into strings using vector primitives.

The parameters used to generate each of the training sets are
summarized in Table 2. All the training sets contained 20 patterns.
With independent noise and distortion models the pattern types were
chosen uniformly from the range [7,l0]. With Markov random field
distortion, the training set was made up of l0 triangles of type 9 and
l0 triangles of type l0. This ensured a sufficient number of patterns
of each size for parameter estimation. The training set for the '7's
were either 8x8 or llxll arrays corresponding to pattern types 2 and 3,
corrupted by the uniform noise model. The inference of the array
grammar for the 7's takes an enormous amount of computer time and
memory so we did not employ the distortion models with it. The
parameters used to generate each of the training sets are summarized in
Table l. The probability of change parameters p, pi and pb were chosen

so as to allow a reasonable amount of deviation without completely

lOO

destroying the underlying pattern structure. The value of pb was set
higher than pi so as to allow greater distortions in the boundary than
in interior regions. For the Markov random field model the parameter
values were selected after a number of test runs. Parameters ai and
ab in Table 2 introduce reasonable amounts of distortion into the
patterns, with more changes likely in the boundary regions. Reasonable
amounts of distortion along the three edges of the triangle and random
changes are obtained in the interior regions for the values chosen for

ai,i’ ab,b and ab,i in Table 2.

Table ;; Parameters values used Lg generate training sets

Generative Parameter values Number of
model patterns

I) Pattern Structure: Triangles.

l) Independent noise p = 0.05, N U[7,lO] 20
2) Independent distortion pi = 0.05, pb = 0.l5 ' 20
N U[7,lO]
3) Markov random field A = <2.2,l.5,0.l,-l.0,-0.8> 20
distortion N - 9,l0

II) Pattern Structure: '7'

1) Independent Noise p - 0.05, N U[2,3] 20

N denotes the pattern type.

10]

The first part of this section presents the results obtained at
each stage of the inference scheme. The second part studies the
robustness of the inference algorithm and demonstrates that the Markov

random field model is quite general and covers the other two schemes.

6.l.l Implementing the Inference Scheme

The first step in the inference scheme, described in Fig. 5 (Chap-
ter 3), is to estimate the parameters of the model from the patterns of
the training set as training set as described in Chapter 5. The
noise/distortion process used to generate the training patterns is
referred to as the generative model, whereas the model assumed when
estimating the parameters for the inference scheme is termed the
inferential model. With independent noise and distortion inferential
models, this involved counting over the entire pattern set. However,
the estimation procedure for the Markov random field model depends on
size, so parameters were estimated separately for each size. Since the
pattern sizes are small, the counts for all the patterns of one size
are accumulated together for estimation. Our final estimates average
the values obtained over all sizes and the two codings (Section 5.3.A).

The estimates for each of the training sets appears in Table 3.

102

Table 3i Estimated Parameter Values

I) Pattern Structure: Triangles
l) Independent noise p = 0.0A68
2) Independent distortion . pi - 0.05h3, pb = 0.l37

3) Markov random field distort. A = <l.86,l.06l,0.097.‘l.l66,-0.632>

II) Pattern Structure: '7'

l) Independence Noise p = 0.05l

Step 2 in Fig. 5 breaks up the patterns into individual blocks,
computes their probability, and separates them into their respective
intermediate languages. For the '7's the blocks were coded into

'vector primitives' and converted to string form as shown in Fig. l6.

The k-tail method for inferring regular string grammars produces a
number of candidate grammars corresponding to different tail lengths.

To emphasize structural differences, weight factors bl = l.0 and

b2 - l0.0 were used to compute the overall discrepancy (Section A.2.3).

The complexity and discrepancy values were plotted as functions of tail

length and the grammar corresponding to the tail length that is closest

 

l03

INTERMEDIATE LANGUAGE -- A
VECTOR PRIMITIVES -- Vl --> X . . V5 --> . X
V2 --> . X X V6 --> . X .
V3 --> . . . V7 --> X . X
VA --> X X X V8 --> X X
BLOCK CODED FORM
x O
. X X ----> Vl V2 V2
X X
X X X
X X
----> V3 VA V2 V3 V3 V3
INTERMEDIATE LANGUAGE -- B
. . X X
VECTOR PRIMITIVES -- WI --> . W2 --> X W3 --> X WA -->
X X .
. . X X
W5 --> . W6 --> x W7 --> . W8 --> x
. - . X X
BLOCK CODED FORM
0 O x x O
. X X . . ----> Wl W2 W3 Wk W5
x x O O O
X . X X . .
X X X . X . ----> W8 W2 W3 WA W5 W6 W5 Wl W5 W5 W5
X X . . X . .

gig. lg: Coding Blocks into String;

or Numeral

to the point of intersection of the two graphs was selected as the

th

'best' grammar for the intermediate language. Some of these plots are

illustrated in Figure 17.

The inference procedure was implemented by a Spitbol (Snobol)
program on the Cyber l70/750 computer. However, to reduce the total
computation time, the discrepancy computation was implemented in a
separate Fortran routine. The run time and the amount of core memory
required by the inference program is presented in Table A for each of
the training sets: LA and LB correspond to the two intermediate
languages for each array grammar as illustrated in Figs. 2 and 3
(Chapter 2). The table indicates that the run time and core
requirements for the inference algorithm are extremely high, even for

the moderate pattern sizes used in our experiments. This was the main

reason for limiting the number of experiments conducted.

6.l.2 Robustness Tests

The purpose of the robustness test is to assess the effectiveness
of the inference procedure when the generative and inferential models
are not the same. The two dimensional discrepancy measure assesses the

fit of the inferred array language to the training set.

105

oo.pu

 

oo.m«

21%

£23.28 22m.

oo.;. oo.p. on”. .

 

0.00

 

 

 

 

 

 

 

W
. m
g .uE
n H
m 5
. mu
.m cm
m
..."
_
w.
an o. o- 5 a. 0 MO 8610 8.%
8 o s o n . Szmamcumum
» exuuezou
8.6... 85... 8.5 we}: 2.». 8..» 8.0m
mu
m.
n
a
L
L
n
m.
u e one ouuo cane oe.qu
8 c s o s . Bzaﬁuﬁma

cos.

IAI

Int. Lang.

Lang. '8'

Int.

 

 

can. cone
buzcuwcumuc

»p_xwumzou

ee.x« oo.p.

ate

0.00

0.00

 

 

ouuo
ruzcmwcum~c

8.0

(II)

OISB.

N

(I) Generative Model:

(ll)

MRF Distortion.

Generative Model:

plexity and Discrepancy versus Tail length.

 

l7: Com

Fig.

l06

Table 5i Run Times and Core Reguirements

Generative Run Time Max. Core Used*
Model (secs.) (in K-words)
LA LB LA LB

Pattern Structure: Triangle

l) Independent Noise 295.l 506.7 72.6 92.5
2) Independent Distort. 36l.2 559.9 72.6 72.6
3) Markov Random Field 299.h 909.3 72.6 98.5

Pattern Structure: '7'

l) Independent Noise 867.3 906.2 76.6 83.2

A
- Each word on the Cyber is 60 bits long. The maximum core available

to a user is I27.5 K-words.

The triangle was used as the ideal structure. Three training sets
of 20 patterns each were formed: l0 of the patterns were of type 9 and
the other IO were of type ID. The parameters for the generation
process were the same as in Table 2 and the corresponding array
grammars were inferred in the usual way. The discrepancy measures are

presented in Table 5.

Rows l, 2 and 3 of Table 5, with the independent noise generative

model, show that the Markov random field distortion model subsumes the

107

Table 5; Robustness Test ; using the Discrepancy Measure

Generative Model Inferential Model Suitability Value
l) Indep. Noise Indep. Noise l.06l E-l6
2) Indep. Noise Indep. Distort. 2.090 E-l6
3) Indep. Noise MRF Distort. 0.2h5 E-l6
A) Indep. Distort. Indep. Noise l7.62 E-l6
5) Indep. Distort. Indep. Distort. l.933 E-l6
6) Indep. Distort. MRF Distort. 0.66l E-l6
7) MRF Distort. Indep. Noise 37.9l E-l6
8) MRF Distort. . Indep. Distort. 3h.02 E-l6
9) MRF Distort. MRF Distort. ' 'o.oso E-l6

independent noise model. Rows A, 5 and 6 show that with the
independent distortion generative model, an inferential model of
independent noise creates a high discrepancy whereas a Markov random
field inferential model produces an even better fit than the "correct"
inferential model. Comparing rows 3 and 6 with the discrepancy values
obtained in rows 7, 8 and 9, emphasizes the generality of the Markov
random field inferential model, which produces more suitable grammars
than the other two inferential models create for the same generative
model (rows l and 5). This justifies the use of the Markov random

field inferential model.

 

108

6.2 Parsing Experiments

The two experiments reported in this section compare acceptance
' probabilities for triangles generated by the three different
noise/distortion models. In the first experiment, the array grammars
used for parsing were inferred in Section 6.l.l (Table 3, part I) and
the patterns parsed were generated from the same array grammars. The
second experiment created l5 distorted triangles from the physical
processes themselves, five under each of the noise/distortion models.
The array grammars used to parse them were inferred as in experiment I
except that intermediate grammars of tail length one were selected.
These array grammars are less restrictive than the grammars in

I

experiment I.

The results of experiment I are given in Table 6. Except for
pattern number lh, the probability of acceptance for each pattern was
highest when the pattern was parsed by the array grammar that generated
it. Some of the patterns generated by one of the corruption processes
could not be parsed by the array grammars trained on patterns from the
other corruption processes which demonstrates differences among the

array grammar models of Section 6.l.l.

The results of the second experiment are listed in Table 7. The
grammar based on the Markov random field model accepted all l5

patterns, whereas grammars based on the independent models could parse

 

l09

Table 6; Acceptance Probabilities ; Patterns generated from

Array Grammars

Source Pattern Size Inferential Model
Number INN IND MRF

INN l 9 0.l58 E-l7 0.032 - 7 0.000l E-l7
INN 2 l0 0.l99 E-l9 0.026 - 0.000l E-l9
INN 3 l0 0.203 E-22 R R

INN A ll 0.68l E-23 0.026 E-23 R

INN 5 ll 0.602 E-26 0.0A9 E 26 R

IND 6 9 O IOA E-l9 0.207 E-l9 R

IND 7 IO 0.0l3 -27 0.233 E-27 R

IND 8 IO R 0.A2A E-26 0 I79 E-26
IND 9 II R 0.275 E-29 R

IND l0 ll 0 l78 E-3l 0.5I2 E-3l R

MRF ll 9 R R 0.629 E-25
MRF 12 IO R 0.009 E-30 0 7l6 E-30
MRF l3 l0 R R 0.7A7 E-32
MRF lA ll 0.0l7 E-3l 0.327 E-3l 0.00A E-3l
MRF l5 II R R 0.353 E-35
INN - Independent noise model: IND - Independent distortion model:

MRF - Markov random field model.

R - Rejection: pattern could not be parsed by that grammar.

only two patterns generated under the Markov random field model. This
illustrates that the Markov random field model generalizes both the
other models. Table 7 also shows that the independent distortion model
generalizes the independent noise model. In a few cases, the patterns
were accepted with higher probability by an inferential model that was
different from the generative model (e.g. patterns I and 3), but this
is more likely a spurious result caused by the random nature of the

generative process.

 

IIO

Table 1i Acceptance Probability ; Patterns Generated from

Physical process

Source Pattern Size Inferential Model
Number INN IND MRF
INN l 6 0 05l E-Il 0.l76 E-ll 0.086 E-ll
INN 2 6 0 I65 E-l3 R 0 29A E-l3
INN 3 ll 0 35A E-33 0.530 E-33 0.0lA E-33
INN A ll 0 508 E-32 R 0.006 E-32
INN 5 l2 0 l9l E-30 0.0l5 E-30 < E-33
IND 6 6 0.0l6 E-l3 0.l93 E-l2 0.308 E-IZ
IND 7 6 R 0 l6A E-l3 0.002 E-l3
IND 8 l2 0.009 E-38 0 80l E-38 0 07A E-38
IND 9 l2 R 0 59l E-39 O.Al0 E-39
IND l0 l2 R R 0.266 E-A3
MRF ll l2 R R 0.5l2 E-A5
MRF l2 l2 0.0002 E- 7 R 0.l80 E-A7
MRF l3 l2 < E-5l R 0.lAl E-A7
MRF IA l2 R R O.A88 E-A6
MRF l5 l2 R R 0.lAA E-52
INN - Independent noise model: IND - Independent distortion model:

MRF - Markov random field model.
R - Rejection: pattern could not be parsed by that grammar.
< - The acceptance probability was less than the value indicated.

6.3 Recognition of Rotated Triangles

The purpose of this experiment is to examine the effects of small
rotations of the ideal configuration on the acceptance rate. A
triangle of type l3 was rotated by l0 degrees as shown in Figure l8.
Ten patterns were created by imposing the independent noise model with
a probability of change value of 0.05 on the rotated triangles and

parsed by the three array grammars of Section 6.l.l: the results are

 

lll

lO° Direction of

h Rotation

 

 

XXXXXXXX

 

 

#-
I
~a‘ ......
10°L ,k x x x x - -x—-x--x -
I I
- 4-»4-x

Boundary of original 13x13 triangle.

----- Boundary of rotated triangle.

Fig. 18; Rotated Triangle.

 

IIZ

presented in Table 8. Some of the noisy rotated triangles are shown in

Figure l9.

The Markov random field model grammar accepted 9 of the ID rotated
patterns though the probability of acceptance was quite low (the
acceptance value is of the order of E-A7 for the ideal triangular
structure of size IS). The noise and distortion model grammars failed
to recognize these triangles. This shows that an inference scheme
based on the Markov random field distortion model can incorporate small

variations in the ideal structure into the distortion process.

Table 8; Acceptance Probabilities ; Rotated Triangles

Source Pattern Inferential Model
Number INN IND MRF
INN l R R 0.l32 E-97
INN 2 R R 0 A08 E-IOA
INN 3 R R 0.A82 E-93
INN A R R 0.262 E-92
INN 5 R R 0 725 E-95
INN 6 R R R
INN 7 R R 0.388 E-97
INN 8 R R O.A3A E-lOl
INN 9 R R 0.275 E-92
INN l0 R R 0.723 E-97
INN - Independent noise model; IND - Independent distortion model:

MRF - Markov random field model.
R - Rejection: pattern could not be parsed by that grammar.

ll3

. X X X X .

. . X X .
. X X X X X X .

. X X X X X .
X X X X X X X .
. X X X X .
. X X X X X X X X .
. X X X X X X X X .
X X X X X

X X X X X X .
. X X X X X X X
. X X X X X X X X .
. X X X X X X X X X

X

. X X X X .
X X X X X X

X X X X X X X X X

. X .

. X X X X X X X X X .
. X X X X X X X X X X X

. X X X X X X .
. X X X X X X X X X X X X X X

0 x

X X .

x O

. X X
. X X X X X X X X X X X X X X

X X X X

. X

X .

X X

. X X X X X X .

X X X X X X X

0 x
. X

. X X X X X X X .

X X
X X X X X X .

. X

.x
xxx
ox.

. X X . X X X X .

x O

X
. X X X X X X X

. X X X X

. X
0 x

. X X X X X X X .
X X X X X X X X .

. X X X X X X X X

xx
xx
xx . .
xxx .
xxxx
xx . .
xxx
xxx
xxx o
xxxx
xx ox
xxxx
xxxx
.XXX
0 0 ex
xx ox
ox .
xx .
xxx
xxx
xxx
xxx
xxx
xxx
xxx .
xxxx
xxxx
xxxx
.xx
. .vAvA

 

[lg . _l_9: Noisy Rotated Trian les

IIA
6.A Parallel Computation Techniques

The inference algorithm for regular grammars, the algorithm for
computing the discrepancy measure and the computation of the measure of
suitability for array languages require large amounts of computer time
in the normal sequential mode of operation. However, parallel
computation techniques, such as the SIMD (Single Instruction-Multiple
Data) and pipeline architectures, offer significant speedup. Parallel
algorithms have been presented for the k-tail inference scheme for
string grammars [l6] and the computation of the WLD between two
strings [l6,l7], which can be implemented on general purpose pipeline
processors [6] or dedicated SIMD systems [l7]. The time complexity for
the WLD computation for two strings of lengths m and n is reduced from

O(mn) to 0(m+n), but requires [min(m,n)+l] processing elements [l7].

We now extend the parallel algorithm for computing the WLD for
strings to a parallel algorithm for computing WL02, the distance
between two arrays. This algorithm, a parallel implementation of the
algorithm from Moore [76], is presented below. The notation
WLD(COL-J,COL-L) denotes the WLD measure for strings between column J
of the IxJ subarray from A and the Lth column from the KxL subarray
from B. WLD(ROW-I,ROW-K) has a similar interpretation. SUBS(qu,BRS)
represents the substitution cost of replacing element APQ of array A by

element B of array B. It is 0 if A = BRS’ and SC (the substitution

RS PO

cost) otherwise.

IIS

Procedure A:
Input: Two arrays, A and B, of sizes PxQ and sizes RxS respectively.

SC, the substitution cost, IC, the insertion cost and DC, the
deletion cost.

Begin
(I) Set Dist(0,0,0,0) - o
(2) For M 8 l to P+Q+R+S
Do in Parallel For all I,J,K,L = O to M such that
|+J+K+L 8 M, steps (i)-(xvi)

(i) eI = Dist(I,J,K,L-l) + K*IC
(ii) e 8 Dist(I,J,K-l,L) + L*IC
(iii) e 8 Dist(I,J-I,K,L) + I*DC
(iv) eh 8 Dist(I-l,J,K,L) + J*DC
(v) e 8 Dist(I,J,K-l,L-l) + (K*L-l)*IC
(vi) e6 8 Dist(I-l,J-I,K,L) + (I*J-l)*DC
(vii) e7 8 Dist(I,J-l,K,L-l) + WLD(COL-J,COL-L)

(viii) e8 8 Dist(I-l,J,K-I,L) + WLD(ROW—l,ROW-K)

(ix) e Dist(I,J-l,K-I,L) + L*IC + I*DC

9
(x) e10 - Dist(I-l,J,K,L-l) + K*IC + J*DC
(xi) e1] 8 Dist(I,J-l,K-l,L-l) + (J-l)*DC +
WLD(COL-J,COL-L)
(xii) e12 = Dist(I-l,J,K-l,L-l) + (K-l)*IC +
WLD(ROW-I,ROW-K)
(xiii) e]3 = Dist(I-l,J-l,K,L-l) + (J-l)*DC +

WLD(COL-J.COL-L)

Dist(l-l,J-l,K-l,L) + (l-l)*DC +

(xiv) em

ll6

WLD(ROW-I,ROW—K)

(xv) e1 8 Dist(I-l,J-l,K-l,L-l) + WLD(ROW-l,ROW-K) +

5
WLD(COL-J,COL-K) - SUBS(A

PQ’BRS)
(xvi) Dist(l,J,K,L) 8 min(ei: l<8i<8l5).
End Loop
End Loop
Distance(A,B) 8 Dist(P,Q,R,S)

End Procedure.

At each step in the WL02 computation, the string WLD between
strings of length I and K, and J and L respectively, is computed.
Therefore, the WL02 computation for PxQ and RxS arrays requires P+Q+R+S
steps with a time complexity of O{(P+Q)+(R+S)} at each step. The
overall time complexity can then be expressed as

O((P+Q+R+S)2}
For P8Q8R8S8N this reduces to O(Nz), as opposed to O(N6) for the
sequential case. The general expression for the maximum number of
processors required is extremely complicated, but for I8J8K8L8N can be
expressed as

(N+l) (2N2+AN+3) , for N>82 --(l)

3

For N 8 l, seven processors are required.
This shows that the number of processors required is of the order of
N3. The WL02 computation of two 20x20 arrays requires 6l8l processors,

a rather large number. A more practical scheme might involve using a

ll7

smaller number of processors, say U, and a pipeline architecture.
Ignoring the delay in the set-up time, or the time taken to structure
the pipeline for the computation, the number of time steps required for

computation can be expressed as,

I+J+K+L

> 5'
/

E:

where Mi is the number of processors required at each step (2) of the

above algorithm. For I8J8K8L8N, Mi can be expressed recursively as

(i+l)(i+2)(i+3) l<-i<-N
M.‘ ---------------
I
6

= Mi_ + i+l i8N+l

8 Mi-l + (2N-l)-3(i-N-l) N+2<8i<82N

8 MAN-i 2N<i<-AN-l
Therefore, the use of pipeline architectures results in fairly
large speedup without requiring an unreasonable number of processing
elements, even for the WL02 computation between large arrays. For
example, the WL02 computation between two l28xl28 arrays would require
an enormous number of processors (A,293,507) by equation I for a SIMD
implementation. The use of a pipeline architecture with A096 processors
will increase the number of computation steps from O(NZ) to 0(N3°06),
which is still a considerable improvement over the 0(N6) steps required

for a sequential scheme. However, the actual speedup obtainable in a

 

ll8

practical implementation will also depend on memory bandwidth and the

interconnection network used between memory and the processors.

6.5 Summary

In this chapter, we have presented results obtained by applying the
inference scheme in conjunction with the independent noise, the
independent distortion and the Markov random field distortion models.
The discrepancy measure for array languages and parsing experiments
show that the inference scheme based on the Markov random field model
is more robust than those based on the other two schemes. The
acceptance rates for the set of rotated triangles shows that the Markov
random field model introduces flexibility into the syntactic
recognition scheme by allowing small deviations and changes in the
ideal structure at the cost of increased computational complexity. The
last section describes parallel techniques that may be used to speed up

the computation process.

 

CHAPTER VII

SUMMARY. CONCLUSIONS AND FUTURE RESEARCH

This chapter summarizes the main contributions of this thesis and

suggests directions for future research.

7.l Summary and Conclusions

The goal of this thesis was to develop a two dimensional picture
description and learning scheme for binary images using the array
grammar as a formal mathematical model. The block structure of array
grammars has been exploited to develop an inference scheme that
requires knowledge of the ideal image, which implies the nonterminal
rewriting rules be known. Chapter 3 of this thesis proposes the first
formal algorithm for inferring an array grammar that models the
structure of noisy and distorted patterns. The key step is to infer
intermediate grammars from probabilistic samples. The development of
the probabilistic two dimensional inference scheme is the primary
contribution of this thesis. Several of the subproblems that were

attacked are summarized below.

ll9

 

l20

A number of complexity and discrepancy measures for string grammars
are reviewed in Chapter A, and a computationally simple complexity
measure that permits quantification of the differences in complexity
among regular grammars inferred by the k-tail method was chosen for our
application. The main development in this part of the thesis is the
definition of a new discrepancy measure for string languages in Section
A.3 that involves both probabilistic and structural factors. The new
discrepancy measure is shown to be bounded and convergent, making it
computationally feasible, even for infinite languages. It is also
shown to possess a number of intuitively desirable properties. The
concept of the structural discrepancy measure for string languages is
extended to define a two dimensional discrepancy measure for array
languages in Section A.A that quantifies the ability of an inferred

array language to match the training set.

Chapter 5 discusses a simple independence noise model and two
independent distortion models for describing corrupting influences on
images. A powerful and comprehensive distortion model based on
nonhomogeneous Markov random fields is also analyzed. These models
have wider applications than grammatical inference. Mathematical
formulations and techniques for generating patterns are studied.
Estimation of the parameters of each model are defined, given a finite
set of training patterns. The estimates are used to compute the

probabilities of the block samples for each intermediate language.

12]

Tests performed on isosceles triangles indicate that the array
grammars inferred under the Markov random field models are capable of
accepting patterns that are generated by independent noise and
distortion processes. Approximations to the two dimensional
discrepancy measure of Section A.A have been used to investigate the
robustness of array grammars that are inferred from three generative
models. The Markov random field inferential model was found to be the
most robust in Section 6.l.2. Parsing experiments in Section 6.2
demonstrated the capability of the Markov random field inferential
model to parse patterns from other models. Results of an experiment on
a rotated triangle in Section 6.3 show that the array grammar inferred
under the Markov random field inferential model is capable of parsing
noisy rotated patterns. The two main conclusions of this study are
first. the Markov random field model infers a versatile two dimensional
grammar that is applicable with a number of noise and distortion
models. and second, it allows the use of underlying structures that
have small deviations from the original ideal structure, thus making

the inferential model more flexible.

The run time and the memory requirements for the inference
algorithm on the Cyber l70/750 computer show that the proposed
algorithm requires tremendous amount of computation time in the
sequential mode of operation. The lack of adequate computer resources

was the main reason for the limited number of experiments conducted in

 

 

 

122

this thesis. The array grammar model is more suited for a
sequential/parallel mode of operation. Chapter 6 examines potential
computational speedup obtainable by using SIMD and pipeline architectu-
res. A parallel implementation of the WL02 algorithm is proposed, that
reduces the time complexity of the algorithm from O(N6) to O(NZ).
Possible implementation using a pipeline architecture when the number

of processing elements required becomes too large, is also discussed.

The study of a two dimensional learning scheme in conjunction with
a corrupting scheme is the primary contribution of this thesis. The
main conclusion is that the learning scheme based on the Markov random
field model infers a vesatile array grammar that is applicable with a
number of noise and distortion models and also allows small deviations
in the assumed ideal structure. Implementation of this scheme with the
architecture proposed in Section 6.A should open up a number of real

time practical applications in industrial vision and robotics.

7.2 Future Research

Directions for further research can be classified into two
categories -
(i) Modifications and extensions to the inference scheme and the
noise and distortion models that enhance their performance. and
(ii) Implementation of cellular automata as language acceptors [l03]

and of parallel rewriting rules [59,9A]. to define a new form of formal

123

array grammar model that could overcome the computational problems of

present two dimensional inference schemes.

A few problems from category (i) are outlined below. An important
factor that arises after the matrix samples have been converted into
vector primitives is the size of the set of primitives. Formal
primitive extraction procedures could be studied using the overall
complexity of the inferred grammar as a criterion. The grammar would
be accepted as a candidate grammar only if its complexity was below a
prespecified threshold. In case no string grammar could be found
satisfying the above criterion, an alternative scheme that involves
inferring a two level matrix grammar( Siromoney ) could be adopted. as
is explained in Appendix C. When the underlying noise/distortion model
for the sample set is not known. statistical tests might be developed
to model the noise and distortion in the training set. A decision
making procedure would be required to choose the appropriate model.
This procedure could be based on computing the difference in
probabilities assigned to samples by the distortion scheme. and the
apriori probabilities assigned to the sample. The model that provides
the closest fit could then be chosen as the underlying distortion
scheme for the pattern class. This scheme could be further advanced to
modelling the noise and distortions observed in real image processing

applications.

 

l2A

Another direction for future research, involves trying to fit the
concepts of cellular automata as language acceptors [lO3.lOA.lO5] and
the concept of parallel rewriting rules [59.9A,l01,102] into the array
grammar model. Intuitively, a cellular automaton is an array of
identical processing elements called cells, which are uniformly
connected in some form of a neighborhood pattern. Each cell is
connected in a particular configuration to a fixed number of
neighboring processors. These cells operate synchronously and in
discrete time steps. Reconfigurable cellular processors have also been
defined [30]. Cellular computers are presently being used for image
processing operations [32,106]. The usefulness of cellular automaton
as language acceptors and their ability for fast parsing of regular and
context free languages indicate possible applications in inference and
learning schemes. The concept of parallel two dimensional parsing
using cellular architectures also indicates new possibilities in
modelling the structures of two dimensional images and developing

learning schemes for the same.

 

APPENDICES

 

APPENDIX A

NOTATION AND DEFINITIONS - ARRAY GRAMMARS

 

 

 

 

We define I to be an alphabet - a finite non empty set of terminal
symbols. A matrix over is an mxn array of symbols from I. The set
A? *a
of all matrices over I is denoted by I ’ and I++8 I - { }. where the
empty matrix is denotedA . If
. T F ' j
a(l,l) a(l,2) --- a(l.n) b(l.l) b(l.2) --- b(l.n )
a(2.l) a(2,2) --- a(2.n) b(2,l) b(2.2) --- b(2,n')
X 8 --- --- --- Y 8 --- --- ---
Li_=i(rn.l) a(m.2) --- a(m.n)_d b(m'.l) b(m'.2) ---b(m'.n')
I— _I
are two matrices then column catenation is defined only when m 8 m',

and is written as,

,—
a(l.1)a(1.2)

x ¢ v .

a(m.l) a(m.2)

_

 

--- a(l.n) b(l.l)

--- a(m.n) b(m,l)

125

---- b(l.n')
---- b(m'.n')
_J

 

126

The new matrix has m rows and (n+n') columns. Similarly row catenation

is defined only when n 8 n'. and is written as,

 

 

”a(i.” “““ a(1,n)—i
X B Y 8 ---- ---- ----
3011.1) ---- a(m,n)
b(I.I) ----- b(l,n)

‘—b(m'.l) ---- b(m',nlJ

The resultant matrix has (m+m') rows and n columns. If M and M' are
two sets of matrices, then the column and row products are sets of

matrices defined as follows
Column Product, M 0 M' 8 {X 0 Y | X E M, Y 6 M'}
and Row Product. M B M' 8 {X B Y | X E M. Y E M'}.
Columnwise Kleene closure is defined iteratively as follows:

1 2

Let - M a M. n = M] ¢ M. -—-, M"+'

- M" g M.

n*-fu(A>

 

127

Kleene closure by rows is defined as

Let - MI 8 M, M2 8 M1 8 M. ------- . M 8 Mn 8 M.

 

APPENDIX B

ARRAY AUTOMATA

An array automaton consists of a two dimensional input tape and a
two dimensional storage tape. If the input is an mxn array of cells,
the storage tape contains a (2m+l)x(2n+l) array of cells. The movement
of the automaton is in three stages. In the first. the automaton acts
on the storage tape. reading, printing and ”pushing“ symbols around
depending on the nonterminal production rules, ultimately subdividing
the (2m+l)x(2n+1) storage tape into blocks. each of which corresponds
to a matrix of a corresponding IML. In the second step. the automaton
acts as several automata and fills up each symbol (i.j) of the storage
tape : i and j are both even numbers. The blocks can have a fixed num-
ber (say k >8 1) of rows (or columns) so the automaton must act on
these rows or columns simultaneously. The third step consists entirely
of checking. The automaton compares the input symbol in the (i.j)th
cell with the symbol on the (2i,2j)th cell of the storage tape. If the
comparison fails at any cell the automaton halts without accepting.

Otherwise the automaton halts and accepts the input.
The set of patterns accepted by an array automata are equivalent to
the set of arrays generated by array grammars [60]. We conclude this

section with a formal definition of a (regular:regular) array automaton

128

 

129

{(R:R) AA}.

The (regular:regular) array automaton is a 10 tuple

(K,I,Z.d], d d 2 F C)

2! 3! qo! o! i

where
K is a set of finite states:

q0 E K is a distinguished (start) state:

FCZK is the set of final states:
I is the input alphabet, or the primitive picture symbols:

2 8 Zl U 22 U 23, with Z3 8 I is the finite set of storage symbols:

Z0 E Z] is the initial storage symbol;

d‘ 15 a mapping from {qoixlI into finite subsets of {qo}x{Z]QZ2} or
{qo}x{ZzQZ]} where 0 represents row catenation (B) or column catenation
(IP) :

d2 is a mapping from le2 into the finite subsets of

Kx{(zz¢23)u(23¢zz)l:

d3 is a mapping from lexl into Kx{DOWN,TOP}:

C is a counter that starts from an initial value of 0. and keeps track

of the d1 moves of the automaton.

The .dI moves of the automaton correspond to the nonterminal rewriting
rules of the corresponding array grammar, and create the block

structure description of the required size. The d2 moves are the

second step and correspond to filling each block independently with the

primitive symbols of the picture. The d mapping corresponds to the

3

checking phase. where the input picture is compared to the picture on

 

130

the storage tape. DOWN implies a move downward along a column and TOP

implies a change to the top of the next column.

 

APPENDIX C

MATRIX GRAMMARS

A matrix grammar is a two tuple

"6' (GI! G)

2

where

GI ‘ (Vl’

context-free or regular:

I],P.S) is a string grammar that can be context-sensitive,

V] is a finite set of horizontal nonterminals;

II is a finite set of intermediates {SI’SZ’---’Sk};

P1 is a finite set of production rules:

S 6 V1 is the start symbol:

k .
G2 8 U G2i is the set of vertical grammars where each
i8l
G2i ‘ (Vzi"' PZi’Si)

is a right linear (regular) grammar and

V2i is a finite set of vertical nonterminals,

P2i is a finite set of right linear production rules.

I is the set of picture primitives and

v2i(\ V21 - ¢ if i + j.

Derivations to create an mxn picture follow a two step procedure.

First a string 5 5 ---sn symbols from I is generated horizontally

1 2 I

131

132

using G]. Then the corresponding G rules are applied m times

2
simultaneously to every column. to create an mxn matrix. The set of
all matrices generated by a matrix grammar is denoted

L(MG) 8 {mxn arrays [aij]' m,n>81 I S8 5152"5517{ai13}°

L I ST OF REFERENCES

[l]

[2]

[3]

[A]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

REFERENCES

A.V. Aho and T.G. Peterson, 'A minimum distance error-correcting
parser for context free Ianguages'. SIAM Jour. Computing,
vol. A, 1972. pp. 30A-312.

J.E. Albus, 'Electrocardiogram intrepretation using a stochastic
finite state model' in Syntactic Pattern Recognition:
Applications (K.S. Fu. ed.), Springer Verlag. Berlin Heidelburg,
1977. pp. 51-6A-

F. Ali and T. Pavlidis. 'Syntactic recognition of handwritten
numerals', IEEE Trans. Sys.. Man and Cyber.. vol. SMC-7. 1977.
pp- S37-5Al-

M.R. Anderberg, Cluster Analysis For Applications. Academic
Press, New York. 1973.

A. Apostolico. 'On the efficiency of syntactic feature
extraction and approximation schemes for data compression',
Proc. 5th. Intl. Conf. on Pattern Recognition, Miami Beach.
Florida. vol. 2, 1980. pp. 982-98A

J.L. Baer, Computer Systems Architecture. Chapter 10.2, Computer
Science Press. Maryland. 1980.

J.E. Besag, 'Nearest neighbor systems and the auto-logistic
model for binary data'. J. Royal Stat. Soc.. Ser. B. vol. 3A.

1972. PP- 75-83.

J.E. Besag, 'Spatial interaction and the statistical analysis of
lattice systems', J. Royal Stat. Soc.. Ser. B. vol. 36. 197A,
pp. 192-236. .

A.W. Biermann and J.A. Feldman, 'On the synthesis of finite state

machines from samples of their behavior',~ IEEE Trans. Comput..
vol. C-21, 1972. PP. 592-597.

M. Blum. 'On the size of machines', Info. and Control. vol. 11.

1967. PP- 257‘265-

T.L. Booth and R.A. Thomson, 'Applying probabilistic measures
abstract Ianguages'. IEEE Trans. Comput.. vol. C-22, 1973, pp.
AA2-A50.

J.R. Bourne. V. Jagannathan. B. Giese and J.W. Ward. 'A software

133

 

[13]

[1A]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[2A]

[25]

13A

system for syntactic analysis of EEG'. Computer Programs in
Biomedicine, vol. 11, 1980. pp. 190-200.

W.S. Brainerd, 'Tree-generating regular systems', Info. and
Control. vol. 1A. 1969, pp. 217-231.

J.M. Brayer and K.S. Fu, 'A note on the k-tail method of tree
grammar inference'. IEEE Trans. Sys.. Man, Cyber.. vol. SMC-7.

1977. pp- 293-300-

J.M. Brayer, 'Web automata and parsing web grammars', Report NSF
Workshop on Structural and Syntactic Pattern Recog., Saratoga
Springs. New York, 1981. pp. AO-Al.

Y. Chiang and K.S. Fu. 'Parallel processing of some synatctic
pattern recognition algorithms', Ninth Workshop Appl. Imagery.
Maryland. 1980. pp. 1-21.

Y. Chiang and K.S. Fu, 'Parallel processing for distance
computation in syntactic pattern recognition', IEEE Comput. Soc.
Workshop Comput. Architecture for Pattern Analysis and Image
Database Management. VA, 1981. pp. 337-3AA.

N. Chomsky. Syntactic Structures, Mouton and Co., The Hague,
Netherlands. 1957.

N. Chomsky and G.A. Miller, 'Finite state Ianguages'. Info. and
COO‘EI’O], V0]. 19 1958' pp. 91-112.

V. C1aus,et al., eds.. Graph Grammars and their Applications 39
Computer Science Egg Biology. Springer Verlag. Berlin.
Heidelberg, 1979.

C.M. Cook. 'A cost function for concept formation', Tech. Rep.
TR-212. Computer Science Center. Univ. of Maryland. College
Park. 1972.

C.M. Cook, 'Grammatical inference by heuristic search', Tech.
Rep. TR-287, Computer Science Center. Univ. of Maryland. College
Park. 197A.

C.R. Cook and P.S. Wang. 'A Chomsky hierarchy of isotonic array
grammars and Ianguages'. Computer Graphics and Image Processing.
V0]. 8’ 1978’ pp. lab-'52.

S. Crespi Reghizzi. 'The mechanical acquisition of precedence
grammars', Tech. Rep. UCLA-ENG-705A, School of Engg. and Appl.
Science, Univ. of California, Los Angeles, 1970.

G.R. Cross. 'Markov random fields and texture models'. Ph.D.

 

 

 

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[3A]

[35]

[36]

[37]

[38]

I35

thesis, Michigan State Univ., E. Lansing. 1980.

G.R. Cross. et al.. 'Multivariate normality in Pattern Rec-
ognition and clustering', Proc. 6th. Intl. Conf. on Pattern
Recog., Munich, Germany, 1982. pp. 862-86A.

R. DeMori, et al., 'A syntactic procedure for the recognition of
glottal pulses in continuous speech'. Pattern Recognition,

vol. 9, 1977. pp. 181-189.

P.J. Denning. et al., Machines, Languages and Computation, Pren-
tice- Hall. New Jersey. 1978.

R.L. Dobrushin. 'Gibbsian random fields for lattice systems with
pairwise interactions'. Funcl. Analy. Applic.. vol. 2, 1968, pp.
292-301.

T. Dubitzsky. et al., 'Parallel region property computations by
active quadtree networks', IEEE Trans. Pattern Analysis and
Machine Intelligence. vol. PAMI-3. 1981. pp. 626-633.

R.O. Duda and P.E. Hart. Pattern Classification and Scene
Analysis. John Wiley, New York. 1973.

M.J.B. Duff, 'Parallel processing techniques'. from Pattern
Recognition - Ideas lg Practice (B.G. Batchelor. ed.), Plenum
Press, 1978. pp. lA5-l76.

J. Earley, 'An efficient context free parsing algorithm'. CACM,
vol. 13, 1970, pp. 9A-52.

J.A. Feldman et al.. 'Grammatical complexity and inference'.
Tech. Rep. CS-125, Computer Science Dept., Stanford Univ.,
Stanford. 1969.

J.A. Feldman. 'Some decidability results on grammatical
inference and complexity'. Info. and Control. vol. 20, 1972. pp.
2AA-262.

N.V. Findler and J.V. Leeuven, 'A family of similarity measures
between strings', IEEE Trans. Pattern Analysis and Machine
Intelligence. vol. PAMI-l, 1979, pp. 116-118.

P.A. Flinn, 'Monte Carlo calculation of phase separation in a
2-dimensional lsing system'. J. Stat. Physics. vol. 10. 197A,

pp. 89-97.

H. Freeman. 'On the encoding of arbitrary geometric patterns'
IRE Trans. on Electronic Comp.. vol. EC-lO, 1961. pp. 260-268.

[39]

[A0]

[A1]

[A2]

[A3]

[AA]

[A5]

the]

[A7]

[A8]
[A9]
[50]

[51]

[52]

136

K.S. Fu and T. Huang, 'Stochastic grammars and Ianguages'. Intl.
Jour. Comp. and Info. Sciences. vol. 1, 1972. PP. 135-170.

K.S. Fu and B.K. Bhargava. 'Tree systems for syntactic pattern
recognition', IEEE Trans. Computers. vol. C-22, 1977.
pp. 1087-1099.

K.S. Fu. Syntactic Methods lg Pattern Recognition. Academic
Press. New York, 197A.

K.S. Fu and T.L. Booth, 'Grammatical inference: introduction and
survey', Part I, IEEE Trans. Sys.. Man, Cyber.. vol. SMC-5.
1975. pp. 95-111-

Part II, IEEE Trans. Sys.. Man. Cyber.. vol. SMC-5.
1975. PP- A09'A23-

K. Fukunaga, Introduction Lg Statistical Pattern Recognition.
Academic Press, New York, 1972.

L.W. Fung and K.S. Fu, 'Stochastic syntactic decoding for
pattern classification'. IEEE Trans. Comput., vol. C-2A, 1975,
pp. 662-667.

R.C. Gonzalez and M.G. Thomason. 'Tree grammars and their
applications to pattern recognition', Tech. Rep. no.
TR-EE/CS-7A-20. Electrical Engineering Department. Univ. of
Tennessee, Knoxville, 197A.

R.C. Gonzalez and M.G. Thomason. Syntactic Pattern Recognition:
gg Introduction. Addison-Weseley, Reading, Mass., 1978.

R.C. Gonzalez and R. Saafabakhsh. 'Computer vision techniques
for industrial applications and robot control'. Computer, vol.
15. 1982. pp. 17-32.

J. Gruska, 'Some classifications of context free Ianguages'.
Info. and Control, vol. 1A, 1969. pp. 152-179.

J.M. Hammersley and D.C. Handscomb, Monte Carlo Methods.
Meuthen. London, 196A.

J.M. Hammersley and P. Clifford, 'Markov random fields on finite
graphs and lattices', (Unpublished manuscript).

J. Hartmanis and R.E. Stearns, 'On the computational complexity
of algorithms', Trans. American Math. Soc.. vol. 117, 1965, pp.
285-306.

J.E. Hopcroft and J.D. Ullman. Introduction 39 Automata Theory,

 

[53]

[5A]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[6A]

[65]

137

Languages. and Computation, Addison Weseley. Mass., 1979.

J.J. Horning, 'A study in grammatical inference'. Ph.D. Thesis,
Stanford Univ., Stanford, CA, 1969.

E. Isaacson and H.B. Keller. Analysis 91 Numerical Methods, John
Wiley, New York. 1966.

V. Isham. 'An introduction to spatial point processes and Markov
random fields'. Intl. Stat. Review, vol. A9, 1981, pp. 21-A3.

S.N. Jayaramamurthy. 'Multilevel array grammars for generating
texture scenes'. Proc. IEEE Conf. on Pattern Recognition and
Image Processing. Chicago. Illinois, 1979, pp. 391-398.

R.L. Kashyap and M.C. Mittal, 'Recognition of spoken words and
phrases in multitalker environment using syntactic methods',
IEEE Trans. Computers. vol. C-27, 1978. pp. AA2-A51.

R.A. Kirsh. 'Computer intrepretation of english text and picture
patterns'. IRE Trans. Electronic Computers. vol. EC-13, 196A.

pp- 363-376.

H.C.M. Kleijn and G. Rozenberg. 'A study in parallel rewriting
systems', Info. and Control, vol. AA. 1980. pp. l3A-163.

K. Krithivasan and R. Siromoney. 'Array automata and operations
on array Ianguages'. Intl. Jour. of Computer Math.. vol. A.
Sec. A, 197A, pp. 3-30.

W. Kuich. 'On the entropy of context free Ianguages'. Info. and
Control. vol. 16, 1970. pp. 173-200.

R.S. Ledley, et al.. 'FIDAC: Film input to digital automatic
computer and associated syntax-directed pattern recognition
programming system'. Optical 22g Electro-Optical Info. Proc.
(J.T. Tippet et. al., eds.), MIT Press. Cambridge, Mass., 1965.
pp. 591-613.

H.C. Lee and K.S. Fu, 'A stochastic syntax analysis procedure
its application to, pattern classification'. IEEE
Trans. Computers. vol. C-21. 1972. Pp. 660-666.

V.I. Levenshtein. ‘Binary codes capable of correcting deletions.
insertions and reversals', Sov. Phys. Dok1., vol. 10. 1966, pp.

707-710.

B. Levine, 'Derivatives of tree sets with applications to
grammatical inference'. IEEE Trans. Pattern Analysis and Machine
Intelligence. vol. PAMI-3. 1981, pp.285-293.

 

 

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[7A]

[75]

[76]

[77]

[78]

138

J. Liou. 'Grammatical inference by a constructive method'. Ph.D.
dissertation. Dept. of Computer Science. Michigan State Univ.,
E. Lansing. 1977.

S.Y. Lu and K.S. Fu, 'Stochastic error-correcting syntax
analysis for recognition of noisy patterns', IEEE
Trans. Comput., vol. C-26, 1977 pp. 1268-1276.

S.Y. Lu and K.S. Fu. 'A syntactic approach to texture
analysis'. Computer Graphics and Image Processing. vol. 7. 1978,
99.303-330.

F.J. Maryanski, 'Inference of probabilistic grammars', Ph.D.
dissertation. Dept. of Elec. Engg. and Computer Science, Univ.
of Connecticut. Storrs, 197A.

K.V. Mardia, 'Applications of some measures of multivariate
skewness and kurtosis in testing normality and robustness
studies'. Sankya: Ind. J. of Stat., vol. 36. Ser. 8, pt. 2,
197A. PP. lA2-15A.

W.J. Masek and M.S. Paterson, ' A faster algorithm computing
string edit distances'. J. Comput. and Sys. Sciences. vol. 20,
1980. pp. 18-31.

A. Mercer and A. Rosenfeld. 'An array grammar programming
system'. Comm. ACM. vol. 16. 1973. pp. 299-305.

N. Metropolis, et al.. 'Equations of state calculations by fast
computing machines', J. Chem. Physics. vol. 21, 1953, pp.
1087-1091.

D.L. Milgram and A. Rosenfeld, 'Array automata and array
grammars', Proc. IFIP Congress-71. vol. 1. 1971, North Holland.

pp-69-7A.

B. Moayer and K.S. Fu. 'An application of stochastic languages
to fingerprint pattern recognition', Pattern Recognition,

vol. 8, 1976. pp. 173-179.

R.K. Moore. 'A dynamic programming algorithm for the distances
between two finite areas'. IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. PAMI-l, 1979. pp. 86-88.

J.L. Mundy and R.E. Joynson. 'Automatic visual inspection using
syntactic analysis'. Proc. IEEE Conf. on Pattern Recognition and
Image Processing. Troy. New York, 1977. pp. 1AA-1A7.

R. Narasimhan, 'Labelling schemata and syntactic description
pictures'. Info. and Control. vol. 7. 196A, pp. 151-179.

 

 

 

 

 

[79]

[80]

[81]

[82]

[83]

[8A]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

139

R. Narasimhan. 'Syntax directed intrepretation of classes of
pictures', Comm. ACM. vol. 9. 1966. pp. 166-173.

T. Okuda, et al., 'A method for correction of garbled words
based on the Levenshtein metric', IEEE Trans. Comput., vol.

c-ZSO 19769 PP- ‘72-‘77-

T.W. Pao. 'A solution of the syntactical induction-inference
problem for a non-trivial subset of context free Ianguages'.
Tech. Rep. 69-19, Moore School of Elec. Engg..Univ. of
Pennsylvania. Philadelphia, 1969.

L.F. Pau. 'Semiconductor IC's: integrated testing and algorithms
for visual inspection'. Proc. 5th. Intl. Conf. on Pattern
Recog., Miami Beach, Florida, vol. 1. 1980, pp. 238-2A0.

T. Pavlidis. Structural Pattern Recognition, Springer-Verlag.
Berlin, Heidelburg, 1977.

 

T. Pavlidis and F. Ali. 'A hierarchical syntactic shape
analyzer', IEEE Trans. Pattern Analysis and Machine
Intelligence. vol. PAMI-l, 1979. pp. 2-9.

E. Persoon and K.S. Fu, 'Shape description using Fourier

descriptors'._IEEE Trans. Sys. Man and Cyber.. vol. SMC-7. 1977,
pp. 170-179.

J.L. Pfaltz. 'Web grammars and picture description'. Computer
Graphics and Image Processing. vol. 1. 1972, pp. 193-220.

G.M. Porter. et al.. 'Automatic visual inspection of blind holes
in metal surfaces'. Proc. IEEE Conf. on Pattern Recognition and
Image Processing, Chicago, Illinois, 1979. pp. 83-86.

W.K. Pratt, Digital Image Processing. John Wiley (Interscience),
New York, 1978.

C.J. Preston, 'Generalized Gibbs states and Markov random
fields'. Adv. Appl. Prob.. vol. 5. 1973. pp. 2A2-26l.

K. Rao and K. Balck, 'Type classification of fingerprints: a
syntactic approach', IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. PAMI-Z. 1980, pp. 223-231.

A. Rosenfeld. 'Isotonic grammars, parallel grammars and picture
grammars', Machine Intelligence, vol. A, 1971. pp. 281-29A.

A. Rosenfeld and D.L. Milgram, 'Web automata and web grammars',
Machine Intelligence, vol. 7, 1972. pp. 307-32A.

 

 

1A0

[93] A. Rosenfeld, 'Array grammar normal forms', Info. and Control.
vol. 23, 1973, pp. 173-182.

[9A] G. Rozenberg and A. Salomaa, The Mathematical Theory gﬁ L
Systems. Academic Press, 1980.

[95] A. Salomaa, Formal Languages. Academic Press. New York, 1973.

[96] C.E. Shannon, 'A mathematical theory of communication', Bell
Sys. Tech. Jour.. vol. 27, 19A8, pp. 379-A23.

[97] A.C. Shaw, 'A formal picture description scheme as a basis for
picture processing systems', Info. and Control. vol. 1A, 1969.
PP- 9’52

[98] A.C. Shaw, 'Picture graphs. grammars and parsing'. from
Frontiers 9: Pattern Recognition (S. Watanabe. ed.), Academic
Press. New York, 1972.

[99] G. Siromoney, et al.. 'Abstract families of matrices and picture
languages'.Computer Graphics and Image Processing, vol. 1, 1972.
pp. 28A-307.

[100] G. Siromoney, et al.. 'Picture Languages with array rewriting
rules', Info. and Control, vol. 22, 1973, pp. AA7-A70.

[101] R. Siromoney and K. Krithivasan, 'Parallel context free
Ianguages'. Info. and Control, vol. 2A, 197A. pp. 155-162.

[102] R. Siromoney and G. Siromoney, 'Parallel 0L Ianguages'. Intl.
J. Comput. Math.. vol. 5, Sec. A, 1975, pp. 109-123.

[103] A.R. Smith III, 'Cellular automata and formal Ianguages'. IEEE
Eleventh Ann. Symp. Switch. and Automata Theory. CA, 1970, pp.
216-22A.

[10A] A.R. Smith 111, 'Cellular automata complexity tradeoffs', Info.
and Control, vol. 18, 1971. pp. A66-A82.

[105] A.R. Smith 111. 'Real time language recognition by one
dimensional cellular automata', J. Comput. Sys. Sciences, vol.

6. 1971. pp- 233-253.

[106] S.R. Sternberg, 'Parallel architectures for image processing',
Intl. IEEE COMPSAC. Chicago. 1979, pp. 1-6.

[107] G.C. Stockman. et al., 'Structural pattern recognition of
cartoid waves using a general waveform parsing system'.
Comm. ACM. vol. 19. 1976, pp. 688-695.

 

 

1A1

[108] G.T. Tang and T.S. Huang. 'A syntactic-semantic approach to
image understanding and creation'. IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. PAMI-l, 1979, pp. 135-1AA.

[109] E. Tanaka and Y. Kikuchi. 'A metric between pictures', Tech.
Rep., Dept. of Info. Science, Utsunomiya Univ., Japan. 1981.

[110] R.A. Thompson. 'Determination of probabilistic grammars for
functionally specified probabilistic-measure Ianguages'. IEEE
Trans. Comput., vol. C-23, 1973. pp. 603-61A.

[111] H. Tropt, 'Analysis by synthesis search for semantic
segmentation applied to workpiece recognition', Proc. 5th. Intl.
Conf. on Pattern Recog., Miami Beach, Florida, vol. 1,. 1980.
pp.2Al-2AA.

[112] W.H. Tsai and K.S. Fu, 'A syntactic-statistical approach to
recognition of industrial objects', Proc. 5th. Intl. Conf. on
Pattern Recog., Miami Beach. Florida. vol. 1, 1980. PP. 251-259.

[113] R.M. Wagner and M.J. Fischer, 'The string-to-string correction
problem'. J. ACM. vol. 21, 1973. pp. 168-173.

[11A] P.S. Wang, 'Sequential/parallel matrix array Ianguages'.
Journal of Cyber.. vol. 5, 1975. pp. 19-36.

[115] P.S. Wang, 'Recognition of two dimensional patterns', Tech.
Rep. no. CS-TR-8-l6. Dept. of Computer Science, Univ. of Oregon.
Eugene Oregon. 1978.

[116] P.S. Wang, 'Syntactic structures of parallel isometric array
patterns',Tech. Rep. no. CIS-TR-80-1. Dept. of Comp. and
Info. Science. Univ. of Oregon, Eugene. Oregon. 1980.

[117] P.S. Wang. 'Some new results on isotonic array grammars', Info.
Proc. Letters, vol. 10, 1980. pp. 129-131.

[118] R.M. Wharton. 'Grammatical inference and approximation', Ph.D.
Thesis, Univ. of Toronto. Toronto. 1973.

[119] E. Yodokawa and N. Honda. 'On two dimensional pattern
generation grammars', Systems, Computers and Controls, vol. 1,

1970. PP. 6'13.

[120] G.T. Zahn and R.Z. Roskies. 'Fourier descriptors for plane
closed curves', IEEE Trans. Computers, vol. C-21. 1972.
pp. 269-281.

[121] B. Zavidovique and G. Stamon. 'An automated process for
electronics scheme analysis'. Proc. 5th. Intl. Conf. on Pattern

 

 

 

1A2

Recog., Miami Beach. Florida. vol. 1, 1980. pp. 2A8-250.