. _ V ‘."'\.“.§',.’ 3. ‘2'? '.,.‘¢I.r.‘..,.n , H V. H ‘ _ I , ‘ ’ I . .
H‘. . g~ \_\1/I‘(,9M,n. I I . Hg, 5., ...II_ N. ‘ . H I ‘I , ' . _
4 ‘f. \‘|.\«\§I\I‘I é‘l'r‘a'h‘. t‘tuLJ'n‘ju', “HUNT” :. ‘ 'I I. I . A . 'I x ‘ I I I ’ " ‘

LINGUISTICALLY ASSISTED Reooemnon
0F PATTERNS FROM. THEIR MULTIPLE
TRANSLATIONAL ENCODEMENTS

Thesis for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
WILLIAM ARTERBURN BURDETTE
1972

LIBRARY

Michigan State
University

 

   
 
  

‘3 mama av “’ i ‘I
- I mm; & SUNS' . a
II BUUK BINDERY we . I
LIBRARY BIhD 'R I -|
'L‘Ks‘nlueeopir. menial I

L A A I

 

ABSTRACT

LINGUISTICALLY ASSISTED RECOGNITION OF PATTERNS
FROM THEIR MULTIPLE TRANSLATIONAL ENCODEMENTS

BY

William Arterburn Burdette

The classical pattern recognition approach to identi-
fying an unknown picture as one of a finite number of
prototypes is based on the extraction of features from a
single, arbitrary, discretized encodement of that unknown
picture. For this approach to be effective, the encode-
ment grid resolution must be fine enough to insure that
distinguishing detail is reflected in the encodement of
each of the prototype pictures -- regardless of the rela-
tive positioning of that picture with respect to the
encodement grid. However, in practical situations where
such a sufficiently fine resolution may not be feasible,
alternative approaches to picture identification are needed.

One such alternative approach investigated in this
thesis characterizes a picture of fixed orientation rela-
tive to an arbitrary encodement resolution in terms of
(1) all the distinct encodement patterns which can result
by encoding that picture in all possible relative trans-
lational positionings with respect to the encodement grid,

and (2) a probability of occurrence for each of these

William Arterburn Burdette

distinct encodement patterns given a random placement of

the encodement grid over the picture. This characterization
is called the snapshot signature of the picture of fixed
orientation relative to the specified encodement reso-
lution, and constitutes a composite representation of the
information contained in all the distinct positional encode-
ments of that picture.

Snapshot signature characterizations are considered
for two-dimensional line drawings. A procedure is pre-
sented for approximating the snapshot signatures of such
pictures whose structural complexity may make the theore-
tical determination of their snapshot signatures difficult.
This procedure is based on a linguistic algorithm which
generates the encodement of a picture in an arbitrarily
specified position relative to a given encodement grid,
using rules resembling the productions of a phrase struc-
ture grammar.

The utility of the snapshot signature characterization
is demonstrated in picture identification at coarse encode-
ment resolutions where the classical pattern recognition
approach fails. Further, it is shown that memory savings
can sometimes be realized by using sequential statistical
picture identification procedures based on snapshot signa-
tures at coarse resolutions, rather than equally reliable

feature extraction techniques at finer resolutions. Hence,

William Arterburn Burdette

snapshot signature characterizations can provide a useful
alternative to increasing encodement resolution for picture

identification.

LINGUISTICALLY ASSISTED RECOGNITION OF PATTERNS
FROM THEIR MULTIPLE TRANSLATIONAL ENCODEMENTS

By

William Arterburn Burdette

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Computer Science

1972

TO

MY PARENTS

ii

‘* ACKNOWLEDGMENTS

I wish to express my gratitude to Dr. Carl V. Page
for serving as my guidance committee chairman and thesis
advisor. He willingly devoted a generous portion of both
professional and personal time to discussions concerning
this research, for which I am most appreciative. His
patience, encouragement, and valuable assistance in
directing the course of this research are gratefully
acknowledged.

I also wish to thank Dr. Robert O. Barr, Dr. Julian
Kateley, Jr., Dr. John B. Kreer, and Dr. Morteza A. Rahimi
for serving on my guidance committee and reviewing this
thesis, as well as for the many inspiring classes which
they provided me as a student.

For the many educational opportunities and generous
financial support provided throughout my graduate studies,
I am highly grateful to: the National Science Foundation;
the National Aeronautics and Space Administration; and the
Division of Engineering Research, the Department of
Computer Science, and the Graduate Office at Michigan State
University.

I wish to thank my typist, MiSs Patricia J. Martens,

for her conscientious and dedicated preparation of this

iii

manuscript. Also, thanks go to Dr. Floyd E. LeCureux for
providing the excellent final drawings of the figures in

this thesis. For the personal sacrifices made by both of
these individuals to insure the rapid completion of this

final draft, I am deeply indebted.

Very special thanks go to my wife Marjorie for her
understanding, support, patience, and loyalty throughout
this research endeavor. Her love and encouragement pro-
vided a special source of inspiration and incentive, which
in large measure sustained my efforts throughout my

education.

iv

 

 

TABLE OF CONTENTS

Page

LIST OF FIGURES. . . . . . . . . . . . . . . . . . . vii
Chapter

1. INTRODUCTION. . . . . . . . . . . . . . . . . l

1.1 Literature Survey. . . . . . . . 6

1.2 Contributions of the Thesis. . . . . . . 16

1.3 Organization of the Thesis . . . . . . . 19

2. SCENES AND THEIR GRID ENCODEMENTS . . . . . . 20

2.1 Basic Concepts . . . . . . . . . . . . . 20
2.2 Scenes . . . . . . 21
2.3 Scene Encodements and Snapshots. . . . . 26
2.4 Screens and Grids. . . . . . . . . . . . 31

3. "LINGUISTIC" SNAPSHOT GENERATION. . . . . . . 38

3.1 Description of Translationally—Fixed

Scene Views. . . . . . . 38
3.2 Resolution Effects on Descriptions . . . 42
3.3 Algorithmic Encodement of T- Scene
Views. . . . . . . . . . . . . . . . . . 49
3.3.1 The "Linguistic" Algorithm. . . . 52
3.3.2 Remarks on Notation . . . . . . . 54
3.3.3 Method of the Algorithm:
A Synopsis. . . . . . . . . . . . 56
3.3.4 An Example. . . . . . 62
3.3.5 Method of the Algorithm:
A Detailed Explanation
and Verification. . . . . . . . . 71
3.3.6 Semantic Interpretation: Two
Special Cases . . . . . . . . . . 91
3.3.7 Grid Extension. . . . . . . . . . 97
3.4 Algorithmic Encodement of Snapshots. . . 101

Chapter

4. SCENE VIEW RECOGNITION WITH SNAPSHOT

SIGNATURES.

4.1 Scene View Recognition: General
Considerations . .

h-ﬁ-h
bLNN

4

«ﬁt-b

4.
4.
4.

.4.1

4.4
4.5
4.6

Snapshot Sets.
Snapshot Signatures.
The Utility of Snapshot Signatures

Scene View Recognition: The
Comparison of Snapshot
Signatures.

Scene View Recognition: The

Testing of Hypotheses .
The Likelihood Ratio and
Sequential Scene View
Identification.

Two Examples.

Practical Considerations.
Blurry Images

5. CONCLUSIONS AND RECOMMENDATIONS

S 1 Summary and Conclusions.
5.2 Suggestions for Future Work.

BIBLIOGRAPHY

General References.

vi

Page

110

110
112
128
140

141
143
145
150
162
170
175

175
178

182
185

LIST OF FIGURES

Figure Page

2.1 Examples of scene description modification

to meet the "junction constraint”. . . . . . . 23
2.2 Examples of typical scenes . . . . . . . . . . 25
2.3 Snapshot examples. . . . . . . . . . . . . . . 28
2.4 More snapshot examples . . . . . . . . . . . . 29
2.5 Examples of finite grids . . . . . . . . . . . 37

3.1 A translationally-fixed scene view and its
set notational representation. . . . . . . . . 41

3.2 Flow chart interpretation of the method upon
which the "linguistic" encodement algorithm

is based . . . . . . . . . . . . . . . . . . . 60
3.3 Flow chart detail of block (E) in Figure 3.2 . 61
3.4 T-F scene view for algorithmic encodement

example. . . . . . . . . . . . . . . . . . . . 63
3.5 Encoded result of T-F scene view of Figure

3.4 on the grid of resolution 4. . . . . . . . 64
3.6 Initial component of flow chart for

"linguistic" algorithm application . . . . . . 86
3.7 Terminal component of flow chart for

"linguistic" algorithm application . . . . . . 87
3.8 Examples of extended grids . . . . . . . . . . 100

4.1 Two examples of snapshot sets. . . . . . . . . 119

4.2 A composite scene view and its snapshot
set. 0 O O O I O 0 O O O O O O O O 0 O O O O O 125

4.3 Another composite scene view and part of its
snapshot set . . . . . . . . . . . . . . . . . 127

vii

Figure Page

4.4 Example of two object views with the same
snapshot set, but different snapshot
signatures . . . . . . . . . . . . . . . . . . 129

4.5 Snapshot signature of a 3 x 5% grid unit
rectangle. . . . . . . . . . . . . . . . . . . 135

4.6 The eight subregions of R which define the
probability distribution for the snapshot
signature of the rectangle in Figure 4.5 . . . 138

4.7 Blurry image represented by the snapshot
signature of Figure 4.4. . . . . . . . . . . . 171

viii

Chapter 1

INTRODUCTION

Automatic machine processing of pictorial infor-
mation has become increasingly apparent in recent years.
The capability of the modern digital computer to rapidly
process large amounts of data has been effectively utilized
for the recognition of patterns in a variety of applica-
tions. For example, textural patterns occurring in
satellite cloud imagery have been successfully analyzed
from video data obtained by the TIROS and NIMBUS meteor-
ological satellites [V-l]. Features of the lunar terrain
have also been successfully analyzed using video patterns
obtained from Air Force Lunar Atlases [V-l]. More common
applications of pattern recognition schemes are evident in
optical character recognition done by bank check pro-
cessors, automatic mail sorters (including mail with
handwritten addresses), and document readers [P-l].
Medical applications of pattern recognition have also been
demonstrated, including automatic sleep state classifi-
cation of human subjects from their electroencephalograph
recordings [V-l].

The overall problem in many pattern recognition

situations is that of identifying, or classifying, an

actual picture or scene relative to a finite set of proto-
types, where the identification is based on the infor-
mation included in one or more inexact representations of
that actual scene. These inexact representations are
usually necessary because the machines involved in data
collection or analysis can physically represent or process
only a finite amount of information. Hence, when two-
dimensional scenes are to be automatically processed, they
are often discretely encoded on some finite two-dimensional
grid, with individual grid squares being shaded with
varying degrees of intensities (gray levels) to reflect
the nature of the scene in the region covered by that grid
square ([R-4] and [D-2]). In the case of line drawings,
this shading is generally done with two gray levels,
resulting in binary grid encodements. A grid square is
shaded in such an encodement if any part of the scene

view lies within the borders of that grid square; other-
wise, the grid square is left unshaded. Clearly, the
resolution of the grid (i.e., its relative fineness)
determines the degree of accuracy with which the binary
encodement represents the structural features of the
actual scene view.

In any practical picture processing situation, the
Optimal encodement grid resolution would be the one which
is as coarse as possible to eliminate the processing of
unnecessary detail, yet fine enough to reflect the signi-

ficant structure of a scene so that it can be unambiguously

identified or classified. Of course, the degree of
encodement accuracy needed for unambiguous identification
is highly dependent upon the class of scenes involved, and
also upon the method used to make the identification.

There are two basic methods which have been used for making
this identification in pattern recognition [N-l]: (1) the
statistical "decision-theoretic approach," which classifies
an unknown scene view based on structural features which
can be extracted (measured) from its encodement and then
compared statistically to known values of these features
for a finite set of prototypes; and (2) the linguistic or
"descriptive approach," which analyzes or identifies an
unknown scene view in terms of the structural relationships
which exist between its component parts or the component
parts of known prototypes, based on a "picture language"
which formally describes the structural relationships

which can exist for the scenes under consideration. Com-
binations of these two approaches have also been pr0posed,
and fall under the general classification of "syntax-aided
analysis" (as opposed to "syntax-directed" or "syntax—
controlled" analysis, which is just approach (2)) [N-Z].
However, in all of these approaches only a single, arbi-
trary encodement of the unknown scene view is processed.
Further, it has generally been the case that the encode-
ment resolution was heuristically picked, under the
assumption that the resolution could be made as fine as

necessary for the identification procedure to be used

effectively ([D-2], [H-Z], [H-3], [N-Z], [V-l]). This last
assumption has practical repercussions in cases where grid
resolutions may be constrained by the physical aSpects of
the situation. For instance, an encodement grid might be
realized by a remote sensing T-V camera, which has a maxi-
mum resolution inherently determined by its design. Or
the encodement grid might be realized by photographic film,
whose resolution has a practical limit determined by its
grain size. So it may not always be feasible to make the
grid resolution arbitrarily fine in scene identification
situations, and alternative solutions to this problem must
be considered.

One classically proposed alternative to increasing
the grid resolution is to improve the identification
methods used at a given resolution by finding features
which have a higher discrimination quality between the
prototype classes ([D-Z], [F-3], [L-Z], [T-1]), or by
"sharpening" the image at the current resolution by various
image enhancement techniques ([L-Z], [R-Z], [R-3]).
However, such approaches are still based on the examination
of a single, arbitrary encodement of the unknown scene view
at the encodement resolution, and thus ignore the effect
that the relative positioning of the encodement grid may
have on the fidelity of an encodement. But since the form
of the encodement of a scene can be significantly affected
by the relative positioning of the encodement grid

(eSpecially at coarse resolutions), it seems reasonable to

expect that additional information about the structure of
the unknown scene view could be obtained by examining more
than one of its encodements at a given resolution. Hence,
this thesis focuses on the investigation of how these
multiple encodements of an unknown scene at a given reso-
lution can be utilized in classifying that unknown scene
with respect to a finite number of prototype classes. To
maintain a reasonable level of complexity, this investi-
gation is limited to examining only those multiple
encodements which can result from translations of an
encodement grid over a scene having a fixed rotational
orientation. However, the methods develOped for this
restricted case can be generalized to the cases where
rotational considerations are important.

The approach taken here in the identification of
unknown scenes using their multiple translational encode-
ments at some grid resolution is a "linguistically—
aided" approach [N-Z] -- i.e., both the linguistic and
statistical decision theoretic concepts of scene analysis
are combined. The linguistic techniques are used
to generate approximate encodement characterizations for
each of the prototype scene classes relative to some
encodement resolution whenever these encodement charac-
terizations cannot be conveniently determined theoretically
(as would be the case for scenes having a complex struc-
ture). Such encodement characterizations -- theoretical

or approximate -- are based upon the various structural

encodement forms which can arise, and how likely it is that
each one will occur, when the encodement grid is trans-
lated over the different scenes in the prototype classes.
Now, given these encodement characterizations for the
prototype classes, statistical decision-theoretic techni-
ques can then be applied to identify an unknown scene view
by comparing an approximation of its encodement charac-
terization (based on a random sample of its different
translational encodements) with the "known" encodement
characterizations for the prototype classes. The effec-
tiveness of such a combined approach to scene identifi-
cation based on multiple translational encodements is
demonstrated in this thesis. In particular, examples

are presented to illustrate the capability of this
approach to correctly identify unknown scenes at coarse
encodement resolutions where other pattern recognition

schemes often fail.

1.1 Literature Survey

 

The "classical” approach to the recognition of
patterns by machine divides the process into three basic
parts [D-Z]: (l) "representation" (encoding or sensing),
(2) "feature extraction" (the selection and measurement
of significant attributes), and (3) "classification” (the
identification or decision process). The gray-level
quantization with discretized pictures (already mentioned)

is the predominant "representation" scheme used in the

literature for line drawing and textural analyses. Such
encodement representations have been shown [R-4] to have
significant redundancy (in terms of information content)
in many situations, and "picture compression" (data com-
pression) can be done with more "efficient encoding"
techniques -- such as "block coding," "predictive coding,"
or "run coding" -- which more directly reflect the infor-
mation content in a discretized representation of a scene.
A detailed description of such "picture coding" and
"picture approximation" techniques can be found in [R-3].
The "feature extraction" portion of the classical
pattern recognition approach deals with taking selected
measurements -- called "features" ~- from an encodement.
These "features" should represent significant attributes
which can be used to distinguish between the various
classes of prototype scenes under consideration. Then,
the values of the features extracted from the encodement
of an unknown scene view can be used effectively to
identify or classify that scene. For instance, significant
features in a character recognition situation might be the
number of vertical (or nearly vertical) line segments in
a character, or the relative degree of straightness along
the edge of a character [D-Z]. In any event, this
selection of significant features to be used for classifi-
cation is a difficult problem, and various approaches to
its solution are discussed in [D-Z], [F-3], and [T-l].

The identification of the various "features" of an unknown

scene from its encodement can also be a difficult task,
since a discrete encodement can only reflect degraded
structural characteristics of the actual scene. Further,
in practical situations, the distortion and noise intro-
duced by physical sensing machinery or data transmission
channels can further degrade the quality of encodement
representations which must be analyzed. Many techniques
have been introduced for "sharpening" a picture before
features are extracted, and such techniques come under the
broad classification of "image enhancement". Two excel-
lent surveys on image enhancement techniques related to
future extraction can be found in [L-Z] and [R-4], with
more detailed presentations given in [A-1] and [R-3].
Numerous "decision rules" have been proposed for
implementing the classification stage of the "classical"
approach to pattern recognition. One of the most straight-
forward is known as "template matching" ([F-Z], [F-3]) in
which the entire encodement of an unknown scene is com-
pared with known encodements (templates) of a set of
prototype scenes. The comparison is based upon a "pre-
selected matching or similarity criterion", and the proto-
type scene view whose template most closely matches the
encodement of the unknown scene view is the one with which
the unknown scene is identified. This "template-matching
approach can be interpreted as a special case [of the]

'feature-extraction' approach, where the templates are

stored in terms of feature measurements and a Special
classification criterion (matching) is used for the

classifier."1

Deterministic classification techniques

have also been proposed which are based on "discriminant
functions" [F-3] defined for each prototype scene class.
These discriminant functions operate on a vector of

feature measurements taken from the encodement of an
unknown scene, and the prototype scene class whose discri-
minant function has the largest value is the one with which
the unknown scene is identified. Other classification
techniques which have become increasingly popular in recent
years [N-l] are statistically based. Such techniques
result from formulating the scene recognition problem as

a hypotheses testing problem, and then using standard
statistical hypotheses testing procedures for performing
scene "classification." These standard statistical
hypotheses testing procedures are described in [H-4],
[M-l], [M-4], [W-l], and [W-3], and their application
specifically to pattern recognition situations is covered
in [F-3] and [F—4]. These statistical techniques are
discussed in more detail in Chapter 4.

Several extensions to the three part classical

approach to pattern recognition have been considered.

 

1 K. S. Fu, Sequential Methods in Pattern Reco nition
and Machine Learning (New York: Academic Press, 19 8),
p. 3.

 

10

These relate to the consideration of contexts in an
encodement [D-Z], the "Optimum decision problem" (deter-
mining the "best" classification techniques for a given
situation), and the "adaptation problem" (determining a
classification technique which can learn from, and adjust
to, various types of picture recognition environments)
[T-l]. The "adaptation" considerations border on the area
of artificial intelligence [M-3], and are especially impor-
tant in remote sensing applications (e.g., unmanned space-
craft). Discussions concerning training and learning in
pattern classifiers can be found in [F-3] and [F-4].
Another basic approach to pattern recognition has
evolved over the past ten years, due to the need for more
powerful and more general methods which are applicable to
complex picture recognition situations. This approach
is the "linguistic" one, based on the description of a
picture in terms of its component parts and the relation-
ships which exist between these components [F-6]. Such a
description might either be "generative" (i.e., useful for
generating some representative of the described picture
class) or "interpretive" (i.e., the result of the analysis
of a given scene view) [N-l]. Most of the schemes which
have been devised for formal picture descriptions are
based on the rewriting systems, or grammars, of formal
language theory ([G-Z], [H-5]). This is a consequence of
the fact that the "heirarchical structure" among the sub-

patterns which are components of a larger pattern is often

11

"analogous to the syntactic structure of languages", and
formal language theory provides a convenient description
for such structure (via the grammar) and a method for
analyzing this structure (parsing) [F-6]. Hence, the name
"syntactic" picture processing has been adOpted.

Since formal language theory has been develOped on
the basis of one-dimensional strings of characters, some
of the first attempts at syntactic picture processing
involved the encoding of a Z-dimensional pattern into a
one-dimensional string. One well known procedure for doing
this is Freeman's "chain encoding" method for line pat-
terns, in which

"...a rectangular grid is overlaid on the
two-dimensional pattern and straight line
segments are used to connect the grid points
falling closest to the pattern... Each line
segment is assigned an octal digit according

to its slope. The pattern is then repre-
sented by a (possibly multiply connected)

chain or chains of octal digits."2
Feder [F-l] has used this encoding scheme to form pattern
languages based on equations in two variables (lines,
circles, etc.) and pattern properties (convexity, etc.).
A somewhat more general approach to picture description
(and analysis) was presented by Shaw [S-l] in his "Picture

Description Language." Shaw uses strings to encode

 

2 K. S. Fu and P. H. Swain, "On Syntactic Pattern
Recognition," in Software Engineering, ed. by Julius T.
Tou (New York: Academic Press, 1971), pp. 158-159.

 

12

pictures which can be represented by multiply connected
graphs, but this method has been criticized because it does
not permit any contextual considerations to be considered
in either picture generation or analysis [F-6].

One of the major problems with l-dimensional string
representations of higher dimensional patterns is that
each pair of characters in the string ("features", or
primitives, of the pattern) are related in exactly one
way, according to their relative position in the string.
Hence, only a single relationship can be expressed between
any two primitive features of the pattern. The practical
problems which such a restriction can cause are evident in
[M-Z], where a robot is described which needs to identify
multiple relationships between the primitives of a scene
in order to identify that scene. A method for linguis-
tically considering multiple contextual relationships in
generating scene descriptions is presented in [D-l], in
which encodement patterns in the form of simple geometric
figures (e.g. a triangle) are generated on a grid. During
the generation, symbols are introduced on the grid on the
basis of the symbols which already appear in up to all
eight of its neighboring grid squares. Unfortunately,
there was no discussion regarding how one might use these
two-dimensional descriptions in a parsing scheme.

A "natural" generalization of a string language to
two-dimensions is represented by the "web grammars"

defined by Pfaltz and Rosenfeld [P-4]. The language

13

defined by a web grammar consists of directed graphs with
symbols at their vertices ("webs"), rather than a one-
dimensional string of characters. The web generation rules
can be used to successively imbed subwebs into a larger
web, enabling a complex two-dimensional pattern to be
described. Since the embeddings can be done based on the
consideration of two or more nodes (vertices) at a time,
contextual considerations can be reflected in web grammar
descriptions. Hence, if each directed edge between two
nodes in a web is considered as a primitive of a pattern,
the web grammar can be considered as a generalization of
Shaw's "Picture Description Language" which allows contex-
tual considerations.

Several two- (and higher -) dimensional linguistic
description schemes which are capable of reflecting mul—
tiple (and often arbitrary) relationships between scene
primitives have recently been presented. One scheme uses
the concept of a "relationship matrix” [R-l] to describe
the features, and relationships existing between these
features, of arbitrary n-dimensional scenes (particularly
line drawings). In the two-dimensional case, the features
of the scene (e.g. line segments, points, or even objects)
label both the rows and columns of the 2-dimensional
relationships matrix, and each entry in the matrix is a
vector specifying significant relationships (e.g. angle,
distance, connectivity, etc.) which exist between the two

features which are the labels of that row and column in the

14

matrix. The on-diagonal entries in the relationship
matrix describe the features of the scene themselves

(e.g. their length or dimensions, their orientation, etc.).
Grammars based on the generation and manipulation of arbi-
trary relationship matrices can be used to produce scene
descriptions (in relationship matrix form), and such
grammars have proven useful in identifying an unknown
representative of four prototype classes of handwritten
characters through a "parse" of the unknown character's
structural relationships [R—l]. While the relationship mat-
rix concept can be used as the data base for syntax-directed
generation and analysis of actual scene views, a "picture
processing grammar" has been proposed by Chang [C-l] which
can be used to generate and analyze arbitrary patterns of
shaded grid squares on a two-dimensional encodement grid.
Such patterns can be considered as those which might

result from a discretized representation of an actual

scene view. The grammar uses coordinates assigned to each
square of the encodement grid to generate descriptions of
these patterns based on certain significant groupings of
the shaded grid squares -- e.g., those which form hori-
zontal strings, or "lines”, on the encodement grid, which
can be described by the special symbol "h—line" followed
by parameters which specify both the length of (i.e. the
number of shaded grid squares in) the string and the
coordinates of the leftmost shaded grid shaded grid square

in the string. The construction of picture processing

15

grammars which can generate a given set of shaded grid
square patterns on an encodement grid was considered by
Chang, and such a grammar constructed for describing the
set of patterns which represent the encodements of hand-
written numerals (0 through 9). This grammar was then used
to analyze arbitrary samples of encoded handwritten
numerals, resulting in the correct identification of
approximately 70% of the samples [C-l]. Other methods of
syntactically processing scene encodements have recently
been developed, and an excellent survey of these methods
can be found in [P-2] and [P-3].

One interesting pattern recognition scheme which
is neither statistical nor linguistic was proposed
by Weiman and Rothstein [W-Z]. The scheme is based on
using parallel processing cellular automata to recognize
straight line patterns on an n x n grid. The encodement
of a straight line is given by a string of binary digits
which can be constructed by tracing the line from left to
right on the encodement grid, and writing a "0" for each
grid square crossed through parallel sides and a "l" for
each grid square crossed through perpendicular sides,
omitting the next grid square crossed in this latter case.
Such a code reflects the relative positioning of a line
with respect to the grid squares, and cyclic shifts occur
in the code as the grid is translated over the line to
produce different encodements. The structure of the code

and its translational cyclic shifting pattern can be used

16

to characterize the line on the encodement grid. This
characterization is then used in [W-2] as the basis for
various procedures to recognize straight lines, t0pological
connectedness, and arbitrary polygons from their represen—

tations on an encodement grid.

1.2 Contributions of the Thesis

The overall objective of this thesis is to
investigate how the combined information contained in the
multiple translational encodements which are possible for
a particular scene (under various relative translational
positionings of the encodement grid) at a given resolution
can be effectively utilized in pattern recognition. Such
combined multiple encodement considerations have not been

3

previously dealt with in the literature, so the investi-

gation within this thesis involves a unique approach

 

3 Multiple translational encodements of straight lines
were considered by Weiman and Rothstein [W-Z], where the
primary concern was with representing a two-dimensional
encodement with a l-dimensional code, and then noting the
cyclic changes which occurred in the code as the encodement
grid was translated relative to the straight line.

Although such a code does reflect the basic structural
features of each encodement, only straight lines of
rational slope could be fully characterized on a finite
grid by the cyclic changes in their codes. Even then, the
length of the line was always assumed to be sufficient to
span the entire grid in any translational position.

The characterization of scenes in terms of their
multiple encodements is considered at a much more general
level in this thesis. The two dimensional representation
for each of the encodement patterns which can result by
translating an encodement grid over an (arbitrary) scene
is retained (so that arbitrary relationship information
can be maintained). Further, the relative degree

17

in pattern recognition technology. The utility of this
approach is specifically demonstrated relative to the
scene classification problem at coarse resolutions, where
other pattern recognition schemes often fail.

As a result of this new approach and the linguisti-
cally aided scheme with which it is investigated, several
other contributions are made by this research. These
include:

(1) the definition of some new characterizations of
scenes in terms of all their possible encodement
forms relative to a given resolution, as well as
the formalization of some previously intuitive
concepts (e.g., resolution) of picture proces-

sing;

 

of persistence (reflected as a probability of occurrence

in Chapter 4) with which each distinct encoding pattern
appears as the grid is uniformly translated over the scene
is also an impotrant part of the charcterization of

a scene in terms of its multiple encodements. Such "per—
sistence" (probabilistic) pr0perties were not considered

by Weiman and Rothstein, but these properties form a quite
powerful part of a positional encodement characterization
of a scene relative to some grid resolution. In fact,
these persistence properties not only permit the

rational lepe line characterization constraints to be
relaxed, but also permit finite length line segments of any
size to be fully characterized on a finite encodement grid.
Additionally, many complex scenes can also be characterized
(at least relative to other scenes of interest) with
respect to almost any encodement resolution. The full
detail of this general characterization of a scene in terms
of its multiple translational encodements is presented

in Chapter 4.

and

(2)

(3)

18

the proposal of a new 2-dimensional linguistic
description scheme (denoted "interpretive coordi-
nate grammar") which is used to generate an
encodement of a specified scene (line drawing)
relative to an arbitrarily chosen resolution.
This differs from Chang's "picture processing
grammar" [C-l] -- which generates an arbitrarily
specified set of encodement forms on a grid --
in that the encodements generated by the "inter-
pretive coordinate grammar" are based on an
actual scene, and a specified relative posi-
tioning of that scene, with respect to the
encodement grid. Although the form of the rules
in the "interpretive coordinate grammar" repre-
sent an extension of the (coordinate) form of
the rules in Chang's "picture processing gram-
mar", the rules of the "interpretive coordinate
grammar” -- unlike Chang's rules -- require a
semantic evaluation of the coordinates after each
step in the generation of a scene description.
This is due to the complexity involved in con-
sidering encodements generated from a specified
positioning of an actual scene view relative to
an encodement grid -- a consideration which is
ignored by Chang;

the application of standard sequential statis-

tical decision-theoretic hypothesis testing

19

procedures to scene identification, based on the
multiple translational encodement character-

izations of such scenes.

1.3 Orggnization of the Thesis

Chapter 2 defines the basic terminology which is
used in this pattern recognition investigation, and for-
malizes some of the intuitive concepts of picture proces-
sing. Chapter 3 concentrates on a two-dimensional lin-
guistic description scheme for generating an encodement of
a scene relative to an arbitrary resolution, based on a
standardized representation of that scene in some fixed
orientation and relative translational position with
respect to the encodement grid. Chapter 4 discusses the
theoretical characterization of scenes in terms of their
multiple translational encodements, and presents techni-
ques for approximating these characterizations based on the
linguistic description scheme of Chapter 3. The utility
of these characterizations in scene recognition situations
is then demonstrated within the framework of statistical
hypothesis testing. Chapter 5 concludes the thesis with

a summary of results and suggestions for future work.

Chapter 2

SCENES AND THEIR GRID ENCODEMENTS

2.1 Basic Concepts

 

The general class of pictures, or "scenes," to
be considered throughout this thesis are plane geometric
line drawings. A finite, symmetrical square grid can be
superimposed over a scene, and the scene then encoded into
a two-dimensional grid pattern by shading only those grid
squares which are intersected by lines in the scene. The
entire scene encodement can be accurately described in the
standard terminology of picture processing in [R-4] as a
"quantized digital picture" having both the shaded and
non-shaded grid squares as its binary valued "picture
elements." That portion of the scene encodement consis-
ting only of the pattern of shaded grid squares will be
called a "snapshot" of the scene.

The scene is always considered as being "viewed"
through an aperture called the "screen." This screen will
have fixed size, shape, and orientation characteristics
associated with it for each specific picture processing
application. This will provide a standard reference for
the specification of any grid to be used in encoding the

scenes under consideration.

20

21

2.2 Scenes

 

The concept of a "scene" is formalized as

follows:

Definition 2.1

 

A sgege is a two-dimensional pattern composed of
a finite, but non-zero number of straight line seg-
ments of finite, non-zero length, with any intercon-
nections (points of intersection) between the line
segments existing only at their endpoints.

A primitive scene is a scene composed of exactly

 

one straight line segment.

We will denote as the "junction constraint" that part of
the preceding definition allowing interconnections between
line segments only at their endpoints. The junction con—
straint is introduced merely for the convenience of stan-
dardized scene descriptions (see Chapter 3), and does not
restrict the class of scenes under consideration. This is
so because a scene containing two straight lines which
intersect at a point other than a common endpoint of both
lines can alternatively be described as a scene containing
this intersection point as the common endpoint of several
shorter lines. These shorter lines are formed by dividing
both of the intersecting lines at their common point of
intersection, with no other alteration of the structure of

the scene. With the description of the scene thus

22

modified, the junction constraint is met without changing
the inherent form of the scene in terms of its snapshot
generation characteristics. And since we will be exam-
ining scenes only through their snapshot representations,
we have effectively left the class of scenes under con-
sideration unchanged.

The modification of a scene description to meet the
junction constraint is illustrated in Figure 2.1, where
two alternative scene descriptions are provided for both
the letter "T" and addition operator "+". Both descrip—
tions given for each scene are valid, but only one con-
forms with the junction constraint in each case.

The straight line segments which comprise a scene
will be considered "ideal", i.e., arbitrarily thin. No
additional restrictions -- other than the non-zero but
finite length, and junction constraints of the definition
-- are placed on these line segments at this point,
although restricting their slopes to being rational can be
useful when scenes will be recognized only in fixed orien-
tations (see [W-2]). In addition to the lines themselves,
an important part of scene specification is incorporated
in the relationships existing between lines in the pattern
-- e.g., whether two lines are connected, perpendicular,
parallel, of different lengths, etc. However, it is
important to note that only the lines themselves and their
relative relationships are considered as part of the scene

structure; any areas or regions that might be enclosed or

(a)

Figure 2.1.

23

 

 

 

 

Descriptions of the
letter "T" and addition
operator "+" by two
intersecting straight
lines, L1 and L2, which

(b)

are connected at a point
other than a common end-
point of both lines.

 

Alternate descriptions
of the letter ”T" by 3
lines intersecting at
a common endpoint, and
the addition operator
"+" by 4 lines inter-
secting at a common
endpoint. The respec-
tive descriptions in
(a) have been modi-
fied so that all
interconnections be-
tween lines exist only
at their endpoints.

Examples of scene description modification

to meet the "junction constraint.”

24

otherwise implicitly defined by the line segments are not
considered part of the scene. In other words, we are con-
sidering line drawings, not solid objects or areas.

If we consider a scene as a graph, with the vertices
being the collection of all line segment endpoints and the
edges being the line segments themselves, then we can make
the following definitions for scenes using the terminology

of graph theory in [H-l]:

Definition 2.2

 

An object is any connected component (maximally
connected "subgraph") of a scene.

A simple scene is a scene composed of exactly

 

one object.

A composite scene is a scene containing two or

 

more objects.

Note that a composite scene is disconnected by definition
in a graph theoretic sense.

Some examples of typical scenes are shown in Figure
2.2. Even though the scenes are illustrated in a parti-
cular orientation, it is important to bear in mind that
scenes are structures defined irrespective of any specific
orientation in which they may appear. Note also that only
straight line approximations of any nonlinear geometric

contours are permitted by the definition of scenes.

25

/ \ \/

 

 

 

 

 

 

 

 

(a) Primitive Scene (b) Primitive Scene (c) Simple
(also a simple (also a simple Scene
scene, or object) scene, or object) (object)

(d) Simple Scene (e) Simple Scene (f) Simple Scene
(object) (object) (object)

(g) Simple Scene (h) Composite Scene (1) Composite Scene
(object) (2 objects) (3 objects)

(j) Composite Scene (k) Composite Scene
(4 objects) (4 objects)

Figure 2.2. Examples of typical scenes.

26

2.3 Scene Encodements and Snapshots

 

In order to encode scenes for processing by a
digital computer, or to provide scene encodements which
allow the possibility of grammatical description, it is
convenient to "discretize" the scene in terms of a finite
set of finitely valued scene primitives. The discretizing
method chosen here encodes the scene into a finite number
of shaded and non—shaded grid squares, with each grid
square being a binary valued "primitive" of the scene
encodement. The shaded portion of an encodement is called
a "snapshot," which is formally defined next along with

the construction rules for forming a scene encodement.

Definition 2.3

 

A snapshot of a scene is the entire pattern of
zero or more shaded grid squares which results when
a finite square grid is superimposed over the scene,
and the scene then encoded on the grid according to
the following grid square shading scheme:

(1) any grid square having at least one of

its sides touched (intersected) by one
or more of the line segments in the
scene is shaded;

and (2) no other grid squares except those

specified in (l) are shaded.

27

Notice that this definition implies that it is only the
structural form of the shaded grid square pattern which
defines the snapshot, independent of the relative trans-
lational location of the pattern on the grid. Addition-
ally, notice that the empty snapshot, defined as the
snapshot resulting from a scene encodement having no
shaded grid squares, is a valid entity.

Examples of snapshots for each of six different
scenes are shown in Figures 2.3 and 2.4. Two encoding
peculiarities arise in these examples which may warrant
further explanation. The first is that a line segment of
a scene which lies exactly along a vertical or horizontal
grid line of a superimposed grid will, when encoded,
shade grid squares along its length on both sides of the
grid line. The second is that a line segment of a scene
which passes through an intersection point of the hori-
zontal and vertical grid lines in a superimposed grid will,
when encoded, shade all four grid squares having this grid
intersection point as a common corner. Careful exami-
nation of the encoding scheme will reveal that the grid
shading is being done in accordance with the Specified
rules, with only grid squares having one or more of their
Sides touched (intersected) by a line in the scene being
Shaded in the encodement. This is clear for the first
Case, and is clarified for the second case by noting that

a corner point of a grid square is actually part of two

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2.3.

28

(a) The empty snapshot of a simple
scene, or object (a right,
isosceles triangle with leg
length of 1/2 grid unit), from
a 2 x 2 grid encodement.

(b) A snapshot of a primitive
scene (a straight line seg-
ment with length of /2 grid
units) from a 4 x 4 grid
encodement.

(c) A snapshot of a primitive
scene (a straight line seg-
ment with length of 6-1/4
grid units) from a 6 x 6
grid encodement.

Snapshot examples.

 

 

 

 

Figure 2.4.

29

(a) A snapshot of a simple scene,
or object (a rectangle of
dimensions 3 x 3-1/2 grid
units), from a 6 x 6 grid
encodement.

(b) A snapshot of a simple scene,
or object (an equilateral
triangle with side length of
4 grid units), from a 6 x 6
grid encodement.

(c) A snapshot of a three object
composite scene from a 12 x 12
grid encodement.

More snapshot examples.

30

sides of that grid square (and also part of two sides of
each of the three other grid squares sharing the same
corner point).

The rationale behind defining the encodement scheme
in a way which permits these peculiarities to arise
becomes apparent by considering a practical implementation
of this scheme. The grid might be physically approximated
by a finite, discrete array of contiguous photoelectric
sensors, each with sufficient sensitivity for detecting
the presence of any portion of an image on any part of its
photocathode.1 Now suppose that the image of a scene com-
posed of non-ideal line segments (i.e., ones with a finite,
non-zero thickness) is projected on this array of photo-
electric sensors. It is easy to imagine that a line seg-
ment whose projected image would theoretically (if it were
arbitrarily "thin") lie on the common border between
adjacent sensors would realistically have sufficient
thickness to be detected by the sensors lying on both
sides of that border. Similarly, a "thick" line segment
whose projected image passes through the common border
point of four adjacent sensors -- and whose ideally thin

image would also pass through this same point -- would in

 

1 The photoelectric sensors are assumed to have
square-bordered areas for their photocathodes. Addi-
tionally, the physical realization discussed here is
referred to as an approximation because the empty snap-
shot cannot be realized.

31

reality be detected by all four of those sensors. Note
that both of these situations occur in practical appli-
cations solely because of the thickness of the line seg-
ments, which permits one location along the length of a
single line segment to physically overlap the photo-
cathodes of two or more sensors simultaneously. Thus,
with grid squares corresponding to sensors, we see that
the apparent peculiarities in our theoretical encodement
scheme for ideally thin line segments anticipate the
extension of the scheme to practical applications invol-

ving scenes composed of finitely thick line segments.

2.4 Screens and Grids

It should be evident from the discussion and
examples of the previous section that whenever a grid is
superimposed over a scene, the resulting encodement is
dependent upon each of the following factors: (1) the
relative translational position between grid and scene;
(2) the relative rotational orientation between grid and
scene; and (3) the relative size of the grid squares with
respect to the size of the scene. The first factor will
be discussed in detail in Chapter 4, where it will become
apparent that most scenes produce more than one snapshot
when encodements are made with a grid of fixed square size
and orientation that is superimposed in different relative
locations with respect to the scene. The last two factors

affecting scene encodements will be considered in the

32

remainder of this section, with the result being the
formalization of the concepts of grids and their speci-
fications only intuitively referred to up to this point.
In many practical picture processing situations it
is realistic to assume that a finite bound exists on the
size of all scenes under consideration. This means that
some single (but not unique) finitely bounded area can be
found whose borders -- in all orientations -- could
entirely enclose any one of the scenes to be analyzed.
In this case, it will be no loss of generality to further
assume that such an area has a square border of fixed,
finite size. Such a square bordered area, once specified
and fixed in an application, will be called the "screen"

for that application.

Definition 2.4

 

A screen is some fixed, finite, square-
bordered two—dimensional area which is specified
for a particular picture processing application,
and whose border -- in every rotational orientation --
could be translationally positioned to entirely
enclose (without contact) each individual scene

under consideration.

33

The single screen thus chosen for a particular application
will serve as the fixed size reference for all scenes and
grids under consideration in that application. The length
of a side of the screen border will be assigned an arbi—
trary reference value of ”1 unit."

The screen can be envisioned as a movable aperture of
fixed size, through which any scene to be encoded and
processed will be viewed. The screen positioning motions
would generally be permitted three degrees of freedom --
two-dimensional translation, and rotation -— but, as noted
in Chapter 1, we are restricting this investigation by
allowing only translational motions when viewing and
encoding a scene. Thus, we will hereafter restrict our
attention in any application to a single screen having one
fixed, but arbitrarily Specified, rotational orientation.
The effect of this restriction is that each scene under
consideration will be viewed in only one specific rota—
tional orientation at a time. But Since scenes are struc-
tures having no definite orientation, the specific rota-
tional orientation of any scene viewed through a screen
of fixed rotational orientation will be arbitrary.
Therefore, if we characterize a "view" of a scene by
specifying a fixed, but arbitrary, rotational orientation
of the scene relative to the screen through which it is
viewed, the particular rotational orientation of that

screen is immaterial. Hence, for notational convenience,

34

we will always assume that any view of a scene is defined
relative to a rotationally fixed screen having its border
lines oriented horizontally and vertically.

Motivated by the preceding discussion, we now make

the following definitions:

Definition 2.5
A standard screen is a screen with a fixed
rotational orientation which makes its border lines

horizontal and vertical.

Definition 2.6
A view of a scene consists of the scene in some

single, arbitrary, rotational orientation defined

relative to a standard screen.

The single standard screen thus chosen in an application
serves as the size and orientation reference for any view
of a scene. Note that each view possible for a scene can
be considered by simply specifying the appropriate fixed
rotational orientation of that scene defined relative to
the standard screen.

With a Single standard screen as the Size and orien-
tation reference, we can now discuss standardized grid
descriptions. The grid structure is formed by using

evenly spaced horizontal and vertical lines to

3S

symmetrically divide the entire area enclosed by the
standard screen into identically-sized square cells. The
square root of the total number of these square cells then
defines the "resolution" of the grid. Any grid thus
formed can be considered as a "retina" inscribed on the
standard screen, with scene views being imposed and
encoded on this retina. The ability of a retina to
reflect the detail in an image imposed Upon it depends on
the Size, number, and spacing of its sensory elements.
Analogously, with grid squares considered as the "sensory
elements," the ability of a grid to reflect detail in the
encodement of a scene view is characterized by the rela-
tive fineness, or "resolution," of that grid.

The concepts of grids and grid specifications are now

made more precise.

Definition 2.7

 

A grid is the symmetrical geometric lattice,
defined by the border lines of the standard screen
and zero or more additional intersecting horizontal
and vertical line segments of length l which lie
between those border lines such that the standard
screen is divided into a number of smaller, identi-

cally-Sized square cells.

36

Definition 2.8

 

The resolution of a grid is that positive,

 

non-zero integer whose algebraic square represents
the total number of identical square cells which the
grid defines on the standard screen.

The dimensions of a grid of resolution "r" are

 

denoted by the expression "r x r".

A finite grid is a grid having a finite

 

resolution.

We note from the above definitions that the standard
screen is considered as the finite grid of resolution "1”.
Some examples of finite grids with different reso-
lutions defined relative to the same standard screen are
Shown in Figure 2.5. Note that all of the grids are of
the same absolute size, although each one has a relative
Cartesian coordinate system uniquely associated with it in
a natural way according to its resolution. The lower
horizontal border line of the grid is taken as the posi-
tive x-axis, and the left vertical border line is taken
as the positive y-axis, with the common intersection point
of these axes at the lower left-hand corner of the grid
serving as the origin of the natural coordinate system.
These relative coordinate systems for grids will be useful
in examining the effects of resolution changes on scene

descriptions, to be considered in Chapter 3.

 

 

 

 

0 1
(a) Grid of resolution "1",
dimensions "1x1".
(The Standard Screen)

 

3

 

 

 

 

 

 

 

(C) Grid of resolution "3",
dimensions "3x3".

 

 

 

 

NAN->010

 

 

 

 

 

 

 

 

 

 

0 l 2 3 4 5 6
(e) Grid of resolution "6",
dimensions "6x6".

37

 

 

 

 

 

 

O 1 2
(b) Grid of resolution "2",
dimensions "2x2".

 

 

 

 

 

 

 

 

 

 

0 l 2 3 4
(d) Grid of resolution "4",
dimensions "4x4".

12
10

45(300

 

0 2
(f) Grid of resolution ”12",
dimensions "12x12".

Figure 2.5. Examples of finite grids.

Chapter 3

"LINGUISTIC" SNAPSHOT GENERATION

3.1 Description of Translationally-
Fixed Seene Views

 

 

Since scenes will be examined via their encode-
ments, it is desirable to develOp an algorithmic procedure
for producing encodements from scene descriptions. This
algorithm can be divided into two parts: (1) the gener-
ation of views of a scene from a single general scene
description, and (2) the generation of encodements for any
view of a scene. The first part relates to the derivation
of specific descriptions of a scene for arbitrary rota-
tional orientations; the second part deals with the
effects of the relative translational positioning between
a view of a scene and the grid being used for encoding
snapshots. This two-part division thus separates the
rotational and translational aspects of the process, per-
mitting the independent study of either one. In accor-
dance with our original objectives, we will confine this
investigation to translational considerations by assuming
that scene views are our starting point, and then focusing

our attention on the view encodement generation process in

38

39

part (2). The first concern here is the development of a
uniform standard for view descriptions.

The definition of "scene" in Chapter 2 implicitly
designates finite length straight line segments as scene
"primitives." Note that a scene can be uniquely charac-
terized by specifying its structure in terms of these
"primitives" (including their lengths), along with the
interconnections (at endpoints), the relative trans-
lational locations, and the relative rotational orien-
tations existing between them. Suppose such a scene
description is constrained by fixing both the relative
rotational orientations and the relative translational
locations of the scene primitives with respect to the
designated standard screen, without alteration of the
scene structure. In this case, the primitives themselves
and the relative relationships -— i.e., interconnections,
orientations, and translational positions -— existing
between them remain unchanged, and the resulting descrip-
tion specifies a translationally-fixed view of the scene.

Recall the elementary fact that a straight line seg-
ment having both its rotational orientation and trans-
lational position fixed can be completely characterized
by the coordinates of its two endpoints specified relative
to a fixed two-dimensional coordinate system. Thus it
becomes apparent that a translationally-fixed [hereafter
denoted as "T-F"] view of a scene can be completely

characterized by representing each of its "primitives"

40

by their endpoint coordinates Specified relative to the
standard screen. The notational representation for the
T—F scene view will be a finite, non-empty set whose
elements are sets of two ordered pairs -- each ordered
pair in a set representing the Cartesian coordinates of a
point on the standard screen, and each set of two ordered

pairs representing the two endpoints of one of the primi-

tives of the T-F view. An example of a T-F view of a
scene is shown in Figure 3.1 - (a), with its corresponding
notational representation in terms of a set of coordinate
pairs given in Figure 3.1 - (b).

Notice that any T-F view of a scene has a unique
representation using this notation, due to the uniqueness
of the description of the scene's structure guaranteed by
the junction constraint in Definition 2.1. Furthermore,
we observe that any finite set consisting of one or more
sets of two ordered pairs describes some unique T-F
scene view if and only if such a set has the following
prOperties:

(a) both coordinates in each ordered pair have

a value between 0 and l, inclusive;

(b) the tWO ordered pairs in any one set are

not identical;
and (c) the line segments, defined by interpreting
each set of ordered pairs as the two

endpoints of the straight line segment

41

 

(.5,.7)

 

(.1,.2) (.9,.2)

 

 

 

0 1

(a) A T-F view of a scene illustrated
on the standard screen.

{{(.l,.2),(.S,.7)},{(.S,.7),(.9,.2)},{(.9,.2),(.l,.2)}}

(b) The set notational representation of
the T-F scene view in (a).

Figure 3.1. A translationally-fixed scene view and its
set notational representation.

42

connecting them, would intersect each

other only at common endpoints if at all.
This is clarified by noting that condition (a) insures
that the T-F scene view appears on the standard screen,
condition (b) guarantees that each primitive represented
by its endpoints has non-zero length (the finite length
requirement is guaranteed by condition (a)), and condition
(c) insures that the junction constraint is met by the
scene whose T-F view is being represented.

The uniqueness prOpertieS noted in the previous
paragraph confirm the utility of the set notational repre-
sentation for T-F scene views. Unless designated other-
wise, we will hereafter assume that any T-F view of a
scene is expressed relative to the coordinate system of
the standard screen using this set notational represen-
tation.

3.2 Resolution Effects
on Descriptions

 

 

In order to consider encodements with grids of
various resolutions, it becomes necessary to investigate
the conversion of a T-F scene view description from the
coordinate system of the standard screen to the coordinate
system of a grid of resolution "r". We begin by re-
examining the coordinate systems of grids in greater
detail.

Recall that the natural coordinate system of a grid

has the lower horizontal grid border as the x-axis, the

43

left vertical grid border as the y-axis, and the lower
left corner of the grid as the origin. Additionally, for
a grid of resolution "r", the lower right corner of the
grid defines the maximum x-coordinate value as "r" while
the upper left corner of the grid defines the maximum
y-coordinate value as "r". Further, each axis is divided
into r equal divisions by r + 1 points (including both
endpoints of the axis) which are consecutively labeled
from the origin with the integers "0" through "r". Each
integer valued point along the x-axis represents the lower
endpoint of one of the r + 1 identical length vertical
line segments of the grid (which include the two vertical
border line segments of the standard screen), while each
integer valued point along the y-axis represents the left
endpoint of one of the r + 1 identical length horizontal
line segments of the grid (which include the two hori-
zontal border line segments of the standard screen). The
horizontal and vertical line segments intersect at

(r + 1)2 points, and divide the standard screen into r2
identical squares. Of course, any point lying along a
vertical grid line has an integer valued x-coordinate,
while any point lying along a horizontal grid line has an
integer valued y-coordinate. Thus, the intersection
points of the horizontal and vertical grid lines are
characterized by having integer valued x and y coordinates

relative to the natural coordinate system of the grid.

44

Definition 3.1

 

A lattice point (or intersection point) of a

 

 

grid is any point defined by the intersections of
the horizontal and vertical line segments (including
the border lines of the standard screen) defining

that grid.

It should be obvious from the preceding discussion that a
grid of resolution r has (r + l)2 lattice points, each one
having integer valued x and y coordinates relative to the
grid.

It is important to bear in mind here that an increase
in grid resolution is obtained not merely by increasing
the number of grid squares which are present, but also by
shrinking the size of individual squares in the grid.

This is due to the fact that all grids under consideration
at any one time are defined relative to the same standard
screen. Thus, every grid coordinate system has the same
origin and the same x and y axes. The only distinction
between the coordinate systems of grids of different
resolution is the relative coordinate labeling along the
x and y axes. So, given a fixed point on the standard
screen being used, each grid imposed on the standard
screen would merely relabel the coordinates of the point
relative to the natural coordinate system of that grid.
In particular, the coordinates of such a fixed point

relative to the grid of resolution "1" (which is the

45

standard screen) can be multiplied by the factor r' to
obtain the coordinates of that same fixed point relative
to the grid of resolution r'. This Simple coordinate
transformation is known as a "dilation", or "stretching",
by r' relative to coordinate system of the standard screen

[L-l], and forms the basis for the following Lemma:

Lemma 3.1

 

A fixed point on the standard screen having
coordinates (a,b) relative to the grid of resolution
"1" will have coordinates (r'a,r'b) relative to the

grid of resolution r'.

Corollary 3.1

 

A fixed point on the standard screen having
coordinates (x,y) relative to the grid of resolution
' I
r will have coordinates (%—x, %—y) relative to the

grid of resolution r'.

2222::
Let the fixed point on the standard screen

having coordinates (x,y) relative to the grid of

resolution r have coordinates (a,b) relative to the

grid of resolution "1".

Then using Lemma 3.1,

(x,y) = (ra.rb) = r(a.b),

46

which implies that

(a,b) = 31,- (x,y) = c; , >14).
Also, the fixed point having coordinates
(a,b) relative to the grid of resolution "1"
will have coordinates (x',y') relative to the
grid of resolution r', determined using
Lemma 3.1 as

(X'.Y') = (r'a,r'b) = r'(a.b),

which implies that

(a,b) = 51:. (x',y') = (%.,I)
So, we have
_ 1 t l — 1
(a,b) — "I?! (X ,Y ) ' ‘1'." (XQY)

which gives

I
A
J.

'1'"!
~<‘
U

(x',y') = §l (x,y) - I.
Q.E.D.

Recall from Section 3.1 that a T-F scene view can be
completely characterized in terms of the coordinates of
the endpoints of each scene primitive specified relative
to the standard screen. By Lemma 3.1 we can multiply each
of these endpoint coordinates by the same factor r to
obtain their corresponding specification relative to the
grid of resolution r, with the aggregate result providing
a description of the same T-F scene view relative to the
grid of resolution r. Because this T-F scene view des-

cription change is accomplished using a direct similarity

47

transformation1

which also preserves the lepe of any
linez, it follows that a change in grid resolution when
viewing a scene does not affect the orientation of the
primitives, the relative positioning among the primitives,
or the relative interconnections which exist between the
primitives. Only the lengths of the primitives and
relative distance measures within the scene view are
changed, all being multiplied by the same factor which

determines the coordinate relabeling in the grid reso-

lution conversion. We thus have the following theorem:

Theorem 3.2

 

Given a T—F view of a scene having the
following notational representation in terms
of a set of coordinate pairs specified relative

to the standard screen:

 

1 Intuitively, a Similarity transformation is a
mapping of the plane into itself which preserves angle
size. A direct similarity transformation is a Similarity
transformation which preserves the sense of every angle.
For a more formal definition and detailed discussion of
similarity transformations, including their structure
preserving prOperties, see [6-1].

2 If the endpoint coordinates of a straight line seg-
ment on the standard screen are (a,b) and (c,d), the line
segment has SIOpe %}2. By Lemma 3.1, the same line seg-
ment on the grid of resolution r will have endpoint
coordinates (ra,rb) and (rc,rd) respectively, with its
rd—rb = d-b
rc-ra c-a'
arbitrary, this shows that a change in grid resolution
does not affect the slope of straight lines.

lepe given by Since the resolution r was

(I)
II
k

{

 

K

48

(a1,b1),(a2,b2)}

Then the set

rS = {:{(ra1,rb1),(ra2,rb2)}

 

(a1,bl) and (a2,b2) are
the standard screen coor-
dinates of the two end-
points of a scene primi-

tive

and r an integer 3 1

{(a1,b1),(a2,b2)}eS}

describes the same T-F scene view at grid reso-

lution

111.09 .

An immediate consequence of this theorem, in light of

Corollary 3.1, is the following:

Corollary 3.2

Given a T-F scene view having the following

notational representation in terms of a set of coor-

dinate pairs specified relative to the grid of

resolution "r":

r

 

{(xl’yl)’(x2’y2)}

 

(x1,y1) and (x2,y2) are
the coordinates on the

grid of resolution r of
the two endpoints of a

scene primitive

49

Then the set

{(x . ).(x . l}
r. r, 1Y1 2Y2

r' r' r' ,
F_ S = {(?—x1,;—y1),(F—x2,¥—y2)} e S and r,r are

integers Z 1

describes the same T-F scene view at grid reso-

lution "r'".

In summary then, shapes and orientations in T-F
scene views are preserved under grid resolution conver-
sions; only effective sizes and distances -- and hence the
detail reflected in any encodement -- are increased or
decreased by a corresponding change in the resolution of
the grid being used for viewing and encoding a scene.

3.3 Algorithmic Encodement of
TZF Scene’Views

 

 

Before presenting the formal algorithm which can
be used to generate grid encodements for T-F scene views,
the notation for describing individual grid squares will
be discussed. Each grid square will be labeled g(x,y),
where x and y are the abscissa and ordinate, respectively,
of the geometric center of that grid square. These coor-
dinates are Specified relative to the natural coordinate
system of whatever resolution grid the grid square is a
part. Note that there are r2 grid squares in a grid of
resolution r, each one Specified by a unique label of the
form g(p-.5, q-.5) where p and q are both integers

between 1 and r inclusive.

50

The encodement algorithm to be presented has the
structural form of a formal grammar, and is applied in a
manner analogous to the derivation of terminal strings
using the productions of a grammar. Making this gramma-
tical comparison more concrete, the algorithm can be con-
sidered as having the set of non-terminal symbols
{S,L,V,H,Gv,Gh}, the set of terminal symbols {g}, the
starting type S, and a set of productions given by groups
(I) through (VI) of the algorithm specification (see
Subsection 3.3.1). Every terminal or non-terminal symbol
is followed by a set of parentheses enclosing one or more
coordinates separated by commas, each of which -- with one
exception -- represents a real value, either explicitly or
through some algebraic expression which may involve
variables. The one exception to the form of these coor-
dinates is the first coordinate of the starting type S,
which is a finite set of pairs of two-dimensional real-
valued coordinates representing a portion of the nota-
tional description of the scene in terms of the endpoints
of each of its primitives. Because of the analogy which
exists between the algorithm and formal grammars, and also
because of the coordinates associated with each terminal
and non-terminal symbol, the algorithm is termed a "coor-
dinate grammar".

One restriction must be observed when the algorithm
is applied: as each step of the algorithm is executed

(i.e., as each "production" is applied), the explicit

51

numerical value of each coordinate must be calculated
before proceeding to the next step. In other words, after
each application of a syntactic "production" of the algor-
ithm, it is necessary to consider the semantics of each
coordinate expression which results before proceeding
further. The application of the algorithm thus bears a
strong resemblance to interpretive processing, and so we
make a further refinement of our terminology by referring
to the algorithm as an "interpretive coordinate grammar".
The rules, or "productions", of the algorithm will
be presented next in Subsection 3.3.1, followed by some
comments regarding notation in Subsection 3.3.2. Sub—
section 3.3.3 contains a brief discussion on the method of
the algorithm, while an example in Subsection 3.3.4 illus-
trates the manner in which the algorithm is applied. A
detailed explanation of how the algorithm works, in which
we conclude that the algorithm does indeed generate the
proper encodements of T-F scene views and nothing more,
is the subject of Subsection 3.3.5. This is followed with
the consideration in Subsection 3.3.6 of two cases which
require special semantic interpretations for the coor-
dinates generated by the algorithm. Subsection 3.3.7
concludes this section by introducing the concept of
extended grids -- larger than those previously defined --
which are needed for the prOper interpretation of some

encodements generated by the algorithm.

52
3.3.1 The "Linguistic" Algorithm

(0) The completely specified starting type:
S {(319131):(C19d1)}:{(a23b2):(Czed2)}:
,r
o o o ,{(ap’bp) , (Cp’dp)}

(I) Rule for changing the specification of each primi-
tive to the encodement resolution "r", and a termi-

nation rule:

(a) S({L1,L2,...,Lk},r) + 8({L1,L2,...,Lk_1},r)

LJL(r.ak,r.bk,r.ck,r-dk)

(b) SULI‘) + I0

(II) Rule defining which vertical and horizontal grid

lines can be crossed by a primitive:

L(a,b,c,d) + V(Tﬁin(a,67 ,Lmax(a,el ,0,a,b,%§§)

UH(rTnin(b,d') ,Lr_nax(b,d_)_, ,o,a,b,£C1—}g)

(III) Rules for generating the coordinates of all inter-
sections of a primitive with vertical grid lines:
(a) V(x1,x2,n,x,y,m) + V(x1-l,x2-l,n+l,x-l,y-l,m)

(b) V(0,x2,n,x,y,m) + V(0,x2-l,n+1,x-l,y-l,m)

LJGv(n,Y'm'X+n)

(IV)

(V)

(VI)

53

(C) V(0,0,n,x,y,m) + GVCDaY'm-X+n)

(d) V(1’0’n’XSY’m) + g

Rules for generating the coordinates of all inter-

sections of a primitive with horizontal grid lines:
(a) H(Y1:Y2:n9x9y,m) + “(VI-1SYZ'19n+19x'ISY‘19m)

(b) H(0’YZ’n9x,y,m) + “(OSYZ'1:n+19X'ISY'19m)

u Gh(x-:-1-.y + n,n)
(C) ”(0,0,H,X,Y:m) + Gh(x-%-y + nsn)

(d) H(1.0,n.X.y.m) + ID

Rule for generating the grid squares which are

shaded due to a point of intersection of a primi-

tive with a vertical grid line:
SUI-.5.ry‘-.5).g(n-.5,Ly_,+.5),

Gv(n,y) +
g(n+.5,FYI-.5),g(n+.5,LyJ+.5)

Rule for generating the grid squares which are

shaded due to a point of intersection of a primi-

tive with a horizontal grid line:
g(rxl-.5,n-.5),g(LxJ+.5,n-.5)

Gh(x,n) +
g(er-.S,n+.S),g(LxJ+.5,n+.5)

54

3.3.2 Remarks on Notation

There are a number of symbols appearing in the
algorithm which were not included in the terminal or non-
terminal alphabets of the previous grammatical analogy
discussion. While a precise formal description of the
algorithm as an interpretive coordinate grammar would
require that all these symbols be included in the speci-
fication of its alphabet, the more intuitive grammatical
analogy notions already introduced will be sufficient for
understanding the structure and application of the algor-
ithm. Thus, in the interest of keeping notational com-
plexity to a minimum, the algorithm will not be formalized
further in terms of its grammatical prOperties. Instead,
the symbols in the algorithm not specified as part of the
alphabet in the grammatical analogy will be described
informally as to their interpretation rather than as
abstract symbols.

First, several delineators are apparent in the
algorithm: left-right bracket pairs, "{ }", used to
enclose the elements of a set; left-right parentheses
pairs, "( )", used to enclose coordinates; and commas,
",", used to separate coordinates and the elements of a
set. Additionally, the empty set "D" appears as the null
symbol of the algorithm, performing a function comparable
to that of the empty string in conventional grammars.

Next, the symbols "p" and "k" are integer variables

representing positive, non-zero, integer values defining

SS

subscript limits. We then have the symbols "ai", ”bi”,
"ci", and "di" -- for i = l, 2,...,p -- as algebraic
variables acting as placeholders for Specific numerical
values. The symbols "Li" -- for i = 1, 2,...,p -- repre-
sent the set {(ai’bi)’(ci’di)}'3 Other algebraic varia-
bles acting as placeholders for specific numerical values
are "a", "b", "c", "d", "r", "m", "n", "x1", "x2", "x",
"y", "yl", and "yz". The symbols "0", ".5", and "l" are
themselves interpreted as Specific numerical values.
Further, there are symbols in the algorithm which
represent ordinary algebraic Operations: "+", "-", ".”,
and "—" represent the conventional notations for addition,
subtraction, multiplication, and division, respectively.
Several algebraic functions are also involved: the symbol
"min" represents the function which has a value equal to
the least numerical value of its arguments; the symbol
"max” represents the function which has a value equal to
the greatest numerical value of its arguments; the nota-
tion ﬂ_J” denotes the "entier" function, whose value is
the greatest integer less than or equal to the numerical
value of the enclosed argument; and the notation "F1"
denotes the least integer greater than or equal to the

numerical value of the enclosed argument.

 

3 All possible symbols "Li" are covered with the

value of "i" ranging from 1 to p instead of l to k,
-because the value of k will never exceed the value of
p in this algorithm.

56

Finally, the set theoretic Operation of set union is
denoted by the conventional symbol "LV'in the algorithm.
It should be emphasized here that the rewriting

symbol "+" is part of the describing language for the
algorithm interpreted as a grammar, and thus does not
comprise any part of the strings produced by the algorithm.
Rather, the symbol "+" gives a structure to the rules of
the algorithm analogous to the structure of the produc-
tions of a grammar, and thus suggests a grammatical
interpretation of the algorithm as a set of rewriting
rules to be used for deriving strings of terminal symbols.
The rewriting symbol "+" thus has its conventional gram-
matical interpretation within the rules of the algorithm,
which is: a string of symbols having the form specified
to the left of the arrow can be replaced by (or "rewrit-
ten" as) the string of the form which appears to the
right of the arrow.
3.3.3 Method of the Algorithm:
A SynOpsiS

The algorithm is designed to generate binary
valued (shaded or non-Shaded) grid encodements of T-F
views of scenes by starting with both a specification of
a T-F scene view relative to the standard screen, and a
specification of the grid resolution r at which the
encodement is to be produced. With such specifications
as the coordinates of the starting type S, the rewriting

rules in groups (I) through (VI) of the algorithm are

57

applied like the productions of a formal grammar -- with
the coordinate values numerically interpreted at each
step -- to produce a set of terminal symbols representing
exactly those grid squares which are Shaded in the encode-
ment at grid resolution r. Thus, the result of a "deri-
vation" using this algorithm is a set
{g(x1.y1).g(xz.Y2).---.g(xq.yq)}
which designates q grid squares g(xi,yi), for i=l,2,...,q,
as those which are shaded in the encodement at grid reso-
lution r. All other grid squares are not shaded in the
encodement. Note that the number "q" of shaded grid
squares in the encodement will be zero in the case of the
empty snapshot represented by the empty set of terminals,
up".

The algorithm is based on the fact that a grid
encodement of a T-F scene view is uniquely determined by
the intersection points of all the primitives with the
horizontal and vertical grid lines. This follows directly
from the fact that all intersection points of primitives
with grid lines reflect all touching points of the scene
primitives with the sides of grid squares, and that the
touching of the side of a grid square by one or more
scene primitives is the only encodement criterion for
shading that grid square. We thus have the following
interpretation of the encodement scheme in terms of the

intersection points of scene primitives with grid lines:

and

(1)

(2)

(3)

58

if an intersection point is a grid lattice
point, all four grid squares sharing the
intersection point at a common corner are
shaded;

if an intersection point lies on a vertical
grid line other than at a lattice point, the
two grid squares -- one to the left, and one
to the right, of the vertical grid line --
which share the intersection point on a
common vertical side are shaded;

if an intersection point lies on a hori-
zontal grid line other than at a lattice
point, the two grid squares -- one above,
and one below, the horizontal grid line -—
which Share the intersection point on a

common horizontal side are Shaded.

This interpretation of the encodement scheme forms the

procedural basis for generating shaded grid squares in

the algorithm.

In generating the intersection points between primi-

tives and grid lines, upon which the encodement of a T-F

view of a scene is based, the algorithm determines the

intersection points separately for each primitive. The

algorithm further subdivides this generation by deter-

mining each primitive's intersections with horizontal

grid lines (if any) separately from it's intersections

with vertical grid lines (if any). In determining the

59

vertical grid line intersection points for a primitive,
the algorithm effectively substitutes the x-value along
each vertical grid line crossed by the primitive into the
straight line equation of that primitive to determine the
corresponding y-coordinate of the intersection point.
Similarly, in determining the horizontal grid line inter-
section points for a primitive, the algorithm effectively
substitutes the y—value of each horizontal grid line
crossed by the primitive into the straight line equation
of that primitive to determine the corresponding x-coor-
dinate of the intersection point. The term "effectively"
is used in both cases to denote that the intersection
point coordinates are determined relative to successively
translated coordinate systems. These translations convert
the vertical grid line intersections of a primitive into
y-axis intercepts, and convert the horizontal grid line
intersections of a primitive into x-axis intercepts.
Additional detail will be provided on these intersection
determinations using coordinate translations when we
examine the interpretation of each rule in the algorithm
in Subsection 3.3.5.

The general method upon which the algorithm is based
is summarized in the flow charts of Figures 3.2 and 3.3.
Notice that the flow chart in Figure 3.3 details the
logic of block (E) of the flow chart in Figure 3.2. The

other letter labels which appear with the various blocks

60

Initialize a cumulative specifi-
cation of all shaded grid squares (A)

 

 

at the empty set, "0".

 

Yes

i

Specify a T-F scene view on

 

 

 

scene primitives on the grid
of resolution r.

the standard screen, and an (B)
encodement resolution r.
Determine the coordinates of

. I both endpoints of one of the (C)

 

 

 
     
     
 

(D)

  

Does this
primitive cross any
vertical grid
lines?

No

  
  
      
 

   
 

Does this .
primitive cross any
horizontal grid
lines?

No

 
   
 

(H)

 
  
 
    
 

Do any
scene primitives
remain to be
encoded?

(E)

Generate all points of inter-
section of the primitive with
vertical grid lines, produce
a list of all grid squares
shaded by these intersec-
tions, and update the cumu-
lative specification of all
shaded grid squares.

J
(a)

Generate all points of inter-
section of the primitive with
horizontal grid lines, pro-
duce a list of all grid
squares shaded by these
intersections, and update the
cumulative specification of
all shadedggrid squares.

l

 

Yes

 

 

 

 

 

 

 

Figure 3.2. Flow chart
interpretation of
the method upon which
the "linguistic"
encodement algorithm
is based.

 

(Enter (E03
1

Determine the leftmost
vertical grid line crossed
by the primitive.

1

Generate the coordinates
of the point of intersec—
tion of the primitive with
the vertical grid line.

 

 

 

 

 

 

 

 

  
  

(L)

  
    
    

Is this
intersection point
a lattice point of
the grid?

(N)

 

Generate a set of 2 termi-
nal symbols having coordi-
nates representing the
center of the 2 grid
squares sharing the inter-
section point on a common
vertical side.

 

 

 

Figure 3.3. Flow chart
detail of block (E)
in Figure 3.2.

61

 

 

(J)
(R)
Consider the next
vertical grid line
to the right.
(K)
(M)

 

Generate a set of 4 ter-
minal symbols having coor-
dinates representing the
center of the 4 grid
squares sharing the inter-
section point on a common
corner.

 

 

 

(P)

 

Add the terminal symbols
to the cumulative specifi-
cation of all shaded grid
squares, eliminating any
duplicates.

 

 

 

  
 
  
   

  
 
  

Does the
primitive cross any
additional vertical
:rid lines?

 

Lent (2)1

 

62

will be used for reference in Subsection 3.3.5, and are

of no consequence here.

3.3.4 An example

To illustrate the application of the algorithm,
the T-F scene view in Figure 3.4 will be encoded on the
grid of resolution "4". The completely specified starting
type (0) for the algorithm is

S({L1,L2,L3,L4},4),
where L1, L2, L3, and L4 are as given in Figure 3.4 - (b).
The rules of the algorithm can now be applied to this
starting type to ultimately produce the prOper encodement
of the T-F scene view illustrated in Figure 3.5. We

proceed as follows:

The completely specified starting type is:

S({L1,L L3,L4},4).

2’

To encode the primitive represented by "L4" (the "slanted"
line segment) on the grid of resolution 4, rules (I) - (a)

and (II) must be applied.

Applying rule (I) - (a) yields:
S({L1,L2,L3},4) u L(%%-4,%%-4,I%-4,1—%-4).

Interpreting the coordinates yields:

8({L1,L2,L3},4) u L(3%-,3%-,l%,l).

63

 

 

 

 

 

 

 

0 1

(a) A T-F scene view on the standard screen.

{L1,L2,L3,L4}
where L1 = {(T%.T%).(%%,T%)}
L2 = {(%%.T%).(%%.%%)}
L3 = {(§.a}%).(%%.1}%)}
L4 = {(%%.%%).(T%.T%)}

(b) The set notational representation of
the T-F scene view in (a).

Figure 3.4. T-F scene view for algorithmic encodement
example.

64

 

 

 

 

 

0 1 2 3 4

(a) Encodement of the T-F scene View of Figure
3.4 on the grid of resolution 4.

g(1.5,.5),g(2.5,.5),g(3.5,.5),g(1.5,1.5),g(2.5,l.5),
g(3.5,l.5),g(2.5,2.5),g(3.5,2.5),g(2.5,3.5),g(3.5,3.5)

(b) Set of terminals denoting the
grid encodement in (a)

Figure 3.5. Encoded result of T-F scene view of Figure 3.4
on the grid of resolution

65

Applying rule (11) yields:

S({L1,L2, L 3} ,4)
1- 37
U V( minal-,1?) , max(3-4-,laZ-_)J ,0, 3%,3%,——— 11 1
1‘7 321
31
32

l 1

U “(5511.31.17 .._r_nax(3%.1) .0.3¢.32.11—————)
1

l
734
Interpreting the coordinates yields:

S({L1,L2,L3},4) UV(2,3,0,3%—,3%—,-1—%)

U H(l,3,0,371r,3%-, 19).
To determine all grid squares Shaded by primitive "L4" due
to its intersections with vertical grid lines, all
occurrences of the symbol "V" are to be removed by appli-
cations of rules in group (III), with all symbols "Gv"
which may result also to be removed by applications of

rule (V).

Applying rule (111) - (a) and interpreting coordinates
yields:
S({L } 4) UV(1,2,1,21 21 1°)
1’ L2’ L3 ’4’ 221—7

UH(l, 3, o ’31'3’2’l7 0.)

Again applying rule (III) - (a) and interpreting coor-

dinates yields:
1 1 10
8({L1,L2,L3},4) UV(0,1,2,1I,17,—7)

UH(1,3,0,3},,3%,£9).

-——)

66

Applying rule (III) - (b) and interpreting coordinates:

5
U Gv(2,l7)

1 1 10
U H(l,3,0,3I-,3-2-,—7).

Applying rule (V) and interpreting coordinates:

. H1 10
g(l.5,l.5),g(l.5,l.5),
U g(2.5,1.5),g(2.5,1.5)
l l 10
UH(19390331932”—7)0
Eliminating duplicate set elements:
1
S({L1,L2,L3},4) UV(0,0,3,I,I, —7)

U {g(l.5,l.5),g(2.5,l.5)}

UH(l, 3, 0 ,3I,3I,10 ).

Applying rule (III) - (c) and interpreting coordinates:
S({L1,L 2, L3}, 4) UGv (3, 37)

U {g(1.5,1.5),g(2.5,1.5)}

UHC1.3.0.3%.3%.-1-3).

Applying rule (V), interpreting coordinates, and elimi-
nating duplicate set elements:
S({L1,L2,L3},4) LJ{g(2.5,3.5),g(3.5,3.5)}
LJ{g(l.5,1.5),g(2. 5,1.S)}

UH(1,3,0,3I,31, 10).

67

Combining sets of terminals with the set union Operator:

S({L1,L2,L3},4) LJ g(1.5,1.5).g(2.s,1.5),
g(2.5,3.5),g(3.5,3.5)

U H(l,3,0,3%—,3%-,l-g).

Next we must determine all grid squares shaded by primi-
tive "L4" due to its intersections with horizontal grid
lines. This is done by removing all occurrences of the
symbol "H" by applications of the rules in group (IV), and
also by removing any resulting symbols "Gh" by applications

of rule (VI).

Applying rule (IV) - (a) and interpreting coordinates:

5({L1,L2,L3}’4) U 8(1-59105):g(2059105)3
g(2.5,3.5),g(3.5,3.5)

U H(0,2,1,2I,2.1I,l°7).

Applying rule (IV) - (b) twice and then applying rule (IV)

- (c), interpreting coordinates after each step, yields:

3({L1.L2.L3}.4) U g(l.5,1.5),g(2.5,1.5)}
g(2.5,3.5),g(3.5,3.5)
IJ Gh(l%,l) LJGh(z§,2) LIGh(2T%,3).
Applying rule (VI) three times, interpreting coordinates,

and eliminating duplicate elements within each set,

yields:

68

5({L1,L2,L3},4) U g(1'591'5)9g(2'591'5)9
g(2.5,3.5),g(3.s,3.5)

LJ{g(1.s,.5),g(1.s,1.5)}
LJ{g(z.s,1.5),g(2.s,z.5)}
LJ{g(2.5,2.5),g(2.5,3.5)}.

Combining sets of terminals with the set union Operators:

8({L1,L2’L3}’4) U g(losa's)sg(1-5’1°5)9g(2°531f5)s
g(2.5,2.5),g(2.5,3.5),g(3.5,3.5)

At this point, the algorithm has produced the complete
encodement of the "slanted" line segment primitive ”L4".
The encodement of each of the remaining three line seg-
ment primitives -- represented by "L1","L2", and "L3" --

can be produced in a similar manner.

The primitive represented by "L3" (the short horizontal
line segment at the tOp) is encoded by the sequential
application of rule (I) - (a), rule (II), rule (III) -
(a) three times, rule (III) — (d), rule (IV) - (a) three
times, and then rule (IV) — (d), in that respective order,
to the string produced thus far. Coordinates are inter—
preted at each step, and the set union Operator is applied
to unite the two sets (both empty) of terminals generated
for the encodement of "L3". The result is:
S({L1,L2},4) LID
g(l.5,.5),g(l.5,l.5),g(2.5,l.5),
g(2.5,2.5),g(2.5,3.5),g(3.5,3.5)

69

Note that the encodement of primitive "L3" is the empty

set, since "L3" does not cross any grid lines at resolution

104'! .

The primitive represented by "L2" (the vertical line seg-
ment on the right) is encoded by the sequential application
of rule (I) - (a), rule (II), rule (111) - (a) three times,
rule (III) - (d), rule (IV) - (a), rule (IV) - (b) twice,
rule (IV) - (c), and then rule (VI) three times, in that
respective order, to the string produced thus far. Coor-
dinates are interpreted at each step, and the set union
Operators are applied to unite the four sets (one of

which is the empty set) of terminals generated for the
encodement of "L2". The result is:

8({L1}.4) U{g(3.5,.5),g(3.5,1.5)}
g(3.5,2.S),g(3.5,3.5)

U93

g(1.5,.5),g(l.5,l.5),g(2.5,l.5),
{g(2.5,2.5),g(2.5,3.5),g(3.5,3.5) .

Note that the entire encodement of primitive "L2" resulted
from its intersections with horizontal grid lines, as it
lies entirely between two consecutive vertical grid lines

at resolution "4".

The primitive represented by "L1" (the horizontal line

segment on the bottom) is encoded by the sequential

70

application of rule (I) - (a), rule (11), rule (III) - (a)
twice, rule (III) - (b), rule (III) - (c), rule (V) twice,
rule (IV) - (a), rule (IV) - (c)4, and then rule (VI), in
that respective order, to the string produced thus far.
Coordinates are interpreted at each step, and the set
union Operators are applied to unite the three sets of
terminals generated for the encodement of "L ". The

1
result is:

S(¢,4) g(1.5,.5),g(1.5,1.5),g(2.5,.5),
LJ g(2.5,l.5),g(3.5,.5),g(3.5,l.5)

g(3.5,.5),g(3.5,1.5),
U{g(3.5,2.5),g(3.5,3.5)} U ‘1

g(1.5,.5),g(1.5,1.5),g(2.5,1.5),
g(2.5,2.5),g(2.5,3.5),g(3.5,3.5) .

At this point, the encodement for each of the four primi-
tives in the T-F scene view is complete. The first coor-
dinate of "S” has become the empty set, signifying that no

more primitives remain to be encoded. Applying rule

 

4 The x-coordinate of the "Gh" symbol generated by

the application of rule (IV) - (c) includes the expression

"%.y", which in this case takes the mathematically unde-
fined form "%.0". This expression is to be interpreted as
having the value "zero", in order to insure that prOper
encodements are produced. This is explained in detail in
Subsection 3.3.6. (Also see Footnote 6 of this Chapter).

71

(I) - (b) to the string generated thus far, and combining
all the sets of terminals with the set union Operators,
terminates the algorithm with the following result:
g(l.S,.5),g(l.5,l.5),g(2.5,.5),g(2.5,l.5),g(2.5,2.5),
g(2.5,3.5),g(3.5,.5),g(3.5,1.5),g(3.5,2.5),g(3.5,3.5).
This set of terminals represents the complete encodement --
as shown in Figure 3.5 -- of the T-F scene view of Figure

3.4 on the grid of resolution "4".

This example has demonstrated the application of the
algorithm in producing the encodement of a Specific T-F
scene view. While this has illustrated the manner in
which the rules can be applied, it remains to be Shown
that the algorithm works in general. Hence, the next
topic to be considered will be the verification of the
validity of the algorithm in the general case.

3.3.5 Method of the Algorithm:

A Detailed EXplanation

and Verification

Through a discussion emphasizing the semantics

associated with the rules of the algorithm, this sub-
section details the structure, application, and inter-
pretation of the rules of the algorithm. In addition to
explaining the method and logic of the algorithm, the goal
is to justify that the structure of the algorithm is such

that the prOper encodement of T-F scene views is assured.

72

Recall the grammatical structure of the algorithm
with rules which are applied like the productions of a
grammar in deriving a terminal string. As is the case
with productions in a grammar, the order of application of
the rules of the algorithm in deriving sets of terminals
is not necessarily unique. However, a unique order of
application for the rules of the algorithm will be used in
this discussion in order to make the logic of the algor-
ithm more understandable. This unique order appears in
the flow charts of Figures 3.2 and 3.3, which represent
the method used by the algorithm in generating encodements.
Bear in mind that the flow chart in Figure 3.2 details the
logic of block (E) of the flow chart in Figure 3.1. Both
of these flow charts will be referenced frequently as we
associate this logical structure with the rules of the
algorithm.

The entity labeled "(0)" in the algorithm represents
the complete specification of the starting type for the
algorithm. More specifically, the starting symbol "S",
chosen to be suggestive of the word "Scene", is followed
by two coordinates. The first coordinate of "S" speci-
fies a T-F view of a scene in terms of its set notational
description. The scene has "p" primitives (p 3 l), with
each primitive represented by its two endpoint coordi-
nates specified relative to the standard screen. The
second coordinate of "S" is a finite, non-zero, positive

integer "r" which specifies the grid resolution at which

73

the T-F scene view described by the first coordinate is to
be encoded. Once the starting type has been completely
specified with its two coordinates -- corresponding to
block (B) of Figure 3.2 -- the rewriting rules in groups
(I) through (VI) of the algorithm can be applied to pro-
duce the complete encodement specification.

A single application of rule (I) - (a) corresponds to
block (C) of the flow chart in Figure 3.2. One of the "p”
primitives "L1" = {(ai’bi)’(ci’di)} is removed from the
first coordinate of the starting type "S" and converted to
its corresponding representation on the grid of resolution
"r" as "L(r-ai,r.bi,r-ci,r-di)". The symbol "L" is chosen
to be representative of the term "line segment primitive",
with its first two coordinates representing the abscissa
and ordinate, respectively, of the other endpoint. One
application of rule (I) - (a) thus corresponds to a
partial application of Theorem 3.2, using a modified
notation for the result of the coordinate transformation.

Rule (II) Of the algorithm is preparatory to asking
the questions posed in blocks (D) and (F) of Figure 3.2.
This rule converts the specification of line segment
primitive "L" into the specification of symbol ”V", which
will be used in generating intersection points of the
primitive with gertical grid lines, and the Specification
of symbol "H", which will be used in generating inter-

section points of the primitive with horizontal grid lines.

74

Recalling that the coordinates of "L" Specify the endpoints

of the primitive on the grid of resolution "r", we note the

following regarding the coordinates of "V":

and

(1)

(2)

(3)

(4)

the first coordinate of "V” represents the
least integer greater than or equal to the
smallest x-coordinate of the primitive's end-
points, and thus specifies the leftmost ver-
tical grid line that the primitive can cross;
the second coordinate of "V" represents the
greatest integer less than or equal to the
largest x-coordinate of the primitive's end-
points, and thus specifies the rightmost
vertical grid line that the primitive can
cross;

the third coordinate "0" initializes a counter
which will keep track of coordinate axes trans-
lations as intersection points on grid lines
are determined;

the fourth and fifth coordinates, "a" and "b”
respectively, represent the coordinates of one
of the endpoints of the primitive, which along
with the sixth coordinate -- which represents
the SIOpe of the primitive -- provide sufficient
information for constructing the "point-lepe"
equation for the straight line segment primi-

tive.

75

In a similar manner, the first coordinate of symbol "H"
specifies the lowest horizontal grid line that the primi-
tive can cross, the second coordinate of "H" specifies the
highest horizontal grid line that the primitive can cross,
and the remaining four coordinates of "H" are identical to
those of "V".

The rules in Group (III) of the algorithm are used in
determining all vertical grid line crossings of the primi-
tive, if any, corresponding to block (D) and a portion of
block (E) in Figure 3.2. Each application of rule (III) -
(a) corresponds to a translation of the origin of the
coordinate system one unit to the right and one unit up
(i.e., one unit along the 45° diagonal of the grid). This
decreases by "l" the values of all the coordinates of "V"
specified relative to the coordinate system, which includes
those values in the first, second, fourth, and fifth
coordinates. The value of the third coordinate of "V",
which represents a counter keeping track of the location
of the translated coordinate system relative to the
natural coordinate system of the grid of resolution "r",
is increased by "l". The last coordinate of "V" repre-
senting the slope of the primitive remains unchanged, as
the SIOpe of a line is invariant under a translation of
coordinates. Rule (III) - (a) is applied consecutively
zero or more times until one of the rules (III) - (b),
(III) - (c), or (III) - (d) is applicable. Rule (III) -

(d) is applicable at this point if and only if both the

76

smallest and largest x~coordinates on the primitive (which
occur exactly at the endpoints of this line segment primi-
tive) are not integers themselves, and both lie strictly
between the same two consecutive integer values. This is
signified by the "1,0" combination in the first two coor-
dinates of "V", respectively, which occurs here if and only
if the leftmost vertical grid line which the primitive SEE
cross (as determined in rule (11)) is to the right of the
rightmost vertical grid line which the primitive ggn_cross
(also as determined in rule (II)). Thus rule (111) - (d)
is applicable only when the primitive does not cross any
vertical grid lines, corresponding to the "no" exit of
block (D) in Figure 3.2. If rule (III) - (d) was not
applicable at this point, this signifies that the "yes"
exit from block (D) in Figure 3.2 would be taken. In this
case, the "0" appearing as the first coordinate of "V"
indicates that the value of the third coordinate of "n"
represents the leftmost vertical grid line on the grid of
resolution ”r" which is intersected by the primitive.
Thus, the applications of rule (III) - (a) which led to
the "0" in the first coordinate of "V” have already accom-
plished block (J) of Figure 3.3. Rule (III) - (c) is now
applicable if and only if the second coordinate of "V",

as well as its first coordinate, has reached a value of
"0", signifying that the y-axis of the translated coor-
dinate system coincides with the rightmost vertical grid

line intersected by the primitive. In this case, an

77

application of rule (III) - (c) generates the symbol "GV"
-- chosen to be suggestive of the terminology "intersection
point on a gertical grid line" -- with two coordinates
specified relative to the natural coordinate system of

the grid of resolution "r". These two coordinates repre-
sent the abscissa and ordinate, respectively, of the inter-
section point of the primitive with the rightmost vertical
grid line it crosses. However, this intersection point
lies on the y-axis of the translated coordinate system,

and thus constitutes the y-intercept of the primitive with
coordinates ((Ly-m.x) relative to the translated coor-
dinate system, where: (l) "x" and "y" as the fourth and
fifth coordinates of "V" represent the abscissa and ordi-
nate, respectively, of an endpoint of the primitive
relative to the translated coordinate system; and (2) "m"
as the Sixth coordinate of "V" represents the slope of the

5 Therefore, adding the translation factor "n"

primitive.
(the third coordinate of "V") to the coordinates of this
y-intercept specified relative to the translated coordinate

system results in the coordinates "(n,y-m-x+n)," which

 

5 If "m" ==n, then the only time it will appear in the
coordinates of the y-intercept -— or in the coordinates of
"CV" -- is when the value of ”x" has become identically

zero. In this case, the expression "m-x" takes the mathe-
matically undefined form "<n.0", which Should here be
interpreted as the value "0". This is explained in detail
in Subsection 3.3.6.

78

then represent the coordinates of "CV" specified relative
to the natural coordinate system of the grid of resolution
"r". The coordinates of "CV" thus represent the point of
intersection of the primitive with the grid line x=n on
the grid of resolution "r". The application of rule (III)
- (c) corresponds to the final pass through block (K) of
Figure 3.3, and also signifies that the "no" exit will be
taken from block (Q) of Figure 3.3 the next time it is
reached since the non—terminal symbol "V" is eliminated.
If rule (III) - (c) is not applicable, rule (III) - (b)
Should be applied until the second coordinate of "V" does
become "0" and rule (III) - (c) can be applied. Rule
(III) - (b) is used to determine all intersection points
of the primitive with each vertical grid line it crosses
except the rightmost one. Each application of rule (III)
(b) performs the Operations of both rules (III) - (a) and
(III) - (c), with minor modifications, as follows:

(1) the symbol "CV" is generated with the appro-
priate coordinates representing the inter-
section point of the primitive with the vertical
grid line x=n (never the rightmost one crossed)
by using the method of rule (III) - (c);

and (2) the coordinate system is translated one addi-
tional unit along the 450 diagonal of the grid
by regenerating "V" on the right side of "+"
with the apprOpriately modified coordinates as

done in rule (111) - (a) -- with the exception

79

that the first coordinate of "V" is maintained
at the value "0" (not further decremented) to
signify that the translated y-axis is to the
right of the leftmost vertical grid line inter-
sected by the primitive.
The regeneration of ”V" with modified coordinates by each
application of rule (III) - (b) allows the intersection
point of the primitive with the next vertical grid line to
the right to be generated. Linking rule (III) - (b) to
the flow chart of Figure 3.3, each application of rule
(III) - (b) -- because of its Operation in (1) above --
corresponds to all but the final pass through block (K) of
Figure 3.3. Additionally, because of the Operation of
regenerating "V" in (2) above, each application of rule
(III) - (b) provides for block (R) in Figure 3.3 after
insuring that the "yes" exit is taken from block (Q) when
it is next reached.

Rule (V) of the algorithm is used to generate the
shaded grid squares which result from each intersection
point of the primitive with a vertical grid line, and thus
corresponds to blocks (L), (M), and (N) in the flow chart
of Figure 3.3. Each application of rule (V) converts the
coordinates "(n,y)" of "CV" -- which represent the point
of intersection of the primitive with a vertical grid
line -- into a set of either four or two specifications of
shaded grid squares, depending respectively upon whether

or not the coordinates "(n,y)" represent a grid lattice

80

point. In either case, though, any grid square shaded as
a result of this intersection point "(n,y)" will be
immediately to the left or right of the vertical grid line
x=n, and thus has the x-coordinate of its center as either
"n-.5" or "n+.5". And Since any intersection point on a
vertical grid line causing a grid square on one side of
the grid line to be shaded also causes the adjacent grid
square at the same vertical level on the opposite side

of the grid line to be shaded, all that remains is the
determination of all possible values for the y-coordinate
of the center of any grid squares shaded as a result of
the intersection point "(n,y)." If the value of y is an
integer, signifying that the intersection point "(n,y)” is
a lattice point, then the y-coordinate of the center of
each grid square shaded by this intersection point is
either "y+.5" or "y-.5". But since "y" is an integer,

LXJ = Ff) =y, and so "LXJ+'5" and "ry1-.s" represent the
two distinct y-coordinates "y+.5" and "y-.5", respectively.
By generating all four possible combinations of these two
distinct x-coordinates and two-distinct y-coordinates as
the x- and y-coordinates, respectively, of the terminal
symbol "g", rule (V) thus specifies the set of four grid
Squares shaded as a result of the lattice point inter-
section of the primitive with a vertical grid line. On
the other hand, if the value of y is not an integer, this
signifies that the intersection point "(n,y)" is not a

lattice point but a point which lies between two horizontal

81

grid lines on a vertical grid line. In this case, the
y-coordinate of the center of each grid square shaded as
a result of the intersection will be located at the single
vertical level which is midway between the same two hori-
zontal grid lines that the intersection point "(n,y)" lies
between. This level is represented by either "Ly,+.5" or
"Fy1-.s" -- both of which have the same value Since "y"
is not an integer -- as the y-coordinate of the center of
a shaded grid square. So here the four coordinate combi-
nations generated for the terminal symbol "g" by rule (V)
results in only two distinct elements in the set, repre-
senting the two grid squares shaded as a result of the
non-lattice point intersection. The duplicate Specifi-
cations generated for each of these shaded grid squares --
due to the fact that "LXJ+'5" = "ry1-.s" -- are automa-
tically eliminated by the interpretation that rule (V)
generates a £33 of terminals. Therefore rule (V)
inherently includes both the "yes" and "no" exits from
block (L) in Figure 3.3 -- as signified, respectively, by
the value of the y-coordinate of the intersection being
an integer or not -- when it is applied, as well as
accomplishing the appropriate Operation in either block
(M) or block (N) of Figure 3.3.

The function of block (P) in Figure 3.3 is automa-
tically accomplished by the set union Operators "LV'which
liave been, and which will be, generated by any applica-

1:ion(s) of rules (I) - (a), (II), and/or (111) - (b), as

82

well as those which will also be generated by any appli-
cation(s) of rule (IV) - (b). The rules of group (IV),
and rule (VI), will now be discussed briefly.

The rules in group (IV) of the algorithm are used in
determining all horizontal grid line intersections of the
primitive, if any, corresponding to block (F) and a portion
of block (G) in Figure 3.2. Each of the rules (IV) - (a),
(IV) - (b), (IV) - (c), and (IV) - (d) is completely
analogous in both structure and application to the corres-
ponding rule in group (III) -- rule (III) - (a), (III) -
(b), (III) - (c), and (III) - (d), respectively. It is
however necessary to bear in mind that the rules in group
(IV) are interpreted with respect to horizontal grid
lines, while the rules in group (III) are interpreted with
respect to vertical grid lines. This means that the con-
cepts of "horizontal," "below," "lowest," "above,"
"highest," "x-intercept," and the like are used when
interpreting the rules in group (IV) in places analogous
to those where the concepts of "vertical," "left of,"
"leftmost," "right of," "rightmost," and "y-intercept,"
respectively, were used in interpreting the rules in group
(III). In particular, note that the symbol "Gh" is sugges-
tive of the terminology "intersection point on a horizon-
tal grid line," and that the coordinates generated for

"Gh" are based on the x-intercept (x-%-y, 0) of the

83

primitive relative to a coordinate system translated "n"
units along the 45° diagonal of the grid of resolution
"r".6
The analogy drawn between the rules in groups (III)
and (IV) of the algorithm is continued with a similar
analogy existing between rules (V) and (VI). Rule (VI) of
the algorithm is used to generate the shaded grid squares
which result from each intersection point of the primitive
with a horizontal grid line. Each application of rule (VI)
converts the coordinates "(x,n)" of "Oh" -- which repre-
sent the point of intersection of the primitive with a
horizontal grid line -- into a set of either four or two
specifications of shaded grid squares, depending respec—
tively upon whether or not the coordinates "(x,n)" repre-
sent a grid lattice point. The shaded grid Squares
generated will be symmetrically positioned above and below
the horizontal grid line on which the intersection point
lies. The coordinates of their centers will be deter—
mined using the method introduced in rule (V), with the
role of "y" in "Gv(n,y)" being assumed by "x" in "Gh(x,n)",

while the role of "n" remains the same except for its

 

6 If "m" = 0, then the only time it will appear in the
coordinates of the x-intercept -- or in the coordinates of
"Oh” -- is when the value of "y" has become identically

zero. In this case, the expression ”%-y" takes the mathe—

l

matically undefined form "U'O’" which should here be inter-

preted as the value "0". This is explained in detail in
Subsection 3.3.6.

84

appearance in rule (VI) in a different coordinate position
than in rule (V).

It should now be apparent that a flow chart, com-
pletely analogous to the one shown in Figure 3.3 for the
application(s) of the rules in groups (III) and (V) of the
algorithm, could be drawn to depict the application(s) of
the rules in groups (IV) and (VI) of the algorithm.
Because this flow chart would detail block (G) of Figure
3.2 in the same way that Figure 3.3 details block (B), it
has been omitted.

At this point we have described the interpretation
for each rule, and a sequence of application(s) for these
rules, in generating the encodement of one primitive in a
T-F scene view on the grid of resolution "r". The encode-
ment of each of the remaining primitives is accomplished
by returning to rule (I) - (a) to initiate the encodement
of the next primitive, and then by again repeating all
applications of the rules from that point on as already
discussed for the first primitive encoded. Note that rule
(I) - (a) will be applicable at the beginning of any such
repetition if and only if the first coordinate of "S" is
non-empty, signifying the "yes" exit from block (H) of
Figure 3.2. Thus the entire rule application process from
rule (I) - (a) on is repeated a total of "p" times
(including the initial time) to encode all the primitives
of a T-F scene view having "p" primitives, after which the

first coordinate of symbol "S" has been reduced to the

85

empty set "D". This signifies that no more primitives
remain to be encoded, and that rule (I) - (b) can now be
applied. This corresponds to the "no" exit from block (H)
of Figure 3.3, which terminates the algorithm.

The entire sequence just presented for applying the
rules of the algorithm is summarized by the flow chart
whose components appear in both Figures 3.6 and 3.7. It
should be reemphasized here that this sequential order of
application for the rules was introduced to enhance the
clarity of the logic of the algorithm, and does not repre—
sent the only order of application which could be used to
generate a set of terminals. In fact, any rule which is
applicable at any stage during a derivation with the
algorithm can be applied at that stage as the next step
of the algorithm. Any derivation so constructed which
converts the starting type "8" (including its prOperly
specified initial coordinates) into a set of terminal
symbols (with coordinates) and nothing else, has produced
the prOper encodement for the specified T-F scene view.
This is guaranteed by the structure of the rules of the
algorithm, which permits a set of terminals (with coor-
dinates) to be the complete result of a derivation only
when each rule in the algorithm has been consistently
applied with its intended interpretation as given earlier.
This fact is obvious when any non-terminal symbol,
including its coordinates, appears during a derivation and

there is exactly one rule of the algorithm which can be

86

Specify starting
type (0)

[Apply rule (I)-(a)J

 

<:g:> (from Figure
3.7)

 

 

[ Apply rule (II) 1

 

 

       
     
 
  
 
  

 

Is
rule (III)-(d)
:oplicable?

I e ..

Apply rule rule (III)-(c)
(V) applicable?

Yes Apply rule
(111)-(d) '

 

 

    

Yes Apply rule
(111)-(6)

 

 

 

    

 

 

 

 

 

 

NO
Apply rule Yes rule (III)-(b) Apply rule
(III)-(b) applicable? (V)
No
’JA 1 1 (to F1 -
I pp y rule (III)-(a) 8

 

ure 3.7)

Figure 3.6. Initial component of flow chart for "lin-
guistic" algorithm application.

87

o (from Figure 3.6)

 

 

 
 

 

rule (IV)-(d)
applicable?

   

Yes Apply rule .
(IV)-(d)

 
 

No

 

g I

[Apply rule

 
 

 

rule (IV)-(c)
applicable?

   

Yes Apply rule
(VI)

(Ivl-(c)

 

 
 

 

No

 
 

Apply rule
(VI)

 

Yes rule (IV)-(b)

applicable?

   

Apply rule
(IV)-(b)

 
 

 

 

 

O

 

 

iApply rule (IV)-(a)]

 

   
 

 

  

rule (1)-(b)

l_App1y rule (1)-(b) applicable?

 

ure 3.6)

Figure 3.7. Terminal component of flow chart for “lin-
guistic" algorithm application.

88

applied to that occurrence of that symbol. But the cases
where more than one rule can be applied to an occurrence
of a symbol in a derivation, and thus where a choice must
be made, need further explanation. These cases are
exhaustively enumerated for the algorithm as follows:
case(l): whenever rule (III) - (b) is applicable,
rule (III) - (a) is also applicable;
case(2): whenever rule (III) - (c) is applicable,
rule (III) - (a) is also applicable;
case(3): whenever rule (III) - (d) is applicable,
rule (III) - (a) is also applicable;
case(4): whenever rule (III) - (c) is applicable,
rule (III) - (b) is also applicable;
case(5): whenever rule (IV) — (b) is applicable,
rule (IV) - (a) is also applicable;
case(6): whenever rule (IV) - (c) is applicable,
rule (IV) - (a) is also applicable;
case(7): whenever rule (IV) - (d) is applicable,
rule (IV) - (a) is also applicable;
and case(8): whenever rule (IV) - (c) is applicable,
rule (IV) - (b) is also applicable.
For the application of the rules of the algorithm to be
consistent with their intended interpretations, the rule
first mentioned in each of the above eight case descrip-
tions should be the rule chosen for application whenever
that respective case arises in a derivation. The conse-

quence of choosing to apply rule (III) - (a) in either

89

case (1) or case (2) would be the generation of the non-
terminal symbol "V" with a first coordinate having a
negative value. The only rule subsequently applicable for
rewriting a "V” with a negative first coordinate would
again be rule (III) - (a), whose application would only
result in another appearance of the symbol "V" with some
negatively valued first coordinate. As this can only
continue, the non-terminal symbol "V" having some nega-
tively valued first coordinate will always be present in
any subsequently derived string. Similarly, the conse-
quence of choosing to apply rule (IV) - (a) in either
case (S) or case (6) would be the generation of the non-
terminal symbol "H", appearing with a negatively valued
first coordinate and persisting with some negatively
valued first coordinate in all subsequently derived
strings. The consequence of applying rule (III) - (a) in
case (3), or rule (III) - (b) in case (4), would be the
generation of the non-terminal symbol "V", appearing with
a negatively valued second coordinate and persisting with
some negatively valued second coordinate in all subse-
quently derived strings. Analogously, we finally have
the consequence of applying rule (IV) - (a) in case (7),
or rule (IV) - (b) in case (8), as the generation of the
non-terminal symbol "H", appearing with a negatively
valued second coordinate and persisting with some nega-
tively valued second coordinate in all subsequently

derived strings. So the conclusion in any of these eight

90

cases is that the choice and application of the "wrong"
rule guarantees the existence of a non-terminal symbol in
any string derived. Thus, the application of a rule of
the algorithm with other than its intended interpretation
at any point in a derivation precludes the possibility
that a set of terminal symbols will be the exclusive result
of that derivation.

As a result of the preceding extensive discussion,

the algorithm has been "proven" in the following sense:

Theorem 3.3

 

Consider a completely specified starting type
consisting of the terminal symbol "S" followed by a
pair of parentheses enclosing two coordinates sepa—
rated by a comma, where the first coordinate repre-
sents a T-F scene view "U" using its set notational
description relative to the standard screen, and
where the second coordinate represents the grid
resolution "r" at which "U" is to be encoded.
Additionally, let "T" denote the unique set of ter-
minal symbols (with coordinates) which represents
the encodement of "U" on the grid of resolution "r",
where each element of "T" consists of the symbol "g"
followed by two coordinates which specify the
abscissa and ordinate of the center of one of the

Shaded grid squares.

91

Then there exists some sequential application of
the rules of the algorithm (in which each rule may
appear -' consecutively or non-consecutively -- zero
or more times) which converts this completely speci-
fied starting type into exactly the set "T" and
nothing else. Further, any set of terminal symbols
(with coordinates) derived as the exclusive result
of applying the algorithm to the completely specified
starting type will be precisely the set "T".

3.3.6 Semantic Interpretation:
Two Special Cases

When the algorithm is applied to encode a T-F
scene view, situations can arise in which a Special inter-
pretation of coordinate values is required to insure that
the algorithm generates the proper encodement. Such a
situation occurs whenever a primitive of a T-F scene view
coincides along its entire length with one of the grid
lines. The interpretation problem results when conven-
tionally undefined mathematical expressions arise in
coordinates due to the "0" or "a," value of the lepe of
the primitive. Since these slope values occur if and only
if a primitive is either a horizontal or vertical line
segment in a T-F scene view, the consideration of all
special coordinate interpretations can be included in two

cases -- the case where a primitive is a vertical line

92

segment, and the case where a primitive is a horizontal
line segment. We first consider the vertical primitive
case.

In the case where a primitive is a vertical line
segment, it is parallel to the vertical grid lines and
hence the coordinates "a" and ”c" of symbol "L" in rule
(II) of the algorithm are equal in value. This implies
that the Sixth coordinate generated by rule (II) for
symbol "V" (and also for symbol "H") will have the form
"962", where the value of "d-b" is non-zero since every
scene primitive must have finite length. Since this sixth
coordinate of "V" (and also of "H") represents the slope
of the primitive, the expression "$62” is interpreted as
"<3". Now a vertical primitive lies either strictly
between two consecutive vertical grid lines, or directly
on a vertical grid line. If the primitive lies strictly
between two consecutive vertical grid lines, then rule
(II) of the algorithm generates the symbol "V" with the
value of its first coordinate exceeding the value of its
second coordinate by "1". This means that rule (III) -
(d) will be applicable after the zero or more applications
of rule (III) - (a) required to reduce the second coor-
dinate of "V" to zero. Hence rules (III) - (b) and (III)
- (c) will never be used when encoding this primitive,
eliminating any concern about the factor "<n" entering

into the coordinate expressions of "CV" which involve the

factor "m" in the numerator. On the other hand, if the

93

primitive lies directly on a vertical grid line, then rule
(II) of the algorithm generates the symbol "V" with its
first, second, and fourth coordinates all having the same
value "a", which represents the x-coordinate value along
the vertical grid line on which the primitive lies. In
this case, rule (III) - (c) becomes applicable after "a"
applications of rule (III) - (a), which has consequentially
reduced the fourth coordinate "x" of "V" to the value zero.
The symbol "CV" is then generated, with the expression
"m-x" having the form "<n-0" in its y-coordinate. By
interpreting this conventionally undefined eXpression as
having the value "0", the coordinates of "CV" will be the
coordinates of one of endpoints of the primitive on the
vertical grid line. This then specifies one point which
is taken as the only intersection point of the primitive
with the vertical grid line which generates shaded grid
squares. The grid Squares shaded as a result of this
point insure the prOper encodement if the primitive lies
strictly between two consecutive horizontal grid lines,
and will merely be redundant if the primitive crosses one
or more horizontal grid lines (i.e., the shaded grid
squares generated due to the intersections of the primi-
tive with one or more horizontal grid lines will include
those already generated as a result of the Single point

of intersection with a vertical grid line as defined above
for the primitive). The only additional place where "m"

is involved in an expression which needs to be interpreted

94

is in the x-coordinate of symbol "Gh". In this case,
however, "m" always appears in the denominator, so the
value of "a>" for "m" causes no problems of interpretation.
In fact, since the value of "y" in the x-coordinate of
"Gh" will always be finite, the expression "%-y" will
always have the value "0" when interpreted with "m" = "<n".
An analogous discussion holds for the case where a
primitive is a horizontal line segment. Here the primitive
is parallel to the horizontal grid lines, and hence the
coordinates "b" and "d" of symbol "L" in rule (II) of the

algorithm are equal in value. This implies that the sixth

coordinate generated by rule (II) for symbol "H" (and also

H H

for symbol "V") will have the form 375

, where the value
of "c-a" is non-zero Since every primitive must have
finite length. Thus the sixth coordinate of "V" (and also
of "H"), which is the slope of the primitive, has the
value "0". Now a horizontal primitive lies either
strictly between two consecutive horizontal grid lines, or
directly on a horizontal grid line. If the primitive lies
strictly between two consecutive horizontal grid lines,
then rule (II) of the algorithm generates the symbol "H"
with the value of its first coordinate exceeding the value
of its second coordinate by "1". This means that rule
(IV) - (d) will be applicable after the zero or more
applications of rule (IV) - (a) required to reduce the

second coordinate of "H" to zero. Hence rules (IV) — (b)

and (IV) - (c) will never be used when encoding this

95

primitive, eliminating any concern about the factor "0"
entering into the coordinate expressions of "Gh" which
involve the factor "m" in the denominator. On the other
hand, if the primitive lies directly on a horizontal grid
line, then rule (11) of the algorithm generates the symbol
"H" with its first, second, and fifth coordinates all
having the same value "b", which represents the y-coor-
dinate value along the horizontal grid line on which the
primitive lies. In this case, rule (IV) - (c) becomes
applicable after "b" applications of rule (IV) - (a),
which has consequentially reduced the fifth coordinate "y"
of "H" to the value zero. The symbol "Gh" is then gener-
ated, with the expression "%.y" taking the form "%-0" in
its x—coordinate. By interpreting this conventionally
undefined eXpreSSion as having the value "0", the coor-
dinates of "Gh" will be the coordinates of one of the
endpoints of the primitive on the horizontal grid line.
This then specifies one point which is taken as the only
intersection point of the primitive with the horizontal
grid line which generates shaded grid squares. The grid
squares shaded as a result of this point insure the pro-
per encodement if the primitive lies strictly between two
consecutive vertical grid lines, and will merely be redun-
dant if the primitive crosses one or more vertical grid
lines. The only additional place where "m" is involved

in an expression which needs to be interpreted is in the

y-coordinate of the symbol "Gv". In this case, however,

96

"m" always appears in the numerator with a finite value of
"x", so that the eXpression ”m-x" has the conventional
interpretation of ”0" when "m" = "0".

In summary, the two cases which require special
semantic interpretations for the values of coordinates
are: (1) when the expression "<n-0" appears in the
y-coordinate of "CV" as a result of encoding a primitive
with infinite lepe lying directly on a vertical grid line;
and (2) when the expression "%-0" appears in the x-coordi-
nate of "Gh" as a result of encoding a primitive with
zero lepe lying directly on a horizontal grid line. Both
of the expressions "a>10" and "%~0" are conventionally
mathematically undefined, but are both to be interpreted
here as representing the numerical value "0" when they
arise during application of the algorithm, which will
insure prOper encodements. These are the only two Special
coordinate value interpretations needed in any derivation
which results exclusively in a set of terminals. If either
of the two mathematically undefined expressions occur for
any other reason than the ones given in (l) and (2) above,
or if any of the coordinates of any occurrence of "CV" or
"Gh" have an interpreted value of "cm", then at least one
rule of the algorithm has previously been applied contrary
to its intended interpretation. Such an occurrence Signi-

fies that the derivation will not have a set of terminals

as its exclusive result.

97

3.3.7 Grid Extension

When the algorithm is used to encode a T-F scene
view which has at least one of its primitives touching a
side border of the standard screen, the encodement
generated by the algorithm will include one or more ter-
minal symbols "g" having coordinates which Specify a point
lying outside the region of the encodement grid superim-
posed on the standard screen. This is due to the fact that
the touching point of a primitive on a border of the stan-
dard screen is also an intersection point of that primitive
with either a horizontal or vertical border line of the
encodement grid. Hence, the algorithm (through rule (V)
or (VI)) generates terminals whose coordinates represent
the centers of shaded grid squares on both sides of that
grid border line, although the grid has not yet been
defined on one of those Sides. In such a situation, then,
the problem is that grids which are confined to the area
of the standard screen, and whose borders must be aligned
with those of the standard screen, are not adequate to
represent the complete encodement of the T-F scene view.

A solution is provided by symmetrically extending the
overall Size of an encodement grid through an increase in
the total number of squares in the grid, while maintaining
the Size of the individual grid squares and the alignment
of the original grid (now symmetrically embedded in the
"extended" grid) with the standard screen. This extension

is accomplished for the grid of arbitrary resolution r --

98

as defined in Definition 2.7 -- by merely appending an
additional grid square (of the same Size as those already
in the grid) along all sides and corners of the grid
border. Note that this puts the borders of the standard
screen strictly within the borders of the extended grid.
Hence, any T-F Scene view appearing on the standard
screen -- even a view which touches the border of the
standard screen -- will have its complete encodement con-
tained entirely within the borders of the extended grid of
any resolution r.

The concept of an "extended grid” is formalized as

follows:

Definition 3.2

 

The extended grid of resolution r is the grid

 

consisting of (r+2)2 identical squares, formed by
extending the grid of resolution r "1" grid unit (at
resolution r) on all four sides (left, right, Up, and
down) and diagonally at all four corners.

The coordinate system of the extended grid of
resolution r is formed by symmetrically extending
both the x- and y-axes of the grid of resolution r by
two grid units -- the x-axis being extended one grid
unit in both the positive and negative x directions,
and the y-axis being extended one grid unit in both
the positive and negative y directions. The origin

of the coordinate system is maintained in its

99

location before the extension, so that the origin
appears at the lattice point located 1 grid unit
right and 1 grid unit up from the lower left corner
of the extended grid. Thus the coordinate axes
labels for the extended grid of resolution r run

from "-1" to "r+l".

Note that this definition insures that the grid and
extended grid of resolution r both have the same size grid
squares, and that the origins of their coordinate systems
coincide. This preserves the definition of resolution for
grids, and insures that all the results obtained in
Section 3.2 -- which characterize the effects of grid
resolution changes on T-F scene view descriptions -- are
directly applicable for extended grids. Hence, it is no
loss of generality to interpret an encodement generated

by the algorithm on the apprOpriate extended grid. And
such an interpretation will insure that the coordinates of
all terminal symbols "g" generated by the algorithm will
be within the coordinate system of the extended grid.

Some examples of finite extended grids with different
resolutions defined relative to the same standard screen
are shown in Figure 3.8. Note that these extended grids,
unlike the corresponding (non-extended) grids discussed in
Section 2.4, are not all of the same absolute size. How-
ever, the grid of resolution r is embedded in the corres-

ponding extended grid of the same resolution, with its

 

 

 

 

00 1

(a) The Standard Screen

(Grid of Resolution 1)

 

 

 

0

 

 

 

 

 

 

-1
-1 0 l 2 3

 

(c) The Extended Grid
of Resolution 2.

100

 

 

 

 

 

 

 

-1
-l 0 1 2

 

(b) The Extended Grid of Reso-
lution 1.

+5
+4
+3
+2
+1

0

’11 o

  

(d) The Extended Grid
of Resolution 4.

Figure 3.8. Examples of extended grids.

101

borders located "1" grid unit in from the correSponding
borders of the extended grid. Further, bear in mind that
the x and y coordinate axes are located "1" grid unit in
from the bottom and left borders, respectively, of the
extended grid. This is reflected by the coordinate
labeling along both the bottom and left borders of the
extended grids in Figure 3.8, which puts the origin of
the coordinate system one grid unit up and right from the
lower left corner.

Because the coordinate systems of both the grid of
resolution r and the extended grid of resolution r are
identical on the standard screen, no confusion should
result if a grid is considered to be an extended grid of
the same resolution whenever necessary or appropriate.
Hence, we will hereafter use the terms "grid" and "extended
grid" interchangeably, unless designated otherwise.

3.4 A1 orithmic Encodement
o SnapShotS

 

 

The "linguistic" algorithm (so named because of
its resemblance in both structure and application to for-
mal grammars) of Section 3.3 can be considered to generate
snapshots -- instead of encodements consisting of shaded
grid squares in specific locations on a grid -- if the
coordinate values of the terminals are interpreted in the
proper manner. Recall from Chapter 2 that although a
snapshot is defined with respect to some specified grid

resolution, it is only the structural form (including the

102

overall orientation) of the entire pattern of shaded grid
squares which defines the snapshot, irrespective of the
relative translational position of that pattern on the
grid. Thus, if a snapshot is completely represented
(relative to some designated grid resolution) by a set of
symbols, each of which has a unique pair of coordinates
Specifying the center of one of the shaded grid Squares in
the snapshot, then the absolute values of these coordi-
nates are of no consequence with regard to their determi-
nation of the location of the pattern on the grid; it is
only the relative positional relationships (location and
orientation) existing between these coordinates which
characterize the snapshot. This implies that any set of
terminals (with coordinates) generated by the algorithm
can be considered to represent a snapshot relative to the
encodement resolution, under the interpretation that the
coordinates of the terminal symbols only have positional
significance relative to each other.

As a consequence of the preceding remarks, we note
that distinct grid encodements (at the same resolution)
which have the same relative positional relationships
existing between their elements represent the same snap-
shot. A characterization of these equivalent encodement
representations for a snapshot will be presented in
Theorem 3.4, using the notation defined next.

Let "T" be a set of terminal symbols (with coordi-

nates) as follows:

103

T={g(x1.y1).g(xz.y2).....g(xq.yq)}.

where q is a non-negative integer. Also, let "f" denote a
coordinate transformation. Then we use the notation

"f(T)” to denote the set

{g(x;.yi).g(X;.y;).-...g(x;.y;)}.

| I
where (xi,yi) = f(xi,yi) for 1=l,2,...,q.

Theorem 3.4

 

Let T1 and T2 be two sets of terminal symbols
(with coordinates), each of which is the exclusive
result of a derivation using the "linguistic" algor-
ithm of Section 3.3 with the same encodement reso-
lution.
Then, T1 and T2 are equivalent encodement repre-
sentations for the same snapshot (at the encodement
resolution) if and only if
f(T1) = T2

for some coordinate transformation "f" of the form
f(X.y) = (x+a.y+b).

where "a" and "b" are integer constants (either of

which may be positive, zero, or negative).

Proof
The proof is based on the fact that the form
specified for coordinate transformation "f" in the

theorem defines "f" as a translation. Therefore, the

104

application of "f" to any set of coordinates "T"
merely redefines the absolute location of all the
points represented by the coordinates in "T". The
result f(T) is a set having the same number of ele-
ments as set "T", and all relative positional
relationships (location and orientation) existing
between the elements of "T" are maintained between
the corresponding elements of f(T).

Now consider the application of f to the encode-
ment T1. The x-coordinate of each terminal symbol
in T1 will be consistently relocated by "a" units,
while the y-coordinate of each terminal symbol in T1
will be consistently relocated by "b" units. Hence
the result f(Tl) is a set consisting of the same
number of terminals as T1, with f defining a 1-1
correspondence between these terminals. Further,
the relative positional location existing between the
coordinates of any two terminals in encodement T1 is
maintained between the translated coordinates of the
corresponding terminals in f(Tl). But the coordi-
nates of the terminals in f(Tl) represent the centers
of grid squares on the encodement grid, Since they
are derived by integer-valued translations -- through
integers "a" and "b" of f -- of the coordinates of the
elements in encodement T1. Thus the set f(Tl) can be
interpreted as representing a set of shaded grid

squares in an encodement, in which case the relative

105

positional relationships existing between any two
shaded grid squares in encodement T1 is the same as
the relative positional relationships which exist
between the two corresponding shaded grid squares in

encodement f(Tl). This means that T and f(Tl) are

1
equivalent encodement representations for the same
snapshot (at the encodement resolution).

Now if f(Tl) = T2, then obviously f(Tl) and T2
are equivalent encodement representations for the
same snapshot, since they are identical. And since
we have already Shown that T1 and f(Tl) are equivalent
encodement representations for the same snapshot, it
follows that T1 and T2 are equivalent encodement
representations of the same snapshot (at the encode-
ment resolution).

Conversely, assume that T1 and T2 are equivalent
encodement representations for the same snapshot
(at the encodement resolution). Then T1 and T2 have

the same number of elements7, and a 1-1 correspondence

must exist between their elements such that the

 

7 This is a necessary condition for two encodements
to be equivalent representations for the same snapshot.
For if the number of elements in encodements T1 and T2

were different, then one would represent a pattern con—
taining more shaded grid squares than the other. It would
then follow that the two shaded grid square patterns
represented could not be the same snapshot.

106

relative positional relationships existing between any
two Shaded grid squares in encodement T1 also exists
between the two corresponding shaded grid squares in
encodement T2. Let "f." denote the 1-1 transformation
from the elements of T1 to the elements of T2 which
realizes this l-l correspondence, so that f'(T1) = T2.
Then f. must be a coordinate transformation which
preserves in f'(T1) the relative positional relation-
ships existing between the elements of T1, implying
that f' is a translation. Further, since f'(T1) = T2
and T2 is an encodement, the coordinates of all the
terminal symbols in f'(T1) must represent the centers
of grid squares on the encodement grid. Similarly,
Since T1 is an encodement, the coordinates of all the
terminal symbols in T1 also represent the centers of
grid squares on the encodement grid. It thus follows
that the translation increments of f. must be integer
values for both the x-coordinate and y-coordinate
relocations. Hence f1 is a translation of the same
form as f, with f'(T1) = T2.
Q.E.D.

The intuitive interpretation of Theorem 3.4 is that
two T-F scene view encodements (on the same resolution
grid) represent the same snapshot whenever the entire pat-
tern of shaded grid squares defined by one of the encode-

ments can be translated to coincide precisely with the

107

entire pattern of shaded grid squares defined by the other
encodement. This suggests the existence of an effective
procedure for determining whetheror not two encodements
represent the same snapshot, and the construction of such

a procedure appears as the proof of the next theorem.

Theorem 3.5

 

There exists an effective procedure for deter-
mining whether or not two T—F scene view encodements
(relative to the same resolution grid) represent the

same snapshot at the encodement resolution.

Proof
Let T1 and T2 represent two encodements relative
to the same resolution grid, where (using the notation

of the "linguistic" algorithm)

T1 {g(x1,19Y1’1):g(x1,zeyl,2)9---ag(x1,p:yl,p)}
and

T

2 ' {g(x2,1.y2,1).g(xz,2.y2,2)..-..g(x2,q.y2,q)h
with p and q non-negative integers. Using Theorem
3.4, it suffices to show that one can effectively
determine whether or not there exists a translation
f having the form defined in Theorem 3.4 for which
f(Tl) = T2. A procedure for accomplishing this is
now described.

First compare the values of p and q, which

represent the number of elements in T1 and T2,

108

respectively. If p f q, then T1 and T2 cannot be
equivalent encodement representations for the same
snapshot (see Footnote 7), and we terminate this
procedure with that conclusion. On the other hand if
p = q, then compute the values of "a" and "b" as
follows:

a = min{x2,1,xz’z,...,x2’q}

- min{x1’1,x1’z,...,x1’q}

and

b = min{y2,1,y2,2,...,y2’q}

- min{y1,1,y1’2,...,y1,q}.

Note that "a" represents the horizontal distance and
direction (relative to T1) that exists between the
leftmost shaded grid squares of T1 and T2, while "b"
represents the vertical distance and direction
(relative to T1) that exists between the lowest
shaded grid squares of T1 and T2. Further, both "a"
and "b" are integers, since each of the x- or y-coor-
dinates of any of the terminals in encodements T1
and T2 represents the middle line between two conse-
cutive grid lines, making the difference between any
two x-coordinates or any two y-coordinates an integer.
So if T1 and T2 are encodements which represent the
same snapshot, then the values of "a" and "b" repre-
sent the integer valued x and y distances, respec-

tively, through which T1 must be translated on the

109

encodement grid to coincide with T2. Thus, we can
define translation "f" by

f(X.y) = (x+a.y+b).
and determine f(Tl). Then if f(Tl) = T2, we conclude
(using Theorem 3.4) that T1 and T2 are equivalent
representations for the same snapshot at the encode-
ment resolution. On the other hand, if f(Tl) # T2,
we conclude (again using Theorem 3.4) that T1 and T2
are not equivalent encodement representations for the
same snapshot. In either case, the procedure is
terminated.

Q.E.D.

Both theorems 3.4 and 3.5 state results which will be
useful in examining the characterization of scene views
through the snapshots they generate. We turn to such an

examination next in Chapter 4.

Chapter 4

SCENE VIEW RECOGNITION WITH

SNAPSHOT SIGNATURES

4.1 Scene View Reco nition:
General Considerations

 

This chapter will deal with the problem of the
recognition of scene views from their encodements. The
problem stems from the fact that the encodement of a scene
view on a finite grid does not necessarily preserve the
full structural detail which characterizes that scene
view. It is apparent that grids of coarse resolution
(lower resolution values) are less capable of faithfully
encoding detail in a scene view than grids of fine reso-
lution (higher resolution values) would be. Hence the
problem becomes one of identifying a scene view from its
approximate representation on an encodement grid of finite
resolution, where the degree of difficulty in making such
an identification is directly related to the relative
coarseness of the grid.

The scene view recognition problem will be examined
in the context of determining which of a finite number of
known candidate scene views is represented by a given

encodement (or encodements). Hence, we assume that a

110

111

finite number of prototype scene views are given, along
with one or more encodements of a scene view which is an
unknown one of the prototypes. The problem then is to
identify which one of the prototype scene views generated
the encodement (or encodements). Practically, an exact
identification may not always be possible, in which case
the goal is to keep the probability of making an identi-
fication error small.

We shall assume that any encodement of an unknown
scene view results from some random positioning of an
encodement grid over that scene view. This is a realistic
assumption in many practical applications. For example,
consider a situation in which a remote sensing "camera"
is arbitrarily positioned (without tilting) to record the
"picture" of some unknown object currently in view, after
which the recorded "picture" is transmitted elsewhere for
processing. In this case, the encodement of the unknown
scene view is analogous to the ”picture" of the unknown
object transmitted from the camera. Further, the reso-
lution of the encodement grid is analogous to the reso-
lution with which the camera records the picture. Finally,
the random positioning of the encodement grid over the
unknown scene view corresponds to the arbitrary posi-
tioning of the camera before it records a picture of the
unknown object.

The consequence of assuming the random positioning of

an encodement grid over a scene view is that the scene

112

view can generate, and thus be associated with, two or more
snapshots. This implies that the characterization of a
scene view in terms of the encodements it can generate at

a specified grid resolution must be based on more than one
snapshot. This will become more evident in the discussion

and examples of the next section.

4.2 Snapshot Sets

It was noted in Chapter 2 that the encodement
which results from a grid placed randomly over a scene
view is dependent upon, among other things, the relative
translational position existing between the grid and scene
view. This became even more apparent in Chapter 3, where
the algorithm for generating the encodement for a scene
view was dependent upon the translationally-fixed ("T-F")
position of that scene view with respect to the encodement
grid. Since we now wish to consider the encodement which
results from a random positioning of a grid over a scene
view, it might appear that the encodements produced by all
possible T-F positionings of the scene view relative to
the encodement grid would need to be examined. Fortunately
however, we are interested only in the distinct snapshots
which are represented by all these encodements. In this
case, due to the symmetry of the encodement grid, any two
T-F scene views which represent integer valued trans-
lations -- both horizontally and vertically -- of each

other on the encodement grid will generate encodements

113

which represent the same snapshot. This observation is
rephrased in Theorem 4.1 in terms of a restricted set of
encodement grid translations which will produce all possi-
ble snapshots of a scene view on that grid. The proof will
utilize the following notation. Let "D" represent the set
notational description of a T-F scene view relative to
some encodement grid, where
{(al,b1),(c1,d1)},{(a2,b2),(c2,d2)},
...,{(ap,bp),(cp,dp)}
with p a positive, non-zero integer. Also let "f" denote
a two-dimensional coordinate transformation. Then the
notation "f(D)" denotes the set
{f(a1,b1),f(c1,d1)},{f(az,b2),f(c2,d2)},
...,{f(ap,bp),f(cp,dp)}

Theorem 4.1

 

All possible snapshots of a scene view on an
(extended) encodement grid are represented by the
encodements which result from all possible trans-
lations of the encodement grid which move the scene
view no more than % grid unit, in both the horizontal
and vertical directions, away from its (arbitrary)

initial T-F position on that encodement grid.

Proof
Let "Dr" denote the set notational description

of some T-F scene view in its initial position

114

relative to an (extended) encodement grid of reso-
lution r. Further, let "M" be a set of coordinate
translations as follows:

M={f I f(x,y)=(x+a,y+b), with a, b e [-.5,.5]},
where

[-.5,.5] = {z I —.5 i z i .5}.

Then the set

V = {f(Dr) | f e M}
denotes all the T-F scene views which result when
the encodement grid is translated such that the scene
view is kept % grid unit or less, in both the hori-
zontal and vertical directions, from its initial T-F
position on that encodement grid.

Now consider some translation of the encodement
grid which moves the scene view more than % grid unit
either horizontally or vertically (or both) away from
its initial position on the grid. Then the resulting
T-F scene view is given by f'(Dr), where f1 is a
coordinate translation of the form

f'cx.y) = (x+a',y+b').
where either a1 t [-.5,.5] or b1 t [-.5,.5], or both.
'Hence f'(Dr) t V. But note that

I I I I
a =a -0=a -L§

+-_51+12 +-.5_1
I I I
=(a ‘13. +-§1)+ 1a +-.§1 .
where the quantity in parentheses denotes some value

I
in [-.5,.5], and where (a +.§, denotes an integer.

115

Similarly,

b'=(b'-m'+._51) + Lh'+._51 .
where the quantity in parentheses denotes some value
in [-.5,.S], and where Ib'+.§J denotes an integer.

Let f1 denote the translation

flcx.y)=(x+a'- e'+.§. .y+b'- m «.51).
and f2 denote the translation

fz(X9Y)=(X+L§'+-§lay+lD'+-§J)-
Then f'(Dr) = f2(f1(Dr)). But f1(Dr) c v, and
f2(f1(Dr)) -- which is f'(Dr) -- denotes the trans-
lation of T-F scene view f1(Dr) some integer number
of grid units in both the horizontal and vertical
directions on the encodement grid. Such a translation
(i.e., f2) guarantees that corresponding points
(under £2) along the primitives in both f1(Dr) and
f'(Dr) will have the same relative positional
location with respect to the pattern formed by the
vertical and horizontal lines of the encodement grid.
In particular, the intersection points with grid
lines of the primitives in fl(Dr) will be mapped
(under f2) exactly onto the intersection points with
grid lines of primitives in f'(Dr), with the corres-
ponding points of intersection of the primitives in
f1(Dr) and f'(Dr) appearing on the same type of grid
line (vertical or horizontal) at the same relative
distance (and in the same relative direction) from

the grid line of Opposite type which is nearest each

116

respective point. Thus the corresponding points of
intersection with grid lines of the scene primitives
in fl(Dr) and f'(Dr) cause precisely the same pattern
of grid squares to be shaded. Further, since f1(Dr)
and f'(Dr) are related by a translation (i.e., f2),
the relative positional relationships existing i
between corresponding points of intersection of the
scene primitives with grid lines will be the same in
both f1(Dr) and f'(Dr). Hence, if T1 and T. repre-
sent the encodements (as a set of terminals with
coordinates which denote the centers of the shaded
grid squares on the encodement grid, as in Chapter 3)
produced by T-F scene views fl(Dr) and f'(Dr),
respectively, then f2(T1) = T'. Hence, by Theorem
3.4, it follows that T1 and T. are equivalent encode-
ment representations for the same snapshot on the
grid of resolution r.

Thus, the theorem has been proved, Since
translation f' was arbitrary.

Q.E.D.

Note that the limited range of translations of the
encodement grid Specified by Theorem 4.1 as being suffi-
cient to generate all the possible snapshots for a scene
view also insures that every such snapshot can be entirely
represented within the borders of the extended grid of the

encodement resolution. This follows because any scene,

117

and hence any scene view, can be contained entirely within
the borders of the standard screen. Letting some T-F
position of a Scene view on the standard screen define the
initial T-F position of that scene view relative to the
(extended) encodement grid will then insure that any point
on any primitive of the scene view is at least 1 grid unit
(at resolution r) away from the border of the extended
grid of arbitrary resolution r. Hence, any translation
of the encodement grid from this initial position which
has its horizontal and vertical components of motion
limited to % grid unit (at the encodement resolution) will
produce only T-F scene views which, at any point, are at
least % grid unit away from any border of the extended
grid of the encodement resolution. Thus, since no border
line of this extended grid is touched, the entire encode-
ment -- and hence the snapshot it represents -- lies
within its borders.

The set of all the distinct snapshots which a scene
view can generate on an encodement grid can be considered
as a collective representation of that scene view relative

to that grid. This suggests the following terminology.

Definition 4.1

 

A snapshot set of a scene view, relative to an

 

encodement resolution r, is the collection of all

snapshots which can result by encoding that scene

118

view (in different relative translational posi-

tionings) on the extended grid of resolution r.

The snapshot set of a scene view relative to any finite
encodement resolution r will itself be finite, since the
maximum number of binary encodements -- each representing a
(not necessarily distinct) snapshot -- which can be formed
by the grid squares of the extended grid of resolution r is
2(r+2)2. Of course, not all of these possible snapshots
will appear in the snapshot set of a particular scene view.
In fact, the number of distinct snapshots in a snapshot set
of a scene view can be relatively small, as illustrated in
Figure 4.1. The shaded grid square patterns labeled (a)
through (g) in Figure 4.1 represent the seven snapshots in
the snapshot set of an isosceles triangle relative to an
encodement grid whose resolution makes the base and
altitude of the triangle 2 grid units and 1 grid unit,
respectively. Similarly, the Shaded grid square patterns
labeled (1) through (IV) in Figure 4.1 represent the four
snapshots in the snapshot set of a square relative to an
encodement grid whose resolution makes the side length of
the square 2 grid units. The appearance of the figure
itself (triangle or square) in each of the snapshots is
intended to illustrate its fixed orientation throughout

the encodement process. Also, the relative translational
positioning existing between the figure and the grid lines

shown in each snapshot illustrates how that snapshot might

119

 

 

ﬁﬁﬁ
mmMﬂmm
mmﬁMmm
ﬁﬁﬁ

(II)

(m ﬁﬁ
%
ﬁﬁﬁ

ﬁﬁﬁ

@ﬁﬁﬁ
ﬁﬁﬁﬂ

(III)

Egg
kmﬁ

(d)

 

were
aﬁm .
see

(IV

Two examples of snapshot sets.

Figure 4.1.

120

be generated in grid encodements of that figure. However,
remember that the figure itself is not part of the snap-
shot -- only the shaded grid squares are.

For scene views having a more complex structure than
those just considered in Figure 4.1, the determination of a
snapshot set -- especially with respect to high encodement
resolutions -- can be a formidable task. However, a prac-
tical method exists for approximating a snapshot set in
such cases. Essentially, the method consists of applying
the "linguistic" algorithm of Chapter 3 a finite number of
times to produce the T-F scene view encodements which
result from randomly chosen relative translational
positionings between the encodement grid and scene view.
The random sample of encodements which results for that
scene view can then be examined using the effective proce-
dure given in the proof of Theorem 3.5 to eliminate any
duplicate specifications of the same snapshot within the
sample. The resulting collection of encodements, inter-
preted as a collection of distinct snapshots, then repre-
sents an approximation for the snapshot set of the scene
view relative to the resolution of the encodement grid.

A method for choosing the random relative trans-
lational positionings between the encodement grid and
scene view in the snapshot set approximation procedure will
now be explained. Let "v" represent the scene view whose
snapshot set is to be approximated, and let "v0" denote

some T-F position of ”v" whose set notational description

121

relative to the standard screen is denoted by "D". Con-
sider the set "V" given by
v = {f(D) I f(x.y)=(x+a,y+b) and a.be[-7%.I%1},

where [-7%,7%] = {Z I -7% < z i 7%}, and r is a positive,

non-zero integer. Each of the elements of V is the set
notational description of a T-F position of ”v" relative
to the standard screen, produced by some translation which
moves "v" away from its position at "v0" no more than 2%
grid units, in both the horizontal and vertical directions,
on the standard screen. But since a distance of 7% grid
units on the standard screen corresponds to a distance of
% grid unit on the (extended) grid of resolution r, each
of the elements of V can be interpreted as the set nota-
tional description of a T-F position of "v" relative to

the standard screen in which "v" has been translated away

from its position at "v " no more than % grid units, in

0
both the horizontal and vertical directions, on the
(extended) grid of resolution r. Hence, it follows from
Theorem 4.1 that all possible snapshots of "v" relative
to resolution r are generated by encoding all the T-F
scene views in V on the extended grid of resolution r.
But V is an infinite set, so it is not practically pos-
sible to individually encode each of the T-F scene views
in V. However, any two randomly and independently chosen

T-F scene views in "V" have the same probability of

encoding any given snapshot of "v". Hence, it seems

122

reasonable to expect that the snapshot set of "v" can be
usefully approximated by the collection of snapshots which
results from encoding all the T-F scene views in some ran-
domly chosen finite subset V' of V, provided that the
number of elements in V' is sufficiently large.1 Note
that a (non-unique) set V' of n elements can be formed
from V by independently choosing each v'eV' as follows:
(1) randomly select two independent real values "a" and
"b" from the interval [-7%,7%], and then (2) determine v'
as f(D), where f is the coordinate translation defined by
f(x,y) = (x+a,y+b). The elements of V' can then be
individually encoded at resolution r by applying the
"linguistic" algorithm of Chapter 3 "n" different times.
It should be apparent that the randomness in the
snapshot set approximation procedure just described can be
incorporated directly within the structure of the "lin-
guistic" algorithm of Chapter 3 by appropriately modifying
its "productions". One such modification will be pre-
sented which keeps the same starting type S, but introduces

an additional non-terminal symbol "8'". Additionally, the

 

1 The number of elements chosen for V' should cer-
tainly depend on the relative complexity of scene view
"v" -- the more complex the structure of "v", the larger
the set V' should be. Further, if the set V' is large
enough, the only snapshots of "v" which are likely to be
omitted by the snapshot set approximation computed from V'
would be those which have a low probability of being the
encodement result from a random translational positioning
of the encodement grid over "v". These occurrence pro-
babilities for snapshots are discussed in Section 4.3.

 

123

symbol "ran” will appear in the coordinates to denote a
randomly chosen real value from the set {zloizil}, where
that value is chosen independently for each occurrence of
the symbol "ran". The symbols "r1" and "r2" are used to
represent two independently generated random real values
between 0 and 1. Now consider the following group of

three rules:

(1)-(er 8({L1.L .Lp}.r) +

2,...

S'({L1,L2,...,Lp},r,ran,ran) ,

(I)-(b)' S'({L1,L2,...,L },r,r1,r2) +

S'({L1L2,...,Lk_1},r,r1,r2)

LJL(r a k+r1- 7,r rb k+r2- 7,r c k+rl— 7,r rd k+r2- -%)
(1)-(C)! S'Cparsrlsrz) + 9

Modify the "linguistic” algorithm of Chapter 3 by replacing
rule (I) - (a) and (I) - (b) with the rules (I) - (a)',

(I) — (b)', and (I) - (c)'. Then for some scene view "v",
let the first coordinate of starting symbol "S” represent
the set notational description for some T-F position "v0"
of "v" on the standard screen. The second coordinate of
starting symbol "S" should be the desired encodement reso-
lution for "v". Then any set of terminals which is the

exclusive result of a derivation using the modified

algorithm represents the snapshot which results from some

124

random translation of "v" within % grid units, both hori-
zontally and vertically, from its initial position "v0" on
the encodement grid. Note that this random translation of
"v" is done independently, but consistently, for each of
its primitives by the coordinates of symbol "L" in rule
(1)-(b)'-

Suppose a Simple or composite scene view is divided
into several component parts, and the snapshot sets for
each of these component parts then determined separately
on the encodement grid of resolution r. Contrary to what
might be expected, the snapshot set of the total scene
view relative to resolution r is not necessarily repre-
sented by the set of all possible combinations which can
be formed by taking one element from each of the snapshot
sets of its component parts. This is illustrated with
Figures 4.1 and 4.2. In Figure 4.1 the snapshot sets
separately determined for each of two object views are
represented -- the 7 snapshots of the triangle define its
snapshot set, while the 4 snapshots of the square define
its snapshot set. This represents 28 possible combinations
of two snapshots (one taken from each snapshot set), all
of which might be expected to appear in some composite
scene view containing both of the objects. However,
Figure 4.2 illustrates the same triangle and square having
a fixed positioning relative to each other, thus forming a
composite scene view of the two objects which has only the

7 (not 28) snapshots Shown in Figure 4.2 in its snapshot

125

Bottoms of square and triangle are
11::Eb at the same vertical level, with the

square to the right of the triangle.
The objects are separated horizontally

by 3 grid units at the encodement
resolution.

Composite Scene View
(with objects from
Figure 4.1)

 

 

 

(e) - (IV) (f) - (IV)

 

(s) - (IV)

Figure 4.2. A composite scene view and its snapshot set.

126

set relative to the encodement grid. The labeling of each
snapshot of the composite scene in Figure 4.2 reflects the
two snapshots from Figure 4.1 -— one from each object -—
which combined to form that snapshot of the composite scene
view. Figure 4.3 illustrates that the snapshot set of a
composite scene is dependent upon the relative positioning
between its component objects. The triangle and square of
Figure 4.1 have a fixed position relative to each other in
Figure 4.3 which is different from their positioning in
Figure 4.2, thus forming a different composite scene view
which has the 8 snapshots Shown in Figure 4.3 as part of
its snapshot set on the encodement grid.2 The comparison
of Figure 4.3 with Figure 4.2 thus reflects the effect that
the relative positioning of the component objects in a
composite scene view can have on the snapshot set -- in
terms of both the number and form of its elements -- of
that composite scene view. The overall conclusion to be
drawn from these illustrations is that the "superposition"
principle doesn't hold in the formation of a snapshot

set -- i.e., the snapshot set of a simple or composite
scene view "v" cannot be constructed by forming all pos-

sible combinations of snapshots from the snapshot sets of

 

The snapshots of the composite scene view in Figure
4.3 which are formed from combinations involving snapshots
(e), (f), and (g) of Figure 4.1 have not been Shown in
Figure 4.3.

127

 

:: Bottom of square 18‘% grid unit below
bottom of triangle, with the square to
the right of the triangle. The objects
C2:p::i::ji:::ef¥i:w are separated horizontally by 3% grid

Figure 4.1) units at the encodement resolution.

 

 

 

 

%% ASSESS
ﬁéﬁhh‘ﬁﬁ §§I§§
ﬂﬁﬁﬁ 8:333

(a) - (IV)
ﬁﬂ‘ﬁ
ﬁmmﬁ ﬁﬁﬁ
(b) - (IV)

,% ﬁgﬁfﬁ
WEE aﬁliﬁ
ﬁﬁﬁ ﬁﬁiﬁ

(C) — (IV)

 

 

 

 

%%%% ﬁg
ﬁﬁﬁﬁ ﬁﬁﬁ 3%-
&wﬁ ﬁﬁﬁﬁ @mﬁ 1.113%

(d) - (II) (d) - (IV)

@

IE
1%

Figure 4.3. Another composite scene View and part
of its snapshot set.

128

the component parts of the scene view considered indivi-
dually. While some subset of all these possible snapshot
combinations will define the snapshot set of "v", exactly
which subset it will be is determined by the intercon-
nection and relative positioning constraints which "v"
imposes on its components. Explicit consideration of
these constraints can be avoided by merely determining the
snapshot set of a simple or composite scene view from

encodements made with the entire scene view intact.

4.3 Snapshot Signatures

 

Note that it is possible for two distinct scene
views to have the same snapshot sets relative to some
encodement resolution. This is illustrated in Figure 4.4,
where the snapshot sets for both a triangle and a left-
right symmetrical trapezoid are shown relative to an
encodement resolution which makes their base lengths 2
grid units, their altitudes 1 grid unit, and the tOp length
of the trapezoid % grid unit. Both of the views of these
objects shown in Figure 4.4 have the same 7 snapshots in
their snapshot sets. Such an example demonstrates that
snapshot sets alone are inadequate to completely charac—
terize scene views relative to some encodement resolution,
since a single snapshot set may be generated by two (or
more) distinct scene views at that resolution.

A more complete characterization of a scene view "v"

in terms of its grid encodements at resolution r is

 

Figure 4.4.

PROBABILITY OF SNAPSHOT OCURRENCE
(with random grid positioning)

'I————'—

(b)

(e)

(d)

(e)

(f)

(s)

 

(a) Snapshot C.G.=(2,l——)

129

3
10

Snapshot C.G.=(2,%)

l 3
Snapshot C.G.=(l§,liz)

Snapshot C.G.-(l%,l)

[01 < a2]

1 3
Snapshot C.G.-(L§;Z)

[81 > 321
Snapshot C.G.=(1T%,T%)
[Y1 < Y2]

7 9
Snapshot C.G.-(li6,T5)

oe_J

ﬁﬁ
ﬁﬁﬁ
ﬁﬁﬁ
ﬁg

0

g

i
i

O

N

 

“3%
ﬁ

\

%
%%
3%

a !

s:
g

e is as s
is es s; a
a as e

533

ﬁ

g

ES

%

£353

332%

Example of two object views with the same
snapshot set, but different snapshot
signatures.

%

130

possible by associating a probability distribution with the
snapshot set S -- where S = {$1,SZ,...,Sn} -- of "v"
relative to resolution r. This probability distribution is
realized by a probability function "p" which assigns to
each snapshot 5158 a value "p(si)" which represents the
probability that Si will be the snapshot encoded as a
result of some random positioning of an encodement grid of
resolution r over the scene view "v”. We will call "p(si)"
as defined above the "probability of occurrence of snap-
shot si" relative to a specified Scene view and encodement

resolution.

Definition 4.2

 

The snapshot signature of scene view "v" relative
to encodement resolution "r" consists of the Snapshot
set of "v" relative to "r" along with an associated
probability distribution which defines the probability
of occurrence (relative to "r") of each snapshot in

the snapshot set of "v".

Examples of two snapshot signatures are illustrated in
Figure 4.4. The snapshot Signature of the triangle rela-
tive to the encodement resolution is given by the snapshot
set {(a),(b),(c),(d),(e),(f),(g)} with the associated
probability distribution: p((a))=p((b))=p((c))=0,
p((d))=a1. p((e))=el. p((f))=rl. and p((g))=61. The snap-

shot signature of the trapezoid relative to the encodement

131

resolution is given by the snapshot set {(a),(b),(c),(d),
(e),(f),(g)} with the associated probability distribution:
p((a))=p((b))=p(CC))=0. p((d))=a2. p((e))=82. p((f))=Y2.
and p((g))=62. The probability of occurrence of each of
the snapshots (a) and (c), for both the triangle and
trapezoid, is 0 because these snapshots are generated only
when the base of the figure lies exactly along a horizontal
grid line. Such a result has probability 0 of occurring
with some random positioning of the encodement grid over
either figure. Similarly, the probability of occurrence
of snapshot (b) for either the triangle or trapezoid is 0
because the snapshot is generated only when both endpoints
of the base of the figure just touch two vertical grid
lines -- again a result which has probability 0 of
occurring with some random positioning of the encodement
grid over either figure. Now snapshot (d) is produced for
either the triangle or the square whenever a horizontal
line of the encodement grid divides the figure into an
upper and lower part, and each part intersects exactly two
vertical grid lines. Such a result occurs with non-zero
probability whenever an encodement grid is randomly posi-
tioned over either figure. Further, notice that if a
horizontal grid line divides both the triangle and trape-
zoid at the same vertical level with respect to their
bases, then the bottom parts of both figures will have the
same maximum horizontal width, while the tOp part of the

triangle will have a maximum horizontal width which is

132

less than the maximum horizontal width of the tOp part of
the trapezoid. Hence, for equivalent horizontal posi-
tionings (with respect to vertical grid lines) of the bases
of both figures, there are less vertical positionings (with
respect to horizontal grid lines) for the triangle than
there are for the trapezoid which produce snapshot (d).
So it follows that the probability of snapshot (d) occur-
ring for some random positioning of the grid with respect
to the figure is less for the triangle than for the trape-
zoid -- i.e., a1 < a2. Similar considerations, based on
the fact that the tOp of the trapezoid is wider than the
tOp of the triangle, result in the conclusions that
81 > 82, Yl < Y2. and 61 < 62. Thus, the snapshot signa-
tures of the triangle and trapezoid relative to the encode-
ment resolution shown in Figure 4.4 are distinct, even
though their snapshot sets are identical at that encodement
resolution. This demonstrates that snapshot signatures --
in contrast with snapshot sets -- represent a more complete
characterization of scene views in terms of their grid
encodements. This snapshot signature characterization of
scene views will prove useful when scene views are to be
recognized from their encodements at coarse resolutions.
The determination of the theoretical probability
distribution for the snapshot signature of a scene View
"v" relative to some encodement resolution "r" will now

be discussed. We first make a definition of terminology.

133

Definition 4.3

 

A unit region relative to resolution r is the
set of all points contained in a square-bordered
region of area 1 on the grid of resolution r.

A standard unit region relative to resolution r

 

is a unit region relative to resolution r which has
its borders parallel to the grid lines (i.e., hori—

zontal and vertical).

Let "p” denote a reference point on scene View "v". Then
each relative translational positioning of v with reSpect
to the encodement grid of resolution r is characterized by
exactly one relative location of p with respect to the
pattern of horizontal and vertical grid lines of the
encodement grid of resolution r. In turn, each such
location of p relative to the pattern of grid lines at
resolution r is characterized by exactly one of the points
contained in an arbitrary standard unit region "R" on the
encodement grid of resolution r. Hence a random posi-
tioning of the encodement grid of resolution r over the
scene view v is equivalent to the random selection of a
location for p from R. Note that the points of R are
associated with the snapshot set "S" of v relative to
resolution r by the function "h" defined as follows: for
posR, h(po)eS such that h(p0) represents the snapshot of
v that would be encoded on the grid of resolution r with

v positioned so that its reference point p is at location

134

p0. Thus S = {h(po)|posR}. Now each snapshot in S can be
characterized in terms of the set of points R1 9 R which
are associated with it under "h" -- i.e., for S={sl,sz,...,
Sn}, Ri = {polpoeR and h(p0) = Si} for 1=l,2,...,n.
Further, each R1 can be interpreted as defining a subregion
(not necessarily connected) of R having an area (inter-
preted relative to the real Cartesian plane) denoted by
”A(Ri)", for i=l,2,...,n. Analogously, let "A(R)" denote
the area of region R, so that A(R)=l (recall R is a stan-
dard unit region). Then the probability that h(po)=si for
a randomly chosen poeR is just the probability that posRi,
which is given by

A(Rl) A(Ri) .
“KTRI = 1 = A(Ri) , for 1=l,2,...,n.

 

But the probability that poeRi is just the probability of
occurrence of snapshot si -- i.e., p(si) -- relative to
scene view v and resolution r. Hence, the snapshot signa-
ture of "v" relative to r is given by the snapshot set

S = {h(po)|poeR} = {sl,sz,...,sn}

with associated probability distribution

p(si) = A(Ri) , for 1=l,2,...,n.

Figure 4.5 illustrates the theoretical snapshot signa-
ture of a rectangle relative to an encodement grid whose
resolution makes the rectangle 3 grid units high and 5%
grid units wide. Its snapshot signature is given by

snapshot set S = {51,52,53,S4,SS,56,S7,58} with associated

 

p(51) = 0

 

Snapshot $3
p(53) = 0

iﬂﬂﬂﬁﬁﬂﬂﬂﬁﬁﬁﬂﬁ
iﬂﬁﬂﬁﬁﬁﬁﬁﬂﬁwﬂ

%
8

Snapshot 55
10(551 = 0

 

Snapshot s7

PCS7) = %

135

 

P($2) = 0

 

Snapshot 54
p(s4) = 0

gﬁﬂﬁﬂ8ﬁﬂﬁﬂﬂﬁwiﬁﬁ
asauwemmnmmmm

%
%

WﬁWHWRWﬂWHRWW
ﬁﬁﬂﬁﬁwﬁﬂﬁﬂﬂﬂ

Snapshot s6
p(56)=0

seesaw WEE
suammmmmemmer

% ﬁ
@

ﬁgﬁﬁﬁﬁ

%

Snapshot $8

p(58) = %

Figure 4.5. Snapshot signature of a 3 x 5% grid unit

rectangle.

136

probability distribution

p(sk)=0 for k=l,2,...,6, and p(s7)=p(s8)=%.

The determination of this probability distribution can be
illustrated by picking some reference point p on the rec-
tangle, and some standard unit region R on the encodement
grid. Let the upper left corner of the rectangle be
reference point p. Also, let R be the standard unit
region defined by one of the squares in the encodement
grid3 -- say the one whose vertical borders are defined by
x=i and x=i+l, and whose horizontal borders are defined by
y=j and y=j+l. Then

R = {(x,y)|i<x:i+l and jﬁy<j+l},

and

W
ll

1 {(i+l,j)} R2 = {(i+%,j)}

3 - {(x,j)|i+%<x<i+l}

70
I

70
I

4 - {(x.i)li<x<i+%}

7U
ll

5 {(i+1.y)|j<y<j+1}

FU
II

6 {(i+%.y)li<y<i+1}

 

3 Only one vertical and one horizontal border line
are considered as part of a standard region R, which
insures that each point in R characterizes a unique rela-
tive positioning with respect to the corresponding grid.
In this case, we choose the lower and right borders of the
grid square to be included in R. Points on the left and
tOp borders of the grid square are then not elements of R.

137

70
l

7 - {(x,y)|i+%<x<i+l and j<y<j+1}

R8 - {(x,y)|i<x<i+% and j<y<j+l}.

Each of these subregions Rk’ k=l,2,...,8 are illustrated in
Figure 4.6 as either a point, line segment, or shaded area,
relative to a dashed square-bordered region -- which
includes the lower and right border lines, but not the
upper and left border lines -- which represents R. With
A(R)=l, note that

A(R1)=A(R2)=A(R3)=A(R4)=A(R5)=A(R6)=O,

while

A(R7)=A(R8)=%.

If the upper left shaded grid square of each snapshot shown
in Figure 4.5 is interpreted as region R, then the rec-
tangle shown in each snapshot sk has its upper left corner
point p in a representative location from Rk’ for
k=l,2,...,8.

This theoretical procedure for determining the pro-
bability distribution associated with the snapshot set of
a scene view can become extremely difficult to apply --
especially at high encodement resolutions -- for scene
views having greater structural complexity than the rec-
tangle of Figure 4.5. The subregion of a standard unit
region which is associated with a particular snapshot may
be odd shaped or even disconnected, and hence quite hard

to identify. Computing the area of such a subregion may

138

 

 

 

j + 1 I "' — — — j + 1 I — — — .—
' I l I
I | l
| I I
l |
j——-—3 j-—~—'
1 i+1 1 1+1
R1 R2
j+1I———- j+1.———-
I I
I . 1 .
' I I
l . I I
1 1+1 i 1+1
R 3 R 4
Jm———- rn-W‘i
I
| I l
I l
.I . I I
J'— "" -' J" "" "
1 1 + 1 1 1 + 1
R 5 R 6
j + 1 "' """'" j + 1 |—_ "" '-
i i+l i i+1
R7 R8

Figure 4.6. The eight subregions of R which define the
probability distribution for the snapshot
signature of the rectangle in Figure 4.5.

139

likewise be difficult. Fortunately, the snapshot set
approximation procedure discussed in Section 4.2 provides
a practical method for approximating the probability
distribution of the snapshot set, as well as the snapshot
set itself, in such cases. Let r, v, v0, D, V, and V' be
as defined for the snapshot set approximation procedure of
Section 4.2. Note that when scene view v is considered in
all its T-F positions defined by set V, any reference point
p on v can assume precisely those locations in some stan-
dard unit region R on the encodement grid of resolution r.
Thus, considering some randomly chosen finite subset V' of
V is equivalent to considering some randomly chosen finite
subset R' of points from R, where V' and R' have the same
number of elements, say n. So it is reasonable to eXpect
that the relative frequency with which a given snapshot
occurs in the collection of encodements of the elements of
V' would provide a useful approximation to the probability
of occurrence of that snapshot for scene view v at encode—
ment resolution r —- provided that n is sufficiently
large. This forms the basis of the following procedure
for approximating the snapshot signature of a scene view
"v” relative to encodement resolution r:
(1) apply the modified ”linguistic" algorithm (the

"linguistic” algorithm of Chapter 3 with rules

(1)-(a) and (1)-(b) replaced by rules (1)-(a)',

(I)-(b)' and (I)-(c)', as described in Section

4.2) n times -- where n is a large, but finite,

140

integer number -- to generate a collection of n
random encodements of "v" at resolution r;

(2) using the effective procedure of Theorem 3.5,
determine the set S={sl,sz,...,sq} of distinct
snapshots which are represented in the n encode-
ments, and the number ni of those n encodements
which represent each of the snapshots Si’ for
i=l,2,...,q;

(3) determine fi’ the relative frequency of occur-
rence of the snapshot si in the n encodements,
by fi = 2; for i=l,2,...,q;

and (4) approximate the snapshot signature of v relative
to r by the snapshot set S={sl,sz,...,sq} with
the associated probability distribution
p(si)=fi, for 1=l,2,...,q.

Again, the larger the value of n, the better this approxi-

mation will be.

4.4 The Utility of Snapshot
Signatures

 

 

As noted in the preceding section, snapshot
signatures have the capability to characterize scene views
at encodement resolutions which are inadequate to produce
structurally different encodements of distinct scene views.
For if an encodement resolution r is such that a set of
two (or more) distinct scene views have the same snapshot
set relative to r, the scene views can still be distin-

guished at resolution r if each scene view associates a

141

unique probability distribution with that snapshot set.
This characterization capability of snapshot signatures
will form the basis for the scene view recognition approach
described in the remainder of this chapter.
4.4.1 Scene View Recognition:

The Comparison of Snap-

shot Signatures

Consider a scene recognition situation in which

there are only a finite number of scene views which can ever
occur. These scene views will be called the prototypes,
and will be denoted by the set P={v1,vz,...,vk}. Further,
for i=l,2,...,k, let vieP have its theoretical snapshot
signature relative to encodement resolution r given by

,s. } relative to

i,l’si,Z"" 1,ti

resolution r with (2) associated probability distribution

(1) its snapshot set S:={s

p§(si,j) for j=l,2,...,ti. Now suppose an unknown scene
view veP is to be recognized based on a random sample
determined from the following random experiment Br: 3
grid of resolution r is superimposed over v in a random
translational position, and v then encoded to produce
snapshot er of v. An outcome er of Er can be a snapshot
in any snapshot set SE for i=l,Z,...,k, since v may be any
one of the elements of P, and hence the sample space Sr

of Er is given by

r_ r r r
3 —s1 USZU...Usk

If the experiment Er is repeated independently n times,

the random sample that is produced will be denoted by the

142

n-dimensional vector

(e§,e§,...,e;) ,
where eieSr denotes the outcome of the i th repetition of
experiment Er, for i=l,2,...,n. From such a random sample
an approximation to the snapshot signature of v relative to
r can be derived, where the approximation consists of snap-
shot set {e§,e§,...,e;}, with each distinct snapshot in
this set assigned a probability of occurrence equal to its
relative frequency of occurrence in the random sample.
Then the problem of identifying v can be considered as the
problem of identifying which of the known (theoretical or
approximated) snapshot signatures (relative to r) of the
scene views in P is the one represented by the approxi-
mate snapshot signature determined from the random sample.
However, since the snapshot signature of v is only an
approximation (it has been determined from a finite random
sample), and also because one or more of the "known" snap-
shot signatures may also be an approximation (see Section
4.3), an exact match between the approximate snapshot
signature of v and one of the "known" snapshot signatures
will not necessarily occur. Hence, the unknown scene
view v must then be identified as that element of P whose
"known” snapshot signature most closely matches the approx-
imate snapshot signature represented by the random sample.
In this case, the particular criteria used to measure the

"closeness" of a match between two snapshot signatures

143

will be a critical factor in determining the degree of
certainty with which an identification of v is made. It
is generally impossible to choose closeness criteria which
insure that v will be identified correctly with 100% cer-
tainty, due to the fact that random samples and approxi—
mations are involved. However, it is possible to minimize
or control the probability of making an identification 5
error using standard statistical decision criteria.
Such criteria can be effectively applied here as
a measure of the degree of closeness between snapshot ,9
signatures (relative to the same resolution r) based upon
their respective probability distributions. These
decision criteria are the ones which arise in statistical
hypothesis testing problems (see [H-4], [M-4], [w-S], or
any standard text on statistics), so the scene view recog-
nition problem will now be reformulated in that context.
4.4.2 Scene View Recognition:
The Testing of Hypo-
theses

Consider the finite set P of prototype scene
views, and a random sample generated by repeated appli-
cation of experiment Er with some unknown scene view veP.
The identification of scene view v is considered in a
hypothesis testing context by making two (or more) con-
jectures concerning the identity of v such that exactly
one of the conjectures (hypotheses) must be true. For

example, if the unknown scene view veP is to be identified

144

precisely, the appropriate multiple hypotheses testing
problem consists of testing k simple hypotheses Hi’ where

Hi: v=vi for i=1,2,...,k.

Or if it is only necessary to determine whether or not
scene view v represents a specific scene view vi eP, then

0 I
the appropriate hypotheses testing problem consists of ; I

testing a simple hypothesis H1 against a composite alter- I

native H2, where

H1: v=vi and H2: v#vi . .;
O O I."

In any case, the relative validity of each hypothesis
(conjecture) is tested using information from the random
sample, and all but one of the hypotheses rejected. The
single hypothesis accepted represents that conjecture about
the identity of v which is most likely to be true, based on
the particular criteria used for testing the relative
validity of the hypotheses. One such criterion which is
commonly used is the likelihood ratio, which we will dis-
cuss relative to k=2 (i.e., P contains just two scene
views). Although this represents a relatively simple
situation, it will suffice to demonstrate both the utility
of snapshot signatures and the use of a standard statis-

tical decision criterion in scene view recognition.

145

4.4.3 The Likelihood Ratio and
Sequential Scene View
Identification
In the case where k=2, the set P of prototype
scene views is {v1,v2}. Thus, any hypotheses testing pro-

blem for determining the identity of v becomes a test

between two simple hypotheses H1 and H2, where

H1: v=v1 and H2: v=v2.

The test is based on a random sam le (er,er,...,er) derived
p l 2 n

from n independent repetitions of experiment Br, and it is
assumed that the snapshot signatures of v1 and v2 relative
to resolution r are known. [Recall the notation defined
in Subsection 4.4.1, where the symbols Si, pi, SE, and p;
represent the theoretical snapshot set and its associated
probability function relative to resolution r for v1 and
v2, respectively, and where Sr=Si LJSE represents the
sample space of Br.] The likelihood ratio Ln can be used

as the testing criterion (see [H-4] or [M-4]), where

Probability of random sample (e§,e§,...,e;)
occurring if H2 is true

 

Probability of random sample (e§,e§,...,e;)
occurring if H1 15 true

p§(e§)-p§(e§)- ... ~p§(e;)

 

p§(e§)-p§(e§)- ... -p§(e;)
n

=T-l- I); (3:)

r r
i=1 P1(ei)

146

and where
r r _ O r r o o
pj(ei) - 0 1f eiréSj for j=1,2 and 1=l,2,...,n.

Intuitively, Ln gives the relative likelihood that H2 is
true as compared to H1, for the given random sample. If
Ln=l, both hypotheses can be considered equally likely to
be true. If 0:Ln<l, H1 can be considered more likely to
be true than Hz, with that likelihood being greater the
closer the value of Ln is to 0. On the other hand, if
l<Ln§ n, H2 can be considered more likely to be true than
H1, with that likelihood being greater for larger values
of Ln' With fixed sample size n, a threshold value c of Ln
is picked such that the test accepts hypothesis H1 as true
if Ln< c and rejects hypothesis H1 (accepts hypothesis H2)
if Ln> c. Note that by making c larger, we can increase
our certainty that the test will not reject H1 when H1 is
true, but we also decrease our certainty that the test
will not accept H1 when H1 is false. Putting this another
way, by making c larger the probability that the test will
reject H1 when H1 is true is made smaller, but the pro-
bability that the test will accept H1 when H1 is false is
made larger. This reflects the two types of errors which
must be considered in tests of hypotheses: (1) a type (I)
error, which is made when the test rejects H1 when H1 is
true; and (2) a type (II) error, which is made when the
test accepts H1 when H1 is false. In conventional nota-

tion [M-4], p(I) is used to denote the probability of a

147

type (I) error occurring and p(II) is used to denote the
probability of a type (11) error occurring, relative to a
specified test. As noted above, it is only possible to
control one of p(I) or p(II) if the likelihood ratio cri-
terion is used relative to a fixed sample size, since a
single value of c must be chosen. Generally, c is chosen
such that p(I) is fixed at a certain acceptable value a
(typically .05 or .01), which automatically assigns some
value 8 to p(II). While it has been shown in the liter-
ature [M-4] that the likelihood ratio test minimizes the
value of B (of p(II)) for a fixed value a (of p(I)) when
a sample of (fixed) size n is being considered, the value
8 (of p(II)) may still be too large to be acceptable if
this scheme were to be used for practical scene view
identification.

However, in many practical applications involving
scene view recognition, it is reasonable to assume that
experiment Er could be performed an arbitrary (but still
finite) number of times on an unknown scene view v. For
instance, if v was an unknown typewritten digit of a zip
code, a mail sorter could reasonably have the capability
to (rapidly) scan v an arbitrary number of times in
attempting to identify the digit it represents. In such
situations where the sample size n can be varied at will,
it is possible to control both p(I) and p(II) at arbitrary
pre-specified levels a and B, respectively. This is done

by determining both a lower threshold c1<l and an upper

148

threshold c2>l for Ln such that the test accepts hypothesis
H1 with p(II) = B if Ln: c1, while the test accepts hypo-
thesis H2 (rejects H1) with p(I) = a if Ln 1 c2. If c1 <

Ln < c2, then the sample size n is not yet large enough
(assuming that the test is performed sequentially with Ll,
L2,L3,...,etc.) for the test to either reject or accept
hypothesis H1 with the required degree of certainty of not
making either a type (I) or type (II) error. This test is
performed in a sequential manner as given by the following
sequence of rules:

(1) set i=1;

(2) make an observation e; from experiment Br;

(3) compute Li’ based on the random sample

Ie§.e§,...,e§);

(4) (a) if Li

IA

c1, accept H1 and terminate the test;

(b) if Li

I V

c2, accept H2 and terminate the test;
(c) if c1 < Li < c2, increase i by 1 and repeat

the test starting with step (2).
This test is known as the "sequential likelihood ratio
test" [m-4], or alternatively as Wald's "sequential pro-
bability ratio test" (denoted "SPRT" in [W-3]). We now
state several well-known prOperties of the above SPRT
without proof (proofs can be found in [H-4], [M-4], [W-l],
and [w-31):

and

(a)

(b)

(C)

149

simple and accurate approximations to the cutoff
values c1 and c2 for Ln’ with specified p(I)
and p(II) = B, are given by

-8 .
CI'It—a and CZ’T’

the sample size n (i.e., the number of tests)
required for the SPRT to terminate will be finite
with probability 1, provided v1 and v2 have dis-
tinct probability distributions over the sample
space Sr = 81 LJSE

the approximate average sample size E(n|vi) for
the SPRT when vi is the actual scene view repre-
sented by v, for i=1,2, is

Pf(vi)-log(c2)+(l-Pf(vi))°log(c1)

 

E( v.) = 4
ml 1 E(z|vi)

where

P fi(v ) is the power of the test given by
the probability that v1 will be

rejected when V1 is the actual scene

view, so
a for i=1
P (v.) = ,
f 1 1'8 for i=2
and
r r
E(zlvi) = P :(51)‘ 108-:——;—'+ p§(s§)°10g—?——?—-+
p1(51) p1(52)
r r
s )
+ 1‘ 5r 1 [Bf—q—
p ( ) 02
q r(Sr)
P1 q

-f-— 1 ‘I".

150

with p§(s§) = o if s? ¢ 5? , j=l,2,...,q.

The average sample size approximation E(n|vi), i=1,2, given
in (c) above breaks down for the cases when E(z|vi)=0 or
when E(z|vi) doesn't exist (i.e., when E(z|vi)=1b»).
However, on the average, the SPRT with p(I)=a and p(II)=B
will require a smaller sample size n for making a decision
than would a non-sequential (fixed-sample size) likelihood
ratio test which realizes the same p(I)=o and p(II)=B.

And by (b), we know that this sample size n will be finite
for the cases of interest. Hence, the SPRT provides a
practical procedure for utilizing snapshot signatures in
scene view recognition. This is illustrated in the two

examples presented next.

4.4.4 Two examples

The expected sample size for the SPRT will be
demonstrated for two examples in which an unknown scene
view ch ={vl,v2} is to be identified by testing the
hypotheses

H1: v=v1 versus H2: v=v2

on the basis of a random sample from experiment Er. In

both examples, the resolution r will be chosen such that

 

151

v1 and v2 have the same snapshot set relative to r, but
distinct probability distributions over that set. Further,
it is assumed that p(I) and p(II) are set at acceptable

levels a and B, respectively.

Example 1

 

Consider some arbitrary encodement resolution
ro Z 6. Let v1 be a rectangle with horizontal side
length 5% grid units and vertical side length 3 grid
units, and v2 be a rectangle with horizontal side
length 5% grid units and vertical side length 3 grid

units -- all lengths being specified relative to re.
The snapshot signature of v1 is illustrated in Figure
4.5, and is given by snapshot set 8:0 with associated
probability distribution pic, where

r
0
S1 = {51’52’53’54’55’56’57’58}

with
r
P1°(Si) = 0 for i=1,2,...,6, and
r r
1
910(57) = P10(58) = 7 °

The snapshot signature of v2 is given by snapshot
r r

set 820 with associated probability distribution p20,

where

820 = 810 = {51’52’53’54’55’56’57’58}

 

152

with

r0
p2 (Si)

0 for i=1,2,...,6,
r r 3

P20(57) = 1 ’ and P20(58) = I °

r r
Then the sample space S 0 of experiment E 0 is given

by
1‘

r r
o _ o o _

S ‘ S1 LJSZ ‘ {sl’52’53’54’55’56’57’58}°

Now, the average size of the sample needed by the

SPRT to identify v if v1 is the actual scene view is

 

 

given by:
a-logc-l-g—ﬁ) + (l-aI-logcéa)
E(IIIV1) = 9
E(z|v1)
where
8 o p:°(s )
E(z|v1) - 2 p1 (s I-Iog—r—l—
= O
J 1 p1 (sj)
= 0 + 0 + 0 + 0 + 0 + 0
1. 1/4 1. 3/4
* z 108177 * '2' 108m
= -.l4384
For a = B = .05,
E( I ) .05 log(19)+.95 log(T%) _ 85000
n V = =
1 -.14334 "11331

5.91 = 6

I
' -..b-n'

 

153

Similarly, the average size of the sample needed by
the SPRT to identify v if v2 is the actual scene view

is given by:

anneal-$3) + Mom???)

 

Emlvz) = E( I ) ’
2 V2
where
8 r pr°(s )
BMW = Z P2°(5j)'1°g‘327—L
i=1 p1°(sj)

0 + 0 + 0 + 0 + 0 +0

+ Inga—z + new

.13082

For a = B = .05,

.95 log(19)+-05 108(T%) .85000

= 7136??

 

E(n|v2)
.13082

6.48 = 7

Thus, for the identification of v with 95% certainty
of being correct (i.e., the probability that v will
be mis-identified is .05), the SPRT will require, on

r

the average, 6 observations of v based on E o if v1

is the actual scene view and 7 observations from
Bro if v2 is the actual scene view.

If it is desired to identify v with a 99% cer-
tainty of being correct, the average sample size

required by the SPRT can be determined, using

a = B = .01, as

154

.01 log(99)+.99 log(§%)

 

 

-4 5032
E(nlv ) = = .3333
1 -.l4384 "1
= 31.4 = 32 »
and
.99 1og(99)+-01 108(gé9 4 5032
E(DIV2) = = °i36§2 '

.13082 5

34.3 = 35 . I
As would be expected, the average sample sizes re- 9
quired by the SPRT are significantly larger if v is
to be identified with a 99% certainty of being cor-
rect than if v is to be identified with a 95% cer-

tainty of being correct.

Example 2

 

A horizontal line "h" of length "a" on the stan-
dard screen will have length "ra" on the (extended)
grid of resolution r (see Theorem 3.2). Then h can
intersect either (Ia, or (rag +1 vertical grid lines
at resolution r, causing either (raj +1 or Ira; +2
consecutive horizontal grid squares, respectively, to
be shaded in its encodement. Further, if h lies
along a horizontal grid line at resolution r, its
encodement will be 2 shaded grid squares high along
its entire length; while if h lies between two hori-
zontal grid lines at resolution r, its encodement

will be exactly 1 shaded grid square high along its

155

entire length. Hence, the snapshot set S; of h at
resolution r consists of the 4 snapshots s; 1, s; 2’
9 3
r r .
sh’3, and sh,4, where.
5h 1 is the solid shaded grid square pattern
9

which is 2 grid units high and [raj +1
grid units wide;

5h 2 is the solid shaded grid square pattern
’ which is 2 grid units high and [EQJ +2
grid units wide;

5h 3 is the solid shaded grid square pattern
’ which is 1 grid unit high and lIaJ +1
grid units wide;

and 5h 4 is the solid shaded grid square pattern

’ wh1ch 1s 1 gr1d un1t h1gh and [IaJ +2

grid units wide.

Now if a grid of resolution r were randomly trans-
lationally positioned over h, (l) the probability
that h would coincide with a horizontal grid line is
"0" while the probability that h would lie between
two horizontal grid lines is "l", and independently
(2) the probability that h would cross tra, grid lines
is "1 + (ra- Lra, )" while the probability that h
would cross Ira, +1 grid lines is "ra- tra, ".
Hence, the probability of occurrence p;(s;,i) of each
snapshot 5;,1 at resolution r, for i=1,2,3,4, is

given by

p§(5;,1) pﬁ(s;,2) = 0 .

r
phcs;,3) = 1 -(ra - Ira; ) .

-1 ..A.

_ 1. u .1._..__'

-_ .J— Jw

156

r r
and s = ra - ra

ph( 11,4) L._J
So, the snapshot signature of h relative to resolution

r is given by snapshot set

I‘ I‘ }

r = r r
S {Sh,l’ Sh,2’ Sh,3’ Sh,4

h
with associated probability distribution pi.

Let v1 and v2 be horizontal lines of length .91
and .98, respectively, on the standard screen. Then
the snapshot signature of v1 relative to resolution r

is given by snapshot set

= {5r

5 1,1’ 51,2’ 51,3, S1,4

r
l

(which is just Si with l substituted for h and

.91 substituted for a)

with associated probability distribution pi where

P1(51,1) = 91(51,2) = 0 ’
p§(s§’3) = 1 - (.91r - t#913, ) ,
I‘ 1‘ _ _
and p1(sl’4) — .91r l;915

Also, the snapshot signature of v2 relative to reso-

lution r is given by snapshot set

r _ r
82 ‘ {52,1’ 52,2, 52,3, 52,4

(which is just sﬁ with 2 substituted for h and

.98 substituted for a)

157

with associated probability distribution p; where

I‘ r l‘ I' _
p2(52’1) p2(52,2) ' O :

p§(s§’3) = l - (.98r - L.__981_“J ) ,

and p§(s§,4) .98r - L.3983,

Now note that L._91£, = (‘983 for r=l,2,...,ll,
and (,919 f {#983, for r112. This implies that v1
and v2 generate the same snapshot sets for r=l,2,..
..,11, but different snapshot sets for r312. We
restrict our attention to values of r between 1 and 11
for the remainder of this example, to insure that the
approximation formulas for the average sample size of
the SPRT are applicable (for r112, E(z|vi) will not
exist for either i=1 or i=2, or both). For rill,

note that SE = S? = {si, 55, 5;, SE}, where

r _ r _ r ._
si - 51,1 - $2,i for 1—1,2,3,4. Hence, the sample

space Sr of experiment Br is given by

Sr = Si LJSE = {SE, SE, 5;, 5:} for rill.

Consider r = 2. Then the sample space S2 of
experiment E2 is given by

2

_ 2 2 2 2

52’ 53'

where
2
piCsi) = pi(s2) = o ,

pi(s§) = 1 - (1.82 - L1.8_2_, ) = 1 - .82 = .18,

158

pi(si) = 1.82 - L1.82, = 82 ,
and
2 2 2 2
p2(51) - 132(52) - O :
2 2 _
P2(S3) - l - (1.96 - L1.9§J ) = 1 _ 96 = .04
2 2 _
p2(s4) - 1.96 - £1.99, = .96
Then with a = B = .05, the average size of the sample

needed by the SPRT to identify v if v1 is the actual

scene view is given by:

.05 103(19) + .95 log(T%)

 

EUIIVI) = 9
E(z|v1)
where
4 pz(sz)
- 2 2 2 '
E(Z|V1) Z P1($J) 108%
i=1 p1 i
= o + 0 + 18 log;%% + 82 10gL§§
= - 14147
Hence,
-.85000
E(an1) = 7711117 2 6
Similarly, for a = B = .05, the average size of the

sample needed by the SPRT to identify v if v2 is the

actual scene view is given by:

.95 log(19) + .05 log(T%)
E(n|v2) = 9
E(z|v2)

 

. . A. o—Q: -' I. . O

159

where
4 p2(sz)
E(zlvz) = P%(5§)’1°g_%_—%—’
j=1 p1(sj)
= o + 0 + .04 log(;%%) + .96 108(ng)
= .09116
Hence,
E(n|v2) = +%%%%% = 9'33 z 10

Thus, for the identification of v at r=2 with 95%
certainty of being correct (i.e., the probability that
v will be mis-identified is .05), the SPRT will
require, on the average, 6 observations of v based

2

on E if v1 is the actual scene view and 10 obser-

vations from B2 if v2 is the actual scene view.
To examine the effect of increased resolution on
the expected sample size, consider r=6. Then the

6

sample Space S6 of eXperiment E is given by

where

p?(s$) = p?(sg) = 0, p?($§) = .54, p?($2) ‘ 46»
and

pg(s$) = p3(sg) = 0. p3(sg) = ~12, 93(52) = '88
Then, with a = B = .05, the average size of the sample

needed by the SPRT to identify v if v1 is the actual

scene view is given by

160

 

-.85000 -.85000
E("'Vl) = "‘—“‘ = 12 88"
B(z|v1) .54 log(;§z) + .46 log(413)
_ -.85000 _ ~
‘7751‘386‘1-66-2
Similarly, with a = B = .05, the average size of the

sample needed by the SPRT to identify v if v2 is the
actual scene view is given by

= .85000 = .85000

ECHIV )
2 E(z|v2) .12 log(f%%) + .88 log(f%%)

_ .85000 _ N
‘W‘2-13-3

Thus, for the identification of v at r=6 with 95%
certainty of being correct (i.e., the probability that
v will be mis-identified is .05), the SPRT will
require, on the average, 2 observations of v based on
E6 if v1 is the actual scene view and 3 observations
from E6 if v2 is the actual scene view.

For one additional examination of increased
resolution effects on the expected sample size, con-

10

sider r=10. Then the sample space S of experiment

E10 is given by

10 10 10 10 10
S = {$1 , $2 , s3 , s4 }

where
1 1 1 10
Pio(510) = pio(520) = 0, p10(s3 ) = .90,

pio(sio) = .10,

161

and
10 10 10 10 1 1
pz (51 ) = pz (52 ) = 0. p20(530) .20.
l
p20(si0) = .80.
Then, with a = B = .05, the average size of the sample

needed by the SPRT to identify v if v1 is the actual

scene view is given by

E(z|v1) .90 log(f%%) + .10 10g(f%%)

 

E(an1) _ -.85000 _ -.85000

= -.85000 = 74 z 1
T171157? '

Similarly, with a = B = .05, the average size of the
sample needed by the SPRT to identify v if v2 is the

actual scene view is given by

 

. .85000
E(nlvz) = ._§._S_O_O_0. = .20 .80
E(z|v2) .20 1°g(790) + .80 1°g(TTU)
_ .85000 _ -
_ 1736771 - .622 - 1

Thus, for the identification of v at r=10 with 95%
certainty of being correct, the SPRT will require, on

10. While

the average, 1 observation of v based on E
this result may seem somewhat surprising, it should
be remembered that E(n|vi) represents an average
value for n, based on n being continuous. In reality
of course, n is discrete and can only assume values

l,2,3,...,etc., and the approximations for E(n|vi)

must be interpreted with this in mind.

162

This example confirms that as the encodement
resolution r is increased, the average number of
samples required by the SPRT to identify v (as either
v1 or v2) decreases. This is intuitively appealing,
since one would expect that distinguishing a length
difference (which is the only difference between v1 ‘}
and v2) would be easier at resolutions for which that 5.
length difference is closer to 1 grid unit than at :
coarser resolutions for which that length difference

is closer to 0 grid units. a’

4.4.5 Practical Considerations

The two examples of the previous section demon-
strate that snapshot signatures can be used practically
for scene view identification in situations where distinct
scene views generate the same snapshot sets relative to
some encodement resolution. Such cases occur when the
resolution of the encodement grid is too coarse to reflect
the structural differences of the distinct scene views,
regardless of the particular relative translational
position of the encodement grid. Hence, features which
are based on the structure reflected in an arbitrary
encodement of an unknown one of these distinct scene
views —- even after smoothing or other image enhancement
techniques -- cannot provide any information which can
help to identify which one of these scene views is repre-

sented. This implies that the "classical" pattern

163

recognition approach based on feature extraction from a
single encodement of an unknown picture will not be pro-
ductive at such coarse resolutions. Hence, the scene view
identification scheme based on snapshot signatures can be
utilized at coarse encodement resolutions where "classical"
feature extraction techniques of identification cannot be
productively applied. This is particularly significant
when physical constraints -- such as the size or number

of photocells which may be available to realize a grid
(see Section 2.3) -— may prevent the encodement grid
resolution from being made fine enough for feature extrac-
tion identification techniques (based on a single encode-
ment of an unknown) to be effective.

Even when the resolution to be used for encoding an
unknown scene view can be made fine enough to allow
feature extraction identification techniques to be used
productively, it may still be advantageous to use a
relatively coarse resolution with an identification
scheme based on snapshot signatures. The choice is based
on memory vs. time considerations, and also upon whether
or not it is possible -- or desirable -- to encode an
unknown scene view more than once. Of course, if an
unknown scene view is only scanned (encoded) once, snap-
shot signature identification techniques can only be based
on the probability of occurrence which each prototype

scene view associates with that single snapshot -- a

.1

.. . La..'.‘__‘~.' I- u-

Inn-.1: .7 5

164

procedure which does not fully utilize the identification
capabilities of snapshot signatures. Hence, we will
restrict our attention to those cases where the unknown
scene view to be identified can be scanned as many times
as desired (with the encodement grid randomly and indepen-
dently translationally positioned for each scan). Note
that each such scan of the unknown scene view at reso-
lution r generates a binary encodement with (r+2)2 bits of
information -— one bit representing each (shaded or non-
shaded) square on the extended grid of resolution r.
Hence, if an identification procedure requires n scans
(encodements) of an unknown scene view at resolution r,
then (r+2)2-n total bits of information must be processed.
For Example 2 of Subsection 4.4.4, we can thus draw the
following conclusions about the average number of bits of
information which are needed to identify an unknown

ve{v1,v2} using the SPRT:

(a) for r = 2, the average number of bits to be

processed is

IZ

(2+2)2-E(n|v1) 16-6 = 96 if v1 is the

actual scene
view,

and is

l2

(2+2)2-E(n|v2) 16-10 = 160 if v2 is the

actual scene
view;

165

(b) for r = 6, the average number of bits to be

processed is

(6+2)2-E(n|v1) = 64-2 = 128 if v1 is the
actual scene
view,

and is
(6+2)2~E(n|v2) 2 64-3 = 192 if v2 is the

actual scene
view;

and (c) for r = 10, the average number of bits to be

processed is
(10+2)2-E(n|vi) z 144 1 = 144 for i=1,2.

Now the first resolution in Example 2 of Subsection 4.4.4
for which v1 and v2 can generate a different snapshot is

r = 12. Hence, this is the first resolution at which a
feature extraction identification scheme based on a single
encodement of an unknown veP can be effective. At r = 12,
v1 can generate snapshots which are 11 and 12 shaded grid
squares wide, while v2 can generate snapshots which are

12 and 13 shaded grid squares wide. Hence, at least a

13 x 13 grid (having grid square size equal to that of the
grid of resolution 12) is needed to represent an arbitrary
encodement of an unknown scene view ve{v1,v2}. In turn,
this implies that at least (l3)2-l = 169 bits of infor-
mation must be processed for the identification of v using
feature extraction techniques. Note, however, that at

r = 12 the snapshots generated by both v1 and v2 which are

- .r .._- ...“...1 .... _

166

12 shaded grid squares in length are identical. And the
probability that such snapshot occurs is .92 (which is
pi2(si2)) if v1 is the actual scene view and .24 (which
is p%2(s§2)) if v2 is the actual scene view. Hence, the
chance at r = 12 that a random encodement of an unknown
ve{v1,v2} will reflect a structural characteristic which
can be used to differentiate between v1 and v2 is fairly
small. So the probability is fairly high that a feature
extraction technique which uses structural characteristics
of the encodement to identify v will make an error. In
fact, it is not until r = 23 —— where v1 can generate only
snapshots which are 21 and 22 grid units wide, and v2 can
generate only snapshots which are 23 and 24 grid units
wide -- that an arbitrary encodement of an unknown
ve{v1,v2} will always reflect the structural differences
between v1 and v2. At r = 23, feature extraction techni-
ques could be used to identify v as either v1 or v2 (with
100% certainty of being correct) by processing the
(23+1)z-1 = 576 bits of information in any arbitrary
encodement of v. In any event, the feature extraction
method of identifying an unknown vc{v1,v2} from a single

arbitrary encodement requires the processing of at least

169 bits of information (at r = 12) to be at all effective,

and even more (for r > 12) if that identification is to be
made with a high degree of certainty. Comparing these

requirements with the average number of bits required to

'1-

1.] III Ila!

167

identify v with 95% certainty using the SPRT with snapshot
signatures at lower resolutions r (r = 2,6, and 10), the
feature extraction method will in many cases require
significantly more bits to be processed to identify v with
the same level of certainty. To make an exact comparison,
it would be necessary to consider the probabilities with
which the scene views v1 and v2 themselves occur, the
degree of certainty with which an unknown scene view is to
be identified, and the specific features and decision rule
used by the feature extraction method. However, it should
be apparent from this discussion that the use of snapshot
signatures with the SPRT offers an effective alternative
to the "classical" feature extraction methods (based on a
single arbitrary encodement of an unknown) of identifi-
cation. Not only can the identification be done at lower
resolutions using snapshot signatures, but there may also
be an overall savings in the average number of bits of
information (i.e., memory) needed for identifying unknown
scene views.

The smaller average number of bits which may be
required by the SPRT to identify an unknown scene view is
not the only savings in memory which can result from using
snapshot signatures. Feature extraction methods require a
relatively fine resolution, say r1, to insure that signi-
ficant structural differences will be reflected in an

arbitrary encodement of any of the scene views under

. ‘ qu-.i_.p'. .4 ‘ ._
I
r

168

consideration. And since the entire encodement must be
available for extracting features, (r1+l)2 bits of infor-
mation must be simultaneously accessible (i.e., stored in
memory). On the other hand, the use of snapshot signa-
tures and the SPRT may permit a relatively coarse reso-
lution, say ro(<r1), to be used for identifying an unknown
one of the scene views under consideration. And since the
SPRT only requires that one encodement at a time be
available, only (ro+2)2 bits of information must be simul-
taneously accessible (i.e., stored in memory). If ro is
significantly smaller than r1, a sizeable savings in
required memory is thus possible using the SPRT at the
coarser resolution. Of course, considerations regarding
memory savings must be weighed against any increase in
time which may be involved in taking a number of encode-
ments (at r0) instead of just a single encodement (at r1).
Although both the examples of the previous section
considered prototype scene views which differed only in
some length measurement, the methods that were illustrated
apply equally as well when the prototype scene views have
other structural differences (e.g., shape, or overall
size). Additionally, the use of snapshot signatures with
the SPRT to identify an unknown scene view does not
require that the prototype scene views have the same snap-
shot set. In fact, the expected sample size needed by the

SPRT to identify an unknown one of prototype scene views

1‘11,

169

having distinct snapshot sets may be quite small (e.g., if
one prototype generates a snapshot with relatively high
probability, while the other prototype cannot generate that
snapshot at all). Finally, the method of using snapshot
signatures with the SPRT to identify an unknown one of two
prototype scene views can be naturally extended to the
case where there are more than two prototypes by using the 1“
”generalized sequential probability ratio test" (denoted
”GSPRT") [F-4] with snapshot signatures. Of course, any 1
other statistical method of testing multiple or composite
hypotheses could also be used for scene view identification
in such cases, as mentioned in Subsections 4.4.1 and 4.4.2.
It should be noted that the use of snapshot signa-
tures with the SPRT (or GSPRT) in scene view identification
might advantageously be combined with other pattern recog-
nition methods in a picture recognition situation. For
instance, a feature extraction method might be used on an
arbitrary encodement of an unknown scene view v at some
given resolution to eliminate part of the prototype scene
views from further consideration. Then additional encode-
ments of the unknown scene view could be made (not neces-
sarily at the same resolution) and snapshot signatures
with the SPRT (or GSPRT) used to identify v as one of the
remaining prototypes. The effectiveness of such a proce-
dure would obviously depend on the resolution constraints,

the degree of certainty with which an identification is to

170

be made, the features and decision rule used in the feature
extraction, and the decrease in the average sample size
that could be expected by using a sequential test on a
reduced set of prototypes in that particular picture

recognition situation.

4.4.6 Blurry Images

An interesting interpretation of the snapshot
signature of a scene view relative to resolution r results
when the snapshots in the snapshot set are superimposed
one over the other to form a single composite represen-
tation of that scene view. In such a superimposition, the
probability of occurrence associated with each of the snap-
shots can be used to represent an intensity level for the
shading associated with its grid squares. Then, any point
(or region) in the single composite representation of the
scene view can be assigned an overall intensity level
which is the sum of the shading intensities of all the

snapshots which cover that point (or region) in the super-

imposition.
Such a single composite representation -- which will
be denoted as a "blurry image" -- is illustrated in

Figure 4.7 for both of the snapshot signatures in Figure
4.4. The snapshots shown in Figure 4.4 have been super-
imposed with respect to their centers of gravity (denoted
"C.G.", and defined relative to the lower left corner of

each snapshot in Figure 4.4), which is represented by the

. t Mu‘~' r. 1'.

171

a] II.- .
Pcaltlll.I.I.I.-.. “.../w

.v.¢ chamﬁm mo ouauwcmwm ponmmmcm on» %n poucomonmou owmeﬁ xuhzam

.n.e eesmwu

 

I I'll l l

172

crossing point of the symbol ”X" appearing in the center

of Figure 4.7. The resolution shown in Figure 4.7 is 20
times finer than the resolution at which the original snap-
shots were shown in Figure 4.4, which avoids the problem
of having a grid square which is only partially covered by
one or more of the snapshots in the superimposition. The
shaded area of Figure 4.7 represents the region of the
blurry image which is covered by all the snapshots (having
non-zero probability of occurrence) of Figure 4.4, and
hence has shading intensity "1" (= a + B + y + 6). The
other regions in Figure 4.7 are labeled with their shading
intensities by a sum of one or more of 0,8,Y, or 6,
depending upon which of the snapshots (d), (e), (f), and/or
(g) (in Figure 4.4), respectively, cover that region in the
superimposition. The subscripts on a,B,Y, and 6 have been
dropped so that Figure 4.7 can represent the blurry image
of either object in Figure 4.4, depending upon which set

of probabilities -- al’Bl’Yl’ and 61 for the triangle, or
a2,82,y2, and 62 for the trapezoid -- is used to define

the shading intensity levels. Since the shading inten-
sities associated with snapshots (a), (b), and (c) of
Figure 4.4 are zero, these snapshots do not affect the
blurry image, and hence are omitted from consideration.
Notice that by superimposing the snapshots with respect

to their centers of gravity, the symmetry of the original

scene view appears to be preserved in the high intensity

173

(shaded) region of the blurry image, with some of its
significant structural form (e.g., wide base, narrow top)
also preserved in this region.

The blurry image can be considered as a photographic
"multiple exposure” which is taken of a scene view relative
to some encodement resolution r. Theoretically, the blurry
image can be considered as the result of taking "snapshots"
of the scene view with the encodement grid (photographic
film with a "coarse grain") in every possible relative
translational position with respect to the scene view, and
then reproducing all these snapshots (positioned so that
their C.G.'s line up) on the same photographic print (with
a fine grain for good resolution) to form a single
"multiple-exposure" image. The relative shading inten-
sities in the various regions of this single image would
then prOportionately reflect the shading levels of the
corresponding regions of the blurry image.

Practically, a blurry image can be approximated by
taking a finite sample of n "snapshots" of the scene view
with the encodement grid in n independent, randomly chosen
relative translational positions with reSpect to the scene
view. These snapshots can then be superimposed with
respect to their C.G.'s to form a single composite image,
with each point in this image assigned a shading intensity
level equal to the relative number of (i.e., the fraction

of) the n snapshots which cover that point. Then the

- '-..¢_“<+\.’.-o.'.r

174

entire scene view recognition problem can be viewed as a
"template matching" problem [D-Z] between the approximate
blurry image of an unknown scene view and the theoretical
blurry images of the prototype scene views. And effec-
tively, this template matching is what the identification
procedure using the SPRT with snapshot signatures accom-
plishes, sequentially increasing the number of snapshots

n in the random sample until the blurry image approximation
for the unknown scene view matched one of the prototype
blurry images (templates) closely enough to make an identi-

fication with the required level of certainty.

Chapter 5

CONCLUSIONS AND RECOMMENDATIONS

5.1 Summary and Conclusions

 

The recognition of scenes from their encodements
is the central problem throughout the fields of pattern
recognition and picture processing. The various techniques
which have been used to approach this problem range from
deterministic and statistically based decision-theoretic
methods to the formal structural analysis methods of syn-
tactic (linguistic) picture processing. All such
approaches, however, are forced to deal with encodement
images of actual scenes, which represent some level of
degradation of the structural features of these scenes.
Such approximate or inexact representations result from
the finite discretization of scenes which is done to
facilitate automatic processing by machines. And physical
constraints which may restrict the resolutions with which
a scene can be encoded may further amplify the distortion
effects of the discretation, particularly when the maximum
possible encodement resolution is still too coarse to
reflect significant fine structural detail in the scenes

which are to be considered.

175

176

Traditionally, pattern recognition methods have dealt
with a single arbitrary encodement of scenes in identifi-
cation situations. However, such an approach is not always
effective, especially when the encodement resolution is
relatively coarse and the structural form of an encodement
highly dependent upon the relative positioning between the
encodement grid and scene. For such situations, an alter-
native approach is suggested in this thesis which is based
on characterizing a scene view (a scene in fixed orien-
tation) relative to any encodement resolution in terms of
all the distinct encodement patterns (individually called
"snapshots", or collectively the "snapshot set") which can
result by encoding that scene view in various (arbitrary)
relative translational positions with reSpect to the encode—
ment grid. Additionally, this characterization associates
with each snapshot in the snapshot set a probability which
reflects the relative likelihood of that snapshot being the
one which would occur if an encodement grid were randomly
positioned over that scene. This entire characterization
of a scene view -- i.e., the snapshot set relative to
resolution r with its associated probability distribution --
is termed the "snapshot signature" of that scene view
relative to resolution r. It is possible to theoretically
determine the snapshot signature of a scene view, and this
determination is illustrated for a simple line drawing.

For more complex scene views in the class of line drawings,

a procedure exists for approximating their snapshot

O
4'
-.__.___- ,

177

signatures which is based on a linguistic description

scheme for generating the encodement of a scene in a fixed,

but arbitrarily specified, rotational and translational

positioning. This linguistic description scheme (denoted

as an "interpretive coordinate grammar") is presented and

discussed in detail in Chapter 3, and could be considered :1

as a generalization of the "picture-processing grammars" ?

of Chang [C-l]. q
The utility of the snapshot signature characterization

of a scene view in a scene identification context is ;j

discussed in Chapter 4 within the framework of statistical

hypothesis testing. Two examples are presented in which

the coarseness of the encodement resolution makes it impos-

sible to distinguish between any of the scene views being

considered on the basis of the structural characteristics

of any single encodement of an unknown one of those scene

views. Yet a sequential pattern recognition method based

on snapshot signatures is capable of identifying these

scene views with high reliability in this situation. The

only assumption that has to be made is that the unknown

scene view can be encoded repeatedly with independent

and randomly selected translational positionings of the

encodement grid. Hence, the advantage of the snapshot

signature characterization in scene view identification at

coarse encodement resolutions becomes evident. Further, it

is shown that a memory savings is possible in some scene

identification situations -- in terms of both the (average)

178

total amount of information which must be processed, and
the amount of information which needs to be available at
any one time -- by using scene identification techniques
based on snapshot signatures at lower resolutions, rather
than equally reliable scene identification techniques based
on a single encodement examination at higher resolutions.
So, there can be definite benefits gained by using
the multiple encodement characterizations of scenes in
scene identification situations. The general methods pre-
sented for line drawings apply equally as well to other
types of scene structures (e.g., textures), although the
linguistic description scheme for generating encodements
would no longer be applicable. It is also important to
note that scene views (scenes in fixed orientation) are
considered instead of scenes, because of the decreased
complexity involved in processing scene views. Such
considerations suffice to demonstrate the practical
advantage of using multiple encodements at a given
resolution. However, the general methods developed for
characterizing scene views by their multiple encodements
could be applied equally as well to characterizing scenes
by their multiple encodements, albeit with a considerable

increase in complexity.

5.2 Suggestions for Future Work

 

The extension of snapshot signature character-

izations to arbitrary scenes (i.e., those without a fixed

179

orientation) would be highly desirable. Rotational con-
siderations based on scene views alone are awkward, at best.
For instance, if an arbitrary scene were to be identified
using the snapshot signature approach based on translational

considerations only, a finite number of fixed rotational

orientations of that scene would have to be used to define

separate views of that scene, with each such scene view

- .. A4"L‘s' .a.. ,

then considered individually in the identification process.
By making the rotational increments which define each such
scene view small enough, it may be feasible to use this #8
scheme to recognize a scene in any arbitrary rotational
orientation. However, a more direct approach is suggested
whereby the snapshot signature characterization of a scene
view is extended to include all the snapshots it can
generate for all possible relative translational and
rotational positionings of an encodement grid over that
scene view. These "rotationally extended" snapshot signa-
tures can then be used to characterize the entire scene,
rather than just a view of that scene, relative to some
encodement resolution. Identification procedures Could

thus be based on the extended characterization of a scene.
The theoretical computation of a "rotationally extended"
snapshot signature will undoubtedly be quite complex
(especially in regard to determining the probability distri-
bution over the snapshot set) for all but the simplest of

scene structures. But for binary encodements of line

180

drawings, the linguistic algorithm of Chapter 3 can be
applied directly to generate the encodement of a scene in
any fixed, but arbitrary, translational and rotational
positioning with respect to the encodement grid. Hence, an
effective procedure for approximating the "rotationally
extended" snapshot signature of a scene can easily be
develOped, based on making repeated encodements of that
scene for various independently and randomly chosen rota-
tional orientations and relative translational positionings
of that scene with respect to the encodement grid.

Another area suggested for future work is the investi-
gation of syntactic picture identification based on a for—
mal linguistic description of the multiple encodement
characterizations of scene views (or scenes) relative to
some resolution. Unfortunately, the linguistic description
scheme presented in Chapter 3 is not readily suited to
conventional grammatical parsing techniques, since many of
its rules ("productions") are really "production schemata",
each of which represents an infinite number of rules (based
on the different possible values of the coordinates). The
determination of the equivalence classes of those scene
views which can generate the same snapshot sets at a given
resolution can be helpful in this respect, as there will be

only a finite number of such equivalence classes at a given

T
m ...“_‘...".—‘

181

resolution.1 A linguistic description scheme could then
be based on this finite number of equivalence classes of
scene views, instead of on an infinite number of scene
views. Of course, such a description scheme would not be
useful for reflecting the probability distribution which
each scene view (or scene) associates with its snapshot
set, as individual scene identities would be lost in the
equivalence classes.

Fu and Huang ([F—5] and [H-6]) have presented a method
for incorporating probabilities of scene view occurrences
directly into the production rules of a linguistic scene
description scheme. The result is that any scene descrip-
tion generated from the linguistic scheme has its prob-
ability of occurrence automatically associated with it.
The extension of this method to a linguistic description
scheme which could produce scene encodements (snapshots)
along with their associated probability of occurrence

suggests an interesting area for further research.

 

1 This is because there are only a finite number of
distinct snapshots possible at a finite encodement reso-
lution, and hence only a finite number of distinct sets
of these snapshots.

, ;
'L_“-
L

0
I

. u ' I
A . «uh MIJ.‘.. I‘ll IUII ills-ll... III. 1

i1 .
. . ...... ..l. 53‘ ..
I . (l h...«.. 1.10:-

B IBLIOGRAPHY

 

.‘J.\ ...

1i. 11..”13: I.‘ 1.

«iii: 5... 0 “Fill.- .1 .. I

BIBLIOGRAPHY

[A-l] Andrews, Harry C. Computer Techniques in Image
Processing. New York: Academic Press, 1970.

 

 

[C-l] Chang, Shi-Kuo. "Analysis and Synthesis of Two-
dimensional Patterns Using Picture-processing
Grammars." Ph.D. Dissertation, University of
California at Berkeley, 1969.

[D-l] Dacey, M. F. "The Syntax of a Triangle and some
other Figures." Pattern Reco nition (Special Issue
on Conceptual Aspects of’Two- imensional Picture
Processing), Vol. 2, No. 1 (Jan., 1970), pp. 11-31.

 

[D-Z] Duda, R. 0. "Elements of Pattern Recognition."
Adaptive, Learning and Pattern Recognition Systems.
Editedby J. M. Mendel ande. S. Fu. New York:
Academic Press, 1970. pp. 3—33.

 

[F-l] Feder, Jerome. "Languages of Encoded Line Patterns."
Information and Control, Vol. 13, No. 3 (Sept.,
1968), pp. 230-244.

 

[F-Z] Fischler, M. A., and Elschlager, R. A. The Repre-
sentation and Matching of Pictorial Structures.
Lockheed Missles and Space Company Information
Sciences Palo Alto Research Laboratory Technical
Report No. LMSC-D243781. Palo Alto, Calif.:
Lockheed Missles and Space Co., Sept. 1971.

 

 

[F-3] Fu, K. S. Sequential Methods in Pattern Recognition
and Machine Learning. New York: Academic Press,
1968.

 

 

[F-4] Fu, K. S. "Statistical Pattern Recognition."
Adaptive, Learning and Pattern Recognition Systems.
Edited'by j. M. Mendel and’K. S. Fu. New York:
Academic Press, 1970. pp. 35-79.

 

[F-S] Fu, K. S. On Syntactic Pattern Recognition and
Stochastic Languages. Purdue UniverSity School of
Electrical Engineering Technical Report No. TR-EE
71-21. Lafayette, Indiana: Purdue University,
Jan., 1971.

 

 

182

v” tut-‘mi'? I.- -.
I-

[F-6]

[G-1]

[G-Z]

[H-1]

[H-2]

[H-3]

[H-4]

[H-5]

[H-6]

[L-1]

[L-ZI

[M-ll

183

Pu, K. S., and Swain, P. H. "On Syntactic Pattern
Recognition.” Software Engineeriyg. Vol. 2 of the
Proceedings of the Third International Symposium on
Computer and Information Sciences (Miami, Fla.,
1969). Edited by Julius T. Tou. 2 vols. New York:
Academic Press, 1971. pp. 155-182.

 

Gans, David. Transformations and Geometries.
New York: Appleton-Century-Crofts, 1969.

 

The Mathematical Theory of
New YOrk: MCGraw-Hill

Ginsburg, Seymour.
Context Free Languages.
Book Co., 1966.

 

 

Harary, Frank. Graph Theory. Reading, Mass.:
Addison-Wesley Publishing Co., 1969.

 

Harlow, Charles; Synder, Arthur; and Lee, Shin Maan.
"Image Analysis and Line Identification." Pro-
ceedings of the UMR-MERVIN J. KELLY Communicifions
Conference. Rolla, Mo.: University of Missouri,
October 5-7, 1970, pp. 6-3-1 thru 6-3-9.

 

 

Harlow, Charles; Caudill, Patrick; Lehr, J.; and
Parkey, R. "On Image Analysis." Proceedings of
the UMR-MERVIN J. KELLY Communications Conference.
Rolla, Mo.: University of Missouri, October 5-7,
1970, pp. 6-4-1 thru 6-4-6.

 

 

Hoel, Paul G.; Port, Sidney C.; and Stone, Charles

 

J. Introduction to Statistical Theory. Boston:
Houghton Mifflin Co., 1971.

H0pcroft, John E., and Ullman, Jeffrey D. Formal
Languages and Their Relation to Automata. Read1ng,

 

Mass.: Addison-Wesley Publishing Co., 1969.

"On Stochastic Context-
1971,

Huang, T., and Fu, K. S.
Free Languages." Information Sciences, Vol. 3,
pp. 201-224.

 

Lang, Serge. Basic Mathematics. Reading, Mass.:
Addison-Wesley Publishing Co., 1971.

 

”Feature Extraction: A Survey."
57, No. 8 (August,

Levine, Martin D.
Proceedings of the IEEE, Vol.
1969), pp. I39l-1407.

 

Meyer, Paul L. Introductory Probability and
Statistical Applications. REading, Mass.:
Wesley PubliShing Co., 1965.

 

Addison-

 

[M-Zl

[M*3]

[M-4]

[N-1]

[N-Zl

[P-1]

[P-2]

[P-3]

[P-4]

[R*1]

[R-Zl

[R-3]

[R-4]

184

Michie, Donald. "The Intelligent Machine." Science
Journal, Vol. 6, No. 10 (October, 1970), pp. 50-54

Minsky, Marvin. "Steps Toward Artificial Intelli-
gence." Computers and Thought. Edited by Edward
A. Feigenbaum and Julian Feldman. New York:
McGraw-Hill Book Co., 1963. pp. 406-450.

 

Mood, Alexander M., and Graybill, Franklin A.

 

Introduction to the Theory of Statistics. 2d ed.
New York: McGraw-Hill Book Co., 1963.
Narasimhan, R. "Picture Languages." Picture

Edited by S. Kaneff. London:
pp. 1-30.

 

Language Machines.
Aca emic Press, 1970.

Narasimhan, R., and Reddy, V. S. N. "A Syntax- I
Aided Recognition Scheme for Handprinted English I
Letters." Pattern Recognition, Vol. 3, No. 4 (Nov., U"
1971), pp. 345-361.

 

Pattern Recognition Society. Pattern Reco nition,
(Special Issue on Optical Character’Recogn1tion),
Vol. 2, No. 3 (Sept. 1970).

 

Pattern Recognition Society. Pattern Recognition,
(Special Issue on Syntactic Pattern Recognition --
Part One), Vol. 3, No. 4 (Nov., 1971), pp. 341-448.

 

Pattern Recognition Society. Pattern Recognition,
(Special Issue on Syntactic Pattern Recognition --
Part Two), Vol. 4, No. 1 (Jan., 1972), pp. 1-144.

 

Pfaltz, John L., and Rosenfeld, Azriel. "Web
Grammars." Proceedings of the International Joint
Conference oniArtifiCial Intelligence. Editedby
Donald E. Walker and’Lewis M. Norton. Washington,
D.C., May 7-9, 1969, pp. 609-619.

 

 

Ribak, Richard. "N-Dimensional Grammar for Pattern
Recognition." Ph.D. Dissertation, The University of
Connecticut, 1971.

Rosenberg, B. "Character Blurring." The Australian
Computer Journal., Vol. 3, No. 1 (February, 1971),
pp. 43-46.

 

 

Rosenfeld, Azriel. Picture Processing by Computer.
New York: Academic Press, 1969. ‘

 

Rosenfeld, Azriel. "Picture Processing by Computer."
Computing Surveys, Vol. 1, No. 3 (Sept. 1969),
pp. 147-1760

 

[5-1]

[T-1]

[V-ll

[W-1]

[W-ZI

[W-3]

185

Shaw, Alan C. "Parsing of Graph-Representable
Pictures." Journal of the Association for Computin
Machinery, Vollil7, No. 3 (July, 1970), pp. 453—4 .

 

 

Tou, J. T. "Feature Extraction in Pattern Recog-
nition." Pattern Recognition, Vol. 1, No. 1 (July,
1968), pp. 3-11.

 

Viglione, S. 8. "Applications of Pattern Recogni-
tion Technology." Adaptive, Learning, and Pattern
Reco nition Systems. Editediby J. M. Mendel and
K. . Fu. New Yofk: Academic Press, 1970.

pp. 115-162.

 

 

Wald, Abraham. Sequential Analysis. New York:
John Wiley and Sons, 1947.

 

Weiman, Carl F. R., and Rothstein, Jerome. Pattern
Recognition bnyetina Like Devices. Ohio State
University Computer and’lnformation Science Research
Center Technical Report No. OSU-CISRC-TR-72-8.
Columbus, Ohio: Ohio State University, July, 1972.

 

Wetherill, G. Barrie. Sequential Methods in
Statistics. London: Methuen and Co., 1966.

 

 

General References

Chang, S. K., and Shelton, G. L. "Two Algorithms
for Multiple View Binary Pattern Reconstruction."
Proceedings of the UMR-MERVIN J. KELLY Communi-
cations Conference. Rolla, Mo.: University of
Mﬁssouri, October 5-7, 1970, pp. 20-2-1 thru
20—2-8.

 

 

Fink, Donald G., and Lutyens, David M. The Physics
of Television. Garden City, New York: AnChor
Books, Doubleday and Co., 1960.

 

Glasford, Glenn M. Fundamentals of Television
Engineering. New York} McGraw-Hill Boék Co., 1955.

 

Gordon, Richard, and Herman, Gabor T. "Reconstruc-
tion of Pictures from Their Projections." Communi-
cations of the ACM, Vol. 14, No. 12 (Dec., 19715,
pp. 759-768.

 

186

IEEE Computer Society. IEEE Transactions on Com-
puters (Special Issue on Two-Dimensional Signal
Processing), Volume C-Zl, No. 7 (July, 1972),

pp. 633-836.

Kaneff, S. (ed.). Picture Language Machines.
London: Academic Press, 1970i

Knuth, Donald E. The Art of Computer Programmimg.
Vol. I: Fundamental Algorithms. Reading, Mass.:
Addison-Wesley Puinshing Co., 1968.

 

Is.

TYL

WCIIIIHIINIHISIIIIIII IIIIIIIIIIIIH | HIIIIIIIIIS
3 1293 03082 2971