. _ V ‘."'\.“.§',.’ 3. ‘2'? '.,.‘¢I.r.‘..,.n , H V. H ‘ _ I , ‘ ’ I . . H‘. . g~ \_\1/I‘(,9M,n. I I . Hg, 5., ...II_ N. ‘ . H I ‘I , ' . _ 4 ‘f. \‘|.\«\§I\I‘I é‘l'r‘a'h‘. t‘tuLJ'n‘ju', “HUNT” :. ‘ 'I I. I . A . 'I x ‘ I I I ’ " ‘ LINGUISTICALLY ASSISTED Reooemnon 0F PATTERNS FROM. THEIR MULTIPLE TRANSLATIONAL ENCODEMENTS Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY WILLIAM ARTERBURN BURDETTE 1972 LIBRARY Michigan State University ‘3 mama av “’ i ‘I - I mm; & SUNS' . a II BUUK BINDERY we . I LIBRARY BIhD 'R I -| 'L‘Ks‘nlueeopir. menial I L A A I ABSTRACT LINGUISTICALLY ASSISTED RECOGNITION OF PATTERNS FROM THEIR MULTIPLE TRANSLATIONAL ENCODEMENTS BY William Arterburn Burdette The classical pattern recognition approach to identi- fying an unknown picture as one of a finite number of prototypes is based on the extraction of features from a single, arbitrary, discretized encodement of that unknown picture. For this approach to be effective, the encode- ment grid resolution must be fine enough to insure that distinguishing detail is reflected in the encodement of each of the prototype pictures -- regardless of the rela- tive positioning of that picture with respect to the encodement grid. However, in practical situations where such a sufficiently fine resolution may not be feasible, alternative approaches to picture identification are needed. One such alternative approach investigated in this thesis characterizes a picture of fixed orientation rela- tive to an arbitrary encodement resolution in terms of (1) all the distinct encodement patterns which can result by encoding that picture in all possible relative trans- lational positionings with respect to the encodement grid, and (2) a probability of occurrence for each of these William Arterburn Burdette distinct encodement patterns given a random placement of the encodement grid over the picture. This characterization is called the snapshot signature of the picture of fixed orientation relative to the specified encodement reso- lution, and constitutes a composite representation of the information contained in all the distinct positional encode- ments of that picture. Snapshot signature characterizations are considered for two-dimensional line drawings. A procedure is pre- sented for approximating the snapshot signatures of such pictures whose structural complexity may make the theore- tical determination of their snapshot signatures difficult. This procedure is based on a linguistic algorithm which generates the encodement of a picture in an arbitrarily specified position relative to a given encodement grid, using rules resembling the productions of a phrase struc- ture grammar. The utility of the snapshot signature characterization is demonstrated in picture identification at coarse encode- ment resolutions where the classical pattern recognition approach fails. Further, it is shown that memory savings can sometimes be realized by using sequential statistical picture identification procedures based on snapshot signa- tures at coarse resolutions, rather than equally reliable feature extraction techniques at finer resolutions. Hence, William Arterburn Burdette snapshot signature characterizations can provide a useful alternative to increasing encodement resolution for picture identification. LINGUISTICALLY ASSISTED RECOGNITION OF PATTERNS FROM THEIR MULTIPLE TRANSLATIONAL ENCODEMENTS By William Arterburn Burdette A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1972 TO MY PARENTS ii ‘* ACKNOWLEDGMENTS I wish to express my gratitude to Dr. Carl V. Page for serving as my guidance committee chairman and thesis advisor. He willingly devoted a generous portion of both professional and personal time to discussions concerning this research, for which I am most appreciative. His patience, encouragement, and valuable assistance in directing the course of this research are gratefully acknowledged. I also wish to thank Dr. Robert O. Barr, Dr. Julian Kateley, Jr., Dr. John B. Kreer, and Dr. Morteza A. Rahimi for serving on my guidance committee and reviewing this thesis, as well as for the many inspiring classes which they provided me as a student. For the many educational opportunities and generous financial support provided throughout my graduate studies, I am highly grateful to: the National Science Foundation; the National Aeronautics and Space Administration; and the Division of Engineering Research, the Department of Computer Science, and the Graduate Office at Michigan State University. I wish to thank my typist, MiSs Patricia J. Martens, for her conscientious and dedicated preparation of this iii manuscript. Also, thanks go to Dr. Floyd E. LeCureux for providing the excellent final drawings of the figures in this thesis. For the personal sacrifices made by both of these individuals to insure the rapid completion of this final draft, I am deeply indebted. Very special thanks go to my wife Marjorie for her understanding, support, patience, and loyalty throughout this research endeavor. Her love and encouragement pro- vided a special source of inspiration and incentive, which in large measure sustained my efforts throughout my education. iv TABLE OF CONTENTS Page LIST OF FIGURES. . . . . . . . . . . . . . . . . . . vii Chapter 1. INTRODUCTION. . . . . . . . . . . . . . . . . l 1.1 Literature Survey. . . . . . . . 6 1.2 Contributions of the Thesis. . . . . . . 16 1.3 Organization of the Thesis . . . . . . . 19 2. SCENES AND THEIR GRID ENCODEMENTS . . . . . . 20 2.1 Basic Concepts . . . . . . . . . . . . . 20 2.2 Scenes . . . . . . 21 2.3 Scene Encodements and Snapshots. . . . . 26 2.4 Screens and Grids. . . . . . . . . . . . 31 3. "LINGUISTIC" SNAPSHOT GENERATION. . . . . . . 38 3.1 Description of Translationally—Fixed Scene Views. . . . . . . 38 3.2 Resolution Effects on Descriptions . . . 42 3.3 Algorithmic Encodement of T- Scene Views. . . . . . . . . . . . . . . . . . 49 3.3.1 The "Linguistic" Algorithm. . . . 52 3.3.2 Remarks on Notation . . . . . . . 54 3.3.3 Method of the Algorithm: A Synopsis. . . . . . . . . . . . 56 3.3.4 An Example. . . . . . 62 3.3.5 Method of the Algorithm: A Detailed Explanation and Verification. . . . . . . . . 71 3.3.6 Semantic Interpretation: Two Special Cases . . . . . . . . . . 91 3.3.7 Grid Extension. . . . . . . . . . 97 3.4 Algorithmic Encodement of Snapshots. . . 101 Chapter 4. SCENE VIEW RECOGNITION WITH SNAPSHOT SIGNATURES. 4.1 Scene View Recognition: General Considerations . . h-fi-h bLNN 4 «fit-b 4. 4. 4. .4.1 4.4 4.5 4.6 Snapshot Sets. Snapshot Signatures. The Utility of Snapshot Signatures Scene View Recognition: The Comparison of Snapshot Signatures. Scene View Recognition: The Testing of Hypotheses . The Likelihood Ratio and Sequential Scene View Identification. Two Examples. Practical Considerations. Blurry Images 5. CONCLUSIONS AND RECOMMENDATIONS S 1 Summary and Conclusions. 5.2 Suggestions for Future Work. BIBLIOGRAPHY General References. vi Page 110 110 112 128 140 141 143 145 150 162 170 175 175 178 182 185 LIST OF FIGURES Figure Page 2.1 Examples of scene description modification to meet the "junction constraint”. . . . . . . 23 2.2 Examples of typical scenes . . . . . . . . . . 25 2.3 Snapshot examples. . . . . . . . . . . . . . . 28 2.4 More snapshot examples . . . . . . . . . . . . 29 2.5 Examples of finite grids . . . . . . . . . . . 37 3.1 A translationally-fixed scene view and its set notational representation. . . . . . . . . 41 3.2 Flow chart interpretation of the method upon which the "linguistic" encodement algorithm is based . . . . . . . . . . . . . . . . . . . 60 3.3 Flow chart detail of block (E) in Figure 3.2 . 61 3.4 T-F scene view for algorithmic encodement example. . . . . . . . . . . . . . . . . . . . 63 3.5 Encoded result of T-F scene view of Figure 3.4 on the grid of resolution 4. . . . . . . . 64 3.6 Initial component of flow chart for "linguistic" algorithm application . . . . . . 86 3.7 Terminal component of flow chart for "linguistic" algorithm application . . . . . . 87 3.8 Examples of extended grids . . . . . . . . . . 100 4.1 Two examples of snapshot sets. . . . . . . . . 119 4.2 A composite scene view and its snapshot set. 0 O O O I O 0 O O O O O O O O 0 O O O O O 125 4.3 Another composite scene view and part of its snapshot set . . . . . . . . . . . . . . . . . 127 vii Figure Page 4.4 Example of two object views with the same snapshot set, but different snapshot signatures . . . . . . . . . . . . . . . . . . 129 4.5 Snapshot signature of a 3 x 5% grid unit rectangle. . . . . . . . . . . . . . . . . . . 135 4.6 The eight subregions of R which define the probability distribution for the snapshot signature of the rectangle in Figure 4.5 . . . 138 4.7 Blurry image represented by the snapshot signature of Figure 4.4. . . . . . . . . . . . 171 viii Chapter 1 INTRODUCTION Automatic machine processing of pictorial infor- mation has become increasingly apparent in recent years. The capability of the modern digital computer to rapidly process large amounts of data has been effectively utilized for the recognition of patterns in a variety of applica- tions. For example, textural patterns occurring in satellite cloud imagery have been successfully analyzed from video data obtained by the TIROS and NIMBUS meteor- ological satellites [V-l]. Features of the lunar terrain have also been successfully analyzed using video patterns obtained from Air Force Lunar Atlases [V-l]. More common applications of pattern recognition schemes are evident in optical character recognition done by bank check pro- cessors, automatic mail sorters (including mail with handwritten addresses), and document readers [P-l]. Medical applications of pattern recognition have also been demonstrated, including automatic sleep state classifi- cation of human subjects from their electroencephalograph recordings [V-l]. The overall problem in many pattern recognition situations is that of identifying, or classifying, an actual picture or scene relative to a finite set of proto- types, where the identification is based on the infor- mation included in one or more inexact representations of that actual scene. These inexact representations are usually necessary because the machines involved in data collection or analysis can physically represent or process only a finite amount of information. Hence, when two- dimensional scenes are to be automatically processed, they are often discretely encoded on some finite two-dimensional grid, with individual grid squares being shaded with varying degrees of intensities (gray levels) to reflect the nature of the scene in the region covered by that grid square ([R-4] and [D-2]). In the case of line drawings, this shading is generally done with two gray levels, resulting in binary grid encodements. A grid square is shaded in such an encodement if any part of the scene view lies within the borders of that grid square; other- wise, the grid square is left unshaded. Clearly, the resolution of the grid (i.e., its relative fineness) determines the degree of accuracy with which the binary encodement represents the structural features of the actual scene view. In any practical picture processing situation, the Optimal encodement grid resolution would be the one which is as coarse as possible to eliminate the processing of unnecessary detail, yet fine enough to reflect the signi- ficant structure of a scene so that it can be unambiguously identified or classified. Of course, the degree of encodement accuracy needed for unambiguous identification is highly dependent upon the class of scenes involved, and also upon the method used to make the identification. There are two basic methods which have been used for making this identification in pattern recognition [N-l]: (1) the statistical "decision-theoretic approach," which classifies an unknown scene view based on structural features which can be extracted (measured) from its encodement and then compared statistically to known values of these features for a finite set of prototypes; and (2) the linguistic or "descriptive approach," which analyzes or identifies an unknown scene view in terms of the structural relationships which exist between its component parts or the component parts of known prototypes, based on a "picture language" which formally describes the structural relationships which can exist for the scenes under consideration. Com- binations of these two approaches have also been pr0posed, and fall under the general classification of "syntax-aided analysis" (as opposed to "syntax-directed" or "syntax— controlled" analysis, which is just approach (2)) [N-Z]. However, in all of these approaches only a single, arbi- trary encodement of the unknown scene view is processed. Further, it has generally been the case that the encode- ment resolution was heuristically picked, under the assumption that the resolution could be made as fine as necessary for the identification procedure to be used effectively ([D-2], [H-Z], [H-3], [N-Z], [V-l]). This last assumption has practical repercussions in cases where grid resolutions may be constrained by the physical aSpects of the situation. For instance, an encodement grid might be realized by a remote sensing T-V camera, which has a maxi- mum resolution inherently determined by its design. Or the encodement grid might be realized by photographic film, whose resolution has a practical limit determined by its grain size. So it may not always be feasible to make the grid resolution arbitrarily fine in scene identification situations, and alternative solutions to this problem must be considered. One classically proposed alternative to increasing the grid resolution is to improve the identification methods used at a given resolution by finding features which have a higher discrimination quality between the prototype classes ([D-Z], [F-3], [L-Z], [T-1]), or by "sharpening" the image at the current resolution by various image enhancement techniques ([L-Z], [R-Z], [R-3]). However, such approaches are still based on the examination of a single, arbitrary encodement of the unknown scene view at the encodement resolution, and thus ignore the effect that the relative positioning of the encodement grid may have on the fidelity of an encodement. But since the form of the encodement of a scene can be significantly affected by the relative positioning of the encodement grid (eSpecially at coarse resolutions), it seems reasonable to expect that additional information about the structure of the unknown scene view could be obtained by examining more than one of its encodements at a given resolution. Hence, this thesis focuses on the investigation of how these multiple encodements of an unknown scene at a given reso- lution can be utilized in classifying that unknown scene with respect to a finite number of prototype classes. To maintain a reasonable level of complexity, this investi- gation is limited to examining only those multiple encodements which can result from translations of an encodement grid over a scene having a fixed rotational orientation. However, the methods develOped for this restricted case can be generalized to the cases where rotational considerations are important. The approach taken here in the identification of unknown scenes using their multiple translational encode- ments at some grid resolution is a "linguistically— aided" approach [N-Z] -- i.e., both the linguistic and statistical decision theoretic concepts of scene analysis are combined. The linguistic techniques are used to generate approximate encodement characterizations for each of the prototype scene classes relative to some encodement resolution whenever these encodement charac- terizations cannot be conveniently determined theoretically (as would be the case for scenes having a complex struc- ture). Such encodement characterizations -- theoretical or approximate -- are based upon the various structural encodement forms which can arise, and how likely it is that each one will occur, when the encodement grid is trans- lated over the different scenes in the prototype classes. Now, given these encodement characterizations for the prototype classes, statistical decision-theoretic techni- ques can then be applied to identify an unknown scene view by comparing an approximation of its encodement charac- terization (based on a random sample of its different translational encodements) with the "known" encodement characterizations for the prototype classes. The effec- tiveness of such a combined approach to scene identifi- cation based on multiple translational encodements is demonstrated in this thesis. In particular, examples are presented to illustrate the capability of this approach to correctly identify unknown scenes at coarse encodement resolutions where other pattern recognition schemes often fail. 1.1 Literature Survey The "classical” approach to the recognition of patterns by machine divides the process into three basic parts [D-Z]: (l) "representation" (encoding or sensing), (2) "feature extraction" (the selection and measurement of significant attributes), and (3) "classification” (the identification or decision process). The gray-level quantization with discretized pictures (already mentioned) is the predominant "representation" scheme used in the literature for line drawing and textural analyses. Such encodement representations have been shown [R-4] to have significant redundancy (in terms of information content) in many situations, and "picture compression" (data com- pression) can be done with more "efficient encoding" techniques -- such as "block coding," "predictive coding," or "run coding" -- which more directly reflect the infor- mation content in a discretized representation of a scene. A detailed description of such "picture coding" and "picture approximation" techniques can be found in [R-3]. The "feature extraction" portion of the classical pattern recognition approach deals with taking selected measurements -- called "features" ~- from an encodement. These "features" should represent significant attributes which can be used to distinguish between the various classes of prototype scenes under consideration. Then, the values of the features extracted from the encodement of an unknown scene view can be used effectively to identify or classify that scene. For instance, significant features in a character recognition situation might be the number of vertical (or nearly vertical) line segments in a character, or the relative degree of straightness along the edge of a character [D-Z]. In any event, this selection of significant features to be used for classifi- cation is a difficult problem, and various approaches to its solution are discussed in [D-Z], [F-3], and [T-l]. The identification of the various "features" of an unknown scene from its encodement can also be a difficult task, since a discrete encodement can only reflect degraded structural characteristics of the actual scene. Further, in practical situations, the distortion and noise intro- duced by physical sensing machinery or data transmission channels can further degrade the quality of encodement representations which must be analyzed. Many techniques have been introduced for "sharpening" a picture before features are extracted, and such techniques come under the broad classification of "image enhancement". Two excel- lent surveys on image enhancement techniques related to future extraction can be found in [L-Z] and [R-4], with more detailed presentations given in [A-1] and [R-3]. Numerous "decision rules" have been proposed for implementing the classification stage of the "classical" approach to pattern recognition. One of the most straight- forward is known as "template matching" ([F-Z], [F-3]) in which the entire encodement of an unknown scene is com- pared with known encodements (templates) of a set of prototype scenes. The comparison is based upon a "pre- selected matching or similarity criterion", and the proto- type scene view whose template most closely matches the encodement of the unknown scene view is the one with which the unknown scene is identified. This "template-matching approach can be interpreted as a special case [of the] 'feature-extraction' approach, where the templates are stored in terms of feature measurements and a Special classification criterion (matching) is used for the classifier."1 Deterministic classification techniques have also been proposed which are based on "discriminant functions" [F-3] defined for each prototype scene class. These discriminant functions operate on a vector of feature measurements taken from the encodement of an unknown scene, and the prototype scene class whose discri- minant function has the largest value is the one with which the unknown scene is identified. Other classification techniques which have become increasingly popular in recent years [N-l] are statistically based. Such techniques result from formulating the scene recognition problem as a hypotheses testing problem, and then using standard statistical hypotheses testing procedures for performing scene "classification." These standard statistical hypotheses testing procedures are described in [H-4], [M-l], [M-4], [W-l], and [W-3], and their application specifically to pattern recognition situations is covered in [F-3] and [F—4]. These statistical techniques are discussed in more detail in Chapter 4. Several extensions to the three part classical approach to pattern recognition have been considered. 1 K. S. Fu, Sequential Methods in Pattern Reco nition and Machine Learning (New York: Academic Press, 19 8), p. 3. 10 These relate to the consideration of contexts in an encodement [D-Z], the "Optimum decision problem" (deter- mining the "best" classification techniques for a given situation), and the "adaptation problem" (determining a classification technique which can learn from, and adjust to, various types of picture recognition environments) [T-l]. The "adaptation" considerations border on the area of artificial intelligence [M-3], and are especially impor- tant in remote sensing applications (e.g., unmanned space- craft). Discussions concerning training and learning in pattern classifiers can be found in [F-3] and [F-4]. Another basic approach to pattern recognition has evolved over the past ten years, due to the need for more powerful and more general methods which are applicable to complex picture recognition situations. This approach is the "linguistic" one, based on the description of a picture in terms of its component parts and the relation- ships which exist between these components [F-6]. Such a description might either be "generative" (i.e., useful for generating some representative of the described picture class) or "interpretive" (i.e., the result of the analysis of a given scene view) [N-l]. Most of the schemes which have been devised for formal picture descriptions are based on the rewriting systems, or grammars, of formal language theory ([G-Z], [H-5]). This is a consequence of the fact that the "heirarchical structure" among the sub- patterns which are components of a larger pattern is often 11 "analogous to the syntactic structure of languages", and formal language theory provides a convenient description for such structure (via the grammar) and a method for analyzing this structure (parsing) [F-6]. Hence, the name "syntactic" picture processing has been adOpted. Since formal language theory has been develOped on the basis of one-dimensional strings of characters, some of the first attempts at syntactic picture processing involved the encoding of a Z-dimensional pattern into a one-dimensional string. One well known procedure for doing this is Freeman's "chain encoding" method for line pat- terns, in which "...a rectangular grid is overlaid on the two-dimensional pattern and straight line segments are used to connect the grid points falling closest to the pattern... Each line segment is assigned an octal digit according to its slope. The pattern is then repre- sented by a (possibly multiply connected) chain or chains of octal digits."2 Feder [F-l] has used this encoding scheme to form pattern languages based on equations in two variables (lines, circles, etc.) and pattern properties (convexity, etc.). A somewhat more general approach to picture description (and analysis) was presented by Shaw [S-l] in his "Picture Description Language." Shaw uses strings to encode 2 K. S. Fu and P. H. Swain, "On Syntactic Pattern Recognition," in Software Engineering, ed. by Julius T. Tou (New York: Academic Press, 1971), pp. 158-159. 12 pictures which can be represented by multiply connected graphs, but this method has been criticized because it does not permit any contextual considerations to be considered in either picture generation or analysis [F-6]. One of the major problems with l-dimensional string representations of higher dimensional patterns is that each pair of characters in the string ("features", or primitives, of the pattern) are related in exactly one way, according to their relative position in the string. Hence, only a single relationship can be expressed between any two primitive features of the pattern. The practical problems which such a restriction can cause are evident in [M-Z], where a robot is described which needs to identify multiple relationships between the primitives of a scene in order to identify that scene. A method for linguis- tically considering multiple contextual relationships in generating scene descriptions is presented in [D-l], in which encodement patterns in the form of simple geometric figures (e.g. a triangle) are generated on a grid. During the generation, symbols are introduced on the grid on the basis of the symbols which already appear in up to all eight of its neighboring grid squares. Unfortunately, there was no discussion regarding how one might use these two-dimensional descriptions in a parsing scheme. A "natural" generalization of a string language to two-dimensions is represented by the "web grammars" defined by Pfaltz and Rosenfeld [P-4]. The language 13 defined by a web grammar consists of directed graphs with symbols at their vertices ("webs"), rather than a one- dimensional string of characters. The web generation rules can be used to successively imbed subwebs into a larger web, enabling a complex two-dimensional pattern to be described. Since the embeddings can be done based on the consideration of two or more nodes (vertices) at a time, contextual considerations can be reflected in web grammar descriptions. Hence, if each directed edge between two nodes in a web is considered as a primitive of a pattern, the web grammar can be considered as a generalization of Shaw's "Picture Description Language" which allows contex- tual considerations. Several two- (and higher -) dimensional linguistic description schemes which are capable of reflecting mul— tiple (and often arbitrary) relationships between scene primitives have recently been presented. One scheme uses the concept of a "relationship matrix” [R-l] to describe the features, and relationships existing between these features, of arbitrary n-dimensional scenes (particularly line drawings). In the two-dimensional case, the features of the scene (e.g. line segments, points, or even objects) label both the rows and columns of the 2-dimensional relationships matrix, and each entry in the matrix is a vector specifying significant relationships (e.g. angle, distance, connectivity, etc.) which exist between the two features which are the labels of that row and column in the 14 matrix. The on-diagonal entries in the relationship matrix describe the features of the scene themselves (e.g. their length or dimensions, their orientation, etc.). Grammars based on the generation and manipulation of arbi- trary relationship matrices can be used to produce scene descriptions (in relationship matrix form), and such grammars have proven useful in identifying an unknown representative of four prototype classes of handwritten characters through a "parse" of the unknown character's structural relationships [R—l]. While the relationship mat- rix concept can be used as the data base for syntax-directed generation and analysis of actual scene views, a "picture processing grammar" has been proposed by Chang [C-l] which can be used to generate and analyze arbitrary patterns of shaded grid squares on a two-dimensional encodement grid. Such patterns can be considered as those which might result from a discretized representation of an actual scene view. The grammar uses coordinates assigned to each square of the encodement grid to generate descriptions of these patterns based on certain significant groupings of the shaded grid squares -- e.g., those which form hori- zontal strings, or "lines”, on the encodement grid, which can be described by the special symbol "h—line" followed by parameters which specify both the length of (i.e. the number of shaded grid squares in) the string and the coordinates of the leftmost shaded grid shaded grid square in the string. The construction of picture processing 15 grammars which can generate a given set of shaded grid square patterns on an encodement grid was considered by Chang, and such a grammar constructed for describing the set of patterns which represent the encodements of hand- written numerals (0 through 9). This grammar was then used to analyze arbitrary samples of encoded handwritten numerals, resulting in the correct identification of approximately 70% of the samples [C-l]. Other methods of syntactically processing scene encodements have recently been developed, and an excellent survey of these methods can be found in [P-2] and [P-3]. One interesting pattern recognition scheme which is neither statistical nor linguistic was proposed by Weiman and Rothstein [W-Z]. The scheme is based on using parallel processing cellular automata to recognize straight line patterns on an n x n grid. The encodement of a straight line is given by a string of binary digits which can be constructed by tracing the line from left to right on the encodement grid, and writing a "0" for each grid square crossed through parallel sides and a "l" for each grid square crossed through perpendicular sides, omitting the next grid square crossed in this latter case. Such a code reflects the relative positioning of a line with respect to the grid squares, and cyclic shifts occur in the code as the grid is translated over the line to produce different encodements. The structure of the code and its translational cyclic shifting pattern can be used 16 to characterize the line on the encodement grid. This characterization is then used in [W-2] as the basis for various procedures to recognize straight lines, t0pological connectedness, and arbitrary polygons from their represen— tations on an encodement grid. 1.2 Contributions of the Thesis The overall objective of this thesis is to investigate how the combined information contained in the multiple translational encodements which are possible for a particular scene (under various relative translational positionings of the encodement grid) at a given resolution can be effectively utilized in pattern recognition. Such combined multiple encodement considerations have not been 3 previously dealt with in the literature, so the investi- gation within this thesis involves a unique approach 3 Multiple translational encodements of straight lines were considered by Weiman and Rothstein [W-Z], where the primary concern was with representing a two-dimensional encodement with a l-dimensional code, and then noting the cyclic changes which occurred in the code as the encodement grid was translated relative to the straight line. Although such a code does reflect the basic structural features of each encodement, only straight lines of rational slope could be fully characterized on a finite grid by the cyclic changes in their codes. Even then, the length of the line was always assumed to be sufficient to span the entire grid in any translational position. The characterization of scenes in terms of their multiple encodements is considered at a much more general level in this thesis. The two dimensional representation for each of the encodement patterns which can result by translating an encodement grid over an (arbitrary) scene is retained (so that arbitrary relationship information can be maintained). Further, the relative degree 17 in pattern recognition technology. The utility of this approach is specifically demonstrated relative to the scene classification problem at coarse resolutions, where other pattern recognition schemes often fail. As a result of this new approach and the linguisti- cally aided scheme with which it is investigated, several other contributions are made by this research. These include: (1) the definition of some new characterizations of scenes in terms of all their possible encodement forms relative to a given resolution, as well as the formalization of some previously intuitive concepts (e.g., resolution) of picture proces- sing; of persistence (reflected as a probability of occurrence in Chapter 4) with which each distinct encoding pattern appears as the grid is uniformly translated over the scene is also an impotrant part of the charcterization of a scene in terms of its multiple encodements. Such "per— sistence" (probabilistic) pr0perties were not considered by Weiman and Rothstein, but these properties form a quite powerful part of a positional encodement characterization of a scene relative to some grid resolution. In fact, these persistence properties not only permit the rational lepe line characterization constraints to be relaxed, but also permit finite length line segments of any size to be fully characterized on a finite encodement grid. Additionally, many complex scenes can also be characterized (at least relative to other scenes of interest) with respect to almost any encodement resolution. The full detail of this general characterization of a scene in terms of its multiple translational encodements is presented in Chapter 4. and (2) (3) 18 the proposal of a new 2-dimensional linguistic description scheme (denoted "interpretive coordi- nate grammar") which is used to generate an encodement of a specified scene (line drawing) relative to an arbitrarily chosen resolution. This differs from Chang's "picture processing grammar" [C-l] -- which generates an arbitrarily specified set of encodement forms on a grid -- in that the encodements generated by the "inter- pretive coordinate grammar" are based on an actual scene, and a specified relative posi- tioning of that scene, with respect to the encodement grid. Although the form of the rules in the "interpretive coordinate grammar" repre- sent an extension of the (coordinate) form of the rules in Chang's "picture processing gram- mar", the rules of the "interpretive coordinate grammar” -- unlike Chang's rules -- require a semantic evaluation of the coordinates after each step in the generation of a scene description. This is due to the complexity involved in con- sidering encodements generated from a specified positioning of an actual scene view relative to an encodement grid -- a consideration which is ignored by Chang; the application of standard sequential statis- tical decision-theoretic hypothesis testing 19 procedures to scene identification, based on the multiple translational encodement character- izations of such scenes. 1.3 Orggnization of the Thesis Chapter 2 defines the basic terminology which is used in this pattern recognition investigation, and for- malizes some of the intuitive concepts of picture proces- sing. Chapter 3 concentrates on a two-dimensional lin- guistic description scheme for generating an encodement of a scene relative to an arbitrary resolution, based on a standardized representation of that scene in some fixed orientation and relative translational position with respect to the encodement grid. Chapter 4 discusses the theoretical characterization of scenes in terms of their multiple translational encodements, and presents techni- ques for approximating these characterizations based on the linguistic description scheme of Chapter 3. The utility of these characterizations in scene recognition situations is then demonstrated within the framework of statistical hypothesis testing. Chapter 5 concludes the thesis with a summary of results and suggestions for future work. Chapter 2 SCENES AND THEIR GRID ENCODEMENTS 2.1 Basic Concepts The general class of pictures, or "scenes," to be considered throughout this thesis are plane geometric line drawings. A finite, symmetrical square grid can be superimposed over a scene, and the scene then encoded into a two-dimensional grid pattern by shading only those grid squares which are intersected by lines in the scene. The entire scene encodement can be accurately described in the standard terminology of picture processing in [R-4] as a "quantized digital picture" having both the shaded and non-shaded grid squares as its binary valued "picture elements." That portion of the scene encodement consis- ting only of the pattern of shaded grid squares will be called a "snapshot" of the scene. The scene is always considered as being "viewed" through an aperture called the "screen." This screen will have fixed size, shape, and orientation characteristics associated with it for each specific picture processing application. This will provide a standard reference for the specification of any grid to be used in encoding the scenes under consideration. 20 21 2.2 Scenes The concept of a "scene" is formalized as follows: Definition 2.1 A sgege is a two-dimensional pattern composed of a finite, but non-zero number of straight line seg- ments of finite, non-zero length, with any intercon- nections (points of intersection) between the line segments existing only at their endpoints. A primitive scene is a scene composed of exactly one straight line segment. We will denote as the "junction constraint" that part of the preceding definition allowing interconnections between line segments only at their endpoints. The junction con— straint is introduced merely for the convenience of stan- dardized scene descriptions (see Chapter 3), and does not restrict the class of scenes under consideration. This is so because a scene containing two straight lines which intersect at a point other than a common endpoint of both lines can alternatively be described as a scene containing this intersection point as the common endpoint of several shorter lines. These shorter lines are formed by dividing both of the intersecting lines at their common point of intersection, with no other alteration of the structure of the scene. With the description of the scene thus 22 modified, the junction constraint is met without changing the inherent form of the scene in terms of its snapshot generation characteristics. And since we will be exam- ining scenes only through their snapshot representations, we have effectively left the class of scenes under con- sideration unchanged. The modification of a scene description to meet the junction constraint is illustrated in Figure 2.1, where two alternative scene descriptions are provided for both the letter "T" and addition operator "+". Both descrip— tions given for each scene are valid, but only one con- forms with the junction constraint in each case. The straight line segments which comprise a scene will be considered "ideal", i.e., arbitrarily thin. No additional restrictions -- other than the non-zero but finite length, and junction constraints of the definition -- are placed on these line segments at this point, although restricting their slopes to being rational can be useful when scenes will be recognized only in fixed orien- tations (see [W-2]). In addition to the lines themselves, an important part of scene specification is incorporated in the relationships existing between lines in the pattern -- e.g., whether two lines are connected, perpendicular, parallel, of different lengths, etc. However, it is important to note that only the lines themselves and their relative relationships are considered as part of the scene structure; any areas or regions that might be enclosed or (a) Figure 2.1. 23 Descriptions of the letter "T" and addition operator "+" by two intersecting straight lines, L1 and L2, which (b) are connected at a point other than a common end- point of both lines. Alternate descriptions of the letter ”T" by 3 lines intersecting at a common endpoint, and the addition operator "+" by 4 lines inter- secting at a common endpoint. The respec- tive descriptions in (a) have been modi- fied so that all interconnections be- tween lines exist only at their endpoints. Examples of scene description modification to meet the "junction constraint.” 24 otherwise implicitly defined by the line segments are not considered part of the scene. In other words, we are con- sidering line drawings, not solid objects or areas. If we consider a scene as a graph, with the vertices being the collection of all line segment endpoints and the edges being the line segments themselves, then we can make the following definitions for scenes using the terminology of graph theory in [H-l]: Definition 2.2 An object is any connected component (maximally connected "subgraph") of a scene. A simple scene is a scene composed of exactly one object. A composite scene is a scene containing two or more objects. Note that a composite scene is disconnected by definition in a graph theoretic sense. Some examples of typical scenes are shown in Figure 2.2. Even though the scenes are illustrated in a parti- cular orientation, it is important to bear in mind that scenes are structures defined irrespective of any specific orientation in which they may appear. Note also that only straight line approximations of any nonlinear geometric contours are permitted by the definition of scenes. 25 / \ \/ (a) Primitive Scene (b) Primitive Scene (c) Simple (also a simple (also a simple Scene scene, or object) scene, or object) (object) (d) Simple Scene (e) Simple Scene (f) Simple Scene (object) (object) (object) (g) Simple Scene (h) Composite Scene (1) Composite Scene (object) (2 objects) (3 objects) (j) Composite Scene (k) Composite Scene (4 objects) (4 objects) Figure 2.2. Examples of typical scenes. 26 2.3 Scene Encodements and Snapshots In order to encode scenes for processing by a digital computer, or to provide scene encodements which allow the possibility of grammatical description, it is convenient to "discretize" the scene in terms of a finite set of finitely valued scene primitives. The discretizing method chosen here encodes the scene into a finite number of shaded and non—shaded grid squares, with each grid square being a binary valued "primitive" of the scene encodement. The shaded portion of an encodement is called a "snapshot," which is formally defined next along with the construction rules for forming a scene encodement. Definition 2.3 A snapshot of a scene is the entire pattern of zero or more shaded grid squares which results when a finite square grid is superimposed over the scene, and the scene then encoded on the grid according to the following grid square shading scheme: (1) any grid square having at least one of its sides touched (intersected) by one or more of the line segments in the scene is shaded; and (2) no other grid squares except those specified in (l) are shaded. 27 Notice that this definition implies that it is only the structural form of the shaded grid square pattern which defines the snapshot, independent of the relative trans- lational location of the pattern on the grid. Addition- ally, notice that the empty snapshot, defined as the snapshot resulting from a scene encodement having no shaded grid squares, is a valid entity. Examples of snapshots for each of six different scenes are shown in Figures 2.3 and 2.4. Two encoding peculiarities arise in these examples which may warrant further explanation. The first is that a line segment of a scene which lies exactly along a vertical or horizontal grid line of a superimposed grid will, when encoded, shade grid squares along its length on both sides of the grid line. The second is that a line segment of a scene which passes through an intersection point of the hori- zontal and vertical grid lines in a superimposed grid will, when encoded, shade all four grid squares having this grid intersection point as a common corner. Careful exami- nation of the encoding scheme will reveal that the grid shading is being done in accordance with the Specified rules, with only grid squares having one or more of their Sides touched (intersected) by a line in the scene being Shaded in the encodement. This is clear for the first Case, and is clarified for the second case by noting that a corner point of a grid square is actually part of two Figure 2.3. 28 (a) The empty snapshot of a simple scene, or object (a right, isosceles triangle with leg length of 1/2 grid unit), from a 2 x 2 grid encodement. (b) A snapshot of a primitive scene (a straight line seg- ment with length of /2 grid units) from a 4 x 4 grid encodement. (c) A snapshot of a primitive scene (a straight line seg- ment with length of 6-1/4 grid units) from a 6 x 6 grid encodement. Snapshot examples. Figure 2.4. 29 (a) A snapshot of a simple scene, or object (a rectangle of dimensions 3 x 3-1/2 grid units), from a 6 x 6 grid encodement. (b) A snapshot of a simple scene, or object (an equilateral triangle with side length of 4 grid units), from a 6 x 6 grid encodement. (c) A snapshot of a three object composite scene from a 12 x 12 grid encodement. More snapshot examples. 30 sides of that grid square (and also part of two sides of each of the three other grid squares sharing the same corner point). The rationale behind defining the encodement scheme in a way which permits these peculiarities to arise becomes apparent by considering a practical implementation of this scheme. The grid might be physically approximated by a finite, discrete array of contiguous photoelectric sensors, each with sufficient sensitivity for detecting the presence of any portion of an image on any part of its photocathode.1 Now suppose that the image of a scene com- posed of non-ideal line segments (i.e., ones with a finite, non-zero thickness) is projected on this array of photo- electric sensors. It is easy to imagine that a line seg- ment whose projected image would theoretically (if it were arbitrarily "thin") lie on the common border between adjacent sensors would realistically have sufficient thickness to be detected by the sensors lying on both sides of that border. Similarly, a "thick" line segment whose projected image passes through the common border point of four adjacent sensors -- and whose ideally thin image would also pass through this same point -- would in 1 The photoelectric sensors are assumed to have square-bordered areas for their photocathodes. Addi- tionally, the physical realization discussed here is referred to as an approximation because the empty snap- shot cannot be realized. 31 reality be detected by all four of those sensors. Note that both of these situations occur in practical appli- cations solely because of the thickness of the line seg- ments, which permits one location along the length of a single line segment to physically overlap the photo- cathodes of two or more sensors simultaneously. Thus, with grid squares corresponding to sensors, we see that the apparent peculiarities in our theoretical encodement scheme for ideally thin line segments anticipate the extension of the scheme to practical applications invol- ving scenes composed of finitely thick line segments. 2.4 Screens and Grids It should be evident from the discussion and examples of the previous section that whenever a grid is superimposed over a scene, the resulting encodement is dependent upon each of the following factors: (1) the relative translational position between grid and scene; (2) the relative rotational orientation between grid and scene; and (3) the relative size of the grid squares with respect to the size of the scene. The first factor will be discussed in detail in Chapter 4, where it will become apparent that most scenes produce more than one snapshot when encodements are made with a grid of fixed square size and orientation that is superimposed in different relative locations with respect to the scene. The last two factors affecting scene encodements will be considered in the 32 remainder of this section, with the result being the formalization of the concepts of grids and their speci- fications only intuitively referred to up to this point. In many practical picture processing situations it is realistic to assume that a finite bound exists on the size of all scenes under consideration. This means that some single (but not unique) finitely bounded area can be found whose borders -- in all orientations -- could entirely enclose any one of the scenes to be analyzed. In this case, it will be no loss of generality to further assume that such an area has a square border of fixed, finite size. Such a square bordered area, once specified and fixed in an application, will be called the "screen" for that application. Definition 2.4 A screen is some fixed, finite, square- bordered two—dimensional area which is specified for a particular picture processing application, and whose border -- in every rotational orientation -- could be translationally positioned to entirely enclose (without contact) each individual scene under consideration. 33 The single screen thus chosen for a particular application will serve as the fixed size reference for all scenes and grids under consideration in that application. The length of a side of the screen border will be assigned an arbi— trary reference value of ”1 unit." The screen can be envisioned as a movable aperture of fixed size, through which any scene to be encoded and processed will be viewed. The screen positioning motions would generally be permitted three degrees of freedom -- two-dimensional translation, and rotation -— but, as noted in Chapter 1, we are restricting this investigation by allowing only translational motions when viewing and encoding a scene. Thus, we will hereafter restrict our attention in any application to a single screen having one fixed, but arbitrarily Specified, rotational orientation. The effect of this restriction is that each scene under consideration will be viewed in only one specific rota— tional orientation at a time. But Since scenes are struc- tures having no definite orientation, the specific rota- tional orientation of any scene viewed through a screen of fixed rotational orientation will be arbitrary. Therefore, if we characterize a "view" of a scene by specifying a fixed, but arbitrary, rotational orientation of the scene relative to the screen through which it is viewed, the particular rotational orientation of that screen is immaterial. Hence, for notational convenience, 34 we will always assume that any view of a scene is defined relative to a rotationally fixed screen having its border lines oriented horizontally and vertically. Motivated by the preceding discussion, we now make the following definitions: Definition 2.5 A standard screen is a screen with a fixed rotational orientation which makes its border lines horizontal and vertical. Definition 2.6 A view of a scene consists of the scene in some single, arbitrary, rotational orientation defined relative to a standard screen. The single standard screen thus chosen in an application serves as the size and orientation reference for any view of a scene. Note that each view possible for a scene can be considered by simply specifying the appropriate fixed rotational orientation of that scene defined relative to the standard screen. With a Single standard screen as the Size and orien- tation reference, we can now discuss standardized grid descriptions. The grid structure is formed by using evenly spaced horizontal and vertical lines to 3S symmetrically divide the entire area enclosed by the standard screen into identically-sized square cells. The square root of the total number of these square cells then defines the "resolution" of the grid. Any grid thus formed can be considered as a "retina" inscribed on the standard screen, with scene views being imposed and encoded on this retina. The ability of a retina to reflect the detail in an image imposed Upon it depends on the Size, number, and spacing of its sensory elements. Analogously, with grid squares considered as the "sensory elements," the ability of a grid to reflect detail in the encodement of a scene view is characterized by the rela- tive fineness, or "resolution," of that grid. The concepts of grids and grid specifications are now made more precise. Definition 2.7 A grid is the symmetrical geometric lattice, defined by the border lines of the standard screen and zero or more additional intersecting horizontal and vertical line segments of length l which lie between those border lines such that the standard screen is divided into a number of smaller, identi- cally-Sized square cells. 36 Definition 2.8 The resolution of a grid is that positive, non-zero integer whose algebraic square represents the total number of identical square cells which the grid defines on the standard screen. The dimensions of a grid of resolution "r" are denoted by the expression "r x r". A finite grid is a grid having a finite resolution. We note from the above definitions that the standard screen is considered as the finite grid of resolution "1”. Some examples of finite grids with different reso- lutions defined relative to the same standard screen are Shown in Figure 2.5. Note that all of the grids are of the same absolute size, although each one has a relative Cartesian coordinate system uniquely associated with it in a natural way according to its resolution. The lower horizontal border line of the grid is taken as the posi- tive x-axis, and the left vertical border line is taken as the positive y-axis, with the common intersection point of these axes at the lower left-hand corner of the grid serving as the origin of the natural coordinate system. These relative coordinate systems for grids will be useful in examining the effects of resolution changes on scene descriptions, to be considered in Chapter 3. 0 1 (a) Grid of resolution "1", dimensions "1x1". (The Standard Screen) 3 (C) Grid of resolution "3", dimensions "3x3". NAN->010 0 l 2 3 4 5 6 (e) Grid of resolution "6", dimensions "6x6". 37 O 1 2 (b) Grid of resolution "2", dimensions "2x2". 0 l 2 3 4 (d) Grid of resolution "4", dimensions "4x4". 12 10 45(300 0 2 (f) Grid of resolution ”12", dimensions "12x12". Figure 2.5. Examples of finite grids. Chapter 3 "LINGUISTIC" SNAPSHOT GENERATION 3.1 Description of Translationally- Fixed Seene Views Since scenes will be examined via their encode- ments, it is desirable to develOp an algorithmic procedure for producing encodements from scene descriptions. This algorithm can be divided into two parts: (1) the gener- ation of views of a scene from a single general scene description, and (2) the generation of encodements for any view of a scene. The first part relates to the derivation of specific descriptions of a scene for arbitrary rota- tional orientations; the second part deals with the effects of the relative translational positioning between a view of a scene and the grid being used for encoding snapshots. This two-part division thus separates the rotational and translational aspects of the process, per- mitting the independent study of either one. In accor- dance with our original objectives, we will confine this investigation to translational considerations by assuming that scene views are our starting point, and then focusing our attention on the view encodement generation process in 38 39 part (2). The first concern here is the development of a uniform standard for view descriptions. The definition of "scene" in Chapter 2 implicitly designates finite length straight line segments as scene "primitives." Note that a scene can be uniquely charac- terized by specifying its structure in terms of these "primitives" (including their lengths), along with the interconnections (at endpoints), the relative trans- lational locations, and the relative rotational orien- tations existing between them. Suppose such a scene description is constrained by fixing both the relative rotational orientations and the relative translational locations of the scene primitives with respect to the designated standard screen, without alteration of the scene structure. In this case, the primitives themselves and the relative relationships -— i.e., interconnections, orientations, and translational positions -— existing between them remain unchanged, and the resulting descrip- tion specifies a translationally-fixed view of the scene. Recall the elementary fact that a straight line seg- ment having both its rotational orientation and trans- lational position fixed can be completely characterized by the coordinates of its two endpoints specified relative to a fixed two-dimensional coordinate system. Thus it becomes apparent that a translationally-fixed [hereafter denoted as "T-F"] view of a scene can be completely characterized by representing each of its "primitives" 40 by their endpoint coordinates Specified relative to the standard screen. The notational representation for the T—F scene view will be a finite, non-empty set whose elements are sets of two ordered pairs -- each ordered pair in a set representing the Cartesian coordinates of a point on the standard screen, and each set of two ordered pairs representing the two endpoints of one of the primi- tives of the T-F view. An example of a T-F view of a scene is shown in Figure 3.1 - (a), with its corresponding notational representation in terms of a set of coordinate pairs given in Figure 3.1 - (b). Notice that any T-F view of a scene has a unique representation using this notation, due to the uniqueness of the description of the scene's structure guaranteed by the junction constraint in Definition 2.1. Furthermore, we observe that any finite set consisting of one or more sets of two ordered pairs describes some unique T-F scene view if and only if such a set has the following prOperties: (a) both coordinates in each ordered pair have a value between 0 and l, inclusive; (b) the tWO ordered pairs in any one set are not identical; and (c) the line segments, defined by interpreting each set of ordered pairs as the two endpoints of the straight line segment 41 (.5,.7) (.1,.2) (.9,.2) 0 1 (a) A T-F view of a scene illustrated on the standard screen. {{(.l,.2),(.S,.7)},{(.S,.7),(.9,.2)},{(.9,.2),(.l,.2)}} (b) The set notational representation of the T-F scene view in (a). Figure 3.1. A translationally-fixed scene view and its set notational representation. 42 connecting them, would intersect each other only at common endpoints if at all. This is clarified by noting that condition (a) insures that the T-F scene view appears on the standard screen, condition (b) guarantees that each primitive represented by its endpoints has non-zero length (the finite length requirement is guaranteed by condition (a)), and condition (c) insures that the junction constraint is met by the scene whose T-F view is being represented. The uniqueness prOpertieS noted in the previous paragraph confirm the utility of the set notational repre- sentation for T-F scene views. Unless designated other- wise, we will hereafter assume that any T-F view of a scene is expressed relative to the coordinate system of the standard screen using this set notational represen- tation. 3.2 Resolution Effects on Descriptions In order to consider encodements with grids of various resolutions, it becomes necessary to investigate the conversion of a T-F scene view description from the coordinate system of the standard screen to the coordinate system of a grid of resolution "r". We begin by re- examining the coordinate systems of grids in greater detail. Recall that the natural coordinate system of a grid has the lower horizontal grid border as the x-axis, the 43 left vertical grid border as the y-axis, and the lower left corner of the grid as the origin. Additionally, for a grid of resolution "r", the lower right corner of the grid defines the maximum x-coordinate value as "r" while the upper left corner of the grid defines the maximum y-coordinate value as "r". Further, each axis is divided into r equal divisions by r + 1 points (including both endpoints of the axis) which are consecutively labeled from the origin with the integers "0" through "r". Each integer valued point along the x-axis represents the lower endpoint of one of the r + 1 identical length vertical line segments of the grid (which include the two vertical border line segments of the standard screen), while each integer valued point along the y-axis represents the left endpoint of one of the r + 1 identical length horizontal line segments of the grid (which include the two hori- zontal border line segments of the standard screen). The horizontal and vertical line segments intersect at (r + 1)2 points, and divide the standard screen into r2 identical squares. Of course, any point lying along a vertical grid line has an integer valued x-coordinate, while any point lying along a horizontal grid line has an integer valued y-coordinate. Thus, the intersection points of the horizontal and vertical grid lines are characterized by having integer valued x and y coordinates relative to the natural coordinate system of the grid. 44 Definition 3.1 A lattice point (or intersection point) of a grid is any point defined by the intersections of the horizontal and vertical line segments (including the border lines of the standard screen) defining that grid. It should be obvious from the preceding discussion that a grid of resolution r has (r + l)2 lattice points, each one having integer valued x and y coordinates relative to the grid. It is important to bear in mind here that an increase in grid resolution is obtained not merely by increasing the number of grid squares which are present, but also by shrinking the size of individual squares in the grid. This is due to the fact that all grids under consideration at any one time are defined relative to the same standard screen. Thus, every grid coordinate system has the same origin and the same x and y axes. The only distinction between the coordinate systems of grids of different resolution is the relative coordinate labeling along the x and y axes. So, given a fixed point on the standard screen being used, each grid imposed on the standard screen would merely relabel the coordinates of the point relative to the natural coordinate system of that grid. In particular, the coordinates of such a fixed point relative to the grid of resolution "1" (which is the 45 standard screen) can be multiplied by the factor r' to obtain the coordinates of that same fixed point relative to the grid of resolution r'. This Simple coordinate transformation is known as a "dilation", or "stretching", by r' relative to coordinate system of the standard screen [L-l], and forms the basis for the following Lemma: Lemma 3.1 A fixed point on the standard screen having coordinates (a,b) relative to the grid of resolution "1" will have coordinates (r'a,r'b) relative to the grid of resolution r'. Corollary 3.1 A fixed point on the standard screen having coordinates (x,y) relative to the grid of resolution ' I r will have coordinates (%—x, %—y) relative to the grid of resolution r'. 2222:: Let the fixed point on the standard screen having coordinates (x,y) relative to the grid of resolution r have coordinates (a,b) relative to the grid of resolution "1". Then using Lemma 3.1, (x,y) = (ra.rb) = r(a.b), 46 which implies that (a,b) = 31,- (x,y) = c; , >14). Also, the fixed point having coordinates (a,b) relative to the grid of resolution "1" will have coordinates (x',y') relative to the grid of resolution r', determined using Lemma 3.1 as (X'.Y') = (r'a,r'b) = r'(a.b), which implies that (a,b) = 51:. (x',y') = (%.,I) So, we have _ 1 t l — 1 (a,b) — "I?! (X ,Y ) ' ‘1'." (XQY) which gives I A J. '1'"! ~<‘ U (x',y') = §l (x,y) - I. Q.E.D. Recall from Section 3.1 that a T-F scene view can be completely characterized in terms of the coordinates of the endpoints of each scene primitive specified relative to the standard screen. By Lemma 3.1 we can multiply each of these endpoint coordinates by the same factor r to obtain their corresponding specification relative to the grid of resolution r, with the aggregate result providing a description of the same T-F scene view relative to the grid of resolution r. Because this T-F scene view des- cription change is accomplished using a direct similarity 47 transformation1 which also preserves the lepe of any linez, it follows that a change in grid resolution when viewing a scene does not affect the orientation of the primitives, the relative positioning among the primitives, or the relative interconnections which exist between the primitives. Only the lengths of the primitives and relative distance measures within the scene view are changed, all being multiplied by the same factor which determines the coordinate relabeling in the grid reso- lution conversion. We thus have the following theorem: Theorem 3.2 Given a T—F view of a scene having the following notational representation in terms of a set of coordinate pairs specified relative to the standard screen: 1 Intuitively, a Similarity transformation is a mapping of the plane into itself which preserves angle size. A direct similarity transformation is a Similarity transformation which preserves the sense of every angle. For a more formal definition and detailed discussion of similarity transformations, including their structure preserving prOperties, see [6-1]. 2 If the endpoint coordinates of a straight line seg- ment on the standard screen are (a,b) and (c,d), the line segment has SIOpe %}2. By Lemma 3.1, the same line seg- ment on the grid of resolution r will have endpoint coordinates (ra,rb) and (rc,rd) respectively, with its rd—rb = d-b rc-ra c-a' arbitrary, this shows that a change in grid resolution does not affect the slope of straight lines. lepe given by Since the resolution r was (I) II k { K 48 (a1,b1),(a2,b2)} Then the set rS = {:{(ra1,rb1),(ra2,rb2)} (a1,bl) and (a2,b2) are the standard screen coor- dinates of the two end- points of a scene primi- tive and r an integer 3 1 {(a1,b1),(a2,b2)}eS} describes the same T-F scene view at grid reso- lution 111.09 . An immediate consequence of this theorem, in light of Corollary 3.1, is the following: Corollary 3.2 Given a T-F scene view having the following notational representation in terms of a set of coor- dinate pairs specified relative to the grid of resolution "r": r {(xl’yl)’(x2’y2)} (x1,y1) and (x2,y2) are the coordinates on the grid of resolution r of the two endpoints of a scene primitive 49 Then the set {(x . ).(x . l} r. r, 1Y1 2Y2 r' r' r' , F_ S = {(?—x1,;—y1),(F—x2,¥—y2)} e S and r,r are integers Z 1 describes the same T-F scene view at grid reso- lution "r'". In summary then, shapes and orientations in T-F scene views are preserved under grid resolution conver- sions; only effective sizes and distances -- and hence the detail reflected in any encodement -- are increased or decreased by a corresponding change in the resolution of the grid being used for viewing and encoding a scene. 3.3 Algorithmic Encodement of TZF Scene’Views Before presenting the formal algorithm which can be used to generate grid encodements for T-F scene views, the notation for describing individual grid squares will be discussed. Each grid square will be labeled g(x,y), where x and y are the abscissa and ordinate, respectively, of the geometric center of that grid square. These coor- dinates are Specified relative to the natural coordinate system of whatever resolution grid the grid square is a part. Note that there are r2 grid squares in a grid of resolution r, each one Specified by a unique label of the form g(p-.5, q-.5) where p and q are both integers between 1 and r inclusive. 50 The encodement algorithm to be presented has the structural form of a formal grammar, and is applied in a manner analogous to the derivation of terminal strings using the productions of a grammar. Making this gramma- tical comparison more concrete, the algorithm can be con- sidered as having the set of non-terminal symbols {S,L,V,H,Gv,Gh}, the set of terminal symbols {g}, the starting type S, and a set of productions given by groups (I) through (VI) of the algorithm specification (see Subsection 3.3.1). Every terminal or non-terminal symbol is followed by a set of parentheses enclosing one or more coordinates separated by commas, each of which -- with one exception -- represents a real value, either explicitly or through some algebraic expression which may involve variables. The one exception to the form of these coor- dinates is the first coordinate of the starting type S, which is a finite set of pairs of two-dimensional real- valued coordinates representing a portion of the nota- tional description of the scene in terms of the endpoints of each of its primitives. Because of the analogy which exists between the algorithm and formal grammars, and also because of the coordinates associated with each terminal and non-terminal symbol, the algorithm is termed a "coor- dinate grammar". One restriction must be observed when the algorithm is applied: as each step of the algorithm is executed (i.e., as each "production" is applied), the explicit 51 numerical value of each coordinate must be calculated before proceeding to the next step. In other words, after each application of a syntactic "production" of the algor- ithm, it is necessary to consider the semantics of each coordinate expression which results before proceeding further. The application of the algorithm thus bears a strong resemblance to interpretive processing, and so we make a further refinement of our terminology by referring to the algorithm as an "interpretive coordinate grammar". The rules, or "productions", of the algorithm will be presented next in Subsection 3.3.1, followed by some comments regarding notation in Subsection 3.3.2. Sub— section 3.3.3 contains a brief discussion on the method of the algorithm, while an example in Subsection 3.3.4 illus- trates the manner in which the algorithm is applied. A detailed explanation of how the algorithm works, in which we conclude that the algorithm does indeed generate the proper encodements of T-F scene views and nothing more, is the subject of Subsection 3.3.5. This is followed with the consideration in Subsection 3.3.6 of two cases which require special semantic interpretations for the coor- dinates generated by the algorithm. Subsection 3.3.7 concludes this section by introducing the concept of extended grids -- larger than those previously defined -- which are needed for the prOper interpretation of some encodements generated by the algorithm. 52 3.3.1 The "Linguistic" Algorithm (0) The completely specified starting type: S {(319131):(C19d1)}:{(a23b2):(Czed2)}: ,r o o o ,{(ap’bp) , (Cp’dp)} (I) Rule for changing the specification of each primi- tive to the encodement resolution "r", and a termi- nation rule: (a) S({L1,L2,...,Lk},r) + 8({L1,L2,...,Lk_1},r) LJL(r.ak,r.bk,r.ck,r-dk) (b) SULI‘) + I0 (II) Rule defining which vertical and horizontal grid lines can be crossed by a primitive: L(a,b,c,d) + V(Tfiin(a,67 ,Lmax(a,el ,0,a,b,%§§) UH(rTnin(b,d') ,Lr_nax(b,d_)_, ,o,a,b,£C1—}g) (III) Rules for generating the coordinates of all inter- sections of a primitive with vertical grid lines: (a) V(x1,x2,n,x,y,m) + V(x1-l,x2-l,n+l,x-l,y-l,m) (b) V(0,x2,n,x,y,m) + V(0,x2-l,n+1,x-l,y-l,m) LJGv(n,Y'm'X+n) (IV) (V) (VI) 53 (C) V(0,0,n,x,y,m) + GVCDaY'm-X+n) (d) V(1’0’n’XSY’m) + g Rules for generating the coordinates of all inter- sections of a primitive with horizontal grid lines: (a) H(Y1:Y2:n9x9y,m) + “(VI-1SYZ'19n+19x'ISY‘19m) (b) H(0’YZ’n9x,y,m) + “(OSYZ'1:n+19X'ISY'19m) u Gh(x-:-1-.y + n,n) (C) ”(0,0,H,X,Y:m) + Gh(x-%-y + nsn) (d) H(1.0,n.X.y.m) + ID Rule for generating the grid squares which are shaded due to a point of intersection of a primi- tive with a vertical grid line: SUI-.5.ry‘-.5).g(n-.5,Ly_,+.5), Gv(n,y) + g(n+.5,FYI-.5),g(n+.5,LyJ+.5) Rule for generating the grid squares which are shaded due to a point of intersection of a primi- tive with a horizontal grid line: g(rxl-.5,n-.5),g(LxJ+.5,n-.5) Gh(x,n) + g(er-.S,n+.S),g(LxJ+.5,n+.5) 54 3.3.2 Remarks on Notation There are a number of symbols appearing in the algorithm which were not included in the terminal or non- terminal alphabets of the previous grammatical analogy discussion. While a precise formal description of the algorithm as an interpretive coordinate grammar would require that all these symbols be included in the speci- fication of its alphabet, the more intuitive grammatical analogy notions already introduced will be sufficient for understanding the structure and application of the algor- ithm. Thus, in the interest of keeping notational com- plexity to a minimum, the algorithm will not be formalized further in terms of its grammatical prOperties. Instead, the symbols in the algorithm not specified as part of the alphabet in the grammatical analogy will be described informally as to their interpretation rather than as abstract symbols. First, several delineators are apparent in the algorithm: left-right bracket pairs, "{ }", used to enclose the elements of a set; left-right parentheses pairs, "( )", used to enclose coordinates; and commas, ",", used to separate coordinates and the elements of a set. Additionally, the empty set "D" appears as the null symbol of the algorithm, performing a function comparable to that of the empty string in conventional grammars. Next, the symbols "p" and "k" are integer variables representing positive, non-zero, integer values defining SS subscript limits. We then have the symbols "ai", ”bi”, "ci", and "di" -- for i = l, 2,...,p -- as algebraic variables acting as placeholders for Specific numerical values. The symbols "Li" -- for i = 1, 2,...,p -- repre- sent the set {(ai’bi)’(ci’di)}'3 Other algebraic varia- bles acting as placeholders for specific numerical values are "a", "b", "c", "d", "r", "m", "n", "x1", "x2", "x", "y", "yl", and "yz". The symbols "0", ".5", and "l" are themselves interpreted as Specific numerical values. Further, there are symbols in the algorithm which represent ordinary algebraic Operations: "+", "-", ".”, and "—" represent the conventional notations for addition, subtraction, multiplication, and division, respectively. Several algebraic functions are also involved: the symbol "min" represents the function which has a value equal to the least numerical value of its arguments; the symbol "max” represents the function which has a value equal to the greatest numerical value of its arguments; the nota- tion fl_J” denotes the "entier" function, whose value is the greatest integer less than or equal to the numerical value of the enclosed argument; and the notation "F1" denotes the least integer greater than or equal to the numerical value of the enclosed argument. 3 All possible symbols "Li" are covered with the value of "i" ranging from 1 to p instead of l to k, -because the value of k will never exceed the value of p in this algorithm. 56 Finally, the set theoretic Operation of set union is denoted by the conventional symbol "LV'in the algorithm. It should be emphasized here that the rewriting symbol "+" is part of the describing language for the algorithm interpreted as a grammar, and thus does not comprise any part of the strings produced by the algorithm. Rather, the symbol "+" gives a structure to the rules of the algorithm analogous to the structure of the produc- tions of a grammar, and thus suggests a grammatical interpretation of the algorithm as a set of rewriting rules to be used for deriving strings of terminal symbols. The rewriting symbol "+" thus has its conventional gram- matical interpretation within the rules of the algorithm, which is: a string of symbols having the form specified to the left of the arrow can be replaced by (or "rewrit- ten" as) the string of the form which appears to the right of the arrow. 3.3.3 Method of the Algorithm: A SynOpsiS The algorithm is designed to generate binary valued (shaded or non-Shaded) grid encodements of T-F views of scenes by starting with both a specification of a T-F scene view relative to the standard screen, and a specification of the grid resolution r at which the encodement is to be produced. With such specifications as the coordinates of the starting type S, the rewriting rules in groups (I) through (VI) of the algorithm are 57 applied like the productions of a formal grammar -- with the coordinate values numerically interpreted at each step -- to produce a set of terminal symbols representing exactly those grid squares which are Shaded in the encode- ment at grid resolution r. Thus, the result of a "deri- vation" using this algorithm is a set {g(x1.y1).g(xz.Y2).---.g(xq.yq)} which designates q grid squares g(xi,yi), for i=l,2,...,q, as those which are shaded in the encodement at grid reso- lution r. All other grid squares are not shaded in the encodement. Note that the number "q" of shaded grid squares in the encodement will be zero in the case of the empty snapshot represented by the empty set of terminals, up". The algorithm is based on the fact that a grid encodement of a T-F scene view is uniquely determined by the intersection points of all the primitives with the horizontal and vertical grid lines. This follows directly from the fact that all intersection points of primitives with grid lines reflect all touching points of the scene primitives with the sides of grid squares, and that the touching of the side of a grid square by one or more scene primitives is the only encodement criterion for shading that grid square. We thus have the following interpretation of the encodement scheme in terms of the intersection points of scene primitives with grid lines: and (1) (2) (3) 58 if an intersection point is a grid lattice point, all four grid squares sharing the intersection point at a common corner are shaded; if an intersection point lies on a vertical grid line other than at a lattice point, the two grid squares -- one to the left, and one to the right, of the vertical grid line -- which share the intersection point on a common vertical side are shaded; if an intersection point lies on a hori- zontal grid line other than at a lattice point, the two grid squares -- one above, and one below, the horizontal grid line -— which Share the intersection point on a common horizontal side are Shaded. This interpretation of the encodement scheme forms the procedural basis for generating shaded grid squares in the algorithm. In generating the intersection points between primi- tives and grid lines, upon which the encodement of a T-F view of a scene is based, the algorithm determines the intersection points separately for each primitive. The algorithm further subdivides this generation by deter- mining each primitive's intersections with horizontal grid lines (if any) separately from it's intersections with vertical grid lines (if any). In determining the 59 vertical grid line intersection points for a primitive, the algorithm effectively substitutes the x-value along each vertical grid line crossed by the primitive into the straight line equation of that primitive to determine the corresponding y-coordinate of the intersection point. Similarly, in determining the horizontal grid line inter- section points for a primitive, the algorithm effectively substitutes the y—value of each horizontal grid line crossed by the primitive into the straight line equation of that primitive to determine the corresponding x-coor- dinate of the intersection point. The term "effectively" is used in both cases to denote that the intersection point coordinates are determined relative to successively translated coordinate systems. These translations convert the vertical grid line intersections of a primitive into y-axis intercepts, and convert the horizontal grid line intersections of a primitive into x-axis intercepts. Additional detail will be provided on these intersection determinations using coordinate translations when we examine the interpretation of each rule in the algorithm in Subsection 3.3.5. The general method upon which the algorithm is based is summarized in the flow charts of Figures 3.2 and 3.3. Notice that the flow chart in Figure 3.3 details the logic of block (E) of the flow chart in Figure 3.2. The other letter labels which appear with the various blocks 60 Initialize a cumulative specifi- cation of all shaded grid squares (A) at the empty set, "0". Yes i Specify a T-F scene view on scene primitives on the grid of resolution r. the standard screen, and an (B) encodement resolution r. Determine the coordinates of . I both endpoints of one of the (C) (D) Does this primitive cross any vertical grid lines? No Does this . primitive cross any horizontal grid lines? No (H) Do any scene primitives remain to be encoded? (E) Generate all points of inter- section of the primitive with vertical grid lines, produce a list of all grid squares shaded by these intersec- tions, and update the cumu- lative specification of all shaded grid squares. J (a) Generate all points of inter- section of the primitive with horizontal grid lines, pro- duce a list of all grid squares shaded by these intersections, and update the cumulative specification of all shadedggrid squares. l Yes Figure 3.2. Flow chart interpretation of the method upon which the "linguistic" encodement algorithm is based. (Enter (E03 1 Determine the leftmost vertical grid line crossed by the primitive. 1 Generate the coordinates of the point of intersec— tion of the primitive with the vertical grid line. (L) Is this intersection point a lattice point of the grid? (N) Generate a set of 2 termi- nal symbols having coordi- nates representing the center of the 2 grid squares sharing the inter- section point on a common vertical side. Figure 3.3. Flow chart detail of block (E) in Figure 3.2. 61 (J) (R) Consider the next vertical grid line to the right. (K) (M) Generate a set of 4 ter- minal symbols having coor- dinates representing the center of the 4 grid squares sharing the inter- section point on a common corner. (P) Add the terminal symbols to the cumulative specifi- cation of all shaded grid squares, eliminating any duplicates. Does the primitive cross any additional vertical :rid lines? Lent (2)1 62 will be used for reference in Subsection 3.3.5, and are of no consequence here. 3.3.4 An example To illustrate the application of the algorithm, the T-F scene view in Figure 3.4 will be encoded on the grid of resolution "4". The completely specified starting type (0) for the algorithm is S({L1,L2,L3,L4},4), where L1, L2, L3, and L4 are as given in Figure 3.4 - (b). The rules of the algorithm can now be applied to this starting type to ultimately produce the prOper encodement of the T-F scene view illustrated in Figure 3.5. We proceed as follows: The completely specified starting type is: S({L1,L L3,L4},4). 2’ To encode the primitive represented by "L4" (the "slanted" line segment) on the grid of resolution 4, rules (I) - (a) and (II) must be applied. Applying rule (I) - (a) yields: S({L1,L2,L3},4) u L(%%-4,%%-4,I%-4,1—%-4). Interpreting the coordinates yields: 8({L1,L2,L3},4) u L(3%-,3%-,l%,l). 63 0 1 (a) A T-F scene view on the standard screen. {L1,L2,L3,L4} where L1 = {(T%.T%).(%%,T%)} L2 = {(%%.T%).(%%.%%)} L3 = {(§.a}%).(%%.1}%)} L4 = {(%%.%%).(T%.T%)} (b) The set notational representation of the T-F scene view in (a). Figure 3.4. T-F scene view for algorithmic encodement example. 64 0 1 2 3 4 (a) Encodement of the T-F scene View of Figure 3.4 on the grid of resolution 4. g(1.5,.5),g(2.5,.5),g(3.5,.5),g(1.5,1.5),g(2.5,l.5), g(3.5,l.5),g(2.5,2.5),g(3.5,2.5),g(2.5,3.5),g(3.5,3.5) (b) Set of terminals denoting the grid encodement in (a) Figure 3.5. Encoded result of T-F scene view of Figure 3.4 on the grid of resolution 65 Applying rule (11) yields: S({L1,L2, L 3} ,4) 1- 37 U V( minal-,1?) , max(3-4-,laZ-_)J ,0, 3%,3%,——— 11 1 1‘7 321 31 32 l 1 U “(5511.31.17 .._r_nax(3%.1) .0.3¢.32.11—————) 1 l 734 Interpreting the coordinates yields: S({L1,L2,L3},4) UV(2,3,0,3%—,3%—,-1—%) U H(l,3,0,371r,3%-, 19). To determine all grid squares Shaded by primitive "L4" due to its intersections with vertical grid lines, all occurrences of the symbol "V" are to be removed by appli- cations of rules in group (III), with all symbols "Gv" which may result also to be removed by applications of rule (V). Applying rule (111) - (a) and interpreting coordinates yields: S({L } 4) UV(1,2,1,21 21 1°) 1’ L2’ L3 ’4’ 221—7 UH(l, 3, o ’31'3’2’l7 0.) Again applying rule (III) - (a) and interpreting coor- dinates yields: 1 1 10 8({L1,L2,L3},4) UV(0,1,2,1I,17,—7) UH(1,3,0,3},,3%,£9). -——) 66 Applying rule (III) - (b) and interpreting coordinates: 5 U Gv(2,l7) 1 1 10 U H(l,3,0,3I-,3-2-,—7). Applying rule (V) and interpreting coordinates: . H1 10 g(l.5,l.5),g(l.5,l.5), U g(2.5,1.5),g(2.5,1.5) l l 10 UH(19390331932”—7)0 Eliminating duplicate set elements: 1 S({L1,L2,L3},4) UV(0,0,3,I,I, —7) U {g(l.5,l.5),g(2.5,l.5)} UH(l, 3, 0 ,3I,3I,10 ). Applying rule (III) - (c) and interpreting coordinates: S({L1,L 2, L3}, 4) UGv (3, 37) U {g(1.5,1.5),g(2.5,1.5)} UHC1.3.0.3%.3%.-1-3). Applying rule (V), interpreting coordinates, and elimi- nating duplicate set elements: S({L1,L2,L3},4) LJ{g(2.5,3.5),g(3.5,3.5)} LJ{g(l.5,1.5),g(2. 5,1.S)} UH(1,3,0,3I,31, 10). 67 Combining sets of terminals with the set union Operator: S({L1,L2,L3},4) LJ g(1.5,1.5).g(2.s,1.5), g(2.5,3.5),g(3.5,3.5) U H(l,3,0,3%—,3%-,l-g). Next we must determine all grid squares shaded by primi- tive "L4" due to its intersections with horizontal grid lines. This is done by removing all occurrences of the symbol "H" by applications of the rules in group (IV), and also by removing any resulting symbols "Gh" by applications of rule (VI). Applying rule (IV) - (a) and interpreting coordinates: 5({L1,L2,L3}’4) U 8(1-59105):g(2059105)3 g(2.5,3.5),g(3.5,3.5) U H(0,2,1,2I,2.1I,l°7). Applying rule (IV) - (b) twice and then applying rule (IV) - (c), interpreting coordinates after each step, yields: 3({L1.L2.L3}.4) U g(l.5,1.5),g(2.5,1.5)} g(2.5,3.5),g(3.5,3.5) IJ Gh(l%,l) LJGh(z§,2) LIGh(2T%,3). Applying rule (VI) three times, interpreting coordinates, and eliminating duplicate elements within each set, yields: 68 5({L1,L2,L3},4) U g(1'591'5)9g(2'591'5)9 g(2.5,3.5),g(3.s,3.5) LJ{g(1.s,.5),g(1.s,1.5)} LJ{g(z.s,1.5),g(2.s,z.5)} LJ{g(2.5,2.5),g(2.5,3.5)}. Combining sets of terminals with the set union Operators: 8({L1,L2’L3}’4) U g(losa's)sg(1-5’1°5)9g(2°531f5)s g(2.5,2.5),g(2.5,3.5),g(3.5,3.5) At this point, the algorithm has produced the complete encodement of the "slanted" line segment primitive ”L4". The encodement of each of the remaining three line seg- ment primitives -- represented by "L1","L2", and "L3" -- can be produced in a similar manner. The primitive represented by "L3" (the short horizontal line segment at the tOp) is encoded by the sequential application of rule (I) - (a), rule (II), rule (III) - (a) three times, rule (III) — (d), rule (IV) - (a) three times, and then rule (IV) — (d), in that respective order, to the string produced thus far. Coordinates are inter— preted at each step, and the set union Operator is applied to unite the two sets (both empty) of terminals generated for the encodement of "L3". The result is: S({L1,L2},4) LID g(l.5,.5),g(l.5,l.5),g(2.5,l.5), g(2.5,2.5),g(2.5,3.5),g(3.5,3.5) 69 Note that the encodement of primitive "L3" is the empty set, since "L3" does not cross any grid lines at resolution 104'! . The primitive represented by "L2" (the vertical line seg- ment on the right) is encoded by the sequential application of rule (I) - (a), rule (II), rule (111) - (a) three times, rule (III) - (d), rule (IV) - (a), rule (IV) - (b) twice, rule (IV) - (c), and then rule (VI) three times, in that respective order, to the string produced thus far. Coor- dinates are interpreted at each step, and the set union Operators are applied to unite the four sets (one of which is the empty set) of terminals generated for the encodement of "L2". The result is: 8({L1}.4) U{g(3.5,.5),g(3.5,1.5)} g(3.5,2.S),g(3.5,3.5) U93 g(1.5,.5),g(l.5,l.5),g(2.5,l.5), {g(2.5,2.5),g(2.5,3.5),g(3.5,3.5) . Note that the entire encodement of primitive "L2" resulted from its intersections with horizontal grid lines, as it lies entirely between two consecutive vertical grid lines at resolution "4". The primitive represented by "L1" (the horizontal line segment on the bottom) is encoded by the sequential 70 application of rule (I) - (a), rule (11), rule (III) - (a) twice, rule (III) - (b), rule (III) - (c), rule (V) twice, rule (IV) - (a), rule (IV) - (c)4, and then rule (VI), in that respective order, to the string produced thus far. Coordinates are interpreted at each step, and the set union Operators are applied to unite the three sets of terminals generated for the encodement of "L ". The 1 result is: S(¢,4) g(1.5,.5),g(1.5,1.5),g(2.5,.5), LJ g(2.5,l.5),g(3.5,.5),g(3.5,l.5) g(3.5,.5),g(3.5,1.5), U{g(3.5,2.5),g(3.5,3.5)} U ‘1 g(1.5,.5),g(1.5,1.5),g(2.5,1.5), g(2.5,2.5),g(2.5,3.5),g(3.5,3.5) . At this point, the encodement for each of the four primi- tives in the T-F scene view is complete. The first coor- dinate of "S” has become the empty set, signifying that no more primitives remain to be encoded. Applying rule 4 The x-coordinate of the "Gh" symbol generated by the application of rule (IV) - (c) includes the expression "%.y", which in this case takes the mathematically unde- fined form "%.0". This expression is to be interpreted as having the value "zero", in order to insure that prOper encodements are produced. This is explained in detail in Subsection 3.3.6. (Also see Footnote 6 of this Chapter). 71 (I) - (b) to the string generated thus far, and combining all the sets of terminals with the set union Operators, terminates the algorithm with the following result: g(l.S,.5),g(l.5,l.5),g(2.5,.5),g(2.5,l.5),g(2.5,2.5), g(2.5,3.5),g(3.5,.5),g(3.5,1.5),g(3.5,2.5),g(3.5,3.5). This set of terminals represents the complete encodement -- as shown in Figure 3.5 -- of the T-F scene view of Figure 3.4 on the grid of resolution "4". This example has demonstrated the application of the algorithm in producing the encodement of a Specific T-F scene view. While this has illustrated the manner in which the rules can be applied, it remains to be Shown that the algorithm works in general. Hence, the next topic to be considered will be the verification of the validity of the algorithm in the general case. 3.3.5 Method of the Algorithm: A Detailed EXplanation and Verification Through a discussion emphasizing the semantics associated with the rules of the algorithm, this sub- section details the structure, application, and inter- pretation of the rules of the algorithm. In addition to explaining the method and logic of the algorithm, the goal is to justify that the structure of the algorithm is such that the prOper encodement of T-F scene views is assured. 72 Recall the grammatical structure of the algorithm with rules which are applied like the productions of a grammar in deriving a terminal string. As is the case with productions in a grammar, the order of application of the rules of the algorithm in deriving sets of terminals is not necessarily unique. However, a unique order of application for the rules of the algorithm will be used in this discussion in order to make the logic of the algor- ithm more understandable. This unique order appears in the flow charts of Figures 3.2 and 3.3, which represent the method used by the algorithm in generating encodements. Bear in mind that the flow chart in Figure 3.2 details the logic of block (E) of the flow chart in Figure 3.1. Both of these flow charts will be referenced frequently as we associate this logical structure with the rules of the algorithm. The entity labeled "(0)" in the algorithm represents the complete specification of the starting type for the algorithm. More specifically, the starting symbol "S", chosen to be suggestive of the word "Scene", is followed by two coordinates. The first coordinate of "S" speci- fies a T-F view of a scene in terms of its set notational description. The scene has "p" primitives (p 3 l), with each primitive represented by its two endpoint coordi- nates specified relative to the standard screen. The second coordinate of "S" is a finite, non-zero, positive integer "r" which specifies the grid resolution at which 73 the T-F scene view described by the first coordinate is to be encoded. Once the starting type has been completely specified with its two coordinates -- corresponding to block (B) of Figure 3.2 -- the rewriting rules in groups (I) through (VI) of the algorithm can be applied to pro- duce the complete encodement specification. A single application of rule (I) - (a) corresponds to block (C) of the flow chart in Figure 3.2. One of the "p” primitives "L1" = {(ai’bi)’(ci’di)} is removed from the first coordinate of the starting type "S" and converted to its corresponding representation on the grid of resolution "r" as "L(r-ai,r.bi,r-ci,r-di)". The symbol "L" is chosen to be representative of the term "line segment primitive", with its first two coordinates representing the abscissa and ordinate, respectively, of the other endpoint. One application of rule (I) - (a) thus corresponds to a partial application of Theorem 3.2, using a modified notation for the result of the coordinate transformation. Rule (II) Of the algorithm is preparatory to asking the questions posed in blocks (D) and (F) of Figure 3.2. This rule converts the specification of line segment primitive "L" into the specification of symbol ”V", which will be used in generating intersection points of the primitive with gertical grid lines, and the Specification of symbol "H", which will be used in generating inter- section points of the primitive with horizontal grid lines. 74 Recalling that the coordinates of "L" Specify the endpoints of the primitive on the grid of resolution "r", we note the following regarding the coordinates of "V": and (1) (2) (3) (4) the first coordinate of "V” represents the least integer greater than or equal to the smallest x-coordinate of the primitive's end- points, and thus specifies the leftmost ver- tical grid line that the primitive can cross; the second coordinate of "V" represents the greatest integer less than or equal to the largest x-coordinate of the primitive's end- points, and thus specifies the rightmost vertical grid line that the primitive can cross; the third coordinate "0" initializes a counter which will keep track of coordinate axes trans- lations as intersection points on grid lines are determined; the fourth and fifth coordinates, "a" and "b” respectively, represent the coordinates of one of the endpoints of the primitive, which along with the sixth coordinate -- which represents the SIOpe of the primitive -- provide sufficient information for constructing the "point-lepe" equation for the straight line segment primi- tive. 75 In a similar manner, the first coordinate of symbol "H" specifies the lowest horizontal grid line that the primi- tive can cross, the second coordinate of "H" specifies the highest horizontal grid line that the primitive can cross, and the remaining four coordinates of "H" are identical to those of "V". The rules in Group (III) of the algorithm are used in determining all vertical grid line crossings of the primi- tive, if any, corresponding to block (D) and a portion of block (E) in Figure 3.2. Each application of rule (III) - (a) corresponds to a translation of the origin of the coordinate system one unit to the right and one unit up (i.e., one unit along the 45° diagonal of the grid). This decreases by "l" the values of all the coordinates of "V" specified relative to the coordinate system, which includes those values in the first, second, fourth, and fifth coordinates. The value of the third coordinate of "V", which represents a counter keeping track of the location of the translated coordinate system relative to the natural coordinate system of the grid of resolution "r", is increased by "l". The last coordinate of "V" repre- senting the slope of the primitive remains unchanged, as the SIOpe of a line is invariant under a translation of coordinates. Rule (III) - (a) is applied consecutively zero or more times until one of the rules (III) - (b), (III) - (c), or (III) - (d) is applicable. Rule (III) - (d) is applicable at this point if and only if both the 76 smallest and largest x~coordinates on the primitive (which occur exactly at the endpoints of this line segment primi- tive) are not integers themselves, and both lie strictly between the same two consecutive integer values. This is signified by the "1,0" combination in the first two coor- dinates of "V", respectively, which occurs here if and only if the leftmost vertical grid line which the primitive SEE cross (as determined in rule (11)) is to the right of the rightmost vertical grid line which the primitive ggn_cross (also as determined in rule (II)). Thus rule (111) - (d) is applicable only when the primitive does not cross any vertical grid lines, corresponding to the "no" exit of block (D) in Figure 3.2. If rule (III) - (d) was not applicable at this point, this signifies that the "yes" exit from block (D) in Figure 3.2 would be taken. In this case, the "0" appearing as the first coordinate of "V" indicates that the value of the third coordinate of "n" represents the leftmost vertical grid line on the grid of resolution ”r" which is intersected by the primitive. Thus, the applications of rule (III) - (a) which led to the "0" in the first coordinate of "V” have already accom- plished block (J) of Figure 3.3. Rule (III) - (c) is now applicable if and only if the second coordinate of "V", as well as its first coordinate, has reached a value of "0", signifying that the y-axis of the translated coor- dinate system coincides with the rightmost vertical grid line intersected by the primitive. In this case, an 77 application of rule (III) - (c) generates the symbol "GV" -- chosen to be suggestive of the terminology "intersection point on a gertical grid line" -- with two coordinates specified relative to the natural coordinate system of the grid of resolution "r". These two coordinates repre- sent the abscissa and ordinate, respectively, of the inter- section point of the primitive with the rightmost vertical grid line it crosses. However, this intersection point lies on the y-axis of the translated coordinate system, and thus constitutes the y-intercept of the primitive with coordinates ((Ly-m.x) relative to the translated coor- dinate system, where: (l) "x" and "y" as the fourth and fifth coordinates of "V" represent the abscissa and ordi- nate, respectively, of an endpoint of the primitive relative to the translated coordinate system; and (2) "m" as the Sixth coordinate of "V" represents the slope of the 5 Therefore, adding the translation factor "n" primitive. (the third coordinate of "V") to the coordinates of this y-intercept specified relative to the translated coordinate system results in the coordinates "(n,y-m-x+n)," which 5 If "m" ==n, then the only time it will appear in the coordinates of the y-intercept -— or in the coordinates of "CV" -- is when the value of ”x" has become identically zero. In this case, the expression "m-x" takes the mathe- matically undefined form " (from Figure 3.7) [ Apply rule (II) 1 Is rule (III)-(d) :oplicable? I e .. Apply rule rule (III)-(c) (V) applicable? Yes Apply rule (111)-(d) ' Yes Apply rule (111)-(6) NO Apply rule Yes rule (III)-(b) Apply rule (III)-(b) applicable? (V) No ’JA 1 1 (to F1 - I pp y rule (III)-(a) 8 ure 3.7) Figure 3.6. Initial component of flow chart for "lin- guistic" algorithm application. 87 o (from Figure 3.6) rule (IV)-(d) applicable? Yes Apply rule . (IV)-(d) No g I [Apply rule rule (IV)-(c) applicable? Yes Apply rule (VI) (Ivl-(c) No Apply rule (VI) Yes rule (IV)-(b) applicable? Apply rule (IV)-(b) O iApply rule (IV)-(a)] rule (1)-(b) l_App1y rule (1)-(b) applicable? ure 3.6) Figure 3.7. Terminal component of flow chart for “lin- guistic" algorithm application. 88 applied to that occurrence of that symbol. But the cases where more than one rule can be applied to an occurrence of a symbol in a derivation, and thus where a choice must be made, need further explanation. These cases are exhaustively enumerated for the algorithm as follows: case(l): whenever rule (III) - (b) is applicable, rule (III) - (a) is also applicable; case(2): whenever rule (III) - (c) is applicable, rule (III) - (a) is also applicable; case(3): whenever rule (III) - (d) is applicable, rule (III) - (a) is also applicable; case(4): whenever rule (III) - (c) is applicable, rule (III) - (b) is also applicable; case(5): whenever rule (IV) — (b) is applicable, rule (IV) - (a) is also applicable; case(6): whenever rule (IV) - (c) is applicable, rule (IV) - (a) is also applicable; case(7): whenever rule (IV) - (d) is applicable, rule (IV) - (a) is also applicable; and case(8): whenever rule (IV) - (c) is applicable, rule (IV) - (b) is also applicable. For the application of the rules of the algorithm to be consistent with their intended interpretations, the rule first mentioned in each of the above eight case descrip- tions should be the rule chosen for application whenever that respective case arises in a derivation. The conse- quence of choosing to apply rule (III) - (a) in either 89 case (1) or case (2) would be the generation of the non- terminal symbol "V" with a first coordinate having a negative value. The only rule subsequently applicable for rewriting a "V” with a negative first coordinate would again be rule (III) - (a), whose application would only result in another appearance of the symbol "V" with some negatively valued first coordinate. As this can only continue, the non-terminal symbol "V" having some nega- tively valued first coordinate will always be present in any subsequently derived string. Similarly, the conse- quence of choosing to apply rule (IV) - (a) in either case (S) or case (6) would be the generation of the non- terminal symbol "H", appearing with a negatively valued first coordinate and persisting with some negatively valued first coordinate in all subsequently derived strings. The consequence of applying rule (III) - (a) in case (3), or rule (III) - (b) in case (4), would be the generation of the non-terminal symbol "V", appearing with a negatively valued second coordinate and persisting with some negatively valued second coordinate in all subse- quently derived strings. Analogously, we finally have the consequence of applying rule (IV) - (a) in case (7), or rule (IV) - (b) in case (8), as the generation of the non-terminal symbol "H", appearing with a negatively valued second coordinate and persisting with some nega- tively valued second coordinate in all subsequently derived strings. So the conclusion in any of these eight 90 cases is that the choice and application of the "wrong" rule guarantees the existence of a non-terminal symbol in any string derived. Thus, the application of a rule of the algorithm with other than its intended interpretation at any point in a derivation precludes the possibility that a set of terminal symbols will be the exclusive result of that derivation. As a result of the preceding extensive discussion, the algorithm has been "proven" in the following sense: Theorem 3.3 Consider a completely specified starting type consisting of the terminal symbol "S" followed by a pair of parentheses enclosing two coordinates sepa— rated by a comma, where the first coordinate repre- sents a T-F scene view "U" using its set notational description relative to the standard screen, and where the second coordinate represents the grid resolution "r" at which "U" is to be encoded. Additionally, let "T" denote the unique set of ter- minal symbols (with coordinates) which represents the encodement of "U" on the grid of resolution "r", where each element of "T" consists of the symbol "g" followed by two coordinates which specify the abscissa and ordinate of the center of one of the Shaded grid squares. 91 Then there exists some sequential application of the rules of the algorithm (in which each rule may appear -' consecutively or non-consecutively -- zero or more times) which converts this completely speci- fied starting type into exactly the set "T" and nothing else. Further, any set of terminal symbols (with coordinates) derived as the exclusive result of applying the algorithm to the completely specified starting type will be precisely the set "T". 3.3.6 Semantic Interpretation: Two Special Cases When the algorithm is applied to encode a T-F scene view, situations can arise in which a Special inter- pretation of coordinate values is required to insure that the algorithm generates the proper encodement. Such a situation occurs whenever a primitive of a T-F scene view coincides along its entire length with one of the grid lines. The interpretation problem results when conven- tionally undefined mathematical expressions arise in coordinates due to the "0" or "a," value of the lepe of the primitive. Since these slope values occur if and only if a primitive is either a horizontal or vertical line segment in a T-F scene view, the consideration of all special coordinate interpretations can be included in two cases -- the case where a primitive is a vertical line 92 segment, and the case where a primitive is a horizontal line segment. We first consider the vertical primitive case. In the case where a primitive is a vertical line segment, it is parallel to the vertical grid lines and hence the coordinates "a" and ”c" of symbol "L" in rule (II) of the algorithm are equal in value. This implies that the Sixth coordinate generated by rule (II) for symbol "V" (and also for symbol "H") will have the form "962", where the value of "d-b" is non-zero since every scene primitive must have finite length. Since this sixth coordinate of "V" (and also of "H") represents the slope of the primitive, the expression "$62” is interpreted as "<3". Now a vertical primitive lies either strictly between two consecutive vertical grid lines, or directly on a vertical grid line. If the primitive lies strictly between two consecutive vertical grid lines, then rule (II) of the algorithm generates the symbol "V" with the value of its first coordinate exceeding the value of its second coordinate by "1". This means that rule (III) - (d) will be applicable after the zero or more applications of rule (III) - (a) required to reduce the second coor- dinate of "V" to zero. Hence rules (III) - (b) and (III) - (c) will never be used when encoding this primitive, eliminating any concern about the factor "" for "m" causes no problems of interpretation. In fact, since the value of "y" in the x-coordinate of "Gh" will always be finite, the expression "%-y" will always have the value "0" when interpreted with "m" = "10" and "%~0" are conventionally mathematically undefined, but are both to be interpreted here as representing the numerical value "0" when they arise during application of the algorithm, which will insure prOper encodements. These are the only two Special coordinate value interpretations needed in any derivation which results exclusively in a set of terminals. If either of the two mathematically undefined expressions occur for any other reason than the ones given in (l) and (2) above, or if any of the coordinates of any occurrence of "CV" or "Gh" have an interpreted value of "cm", then at least one rule of the algorithm has previously been applied contrary to its intended interpretation. Such an occurrence Signi- fies that the derivation will not have a set of terminals as its exclusive result. 97 3.3.7 Grid Extension When the algorithm is used to encode a T-F scene view which has at least one of its primitives touching a side border of the standard screen, the encodement generated by the algorithm will include one or more ter- minal symbols "g" having coordinates which Specify a point lying outside the region of the encodement grid superim- posed on the standard screen. This is due to the fact that the touching point of a primitive on a border of the stan- dard screen is also an intersection point of that primitive with either a horizontal or vertical border line of the encodement grid. Hence, the algorithm (through rule (V) or (VI)) generates terminals whose coordinates represent the centers of shaded grid squares on both sides of that grid border line, although the grid has not yet been defined on one of those Sides. In such a situation, then, the problem is that grids which are confined to the area of the standard screen, and whose borders must be aligned with those of the standard screen, are not adequate to represent the complete encodement of the T-F scene view. A solution is provided by symmetrically extending the overall Size of an encodement grid through an increase in the total number of squares in the grid, while maintaining the Size of the individual grid squares and the alignment of the original grid (now symmetrically embedded in the "extended" grid) with the standard screen. This extension is accomplished for the grid of arbitrary resolution r -- 98 as defined in Definition 2.7 -- by merely appending an additional grid square (of the same Size as those already in the grid) along all sides and corners of the grid border. Note that this puts the borders of the standard screen strictly within the borders of the extended grid. Hence, any T-F Scene view appearing on the standard screen -- even a view which touches the border of the standard screen -- will have its complete encodement con- tained entirely within the borders of the extended grid of any resolution r. The concept of an "extended grid” is formalized as follows: Definition 3.2 The extended grid of resolution r is the grid consisting of (r+2)2 identical squares, formed by extending the grid of resolution r "1" grid unit (at resolution r) on all four sides (left, right, Up, and down) and diagonally at all four corners. The coordinate system of the extended grid of resolution r is formed by symmetrically extending both the x- and y-axes of the grid of resolution r by two grid units -- the x-axis being extended one grid unit in both the positive and negative x directions, and the y-axis being extended one grid unit in both the positive and negative y directions. The origin of the coordinate system is maintained in its 99 location before the extension, so that the origin appears at the lattice point located 1 grid unit right and 1 grid unit up from the lower left corner of the extended grid. Thus the coordinate axes labels for the extended grid of resolution r run from "-1" to "r+l". Note that this definition insures that the grid and extended grid of resolution r both have the same size grid squares, and that the origins of their coordinate systems coincide. This preserves the definition of resolution for grids, and insures that all the results obtained in Section 3.2 -- which characterize the effects of grid resolution changes on T-F scene view descriptions -- are directly applicable for extended grids. Hence, it is no loss of generality to interpret an encodement generated by the algorithm on the apprOpriate extended grid. And such an interpretation will insure that the coordinates of all terminal symbols "g" generated by the algorithm will be within the coordinate system of the extended grid. Some examples of finite extended grids with different resolutions defined relative to the same standard screen are shown in Figure 3.8. Note that these extended grids, unlike the corresponding (non-extended) grids discussed in Section 2.4, are not all of the same absolute size. How- ever, the grid of resolution r is embedded in the corres- ponding extended grid of the same resolution, with its 00 1 (a) The Standard Screen (Grid of Resolution 1) 0 -1 -1 0 l 2 3 (c) The Extended Grid of Resolution 2. 100 -1 -l 0 1 2 (b) The Extended Grid of Reso- lution 1. +5 +4 +3 +2 +1 0 ’11 o (d) The Extended Grid of Resolution 4. Figure 3.8. Examples of extended grids. 101 borders located "1" grid unit in from the correSponding borders of the extended grid. Further, bear in mind that the x and y coordinate axes are located "1" grid unit in from the bottom and left borders, respectively, of the extended grid. This is reflected by the coordinate labeling along both the bottom and left borders of the extended grids in Figure 3.8, which puts the origin of the coordinate system one grid unit up and right from the lower left corner. Because the coordinate systems of both the grid of resolution r and the extended grid of resolution r are identical on the standard screen, no confusion should result if a grid is considered to be an extended grid of the same resolution whenever necessary or appropriate. Hence, we will hereafter use the terms "grid" and "extended grid" interchangeably, unless designated otherwise. 3.4 A1 orithmic Encodement o SnapShotS The "linguistic" algorithm (so named because of its resemblance in both structure and application to for- mal grammars) of Section 3.3 can be considered to generate snapshots -- instead of encodements consisting of shaded grid squares in specific locations on a grid -- if the coordinate values of the terminals are interpreted in the proper manner. Recall from Chapter 2 that although a snapshot is defined with respect to some specified grid resolution, it is only the structural form (including the 102 overall orientation) of the entire pattern of shaded grid squares which defines the snapshot, irrespective of the relative translational position of that pattern on the grid. Thus, if a snapshot is completely represented (relative to some designated grid resolution) by a set of symbols, each of which has a unique pair of coordinates Specifying the center of one of the shaded grid Squares in the snapshot, then the absolute values of these coordi- nates are of no consequence with regard to their determi- nation of the location of the pattern on the grid; it is only the relative positional relationships (location and orientation) existing between these coordinates which characterize the snapshot. This implies that any set of terminals (with coordinates) generated by the algorithm can be considered to represent a snapshot relative to the encodement resolution, under the interpretation that the coordinates of the terminal symbols only have positional significance relative to each other. As a consequence of the preceding remarks, we note that distinct grid encodements (at the same resolution) which have the same relative positional relationships existing between their elements represent the same snap- shot. A characterization of these equivalent encodement representations for a snapshot will be presented in Theorem 3.4, using the notation defined next. Let "T" be a set of terminal symbols (with coordi- nates) as follows: 103 T={g(x1.y1).g(xz.y2).....g(xq.yq)}. where q is a non-negative integer. Also, let "f" denote a coordinate transformation. Then we use the notation "f(T)” to denote the set {g(x;.yi).g(X;.y;).-...g(x;.y;)}. | I where (xi,yi) = f(xi,yi) for 1=l,2,...,q. Theorem 3.4 Let T1 and T2 be two sets of terminal symbols (with coordinates), each of which is the exclusive result of a derivation using the "linguistic" algor- ithm of Section 3.3 with the same encodement reso- lution. Then, T1 and T2 are equivalent encodement repre- sentations for the same snapshot (at the encodement resolution) if and only if f(T1) = T2 for some coordinate transformation "f" of the form f(X.y) = (x+a.y+b). where "a" and "b" are integer constants (either of which may be positive, zero, or negative). Proof The proof is based on the fact that the form specified for coordinate transformation "f" in the theorem defines "f" as a translation. Therefore, the 104 application of "f" to any set of coordinates "T" merely redefines the absolute location of all the points represented by the coordinates in "T". The result f(T) is a set having the same number of ele- ments as set "T", and all relative positional relationships (location and orientation) existing between the elements of "T" are maintained between the corresponding elements of f(T). Now consider the application of f to the encode- ment T1. The x-coordinate of each terminal symbol in T1 will be consistently relocated by "a" units, while the y-coordinate of each terminal symbol in T1 will be consistently relocated by "b" units. Hence the result f(Tl) is a set consisting of the same number of terminals as T1, with f defining a 1-1 correspondence between these terminals. Further, the relative positional location existing between the coordinates of any two terminals in encodement T1 is maintained between the translated coordinates of the corresponding terminals in f(Tl). But the coordi- nates of the terminals in f(Tl) represent the centers of grid squares on the encodement grid, Since they are derived by integer-valued translations -- through integers "a" and "b" of f -- of the coordinates of the elements in encodement T1. Thus the set f(Tl) can be interpreted as representing a set of shaded grid squares in an encodement, in which case the relative 105 positional relationships existing between any two shaded grid squares in encodement T1 is the same as the relative positional relationships which exist between the two corresponding shaded grid squares in encodement f(Tl). This means that T and f(Tl) are 1 equivalent encodement representations for the same snapshot (at the encodement resolution). Now if f(Tl) = T2, then obviously f(Tl) and T2 are equivalent encodement representations for the same snapshot, since they are identical. And since we have already Shown that T1 and f(Tl) are equivalent encodement representations for the same snapshot, it follows that T1 and T2 are equivalent encodement representations of the same snapshot (at the encode- ment resolution). Conversely, assume that T1 and T2 are equivalent encodement representations for the same snapshot (at the encodement resolution). Then T1 and T2 have the same number of elements7, and a 1-1 correspondence must exist between their elements such that the 7 This is a necessary condition for two encodements to be equivalent representations for the same snapshot. For if the number of elements in encodements T1 and T2 were different, then one would represent a pattern con— taining more shaded grid squares than the other. It would then follow that the two shaded grid square patterns represented could not be the same snapshot. 106 relative positional relationships existing between any two Shaded grid squares in encodement T1 also exists between the two corresponding shaded grid squares in encodement T2. Let "f." denote the 1-1 transformation from the elements of T1 to the elements of T2 which realizes this l-l correspondence, so that f'(T1) = T2. Then f. must be a coordinate transformation which preserves in f'(T1) the relative positional relation- ships existing between the elements of T1, implying that f' is a translation. Further, since f'(T1) = T2 and T2 is an encodement, the coordinates of all the terminal symbols in f'(T1) must represent the centers of grid squares on the encodement grid. Similarly, Since T1 is an encodement, the coordinates of all the terminal symbols in T1 also represent the centers of grid squares on the encodement grid. It thus follows that the translation increments of f. must be integer values for both the x-coordinate and y-coordinate relocations. Hence f1 is a translation of the same form as f, with f'(T1) = T2. Q.E.D. The intuitive interpretation of Theorem 3.4 is that two T-F scene view encodements (on the same resolution grid) represent the same snapshot whenever the entire pat- tern of shaded grid squares defined by one of the encode- ments can be translated to coincide precisely with the 107 entire pattern of shaded grid squares defined by the other encodement. This suggests the existence of an effective procedure for determining whetheror not two encodements represent the same snapshot, and the construction of such a procedure appears as the proof of the next theorem. Theorem 3.5 There exists an effective procedure for deter- mining whether or not two T—F scene view encodements (relative to the same resolution grid) represent the same snapshot at the encodement resolution. Proof Let T1 and T2 represent two encodements relative to the same resolution grid, where (using the notation of the "linguistic" algorithm) T1 {g(x1,19Y1’1):g(x1,zeyl,2)9---ag(x1,p:yl,p)} and T 2 ' {g(x2,1.y2,1).g(xz,2.y2,2)..-..g(x2,q.y2,q)h with p and q non-negative integers. Using Theorem 3.4, it suffices to show that one can effectively determine whether or not there exists a translation f having the form defined in Theorem 3.4 for which f(Tl) = T2. A procedure for accomplishing this is now described. First compare the values of p and q, which represent the number of elements in T1 and T2, 108 respectively. If p f q, then T1 and T2 cannot be equivalent encodement representations for the same snapshot (see Footnote 7), and we terminate this procedure with that conclusion. On the other hand if p = q, then compute the values of "a" and "b" as follows: a = min{x2,1,xz’z,...,x2’q} - min{x1’1,x1’z,...,x1’q} and b = min{y2,1,y2,2,...,y2’q} - min{y1,1,y1’2,...,y1,q}. Note that "a" represents the horizontal distance and direction (relative to T1) that exists between the leftmost shaded grid squares of T1 and T2, while "b" represents the vertical distance and direction (relative to T1) that exists between the lowest shaded grid squares of T1 and T2. Further, both "a" and "b" are integers, since each of the x- or y-coor- dinates of any of the terminals in encodements T1 and T2 represents the middle line between two conse- cutive grid lines, making the difference between any two x-coordinates or any two y-coordinates an integer. So if T1 and T2 are encodements which represent the same snapshot, then the values of "a" and "b" repre- sent the integer valued x and y distances, respec- tively, through which T1 must be translated on the 109 encodement grid to coincide with T2. Thus, we can define translation "f" by f(X.y) = (x+a.y+b). and determine f(Tl). Then if f(Tl) = T2, we conclude (using Theorem 3.4) that T1 and T2 are equivalent representations for the same snapshot at the encode- ment resolution. On the other hand, if f(Tl) # T2, we conclude (again using Theorem 3.4) that T1 and T2 are not equivalent encodement representations for the same snapshot. In either case, the procedure is terminated. Q.E.D. Both theorems 3.4 and 3.5 state results which will be useful in examining the characterization of scene views through the snapshots they generate. We turn to such an examination next in Chapter 4. Chapter 4 SCENE VIEW RECOGNITION WITH SNAPSHOT SIGNATURES 4.1 Scene View Reco nition: General Considerations This chapter will deal with the problem of the recognition of scene views from their encodements. The problem stems from the fact that the encodement of a scene view on a finite grid does not necessarily preserve the full structural detail which characterizes that scene view. It is apparent that grids of coarse resolution (lower resolution values) are less capable of faithfully encoding detail in a scene view than grids of fine reso- lution (higher resolution values) would be. Hence the problem becomes one of identifying a scene view from its approximate representation on an encodement grid of finite resolution, where the degree of difficulty in making such an identification is directly related to the relative coarseness of the grid. The scene view recognition problem will be examined in the context of determining which of a finite number of known candidate scene views is represented by a given encodement (or encodements). Hence, we assume that a 110 111 finite number of prototype scene views are given, along with one or more encodements of a scene view which is an unknown one of the prototypes. The problem then is to identify which one of the prototype scene views generated the encodement (or encodements). Practically, an exact identification may not always be possible, in which case the goal is to keep the probability of making an identi- fication error small. We shall assume that any encodement of an unknown scene view results from some random positioning of an encodement grid over that scene view. This is a realistic assumption in many practical applications. For example, consider a situation in which a remote sensing "camera" is arbitrarily positioned (without tilting) to record the "picture" of some unknown object currently in view, after which the recorded "picture" is transmitted elsewhere for processing. In this case, the encodement of the unknown scene view is analogous to the ”picture" of the unknown object transmitted from the camera. Further, the reso- lution of the encodement grid is analogous to the reso- lution with which the camera records the picture. Finally, the random positioning of the encodement grid over the unknown scene view corresponds to the arbitrary posi- tioning of the camera before it records a picture of the unknown object. The consequence of assuming the random positioning of an encodement grid over a scene view is that the scene 112 view can generate, and thus be associated with, two or more snapshots. This implies that the characterization of a scene view in terms of the encodements it can generate at a specified grid resolution must be based on more than one snapshot. This will become more evident in the discussion and examples of the next section. 4.2 Snapshot Sets It was noted in Chapter 2 that the encodement which results from a grid placed randomly over a scene view is dependent upon, among other things, the relative translational position existing between the grid and scene view. This became even more apparent in Chapter 3, where the algorithm for generating the encodement for a scene view was dependent upon the translationally-fixed ("T-F") position of that scene view with respect to the encodement grid. Since we now wish to consider the encodement which results from a random positioning of a grid over a scene view, it might appear that the encodements produced by all possible T-F positionings of the scene view relative to the encodement grid would need to be examined. Fortunately however, we are interested only in the distinct snapshots which are represented by all these encodements. In this case, due to the symmetry of the encodement grid, any two T-F scene views which represent integer valued trans- lations -- both horizontally and vertically -- of each other on the encodement grid will generate encodements 113 which represent the same snapshot. This observation is rephrased in Theorem 4.1 in terms of a restricted set of encodement grid translations which will produce all possi- ble snapshots of a scene view on that grid. The proof will utilize the following notation. Let "D" represent the set notational description of a T-F scene view relative to some encodement grid, where {(al,b1),(c1,d1)},{(a2,b2),(c2,d2)}, ...,{(ap,bp),(cp,dp)} with p a positive, non-zero integer. Also let "f" denote a two-dimensional coordinate transformation. Then the notation "f(D)" denotes the set {f(a1,b1),f(c1,d1)},{f(az,b2),f(c2,d2)}, ...,{f(ap,bp),f(cp,dp)} Theorem 4.1 All possible snapshots of a scene view on an (extended) encodement grid are represented by the encodements which result from all possible trans- lations of the encodement grid which move the scene view no more than % grid unit, in both the horizontal and vertical directions, away from its (arbitrary) initial T-F position on that encodement grid. Proof Let "Dr" denote the set notational description of some T-F scene view in its initial position 114 relative to an (extended) encodement grid of reso- lution r. Further, let "M" be a set of coordinate translations as follows: M={f I f(x,y)=(x+a,y+b), with a, b e [-.5,.5]}, where [-.5,.5] = {z I —.5 i z i .5}. Then the set V = {f(Dr) | f e M} denotes all the T-F scene views which result when the encodement grid is translated such that the scene view is kept % grid unit or less, in both the hori- zontal and vertical directions, from its initial T-F position on that encodement grid. Now consider some translation of the encodement grid which moves the scene view more than % grid unit either horizontally or vertically (or both) away from its initial position on the grid. Then the resulting T-F scene view is given by f'(Dr), where f1 is a coordinate translation of the form f'cx.y) = (x+a',y+b'). where either a1 t [-.5,.5] or b1 t [-.5,.5], or both. 'Hence f'(Dr) t V. But note that I I I I a =a -0=a -L§ +-_51+12 +-.5_1 I I I =(a ‘13. +-§1)+ 1a +-.§1 . where the quantity in parentheses denotes some value I in [-.5,.5], and where (a +.§, denotes an integer. 115 Similarly, b'=(b'-m'+._51) + Lh'+._51 . where the quantity in parentheses denotes some value in [-.5,.S], and where Ib'+.§J denotes an integer. Let f1 denote the translation flcx.y)=(x+a'- e'+.§. .y+b'- m «.51). and f2 denote the translation fz(X9Y)=(X+L§'+-§lay+lD'+-§J)- Then f'(Dr) = f2(f1(Dr)). But f1(Dr) c v, and f2(f1(Dr)) -- which is f'(Dr) -- denotes the trans- lation of T-F scene view f1(Dr) some integer number of grid units in both the horizontal and vertical directions on the encodement grid. Such a translation (i.e., f2) guarantees that corresponding points (under £2) along the primitives in both f1(Dr) and f'(Dr) will have the same relative positional location with respect to the pattern formed by the vertical and horizontal lines of the encodement grid. In particular, the intersection points with grid lines of the primitives in fl(Dr) will be mapped (under f2) exactly onto the intersection points with grid lines of primitives in f'(Dr), with the corres- ponding points of intersection of the primitives in f1(Dr) and f'(Dr) appearing on the same type of grid line (vertical or horizontal) at the same relative distance (and in the same relative direction) from the grid line of Opposite type which is nearest each 116 respective point. Thus the corresponding points of intersection with grid lines of the scene primitives in fl(Dr) and f'(Dr) cause precisely the same pattern of grid squares to be shaded. Further, since f1(Dr) and f'(Dr) are related by a translation (i.e., f2), the relative positional relationships existing i between corresponding points of intersection of the scene primitives with grid lines will be the same in both f1(Dr) and f'(Dr). Hence, if T1 and T. repre- sent the encodements (as a set of terminals with coordinates which denote the centers of the shaded grid squares on the encodement grid, as in Chapter 3) produced by T-F scene views fl(Dr) and f'(Dr), respectively, then f2(T1) = T'. Hence, by Theorem 3.4, it follows that T1 and T. are equivalent encode- ment representations for the same snapshot on the grid of resolution r. Thus, the theorem has been proved, Since translation f' was arbitrary. Q.E.D. Note that the limited range of translations of the encodement grid Specified by Theorem 4.1 as being suffi- cient to generate all the possible snapshots for a scene view also insures that every such snapshot can be entirely represented within the borders of the extended grid of the encodement resolution. This follows because any scene, 117 and hence any scene view, can be contained entirely within the borders of the standard screen. Letting some T-F position of a Scene view on the standard screen define the initial T-F position of that scene view relative to the (extended) encodement grid will then insure that any point on any primitive of the scene view is at least 1 grid unit (at resolution r) away from the border of the extended grid of arbitrary resolution r. Hence, any translation of the encodement grid from this initial position which has its horizontal and vertical components of motion limited to % grid unit (at the encodement resolution) will produce only T-F scene views which, at any point, are at least % grid unit away from any border of the extended grid of the encodement resolution. Thus, since no border line of this extended grid is touched, the entire encode- ment -- and hence the snapshot it represents -- lies within its borders. The set of all the distinct snapshots which a scene view can generate on an encodement grid can be considered as a collective representation of that scene view relative to that grid. This suggests the following terminology. Definition 4.1 A snapshot set of a scene view, relative to an encodement resolution r, is the collection of all snapshots which can result by encoding that scene 118 view (in different relative translational posi- tionings) on the extended grid of resolution r. The snapshot set of a scene view relative to any finite encodement resolution r will itself be finite, since the maximum number of binary encodements -- each representing a (not necessarily distinct) snapshot -- which can be formed by the grid squares of the extended grid of resolution r is 2(r+2)2. Of course, not all of these possible snapshots will appear in the snapshot set of a particular scene view. In fact, the number of distinct snapshots in a snapshot set of a scene view can be relatively small, as illustrated in Figure 4.1. The shaded grid square patterns labeled (a) through (g) in Figure 4.1 represent the seven snapshots in the snapshot set of an isosceles triangle relative to an encodement grid whose resolution makes the base and altitude of the triangle 2 grid units and 1 grid unit, respectively. Similarly, the Shaded grid square patterns labeled (1) through (IV) in Figure 4.1 represent the four snapshots in the snapshot set of a square relative to an encodement grid whose resolution makes the side length of the square 2 grid units. The appearance of the figure itself (triangle or square) in each of the snapshots is intended to illustrate its fixed orientation throughout the encodement process. Also, the relative translational positioning existing between the figure and the grid lines shown in each snapshot illustrates how that snapshot might 119 fififi mmMflmm mmfiMmm fififi (II) (m fifi % fififi fififi @fififi fifififl (III) Egg kmfi (d) were afim . see (IV Two examples of snapshot sets. Figure 4.1. 120 be generated in grid encodements of that figure. However, remember that the figure itself is not part of the snap- shot -- only the shaded grid squares are. For scene views having a more complex structure than those just considered in Figure 4.1, the determination of a snapshot set -- especially with respect to high encodement resolutions -- can be a formidable task. However, a prac- tical method exists for approximating a snapshot set in such cases. Essentially, the method consists of applying the "linguistic" algorithm of Chapter 3 a finite number of times to produce the T-F scene view encodements which result from randomly chosen relative translational positionings between the encodement grid and scene view. The random sample of encodements which results for that scene view can then be examined using the effective proce- dure given in the proof of Theorem 3.5 to eliminate any duplicate specifications of the same snapshot within the sample. The resulting collection of encodements, inter- preted as a collection of distinct snapshots, then repre- sents an approximation for the snapshot set of the scene view relative to the resolution of the encodement grid. A method for choosing the random relative trans- lational positionings between the encodement grid and scene view in the snapshot set approximation procedure will now be explained. Let "v" represent the scene view whose snapshot set is to be approximated, and let "v0" denote some T-F position of ”v" whose set notational description 121 relative to the standard screen is denoted by "D". Con- sider the set "V" given by v = {f(D) I f(x.y)=(x+a,y+b) and a.be[-7%.I%1}, where [-7%,7%] = {Z I -7% < z i 7%}, and r is a positive, non-zero integer. Each of the elements of V is the set notational description of a T-F position of ”v" relative to the standard screen, produced by some translation which moves "v" away from its position at "v0" no more than 2% grid units, in both the horizontal and vertical directions, on the standard screen. But since a distance of 7% grid units on the standard screen corresponds to a distance of % grid unit on the (extended) grid of resolution r, each of the elements of V can be interpreted as the set nota- tional description of a T-F position of "v" relative to the standard screen in which "v" has been translated away from its position at "v " no more than % grid units, in 0 both the horizontal and vertical directions, on the (extended) grid of resolution r. Hence, it follows from Theorem 4.1 that all possible snapshots of "v" relative to resolution r are generated by encoding all the T-F scene views in V on the extended grid of resolution r. But V is an infinite set, so it is not practically pos- sible to individually encode each of the T-F scene views in V. However, any two randomly and independently chosen T-F scene views in "V" have the same probability of encoding any given snapshot of "v". Hence, it seems 122 reasonable to expect that the snapshot set of "v" can be usefully approximated by the collection of snapshots which results from encoding all the T-F scene views in some ran- domly chosen finite subset V' of V, provided that the number of elements in V' is sufficiently large.1 Note that a (non-unique) set V' of n elements can be formed from V by independently choosing each v'eV' as follows: (1) randomly select two independent real values "a" and "b" from the interval [-7%,7%], and then (2) determine v' as f(D), where f is the coordinate translation defined by f(x,y) = (x+a,y+b). The elements of V' can then be individually encoded at resolution r by applying the "linguistic" algorithm of Chapter 3 "n" different times. It should be apparent that the randomness in the snapshot set approximation procedure just described can be incorporated directly within the structure of the "lin- guistic" algorithm of Chapter 3 by appropriately modifying its "productions". One such modification will be pre- sented which keeps the same starting type S, but introduces an additional non-terminal symbol "8'". Additionally, the 1 The number of elements chosen for V' should cer- tainly depend on the relative complexity of scene view "v" -- the more complex the structure of "v", the larger the set V' should be. Further, if the set V' is large enough, the only snapshots of "v" which are likely to be omitted by the snapshot set approximation computed from V' would be those which have a low probability of being the encodement result from a random translational positioning of the encodement grid over "v". These occurrence pro- babilities for snapshots are discussed in Section 4.3. 123 symbol "ran” will appear in the coordinates to denote a randomly chosen real value from the set {zloizil}, where that value is chosen independently for each occurrence of the symbol "ran". The symbols "r1" and "r2" are used to represent two independently generated random real values between 0 and 1. Now consider the following group of three rules: (1)-(er 8({L1.L .Lp}.r) + 2,... S'({L1,L2,...,Lp},r,ran,ran) , (I)-(b)' S'({L1,L2,...,L },r,r1,r2) + S'({L1L2,...,Lk_1},r,r1,r2) LJL(r a k+r1- 7,r rb k+r2- 7,r c k+rl— 7,r rd k+r2- -%) (1)-(C)! S'Cparsrlsrz) + 9 Modify the "linguistic” algorithm of Chapter 3 by replacing rule (I) - (a) and (I) - (b) with the rules (I) - (a)', (I) — (b)', and (I) - (c)'. Then for some scene view "v", let the first coordinate of starting symbol "S” represent the set notational description for some T-F position "v0" of "v" on the standard screen. The second coordinate of starting symbol "S" should be the desired encodement reso- lution for "v". Then any set of terminals which is the exclusive result of a derivation using the modified algorithm represents the snapshot which results from some 124 random translation of "v" within % grid units, both hori- zontally and vertically, from its initial position "v0" on the encodement grid. Note that this random translation of "v" is done independently, but consistently, for each of its primitives by the coordinates of symbol "L" in rule (1)-(b)'- Suppose a Simple or composite scene view is divided into several component parts, and the snapshot sets for each of these component parts then determined separately on the encodement grid of resolution r. Contrary to what might be expected, the snapshot set of the total scene view relative to resolution r is not necessarily repre- sented by the set of all possible combinations which can be formed by taking one element from each of the snapshot sets of its component parts. This is illustrated with Figures 4.1 and 4.2. In Figure 4.1 the snapshot sets separately determined for each of two object views are represented -- the 7 snapshots of the triangle define its snapshot set, while the 4 snapshots of the square define its snapshot set. This represents 28 possible combinations of two snapshots (one taken from each snapshot set), all of which might be expected to appear in some composite scene view containing both of the objects. However, Figure 4.2 illustrates the same triangle and square having a fixed positioning relative to each other, thus forming a composite scene view of the two objects which has only the 7 (not 28) snapshots Shown in Figure 4.2 in its snapshot 125 Bottoms of square and triangle are 11::Eb at the same vertical level, with the square to the right of the triangle. The objects are separated horizontally by 3 grid units at the encodement resolution. Composite Scene View (with objects from Figure 4.1) (e) - (IV) (f) - (IV) (s) - (IV) Figure 4.2. A composite scene view and its snapshot set. 126 set relative to the encodement grid. The labeling of each snapshot of the composite scene in Figure 4.2 reflects the two snapshots from Figure 4.1 -— one from each object -— which combined to form that snapshot of the composite scene view. Figure 4.3 illustrates that the snapshot set of a composite scene is dependent upon the relative positioning between its component objects. The triangle and square of Figure 4.1 have a fixed position relative to each other in Figure 4.3 which is different from their positioning in Figure 4.2, thus forming a different composite scene view which has the 8 snapshots Shown in Figure 4.3 as part of its snapshot set on the encodement grid.2 The comparison of Figure 4.3 with Figure 4.2 thus reflects the effect that the relative positioning of the component objects in a composite scene view can have on the snapshot set -- in terms of both the number and form of its elements -- of that composite scene view. The overall conclusion to be drawn from these illustrations is that the "superposition" principle doesn't hold in the formation of a snapshot set -- i.e., the snapshot set of a simple or composite scene view "v" cannot be constructed by forming all pos- sible combinations of snapshots from the snapshot sets of The snapshots of the composite scene view in Figure 4.3 which are formed from combinations involving snapshots (e), (f), and (g) of Figure 4.1 have not been Shown in Figure 4.3. 127 :: Bottom of square 18‘% grid unit below bottom of triangle, with the square to the right of the triangle. The objects C2:p::i::ji:::ef¥i:w are separated horizontally by 3% grid Figure 4.1) units at the encodement resolution. %% ASSESS fiéfihh‘fifi §§I§§ flfififi 8:333 (a) - (IV) fifl‘fi fimmfi fififi (b) - (IV) ,% figfiffi WEE afilifi fififi fifiifi (C) — (IV) %%%% fig fifififi fififi 3%- &wfi fifififi @mfi 1.113% (d) - (II) (d) - (IV) @ IE 1% Figure 4.3. Another composite scene View and part of its snapshot set. 128 the component parts of the scene view considered indivi- dually. While some subset of all these possible snapshot combinations will define the snapshot set of "v", exactly which subset it will be is determined by the intercon- nection and relative positioning constraints which "v" imposes on its components. Explicit consideration of these constraints can be avoided by merely determining the snapshot set of a simple or composite scene view from encodements made with the entire scene view intact. 4.3 Snapshot Signatures Note that it is possible for two distinct scene views to have the same snapshot sets relative to some encodement resolution. This is illustrated in Figure 4.4, where the snapshot sets for both a triangle and a left- right symmetrical trapezoid are shown relative to an encodement resolution which makes their base lengths 2 grid units, their altitudes 1 grid unit, and the tOp length of the trapezoid % grid unit. Both of the views of these objects shown in Figure 4.4 have the same 7 snapshots in their snapshot sets. Such an example demonstrates that snapshot sets alone are inadequate to completely charac— terize scene views relative to some encodement resolution, since a single snapshot set may be generated by two (or more) distinct scene views at that resolution. A more complete characterization of a scene view "v" in terms of its grid encodements at resolution r is Figure 4.4. PROBABILITY OF SNAPSHOT OCURRENCE (with random grid positioning) 'I————'— (b) (e) (d) (e) (f) (s) (a) Snapshot C.G.=(2,l——) 129 3 10 Snapshot C.G.=(2,%) l 3 Snapshot C.G.=(l§,liz) Snapshot C.G.-(l%,l) [01 < a2] 1 3 Snapshot C.G.-(L§;Z) [81 > 321 Snapshot C.G.=(1T%,T%) [Y1 < Y2] 7 9 Snapshot C.G.-(li6,T5) oe_J fifi fififi fififi fig 0 g i i O N “3% fi \ % %% 3% a ! s: g e is as s is es s; a a as e 533 fi g ES % £353 332% Example of two object views with the same snapshot set, but different snapshot signatures. % 130 possible by associating a probability distribution with the snapshot set S -- where S = {$1,SZ,...,Sn} -- of "v" relative to resolution r. This probability distribution is realized by a probability function "p" which assigns to each snapshot 5158 a value "p(si)" which represents the probability that Si will be the snapshot encoded as a result of some random positioning of an encodement grid of resolution r over the scene view "v”. We will call "p(si)" as defined above the "probability of occurrence of snap- shot si" relative to a specified Scene view and encodement resolution. Definition 4.2 The snapshot signature of scene view "v" relative to encodement resolution "r" consists of the Snapshot set of "v" relative to "r" along with an associated probability distribution which defines the probability of occurrence (relative to "r") of each snapshot in the snapshot set of "v". Examples of two snapshot signatures are illustrated in Figure 4.4. The snapshot Signature of the triangle rela- tive to the encodement resolution is given by the snapshot set {(a),(b),(c),(d),(e),(f),(g)} with the associated probability distribution: p((a))=p((b))=p((c))=0, p((d))=a1. p((e))=el. p((f))=rl. and p((g))=61. The snap- shot signature of the trapezoid relative to the encodement 131 resolution is given by the snapshot set {(a),(b),(c),(d), (e),(f),(g)} with the associated probability distribution: p((a))=p((b))=p(CC))=0. p((d))=a2. p((e))=82. p((f))=Y2. and p((g))=62. The probability of occurrence of each of the snapshots (a) and (c), for both the triangle and trapezoid, is 0 because these snapshots are generated only when the base of the figure lies exactly along a horizontal grid line. Such a result has probability 0 of occurring with some random positioning of the encodement grid over either figure. Similarly, the probability of occurrence of snapshot (b) for either the triangle or trapezoid is 0 because the snapshot is generated only when both endpoints of the base of the figure just touch two vertical grid lines -- again a result which has probability 0 of occurring with some random positioning of the encodement grid over either figure. Now snapshot (d) is produced for either the triangle or the square whenever a horizontal line of the encodement grid divides the figure into an upper and lower part, and each part intersects exactly two vertical grid lines. Such a result occurs with non-zero probability whenever an encodement grid is randomly posi- tioned over either figure. Further, notice that if a horizontal grid line divides both the triangle and trape- zoid at the same vertical level with respect to their bases, then the bottom parts of both figures will have the same maximum horizontal width, while the tOp part of the triangle will have a maximum horizontal width which is 132 less than the maximum horizontal width of the tOp part of the trapezoid. Hence, for equivalent horizontal posi- tionings (with respect to vertical grid lines) of the bases of both figures, there are less vertical positionings (with respect to horizontal grid lines) for the triangle than there are for the trapezoid which produce snapshot (d). So it follows that the probability of snapshot (d) occur- ring for some random positioning of the grid with respect to the figure is less for the triangle than for the trape- zoid -- i.e., a1 < a2. Similar considerations, based on the fact that the tOp of the trapezoid is wider than the tOp of the triangle, result in the conclusions that 81 > 82, Yl < Y2. and 61 < 62. Thus, the snapshot signa- tures of the triangle and trapezoid relative to the encode- ment resolution shown in Figure 4.4 are distinct, even though their snapshot sets are identical at that encodement resolution. This demonstrates that snapshot signatures -- in contrast with snapshot sets -- represent a more complete characterization of scene views in terms of their grid encodements. This snapshot signature characterization of scene views will prove useful when scene views are to be recognized from their encodements at coarse resolutions. The determination of the theoretical probability distribution for the snapshot signature of a scene View "v" relative to some encodement resolution "r" will now be discussed. We first make a definition of terminology. 133 Definition 4.3 A unit region relative to resolution r is the set of all points contained in a square-bordered region of area 1 on the grid of resolution r. A standard unit region relative to resolution r is a unit region relative to resolution r which has its borders parallel to the grid lines (i.e., hori— zontal and vertical). Let "p” denote a reference point on scene View "v". Then each relative translational positioning of v with reSpect to the encodement grid of resolution r is characterized by exactly one relative location of p with respect to the pattern of horizontal and vertical grid lines of the encodement grid of resolution r. In turn, each such location of p relative to the pattern of grid lines at resolution r is characterized by exactly one of the points contained in an arbitrary standard unit region "R" on the encodement grid of resolution r. Hence a random posi- tioning of the encodement grid of resolution r over the scene view v is equivalent to the random selection of a location for p from R. Note that the points of R are associated with the snapshot set "S" of v relative to resolution r by the function "h" defined as follows: for posR, h(po)eS such that h(p0) represents the snapshot of v that would be encoded on the grid of resolution r with v positioned so that its reference point p is at location 134 p0. Thus S = {h(po)|posR}. Now each snapshot in S can be characterized in terms of the set of points R1 9 R which are associated with it under "h" -- i.e., for S={sl,sz,..., Sn}, Ri = {polpoeR and h(p0) = Si} for 1=l,2,...,n. Further, each R1 can be interpreted as defining a subregion (not necessarily connected) of R having an area (inter- preted relative to the real Cartesian plane) denoted by ”A(Ri)", for i=l,2,...,n. Analogously, let "A(R)" denote the area of region R, so that A(R)=l (recall R is a stan- dard unit region). Then the probability that h(po)=si for a randomly chosen poeR is just the probability that posRi, which is given by A(Rl) A(Ri) . “KTRI = 1 = A(Ri) , for 1=l,2,...,n. But the probability that poeRi is just the probability of occurrence of snapshot si -- i.e., p(si) -- relative to scene view v and resolution r. Hence, the snapshot signa- ture of "v" relative to r is given by the snapshot set S = {h(po)|poeR} = {sl,sz,...,sn} with associated probability distribution p(si) = A(Ri) , for 1=l,2,...,n. Figure 4.5 illustrates the theoretical snapshot signa- ture of a rectangle relative to an encodement grid whose resolution makes the rectangle 3 grid units high and 5% grid units wide. Its snapshot signature is given by snapshot set S = {51,52,53,S4,SS,56,S7,58} with associated p(51) = 0 Snapshot $3 p(53) = 0 iflflflfififlflflfifififlfi iflfiflfifififififlfiwfl % 8 Snapshot 55 10(551 = 0 Snapshot s7 PCS7) = % 135 P($2) = 0 Snapshot 54 p(s4) = 0 gfiflfifl8fiflfiflflfiwififi asauwemmnmmmm % % WfiWHWRWflWHRWW fififlfifiwfiflfiflflfl Snapshot s6 p(56)=0 seesaw WEE suammmmmemmer % fi @ figfifififi % Snapshot $8 p(58) = % Figure 4.5. Snapshot signature of a 3 x 5% grid unit rectangle. 136 probability distribution p(sk)=0 for k=l,2,...,6, and p(s7)=p(s8)=%. The determination of this probability distribution can be illustrated by picking some reference point p on the rec- tangle, and some standard unit region R on the encodement grid. Let the upper left corner of the rectangle be reference point p. Also, let R be the standard unit region defined by one of the squares in the encodement grid3 -- say the one whose vertical borders are defined by x=i and x=i+l, and whose horizontal borders are defined by y=j and y=j+l. Then R = {(x,y)|i c. Note that by making c larger, we can increase our certainty that the test will not reject H1 when H1 is true, but we also decrease our certainty that the test will not accept H1 when H1 is false. Putting this another way, by making c larger the probability that the test will reject H1 when H1 is true is made smaller, but the pro- bability that the test will accept H1 when H1 is false is made larger. This reflects the two types of errors which must be considered in tests of hypotheses: (1) a type (I) error, which is made when the test rejects H1 when H1 is true; and (2) a type (II) error, which is made when the test accepts H1 when H1 is false. In conventional nota- tion [M-4], p(I) is used to denote the probability of a 147 type (I) error occurring and p(II) is used to denote the probability of a type (11) error occurring, relative to a specified test. As noted above, it is only possible to control one of p(I) or p(II) if the likelihood ratio cri- terion is used relative to a fixed sample size, since a single value of c must be chosen. Generally, c is chosen such that p(I) is fixed at a certain acceptable value a (typically .05 or .01), which automatically assigns some value 8 to p(II). While it has been shown in the liter- ature [M-4] that the likelihood ratio test minimizes the value of B (of p(II)) for a fixed value a (of p(I)) when a sample of (fixed) size n is being considered, the value 8 (of p(II)) may still be too large to be acceptable if this scheme were to be used for practical scene view identification. However, in many practical applications involving scene view recognition, it is reasonable to assume that experiment Er could be performed an arbitrary (but still finite) number of times on an unknown scene view v. For instance, if v was an unknown typewritten digit of a zip code, a mail sorter could reasonably have the capability to (rapidly) scan v an arbitrary number of times in attempting to identify the digit it represents. In such situations where the sample size n can be varied at will, it is possible to control both p(I) and p(II) at arbitrary pre-specified levels a and B, respectively. This is done by determining both a lower threshold c1l for Ln such that the test accepts hypothesis H1 with p(II) = B if Ln: c1, while the test accepts hypo- thesis H2 (rejects H1) with p(I) = a if Ln 1 c2. If c1 < Ln < c2, then the sample size n is not yet large enough (assuming that the test is performed sequentially with Ll, L2,L3,...,etc.) for the test to either reject or accept hypothesis H1 with the required degree of certainty of not making either a type (I) or type (II) error. This test is performed in a sequential manner as given by the following sequence of rules: (1) set i=1; (2) make an observation e; from experiment Br; (3) compute Li’ based on the random sample Ie§.e§,...,e§); (4) (a) if Li IA c1, accept H1 and terminate the test; (b) if Li I V c2, accept H2 and terminate the test; (c) if c1 < Li < c2, increase i by 1 and repeat the test starting with step (2). This test is known as the "sequential likelihood ratio test" [m-4], or alternatively as Wald's "sequential pro- bability ratio test" (denoted "SPRT" in [W-3]). We now state several well-known prOperties of the above SPRT without proof (proofs can be found in [H-4], [M-4], [W-l], and [w-31): and (a) (b) (C) 149 simple and accurate approximations to the cutoff values c1 and c2 for Ln’ with specified p(I) and p(II) = B, are given by -8 . CI'It—a and CZ’T’ the sample size n (i.e., the number of tests) required for the SPRT to terminate will be finite with probability 1, provided v1 and v2 have dis- tinct probability distributions over the sample space Sr = 81 LJSE the approximate average sample size E(n|vi) for the SPRT when vi is the actual scene view repre- sented by v, for i=1,2, is Pf(vi)-log(c2)+(l-Pf(vi))°log(c1) E( v.) = 4 ml 1 E(z|vi) where P fi(v ) is the power of the test given by the probability that v1 will be rejected when V1 is the actual scene view, so a for i=1 P (v.) = , f 1 1'8 for i=2 and r r E(zlvi) = P :(51)‘ 108-:——;—'+ p§(s§)°10g—?——?—-+ p1(51) p1(52) r r s ) + 1‘ 5r 1 [Bf—q— p ( ) 02 q r(Sr) P1 q -f-— 1 ‘I". 150 with p§(s§) = o if s? ¢ 5? , j=l,2,...,q. The average sample size approximation E(n|vi), i=1,2, given in (c) above breaks down for the cases when E(z|vi)=0 or when E(z|vi) doesn't exist (i.e., when E(z|vi)=1b»). However, on the average, the SPRT with p(I)=a and p(II)=B will require a smaller sample size n for making a decision than would a non-sequential (fixed-sample size) likelihood ratio test which realizes the same p(I)=o and p(II)=B. And by (b), we know that this sample size n will be finite for the cases of interest. Hence, the SPRT provides a practical procedure for utilizing snapshot signatures in scene view recognition. This is illustrated in the two examples presented next. 4.4.4 Two examples The expected sample size for the SPRT will be demonstrated for two examples in which an unknown scene view ch ={vl,v2} is to be identified by testing the hypotheses H1: v=v1 versus H2: v=v2 on the basis of a random sample from experiment Er. In both examples, the resolution r will be chosen such that 151 v1 and v2 have the same snapshot set relative to r, but distinct probability distributions over that set. Further, it is assumed that p(I) and p(II) are set at acceptable levels a and B, respectively. Example 1 Consider some arbitrary encodement resolution ro Z 6. Let v1 be a rectangle with horizontal side length 5% grid units and vertical side length 3 grid units, and v2 be a rectangle with horizontal side length 5% grid units and vertical side length 3 grid units -- all lengths being specified relative to re. The snapshot signature of v1 is illustrated in Figure 4.5, and is given by snapshot set 8:0 with associated probability distribution pic, where r 0 S1 = {51’52’53’54’55’56’57’58} with r P1°(Si) = 0 for i=1,2,...,6, and r r 1 910(57) = P10(58) = 7 ° The snapshot signature of v2 is given by snapshot r r set 820 with associated probability distribution p20, where 820 = 810 = {51’52’53’54’55’56’57’58} 152 with r0 p2 (Si) 0 for i=1,2,...,6, r r 3 P20(57) = 1 ’ and P20(58) = I ° r r Then the sample space S 0 of experiment E 0 is given by 1‘ r r o _ o o _ S ‘ S1 LJSZ ‘ {sl’52’53’54’55’56’57’58}° Now, the average size of the sample needed by the SPRT to identify v if v1 is the actual scene view is given by: a-logc-l-g—fi) + (l-aI-logcéa) E(IIIV1) = 9 E(z|v1) where 8 o p:°(s ) E(z|v1) - 2 p1 (s I-Iog—r—l— = O J 1 p1 (sj) = 0 + 0 + 0 + 0 + 0 + 0 1. 1/4 1. 3/4 * z 108177 * '2' 108m = -.l4384 For a = B = .05, E( I ) .05 log(19)+.95 log(T%) _ 85000 n V = = 1 -.14334 "11331 5.91 = 6 I ' -..b-n' 153 Similarly, the average size of the sample needed by the SPRT to identify v if v2 is the actual scene view is given by: anneal-$3) + Mom???) Emlvz) = E( I ) ’ 2 V2 where 8 r pr°(s ) BMW = Z P2°(5j)'1°g‘327—L i=1 p1°(sj) 0 + 0 + 0 + 0 + 0 +0 + Inga—z + new .13082 For a = B = .05, .95 log(19)+-05 108(T%) .85000 = 7136?? E(n|v2) .13082 6.48 = 7 Thus, for the identification of v with 95% certainty of being correct (i.e., the probability that v will be mis-identified is .05), the SPRT will require, on r the average, 6 observations of v based on E o if v1 is the actual scene view and 7 observations from Bro if v2 is the actual scene view. If it is desired to identify v with a 99% cer- tainty of being correct, the average sample size required by the SPRT can be determined, using a = B = .01, as 154 .01 log(99)+.99 log(§%) -4 5032 E(nlv ) = = .3333 1 -.l4384 "1 = 31.4 = 32 » and .99 1og(99)+-01 108(gé9 4 5032 E(DIV2) = = °i36§2 ' .13082 5 34.3 = 35 . I As would be expected, the average sample sizes re- 9 quired by the SPRT are significantly larger if v is to be identified with a 99% certainty of being cor- rect than if v is to be identified with a 95% cer- tainty of being correct. Example 2 A horizontal line "h" of length "a" on the stan- dard screen will have length "ra" on the (extended) grid of resolution r (see Theorem 3.2). Then h can intersect either (Ia, or (rag +1 vertical grid lines at resolution r, causing either (raj +1 or Ira; +2 consecutive horizontal grid squares, respectively, to be shaded in its encodement. Further, if h lies along a horizontal grid line at resolution r, its encodement will be 2 shaded grid squares high along its entire length; while if h lies between two hori- zontal grid lines at resolution r, its encodement will be exactly 1 shaded grid square high along its 155 entire length. Hence, the snapshot set S; of h at resolution r consists of the 4 snapshots s; 1, s; 2’ 9 3 r r . sh’3, and sh,4, where. 5h 1 is the solid shaded grid square pattern 9 which is 2 grid units high and [raj +1 grid units wide; 5h 2 is the solid shaded grid square pattern ’ which is 2 grid units high and [EQJ +2 grid units wide; 5h 3 is the solid shaded grid square pattern ’ which is 1 grid unit high and lIaJ +1 grid units wide; and 5h 4 is the solid shaded grid square pattern ’ wh1ch 1s 1 gr1d un1t h1gh and [IaJ +2 grid units wide. Now if a grid of resolution r were randomly trans- lationally positioned over h, (l) the probability that h would coincide with a horizontal grid line is "0" while the probability that h would lie between two horizontal grid lines is "l", and independently (2) the probability that h would cross tra, grid lines is "1 + (ra- Lra, )" while the probability that h would cross Ira, +1 grid lines is "ra- tra, ". Hence, the probability of occurrence p;(s;,i) of each snapshot 5;,1 at resolution r, for i=1,2,3,4, is given by p§(5;,1) pfi(s;,2) = 0 . r phcs;,3) = 1 -(ra - Ira; ) . -1 ..A. _ 1. u .1._..__' -_ .J— Jw 156 r r and s = ra - ra ph( 11,4) L._J So, the snapshot signature of h relative to resolution r is given by snapshot set I‘ I‘ } r = r r S {Sh,l’ Sh,2’ Sh,3’ Sh,4 h with associated probability distribution pi. Let v1 and v2 be horizontal lines of length .91 and .98, respectively, on the standard screen. Then the snapshot signature of v1 relative to resolution r is given by snapshot set = {5r 5 1,1’ 51,2’ 51,3, S1,4 r l (which is just Si with l substituted for h and .91 substituted for a) with associated probability distribution pi where P1(51,1) = 91(51,2) = 0 ’ p§(s§’3) = 1 - (.91r - t#913, ) , I‘ 1‘ _ _ and p1(sl’4) — .91r l;915 Also, the snapshot signature of v2 relative to reso- lution r is given by snapshot set r _ r 82 ‘ {52,1’ 52,2, 52,3, 52,4 (which is just sfi with 2 substituted for h and .98 substituted for a) 157 with associated probability distribution p; where I‘ r l‘ I' _ p2(52’1) p2(52,2) ' O : p§(s§’3) = l - (.98r - L.__981_“J ) , and p§(s§,4) .98r - L.3983, Now note that L._91£, = (‘983 for r=l,2,...,ll, and (,919 f {#983, for r112. This implies that v1 and v2 generate the same snapshot sets for r=l,2,.. ..,11, but different snapshot sets for r312. We restrict our attention to values of r between 1 and 11 for the remainder of this example, to insure that the approximation formulas for the average sample size of the SPRT are applicable (for r112, E(z|vi) will not exist for either i=1 or i=2, or both). For rill, note that SE = S? = {si, 55, 5;, SE}, where r _ r _ r ._ si - 51,1 - $2,i for 1—1,2,3,4. Hence, the sample space Sr of experiment Br is given by Sr = Si LJSE = {SE, SE, 5;, 5:} for rill. Consider r = 2. Then the sample space S2 of experiment E2 is given by 2 _ 2 2 2 2 52’ 53' where 2 piCsi) = pi(s2) = o , pi(s§) = 1 - (1.82 - L1.8_2_, ) = 1 - .82 = .18, 158 pi(si) = 1.82 - L1.82, = 82 , and 2 2 2 2 p2(51) - 132(52) - O : 2 2 _ P2(S3) - l - (1.96 - L1.9§J ) = 1 _ 96 = .04 2 2 _ p2(s4) - 1.96 - £1.99, = .96 Then with a = B = .05, the average size of the sample needed by the SPRT to identify v if v1 is the actual scene view is given by: .05 103(19) + .95 log(T%) EUIIVI) = 9 E(z|v1) where 4 pz(sz) - 2 2 2 ' E(Z|V1) Z P1($J) 108% i=1 p1 i = o + 0 + 18 log;%% + 82 10gL§§ = - 14147 Hence, -.85000 E(an1) = 7711117 2 6 Similarly, for a = B = .05, the average size of the sample needed by the SPRT to identify v if v2 is the actual scene view is given by: .95 log(19) + .05 log(T%) E(n|v2) = 9 E(z|v2) . . A. o—Q: -' I. . O 159 where 4 p2(sz) E(zlvz) = P%(5§)’1°g_%_—%—’ j=1 p1(sj) = o + 0 + .04 log(;%%) + .96 108(ng) = .09116 Hence, E(n|v2) = +%%%%% = 9'33 z 10 Thus, for the identification of v at r=2 with 95% certainty of being correct (i.e., the probability that v will be mis-identified is .05), the SPRT will require, on the average, 6 observations of v based 2 on E if v1 is the actual scene view and 10 obser- vations from B2 if v2 is the actual scene view. To examine the effect of increased resolution on the expected sample size, consider r=6. Then the 6 sample Space S6 of eXperiment E is given by where p?(s$) = p?(sg) = 0, p?($§) = .54, p?($2) ‘ 46» and pg(s$) = p3(sg) = 0. p3(sg) = ~12, 93(52) = '88 Then, with a = B = .05, the average size of the sample needed by the SPRT to identify v if v1 is the actual scene view is given by 160 -.85000 -.85000 E("'Vl) = "‘—“‘ = 12 88" B(z|v1) .54 log(;§z) + .46 log(413) _ -.85000 _ ~ ‘7751‘386‘1-66-2 Similarly, with a = B = .05, the average size of the sample needed by the SPRT to identify v if v2 is the actual scene view is given by = .85000 = .85000 ECHIV ) 2 E(z|v2) .12 log(f%%) + .88 log(f%%) _ .85000 _ N ‘W‘2-13-3 Thus, for the identification of v at r=6 with 95% certainty of being correct (i.e., the probability that v will be mis-identified is .05), the SPRT will require, on the average, 2 observations of v based on E6 if v1 is the actual scene view and 3 observations from E6 if v2 is the actual scene view. For one additional examination of increased resolution effects on the expected sample size, con- 10 sider r=10. Then the sample space S of experiment E10 is given by 10 10 10 10 10 S = {$1 , $2 , s3 , s4 } where 1 1 1 10 Pio(510) = pio(520) = 0, p10(s3 ) = .90, pio(sio) = .10, 161 and 10 10 10 10 1 1 pz (51 ) = pz (52 ) = 0. p20(530) .20. l p20(si0) = .80. Then, with a = B = .05, the average size of the sample needed by the SPRT to identify v if v1 is the actual scene view is given by E(z|v1) .90 log(f%%) + .10 10g(f%%) E(an1) _ -.85000 _ -.85000 = -.85000 = 74 z 1 T171157? ' Similarly, with a = B = .05, the average size of the sample needed by the SPRT to identify v if v2 is the actual scene view is given by . .85000 E(nlvz) = ._§._S_O_O_0. = .20 .80 E(z|v2) .20 1°g(790) + .80 1°g(TTU) _ .85000 _ - _ 1736771 - .622 - 1 Thus, for the identification of v at r=10 with 95% certainty of being correct, the SPRT will require, on 10. While the average, 1 observation of v based on E this result may seem somewhat surprising, it should be remembered that E(n|vi) represents an average value for n, based on n being continuous. In reality of course, n is discrete and can only assume values l,2,3,...,etc., and the approximations for E(n|vi) must be interpreted with this in mind. 162 This example confirms that as the encodement resolution r is increased, the average number of samples required by the SPRT to identify v (as either v1 or v2) decreases. This is intuitively appealing, since one would expect that distinguishing a length difference (which is the only difference between v1 ‘} and v2) would be easier at resolutions for which that 5. length difference is closer to 1 grid unit than at : coarser resolutions for which that length difference is closer to 0 grid units. a’ 4.4.5 Practical Considerations The two examples of the previous section demon- strate that snapshot signatures can be used practically for scene view identification in situations where distinct scene views generate the same snapshot sets relative to some encodement resolution. Such cases occur when the resolution of the encodement grid is too coarse to reflect the structural differences of the distinct scene views, regardless of the particular relative translational position of the encodement grid. Hence, features which are based on the structure reflected in an arbitrary encodement of an unknown one of these distinct scene views —- even after smoothing or other image enhancement techniques -- cannot provide any information which can help to identify which one of these scene views is repre- sented. This implies that the "classical" pattern 163 recognition approach based on feature extraction from a single encodement of an unknown picture will not be pro- ductive at such coarse resolutions. Hence, the scene view identification scheme based on snapshot signatures can be utilized at coarse encodement resolutions where "classical" feature extraction techniques of identification cannot be productively applied. This is particularly significant when physical constraints -- such as the size or number of photocells which may be available to realize a grid (see Section 2.3) -— may prevent the encodement grid resolution from being made fine enough for feature extrac- tion identification techniques (based on a single encode- ment of an unknown) to be effective. Even when the resolution to be used for encoding an unknown scene view can be made fine enough to allow feature extraction identification techniques to be used productively, it may still be advantageous to use a relatively coarse resolution with an identification scheme based on snapshot signatures. The choice is based on memory vs. time considerations, and also upon whether or not it is possible -- or desirable -- to encode an unknown scene view more than once. Of course, if an unknown scene view is only scanned (encoded) once, snap- shot signature identification techniques can only be based on the probability of occurrence which each prototype scene view associates with that single snapshot -- a .1 .. . La..'.‘__‘~.' I- u- Inn-.1: .7 5 164 procedure which does not fully utilize the identification capabilities of snapshot signatures. Hence, we will restrict our attention to those cases where the unknown scene view to be identified can be scanned as many times as desired (with the encodement grid randomly and indepen- dently translationally positioned for each scan). Note that each such scan of the unknown scene view at reso- lution r generates a binary encodement with (r+2)2 bits of information -— one bit representing each (shaded or non- shaded) square on the extended grid of resolution r. Hence, if an identification procedure requires n scans (encodements) of an unknown scene view at resolution r, then (r+2)2-n total bits of information must be processed. For Example 2 of Subsection 4.4.4, we can thus draw the following conclusions about the average number of bits of information which are needed to identify an unknown ve{v1,v2} using the SPRT: (a) for r = 2, the average number of bits to be processed is IZ (2+2)2-E(n|v1) 16-6 = 96 if v1 is the actual scene view, and is l2 (2+2)2-E(n|v2) 16-10 = 160 if v2 is the actual scene view; 165 (b) for r = 6, the average number of bits to be processed is (6+2)2-E(n|v1) = 64-2 = 128 if v1 is the actual scene view, and is (6+2)2~E(n|v2) 2 64-3 = 192 if v2 is the actual scene view; and (c) for r = 10, the average number of bits to be processed is (10+2)2-E(n|vi) z 144 1 = 144 for i=1,2. Now the first resolution in Example 2 of Subsection 4.4.4 for which v1 and v2 can generate a different snapshot is r = 12. Hence, this is the first resolution at which a feature extraction identification scheme based on a single encodement of an unknown veP can be effective. At r = 12, v1 can generate snapshots which are 11 and 12 shaded grid squares wide, while v2 can generate snapshots which are 12 and 13 shaded grid squares wide. Hence, at least a 13 x 13 grid (having grid square size equal to that of the grid of resolution 12) is needed to represent an arbitrary encodement of an unknown scene view ve{v1,v2}. In turn, this implies that at least (l3)2-l = 169 bits of infor- mation must be processed for the identification of v using feature extraction techniques. Note, however, that at r = 12 the snapshots generated by both v1 and v2 which are - .r .._- ...“...1 .... _ 166 12 shaded grid squares in length are identical. And the probability that such snapshot occurs is .92 (which is pi2(si2)) if v1 is the actual scene view and .24 (which is p%2(s§2)) if v2 is the actual scene view. Hence, the chance at r = 12 that a random encodement of an unknown ve{v1,v2} will reflect a structural characteristic which can be used to differentiate between v1 and v2 is fairly small. So the probability is fairly high that a feature extraction technique which uses structural characteristics of the encodement to identify v will make an error. In fact, it is not until r = 23 —— where v1 can generate only snapshots which are 21 and 22 grid units wide, and v2 can generate only snapshots which are 23 and 24 grid units wide -- that an arbitrary encodement of an unknown ve{v1,v2} will always reflect the structural differences between v1 and v2. At r = 23, feature extraction techni- ques could be used to identify v as either v1 or v2 (with 100% certainty of being correct) by processing the (23+1)z-1 = 576 bits of information in any arbitrary encodement of v. In any event, the feature extraction method of identifying an unknown vc{v1,v2} from a single arbitrary encodement requires the processing of at least 169 bits of information (at r = 12) to be at all effective, and even more (for r > 12) if that identification is to be made with a high degree of certainty. Comparing these requirements with the average number of bits required to '1- 1.] III Ila! 167 identify v with 95% certainty using the SPRT with snapshot signatures at lower resolutions r (r = 2,6, and 10), the feature extraction method will in many cases require significantly more bits to be processed to identify v with the same level of certainty. To make an exact comparison, it would be necessary to consider the probabilities with which the scene views v1 and v2 themselves occur, the degree of certainty with which an unknown scene view is to be identified, and the specific features and decision rule used by the feature extraction method. However, it should be apparent from this discussion that the use of snapshot signatures with the SPRT offers an effective alternative to the "classical" feature extraction methods (based on a single arbitrary encodement of an unknown) of identifi- cation. Not only can the identification be done at lower resolutions using snapshot signatures, but there may also be an overall savings in the average number of bits of information (i.e., memory) needed for identifying unknown scene views. The smaller average number of bits which may be required by the SPRT to identify an unknown scene view is not the only savings in memory which can result from using snapshot signatures. Feature extraction methods require a relatively fine resolution, say r1, to insure that signi- ficant structural differences will be reflected in an arbitrary encodement of any of the scene views under . ‘ qu-.i_.p'. .4 ‘ ._ I r 168 consideration. And since the entire encodement must be available for extracting features, (r1+l)2 bits of infor- mation must be simultaneously accessible (i.e., stored in memory). On the other hand, the use of snapshot signa- tures and the SPRT may permit a relatively coarse reso- lution, say ro(