TOWARD THE DETECTION OF LANDSCAPE FEATURES: CLUSTERING 3D POINTS USING SPATIAL AND THEMATIC CHARACTERISTICS By Boleslo Edward Romero A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Geography 2010 ABSTRACT TOWARD THE DETECTION OF LANDSCAPE FEATURES: CLUSTERING 3D POINTS USING SPATIAL AND THEMATIC CHARACTERISTICS By Boleslo Edward Romero The study of Geography generally concerns phenomena at or near the surface of the earth. High resolutions of 3D quantitative and qualitative data can represent such phenomena as objects or fields. The data can be grouped to reveal representations of contiguous regions of spatial and thematic homogeneity. My thesis is concerned with finding groups of 3D points with similar locations, spatial relationships, and thematic values of spectral reflectance. To accomplish this successfully, I synthesized elements of two geographic theories: point aggregation from cartographic generalization and hierarchical geographic ontology. My experimental design used synthetic 3D point data with spectral values. I employed the multi-dimensional Mean Shift clustering technique from the discipline of Computer Vision, and adapted a 3D range image segmentation accuracy assessment technique. I also contributed new techniques for segmentation quality assessment including two area under the curve indices and the development of new segmentation surface plots. Experimental evaluations included comparisons of the Mean Shift results with K-means clustering results, spatial resolution results, noise evaluation results, and the results of an alternative color configuration. I modified the variable sets to address uneven lighting conditions and employed the experimental methods to grouping real-world terrestrial LiDAR scan data. Though my new spatial relationship variable needs improvement, the methods yielded groups of points representing features in the LiDAR data and provided evidence of the potential for grouping richly attributed 3D points that represent geographic features. Copyright by BOLESLO EDWARD ROMERO 2010 DEDICATION This thesis is dedicated to my mother, my father, my grandparents, and Tammy for their positive influence, steady encouragement, and unconditional love. iv ACKNOWLEDGMENTS Dr. Ashton Shortridge has been an outstanding advisor. I genuinely value his thoughtful review, critiques, and counsel throughout my research. He has been generous with his time, shared many insights, and expressed continued enthusiasm for the practice of scholarship. I also appreciate the contributions of my committee members, Dr. David Lusch and Dr. Bruce Pigozzi. They kindly shared their time and ideas to improve my research. Dr. Tamara Bray offered an opportunity for me to participate in a truly inspiring field research project. Dr. Tim Wawrzyniec, Jed Frechette, and the UNM LiDAR Lab provided detailed guidance and field equipment for the LiDAR case studies. Three other scholars, Dr. Adam Hoover, Dr. Ilan Shimshoni, and Dr. George Stockman, all provided consultation at critical stages. I am grateful for the help and partnership that has been offered by all of the people above and many other friends as well. I express my gratitude to the Michigan State University as a whole and to the Department of Geography faculty and staff for support throughout the course of my thesis. v TABLE OF CONTENTS LIST OF TABLES ......................................................................................................................... ix LIST OF FIGURES ....................................................................................................................... xi KEY TO SYMBOLS OR ABBREVIATIONS ........................................................................... xiii Chapter 1 - INTRODUCTION ........................................................................................................1 1.1 Major Concepts ................................................................................................................1 1.2 Focus of Work..................................................................................................................2 1.3 Contributions....................................................................................................................4 1.4 General Hypotheses .........................................................................................................6 Chapter 2 - CONCEPTUAL FRAMEWORK .................................................................................7 2.1 Overview ..........................................................................................................................7 2.2 Geographic Representation ..............................................................................................7 2.3 Cartographic Generalization ..........................................................................................10 2.4 Geographic Ontology .....................................................................................................11 2.5 Synthesis of Cartographic Generalization and Geographic Ontology ...........................14 2.6 Spatial Characteristics....................................................................................................15 2.6.1 Derived Spatial Variables - Neighborhood Relationships .........................................15 2.7 Thematic Characteristics ................................................................................................18 2.7.1 Derived Thematic Variables - Spectral Color Space Alternatives.............................19 2.8 Classification and Segmentation ....................................................................................19 2.9 Clustering .......................................................................................................................21 2.9.1 Hierarchical ................................................................................................................22 2.9.2 K-means .....................................................................................................................22 2.9.3 ISODATA ..................................................................................................................24 2.9.3.1 Mean Shift ........................................................................................................25 2.10 Accuracy Assessment ....................................................................................................28 2.10.1 Types of Clustering Assessment ................................................................................28 2.10.2 Confusion Matrices - Typical Classification Approach.............................................29 2.10.3 Overlapping Area Matrices - Modified Segmentation Approach ..............................31 2.10.4 Range Data Segmentation - Extended Metrics ..........................................................32 2.10.4.1 Segmentation Performance per Tolerances - Correct Segmentation ................34 2.10.4.2 Instances of Over-segmentation and Under-segmentation ...............................35 2.10.4.3 Final Classification ...........................................................................................36 2.10.4.4 Segmentation Performance Curves ..................................................................37 2.10.4.5 Area Under the Curve Index .............................................................................38 vi Chapter 3 - METHODS .................................................................................................................39 3.1 Methods Overview .........................................................................................................39 3.2 Data Sources ..................................................................................................................40 3.2.1 Synthetic Data Sets ....................................................................................................41 3.2.2 LiDAR Data - Case Studies .......................................................................................43 3.3 Processing ......................................................................................................................44 3.3.1 Spatial Variables - Neighborhood Relationships .......................................................44 3.3.2 Thematic Variables - Spectral Color Space Alternatives ..........................................45 3.3.3 Standardized variables ...............................................................................................45 3.3.4 Variable Sets and List of Experiments.......................................................................46 3.3.4.1 Experiment Set 1 - Mean Shift/K-means Clustering Comparison ...................48 3.3.4.2 Experiment Set 2 - Spatial Resolution Comparison .........................................48 3.3.4.3 Experiment Set 3 - Noise Evaluation ...............................................................49 3.3.4.4 Experiment Set 4 - Alternative Color Configuration........................................49 3.3.4.5 Experiment Set 5 - LiDAR Case Studies..........................................................50 3.3.5 Clustering ...................................................................................................................52 3.4 Accuracy Assessment ....................................................................................................53 3.4.1 Mean Shift Clustering ................................................................................................53 3.4.2 Pruned Mode Labeling Modifications .......................................................................54 3.4.3 Segmentation Performance Classification .................................................................55 3.4.3.1 Modification to the Reference Column Metrics ...............................................55 3.4.3.2 Tolerance Values ..............................................................................................56 3.4.3.3 Correctly Segmented Metrics ...........................................................................56 3.4.3.4 Over-segmented Metrics...................................................................................57 3.4.3.5 Under-segmented Metrics.................................................................................58 3.4.3.6 Final Classification ...........................................................................................59 3.4.4 Segmentation Performance Curves ............................................................................60 3.4.5 Area Under the Curve Index ......................................................................................60 3.4.6 Segmentation Surface Plots .......................................................................................61 3.4.7 Statistical Analyses ....................................................................................................62 3.4.7.1 One-sample t-tests for the Noise Evaluation ....................................................63 3.4.7.2 The ANOVA and Post-ANOVA Tests for All Variable Sets ..........................63 3.4.7.3 The ANOVA and Post-ANOVA Tests for Color Variable Sets ......................64 3.4.7.4 The ANOVA and Post-ANOVA Tests for Variable Type ...............................64 3.4.8 3D Perspective Plots ..................................................................................................65 3.4.9 K-means Clustering Comparison ...............................................................................65 3.4.10 LiDAR Case Studies ..................................................................................................65 Chapter 4 - RESULTS and DISCUSSION ....................................................................................67 4.1 Overview ........................................................................................................................67 4.2 Synthetic Data Set ..........................................................................................................67 4.3 K-means clustering ........................................................................................................68 4.4 Spatial Resolution ..........................................................................................................76 4.5 Noise Evaluation ............................................................................................................83 4.5.1 One-sample t-tests for All Variable Sets at All Levels of Noise ...............................85 4.5.2 Selection of the 0.02 Noise Level for Additional Analyses.......................................87 vii 4.6 Comparison of Multiple Means with Analysis of Variance (ANOVA) tests ................88 4.6.1 The ANOVA and Post-ANOVA Tests for All Variable Sets ....................................89 4.6.2 The ANOVA and Post-ANOVA Tests for Color Variable Sets ................................92 4.6.3 The ANOVA and Post-ANOVA Tests for Variable Types .......................................95 4.7 Alternative Color Configuration ....................................................................................97 4.8 LiDAR Case Studies ....................................................................................................101 4.8.1 Indoor Scan ..............................................................................................................101 4.8.2 Archaeological Site Scan .........................................................................................107 Chapter 5 - CONCLUSIONS.......................................................................................................115 5.1 Overview ......................................................................................................................115 5.2 Two General Hypotheses .............................................................................................115 5.3 Five Sets of Experiments .............................................................................................115 5.4 Contributions................................................................................................................117 5.5 Contributions to Theory ...............................................................................................119 5.6 Future Research ...........................................................................................................121 REFERENCES CITED ................................................................................................................124 viii LIST OF TABLES Table 1. Hypothetical overlapping error matrix showing under-segmentation. ........................... 32 Table 2. Mr values, proportions of points in each cell relative to the reference totals. ................ 33 Table 3. Ms values, proportions of points in each cell relative to the output totals. ..................... 34 Table 4. Cells with proportions greater than or equal to a tolerance of 0.5. ................................. 35 Table 5. Cells with proportions greater than or equal to a tolerance of 0.6. ................................. 35 Table 6. Color assignment for thematic variables of the initial synthetic data set. ...................... 42 Table 7. Data sets. ......................................................................................................................... 44 Table 8. Variable sets, variables and descriptions. ....................................................................... 45 Table 9. Combinations of sets of variables, steps, and sequences. ............................................... 47 Table 10. List of experiments. ...................................................................................................... 47 Table 11. Alternative color configuration values. ........................................................................ 50 Table 12. K-means and Mean Shift AUC rankings. ..................................................................... 72 Table 13. Low resolution Max. AUC rankings............................................................................. 77 Table 14: High resolution Max. AUC and Avg. AUC rankings. .................................................. 79 Table 15. Comparison of various resolutions' Avg. AUC. ........................................................... 83 Table 16. Mean Avg. AUC and P-values for one-tailed, one-sample t-tests. ............................... 86 Table 17. ANOVA results for all variable sets. ............................................................................ 89 Table 18. ANOVA results for color variable sets. ........................................................................ 92 Table 19. Color space value examples favoring clustering of the RGB variable set.................... 94 Table 20. ANOVA results for variable types................................................................................ 95 Table 21. Alternative color Max. AUC and Avg. AUC. .............................................................. 99 ix Table 22. Indoor LiDAR Max. AUC and Avg. AUC values...................................................... 104 Table 23. Archaeological Site LiDAR Max. AUC and Avg. AUC values. ................................ 111 x LIST OF FIGURES Figure 1. General summary of theory, processes, methods. ........................................................... 2 Figure 2. Mean vector relationships of 3D points representing an object. ................................... 17 Figure 3. Segmented satellite image (true color, false color, and classified). .............................. 20 Figure 4. K-means clustering process. .......................................................................................... 23 Figure 5: K-means clustering mismatch between 4 clusters and 3 seeds. .................................... 24 Figure 6. Mean Shift on 2D point set, finding modes. .................................................................. 26 Figure 7. Points in a 3D LUV color space clustered into 6 groups by the Mean Shift. ................ 27 Figure 8. Example segmentation performance curve. .................................................................. 38 Figure 9. Synthetic test cube from the top, front, and left corner. ................................................ 42 Figure 10. Indoor LiDAR data set. ............................................................................................... 51 Figure 11. Archaeological Site LiDAR data set. .......................................................................... 52 Figure 12. Initial noise-free data set. ............................................................................................ 68 Figure 13. K-means and Mean Shift Segmentation Performance Curves. ................................... 70 Figure 14. K-means 3D perspective plots. .................................................................................... 73 Figure 15. Low resolution segmentation performance curves. ..................................................... 77 Figure 16. High resolution segmentation performance curves. .................................................... 78 Figure 17. High resolution segmentation surface plots. ............................................................... 80 Figure 18. High resolution 3D perspective plots. ......................................................................... 81 Figure 19. Examples of adding noise to the spatial data............................................................... 84 Figure 20. Least significant differences results for all variable sets. ............................................ 90 Figure 21. Least significant differences results for color variable sets......................................... 93 xi Figure 22. Least significant differences results for variable types. .............................................. 96 Figure 23. Alternative color configuration segmentation performance curves. ........................... 98 Figure 24. Alternative color configuration 3D perspective plots. ................................................ 99 Figure 25. Indoor LiDAR scan, sampled data, and labeled reference groups. ........................... 102 Figure 26. Indoor LiDAR segmentation performance curve examples. ..................................... 103 Figure 27. Indoor LiDAR segmentation surface plot examples. ................................................ 105 Figure 28. Indoor LiDAR 3D perspective plot examples. .......................................................... 106 Figure 29. Archaeological Site LiDAR scan, sampled data, and labeled reference groups. ...... 109 Figure 30. Archaeological Site LiDAR segmentation performance curve examples. ................ 110 Figure 31. Archaeological Site LiDAR segmentation surface plot examples. ........................... 112 Figure 32. Archaeological Site LiDAR 3D perspective plot examples. ..................................... 113 xii KEY TO SYMBOLS OR ABBREVIATIONS 3D Three-dimensional 2D Two-dimensional AB Point values for the CIE La*b* color space, without Lightness AUC Area under the curve index Avg. AUC Average area under the curve index of Mean Shift bandwidths FAMS Fast Adaptive Mean Shift clustering algorithm GPS Global positioning systems LiDAR Light Detection and Ranging LAB Point values for the CIE La*b* color space LUV Point values for the CIE Luv color space Max. AUC Maximum area under the curve index of Mean Shift bandwidths Mr Correct segmentation metric for reference groups Ms Correct segmentation metric for segmented output groups Mro Over-segmentation metric for reference groups Mso Over-segmentation metric for segmented output groups Mru Under-segmentation metric for reference groups Msu Under-segmentation metric for segmented output groups MV Point values for mean vector variable in the x, y, and z, axes MVx Point values for mean vector variable in the x axis MVy Point values for mean vector variable in the y axis MVz Point values for mean vector variable in the z axis xiii OAM Overlapping Area Matrix RGB Point values for the red, green, and blue color space UV Point values for the CIE Luv color space, without Lightness XYZ Point values for the x, y, and z axes xiv Chapter 1 - INTRODUCTION 1.1 Major Concepts The study of Geography generally concerns phenomena at or near the surface of the earth. With the advent of technologies such as remote sensing, humans have boosted their ability to sample and measure characteristics of this environment. We are now able to sample vast regions of previously inaccessible locations on the earth's surface as well as in the atmosphere and below the surface of land and water bodies. Higher resolution data are also being collected as sensors and computer processing power are improved. Due to these new capabilities, geographical data sets have become very large. With more geographical characteristics being sampled at higher resolutions, it is difficult to ascertain the general geographic phenomena through their data representations. The data are qualitative or quantitative values of some characteristic at particular locations in space and time: a finite sampling of locations representing geographic phenomena. They can be grouped to reveal representations of contiguous regions of spatial and thematic homogeneity, either parts of or whole geographic phenomena. My thesis is concerned with finding groups of data representing geographic phenomena in the environment: specifically, groups of 3D points with similar locations, spatial relationships, and thematic values of spectral reflectance. Major relevant concepts include the representation of geographic features as objects or fields, cartographic generalization, and geographic ontology. I rely on the subfields of GIScience, Cartography, and Remote Sensing from the discipline of Geography; Image Analysis and Computer Vision from the discipline of Computer Science; and Cluster Analysis from information sciences. The main topics I incorporate are representation, sets, generalization, aggregation, classification, segmentation, and clustering. 1 Following this Introduction, I discuss these concepts and topics in my Conceptual Framework section. Implications and decisions relating to my experimental design are also covered in my Conceptual Framework. The Methodology section involved details of my actual experimental procedures as well as two case studies. The Results and Discussion and the Conclusions follow. 1.2 Focus of Work As an introductory preview, I now briefly describe my topics. Later, in the Conceptual Framework section, I will discuss these topics in detail. Figure 1, is a general graph summarizing the theory, processes, methods, and results of my work, excepting the experimental accuracy assessment. The next few paragraphs elaborate on these components. Figure 1. General summary of theory, processes, methods. 2 Representation of geographic phenomena is my main concern. I employ a three-dimensional (3D) point data structure. From the GIScience literature, I recognize these 3D spatial locations are samples of phenomena that could be considered either objects or fields. The points are considered unstructured in the sense that there are no topological connections or relationships initially defined. As 3D points, they could represent locations on surfaces or throughout volumes. Spatial location is an inherent property of the samples. However, I also developed a measure of each point's spatial relationship to its neighbors to identify similar groups of points representing particular orientations or spatial arrangement of the geographic phenomena. The point locations also have thematic attributes associated with them. The thematic attributes could be any quantitatively measured or qualitatively labeled characteristics of some geographic object or field phenomena; I selected spectral reflectance. In proposing groups of data points that represented some geographic phenomena, two important considerations arose. First, the concept of spatial autocorrelation was relevant as characteristics of nearby locations are more likely to be similar than distant locations. This supports the idea that spatial location itself is a primary indicator in finding locally similar points. Contiguous sets of similar points are further defined with respect to their spatial relationships or thematic characteristics. Secondly, local groups of points can, to some degree, be employed to predict the thematic character of unsampled locations within the region they encompass. Keeping focus on what such groups of points represent: geographic phenomena. I am interested in synthesizing elements of two geographic theories. I notice similarities between cartographic 3 generalization and geographic ontology. A particular process of cartographic generalization known as aggregation is used to group points that represent homogeneous regions. Likewise, in geographic ontology, low-level observable points are grouped into sets representing simple objects. This shows conceptually parallel processes, both developing regions of similarity from groups of points. Grouping sets of data by measures of similarity is an initial step in the common process of classification, before the sets are actually labeled as belonging to a class. I am interested in grouping 3D point data into locally homogenous groups representing geographic phenomena. Finding component parts of objects or geographic phenomena with many smaller groups is preferable to finding too few groups which do not adequately separate the geographic phenomena. The component parts can be aggregated later, if necessary. As described in the Methods section, my experimental design used synthetic 3D point data sets with various spectral values as thematic attributes. I employed the Mean Shift clustering method for grouping points based upon their spatial location, relationships, and thematic attributes. Along with a brief comparison to a K-means clustering, I extended an accuracy assessment framework to quantitatively evaluate the clustering performance for synthetic data and two case studies. 1.3 Contributions This is a summarized list of my research contributions. Details of the topics, implications, and experimental decisions are covered in the Conceptual Framework section. The Results, Discussion, and Conclusions sections also add insights related to these contributions. This listing 4 is loosely based on the order of introduction of the topics with earlier topics building a foundation for subsequent topics. Here are my contributions to the discipline of Geography: 1) My research utilized a 3D point data structure. Though relatively uncommon in Geography, this type of data is becoming more prevalent as it offers more flexibility for complex geographic representation than the historically common 2D point or 2D raster data. 2) I developed a measure of spatial neighborhood relationship for unstructured 3D points called the Mean Vector. The name refers to the process of finding the (average) mean of 3D vectors from each point to its local neighbors. 3) For thematic attributes, I evaluated the effectiveness and potential benefits of three common color spaces (RGB, LUV, and LAB) for grouping 3D points. 4) I employed the Mean Shift clustering technique which is rarely used in geographical data analysis. It is a successful segmentation technique commonly used in Computer Science for 2D image analysis and computer vision. 5) I employed a remote sensing segmentation performance accuracy assessment framework developed for 3D range scans. 6) I extended the accuracy assessment to include numerical area under the curve indices. 5 7) I developed new segmentation performance surface plots for comprehensively evaluating three dimensions for my segmentation performance assessment (tolerance, bandwidth, and instances of correct segmentation). 8) Theoretically, I combined elements of two geographical concepts, cartographic generalization and geographic ontology, formalizing a particular process and method of geographic representation. 1.4 General Hypotheses My first general hypothesis is that groups of data representing geographic phenomena in the environment can be identified using spatial location, spatial relationship, and thematic characteristics with the Mean Shift clustering technique. My second general hypothesis is that combinations of up to three variable sets (spatial location, spatial relationship, and thematic attributes) enhances the ability to find groups of data representing components of geographic phenomena, as compared to single variable sets. The next chapter describes the conceptual framework in more detail. Chapter three presents the experimental methods. Chapter four reports the results and includes discussion, and chapter five concludes with some general thoughts. 6 Chapter 2 - CONCEPTUAL FRAMEWORK 2.1 Overview The Conceptual Framework section discusses previous research and implications to the current work. My experimental design decisions and innovations are also included. The chapter begins with a discussion of theoretical viewpoints on geographical representation and data models. Then, particular topics relating to cartographic aggregation and geographic ontology are introduced. From these two fields of study, I draw a parallel between two similar conceptual processes and discuss a theoretical synthesis. I then provide conceptual background on the methods. First, classification and multivariate clustering methods are introduced. Next, the Mean Shift clustering is described. Finally, an established accuracy assessment framework is both presented and extended. This Conceptual Framework introduces and explains portions of the methods, reducing lengthy discussion in the following chapters. 2.2 Geographic Representation The study of geographic features involves at least two components, spatial location and thematic attributes (Goodchild, 1990). Data for analysis are obtained by sampling features in the environment. The data are never as complete or as detailed as the feature being sampled. Therefore, geographic data are always generalized representations of some environmental phenomena with the information often presented in maps. When the data are entered into a computational environment, models are used for representation and prediction of some phenomena in geographic space (Frank, 2000). A review of geographic representation and many types of data models (Pequet, 1988) described a dual identity for maps: the data are represented as either as images or as geometric structures in 7 graphic form. Many versions of raster models, for images, and vector models, for objects, have been employed for analysis in computing environments (Pequet, 1988). Subsequent academic work on the dual representation models has argued that there is a complementary role that the two models offer for representation of features in geographic space (Couclelis, 1992). The work of Couclelis (1992) described the vector data as objects, treating geographic phenomena similar to geometric shapes or table-top items that can be manipulated. Couclelis (1992) addressed the common challenge of region delineation, and noted that the majority of hard boundaries were related to human artifacts such as engineered structures and administrative boundaries. The use of vector structures for geographic objects also provides benefits such as the potential for specification and analysis of spatial and topologically connected relationships. Couclelis (1992) also pointed out that raster data usually represents geographic phenomena as fields. Groupings of clustered pixels that share particular values of some thematic attribute can be identified as features. However, the boundaries of such regions are not as discrete as in the object model. The field model described by Couclelis (1992) also allowed the possibility of different features being found in the same location if different thematic attributes are observed. Geographic features could be represented with either model, though the model should be chosen to best fit the geographic phenomena or properties being studied. A recent article (Goodchild, Yuan, & Cova, 2007) formally synthesized the two data models by first considering the previous discussions (Pequet, 1988; Goodchild & Wang, 1989; Couclelis, 1992) and distinctive perspectives of geographic phenomena. The object-view perspective characterized geographic space as mostly empty and containing objects discretely defined as 8 points, lines, polygons, or volumes in two, three, or 4 dimensions. The field-view perspective characterized geographic space as filled with variable values of some measureable continuous phenomena. The synthesis of these two representational models was accomplished through reference to the nature of the geographic feature of interest. The shared foundation of both models was termed the atomic form of geographic representation (Goodchild, Yuan, & Cova, 2007). The first primitive of this representation, the geo-atom, was defined as the association between a point in space-time and some property, or thematic attribute. In essence, there may exist an infinite number of infinitesimal (vector) point locations associated with some measured value of any (field) property. Geographic features represented by many (sampled) points of similar values for the particular property may be considered either an object or a region within a continuous field. For my thesis, the theory of atomic geography helped to connect the vector model of point data with the regions of similar spatial and thematic properties that they represent. My synthetic data sets, for controlled experiments and accuracy assessment, consisted of points representing simple shapes with very separable color attributes. Points were configured as groups, each having similar values, intended to represent individual objects or components of objects. The experimental methods were then employed with the data sets of my case studies to demonstrate the potential for identifying regions of homogeneity in the environment. These data were sampled with a terrestrial LiDAR sensor and represented real-world scenes having complex spatial and thematic properties. They were discrete samples of a particular point in 3D space and a measureable thematic property (spectral reflectance). The vector point data were entered into a computational environment to identify groups of points representing regions of similar spatial 9 and thematic properties. The theory of atomic geography provided a link between the vector point data model and the 3D features the points represented in a continuous field of measurable spectral reflectance. 2.3 Cartographic Generalization In Geography, paper maps have traditionally provided a generalized representation of geographic phenomena. A relatively recent comprehensive theory of generalization was presented by McMaster and Shea (1992). Beginning with a review of definitions and numerous theoretical frameworks which preceeded their work, McMaster and Shea (1992) ultimately offered a theory of cartographic generalization suitable for computational use in a digital environment. Of the main reasons stated for generalization was the aim of reducing complexity. As such, generalization essentially reduced the amount of application-specific data and detail to retain clarity of a map, especially when the map scale was reduced. Of particular importance for this thesis is the generalization of point objects representing regions of similar spatial and thematic properties. McMaster and Shea (1992) define the term aggregation as the spatial transformation operator which aggregates, or groups, (nearby) points having similar attributes and represents them as polygonal area features. They note a dimensionality transformation from a point representation to an area representation. They also reference Keates' (1973) example of aggregating individual buildings which are near to each other and representing them as built-up areas. This thesis is concerned with aggregating individual (synthesized or sampled) 3D points into groups which represent regions of spatial and thematic similarity. 10 Though this thesis is mainly concerned with representation of geographic features, the methods employ computational processes. Weibel (1997) discusses two views of generalization: representation-oriented and process-oriented. Representation-oriented generalization is concerned with the representation of particular features at different map scales. He mentions that a multi-scale database may be required to provide different sets of information at different scales, and if alterations to the data are made at one scale, they may not be automatically updated at the other scales. The process-oriented view is the focus of Weibel's (1997) work. This is described as being more automatic and flexible as all of the map scales are derived from a single spatial database using generalization algorithms. As detailed in my methods, I employ semi-automatic algorithms that only require certain input parameters: number of output groups for k-means clustering, and number of neighbors and bandwidths for Mean Shift clustering. These belong to the global and bandwidth classes, respectively, of spatial data generalization algorithms described by Weibel (1997). Weibel (1997) mainly contrasts the expert-driven, creative techniques of knowledge-based cartography with the use of generalization algorithms. However, he notes that they can be used together for improved results and that both approaches still require key decisions by the user. Human input guides cartographic generalization with either subjective and intuituve elements or with control of algorithmic processes. He also implies that well-defined objectives improve the result of either approach. 2.4 Geographic Ontology To produce representations of geographic features from point data with 3D coordinates and thematic attributes, certain objectives are required. In other words, a conceptual model, or 11 ontology, for the features of interest must be defined. Ontology, as a philosophical discipline, is concerned with conceptualizations or particular categories for particular views of the world (Guarino, 1998). The concepts are ideas of things in existence depending upon particular perspectives. Another related meaning for the term ontology has become popular due to developments in information systems and computer sciences. Guarino (1998) mentions that this meaning generally refers to domain modeling as implemented by various methodologies. In geographic information sciences, ontology also relates to the specification of a conceptualization and both lines of investigation are pursued: top-level categories from a formal perspective, and a domain-specific ontology from a task-oriented approach (Agarwal, 2005). I followed a particular domain ontology in this research. In practical terms, this can be generally described as knowledge representation, defining features as components of the concept (Schwering, 2008). My ontology related to both the conceptual categories and physical properties of spatial and thematic data representing 3D features. I aimed to find components of objects in a scene that have similar spatial and thematic characteristics. Features in a landscape scene can become more distinctive when additional information is added. Couclelis (2009) recently described 7 levels of feature representation depending upon the amount of semantic information attached to a feature and its components. A description of relevant portions of her hierarchy follows. At the lowest ontological level, only points in space-time (locations) relating to a particular purpose exist. Adding crude, qualitative knowledge at the point locations provides the next higher level of information, as (thematic) observables. Separating the points into classes, based upon the observable qualities, provides additional 12 another level of (generalized) information. Then, spatially connected homogeneous regions are identified as simple objects (real-world entities). Assemblages of simple objects can be associated, providing what she terms as complex objects (this is one level beyond my methods, but included in my discussion.) I follow her hierarchical levels of features and their components. The three types of characteristics which made up my ontological framework were: spatial location, spatial relationship, and thematic attributes. The data employed in this thesis initially had 3D coordinate values for the x, y, and z axes as well as red, green, and blue color values. These basic characteristics were used with additional variables calculated from this data: a measure of spatial relationship and additional color spaces, each described in more detail later. The data values were expected to differentiate separate groups of data points representing different components of the 3D scene. As exemplified in the following three paragraphs, I intended to identify components that: • were in different locations (in 3D space) • had differing spatial relationships to neighboring points (planar or non-planar) • had different thematic attributes (colors) Spatial location can be a very useful characteristic to distinguish different features having similar thematic attributes or local spatial relationship. For example, walls of separate buildings may have a similar color and (a locally planar) spatial relationship, though we generally consider them as components of different buildings based upon their location. The same could be said of similarly shaped and colored groups of vegetation in different locations. 13 Local spatial relationships could provide an additional set of information, separating features having similar colors and being in close proximity. Examples of objects that could be separated based on (locally planar) spatial relationships include a concrete sidewalk that is adjacent to gray boulders or a grassy area that leads to green bushes. Thematic attributes, such as spectral reflectance in the red, green, and blue wavelengths, can be used to separate nearby regions with the same spatial orientation. For example, a study on the percentage of impermeable ground cover in an urban scene may have data representing an asphalt roadway, a sidewalk, and grassy landscaping. All of the features may be relatively flat and level. Color can be used to separate the regions. My aim is to identify features in a scene represented by 3D points by using the three basic characteristics of location, spatial relationship, and spectral values. The resulting aggregated groups of points may represent simple objects that could be subsequently assembled into complex objects, if necessary. The generalized representation and separation of groups with distinctly different characteristics are expected to ease interpretation of complex 3D data and landscape scenes. 2.5 Synthesis of Cartographic Generalization and Geographic Ontology This thesis closely followed two theories: the cartographic generalization of McMaster and Shea (1992), and an ontological hierarchy from Couclelis (2009). These two theories, though they came from separate threads of geographic inquiry, are very parallel in relation to this investigation. Both are related to geographic representation of features in the landscape and both present processes that similarly combine information to better understand those features. The 14 generalization process of aggregation was a transformation that grouped 3D points into representations of areal regions having similar spatial and thematic characteristics (McMaster & Shea, 1992). The ontological hierarchy also developed simple objects from spatially connected, homogeneous space-time points with similar observables (Couclelis, 2009). These two theories eased interpretation of complex data sets by providing a theoretical basis for the development of groups of points that represent recognizable features in the landscape. 2.6 Spatial Characteristics The initial data included 3D spatial location coordinate values in the x, y, and z axes (XYZ). Though spatial location was the lowest-ontological level of information, it provided a measure of proximity between points and group points representing contiguous regions. Ledoux and Gold (2008) argue that processing objects in 3D may have advantages over common raster computations, despite the ease of computing with raster arrays. Spatial data consisting of x, y, and z values can be sampled; they maintain natural discretization; map algebra operations are still available; and there are potential benefits for analysis and visualization stages (Ledoux & Gold, 2008). One of the first steps in using 3D point data is to identify features in the scene for further quantitative analysis and qualitative evaluation. 2.6.1 Derived Spatial Variables - Neighborhood Relationships To augment the available variable sets, I propose a metric for 3D spatial relationships intended for identifying planar regions. It is calculated for each point using the relative locations of neighboring points. With 2D raster grids, the slope, curvature, and roughness could be calculated from adjacent cells and used as neighborhood characteristics. Similarly, if a digital surface model 15 was already created from the 3D data points, the topological structure and connectivity could be used to determine which points were adjacent to calculate similar measures as in the 2D raster case. However, the goal of my thesis is to identify groups of unstructured 3D points where no topological connectivity between the points is defined. By identifying groups first, interesting data can be selected before spending the time and computing power on such processes as surface interpolation. This also allows the method to be more generalizable and applicable to any 3D spatial data, representing surfaces or other phenomena. Therefore, a new measure of spatial neighborhood relationship is considered: the mean vector (MV). This is simply the mean of vectors from each point to its neighbors, as defined in the Methods section. The neighboring points in the MV calculation could be determined in several ways: number of neighbors, distance with an absolute radius, distance with an adjusted radius based upon local density, or even a hybrid method of sampling points within a radius. For simplicity, I chose the number of neighbors method. I decided 8 neighbors would be beneficial as a small set of 8 neighbors helps to boost differentiation between local regions by not averaging across too many points. It would also be computationally efficient and is somewhat comparable to the queen's case of neighbors in a 2D grid. I expect that the MV would help to group 3D points representing planar regions as opposed to edges, corners, or regions of high variability. Figure 2 helps to illustrate the following example. First, I'll describe the situation considering points toward the interior of the planar region, away from edges, on the top side. The vectors to its neighbors would not vary in the Z axis, the X values of the vectors to the points to the left and to the right would offset, and the Y values of the 16 vectors to the points above and below would offset. This results in a mean vector at or very near zero. Most points on each side would have this same near-zero value for the mean vector. This value will help to distinguish planar regions as being different from edges, as described next. Points near an edge, or a change in the surface orientation, on one side may have neighbors representing a different side or orientation. When the coordinates from different sides are used for the mean vector calculation, the non-planar neighbors will result in a mean vector that varies in up to three dimensions. The mean vector metric could be considered beneficial for edgefinding or, as in my aims, for separating points with near-planar spatial relationships from points representing edges, corners, or highly variable regions. Figure 2. Mean vector relationships of 3D points representing an object. (For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis.) 17 2.7 Thematic Characteristics By adding measureable values of thematic attributes, the data have a higher ontological level and more information is available for cartographic generalization processes. Studies that use both spatial and spectral properties for segmentation are of particular importance to this thesis, since this is how I aim is to identify components of objects within a 3D scene. For instance, laser scans provide detailed geometry but still require interpretation for mapping and segmentation of a complex scene is not possible with a single spatial cue (Barnea & Filin, 2008). The previous work by Barnea and Filin shows that segmentation methods using either range or color alone have limitations, but together they have better results. The most promising method they implement, which I will also evaluate, is the use of Mean Shift clustering (Comaniciu & Meer, 2002). I will describe the Mean Shift clustering in the Conceptual Framework section on Clustering. Compared to other clustering methods, it is relatively low-parametric and only requires a bandwidth window of evaluation as described. It can also work with multi-variate data. For their study, Barnea and Filin used range information, surface normals, and color along with an iterative approach to further separate under-segmented clusters. However, to address issues of relatively high LiDAR point density near the scanner, they make a data representation transformation from the 3D object-view (empty space with objects) to a more traditional remote sensing computational model, the 2D field view (a raster grid with all cells filled with a value). I followed their approach in regard to the combination of spatial and thematic information and the use of Mean Shift clustering. However, I maintain 3D object-view representation of the data, having unstructured 3D points in space with additional thematic attributes. 18 2.7.1 Derived Thematic Variables - Spectral Color Space Alternatives The basic synthetic data included a set of red, green, and blue (RGB) values which can be considered a surrogate for any thematic variable. There are alternative color spaces for visible wavelengths which I was interested in evaluating. These were expected to assist in reducing the impact of shadows or environmental illumination on the spectral values. For instance, the the CIE Luv (LUV) and CIE L*a*b* (LAB) color spaces isolate lightness into one variable and have two variables for color. They are also designed to have values that more evenly represent the humanly perceptual differences between colors. Research has deemed CIE L*a*b* as an appropriate color space for raster image segmentation along with the Mean Shift clustering (Paris & Durand, 2007). I chose to evaluate the original RGB along with the LUV and LAB color spaces. For my experiments with synthetic data I made use of the RGB and all three variables in each of the LUV and LAB variable sets. However, the LiDAR case studies had highly variable RGB values, changing material reflectance, and shadows. To address these issues and improve segmentation performance, I made use of supplementary variable sets, UV and AB, removing the (L) lightness variable from the LUV and LAB variable sets. 2.8 Classification and Segmentation Using basic characteristics for classification of landscape features is common. The variable spatial and thematic characteristics can be quantitatively evaluated and converted into a generalized representation of classes. Instead of attempting to deal with an infinite variety of variables and values for an entire region, the sampled data can be reduced to a more 19 understandable set of thematic classes (Foody, 2002). This classification provides another set of semantic information, raising the ontological level up in the Couclelis (1992) hierarchy. Computational environments have greatly aided the processing of such abundant information. From the computer science literature, Haralick and Shapiro (1985) describe processes such as image segmentation which have created many possible avenues for classifications of land cover, clarifying and aiding in our interpretation landscape features. Figure 3 shows an example of image segmentation based on classified spectral values. For this classification, a Landsat 7 ETM+ satellite image is used, showing a region in the vicinity of the Michigan State University campus. The first image displays the region in true color, using bands 1, 2, and 3 (blue, green, and red). The second image displays the region in false color, using bands 3, 4, and 5 (red, near-infrared, and mid-infrared). The third image displays the classification results. Pixels are grouped into 4 classes of land cover using all of the spectral values available in the satellite image. This generalization techniques helps to represent and evaluate landscape features. Figure 3. Segmented satellite image (true color, false color, and classified). 20 In a theoretical sense, classification is often related to both cartographic generalization and geographic domain ontologies, since observable properties are selected and used to represent features in the landscape. In a practical sense, classification often employs clustering, described in more detail in the next section. This can be implemented in either supervised or unsupervised methods. For the supervised case, data are selected by an analyst to provide training signatures for the desired classes. Each of the remaining data are assigned to the most similar class through a clustering process. For unsupervised classification, no classes are initially specified. Clustering is performed on the entire data set and the resulting groups are labeled by the analyst. 2.9 Clustering In this section, I discuss multi-dimensional clustering, starting with the general theory and following with specific clustering methods. I present the Hierarchical, K-means, ISODATA, and the Mean Shift types of clustering to show a lineage of methods related to my implementation of the Mean Shift clustering. To offer a basis of experimental comparison, the well-known and popular K-means method is used to evaluate the performance of the Mean Shift method. In general, clustering can be beneficial to understanding multidimensional data sets. A review of data clustering (Jain, Murty, & Flynn, 1999) states that the aim of data clustering is to determine natural groupings and they explain that this is typically achieved by minimizing within-group variance while maximizing between-group variance. They describe two conceptual approaches to this goal. One approach is agglomerative clustering, adding similar data to a group in sequence. The second approach is divisive clustering, sequentially separating the most differing data apart. Finally, they point out that differing results that can be achieved by clustering. Hard assignment considers the destination cluster for a point to be one single group. On the other 21 hand, fuzzy probability clustering can allow for a point to potentially belong to one of several groups with differing degrees of probability. My experiments follow the concepts of unsupervised, agglomerative clustering with hard assignment. Numerous types of clustering are available; a few are described below since they relate to the Mean Shift clustering approach which was ultimately selected for finding groups of points with similar spatial and thematic characteristics. 2.9.1 Hierarchical The basis of clustering philosophy can be easily conceptualized with a description of hierarchical clustering. Hastie, Tibshirani, and Friedman (2009) explain that hierarchical clustering starts with each and every point in a cluster of its own. Then, the (typically Euclidean) distances in the multi-dimensional variable space from every point to every other point is determined and the two closest points are combined into a cluster. Once they are clustered, the values of the points will be averaged to create a new value for the center of the cluster. This continues iteratively until all of the clusters have been combined into one group. A dendogram is created and used to select a cutoff point for a reasonable number of clusters. 2.9.2 K-means The k-means algorithm minimizes the within cluster variability and maximizes the between cluster variability (MacQueen, 1967). The number of seed clusters must be selected a priori; a non-trivial drawback, since this parameter is often unknown before hand. Xu and Wunsch (2009) provide a description for the iterative K-means process which can be followed in 22 Figure 4, below. The K-means clustering starts with (K) randomly placed seed values. In this example, four initial seed locations are shown with black squares. For each point in the data set, the multi-dimensional distances to the seeds are calculated. All of the points closest to each seed are grouped together into clusters, as shown in the lower images with group boundary lines. For each group, the mean is calculated, creating new cluster centers. Then, for the next iteration, new groups of points are selected as being closest to the cluster centers and new means are calculated. This continues iteratively as the groups minimize the within-cluster variance and maximize the between-cluster variance. The process tends to produce hyperspherical clusters and will have different results with different initial seed values or different initial seed locations. Figure 4. K-means clustering process. To demonstrate the effect of different seed values on K-means clustering, a second iterative sequence is shown in Figure 5. From the previous example, the same set of data points and only 23 three of the four original seed values are used. This represents a mismatch between number of data clusters (four) and number of seed values (three). By selecting a different set of seed values at the onset, the results are considerably different. All of the resulting groups are different than the previous example, but the most notable difference is that the cluster in the top left was effectively split and combined into nearby clusters. This shows that, by selecting the different numbers of desired clusters for the K-means method, the results may not reflect natural groupings in the data. Figure 5: K-means clustering mismatch between 4 clusters and 3 seeds. 2.9.3 ISODATA An example of numerous alternatives related to K-means clustering is ISODATA (Iterative SelfOrganizing Data Analysis Technique) (Ball & Hall, 1967). Though the two algorithms are similar, a distinct difference is that the K-means assumes the number of clusters is specified a 24 priori, while the ISODATA algorithm allows for the adjustment of different numbers of clusters based upon a specified threshold. The original cluster locations are devised by calculating the mean of all variables, expanding the range of cluster origins to one standard deviation, and finally dividing the range per number of cluster origins for starting positions (Tou & Gonzalez, 1974). Though the ISODATA method is not compared here, it shows that the K-means has inspired new ways of finding natural groupings of data. This provides support for methods moving beyond the K-means and is an example of clustering techniques that do not require the number of groups in a scene to be known a priori, like the Mean Shift method discussed next. 2.9.3.1 Mean Shift The Mean Shift clustering (Comaniciu & Meer, 2002) is again similar to K-means. However, the number of output groups is not required. Instead the Mean Shift uses a bandwidth distance, or window of evaluation, for calculating local density. First, windows are placed at either uniform or selected locations, depending upon the implementation. Then, for all of the points within the window, the mean for each variable is determined. A vector from the original location to the mean location is calculated and the window is shifted to the mean, or center of local density. This continues iteratively, calculating new means and shifting the window until a specified convergence threshold is reached and the shift is considered small enough to terminate. It is considered a mode-seeking process. Segmentation of multidimensional data sets with the Mean Shift is achieved by placing many windows for the mean shift process, possibly at data point locations, and grouping data that converged to the same local density values, or destination "modes." 25 Figure 6 illustrates an example of the Mean Shift using the U and V dimensions (from the LUV color space) of one of my case studies (the Indoor LiDAR data, described in the Methods section). The first of the three images shows the two variables spatially mapped with their U and V values (in standard deviation units) and colored with their RGB values for reference to the real world objects they represent. The second and third images are color and grayscale examples of the same Mean Shift calculations. Both images have 5 selected points with a circle showing the first bandwidth window of evaluation. Here, the bandwidth is 0.5 standard deviation units. The iteratively calculated local density means are also shown, demonstrating 5 paths that lead to 5 different cluster centers. For segmentation using the Mean Shift clustering technique, all data points will be evaluated in similar manner. Figure 6. Mean Shift on 2D point set, finding modes. To extend the example to multidimensional (3D) space, Figure 7 shows the three LUV variables spatially mapped to the x, y, and z axes. To aid visual reference of the real objects they represent (in the Indoor LiDAR scan), the data are colored with their associated RGB values in the first image. The second image shows the Mean Shift clustering results. After all of the data points have been evaluated with the Mean Shift clustering, they are colored per their output cluster 26 groups. Points that arrived at the same local density point have the same color. Even though a few points were not found to be associated with clusters when using the 0.5 standard deviation bandwidth, the Mean Shift clustered the majority of points into 6 clusters without the need for specifying the number of clusters. Figure 7. Points in a 3D LUV color space clustered into 6 groups by the Mean Shift. The performance of the Mean Shift has been well documented and extended in the image segmentation literature. For instance, Shimshoni, Georgescu, and Meer (2006) have implemented a version to improve speed by using locality-sensitive hashing to partition the data and reduce the number of iterations needed to find approximate cluster centers. The Mean Shift has also been applied as a means in quantifying vegetation in an agricultural setting by using raster image color values (Zheng, Zhang, & Wang, 2009), in shoreline extraction using geometry derived from aerial LiDAR (Lee, Wu, & Li, 2009), and in object extraction of urban features from aerial LiDAR points (Yao, Hinz, & Stilla, 2009). 27 My thesis bridges these studies by implementing the Mean Shift clustering using both spatial and spectral values. As previously mentioned, the spatial location, spatial relationship, and thematic characteristics relate to low-level ontological components in a landscape. With the Mean Shift, the 3D points can be aggregated into locally similar groups for a quantitative and qualitative evaluation. 2.10 Accuracy Assessment 2.10.1 Types of Clustering Assessment My thesis is concerned with finding groups of 3D points with similar spatial and thematic attributes, possibly representing components of objects. Considering this is multi-dimensional, I have chosen to proceed using clustering aiming to identify groups of similar 3D points. This is essentially a form of unsupervised segmentation of 3D points. Range image classification and segmentation performance assessments offer a means of evaluation. I'll first describe the general types of clustering and segmentation assessments. A survey of methods for evaluating image segmentation processes (Zhang, 1996) shows there are two main categories to consider. Analytic methods are for evaluating the algorithms themselves; the concepts, principles, and characteristics that are used. Empirical methods, on the other hand, evaluate the performance of the processes through the results. This type of evaluation is very common and can be found to have different branches of study as well. One branch of empirical evaluation is concerned with goodness measures, leaning heavily upon human intuition of a visually "good" segmentation along with measures of intra-region uniformity or inter-region contrast. Another branch of empirical evaluation concerns measures of discrepancy. This deals with differences between ground truth data and the output. Mis-segmentation of pixels, groups, 28 positions, or a variety of characteristics may be evaluated. I will generally make use of empirical discrepancy methods in evaluating my experimental results and the level of performance for the variables of interest. One of the most basic evaluations of clustering output is simply the number of output groups closest to a predetermined number of reference groups (Zhang, 1996). A more informative metric is the number of correctly or incorrectly segmented points, usually shown as a percentage. These could be helpful as guidance toward a selection of segmentation parameters to be used for more detailed analysis. 2.10.2 Confusion Matrices - Typical Classification Approach Traditional classification performance assessments establish an error matrix for evaluation (Foody, 2002). In this section I will describe a typical version for classification (having the same number of reference and output classes.) Then, in the next section, I describe a modified matrix that handles my clustering results (having different numbers of reference and output classes.) As described by Foody (2002), in a typical supervised classification the desired classes to be found in an image are defined. Pixels in an image are selected as training sets and classified as a particular class by an analyst. Every other unclassified pixel is then compared to the training sets and assigned to the class with the most similar multispectral values. All of the pixels are typically assigned to one of the defined classes. An important point here is that a square confusion matrix is used to compare the same number of classes in both the reference data and the output. 29 Traditional confusion matrices can be used to calculate accuracy metrics (Foody, 2002). The square confusion matrix has columns and rows for the specified classes and conventionally represent the reference and output labels respectively. In the cells are the number of sampled pixels corresponding to each reference and output class. As Foody (2002) describes, the diagonal sum of this matrix represents the total number of correctly classified pixels in every class and the percentage of correctly classified pixels to the total number of sampled pixels offers the Overall Accuracy of the classification. He also notes that two other metrics are available for each class, the Producer's Accuracy and the User's accuracy. The Producer's accuracy is a percentage measure calculated as the number of correctly classified pixels in a reference column to the total number of pixels in that reference column. This shows how well the classification did to avoid errors of omission, or reference pixels that were labeled as some other class. The User's accuracy is a percentage measure calculated as the number of correctly classified pixels in an output row to the total number of pixels in that output row. This shows how well the classification did to avoid errors of commission, or output classes that contain pixels from other reference classes. Enhancement of the traditional classification accuracy assessment discussed in the literature (Congalton, 1991) compares significant differences in several matrices using kappa and khat metrics. However, with my synthetic data, the number of Mean Shift clustering output groups is not specified and may not equal the number of input reference groups. Not having a square matrix negates the possibility of using the kappa and khat values for determining if a classification is significantly better than another. Nevertheless, the traditional confusion matrix lays a theoretical foundation for evaluating accuracy, though it requires a modification, as described below. 30 2.10.3 Overlapping Area Matrices - Modified Segmentation Approach As the number of object components in a landscape scene is often unknown, it is preferable not to specify the number of output groups in advance. This may result in a (rectangular) asymmetric confusion matrix having different numbers of columns and rows. A symmetric, square matrix, only results if the algorithm happens to produce the same number of output groups as there are reference groups. There were typically 6 reference groups (columns) specified for my synthetic data, though the alternative color configuration had 12 reference groups. The output groups (rows) in my experiments had the potential to range from many groups containing one point each at small bandwidths to one group containing all of the points at larger bandwidths. A modified type of performance matrix, the Overlapping Error Matrix (OAM), was developed for segmentation evaluation (Ortiz & Oliver, 2006). This segmentation accuracy assessment works with unequal numbers of reference and output groups. As described in the Methods section, I had the benefit of a known ideal number of groups in my synthetic data set. The Mean Shift clustering output allows for a flexible number of output groups to be found. Thus, the confusion matrix consisted of six columns representing reference groups and a variable number of rows representing the output groups. If there were fewer than six output groups, the clustering resulted in some level of under-segmentation. If there were more than six output groups, the clustering resulted in some level of over-segmentation. A hypothetical under-segmented example output is shown in Table 1; it has six reference groups (columns) and three output groups (rows). The lower and right margins are the column and row totals, respectively. 31 Ref. Group 1 Ref. Group 2 Ref. Group 3 Ref. Group 4 Ref. Group 5 Ref. Group 6 Total Output Group 1 5 15 45 55 45 44 209 Output Group 2 0 40 15 4 10 9 78 Output Group 3 59 9 4 5 9 11 97 Total 64 64 64 64 64 64 384 Table 1. Hypothetical overlapping error matrix showing under-segmentation. 2.10.4 Range Data Segmentation - Extended Metrics To address 3D spatial point data directly, I followed the literature regarding range image segmentation and implemented the accuracy assessment framework by Hoover, et al. (1996) that extended the measures of an overlapping area matrix. In this assessment framework, several metrics were defined and additional stages were used to determine whether every cell in the matrix was initially labeled as at least one of 5 classifications: correctly segmented, oversegmented, under-segmented, a missing (reference) segment, or a noise (output) segment. The metrics basically related to proportions of intersecting cell values (numbers of points) to their reference (column) and output (row) values. If a matrix cell value was a high proportion in both the reference and output groups, it was generally considered correctly segmented. Ultimately, the cells were used to determine the number of output groups that are associated with each type of classification. Tolerance levels, measures of acceptance or rejection with which the proportions of points were evaluated, helped gauge the degree of segmentation performance. The tolerance levels employed ranged from a majority of points (>50%) to a perfect, correctly-segmented group with all points (100%) in an output group comprising 100% of a reference group. An 32 introductory description of this assessment method follows, including hypothetical examples; a detailed description is presented in the Methods section. First, the reference and segment metric values, Mr and Ms, were generated for each cell by calculating the proportions of points in each cell to the column total for the reference group and to the row total for the output group. This was somewhat similar to the Producer's and User's accuracy in a typical confusion matrix evaluation. Continuing with the under-segmented example OAM, shown below as Table 1, the resulting values from this stage are shown in Error! Reference source not found. and Table 3 with the Mr being proportions of points per reference side (column) and the Ms being proportions of points per output segment (row) respectively. Two cells, shown in bold, illustrate that each pair exceeds the tolerances mentioned in the next section. Ref. Group 1 Output Group 1 Output Group 2 Output Group 3 Total Ref. Group 2 Ref. Group 3 Ref. Group 4 Ref. Group 5 Ref. Group 6 0.08 0.23 0.70 0.86 0.70 0.69 0.00 0.63 0.23 0.06 0.16 0.14 0.92 0.14 0.06 0.08 0.14 0.17 1.00 1.00 1.00 1.00 1.00 1.00 Table 2. Mr values, proportions of points in each cell relative to the reference totals. 33 Ref. Group 1 Output Group 1 Output Group 2 Output Group 3 Ref. Group 2 Ref. Group 3 Ref. Group 4 Ref. Group 5 Ref. Group 6 Total 0.02 0.07 0.22 0.26 0.22 0.21 1.00 0.00 0.51 0.19 0.05 0.13 0.12 1.00 0.61 0.09 0.04 0.05 0.09 0.11 1.00 Table 3. Ms values, proportions of points in each cell relative to the output totals. 2.10.4.1 Segmentation Performance per Tolerances - Correct Segmentation For the second stage, and as a means of determining the level of segmentation performance, the proportion values in every cell in both matrices were evaluated against tolerances increasing from 0.5 to 1.0, in specified steps. By having a cell proportion value relative to the column and row totals that is larger than a tolerance value, this pair of reference and output groups was considered an instance of correct segmentation at this tolerance level. If the tolerance level was 0.5, the number of points in the cell are at least a majority of points in both the reference group and the output group. Of course, 50% of the points for such a cell is a relatively low level of tolerance. So, by increasing the tolerance in specified steps up to 1.0, for 100% of the points, it was possible to reveal which pairs of reference and output groups had a high proportion of shared points. Using the example metrics in Table 1 and Table 3, above, the segmentation performance was evaluated as proportions of points exceeding increasing tolerance levels. Table 4, shows whether two corresponding cell pairs meet the criteria and are considered instances of correct segmentation since the proportions of points in Table 1 and Table 3 are both greater than or 34 equal to the 0.5 tolerance. As shown in Table 4, increasing the tolerance to 0.6 resulted in only one correctly segmented cell pair, shown in Table 5. Ref. Group 1 Output Group 1 Output Group 2 Output Group 3 Ref. Group 2 Ref. Group 3 Ref. Group 4 Ref. Group 5 Ref. Group 6 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 Table 4. Cells with proportions greater than or equal to a tolerance of 0.5. Ref. Group 1 Output Group 1 Output Group 2 Output Group 3 Ref. Group 2 Ref. Group 3 Ref. Group 4 Ref. Group 5 Ref. Group 6 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Table 5. Cells with proportions greater than or equal to a tolerance of 0.6. 2.10.4.2 Instances of Over-segmentation and Under-segmentation The third stage of evaluation addressed instances of over-segmentation. This generally proceeded with the same theoretical concept of meeting the criteria of proportions per reference column and output row. However, over-segmentation occurs when one reference segment is represented by several output groups. Here, instead of simply one row, the sum of the cells from every row that are participating in the column was determined and a proportion of the column total is again calculated for the metric Mro (over-segmentation metric for reference groups). However, to be 35 sure that each of these rows was participating more in this column than any other column, each participating row cell must have met the tolerance value with regard to the proportion of points compared to the row total. If this additional criteria is met, the cells for all the participating rows were summed, and a proportion relative to the row total was calculated for the metric Mso (oversegmented metric for output segments). Both Mro and Mso must have then met the tolerance for the cell to be classified as an instance of over-segmentation. The fourth stage of evaluation addressed under-segmentation. This is where one output group represented several reference groups. It is evaluated in the same fashion as over-segmentation, although it uses several columns to derive the Mru and Msu metrics (under-segmented metric for reference groups and output segments, respectively). 2.10.4.3 Final Classification In this framework, each cell could be classified as a member of up to three classes (Hoover, et al., 1996). The fifth stage of evaluation simply determined which classification the cell was mostly participating in by averaging both measures for each of the three classifications of correct segmentation (Mr and Ms), over-segmentation (Mro and Mso), and under-segmentation (Mru and Msu). The highest average value of the cell's classifications (relating to the proportions of a cell's participation in each classification) is selected as the final classification for that cell. In circumstances with equal averages of the metrics, preference is given to correct segmentation, then to over-segmentation, then to under-segmentation. A cell not classified as correct segmentation, over-segmentation, or under-segmentation is labeled as a missing reference in the reference column and also labeled as a noise segment in the output row. 36 The five potential final cell classifications ultimately determined how many groups belonged to each classification, based on the intersecting cell values representing proportions of the reference and output groups. Of these five classifications, I was mainly concerned with the number of correctly segmented groups as the single measure of segmentation performance. 2.10.4.4 Segmentation Performance Curves By charting the number of groups for each tolerance value, with the tolerance on the x-axis and the number of correctly segmented groups for each tolerance on the y-axis, it is possible to graphically present the performance of each variable set. The segmentation performance curves of Hoover, et al. (1996) easily allowed for a visual evaluation of the persistence of correctly segmented groups for each variable set as the tolerance increased. An example of a hypothetical segmentation performance curve is shown below in Figure 8, having the number of correctly segmented groups specified for tolerances from 0.5 to 1.0 in steps of 0.1. Also shown at the top, is an "Ideal" value of 6 correctly segmented groups, equivalent to most of my experiments' prelabeled reference groups (only one of my synthetic experiments had 12 groups.) The segmentation performance curve shows that 6 output groups are classified as correctly segmented at the 0.5 and 0.6 tolerance values, meaning 6 groups had a number of points that was at least 60% of a reference group and 60% of an output group. At the 0.7 tolerance level, only 5 groups met the criteria for being correctly segmented (over 70% of the points of both the reference and output group.) Eventually, the segmentation performance curve reveals 2 correctly segmented groups at the 1.0 tolerance level, indicating that 2 groups contained 100% of the points of both the reference group and the output group. 37 Figure 8. Example segmentation performance curve. 2.10.4.5 Area Under the Curve Index To obtain a single numeric performance value as a quantitative metric, I extended the accuracy assessment of Hoover, et al. (1996) by calculating an area under the curve index (AUC) so several results could be compared. This was helpful for summarizing the numerous segmentation performance curves generated for the variable sets in my experiments. The highest AUC ranking indicated the highest performing variable set. 38 Chapter 3 - METHODS 3.1 Methods Overview My thesis concerns the separation of 3D point data into groups with similar spatial and thematic characteristics. To that aim, my methods were concerned with clustering segmentation performance using 3D point data, different variable sets, and multiple perturbations of the data. The specific processes employed are described below. Images in this thesis are presented in color. In the context of segmentation, my thesis generally involved the Mean Shift clustering algorithm. Based upon the computer vision and segmentation literature, I expected the Mean Shift clustering to segment a large group of multidimensional point data satisfactorily. A notable benefit of the Mean Shift method is that a selected bandwidth window of evaluation, which is applied to locations throughout the data space, can yield any number of clustered groups, as opposed to setting the number of groups with prior knowledge. I used the K-means clustering technique, to compare the Mean Shift to a more familiar clustering method. The K-means requires the specification of the number of output groups as a parameter. The number of groups was selected based upon the known number of reference groups. However, it is important to note that the reference groups were selected intuitively, based upon spatial and thematic characteristics. The K-means algorithm has the potential to reveal different types of groups than the intuitively selected reference groups. A qualitative evaluation of the output groups helped to explore this situation, as discussed in the Results and Discussion section. 39 The synthetic data set initially consisted of 3D points with thematic information. From this, additional spatial and thematic variables were derived as explained in the Data Sources and Processing sections below. Several stages of evaluation were used to address issues individually, as described below. Multi-variate data analysis, spatial resolution, sensor accuracy, and more complex attributecoordinate associations are all important components for this research. As an introductory comparison of the clustering methods, the first stage of experimental evaluation considered the parameters and results of the Mean Shift and K-means clustering. The second stage compared the effects of higher and lower spatial resolutions. The third stage of evaluation considered signal error by introducing increasing levels of noise in spatial and attribute variables. The fourth stage used an alternative color configuration. All of the stages up to this point were evaluated with an accuracy assessment framework incorporating various quantitative metrics, graphed results, and 3D perspective plots. After evaluating the experimental results of the synthetic data, I applied the concepts and techniques to two LiDAR case studies. These steps are described in the Processing and Accuracy Assessment sections below as well as in the Results and Discussion section. 3.2 Data Sources Most of the data in this thesis were processed with the open-source software "R" (v. 2.10.0) (R Development Core Team, 2009). I also used a C++ implementation of the Mean Shift clustering algorithm, "Code for adaptive mean shift in high dimensions" (Georgescu, Shimshoni, & Meer, 2003), obtained from the Rutgers University Robust Image Understanding Library. I collected the case study LiDAR data in August of 2009 with an Optech 36D terrestrial LiDAR scanner and parsed those data with the Optech Parser (v.4.3.8.6). 40 3.2.1 Synthetic Data Sets A series of synthetic datasets were generated for the first sets of experiments. Each consisted of 3D vector points regularly spaced along the six faces of a cube, 10 units on a side. Thematic information— color—was assigned to each point. The number and spacing of points, spatial configuration of color on the cube, and degree of additive coordinate and spectral noise varied by dataset and are detailed in the following paragraphs. The use of synthetic data provided known reference values for each observation, which is valuable in any validation effort. The design mimicked a basic real-world application of directing a LiDAR scanner at an object with clearly defined facets and attributes. Finally, the relatively small size of each dataset minimized the computational burden, enabling additional experiments and analysis. The first step in my processing was to create synthetic data sets. The initial synthetic data set represented a cube that had points on each of its six sides in 3D (XYZ) coordinates ranging from 0.0-9.0. Since there is often uncertainty regarding exact boundaries or edges of regions in most measurements of real world phenomena, the synthetic data excluded point representation of edges and corners (values 0.0 and 9.0). This resulted in an 8 x 8 array of 64 points representing each of the six sides, totaling 384 points for the data set. For the accuracy assessment, all of the points were labeled as belonging to one of 6 reference groups representing the sides of the cube (front, back, top, bottom, left, and right). Thematic properties were represented by different levels of red, green, and blue (RGB) colors. The value of visible color reflectance is often collected by remote sensors, though these data were simply a surrogate for any other spectral wavelengths or thematic information that may be 41 included with 3D spatial data. For this experiment, values for color were added to the points. Groups of points representing each side of the cube had a different level of R, G, and B color values in a range from 0.0-1.0. Three sides had full-intensity red, green, or blue values of 1.0, while three sides had half-intensity red, green, or blue values of, 0.5 as shown in Table 6 below. A 3D perspective of the synthetic cube is also shown in Figure 9, below; it is an oblique view from above the top, front, left corner. These initial data are called the "noise-free" data set from here on. Red (R) Green (G) Blue (B) Reference Side 1 - Left 1 (Full) 0 0 Reference Side 2 - Top 0 1 (Full) 0 Reference Side 3 - Front 0 0 1 (Full) Reference Side 4 - Right 0.5 (Half) 0 0 Reference Side 5 - Bottom 0 0.5 (Half) 0 Reference Side 6 - Back 0 0 0.5 (Half) Table 6. Color assignment for thematic variables of the initial synthetic data set. Figure 9. Synthetic test cube from the top, front, and left corner. 42 The accuracy assessment required reference groups against which the segmentation output could be compared. Since I expected that the spatial and thematic characteristics of each side of the cube would provide sufficiently separable groups of points, each set of observations comprising one face of the cube was also considered a reference group. These synthetic groups are experimental surrogates for any groups of points with similar spatial and thematic characteristics in a real-world data set, potentially representing components of an object. The data set now included a reference label (Ref.), spatial attributes (XYZ), and thematic attributes (RGB). I expected that the XYZ coordinates would help to group points by their absolute position. This is a very important determinant of a group of spatial data points, particularly for the aim of representing a locally contiguous region of similar spatial characteristics. The thematic attributes of RGB spectral values are a very common set of characteristics that could also help to differentiate local regions of similarity. 3.2.2 LiDAR Data - Case Studies Beyond my synthetic data experiments, I employed the methods on two terrestrial LiDAR scans collected from observations of real-world objects in order to evaluate its potential for geographic feature segmentation. The first was a scan of simple, constructed shapes and colors represented in an indoor scene (subsequently called "Indoor"). The second was a scan of an archaeological site (subsequently called "Archaeological Site"), which provided rugged shapes and numerous colors of built structures and background landscapes. All three data sets are summarized in Table 7. 43 Data Set Experimental Data (initial and modified versions) Purpose To provide source and reference data for methodology and assessment framework included in this thesis. To apply thesis methodology to simple man-made shapes and Indoor LiDAR Data colors for discussion purposes. To apply thesis methodology to complex structures and Archeological Site LiDAR Data background landscapes for discussion purposes. Table 7. Data sets. 3.3 Processing 3.3.1 Spatial Variables - Neighborhood Relationships The next step in the generation of my synthetic data set was to derive additional spatial information from the existing basic data. The aim here was to represent the spatial relationship between each 3D point and its neighbors. Therefore, a basic measure of spatial neighborhood relationship was conceived and calculated: the mean vector (MV). This is simply the mean of vectors from each point to some of its neighbors. To calculate the mean vector, the eight nearest neighbors to each point were determined. The 3D (XYZ) vector was calculated for each neighbor. Finally, the average values of change in each axis were returned as a single set of 3D distances in the X, Y, and Z directions (MVx, MVy, and MVz). The formula for each MV is as follows: 𝑀𝑉 𝑖 = ∑𝑖 𝑛 𝑛 where i is the coordinate value for each axis (x, y, and z), and n is the number of neighbors (8 in this case). 44 3.3.2 Thematic Variables - Spectral Color Space Alternatives The initial synthetic data included a set of basic RGB values. This is one of numerous color spaces that can be used for analysis. As described in the Conceptual Framework, alternative color representations were considered to address issues of varying lightness in real-world data. From the RGB variable set I chose to derive values sets for both Luv (LUV) and CIE L*a*b* (LAB) color spaces, using the R software "colorspace" library(R Development Core Team, 2009). 3.3.3 Standardized variables The synthetic data set included all of the variables that I intended to use: Ref., X, Y, Z, MVx, MVy, MVz, R, G, B, L, U, V, L*, a*, b*. These variables are shown as "variable sets" in Table 8. Variable sets, variables and descriptions.. Variable Set Ref XYZ MV RGB LUV UV LAB AB Description Reference Side Labels 1-6: Reference Group Left, Front, Bottom, Right, Back, Top X, Y, and Z coordinates X Y Z (For absolute position) Mean Vector X, Y, and Z coordinates MVx MVy MVz (For neighborhood relationship) R G B Red, Green, and Blue color space values L U V CIE LUV color space values U V CIE LUV color space values (No Lightness) L* a* b* CIE L*a*b* color space values a* b* CIE L*a*b* color space values (No Lightness) Variables Table 8. Variable sets, variables and descriptions. The variable sets had various distributions of values and also had different ranges (e.g., 1-9 for XYZ and 0-1 for RGB). To remove the possibility of weighting one variable more than another 45 due to the magnitude of its values, I standardized each variable by transforming data values into their z-scores using the following formula: 𝑧= 𝑥− 𝜇 𝜎 where 𝑧 is the standardized z-score, 𝑥 is the point value, 𝜇 is the variable's mean value, and 𝜎 is the variable's standard deviation. Though the synthetic variables did not have normal distributions of values (real-world values may have a more normal distribution), the results of this process expressed the variability in a standard, unit-less way for the subsequent clustering operations. 3.3.4 Variable Sets and List of Experiments With all of the variables prepared, I was able to enter them as combinations into the clustering processes. The 23 possible combinations of variable sets are shown in Table 9. I used 15 variable set combinations for the synthetic data evaluation and 8 additional variable set combinations for the case studies. 46 Used for Used for Variable 1 Variable 2 Variable 3 Experimental Case Data Sets Studies Yes Yes XYZ Yes Yes MV Yes Yes RGB Yes Yes LUV Yes Yes LAB (No) Yes UV (No) Yes AB Yes Yes XYZ MV Yes Yes XYZ RGB Yes Yes XYZ LUV Yes Yes XYZ LAB (No) Yes XYZ UV (No) Yes XYZ AB Yes Yes MV RGB Yes Yes MV LUV Yes Yes MV LAB (No) Yes MV UV (No) Yes MV AB Yes Yes XYZ MV RGB Yes Yes XYZ MV LUV Yes Yes XYZ MV LAB (No) Yes XYZ MV UV (No) Yes XYZ MV AB Table 9. Combinations of sets of variables, steps, and sequences. I conducted 5 general sets of experiments, as listed in Table 10, below. They are described in the following sections and were compared throughout the assessment. Experiment Set 1 2 3 4 5 Topic Mean Shift/K-means clustering comparison Spatial resolution comparison Noise evaluation Alternative color configuration LiDAR case studies Table 10. List of experiments. 47 3.3.4.1 Experiment Set 1 - Mean Shift/K-means Clustering Comparison The first stage of evaluation for the synthetic data compared the Mean Shift and the K-means clustering outputs using the noise-free data set for both. Because the K-means algorithm requires the number of output groups as an input parameter, this was specified to match the number of (6) reference groups. This was the ideal set of labeled reference groups that could be found by either clustering method. The k-means clustering algorithm, as implemented through the R software "stats" library(R Development Core Team, 2009), is based upon an algorithm presented by Hartigan and Wong (1979). The Euclidean distance measure was used for the k-means clustering. 3.3.4.2 Experiment Set 2 - Spatial Resolution Comparison The data sets for the spatial resolution comparison were prepared similarly to the initial noisefree data set, though they had different point spacing. In comparison to the noise-free data set's spacing of 1 unit, the lower resolution set had a point spacing of three units, while the higher resolution set had a point spacing of 0.2 units. These values were selected as the first evenvalued multiples below and above the initial spacing of one. These point spacings avoided computational rounding errors. More importantly, each data set could be divided perfectly in half, able to meet an accuracy assessment tolerance value of 0.5, or 50% of points on each side. The total number of points in the set in the low resolution set, was 24 with (2 x 2) 4 points per side. The total number of points in the set in the high resolution set, was 11,616 with (44 x 44) 1,936 points per side. These are large differences in numbers of points, but it emphasized the effects of under-sampling and over-sampling during the data collection phase of an investigation. 48 3.3.4.3 Experiment Set 3 - Noise Evaluation This set of experiments consisted of data sets that had an incremental amount of noise added. Random samples of a Gaussian distribution with a mean of zero and an incrementally increasing value of variance from 0.02 to 0.10, in steps of 0.02 units were added to the initial raw (XYZ and RGB) data values. 3.3.4.4 Experiment Set 4 - Alternative Color Configuration The data for Experiment Set 4 was intended to help evaluate the clustering ability with a more complex, alternative color configuration. As described in the Conceptual Framework section regarding Ontology, objects in the landscape may be close to each other and have the same orientation, though they may be considered separate components due to varying color. To address such circumstances, this data set was prepared with separate groups of points that only varied in color (RGB, LUV, or LAB) while being in close (XYZ) proximity and having the similar (MV) neighborhood. Each side had a different color configuration having 2 colors per side, as shown in Table 11. Some colors had the same varying levels of RGB as the original data set, 1.0 or 0.5 for each R, G, or B. Other colors were combinations of R, G, and B, with values of either 1.0 or 0.5. To evaluate whether the clustering would create groups on different sides (across an edge of the cube), the configuration was prepared to have situations where two of the halves on different but adjacent sides had the same color (e.g., Reference Sides 1 and 5, Table 11). Conversely, the same color was also used on non-adjacent halves of sides (e.g., Reference Sides 1 and 3, Table 11). 49 Red (R) Green (G) Reference Side 1 - Left Side, Top Half 1 (Full) 0 Reference Side 2 - Left Side, Bottom Half 0.5 (Half) 0 Reference Side 3 - Right Side, Top Half 1 (Full) 0 Reference Side 4 - Right Side, Bottom Half 0 0 Reference Side 5 - Front Side, Left Half 1 (Full) 0 Reference Side 6 - Front Side, Right Half 0 0 Reference Side 7 - Back Side, Left Half 1 (Full) 1 (Full) Reference Side 8 - Back Side, Right Half 0 0.5 (Half) Reference Side 9 - Bottom Side, Left Half 0 0 Reference Side 10 - Bottom Side, Right Half 0 0 Reference Side 11 - Top Side, Left Half 0.5 (Half) 0 Reference Side 12 - Top Side, Right Half 0 0 Blue (B) 0 0 0 1 (Full) 0 0.5 (Half) 0 0.5 (Half) 1 (Full) 0.5 (Half) 0 0.5 (Half) Table 11. Alternative color configuration values. 3.3.4.5 Experiment Set 5 - LiDAR Case Studies After completing the work with the synthetic data sets, the methods were applied to real-world terrestrial LiDAR data sets. The first real-world data set was the Indoor scan of simple, manmade shapes having very basic and separable colors, though there was certainly more variability in both the spatial and thematic (RGB color) variables as compared to the synthetic data. The scan originally included over three million points and was sub-sampled to a much more computationally efficient set of ~5,000 points by selecting every 512th point in the data set. This resolution still provided plenty of spatial and spectral detail regarding the objects in the approximately 3 x 2 x 1 meter scene. The original 3 million point data set, before sub-sampling, is shown in Figure 10, below. 50 Figure 10. Indoor LiDAR data set. The second real-world data set was the Archaeological Site scan of complex built-up and natural landscape features having considerable variability in both the spatial and thematic (RGB color) variables. The scan included a stone-lined and capped water channel (diagonally from the lowerleft to the upper-right of the scene), bare soil (in the lower-right), and grassy vegetation (on the upper-left). This allows for the evaluation of spatial position and complex orientation, as well as the spectral values for different materials. The scan originally included approximately 0.5 million points and was sub-sampled to a much more computationally efficient set of ~8,000 points by selecting every 64th point in the data set. This resolution still provided plenty of spatial and spectral detail regarding the objects in the approximately 4 x 3 x 1 meter scene. The original half of a million point data set, before sub-sampling, is shown in Figure 11, below. 51 Figure 11. Archaeological Site LiDAR data set. 3.3.5 Clustering For the Mean Shift clustering, I used the Fast Adaptive Mean Shift (FAMS) algorithm, as implemented in the C++ programming language by researchers at Rutgers University (Georgescu, Shimshoni, & Meer, 2003). It is based upon previously established concepts from work by Comaniciu and Meer (2002). Though the FAMS algorithm is intended to address computational performance by sampling the data set, I simply ran the algorithm using all of the data points. Thus, only two parameters were required for this implementation of the algorithm: a bandwidth distance and the number of nearest neighbors to use. I initially selected both 8 and 24 neighbors for the FAMS algorithm. Since there was no difference in the number of output groups for initial stages of several variable sets between these two nearest neighbor selections, I maintained only 8 neighbors for the full evaluation of all of the experiments. The more important Mean Shift parameter was the distance selected for the bandwidth of evaluation. I used Euclidean distances initially ranging from 0.0 to 3.9 standard deviation units, in increments of 0.1. After initial results were evaluated, this was reduced to a range of 0.0 to 2.9 standard 52 deviation units because the larger bandwidths yielded only one under-segmented group of all data points, with no benefit arising from the evaluation of bandwidths higher than 2.9 standard deviation units. Therefore, each of 30 bandwidths was used for clustering each variable set. The results were evaluated, in terms of the number of output groups that most closely matched the number of reference data groups. 3.4 Accuracy Assessment The accuracy assessment of the clustering followed the previously employed concepts and techniques described in the Conceptual Framework (e.g., Hoover, et al., 1996). However, as discussed in detail below, there were some alterations and innovations to address the implementation of the Mean Shift clustering. 3.4.1 Mean Shift Clustering The implemented FAMS algorithm for the Mean Shift method used the previously mentioned parameters (bandwidths and number of neighbors) to cluster the variable sets providing two outputs. One Mean Shift output is a listing of the destination mode for every point, in terms of each variable. Occasionally, the bandwidth of evaluation may not calculate an exact destination mode; it repeatedly bounces between two very close destinations. A small convergence value is used to stop the algorithm from running continuously in such cases. The next step in the algorithm is a routine that "prunes" the modes. Because many points have destinations that are very close to each other but not with exactly the same values, this pruning routine determines points that are very close to each other, within a small tolerance distance, and re-assigns them to a "pruned mode". The routine reduces a larger number of modes (with few 53 points each) to a smaller number of pruned modes with relatively more points. The outcome is the second set of results, the Mean Shift pruned modes. 3.4.2 Pruned Mode Labeling Modifications The second Mean Shift output, showing pruned mode locations, does not specify which particular points are associated with the pruned mode location; it simply returns a count of the points at that location. Furthermore, the total of points at pruned modes did not equal the total number of points in the data set. Some points were not in close enough proximity to be considered a member of any pruned mode. To determine which points were nearest to each pruned mode location, I post-processed the results. I identified a bandwidth tolerance in the 16 FAMS algorithm used for pruning the modes. This was 3000 in 1/(2 ) units of the data set, which was approximately a value of ~0.045 in each variable's standardized units. To post-process the clustering results, I first found the closest pruned mode for every point in the data set and then evaluated if it was within the bandwidth used for pruning the modes. If it was, then the point was labeled as belonging to that particular pruned mode, or output group. However, in comparing the resulting number of groups and points in each output group to the number of pruned modes for each and every experiment, variable set, and bandwidth, I actually found that the numbers of my labeled groups and points within those groups was close to, but not exactly matching the number of groups and points in the pruned modes. Some were a few groups or a few points less. To get closer to the Mean Shift pruned mode output, I expanded the bandwidth for labeling the points to 1.1 units, which matched the numbers of post-processed groups to the number of Mean Shift output groups with only a very small proportion of exceptions between the numbers of points in the corresponding groups. These errors were considered negligible since they were mostly found in very small clustering bandwidths (0.0-0.2 standardized units) which 54 had very few groups, indicating under-segmentation or missing segments. The only notable exceptions were with some bandwidths around 1.3 , usually with the LUV variable set and again involved under-segmentation. 3.4.3 Segmentation Performance Classification With the points finally labeled as belonging to specific pruned mode output groups, I proceeded with an accuracy assessment that was mainly guided by Hoover, et al. (1996). First, I created overlapping area matrices (OAMs) for every experiment, variable set, and bandwidth. Then, the number of points in each cell was used to derive the proportional metrics per column and row used for correct segmentation, Mr and Ms, as well as the corresponding metrics for oversegmentation, Mro and Mso, and the metrics for under-segmentation, Mru and Msu. The labeling convention for the symbols metrics follow this convention: M - indicates a metric r - indicates the metric is in relation to a reference group (OAM column) s - indicates the metric is in relation to an output group, or segment (OAM row) o - indicates the metric is in relation to a measure of over-segmentation u - indicates the metric is in relation to a measure of under-segmentation The metrics are formalized after a mention of a modification to the reference column metrics (Mr, Mro, and Mru) and the tolerance values used. The final 5 classifications of corresponding reference and output groups were derived from all 6 metrics, as described in sections that follow. 3.4.3.1 Modification to the Reference Column Metrics One important modification was made to the calculation of the proportional metrics. The intent of the accuracy assessment described by Hoover, et al. (1996) is to evaluate a data set that had 55 every point labeled; this was not the case for the Mean Shift pruned modes. Some of the points were left out of the Mean Shift pruned modes and the subsequent point labeling. A column of the overlapping area matrix represents all of the points in a reference group. If a subset of points (the set of points belonging to the pruned mode) existed in a column, it did not actually represent a proportion of the reference group's initial number of points. To correct this issue, the column total was set to the original number of points in the input reference group, usually 64 points (or 1936 points for the high resolution data set and 4 points for the low resolution data set). Then, for the Mr, Mro, and Mru metrics, the proportion of points in each cell was calculated as a proportion of the original number of points in the reference groups. 3.4.3.2 Tolerance Values As mentioned in the Conceptual Framework section on Accuracy Assessment, the process included the evaluation of the metrics against the specified tolerance values from 0.5 to 1.0 (a 50 to 100% proportion), in steps of 0.1 (10%). If each cell's metrics met the required criteria, it was temporarily labeled in separate matrices for each of three segmentation classifications (correctly segmented, over-segmented, and under-segmented). 3.4.3.3 Correctly Segmented Metrics An instance of correct segmentation is determined when the reference group and output segment meet two criteria, shown below, with the majority of points defined by the incremental tolerance values: 1) The majority of points in the reference group are labeled as members of a single output segment, calculated with the formula: 𝑀𝑟 = 56 𝑁 𝑟𝑠 𝑁𝑟 where 𝑀𝑟 is the ratio of intersecting points (𝑁 𝑟𝑠 ) to the total number of points in the reference group (𝑁 𝑟). 2) The majority of points in the output segment correspond to a single reference group, calculated with the formula: 𝑀𝑠 = 𝑁 𝑟𝑠 𝑁𝑠 where 𝑀𝑠 is the ratio of intersecting points (𝑁 𝑟𝑠 ) to the total number of points in the output segment (𝑁 𝑠). 3.4.3.4 Over-segmented Metrics Over-segmentation occurs when one reference group is represented by more than one output segment. In this case the number of intersecting points is the sum of intersecting points in all of the participating output segments. The criteria and metrics are similar to the correctly segmented metrics. However, the second criterion has two stages. The over-segmented criteria and metrics are: 1) The majority of points in the reference group are comprised of the majority of points in many output segments (Ss1+Ss2+...), calculated with the formula: 𝑀𝑟𝑜 = 𝑁 𝑟𝑠1 + 𝑁 𝑟𝑠2 + ⋯ 𝑁𝑟 where 𝑁 𝑟𝑠1, 𝑁 𝑟𝑠2, ... are the number of intersecting points in Ss1, Ss2, ... respectively. 2a) There is a majority of points in each and every one of the participating output segments (Ss1+Ss2+...) participating in the corresponding reference group, calculated with the formula: 𝑀𝑠𝑜1 = 𝑁 𝑟𝑠1 , 𝑁 𝑠1 𝑀𝑠𝑜2 = 57 𝑁 𝑟𝑠2 , 𝑁 𝑠2 … where 𝑁 𝑟𝑠1, 𝑁 𝑟𝑠2, ... are the number of intersecting points in Ss1, Ss2, ... respectively and 𝑁 𝑠1, 𝑁 𝑠2, ... are the number of total points in Ss1, Ss2, ... respectively. 2b) The majority of points in all of the participating output segments (Ss1+Ss2+...) 𝑁 𝑟𝑠1 + 𝑁 𝑟𝑠2 + ⋯ 𝑁𝑠 correspond to a single reference group, calculated with the formula: 𝑀𝑠𝑜 = 3.4.3.5 Under-segmented Metrics Under-segmentation occurs when more than one reference group is represented by one output segment. In this case the number of intersecting points is the sum of intersecting points in all of the participating reference groups. The criteria and metrics are similar to the over-segmented metrics. The under-segmented criteria and metrics are: 1) The majority of points in the output segment are comprised of the majority of points in many reference groups (Sr1+Sr2+...), calculated with the formula: 𝑀𝑠𝑢 = 𝑁 𝑟𝑠1 + 𝑁 𝑟𝑠2 + ⋯ 𝑁𝑠 where 𝑁 𝑟𝑠1, 𝑁 𝑟𝑠2, ... are the number of intersecting points in Sr1, Sr2, ... respectively. 2a) There is a majority of points in each and every one of the participating reference groups (Sr1+Sr2+...) participating in the corresponding output segment, calculated with the formula: 𝑀𝑟𝑢1 = 𝑁 𝑟𝑠1 , 𝑁 𝑟1 𝑀𝑟𝑢2 = 58 𝑁 𝑟𝑠2 , 𝑁 𝑟2 … where 𝑁 𝑟𝑠1, 𝑁 𝑟𝑠2, ... are the number of intersecting points in Sr1, Sr2, ... respectively and 𝑁 𝑟1, 𝑁 𝑟2, ... are the number of total points in Sr1, Sr2, ... respectively. 2b) The majority of points in all of the participating reference groups (Sr1+Sr2+...) 𝑁 𝑟𝑠1 + 𝑁 𝑟𝑠2 + ⋯ 𝑁𝑟 correspond to a single output segment, calculated with the formula: 𝑀𝑟𝑢 = 3.4.3.6 Final Classification For the final classification, each set of proportional metrics was first averaged for the correctly 𝑀𝑟 ∗ 𝑀𝑠 2 segmented, over-segmented, and under-segmented classifications as shown below: 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑆𝑒𝑔𝑚𝑒𝑛𝑡𝑒𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 = 𝑂𝑣𝑒𝑟 − 𝑆𝑒𝑔𝑚𝑒𝑛𝑡𝑒𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 = 𝑈𝑛𝑑𝑒𝑟 − 𝑆𝑒𝑔𝑚𝑒𝑛𝑡𝑒𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 = 𝑀𝑟𝑜 ∗ 𝑀𝑠𝑜 2 𝑀𝑟𝑢 ∗ 𝑀𝑠𝑢 2 Then, each cell was classified according to the highest value of the averaged metrics. If the averaged values were equal, classification preference was given in the order of correctly segmented first, then over-segmented, then under-segmented. The corresponding sets of reference groups and output segments were considered an instance of either correct segmentation, over-segmentation, or under-segmentation. If any cell was not labeled as either correctly segmented, over-segmented, or under-segmented, it received two classifications: as a missing segment with respect to the reference group, and as a noise segment with respect to the output group. 59 3.4.4 Segmentation Performance Curves As described in the Conceptual Framework, segmentation performance curves were generated. They show the number of instances of each classification, evaluated at the each tolerance level. The number of graphs would have been very large for each of the 5 classifications from 9 experiments, with 15 variable sets, and 30 bandwidths—20,250 graphs. Therefore, I chose to overlay the results of 30 bandwidths in each graph, reducing the number of graphs to 675. Each graph for each classification included the segmentation performance curves for every bandwidth plotted in gray and the highest performing bandwidth plotted in black. Also, I was mainly concerned with instances (paired reference and output groups) classified as correctly segmented. This selection further reduced the number of graphs to 135; 15 correctly segmented graphs for the variable sets of 9 experiments. Examples of these graphs are shown and discussed in the Results and Discussion section. 3.4.5 Area Under the Curve Index While a segmentation performance curve is a useful visualization of experimental results, the information content can be condensed to a metric for quantitative comparison. I extended the previous accuracy assessment of Hoover, et al. (1996) by calculating an additional metric. An area under the curve index (AUC) was calculated from the segmentation performance curves, with a range from a minimum of zero to an ideal index value of one. Since the ideal result of correct segmentation was specified as the number of reference groups, the actual results could be measured against this value. For each bandwidth of each variable set for each of the Mean Shift 𝑇ℎ 1 ∗ 𝑇𝑙 𝑛 experiments, the AUC was calculated with the following formula: 𝐴𝑈𝐶 = 𝐴 ∗ 60 where 𝐴𝑈𝐶 is the area under the curve index for each bandwidth of each variable set of each experiment, 𝐴 is the area under the segmentation performance curve (calculated with the trapezoidal rule), 𝑇ℎ is the high tolerance value, 𝑇𝑙 is the low tolerance value, and 𝑛 is the number of reference groups. The typical values for the formula are shown for 6 reference groups below: 𝐴𝑈𝐶 = 𝐴 ∗ 1.0 1 ∗ 0.5 6 The AUCs of different experiments provide information for a quantitative comparison of the performance of different variables for segmenting observations. I employed two comparative approaches with the new AUC information. The first approach uses the maximum AUC value of all bandwidths for each variable set and experiment (called "Max. AUC" from here forward). The Max. AUC values were ranked to reveal the highest performing bandwidth(s) and variable set(s) for each experiment. The second approach uses the average AUC of all bandwidths for each variable set and experiment (called "Avg. AUC" from here forward). The Avg. AUC values were ranked to reveal the highest performing variable sets (over all bandwidths) for each experiment. The AUC rankings are shown in the Results and Discussion section. 3.4.6 Segmentation Surface Plots Though the segmentation performance curves and the AUC Rankings are helpful aids to understanding the performance of the variable sets, they both failed to provide all available dimensions of performance. The segmentation performance curves lacked a clear view of each particular bandwidth and the AUC Rankings lacked a visual indication as to the character of a variable set's performance over all of the bandwidths. A better way of visualizing the results was necessary. 61 In addition to the other assessments, I produced an innovative representation of the Mean Shift clustering results. It is an extension of the segmentation performance curves for each variable set. I used a common raster surface plot, showing the tolerance values (on the x-axis), the Mean Shift clustering bandwidth (on the y-axis), and the number of correctly segmented groups (as a classified color) for each bandwidth and tolerance, totaling three dimensions. The number of correctly segmented groups is simply represented with color patches ranging from white, for the minimum of zero, to black, for the maximum ideal number of 6 groups (or 12 groups for the alternative color configuration). The segmentation surface plot visualization revealed which variable sets and which bandwidths perform well for each experiment. The Max AUC was readily apparent in many variable sets, bandwidths, and tolerance levels. However, this visualization particularly helped to identify which variable sets performed well over many bandwidths and tolerance levels. It visually demonstrated the characteristics and nuances of the Avg. AUC metric. The segmentation surface plots are shown in the Results and Discussion sections. 3.4.7 Statistical Analyses Even though the segmentation surface plots and the Avg. AUC showed potential for indicating the persistence of the variable sets' performance over all bandwidths, a more rigorous means of determining the variable sets' performance was through statistical analyses. Several statistical tests were conducted to more precisely understand the relative performance of variable sets. The noise evaluations offered an opportunity to generate enough samples for statistical testing. These multiple realizations of data also allowed for the evaluation of particular variable sets, addressing objectives relating to the most influential color spaces and overall characteristics: spatial 62 location, spatial relationship, or thematic attribute. The following paragraphs describe the information used and the statistical tests employed: one-sample t-tests, analysis of variance tests (ANOVA), and Post-ANOVA tests. The output is shown in the Results and Discussion section. 3.4.7.1 One-sample t-tests for the Noise Evaluation To statistically evaluate the resulting effects of increasing levels of noise in the raw data, 19 realizations of each level of noise were generated. The 5 levels of noise were simulated by adding to the noise-free data set individual samples from random distributions with means of zero and a variances ranging from 0.02 to 0.10 units, in increments of 0.02 units. The 19 realizations for each 5 levels of noise were sufficient to run 5 separate one-tailed, one-sample ttests, comparing each level of noise to a single mean value from the noise-free data set. The Mean Shift clustering results from each variable set was tested with 5 levels of noise. The null hypothesis is that the with-noise data sets have a mean that is equal to the noise-free data. The alternative hypothesis is that the with-noise data sets have a mean that is less than the noise-free data. A rejection threshold alpha value of α = 0.05 was employed. The noise evaluation hypotheses were: Ho: the mean of the with-noise results = the noise-free result Ha: the mean of the with-noise results < the noise-free result 3.4.7.2 The ANOVA and Post-ANOVA Tests for All Variable Sets I employed an ANOVA test to compare multiple variable sets and determine if the means are all equal or if at least one of the means is significantly different. On the assumption that some noise is commonly present in most observations and data collection, I used the 19 iterations of data containing the 0.02 level of noise. The Avg. AUC of all variable sets were submitted to the ANOVA test. The hypotheses were: 63 Ho: the mean of variable set 1 = the mean of variable set 2 = ... Ha: not all of the means are equal Since this only offers an indication of whether there is a difference of means, it was necessary to follow-up with a Post-ANOVA test to determine which means were significantly different from each other. Three Post-ANOVA tests were conducted: the Dunnett-Tukey-Kramer (DTK) test, the Tukey Honestly Significant Differences (HSD) test, and the Least Significant Different (LSD) tests. The outcomes are shown in the Results and Discussion section. 3.4.7.3 The ANOVA and Post-ANOVA Tests for Color Variable Sets To determine which color space had the highest performance and the most correctly segmented groups over all the bandwidths, an ANOVA test was run solely on the RGB, LUV, and LAB variable sets. The tests used similar data, information, and procedures as mentioned above for all variable sets: the Avg. AUC of 19 iterations of the 0.02 noise level were evaluated. Again, the three Post-ANOVA tests (DTK, HSD, and LSD) followed for a more detailed evaluation. 3.4.7.4 The ANOVA and Post-ANOVA Tests for Variable Type With the highest performing color space identified, an ANOVA test was conducted to evaluate the three variable types: spatial location (XYZ), spatial relationship (MV), and the thematic attributes (highest performing color). The tests used similar data, information, and procedures as mentioned above for all variable sets: the Avg. AUC of 19 iterations of the 0.02 noise level were evaluated. Again, the three Post-ANOVA tests (DTK, HSD, and LSD) followed for a more detailed evaluation. 64 3.4.8 3D Perspective Plots In addition to the quantitative metrics, it was important to see which particular points were grouped together. For a visual representation of the data, 3D perspective plots were generated from an oblique viewpoint, looking from the top, left, and front perspective. Spheres were used to symbolize each point to aid the perception of three-dimensional depth and understanding of which points were in the foreground and which were in the background. The points were colored based upon their membership in an output group. For most of the plots, if a point was not in a group it was appropriately invisible, though a few plots would show these as white. These plots offered a valuable way to view and interpret the patterns of grouped points. Selected plots are shown in the Results and Discussion section. 3.4.9 K-means Clustering Comparison In contrast to the Mean Shift clustering, the K-means method had no bandwidths, so only one set of results was produced. Segmentation performance curves, AUC rankings, and segmentation surface plots were generated and 3D perspective plots were produced for visual analysis. Examples of the output are in the Results and Discussion section along with a comparison of the parameters of the two clustering methods. 3.4.10 LiDAR Case Studies Building on the understanding of the Mean Shift clustering performance from the synthetic data experiments, my attention turned to understanding a real-world scene. The Indoor and Archaeological Site data sets were used. The same Mean Shift clustering methods were employed. The synthetic experimental results guided attention to particular bandwidths and variable sets. 65 To perform a quantitative analysis with the segmentation performance curves, Max. AUC and Avg. AUC rankings, and segmentation surface plots, it was necessary to select groups of points representing objects or features in the scenes. The Indoor LiDAR scan was labeled as 9 groups: 8 reference groups representing objects and one background group. The 8 reference objects were the following items: the brown box, red toolbox, blue pail, black case, green case, yellow ball, blue cup, and yellow ducks. For the Archaeological Site LiDAR data, the entire scan was labeled as three groups representing features in the landscape: the grass, the dirt, and the stone canal. Results were generated by employing the Mean Shift clustering to all of the initial 15 variable set combinations and 8 additional variable sets for alternative color spaces with the lightness removed. The quantitative accuracy assessment was performed and generated segmentation performance curves, the Max. AUC and Avg. AUC rankings, and segmentation surface plots. Finally, 3D perspective plots were also generated for a qualitative visual assessment and discussion of potential applications for geographic landscape representation. 66 Chapter 4 - RESULTS and DISCUSSION 4.1 Overview The main goal of this thesis is to develop a concept and a means of finding features in a landscape. As there is a potential for abundant 3D data sets augmented with additional attribute information, it is beneficial to aggregate the observed data points into generalized regions of homogeneous characteristics. These regions may represent landscape features or components of more complex objects. Groups of data points having similar spatial locations, spatial relationships, and thematic attributes might be found with clustering techniques such as the Mean Shift method. First, the experiments with synthetic data provided a controlled environment and included pre-selected reference groups for the accuracy assessment of 3D point segmentation. Then, for the case studies, the methods were applied to real world data. Though the case studies also included an investigation into more complex variation, such as spatial and spectral variation (e.g., rugged surfaces and shadows), knowledge gained from the synthetic data analyses guided the interpretation of the case study results. There was an abundance of output from the experimental analysis; I present a condensed assessment of that output in this chapter. 4.2 Synthetic Data Set The first result of my processing was simply the creation of an initial synthetic data set consisting of 3D (XYZ) point data with thematic (RGB) attributes, shown below. 67 Figure 12. Initial noise-free data set. This initial noise-free data set served as the basis for the selection of ontological properties for an intuitive selection of reference groups. The planar regions of each side were assumed to have distinctive spatial locations (XYZ) and relationships (MV) compared to other sides. So, the points representing each side might be clustered together to identify aggregated groups of points representing each planar region. Though the sides were components of the more complex cube object, each side of the cube also had distinctive thematic attributes (RGB, LUV, or LAB), which helped to intuitively identify them as uniquely different features or reference groups. 4.3 K-means clustering Using the noise-free data set, a brief comparison between the well-known K-means and the more recent Mean Shift clustering was done to describe differences in the processes and results. As an input parameter, the K-means requires the number of output groups to be specified. This is not commonly known for many data sets. However, 6 reference groups were labeled in the synthetic data set so 6 groups were specified for the K-means. Instead of specifying the number of output 68 groups, the Mean Shift clustering requires a bandwidth distance of evaluation. As mentioned in the Methods section, segmentation performance curves for 30 Mean Shift bandwidths were overlaid on the graphs. A comparison of selected segmentation performance curves is shown in Figure 13. The black lines show the number of correctly segmented groups found at each tolerance level. The K-means clustering yielded one set of results for each variable set. For comparing to the K-means, the highest performing of the 30 Mean Shift bandwidth results are shown in black (lower performing bandwidths are shown in gray). 69 K-Means Mean Shift Figure 13. K-means and Mean Shift Segmentation Performance Curves. 70 The segmentation performance curves showed that the Mean Shift clustering performs better than the K-means. One of the highest performing K-means variable sets, XYZ-MV-LAB, was matched by the Mean Shift with both meeting the ideal number of 6 reference groups up to a tolerance value of 1.0, or 100%. When the color variable set was switched, giving the XYZ-MVRGB set, the K-means only reached three correctly segmented groups up to a tolerance value of 0.9, dropping down to two groups when evaluated against a tolerance value of 1.0. This means there were three output groups with each having at least 90% of the points of some corresponding reference group, though only two of the groups contained a full 100% of the points in the reference groups. The Mean Shift again had at least one bandwidth that found all of the 6 reference groups perfectly. With only two characteristics included, the Mean Shift was still able to maintain a high performance, while the K-means remained lower. However, when one of the two variable sets was the spatial relationship MV variable set, even the Mean Shift dropped from having 6 correctly segmented groups through the 0.9 tolerance value down to zero correctly segmented groups at the 1.0 tolerance. The K-means had even less success and only had two correctly segmented groups up to the 0.6 tolerance, meaning the two groups had at least 60% of their points in common with some reference groups. These examples show the Mean Shift clustering performing much better than the K-means, even when the K-means was set to find the exact number of reference groups in the noise-free data. Used in computer science, computer vision, and machine learning disciplines, the conventional segmentation performance plots offer some level of quantitative and visual insight into the character of the resulting output segments. With many experiments to compare, however, the 71 first extension of the accuracy assessment was introduced, the area under the curve. Since the Mean Shift included 30 bandwidths for each variable set in each experiment, both the Max. AUC and the Avg. AUC were calculated, as described in the Methods section. The K-means AUC results and the Mean Shift Max. AUC results are shown in Table 12 as a ranked graph with the highest performing variable sets on top. K-means AUC 1.00 1.00 1.00 1.00 1.00 0.50 0.50 0.48 0.47 0.10 0.10 0.03 0.00 0.00 0.00 Mean Shift Max AUC 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.90 0.02 0.00 0.00 0.00 Variable Set RGB LUV LAB XYZ-LAB XYZ-MV-LAB XYZ-RGB XYZ-LUV XYZ-MV-RGB XYZ-MV-LUV XYZ-MV MV-LUV MV-LAB XYZ MV MV-RGB Variable Set XYZ RGB LUV LAB XYZ-RGB XYZ-LUV XYZ-LAB XYZ-MV-RGB XYZ-MV-LUV XYZ-MV-LAB XYZ-MV MV-RGB MV MV-LUV MV-LAB Table 12. K-means and Mean Shift AUC rankings. This table shows that, for many variable sets, some bandwidth in Mean Shift resulted in a higher Max. AUC than the AUC of the K-means method. In fact, with this noise-free data set, the Mean Shift had a perfect Max. AUC value of 1.0 for most of the variable sets and their combinations, meaning that through the tolerance value of 1.0, 100% of the points were correctly segmented into 6 groups matching the reference groups. The only drop off in Mean Shift performance was where the MV variable set was alone or in small combinations. The K-means seemed to do well 72 with some variable set combinations with color included, especially the LAB color space values. Overall, the common K-means clustering results rarely matched the performance of the highest performing Mean Shift bandwidths. Affecting the results was a difference in the way each clustering process identifies clusters. With the K-means, the initial cluster seed locations are random and eventually move to the highest density of nearby points. With the Mean Shift, the bandwidth of evaluation is placed on every point, moving to the local center of density. The Mean Shift may be a more comprehensive search and offers the flexibility of finding any number of modes, or cluster centers. For a visual evaluation, several 3D perspective plots from the K-means are shown in Figure 14; the Mean Shift plots are discussed in later sections. XYZ MV RGB XYZ-RGB Figure 14. K-means 3D perspective plots. 73 These plots offered an opportunity to learn more about the characteristics of the clustering results. Each variable set had particular properties and therefore particular tendencies in the clustering process. For instance, the XYZ variable set was intended to provide a measure of proximity, with similar valued points clustered. The expected output was locally contiguous groups of points. This was viewed as essential to separating different sides of the cube, or similar objects in a landscape separated by some distance. The 3D perspective plot for the XYZ variable set shows an unexpected K-means clustering result. The points from three sides, near corners of the cube, were grouped together. This is a reasonable outcome: points near a corner on one side are closer to points on a second side than they are to other points on the same side. The 3D perspective plot for the MV variable set also revealed a partially unexpected result. The MV variable was intended to provide a value of each point's relationship to the neighboring points in an unstructured 3D space. As described in the Conceptual Framework, by averaging the vector values from a point to its neighbors, the Mean Vector was shown to have different values in a planar region than near an edge or in an area of high variability. The formula and values did follow that presumption, MV values of points representing planar regions had different values than those near edges. It was assumed that the XYZ variable set would help to separate planar regions with different locations or orientations. It was also expected that points near edges would have different values. However, it was not expected to majorly degrade planar regions near the edges. To notable effects arose. First, the values for all planar regions were so similar that they were all clustered together. Second, the number of points representing sides were almost always reduced (and effectively degraded) because the edges and corner points were separated into other 74 clusters. Though this may be helpful for applications interested in finding planar regions and edges, it was not perfectly matched for finding the solely planar regions. The 3D perspective plot for the 6 K-means groups clearly shows the edges as separated into different groups from the sides. Additionally, since only 6 groups were specified, the edges along the back side were not separated from the group of all planar regions. If 7 groups had been specified, the edges may have been separated from the 6 planar regions. Though the MV may be helpful for finding edges, for this thesis the MV generally degraded the output groups by reducing the number of points per side. The 3D perspective plot for the K-means clustering of the RGB variable set shows a perfect segmentation. All 6 reference groups were perfectly segmented based upon the RGB values of 0, 0.5, 1 for red, green, and blue as specified in the Methods section. Though this was very helpful for clustering, it is important to note that this synthetic data structure greatly influenced the outcome. The small number of unique RGB values were very separable compared to the uniform distribution of more XYZ values (0-9) or the highly variable and less separable values of the MV, LUV, and LAB variable sets. Thus, the distribution of the RGB values and the structure of the initial synthetic data set appeared to boost the segmentation results of the variable sets including the RGB values. The XYZ-RGB 3D perspective plot shows strong RGB influence. Recalling from the K-means segmentation performance curve and the AUC value, the XYZ-RGB variable set had three correctly segmented groups throughout all of the tolerance values. The 3D plot shows three sides perfectly grouped (left, front, and right), two sides as one group (top and bottom), and one side 75 split in half (back). The XYZ component may have influenced the splitting of the back side, though it did not create groups near corners as did the individual variable set. The RGB portion held together three sides individually and two sides as a group (the top and bottom were both levels of green, 1.0 and 0.5, respectively.) This begins to show that the synthetic experiments were strongly influenced by the pre-defined structure of the RGB variable set. The K-means has been shown to have lower performance than the Mean Shift using several assessment aids. However, the K-means clustering was beneficial as it provided a baseline comparison of results, as well as insight into the processes and parameters of the two techniques. The visual evaluation helped to further understanding of the segmentation performance curves and the AUC rankings. It also offered an opportunity to discuss some of the key issues involving the synthetic data and XYZ, MV, and RGB variable sets. The remaining results and discussion concern the higher performing Mean Shift clustering method, as it shows relatively higher potential for being able to identify homogeneous regions in the landscape and group observations together by spatial location, spatial relationship, and thematic attributes. 4.4 Spatial Resolution The results for the low-resolution (2 x 2 points per side) and the high resolution (44 x 44 points per side) data sets will now be presented and compared to the noise-free data set. Starting with the low resolution data, the selected segmentation performance curves and Max. AUC rankings, Figure 15 and Table 13, respectively, show that there was a maximum of one correctly segmented group throughout all of the variable sets and Mean Shift bandwidths. The MV again degraded the results, especially considering the 8 nearest neighbors for the calculation used points from 3 sides. This shows that if the MV variable were used to identify edges, the number 76 of neighbors and the relationship between distances to planar and non-planar neighbors is an important consideration. Figure 15. Low resolution segmentation performance curves. Max AUC 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.00 0.00 Variable set XYZ MV RGB XYZ-MV XYZ-RGB XYZ-LUV XYZ-LAB MV-RGB MV-LUV MV-LAB XYZ-MV-RGB XYZ-MV-LUV XYZ-MV-LAB LUV LAB Table 13. Low resolution Max. AUC rankings. For the high resolution data set, the segmentation performance curves shown in Figure 16 are examples that the variable sets yielded between zero and 6 correctly segmented output groups. Apparently, the Mean Shift bandwidths reached certain distance thresholds that abruptly changed 77 the performance, but were stable through all tolerance values. The performance held steady through all tolerance values for every variable set, except the XYZ variable set. Figure 16. High resolution segmentation performance curves. This evaluation of resolution made use of multiple bandwidths for the Mean Shift clustering. Though the Max. AUC may have been achieved for at least one bandwidth, it was difficult to determine how many of the 30 overlaid bandwidths reached the maximum. For example, the XYZ, XYZ-LAB, and XYZ-RGB variable sets in Figure 16 may have all reached the maximum at some point, but it was more important to know the degree to which each variable set's highest performance persisted through all of the bandwidths evaluated. To address this aim, the Avg. AUC (averaged across all bandwidths) for each variable set was calculated. This quantitative 78 metric provided a number with which to rank a variable sets' overall performance across all of the bandwidths evaluated. In summary, the Max. AUC can show which variable sets had the highest performing bandwidths and the Avg. AUC can show which variable sets performed well over all of the bandwidths. The complementary high resolution Max. AUC and Avg. AUC rankings are shown in Table 14. Max AUC 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Variable Set XYZ RGB LUV LAB XYZ-RGB XYZ-LUV XYZ-LAB MV XYZ-MV MV-RGB MV-LUV MV-LAB XYZ-MV-RGB XYZ-MV-LUV XYZ-MV-LAB Avg AUC 0.30 0.27 0.27 0.20 0.20 0.19 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Variable Set XYZ-RGB XYZ-LUV XYZ-LAB RGB LUV LAB XYZ MV XYZ-MV MV-RGB MV-LUV MV-LAB XYZ-MV-RGB XYZ-MV-LUV XYZ-MV-LAB Table 14: High resolution Max. AUC and Avg. AUC rankings. Beyond the Max. AUC and Avg. AUC rankings, a second innovation uses the segmentation performance curve information to create new segmentation surface plots which illustrate the performance over every bandwidth. These plots provided an additional means of evaluating the quantitative measures of segmentation performance over all of the bandwidths using a single visualization. As described in the Methods section, the x-axis represents the tolerance values (0.5 to 1.0), the y-axis represents each bandwidth (0.0 to 2.9 standard deviation units), and all locations in the graph contain color patches indicating the number of correctly segmented groups 79 (0 to 6, from white to black, respectively.) Examples are shown Figure 17. Figure 17. High resolution segmentation surface plots. The new segmentation surface plots readily display the number of correctly segmented output groups in shades of gray from zero (white) to 6 (black). The XYZ-RGB variable set, with an Avg. AUC of 0.30, has many bandwidths that have the ideal 6 groups correctly segmented in bandwidths ranging from 0.3 to 1.1 standard deviation units. The XYZ-LAB had an equal Max. AUC of 6 groups, but a slightly lower Avg. AUC of 0.27. This was apparently due to fewer correctly segmented groups (in levels of gray) in the 1.0 and 1.1 bandwidths; lower bandwidths to 0.3 performed as well as the XYZ-RGB. The segmentation surface plot for the XYZ set shows that its performance peaked at 6 groups, but dropped off at varying tolerance levels for some bandwidths. This was a curious effect, that performance would increase, then decrease, and increase again before finally degrading with higher bandwidths. To investigate which points were being grouped together for the high resolution data set, the 3D perspective plots are shown in Figure 18. 80 Bandwidth=0.1 Bandwidth=0.3 Bandwidth=0.5 Bandwidth=0.6 Bandwidth=0.7 Bandwidth=0.9 Bandwidth=1.0 Bandwidth=1.3 Figure 18. High resolution 3D perspective plots. The 3D perspective plots revealed several unexpected results for the XYZ variable set. The 0.1 bandwidth (whose plot is a closer view and not at the same scale as the others) was apparently the lower threshold for generating a corner grouping as previously noted for the XYZ variable set However, only one corner was created; why other corners were not found was unclear. At the 0.3 bandwidth, rectangular regions on each side were grouped, though it was still not enough points to meet the majority criteria for correctly segmented groups. At the 0.5 bandwidth, the circular edges reappeared in the spatial pattern, at least indicating that some points were being clustered at some distance from the center of each side. At bandwidth 0.6, the high resolution XYZ variable set clustered 6 correctly segmented groups in a circular pattern from the center of each side; this persisted through the 0.8 tolerance value as shown in the segmentation surface plot. At a larger bandwidth of 0.7, the circular shape reduced in size and the number of clustered points decreased, with the 6 correctly segmented groups having over 70% of the points in the reference group, a drop from the previous bandwidth's percentage of over 80%. The only 81 explanation I can provide for this is the possibility that the bandwidth is now just large enough to create competition between different modes that the Mean Shift cannot resolve. As the bandwidth grows to 0.9, the circle grows again and maintains 6 correctly segmented groups through a tolerance value of 0.8, or over 80% of the reference points. Eventually, between the 1.0 and 1.2 standard deviation bandwidths, the variable set reaches a perfectly segmented cube; each planar side has 100% of the points included in an output group. This is shown in the both the 3D perspective plot (Figure 18) and the segmentation surface plot (Figure 17). Finally, at the bandwidths of 1.3 and higher, the points are contained in one single group. This undersegmentation is a common result for large bandwidths in all variable sets. This evaluation demonstrates the complementary value of the new segmentation surface plots and the 3D perspective plots. To compare the different resolutions (point spacings of 0.2, 1.0, and 3.0), the visualizations offered great potential for revealing the character of the clustering results. However, to make a final determination of the importance of various levels of resolution, I made use of the Avg. AUC metrics, shown in Table 15. It was a convenient way to quantitatively rank the variable sets and reach conclusions about the different resolutions. The rankings show that the lower resolution was problematic and there were too few points to find locally homogeneous groups. The higher resolution provided numerous points and intriguing results. However, it did not appear necessary to have a very high density of data to correctly segment the points. The moderate spacing of 1.0 units actually had an equal or higher Avg. AUC for most variable sets; it performed much better with the XYZ variable set and even had some success with the MV variable set. 82 Spacing 3.0 Avg AUC Variable set 0.06 XYZ 0.06 MV 0.06 XYZ-RGB 0.06 MV-RGB 0.06 MV-LUV 0.06 MV-LAB 0.06 XYZ-LUV 0.06 XYZ-LAB 0.05 XYZ-MV-RGB 0.05 XYZ-MV-LUV 0.04 XYZ-MV-LAB 0.04 XYZ-MV 0.03 RGB 0.00 LUV 0.00 LAB Spacing 1.0 Avg AUC Variable set 0.31 XYZ-RGB 0.28 XYZ-LUV 0.28 XYZ 0.27 XYZ-LAB 0.20 RGB 0.20 LUV 0.19 LAB 0.15 XYZ-MV-LUV 0.13 XYZ-MV-RGB 0.13 XYZ-MV-LAB 0.03 XYZ-MV 0.00 MV-RGB 0.00 MV 0.00 MV-LUV 0.00 MV-LAB Spacing 0.2 Avg AUC Variable set 0.30 XYZ-RGB 0.27 XYZ-LUV 0.27 XYZ-LAB 0.20 RGB 0.20 LUV 0.19 LAB 0.19 XYZ 0.00 MV 0.00 XYZ-MV 0.00 MV-RGB 0.00 MV-LUV 0.00 MV-LAB 0.00 XYZ-MV-RGB 0.00 XYZ-MV-LUV 0.00 XYZ-MV-LAB Table 15. Comparison of various resolutions' Avg. AUC. 4.5 Noise Evaluation The experiments up to this point only made use of noise-free data sets; equally spaced spatial patterns and perfectly selected attribute information. In practice, any investigation is subject to error. The source of error may be in the experimental design, data collection, processing steps, or analyses. Considering 3D spatial point data, observational equipment (e.g., range scanners, GPS receivers) or complex environmental variability (e.g., leaves on vegetation, varying temperature or moisture) could potentially lead to noisy data sets with some level of uncertainty as to the location and boundaries of landscape features. Before working with real world data, I created 19 sets of data for 5 levels of noise in order to gain a more thorough understanding of the variable sets and Mean Shift clustering results. The synthetic experiments leveraged computational tools, 83 generated data samples with increasing levels of noise, and produced the most reliable results of this thesis through statistical analyses and hypothesis testing. The Methods section provided details of the data set production. As a visual reference, the examples below show in Figure 19 the spatial variability of the noise-free data and sets with 5 levels of increasing noise. The (non-noisy) colors represent the 6 reference groups (the plots are at different scales to show outliers.) Noise=0.00 Noise=0.02 Noise=0.04 Noise=0.06 Noise=0.08 Noise=0.10 Figure 19. Examples of adding noise to the spatial data. Only one noise-free data set was created and 19 data sets of each level of noise were created. The noise, randomly selected from a Gaussian distribution having a mean of zero and variance ranging from 0.02 to 0.10 in increments of 0.02 was added to the spatial (XYZ) and thematic (RGB) values. Next, other derived values (MV, LUV, LAB) were processed and all variable sets were standardized as with the noise-free data set. The 19 sets of noisy data provided a sufficient number of random samples to perform statistical analyses. The accuracy assessment was performed on all the data sets in this experiment to find the number of correctly segmented groups at each bandwidth for each variable set. Then, the Avg. AUC values were calculated as measures of overall performance of each variable set; this value was used for all statistical analyses. 84 4.5.1 One-sample t-tests for All Variable Sets at All Levels of Noise The first aim was to determine what level of noise produced significantly lower results compared to the noise-free data set. A one-tailed, one-sample t-test was performed for each variable set and each level of noise, comparing the mean of the 19 samples of each level of noise to the noise-free value. A rejection threshold alpha value of α = 0.05 was employed. The hypotheses are presented below and the Mean Avg. AUC and p-values are shown in Table 16. Ho: the mean of the 19 noisy Avg. AUC results for each variable set = the noise-free Avg. AUC for each variable set Ha: the mean of the 19 noisy Avg. AUC results for each variable set ≤ the noise-free Avg. AUC for each variable set 85 Noise-free 0.00 XYZ Mean Avg. AUC 0.275 P-value MV Mean Avg. AUC 0.000 P-value RGB Mean Avg. AUC 0.200 P-value LUV Mean Avg. AUC 0.200 P-value LAB Mean Avg. AUC 0.189 P-value XYZ-MV Mean Avg. AUC 0.032 P-value XYZ-RGB Mean Avg. AUC 0.306 P-value XYZ-LUV Mean Avg. AUC 0.283 P-value XYZ-LAB Mean Avg. AUC 0.272 P-value MV-RGB Mean Avg. AUC 0.002 P-value MV-LUV Mean Avg. AUC 0.000 P-value MV-LAB Mean Avg. AUC 0.000 P-value XYZ-MV-RGB Mean Avg. AUC 0.128 P-value XYZ-MV-LUV Mean Avg. AUC 0.145 P-value XYZ-MV-LAB Mean Avg. AUC 0.128 P-value 0.02 0.243 <<0.001 0.000 0.206 0.134 <<0.001 0.122 <<0.001 0.011 <<0.001 0.304 0.159 0.266 <<0.001 0.260 <<0.001 0.008 0.999 0.007 0.999 0.004 0.999 0.168 0.999 0.154 0.999 0.137 0.998 *** *** *** *** *** *** 0.04 0.174 <<0.001 0.000 0.198 0.065 0.083 <<0.001 0.077 <<0.001 0.009 <<0.001 0.295 <<0.001 0.228 <<0.001 0.217 <<0.001 0.031 0.999 0.008 0.999 0.008 0.999 0.164 0.999 0.121 <<0.001 0.115 <<0.001 *** *** *** *** *** *** *** *** *** With-noise 0.06 0.08 0.10 0.101 0.061 0.041 <<0.001 *** <<0.001 *** <<0.001 *** 0.000 0.000 0.000 0.182 0.165 0.153 <<0.001 *** <<0.001 *** <<0.001 *** 0.045 0.030 0.018 <<0.001 *** <<0.001 *** <<0.001 *** 0.043 0.026 0.017 <<0.001 *** <<0.001 *** <<0.001 *** 0.006 0.005 0.005 <<0.001 *** <<0.001 *** <<0.001 *** 0.281 0.264 0.253 <<0.001 *** <<0.001 *** <<0.001 *** 0.171 0.125 0.092 <<0.001 *** <<0.001 *** <<0.001 *** 0.186 0.149 0.128 <<0.001 *** <<0.001 *** <<0.001 *** 0.030 0.019 0.014 0.999 0.999 0.999 0.004 0.002 0.001 0.999 0.999 0.977 0.004 0.002 0.001 0.999 0.999 0.999 0.155 0.138 0.126 0.999 0.999 0.133 0.084 0.050 0.030 <<0.001 *** <<0.001 *** <<0.001 *** 0.093 0.068 0.050 <<0.001 *** <<0.001 *** <<0.001 *** *** indicates significance at α = 0.001 Table 16. Mean Avg. AUC and P-values for one-tailed, one-sample t-tests. Based on the fact that many noise-free variable sets produced perfect segmentations (6 correctlysegmented groups through tolerance levels ranging from 0.5 to 1.0), I expected a small amount of noise would not significantly degrade the performance. However, the p-values from the onesample t-test showed that even a small amount of noise produced significantly lower results than the noise-free data. Because the noisy samples for each variable set and level of noise had small variability, they produced very small margins of error (~ ±0.003). This provided assurance that 86 the true mean value for each variable set and level of noise was most likely within small 95% confidence intervals around the sampled means. There were a few issues with the tests that require explanation. The 19 RGB data sets with 0.02 noise yielded the same exact Avg. AUC value. There was no variation in the samples and the ttest resulted in a null value. In similar fashion, the MV variable set performed poorly in the noise-free data and could not perform any worse with noise. The zero values generated a null output from the t-test. Most of the other combined variable sets which included the MV set actually managed to improve the Avg. AUC slightly for low levels of noise. This was probably due to the noise actually reducing the negative effect of the Mean Vector, allowing the other variables (e.g., XYZ, RGB, etc.) to have a positive effect on the performance until high levels of noise eventually degraded the entire set. Regardless of any slight improvement, most of the ttests with the MV set did not have significantly lower values since they were very low in the noise-free set. Nearly all of the other noisy variable sets performed significantly lower than the noise-free values. The only variable sets that were not significantly lower were the XYZ-RGB at the 0.02 noise level and the RGB at the 0.04 noise level. Both of these variable sets performed very well compared to the noise-free sets and likely benefitted from the strong separability of the RGB values, as previously noted. 4.5.2 Selection of the 0.02 Noise Level for Additional Analyses One of the purposes of evaluating which level of noise was not significantly different from the noise-free level was to provide a means of obtaining many samples. In practice, a two percent 87 level of noise could be considered common, whether due to variability in the environment, or errors in observations or sensors. Though the 0.02 noise level was often significantly lower for many of the variable sets, the 0.02 level of noise was selected as a reasonably low level to generate enough randomly sampled synthetic realizations for further statistical analyses. This was deemed acceptable because the values of the highly performing RGB variable set indicated a performance threshold near the 0.02 level of noise. The RGB values had no variance at the 0.02 level of noise; reducing the level of noise may result in more variable sets with no variance. Increasing the level of noise resulted in more variable sets having significantly lower performance. 4.6 Comparison of Multiple Means with Analysis of Variance (ANOVA) tests The objectives of this thesis were to determine the highest performing variable sets, color spaces, and which type of variable set was most helpful (spatial location, spatial relationship, or thematic attribute.) It was therefore necessary to compare multiple mean values. The One-way ANOVA test is suitable for comparing multiple means with different variances and sample sizes. This test determines if the means are equal or if any one of them is significantly different. A PostANOVA test is necessary to determine which particular mean values are significantly different. The three Post-ANOVA tests I employed were the Dunnett-Tukey-Kramer (DTK) test, the Tukey Honestly Significant Differences (HSD) test, and the Least Significant Differences (LSD). They are all very closely related and generally find the least significant differences for each mean. The DTK and HSD tests result in sets of pairwise comparisons. These two tests had similar, though not exactly the same, results. The differences are attributable to the numerous adjustments available as extensions to the main test. The LSD had similar results regarding the mean, confidence intervals, and the least significant difference values. However, this test simply 88 outputs all of the means, the least significant differences and determines groups of means that are not significantly different. Since the LSD output is more clear than the multitude of pairwise comparisons, I first show the ANOVA results and the LSD results follow. 4.6.1 The ANOVA and Post-ANOVA Tests for All Variable Sets Continuing investigation as to which variable sets have the highest performance, all of the Avg. AUC values from 19 samples (from the 0.02 noise level) of 15 variable sets were used for the ANOVA test. A rejection threshold alpha value of α = 0.05 was employed. The hypotheses are presented below and the results are shown in Table 17. The p-value shows there is a significantly low probability that the means are equal; as such, the null hypothesis is rejected. Ho: the mean of the 19 noisy Avg. AUC results for variable set (1) = the mean of the 19 noisy Avg. AUC results for variable set (2) = ... = the mean of the 19 noisy Avg. AUC results for variable set (15) Ha: not all of the means of the 19 noisy Avg. AUC results for the variables sets were equal Analysis of Variance Table: All Variable Sets Response: Avg. AUC Df Sum Sq Mean Sq F value Pr(>F) Variable_Set 14 3.09301 0.220929 3889.4 < 2.2e-16 *** Residuals 270 0.01534 0.000057 Table 17. ANOVA results for all variable sets. The Least significant differences Post-ANOVA test followed. A rejection threshold alpha value of α = 0.05 was employed. The resulting graph, Figure 20, shows a bar for the mean of all 19 89 samples of each variable set and whiskers for the least significant difference for the mean of each variable set. Groups of means are also labeled with letters. Mean values belonging to the same group are not significantly different from each other. Figure 20. Least significant differences results for all variable sets. Compared to the one-sample, noise-free Avg. AUC ranking, this graph contains a more reliable ranking of variable sets because it was based on the mean and variance of 19 samples of a realistic value of 2% noise (0.02 variance). As noted previously, the margin of error and 95% 90 confidence intervals were sufficiently small, as well. The whiskers represent the least significant difference for each variable set's mean. The letters indicate groups of means that were not significantly different from each other; variable sets with a shared letter were not significantly different. The most obvious pattern in the graph is that the MV and two-set combinations with MV performed very poorly. The performance of the MV improved only when used in a three-variable set combination. However, these are still relatively low compared to the XYZ and color variable sets. The MV evidently degrades segmentation performance. This indicates a conceptual mismatch between the selected reference groups and the implementation of the spatial relationship metric, MV. This metric resulted in different values (and thus different output groups) for points near some edges as compared to other edges, corners, and planar regions. This may be helpful for some applications. Unfortunately, for this experimental framework, removing the edge points from the planar points degraded the percentage of points belonging to groups representing sides. The remaining results for the XYZ and color variable sets also offered an answer for the general hypotheses of this thesis: that multiple variable sets enhance the ability to find groups of data as compared to single variable sets. With the three-variable set combinations degraded by the MV, it was readily apparent that the two-variable set combinations of XYZ and color had significantly higher performance than the individual XYZ and color variable sets. Even the poorly performing MV variable set had boosted performance when combined with two additional variable sets. So, the identification of points representing regions of homogeneity is boosted by the use of 91 additional variable sets. Spatial location and thematic attributes help to increase the number of points correctly segmented as sides, or components, of the more complex 3D cube object. The least significant differences graph of all variable sets provided clear and encouraging results of higher performance with additional variable sets. 4.6.2 The ANOVA and Post-ANOVA Tests for Color Variable Sets To identify the best performing color space, all of the Avg. AUC values from 19 samples (from the 0.02 noise level) of the 3 individual color space variable sets were used for the ANOVA test. A rejection threshold alpha value of α = 0.05 was employed. The hypotheses are presented below and results are shown in Table 18. The p-value shows there is a significantly low probability that the means are equal; as such, the null hypothesis is rejected. Ho: the mean of 19 noisy Avg. AUC results for the RGB variable set = the mean of 19 noisy Avg. AUC results for the LUV variable set = the mean of 19 noisy Avg. AUC results for the LAB variable set Ha: not all of the means of the 19 noisy Avg. AUC results for the RGB, LUV, and LAB variables sets were equal Analysis of Variance Table: Color Variable Sets Response: Avg. AUC Df Sum Sq Mean Sq F value Pr(>F) Variable_Set 2 0.07741 0.038704 817.54 < 2.2e-16 *** Residuals 54 0.00256 0.000047 Table 18. ANOVA results for color variable sets. The least significant differences Post-ANOVA test followed. A rejection threshold alpha value of α = 0.05 was employed. The resulting graph in Figure 21 shows a bar for the mean of all 19 92 samples of each color variable set and whiskers for the least significant difference for each group mean. Groups of means are also labeled with letters. Mean values belonging to the same letter group are not significantly different from each other. Figure 21. Least significant differences results for color variable sets. First, it is important to note that the RGB variable set did not have least significant distances because the 0.02 noise samples for RGB all had the same Avg. AUC value; there was no variance. The 19 RGB samples, therefore, had no margin of error. As the test results show, the RGB variable set had a significantly higher performance than the LUV, and LAB variable sets. 93 This was most likely related to the construction of the synthetic data set. The RGB values were much more separable and able to cluster much better than the LUV or LAB sets. For example, the color space and standardized values for both full intensity red and blue are shown in Table 19. The range and variance for the difference of the standardized RGB, LUV, and LAB values for red and blue are also shown. In the example, the RGB values had a much larger range and variance providing for highly separable clusters compared to the LUV and LAB values. The design of the synthetic data set provided values that helped the RGB variable set cluster more successfully. To overcome this, a wider variety of different colors should have been used. This would also more readily parallel the potential variation in thematic attributes in a real world investigation. Though the basic data used here were meant to be a simplistic representation in a test environment, it brought about useful insights relating to the clustering process, color variable sets, and thematic data values. Variable Set Color Space Values Standardized Values RGB (red) RGB (blue) Difference 1.00 0.00 1.00 0.00 1.00 -1.00 1.96 -0.65 2.62 -0.65 -0.65 0.00 -0.65 1.96 -2.62 5.23 6.84 LUV (red) LUV (blue) Difference 53.24 32.30 20.94 175.01 37.76 -9.40 -130.34 184.42 168.09 0.14 -0.81 0.95 1.58 -0.34 1.93 0.37 -1.56 1.93 0.98 0.32 LAB (red) LAB (blue) Difference 53.24 32.30 20.94 80.09 67.20 79.19 -107.86 0.91 175.06 0.14 -0.81 0.95 0.82 0.81 0.01 0.70 -1.54 2.24 2.22 1.25 0.00 0.00 0.00 Range Variance Table 19. Color space value examples favoring clustering of the RGB variable set. 94 4.6.3 The ANOVA and Post-ANOVA Tests for Variable Types Determining the highest performing variable type is also a research question in this thesis. Spatial location, spatial relationship, and thematic attributes are represented in this test by XYZ, MV, and RGB, respectively. RGB was chosen here because it was the highest performing color space, even though the data were created in a fashion that increased the possibility of RGB clustering successfully. Also, at this point in the evaluation, the MV variable set was expected to reduce the number of correctly segmented groups, though these statistical tests offer a confirmatory test. All of the Avg. AUC values from 19 samples (from the 0.02 noise level) of the 3 selected variable types were used for the ANOVA test. A rejection threshold alpha value of α = 0.05 was employed. The hypotheses are presented below and the results are shown in Table 20. The p-value shows there is a significantly low probability that the means are equal; as such, the null hypothesis is rejected. Ho: the mean of 19 noisy Avg. AUC results for the XYZ variable set = the mean of 19 noisy Avg. AUC results for the MV variable set = the mean of 19 noisy Avg. AUC results for the RGB variable set Ha: not all of the means of the 19 noisy Avg. AUC results for the XYZ, MV, and RGB variables sets were equal Analysis of Variance Table: Variable Types Response: Avg. AUC Df Sum Sq Mean Sq F value Pr(>F) Variable_Set 5 1.9979 0.39958 45.006 < 2.2e-16 *** Residuals 393 3.4892 0.00888 Table 20. ANOVA results for variable types. The Least significant differences Post-ANOVA test followed. A rejection threshold alpha value of α = 0.05 was employed. The resulting graph, Figure 22, shows a bar for the mean of all 19 95 samples of each variable type and whiskers for the least significant difference of the group mean. Groups of means are also labeled with letters. Mean values belonging to the same letter group are not significantly different from each other. Figure 22. Least significant differences results for variable types. After looking at the previous tests of all variable sets and selecting the highest performing color variable set, RGB, this test was aimed at finding the most beneficial type of variable set: spatial location (XYZ), spatial relationship (MV), or thematic attribute (RGB). Already noting the 96 effects of MV, it was easy to deduce that performance was much higher without the MV and much lower with the MV. As for the XYZ and RGB variable sets, a higher mean was found with the XYZ, though it was not significantly higher than with the RGB. This is indicated by both "With XYZ" and "With RGB" belonging to the same group of means, "B". These two collections performed significantly better than without either of the XYZ or RGB sets, which were in a group of lower means, "C". This indicates that using the spatial and thematic variable sets improved segmentation performance in the experiments. The process of grouping points representing sides, or parts of the whole cube, benefitted from having both the XYZ and RGB data included. 4.7 Alternative Color Configuration The alternative color configuration was intended to address the question of identifying regions having similar spatial location and relationship and differing in the values of thematic attributes. The previous noise evaluation did alter colors as well as spatial properties, though it was meant to impose increasing levels of variability in the data. The alternative color configuration used the noise-free spatial values and only altered the colors per side. Instead of one color per each side, different colors were applied to each half of a side. The initial red, green, and blue color combinations were used along with new combinations as noted in the Methods section. Adjacent sides, across edges of the cube, had either the same or different colors to test the clustering ability when encountering edges. The accuracy assessment tested the results for 12 reference groups containing 32 points per group. The segmentation performance curves only showed a few instances when the clustering produced one correctly segmented group which lasted through a tolerance value of 0.7, at the most. Two examples are shown in Figure 23 with the only two resulting patterns. 97 Figure 23. Alternative color configuration segmentation performance curves. The variable set combination of XYZ and each color had the best results as shown on the left. Adding the MV to the XYZ and color variable sets again degraded the performance. Interestingly, none of the color variable sets correctly segmented groups individually. This suggested that when new color combinations were used (beyond solely red, green, or blue), the color variable sets did not have any success. The Max. AUC and Avg. AUC rankings, in Table 21 below, show the level of performance for the variable sets in the alternative color configuration. 98 Max AUC 0.08 0.08 0.08 0.02 0.02 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12 Reference Groups (Alt. Color) XYZ-RGB XYZ-LUV XYZ-LAB XYZ-MV XYZ-MV-RGB XYZ-MV-LUV XYZ-MV-LAB XYZ MV RGB LUV LAB MV-RGB MV-LUV MV-LAB Avg AUC 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12 Reference Groups (Alt. Color) XYZ-RGB XYZ-LUV XYZ-LAB XYZ-MV-RGB XYZ-MV-LUV XYZ-MV-LAB XYZ-MV XYZ MV RGB LUV LAB MV-RGB MV-LUV MV-LAB Table 21. Alternative color Max. AUC and Avg. AUC. The 3D perspective plots offer the best insight into the clustering performance with the alternative color configuration, shown in Figure 24. XYZ-RGB Bandwidth=0.1 XYZ-RGB Bandwidth=0.3 MV-RGB Bandwidth=0.1 RGB Bandwidth=0.1 XYZ Bandwidth=0.5 MV Bandwidth=0.1 Figure 24. Alternative color configuration 3D perspective plots. 99 One of the highest performing variable sets, with only one correctly segmented group, was the XYZ-RGB. At a very small bandwidth of 0.1 standard deviation units, only 18 out of 32 points in the reference group were clustered together. At the 0.3 bandwidth, the group combined with an adjacent group across an edge of the cube. The MV-RGB variable set with a small 0.1 bandwidth grouped portions of several halves together into two groups. The points near edges were excluded, as was also the case for the individual MV variable set. The XYZ variable set managed to group three sides into separate groups and three other sides into one group. However, since the aim here was to find 12 reference groups, two groups per side, this result was not correctly segmented. The influence of thematic information was readily apparent with the RGB variable set. Halves of sides were in separate groups, since they had different colors. However, some halves on different sides were grouped together and not correctly segmented per the assumption of 12 reference groups. The individual XYZ and RGB variable sets exhibited some expected tendencies. The XYZ did group many points that were close to each other and the RGB and other color spaces grouped similar colors together. Though they clustered well per the input data values, they did not find the 12 separate reference groups. This effect supported the idea that features with different spatial or thematic attributes could result in correctly segmented groups with respect to the input data. However, the combinations of spatial and thematic variable sets did not perform as well. One additional observation, however, is that smaller bandwidths appeared to work better in the alternative color configuration. This may be due to the smaller distances between values of both the spatial and thematic attributes as compared to the other experiments with 6 groups. Bandwidths may be sensitive to the range of the variable set values, as well as the number and spacing of modes within each variable set (e.g., more sets of colors would have more modes for the thematic variable). 100 4.8 LiDAR Case Studies The final investigations of this thesis involve data collected from real-world scenes. Terrestrial LiDAR scans were collected in two scenes, an Indoor and an Archaeological Site scene. The details of both are described in the previous Methods section. All of the same 15 variable set combinations were generated and supplemented with 8 new combinations including two new variable sets UV and AB (simply the LUV, and LAB without the lightness component). Since shadows were a concern, using color spaces that ignored lightness was expected to allow clustering features based upon their chromatic properties without the influence of lightness or shadows. As this was an extension from the synthetic data experiments, the same Mean Shift clustering methods, number of neighbors, and bandwidths were employed and the same accuracy assessment was performed. 4.8.1 Indoor Scan The first case study, in a progression from the synthetic data, I initially worked with the Indoor LiDAR scan which contained spatial and thematic (spectral) data collected from simple constructed shapes having contrasting colors. Various objects in the scene had different properties. Some were planar, some were cylindrical, and some were spherical. The objects had different smoothness and colors. One notable challenge was to reduce the effect that shadows and lighting had on the segmentation. Though shadows may be helpful for some applications, it was considered beneficial to remove their effects to improve the correct segmentation of objects in this scene. First, the Indoor LiDAR scan was sampled to boost computational efficiency. Then, the scan was labeled as 9 groups: 8 reference groups representing objects and one background group. The 8 101 reference objects were the brown box, the red toolbox, the blue pail, the black case, the green case, the yellow ball, the blue cup, and the yellow ducks. Figure 25 shows images of the unsampled data, the sampled data in RGB color, and the sampled data colored per their reference groups. Indoor LiDAR (Unsampled) Indoor data (Sampled) Indoor data (Reference groups) Figure 25. Indoor LiDAR scan, sampled data, and labeled reference groups. After the Mean Shift clustering was complete, the accuracy assessment generated segmentation performance curves. Two of these are shown in Figure 26 and are examples of higher performing variable sets. The two images both show that 6 of the 9 reference groups were correctly segmented through the 0.6 tolerance level, meaning that the 6 groups had between 60-70% of their respective reference groups' points. Four output groups had over 80% and three output groups had over 90% of their respective reference groups. 102 Figure 26. Indoor LiDAR segmentation performance curve examples. The Max. AUC and Avg. AUC rankings, shown in Table 22, allow for a performance evaluation of all variable sets in relation to each other. The Max. AUC, relating directly to the largest area under the curve for each variable, shows that the alternative color spaces generally produce the highest performing results, whether they include the lightness variable or not. The XYZ and MV variable set are spread throughout the ranking list, however the XYZ set is evidently associated with some of the highest Max. AUC values. On the other hand, the RGB variable set is included in the lowest of the Max. AUC values. When all of the bandwidths are averaged for the Avg. AUC, a relative measure of performance is generated. The ranking of the Avg. AUC values reinforces the fact that the UV, AB, LAB, and LUV variable sets produce higher levels of correct segmentation overall. The XYZ variable set follows and generally performs better than the MV or RGB variable sets. 103 Max AUC Variable Set 0.47 LAB 0.47 XYZ-LUV 0.47 XYZ-LAB 0.47 XYZ-MV-LUV 0.47 XYZ-MV-LAB 0.42 LUV 0.41 AB 0.39 UV 0.39 XYZ-AB 0.35 MV-LUV 0.33 XYZ-UV 0.33 MV-LAB 0.33 XYZ-MV-AB 0.31 MV-AB 0.31 XYZ-MV-UV 0.29 XYZ-MV-RGB 0.28 XYZ-RGB 0.28 MV-UV 0.27 RGB 0.10 XYZ-MV 0.09 XYZ 0.08 MV-RGB 0.00 MV Avg AUC Variable Set 0.28 UV 0.28 AB 0.27 LAB 0.26 LUV 0.24 XYZ-LUV 0.23 XYZ-LAB 0.21 XYZ-UV 0.20 XYZ-AB 0.15 XYZ-MV-LUV 0.14 XYZ-MV-LAB 0.13 MV-LUV 0.11 MV-LAB 0.11 MV-UV 0.11 XYZ-MV-UV 0.11 XYZ-MV-AB 0.09 MV-AB 0.09 XYZ-RGB 0.07 RGB 0.06 XYZ-MV-RGB 0.03 XYZ 0.03 MV-RGB 0.01 XYZ-MV 0.00 MV Table 22. Indoor LiDAR Max. AUC and Avg. AUC values. The segmentation surface plots allow for a quick evaluation of the performance of all of the variable sets throughout all of their Mean Shift bandwidths. Guided in part by the AUC rankings, a selection of 6 segmentation surface plots are shown in Figure 27. The UV and AB variable sets plots are similar and illustrate their ability to segment throughout a large range of bandwidths. Though performing well overall, their peak performance is not as high as the LAB or the XYZLAB and XYZ-LUV combinations. Though these three sets are again similar and show that the addition of XYZ slightly degrades the performance at small and large bandwidths. This is possibly due to the different real world objects sizes, and therefore different XYZ distances in 104 the data. The lower performing XYZ-RGB variable set is also shown to illustrate the difference between the synthetic data and the real world data. The highly separable distribution of RGB values in the synthetic data readily generated clusters and the variable set had much higher performance compared to the LUV and LAB color spaces. However, in the real world data, the RGB values were less separable. The alternative color spaces evidently had values that were much more separable, allowing for improved clustering results that may be in line with the way different colors in the landscape are perceived by humans. Figure 27. Indoor LiDAR segmentation surface plot examples. The evaluation also involved a subjective visual inspection regarding particular objects represented in the source data set and the properties of their variables. Four Mean Shift clustering results are shown in Figure 28 as 3D perspective plots for discussion purposes. The plots for the MV variable sets were not selected for discussion here because they separated points representing edges from the output groups. 105 XYZ-LUV Bandwidth =0.3 XYZ-LAB Bandwidth =0.3 LAB Bandwidth =0.3 AB Bandwidth =0.3 Figure 28. Indoor LiDAR 3D perspective plot examples. Smaller Mean Shift bandwidths (~ 0.3) segmented the Indoor scene better than the preferred bandwidths in the synthetic data (~ 0.6). This reinforces the notion that the bandwidths may need to be selected to fit the distribution of data values and size of the clusters. In fact, the only three objects that rarely clustered were the small objects with very few points (the yellow ball, the blue cup, and the yellow ducks) as they had an entirely different scale and were relatively undersampled in the scan. The XYZ-LUV results illustrate the effect of the spatial location variables. Larger regions that exhibited similar thematic (color) characteristics were segmented into different groups by the XYZ variable set. This XYZ-LUV set was also sensitive to lightness; regions with different lighting or shading were in separate groups, as seen in the upper right region of the bookshelves. The XYZ-LAB variable set continued to break up similarly colored features, such as the brown wood box at the bottom of the scene. Both of these variable sets created many groups for the objects on the shelves. Over-segmentation could be viewed as preferable in this application since it is easier to combine groups of data than it is to separate 106 groups. The LAB variable set, on the other hand, had less over-segmentation, yet effectively generalized the scene. Some of the features, however, were not found at all (the two yellow ducks in the center of the shelves, and the yellow ball) and some were combined with larger groups (the blue coffee cup toward the upper left.) With this set, the shadow region in the upper right was not found, possibly because the shadow was not as influential as in other regions. Of particular interest, is the black rectangular case on the left side of the book shelf. It was easily segmented as a separate group with the LAB variable set. However, when the lightness variable was removed for the AB variable set, the black case was combined with the white background. Though the AB variable set did well at removing the effect of shadows, the clustering process could not find any difference between black case and white background points for two reasons: lightness was not used for clustering and both of these extremes in lightness (black and white) do not have much difference in color. In practice, it may be helpful to remove the effect of shadows, though one should exercise caution if clustering white and black objects is not desirable. To summarize, the XYZ variable set separated the data into local regions, the MV separated edges from planar regions, and the thematic (color) attributes did the best at segmenting easily recognizable objects in the scene. The inclusion or exclusion of lightness is a potential option for refinement of spectral segmentation of a real world scene. 4.8.2 Archaeological Site Scan The second case study was the Archaeological Site LiDAR scan. The scene, shown in Figure 29 contained nearly flat (not necessarily level) soil in the right foreground, a stone-lined and capped water canal running diagonally from the lower left foreground to the upper right of the scene, and grassy vegetation toward the left background portion. This scan provided rugged features with the stones in the water canal which also had highly variable color values. The grassy 107 vegetation and soil provided relatively flat surfaces, though the blades of grass presented some variation. Intuitively, the spectral values help to quickly differentiate three basic features: a patch of soil, a constructed canal, and a vegetated area. This type of real world data set inspired this line of research since the ability to create aggregated groups from abundant 3D point data, separating distinctly different features in a scene, would be helpful, efficient, and revealing. If the data collected from the three features in this scene were segmented into separate groups, then theoretically the characteristics, variable sets, clustering methods, and parameters may be extended to geographic features of larger extents. First, the Archaeological Site LiDAR scan was sampled for computational efficiency. Then, the scan was labeled as 3 groups: the grass, the stone canal, and the soil. Figure 29 shows images of the unsampled data, the sampled data in RGB color, and the sampled data colored per their reference groups. 108 Archaeological Site LiDAR (Unsampled) Archaeological Site data (Sampled) Archaeological Site data (Reference groups) Figure 29. Archaeological Site LiDAR scan, sampled data, and labeled reference groups. After the Mean Shift clustering was complete, the accuracy assessment produced segmentation performance curves. Four of these are shown in Figure 30 and are examples of higher performing variable sets. Three of the four images show that all three of the reference groups were correctly segmented through the 0.6 tolerance level, meaning that the three output groups had between 6070% of their respective reference groups' points. The XYZ-MV-RGB variable set maintained two groups with above 70% of their respective reference groups. 109 Figure 30. Archaeological Site LiDAR segmentation performance curve examples. The Max. AUC and Avg. AUC rankings, shown in Table 23, show the highest performing variable sets for one bandwidth and across all bandwidths, respectively. Variable sets including RGB had the highest performing bandwidths, probably due to the high separability of values for the green grass. However, the alternative color spaces like AB, UV, and LAB performed well overall. The MV did not perform well, except where it was combined with the XYZ set. The spatial location XYZ variables may not have performed well over all of the tests, but were included in several of the highest performing bandwidths. 110 Max AUC Variable Set 0.50 RGB 0.50 XYZ-RGB 0.50 XYZ-MV-RGB 0.44 LUV 0.44 XYZ-LUV 0.44 XYZ-MV-LUV 0.40 LAB 0.36 XYZ-LAB 0.36 MV-RGB 0.34 UV 0.34 AB 0.34 XYZ-UV 0.34 XYZ-AB 0.34 MV-LUV 0.34 MV-LAB 0.34 MV-UV 0.34 MV-AB 0.34 XYZ-MV-LAB 0.34 XYZ-MV-UV 0.34 XYZ-MV-AB 0.20 XYZ 0.00 MV 0.00 XYZ-MV Avg AUC Variable Set 0.32 AB 0.32 UV 0.32 LAB 0.30 RGB 0.30 LUV 0.24 XYZ-RGB 0.22 XYZ-LAB 0.22 XYZ-AB 0.22 XYZ-LUV 0.20 XYZ-UV 0.14 MV-RGB 0.14 XYZ-MV-LAB 0.14 XYZ-MV-RGB 0.14 XYZ-MV-LUV 0.14 MV-LAB 0.14 MV-LUV 0.14 MV-AB 0.14 MV-UV 0.14 XYZ-MV-AB 0.12 XYZ-MV-UV 0.02 XYZ 0.00 MV 0.00 XYZ-MV Table 23. Archaeological Site LiDAR Max. AUC and Avg. AUC values. The segmentation surface plots reveal more clearly the characteristics of the previous results. The 6 selected images are shown in Figure 31. The top three show that the RGB, the XYZ-RGB, and the XYZ-MV-RGB achieved three groups having over 60% of their respective reference groups. They achieve the highest performance at a few bandwidths. The bottom three images show how persistent some of the variable sets were over all of the bandwidths. The thematic variables showed ability to segment the data as previously in the other case study. The difference here is that the RGB did well, probably due to the highly separable set of data representing green 111 grass. Finally, when the XYZ variables were combined with color variables, the range of effective bandwidths was reduced and the performance for limited bandwidths was increased, Figure 31. Archaeological Site LiDAR segmentation surface plot examples. The 3D perspective plots again illustrate which points were grouped together. In Figure 32, there are 6 images showing the effects of using different spatial location, spatial relationship, and thematic variable sets. The results show segmentation characteristics that were similar to previous experiments, though they offered an opportunity to discuss the methods as applied to landscape features. 112 XYZ-RGB Bandwidth =0.2 XYZ-MV-LAB Bandwidth =0.2 UV Bandwidth =0.3 LUV Bandwidth =0.4 XYZ-LUV Bandwidth =0.4 XYZ-RGB Bandwidth =0.4 Figure 32. Archaeological Site LiDAR 3D perspective plot examples. First, the XYZ-RGB separated different features into small localized patches. With a relatively small bandwidth of 0.2, small groups were created. Since the scene was divided into many groups of nearby points having similar thematic attributes, it could be considered oversegmented. The XYZ-MV-LAB variable set, having a small bandwidth of 0.2, again broke up the scene into small groups, though the MV reduced the number of points included in groups overall. The UV variable set drastically generalized the scene; the majority of points were in two groups. Without the lightness variable, the scene was segmented based upon the two dimensions of color. This was effective at separating the green grass from the brownish soil and rocks. With 113 the lightness included in the LUV variable set, the landscape was generalized into three large groups of points. The soil and rocks were mostly separated since the rocks were generally darker. However, working with only spectral values, the clustering grouped the lighter tones on top of the rocks together with the sandy soil. The XYZ-LUV and XYZ-RGB variable sets demonstrated how local proximity was helpful in separating the tops of rocks from the soil. The XYZ component did separate the grass into a several groups, though there were fewer due to a larger bandwidth of 0.4. Also, upon close inspection of the results, the separate group at the top of the grassy area was actually brownish in the original LiDAR scan indicating a small sliver of soil. This nuance in the data was identified with the aid of the cartographic generalization process implemented with the Mean Shift clustering. Ultimately, both the XYZ-LUV and the XYZ-RGB variable sets generalized the scene enough to separate different features in the landscape. Not too many groups were created and some could be combined for subsequent processing. The XYZRGB variable set appeared to have a slightly more refined output as it completely separated the rocks from the soil. Features were grouped per their spatial and thematic characteristics and the complex landscape was simplified. 114 Chapter 5 - CONCLUSIONS 5.1 Overview To conclude this thesis, I address several previously stated propositions. The main topics are organized in the following sections: two general hypotheses, five sets of experiments, contributions, and future research. 5.2 Two General Hypotheses High level conclusions drawn from my work are related to my two general hypotheses. First, this research supports the idea that groups of data representing geographic phenomena in the environment can be identified using spatial location, spatial relationship, and thematic characteristics with the Mean Shift clustering technique. Second, combinations of variable sets enhance the ability to find such phenomena compared to single variable sets. 5.3 Five Sets of Experiments The five sets of experiments listed in the Methods chapter each provided insights to the process of aggregating thematically attributed 3D data. The knowledge gained from developing controlled experiments with synthetic data guided the successful application of the methods to the real-world terrestrial LiDAR data. Notable results are listed in the following five paragraphs. The first evaluation compared the K-means and Mean Shift clustering methods. It illustrated that benefits to using the Mean Shift clustering are that it is more flexible regarding the number of output groups and had a higher overall performance. This offered initial evidence for the potential success of clustering multivariate geographic data with the Mean Shift technique. 115 The second comparison, regarding spatial resolution, showed that a potentially under-sampled low-resolution data set greatly degraded the grouping of points. However, the high-resolution data set, with considerably more points, did not improve performance compared to the baseline data set. This suggests that only a moderate amount of data (in relation to the bandwidth of evaluation) is required to find groups of points representing features in the scene. The third evaluation studying the effects of noise provided evidence that even a reasonably small amount of noise in the data can significantly degrade the segmentation results. A benefit of the noise evaluation was that it allowed for the generation of multiple synthetic realizations of the data which provided a way to evaluate the performance of different variables through statistical testing. The results showed that combinations of variable sets were found to perform higher than individual variable sets. In comparing the color variable sets in the synthetic data, the RGB had higher performance. However, this was ultimately found to be influenced by the structure of the synthetic data, which had higher separability for the RGB set as compared to the LUV and the LAB sets. The MV spatial relationship variable set was found to perform poorly with regard to the measures of the accuracy assessment, though this variable may provide other benefits for separating planar and non-planar regions. The spatial location (XYZ) and thematic (RGB) variable sets both showed high performance when included in the synthetic data evaluation. The fourth evaluation involved an alternative color configuration in the synthetic data. It resulted in spatially disjoint sets of points with similar colors being grouped together This indicated that the thematic variable set had a strong influence on the clustering results. Again, the 116 structure of the synthetic thematic data was more separable than either the spatial location or spatial relationship. Finally, the real world case studies of the Indoor and Archaeological Site LiDAR data provided the most encouraging results. They demonstrated that the variable sets, clustering techniques, and parameters can have success in grouping points representing simple objects and landscape features. Supplementing the spatial data with the appropriate thematic information allowed for features to be grouped while alleviating the effects of shadows and environmental lighting. Including spatial and thematic data improved the cartographic generalization process and aggregating real-world point data effectively simplified the scene. This ultimately resulted in a higher-level of ontological information about features in the real-world scenes. 5.4 Contributions Relating to my contributions listed in the introduction, I've reached a variety of conclusions. Though the unstructured nature of the data, without topological connectivity or relationships, presented difficulty, the 3D vector point data structure was a good match for the Mean Shift clustering technique. This is especially so considering multiple thematic attributes can be associated with the spatial locations.. The Mean Vector variable, a variable derived from local points in a spatial neighborhood provided insufficient information to fully represent neighborhood relationships. This variable only managed to help group all points representing planar regions together, with edges, corners and areas of high variability each grouped separately. It is noteworthy to mention that, although the Mean Vector degraded results for this accuracy assessment, it also shows potential for other applications. This thesis was concerned with finding points representing homogeneous regions. If, on the other hand, an aim is to find 117 edges, corners, or areas of high variability in contrast to planar regions, then the Mean Vector show potential. It could be used in edge-based segmentation processes, possibly by finding sets of points that could be used to generate vectors representing the edges of homogeneous regions. The evaluation of color spaces provided two main insights. First, this common type of thematic attribute greatly aided the grouping of points and helped separate features in close proximity with similar Mean Vector values. Second, the LUV and LAB variable sets both allowed lightness to be removed and showed success in alleviating the degrading effects of shadows and environmental lighting in the process of segmenting real-world data. However, an important note is that when the segmentation used only the UV and AB color variables for handling shadows, white and black features were indistinguishable as they vary primarily in lightness. Generally, the LUV and LAB color spaces can help to separate colors into separate clusters. However, the RGB variable set effectively segmented the Archaeological Site LiDAR scan since the grass was particularly greener than the rest of the scene. The Mean Shift clustering technique provided a flexible and generally successful method for clustering similar points in a multi-variate data set. This technique, used often in the discipline of Computer Vision, showed potential for segmenting real-world geographic data having three spatial dimensions and thematic attributes. A general guideline regarding the Mean Shift process is that the bandwidth distance of evaluation should be carefully selected to find different scales of features. This will guide the process in finding different sizes of clusters in variable space. If smaller bandwidths are used, smaller clusters may be found as the bandwidth will not bridge a 118 large gap between clusters. If bigger bandwidths are used, the results will eventually show one large, under-segmented, cluster. The segmentation performance accuracy assessment included elements from traditional classification assessments and image analysis disciplines. These elements had previously been applied to range image segmentation research and, as such, were suitable for the terrestrial LiDAR data in this investigation. They could be very beneficial for geographic data analyses in general. My extensions of the previous accuracy assessment framework were key to fully evaluating the numerous variables, methods, and parameters. The area under the curve indices condensed the information from the segmentation performance curves. The Max. AUC values provided a single metric for determining the highest performing Mean Shift bandwidths. A second metric, the Avg. AUC, provided a measure of a variable set's persistence in performance throughout all bandwidths. The Avg. AUC also provided an overall performance measure for the statistical analyses, helping to identify the most effective variable set combinations. The segmentation performance surface plots were complementary to the AUC metrics and particularly helpful for understanding the various levels of performance for each variable set through all of the Mean Shift bandwidths. Finally, the 3D perspective plots, common for visual evaluation, revealed the which points were grouped in relation to the features they represent. 5.5 Contributions to Theory As a contribution to theory, I combined elements of two geographical concepts, cartographic generalization and geographic ontology, formalizing a particular process and method of 119 geographic representation. Two specific theories were discussed, the cartographic generalization of McMaster and Shea (1992), and an ontological hierarchy from Couclelis (2009). These two theories, though they came from separate threads of geographic inquiry, are very parallel in relation to this investigation. Both are related to geographic representation of features in the landscape and both present processes that similarly combine information to better understand those features. The generalization process of aggregation was a transformation that grouped 3D points into representations of areal regions having similar spatial and thematic characteristics (McMaster & Shea, 1992). The ontological hierarchy also developed simple objects from spatially connected, homogeneous points with similar (thematic) observables (Couclelis, 2009). These two theories ease interpretation of complex data sets by similarly developing groups of points that represent recognizable features in the landscape. To identify features in the landscape, a well-defined ontology is necessary. Certain properties and relationships must be understood, clarified, and explicitly modeled with the data. An example from this thesis that could be improved is the interaction between the Mean Vector and the accuracy assessment. While the Mean Vector may have fulfilled the purpose of separating planar regions from non-planar regions and is potentially helpful for other applications, it reduced the number of points per region and reduced the performance with respect to the accuracy assessment. Another example of refining an ontological model in this thesis relates to the relationship between the distribution of spatial or thematic data values, the level of precision, and the Mean Shift bandwidth of evaluations. The bandwidth distance should be selected to guide the 120 clustering process in finding certain sizes of clusters. For instance, small features may be found with small bandwidths. This could apply to either the spatial or thematic data. This also relates to the level of precision of data collection. The ability to find features is dependent upon the number of data points collected. If more data points were used to represent the blue cup, yellow ball, or yellow ducks in the Indoor data set, they would have composed a sufficiently dense cluster in variable space to be grouped by the Mean Shift. The red toolbox and blue pail show that increasing spatial resolution relative to the size of the feature can improve clustering performance. Collecting a sufficient amount of data to identify a feature could be an important component of the domain ontology. The selection of variables that relate to particular characteristics can be guided by their distribution of values. For instance, the Archaeological Site scan benefitted from the distribution of green values, resulting in segmentation of the grassy areas. It didn't have as many colors or shadows as the Indoor scan, which had better segmentation with the alternative color spaces. A specific definition of ontology of characteristics and relationships for segmenting desired features is evidently important in cartographically generalizing thematically attributed 3D data. 5.6 Future Research Considering future directions from this investigation, three main topics that come to mind. First, this thesis extended an evaluation of experimental methods to real world LiDAR data. Applying the methods to a larger extent would allow for the selection of points representing larger geographic features. A mosaic of terrestrial LiDAR scans or a data generated through fusion of aerial LiDAR and photographs could be used. A second topic would be to incorporate spectral reflectance with more variation or in different bandwidths, such as the near infrared or hyperspectral data. The third main topic for future investigation would be to conceptualize and 121 employ a better 3D neighborhood relationship metric than the Mean Vector; it only separated planar regions from other configurations. A more complete metric would identify different planar orientations or possibly address regions of similar curvature. Overall, this thesis demonstrated how diverse techniques and theories from other disciplines can be integrated with those of Geography and in the process developed new approaches for indentifying geographic features represented by richly attributed 3D data. Applications that this research could potentially benefit are varied. Spatial and thematic data representing geographic features in natural or urban settings could be segmented. The methods may help to select and quantify vegetation. Models of under-story vegetation could complement canopy information. Landforms such as dunes, outcrops, and caves could be segmented from high resolution data. Data relating to different constituents of soils, water bodies, or the atmospheric could be mapped in 3D. The potential for studying objects at different scales, such as atomic, biophysical, or astronomic, and the movement of such objects could benefit from 3D feature segmentation. Finally, this study can provide information for basic research in 3D spatial analysis and uncertainty which aid cartographic representation of regions and their boundaries. 122 REFERENCES CITED 123 REFERENCES CITED Agarwal, P. (2005). Ontological considerations in GIScience. International Journal of Geographic Information Science , 19 (5), 501-536. Ball, G. H., & Hall, D. J. (1967). A clustering technique for summarizing multivariate data. Systems Research and Behavioral Science , 12 (2), 153-155. Barnea, S., & Filin, S. (2008). Segmentation Of Terrestrial Laser Scanning Data By Integrating Range And Image Content. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XXXVII, Part B5, pp. 745-750. Beijing. Comaniciu, D., & Meer, P. (2002). Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence , 34 (5), 603619. Congalton, R. G. (1991). A Review of Assessing the Accuracy of Classifications of Remotely Sensed Data. Remote Sensing of the Environment , 37, 34-46. Couclelis, H. (2009). Ontology, Epistemology, Teleology: Triangulating Geographic Information Science. In Research Trends in Geographic Information Science (pp. 3-15). Berlin: Springer Verlag Lecture Notes in Geoinformation and Cartography. Couclelis, H. (1992). People Manipulate Objects (but Cultivate Fields)-Beyond the RasterVector Debate in GIS. In A. U. Frank, I. Campari, & U. Formentini (Eds.), Theories and Methods in Spatio-temporal Reasoning in Geographic Space (Vol. 639, pp. 65-77). New York: Springer Verlag Lecture Notes in Computer Science. Foody, G. M. (2002). Status of Land Cover Classification Accuracy Assessment. Remote Sensing of Environment , 80, 185-201. Frank, A. U. (2000). Geographic Information Science: New methods and technology. Journal of Geographical Systems , 2, 99-105. Georgescu, B., Shimshoni, I., & Meer, P. (2003). Mean shift based clustering in high dimensions: A texture classification example. Ninth IEEE International Conference on Computer Vision (pp. 456-463). Nice, France: IEEE Computer Society. Goodchild. (1990). Geographical Information Science. Keynote Address. Zurich: Fourth International Symposium on Spatial Data Handling. Goodchild, M. F., & Wang, M. -H. (1989). Modeling error for remotely sensed data input to GIS. Proceedings, AutoCarto 9 (pp. 530-537). Falls Church, VA: ASPRS/ACSM. 124 Goodchild, M. F., Yuan, M., & Cova, T. J. (2007). Towards a general theory of geographic representation in GIS. International Journal of Geographical Information Science , 21 (3), 239-260. Guarino, N. (1998). Formal Ontology and Information Systems. Formal Ontology and Information Systems (FOIS) '98 (pp. 3-15). Trento, Italy: IOS Press. Haralick, R. M., & Shapiro, L. G. (1985). Image Segmentation Techniques. Computer Vision, Graphics, and Image Processing , 29 (1), 100-132. Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 28 (1), 100-108. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. New York, NY: Springer. Hoover, A., Jean-Baptiste, G., Jiang, X., Flynn, P. J., Bunke, H., Goldgof, D. B., et al. (1996). An Experimental Comparison of Range Image S eg m e n t at i o n AI go r i t h m s. IEEE Transactions on Pattern Analysis and Machine Intelligence , 18 (7), 673-689. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data Clustering: A Review. ACM Computing Surveys , 31 (3), 264-323. Keates, J. S. (1973). Cartographic Design and Production. New York: John Wiley & Sons, Inc. Ledoux, H., & Gold, C. M. (2008). Modelling Three-Dimensional Geoscientific Fields with the Voronoi Diagram and its Dual. International Journal of Geographical Information Science , 22 (5), 547-574. Lee, I.-C., Wu, B., & Li, R. (2009). Shoreline Extraction from the Integration of Lidar Point Cloud Data and Aerial Orthophotos using Mean Shift Segmentation. American Society of Photogrammetry and Remote Sensing Annual Conference. Baltimore, Maryland. MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley: University of California Press. McMaster, R. B., & Shea, K. S. (1992). Generalization in Digital Cartography. Washington D.C.: Association of American Geographers. Ortiz, A., & Oliver, G. (2006). On the Use of the Overlapping Area Matrix for Image Segmentation Evaluation: A Survey and New Performance Measures. Pattern Recognition Letters , 27, 1916-1926. Paris, S., & Durand, F. (2007). A Topological Approach to Hierarchical Segmentation using Mean Shift. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition . Minneapolis, Minnesota: IEEE . 125 Pequet, D. (1988). Representations of Geographic Space: Toward a Conceptual Synthesis. Annals of the Association of American Geographers , 78 (3), 375-394. R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Schwering, A. (2008). Approaches to Semantic Similarity Measurement for Geo-Spatial Data: A Survey. Transactions in GIS , 12 (1), 5-29. Shimshoni, I., Georgescu, B., & Meer, P. (2006). Adaptive Mean Shift Based Clustering in High Dimensions. In G. Shakhnarovich, T. Darrell, & P. Indyk (Eds.), Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (pp. 203-220). Cambridge, Massachusetts: MIT Press. Tou, J. T., & Gonzalez, R. C. (1974). Pattern Recognition Principles. Reading, Massachusetts: Addison-Wesley. Weibel, R. (1997). Generalizations of Spatial Data: Principles and Selected Algorithms. In M. van Kreveld, J. Nievergelt, T. Roos, & P. Widmayer (Eds.), Algorithmic Foundations of Geographic Information Systems (Vol. 1340, pp. 99-152). New York: Springer Verlag Lecture Notes in Computer Science. Xu, R., & Donald Wunsch, I. (2009). Clustering. Hoboken, New Jersey: John Wiley & Sons, Inc. Yao, W., Hinz, S., & Stilla, U. (2009). Object Extraction Based on 3D-Segmentation of Lidar Data by Combining Mean Shift with Normalized Cuts: Two Examples from Urban Areas. Urban Remote Sensing Joint Event. Shanghai, China: IEEE Geoscience and Remote Sensing Society. Zhang, Y. J. (1996). A Survey on Evaluation Methods for Image Segmentation. Pattern Recognintion , 29 (8), 1335-1346. Zheng, L., Zhang, J., & Wang, Q. (2009). Mean-Shift-Based Color Segmentation of Images. Computers and Electronics in Agriculture , 65, 93-98. 126