ANALYSIS OF GEOBIA ALGORITHMS FOR CONTEXTUAL DETECTION OF DPRK MISSILE TESTING FACILITIES By Connor Alec Plensdorf A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Geography—Master of Science 2019 ABSTRACT ANALYSIS OF GEOBIA ALGORITHMS FOR CONTEXTUAL DETECTION OF DPRK MISSILE TESTING FACILITIES By Connor Alec Plensdorf Remote sensing provides people with an alternative, otherwise unattainable view to analyze the earth. Military and intelligence analysts quickly adopted this technology for tactical and strategic applications. Accordingly, these interpreters require increasingly immediate, accurate image analysis for decision-making in today’s dynamic military environment. Geographic Object-Based Image Analysis (GEOBIA) provides a means for automated image interpretation modeled after the expert interpretation processes. Although the system’s flexibility is advantageous for creating comprehensive image classifications, its flexibility may also preclude full automation and replication. The goal of this research was to improve image classification outcomes in the context of missile site detection. Here a GEOBIA workflow was developed that incorporates expert human knowledge for the detection of DPRK missile testing facilities. After conducting the analyses, I determined the best-fitting parameters from those tested include the rule-based classification for the Sohae testing facility and random forest classification for Yongbyon, with no conclusive results in favor of either software. The results indicate expert human knowledge does not necessarily improve classification accuracy for this case of study sites. Key Words: GEOBIA, image classification, contextual analysis, situation awareness, DPRK This thesis is dedicated to the family and friends who have wholeheartedly supported me throughout this project and in past, present, and future endeavors. ACKNOWLEDGEMENTS I want to thank my graduate advisor, Dr. Raechel White, for guiding me along this research project. I would also like to thank the remainder of my graduate committee, including Dr. Kyle Evered and Dr. Ashton Shortridge, for their support and effort towards this project. I also extend further gratitude to the Nuclear Threat Initiative researchers and the 38 North agency and its associated image interpreters. iv TABLE OF CONTENTS LIST OF TABLES…………………………………………………………………………………………………………………vii LIST OF FIGURES…..…………………………………………………………………………………………………………viii 1. Introduction ............................................................................................................. 1 2. Background ........................................................................................................... 10 Remote Sensing for Strategic Operations ............................................................... 10 2.1. The DPRK Missile Program ...................................................................................... 13 2.2. The DPRK Missile Development Monitoring .......................................................... 16 2.3. Rule-Based Image Object Detection ....................................................................... 19 2.4. Knowledge Incorporation into GEOBIA Applications ............................................. 21 2.5. Feature Extraction ................................................................................................... 24 2.6. 2.7. Objectivity ............................................................................................................... 25 2.8. Workflow Reusability .............................................................................................. 26 2.9. Contributions of the Present Research ................................................................... 27 3.1. 3.2. 3.3. 3.4. 3. Methods ................................................................................................................ 29 Study Sites ............................................................................................................... 30 Data ......................................................................................................................... 33 Analysis 1 - Extraction of Expert Interpretation Cues ............................................. 35 Analysis 2 - Classification Methods Best for Missile Facility Extraction ................. 37 Rule-Based Classification in eCognition .......................................................... 39 Nearest Neighbor Classification in eCognition ................................................ 41 Random Forest Classification in eCognition .................................................... 44 Analysis 3 – Comparison of Classification Software ............................................... 44 Analysis 4- Comparison of Spatial Resolutions ....................................................... 47 Post-classification Accuracy Assessments .............................................................. 51 3.4.1. 3.4.2. 3.4.3. 3.5. 3.6. 3.7. 4. 4.1. 4.1.1. 4.1.2. 4.1.3. 4.1.4. 4.1.5. 4.1.6. Results ................................................................................................................... 54 Knowledge Incorporation Comparison ................................................................... 54 Rule-Based Classification without Knowledge in eCognition .......................... 55 Rule-Based Classification with Knowledge in eCognition ............................... 57 Nearest Neighbor Classification without Knowledge in eCognition ............... 58 Nearest Neighbor Classification with Knowledge in eCognition ..................... 60 Random Forest Classification without Knowledge in eCognition ................... 61 Random Forest Classification with Knowledge in eCognition ......................... 63 Software Comparison ............................................................................................. 65 Random Forest Classification in eCognition .................................................... 65 4.2.1. 4.2. v 4.2.2. 4.3. Random Forest Classification in R ................................................................... 65 Spatial Resolution Comparison ............................................................................... 69 Rule-Based Classification 1m ........................................................................... 69 Rule-Based Classification 3m ........................................................................... 71 Nearest Neighbor Classification 1m ................................................................ 72 Nearest Neighbor Classification 3m ................................................................ 73 Random Forest Classification 1m .................................................................... 74 Random Forest Classification 3m .................................................................... 75 Overall Results ................................................................................................. 77 4.3.1. 4.3.2. 4.3.3. 4.3.4. 4.3.5. 4.3.6. 4.3.7. 5. Discussion .............................................................................................................. 78 Limitations............................................................................................................... 78 Delimitations ........................................................................................................... 78 Analysis 1 – Knowledge Incorporation Comparison ............................................... 79 Analysis 2 – Software Comparison .......................................................................... 82 Analysis 3 – Spatial Resolution Comparison ........................................................... 83 Developments on Present Research ....................................................................... 86 5.1. 5.2. 5.3. 5.4. 5.5. 5.6. 6. Conclusion ............................................................................................................. 88 APPENDICES……………………………………………………………………………………………………………………91 APPENDIX A: Abbreviations…………………………………………………………………………………………….92 APPENDIX B: Sohae Ruleset…………………………………………………………………………………………….93 APPENDIX C: Yongbyon Ruleset………………………………………………………………………………………94 APPENDIX D: Palisades Nuclear Energy Facility Ruleset…………………………………………………..95 REFERENCES……………………………………………………………………………………………………………………96 vi LIST OF TABLES Table 1: Objectives of the Study…………………………………………………………………………………………..6 Table 2: Concordance subset for the knowledge base……………………………………………………..……37 Table 3: Classifications used for analysis per feature space...……………………………………………..…39 Table 4: Features that were used for knowledge incorporation……………………………………………40 Table 5: Number of samples per class for each DPRK image………………………..……………………….42 Table 6: Knowledge Incorporation comparison accuracy assessment (above)……………………..64 Table 7: Knowledge Incorporation Analysis Processing Times……………………………………………….64 Table 8: Model 1 Accuracy Assessment…………………………………………………………………………………65 Table 9: Model 2 Accuracy Assessment…………………………………………………………………………………66 Table 10: Model 2 variables with greater than 2 significance…………………………………………………67 Table 11: Model 3 Accuracy Assessment…………………………………………………………………….……….68 Table 12: Rule-based, 1m Accuracy Assessment……………………………………………………………….….70 Table 13: Rule-based, 3m Accuracy Assessment……………………………………………………………………71 Table 14: Nearest Neighbor 1m, Accuracy Assessment…………………………………………………………72 Table 15: Nearest Neighbor 3m, Accuracy Assessment…………………………………………………………73 Table 16: Random Forest 1m, Accuracy Assessment…………………………………………………………….75 Table 17: Random Forest, 3m Accuracy Assessment…………………………………………………………….76 Table 18: Spatial Resolution Analysis Classification Processing Times……………………………………77 vii LIST OF FIGURES Figure 1: DPRK Missile Ranges. Source: https://www.dw.com/en/which-us-cities-could-north- koreas-ballistic-missile-hit/a-39881831.…………………………………………………………………….14 Figure 2: Number of expert interpretations from 38 North per missile testing site in this study.………………………………………………………………………………………………………………………..32 Figure 3: Sohae Image. Source: Planet Labs……………………………………………………………………………..33 Figure 4: Yongbyon Image. Source: Planet Labs……………………………………………………………………….34 Figure 5: Palisades nuclear power plant, 1m resolution and 3m resolution (right). Source: USGS..............................................................................................................................34 Figure 6: Sohae samples used for supervised and machine learning classifications………….………42 Figure 7: Yongbyon samples used for supervised and machine learning classifications.……………43 Figure 8: Nuclear Power Plant (top) and airfield (bottom).1m images on left and 3m images on right……………………………………………………………………………………………………………………………48 Figure 9: Samples used for the spatial resolution comparison classifications……………………………51 Figure 10: Rule-based no knowledge classification, Sohae……………………………………………………….55 Figure 11: Rule-based no knowledge classification, Yongbyon…………………………………………………56 Figure 12: Rule-based with knowledge classification, Sohae…………………………………………………….57 Figure 13: Rule-based with knowledge classification, Yongbyon………………………………………………58 Figure 14: NN no knowledge classification, Sohae……………………………………………………………………59 Figure 15: NN no knowledge classification, Yongbyon……………………………………………………………..59 Figure 16: NN with knowledge classification, Sohae…………………………………………………………………60 Figure 17: NN with knowledge classification, Yongbyon…………………………………………….…………….61 Figure 18: RF no knowledge classification, Sohae………………………………………………………………….….62 Figure 19: RF no knowledge classification, Yongbyon……………………………………………………………….62 Figure 20: RF with knowledge classification, Sohae………………………………………………………………….63 Figure 21: RF with knowledge classification, Yongbyon……………………………………………………………64 Figure 22: Prediction using Model 1 (z-value of pixel)………………………………………………………………66 Figure 23: Prediction using RF Model 2 (RFsp with z)……………………………………………………………….67 viii Figure 24: Prediction using RF Model 3 (RFsp, no z)…………………………………………………………………68 Figure 25: Rule-based classification, 1m…………………………………………………………………………………..69 Figure 26: Rule-based classification, 3m…………………………………………………………………………………..71 Figure 27: NN classification, 1m……………………………………………………………………………………………….72 Figure 28: NN classification, 3m…………………………………………………………………………………………..…..73 Figure 29: RF classification, 1m…………………………………………………………………………………………..……75 Figure 30: RF classification, 3m……………………………………………………………………………………………..…76 ix 1. Introduction Despite its benefits of timely and comprehensive visualization of the earth’s surface, satellite imagery may not satisfy consumer needs entirely. Military and intelligence consumers often require near real-time analysis for proper tactical or strategic decision-making (Cloud and Clarke 1999). Military image analysts undertake tedious, time-consuming processes of manually analyzing images. As late as 1997, public image analysts manually analyzed imagery to detect and confirm the location of the 1974 Indian nuclear weapons test (Gupta and Pabian 1997). The 1990s declassification of CORONA satellite imagery opened up dialogues between military and public analysts, allowing greater information sharing between the domains (Cloud and Clarke 1999). The military has used remote sensing technologies for observation, and intelligence gathering for over 100 years. During the American Civil War and the 1849 bombardment of Venice, Italy, unmanned hot air balloons conducted aerial reconnaissance for military operations (Watts, Kobziar, and Percival 2009). By World War II, the world’s militaries used aerial photography, for situation awareness and damage assessment. Greater advances in military technology and increasing demands, such as the need for increased visibility and range, led to the development of more modern technologies for remote sensing (Perkins and Dodge 2009). The Cold War drove developments in remote sensing, particularly satellite imagery, which has since dominated national security military applications. The CORONA satellite imagery program – the first US spy satellite imagery program - provided strategic means for decision- making against the Soviet Union throughout the Cold War (Cloud and Clarke 1999). With satellite imagery technologies initiated in the CORONA program, the military since conducted several operations concerning developments in imagery gathering. Much of the 1 military use today involves the detection of hazardous entities and other forms of intelligence gathering. For instance, military personnel have used hyperspectral imagery to detect vehicles under vegetation canopies (Shippert n.d.). The use of satellites is also used to assess military success and progress. Examples of such analytical goals include the 2007 surge, in which the U.S. military deployed over 30,000 military personnel to Baghdad, Iraq, and assessments of its effectiveness in swaying the tide of the political and social battle of nation-building and reconstruction. Accordingly, the military used satellite imagery from the Defense Meteorological Satellite Program (DMSP) to detect changes in city lights visible from space before, during, and after the surge. Researchers indicated that the presence of visible light, would display a likely increase in infrastructure whereas darkness, or an absence of visible light would likely indicate a decrease in infrastructure or absence altogether (Agnew et al. 2008). Details about the methods of image analysis used by the military remain classified. The increase in availability of commercial satellite imagery, as well as commercial initiatives for declassified imagery, such as John Pike’s Public Eye, have improved public awareness about military remote sensing. The Public Eye stems from Globalsecurity.org and relates to issues of national intelligence and security and uses declassified CORONA imagery and aerial U2 imagery (Perkins and Dodge 2009). Widespread use of remote sensing imagery by the media has meant there is an increasing demand for open source imagery by the public. With programs such as Public Eye and Digital Globe, a commercial producer of Quickbird imagery, high-resolution satellite imagery is becoming increasingly open and public. Geographic object-based image analysis (GEOBIA) use has risen due to its capability to improve analytical output in the face of increasing image analysis demands. GEOBIA challenges 2 and builds upon the largely standardized pixel-based methods, integrating computer vision and pattern recognition processes with traditional earth observation workflows (Blaschke et al. 2014). Its emergence likely stems from recent historical events, such as the 1990s changes in U.S. space policy (Hitchings 2003) and the development of more powerful computers and processing technologies (Hay and Castilla 2008). GEOBIA is an image analysis method centering on the patterns created by pixels rather than the pixels themselves. GEOBIA began to constitute a potential paradigm shift in the early 2000s (Blaschke et al. 2014). Patterns created by these pixels are designated as image objects, which are items of interest in the image (Blaschke et al. 2014). Focusing on image objects, GEOBIA mimics the way humans interpret images, instead of focusing on individual pixels often unseen with the naked eye (Hay and Castilla 2008). The issue with this is ultimately that pixels are not features of an image and are, thereby, subject to internal homogeneity or heterogeneity, in which they capture only parts of features or located on feature edges, respectfully. With a grouping via segmentation, pixels of similar spectral value may be logically grouped into recognizable features. Hay and Castilla define GEOBIA as a sub-discipline of geographic information science (GIScience) that automates segmentation of images and evaluates the spatial and spectral characteristics of the image objects created. Results may be used in a GIS-ready format (Hay and Castilla 2008). GEOBIA begins by dividing entire images into candidate image objects through segmentation. Segmentation is the process of dividing the image into image-objects of spectrally-similar pixel groups. Classification, on the other hand, is the process of assigning these image-objects classes to represent a given analysis. Pre-built algorithms, such as the multiresolution segmentation and chessboard segmentation, and are used for this segmentation, after which the analyst creates 3 objects from segmented images with trial-and-error rule building (Belgiu, Hofer and Hofmann 2014). During GEOBIA’s second phase, the image is classified into user-designated classes based on classification rules. This step permits the user to draw from his/her knowledge of the landscape and subject matter to create threshold-based rules to assign specific classes across the image (Belgiu, Hofer and Hofmann 2014). For instance, buildings in an image could be classified across an image based on the brightness values of the objects representing them in the image. With GEOBIA, users obtain intelligence from the interpretation through a less subjective, less labor and time intensive process (Hay and Castilla 2008). Though the military has likely begun utilizing GEOBIA for analysis, detailed documentation of processes remains classified for security purposes. Military applications of GEOBIA have developed noticeably in time. In the late 1990s, military analysts used simple, timely visual interpretation in the detection of the 1979 Indian nuclear missile test (Gupta and Pabian 1997). Recently, however, researchers adopted this “new paradigm” of GEOBIA to monitor specific sites of interest, namely for weapons treaty verification (Niemeyer and Nussbaum n.d.). These applications search for activity within facilities to identify suspicious activity suggesting weapons development. Given the recent Joint Comprehensive Plan of Action (JCPOA) in Iran regarding nuclear energy, GEOBIA research has focused on detection of suspicious activities in suspect nuclear sites. Another country which requires remote sensing-based monitoring of nuclear activity is the Democratic People’s Republic of Korea (DPRK). Few unclassified studies aim to use GEOBIA in the DPRK to monitor its known missile testing facilities, however. A similar GEOBIA approach in treaty verification could be used in the DPRK to observe known missile site activities, 4 which would support the intelligence community (IC) in its monitoring for further developments in the DPRK missile program or for preparations towards another missile launch or test. While the GEOBIA process inherently simulates human interpretation, it is unlikely that a single human can interpret every individual feature (image attributes such as spectral values, geometric parameters, etc.). A human interpreter is likely to focus on a handful of these features to distinguish a house from a tree, since he/she is cognitively incapable of utilizing the plethora of data available for a given group of pixels. To better mimic human intelligence of interpretation and improve classification accuracy, GEOBIA may exclusively limit its scope to those properties that the interpreter uses in his/her analysis. This study aims to discover whether the use of interpreter knowledge for informing classification improves classification accuracy in the case of military feature identification. To accomplish this, here I compare the accuracies of classifying an image with exclusively human- detectable features to classifying with all available features. It additionally views images of the Palisades Nuclear Energy Facility in Michigan, US, as a spatial resolution comparison. This study uses the case of the DPRK’s missile testing facilities for classification. The DPRK region is chosen in support of national security initiatives and to advance research towards monitoring of weapons of mass destruction (WMD). This study was conducted with the objectives in Table 1. 5 Objective 1 intends to gain an understanding of the human interpretation process and visual cues used for the analysis of images of missile test sites. To accomplish this objective, I needed to discover interpreters who write about their interpretations, particularly those who focus on the DPRK and its missile testing. I then extracted the interpretation elements used for the detection of each feature in the text. Further details regarding the process are provided in Section 3. I predicted that the contextual information from interpreters would focus largely on shape and texture elements and features, such as rectangularity and coarseness, translatable into a GEOBIA format. This hypothesis is based on the notion that much of visual interpretation of images involves the recognition of these interpretation elements and would likely accordingly apply to adding context to image analysis. Objective 1 Objective 2 Objective 3 Objective 4 Objective 5 Obtain human interpreter knowledge. Determine the features indicative of missile testing. Segment and classify each of the images using the three classification methods and compare the accuracies with and without utilizing interpreter-used features. Compare classification accuracies of multiple classification software. Determine whether spatial resolution improves classification accuracies of the three classification algorithms. improving Table 1: Objectives of the Study. Objective 2 aims to pinpoint the objects of focus for detection and for assessment of the methods in Objective 3. Using the same interpretation rules from the content analysis conducted in Objective 1, I extracted the artificial features in the images that the interpreters highlighted as indicative of a missile testing facility or one which supports missile testing. I predicted that the 6 features that interpreters would focus on largely highlight different types of single-purpose buildings. Objective 3 compares two cases. In the first, only image features that were identified in objective one were used in classification. In the second case, a full set of image features was used. I developed segmentations and classifications for each of the sites used in the analysis. For comparison, this step uses three methods of classification to determine if any combination of the classification and feature extraction are more accurate than the others. The methods of classification include rule-based classification, nearest neighbor, and random forest classification. I implemented all three classifications with both the complete feature selection and the interpreter-derived feature extraction achieved from Objective 1. An accuracy assessment was completed for each of the six classifications for comparison in detecting the objects discovered in Objective 2. I predicted that the use of the human interpreter elements would improve accuracies in detection across all three of the classification methods with the greatest increase in accuracy being in the random forest classification. I predicted that the random forest classifier would perform the best due to previous studies on the topic, discussed in the literature review of this thesis, as well as its inherent approach to the classification: the use of training and decision trees may better utilize different features for classification than the other methods based on the decision trees. Objective 4 compares two different types of image classification software. I conduct in each software a random forest classification on the same study site using a knowledge-based reduced feature set between both platforms. The first software used is eCognition, which remains the primary software of this study, and the second is R (Trimble 2019; The R Foundation 2019). 7 Accuracy assessments were completed for each software and are compared for the classification of buildings in the respective image from Objective 2. I predicted that there would not be a significant difference in accuracies between the two software types with this method of image classification. Objective 5 conducts an analysis similar to that in Objective 3 but instead compares the classification accuracies of two different spatial resolutions. The comparison will conduct all three classification algorithms used in Objective 3 for each of the two spatial resolutions of the same image. I did not use the same study site for this comparison as with the previous steps, but I did retain the parameters from the previous steps to isolate strictly the spatial resolution comparison. An accuracy assessment was conducted for each of the classifications of each spatial resolution, from which I could determine whether or not finer resolutions using these algorithms yield more accurate classification results. The accuracy assessments from this objective viewed the classification of all classes in the scene versus solely the building class as with the previous objectives. The remainder of this document provides details, literature background, and results of the study. The following section details current and past research in GEOBIA, the incorporation of human cognition into GEOBIA, and the use of GEOBIA in cases of the DPRK. Section 3 discusses the data used in the study and the processes used to conduct the project in GEOBIA. Section 4 discusses the accuracy assessments produced in the project and all results obtained from the study. Section 5 discusses the results in the context of military intelligence and its potential implications and the limitations and delimitations of the project as a whole. Lastly, Section 6 concludes the project discussion and describes its contribution to future research and how a 8 similar project may improve the results of this project in future research. All acronyms and abbreviations used throughout this study may be viewed in Appendix 1. 9 2. Background This study draws from history, geography, and Geographic Information Science (GISci). To understand the basis of the research, it is necessary to explain the context in which similar research has developed. Since the research maintains a military focus, I will first discuss military remote sensing, particularly for strategic operations. The studies and instances of remote sensing in this context involve both military-led and civilian-led research. Given this study’s focus in the DPRK, a history of the nation’s high-profile missile program, present capabilities and the research completed regarding the monitoring of this specific missile program are described. These details are necessary for understanding the pressing issue of DPRK to US politics and in gaining a relative perspective on the nation’s motivations and testing capabilities. Next, I review research concerning the application and development of GEOBIA. This explanation will exhibit the common methods of analysis in prominent GEOBIA studies, which contributed to decisions made in Section 3 of this study. I focus on rule-based classification methods to explain methods and applications common for this type of classification. I also review knowledge incorporation for GEOBIA. To provide further reasoning for our methods, we discuss trends in feature extraction and feature space reduction, objectivity in knowledge-based classifications, and potential workflow reusability. Following a discussion of the past and present research in these relevant areas, I discuss the present research gaps and how this particular study may contribute towards these gaps. 2.1. Remote Sensing for Strategic Operations The US and foreign militaries use GIS and remote sensing for a variety of applications in support of their operations. According to Witmer (2015), military operations were among the 10 first applications of remote sensing in violent conflict settings. Military remote sensing in the US began with the use of the hot air balloon to observe the battlefield in the American Civil War (Witmer 2015). These technologies allow observation of the battlefield from above and provide spatial information for military commanders, permitting these leaders and decision-makers to effectively plan from an aerial point of view (Satyanarayana and Yogendran 2013). Remote sensing technology has become increasingly popular with the US military, largely in the form of unmanned aerial vehicles (UAVs), since they do not put any lives directly at risk in the sake of reconnaissance. Remote sensing research in support of strategic military operations takes the form of either digital mapping or UAV reconnaissance. Glade (2000) evaluates the use of UAVs in a variety of military applications, including transportation, intelligence and surveillance, attack missions, and combat support missions. For surveillance, the military has used the UAV technology for remote sensing reconnaissance since it remains relatively difficult to detect by those being observed. UAVs have also been used to remotely detect chemical and biological weapons autonomously (Glade 2000). These platforms have additionally been preferred by the military due to their ability to broadcast live information for long periods. Their use is particularly useful as it reduces the need to expose military personnel to fatigue and stress that operated flights cause. Remote sensing in recent US conflicts has focused largely on the observation of urban environments, due to the military presence in both Iraq and Afghanistan. While urban environments are a major focus for remote sensing for current military operations, their difference from other environments and large variability (cities and towns vary in construction 11 techniques around the world) require smaller and less durable remote sensing platforms for operations (Samad, Bay, and Godbole 2007). The use of UAVs for surveillance emphasizes the military’s situation awareness (SA) for understanding complex environments, such as the urban battlefield of today’s conflicts (Samad, Bay, and Godbole 2007). Although UAVs and remote sensing are used for urban reconnaissance, military leaders also use the surveillance capabilities for detection outside the urban environment. Commanders use this view of the terrain to maneuver troops, materials, or vehicles, and to develop maps of the terrain for optimal resource utilization and decision-making for missions (Satyanarayana and Yogendran 2013). Due to the variety of environments, no broad ruleset exists for either military or civilian use in support of military operations or civilian research (Witmer 2015). Military operations also use space-borne remote sensing platforms, such as satellites. The search for Osama bin-Laden, the mastermind behind the New York World Trade Center attacks in 2001, prompted the combination of Landsat 5 Thematic Mapper imagery and cultural geography to search for terrorist groups in the Zhawar Kili region of Afghanistan. This study resulted in the detection of terrorist posts containing terrorist-led convoys and potentially high- value targets in al-Qaeda (Beck 2003). Though military leaders use them in support of planning operations, space-borne platforms are not ideal for real-time military operations, due to revisit times over the same regions and the need for very high resolution (VHR) imagery, which is tougher to attain by space-borne platforms than from a UAV sensor (Witmer 2015). Maathuis (2003) accordingly declined the use of satellite imagery to detect individual landmines – due to the need for a much higher spatial resolution (in centimeters) than that achievable by airborne or spaceborne imagery. Due to the inaccessibility of this high-resolution data, this study instead 12 used Landsat imagery to detect the entire scene for the likely presence of minefields over a region (Maathuis 2003). Military remote sensing also has a variety of targets. Witmer (2015) addresses the various methods and types of remote sensing used for detection and analyzing the effects of violent conflicts, including wars and genocide. Beck (2003) uses 30m Landsat imagery in combination with GIS and cultural geography to aid intelligence analysts in searching for terrorists hidden in mountains and caves in Afghanistan, while Maathuis (2003) used SPOT XS and Landsat Thematic Mapper imagery to detect minefields in three different regions of Zimbabwe to support civilian or military landmine clearance. All three of these studies use publicly available information in their military-oriented applications. 2.2. The DPRK Missile Program Some nations maintain the goal of secrecy, revealing as little as possible to the world outside their borders. Recently, the DPRK has emerged in the world news due to this secrecy and perplexing behavior on the global scale, particularly regarding its missile program. The DPRK poses a major threat to the US, making the development of an intelligent, making remote sensing developments concerning the missile testing sites of interest. The DPRK’s ambitions for strategic weaponry rose almost immediately after its inception in 1950. Chinese and Soviet powers assisted the DPRK in achieving these aspirations. Between 1968 and 1969, the United Soviet Socialist Republic (USSR) provided a sample of its S-2 Sopka missiles to The DPRK for coastal defense (Sachdov 2000). At the same time, China provided similar assistance with its HY-1 naval missiles, themselves a bi-product of the USSR’s SS-N-2 Styx missiles (Sachdov 2000). Additionally, Egypt provided the DPRK Scud B missiles in the late 1970s or early 1980s. Shortly after the 13 beginning of this weapons trading relationship, in 1972, the DPRK developed a domestic site to develop the Chinese HY-1 naval missiles. Figure 1: DPRK Missile Ranges. Source: https://www.dw.com/en/which-us-cities-could- north-koreas-ballistic-missile-hit/a-39881831. By the mid-1970s, the DPRK had made adequate progress toward its weapons ambitions with the assistance of its trade partners. In 1973, the DPRK possessed 24 unguided FROG 5/7 rockets and 6 SS-C-2b missiles (Sachdov 2000). The DPRK with China proposed the joint development of a single-stage tactical missile, DF 61, around this time. The project was canceled in 1978 due to the collapse of the main Chinese governmental supporters of the project, though the DPRK continued additional missile development. By 1981, the Korean weapons program accelerated with the cooperation of Egypt in a technological exchange agreement, which provided the DPRK Scud-B technology. The next year a North-Korean-built, Iranian-financed Scud-B missile was tested with three more in 1984. The DPRK established an official development and testing facility near the capital, Pyongyang, during the mid-1980s where it maintained an annual production of 50 Scud-B missiles (Sachdov 2000). The missile program and relative success for the DPRK have 14 tightened relationships with Iran, in that the latter purchased in 1987 approximately 90-100 Scud-B missiles from the DPRK, due to its limited domestic assembly at its Isfahan plant (Sachdov 2000). The DPRK missile program advanced in the late 1980s and early 1990s, climaxing in the development of the Scud C missile in 1987. The first test of Scud C occurred in 1990 and followed with full-scale production of 4-8 missiles per month over the next year. The program reached an elevated level when it developed the No Dong I missile in 1991, which was allegedly capable of reaching all of South Korea. The DPRK revealed this missile to other nations unfriendly to the U.S. – for instance, it attempted to sell it to Libya for $7 million and exhibited it for Pakistani officials in 1992. DPRK weapons developers completed and finalized the No Dong I missile in 1993, testing two of them during missile testing from May 29-30, 1993 (Sachdov 2000). The success of the Scud missile program in the DPRK inspired their nuclear weapons development. Though US sanctions have stunted the development of the nuclear program, the DPRK has continued to establish nuclear facilities across the country, most of which are located in Yongbyon. Currently, the Yongbyon facility possesses a 50 MW reactor, with other reactors across the country for testing and development. Sanctions from the U.S. froze the nuclear operations in the DPRK after the DPRK’s attempted withdrawal from the Nuclear Non- Proliferation Treaty (NPT) in 1993 (Sachdov 2000). The heightened focus on nuclear developments has accelerated the need for The DPRK to develop methods of delivery in the form of missiles. The first nuclear detonation test occurred at the Punggye-Ri underground testing facility on October 9, 2006, receiving plutonium from the Yongbyon nuclear facility (Chung 2016). Subsequent nuclear tests occurred at Punggye-Ri in 2009, 15 2013, and 2016, for a total of four tests. These tests have culminated at what the DPRK officials claim is the possession of a hydrogen bomb in 2016 (Chung 2016). As late as 2016, the DPRK has progressed towards the development of its KN-11 missile, which is a submarine-launched ballistic missile (SLBM) that can carry a nuclear warhead (Postol and Schiller 2016). The missile program has made significant increases in its capabilities since its withdrawal from the NPT in 1993. Missile development by the DPRK has led to an increase in threatening political posturing by the DPRK towards the US. In 2016, in response to condemnations of potential hydrogen bomb testing, the Korean Central News Agency (KCNA) announced that the Iraqi and Libyan regimes of Hussein and Gaddafi, respectively, succumbed to destruction upon giving up their nuclear ambitions amid pressure from the U.S. and the Western nations (Chung 2016). The nuclear program came largely from the missile program, so it is worth focusing on the missile testing to combat the nuclear ambitions. 2.3. The DPRK Missile Development Monitoring Although the DPRK is a national security interest for the US, not much public research has addressed methods of observing the DPRK’s missile development with remote sensing. According to Shim (2014), the geographic study of remote sensing for The DPRK lacks focus and strength, regardless of the country ‘s abnormality as a “terra incognita sui generis or uncharted land of its own.” Satellite imagery has been the focus for obtaining information regarding the developments in the DPRK due to lack of ground accessibility. The satellite imagery surveillance has yielded mixed results, due to classification inaccuracies resulting from image tampering or disguising objects on the ground to reflect a different object from space. Accordingly, greater details are obtained through independent monitoring services that may access the country on foot 16 (Squassoni 2005). In its relatively small public domain, however, monitoring the DPRK for research largely revolves around non-remote sensing methods, or those not concerning its political aims and missile development. Since the DPRK’s withdrawal from the NPT and denial of International Atomic Energy Agency (IAEA) inspectors from observing its plutonium enrichment facilities, remote sensing methods remain one of the only approaches to monitor the missile and nuclear development of the country (Pollack 2003). According to Albright and Brannan (2007), the IAEA provided independently in situ monitoring for the US intelligence estimates. Moreover, the monitoring focuses on the Yongbyon radiochemical facility, due to its plutonium enrichment and chemical laboratories on site. Additional sensing monitoring of the DPRK activities from these sites takes the form of seismic wave analysis. Kim and Richards (2007) calculate the distance of seismic waves arriving at local monitoring sites and compare them to the times and locations of recorded earthquakes and compare these wave values with those from the nuclear test to detect the location of the test itself as well as the likely origin of the missile. Similarly, Schlittenhardt, Canty, and Grunberg (2010) estimated the seismic activity of the test site from different monitoring facilities and agencies for the DPRK test site, identifies the locations given these estimates in the low-seismic-yield region and confirmed testing activity using ASTER satellite imagery in a change detection analysis. This latter method differs from others that use seismic data for detection, as it chooses to complement the study and verify the missile tests with remotely sensed imagery. The other studies instead tend to focus on the seismic element created from the missile’s impact and detonation. These studies address the monitoring of the DPRK for the Comprehensive Nuclear- Test-Ban Treaty (CTBT) without remote sensing data. 17 Several studies use remote sensing in the country to monitor activities at missile test sites. Albright and Brannan (2007) use of commercial imagery for monitoring and estimating developments of plutonium stocks at the Yongbyon radiochemical facility, and for monitoring related facilities at the same site. Broad et al. (2005) use satellite images by intelligence agencies and national monitors to detect tunneling activity likely used for missile and nuclear testing. Squassoni (2005) discusses the focus of satellite imagery for monitoring the DPRK by observing specific locations, namely the Yongbyon facility, and the components at these sites, such as the 5MW reactor at Yongbyon. Some studies integrate remote sensing data with other information resources. Shim (2014) visually interprets nighttime NASA imagery to determine the spatial scarcity and development void in the DPRK in support of US policymakers and national security decisions. Ozeki and Heki (2010) used GIS to calculate a DPRK missile test’s entrance and exit from the ionosphere using the channel frequency disturbances at various Japanese GPS station locations. This latter study focuses on the geography of Japan in conjunction with the geography of the missile launched from Musudan-ri, DPRK (Ozeki and Heki 2010). Though these studies focus on identifying different components of the nuclear developments in the DPRK, none extracts the individual features of a missile testing facility. Furthermore, most studies use the Yongbyon facility due to its prominence in nuclear development and chemical enrichment. Minimal research appears to use the missile testing facility at the Sohae Satellite Launching Station, another prominent testing site. 18 2.4. Rule-Based Image Object Detection According to Castilla and Hay (2008), rule sets are a comprehensive representation of procedural knowledge. GEOBIA rule sets are used to classify the image based on user knowledge (Castilla and Hay 2008). GEOBIA is carried out through a multi-phase workflow: pre-processing the image, segmenting the image into candidate image objects based on spectral features, and classifying the image objects based on user-defined parameters for each class (Castilla and Hay 2008). Few studies in the reviewed literature stray from this workflow; although, some conduct a cyclical repetition of multiple iterative segmentations and classifications (Baatz, Hoffmann and Willhauck 2008). Castilla and Hay (2008) cite in Benz et al. (2004) that these cycles of segmentation and classification are necessary for the incorporation of semantic meaning into image objects. Rule-based methods of image object detection in GEOBIA largely focus on the classification procedure, focusing mainly on either land cover classification or urban classification. Dragut and Blaschke (2006) classified the geomorphology of landforms in Germany and Romania by comparing Digital Terrain Models (DTM) to the image segmentation and classification based on a hierarchical classification rule-set. Bhaskaran, Paramananda, and Ramnarayan (2010) classified boroughs of New York, the US to compare pixel-based- and object-based classification methods. They address the separability of the urban features and focus on individual objects in the scene to create classes (Bhaskaran, Paramananda and Ramnarayan 2010). This focus on individual image objects serves as inspiration for the study described here. The authors conclude that GEOBIA significantly increased accuracies in detecting these urban features when compared to pixel-based methods (Bhaskaran, Paramananda and Ramnarayan 2010). 19 As a major component of rule-based GEOBIA, the feature thresholds and scales in the rules remain critical to the accurate analysis of the image using GEOBIA. GEOBIA allows for multi- scale analyses at the pixel, object, and pattern levels. The object scale is the smallest meaningful unit for the analysis and most closely resembles semantically meaningful objects (Ming et al. 2015). Torres-Sánchez, López-Granados, and Peña (2015) investigate different values for the scale parameter, shape, and compactness in segmentation for classifying vegetation, determining that increases in scale parameter reduce error until an optimal value is achieved after which, the error increases. Through their implementation of multiresolution segmentation, the authors determine the scale parameter is the most significant parameter for initial segmentation parameters, since other values, such as shape and compactness, produced minimal impact in comparison (Baatz and Schäpe 2010; Torres-Sánchez, López-Granados and Peña 2015). Multiresolution image segmentation is a bottom-up pairwise merging technique that merges individual pixels with nearby pixels to produce an image object iteratively based on spectral similarity (Torres-Sánchez, López-Granados and Peña 2015). Supervised methods include both k-nearest neighbor and random forest classification approaches. Supervised classifications in image analysis are those which assign classes to pixels or objects based on the spectral values from samples of each class. A common method for GEOBIA is a k-nearest neighbor, which Maxwell et al. (2015) used to classify mine presence and reclamation land in West Virginia, US compared to other methods of classification, such as random forests. In a K-nearest neighbor classification, the algorithm uses the samples to classify neighboring objects based on their similarity to the samples (Weinberger, Blitzer, and Saul, 2006). A commonly used machine learning algorithm is the random forests classifier. In machine 20 learning, the processor collects spectral data based on a training set (often samples) and depending on the classifying algorithm, will use the information to train the classifier and classify the image. This method creates a designated number of decision trees and randomly plots the points in the image; classification is based on the training data at each location (Breiman 2001). Random forest classification has been used in various applications for land cover mapping, for example, peatland in Ontario, Canada using LiDAR data (Millard and Richardson 2015), urban classifications of LiDAR data (Chen et al. 2014), and determining tree health with IKONOS imagery (Wang, et al. 2015). Rule-based classification studies tend to focus on incorporating knowledge into the image for interpretation. Krtalic (2016) uses a system, T-AI DDS, to conduct segmentation and classification for automatic detection of mines and minefields in Croatia. While the authors conclude that the application of automatically detecting mines requires more research, their semi-automated method of detection incorporates all data and expert knowledge available in the scene (Krtalic 2016). These types of methods follow a long history of expert-based image analysis systems. For example, Wharton (1982) proposed a knowledge integration method based on rules with the CONAN, or contextual analysis, method, which classified data into image components and conducted a contextual classification based on the mixture of these components. 2.5. Knowledge Incorporation into GEOBIA Applications With a focus on improving classification automation, studies in GEOBIA classifications are exploring the potential of complementing computational approaches with knowledge incorporation. Johnson and Xie (2013) developed a method to incorporate “super object information” into image objects and compare them with and without the information to 21 determine whether or not increased information improved the accuracy of detection. The information incorporated included the spectral parameters, texture, size and shape information of the image segments’ super objects, which are the larger segments within which the segments of focus are located (Johnson and Xie 2013). Instead of comparing methods for accuracy, MacFaden and O’Niel-Dunne (2015) created two rule sets (one for each study site) and incorporated contextual criteria to detect automatically any potential infrastructure damage on roads following major storms using GEOBIA. Nussbaum, Niemeyer, and Canty (2006) similarly avoid comparison of multiple methods and develop a GEOBIA-based separability and threshold (SEaTH) algorithm for automatically classifying image objects. The study classifies objects in the Esfahan Nuclear Facility in Iran using a quantitative classification approach in the form of developed mathematic algorithms to classify the images (Nussbaum, Niemeyer and Canty 2006). This differs from previous studies that classify using qualitative features, namely the key interpretation elements (MacFaden and O'Neil-Dunne 2015). Marpu et al. (2008) use the same separability and threshold method and study site as Nussbaum, Niemeyer, and Canty (2006); however, their study uses this method to automatically process imagery in GEOBIA (Marpu et al. 2008). Unlike other studies, this study only focused on detecting a single class (the class of interest), which explained the investigation of the two classes, class of interest and background (Marpu et al. 2008). The incorporation of knowledge into automated image detection processes has produced a large number of studies despite its complications. Many studies focus on knowledge incorporation via ontologies. Written, computer-readable, and reproducible representations of expert knowledge, ontologies are used in GEOBIA as a means of exploiting structural parameters 22 of an image that is traditionally accomplished only by humans for image interpretation (Blaschke et al. 2014). Ontological workflow development is similar to standard rule-based procedures; however, developing workflows with ontologies involves hierarchies of classes for use and establishing a database of knowledge for defining individuals and classes (Gu et al. 2015). Many studies centering on knowledge integration via ontologies use GEOBIA. Gu et al. (2015) studied the development of a universal workflow based on a hybrid machine-learning and semantic model, which they applied to farmland in China. Rather than using traditional GEOBIA software such as eCognition these authors employ web service applications (GeoBrain) to intelligently identify complex artificial features (Trimble 2019; Yue et al. 2013). This study focuses on the detection of weapons of mass destruction (WMD) facilities (Krtalic 2016; Nussbaum, Niemeyer and Canty 2006; Marpu et al. 2008). Belgiu, Hofer, and Hofmann (2014) developed a classification procedure with embedded expert knowledge using GEOBIA. After creating the ontology in Protégé, an open-source ontology building program, in the OWL2 Web Ontology Language, the ontology may be used to classify image objects (Stanford Center for Biomedical Informatics Research 2016; WC3 Web Ontology Working Group 2004; Belgiu, Hofer and Hofmann 2014). Some studies have also examined the possibility of integrating knowledge into other image classification methods. For example, Liedtke et al. (1997) proposed a new program called AIDA that integrated semantic nets in image interpretation processes; however, the program could not successfully detect complex features as they were and could only detect these features. Similar to O’Neil-Dunne, MacFaden, and Pelletier (2011), this study attempts to replicate human interpretation in the creation and display of contextual relations, though this study focuses on 23 analyst queries whereas the former study addresses simultaneous segmentation and classification. 2.6. Feature Extraction GEOBIA is noted for having large computational demands. One means for reducing processing time and computer RAM demands is the use of feature space reduction, the process of reducing the number of image attributes used in classification based on target separability. A typical image object may retain more than 200 features across different scales. The number of object features increases as the spatial resolution becomes finer (Ma, et al. 2015). Limiting the number of features used for classification significantly reduces the amount of time needed to compute the classification, thereby making it effective for image analysts in need of quick results; however, it retains potential to decrease accuracy in classification given its fewer features of consideration (Ma, et al. 2015). Land cover remains a common application for feature extraction methods in the present literature. Yu et al. (2006) classified local vegetation land cover in northern California, US using 52 features derived from a developed statistical classification and regression tree (CART) algorithm. A large number of object features is based on the CART algorithm, an automated approach with a higher number of features in this case than other studies (Yu et al. 2006). On the other hand, Taubenböck et al. (2010) focus on transferability from one classification to another using a limited number of object features. The study used an object feature hierarchy to classify and extract urban objects in Istanbul and India, which effectively classified individual homes at 85% overall accuracy (Taubenböck et al. 2010). 24 Feature reduction affects several varieties of classification algorithms, including rule-based and machine learning approaches. According to Pal and Mather (2005), the size of a machine learning training set has a significant effect on the algorithm and must contain specific class descriptions. Machine learning feature space reduction must aim to include at least 10-30 times the number of features in each training set as pixels for training data (Pal and Mather 2005). Ma et al. (2015) analyzed the effects of training set sizes at different scales for rural land cover classification in Deyang, China. Using the gain ratio from Quinlan (1996), they rank all features and a best-first search algorithm to obtain object feature subsets (Ma et al. 2015). These reduced feature subsets and varying training sizes were used to conduct a random forest classification with minimal computational requirements (Ma et al. 2015). 2.7. Objectivity A central concern in the integration of human knowledge into remote sensing workflows is objectivity. Objectivity in the detection and classification of image objects is a difficult task as human interpreters, as well as complex objects, are inherently subjective. Krtalic (2016) integrate computer-based segmentations with the self-produced T-AI DDS program to reduce the potential for subjectivity in the results. Furthermore, Gu et al. (2015) attempt to develop an objective GEOBIA workflow by incorporating expert domain knowledge via ontologies and semantic maps. Although GEOBIA classification is biased by an operator, subjectivity can be improved by using ontologies developed by multiple experts (Gu et al. 2015; Belgiu, Hofer, and Hofmann 2014). Baatz, Hoffmann, and Willhauck (2008) attempts to disregard subjectivity concerns in developing a rule-based GEOBIA. Their approach requires the operator to know the real-world object for which he or she is searching in the image to segment the image and classify it, which 25 retains some inherent subjectivity. The “spiral method” discussed and applied in the study requires the operator to segment by a particular item in the image, permitting the operator to choose when the segmentation and classification end and altering the scope based on different operator experiences in the domain (Baatz, Hoffmann, and Willhauck 2008). While the method retains inherent subjectivity in the knowledge of the features for segmentation, it attempts to establish a definition based on the parameters that are useable by different operators for classification. Moreover, Arvor et al. (2013) argue that in GEOBIA applications, expert knowledge and expert bias ultimately limit the segmentations and classifications, emphasizing the need to reduce subjectivity and increase objectivity in GEOBIA workflows. 2.8. Workflow Reusability A universal rule-set or workflow is unlikely to be achieved using GEOBIA due to the complexity and heterogeneity of the earth’s surface. Creating limited, repeatable workflows for specific scenarios may be possible. Ontologies are one means of addressing this issue. At present, there is a lack of a comprehensive, systematic formalization for class definitions in GEOBIA leading to subjectivity (Belgiu, Hofer, and Hofmann 2014). These authors continue to address that ontologies may potentially serve as standardization for class definitions (Belgiu, Hofer, and Hofmann 2014). Arvor et al. (2013) explore the potential for ontologies and argue that they will permit objectivity across disciplines, emphasizing ontology mapping as a means to that end. This sentiment is shared across other studies (Castilla and Hay 2008). Despite the potential for a universal approach and workflow for GEOBIA applications, few follow through and conduct testing towards neutral and reusable workflows. Yue et al. (2013) incorporated thematic semantics to develop a transferable workflow for weapons site detection. 26 Studying farmland in China, Gu et al. (2015) pursued not only a fair, objective workflow but also one that would be universally applicable based on machine learning methods. None of these studies has led to the development of a successfully transferable GEOBIA workflow (Gu et al. 2015). 2.9. Contributions of the Present Research I have reviewed some research studies that have addressed various facets of GEOBIA workflow. While these studies have compared outcomes from the application of different classification strategies, few of these studies address the usefulness of incorporating expert knowledge. A variety of knowledge incorporation methods, such as ontologies and semantic networks, have been used in GEOBIA. Feature space reduction appears to be a more common approach to classify image objects intelligently. As far as I can tell, no study directly compares the use of feature space reduction as a knowledge-based process to a classification without feature space reduction. Other studies also do not appear to have used expert interpreters’ annotations as a guide for feature selection. Thus, our research explores a new avenue for feature space reduction and potential for improving the integration of expert knowledge into GEOBIA. Despite a limited number of studies on knowledge integration that focused on military sites, none focused on the classification of DPRK missile testing facilities, which is surprising given its prominence in modern global affairs. Only a handful of studies focus the remote sensing imagery on missile testing sites. GEOBIA research appears to favor of the Islamic Republic of Iran (IRI) and India (Niemeyer, Marpu, and Nussbaum 2008; Gupta and Pabian 1997). Though these studies focus on the CTB treaty verification, they do not address the issue in the DPRK’s missile program, at least publicly. Finally, there does not seem to be a systematic approach to automating GEOBIA 27 for commercial intelligence uses. According to Diamond (2001), the lack of quasi-instantaneous automated change detection in satellite imagery as a major problem for government intelligence. This study will fill the gaps left regarding the lack of research towards the DPRK missile program in GEOBIA and a direct comparison of automated methods of classification comparing knowledge-based- and knowledge-devoid-classification methods. 28 3. Methods The goal of this research is to develop a GEOBIA workflow for identifying buildings at missile test sites in the DRPK and to determine whether the inclusion of expert knowledge improves the object detection accuracy. I conducted four main analyses to meet the five objectives of this research. Objective 1: Obtain human interpreter knowledge. Objective 2: Determine the features indicative of missile testing. Objective 3: Segment and classify each of the images using the three classification methods and compare the accuracies with and without utilizing interpreter-used features. Objective 4: Compare classification accuracies of multiple classification software. Objective 5: Determine whether improving spatial resolution improves classification accuracies of the three classification algorithms. The first analysis extracted expert interpreter’s visual cues used for the detection of missile testing facilities from text documents. The second analysis compared the different GEOBIA classification methods to determine which classification techniques prove more effective in classifying buildings of the missile testing facilities. The third analysis then compared these classification techniques across classification software. Finally, the fourth analysis compares the classification algorithms across different spatial resolutions. Traditional remote sensing accuracy assessment methods are used throughout the study to determine the success of these classifications. 29 3.1. Study Sites The study areas of this project include two major missile-testing facilities in the DPRK, as well as the Palisades Nuclear Energy Facility in Michigan, US. Data for the DPRK sites, their locations, and test information were retrieved from the Nuclear Threat Initiative (NTI) website (Nuclear Threat Inititative 2018). The NTI compiled a Microsoft Excel spreadsheet containing missile test information since 2016 with variables including date of tests, type of missile tested, the location of the launch, the achievable distance by the missile tested, and the test’s success or failure. For this study, I used the test site location from this dataset. The DPRK monitoring agency, 38 North, publishes web articles containing very high- resolution satellite imagery from Digital Globe Company along with interpretations by former military analysts in the form of image annotations (38 North 2018). I collected all articles and associated images from 38 North’s satellite imagery archive through February 2018, which totals roughly 120 articles with approximately 1500 images across all of the articles. Each image contains between one and ten analyst-written annotations to highlight specific components critical to their analysis of the testing sites, such as a launch tower arm or vehicles present or absent. The articles were then grouped by the missile test site location, and the five sites with the most articles were chosen for the study. Though the DPRK’s main nuclear testing site, Punggye- Ri, had the greatest number of articles, I excluded it as an outlier being the only confirmed nuclear weapons testing facility in the DPRK (Nuclear Threat Inititative 2018). Therefore, I selected the next four most-frequently-occurring sites from which to analyze and increase the focus of the study-- Sohae, Yongbyon, Sinpo, and Musudan-ri (Tonghae). I selected this subset for several 30 reasons. First, the four sites provided wide spatial coverage horizontally across the country, which would ideally require a more comprehensive rule-set that accounts for broader geographic variation in the land surface, cover, and uses. Second, the additional sites are seldom mentioned in 38 North, making them less appropriate for this study given their likely infrequent use. Lastly, the use of four study sites allows a manageable amount of data to analyze and compare with our methods as opposed to data from each of 23 testing facilities. The four sites yielded 300 total images for which I created an Excel spreadsheet to compile each image and article’s data, including the site focus of the article, date of the article, the name of the interpreter(s)(left as 38 North if none other indicated), the date of the image, and the annotations in each image. I assigned a specific identification (ID) code to each image and its respective data and qualitatively coded the annotations, which will be discussed in detail in Section 3.2. Figure 2 displays the number of articles and interpretations per the four missile testing sites. 31 After further assessing the study sites, I chose to focus on only the Sohae and Yongbyon facilities. Imagery available for Sinpo and Musudan-ri did not permit adequately fine resolution to distinguish the individual building facilities. Additionally, these two omitted sites operate quite differently from Sohae and Yongbyon, which are both largely urban compared to the shipyard, Sinpo, and mountain launching facility, Musudan-ri. Musudan-ri (Tonghae) Yongbyon Sinpo Sohae 0 50 100 150 200 250 Figure 2: Number of expert interpretations from 38 North per missile testing site in this study. Articles with Interpretations I justify the selection of the Sohae and Yongbyon sites because these sites still adequately address urban areas in the DPRK and are likely reproduceable at other similar study sites. Second, these two remaining sites retained far greater articles of focus than the respective omitted sites. Lastly, focusing on two sites permits far greater focus into the study applications than spreading resources across the four sites. For both Sohae and Yongbyon, I retrieved imagery for both locations from Planet Labs Imagery & Archive (Planet Labs 2019). 32 3.2. Data The imagery used retains a 1m spatial resolution with four spectral bands and was acquired in 2016. For the analysis of the different spatial resolutions, the image used was obtained from the US Geological Survey (USGS) Earth Explorer as National Agricultural Imagery Program (NAIP) imagery. The image retains a 1m resolution initially, which was degraded to 3m resolution for the comparison. This image contains four spectral bands and is 1,936 by 2,526 pixels in size. All of the original images used may be seen in Figures 3-5. Figure 3: Sohae Image. Source: Planet Labs 33 Figure 4: Yongbyon Image. Source: Planet Labs. Figure 5: Palisades nuclear power plant, 1m resolution and 3m resolution (right). Source: USGS. 34 3.3. Analysis 1 - Extraction of Expert Interpretation Cues The objective of this study was to extract expert knowledge from image interpretations on the 38 North website and analyze the contests of those interpretations using content analysis. Content analysis is often used for qualitative research, a process where text contents are grouped into classes based on a theme relevant to the study (Hsieh and Shannon 2005). In content analysis, all textual information is classified based on a case-based dictionary of terms (McTavish and Pirro 1990). I use content analysis to codify the expert annotations. After collecting all of the experts’ annotations, I codified the annotations hierarchically based on common characteristics. I transformed the annotations manually in Microsoft Excel into single-word codes using descriptive coding, which provides a general descriptive term for a breadth of information (Saldaña 2013), and a priori coding, which employs theoretically-derived codes from prior knowledge (Bazeley and Jackson 2013). For example, buildings and train stations received the code “building.” I then divided these general codes into more specific categories in a similar a priori, descriptive manner. In the same case as above, train stations received the class, “building,” and the subclass, “transportation.” I further extracted the annotations and organized them in an Excel spreadsheet. In the end we identified six classes: buildings, vehicles, environment, missiles, and changes. Since the interpreters of the images used repeated the annotations across images due to similar characteristics in different images, I omitted any repeats in the organization of these classes for individual annotations. Classification of these text documents resulted in a data dictionary that could be used to support the second objective of this work. The “buildings” class yielded the highest number of 35 annotations and objects after eliminating duplicate mentions. The remaining classes were merged into one of three classes: vegetation (environment), water (environment), and built-up (all unaccounted annotations). After determining which objects were of most interest, I performed a text analysis of the expert-written articles. This analysis aimed to determine the words, and thus visual cues, expert analysts used in interpreting the images. I employed the online corpus developer, Sketch Engine, to create a corpus and determine these contextual cues. Sketch Engine allows the user to provide a corpus of text and extract data about words, parts of speech, position in a sentence, and so on (Lexical Computing CZ 2019). I first created a corpus including all of the articles from 38 North about the Sohae and Yongbyon study sites for this analysis. From these articles, I then analyzed the corpus using the image interpretation elements of texture, shape, size, pattern, shadow, location, tone/color, height, and site as search filters. Each of the searches was conducted individually and included related terms. For instance, the analysts did not often write the word, “texture” or “color” in their reports; however, they would write “smooth” and “green,” which thereby yielded greater results than the interpretation terms. These results are similar to the results of the content analysis of historical interpretation documents presented by Bianchetti and MacEachren (2015). After the word frequency analysis was completed on the article corpus, a concordance analysis was carried out on the interpretation elements in the document. The entire concordance results consisted of 305 entries, retaining the searched word, its grammatical form in the sentence, the sentence fifteen words before and after the cue, and the document it in which it exists. I then reduced the concordance to a subset which included only the searched adjective 36 cue, its frequency across the articles, and its cognitive code relating to the image interpretation elements. Table 2 displays the concordance subset for identifying buildings in images. These most-frequently-occurring textual cues were then used in the second portion of this study to incorporate expert knowledge as a means of feature space reduction in image analysis. For example, what would be the best features for identifying buildings? The “buildings” class yielded the highest number of annotations and compressed (after eliminating repeats) objects. Adjective green high low short rectangular large small long Frequency 5 32 26 16 7 107 96 14 Cognitive Code Color Height Height Height Shape Size Size Size Table 2: Concordance subset for the knowledge base. 3.4. Analysis 2 - Classification Methods Most Appropriate for Missile Facility Extraction Next, GEOBIA was carried out to determine which classification method performed best at identifying buildings. Trimble eCognition was used to develop the GEOBIA workflows, resulting in six image classifications per each of the two images. Trimble’s eCognition simplified the translation of the concordance results in the image classification. Three of the six classifications included the results of the concordance analysis to reduce the feature space. The other three classifications used the same type of classification method but did not include a feature space reduction method. The three classification methods selected for analysis were rule-based, nearest neighbor, and random forest classification. The rule-based classification requires the development of rules based on thresholds of spectral and geometric information to classify the image objects. These rules require the user to manually determine the 37 appropriate thresholds for the features used to classify the image. As such, this method required the most trial-and-error iteration of the three methods used here and was the most time- consuming. The two supervised approaches, nearest neighbor and random forests, used in the study required training data to classify the images. The nearest neighbor classification method uses the feature values from samples and determines, based on the similarity between the training classes and candidate objects, which class is most appropriate for each candidate image object neighboring the samples. It evaluates the similarity of each object to its neighboring class and assigns a class based on whether or not it is more similar to that given class. The random forest classification is a common method of machine learning classification. Similar to the nearest neighbor classification method, the random forest classification requires a training set. This layer of samples is used to train the algorithm to detect the assigned classes based on the value thresholds of the class samples. The algorithm creates a user-designated number of decision trees for the computer to process, and a user-defined number of random points. After the algorithm is trained, it creates many trees, based on the user’s discretion, and determines the classes at each tree, after which it conducts a vote across all trees that determines the class at that location. These classification methods were then augmented to incorporate expert knowledge into the classification. Feature space reduction decreases the number of features, or attributes, used for classification. This approach tends to improve computation speed and improve accuracy, as some features used for classification remain difficult or impossible for an expert interpreter to employ in analysis and may introduce noise into the classification process. Table 3 displays the 38 segmentations and classifications for the computer knowledge (CB) and expert knowledge (EK) feature spaces in the study. Computer Knowledge Rule-based Nearest Neighbor Random Forest Expert Knowledge Rule-based Nearest Neighbor Random Forest Table 3 Classifications used for analysis per feature space. 3.4.1. Rule-Based Classification in eCognition To initiate the study and develop a relative understanding of the thresholds of values in the data, I began with a rule-based classification. The rules used in the classification are developed based on user-designated thresholds of the spectral values of the image objects. Manually inputting these values provided us a better understanding of the values in the data for the future classifications. Before developing the rules for the classification, I first had to segment the image into candidate image objects. I used the same segmentation parameters for both images. I used a multiresolution segmentation parameter of a 50 scale, 0.1 shapes, and 0.5 compactness for both segmentations. These values were selected based on an interactive process and visual interpretation. Following the segmentation, we conducted a merge of image objects with similar spectral values. I manually merged the image objects by hand using eCognition’s manual merge tool to merge similar objects, focusing the image objects representing buildings. After the segmentation and merging, I began to develop the classification rules. Each rule focused on classifying a particular class. I classified the most prominent classes first, progressing to classes with fewer features to classify. I used the following classes for both images: water, vegetation, built-up, and buildings. A background class was created to delineate the void portions 39 of the images due to the path of the satellite from the actual image content. Since all values in the background class were 0, I classified this class first, followed by the four classes in the order listed above. To determine the thresholds, I focused on a select number of features to build the rules. I used eCognition’s feature selection window to test different thresholds of particular features through a trial-and-error method of inputting values. For the CK classifications, I did not limit the number of features for analysis. Accordingly, the rules accounted for all four spectral band values, geometric values, and shape and positioning values, in the CK case, while the EK classifications considered mainly spectral values and geometric values. While I considered these features in the development of rules, I developed rules mainly on the spectral values of the classes, including brightness, near-infrared, red, blue, green, and maximum difference values. For the EK rule- based classification, I limited the feature space used for rule development to only those found useful in the text analysis. The features in this second case were limited strictly to the most equivalent feature in eCognition’s feature space to the word in the concordance. A comparison of the features used in each method can be seen in Table 4. Brightness The radius of Smallest Enclosing CK Features Ellipse Max. Diff. Blue Green NIR Red Area Border Length Rel. Border to Image Border Volume Asymmetry Rectangular Fit Roundness Shape Index Number of Pixels Border Index Compactness Density Elliptic Fit Main Direction The radius of largest enclosed Table 4: Features that were used for knowledge incorporation. 40 EK Features Brightness Green NIR Area Number of Pixels Asymmetry Elliptic Fit Rectangular Fit Roundness The rule development process continued until I determined the classification produced the best results. In eCognition, I produced these rules in a hierarchy that separates the segmentation rules from the classification rules and separates the rules for each class. In doing so, the classification process can be reapplied without having to re-segment the image. The classification parameter values varied between the images, but I retained the rules between the same images. The final rulesets for each of the CK and EK classifications are displayed in Appendices 2a-2b. I repeated these classification iterations with different rule thresholds until the best-appearing classification was produced. 3.4.2. Nearest Neighbor Classification in eCognition The nearest-neighbor classification was completed next. For consistency in the segmentation, the same segmentation parameters were used as those described above. The supervised nearest neighbor approach does not require specified rules; however, it requires a set of samples to compute the classes itself. Using the feature space designated in Table 4 above for the without- knowledge classifications, I created samples for each class by visually sampling objects for each image using the sampling brush in eCognition. While I attempted to keep the number of samples the same across both images, the significantly greater number of image objects in the Yongbyon image led to a slightly greater number of samples in this image than in the Sohae image. Additionally, the number of samples varied across the classes, since some classes, were more easily distinguished. The number of samples per class in each image are seen in Table 5. 41 Building Background Water Vegetation Build-up Class Sohae Yongbyon 25 1 6 31 20 25 1 10 25 30 Table 5: Number of samples per class for each DPRK image. The number of samples used was scaled to the amount of the respective class objects in the image, with a greater emphasis on the building objects and samples. The samples were held consistent for both the case that included feature space reduction and the case that did not. The samples for both images are visible in Figures 6-7. Figure 6: Sohae samples used for supervised and machine learning classifications. For both the knowledge-based nearest neighbor classification and without-knowledge-based variant, I used the same segmentation parameters used in the knowledge-based rule-based 42 Figure 7: Yongbyon samples used for supervised and machine learning classifications.. classification and used the same samples for both nearest neighbor knowledge levels. The same image objects were accordingly analyzed for the algorithm tests across both feature spaces of the same respective images. The difference between the two tests per each image is the different feature spaces used to classify each in the nearest neighbor algorithm. The knowledge- incorporated nearest neighbor classification, I used the knowledge-based feature space to produce its supervised classifications. The without-knowledge variant, however, used the non- restricted feature space representing no knowledge incorporation. The same two rules and the same classes were used for both with- and without-knowledge classifications of both the Sohae and the Yongbyon images to maintain consistency between the study sites. This algorithm produced a total of four classifications, as with the other algorithms in this study. 43 3.4.3. Random Forest Classification in eCognition The final classification algorithm is a random forest classification. The segmentation parameters were held constant for the without-knowledge and the knowledge-based classification workflows resulting in the same candidate image objects as the rule-based and nearest neighbor classifications above. This classification required three rules: the image layer copy, training the classifier, and executing the classification. To train the classifier, I used samples transferred from the nearest neighbor classification. The features used for the training included those in Table 2 from the feature spaces. As before, the feature spaces used differed between the without-knowledge and the knowledge-based processes. Within this trainer, I also designated the parameters for the random forest classification. I used 16 maximum categories, 200 trees, and 0.2 forest accuracy to produce the best-appearing classification from the random forest classification. All other parameters in the editor remained their default settings. Following the establishment of the training parameters, I trained and ran the machine learning classifier. Executing the classifier required the use of the final rule in the ruleset. All parameters for this rule were default parameters except selecting the appropriate training data. I conducted the same steps with the knowledge-based random forest classification with only the reduced feature space. As with the other classifications, the random forest classifications for both images were rerun until the best-fitting parameters were achieved. 3.5. Analysis 3 – Comparison of Classification Software Several programs exist to conduct image classifications. While this study primarily uses eCognition, to find the combination of variables that produces the best classifications of the 44 missile testing facilities in the DPRK, I produced classifications using the R software package. This package permits the use of the R computer language to conduct the classifications manually. To compare the software packages, I conducted a random forest classification in R on the Sohae image to compare to the results obtained from classifications conducted in the previous analysis. Both eCognition and R have capabilities for executing random forest classifications, though their purposes and processes remain quite different (Trimble 2019; The R Foundation 2019). eCognition is a GEOBIA software developed by Trimble that permits the development of rulesets to automatically classify and analyze remotely-sensed imagery. eCognition Developer is used to develop processes for image analysis whereas the other packages extend the software for other analysis scenarios. R is a highly flexible, open-source, and extensive statistical program used for a variety of applications. It is a language that relies on coding to produce some statistical operations in various programming environments. The wide applicability of the R computer language permits the analysis of remote sensing data and conducting classifications of its own. For the comparison of the software techniques, I used the random forest classification algorithm in eCognition and R. I used only the Sohae image for the comparison using the without- knowledge case to retain consistency between the two programs. To conduct the classification in eCognition, I used the same steps detailed in Section 4.4.3. All of the parameters detailed in 4.4.3 discussion of the without-knowledge case apply here. The process of conducting random forest in R does not require workflow development in the same way that eCognition does, but rather, it uses command line processing to call on packages to classify the image. To train the random forest classification, a reference dataset constructed in ArcMap resembling the samples used in the eCognition classification was used. Using the R 45 ranger library, I conducted random forest classifications and constructed three separate models. I began by creating a point sample using the reference data and the pixel values from the image. Of the 69 polygons in the reference sample points, 50 were chosen at random for sampling. Within the chosen polygons, three-point locations were chosen at a regular sampling interval, removing points within 15 meters from another, resulting in 139 points. Spatial random forest (RFsp) classifications were executed in R, by generating a grid of 29 points across the image, with one raster generated for each point. Cell values then represented the distance from the cells to each point, which enabled the classifier to use the relative location for improved predictions. Of the 139 sample points, 111 were used for the training data, and 28 were used as a validation set. With these training and validation sets, I constructed three models with the RFsp classifier. The first model used the pixel value for the random forest classifier. The parameters designated were trees=500 and mtry=one, mtry is the parameter that constraining the number of variables considered at each decision point in the random forest classification. The second model developed and used in R is as follows with equation (1): Class = z + layer.1 + layer.2 + layer.3 + layer.4 + layer.5 + layer.6 + layer.7 + layer.8 + layer.9 + layer.10 + layer.11 + layer.12 + layer.13 + layer.14 + layer.15 + layer.16 + layer.17 + layer.18 + layer.19 + layer.20 + layer.21 + layer.22 + layer.23 + layer.24 + layer.25 + layer.26 + layer.27 + layer.28 + layer.29, (1) where z is the pixel value. Each layer.x value represents the distance value from each respective grid point. The model identifies the best mtry parameter value. Once mtry was defined, the classification was run with 46 500 trees. The third model did not consider pixel value, as it was a purely spatial RFsp model, and the model used in R is as follows with equation (2): Class = layer.1 + layer.2 + layer.3 + layer.4 + layer.5 + layer.6 + layer.7 + layer.8 + layer.9 + layer.10 + layer.11 + layer.12 + layer.13 + layer.14 + layer.15 + layer.16 + layer.17 + layer.18 + layer.19 + layer.20 + layer.21 + layer.22 + layer.23 + layer.24 + layer.25 + layer.26 + layer.27 + layer.28 + layer.29 (2) The mtry value used in this third model was 10, and again, 500 trees were used. The results of each model will be displayed in Section 5. 3.6. Analysis 4- Comparison of Spatial Resolutions After achieving poor results using the Planet Lab imagery, I examined whether to improve the spatial resolution of the satellite imagery might improve the classification results. Because imagery of the DPRK was not available at resolutions greater than 3 meters, I used proxy imagery for this final analysis. The three-meter resolution made a visual interpretation of the objects in the image difficult. I chose to use NAIP (National Agricultural Imagery Program) imagery with a 1-meter spatial resolution of the Palisades Nuclear Energy facility in Michigan, US as a proxy for the Planet Labs imagery, as it has a finer spatial scale, but similar spectral resolution (United States Department of Agriculture 2015). A 2016 NAIP image of this site was obtained from the US Geological Survey (USGS) Earth Explorer (United States Geological Survey 2019). Insets of the image of the nuclear energy facility are provided in Figure 8 to display in more detail the features of focus. For consistency and a further test of the accuracy between different spatial resolutions, I reduced 47 the resolution of the image in ArcMap to three-meters to imitate the spatial resolution of the Planet Labs imagery of the DPRK missile testing facilities. I conducted the same six classifications described above including the distinction between without-knowledge and knowledge-based feature spaces. I copied the exact rules used in classifying the Sohae image and employed them towards the classification of both the one- and Figure 8: Nuclear Power Plant (top) and airfield (bottom).1m images on left and 3m images on right. three-meter resolution images of the Michigan nuclear energy facility. The segmentation used the same parameters as the Sohae segmentation. The parameters of the rule-based classification 48 required altering given the difference in the spectral parameters of the Planet Labs imagery to the NAIP imagery. Rules for the other classification algorithms were not altered to maintain as much consistency as possible between the images. The classifications of the NAIP images followed the same steps for classification of rule-based, nearest neighbor, and random forest classifications. The aim of this classification was to create a full-scene classification as opposed to focusing on the buildings present. Some classes were added – namely, the residential and industrial buildings classes to distinguish the two types of buildings visible in the image and determine if the resolution permitted such analysis. Due to this addition of classes, some new rules were required for the rule-based classification; however, several of the rules remained similar to those copied from the Sohae image classification. The complete rules are found in Appendix 2c. The new image and new classes required new samples for the nearest neighbor and random forest classifications. The samples from the Sohae imagery would not be representative of the Michigan nuclear facility either in geography or in spectral values. I created samples for each of the new classes: residential buildings, industrial buildings, built-up, vegetation, and water. The background class was excluded from the classes since the image did not contain any spectrally void regions. The numbers of each of the samples are as follows: 25 residential buildings, 15 industrial buildings, 20 built-up, 30 vegetation, and 10 water. The samples were the same for the without-knowledge feature space, but the samples changed between the two resolution images. Due to the different resolutions, the segmentation algorithms produced different results with far fewer image objects in the three-meter resolutions than in the one-meter resolution. 49 As a result, the samples from the one-meter resolution imagery could not be transferred accurately to the three-meter imagery, despite applying a Test Time Augmentation (TTA) mask for sample creation. Applying the TTA mask required a degree of overlap between the two images to be successfully transferred. In doing so, the default overlap of 75 % produced no samples in the new image. I reduced this number incrementally to 10%, which ultimately resulted in very few samples in the new image. Accordingly, I chose not to use the TTA mask for the sample selection of the three-meter resolution image and instead created samples resembling as closely as possible the samples from the one-meter imagery. The samples are displayed in Figure 9 to show the differences in the samples’ sizes, though the distribution and number remain relatively similar. The nearest neighbor and random forest classifications retained the same rules and parameters aside from different samples between the resolutions. I ran each algorithm until I achieved optimal results. The residential building and industrial building classes were removed from the classifications in the three-meter resolution imagery due to major misclassifications. This issue is discussed in depth Section 5. As with the other classifications, the results of each of these classifications is provided in Section 4. comparison. Each of these types of accuracy assessments reflected the nature of the information sought from each classification. 50 Figure 9: Samples used for the spatial resolution comparison classifications. 3.7. Post-classification Accuracy Assessments The comparison of several classifications with several different factors required more than one type of accuracy assessment. Classification accuracy assessment was conducted for each of the classification cases that are described above (DPRK imagery, NAIP imagery). Accuracy assessment of the DPRK imagery was completed first. The goal of this assessment was object identification accuracy. First, all of the Sohae and the Yongbyon images classification were exported as shapefiles to ArcMap. I created a shapefile for each image of the features identified by the 38 North image interpreters. I then created a random points layer for each image. The attribute table of this random points layer was then joined with each of the six classifications results shapefiles. These joins resulted in a file that contained the class results and the expected results based on the expert interpretations. To assess the accuracy of the DPRK image classifications, I manually created a confusion matrix from the attribute tables. For each classification, I visually observed the classes of each 51 random point based on the respective unclassified images. This visual result acted as the ground truth for the accuracy assessment since I have limited access and resources to provide a more detailed ground truthing. I then compared the visual interpretations to each point’s respective classification in each of the six classifications per image. I labeled the classified points as either building or non-building. I included the building of interest points created to determine how each classification algorithm performed. The building points were compared to their classifications in each algorithm. For each algorithm, out of the total number of image objects, I determined the number of true positives (TP) or correctly classified buildings; false positives (FP), or non-buildings classified as buildings; true negatives (TN), or non-buildings classified as non-buildings; and false negatives (FN), or non-buildings classified as buildings. With these values for each of the classifications, I calculated the user’s (UA) and producer’s accuracies (PA), the overall accuracy, and the value of F (Radoux et al. 2011). The equation used to calculate each of these values is as follows: CA = [(TP+TN)/(TP+TN+FP+FN)], UA = [TP/(TP+FN)], PA = [TP/(TP+FP)], F = [(UA x PA)/(UA+PA) Next, I developed an accuracy assessment to compare the results of the R and eCognition processes. To compare the different software, I needed to compare the accuracies of the processes from both the eCognition random forest classification and that produced in R. I calculated the values for identification accuracy assessment as detailed above for both the (3) 52 eCognition random forest classification (without knowledge) and the R random forest classification using the four classes used in the classification, eliminating the background classification altogether. The values determined for the R classification used the validation dataset created using the 29 generated points and compared the locations of these points to the classifications for each model created. Contingency tables were created for all models present between the two classifications. The final accuracy assessment compared the impact of spatial resolution on the accuracy of NAIP classification. Since these classifications focused on classifying the entire scene of each of the images, rather than extracting building features, I constructed a confusion matrix of each of the classes. The steps to create the table follow a similar method for extracting resulting classes and reference classes. I created 100 random points and linked these points to each of the three classifications for each spatial resolution. This link assigned each random point the class present at its location in each of the classifications. A contingency table comparing the visually- observed location of the random point in the image to its respective classification was created for each classification for each of the two images. For validation consistency, I retained the same random points for each of the contingency tables, only changing the classifications. 53 4. Results The goal of this research was to determine whether the introduction of human knowledge into the GEOBIA process could improve image classification accuracy in the case of missile test sites. Due to a lack of success in the classification of DPRK images, a secondary analysis was conducted using USGS NAIP imagery as well. This section provides the details of classification results of the three comparisons in this study. Processing times for all classifications may be seen at the end of Sections 4.1 and 4.3 in Table 7 and Table 18, respectively. 4.1. Knowledge Incorporation Comparison This section provides the results of the knowledge comparison classifications. I provide results of each of the rule-based, nearest neighbor, and random forest classifications for both the with- and without-knowledge feature spaces. Each subsection provides details for both the Sohae and the Yongbyon images and is detailed accordingly. For each of these classifications, I used the TP, FP, TN, and FN to calculate image object detection accuracy (overall, user’s and producer’s accuracy), as discussed in the previous section. 54 4.1.1. Rule-Based Classification without Knowledge in eCognition To assess the amount of time that it took to classify the images, I accounted for the number of rules needed in lieu of time. For the Sohae image, the ruleset required a total of 10 rules. The resulting classification for the Sohae rule-based classification without knowledge is seen below Figure 10: Rule-based no knowledge classification, Sohae. 55 Figure 11: Rule-based no knowledge classification, Yongbyon. Of the 31 image objects representing buildings in the unclassified image, 25 were classified correctly as buildings with 6 were misclassified. A table of the accuracies and true/false positives/negatives may be seen with according accuracy calculations in Table 6 at the end of Section 4.1. For the Yongbyon study site, I used a total of 8 rules. The representative classification of the Yongbyon site with this classification may be seen below. Of the 27 building image objects, 13 were classified as buildings and 14 were misclassified as other classes. One image object was incorrectly classified as a building (FP). 56 4.1.2. Rule-Based Classification with Knowledge in eCognition The second rule-based classification followed the same methods as the previous rule-based classification, only with a reduced feature space based on knowledge from image interpreters. For the Sohae image, I used 6 rules. Of the 31 image objects, 25 were correctly classified as buildings and 6 misclassified. For the Yongbyon image, 10 of the 27 were correctly classified as buildings, with one false positive. I used 5 rules for this image classification. The respective classification maps are as follows: Figure 12: Rule-based with knowledge classification, Sohae. 57 Figure 13: Rule-based with knowledge classification, Yongbyon. 4.1.3. Nearest Neighbor Classification without Knowledge in eCognition The first nearest neighbor classifications of each site used the no-knowledge feature space. Unlike the previous rule-based classifications, the remaining classifications were automated, so I recorded the time taken to compute each classification. The Sohae image classification with this algorithm took 46.025 minutes and resulted in 25 out of the 31 image objects being correctly classified as buildings. For the Yongbyon image, the algorithm took 1:45:32 hours to complete and resulted in 21 of the 27 image objects being correctly classified as buildings. The classification maps are as seen below: 58 Figure 14: NN no knowledge classification, Sohae. Figure 15: NN no knowledge classification, Yongbyon. 59 4.1.4. Nearest Neighbor Classification with Knowledge in eCognition These two nearest neighbor classifications used the with-knowledge feature space for classification. With the Sohae image, the algorithm took 23.08 minutes and resulted in 24 of 31 correctly classified buildings with 3 false positives. The Yongbyon image took the algorithm 34.06 minutes and resulted in 17 of 27 correctly classified buildings. The classification schemes of each for this algorithm are as follows: Figure 16: NN with knowledge classification, Sohae. 60 Figure 17: NN with knowledge classification, Yongbyon 4.1.5. Random Forest Classification without Knowledge in eCognition The random forest classifications used two rules. To assess the amount of time the classifications required, I added the processing times of the two rules. The first two used the no- knowledge feature space for classification. For the Sohae image, training the classifier took 0.203 seconds, and applying the classifier took 1.703 seconds. This resulted in 25 of 31 buildings correctly classified. For the Yongbyon image, training the classifier took 0.613 seconds and applying the classifier took 7.851 seconds to classify the image. This resulted in 21 of 27 building image objects correctly classified. The classification maps for these algorithms are as follows: 61 Figure 18: RF no knowledge classification, Sohae. Figure 19: RF no knowledge classification, Yongbyon. 62 4.1.6. Random Forest Classification with Knowledge in eCognition The next iteration of random forest classifications used the with-knowledge feature space for classification. For the Sohae image, training the classifier took 0.219 seconds and applying the classifier took 1.032 seconds, resulting in 25 of 31 correctly classified building image objects and 2 false positives. For the Yongbyon image, training the classifier took 0.765 seconds and 4.891 seconds to apply the classifier. This resulted in 17 of 27 correctly classified building image objects and 6 false positives. The classifications for the with-knowledge random forest classifications, and the accuracy assessment table for all classifications in this first analysis, are as follows: Figure 20: RF with knowledge classification, Sohae. 63 Figure 21: RF with knowledge classification, Yongbyon. Table 6: Knowledge Incorporation comparison accuracy assessment (above). ) s d n o c e s ( e m T i Processing Times, Analysis 1 10,000.00 2761.5 1381.8 6319.2 2043.6 1,000.00 100.00 10.00 1.00 1.906 1.251 8.464 5.656 Sohae, NN, NK Sohae, NN, WK Sohae, RF, NK Sohae, RF, WK YB, NN, YB, NN, YB, RF, YB, RF, NK WK NK WK Classification Table 7: Knowledge Incorporation Analysis Processing Times. 64 ClassificationSohae, RB, NKSohae, RB, WKSohae, NN, NKSohae, NN, WKSohae, RF, NKSohae, RF, WKYB, RB, NKYB, RB, WKYB, NN, NKYB, NN, WKYB, RF, NKYB, RF, WKTP252525242525131021172117FP0013021183136TN100100999710098999992978794FN6667661417610610CA0.95420.95420.94660.92370.95420.93890.88190.88190.85830.88980.89760.874UA0.80650.80650.96150.77420.80650.80650.48150.48150.37040.77780.62960.6296PA110.96150.888910.92590.92860.92860.90910.72410.850.7391F0.44640.44640.48080.41380.44640.4310.31710.31710.26320.3750.36170.3399 4.2. Software Comparison 4.2.1. Random Forest Classification in eCognition For the random forest software comparison, I used the Sohae image classification without feature space reduction. These results do not differ from those discovered in the knowledge incorporation analysis of this study. Accordingly, the results - including the classification map and the accuracy assessment – may be found in Section 5.1.5 of this study. 4.2.2. Random Forest Classification in R The random forest classifications in R are divided into three models each with its own results. Prediction maps for each of the models were created based on their respective results. Model 1, which is the simple random forest model using pixel values, achieved an overall accuracy of 64 % with the validation dataset. The error matrix for Model 1 is as follows: Building Buildup Vege_mount Water Building Buildup Vege_mount Water 3 0 0 0 1 5 1 1 0 4 8 0 0 3 0 2 Table 8: Model 1 Accuracy Assessment 65 The prediction map of the classifications for Model 1 may be seen below: Figure 22: Prediction using Model 1 (z-value of pixel). Model 2 achieved an overall accuracy of 93 % using the validation dataset. This accuracy was achieved after several iterations of the model to determine the best choice of mtry value. The most appropriate value was 10, which produced the accuracy. The error matrix for Model 2, are as follows: Building Buildup Vege_mount Water Building Buildup Vege_mount Water 4 0 0 0 0 12 1 0 0 0 7 0 0 0 1 3 Table 9: Model 2 Accuracy Assessment Accordingly, the prediction map of the classification from Model 2 is seen below: 66 Figure 23: Prediction using RF Model 2 (RFsp with z). For this model, I also calculated the variable importance, which is the proportion of the frequency of occurrences of the variable in the classification trees. The z value showed greater importance with distance-based variables maintaining high significance, as well. The following shows the variables with a variable significance greater than 2. Variable z layer.21 layer.23 layer.5 layer.22 layer.24 layer.13 layer.27 Importance 17.0727495 8.8608239 5.8005842 5.1880802 4.9068297 3.9597840 3.5027885 2.4581336 Table 10: Model 2 variables with greater than 2 significance. 67 Lastly, Model 3 used only the distance variables for the random forest classification. It performed best on the validation dataset, producing an overall accuracy of 94 %. The error matrix for Model 3 is as follows: Building Buildup Vege_mount Water Building Buildup Vege_mount Water 4 0 0 0 0 12 0 0 0 0 8 0 0 0 1 3 Table 11: Model 3 Accuracy Assessment Below is the prediction map of the classification produced with Model 3: Figure 24: Prediction using RF Model 3 (RFsp, no z). 68 4.3. Spatial Resolution Comparison The spatial resolution comparison analysis used the three classification algorithms used in the first analysis on NAIP imagery at 1m and 3m resolution. This analysis used only the knowledge-based classification. Each of these classifications was classifying the entire scene as opposed to extracting the building class. The bulk of the 100 points used for assessing accuracy were either water or vegetation from the reference image, with much lower frequencies of the remaining classes. 4.3.1. Rule-Based Classification 1m The first rule-based classification I conducted on the 1m version of the facility image. This required a total of 9 rules for classification. The most accurate class of production was the “water” Figure 25: Rule-based classification, 1m 69 and “vegetation” classes, whereas there was confusion with 5 “buildup” points being classified as “building” and “vegetation.” The higher accuracy in “vegetation” and “water” classes is likely due to the greater number of samples in these classes relative to the remaining classes. The classification is seen in Figure 25, and its respective contingency table is seen below: RB_1 Water Buildup Classification Building Vegetation Total Water 22 0 0 0 22 Building Reference Image Buildup 0 0 1 4 5 Vegetation Total 0 0 0 0 0 0 0 0 73 73 22 0 1 77 100 Table 12: Rule-based, 1m Accuracy Assessment 70 4.3.2. Rule-Based Classification 3m The next rule-based classification was conducted on the 3m version of the image, for which I used the same 9 rules as the 1m rule-based classification. This classification achieved high accuracies in classifying water and vegetation but very low accuracy in built-up classification, which was confused entirely with vegetation. The classification map is seen in Figure 26, and its contingency table is as follows: Figure 26: Rule-based classification, 3m RB_3 Water Buildup Classification Building Vegetation Total Water 22 0 0 0 22 Building Reference Image Buildup 0 0 0 8 8 Vegetation Total 0 0 0 0 0 0 0 0 70 70 22 0 0 78 100 Table 13: Rule-based, 3m Accuracy Assessment 71 4.3.3. Nearest Neighbor Classification 1m The nearest neighbor classification that I conducted on the 1m version of the facility image took approximately 45:02 minutes to complete. The water and vegetation classes achieved the highest accuracies with built-up and building classification experiencing low accuracies, with at least one correctly classified point in each class. The classification map for the nearest neighbor classification of the 1m image is seen in Figure 27, and its contingency table is as follows: Figure 27: NN classification, 1m NN_1 Water Buildup Classification Building Vegetation Total Water 22 0 0 1 23 Building Reference Image Buildup 0 1 5 0 6 Vegetation Total 0 0 1 3 4 1 0 3 63 67 23 1 9 67 100 Table 14: Nearest Neighbor 1m, Accuracy Assessment 72 4.3.4. Nearest Neighbor Classification 3m For the second nearest neighbor classification for the 3m version of the image, the classification algorithm took 20:09 minutes to complete. Vegetation and water again classified all respective points correctly, and all 7 built-up points were incorrectly classified as vegetation. No points retained building classifications. The classification map is in Figure 28 and the contingency table is seen below: Figure 28: NN classification, 3m NN_3 Water Buildup Classification Building Vegetation Total Water 22 0 0 0 22 Building Reference Image Buildup 0 0 0 7 7 Vegetation Total 0 0 0 0 0 0 0 0 71 71 22 0 0 78 100 Table 15: Nearest Neighbor 3m, Accuracy Assessment 73 4.3.5. Random Forest Classification 1m The random forest classification of the 1m resolution image took approximately 0.52 seconds to train the classifier with the given samples and an additional 6.03 seconds to apply the classification to the image. Water and building classes retained the highest accuracy of correctly classifying all of their respective points in the reference image. The built-up class experienced an increase in correctly identified objects, 3 of 6 correctly classified points, while vegetation experienced increased confusion amongst all other classes, though the majority of its points were correctly classified. The classification map is in Figure 29, and the contingency table for this random forest classification may be seen below: 74 Figure 29: RF classification, 1m RF_1 Water Buildup Classification Building Vegetation Total Water 22 0 0 0 22 Building Reference Image Buildup 0 3 0 3 6 Vegetation Total 0 0 1 0 1 3 4 6 58 71 25 7 7 61 100 Table 16: Random Forest 1m, Accuracy Assessment 4.3.6. Random Forest Classification 3m For the random forest classification of the 3m resolution image, training the classifier took 0.44 seconds and applying the classifier, 4.56 seconds. Water yielded the highest accuracy, similar to the other classifications, while built-up yielded the lowest, again, with all of its points classified as vegetation. Vegetation retained its high accuracy though with one point of confusion 75 with the water class. The classification map and contingency table for this classification are as follows, followed by a discussion of these results: Figure 30: RF classification, 3m RF_3 Water Buildup Classification Building Vegetation Total Water 22 0 0 0 22 Building Reference Image Buildup 0 0 0 7 7 Vegetation Total 0 0 0 0 0 1 0 0 70 71 23 0 0 77 100 Table 17: Random Forest, 3m Accuracy Assessment 76 Processing Times, Analysis 3 ) s d n o c e s ( e m T i 10000.00 1000.00 100.00 10.00 1.00 NN1 NN3 RF1 RF3 Classification Table 18: Spatial Resolution Analysis Classification Processing Times 4.3.7. Overall Results Each analysis produced a variety of different accuracies, but for this study in determining the best algorithm and parameters for this classification, the algorithm with the best accuracy from each section will be selected. Our results show that for the first analysis, the most accurate algorithm and feature space combination was both feature sets of the rule-based algorithm and the no knowledge random forest classification for the Sohae image. For the Yongbyon image, the highest accuracy was found in the random forest without knowledge classification. For the second analysis, Model 3 in R produced a slightly greater overall accuracy than the knowledge- incorporated random forest classification in eCognition of the Sohae image. Lastly, for the spatial resolution comparison, the 1m nearest neighbor and rule-based classifications produced the fewest amount of incorrect classifications and thereby the highest accuracy of the classification- resolution combinations. 77 5. Discussion The goal of this research was to determine whether expert knowledge could be used to improve classification accuracy for missile test sites through a process of feature reduction. To this end, four analyses were conducted to evaluate the roles that classification method, spatial resolution, program, and the inclusion of expert knowledge affect the identification of buildings. This research was conducted in the face of some limitations and was restricted further by our delimitations. 5.1. Limitations Our study contained several limitations to its full potential results. The most prominent limitation is the restriction to only two main study sites. Additionally, these study sites existed in the DPRK, one of the more restricted nations in the world. The location of these study sites proved problematic in the range of imagery available to public use. Accordingly, I were restricted to the 3m Planet Labs imagery, which did not appear to provide adequate resolution for the knowledge incorporation classification comparisons. Additionally, I had no direct means to access the sites for potential ground truthing, if necessary. These images are susceptible to tampering as well, which remains a minor yet critical limitation if the research is to be used for intelligence or military applications. 5.2. Delimitations I delimited this study in several ways, including our restriction of study sites to those in the DPRK for the initial analysis and to strictly the Michigan nuclear facility in the final analysis. The results of this study may accordingly only apply to these specific cases and not to other cases, 78 despite the reusability between the DPRK and the Michigan workflows. Furthermore, I limited the analysis to GEOBIA instead of incorporating pixel-based analysis methods, which may have produced different results for detecting the buildings in these images. Lastly, I limited the software comparison to only eCognition and R and limited this analysis to the application of random forest classification in the respective software. Similarly, these results could be software and algorithm specific – expanding the options and combinations would likely produce different results. 5.3. Analysis 1 – Knowledge Incorporation Comparison The results from the knowledge incorporation did not seem to confirm our hypothesis that simpler automated classification of buildings would be more successful than using the complete image feature information. This feature space reduction seems to have confirmed or reinforced notions of timeliness at the cost of accuracy posed by previous work in the GEOBIA literature. Aside from the classifications themselves, the calculation of the algorithms took much less time with the knowledge-based feature space. This observation is consistent with that seen in Ma et al. (2015), which emphasizes decreased computing time with the reduced feature space but also a reduced accuracy. This latter point, however, is not confirmed with our analysis. Reducing the number of features to reflect the interpreter’s analysis in the feature space did not seem to affect the classifications for the Sohae image as much as the Yongbyon image. For the Sohae image, the number of true positive remained the same for both with- and without-knowledge classifications, except one less true positive in the with-knowledge nearest neighbor classification. The features used in the classification of this image seemed to reflect those best used to identify buildings already since the building features (such as green and brightness values) tended 79 to be of much greater value in buildings for this image than in the Yongbyon image. The most notable difference in the feature space reduction for this image is the slight increase in false positives, which is likely due to some confusion between spectrally similar buildings and built-up areas in the image. The Yongbyon image, on the other hand, experienced more dramatic changes by the reduced accuracy found in previous literature (Ma et al. 2015; Yu et al. 2006). All three classifications algorithms for the Yongbyon site classified fewer true positives when the feature space was reduced for knowledge incorporation. The number of false positives, however, either stayed the same or decreased, likely due in part to this increase in false negatives. Based on the classification maps, both sites appeared to have greater misclassifications with the reduced feature space. The fewer features, as mentioned in Ma et al. (2015), permit the algorithms fewer variables to consider in classifications, thereby leading to likely greater misclassification. All images from both sites appear much more speckled, as built-up tends to be misclassified as vegetation in the Sohae image due to the spectral similarity of the mountainous terrain. In the Yongbyon image, the notable confusion apparent in the maps is the increase in water features throughout the classification. The image contains some sporadic water features which retain similar spectral similarity to the vegetation in the image, which likely attributed to the sporadic water classifications throughout the vegetation class in the automated supervised algorithms. Despite this water misclassification, the knowledge incorporation refined the building features within the built-up features. The accuracy assessments do not appear to reflect this refinement, but the knowledge incorporation for both the nearest neighbor and random forest classifications of the Yongbyon image appear to better identify the buildings as opposed to 80 creating a large cluster of building-classified image objects which inevitably contain the buildings of interest. I chose to focus on only the “building” class for this analysis. This choice reflects that found in Marpu et al. (2008), which identified the most appropriate method of classifying an Iranian nuclear facility as creating two binary classes (class of interest and not class of interest) and classifying accordingly, which produced favorable results. Our classifications provide an indicator of the presence of buildings rather than identifying specific buildings and their extent, likely due to the image quality. Accordingly, a higher resolution image would provide more accurate results of the extent of the building class. Additionally, while Castilla and Hay (2008) emphasize the requirement of multiple iterations of segmentation and classification to incorporate semantic meaning into image objects, the results of this study appear to support their findings. Providing a simple feature reduction appears to identify the locations of the classes of interest with a single iteration though multiple iterations would likely have provided better results and greater semantic meaningful image objects. The dramatic difference between the Sohae and the Yongbyon images is likely due to the sheer size difference between the two and the heterogeneity of the Yongbyon image versus the Sohae image. The Sohae image contained 8,019 image objects and the Yongbyon image, 330,722 image objects, contributing to a large difference in the size of the image objects and the image’s overall homogeneity difference. The image objects in the Sohae image, as a result, were more representative of visually-interpreted objects and contained relatively similar and more- distinguishable feature values. Accordingly, the user’s accuracy for all of the Yongbyon 81 classifications regardless of feature space were significantly lower than the Sohae user’s accuracy for any classification. Additionally, all other measured accuracies were slightly lower in the Yongbyon image than the Sohae image. 5.4. Analysis 2 – Software Comparison Before this comparison, it does not seem a comparison between the R and eCognition random forest applications had been previously undertaken. The comparison is further refined with the use of random forest classification for analysis. As mentioned in the previous discussion section, the results of the classification in eCognitinion yielded good accuracies with mostly true positives in classifying the buildings. For all three models constructed in R, all building classifications were correctly classified, though they represent a low number of the validation set. Each of the models in R also produced high overall accuracies. Though not the intent of this analysis, it suggests the increase in the amount of image data used for classification may complicate the classification and lead to misclassifications. The argument reinforces the strength of feature reduction for identifying certain classes in an image. Of the three models in the R software analysis, Model 2 produced the most accurate map. It generalized a fair portion of the built-up area in the image, similar to the eCognition analysis. This confusion likely lies in the similarity in spectral values of the built-up area and the vegetation- mountain class. eCognition appeared to better represent this built-up class, though with some errors in classifying as “vegetation,” since the former does not create large, properly distinguishable features as groups of the built-up area as does the R classification. The R analysis classified a large portion of the bottom-right of the image as “building,” however, which the eCognition classification better identifies. 82 5.5. Analysis 3 – Spatial Resolution Comparison Per our predictions for the spatial resolution comparison of the Palisades Nuclear Facility image, the 1m resolution images provided better classification results across all three classification algorithms using the knowledge-based feature space. As was the issue in Maathuis (2003), where low-resolution imagery could not accurately classify individual landmines, the lower resolution imagery of the Palisades facility reduced accuracy. This observation supports the findings in the study above that higher resolution imagery – sub-meter resolution as designated by Maathuis (2003) – is needed to identify smaller features in spaceborne imagery properly. The lower accuracies achieved in the 3m resolution image correlates to the lower accuracies reflected in our knowledge incorporation comparison analysis of this study. The 3m resolution data did not accurately classify “buildings” or any of the other used classes and the 1m resolution imagery with the given rulesets and classification algorithms. The validation points in the 3m resolution imagery did not represent the building class in any of the instances, likely due to the larger image object size likely placing it in an image object of vegetation or built-up classification. Even referring to the built-up class, the 3m resolution imagery did not correctly classify one of the built-up points from the reference image. Rather, these were all classified as vegetation, again likely due to the over-segmenting of the image objects representing vegetation into built- up areas of similar spectral value. This may also be due to shadows in the built-up areas being improperly classified as vegetation due to their spatial proximity to the vegetation classes. Shadows being classified as vegetation is not an expected outcome from our hypotheses; this confusion may be due to the darkness of the vegetation in certain parts of the image. Some 83 samples used for vegetation may have extracted these dark-appearing spectral values and did not have any other class appropriate for classifying the shadows than “vegetation.” Creating a “shadows” class may mitigate this confusion. Though the 3m resolution imagery did not yield high accuracy, the 1m resolution imagery was not perfect, and yielded higher, though not high, accuracy. The image objects in the 1m resolution image merged to represent buildings due largely to their high brightness value in this image. The finer imager objects produced allowed an easier merging process that permitted the merging of image objects to represent urban and artificial features correctly. The 1m imagery better defined the different features and prevented much of the over-segmentation present in the 3m resolution image due to finer, more-definable features. Of the three classifications used on the 1m resolution imagery, the rule-based classification remained the only algorithm not to identify building features at any of the random points used for verification. The other algorithms identified buildings in one or more instances at the 1m resolution level but produced mostly misclassifications as vegetation class. This confusion is likely due to the spatial proximity of the classes and the spectral similarity of vegetation and some of the shadows present near buildings. The sample differences between the images may have contributed to the different accuracies in classification. As seen in the Methods section of this study, the samples for the 1m resolution imagery were much smaller and did not account for as much space as the 3m resolution samples. In attempting to transfer the TTA mask created from this 1m sampling, no accurate transfer was possible without large image objects being used as samples in the 3m resolution image. Accordingly, the 3m resolution samples likely accounted for largely different spectral values than the more refined 1m resolution samples. This likely contributed to the error seen in classifying 84 buildings, in particular. The misclassification of the vegetation image objects in the 1m resolution imagery may result from the smaller image objects accounting for a more limited threshold of values and the increased number of image objects in general due to greater heterogeneity in the finer resolution image. Both of the resolutions accurately classified non-urban features. The 1m random forest classification produced the lowest accuracy for classifying vegetation, however, with confusion between water and built-up classes. The confusion may lie in the spectral similarity of shadows, darker water features, and darker vegetation, a problem which may potentially be solved with the addition of a shadows class. Water features were classified correctly at all identified points, except one point in the finer resolution nearest neighbor classification, likely due to over- segmenting or spectral similarity of very close image objects to water features. Even though the 1m resolution image classification produced more favorable results than the 3m resolution image, the algorithms took a longer duration to complete. For the rule-based classifications, I used the same 9 rules to classify the images, while the nearest neighbor took almost double the amount of time for the 1m resolution image than it needed for the 3m resolution image. The random forest classification took about one-second longer for the 1m resolution image classification. This difference in time reflects the greater amount of information present in the 1m resolution imagery. Following the segmentation, the 1m resolution image yielded a far greater number of image objects to classify than di the 3m resolution image. The larger image objects in the 3m resolution image further reduce the amount of information needed to classify the image objects, as they account for multiple image objects in the 1m resolution image. 85 Based on the classification maps, the lower resolution imagery appears to provide cleaner classifications than the 1m resolution imagery. This observation may likely be attributed to the far greater number of image objects in the finer resolution imagery. Additionally, the finer resolution imagery contains an additional “residential” class, which could not be classified in the 3m resolution imagery due to large spectral confusion in classifying the entire scene as this class. As seen in the 1m resolution classifications, the “residential” class appears sporadically confused with “vegetation” and “industrial building” classes. To a degree, these may display accurately, since residential buildings were represented by small image objects throughout the image; however, the similarity between the residential image objects and the surrounding “vegetation,” “industrial,” and “built-up” classes. The 3m classifications did not produce as much apparent noise as the 1m resolution classifications. Accordingly, they appear to have identified specific buildings better than the 1m classifications, which produce a fair amount of noise for the “building” and “built-up” classes, again likely due to the similarity between the “built-up” and “building” classes spectrally, and the increased number of image objects. These 3m classifications, however, did produce unclassified image objects in the final classification. The rule-based 1m resolution classification appears to produce the most realistic classification, including the residential buildings divided, whereas the random forest from the same resolution produces a noise-filled classification with several apparent misclassifications. 5.6. Developments on Present Research This research builds on and solidifies some of the previous research regarding knowledge incorporation into GEOBIA applications and applications observing the DPRK and military applications with remote sensing analysis. The research extends previous research regarding the 86 DPRK from a geographic perspective, as designated by Shim 2014 (Shim 2014). It additionally details the seldom-analyzed missile testing facility components specifically, which often are overshadowed by the missile development and the missile program in the DPRK itself. Regarding remote sensing for strategic operations, this research contributes by confirming the need for high-resolution imagery to detect small features in images, as observed by Maathuis (Maathuis 2003). These analyses further show that conducting remote sensing studies reflecting strategic operations may be conducted to a lesser degree with publicly available data for little to no cost to the analyst. There appears to be a large amount of research towards the development of contextual, knowledge-incorporated image classification in the realm of GEOBIA. The research produced in this study, particularly our first analysis, builds on the assertion by MacFaden and O’Niel-Dunne (2015) that different rulesets will be necessary to properly classify different sites with contextual information. The comparison of different classification methods is similar to Belgiu and Dragut (2016), who compare and supervised classification methods for classification accuracy. Additionally, the methods in this study may motivate to further integrate eCognition with the R computer software, which has not been directly compared in previous research as far as I am aware. 87 6. Conclusion In our research study, I aimed to observe the best approach of incorporating expert knowledge into classifications of DPRK missile testing facilities. The study compared three different perspectives and variables associated with image analysis. Accordingly, I divided the study into three analyses, one for each of these factors: knowledge incorporation based on feature reduction, software for classification, and spatial resolution of the image for classification. Though our specific analyses did not produce outstanding results, it still narrowed the algorithms in the study to a most accurate combination across the three analyses, based on the most accurate parameters from each analysis. In several cases, I conclude that the different classification methods and combinations of analyses results could be used with some success in different scenarios. By the results provided above, however, I conclude that the best results provided from our analyses based on the comparison is a knowledge-based, random forest classification using R and 1-meter spatial resolution. Different applications of the research results would require different combinations of the best-resulting variables. Overall, the intent of the study remained to see to what extent I could conduct these analyses using publicly-available data to perform analyses for intelligence-like applications in a politically isolated nation of the world critical to government intelligence interests. Several different variables could be accounted for in this study to improve the results for future refinement and research. More research could be done to further the results produced in this study. The use of strictly 3m resolution data for the DPRK classifications could be improved to sub-meter resolutions for likely better analysis and a more accurate representation of each of the algorithms and their potentials. Since I restricted the study to only DPRK sites, future research 88 could expand this analysis to similarly rogue nations with nuclear and missile programs, namely the IRI, which has been the greater focus of many similar studies. These less-accessible regions appear to be one of the only blockades toward achieving entirely public information for similar studies. According to these future research suggestions to improve the motives of this study, some additional factors could be tested to develop and discover the best-fitting combination of variables for this political application of image classification. A study of different combinations of the results of this study may provide a more diversified analysis of the accuracies of different combinations of data, classification algorithms, knowledge incorporation, and software. For instance, a future study could take the most accurate approaches found in this study and test the three variables together for accuracy. It may add to the study with different combinations to compare the accuracies. Performing this combination again on the DPRK sites used in this study will also provide a more in-depth analysis of the research area and further refine the results of this research. While this study focused on GEOBIA applications primarily, it may be an interesting analysis to compare the results of the different algorithms for building identification between pixel-based methods and GEOBIA methods. The introduction of pixel-based analysis would furthermore introduce new software to the software comparison of this study, including ERDAS Imagine and ArcGIS programs, to name a few. This software could be added to the software comparison to expand on this particular analysis. Additionally, higher and lower resolution images could be used for classification to determine the effects of increasing and decreasing the resolution for each of the classification algorithms. I would also like to compare highly quantitative methods developed 89 in GEOBIA literature to those that are highly qualitative methods to determine which approach provides the most accurate classifications. All of these analyses could be used on the DPRK missile sites to advance the research done in this study in the interests of supporting policymaking and government decision-making. I hope that the results of this study motivate to continue geographic research in the DPRK. Little research has been conducted in the region that is available to the public. While some may believe it impossible to conduct geographic research of the DPRK without classified government- level imagery, this study shows that such analysis may be conducted with publicly available assets and imagery for little cost to the individual. The DPRK missile program remains a highly significant issue in global affairs, particularly in its threats towards the US and its allies historically and recently. With this issue becoming publicly available, it is critical that the public view the program on the ground to truly understand it and the DPRK capabilities so as to avoid inappropriate political moves or gestures. Using the ever-advancing field of remote sensing and GEOBIA and results of this study, the public may be able to make further developments and refinements to develop an accurate representation of highlighting the buildings in DPRK missile testing facilities for better public perception of the program and its developments as they occur using entirely open-source data. 90 APPENDICES 91 APPENDIX A: Abbreviations - CONAN: Contextual Analysis - CTB: Comprehensive Test Ban Treaty - DPRK: The Democratic People’s Republic of Korea (The DPRK) - FMLE: Fuzzy Maximum Likelihood Estimation - GEOBIA: Geographic Object-Based Image Analysis - - - - IC: Intelligence Community ICBM: Intercontinental Ballistic Missile ID: Identification IRI: Islamic Republic of Iran - MAD: Multivariate Alteration Detection - NPT: Nuclear Nonproliferation Treaty - NTI: Nuclear Threat Initiative - OWL: Ontology Web Language - ROK: The Republic of Korea (South Korea) - SEaTH: Separability and Threshold - SLBM: Submarine Launched Ballistic Missile - US: The United States of America - USGS: United States Geological Survey - USSR: United Soviet Socialist Republics (Soviet Union) - WMD: Weapons of Mass Destruction 92 APPENDIX B: Sohae Ruleset 93 APPENDIX C: Yongbyon Ruleset 94 APPENDIX D: Palisades Nuclear Energy Facility Ruleset 95 REFERENCES 96 REFERENCES 38 North. 2018. 38 North. https://www.38north.org/. Agnew, John, Thomas W. Gillespie, Jorge Gonzalez, and Brian Min. 2008. "Baghdad Nights: Evaluating the US Military 'Surge' Using Nighttime Light Signatures." Environment and Planning 40: 2285-2295. Albright, David, and Paul Brannan. 2007. The North Korean Plutonium Stock, February 2007. Institute for Science and International Security. Arvor, Damien, Laurent Durieux, Samuel Andrés, and Marie-Angélique Laporte. 2013. "Advances in Geographic Object-Based Image Analysis with Ontologies: A Review of Main Contributions and Limitations from a Remote Sensing Perspective." ISPRS Journal of Photogrammetry and Remote Sensing 125-137. Baatz, M, C Hoffmann, and G Willhauck. 2008. "Processing from Object-Based to Object-Oriented Image Analysis." In Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications, by T Blaschke, S Lang and G J Hay, 29-42. Berlin: Springer- Verlag Berlin Heidelberg. Baatz, M., and A. Schäpe. 2010. Multiresolution Segmentation: An Optimization Approach for High Quality Multi-Scale Image Segmentation. http://www. agit. at/papers/2000/baatz_FP_12. pdf. Bazeley, Pat, and Kristi Jackson. 2013. Qualitative Data Analysis with NVIVO. 2nd. Los Angeles: SAGE Publications Inc. Beck, Richard A. 2003. "Remote Sensing and GIS as Counterterrorism Tools in the Afghanistan War: A Case Study of the Zhawar Kili Region." The Professional Geographer 170-179. Belgiu, Mariana, and Lucian Dragut. 2016. "Random Forest in Remote Sensing: A Review of Applications and Future Directions." ISPRS Journal of Photogrammetry and Remote Sensing 24-31. Belgiu, Mariana, Barbara Hofer, and Peter Hofmann. 2014. "Coupling Formalized Knowledge Bases with Object-Based Image Analysis." Remote Sensing Letters 530-538. Benz, Ursula C, Peter Hofmann, Gregor Willhauck, Iris Lingenfelder, and Markus Heynen. 2004. "Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information." ISPRS Journal of Photogrammetry and Remote Sensing 239-258. Bhaskaran, Sunil, Shanka Paramananda, and Maria Ramnarayan. 2010. "Per-pixel and Object- Oriented Classification Methods for Mapping Urban Features Using Ikonos Satellite Data." Applied Geography 650-665. 97 Bianchetti, Raechel, and Alan MacEachren. 2015. "Cognitive Themes Emerging from Air Photo Interpretation Texts Published to 1960." ISPRS International Journal of Geo-Information 551-571. Blaschke, Thomas, Geoffrey J Hay, Maggi Kelly, Stefan Lang, Peter Hofmann, Elisabeth Addink, Raul Queiroz Feitosa, et al. 2014. "Geographic Object-Based Image Analysis – Towards a new paradigm." ISPRS Journal of Photogrammetry and Remote Sensing 180-191. Breiman, Leo. 2001. "Random Forests." Machine Learning 5-32. Broad, WJ, D Jehl, DE Sanger, and T Shanker. 2005. "North Korea Nuclear Goals: Case of Mixed Signals." New York Times. Castilla, G, and G J Hay. 2008. "Image Objects and Geographic Objects." In Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications, by T Blascke, S Lang and G J Hay, 91-110. Berlin: Springer-Verlag Berlin Heidelberg. Chen, W, X. Li, Y. Wang, and G. Liu, S. Chen. 2014. "Forested Landslide Detection Using LIDAR Data and the Random Forest Algorithm: A Case Study of the Three Gorges, China." Remote Sensing of Environment 291-301. Chung, Samman. 2016. "North Korea's Nuclear Threats and Counter-Strategies." The Journal of East Asian Affairs 83-131. Cloud, John G, and Keith C Clarke. 1999. "Through a Shutter Darkly: The Tangled Relationship Between Civilian, Military, and Intelligence Remote Sensing in the Early U.S. Space Program." In Secrecy and Knowledge Production, by Judith Reppy, 36-56. Cornell University Peace Studies Program. Diamond, John M. 2001. "Re-Examining Problems and Prospects in U.S. Imagery Intelligence." International Journal of Intelligence and Counterintelligence 1-24. Dragut, Lucian, and Thomas Blaschke. 2006. "Automated Classification of Landform Elements Using Object-Based Image Analysis." Geomorphology 330-344. Glade, David. 2000. Unmanned Aerial Vehicles: Implications for Military Operations. Unclassified Military Report, Maxwell AFB, AL: Air University Press. Gu, H. Y., H. T. Li, L. Yan, and X. J. Lu. 2015. "A Framework for Geographic Object-Based Image Analysis (GEOBIA) Based on Geographic Ontology." The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 21-23. Gupta, Vipin, and Frank Pabian. 1997. "Investigating the Allegations of Indian Nuclear Test Preparations in the Rajasthen Desert." Science & Global Security 101-188. Hay, G.J., and G. Castilla. 2008. "Geographic Object-Based Image Analysis (GEOBIA): A New Name for a New Discipline." In Object-Based Image Analysis: Spatila Concepts for Knowledge- Driven Remote Sensing Applications, by Th. Blaschke, S. Lang and G.J. (Eds.) Hay, 74-89. Berlin, Heidelberg: Springer. 98 Hitchings, Sean. 2003. "Policy Assessment of the Impacts of Remote-Sensing Technology." Space Policy 119-125. Hsieh, Hsiu-Fang, and Sarah E Shannon. 2005. "Three Approaches to Qualitative Content Analysis." Qualitative Health Research 1277-1288. Johnson, Brian, and Zhixiao Xie. 2013. "Classifying a High Resolution Image of an Urban Area using Super-object Information." ISPRS Journal of Photogrammetry and Remote Sensing 40-49. Kim, Sung Chull, and Michael D Cohen. 2017. North Korea and Nuclear Weapons: Entering the New Era of Deterrence. Washington DC: Georgetown University Press. Kim, Won-Young, and Paul G. Richards. 2007. "North Korean Nuclear Test: Seismic Discrimination Low Yield." Eos 158-161. Kit, Oleksandr, and Matthias Lüdeke. 2013. "Automated Detection of Slum Area Change in Hyderbad, India using Multitemporal Satellite Imagery." ISPRS Journal of Photogrammetry and Remote Sensing 130-137. Krtalic, A. 2016. "Analysis of the Segmented Features of Indicator of Mine Presence." The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 12-19. Lexical Computing CZ. 2019. Sketch Engine. Mikulov, CZ. Liedtke, C.-E, J Buckner, O Grau, S Growe, and R Tonjes. 1997. "AIDA: A System for the Knowledge Based Interpretation of Remote Sensing Data." Third International Airborne Remote Sensing Conference and Exhibition. Copenhagen. Ma, Lei, Liang Cheng, Manchun Li, Yongxue Liu, and Xiaoxue Ma. 2015. "Training Set Size, Scale, and Features in Geographic Object-Based Image Analysis of Very High Resolution Unmanned Aerial Vehicle Imagery." ISPRS Journal of Photogrammetry and Remote Sensing 14-27. Maathuis, B. H.P. 2003. "Remote Sensing Based Detection of Minefields." Geocarto International 51-60. MacFaden, Sean, and Jarlath O'Neil-Dunne. 2015. "A Tool for the Automated Detection of Damaged Transportation Infrastructure." ASPRS Annual Conference. Tampa. Marpu, P R, I Niemeyer, S Nussbaum, and R Gloaguen. 2008. "A Procedure for Automatic Object- Based Classification." In Object-Based Image Analysis: Spatial Concepts for Knowledge- Driven Remote Sensing Applications, by T Blaschke, S Lang and G J Hay, 169-184. Berlin: Springer-Verlag Berlin Heidelberg. Maxwell, A.E., T.A. Warner, M.P. Strager, J.F. Conley, and A.L. Sharp. 2015. "Assessing Machine- Learning Algorithms and Image- and Lidar-derived Variables for GEOBIA Classification Mining and Mine Reclamation." International Journal of Remote Sensing 954-978. 99 McTavish, Donald G, and Ellen B Pirro. 1990. "Contextual Content Analysis." Quality and Quantity 245-265. Millard, K., and M. Richardson. 2015. "On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping." Remote Sensing 8489. Ming, Dongping, Jonathan Li, Junyi Wang, and Min Zhang. 2015. "Scale Parameter Selection by Spatial Statistics for GEOBIA: Using Mean-Shift Based Multi-Scale Segmentation as an Example." ISPRS Journal of Photogrammetry and Remote Sensing 28-41. Niemeyer, I, P R Marpu, and S Nussbaum. 2008. "Change Detection using Object Features." In Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications, by T Blaschke, S Lang and G J Hay, 185-202. Berlin: Springer-Verlag Berlin Heidelberg. Niemeyer, I., and S. Nussbaum. n.d. "Automation of Change Detection Procedures for Nuclear Safeguards-Related Monitoring Purposes." Global Monitoring for Security and Stability Network of Excellence. Nuclear Threat Inititative. 2018. North Korea. http://www.nti.org/learn/countries/north- korea/facilities/. Nussbaum, S., I. Niemeyer, and M. J. Canty. 2006. "SEATH - A New Tool for Automated Feature Extraction in the Context of Object-Based Image Analysis." O'Neil-Dunne, Jarlath P.M., Sean MacFaden, and Keith C Pelletier. 2011. "Incorporating Contextual Information into Object-Based Image Analysis Workflows." ASPRS 2011 Annual Conference. Milwaukee. Ozeki, Masaru, and Kosuke Heki. 2010. "Ionospheric Holes made by Ballistic Missiles from North Korea Detected with a Japanese Dense GPS Array." Journal of Geophysical Research 115. Pal, M., and P.M. Mather. 2005. "Support Vector Machines for Classifciation in Remote Sensing." International Journal of Remote Sensing 1007-1011. Perkins, Chris, and Martin Dodge. 2009. "Satellite Imagery and the Spectacle of Secret Spaces." Geoforum 546-560. Planet Labs. 2019. Planet Labs Imagery & Archive. Mountain View, CA. Pollack, Jonathan D. 2003. The United States, North Korea, and the End of the Agreed Framework. US Naval War College. Postol, Theodore, and Markus Schiller. 2016. "The North Korean Missile Program." Korea Observer 751-805. Radoux, J, P Bogaert, D Fasbender, and P Defourney. 2011. "Thematic Accuracy Assessment of Geographic Object-Based Image Classification." International Journal of Geographic Information Science 895-911. 100 Sachdov, A.V. 2000. "North Korea's Missile Programme: A Matter of Concern." Strategic Analysis 1695-1707. Saldaña, Johnny. 2013. The Coding Manual for Qualitative Researchers. 2nd. Los Angeles: SAGE Publications Inc. Samad, Tariq, John S. Bay, and Datta Godbole. 2007. "Network-Centric Systems for Military Operations in Urban Terrain: The Role of UAVs." Institute of Electrical and Electronics Engineers. Satyanarayana, P, and S. Yogendran. 2013. "Military Applications of GIS." IIC Technologies Private Limited, Hyderabad. Schlittenhardt, J, M Canty, and I Grunberg. 2010. "Satellite Earth Observations Support CTBT Monitoring: A Case Study of the Nuclear Test in North Korea of Oct. 9, 2006 and Comparison with Seismic Results." Pure and Applied Geophyics 601-618. Shim, David. 2014. "Remote Sensing Place: Satellite Images as Visual Spatial Imaginaries." Geoforum 51: 152-160. Shippert, Peg. n.d. Introduction to Hyperspectral Image Analysis. Research Systems, Inc. Squassoni, Sharon A. 2005. North Korea Nuclear Weapons: How Soon an Arsenal? CRS Report for Congress, Congressional Research Service. Stanford Center for Biomedical Informatics Research. 2016. Protégé. Stanford, CA. Taubenböck, H., T. Esch, M. Wurm, A. Roth, and S. Dech. 2010. "Object-Based Feature Extraction Using High Spatial Resolution Satellite Data of Urban Areas." Journal of Spatial Science 117- 132. The R Foundation. 2019. R version 3.5.3. Murray Hill, NJ. Torres-Sánchez, J., F. López-Granados, and J.M. Peña. 2015. "An Automatic Object-Based Method for Optimal Thresholding in UAV Images: Application for Vegetation Detection in Herbaceous Crops." Computers and Electronics in Agriculture 43-52. Trimble. 2019. eCognition Developer 9. Sunnyvale, CA. Tuxen, K., and M. Kelly. 2008. "Multi-Scale Functional Mapping of Tidal Marsh Vegetation Using Object-Based Image Analysis." In Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications, by Th. Blaschke, S. Lang and G.J. Hay, 415- 442. Berlin Heidelberg: Springer-Verlag. United States Department of Agriculture. 2015. National Agriculture Imagery Program. Salt Lake City, UT. United States Geological Survey. 2019. Earth Explorer. Washington DC. 101 Wang, H., Y. Zhao, R. Pu, and Z. Zhang. 2015. "Mapping Robinia Pseudoacacia Forest Health Conditions by Using Combined Spectral, Spatial, and Texural Information Extracted from IKONOS Imagery and Random Forest Classifier." Remote Sensing 9020. Watts, A.C., L.N. Kobziar, and H.F. Percival. 2009. "Unmanned Systems for Wildland Fire Monitoring and Research." 24th Tall Timbers Fire Ecology Conference. Tallahasee. 86-90. WC3 Web Ontology Working Group. 2004. OWL2. Weinberger, Kilian Q, John Blitzer, and Lawrence K Saul. 2006. "Distance Metric Learning for Large Margin." Advances in Neural Information Processing Systems. Wharton, Stephen W. 1982. "A Contextual Classification Method for Recognizing Land Use Patterns in High Resolution Remotely Sensed Data." Pattern Recognition 317-324. Witmer, Frank D.W. 2015. "Remote Sensing of Vioelnt Conflict: Eye from Above." International Journal of Remote Sensing 2326-2352. Yu, Qian, Peng Gong, Nick Clinton, Greg Biging, Maggi Kelly, and Dave Schrikauer. 2006. "Object- Based Detailed Vegetation Classification with Airborne High Spatial Resolution Remote Sensing Imagery." Photogrammetric Engineering & Remote Sensing 799-811. Yue, Peng, Liping Di, Yaxing Wei, and Weiguo Han. 2013. "Intelligent Services for Discovery of ISPRS Journal of Complex Geospatial Features from Remote Sensing Photogrammetry and Remote Sensing 151-164. Imagery." 102