Ens-s ,- "41’, fig,” ‘ N. 1 " ' o. 9mg" . v I - 3"“ . 4“; .o~- . $243. ‘ ., ,.r;.<-;; 2.“: 'H‘g‘w’4“ . W... ‘_ . A...“ —. N '- «at. 3'"... $va - "fl." -. f". I This is to certify that the dissertation entitled PLANNING AND CONTROL OF MOBILE SURVEILLANCE NETWORKS presented by AMIT YOGESH GORADIA has been accepted towards fulfillment of the requirements for the Ph.D. degree in ELECTRICAL AND COMPUTER ENGINEERING _\ _ , "_.__. t Major Professor’s Signature 0/ (I Bar/t5 Date MSU is an Affirmative Action/Equal Opportunity Institution f"-‘4‘ LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p‘lClRC/DaIeDue.indd-p.1 PLANNING AND CONTROL OF MOBILE SURVEILLANCE NETWORKS BY Amit Goradia A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical and Computer Engineering 2006 P. mow St'IlNll The I In;T I'M muniti and P); [rat-hi; “Miter 1 Scale m Slalilt? r The Capabii DELSH} ABSTRACT Planning and Control of Mobile Surveillance Networks By Amit Goradia Pervasive surveillance implies the continuous tracking of multiple targets as they move about the monitored region so that they do not leave the field‘of view of the sensors observing them and maintain discernable resolution for feature identification. The tasks to be performed by a surveillance system are expressed as the follow- ing requirements:(1) Automatically track the identified targets over the region being monitored; (2) Coordinate tasking of multiple sensors to maintain target visibility and execute other requested tasks;(3) Provide concise feedback and video data of a tracked target to multiple operators. Due to the inherent complexity of large net— worked systems and the diversity of the sensors used, a major challenge for such large scale networked systems is to design an efficient modeling and analysis tool and devise stable control algorithms for accomplishing the surveillance task. The ability to track multiple targets simultaneously using sensors with motion capability is an important aspect of surveillance systems. Current feature-point- based visual surveillance and tracking techniques generally employed do not provide an adequate framework to express the surveilla; z-c task of tracking multiple targets simultaneously using a single sensor. This rlisscrtation presents the method of Haus- dorff tracking, which can express the surveillance task succinctly and track multiple targets simultaneously using a single sensor. Hausdorff tracking can readily express the surveillance tasks of maintaining the visibility and adequate resolution of a target as the minimization of an error (shape function) and accomplish the tracking task using feedback directly from the acquired image. Mutational equations are used to represent the motion of the target sets with respect to the motion of the camera. Ls IIIL‘: Emit {lung of III (haw; sire .~ for It; SWIM"! It»: 1;. apps). Tl. Ht‘lltjt: Hail, bd‘t’fii Using the mutational dynamics model, a feedback map to control the sensor motion module for accomplishing the surveillance task can be developed. Further, in order to take advantage of the redundancy in the motion control, an optimal Hausdorff tracking framework is presented. Despite the limited sensing capability and range of the individual sensors, a surveil- lance network can track targets over a large region by transferring the target-tracking task from one group of sensors to another based on the motion of the target. Muta- tional analysis and shape-based control fail to capture the discrete switching nature of the surveillance task of tracking the target using multiple switched sensors. This dissertation presents a mutational hybrid automata (MHA) model for such perva- sive surveillance networks that retains the advantages of using mutational equations for modeling the active surveillance task while also being able to model the discrete switching between various sensors. Analysis of example pervasive surveillance scenar- ios modeled using the MHA model and experimental results that verify the proposed approach are presented. The active sensors needed to track the target keep changing due to target motion. Hence the concise video feedback provided to the operator to assist decision-making needs to be switched to the sensors currently involved in the tracking task. A task- based metric representing the number of cameras required to track a moving target over a monitored region is presented. This metric (in conjunction with other metrics) can be used to optimally place cameras in a monitored environment. An optimized camera selection algorithm for selecting the minimum number of cameras to track a moving target over a large area is also presented. A surveillance test—bed has been designed based on these requirements and the algorithms and subsystems described have been implemented on it. Results of the experimental implementation validate the proposed approaches. To my parents and Prerna... haw dun (‘UIII litm- 51.13:: I and l I am Drain: Dr. \ lft‘hm and h. IOU H. Ofpan .\l: (IE-5“." Will}. ACKNOWLEDGEMENTS I would like to thank my advisor, Dr. N ing Xi, without whom this would not have been possible. I would like to thank Dr. Matt Mutka for providing acute insights during our numerous discussion sessions. I would like to thank all my other guidance committee members, Dr. Hassan Khalil, Dr. Erik Goodman and Dr. Gowei Wei, who have devoted many hours on the committee meetings. Their insightful comments and suggestions have enhanced the technical soundness of this dissertation. I would also like to express my deepest gratitude to the Dean Dr. Satish Udpa and Dr. Subir Biswas for their support and advise through the course of my career. I am grateful to my friends and colleagues from the Robotics and Automation Lab— oratory. The discussions with Zhiwei Cen, Dr. Yu Sun, Mike(Quan) Shi, Yang Liu, Dr. Yantao Shen, Long Pan, Dr. Jindong Tan, Dr. Imad Elhajj, Dr. Guangyong Li, Uchechukwu Wejinya, and Jiangbo Zhang have substantially contributed to my work and broadened my knowledge. I would like to express my great appreciation to Clay- ton Haffner, Michael Huntwork, Mat Prokos and Craig Pomeroy for the endless hours of painstaking work they have put in at the Robotics and Automation Laboratory. My thanks go to my family, especially my fiancee Prerna and my parents. This dissertation would not have been possible without their years of encouragement and continuous support. .. ...r .99 3 HIM: 11 Du. 9...... TABLE OF CONTENTS LIST OF FIGURES ix LIST OF TABLES xii 1 Introduction 1 1.1 Background ................................ 1 1.2 Challenges and Problems in Surveillance Networks ........... 5 1.3 Literature Review ............................. 7 1.4 Objectives of this Research ........................ 12 1.5 Major Contributions of this Dissertation ................ 13 1.6 Organization of this Dissertation .................... 15 2 Mathematical Background 17 2.1 Introduction ................................ 17 2.2 Mutational Equations ........................... 19 2.2.1 Evolution of a Tube ........................ 20 2.2.2 Controlled Mutational Equations ................ 21 2.3 Hybrid Dynamic Systems ......................... 22 2.4 Mutational Hybrid Dynamic Systems .................. 24 2.5 Shape Analysis .............................. 25 2.5.1 Shape Functions ......................... 25 2.5.2 Directional Derivative of Shape Functions ........... 26 2.6 Chapter Summary ............................ 27 3 Modeling Mobile Surveillance Networks 28 3.1 Introduction - Example Surveillance Scenario .............. 28 3.2 Controlled Mutational Equation Model for Camera Sensor and Target 33 3.2.1 Perspective Projection Model .................. 33 3.2.2 World Space Model ........................ 34 3.3 Incorporating Non-holonomic Sensor Constraints ............ 37 3.4 Hausdorff Tracking ............................ 39 3.4.1 Image-Based Hausdorff Tracking ................. 41 3.4.2 Cooperative Hausdorff Tracking ................. 46 vi C." C." C." C." .r. K." L' kw! C? C." C." I]! w 3.5 Mutational Hybrid Automata Model for Pervasive Surveillance Networks 50 3.5.1 Mutational Hybrid Automata Model .............. 50 3.5.2 Controller Design for Individual Cameras ............ 54 3.6 Chapter Summary ............................ 55 Analysis of Surveillance Networks 57 4.1 Lyapunov Theorem for Shape Space ......... . .......... 57 4.2 Stability Analysis of Hausdorff Tracking ................ 60 4.3 Analysis of Example Surveillance System using Mutational Hybrid Au- tomata Model ............................... 61 4.3.1 Two-Camera Example for Pervasive Surveillance with Contin- uous Servoing Cameras ...................... 62 4.3.2 Stability of the Two-Camera Example System ......... 70 4.3.3 Two-Camera Example for Pervasive Surveillance with Controlled Switches .............................. 74 4.3.4 Stability Analysis of the Two Camera Example with Controlled Switches .............................. 79 4.4 Chapter Summary ............................ 83 Design of Surveillance Networks 84 5.1 Wide Area Camera Calibration ..................... 85 5.1.1 Unitless Calibration ........................ 85 5.1.2 Method for Assigning Units ................... 88 5.2 Grouping Architectures .......................... 88 5.3 Sensor Allocation for Multi-Sensor Cooperation ............ 91 5.4 Optimal Hausdorff Tracking ....................... 93 5.4.1 Framework for Optimal Hausdorff Tracking Design ...... 94 5.5 Switched Video Feedback ......................... 106 5.5.1 Characteristics and Implementation Details for MJPEG . . . . 110 5.5.2 Characteristics and Implementation Details for H.261 ..... 110 5.5.3 Characteristics and Implementation Details for H.263 ..... 111 5.5.4 Characteristics of H.264 / MPEG4~Part10 ........... 112 5.6 Sensor Node Architecture ........................ 113 5.7 Modeling Switched Video Feedback as a Dynamic System ....... 114 5.7.1 Examples of Video Feedback Algorithms - Best Resolution . . 116 5.7.2 Example Video Feedback Algorithm - Persistent Camera . . . 117 5.7.3 Assessment Metric for Video Switching Based on Camera Lo- cations ............................... 117 5.8 Optimization for Target Tracking using Switched Sensors ....... 119 5.8.1 Optimal Control and Dynamic Programming .......... 119 5.8.2 Dynamic Programming Algorithm Implementation ...... 121 vii 5.9 Chapter Summary ............................ 121 6 Experimental Implementation and Testing 124 6.1 Experimental Test—bed Setup ...................... 124 6.1.1 Hardware Setup .......................... 126 6.2 Software Setup .............................. 127 6.3 Experimental Results for Wide Area Camera Calibration ....... 128 6.4 Experimental Implementation for Image-Based Hausdorff Tracking . . 131 6.4.1 Tracking a Single Target ..................... 131 6.4.2 Tracking Multiple Targets Simultaneously ........... 133 6.5 Experimental Implementation for Cooperative Hausdorff Tracking . . 136 6.6 Experimental Implementation for Redundancy Resolution using Opti- mal Hausdorff Tracking .......................... 141 6.7 Experimental Results for Multisensor Cooperation ........... 146 6.7.1 Experimental Evaluation for the Sensor Allocation Algorithm 146 6.7.2 Experimental Evaluation of Multiple Target Tracking using Mul- tiple Formation-Constrained Sensors .............. 148 6.7.3 Experimental Evaluation for Multi-Sensor Cooperation using Mutational Hybrid Automata Model .............. 155 6.8 Performance Analysis of the Video Subsystem ............. 162 6.8.1 Performance Analysis for Capture and Processing Time . . . . 164 6.8.2 Comparison for Video Size and Image Quality ......... 165 6.8.3 Performance Analysis for Video Bitrate and Frame Rate . . . 165 6.8.4 Performance Analysis for Server Switching and Camera Handoff 166 6.8.5 Performance Metric Evaluation for Various Surveillance Net- work Tasks ............................ 168 6.9 Chapter Summary ............................ 169 7 Conclusions 170 7.1 Conclusions ................................ 170 7.2 Future Research Directions ........................ 176 Bibliography 181 viii 1.1 3.1 3.2 3.3 3.4 3.5 3.6 4.1 4.2 4.3 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 LIST OF FIGURES Pervasive Surveillance Scenario ..................... 2 3 Camera Pervasive Surveillance Scenario ................ 29 Modeling Pervasive Surveillance Networks ............... 30 Target and coverage set for image-based Hausdorff tracking. ..... 42 Block diagram for image-based Hausdorff tracking. .......... 46 Target and coverage set 2D for cooperative Hausdorff tracking. . . . . 48 Target and coverage set for 3D cooperative Hausdorff tracking. . . . . 48 Example pervasive surveillance scenario with 2 cameras ........ 62 Graphical MHA model for the two-camera scenario .......... 65 Graphical MHA model for the two-camera scenario with controlled switches .................................. 76 The architecture of the optimal task distribution scheme ........ 95 The structure of the task analyzer module ................ 97 The structure of the optimization scheduling module. ......... 99 Class functions for physical programming. ............... 102 Modification of the preference mapping by confidence factor. ..... 104 Switched video architecture implementation .............. 107 H.624 Based Switched Video Streaming Client .............. 109 Architecture of Sensor Node ........................ 113 Generating Graph for the Camera Switching Strategy ......... 122 ix 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 Surveillance network implementation over an IP network ........ Surveillance test-bed designed along with the cameras and the motion modules for camera motion. ....................... Sony EVI—D30 mounted on Robot Nomad XR-4000 .......... Active Cameras: Sony EVI-D30 and dual head Hitachi KP-D50 Camera Setup: Location and pointing direction of the camera setup. . Image-based Hausdorff tracking using active camera .......... Image-based Hausdorff tracking using active camera: Maintaining ad- equate target resolution ......................... Tracking multiple targets using Image-based Hausdorff tracking Image-based Hausdorff tracking for multiple targets: Estimated posi- tion of targets ............................... Image sequence for tracking multiple targets using Image-based Haus- dorff tracking ................................ Position of robot and target set on x and y axes ............. Motion of target set and robot on x-z plane. The view cone of the robot is shown at a particular instance in time with the position of the target centroid shown as well. ...................... Shape functions for cooperative and image-based Hausdorff tracking . Robots View of target at several points in time to show several condi- tions that the tracking system responds to ................ Target tracking using a Sony EVI-D30 camera mounted on Robot No- mad XR—4000 mobile robot ........................ Results of the tracking task with optimal task distribution using phys- ical programming .............................. The energy consumption for no optimization, and weighted sum. . . The pan angle for no optimization, and weighted sum. ........ Estimated position of targets for multisensor cooperation using sensor allocation algorithm ............................ 125 126 127 130 132 135 136 137 138 139 140 141 143 144 147 6.20 6.21 6.22 6.23 6.24 6.25 6.26 6.27 6.28 6.29 6.30 Experiment setup for multi-target tracking using multiple cameras. . 150 Target tracks for multi-target tracking using multiple cameras ..... 151 Experiment setup for multi-target tracking using multiple cameras. . 152 Experiment setup for multi—target tracking using multiple cameras. . 153 Image Sequences taken while tracking the two targets with two formation- constrained cameras ........................... 154 X-Y Plot for target motion ........................ 155 Current MHA mode, shape function, velocity, pan angle and target visibility for Caml ............................ 157 Current MHA mode, shape function, velocity, pan angle and target visibility for Camg ............................ 158 Image Sequences taken while tracking the target over different modes 159 Current MHA mode, shape function, velocity, pan angle and target visibility for Caml ............................ 160 Current MHA mode, shape function, velocity, pan angle and target visibility for Camg ............................ 161 xi 6.1 6.2 6.3 6.4 6.5 LIST OF TABLES Scenarios and gain values ......................... 164 Capture and Compression Time of the Two Schemes ......... 164 Frame Rate and Bitrate for MJPEG, H.261 and H.263 ........ 165 Server Switching Time for Monitoring and Tracking Tasks ...... 167 Metric for Comparison of Various Schemes ............... 168 xii Ir. 1.] A $01 ("mm Df‘l Wt POHsj 3mm. (‘lffaxj In in» mm: .\'. ”‘19) T, SWIM; Infra}. I AIL“ Opfifay and 5‘ Chapter 1 Introduction 1 . 1 Background A sensor network is a collection of individual nodes with each node having sensing, communication, computation and motion capabilities. Distributed wireless sensor networks have recently emerged as an important research area. Sensor networks consist of a variety of different types of sensors in distributed areas. Technological advances in wireless networking and distributed robotics have been leading to in- creasing research on distributed sensing applications using wireless sensor networks. Infrastructure-less surveillance and monitoring are important. applications of such rapidly deployable sensor networks. Networked surveillance systems provide an extended perception and distributed reasoning capability in monitored environments through the use of multiple networked sensors. Various types of sensors with varying sensing modalities such as cameras, infrared detector arrays, laser rangefinders, omnidirectional acoustic sensors etc., can be instantly deployed in hostile environments, inaccessible terrain and disaster relief operations to obtain vital reconnaissance information about the area being surveyed. Locomotion and active sensing on individual nodes can greatly increase the range and sensing capability of the network. Such instantly deployable, wireless, mobile, Unmanned Aerial Vehicles Figure 1.1: Pervasive Surveillance Scenario distributed sensor networks networks find myriad civilian and military applications, such as battle field surveillance, environment monitoring, scene reconstruction, mo- tion detection and tracking, remote sensing and global awareness. There is a growing interest in surveillance applications due to the growing avail- ability of sensors and processors at a reasonable cost. There is also a growing need from the public for improved safety and security in large urban environments and improved usage of resources of public infrastructure. This, in conjunction with the increasing maturity of algorithms and techniques, is making possible the application of this technology in various application sectors such as security, transportation, and the automotive industry. In particular, the problem of remote surveillance of unat— tended environments has received growing attention in recent years, especially in the context of: 0 Safety in transport applications [1], [2], such as monitoring of railway stations [4], [5], underground stations [3,4], airports [5, 6] and airplane routes [7—9], motorways [10], [11], urban and city roads [12,13], maritime environments [14, 15]; 0 Safety or quality control in industrial applications, such as monitoring of nuclear plants [16] or industrial processing cycles [1,2]; 0 Improved security for people’s lives, such as monitoring of indoor or outdoor environments like banks [29], supermarkets [3], car parking areas, waiting rooms [17], buildings [18,19], etc., remote monitoring of the status of a patient [20], re— mote surveillance of human activity [21,22]; d) military applications for surveil- lance of strategic infrastructure [23,24], enemy movements in the battlefield [25,26] and air surveillance [27,28]. Surveillance networks are composed of numerous independent nodes which can be controlled individually. Hence they have high fault tolerance to node failure. However, this redundancy and fault tolerance comes at a price. One needs to design cooperation and coordination schemes in order for the individual nodes to perform tasks together. Phrther, these cooperation and coordination mechanisms should not consume too much power and other resources, as they would render the advantages due to redundancy useless. Pervasive surveillance can be defined as the active monitoring of recognized tar- gets as they move through a large monitored area using a network of sensors. An important characteristic of pervasive surveillance systems is their capability to track a target over a large area using multiple sensors. Figure 1.1 depicts a scenario for conducting reconnaissance on a city square. Each individual sensor can view only a part of the monitored area, but collectively the network can monitor the entire region. There are a number of reasons that optimal coverage may not be available for all regions of a large area being monitored, such as: a lack of prior knowledge of the IIE‘I'I IdIir Spa; CISP I mh‘lt and I Th‘fr; <3th- 0 Strain; environment, a paucity of locations for placement of sensors in the environment, or a shortage of sensors to be deployed. Motion capability of the sensors and the use of active sensing can greatly enhance the coverage quality of the network. For example, monitoring of areas with sparse or no coverage or acquiring higher resolution coverage of the targets can be accomplished by changing the active field of view of the fixed sensors or deploying mobile sensors. Further, the area monitored using a limited number of sensors can be increased by using active (mobile) sensors such as UAV’s, robot-mounted cameras or cameras mounted on pan-tilt heads etc. The mobility of these sensors can be leveraged to increase the area monitored by each sensor, which in turn increases the total area monitored. However, the mobility combined with the scalability requirements renders modeling and analysis of pervasive surveillance systems using active sensors a challenging task and brings to light many research issues. Compared with traditional surveillance networks, an IP-network-based surveil- lance system is easy to deploy, maintain, and expand. The ubiquitous nature of IP networks will dramatically reduce the cost to set up a traditionally expensive surveil— lance network. We envision that the possible usage of surveillance networks could span the venues of industry, military and daily lives. The surveillance network must provide, to the human operator, a timely and con- cise view of the relevant activities within the environment being monitored. Providing multiple video feedback streams often causes loss of attention span of the operator and makes it hard to keep track of the various activities over the various cameras. Therefore only video streams from relevant sensors should be presented to the oper- ator on a per activity basis. This involves automatically switching the active video stream presented to the operator. rm 1.2 Challenges and Problems in Surveillance Networks The tasks to be performed by a surveillance system can be expressed as the following requirements: 1. Deploy the sensors in order to monitor the region under surveillance; 2. Identify multiple moving targets based on predefined models; 3. Automatically track the identified targets over the region being monitored; 4. Coordinate tasking of multiple sensors to maintain target visibility and execute other requested tasks; 5. Provide concise feedback and video data of a tracked target to multiple opera- tors. The sensors should be deployed in the monitored region in order to maximize the likelihood of identifying and tracking the targets. In adversarial scenarios, care should be taken in order to conceal the location of the sensors in order to ensure their safety and the integrity of the task. A major challenge in the deployment phase is sensor placement in order to maximize the covered area while also ensuring continuous tracking of a target when it traverses the coverage regions of multiple sensors. An important aspect of sensor deployment is calibration. Calibration of the sen- sors enables multiple sensors to coordinate with each other and share information amongst themselves. This cooperation amongst sensors leads to better sensing and is in general a desirable property. Calibration of the sensors can be viewed in many do- mains (feature spaces). The most literal one is calibrating the location of the deployed sensors with respect to each other and a global frame. Other types of calibration can involve calibrating the color spaces of various cameras, which will facilitate the sharing of target models amongst the sensors. on: ist 01’ {II at it [rat-l Hum and f. agent Very . nlOllC T1 the r 0f th. {0 ii.” felt. malt-j In order to perform automated surveillance there are two major subtasks: target detection and target tracking. A target perception and video understanding module is responsible for detecting and classifying the various targets in the active field of view (F CV) of the sensor and performing temporal consolidation of the detected targets over multiple frames of detection. Moving target detection and classification is known to be a difficult research problem and has been the focus of many recent research efforts [29]. The next challenge would be to classify and associate the various detected image blobs to discernable targets and maintain their temporal tracks in order to pervasively track them. Once the targets have been identified, the challenge is to recognize the relationships of the target to the environment and other targets in order to attribute aims and motives to the tracked targets. The target tracking problem involves continuously keeping the target in view of at least one sensor. When the target is escaping the viewing area of the current tracking sensor, the current sensor needs to be moved (relocated) or the tracking task must be transferred to another sensor which can currently view the target. Modeling and design of such cooperative tracking using multiple sensors is a challenge. Active agents further make the modeling and design of such pervasive surveillance systems very difficult. A modeling and design framework for coordinating and controlling the motion of the active cameras to pervasively track targets is necessary. The various methods through which the the human operator communicates with the surveillance network form an integral part of the usability, usefulness and efficacy of the surveillance network. These methods include passing commands and queries to the network indicating the intention of the surveillance task and receiving realtime feedback and results of the queries in progress and alarms for violations of various predefined conditions. Video feedback is provided to the operator in order to apprise the operator of the various tracking and monitoring tasks being currently executed by the sensors. the lIliU and surv dciaj rarui 3r. - 1.3 Nf‘l It. muni .‘ However, due to the large number of sensors involved, the operator can be easily overwhelmed by the amount of video information. Hence, only relevant video feeds should be provided to the operator and, based on the task, the video feed should be switched from sensor to sensor. There are various choices for the switched video feedback implementation and they have their advantages and disadvantages based on the scenario they are being applied to. A performance evaluation metric is needed for evaluating the efficacy of a particular scheme when being applied to certain scenarios. Direct human intervention and control of the various cameras can be used for the logging and tracking of targets which are not predetermined and programmed into the network. Hence the human operator needs to move the active cameras and receive real-time video feedback to manually track targets. However for such surveillance systems, direct manual tele—control of the active active camera over the delayed IP network suffers from stability and synchronization problems due to the random, unbounded and asymmetric communication time delay in the network [30, 31]. This can result in reduced performance and even loss of tracking. 1.3 Literature Review Networked surveillance systems have received much attention from the research com- munity due to their many pervasive applications [32], [33]. Previous work from liter- ature is reviewed here. Deployment and Calibration: Based on their mobility and rapid-deployment capability, the operation of MSN’s can be divided into two separate phases, namely deployment and surveillance. Infrastructure-less rapid deployment is an important characteristic of mobile surveillance networks and many research efforts have been dedicated to optimally deploy sensor nodes for increasing their total sensing area [34—36]. A global kinematic representation of a network of connected sensors ’R, = Tat 1!]th and Tilt; deter appri tion ran: next discer Ihem. and B larger Targp 1W‘pl I. I 5])”. l f; . .gr‘ ' I nle. .I‘) inl‘gh.‘ ] I the Far 5]] ”In r b] Urfit {R1,R2, ...,Rn} using only localization information between one-hop neighbors is suggested in [34]. This relationship between the various nodes is key to sharing meaningful information between them. Target Identification and Video Understanding: Beyond the initial deploy- ment phase, in the surveillance phase there are two major subtasks: target detection and target tracking. Figure 5.8 depicts the general architecture of a sensor node. The target perception module is responsible for detecting and classifying the various targets in the active field of view (FOV) of the sensor and performing temporal con- solidation of the detected targets over multiple frames of detection. Moving target detection and classification is known to be a difficult research problem [29]. Many approaches such as active background subtraction [37] [38] and temporal differentia- tion have been suggested for detecting and classifying various types of moving targets ranging from single humans and human groups to vehicles and wildlife [39] [38]. The next problem would be to classify and associate the various detected image blobs to discernable targets and maintain their temporal tracks in order to pervasively track them. Various approaches such as extended Kalman filtering, pheromone routing and Bayesian belief nets have been suggested for maintaining the track of the various targets [40]. Target Tracking with Active Camera: A general surveillance task involves keeping moving targets in the active sensing region of the sensors with a certain pre- specified resolution. Research approaches to this problem found in recent literature [39] [37], generally use visual servo control [41] or gaze control [42], which mainly involve feature-point-based tracking and fail to describe the basic task of maintaining the target in the sensor’s active field of view with certain resolution, effectively and succinctly. These approaches result in excessive camera motion that may result in blurring and other undesired effects, which in turn have detrimental effects on the Slll fill; fron its-'1: in a , rt>r]11j In or. 0f [lit to an. the 01 Opera both ( surveillance task. Further, using these approaches it is not possible to describe a multiple target tracking task with randomly moving targets using a single sensor. Sensor Cooperation and Coordination: Multiple groups of nodes are sequen- tially involved in order to pervasively track the target as it traverses through the monitored area which is much larger than the range of an individual sensor. Hence there needs to be a handoff mechanism which essentially “switches” the tracking task from one group of sensors to another based on the movement of the target. Tracking itself can be described as a switching phenomenon since the act of tracking a target in a given monitoring area which is greater than the viewing area of any given sensor requires there to be some form of hand-off, or switch, between a number of sensors. In order to assure that the target visibility will be maintained even when it goes out of the capable field of view of one sensor, the system must switch the tracking task to another sensor that has to “pick up” the target before it leaves the field of view of the original sensor. This discrete switching phenomenon along with the continuous operation of the individual nodes can be modeled as a hybrid system consisting of both continuous and discrete transitions. Hybrid automata models for describing switching systems have been the focus of many recent research efforts [43], [44]. They can effectively capture the discrete switching nature of the surveillance task. However, their continuous part is modeled in a point-based framework using differential equations. This model fails to effectively model the continuous operation of the individual nodes in the surveillance task and suffers from the drawbacks mentioned previously. Network topology The efficient and timely exchange of information between the individual sensing nodes is imperative for cooperation and coordination of multi- ple nodes in order to perform pervasive target tracking. Ad-hoc, multi-hop mecha- nisms are best suited for such mobile infrastructure-less wireless networked systems. IN" to ' sur: by i ore: T ,w (”nyl, in ‘ I [fwd r. Networking and routing mechanisms for distributed wireless sensor networks have attracted a lot of attention in recent literature [45] [46] [47]. These approaches gener- ally assume scalar (low volume) data transfer between the individuals nodes and the sink node (which is just a consumer of the information.) However, surveillance tasks require fast and timely transfer of vector (large volume) data, e.g. video data, to the sink and also within the nodes themselves. This scenario may create an unbalanced networking load on the nodes along the optimal relay path, which can be detrimental to the longevity of the network. Operator Interface Design: Video feedback is an essential component of the surveillance system. A single human operator cannot effectively monitor a large area by looking at dozens of monitors showing raw video output. That amount of sensory over-load virtually guarantees that information will be ignored and requires a pro- hibitive amount of transmission bandwidth. In [39] an approach is presented which provides an interactive, graphical user interface (GUI) showing a synthetic View of the environment, upon which the system displays dynamic agents representing peo- ple and vehicles. This approach has the benefit that visualization of scene events is no longer tied to the original resolution and viewpoint of a single video sensor and the operator can therefore infer proper spatial relationships between multiple ob- jects and scene features. Another program called the the Modular Semi-Automated Forces (ModSAF) program provides a 2-D graphical interface similar to the VSAM GUI [48], with the ability to insert computer-generated human and vehicle avatars that provide simulated opponents for training [49]. The ModStealth program gener- ates an immersive, realistic 3-D visualization of texture—mapped scene geometry and computer-generated avatars [50]. Although automatic image analysis and video understanding tools [39] can be used to facilitate identification of targets and activation of alarms or logs for certain 10 I] {)1 iii. Iltfl dcx ('lll’l 59115 Illl_l\'i llllllt 1 (61111 (JYffr to ti. S‘Vlft surveillance tasks, the operator needs the video feedback to make decisions about the tracking task which may not have been pre—programmed or to independently task the network based on the current feedback received from the sensors. The video feedback provided to the operator can be transmitted over either analog or digital channels [51]. The use of analog modulating and transmission techniques for surveillance applica- tions has been reported in [52]. Receiving digital video feedback over an IP-based network from the networked camera sensors has also received much attention with the development of the various video compression and communication protocols [53—57]. Since multiple cameras are deployed to track the identified targets, multiple, con- current feedback video streams maybe required for monitoring the target. These sensors initiating these streams will be changing from time to time as the target moves out of range of the current sensors tracking it. However, providing multiple unnecessary (unrelated to the task) video feedback streams often causes loss of at- tention span of the operator and makes it hard to keep track of the various activities over the cameras. Hence only video streams from relevant sensors should be presented to the Operator on a per activity basis. This is done through automatic or manual switching of the camera streams that are presented to the operator. Literature Review Summary: From the above review on recent literature on mobile surveillance networks, the focus of the major works reviewed revolves around target identification and video understanding. However, most of the approaches cited in literature lack a generalized framework for modeling the surveillance task of main- taining visibility of multiple targets with adequate resolution. These approaches tend to use visual servoing which involves only feature point based target tracking and cannot adequately describe multi—target tracking using a single camera sensor. Multi target tracking is addressed in a very ad-hoc fashion in literature using methods such as sensor slaving and gaze control. A generalized modeling and analysis framework for pervasive target tracking using multiple distributed sensors is not available in 11 5835 0f ti; Cfffjl’f 56911;} n mode] Ih’ffast tori“; the CT, ”ad-ii: Ufa}. “fuel I Haiti} ”‘3 86:: F0] current literature. Although switched surveillance implementations using multiple distributed sensors have received much attention in recent literature, a comparative analysis of video feedback schemes for various visual surveillance scenarios has not been reported. Further, optimized sensor deployment based on minimizing the num- ber of sensor switches in order to pervasively track a target using multiple sensors over a large area has not been explored. This dissertation tries to address these issues and develop a formal modeling, analysis and design framework for mobile surveillance applications. 1.4 Objectives of this Research The main objective of this research is to implement a pervasive surveillance network using active camera sensors connected to each other using an IP-based network. The sensors have motion capability and are required to continuously maintain visibility of the identified targets in the surveillance region. The objectives of this research effort are to provide a modeling and design framework for such pervasive surveillance scenarios. This dissertation proposes a mutational-analysis—based topological framework for modeling and design of mobile surveillance networks which find applications in myriad infrastructure-less, rapidly deployable pervasive multi-target surveillance and moni- toring scenarios. Using the concept of mutations of domains and shape analysis, the conditions for accomplishing various surveillance tasks such as continuous target tracking while maintaining appropriate image resolution can be derived. The design of a surveillance task using image-based Hausdorff tracking, used when the target is in the active field of view of the sensor is presented. The method of cooperative Hausdorff tracking, used to track targets which are outside the active field of view of the sensor using the observations from other sensors, is presented. For pervasive surveillance scenario which requires a hand-off or switching mecha- l2 nisrn between a number of sensors, this dissertation proposes the Mutational Hybrid Automata (MHA) model, that combines the discrete switching structure of hybrid au- tomata with the domain-based evolution of mutational systems in order to overcome their individual limitations. It further presents example surveillance tasks modeled using the proposed MHA model and provides an analysis and proof of stability of the example systems. A switched video feedback system is an important component of the pervasive surveillance network. This dissertation compares different design alternatives for the implementation of the switched video feedback system, such as MJ PEG [57] H.261 [53] and H.263 [54] transported over RTP transport protocol [56]. Their performance under certain scenarios/tasks are measured and the advantages and disadvantages of different alternatives are analyzed. Various performance metrics are proposed to compare the suitability of these schemes based on a given scenario/ task. Further an optimal sensor selection algorithm for selecting minimum number of sensors to track a moving target is presented. 1.5 Major Contributions of this Dissertation The major contributions of this dissertation can be classified into various categories. This section provides a brief statement about the major contributions of this disser- tation. Theoretical Contribution o A new domain based modeling framework called Hausdorff tracking for tracking multiple targets using continuous camera motion is developed. 0 A mutational hybrid automata model is developed for modeling multi-sensor pervasive target tracking tasks. Using this model, autonomous as well as con- trolled sensor switching scenarios can be modeled. 13 0 Stability analysis of the proposed Hausdorff tracking scheme using the shape Lyapunov theorem has been presented. 0 Target tracking stability of the pervasive surveillance task modeled using the mutational hybrid automata models is analyzed for example systems using a two-step procedure. Technological Contributions 0 A switched video feedback system that enables streaming of relevant video streams to a human operator for multiple clients has been implemented us- ing a H.264 video encoding scheme. The operator clients developed allow the automated monitoring as well as active control of the networked sensors. 0 A task / scenario based performance metric is proposed to analyze the suitability of application of a particular video transport scheme for that task. Various schemes such as MJPEG, H.261, H.263 and H.264 are compared for different scenarios using the proposed metric. Design Contributions 0 An optimality framework for exploiting the redundancy in task accomplishment using Hausdorff tracking is developed. Physical programming has been used to provide a physical meaning to the combinations of the various cost functions. a The video switching algorithm is modeled as a discrete-time finite-state dynamic system and an assessment metric is proposed to analyze the locations and con- figurations of the cameras for switched video feedback in order to minimize the number of switches while maximizing the resolution sustained by the tracked target at the tracking camera. 14 011 Hibt Contributions for Experimental Work 0 A visual surveillance test-bed has been implemented for testing and validating the efficacy of the various approaches proposed. 0 A mobile robot and articulated robot arm have been integrated as a part of the motion platform for the visual sensors. These can be used in the future for further extension and experiments of this research. 1.6 Organization of this Dissertation The scope of this dissertation is to investigate a novel modeling and design framework for mobile surveillance networks. The work is divided into three parts - namely mod- eling, analysis and design. First, in Chapter 2, we present the necessary background on the mathematical tools which will be required to develop and analyze the formal modeling and design framework proposed. In Chapter 3, the new modeling framework using mutational equations for motion of a single sensor is presented. Using this mutational equation model, a shape-based Hausdorff tracking control scheme to track multiple targets with viewing constraints is developed. Chapter 3 further presents the Mutational Hybrid Automata model developed for modeling multi-sensor cooperative target tracking scenarios. Chapter 4 presents the shape Lyapunov theorem for analyzing the stability (in the sense of Lyapunov) of the Hausdorff tracking method proposed. It further presents examples of multi camera surveillance systems modeled using the mutational hybrid automata model. A procedure for analyzing the stability of the example multi-sensor cooperative tracking systems modeled using the mutational hybrid automata model is also presented. Chapter 5 presents the work done toward design of the switched video surveil- lance system. It presents a brief review on grouping architectures and presents a 15 sensor allocation algorithm which can be used to allocate sensors to targets. A design framework for optimal tracking of multiple targets by exploiting the redundancy of task accomplishment is presented. Further algorithms for sensor placement in or- der to minimize sensor switching while tracking moving targets and algorithms for optimized target tracking using switched sensors are also presented. Experimental validations of the proposed algorithms are presented in Chapter 6. The experimental results for Hausdorff tracking, multi-sensor cooperation using the mutational hybrid automata model and multi—sensor target allocation are presented. Further results of optimal Hausdorff tracking and multi-target tracking experiments are also presented. Chapter 6 also provides a performance analysis of the various video transmission schemes and proposes a task- and scenario— based metric for eval- uating the various video compression and transmission schemes for switched video surveillance tasks. Further, simulation results for optimized target tracking using switched sensors is also presented. Conclusions and future directions are presented in Chapter 7. 16 9qu Fur fl 10g WIT ll 2.1 Let g. and J be Cir SHIN. (105,.“ lle‘d . Chapter 2 Mathematical Background This chapter presents the mathematical preliminaries that will be used in the rest of the dissertation. A definition of a dynamic systems is presented. Then mutational equations, which are used to represent domain based dynamic systems are introduced. Further hybrid dynamic systems are introduced and the Mutational Hybrid Automata model is presented. Shape analysis and shape functions are introduced to provide various measures on sets (shapes). 2.1 Introduction Let X be a collection of variables; then X denotes the valuations of these variables and a: will be used to refer to both the variable and its valuation (the reference should be clear from the context). 2X denotes the power space of X i.e., the family of all subsets of X and [C(X) denotes the space of all nonempty compact subsets of a given closed set W C R". We will use M to denote a general metric space and X will be used to denote a finite-dimensional vector space. A controlled dynamical system is a system 2 = [X , F ,U , qt] where, o X is an arbitrary topological space called the state space of E; 17 ifl di: Dir; C011 enti o I‘ is the time set, which is a transition semigroup with identity; 0 L1 is a nonempty set called the control—value space of E;and o The map d) : X x I‘ x U H X is a continuous function satisfying the identity and semigroup properties [58]. The dynamical system can also be denoted by the system 2 = [X , F,L(, f] where the transition function f is the generator of the extended transition function ¢ [58]. A discrete-time dynamic system is a dynamic system 2 for which I‘ = Z where, Z is the set of integers. The discrete-time dynamic system, 2 is finite dimensional if both X and L! are finite dimensional and the dimension of the system 2 is the dimension of X. A system 2 is complete if every input is admissible for every state. A hybrid dynamic system is an indexed collection of dynamic systems with some map defined for jumping among them. Various models for hybrid systems have been proposed in the literature [43] [44]. Generally, these approaches assume that the continuous dynamics part of the hybrid systems are defined by the solutions of differ- ential equations (51': = f (:I:(t))) where :r:(t) E X C R" and the function f : X H IR" is called a vector field on IR". The resulting dynamical system is given by ¢(:ro, t) = 23(t) where :r:(t) is a solution of the differential equation starting at 3:0 at t = 0. For the case of the pervasive surveillance problem, the variables to be tracked are not elements of a vector space but geometrical domains themselves which are elements of the space [C(W) of all non-empty compact subsets of a given closed set W C R" with distance d. Hence we need to enhance the hybrid systems model(s) to include the cases where the continuous state is not an element of a vector space but a geometric domain K E [C (W) In this case we describe the continuous dynamical system as the solution of a mutational equation K 3 [C(E) K(): (2.4) ti-—> K(t). The evolution of a tube can be described using the notion of a time derivative of the tube as the perturbation of a set. Associate with any Lipschitz map (,0 : E i—-> E, a map called the transition 19¢(h, q) :2 q(h), which denotes the value at time h of the solution of the differential equation: (1 = 99((1) (1(0) = <10- 20 Extend this concept of a transition to the space [C(E) by introducing the reachable set from set K at time h of (,9 as 19,901.10 == {199201. qo)}qoeK (2-6) The curve h +——> 19¢(h, K) plays the role of the half lines h i—> :1: + lit) for defining differential quotients in vector spaces. Using the concept of the reachable set the time derivative of a tube can be defined as a mutation: Definition 2.2.1 (Mutation) Let E C R" and cp : E i—r E be a Lipschitz map {9.0 E Lip(E,lR”)). Iffort E R+, the tube K : R+ H [C(E) satisfies: d(1((t+ h). 199001. K0») 1' = 2. ’13ng h 0. ( 7) then, go is a mutation of K at time t and is denoted as: K 3 «p (2.8) It should be noted that cp is not a unique representation of the mutational equation, which justifies the use of the notation (3) [60]. Consider a function (,0 : R x [C (E) i—> Lip(E, IR") as a map associating a Lipschitz map (,0 to a pair (t, K). Using this map, we can define the mutation of tube K (t) as: K(t) a cp(t,K(t)), VtZO (2.9) 2.2.2 Controlled Mutational Equations A controlled mutational equation can be written as: K(t) a e(t,K(t),u(t)), VtZO (2.10) 21 u(t) e U. (2.11) where, (0 : R+ x [C(E) x U H Lip(E, IR") is a continuous map associating a Lipschitz map with (t, K, u) and u(t) E U is the control input. A feedback law can be defined as a map Ll : [C(E) H U associating a control u with a domain K (t) as: u(t) = U(K(t)) (2.12) Using a controlled mutational equation, we can model the motion of the target and coverage sets due to the motion input u to the camera/ robot. 2.3 Hybrid Dynamic Systems Hybrid systems generally refer to dynamical systems which consist of a mixture of discrete and continuous components. Such systems describe the time evolution of systems with both continuous and discrete parts. Hybrid dynamical systems consid- ered in the literature generally have continuous dynamics modeled using differential equations which depend on some discrete phenomena like switching and jumping in the continuous and discrete state of the system respectively. These hybrid phenomena can be modeled using hybrid automata proposed by Lygeros and Sastry [44] [61]. A hybrid automaton describes the interaction and time evolution of both the discrete and continuous parts of a hybrid dynamical system. The hybrid automata frame- work models the continuous part of a hybrid dynamical system as using differential equations. Edge maps and reset maps are used to capture the discrete nature of the system while guard conditions and domains are used to describe the interaction of the discrete and continuous components. 22 A hybrid automaton (HA) is a collection H = (Q, X, s, Init, D, E, G, R] (2.13) with the constituent parts as follows. Q is the set of index states or discrete states. X is the set of continuous variables. 2 = {2,1}qu is a collection of controlled dynamical systems, where each Eq = [Xq, Fq, fq,uq] is a controlled dynamical system. Xq are the valuations of the continuous state variables X q, fq are the continuous dynamics and Hg is the set of continuous controls. I nit C Q X X is a set of initial states. D : Q H 2X is a domain. Each discrete state has a domain of possible con- tinuous states associated with it. The domain is the set of valuations of the continuous states where the particular associated discrete state is valid. E C Q x Q is a set of possible edge transitions between the discrete states. G : E H 2X is a guard condition. Each edge in E is mapped to a manifold in X which triggers a mode switch when the continuous state hits this manifold. R : E x X H 2X is a reset map which causes the continuous state to be reset due to the mode switch in the discrete state. The state of the above hybrid dynamical system can be denoted as S = UqEQXq x Q. The above model can be used to model the discrete phenomena mentioned earlier. For more details about this model refer to [44]. 23 2.4 Mutational Hybrid Dynamic Systems Mutational equations can be incorporated into the definition of general dynamical sys- tems as they satisfy the various nontrivialitry, semigroup and identity properties [58]. Using the concept of transitions of domains introduced earlier and mutational equa- tions, we can write a general dynamic system as: E =[IC(W),I‘,2999] or (2.14) s = [K(W), r, ,0) (2.15) where i9.p is the transition function and (,0 is the generator of that transition function. Using the above definition of a dynamic system, we can now define a Mutational Hybrid Automaton (MHA) as: Hm = [Q, M,E,Init,D, E,G, R] (2.16) with the constituent parts as follows. 0 Q is the set of index states or discrete states. 0 M is a collection of the continuous variables K,- 6 IC (W) E = {2%}qu is a collection of controlled dynamic systems, where each Eq 2 [K(W)q, I‘q, (,0q,qu] is a controlled dynamic system. [C(W)q are the continuous state spaces, (0., can be construed as the continuous dynamics and Hg is the set of continuous controls. Init E M x Q is a set of initial states. D : Q H 2M is a domain which associates a domain of operation with every qEQ. 24 n)- o E C_: Q x Q is a set of edges along which discrete transition takes place. 0 G : E H 2M is a collection of guard conditions which enable discrete transitions. 0 R : E x M H 2'M is a reset map which resets the continuous state when a discrete transition is enabled. Thus, S = quQ M x {q} is the hybrid state space of Hm. A MHA can be pictured as an automaton as in Figure 4.2. There, each node is a constituent dynamical system with the index state as the name of the node. This definition is a fairly broad and powerful modeling framework which subsumes continuous differential equations, mutational equations, discrete time difference equa- tions, finite automata as well as switched controlled systems and non-smooth differ- ential/ mutational equations. A more intuitive explanation of the various components will be provided via an example application to pervasive surveillance networks in the following section. 2.5 Shape Analysis Shape analysis deals with problems where the variables are not vectors of parameters or functions, but the shapes of geometric domains or sets K contained in a subset E C R”. The measure of deformation of a set K can be expressed using shape functions as: J : [C(E) H IR. The directional derivative of the shape function rep- resents the change in the shape function J (K) due to the deformation of the shape K. 2.5.] Shape Functions Shape functions are set-defined maps defined from K(E) H IR and provide a “mea- sure” of the deformation of K E [C(E). For example we can use a shape function 25 Ti, to check the distance of the set K to another set K, or whether the reference set K is contained within the current set K. Shape acceptability and optimality can be studied using shape functions J as optimization cost functions as is shown in [62]. Various shape functions can be readily developed which can adequately describe the surveillance task of set coverage or a measure of the size of a set, etc. 2.5.2 Directional Derivative 0f Shape Functions The directional derivative of the shape function [62] represents the change in the shape function J (K) due to the deformation of the shape K. It can be construed as the analog of the directional derivative in vector spaces and provides us with a measure of the change in the task criterion due to motion of the coverage or target set. Consider a function J : H H R where H is a real Hilbert space such as R". The Gateaux (directional) derivative of J at the point q in the direction of v is defined as: J(q+tv)—J(Q) D ' . = l' 2.1 J ((1)0) gr), t ( 7) This can also be looked at in a different way. Suppose q(t) is defined as: 40) = 11. (1(0) = q (2-18) This means that ifj(t) = J(q(t)) then, dj(t) _ -. _ . J((me - J(CI(0)) dt " J(t) — ill—13(1) t (2'19) = lint) “‘1 + “’2 — “(1) = DJ(q)(v) (2.20) Thus, the Gateaux (directional) derivative involves the point q and the direction v at t=0. 26 for); can 2.6 Tlll.‘ tati» 113-b: nan] llnu, 1151? j A 11:. Slime. 8‘30“ : Extending this same concept when q is not an element of a Hilbert space, but a shape (domain) K and v is replaced by (0, which is the direction of mutation of the tube K. Thus the Eulerian semi—derivative (Gateaux directional derivative) is defined as: MM z 1.... J(v.(t.1<))— J(K) 2.21 t—>0 t ( ) where, 1999(t, K) is the t reachable tube of K under the mutation (,0. From [60] and [63], the directional derivative of the shape function having the form of J: [K new. (222) can be written as: J°(K)() dq (2.23) K 2.6 Chapter Summary This chapter presents a discussion on various dynamic systems and introduces mu- tational equations as a tool to model the dynamics of shapes. It also presented the hybrid automata model used to model a combination of discrete and continuous dy- namic systems. Hybrid automata models use differential equations to model the con- tinuous system dynamics. This chapter further extends the hybrid automata model to use mutational equations to model the continuous dynamics. The Mutational Hybrid Automata (MHA) model proposed allows us to model large scale networked pervasive surveillance systems. This chapter also introduces shape analysis used to measure the acceptability of shapes using shape functions. 27 Thi. lam irai: Hifir Illi‘t Inuit (hill of II: 3.1 In a Willi 0f a . One ,- JUN}; Chapter 3 Modeling Mobile Surveillance Networks This chapter presents the modeling of a tracking task using a multiple camera surveil- lance network. which is a mutational analysis based technique christened “Hausdorff tracking” (in honor of the German mathematician Felix Hausdorff who made sig- nificant contributions to set theory, descriptive set theory, measure theory, function theory, and functional analysis), is introduced in order to track multiple targets si- multaneously using a single camera sensor. Further the Mutational Hybrid Automata (MHA) model is introduced for tracking a target as it moves across the viewing region of multiple cameras. 3.1 Introduction - Example Surveillance Scenario In a multiple camera surveillance scenario with active (moving) cameras, the target, which moves in a continuous fashion, can be tracked using either continuous motion of a camera sensor or by discontinuous jumps of the “current” tracking sensor from one camera to the next or a combination of continuous motion and a discontinuous jump. Consider the example surveillance scenario form Figure 3.1(a) which depicts 28 -- Coverage set .- Target set (disjoint) Optical Axis (a) Three—Camera Pervasive Surveillance Scenario Figure 3.1: 3 Camera Pervasive Surveillance Scenario the image planes of three camera sensors and a couple targets. The targets can be tracked using the continuous motion of Camera 1 as is shown in Figure 3.2(a). We see that Camera 1 is moved in a continuous fashion to ensure that the targets are within the active field of view of the sensor. In Figure 3.2(b), the tracking task is transferred to Camera 2 which can already view the targets in its active field of view. Finally, as shown in Figure 3.2(c), the tracking task can be transferred to camera 3 which will first need to acquire the targets by executing continuous motion and then a discrete switch will be executed to transfer the tracking task to camera 3. This chapter first presents the modeling of continuous motion of the camera sen- sor and target (continuous dynamics) using mutational equations. These dynamics 29 (a) Coverage Using Continuous Motion (b) Coverage Using Only Discrete Motion (c) Coverage Using Hybrid Motion Figure 3.2: Modeling Pervasive Surveillance Networks 30 Ff. 1 models can be used for developing a servo mechanism to control the continuous cam- era motion. The continuous input to the camera for motion can be controlled using an error feedback from either the image space analysis or the world space analysis. Image space analysis implies that the error feedback for moving the camera is derived from analysis of the target location in the image plane of the camera, while world space analysis implies that the error input for moving the camera is derived from the world space coordinates of the target (acquired from other sensors that can observe the target) and the current world space location of the camera coverage set. Hence, we develop models for image space servoing and world space servoing separately. The mutational equation model for the target motion in the image space is derived using the perspective projection model and the image jacobian [64]. The motion of the camera coverage set in the world space model is derived separately for 2-D and 3—D scenarios. In the 2-D scenario, the camera coverage set is taken to be a pie cut out from a circle, while in the 3—D scenario, the camera coverage is modeled as a 3-D pie (cone bounded by a sphere) cut out from a sphere. The motion of these coverage sets due to the motion of the camera is modeled as a controlled mutational equation. Rirther, we relax the implicit assumption of holonomic motion constraints and extend the continuous motion modeling using mutational equations for a sensor with non-holonomic constraints. The Hausdorff tracking method for image space and world space models is intro- duced. This method uses shape functions to derive feedback errors which are used to control the motion input to the camera. Image space Hausdorff tracking and coop- erative Hausdorff trackng are presented as two examples of the Hausdorff tracking method in the image space coordinates and world space coordinates, respectively. This chapter also introduces the Mutational Hybrid Automata (MHA) model for modeling the hybrid camera motion scenario shown in Figure 3.2(c). Using mutational equations, the mutational hybrid automata model can describe the continuous motion 31 of the cameras and target and the discrete switching behaviors when a target leaves or enters the field of view of a camera. Using Hausdorff tracking method developed in this section, there are numerous choices of the input applied to the camera which can accomplish the task. This is because the robot system has a certain amount of redundancy built in for the surveillance task. An example of this redundancy in task execution can be evinced by noting that in order to bring a target, which is currently located at one of the sides of the camera F OV, into the cameras F OV, the camera can execute an angular motion by panning or a lateral motion by moving behind. Both these motions individually or a combination of them can be used to accomplish the surveillance task. Hence we need to select an appropriate input to the camera for task accomplishment. We can select various other tasks that the camera can perform in order to utilize the redundancy offered by the camera motion. For example the energy consumed due to the motion of the sensor should also be minimized while maintaining a safe distance from sensed obstacles (in the case of sensors mounted on mobile robots). These combined tasks can be expressed as a multi-objective optimization problem which tries to minimize the resolution inadequacies and energy consumption subject to the constraint of maintaining the visibility of the target. Various other tasks can also be considered for the multi-objective optimization problem and the complete problem can be represented as an optimal control frame— work for Hausdorff tracking. Once the task criteria are expressed in a multi-objective constrained optimization framework, various methods such weighted sum method ,physical programming, non-linear programming, search space methods, etc. can be used to find the solution to the Optimization problem. 32 Tl llsj ilt )1 hip I111; 101 Spam: 3.2 Controlled Mutational Equation Model for Camera Sen- sor and Target A controlled mutational equation can be written as: K(t) a (,0(t,K(t),u(t)), age (3.1) u(t) E U, (3-2) where, (p : R+ x [C(E) x U H Lip(E, R") is a continuous map associating a Lipschitz map with (t, K, u) and u(t) E U is the control input. Using a controlled mutational equation, we can model the motion of the target and coverage sets due to the motion input u to the camera/ robot. 3.2.1 Perspective Projection Model The deformation of the target set w.r.t. the motion of the camera can be represented using a mutational equation. The mutational equation can be derived using the optic flow modeling as shown in [41]. Let u be a velocity screw which represents the velocity input to the camera and let q be a vector of image space feature parameters. The image jacobian Jv is a linear transformation that relates the camera velocity screw u to the tangent space q of the image space feature parameter vector q as: q = Jv(c)u (3.3) where, c represents the camera pose such that c = u. Assuming that the projective geometry of the camera is modeled by the per- spective projection model, a point P = [:r,y, z]T, whose coordinates are expressed 33 r—J with respect to the camera coordinate frame, will project onto the image plane with coordinates q = [gag qy]T as: (3.4) where A is the focal length of the camera lens [41]. Using the perspective projection model of the camera, the velocity of a point in the image frame with respective to the motion of the camera frame [41] can be expressed. This is called the image jacobian by [41] and is expressed as: = «pc(q) = Bc(Q) , = Bc(q)uc (3-5) —-:— o a —e A—xe q... s 0 —% 2} 1:33 _-_€1.rq _q$ {q )T are the image space coordinates of the point P whose task Where. q = ((122.93; Space coordinates are (23,31, z)T. A is the focal length of the lens being used and u = [11:2, vy, vz, (.013, wy, wZ]T is the velocity screw of the camera motion and A is the rate of change of the focal length. The mutational equation of the target set can now be written as: (i: wlq) = 99cm) + 92107) A K 3 (0(K) (3.6) 3.2.2 World Space Model We can derive the mutational equation for the sensor coverage set and target set in the world space. Here we consider two cases for a 2-D model and 3-D model 34 Ci; poi for mum. separately and present the mutational model for the 2-D and 3-D cases based on the dimensionality and geometry. 2 Dimensional World Space Model Consider a point q E R2 defined as: qr, _ :r + r cos(p) (3 7) qz 2 -l- rsin(p) where [r, z, 0]T are the coordinates of the robot and r, p are the polar coordinates of point q with respect to the robot coordinates. Using equation 4.15, the coverage set for the focusing constraint can be written as: K : {q l T E (Dmina Dmax)ap E (01min: amaw» (3'8) where Dmm, Dmax are the minimum and maximum focus distances and amin: amax are the minimum and maximum angles of the view lines as shown in Figure 3.5. The motion of each point in the coverage set can be written as: q;- 1 0 —r sin(p) 2 ’11. 4,. 0 1 r cos(p) i=Mm=Bu we where u = [13, 2,9]T is the velocity input to the camera. Using equation 4.17, the mutational equations for the motion of the coverage set can be derived. 35 3 Dimensional World Space Model The coverage set can be specified as the set of points which satisfies the viewing constraints of the sensor and can be defined as the active FOV of the camera after incorporating the depth of field constraints such as the maximum and minimum focus distances. Consider a point q E R3 in the sensor coordinate frame defined as: ' (1x - ’ rcos(0) sin((b) ‘ qy = rcos(q‘9) (3.10) qz rsin(6) sin(c§) 4 where (r, 6, (b) are the polar coordinates of the point q in the sensor frame. Using the above equation, the coverage set can be defined as : K = {q I 7' E (DminiDmar)»9 E (aminaamarla d9 E (remap, (3111271)} (3:11) where, (DmimDmax) represent the depth of field and (am-mam”), (fimabflmm) are the view] angles for the sensor. The global coordinates of the target set K can be transformed to the local sensor coordinates using the localization information available. The mutational equation for the motion of the coverage set can be written as: qg; F cos(6)cos(q§) —sin(9)sin(gb) T 4y = 7‘ 811105) 0 u _ (iz . [ sin(9)COS(¢) c08(9) $1105) - (i = <.0(<1)=Bu (3-12) where u = [wig wy]T is the velocity input to the camera. The objective is to cover the 36 In ti. set of targets K with the coverage set K such that K C K. 3.3 Incorporating Non-holonomic Sensor Constraints In section 3.2 and 3.2.2 the mutational model relating the motion of the sensor u to the motion of the target and coverage sets was introduced. However, the implicit assumption made was that the individual components of the vector u are independent of each other i.e., the system has a holonomic structure. Consider the example of a car moving in a horizontal plane. Select the reference point to be the midpoint of the two rear wheels and the reference direction to be the main axis of the car moving forward. This car’s configuration can be defined as a vector (state) (:r,y, 6) E R3. However, these parameters and their derivatives are related by the following relation: —:i: sin6 + ycos6 (3.13) where it and y are the derivatives of a: and y respectively. At any configuration (2:, y, 6) along a trajectory parameterized by time, the triplet (5i), , y, 6) is the velocity of the robot (car). The set of all vectors (2:, , y 6) passing through the point (x, y, 6) is a vector space. If the robot was not constrained by the constraint in equation (3.13), this vector space would be the tangent space of the configuration space at (:r,y, 6) and would have a dimension 3. However, due to the kinematic constraint in equation (3.13), the velocity vectors only span a subspace of the tangent space of dimension 2. This is called a nonholonomic constraint and the system dynamics of the car are said to be nonholonomic. Various sensor motion platforms have a nonholonomic structure with examples like a car which can be steered left and right only when the car has a velocity component in the forward direction. Hence it is required to model such nonholonomic platforms 37 for an accurate description of system dynamics. Another example of a nonholonomic platform would be a sensor mounted on a unicycle-type robotic motion module with a differential drive, which has been quite popular recently since they require only two motors for forward motion and steering. Further, due to the large amount of energy consumed by the motion module, multiple sensors with varying sensing modalities may be mounted on a single motion module. An example is the mobile manipulator depicted in Figure 6.3, which has a PTZ camera mounted on the mobile base and a bullet camera mounted on the robot arm. This section presents an example of a unicycle-type robot with nonholomic motion constraints which can be modeled as: a': = v cos 6,- 3) = 1) sin 6,- 6 = (.0 (3.14) where, v and w are the linear and angular velocity control inputs of the robot. The relationship between u and uh = (v, w)T can be written as: cos6 0 v u = Bhuh = sing O (3.15) w 0 1 Thus, the controlled mutational equation for camera coverage model in equation 3.6 can be written as: WI) = 3(a)u = B ((DBhuh = Bow. (3-16) 38 3.4 Hausdorff Tracking The task of a surveillance camera is to continuously keep a moving target or multiple moving targets in its active field of view. The surveillance task of maintaining multi- ple targets in the active field of view (F OV) of the sensor with an adequate resolution can be readily expressed in a set-based framework where the variables taken into con- sideration are not vectors of parameters but shapes (domains) themselves. However, due to lack of vectorial structure on the space, classical differential calculus cannot be used to describe the evolution of such domains. However, mutational equations [60] can be used to describe the dynamics (change in shape) of the sensor FOV and target domains and further be used to calculate feedback mechanisms to accomplish the surveillance task. The surveillance task can be expressed, using shape functions [62], as the min— imization of a Hausdorff distance-based metric or the size of the target, etc. The shape function essentially represents the error between the desired and actual shapes and reducing it to zero will accomplish the task. This section presents the method of Hausdorff tracking using mutational equations for performing the surveillance task. Shape or a geometric domain can be defined as the set K E [C(E), E C IR" where [C (E) represents the space of all nonempty, compact subsets of E. The target and the camera coverage can be readily expressed as shapes. Mutational equations can then be used to express the change (deformation) in the coverage and target sets based on the motion of the sensor. Shape analysis [62] can be used to address problems involving geometric domains or shapes. Shape functions, which are set-defined maps from J (K) : [C(E) H IR, can be used to provide a ”measure” of acceptability and optimality of the shape K. For example we can use a shape function to see if a reference set K is contained within a current set K. In order to accomplish the task defined using shape functions, we need to derive a feedback map LI : [C(E) H U, where u = U (K (t)) is the input to the sensor, which will reduce the shape function 39 (it d 3 IS ; P0: CO( CO I») to zero. The convergence of the shape function can be analyzed using the shape Lyapunov theorem [65]. The convergence to zero of the task function would imply task accomplishment. The task requirements for Hausdorff tracking can be expressed as a positive semi- definite shape function which provides a measure of acceptability of a particular shape (set). For the example shown in Figure 3.2(a), the measure of acceptability of the target image (sets) is whether they are within a certain region of the image space and the measure can be expressed as a shape function J (K) 6 73+. The task of Hausdorff tracking can be defined as finding an input in order to move the camera in order to reduce the shape function to zero. Hence we need to express the evolution (change) in the shape of the target w.r.t. the movement of the camera. This is accomplished using mutational equations, which are the analog of differential equations for shape spaces. Using the mutational equation, we should be able to calculate the directional derivative of the shape function w.r.t. the motion of the camera, and using an inverse dynamics procedure, calculate a motion input to the camera in order to reduce the shape function (error) to zero. For the example in Figure 3.1(a), the task requirements for Hausdorff tracking can be mathematically] expressed using a positive definite shape function J (K ) = fl? (13((p) dp where K is the coverage set for the camera and dK(p) = inqu K ”g — pl], is the point to set distance from point p to set K. Note that the shape function is zero only when set K is completely covered by set K, otherwise it is a non-zero positive value. Initially in Figure 3.1(a) the target set K is not covered by the camera coverage set K and hence the shape function J (K > 0. The Hausdorff tracking task is to reduce the shape function to zero by moving the camera as shown in Figure 3.2(a) i.e., minimize the shape function. 40 II p: r. a( lllt Ta] (‘3 3.4.1 Image-Based Hausdorfi Tracking When the target is located in the active FOV of the sensor, the method of image- based Hausdorff tracking can be used to accomplish the surveillance task. The target is detected using image processing algorithms and represented as a collection of pixels (blob) on the image plane. The sensor coverage set is represented as a rectangle cen- tered at the image center and one of the tasks is to maintain the target blob within the sensor coverage set. Surveillance tasks such as set coverage and maintaining ap— propriate image resolution can be readily expressed in a setbased mutational analysis framework using a shape function as the minimization of a Hausdorff distance-based metric or the size of the target, etc. The task can be accomplished by reducing the shape function to zero. The task of visual servoing can also be expressed in a set- based framework [66]. The shape function essentially represents the error between the desired and actual shapes. Target and Coverage Sets for Image-Based Hausdorff Tracking Figure 3.3 depicts the coverage and target sets. The coverage set K is the set of points on the image plane (X, Y), within which we require the targets to be maintained. Generally it is taken as a rectangle centered in the image plane. The target K is defined as the union of the individual image plane blobs (sets of pixels) formed by the multiple (n) targets as: K = U121 Ki. Shape Functions for Image-Based Hausdorff Tracking The surveillance task of maintaining visibility of the target(s) with a certain resolution can be represented using a shape function as: 41 Figure 3.3: Target and coverage set for image-based Hausdorff tracking. J(K) = JFovu‘o + Jame?) + amuse?) (3.17) JFOVUAQ = I}? did?) dq JAm,,,(f{) = max(ff( dq — AREA_MIN, 0) JAMAR) = min(AREA-MAX — In dq, 0) where q is a point on the image set K and AREA_M AX and AREA_M I N denote the maximum and minimum admissible areas of the target set K for maintaining adequate resolution. Note that the shape function J (K ) is zero only when set K is completely covered by set K and when the area of set K is within the limits (AREA-M I N, AREA_M AX ) Otherwise it is a non-zero positive value. 42 Shape Directional Derivative for Image-Based Hausdorff Tracking In order to derive the feedback u to the camera, we need to find the shape directional derivative J (K )((p) of J (K) in the direction of the mutation (0(K) using the shape function and the mutational equation. Assuming a relatively flat object, i.e., the z coordinate of all the points on the target are approximately the same, we can derive an expression for J (K )((,0) by substituting equations 6.1 and 3.6 into 2.23 as: face) = [K waive) +fdiw dq (3.18) where, f (q) can be construed as the aggregated integrand from equation 6.1. For the shape function of the type J(K) = [K d§{(q) dq (3.19) Substituting Equations (3.19) and (3.6) into Equation (3.21) we get f(mev‘cu» = Lee—Menage + didiv(B(q>u)) dq = [f((aq—ern-Be) +d§~((q)divB(q)) dq - u (3.20) The above equation can be written as: June) = 2 [K (16311}; 0.1“" -p>..o(p>> dp+ 2 [K divee» dp (321) Here it must be noted that for the existence of the shape directional derivative, the integrand or E C1(IR”,IR). J FOV in Equation (6.1) is continuous. However, the 43 elements J Amin and J Amax in Equation (6.1) do not satisfy this criteria. Hence we modify the shape function in order to satisfy the continuity criteria using smooth approximations of the min and max functions as: “d —Amm2 62 “d ”Amm (MK 10 > + +0., 12 ) (3.22) JAmin : JAmaa: : _ (3:23) 2 V (Amar - IR dp)2 "l' 52 “ (Amos: _ fK dp) 2 where, e is a very small positive number. This approximation physically manifests as adding a very small area (of size 6) to the target set size and hence this approx- imation will hardly affect the validity of the shape function to represent the task of maintaining adequate resolution. Feedback Map u for Image-Based Hausdorff Tracking The problem now is to find a feedback map uc such that the shape function J is reduced to zero. Equation 3.21 can be approximated as: . - 1 . J(K)((0) S Z—tCi(K)U1 + CI'2lKl‘U2 +' C3(K)’uc (3-24) where,uc = [u?u§]T, u1 = [vx,vy,vz,A]T and u2 = [w$,wy,wz]T. zt is an esti- mated minimum bound on the target 2 position. The relation 2t > 2 will guarantee the inequality in 3.24. The matrices C1(K), 02(K) and C3(K) can be defined as: c.(K)=2/ inf B.(q>T(q—p>dp, (3.25) KqGHKIP) 44 02(K)=2/ inf B2(q)T(q—p)dp. (3.26) KqEHK(P) 03(K) = 2 [K B(q)dp. (3.27) where, the matrix B(q) is defined as: B(q) = [31 (q),B2(q)] and its elements are defined as: I "A 0 9x Bi(q) = 2— (3.28) t 0 —A qy QIQ (A2+02) x "TL q B2(q) = EH2 y 9X (3.29) will“ if“ —q. 91“ After combining the relevant terms from equation 3.24, it can now be rewritten A J(K)(T(q — pup. (3.35) K (Enid?) Let C (K )# be the pseudo inverse of C (K) A choice of u can be taken as: 1 # u = —§aC(K) J(K), (3.36) where a is the gain of the control. The gain distribution of the task among the various control channels of u is controlled by assigning the null space vector of the matrix C(K). 49 3.5 Mutational Hybrid Automata Model for Pervasive Surveil- lance Networks Consider a network of N cameras which comprise the pervasive surveillance system. With each camera camn we associate a coverage set Kn E [C(Wn) which signifies the region camn can view actively and Wn represents the total capable FOV of camn. A target to be tracked is represented by a target set K C [C(W), W = Une N Wn. An example of the graphical MHA model for two cameras tracking a target is shown in Figure 4.2 Groups of cameras are sequentially involved in tracking a target as it traverses the monitored region. Within each group, the behavior of each camera is governed by target visibility from a particular camera. This combination of continuous track- ing and switching behavior can be modeled using the Mutational Hybrid Automata (MHA) model presented in section 2.4. This section discusses the various parts of the MHA model for the pervasive surveillance network H p S N and further elaborates on the design of stable controllers for a pervasive surveillance system modeled using the MHA model. 3.5.1 Mutational Hybrid Automata Model It is relatively straight forward to write the MHA model for a pervasive surveillance system as: HPSN = (Q,M,Z,Init,D,E,G,R) (3.37) where Q represents the discrete variables, M represents the continuous variables, 2 = {2.1}qu is a collection of the continuous mutational dynamics, Init C_: Q x M is 50 the set of initial values of the variables in Q and M. D is a domain which associates a domain of operation of the continuous states for every discrete state q E Q, E is an edge set along which discrete transition takes place, G is a set of guard conditions which enable the discrete transitions and R is a reset map which resets the continuous states when a discrete transition is enabled. Discrete Variables In the above multi-camera surveillance scenario the target can be tracked by a single camera or a group of cameras at any given time. Fhrther, if the target is in the FOV of a particular camera, it can visually track the target, i.e., perform image-based tracking, while if the target is not visible, the camera can cooperatively track the target based on the location of another camera which can view the target. Hence one discrete (binary) variable bn is associated with each camera to indicate whether the target is visible to that camera, such that bn = 1 if target is visible in camn and bn = 0 otherwise. Another discrete variable on indicates whether a camera (camn) is involved in the particular tracking scenario. Thus the discrete variables collection can be defined as: Q = {{bn, CR}nENi }. Edge Set The edge set E defines which particular transitions are possible. Here we describe general rules to develop E. A handoff between cameras can be initiated when the target is visible to both cameras. An example edge set for a two—camera scenario is developed in section 4.3. 51 Domains The domains of operation Dq for each mode can be developed based on location of the target set K, which will decide which cameras are capable of viewing the target, and based on the continuity of the target motion, will decide which cameras can acquire the target as it moves through the monitored region W. Continuous Variables The collection of continuous variables AI consists of the individual camera coverage sets Kn and the target set K; i.e., M = {{KnlneNa K}. Let K E M represent an element of the valuations of M. Assumption: We can safely make the assumption that the target set K is convex. This assumption does not have significant effect on the model as all non-convex targets can be modeled as their convex cover. This assumption will be invoked to aid the proof of stability later in this section. We further make the assumption that the coverage sets Kn are also convex. This assumption can be substantiated by the fact that the active fields of view of most cameras are convex. Continuous Dynamics The continuous dynamics represent the change (deformation) of the sets in M which can be modeled using mutational equations as: Kn 3 (pn(Kn, un) where an E U is the control input to cam”. Based on the mode of operation of camn, the controller model can be changed, giving rise to different continuous dynamics for each camera in different modes of operation q. The motion of the target can also be represented as a mutational equation as: K 3 (0t(K) which can be estimated from the observations of the target. 52 Guard Condition A guard condition or switching condition G associated with each edge e E E is designed based on the location of the target set. Various events can be recognized and processed by the multiple camera surveillance network such as the visibility of a target in a particular target or the global location of the target or camera coverage set reaching some predefined value. For example, when the target moves into the visible region of camn then the bn associated with it changes, which effects the transition along an edge e to a mode where bn = 1. Further, based on the current location of the target (sensed and communicated from other cameras), the variable an associated with camera camn will change in order to involve the camera in the tracking task. Reset Map Since we are dealing with continuous positions of the targets and camera coverage sets, there are no jumps affected in the continuous variables. Hence the reset map R can be defined as R : {(p, q), K} I-—> K, K E M. However, in case of multiple groups of cameras being involved in the surveillance scenario, the coverage sets, which are defined as the union of the individual coverage sets, will jump from one group to another based on the change in the discrete variable c. For example, when switching the camera group from state q to p, the reset map is given as: R(p,q,K,;) t—> K j where Ki = UlEpKlv Kj = UlEqu- Init Set The target can be recognized only if it is in the view of at least one camera. Hence the set of initial conditions is restricted to the set of discrete modes Init : {q, M : 53 Anern = 1} where bn is the target visibility condition for camn. 3.5.2 Controller Design for Individual Cameras Events can be recognized by the controller which can be used to generate the control inputs provided to the various cameras and trigger mode switches. Such events include the recognition of the target by a particular camera and the location of the coverage set of camn hitting certain boundaries within its domain of operation Wn. Also, within each mode q we are provided with the number of cameras actively involved in the surveillance c and the cameras which can view the target on. Camera Active, Target Visible(cn = 1, on = 1) When the target is visible in the image plane of camn (bn = 1), the method of image— based Hausdorff tracking [67] can be used to track the target. In this controller, the control input to the controller is derived based on the shape function z'Jn(i"Kn,i" R) : K(Wn) x [C(W) I—> R+ where the superscript in represents the projection of a set on the image plane z'Wn of camn. The control input an can be calculated as is detailed in [68]. Camera Active, Target Not Visible (on = 1, bn = 0) When the target is not visible to camn but is visible to camm (bm = 1), we can use the method of cooperative Hausdorff tracking proposed in [67]. If the target K ¢ Wn then camn should track the projection HWn(K) of the target on the domain W”. This strategy will ensure that as soon as the target moves into the region Wn it will be encompassed in the visible region of camn. However, in a real tracking scenario the projection flu/”(1%) may not be easily available. But taking into account the fact that bm = 1, which implies that R C K m, 54 we can deduce that HWn(K) C IIWn(Km). Thus, tracking HWn (Km) will also ensure tracking the set IIWnUf) (proof of this proposition not presented due to space limitations but it can be done in a very straightforward manner). We can derive a control input an using cooperative Hausdorff tracking technique in order to track the set HWn ( K m) by minimizing the shape function CJn(Kn, HWn (K m)) : [C(Wn) x [C(Wn) +——+ R+. Camera Inactive(c.,, = 0) When the distance of the target set to the coverage set of camn is large i.e., the target is vary far away from camn, the variable on is set to zero and camn enters into inactive mode. This mode prevents all the cameras from continuously servoing based on small motions of a target which is rather far away and is used to conserve energy of the camera. In this mode, the input to the camera, camn is set to 0 and the camera does not move. The target dynamics are still represented using the mutational equation of the target set. 3.6 Chapter Summary In this chapter, we have introduced the use of controlled mutational equations to model the motion of the sensor coverage and target sets. Using the mutational equa- tion model, we have introduced the method of Hausdorff tracking which can be used to control the motion of the sensors (cameras) to maintain target visibility for moving targets. Specific examples of the Hausdorff tracking in image space and world space have also been presented. Cooperation among various sensors is an important characteristic for pervasive surveillance. We have introduced the mutational hybrid automata (MHA) model for modeling motion of multiple sensors tracking a target. The implementation of this 55 model for a pervasive surveillance scenario can be done in a distributed fashion as the various events occurring, such as target visibility, etc., can be detected and processed in a distributed fashion. 56 Chapter 4 Analysis of Surveillance Networks This chapter presents the stability analysis of the proposed model and controllers. The shape Lyapunov theorem is presented in section 4.1, and can be used to analyze the asymptotic behavior of tubes. The stability analysis of the proposed Hausdorff tracking method using the shape Lyapunov theorem is presented next, followed by the analysis of an example two-camera system modeled using the mutational hybrid automata model proposed earlier. 4.1 Lyapunov Theorem for Shape Space The asymptotic behavior of the measure J (K (t)) of the deformation of the set K can be studied using the shape Lyapunov theorem. The deformation is described as the reachable tube K (t) and is the solution to the mutational equation in equation 3.1. We usually look for assumptions that guarantee the convergence to O of J (K (t)) Definition 4.1.1 (Shape Lyapunov Function) Consider a subset E of R” and a mutational map (,0 defined on the set E as shown in section 1, a shape functional J : [C(E) I——> R+ and a continuous map f : R I——> IR. The functional J is a f-Lyapunov 57 6L5 Tl.- 56 function for the mutational map (,9 if, for any K E Dom(J) we have, deixngwmwnga Jm3=wm) (4D where, w() is a solution of w’ = —f(w). When the solution w(-) converges to 0 as t —+ 00, the shape Lyapunov functional J (1990(t, K )) also converges to 0. Using shape derivatives, the above definition of Lyapunov functions can be stated as the shape Lyapunov theorem. Theorem 4.1.1 Consider a subset E of IR" and a mutational map 99 defined on the set E as shown in section 1, a shape functional J : K(E) I—> IR+ and a continuous map f : IR t—v R. Let the Eulerian semi-derivative of J in the direction go exist and be defined as J(K)( “w H-IH Using the definition of directional shape derivative this yields, Mono) s w'(0> (4.4) Then, observing that w’ (0) = — f (w(0)) = — f (J (K )) we can conclude the result. 58 Conversely, denote y(t) = J (1999(t, K )) We can now write mo~wn=u/:Efl WicK> By the hypothesis, we obtain t yw—mms—Afuwns 0%) Now, we can write yf1(t) S —f(y(t)) (47) Now consider a solution w(-) of the differential equation u305) = -f(w(t)), w(0) = J(K) (4-8) Using the Gronwall Lemma, for any positive t, we obtain y(t) = J (19920, K )) S 10(0) (4.9) This proves the theorem. [:1 Using the shape Lyapunov theorem we can derive assumptions on the input u, to move the camera/ robot, in order to accomplish the surveillance task. 59 4.2 Stability Analysis of Hausdorff Tracking In this section we analyze the stability of the Hausdorff tracking method using the shape Lyapunov theorem discussed in the previous section. Stability in the sense of Lyapunov for the proposed Hausdorff tracking method would imply the asymptotic convergence to zero of the shape function J (K) This asymptotic convergence to zero of J (K ) will imply that the task criterion for the system is accomplished. For both the image-based and cooperative Hausdorff tracking methods proposed, the shape directional derivative has been reduced to the following form: J°(K)(so) < C(K) - u (4.10) We can now propose the following theorem for stability of the Hausdorff tracking method. Theorem 4.2.1 For a automated tracking system modeled using the mutational equa- tion K 3 w(K) where K is the target set and w(q) = B(q)u with the plant input vector u. The shape ( error) function is defined as J (K) with the shape directional derivative in the direction of the mutation (J (K )(go) ) defined by equation 4.10. The system is stable, i.e., the shape function asymptotically converges to zero for the input u = —aC#(K)J(K), a > 0. Proof. In order to prove the stability of this system, we have to show that it satisfies the shape Lyapunov theorem from section 4.1. That would imply that the shape function J (K) asymptotically converges to zero. The shape Lyapunov theorem can be written as: J°(K)(_ 0, Vi = 0 implies that the target is covered. Based on the procedure for calculating the motion input an to the cameras outlined in 4.3.1, we can show that z'Jn and an exponentially tend to zero, which implies that V,- also exponentially tends to zero. Hence we can say that the continuous evolution in the modes {q1, q2, q3} converges exponentially to the equilibrium state. We further note that the switching conditions G defined in equation 4.20 have a certain hysteresis associated with them due to the finite size of the non-empty target set R and the non-empty overlap region W1 (1 W2. Hence we can prove that Zeno executions are prevented, which completes the proof. [:1 Hybrid System Stability for the Two-Camera Example System without Controlled Switches Based on Definition 4.3.1 of internal stability and lemma 4.3.1, it can be deduced that each state (except qO) is internally stable. We can now define stability for the pervasive surveillance scenario as: 72 Definition 4.3.2 {Stability of a pervasive surveillance system) A pervasive surveil- lance system is said to have the stability property if a target set (R) will be continu- ously covered in the monitored region if it enters the coverage set (Ki) of one of the sensors comprising the network. Now, if we can prove that the set of modes {q1, q2, q3} is invariant, then any hybrid execution (r, q, K) starting in these modes will always remain in these exponentially stable modes, which implies that the target will be pervasively tracked. Based on these conditions we can put forward the following theorem for stability of a pervasive surveillance network. Theorem 4.3.1 Consider the graphical MHA model of a pervasive surveillance sys- tem in Figure 4.2. The set of operating modes {q1, (12, q3} is an invariant set. Proof. Consider a hybrid trajectory 5 H = (T,q, K) of the system with its initial conditions within the set {q,KIq E {q1,q2,q3}}. The modes {q1,q2,q3} are expo- nentially stable, which implies that the energy of the system in that mode can be reduced to a very small value in any finite amount of time t. For mode ql this implies that when the target R C K 2 hits the boundary of the domain W1, its projection flu/10?) C HW1(K2) is already covered by K1 as IIW1(K2) C K1. When K is inside the boundary of W2, IIW2 (R) Q R C K1. Hence the target is within the coverage region of K1 and will be recognized by K1 which will generate the event to set bl = 1, which in turn will switch the mode to q3 (assuming W1 0 W2 75 0). Thus as soon as the target moves out of W2 and into W1 it will be covered by K1. A similar argument can be presented for the mode qg and the target hitting the boundary of W2. Now in mode q3, the target is located in the intersection of the domains W1 and W2 (fr c W1 n W2) and is being covered by both K1 and K2 such that fr c K1 and If C K 2 due to the exponential stability of mode q3. Now the target can only move into the regions W1 \ (W1 (1 W2) or W2 \ (Wl 0 W2) and in both regions either K1 or 73 K2 can cover the target, respectively. This implies that the system can switch modes to q1 or q2, both of which possess the exponential stability (in the sense of Lyapunov) property. We have shown that all trajectories originating in ql and q2 can switch only to state q3 and all trajectories originating in q3 can switch only to either q1 or q2. This implies that the set of modes {q1, q2, q3} is invariant, which proves the theorem. El Based on the above theorem, and the fact that all the modes in the invariant set {q1, q2, q3} are exponentially stable (in the sense of Lyapunov), we can conclude that the hybrid system is stable i.e., the target set is continuously covered once it enters the invariant set. 4.3.3 Two-Camera Example for Pervasive Surveillance with Controlled Switches For the two-camera pervasive surveillance example considered in the previous section 4.3, the discrete variables were comprised of Q = (b1, b2, c) = {q} where b,- is a binary value which denotes whether the target is visible to sensor Cam,- and c is a discrete variable which indicates which cameras are involved in the tracking scenario. This representation of the discrete state c is not very intuitive. Hence for a truly distributed implementation, we propose to modify the discrete variable c and separate it into its constituent parts. Each camera will be associated with its own 0,- which will assist in a distributed implementation. For the example system presented earlier, only the modes in which the discrete variable c was set to 2 were considered. This implies that for all possible locations of the target all the cameras will be servoing the target. This scenario will be wasteful for energy consumption and does not take into account the fact that the sensors need not track the target if it is relatively far away from the sensor’s domain. Since the cameras need not be tracking the target continuously and can shut down 74 when the target is not close by, the MHA model with controlled switching will be more energy efficient. Figure 4.3 shows the MHA model of the two camera system with controlled a controlled switch which is used render a camera into an inactive state. 75 8:336 8:838 55 2.858 eHmeoéBu 2: 8m Ecofi <32 EQEQSO ”m6 853m NK&UM ‘NN:N vRUNN <§U- NNSUM AN3.N¥VNSMN¥ < :5 A$2¥V§m Cw :5 k a w: . / “VAC: 3 gesmw mambo s >\ < \RUN I_ . N N: N 3&5”st a- «CNN 3.3" s .. u MCNN $2.5st Aizkfism Cw Aizvbicm .M CEMSmww thsmk 2.3.8 n s 2.3.: n s aéfigé snflecwéuicw téfigss N ... 33%.? %VA¥V h fiHN¥C- 5} 0(05) = {K E MIR C WiAdW2(f{) > 5} where, 6 is a positive scalar distance whose value is selected by the system designer based on the controller design and stability requirements of the model. The Init set can be defined as Init = {q,Kl Vi=1toN b,- = 1}}. This implies that the Init set encompasses all the states where the target is visible to at least one camera. 77 Edge Set (E) The possible discrete transitions denoted by the edge set E are: E = {(q1, (10), (ql, q2)(q2, q0)(<12, r13). ((12, <11), ((13, (12), ((13, (10). ((14, (10), ((15, (10), ((14, (11), (C11, (14), ((15, (13), ((13, (15)} (4-21) Continuous Dynamics: lV’Iutational Equations (2) The continuous dynamics in states q1, q2 and q3 are the same as in the previous example presented in section 6.7.3. However for the sates q; and q5, the continuous dynamics are modified such that the camera that can View the target will perform image based Hausdorff tracking while the other camera will have zero input. Hence there will be a third case such that on = 0. Camera Inactive, (on = 0) When the camera is inactive, the input the the system is set to zero i.e., un = 0. Guard Conditions: Switching Control (C) The various events recognized by the distributed sensors are: 0 Target visible to camera camn 0 Location of coverage set of all neighboring cameras. Using the location of the neighboring cameras, the current camera can estimate the approximate location of the target as shown in 4.3.1. The guard conditions for the 78 various transitions can be written as: A “(11.92) = {KCM3KCl/Vll C(qsml = {KEMIKCW2} C(q1,q0) = {KEMzR'nKlztlAKnK2=0} C(q3,q0) = {KEMrIi’flKlzzleKflKg-zlb} C(q2,q0) = {KEMzKflKlzleKflngfl} C(q4,q0) = {KEMzR’flKlrfiAKflnglb} C(q5,q0) = {Kelei’nlemARanzw} C(qwn) = {KEMIK0K1=0} C(q2,q3) = {KEMin'flKg-tlll} G(Q3.<15) = {K E M I dW2(Kll > 5+ 6} C(q5,q3) = {K e M : dWZ(K1) 4.) a} 001144) = {K E M I dW1(K2) '> ‘5 +6} G(CI4:(11) = {K C M3d111/1(K2)> (5} where, 6 > 0 is an arbitrarily small positive integer. A hysteresis is inherently as— sociated with the switching conditions for every mode which will prevent infinite switching in a finite time period (zeno execution). 4.3.4 Stability Analysis of the Two Camera Example with Controlled Switches We can apply a two step procedure similar to the one presented in 4.3.2 in order to analyze the stability of the mutational hybrid automata model presented in Figure 4.3. First we prove that each mode in the set q = {q1, q2, q3, q4, q5} is internally stable 79 as defined in Definition 4.3.1. Next we show that the goal set of the mutational hybrid system is invariant which would imply that a trajectory which reaches the goal set will remain in the goal set. Further, by restricting the I nit set to the goal set, we can show that all trajectories of the mutational hybrid system initialized in the I nit set will remain in the goal set hence completing the proof. Internal Stability Lemma 4.3.2 Consider the continuous closed loop dynamics of the example two- camera system represented in Figure 4.3 in the operating modes {q1,q2,q3,q4,q5} and the guard conditions 4.22. The operating modes {q1, q2, q3,q4,q5} are internally stable. Proof. For each state in {q1, (12, q3, q4, q5} we can define a candidate Lyapunov func- tion V,- as: CI1=V1 = CJi+iJ2 (121V2 = z"71+CJ2 Q3IV3 = z'J1+z'Jz (14IV4 = z'J2 <15=V5 = iJl where, iJn 2 0 and 6.11; 2 0 are defined as in equation 4.18 and 4.19 respectively. Hence V,- {i = 1,2,3,4,5} are candidate Lyapunov functions, since V,- _>_ O, V,- = 0 implies that the target is covered. Based on the procedure for calculating the motion input an to the cameras outlined in 4.3.1, we can show that z'Jn and an exponentially tend to zero, which implies that V,- also exponentially tends to zero. Hence we can say that the continuous evolution in the modes {q1, (12, q3, q4, q5} converges exponentially to the equilibrium state. 80 We further note that the switching conditions C defined in equation 4.22 have a certain hysteresis associated with them due to the finite size of the non-empty target set R and the non-empty overlap region W1 0 W2. We can also assume that the size of the camera coverage set for any camera is larger than the size of the target which implies that that dW1 (K 2) g dW1(K) when the target set is covered by the coverage set K 2 of camera cam2. We can write a similar condition for the target being covered by camera 1.Hence we can prove that Zeno executions are prevented, which completes the proof. Cl Hybrid System Stability In order to prove that the target will always be visible to at least one camera at all times, i.e., stability of the hybrid system, we need to prove that the set of modes q = {q1,q2, q3, q4,q5} is invariant. This will imply that once the target enters this set of invariant modes, i.e., the target is visible to at least one camera, the hybrid trajectory will never leave the invariant set. Theorem 4.3.2 Consider the graphical MHA model of a pervasive surveillance sys- tem in Figure 4.3. The set of operating modes {q1, q2, q3, q4, q5} is an invariant set. Proof. Consider a hybrid trajectory 5 H = (r, q, K ) of the system with its initial conditions within the set {q, Klq E {q1,q2, (13,Q4, q5}}. The modes {q1, q2,q3,q4,q5} are exponentially stable, which implies that the energy of the system in that mode can be reduced to a very small value in any finite amount of time t. In the mode q4, If C K2. Further as the target moves closer to the domain W1, dW1(K2) —> 6 and as soon as the dW1(K2) < 6, there is a mode switch to mode q1. For mode ql this implies that when the target Ii’ C K2 hits the boundary of the domain W1, its projection HW1(R) C HW1(K2) is already covered by K1 as HW1(K2) C K1. When R is inside the boundary of W2, HW2(F{) g Ff C K1. Hence 81 the target is within the coverage region of K1 and will be recognized by K1 which will generate the event to set bl = 1, which in turn will switch the mode to q2 (assuming W1 0 W2 # 0). Thus as soon as the target moves out of W2 and into W1 it will be covered by K1. Now within the mode ql, consider the target moving away from the domain W1. This will imply that the distance dW1(F() is increasing which would in turn imply that dWl (K 2) is increasing. When dW1(K2) > 6 + e, the system switches to mode q4. At the instant when there is a mode switch from mode q4 to q1, dW1(f{) > 6. Assuming that the maximum velocity of the target centroid is vmax it will take the target t1 = 6/vmam time to reach the boundary of domain W1. From the continuous dynamics system, the error CJ2 can be reduced to zero in a finite amount of time t2 S 5a where a is the gain of the Hausdorff tracking control input. Now, if t2 < t1 i.e., 6 > 5avmax, we can show that the Lyapunov function V1 in mode ql will be reduced to zero before the target set intersects with the boundary of domain W1. Hence we have shown that for all mode switches out of mode q1, the Lyapunov function of q1 is reduced to zero. A similar argument can be presented for the mode q3 and the target hitting the boundary of W2 or moving away from the domain W2 such that dW2(K 1) > 5 + 6. Now in mode q2, the target is located in the intersection of the domains W1 and W2 (K C W1 (1 W2) and is being covered by both K1 and K2 such that K C K1 and If C K 2 due to the exponential stability of mode q2. Now the target can only move into the regions W1 \ (W1 0 W2) or W2 \ (W1 0 W2) and in both regions either K1 or K 2 can cover the target, respectively. This implies that the system can switch modes to q1 or q3, both of which possess the exponential stability (in the sense of Lyapunov) property. We have shown that all trajectories originating in q1(q3) can switch only to state q2(q2) or q4(q5), all trajectories orignating in state q4(q5) can switch only to state 82 q1(q3) and all trajectories originating in q2 can switch only to either q1 or q3. This implies that the set of modes {q1, q2, q3, q4, q5} is invariant, which proves the theorem. E] 4.4 Chapter Summary This chapter presents the analysis of the Hausdorff tracking method and the muta- tional hybrid automata model presented in Chapter 3. The Lyapunov theorem for shape spaces is presented and it is used to prove the stability (in the sense of Lya— punov) for the Hausdorff tracking method presented. The stability analysis of an example two-camera pervasive surveillance system is also presented. 83 Chapter 5 Design of Surveillance Networks Collaboration among multiple sensors can generally lead to better sensing perfor- mance and higher fault tolerance to individual node failure. Calibration of the sen- sors aids collaboration by establishing a common frame of reference for the various sensors. Calibration needs to be handled at many levels including location calibration of the sensors, color map calibration, calibration of the target recognition modules etc. Coordinating the tracking of multiple targets using multiple sensors can be ad- dressed in two ways: 1. Use a central monitoring station to integrate the track information from each node [39]. 2. Allow sensor nodes to operate autonomously and exchange track information with one another in a geographical vicinity and provide a handover mechanism when the target transitions from one sensor field of view to another [37]. The first approach simplifies track management and communications but is not scalable to large networks due to the limited communication and processing capability of the central monitoring station. Also, the system is not fault tolerant to the failure of the central monitoring station. The second approach is highly scalable and fault 84 tolerant to the failure of nodes, but involves significant communication overhead for target track management and disambiguation. 5.1 Wide Area Camera Calibration Calibration of the sensors in a wide area environment is a difficult task without physical access to the environment that the sensors are placed in. Normal techniques of system calibration such as using precise planar grids [70] are not possible in a previously unknown and inaccessible environment. Only the intrinsic properties of the sensors can be calculated and accounted for prior to deployment. Therefore, a robust calibration method that allows for a wide dispersion of sensors to be calibrated must be used. 5.1.1 Unitless Calibration An unsynchronized network of cameras can be calibrated by trackng an identifiable point in motion. This provides a unitless orientation of all sensors that have sensed the point. The sensor network must meet the requirement of viewing area overlap for each camera with at least one other camera. Also, all sensors must be interconnected via overlapping viewing areas of intermediary sensors. Without this interconnectedness the network would be segmented into groups that could not be related spatially. This method is ideal for our proposed environment for the following reasons: 1) with a drop-in-place environment there is no assurance that sufficient identifiable markers will be available for automatic calibration of the type proposed by [71] [72] 2.) A Kalman filtering technique can be used to reduce the error between pairs of sensors and the propagation of error across the network 3.) The world coordinate system can be defined with respect to any camera in the network including the mobile sensors. 85 Initial Calibration It is desirable to have taken the intrinsic characteristics of the cameras into account prior to deployment to reduce the number of actions necessary to calibrate the system. These intrinsic properties are represented in matrix form as an 0 v.0 A: 0 av '00 (5.1) .OOIl and includes the focal length (au, av), image center (uo, v0), and skewness (c) of the image [73]. Focal length is the distance from the lens to the projection plane. The image center is the center of the projected image on the projection plane. This allows a more accurate pose to be generated. Finally, the skewness is the ratio of the pixel width to pixel height and is used for correction of a flawed lens or to normalize wide view images. We will assume that these characteristics are already well known for each sensor in the network. The extrinsic parameters of the cameras can be found using a point-in-motion scheme. By identifying a detectable point in motion it is necessary, for every pair of cameras viewing the point, to calculate both the cameras’ transformation matrices. A camera’s transformation matrix is composed of the rotation matrix and the translation vector for the camera with respect to another. This matrix is represented as: T=[R t] (5.2) The intrinsic and extrinsic matrices are then combined to provide a complete descrip- tion of the mapping of a point in 2D image coordinates to 3D coordinates in the world 86 frame. " T ' ' (I? ’U. y 2 1 _ .. ‘1 One method of wide area calibration of extrinsic parameters is a point in motion calibration technique. This method, proposed by [74], allows for a single identifiable point to be passed throughout the observable environment. When all sensors have detected the point while another sensor has the point in view simultaneously, a rough estimate of the unitless calibration can be generated. This method is better suited for wide area calibration because of the relatively large distances between sensors. In order for a calibration object to be used to generate accurate results, resolution is very important. The most accurate calibration takes place with the object filling most of the screen. For a good calibration to be performed with a calibration object in a wide area, it would need to be quite large to ensure the pose was properly modeled. This is one of the strengths of the point-in-motion technique. The points can be distributed throughout the volume of the coverage set, creating a virtual calibration object. The scalar A is called the depth and accounts for perspective effects [75] and can have any value. This ambiguity is a result of the loss of the z coordinate in relation between the object in space and the object on the image plane. For any point (ui,v,-) in the image plane i, a matrix T,- exists that satisfies F . .. ., (I: “i y /\2 U2 = ATi (5 4) Z 1 - - 1 where A,- is the depth at that point i. 87 The collected data is then processed assuming that times are synchronous for each pair of positions and recorded for the sensors. Processing includes generating an initial rough extrinsic calibration. Since no actual synchronization is used to gather data, there will be time domain discrepancies in every pair of 2D points used to find location and pose. This error in the developing map is reduced later using an extended Kalman filter [76]. 5.1.2 Method for Assigning Units In order to find the scale factor associated with the calibration matrix it is necessary to make some accurate real world measurements. With a drop-in-place system a number of methods can be used to find the scalar A associated with the calibration matrices. In our proposed environment there can be no assurance that a known dimension will be found or that an object with a known dimension can be placed. We have implemented a pre—calibrated dual head camera rig. This rig can be used to accurately assign units to the unitless calibration data that has already been generated. An accurate calibration such as that proposed by Zhang [72] will have been performed and the unit’s characteristics will be considered well known so that minimal error is introduced into the scaling of the sensor map. All sensors in the network of calibrated pairs that now have a shared normalized transformation can have units assigned with only a single shared marker with one other sensor by the pre-calibrated dual sensor rig. 5.2 Grouping Architectures One important issue we need to address for the surveillance network is group forma- tion and management. There is extensive previous research done to address forming and managing ad hoc network groups based on the proximity of the sensor nodes. In 88 most surveillance tasks, spatial proximity of sensor nodes is a vital characteristic of sensor networks. In order to take advantage of the proximity of the nodes to save energy or enable efficient sensing information processing, sensors are often organized into local collaborative groups. The group management approaches differ from each other depending on how the target object is identified and the parameters are detected. In the work of Chen et a1. [77], the target is detected based on the acoustic signal strength and a group management algorithm is proposed based on the Voronoi diagram. Liu et a1. [78], proposed a distributed group management framework for target localization appli- cations. In their framework, each node runs a likelihood ratio test (LRT) once the target is detected. Later the group formation and leader election process is initialized based on the result of LRT. In RoamHBA (Roaming Hub Based Architecture, [7 9]), a subnetwork is established among the sensor nodes that are observing the same target. Communication within this subnet is done through a multicast tree, called a roaming hub. The subnetwork is updated by adding or deleting nodes at the edge. For the visual surveillance network, we propose to use the RoamHBA(Roaming Hub Based Architecture, [79]), based group management approach. One group will be set up for all the nodes in a particular locality. In most cases, it is necessary for the tracking group to elect a leader to coordinate the tracking and surveillance task. Previous research work addressed the leader election problem in a wireless sensor network The dynamically elected leader is responsible for coordinating the tracking tasks of each node in the group. Periodically the group leader runs the coverage optimization algorithm. A timer is set up to invoke the process of coverage optimization. The value of the time is determined based on the capabilities of the sensor node and the characteristics of the targets being tracked. For example, if the targets move very fast, the timer should have a very short timeout, and at the same time, the requirement 89 for the capabilities of the sensor nodes are increased. The result of the optimization algorithm, that is, the (multiple) target assignments, should be transmitted to the member nodes using the networking transport services provided by the sensor nodes. The actual transport services being used can vary based on the network infrastructure on which the sensor network is built up. It is reasonable to assume that the sensor nodes are equipped with geographical- location-aware devices. Extensive research has been done about forwarding packets in a wireless network based on the geographical location of the nodes. Different from traditional wireless routing protocols, in location-based routing each node only needs to know its own position and the position of its one-hop neighbors. This makes location-based routing perform well in a dynamic environment such as a mobile surveillance network. More details about location-based routing can be found in [80]. If the sensor nodes are geographically distributed and connected to the Internet, common multicast protocols can be used for the group leader to disseminate the cov- erage information to the group members. The group management can be done using protocols like Internet Group Management Protocol (IGMP) and multicast routing protocols can be used to set up the routes. In more common cases, where the sensor nodes are connected using wireless networking technologies such as IEEE 802.11 series technologies, group management techniques presented at the beginning of the section can be used. The dissemination of the control information for sensor tasking can be done using the multicasting or broadcasting techniques in wireless networks. For access-point-based wireless networks, multicast among the group can be done with the aid of the access points. For wireless ad hoc networks, much research has been done to solve the multicasting problem. In the most straightforward case, a broadcast simulated multicast, which involves the nodes boradcasting within their locality, can serve the goal of a surveillance network. 90 5.3 Sensor Allocation for Multi-Sensor Cooperation For successful completion of the surveillance task, it is important to allocate the sensors to the targets that are actively being tracked. Generally, each target should be tracked by the sensor that can view it at an optimal resolution for best visibility of that target. However, when the system is heavily loaded, that is, the target density in a particular region with a limited number of sensors is high, one sensor might need to track multiple targets. Even when the system is underloaded, multiple target tracking by individual sensors is highly desirable as viewpoint redundancy will lead to better estimates of the target coordinates and also present the viewer a clearer view of all the targets that are being tracked. In this section we propose a heuristic method to determine an optimal sensor allocation strategy. At each system interval, the leader node computes a cost and visibility weight for the targets being tracked. Assume m sensors with cameras were setup in the system and there are n targets that need to be covered. One camera can cover from O to up to n targets, so for each camera, the number of possible coverage sets is 2220(2) = 2'". In order to determine the optimal coverage set for each camera, we set up a weight matrix W for the coverage Options. An element of the matrix W is defined as (i E [1,m] and k 6 [0,2" — 1]): C(L(k)) wik: Z Vi,L(lc)'(Rs,i,L(k)'wv+P3'wP) (5-5) 3:1 The following parameters are explained as follows: 0 L(k) is the kth combination of the set [1, n]. o ViJJUC) is a binary value (either 0 or 1) that indicates the ability for the sensor node to cover all the targets in the set L(k). o Rs,i‘,L(k) is the resolution of target 3 viewed by sensor node i when all the targets 91 in the set L(k) are covered by sensor node i. The resolution is calculated as follows: CC RS,’l,L(k) = /\—3- (5-6) zs,i where x, is the width of the target 3 and 23,,- is the distance between target 3 and sensor node i. A E (Am-n, Am”) is the zoom factor (focal length) of the camera of sensor node i. a P5 is the preassigned priority of target 3 in the tracking system. a my and wp are adjustable parameters to reflect the importance of resolution and priority in the coverage weight. For each row of the matrix, a larger value will indicate a more desirable target assignment. However, it is required that all targets should be covered by the sensor nodes. Thus, an optimal solution of the problem can be defined as {(y,Ly)|y E [1,n] and Ly is a combination of the set [1,n]} which satisfies that 23:1 Ly is the largest among all the possible coverage set for each camera. By brute force the complexity of the algorithm is 0(2m"). When n and m are large, heuristic algorithms are needed to find the optimal assignment. In reality, many heuristic approaches can be taken into account when solving the problem. For example, we can prove that under normal target density, the coverage matrix will be a sparse matrix, where most of the elements are zero. Eirther, in most cases, we do not need to find the optimal solution. Finding a sub-optimal solution would reduce the complexity of the algorithm dramatically. For example, after setting a threshold of the acceptable weight, we can take the first solution encountered as long as all the targets are covered. 92 5.4 Optimal Hausdorff Tracking Given the task specified by equation 3.31, notice that C (K, K) is not a square matrix, which indicates that the system is redundant. Further the condition from Equation (3.30) indicates that there are numerous choices of the input u which satisfy the stability requirements of the Hausdorff tracking system. This implies that the system has infinite solutions for a choice of input u which will accomplish the task. Therefore the designer can choose an algorithm in order to achieve an optimal solution for this system. An appropriate choice for u will minimize the resolution inadequacies and energy consumed. A review on recent literature shows that a commonly used approach for the optimal redundancy resolution relies on the use of generalized or pseudoinverse transformation to degenerate the system, while optimizing the given task criterion constructed from multiple objective functions using a weighted sum method [81], [82]. Optimality can be quantified by defining supplementary task functions which the system needs to accomplish. These functions are called objective functions which need to be minimized in order to achieve an optimal solution. The optimization problem can be expressed as: Find the target and coverage sets (K and K respectively) and input u which satisfy: T min: D(K,K,u)= D1(K,K,u) DN(K,K,u) K,K,u JOCK): :d2 (p)dp Subject to: fK K (5.7) J°(K,K)( R as: 99 D,- is the objective function for the ith task requirement. The optimal solution is achieved by choosing the system variables [u, p] which calculates the optimal value for the criterion Z, where u is the input velocity of the various axes and p is the state of the sensors and targets. Weighted Sum Method The weighted sum method is the most popular mapping method for multi-objective optimization problems. In this method, the task criterion is formulated as a weighted sum of the objective functions, where the system designer assigns a relative importance (weight) to each objective function. However, the task criterion obtained by this method lacks physical meaning. Misinterpretation of the theoretical and practical meaning of the weights can make the final solution unsatis- factory. Although there are many methods for choosing weights, a priori selection of weights does not necessarily guarantee the acceptability of the solution [81]. Further- more, it is impossible to obtain a solution on the non-convex regions of the Pareto frontier [83]. Thus, physical programming [84], is used to generate the task criterion. Using this method, the system designers only need to specify a preference structure for each objective instead of assigning a weight which may be meaningless. This usually has more physical meaning and can better guarantee a satisfactory solution. Using physical programming, the system design can be put into a more flexible and natural framework. Another advantage of physical programming is that it can obtain the solutions in the non-convex regions of the Pareto frontier [83]. Preferences Mapping Using the physical programming approach, objective func- tions are mapped into a preference space. The preference is a parameter which rep- resents the extent of the designer’s satisfaction. The task criterion is set up as the aggregation of the preferences of all the objective functions. Thus, instead of being physically meaningless, the criterion becomes a satisfaction factor of the solution. 100 The designer’s expression of the preferences with respect to each objective function can be categorized into four different classes: smaller-is—better (1S), larger-is-better (2S), value—is-better (BS), range-is-better (48) as in Figure 5.4. According to Messac’s research [85], the mapping is depicted as Figure 5.4. In this Figure, u,- is the value of the ith objective function, P7; is preference, a real positive scalar, of the ith objective function. Each objective belongs to one of these classes. [83—85] provide a more deeper analysis of physical programming. According to the degree of the desirability, the objective function values are cat- egorized into several regions as seen in Figure 54. Taking the case of Class 1S as example, the preference ranges are: Ideal range (#2 S U21) Desirable range (v11 S m S 1’2'2 Tolerable range (”012 S i S 7123 Undesirable range ( Highly Undesirable range (viz, _<_ Unacceptable range (”Dis S i) The parameters, ”2’1 to vi5, are physically meaningful constants which express the designer’s preference associated with the ith objective. For physical programming, higher the desirability of the objective, the lower the value of preference. Therefore, physical programming problems are always of the minimization type. The class functions for preference mapping should have several important prop- erties such as nonnegativity, continuity and convexity. Further, the preference value P at range intersections (e.g., Tolerable-Undesirable) is the same for all class types and objectives. 101 HU — Highly Undesirable HD — Highly Desirable CLASS'l S ‘1’. (fl ) UA — Unacceptable U - Undesirable T - Tolerable D - Desirable HD CLASS-28 131-(#1.) CLASS-38 UA HU u IT u HU‘UAfl. l ”i5L ”i4L ”13L ”£21. ”i1 ”i2R ”i3R ”i4R ”i5R CLASS-4S PI-(fli) \ / UA D HD , UA HU u T 1“ ’10 I T u Hub”: ”i5L ”i4L ”:31. ”i2L ”i1L ”i1L”i2R vi3R ”i4R ”i5R Figure 5.4: Class functions for physical programming. 102 Physical Programming Formulation Based on the preference mapping of the ob- jectives, physical programming problems can be stated as: A! 1111112 2 Z P,- (gi(x)) x i=1 subject to: 9100 S 015 (for class IS) 9i(x) 2 112-5 (for class ZS) (5.12) ’Uz'5 L g 9,-(x) g 15:51; (for class BS and 48) where x is the system variables, M is the number of the objective functions, gi(x) is the ith objective function, P,- is the preference class function of 92-. Physical Programming Analyzer (PP Analyzer) The preference structure (say, class type and 211 to ’05) for each objective function must be specified to set up the global criterion function. However, due to changing task requirements, the initially specified preferences for the objectives may not represent the objective’s de- sirability accurately at later time instants. Therefore, physical programming should be modified according to the task requirements. A PP analyzer module is built to handle this problem. The PP analyzer takes the Task Dexterity Indices as inputs, and generates a vector E = [$1, - -- ,EM]T to represent the degree of designers’ confidence about the preference structure specified for the objective function. It passes this confidence vector to the physical program- ming module. The latter module will modify its result according to the confidence parameters. The analyzer is formulated as: E,- = (ai + k,- x indexi) i: 1, - -- ,]W (5.13) 103 where ai is a constant which represents the system designer’s confidence about the original preference mapping function, It, is a constant. The criterion generated by physical programming will be modified as: M Z = 26.- >< 1).-(am) (5.14) i=1 Equations (5.13) and (5.14) show that when indezri increases, the preference value P. ‘ index; = 0.8 ' 5,. = 0.9 index, = 0.2 6,. = 0.6 sz #‘1 ' D v2 v2 .11: Figure 5.5: Modification of the preference mapping by confidence factor. of the ith objective function will increase. Thus, when the task has high preference on the requirement class i, the objective function D,- should have a higher preference for further minimization. Thus, the task requirement condition governs the path of optimization. From another viewpoint, multiplying a factor with the preference mapping func- tion can be treated as modifying the structure of the mapping. Taking the class 18 as an example (from Figure 5.5), if the preference value at the range intersection (say 202 Desirable-Tolerable) is kept unchanged, the desirability ranges of the objective func- tion change with the task index (confidence factor). In this Figure, index’ > index 104 will induce 112' < '02 which makes the Tolerant range closer to the ideal range. Therefore, combining equations (5.7), (5.12), and (5.13), and replacing g,- by Di, and x by u, p, the optimal Hausdorff tracking problem is formulated as: M 2511112 = 2 £1 X Pi (Di(u,P)) (5-15) '1’ i=1 subject to —aJ 2 Cu D- u, S v- for class 18 i( P) 15 ( ) (5.16) Di(u, p) Z 1),-5 (for class 28) 21,51, 3 D,(u,p) _<_ "oz-53 (for class 3S and 48) where, a is a positive definite gain and J is the vector of shape functions described for a specific task. The system constraint equation -on = Cu is derived from the Hausdorff tracking task controller derived in [67,68]. Optimal Solution Calculation The next step is to find the optimal solution for the optimization problem specified by equations (5.15) and (5.16. In this formulation the system constraint -aJ = Cu takes the form of the linear equation. Therefore, we can use the generalized elimination method [86] to eliminate the system dependent variables: suppose that there exist matrices Anxm and an (n_m) such that [A B] is non-singular. Usually, for the single target Hausdorff tracking task, n = 6 and m = 1. If matrices A and B satisfy the following equations: CA=I CB=0, (5.17) 105 the system solution is formulated as: u S —aAJ + B6 (5.18) where, 6 denotes the Null space of the matrix C. The reduced optimization problem is shown as: M minZ = Z 5,- X P,- (Dz-(u,p)) (5.19) 5 i=1 subject to D,-(u,p) _<_ 212-5 (for class IS) Di(u,p) 2 222-5 (for class 28) (5'20) 1),-5L S Di(u,p) S "oz-53 (for class 33 and 4S) where, u g —aAJ + Be (5.21) p = 13 + U/ f 13 is the initial value of the system variables p, f is the control frequency, 6 is and arbitrary vector and A and B are as defined in Equation (5.17). Using an online pattern search algorithm [82], the system can easily obtain the optimal solution. 5.5 Switched Video Feedback Video feedback is an essential component of the surveillance system. The task of cap- turing, transmitting and displaying the live video stream from the various sensors to the requesting clients is handled by the video subsystem. Since multiple cameras are deployed to track the identified targets, multiple, concurrent feedback video streams may be required for monitoring the target. These sensors initiating these streams will be changing from time to time as the target moves out of range of the current sensors tracking it. However, providing multiple unnecessary (unrelated to the task) 106 Sensor node 1 video server Client display Figure 5.6: Switched video architecture implementation video feedback streams often causes loss of attention span of the operator and makes it hard to keep track of the various activities over the cameras. Hence only video streams from relevant sensors should be presented to the operator on a per-activity basis. This is done through automatic or manual switching of the camera streams that are presented to the operator. The system is designed to support video feedback to multiple clients over an IP network. The video service is designed to support various types of live video stream feedback from the sensors to the individual clients such as MJ PEG and H.263. Various resolutions such as CIF, QCIF and 4CIF are supported for the live video streams. A switched video client and server architecture have been implemented as shown in Figure 5.6. 107 The various properties and tenets of the design of the switched video feedback system are 0 The client should be able to switch the video stream between multiple servers in order to track a target as it ’handed-off’ between multiple servers. 0 The client should also afford the viewing of video streams from multiple servers simultaneously in order to display multiple tracked targets or multiple views of the same target. 0 The client should receive selected video feedback based on the commanded task which will not tend to overwhelm the operator. This selection of the feedback streams should be done automatically. In order to transmit real—time video over a network, it is necessary to convert the output of the encoder into a packet sequence - i.e., packetize it. Traditional TCP and UDP services are not sufficient for real-time applications, and applications-oriented protocols such as the Real~time T1 :13". sport Protocol RTP provide an alternate solution for managing the essential tradeofls for quality and bandwidth. RTP provides end- to-end delivery services such as payload type identification, sequence numbering and time stamping. The proposed architecture has been implemented on a PC running Linux OS. The Qt toolkit [87], which is a complete C++ application development framework, including a class library and tools for cross-platform development, has been used to develop the client and application. A Windows-only client has also been implemented using Microsoft Visual Studio 2005. The current implementation of the user interface is shown in Figure 5.7. We have implemented the MJPEG, H.261, H.263 and H.264(hardware based) video compression schemes which are packetized using RTP / U DP transport protocol 108 SW l (0‘ mmtdswdmviduamfl (‘ mmwwmmm Mun Nova-awa- :13; m am. Em .———J—'- roam -—-J— swam —J— M -.-.——J— atg-s ]. Mn ] _1 Ma] _1 _l WM~-*‘— 'P'“ .144 "fir-1.1.1.! Gd“; Milan] mal; :l WT— _l _J m “' TM ator: record Calm Cum Nun .___J"’W: |——'Mfl——l [—, Figure 5.7: H.624 Based Switched Video Streaming Client. 109 for transferring the surveillance video to the client. Characteristics and implemen- tation details of the video subsystems implemented are summarized in the following discussion. 5. 5.] Characteristics and Implementation Details for MJPEG MJ PEG stands for Motion-JPEG and is a technique which simply performs a J PEG compression on each video frame before transmission. Unlike MPEG and H.263 codecs, MJPEG does not support temporal compression but only supports spatial compression. The main advantages to using this approach is that J PEG compression can be im- plemented relatively easily in hardware and it supports a wide variety of resolutions. This implies that a wide variety of hardware can be supported in case of a network with heterogenous video capture hardware. Phrther, it uses no inter-frame compres- sion which results in low latency in the video transmission system. However, the major disadvantage for using MJPEG technology is its inefficient use of bandwidth. Due to the lack of inter-frame (temporal) compression, MJ PEG streams require a high bandwidth of the order of 2 Mbits/s for a 30 fps NTSC resolution stream. Though at lower frame rates and lower resolutions MJ PEG can be used effectively, its use cannot be justified in low bandwidth applications such as wireless sensor networks. 5. 5.2 Characteristics and Implementation Details for H.261 H.261 is a video coding standard published by the International Telecommunications Union (ITU) [53]. It supports CIF (352x288 pixels) and QCIF (176x144 pixels) resolutions. It supports both temporal and spatial compression for reducing the size of the encoded image. The coding algorithm is a hybrid of inter-picture prediction, transform coding, and motion compensation. The data rate of the coding algorithm was designed to be able to be set to between 40 Kbits/s and 2 Mbits/s. The com- 110 pressed video stream is structured into a hierarchial bitstream consisting of four parts, namely: (1) Blocks which correspond to 8 x 8 pixels; (2) Macro blocks which corre- spond to 16 x 16 pixels of luminance and two 8 x 8 pixel chrominance components; (3) Group of blocks (GOB) which corresponds to 1 / 12 of CIF picture or 1/3 QCIF picture; (4) Picture layer which corresponds to one video frame. The H.261 standard actually only specifies how to decode the video stream. En- coder designs are constrained only such that their output can be properly decoded by any decoder meeting the standard spec [53]. In our implementation, based on the Vic software, the H.261 bitstream is encoded such that macro blocks that have changed significantly from the previous frame are updated in the current frame. Also, each macro block is updated at least once every 32 picture frames using a block aging process. 5. 5.3 Characteristics and Implementation Details for H. 263 H.263 is video coding standard by ITU [54]. It was designed for data rates as low as 20 Kbits/s and is based on the ITU H.261 standard. It supports 5 resolutions (CIF, QCIF, sub-QCIF, 4CIF and 16CIF, where CIF is standard 352x288 pixels resolution). Like H.261, it uses both temporal and spatial compression and also provides for advanced coding options (not included in H.261) such as unrestricted motion vectors, advanced prediction and arithmetic coding (instead of variable length coding) for improvement of video quality at the expense of video codec complexity. It allows for fixed bit rate coding for transmission over a low bandwidth network as ‘ well as variable bit rate coding for preserving a constant image quality and frame rate for storage and transmission over high bandwidth networks. Each picture frame can be coded as either an ‘I’(intra), ‘P’(predictive), ‘B’(bidirectional) or ‘PB’ frame. ‘I’ frames are intra—coded reference frames which contain all the infor- mation needed to initialize the decoder and display the image. ‘P’ and ‘PB’ frames 111 are inter-coded frames which require information from other frames at the decoder in order to decode them. In our implementation for the H.263 codec, which is based on the Vic software, a completely intra—coded ‘1’ frame is transmitted every 10 seconds. 5.5.4 Characteristics of H.264 / MPEG4-Part10 A recent development for video coding is H.264/MPEG4 Part 10 [55], also named Ad- vanced Video Coding (AVC), which is jointly developed by ITU and ISO. H.264/MPEG- 4 supports video compression (coding) for realtime video delivery over low bandwidth networks. In contrast to the frame based coding in MJPEG, H.261 and H.263+ schemes, the coding in MPEG4 streams is object based where each scene is composed of objects, which are coded separately. This object based coding structure can be effectively used for video surveillance systems [88]. However, current software im- plementations of the coding scheme have been tested to have best-case capture and compression latency of the order of 500ms - ls which precludes their use in a realtime video scenario. We have integrated a commercially available hardware based H.264—baseline and H.264-enhanced encoder in our switched surveillance implementation which supports a frame resolution of up to 1080x768 pixels2 at a frame rate.of up to 30 Frames/ sec. The hardware encoder supports constant bit rate encoding as well as variable bit rate encoding. For a frame rate of 30 fps and an image resolution of 800x600 pixels2 with variable bit rate encoding and a tracking scenario involving camera motion, the baseline encoder has a bit rate of approximate 500 Kbps. The interface developed is shown in Figure 5.7. 112 Sensor fiat—— Information lon :’ ........................ , ........... Module ] I 3 Sensor Nodes Information Table about other Target into No— . Image based ’ Module Wide ’ ‘ Yesl Trackm, 5 Targe fromother S , Target Observations Table NM” ensmg - - Data (Videorl Cmmm Video Packetize Video 'Qmpressio for transport j Motion active FOV . I t T etLocation - - arsgo am 1 mutations [TI—>1 Figure 5.8: Architecture of Sensor Node. 5.6 Sensor Node Architecture Figure 5.8 depicts the general architecture of a sensor node. The individual sensor nodes maintain information regarding the observations of their neighboring nodes and broadcast (within their locality) their own observations. Based on the combined observations, each node develops a list of targets being actively tracked and the status of its peer nodes and stores this information in the targets table and the sensor nodes table, respectively. In the targets table, the native as well as observed characteristics of the target objects, observed by the respective sensors, are stored. The targets table also stores information indicating the node that sensed these characteristics. Nodes also store peer information, such as location, active FOV and total capable FOV of the peer. When the target to be tracked is outside the active FOV of the sensor (but in the capable FOV), the node can still ensure that the target is acquired in the active FOV using target location information available from other sensors in the targets table. This paper proposes the method of cooperative Hausdorff tracking for deriving assumptions on the input a to the robot/camera to bring the target into the active FOV of the sensor. Using the cooperative Hausdorff tracking method the target can be brought into the active FOV of the sensor, and assuming that the visual characteristics 113 of the target can be recognized, the sensor will switch over to image-based Hausdorff tracking to maintain the target in the active F OV. The sensed video information is broadcast to the requesting clients using the video subsystem, which consists of a video server that compresses and transmits the cap- tured video using either MJ PEG or H.263 bit stream mounted over the RPT/UDP/IP transport protocol. 5.7 Modeling Switched Video Feedback as a Dynamic Sys- tem The configuration of the cameras C can be defined as the collection of all the param- eters of the cameras involved. That is: C={Cl,C2,...,CN} (5.22) where, C,- 6 RP is a vector containing all the intrinsic and extrinsic parameters of the camera. The words “layout” and “placement” will also be alternatively used with camera configuration but they will reflect the internal parameters of the camera, too. Consider a monitored region as a domain E C R" under surveillance. A finite number (N) of cameras is distributed in this region and involved in the surveillance task of maintaining visibility of a target as it moves within the monitored region. The monitored region E can be discretized on a finite-dimensional grid 9 = {WW} 6 IR”,i=1,..,g} which consists of a finite number, 9, of vertices Vi. Note that grid 9’ can be generated from the domain E using various approximate cell decomposition methods where the cells have a pre—defined shape and size in order to achieve a certain resolution. Consider any continuous-time target trajectory in two dimensions kc(r) : 1" C +—-> R2. For every 7' 6 Fc, kc(r) provides as an output the location of the target. This 114 continuous time trajectory can be approximated as a discrete-time finite-dimensional map on a grid as k(t) : r H ug (5.23) where, F = Z and 119 represents the finite-dimensional space of the locations of the vertices V,- of the grid 9. The dynamic sensor switching problem can be modeled as a discrete—time, finite- dimensional dynamic system. The state space of this system is the finite set of camera states defined as: Q = {q,-lg,- = (i,C,-),i E (1, ...,N),C,- E C} where N is the total number of cameras and C is space of configurations of all cameras. Using the above definitions we can now describe the model of a dynamic sensor switched system as a finite-dimensional discrete-time dynamic system V: v =[Q7F1u7¢] (b:QXPXUgXU1l—>Q (5.24) where U := Hg x U1. Here, a 6 U1 is the vector of control inputs while k(t) 6 Mg, t E I‘, the sequence of grid locations in 9 that the target visits, are the reference inputs to the system. The reference inputs evolve according to a predefined function of time t and depend only on the target motion, which may not be known a priori. The control inputs are used to control the trajectory of the system to ensure the target is covered and can also be used for optimization of the various metrics being used. The system V may not be complete, as all the transitions between various cameras may not be enabled at all times, based on the visibility constraints of the various states for the target locations provided. 115 5. 7.1 Examples of Video Feedback Algorithms - Best Resolution Consider a set of N cameras Q = {(i, C;)|i = 1, ..., N C,- E C}. The sensor selection strategy is dependent on the resolution sustained by the target at each of the cameras and the one with the best resolution will be selected to provide feedback to the human operator. This sensor selection strategy takes into account only the best resolution sustained at all the cameras, which implies that the control input 11. is based solely on the camera configurations {C2} and the target locations k(t). The strategy does not depend on the current camera tracking the target - i.e., sensor switching costs are not taken into account. In order to implement this strategy in the dynamic systems model, we define a function RN33 : Hg x I‘ +—> U1 which maps the locations of the target at various time instances to the space of the control input variables it based on the resolution sustained by the target at each camera. Notice that R is not a function of the current state of the dynamic system - i.e., the current camera. This implies that the system does not have memory. The above algorithm can be implemented as follows. Let qbest represent the cam- era that has the least distance to the target; i.e., under the assumption of homogenous camera sensors, Qbe st can view the target with the best resolution. For a given target location k(t) and time t, Qbest can be written as: ant) = cm i=rr11tigNdist(Cz-(t), km). visib1e R... (5.29) K : [0,7') x X H U (5.30) The Bellman function V(s, 1:) should satisfy for any 3 E [0, r], and each a: E X, V(s,:r) 2 main B(r,o,x,w) (5.31) and the optimal control input satisfies the condition 60' + 1) = ¢(€(J'),J}K(j,€(j))), j = 3,3 + 1, MT {(3) = :1: (5.32) 120 for each s E [0, r) and each a: E X. The computation effort required to solve this problem will be significantly lower than tabulating all the control input sequences. However storage requirements for this procedure will be significantly large. 5. 8.2 Dynamic Programming Algorithm Implementation In order to minimize the switching cost and maintain adequate resolution of the target, we construct a graph based on the camera location and the visibility of the targets to the cameras, given the predicted motion of the target, Figure 5.9. The procedure for constructing the graph is shown in Algorithm 1. After the graph is constructed, we use the dynamic programming method, a modified version of the Dijkstra algorithm, to find the optimal switching strategy (Algorithm 2). In the graph generating stage, we enumerate all the cameras that can see the current grid point with an acceptable resolution. The resolution metric is marked over each camera as a weight on the vertex. However, in order to run the dynamic programming method, it is desirable to have all the weights only on the edges. Thus each camera is represented by two vertices in the graph, with an edge connecting them that has a weight equaling to the resolution metric. We also enumerate all the cameras that can see the grid points specified by the prediction vector. Between two prediction grid points we connect the cameras based on their switching metric. If the two cameras are the same, the switching metric is set to 0, otherwise a non-zero switching metric is set on the edge. In Figure 5.9, all non-zero metrics are set to 1 for the sake of simplicity. 5.9 Chapter Summary This chapter presents the design of the various components of a surveillance network. For exploiting the redundancy in motion along the various axes of the sensor, an 121 f." ' "\ //’”\ ""\\ /“ { C t Resolution .. l/ i \t O (/ C ‘1, Resolution J C'\l '=. Metric . / ‘ ’ Metric ’ ‘ I \\ 1/ \\..._// \x.._fu’/ \\a_l—/ 1 /’""“ ’7 ’ ' , ”‘\ ”A r" \ Resolution , / ’\ Resolution .. / ’\ i 62 l Metric K CZ / ’ CZ Metric W] 62 l \\\_’_,’ .\\__// 1 \\_/ \ku/z rx'"\ . /-\\ , ... . ‘ /"' ,/‘\. (I C \ . Resolution a / C, . _ ‘ / C\ . Reso|ution .. “f C, K k ) Metric K k K k / Metric \ k ’\._-/ -,_,. ‘\.__/ v/ Figure 5.9: Generating Graph for the Camera Switching Strategy Algorithm 1 [0, W] = Grathen(pv) 1: for all cam in C ameraSet do 2 if pv(l) is visible by cam then 3: Add two nodes representing com to G 4 5 Connect the two nodes with the resolution metric of cam in W end if 6: end for i Pprev = 7911(1) 8: Delete pv(1) from pv 9: for all p in pv do 10: for all cam in CameraSet do \1 11: if p is visible by cam then 12: Add two nodes representing cam to C 13: Connect the two nodes with the resolution metric of cam 14: Connecting the nodes generated by p and ppm” and set the switching cost to W if they are for different cameras. 15: end if 16: end for 17: end for optimization framework is prOposed. This framework allows the accomplishment of various other tasks under the constraints of the main tracking task, such as energy conservation and better sensor location for sensing. The characteristics and implementation details of the various choices for video feedback are discussed and finally the design of a sensor node is presented. The architecture of the sensor node consists of a camera, a target perception module, a motion control module, a video feedback module and a communications module. 122 Algorithm 2 p = Dijkstra(G, W, s, t) 1: for all 22 in V(G) do 2: d(v) <— +oo {Initialize the distance vector} 3: p(v) <— undefined {Initialize the previous node vector} 4: end for 5: d(s) <— 0 {The source has 0 distance to itself} 6: C (— 0 {Initialize the checked node set} 7: Q <— V(C) {Copy the vertex set into a working set} 8: while Q sé (ll do 9: u +— ExtractMin(Q) {Extract the vertex with minimum value in d} 10: C s— C U {u} 11: if u = t then 12: return 13: end if 14: for all edge (11,12) do 15: if d(u) + W(u,v) < d(v) then 16: d(v) 4— d(u) + W(u, v) 17: p(v) <— u 18: end if 19: end for 20: end while 21: if !t e C then n p90 23: end if 123 Chapter 6 Experimental Implementation and Testing Figure 6.1 shows the general architecture of the surveillance system implemented over an IP network. This chapter presents a discussion of the implementation issues such as switched video feedback to the human operator/ client and a performance analysis of the experimental results of the proposed algorithms. 6.1 Experimental Test-bed Setup A visual surveillance test-bed comprising of seven video cameras has been set up at the Robotics and Automation Laboratory at Michigan State University. The test-bed is used to demonstrate the integration of multiple active sensors with active target tracking algorithms to perform a coherent pervasive surveillance task of tracking multiple targets as they move across the monitored landscape. The cameras are attached to processing and communication computers and some are mounted on pan- tilt drives or robots for moving the cameras. The surveillance test-bed developed has the functionality of an end-to—end, multi-camera monitoring system that allows a single or multiple human Operator(s) to monitor activities in the region of surveillance. 124 Networked sensor” (Wired) \ Mobile networked sensor (wireless) . FEVER ' “iii-oi. ‘ 5] “Tit—'4‘ Wireless networked sensor Wireless networked sensor Mobile wireless operator camZ \l / cam3 amt / Network "' Infrastructure cam5 / {'— \ Robot Motion Platforms Targets Figure 6.2: Surveillance test-bed designed along with the cameras and the motion modules for camera motion. 125 Figure 6.3: Sony EVI—D30 mounted on Robot Nomad XR—4000 Figure 6.2 shows the test—bed setup. This section describes the hardware and software architecture of the individual sensors used. 6.1.1 Hardware Setup The sensor node setup consists of three Sony EVI—D30 active PTZ (pan-tilt—zoom) cameras, one dual head stereo vision camera consisting of two Hitachi KPI-D50 cam- eras mounted on a Digital Perception DP-250 pan-tilt drive and one fixed view KPI D-50 camera. One of the Sony EVI—D30 cameras was mounted on a Nomad XR 4000 mobile robot for extended mobility and sensing range. Figures 6.3 and 6.4 show the sensor and motion hardware for the implemented system. The cameras were connected to Pentium 4 2.4 GHz computers which had PCI- based video capture hardware cards attached to them. The camera capture cards are 126 Active Camera Sensors Figure 6.4: Active Cameras: Sony EVI-D30 and dual head Hitachi KP-D50 capable of transcoding, in real time, the input from the cameras to an H.264 stream (on a separate software interface) and transmiting that stream over the Internet to clients that request the video stream. The PTZ modules for the various cameras were controlled through serial port communications. The various computers were connected to each other using wired ethernet and wireless 802.11g connections over an IP network. The individual sensor nodes were provided with publicly addressable IP addresses and hence could be accessed from the Internet. The human interface clients were comprised of Pentium 4 laptop computers which could be connected to the surveillance network through a direct wired or wireless connection or through the Internet. 6.2 Software Setup The sensor outputs are processed on the local computer for target detection and identification. The local computers used for sensing and processing are running a Linux Operating system. The targets to be tracked using the automated tracking 127 system were multiple humans moving around in the monitored region. For ease of target detection and identification, the human targets were wearing solid color clothing and CM Vision was used for color analysis and blob detection and merging [91]. The acquired 640x480~pixe1 images were quantized on a 128x96 grid for reduced computation load. The identified targets were represented using their bounding boxes which were quantized on the 128x96 grid. The coverage set is also quantized on the grid and the distances of the grid points to the target set are pre—computed and stored. The tracking task is carried out at 25 fps - i.e., a new velocity input is computed for the motion module inputs at the rate of 25 Hz. At the same time, the raw camera feed is input to the H.264 transcoder card, which is provided as a separate software device (/dev/dsp0) on the computer. There are various implemented on Linux and Windows operating systems for receiving video stream and other pertinent information from the camera servers. 6.3 Experimental Results for Wide Area Camera Calibration The individual camera sensors involved in the setup of the camera system need to be calibrated for their location and pose information in order to share information amongst each other. A wide area calibration procedure described in Section 5.1 has been used to calibrate the configuration of the cameras. BlueCCal, developed for use at Czech Technical University by Tomas Svoboda [92], is an attempt at calibrating many cameras without the necessity of a large calibration object, or the necessary handwork of more classical methods. The system does, however, have necessary conditions for the calibration to yield sufficiently accurate results, which, among others, is a minimum of three cameras for the setup. The only action required as a reference for calibration is the mapping of l-point objects that is easily detectable in the images captured. For our setup, this involved 128 capturing images in a dark setting with a plastic-capped laser pointer to act as the 1-point reference in all images. Therefore, the only necessary work was moving the laser pointer around within our viewable domain, and capturing images during the course of this motion. These images would later be used in the calibration software, to locate the relevant points necessary for calibration. The software itself can be broken down into a number of simple steps: find all points in the images for each sensor, discard all mis-detected points by pair-wise RANSAC analysis, estimate projective depths, fill missing points to make the scaled measurement matrix complete, perform the rank 4 factorization to get projective shape and motion, and estimate parameters of non—linear distortion. Results of the calibration aren’t only just the necessary conversion matrices be- tween each of the sensors, but also the distortion models of the system, the location of all captured points in each of the sensors, and a scaled model of the location of each of the sensors. This information proves vital in the immediate testing of the results of the calibration; i.e., if the scaled model displays some erratic location of a given sensor, it is likely the calibration is off w.r.t. that sensor. One can then look at the location of the points in that sensor and notice if that sensor got the necessary coverage of its viewing area to accurately calibrate w.r.t. the rest of the system. This allows for a more intuitive check of the accuracy of a given calibration. An advantage of this calibration tool, besides those stated above, is that the system is an adaptive calibration, and among the images captured, the one-point object, - i.e. the laser pointer - does not have to be in all of the images for each sensor. The adaptive style of this network means that the more cameras involved, the better the calibration results. So this proves beneficial for heavily overlapped multiple sensor networks. The only necessary condition w.r.t. the number of sensors that can view a given point is that the resultant calibration requires a minimum number of points that can be viewed by at least three cameras at a time; this proves incredibly helpful 129 5-Camera Configuration z-axis 4 4 x—axis Figure 6.5: Camera Setup: Location and pointing direction of the camera setup. when working with a disjoint network, where there exists no point that all of the cameras can see. By only requiring three cameras to see it, as long as the majority of the monitored region is covered by at least 3 cameras, then the system is able to calibrate with reasonable efficiency. Figure 6.5 shows the results of the calibration carried out on 5 cameras. The cameras were located at different heights and were pointing to the center of the room. The results of this experiment demonstrate the calibration of the surveillance test- bed setup for conducting the remaining experiments outlined. From Figure 6.5 we can see that the camera setup was calibrated with an adequate degree of accuracy. 130 6.4 Experimental Implementation for Image-Based Hausdorff Tracking The automated tracking task was defined as maintaining the visibility of multiple moving targets in a region with adequate resolution. The shape function used to track the targets is: Mo = 2?; Jpovrkr) + await-1+ await.) (6.1) JFOkai) = lg, dim?) dq JAmmez) = ma.x(fKz dq — AREAJl/IINi, 0) JAma$(f{,-) = min(AREA_MAX,- — IR. dq, 0) 2 where, N is the number of targets, q is a point on the target set R and AREA-M AX i and AREA_M I N,- denote the maximum and minimum admissible areas of the target set R,- for maintaining adequate resolution. Note that the shape function J (K ) is zero only when the set of targets UiN=1 If; is completely covered by the sen- sor coverage set K and when the area of each target set R is within the limits (AREA-M I N, AREA-M AX ) specified for that target. Otherwise J (R ) is a non- zero positive value. Based on the value of the shape function J (K ), the velocity input vector u to the camera motion units (PTZ drives or robot) is calculated and applied at the rate of image acquisition - i.e., 25 frames per second (fps). 6.4.1 Tracking a Single Target The surveillance task is to maintain the target in the active FOV while maintaining a discernable resolution of the target on the image plane. The task is described 131 g """"""""" Jr=ov g . -'-'-JAmmMm D . ‘ . J E; ........................ E ...... .:.U§J\ .................. AflmMm‘ ‘0 : . f f s ' (T) 1' t J E r in! 3 -A I." I I 40 60 80 100 120 time (s) (a) Shape functions 150-“; ......... 2 ............ J.H.H.H.E ............ . ............ : ............ VX 100._...{\' ......... ............ ............ ............ ..... .....vz -100 i i i i i i O 20 40 , 60 80 100 120 time (s) (b) Active sensor inputs Velocity input (mm/s) (mrad/s) Figure 6.6: Image-based Hausdorff tracking using active camera mathematically using equation 6.1. The target was a ball that was manually moved around in 3D space during the duration of the experiment. The image was taken as a regular grid of 128x96 pixels evenly spread over the 640x480 original image in order to reduce the computation load. The target set K was approximated as occupying a certain number of pixels on this grid. Assumptions on the input u = [23, 2, W13, wy, MT to the robot and camera system were derived using equation 3.31. The target was initially placed where the task critera were not satisfied and then the target moved around manually, which generated a disturbance input to the system and the system immediately tried to reduce the value of the shape function to zero. Figures 6.6 and 6.7 depict the results of the image space Hausdorff tracking task using an active camera. The shape functions J FOV (If ), J Area Minfk ) and J Area Max and inputs u = [23,y,w$]T are plotted in 6.6 (a) and (b), respectively. The figure 132 (O FOV : .9 J . '5 AreaMrn C ......................................................................... : 3 . LL : a) _ ......................................................... ; o- . to t _c . (D ............................................................ f 7 --—W (I) Y *5 """" 7» Q . E S m ........ E (U 0 -10. """"""" 1 """"""" 1 """ 1 ... 1 ........ i ....... I.U.H,;j (bl Figure 6.7: Image-based Hausdorff tracking using active camera: Maintaining ade- quate target resolution shows that the system is stable and that the target is maintained in the camera field of view with a desired resolution despite the seemingly random motion of the target. These experiments demonstrate Hausdorff tracking using multiple shape functions. We see that the shape functions are are non-zero when the target moves but they are quickly reduced to zero by the motion of the camera. The motion of the target is taken as a disturbance input for the tracking control system shown in Figure 3.4 and it can be seen that the Hausdorff tracking controller can reject disturbance input from motion of the target. 6.4.2 Tracking Multiple Targets Simultaneously The surveillance task is to maintain the multiple targets in the active FOV of the sensor. The targets were two humans moving around and interacting with each other. 133 ,5 1 r r g s .1 0 1 0 20 40 50 60 time (s)30 Figure 6.8: Tracking multiple targets using Image—based Hausdorff tracking The target set K was approximated as occupying a certain number of pixels (in multiple disjoint locations) on this grid. Assumptions on the input u = [wx,wy]T to the camera system were derived using equation 3.31. The targets were initially located such that they were out of the field of view of camera. The targets were detected by the monitor and the sensor was tasked to cover the two targets simultaneously. At, t = 0, the targets are just in the active FOV of the sensor and the task criterion is not satisfied. The camera then moves to reduce the shape function J to zero so the targets are now covered. The targets then randomly move around the room and the camera tries to maintain both targets continuously in the active FOV. Figure 6.8 depicts the J and the input velocities (u 2 [tax wy]T) applied to the camera. Notice the initial large value of the shape function J, which is quickly reduced to zero. Figure 6.9 depicts the position X, Z estimated of the two targets. We see 134 Person 1 Person 2 ............................................................. ........................... 1o 20 30 4o 50 60 7o 80 time (s) Figure 6.9: Image-based Hausdorff tracking for multiple targets: Estimated position of targets that despite the seemingly random motion of the two targets, the camera always tries to keep both of them in the active FOV. Further, the energy efficiency of the proposed method is demonstrated by the relatively infrequent input applied to the camera only when one of the objects escapes the active FOV. Figure 6.10 shows the images taken from the camera at various times during the tracking experiment shown in Figure 6.8. This experiment shows multiple target tracking with a single camera sensor using the Hausdorff tracking method. Tracking multiple targets using a single sensor is advantageous for overloaded networks where the number of targets outnumber the number of cameras available for tracking. The image sequence in Figure 6.10 shows that the targets are moving about randomly and still being maintained in the field of view of the camera sensor at all times, which demonstrates the advantages of 135 time = Ssec time = 13sec time = 208cc time = 425cc time = 465cc Figure 6.10: Image sequence for tracking multiple targets using Image-based Haus- dorff tracking. Hausdorff trackng over vector space servoing. 6.5 Experimental Implementation for Cooperative Hausdorff Tracking The cooperation between nodes to continuously track a target by directly tasking various sensors can be done using cooperative Hausdorff tracking as explained in section 3.4.2. In this experiment, a mobile robot is deployed in order to track a target that cannot be covered by a fixed sensor with adequate resolution. The robot uses the position information received from other sensors in the network to compute a trajectory that will bring the target set within its coverage set. Figure 6.12 shows the x-z plot, which is the plane of the floor, of the data for both the target centroid and the robot tracking it. The robot’s path shows the initial cooperative tracking taking place. As the robot moved its view cone toward the 136 Plot of cooperative and image-based Hausdorff tracking on the x axis 7000 r r l r r r I I f § § ’: § § """" ”b“ 6000_H..H,H (,H.. . Win a.. .f,..,fl ._flq.q....;....n.m ; ....... -f--ungetcenUOklfj E 5000 g x 4000 2000 L I l I l J 1 I 0 10 20 30 40 50 60 70 80 90 time (5) Plot of cooperative and image-based Hausdorff tracking on the z axis 4000 r r r I r r . 1 : E i 2 § 5 ------- robot 3500- . .. .. _. émm9t_wnmw _ E 3000 _. ...................... s ‘ . N 2500 _ ....................................... 2000 _ .- .................... .. 1 500 l l l l l 1 1 l 1 0 20 30 40 50 60 70 80 90 time (s) Figure 6.11: Position of robot and target set on x and y axes. target, the target also began moving. Re-projection of the target onto the view cone caused a slight change in velocity in the robot that can be seen. Once the target set enters the set space of the mobile robot’s view cone, image-based Hausdorff tracking begins. Now the robot reacts to changes in the target to maintain optimal resolution. This continues for the remainder of the experiment, as can be seen in the plot. In Figure 6.11 we can see the data from the individual axes. The robot is quite successful in maintaining the target set within its coverage set, as can be seen by the correlation of the plots of each of the axes. Finally, in Figure 6.13 we see the plot of the J value for both image—based and 137 Plot of cooperative and image-based Hausdorff tracking on the x-z plane 5000 r r 1 1 r 1 ------ robot —— target centroid n ’ I 4500 Target 1 l 4000 3500 2 (mm) 1 41 I l 3000 ..... I 2500 l L 2000 uuuuuuuuuu 1 500 l L I l J l 2000 2500 3000 3500 4000 4500 5000 5500 Figure 6.12: Motion of target set and robot on x—z plane. The view cone of the robot is shown at a particular instance in time with the position of the target centroid shown as well. 138 1° I F T I '. 8.... .. .4 ................................. —¢ 2 :--e ------- ;--—----+ ------- e-—----A-------* a; 6- ...... - M .. ....... ......... .—Ima9eJ . i f I . q s . 2 : s """ °°°9J , IL ' r r j - - -coop(|ow)oractive(hlgh) 8. 4,. l ....................... ................... -l a .r: I ‘0 I f 2,. I . _ ._ ' ; 2 0 R 1M A 1 M 0 10 20 30 40 50 60 time(s) Velocities on x and z axes in time I l I I 3 —xvelodty a ....... _'..f.'..z_.‘.’9'9‘?‘.'¥‘- 2 >. 2 ...... - (I x a .................... ............... «— .3‘ i o . r----.--------s- > ------__‘C ; - —2oo ‘ ' 1 i O 40 50 60 time(s) Figure 6.13: Shape functions for cooperative and image-based Hausdorff tracking 139 Target at optimal distance. (505) w k 1d ' I ' ‘ ‘ ' l Profile of robot and target Target dlstance to great. (205) Target at edge of view. (805) Figure 6.14: Robots View of target at several points in time to show several conditions that the tracking system responds to. cooperative Hausdorff tracking. A demarcation line has been included to clearly mark the switch from cooperative to image-based tracking. This occurs at approximately seven seconds. This agrees with the data from Figure 6.12 since this is the point in time that the values of the axes begin to synchronize. Figure 6.5 shows several images gathered during the experiment. The first image in the series shows a profile of the mobile robot and the target tracked. The remaining images were gathered from the robot’s onboard camera. The second image shows the target at the determined optimal distance. This determination is based on the set space of the target instead of simple-distance as is the case with point-based tracking. The third and fourth images show cases where the target is moving toward, and away, from the robot, which then causes the robot to retreat from, and pursue, the target respectively. The final image in the series shows the target strafing left, to which the robot responds by moving to re—center the target set. 140 Figure 6.15: Target tracking using a Sony EVI—D30 camera mounted on Robot Nomad XR—4000 mobile robot This experiment shows that the Hausdorff tracking control can be used adequately for task space servoing, as well. Given the task space location of a target set (possibly multiple disjoint targets) and a model of the coverage set of a camera, the target can be brought into view using cooperative Hausdorff tracking. This experiment also demonstrates the stable switching (due to in-built hysteresis in the switching criteria) from cooperative Hausdorff tracking mode to image-based Hausdorff tracking mode. 6.6 Experimental Implementation for Redundancy Resolu— tion using Optimal Hausdorff Tracking Based on the discussions in the preceding sections, we can now formulate the task distribution problem for Hausdorff tracking using redundant motion stages. The problem we are considering is the optimal task distribution for a pan-tilt camera mounted on a mobile robot tracking a single target. The experimental setup for the tracking task is depicted in Figure 6.15. For the experimental implementation, we will consider the redundancy in the robot and camera motion along the X traverse direction. When the target moves 141 along the X direction, either the robot can traverse linearly or the camera can pan. Hence there is a redundancy in the task execution. The model for the Hausdorff tracking task using the camera and the robot combination can be written as: "U2: [01C2] = —aJ (6.2) my where u 2 [ex 212ij is the velocity input vector to the robot and camera combination, a is a positive task level scalar gain and J is the shape function which indicates the error in accomplishing the task. The vector C = [Cl C2] describes the distribution of the task error between the two redundant axes. Equation (6.2) can be derived using the formula provided in [67]. The task indicators used are the energy consumed by the system and the cur- rent pan angle of the camera. The current pan angle of the camera affects how the translation in X by the mobile robot affects the task accomplishment. Hence, the objective functions considered for the multi-objective optimization schemes also included energy minimization and pan angle minimization and were represented as: 01 = q8 = (0 - 00)2 (63) D2 = llvxll + Ilwyll (6-4) where, 0 is the current pan angle value, 00 is the desired pan angle value considered as zero pan, Hug,- II and Hwy II are the horizontal X velocity and the angular Y velocity (pan velocity), respectively. Note that confidence parameters are built into the objective functions. For example, when the pan angle is close to zero, we would like to reduce the weight for the objective function D1, which is exactly what happens. Various optimization schemes were set up, including one with no optimization involving just a random selection of the null space vector 6. The optimization schemes 142 10r .. ~.- w - 150 ‘ b panvelocity j '6 Xvelocity c $100» . -.» .9 > *g x 3 E 50 “t .8 g 2 w 5 o a _50 l i a" g . 0 10 20 30 40 50 time (s) time (s) 120 150. Tolerable 100’ ......... 80 Tolerable 2 100’ 8’ E) 60 F1 . g a) ...... ...... . .............. 50 4O . 3 '20’“ ‘ """""""" 0 Highly Desirable o ‘_ . i . 4 . A . 0 10 20 30 40 50 60 0 10 20 30 40 50 60 time (s) time (s) Figure 6.16: Results of the tracking task with optimal task distribution using physical programming. selected for the experiment included single objective optimization where either the energy minimization or zero pan angle objectives were considered individually. For the multi-objective schemes, both weighted sum and physical programming methods were tested. The results of these tests are tabulated in the following sections. Figure 6.16 shows the results of the experiments carried out using physical programming optimization described above. The figure shows the plots of the shape function, the velocities in the X and pan directions and the objective function values for energy conservation and pan angle zeroing. Notice that the values of all the objective functions are always maintained at least in the tolerable range by sacrificing the performance of the other cost functions currently under consideration. The same task was also carried out using a lie-optimization scheme with a fixed 143 45 _ I. Weighted Sum Opt. II :I _ .. . No Optimization ll 40‘ II I: 35 II I. - I _ :: ,I Desrrable I I I I I I ' I i .., ....................... I' II | I Hfghly Desirable " | time (s) Figure 6.17: The energy consumption for no optimization, and weighted sum. task distribution between the two redundant axes and also using a weighted sum scheme. As mentioned in section 5.4.], a weighted sum method with fixed weights without apparent meaning is used in order to combine the various cost criteria. The cost criterion used is: + llwzll) The two cost functions - i.e., energy and pan—angle maintenance for the no- optimization method and the weighted sum method, are shown in Figures 6.17 and 6.17, respectively. For the no—optimization case, it can be seen that in the absence of any optimiza- tion, the linear velocity transferred to the robot is very small and most of the velocity 144 500 ' '— - - - — Weighted Sum Opt. - - - No Optimization 400- Untolerable 300- I """""" H'i' iii ' 'Uh'cié's'iiéiiié" _g 200 .................................. 3‘ ............. .g...y. ................. 2’ -_ “3 | S ------------------------------------ I --------------------------- o. 100 ‘_ _, Tolerable I',IIIIIII.IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII‘IIIIIIIIIIIIIIIIIIIIISIS 0". ............................................... ". ....................... lllllllllllllllllllllllllllllllllllllllllllllllll ‘Hl‘llllllllillllilltl ! -1oo~ ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, I, _[9'37‘3'9'3 . , Undesirable _20 I I I L I I 0 10 20 30 40 50 60 time (s) Figure 6.18: The pan angle for no optimization, and weighted sum. for task execution is allocated to the camera pan. Using the weighted sum method, the various components comprising the task criterion lack physical meaning. Based on this observation we notice that even when the multi-objective optimization is enabled, the pan angle remains very close to the desired value while the energy conservation objective attains unacceptable values. It should be noted that various tolerance ranges, which are used for the physical programming approach, are not used in the weighted sum approach. Hence, despite the optimization being enabled on the various objectives, the values of the objective functions can attain unacceptable values. The above results show that the physical programming method provided better results in terms of Optimizing multiple objectives while simultaneously accomplish- ing the task. This is because with the weighted sum method, due to the loss of physical meaning of the weights, dynamic weighting may provide an unacceptable 145 solution. However, for physical programming, since the cost criteria are combined using weights which are attributed a physical meaning, physical programming can calculate solutions on the pareto—optimal frontier. This experiment further goes to show that the redundancy in the input expressed in Equation (3.30) can be successfully exploited in order to optimize various cost func- tions associated with the task. Hence, some criteria, such as resolution maintenance of multiple targets. can indeed be removed from the set of constraints and used in the optimization criteria and can provide more flexibility for task specification. 6.7 Experimental Results for Multisensor Cooperation This section illustrates the experimental implementation and results for the two tech- niques: 1. Sensor allocation algorithm and 2. Mutational hybrid automata modeling techniques proposed earlier. 6. 7.1 Experimental Evaluation for the Sensor Allocation Algorithm Pervasive surveillance using the proposed multi-sensor coordination mechanism was rigorously tested. The task of tracking a target over multiple sensor coverage regions was carried out. The target moved a large distance, spanning the capable viewing regions of multiple sensors. Figure 6.19 depicts the results of the tracking scenario. Initially, sensor 2 can see the complete target but the target is out of the active FOV of sensor 1, which is depicted by the target visible flag. Based on the solution of the optimization, sensor 1 is tasked to track the object. It uses cooperative Hausdorff surveillance to cover the target. The global coordinates of the target for cooperative Hausdorff tracking are provided by sensor 2. Once it acquires the target, it switches to image-based Hausdorff tracking to ensure target coverage. The target then randomly 146 5- ....................... , ........................ , ................................................. , —X 4-_Y —Z -—- visible E C .9 #0.; O 0. N >. 3 - X '0 OJ ‘5 2 - .§ E ,_ 1 - . e l 5 g 0 I : l 1 l l l l I 8 0 1O 20 30 40 50 60 70 80 time (s) (a) Camera 2- Estimated X, Y, 2 Position (m) (b) Figure 6.19: Estimated position of targets for multisensor cooperation using sensor allocation algorithm 147 moves around in the active F CV of both the sensors, which keep continuously tracking it. Notice the robust tracking performance of the image-based tracking scenario despite the minor differences in the estimation of the object location. The target then starts to move out of the tracking range of sensor 2, which again triggers the optimization planning problem, which assigns sensor 1 to keep continuously tracking the target. This experiment demonstrates the cooperation between multiple sensors for track- ing a target. It shows that information from another sensor can be used in order to cooperatively acquire a target by a camera and that the location of the targets can be used to optimally assign the tracking task to the various sensors based on their availability, target priority and the locations of the sensors and targets. 6. 7.2 Ezperimental Evaluation of Multiple Target Tracking using Mul- tiple Formation- Constrained Sensors Motion is an energy intensive resource for a sensor node and should be minimized. Multiple sensors (maybe with varying sensing modalities) can be mounted on a single motion drive to reduce the energy consumption. Further, the various sensors can be tasked to track different targets simultaneously. Hence the motion of the sensor node should be optimized in order to achieve the various tasks simultaneously. The optimal Hausdorff tracking framework proposed in section 5.4.1 can be used to calculate the optimal sensor input u. Consider a scenario where two targets are being tracked by two separate formation- constrained sensors as shown in Figure 6.20. The task is to maintain the visibility of targets R1 and K2 using the sensors caml and cam2 respectively. The two sensors are mounted on a single motion platform, although sensor caml can be moved inde- pendently of sensor camg by using the motion of joint go of the PUMA robot arm. 148 The shape functions for this task can be expressed as: J1: [Kl dK1(p)dp (6.6) J2=/1.{2dK2(p)dp (6.7) The motion inputs to the complete robot system are linear motion along the X axis (um) and angular motion about the Z axis of the robot base (LUZ). Angular motion of the robot will not affect the coverage task assigned to sensor camg while linear motion along the X axis will affect the task assigned to both the sensors. Using Equation (3.31), the model for Hausdorff tracking for the two—camera system can be written as: C C u a 0 J Cu : 11 12 a: < _ 1 1 (6.8) / Figure 6.21 depicts the target tracks of the two targets in the world coordinate frame. The two targets move independently of each other and can cross each other as is shown in Figure 6.24. Figure 6.22 depicts the shape functions .11 and J2 for two cameras caml and camg, respectively, and the velocity profiles of the two inputs v3; and wz to the mobile base and robot arm are shown in Figure 6.23. The inputs to the two motion axes (linear velocity v3 and angular velocity wz) are calculated using 6.8. The results of this experiment show that we can use formation-constrained sensors in order to accomplish multiple tracking tasks simultaneously using multiple sensors. This implies that the optimal Hausdorff tracking framework can be leveraged in order to solve multiple target tracking problems. 149 r g Camera 1 \ . . r Figure 6.20: Experiment setup for multi-target tracking using multiple cameras. 150 Y position (m) -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 Xposition (m) Figure 6.21: Target tracks for multi-target tracking using multiple cameras. 151 Sha Functions 4000 .............. : ........... .7 ......... PET ....... . ..... : ..... 3500 .............. .............. ............. ........... ..... ‘ 3000 .............. g .............. ;.H_i.-.ug ............ g .............. ;.-.H.H_H; N 1 E f f i 3 2500 ............. E ........... ;{H.H.H.H.i ............ i .............. j.m.x.u.né 2000 ............. é .......... jéli ........... é ...... 4E5}; ...... . ....... §.HH_H.H.i Shape Function, J1, J . l- . _ - I . - _ . _ . . : _ _ . 1 500 ............. . ........... _ ..... , ............. I ‘ --_I _ - I 1000 ............. g ......... .{;€§; ........ 500 'IIIDIIIII,‘.' Figure 6.22: Experiment setup for multi-target tracking using multiple cameras. 152 100-...W ............... I .................. : ............ ..: ............... ....... Base velocity: v)( i __E —— Arm angular velocity: cox Figure 6.23: Experiment setup for multi—target tracking using multiple cameras. 153 'l'I'l'I'I'I'I‘I'I'I'l'l‘l'l'I'I'l'l'l'I'I'l'l'l'l'I'I'I'l'l'l'l‘l'. . . . . . . . Arm Camera — Cami Base Camera - Cam2 (a) time = 10 sec Arm Camera — Carnl Base Camera — Cam2 (b) time = 25 sec I - r .. ' . ‘ Arm Camera — Caml Base Camera - Cam2 (c) time = 37 sec . ' g _, with Arm Camera — Cam] Base Camera -— Cam2 (d) time = 55 sec Figure 6.24: Image Sequences taken while tracking the two targets with two formation-constrained cameras 154 7000 6000 5000 Y (mm) 3000 2000 1000 Figure 6.25: X—Y Plot for target motion 6. 7.3 Erperimental Evaluation for Multi-Sensor Cooperation using Mu- tational Hybrid Automata Model For the experiment, the location of the target (human) moving about was captured using a stereo vision system. Note that the location of the target was not used in the feedback control scheme designed, but was used merely to show the random motion of the target and coordinate it with the various mode changes in the MHA model. Figure 6.25 represents the position, in x-y coordinates, of the centroid of the target, located using the stereo camera. The locations of the cameras caml and camz are superimposed on this figure. 155 Pervasive Target Tracking Using Two Cameras with Continuous Cam- era Servoing In this experiment, the MHA model presented in section is implemented. Figures 6.27 and 6.26 show the results of the experiments conducted using this model. Figure 6.27 represents the data calculated by caml and Figure 6.26 represents the data calculated by camg. Figures 6.27 and 6.26 show the value of the shape function J, the pan velocity input to the camera, and the mode q E {q1,q2,q3} (defined in 4.2) that the system is in; and subfigure b depicts the current pan position of the camera and the visibility flag of the target for that camera. The visibility flag indicates whether the target is within the FOV of the camera and is set when the target is recognized by the target detection subsystem. Figure 6.28 depicts an image sequence captured by the two cameras, with subfig- ures a, b and c representing the three modes q1, q3 and q2 respectively. Within each Figure 6.28(a), 6.28(b) and 6.28(c), the left and right images correspond to caml and camg, respectively. We notice that the target is in the field of view of at least one camera at any time during the course of the entire experiment. The results of this experiment show that the mutational hybrid automata frame— work can be used to systematically design a pervasive surveillance network which can be used to track a target as it moves across a wide region covered in parts by individ- ual cameras. It shows that despite the random motions of the target, the surveillance network can, in fact, perform stable tracking without having to estimate the exact location of the target. Pervasive Target Tracking Using Two Cameras with Controlled Switch This section presents the results of the experimental implementation of the mutational hybrid automata model with controlled switch shown in Figure 4.3. 156 200 ................ ' _Shape FunctionJ g . q3 . . : pan velocnty 100 ....................... ”...,... .......... A. Mode J, 6, Mode 0 Li fr ‘3. —10 i i i i .1 0 20 4O 60 80 100 time(s) 300 ................ : ...... ........ _ ...... Pan Angle (O) : Target Visibility 200 ................ .............. 100 _ .. Visible OJ .......... .............. 3 ........ i 9, Target Visibility Not Visible § § § Not Visible ----------- -1o . i . 1 0 20 6° 80 100 time (s) Figure 6.26: Current MHA mode, shape function, velocity, pan angle and target visibility for Caml 157 100 ......... : ........ : ........ : ......... 1 ........ 2 ........ , ........ 2 ........ : ....... f ........ : J, 9, Mode —— Shape Function llLTl if . , . ... Pan Velocity i ' i — Mode Visible Not Visible "100 -------- 1 ........ g ...... i I : ' ' Pan (9) -200 ... . . . . . . ....... ...... Visibi'ity 9, Target Visibility _ g i g i . i i i . . 300o 10 20 3o 40 50 60 7o 80 90 100 time (s) Figure 6.27: Current MHA mode, shape function, velocity, pan angle and target visibility for Cam2 158 rip. . ’i‘. — in '. “R (b) Mode q3, (10 sec) Figure 6.28: Image Sequences taken while tracking the target over different modes 300- .......... ShapeFunctionJ ...... .......... .......... ' ‘ Pan Velocity 6) ' ' g ----- *— Current Mode 0 . 2 co“ 5" -100 -200 :§‘ 2?: .2 > ‘65 2’ (U [.— a5 _40 i i i L i i i i O 5 10 15 , 20 25 3O 35 40 time (see) Figure 6.29: Current MHA mode, shape function, velocity, pan angle and target visibility for Caml Figures 6.29 and 6.30 show the results of the experiments conducted. Figure 6.29 represents the data calculated by caml and Figure 6.30 represents the data calculated by camg. Figures 6.29 and 6.30 show the value of the shape function J, the pan velocity input to the camera, and the mode q E {q1, q2, q3,q4,q5} (defined in 4.3) that the system is in; and subfigure b depicts the current pan position of the camera and the visibility flag of the target for that camera. The visibility flag indicates whether the target is within the FOV of the camera and is set when the target is recognized by the target detection subsystem. These experiments show that irrespective of the target’s current location in the monitored region, visibility of the target is maintained by at least one camera. This 160 400 , ................... 3 .......... E .......... : ...... d . .i. . . ShapeFunction J q4 q, Q2 g q3 g qs E 3 """" Pan Velocity 6) g 200 ........ ....... ................. §-----CurrentMode 2 _ _. . . : : . a5 . fi- 0 :1}. 3 "-::-:EE"' l.-o- ....... 4d. q2 q1 q4 _200 L i I i i l i j 0 5 10 15 20 25 30 35 40 time (sec) ......................... Pan angle (9) ....................... ....................... 9, Target Visibility Not Visible _10 ' I I : i i l 15 . 20 25 30 35 40 time (sec) Figure 6.30: Current MHA mode, shape function, velocity, pan angle and target visibility for Cam2 161 experiment further demonstrates the controlled switch scenario, where, the cameras can be actively turned off, can be used to perform pervasive tracking. 6.8 Performance Analysis of the Video Subsystem The real-time-performance video subsystem implemented was measured and analyzed for use in the surveillance network scenario. The parameters which affect the perfor- mance of a video transport scheme in the context of a switched surveillance scenario are time taken to deliver the video stream to the operator, size and quality of the image, the rate of video frame update (framerate) and the initialization time for the client to receive one complete image frame from the server. We compared the perfor- mance of using H.261, H.263 and MJPEG as the encoding/decoding scheme for the visual surveillance tasks. The various choices for the video subsystem implementa- tion have their advantages and disadvantages based on the scenario they are being applied to. We propose a scenario/ task based metric which can be used to evaluate the suitability of the various schemes based on their characteristics. The parameters for evaluation include: 1. Codec complexity 2. Bitrate of generated video stream 3. Switched video initialization time Codec complexity can be represented by the amount of time taken for the capture and compression of an individual image and may be an important factor for implemen- tation on computationally constrained systems such as embedded processors. The bitrate of the generated video stream represents the bandwidth consumed for trans- porting the video stream to the operator and can be a limiting factor for wireless or power constrained sensor nodes. Switched video initialization time represents the 162 amount of time taken to display the relevant objects (targets or environment) after a video stream switch has occurred. The proposed metric involving these parameters is: M=— — — 6.9 TLE+B+TC ( l where, K L E, K B, K C are gain coefficients chosen on a per-scenario/ task basis based on a fuzzy rule. TL E and To represent the switched video initialization time and capture and compress time, respectively, and B represents the bitrate of the generated stream. The surveillance tasks can be divided into two categories - namely, monitoring and tracking tasks. In the monitoring task, the operator must evaluate the actions of the target, which may require the knowledge of the environment w.r.t the target, while the tracking task requires to only keep the target in view. Hence, for the tracking task, in order to calculate the real switching time for the video system, we need to account only for the display of a moving target, while for the monitoring task, we need to evaluate the switching time for the moving target as well as the static environment. Based on the characteristics of the task / scenario being evaluated, the values of the gain coefficients are chosen based on a fuzzy rule. For example, if the sensor nodes are not constrained by the processing requirements or power consumption (implemented on regular PC’s), the codec complexity will not have much effect on the system performance, so we assign a very low value to the gain coefficient Kc- Similarly, if the nodes are networked using a wired LAN (100Mbps ethernet) infrastructure having large enough bandwidth, then we assign a low value to the gain coefficient K B- In contrast, if the nodes are sharing a low-bandwidth wireless channel, K B should be assigned a higher value which signifies the importance of the bandwidth-consumed parameter. The various task / scenario combinations and their respective gain values we have compared for the video system evaluation are tabulated in Table 6.1. In the scenarios 163 Table 6.1: Scenarios and gain values # Scenario K L E K B K C 1 Wired network, monitoring task 1 0 0 2 Wired network, tracking task 1 0 0 3 Wireless network, monitoring task 1 1000 0 4 Wireless network, tracking task 1 1000 0 implemented, the sensor nodes are implemented on general purpose PC’s running Linux and hence the capture and compress time is not very crucial to the performance comparison of the various schemes. The software implementation of the codecs for the tests was a modified version of the Vic software. The video streams were packetized using the RTP transport protocol. 6.8.1 Performance Analysis for Capture and Processing Time The capture time and compression time for one image frame for H.263 and MJPEG is shown in Table 6.2. The frame rate is set to 25 fps, the bit rate is set to be 1.5 M and the image size is set to CIF resolution (352x288) for the experiments. Table 6.2: Capture and Compression Time of the Two Schemes Scheme MJPEG H.261 H.263 Capture 85 Compression 12.65ms 30.09ms 32.58ms We noticed that for H.261 and H.263, when the scene has dramatic changes, the compression time tends to increase, and in the case of MJPEG the compression time is quite constant. Since both MJPEG and H.263 are symmetric coding/decoding schemes, the decompression time should be equivalent to the compression time. 164 6. 8.2 Comparison for Video Size and Image Quality The H.263 standard is very limited in its capability for selection of the frame size, which is limited to 5 standard resolutions, namely, CIF, QCIF, SIF, 4CIF and 16CIF. This may be a limitation when integrating multiple types of camera sensors. On the other hand MJPEG allows for almost all resolutions (limited to multiples of 8, of course) and can be used to transmit very high definition images when required. The viewing quality of the transmitted images can be largely regarded as the same because both schemes use DCT (discrete cosine transform) and quantization for the compression. However, it should be noted that using MJPEG, the complete image is updated at once, while for H.263, the image is updated in parts and may cause a visual degradation of the perceived image quality. 6.8.3 Performance Analysis for Video Bitrate and Frame Rate Due to the inter-frame (temporal) compression implementation, the video bitrate generated per frame for the H.261/H.263 schemes is lower compared to MJ PEG. This makes H.261/H.263 schemes more suitable for video communications over a restricted bandwidth network. The frame rates and bitrates for the two schemes generated for fixed and panning cameras are tabulated in Table 6.3. The quantization ‘Q’ values [57] used are based on a visual notion of picture clarity and are noted in the table. Table 6.3: Ffame Rate and Bitrate for MJPEG, H.261 and H.263 Frame MJPEG: Q=30 H.261: Q=10 H.263: Q=10 Rate Bitrate(KBPS) Bitrate(KBPS) Bitrate(KBPS) (fps) panning static panning static panning static 10 668 668 550 25 700 20 20 1300 1300 1000 45 1000 38 25 1700 1700 1500 60 1300 48 165 6. 8.4 Performance Analysis for Server Switching and Camera Hand- 017 The initialization time of H.261, H.263 and MJPEG are also measured. The initial- ization time is defined as the time from when the server starts broadcasting to when the client receives the first image frame. In all three cases, the initialization time is around 10ms. For H.261 and H.263 schemes, since inter-frame coding is used, there will be long delay when a client joins a session where a server is already broadcasting to other clients. The effect is termed “late entry” and is caused due to the use of inter-frame encoding, where each frame is not independent of the others but must be decoded using some information from other frames. The stream switching delay due to late entry can be large when using these encoding schemes, especially when the scene is relatively static. The late entry problem does not exist for the MJ PEG coding scheme because all the frames transmitted are intra—coded and do not rely on information in the previous or subsequent frames. For the H.261 scheme implementation the server is only required to transmit those master blocks that have changed through consecutive frames. Thus the client has to wait until all the master blocks are transmitted once to be able to see the whole scene. Based on our implementation methodology for the H.261 scheme, a moving target will be displayed almost immediately while the static environment will take a longer time to display after the switch, which implies that the switching time will be low for a tracking task and relatively high for a monitoring task where the static environment needs to be taken into account. The H.263 standard allows the transmission of a completely intra—coded frame, called the ‘1’ frame, to be transmitted periodically in order to negate the effects of accumulated errors. In our experiments for the H.263 coded stream carried out at various frame rate settings, the client has to wait for approximately 5—10 seconds to 166 decode and display the first image frame because it can start to decode a stream only after it has received the first ‘I’ frame. Table 6.4 tabulates the maximum time taken by the client to display all the relevant macro blocks for the first time when the client joins the broadcast session late for both the tracking and monitoring tasks. The maximum switching time due to late entry problem is compared for various frame rates of the video streams with a very large bandwidth. Table 6.4: Server Switching Time for Monitoring and Tracking Tasks Frame rate Monitoring task Tracking task (fps) MJPEG H.261 H.263 MJPEG H.261 H.263 1 13 323 9.33 ls ls 9.53 10 0.13 3.23 9.83 0.13 0.123 9.73 25 0.063 1.33 9.23 0.063 0.053 9.43 Note that the times given in Table 6.4 are the measured values for the display to update. We notice that the switching time for the H.261 scheme is significantly lower for the tracking task than for the monitoring task because the macro blocks comprising the moving target will be updated and transmitted almost instantly; however, the static environment blocks will not be updated until they age and expire at the end of their 32-frame update cycle. It should be noted that for the H.263 coding scheme, all the information needed to initialize the decoder is stored in an intra coded ‘1’ reference frame, which is transmit- ted periodically. One method to combat this late entry problem would be to transmit an ‘I’ frame every time a new client joins the session, and can be implemented using the RTCP (control commands) part of the RTP protocol. This may lead to higher bandwidth consumption, but may be acceptable in wired surveillance applications. 167 6.8.5 Performance Metric Evaluation for Various Surveillance Net- work Tasks Based on the experimental results reported above, the metric It! evaluated for the various scenarios and the codec schemes is tabulated in Table 6.5. The video frame rate was held constant at 25 fps with the image quality (Q) being held at 30 for MJ PEG and 10 for H.261/H.263. The cameras are assumed to be stationary. Table 6.5: Metric for Comparison of Various Schemes Scenario # MJPEG H.261 H.263 1: Wired, monitoring 16.67 0.769 0.109 2: Wired, tracking 16.67 20 0.109 3: Wireless, monitoring 17.26 17.43 20.94 4: Wireless, tracking 17.26 36.67 20.94 We notice that the H.261 implementation has best performance for the wired and wireless tracking scenarios, whereas for the wired monitoring scenario, MJPEG has the best performance and the H.263 implementation manages better performance for the wireless monitoring scenario. It must be kept in mind that the performance eval- uation for the various schemes are qualified by their implementation methodologies outlined in section 5.5. A brief discussion regarding the advantages and disadvantages of using the various video subsystems is presented here. The MJ PEG system can transmit video frames of various sizes and hence has the advantage of being able to handle heterogenous hardware for video capture. Further it has a low coding and initialization latency and allows for a direct hardware im- plementation, which is advantageous for using lower processing power on the sensor node. It does not suffer from the late entry problem during switching. However, it requires a consistently high bandwidth, which may limit its application on wireless or low bandwidth communication channels. H.261 and H.263 encoding schemes consume less bandwidth, allowing for higher 168 frame rates, which is an important characteristic for continuous and responsive real- time monitoring. The drawbacks of these schemes are that they can handle video frames of certain specified sizes only and that both initialization time and switching time are higher than in MJ PEG. They also suffer from the late entry problem, which can be detrimental for systems involving video stream switching and nodes linked over unreliable channels. However, the late entry problem can be solved at the expense of transmitting the complete intra—frame—encoded image when requested using the RTCP protocol. 6.9 Chapter Summary This chapter presents the experimental implementation details of the various algo- rithms proposed earlier. The image-based Hausdorff tracking method has been im— plemented for tracking a single and multiple targets simultaneously. This tracking of multiple disjoint targets moving independently is a major advantage of the proposed method, as the current existing image—based tracking techniques, such as visual ser- voing and gaze control, cannot achieve this. Results for the cooperative Hausdorff tracking are also presented. The two multi-sensor cooperation methods proposed are also validated experimentally. A task-based metric is also proposed for evaluating the suitability of the various video feedback schemes. 169 Chapter 7 Conclusions This dissertation has presented an in-depth examination of the various aspects of planning and control of mobile surveillance networks, concentrating particularly in the broad areas of modeling, analysis and design of mobile surveillance networks. This chapter discusses the conclusions and gives a view towards future areas of research. 7.1 Conclusions The non-intrusive nature of cameras makes them an attractive choice for sensors for various surveillance applications. In recent times, with the decrease in the cost of cam- era sensors accompanied with increasing processing power and availability of wireless networking capabilities in sensors of exceedingly smaller size, large-scale target track- ing systems utilizing a large number of cameras are being increasingly deployed for myriad surveillance and tracking tasks. Sensor mobility accompanied with large scale wireless networking can overcome the limitations due to range, direction and occlu- sion of visual sensing. However, a large number of sensors with motion capabilities tends to increase the modeling and analysis complexity of such systems. For successful completion of the surveillance task, it is important to allocate the sensors to the targets that are actively being tracked. Generally, each target should 170 be tracked by the sensor that can view it at an optimal resolution for best visibility of that target. However, when the system is heavily loaded, that is, the target density in a particular region with a limited number of sensors is high, one sensor might need to track multiple targets. Even when the system is under-loaded, multiple target tracking by individual sensors is highly desirable, as viewpoint redundancy will lead to better estimates of the target coordinates and also present the viewer a clearer view of all the targets that are being tracked. In order to track a target over a wide area with a distributed surveillance network, the tracking sensors need to be successively switched and the tracking task needs to be handed over from one sensor to the next. This switching, while increasing the modeling and analysis complexity, also introduces the design problem of optimally locating the sensors (i.e., configuring the sensors) in order to maximize the tracking performance. Switching the tracking task from one sensor to another tends to be quite disorienting for a human operator and hence needs to be minimized while tracking a target over a large region, while also maintaining adequate resolution of the target to allow discerning various target features for classification and identification of targets. This dissertation presents various novel approaches to solve the aforementioned problems associated with mobile surveillance networks. The original contributions of this dissertation can be broadly classified into modeling, analysis and design contri- butions, which are individually addressed below. Modeling Contributions Using the concepts of continuous and discrete motion of cameras, a generalized modeling framework for conducting pervasive surveillance has been developed. First, we have developed a novel domain-based modeling ap- proach, christened “Hausdorff tracking”, for visually tracking multiple targets with a single sensor while simultaneously accomplishing various other sub-tasks such as maintaining adequate resolution of the tracked targets. Hausdorff tracking uses a 171 mutational-analysis-based approach to express the motion of the target and sensor coverage sets with respect to the camera motion input. The various task requirements can be specified using shape functions which can be construed as analogs of vector errors in a vector—based control system. Using the concept of mutations of domains and shape analysis, the dynamics (change in shape) of the sensor field of view (FOV) and target domains are described and further feedback control mechanisms have been derived to complete the specified task. The assumptions on the control inputs to the camera for stability of the control system can be calculated using the shape Lyapunov theorem. Hausdorff tracking provides a significant advantage over classical vector space servoing as it can express the multiple target tracking succinctly and perform the tracking task in an energy- efficient manner. The methods of image-based Hausdorff tracking and cooperative Hausdorff tracking are introduced to perform target tracking using target information either directly sensed from the camera image plane or acquired from other sensors using the communication module, respectively. In order to pervasively track a target over a large area covered by multiple sensors, the mutational hybrid automata (MHA) model is proposed. The MHA model is capable of capturing the inherent hybrid (discrete switching with continuous motion) nature of the pervasive surveillance task - i.e., it accounts for both continuous camera motion and discrete jumps in the location of the tracking camera, which indicates a switch of the tracking camera from one sensor to another. The MHA model for pervasive surveillance can be used to model “hand-off” scenarios as the targets move across the coverage regions of multiple networked sensors. It allows modeling of both autonomous and controlled switches, which provides for a rich framework for hybrid control of the various tracking cameras based on the location of the target and other environment inputs. 172 :E_t Wfifl Analysis Stability properties of the Hausdorff tracking controllers are proved using the shape Lyapunov theorem. Using the shape Lyapunov theorem, it is shown that the shape function (error) converges to zero at least as fast as a stable first-order linear system. Stability for a pervasive tracking scenario implies that when a target is detected by a camera in the network, its visibility will always be ensured while it moves around in the monitored system. Using examples of pervasive surveillance scenarios, a two-step procedure is proposed to show the stability properties of the MHA model for modeling pervasive surveillance tasks. The first step shows that all the modes of the MHA model are internally stable - i.e., continuous evolution within each mode is stable in the sense of Lyapunov and there are no Zeno executions. The next step is to show that a particular set of stable modes in the MHA model forms an invariant set - i.e., if the hybrid execution enters this invariant set, it will remain in this set for all time. Now if we limit the Init set of the MHA model to the invariant set, the execution will never leave that set. This implies that if we can prove that the invariant set is the set of all modes where at least one camera can view the target, then if the target enters the field of view of any one camera, it will be tracked continuously as long as it remains in the monitored region. This procedure is developed for both models with and without controlled switches. Design A wide area multi-camera location and pose calibration scheme is presented that can calibrate the many heterogenous cameras with respect to a global coordinate system in a short time by simply waving a bright light source in the region observed by the cameras. This method can be used even when all cameras do not have an overlapping field of view. The problem of optimally placing a number of cameras in a given region to facilitate a surveillance task is presented. Metrics for evaluating the performance of the tracking task based on the target resolution and number of switches to pervasively track the target are presented. Another problem we presented 173 is related to camera switching strategy given a full or part of a trajectory path of a moving target. A dynamic-programming—based approach to generate the switching strategy and optimize both the switching and resolution metrics is presented. The Hausdorff tracking problem is extended to incorporate an optimal control framework for executing multiple subtasks along with the target visibility task for a surveillance network. This optimization framework allows the surveillance task de- signer to consider various sub-tasks in the planning phase. Various sub—tasks can include minimizing the energy consumed by the network, which will enhance the longevity of the deployed network sensors. Physical programming is used for find- ing a solution to the optimization problem in order to lend a tangible and physical interpretation to the subtasks being considered. The design of a switched video feedback system and a sensor node architecture are also presented. The various alternatives for the switched video feedback namely MJ PEG, H.261, H.263 and H.264 are evaluated and a task- and scenario-based metric is proposed to analyze their performance. The MJPEG scheme does not suffer from the late entry problem, but the bandwidth requirement is constantly high. H.261 and H.263 schemes consume low bandwidth for relatively static scenes but suffer from the late entry problem when video switching is required. A switched video feedback system using a multiple-client, multiple-server architecture using the H.264 video encoding scheme is implemented. Implementation and testing A surveillance test-bed consisting of seven heteroge- nous cameras with motion capabilities for validation and testing of the proposed schemes and algorithms is implemented. The sensors are calibrated using the wide area calibration scheme proposed and various target tracking scenarios are imple- mented and tested. Experimental results validating the proposed theoretical ap- proaches are presented. 174 The various theoretical approaches and tools presented in this dissertation will enable the design, analysis and implementation of real-time mobile surveillance sys- tems. Overall Impacts All together this dissertation presents a collection of modeling analysis and design tools for mobile surveillance networks. The impacts of this dis- sertation to various fields are summarized below. Impact on the field of surveillance and sensor networks: This dissertation presents significant advances for modeling analysis and design for the field of surveillance and sensor networks. Modeling multiple target tracking using the Hausdorff tracking method provides a significant advantage for distributed sensor systems as it can sig- nificantly improve sensing accuracy due to multiple detections of the same target with different sensors. The novel stability analysis approach presented for guaranteing tar- get coverage using distributed sensors is a significant step forward from the current approaches of probabilistic modeling. Impact on the field of robotics and control: This dissertation presents a major contribution to the field of robotics and control. Vision based robot control has suffered from the problem of task specificity i.e., the tracking task is generally over specified and also lacked a generalized modeling structure for modeling multi target visibility problems. The method of Hausdorff tracking proposed in this dissertation represents a generalized set based modeling structure for modeling traditional vi- sion based servoing. It can in fact be viewed as the super set of visual servoing as traditional visual servoing tasks can also be readily expressed in this framework. The use of mutational equations and shape analysis for representing shape based control is a significant step towards extending the rich tool-set of control literature to domain based control. Generalizing a vector based model to a domain based model 175 opens up a lot of avenues for creative design of control algorithms. Many real-world tasks which needed to be parameterized in order to be expressed in a controls theoretic framework can be directly solved using this domain based approach. Impact on general science and engineering: This dissertation presents various results on suitability of various video streaming approaches for surveillance appli- cations. It provides a method to measure the acceptability of various streaming solutions which will have a significant impact on the implementation of networking and IP-based video transmission. Impact on society in general: There are many issues up for debate on privacy and intrusion of it thereof. However, these concerns should not preclude the pervasive monitoring and surveillance of critical infrastructure. In this current world scenario, exhibiting a little precaution with regards to safety is, in my opinion, not a bad idea. This dissertation provides tools for conducting pervasive surveillance which will have many potential uses with regards to monitoring critical infrastructure such as airports power plants etc and also many applications in our fuller understanding of the world around us i.e., in environmental monitoring for understanding and preserving natural habitats. However there are also many possible uses for this technology such as continuous tracking of people with express intent of invasion of their privacy. But all technology like a double edged sword has it’s advantages and disadvantages. It is behooved on the people who use technology to be fair, just and moral in the application of technology. 7.2 Future Research Directions Large-scale surveillance systems have only recently been commercialized, and the architecture and capabilities of these systems are still an active research area as many 176 problems still remain to be solved. This dissertation has addressed several issues in modeling, analysis and design of mobile surveillance systems. Certainly more can be done to elaborate on the work presented in this dissertation and even more can be done to address various other issues in mobile surveillance which have not been the topic of this dissertation. Shape functions for performing visibility- and resolution-based tasks have been developed in this dissertation. However, more general classes of admissible shape functions should be developed in order to provide a rich set of tools to design surveil- lance tasks. The mutational hybrid automata model presented uses an edge set in order to describe the discrete switching dynamics of the pervasive target tracking model. It should be noted that the number of edges and modes required to model the perva- sive surveillance task increases exponentially as the number of cameras involved in the surveillance task grows. This somewhat limits the applicability of the proposed approach of using edge sets for modeling discrete dynamics. In order to alleviate this problem, a language-based automata model can be used with an alphabet capable of modeling very large-scale camera systems. The proposed MHA model can be used for ensuring the visibility of a single target moving in a region. The possibility of combining multiple such single-target MHA models in order to track multiple targets simultaneously using the same network should be investigated. Optimal control for the proposed mutational hybrid automata model is still an open problem. The switched video feedback system is implemented using a baseline encoder. A high profile encoder with object-level support to code the various objects separately would certainly enhance the video feedback system by reducing the bandwidth re- quirement for the surveillance system. The support of user-defined queries and tasks will certainly enhance the usability of the surveillance system, making it more flexible 177 and adaptable for a real-world surveillance scenario. There has been relatively little research done on camera deployment and place- ment issues for tracking applications. The switching-based metric presented in this dissertation serves as a first step towards the final goal of optimal camera placement. A number of avenues can be pursued to carry this work further. A user-friendly camera placement planning system that utilizes the switching metric amongst others would provide a very useful tool for optimal camera placement. 178 10. 11. AUTHOR’S PUBLICATIONS A. Goradia, M. Huntwork, C. Haffner, N. Xi, and M. Mutka, “Modeling and Control of Pervasive Surveillance Networks using Mutational Hybrid Systems,” in IEEE IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, 2006. A. Goradia, C. Haffner, N. Xi and M. Mutka, “Optimality Framework for Haus- dorff Tracking using Mutational Dynamics and Physical Programming,” submit- ted to IEEE International Conference on Robotics and Automation, 2007. A. Goradia, Z. Cen, C. Haffner, N. Xi and M. Mutka, “Sensor Selection Problem using Dynamic Programming,” submitted to IEEE International Conference on Robotics and Automation, 2007. A. Goradia, C. Haffner, Z. Cen, M. Mutka and N. Xi, “Modeling, Analysis and Design of Mobile Surveillance Networks Using Mutational Hybrid Systems,” Submitted to IEEE Transactions on Robotics. . A. Goradia, M. Huntwork, C. Haffner, C. Klochko, N. Xi, and M. Mutka, “Per- vasive Surveillance using a Cooperative Mobile Sensor Network,” in IEEE In— ternational Conference on Robotics and Automation, 2006. A. Goradia, Z. Cen, C. Haffner, Y. Liu, B. H. Song, M. Mutka and N. Xi, “Pervasive Surveillance Networks: Design, Implementation and Performance Analysis,” Accepted, International Journal of Distributed Sensor Networks. A. Goradia, Z. Cen, C. Haffner, N. Xi and M. Mutka, “Design, Implementation and Performance Analysis of Pervasive Surveillance Networks”, 19th Interna- tional F LAIRS conference, 2006. A. Goradia, N. Xi, Z. Cen, and M. Mutka, “Modeling and Design of Mobile Surveillance Networks using a Mutational Analysis Approach,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005. A. Goradia, N. Xi, M. Prokos, Z. Gen, and M. Mutka, “Cooperative Multi- Target Surveillance using a Mutational Analysis Approach,” in Proceedings of 2005 IEEE/ASME International Conference on Advanced Intelligent Mecha- tronics (AIM 2005), Monterey, California, USA, 2005. A. Goradia, I. Elhajj, and N. Xi, “Internet Based Robots: Applications, Im- pacts, Challenges and Future Directions,” in IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), 2005. Z. Cen, A. Goradia, Y. Liu, N. Xi, M. Mutka, “Improving the Operation Effi- ciency of Internet-Based Teleoperation via Overlay Networks,” Submitted, IEEE Transactions on Robotics. 179 12 13. 14. 15. 16. 17. 18. 19. 20. 21. Z. Cen, A. Goradia, M. Mutka, N. Xi, W.-K. Fung, and Y.-h. Liu, “Improving the Operation Efficiency of Supermedia Enhanced Internet Based Teleoperation via an Overlay Network,” in IEEE International Conference on Robotics and Automation, Barcelona, Spain, 2005. Z. Cen, M. Mutka, Y. Liu, A. Goradia, and N. Xi, “QoS Management of Super- media Enhanced Teleoperation via Overlay Networks,” in IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS 2005), Edmonton, Alberta, Canada, 2005. A. Goradia, Z. Nichol, Y. Liu, P. Suchyta, M. Prokos, and N. Xi, “Super- Media Enhanced Internet-Based Real—Time Teleoperation,” in IEEE Interna- tional Conference on Mechatronics and International Conference on Hands-on Intelligent Mechatronics and Automation (ICM-HIMA), 2005. A. Goradia, L. Pan and N. Xi, “Dynamic Multi-Objective Optimal Task Dis- tribution for Teleoperated Mobile Manipulators,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004. J. Tan, A. Goradia, N. Xi, and W. Sheng, “Coordination of Multi-Robot and Human Systems in a Perceptive Reference Frame,” International Journal of Vehicle Autonomous Systems, vol. 2, 2004. I. Elhajj, A. Goradia, N. Xi, C. M. Kit, Y.-H. Liu, and T. Fukuda, “Design and Analysis of Internet-Based Tele—Coordinated Multi-Robot Systems,” Au- tonomous Robots, vol. 15, no. 3, 2003. I. Elhajj, A. Goradia, N. Xi, C. M. Kit, Y.-H. Liu, and T. Fukuda, “Tele- Coordinated Control of Multi-Robot Systems via the Internet.” in IEEE Inter- national Conference on Robotics and Automation, 2003. J. Tan, A. Goradia, N. Xi, and W. Sheng, “Coordination of Human and Mobile Manipulator Formation in a Perceptive Reference Frame.” in IEEE Interna- tional Conference on Robotics and Automation, 2003. I. Elhajj, A. Goradia, N. Xi, W. T. Lo, Y.—H. Liu, and T. Fukuda, “Internet- Based Tele-Manufacturing,” in The Seventh International Conference on Au— tomation Technology, Chia—yi, Taiwan, 2003. Amit Goradia, Jindong Tan, Ning Xi, “Hybrid Force/ Position Control in Mov- ing Coordinate Frames,” International Conference of Control, Automation, Robotics and Computer Vision, Singapore, 2002. 180 BIBLIOGRAPHY [1] C. S. Regazzoni, G. Fabri, and G. Vernazza, Advanced Video-Based Surveillance Systems. Norwell, MA: Kluwer, 1998. [2] G. L. Foresti, P. Mahonen, and C. S. Regazzoni, Multimedia Video-Based Surveil- lance Systems: Requirements, Issues and Solutions. Norwell, MA : Kluwer, 2000. [3] M. Bogaert, N. Chelq, P. Cornez, C. S. Regazzoni, A. Teschioni, and M. Thonnat, “The password project,” in Proceedings of the International Conference on Image Processing, Chicago , IL, 1996, p. 675 678. [4] C. S. Regazzoni and A. Tesei, “Distributed data fusion for real-time crowding estimation,” Signal Processing, vol. 53, p. 47 63, 1996. [5] E. F. Lyon, “The application of automatic surface lights to improve airport safety,” IEEE AES Systems Magazine, p. 14 20, 1993. [6] M. Braasch, M. DiBenedetto, S. Braasch, and R. Thomas, “Laas operations in support of airport surface movement, guidance, control and surveillance: Initial test results,” in IEEE Proceedings on Position Location and Navigation Sympo- sium, 2000, p. 82 89. [7] H. Wang, T. Kirubarajan, and Y. Bar-Shalom, “Precision large scale air traffic surveillance using imm/ assignment estimators,” IEEE Transactions on Aerospace and Electronic Systems, vol. 35, p. 255 266, January 1999. [8] B. Sridhar and G. B. Chatterji, “Computationally efficient conflict detection methods for air traffic management,” in Proceedings of the American Control Conference, vol. 2, 1997, p. 1126 1130. [9] G. Donohue, “Vision on aviation surveillance systems,” in Proceedings of the IEEE International Radar Conference, 1995, p. 1 4. [10] G. L. Foresti and B. Pani, “Monitoring motorway infrastructures for detection of dangerous events,” in Proceedings of the IEEE International Conference on Image Analysis and Processing, Venice, Italy, 1999, p. 1144 1147. [11] J. M. Manendez, L. Salgado, E. Rendon, and N. Garcia, “Motorway surveil- lance through stereo computer vision,” in Proceedings of the 33rd IEEE Annual International Carnahan Conference on Security Technology, 1999, p. 197 202. [12] D. Koller, K. Daniilidis, and H. Nagel, “Model-based object tracking in monoc- ular image sequences of road traffic scenes,” International Journal on Computer Vision, vol. 10, p. 257 281, 1993. [13] J. Malik, D. Koller, and J. Weber, “Robust multiple car tracking with occlusion reasoning,” in European Conference on Computer Vision, Stockolm, Sweden, 1994, p. 189 196. 181 [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] A. N. Ince and E. Topuz, “The design and computer simulation of a maritime surveillance system,” in International Confierence on Radar, 1997, p. 653 656. J. G. Sanderson, M. K. Teal, and T. J. Ellis, “Target identification in complex maritime scenes,” in 6th International Conference on Image Processing and its Applications, vol. 2, 1997, p. 463 467. C. A. Rodriguez, J. A. Howell, H. O. Menlove, C. M. Brislawn, J. N. Bradley, P. Chare, and T. Gorten, “Nuclear video image pro—cessing for nuclear safe- guards,” in Proceedings of the IEEE 29th Annual International Carnahan Con- ference on Security Technology, 1995, p. 355 363. E. Stringa and C. S. Regazzoni, “Content-based retrieval and real time detection from video sequences acquired by surveillance systems,” in IEEE International Confnference on Image Processing, vol. 3, Chicago, IL, October 4-7 1998, p. 138 142. C. Lin and R. Nevatia, “Building detection and description from a single intensity image,” Computer Vision and Image Understanding, vol. 72, no. 2, p. 101 121, 1998. L. Vergara and P. Bernabeu, “Automatic signal detection applied to fire control by infrared digital signal processing,” Signal Processing, vol. 80, no. 4, p. 659 669, 2000. W. Millesi, M. J. Truppe, F. Watzinger, A. Wagner, and R. Ewers, Lecture Notes in Computer Science. Berlin, Germany: Springer, 1997, vol. 1205, ch. Image guided surgery extended by remote stereotactic visualization, p. 813 821. I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveillance of peo— ple and their activities,” IEEE Transaction on Pattern Analalysis and Machine Intelligence, vol. 22, p. 809830, August 2000. N. M. Oliver, B. Rosario, and A. P. Pentland, “A bayesian computer vision sys- tem for modeling human interactions,” IEEE Transactions on Pattern Analalysis and Machine Intelligence, vol. 22, no. 8, p. 831843, August 2000. D. A. Pritchard, “System overview and applications of a panoramic imaging perimeter sensor,” in Proceedings of the IEEE 29th Annual International Car- nahan Conference on Security Technology, 1995, p. 420425. U. Oppelt, “New possibilities for video applications in the security field,” in Proceedings of the IEEE 29th Annual International Carnahan Conference on Security Technology, 1995, p. 426435. B. Peters, J. Meehan, D. Miller, and D. Moore, “Sensor link protocol: Linking sensor systems to the digital battlefield,” in Proceedings of the IEEE Military Communications Conference, vol. 3, 1998, p. 919923. 182 [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] M. T. Fennell and R. P. Wishner, “Battlefield awareness via synergistic sar and mti exploitation,” IEEE Aerospace and Electronic Syststems Magazine, vol. 13, p. 3943, February 1998. J. S. Draper, S. Perlman, C. K. Chuang, M. Hanson, L. Lillard, B. Hibbeln, and D. Sene, “Tracking and identification of distant missiles by remote sounding,” in Proceedings of the IEEE Aerospace Conference, vol. 4, 1999, p. 333341. G. A. V. Sickle, “Aircraft self reports for military air surveillance,” in Proceedings of the IEEE Digital Avionics Systems Conference, vol. 2, 1999, p. 28. K. Toyoma, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and practice of background maintainence,” in International Conference on Computer Vision, 1999, pp. 255—261. I. Elhajj, N. Xi, W. K. Fung, Y.-H. Liu, Y. Hasegawa, and T. Fukuda, “Supermedia-enhanced internet-based telerobotics,” Proceedings of the IEEE, vol. 91, no. 3, pp. 396 —- 42, March 2003. N. Xi and T. J. Tarn, “Action synchronization and control of internet based teler- obotic systems,” in IEEE International Conference on Robotics and Automation, 1999. C. Regazzoni, V. Ramesh, and G. E. Foresti, “Special issue on third generation surveillance systems,” Proceedings of the. IEEE, vol. 89, Oct. 2001. R. Collins, A. Lipton, and T. Kanade, “Introduction to the special section on video surveillance,” Pattern Analysis and Machine Intelligence, IEEE Transac- tions on, vol. 22, no. 8, pp. 745—746, August 2000. J. Tan, N. Xi, W. Sheng, and J. Xiao, “Modeling multiple robot systems for area coverage and cooperation,” in International Cconference on Robotics and Automation, 2004. S. Stoeter, P. Rybski, M. Erickson, M. Gini, D. Hougen, D. Krantz, N. Pa- panikolopoulos, and M. Wyman, “A robot team for exploration and surveillance: Design and architecture,” in Proceedings of the International Conference on In- telligent Autonomous Systems, 2000, pp. 767—774. J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” in International Conference on Robotics and Automation, 2003. T.Matsuyama and N .Ukita, “Real-time multitarget tracking by a cooperative distributed vision system,” Proceedings of the IEEE, vol. 90, no. 7, pp. 1136— 1150, 2002. 183 [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] T. Boult, R. J. Micheals, X. Gao, and M. Eckmann, “Into the woods: Visual surveillance of noncooperative and camouflaged targets in complex outdoor set- tings,” Proceedings of the IEEE, vol. 89, Oct. 2001. R.T.Collins, A.J.Lipton, H.Fujiyoshi, and T.Kanade, “Algorithms for coopera- tive multisensor surveillance,” Proceedings of the IEEE, vol. 89, pp. 1456—1477, 2001. R. Brooks, C. Griffin, and D. Friedlander, “Self-organized distributed sensor network entity tracking,” International Journal of High Performance Computing, vol. 16, no. 2, 2002. S. Hutchinson, G. Hager, and P. Corke, “A tutorial on visual servo control,” IEEE Transactions on Robotics and Automation, vol. 12, no. 5, pp. 651—670, 1996. C. Brown, “Gaze control with interactions and delays,” IEEE Transactions on Systems Man and Cybernetics, vol. 20, no. 1, pp. 518—527, 1990. M. Branicky, V. Borkar, and S. Mitter, “A unified framework for hybrid control: Model and optimal control theory,” IEEE Transactions on Automatic Control, vol. 43, no. 1, pp. 31-45, 1998. J. Lygeros, K. Johansson, S. Simic, J. Zhang, and S. S. Sastry, “Dynamical prop- erties of hybrid automata,” IEEE Transactions on Automatic Control, vol. 48, no. 1, pp. 2—17, 2003. C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva, “Di- rected Diffusion for Wireless Sensor Networking,” IEEE/A CM Transactions on Networking, vol. 11, no. 1, 2003. D. Braginsky and D. Estrin, “Rumor Routing Algorithm for Sensor Networks,” Proceedings of the 1 st ACM international workshop on Wireless sensor networks and applications, pp. 22—31, 2002. H. S. Hamza and S. Wu, “RAODV: an Efficient Routing Algorithm for Sen- sor Networks,” Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, pp. 297—298, 2004. R. T. Collins, A. J. Lipton, and T. Kanade, “Introduction to the special section on video surveillance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, p. 745746, August 2000. A. J. Courtemanche and A. Ceranowicz, “Modsaf development status,” in Pro- ceeding of the 5th Conference on Computer Generated Forces and Behavioral Representation, Orlando, FL, May 1995, p. 313. K. Cauble, “The openscene modstealth,” in Proceedings of the Simulation Inter- operability Workshop, Paper 97S-SIW—008 1997. 184 [51] C. S. Regazzoni, C. Sacchi, and C. Dambra, “Remote cable-based video surveil- lance applications: the avs—rio project,” in Proceedings of ICIAP99, Venice , Italy, September 2729 1999, p. 12141215. [52] P. Rybski, S. Stoeter, M. Gini, D. Hougen, and N. Papanikolopoulos, “Perfor- mance of a distributed robotic system using shared communications channels,” Robotics and Automation, IEEE Transactions on, vol. 18, no. 5, pp. 713 —- 727, October 2002. [53] ITU-H.261, “Video codec for audio—visual services at 64—1920 kbit/s,” I TU-T Recommendation H.261, 1993. [54] ITU-H.263, “Video coding for low bit rate communication,” I TU—T Recommen- dation H.263, March 1996. [55] I.-T. R. H. . I. 11496-10, “Advanced video coding,” Final Committee Draft, Document J VT-E022, September 2002. [56] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “Rtp: A transport protocol for real—time applications,” RF 01889, July 1997. [57] L. Berc, W. Fenner, R. Frederick, S. McCanne, and P. Stewart, “Rtp payload format for jpeg—compressed video,” RF C2435, October 1998. [58] E. Sontag, Mathematical Control Theory. Springer-Verlag, 1990. [59] J .-P. Aubin, “Mutational equations in metric spaces,” Set- Valued Analysis, vol. 1, pp. 346, 1993. [60] —, Mutational and Morphological Analysis: Tools for Shape Evolution and Morphogenesis. Birkhéiuser,1999. [61] R. Alur, T. Henzinger, G. Lafferriere, and G. Pappas, “Discrete abstractions of hybrid systems,” Proceedings of the IEEE, vol. 88, no. 7, pp. 971—984, 2000. [62] J. Cea, “Problems of shape optimal design,” Optimization of Distributed Param- eter Structures vol. I and II, pp. 1005—1087, 1981. [63] J. Sokolowski and J .-P. Zolesio, Introduction to Shape Optimization: Shape Sen- sitivity Analysis, ser. Computational Mathematics. Springer-Verlag, 1991. [64] A. C. Sanderson, L. Weiss, and C. Neuman, “Dynamic sensor based control of robots with visual feedback,” IEEE Transactions on Robotics and Automation, vol. RA-3, pp. 404—417, October 1987. [65] L. Doyen, “Shape laypunov functions and stabilization of reachable tubes of control problems,” Journal of Mathematical Analysis and Applications, vol. 184, pp. 222—228, 1994. 185 [66] —, “Mutational equations for shapes and vision-based control,” Journal of Mathematical Imaging and Vision, vol. 5, no. 2, pp. 99—109, 1995. [67] A. Goradia, Z. Cen, M. Mutka, and N. Xi, “Modeling and design of mobile surveillance networks using a mutational analysis approach,” in IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems, 2005. [68] A. Goradia, N. Xi, M. Prokos, Z. Gen, and M. Mutka, “Cooperative Multi- Target Surveillance Using a Mutational Analysis Approach,” in Proceedings of 2005 IEEE/ASME International Conference on Advanced Intelligent Mechatron- ics (AIM 2005), Monterey, California, USA, 2005. [69] K. Fregene, D. Kennedy, and D. Wang, “A study of supervisory constraints in a class of coordinated multiagent systems,” in Proceedings of the American control Conference, 2003, pp. 1044-4049. [70] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern and Machine Intell, vol. 22, no. 11, pp. 1330—1334, 2000. [71] —, “Motion and structure from two perspective views: from essential param- eters to euclidean motion through the fundamental matrix,” J. Opt. Soc. Am, vol. 14, no. 11, 1997. [72] R. D. Z. Zhang, 0. Faugeras, “An effective technique for calibrating a binocular stereo through projective reconstruction using both a calibration object and the environment,” Videre: Journal of Computer Vision Research, vol. 1, no. 1, 1997. [73] L. Shapiro and G. Stockman, Computer Vision. Prentice Hall, 2001. [74] X. Chen, J. Davis, and P. Slusallek, “Wide area camera calibration using virtual calibration objects,” in Computer Vision and Pattern Recognition. Proceedings. IEEE Conference on, 2000. [75] A. Heyden and K. Astrom, “Minimal conditions on intrinsic parameters for eu- clidean reconstruction,” in ACC V ’98: Proceedings of the Third Asian Confer- ence on Computer Vision- Volume II. London, UK: Springer-Verlag, 1997, pp. 169—176. [76] M. Grewal and A. Andrews, Kalman Filtering: Theory and Practice. Prentice Hall, 1993. [77] W.-P. Chen, C.-J. Hou, and L. Sha, “Dynamic clustering for acoustic target tracking in wireless sensor networks,” IEEE Transactions on Mobile Computing Special Issue on Sensor Networks, 2004. [78] J. Liu, J. Liu, J. Reich, P. Cheung, and F. Zhao, “Distributed Group Manage— ment for Track Initiation and Maintenance in Target Localization Applications,” in Proceedings of 2nd International Workshop on Information Processing in Sen— sor Networks (IPSN), 2003. 186 [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] Q. Fang, J. Liu, L. Guibas, and F. Zhao, “RoamHBA: Maintaining Group Con- nectivity In Sensor Networks,” in The 3rd international symposium on Informa- tion processing in sensor networks (IPSN2004), 2004, pp. 151—160. M. Mauve, J. Widmer, and H. Hartenstein, “A Survey on Position—Based Routing in Mobile Ad—Hoc Networks,” IEEE Network Magazine, vol. 15, no. 6, pp. 30—39, 2001. R. Timothy and J. Arora, “Survey of multi-objective optimization method for en- gineering.” http://www. ccad.uiowa.edu/ tmarler/rtm/img/survey.pdf, Accessed March 2005. B. S. Gottfried and J. VVeisman, Introduction to Optimization Theory. Prentice- Hall, Inc., Englewood Cliffs, New Jersey, 1973. A. Mullur, C. Mattson, , and A. Messac, “New decision matrix based ap- proach for concept selection using linear physical programming,” in 44th AIAA/ASME/ASCE/AHS, Paper No. AIAA 20034446, Norfolk, VA, April 2003. A. Messac, “Physical programming: Effective optimization for computational design,” AIAA Journal, vol. 34, no. 1, pp. 149—158, 1996. ———, “Horn dubious construction of objective functions to the application of physical programming,” AIAA Journal, vol. 38, no. 1, pp. 155—163, Jan 2000. R. Fletcher, Practical Methods of Optimization, Second Edition. Anchor Bren- don Ltd., Tiptree, Essex, 1991. Trolltech-Inc, “Qt: A cross platform application development framework,” http://www.trolltech.com/products/qt/index.html, accessed December 2005. C. Kim and J .-N . Hwang, “Ob ject-based video abstraction for video surveillance systems,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 12, December 2002. S. M. LaValle and S. A. Hutchinson, “Optimal motion planning for multiple robots having independent goals,” IEEE Transactions on Robotics and Automa— tion, vol. 14, no. 6, pp. 912—925, December 1998. R. Pal, Nikhil and S. K. Pal, “A review on image segmentation techniques,” Pattern Recognition, vol. 26, no. 9, pp. 1277-1294, 1993. J. Bruce, T. Balch, and M. Veloso, “Fast and inexpensive color image segmen- tation for interactive robots,” in IROS, 2000. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera self- calibration for virtual environments,” PRESENCE: Teleoperators and Virtual Environments, vol. 14, no. 4, pp. 407—422, August 2005. 187 I"[[[[gi[[[[[jgl[[1]]