Ens-s ,-

"41’,

ﬁg,” ‘

N. 1

" ' o.
9mg" . v
I - 3"“

. 4“;

.o~-
. $243. ‘

., ,.r;.<-;; 2.“:
'H‘g‘w’4“ .

W... ‘_ .
A...“ —. N
'- «at.

3'"...

$va -
"fl."

-. f". I

 

This is to certify that the
dissertation entitled

PLANNING AND CONTROL OF MOBILE SURVEILLANCE
NETWORKS

presented by

AMIT YOGESH GORADIA

has been accepted towards fulﬁllment
of the requirements for the

Ph.D. degree in ELECTRICAL AND
COMPUTER ENGINEERING

 

 

_\ _
, "_.__. t

 

Major Professor’s Signature

0/
(I Bar/t5

Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

f"-‘4‘

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p‘lClRC/DaIeDue.indd-p.1

PLANNING AND CONTROL OF
MOBILE SURVEILLANCE NETWORKS

BY

Amit Goradia

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical and Computer Engineering

2006

P.
mow
St'IlNll
The I
In;T I'M
muniti
and P);
[rat-hi;
“Miter 1
Scale m

Slalilt? r

 

The

Capabii

DELSH}

 

 

ABSTRACT

Planning and Control of Mobile Surveillance Networks
By

Amit Goradia

Pervasive surveillance implies the continuous tracking of multiple targets as they
move about the monitored region so that they do not leave the ﬁeld‘of view of the
sensors observing them and maintain discernable resolution for feature identiﬁcation.
The tasks to be performed by a surveillance system are expressed as the follow-
ing requirements:(1) Automatically track the identiﬁed targets over the region being
monitored; (2) Coordinate tasking of multiple sensors to maintain target visibility
and execute other requested tasks;(3) Provide concise feedback and video data of a
tracked target to multiple operators. Due to the inherent complexity of large net—
worked systems and the diversity of the sensors used, a major challenge for such large
scale networked systems is to design an efﬁcient modeling and analysis tool and devise
stable control algorithms for accomplishing the surveillance task.

The ability to track multiple targets simultaneously using sensors with motion
capability is an important aspect of surveillance systems. Current feature-point-
based visual surveillance and tracking techniques generally employed do not provide
an adequate framework to express the surveilla; z-c task of tracking multiple targets
simultaneously using a single sensor. This rlisscrtation presents the method of Haus-
dorff tracking, which can express the surveillance task succinctly and track multiple
targets simultaneously using a single sensor. Hausdorff tracking can readily express
the surveillance tasks of maintaining the visibility and adequate resolution of a target
as the minimization of an error (shape function) and accomplish the tracking task
using feedback directly from the acquired image. Mutational equations are used to

represent the motion of the target sets with respect to the motion of the camera.

Ls

IIIL‘:

Emit
{lung
of III
(haw;
sire .~
for It;
SWIM"!
It»: 1;.
apps).

Tl.
Ht‘lltjt:
Hail,

bd‘t’fii

Using the mutational dynamics model, a feedback map to control the sensor motion
module for accomplishing the surveillance task can be developed. Further, in order
to take advantage of the redundancy in the motion control, an optimal Hausdorff
tracking framework is presented.

Despite the limited sensing capability and range of the individual sensors, a surveil-
lance network can track targets over a large region by transferring the target-tracking
task from one group of sensors to another based on the motion of the target. Muta-
tional analysis and shape-based control fail to capture the discrete switching nature
of the surveillance task of tracking the target using multiple switched sensors. This
dissertation presents a mutational hybrid automata (MHA) model for such perva-
sive surveillance networks that retains the advantages of using mutational equations
for modeling the active surveillance task while also being able to model the discrete
switching between various sensors. Analysis of example pervasive surveillance scenar-
ios modeled using the MHA model and experimental results that verify the proposed
approach are presented.

The active sensors needed to track the target keep changing due to target motion.
Hence the concise video feedback provided to the operator to assist decision-making
needs to be switched to the sensors currently involved in the tracking task. A task-
based metric representing the number of cameras required to track a moving target
over a monitored region is presented. This metric (in conjunction with other metrics)
can be used to optimally place cameras in a monitored environment. An optimized
camera selection algorithm for selecting the minimum number of cameras to track a
moving target over a large area is also presented. A surveillance test—bed has been
designed based on these requirements and the algorithms and subsystems described
have been implemented on it. Results of the experimental implementation validate

the proposed approaches.

To my parents and Prerna...

haw
dun
(‘UIII
litm-
51.13::
I
and l
I am
Drain:
Dr. \
lft‘hm
and h.
IOU H.
Ofpan
.\l:
(IE-5“."

Will}.

 

 

ACKNOWLEDGEMENTS

I would like to thank my advisor, Dr. N ing Xi, without whom this would not
have been possible. I would like to thank Dr. Matt Mutka for providing acute insights
during our numerous discussion sessions. I would like to thank all my other guidance
committee members, Dr. Hassan Khalil, Dr. Erik Goodman and Dr. Gowei Wei, who
have devoted many hours on the committee meetings. Their insightful comments and
suggestions have enhanced the technical soundness of this dissertation.

I would also like to express my deepest gratitude to the Dean Dr. Satish Udpa
and Dr. Subir Biswas for their support and advise through the course of my career.
I am grateful to my friends and colleagues from the Robotics and Automation Lab—
oratory. The discussions with Zhiwei Cen, Dr. Yu Sun, Mike(Quan) Shi, Yang Liu,
Dr. Yantao Shen, Long Pan, Dr. Jindong Tan, Dr. Imad Elhajj, Dr. Guangyong Li,
Uchechukwu Wejinya, and Jiangbo Zhang have substantially contributed to my work
and broadened my knowledge. I would like to express my great appreciation to Clay-
ton Haffner, Michael Huntwork, Mat Prokos and Craig Pomeroy for the endless hours
of painstaking work they have put in at the Robotics and Automation Laboratory.

My thanks go to my family, especially my ﬁancee Prerna and my parents. This
dissertation would not have been possible without their years of encouragement and

continuous support.

.. ...r
.99 3

HIM: 11
Du. 9......

TABLE OF CONTENTS

LIST OF FIGURES ix
LIST OF TABLES xii
1 Introduction 1
1.1 Background ................................ 1
1.2 Challenges and Problems in Surveillance Networks ........... 5
1.3 Literature Review ............................. 7
1.4 Objectives of this Research ........................ 12
1.5 Major Contributions of this Dissertation ................ 13
1.6 Organization of this Dissertation .................... 15

2 Mathematical Background 17
2.1 Introduction ................................ 17
2.2 Mutational Equations ........................... 19
2.2.1 Evolution of a Tube ........................ 20

2.2.2 Controlled Mutational Equations ................ 21

2.3 Hybrid Dynamic Systems ......................... 22
2.4 Mutational Hybrid Dynamic Systems .................. 24
2.5 Shape Analysis .............................. 25
2.5.1 Shape Functions ......................... 25

2.5.2 Directional Derivative of Shape Functions ........... 26

2.6 Chapter Summary ............................ 27

3 Modeling Mobile Surveillance Networks 28
3.1 Introduction - Example Surveillance Scenario .............. 28
3.2 Controlled Mutational Equation Model for Camera Sensor and Target 33
3.2.1 Perspective Projection Model .................. 33

3.2.2 World Space Model ........................ 34

3.3 Incorporating Non-holonomic Sensor Constraints ............ 37
3.4 Hausdorff Tracking ............................ 39
3.4.1 Image-Based Hausdorff Tracking ................. 41

3.4.2 Cooperative Hausdorff Tracking ................. 46

vi

C."

C."

C."

C."
.r.

K."
L'

kw!
C?

C."

C."
I]!

w

 

3.5 Mutational Hybrid Automata Model for Pervasive Surveillance Networks 50

3.5.1 Mutational Hybrid Automata Model .............. 50
3.5.2 Controller Design for Individual Cameras ............ 54
3.6 Chapter Summary ............................ 55
Analysis of Surveillance Networks 57
4.1 Lyapunov Theorem for Shape Space ......... . .......... 57
4.2 Stability Analysis of Hausdorff Tracking ................ 60
4.3 Analysis of Example Surveillance System using Mutational Hybrid Au-
tomata Model ............................... 61
4.3.1 Two-Camera Example for Pervasive Surveillance with Contin-
uous Servoing Cameras ...................... 62
4.3.2 Stability of the Two-Camera Example System ......... 70
4.3.3 Two-Camera Example for Pervasive Surveillance with Controlled
Switches .............................. 74
4.3.4 Stability Analysis of the Two Camera Example with Controlled
Switches .............................. 79
4.4 Chapter Summary ............................ 83
Design of Surveillance Networks 84
5.1 Wide Area Camera Calibration ..................... 85
5.1.1 Unitless Calibration ........................ 85
5.1.2 Method for Assigning Units ................... 88
5.2 Grouping Architectures .......................... 88
5.3 Sensor Allocation for Multi-Sensor Cooperation ............ 91
5.4 Optimal Hausdorff Tracking ....................... 93
5.4.1 Framework for Optimal Hausdorff Tracking Design ...... 94
5.5 Switched Video Feedback ......................... 106
5.5.1 Characteristics and Implementation Details for MJPEG . . . . 110
5.5.2 Characteristics and Implementation Details for H.261 ..... 110
5.5.3 Characteristics and Implementation Details for H.263 ..... 111
5.5.4 Characteristics of H.264 / MPEG4~Part10 ........... 112
5.6 Sensor Node Architecture ........................ 113
5.7 Modeling Switched Video Feedback as a Dynamic System ....... 114
5.7.1 Examples of Video Feedback Algorithms - Best Resolution . . 116
5.7.2 Example Video Feedback Algorithm - Persistent Camera . . . 117
5.7.3 Assessment Metric for Video Switching Based on Camera Lo-
cations ............................... 117
5.8 Optimization for Target Tracking using Switched Sensors ....... 119
5.8.1 Optimal Control and Dynamic Programming .......... 119
5.8.2 Dynamic Programming Algorithm Implementation ...... 121

vii

5.9 Chapter Summary ............................ 121

6 Experimental Implementation and Testing 124

6.1 Experimental Test—bed Setup ...................... 124

6.1.1 Hardware Setup .......................... 126

6.2 Software Setup .............................. 127

6.3 Experimental Results for Wide Area Camera Calibration ....... 128

6.4 Experimental Implementation for Image-Based Hausdorff Tracking . . 131

6.4.1 Tracking a Single Target ..................... 131

6.4.2 Tracking Multiple Targets Simultaneously ........... 133

6.5 Experimental Implementation for Cooperative Hausdorff Tracking . . 136
6.6 Experimental Implementation for Redundancy Resolution using Opti-

mal Hausdorff Tracking .......................... 141

6.7 Experimental Results for Multisensor Cooperation ........... 146

6.7.1 Experimental Evaluation for the Sensor Allocation Algorithm 146
6.7.2 Experimental Evaluation of Multiple Target Tracking using Mul-

tiple Formation-Constrained Sensors .............. 148

6.7.3 Experimental Evaluation for Multi-Sensor Cooperation using
Mutational Hybrid Automata Model .............. 155
6.8 Performance Analysis of the Video Subsystem ............. 162
6.8.1 Performance Analysis for Capture and Processing Time . . . . 164
6.8.2 Comparison for Video Size and Image Quality ......... 165
6.8.3 Performance Analysis for Video Bitrate and Frame Rate . . . 165

6.8.4 Performance Analysis for Server Switching and Camera Handoff 166
6.8.5 Performance Metric Evaluation for Various Surveillance Net-

work Tasks ............................ 168

6.9 Chapter Summary ............................ 169

7 Conclusions 170
7.1 Conclusions ................................ 170
7.2 Future Research Directions ........................ 176
Bibliography 181

viii

1.1
3.1
3.2
3.3
3.4
3.5
3.6
4.1
4.2
4.3
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8

5.9

LIST OF FIGURES

Pervasive Surveillance Scenario ..................... 2
3 Camera Pervasive Surveillance Scenario ................ 29
Modeling Pervasive Surveillance Networks ............... 30
Target and coverage set for image-based Hausdorff tracking. ..... 42
Block diagram for image-based Hausdorff tracking. .......... 46
Target and coverage set 2D for cooperative Hausdorff tracking. . . . . 48
Target and coverage set for 3D cooperative Hausdorff tracking. . . . . 48
Example pervasive surveillance scenario with 2 cameras ........ 62
Graphical MHA model for the two-camera scenario .......... 65
Graphical MHA model for the two-camera scenario with controlled

switches .................................. 76
The architecture of the optimal task distribution scheme ........ 95
The structure of the task analyzer module ................ 97
The structure of the optimization scheduling module. ......... 99
Class functions for physical programming. ............... 102
Modiﬁcation of the preference mapping by conﬁdence factor. ..... 104
Switched video architecture implementation .............. 107
H.624 Based Switched Video Streaming Client .............. 109
Architecture of Sensor Node ........................ 113
Generating Graph for the Camera Switching Strategy ......... 122

ix

6.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

6.9

6.10

6.11

6.12

6.13

6.14

6.15

6.16

6.17

6.18

6.19

Surveillance network implementation over an IP network ........

Surveillance test-bed designed along with the cameras and the motion
modules for camera motion. .......................

Sony EVI—D30 mounted on Robot Nomad XR-4000 ..........
Active Cameras: Sony EVI-D30 and dual head Hitachi KP-D50

Camera Setup: Location and pointing direction of the camera setup. .
Image-based Hausdorff tracking using active camera ..........

Image-based Hausdorff tracking using active camera: Maintaining ad-
equate target resolution .........................

Tracking multiple targets using Image-based Hausdorff tracking

Image-based Hausdorff tracking for multiple targets: Estimated posi-
tion of targets ...............................

Image sequence for tracking multiple targets using Image-based Haus-
dorff tracking ................................

Position of robot and target set on x and y axes .............

Motion of target set and robot on x-z plane. The view cone of the
robot is shown at a particular instance in time with the position of the
target centroid shown as well. ......................

Shape functions for cooperative and image-based Hausdorff tracking .

Robots View of target at several points in time to show several condi-
tions that the tracking system responds to ................

Target tracking using a Sony EVI-D30 camera mounted on Robot No-
mad XR—4000 mobile robot ........................

Results of the tracking task with optimal task distribution using phys-
ical programming ..............................

The energy consumption for no optimization, and weighted sum. . .
The pan angle for no optimization, and weighted sum. ........

Estimated position of targets for multisensor cooperation using sensor
allocation algorithm ............................

125

126

127

130

132

135

136

137

138

139

140

141

143

144

147

6.20

6.21

6.22

6.23

6.24

6.25

6.26

6.27

6.28

6.29

6.30

Experiment setup for multi-target tracking using multiple cameras. . 150

Target tracks for multi-target tracking using multiple cameras ..... 151
Experiment setup for multi-target tracking using multiple cameras. . 152
Experiment setup for multi—target tracking using multiple cameras. . 153

Image Sequences taken while tracking the two targets with two formation-
constrained cameras ........................... 154

X-Y Plot for target motion ........................ 155

Current MHA mode, shape function, velocity, pan angle and target
visibility for Caml ............................ 157

Current MHA mode, shape function, velocity, pan angle and target
visibility for Camg ............................ 158

Image Sequences taken while tracking the target over different modes 159

Current MHA mode, shape function, velocity, pan angle and target
visibility for Caml ............................ 160

Current MHA mode, shape function, velocity, pan angle and target
visibility for Camg ............................ 161

xi

6.1

6.2

6.3

6.4

6.5

LIST OF TABLES

Scenarios and gain values ......................... 164
Capture and Compression Time of the Two Schemes ......... 164
Frame Rate and Bitrate for MJPEG, H.261 and H.263 ........ 165
Server Switching Time for Monitoring and Tracking Tasks ...... 167
Metric for Comparison of Various Schemes ............... 168

xii

 

 

Ir.

1.]

A $01
("mm
Df‘l Wt
POHsj
3mm.
(‘lffaxj
In in»
mm:
.\'.
”‘19) T,
SWIM;

Infra}.

 

I AIL“
Opﬁfay

and 5‘

Chapter 1

Introduction

1 . 1 Background

A sensor network is a collection of individual nodes with each node having sensing,
communication, computation and motion capabilities. Distributed wireless sensor
networks have recently emerged as an important research area. Sensor networks
consist of a variety of different types of sensors in distributed areas. Technological
advances in wireless networking and distributed robotics have been leading to in-
creasing research on distributed sensing applications using wireless sensor networks.
Infrastructure-less surveillance and monitoring are important. applications of such
rapidly deployable sensor networks.

Networked surveillance systems provide an extended perception and distributed
reasoning capability in monitored environments through the use of multiple networked
sensors. Various types of sensors with varying sensing modalities such as cameras,
infrared detector arrays, laser rangeﬁnders, omnidirectional acoustic sensors etc., can
be instantly deployed in hostile environments, inaccessible terrain and disaster relief
operations to obtain vital reconnaissance information about the area being surveyed.
Locomotion and active sensing on individual nodes can greatly increase the range

and sensing capability of the network. Such instantly deployable, wireless, mobile,

Unmanned Aerial Vehicles

 

Figure 1.1: Pervasive Surveillance Scenario

distributed sensor networks networks ﬁnd myriad civilian and military applications,
such as battle ﬁeld surveillance, environment monitoring, scene reconstruction, mo-
tion detection and tracking, remote sensing and global awareness.

There is a growing interest in surveillance applications due to the growing avail-
ability of sensors and processors at a reasonable cost. There is also a growing need
from the public for improved safety and security in large urban environments and
improved usage of resources of public infrastructure. This, in conjunction with the
increasing maturity of algorithms and techniques, is making possible the application
of this technology in various application sectors such as security, transportation, and
the automotive industry. In particular, the problem of remote surveillance of unat—
tended environments has received growing attention in recent years, especially in the

context of:

0 Safety in transport applications [1], [2], such as monitoring of railway stations

[4], [5], underground stations [3,4], airports [5, 6] and airplane routes [7—9],
motorways [10], [11], urban and city roads [12,13], maritime environments [14,

15];

0 Safety or quality control in industrial applications, such as monitoring of nuclear

plants [16] or industrial processing cycles [1,2];

0 Improved security for people’s lives, such as monitoring of indoor or outdoor
environments like banks [29], supermarkets [3], car parking areas, waiting rooms
[17], buildings [18,19], etc., remote monitoring of the status of a patient [20], re—
mote surveillance of human activity [21,22]; d) military applications for surveil-
lance of strategic infrastructure [23,24], enemy movements in the battleﬁeld

[25,26] and air surveillance [27,28].

Surveillance networks are composed of numerous independent nodes which can
be controlled individually. Hence they have high fault tolerance to node failure.
However, this redundancy and fault tolerance comes at a price. One needs to design
cooperation and coordination schemes in order for the individual nodes to perform
tasks together. Phrther, these cooperation and coordination mechanisms should not
consume too much power and other resources, as they would render the advantages
due to redundancy useless.

Pervasive surveillance can be deﬁned as the active monitoring of recognized tar-
gets as they move through a large monitored area using a network of sensors. An
important characteristic of pervasive surveillance systems is their capability to track
a target over a large area using multiple sensors. Figure 1.1 depicts a scenario for
conducting reconnaissance on a city square. Each individual sensor can view only
a part of the monitored area, but collectively the network can monitor the entire
region. There are a number of reasons that optimal coverage may not be available for

all regions of a large area being monitored, such as: a lack of prior knowledge of the

 

 

IIE‘I'I
IdIir

Spa;

CISP I
mh‘lt
and I
Th‘fr;
<3th- 0

Strain;

environment, a paucity of locations for placement of sensors in the environment, or
a shortage of sensors to be deployed. Motion capability of the sensors and the use of
active sensing can greatly enhance the coverage quality of the network. For example,
monitoring of areas with sparse or no coverage or acquiring higher resolution coverage
of the targets can be accomplished by changing the active ﬁeld of view of the ﬁxed
sensors or deploying mobile sensors. Further, the area monitored using a limited
number of sensors can be increased by using active (mobile) sensors such as UAV’s,
robot-mounted cameras or cameras mounted on pan-tilt heads etc. The mobility of
these sensors can be leveraged to increase the area monitored by each sensor, which
in turn increases the total area monitored. However, the mobility combined with
the scalability requirements renders modeling and analysis of pervasive surveillance
systems using active sensors a challenging task and brings to light many research
issues.

Compared with traditional surveillance networks, an IP-network-based surveil-
lance system is easy to deploy, maintain, and expand. The ubiquitous nature of IP
networks will dramatically reduce the cost to set up a traditionally expensive surveil—
lance network. We envision that the possible usage of surveillance networks could
span the venues of industry, military and daily lives.

The surveillance network must provide, to the human operator, a timely and con-
cise view of the relevant activities within the environment being monitored. Providing
multiple video feedback streams often causes loss of attention span of the operator
and makes it hard to keep track of the various activities over the various cameras.
Therefore only video streams from relevant sensors should be presented to the oper-
ator on a per activity basis. This involves automatically switching the active video

stream presented to the operator.

 

 

rm

1.2 Challenges and Problems in Surveillance Networks

The tasks to be performed by a surveillance system can be expressed as the following

requirements:
1. Deploy the sensors in order to monitor the region under surveillance;
2. Identify multiple moving targets based on predeﬁned models;
3. Automatically track the identiﬁed targets over the region being monitored;

4. Coordinate tasking of multiple sensors to maintain target visibility and execute

other requested tasks;

5. Provide concise feedback and video data of a tracked target to multiple opera-

tors.

The sensors should be deployed in the monitored region in order to maximize
the likelihood of identifying and tracking the targets. In adversarial scenarios, care
should be taken in order to conceal the location of the sensors in order to ensure their
safety and the integrity of the task. A major challenge in the deployment phase is
sensor placement in order to maximize the covered area while also ensuring continuous
tracking of a target when it traverses the coverage regions of multiple sensors.

An important aspect of sensor deployment is calibration. Calibration of the sen-
sors enables multiple sensors to coordinate with each other and share information
amongst themselves. This cooperation amongst sensors leads to better sensing and is
in general a desirable property. Calibration of the sensors can be viewed in many do-
mains (feature spaces). The most literal one is calibrating the location of the deployed
sensors with respect to each other and a global frame. Other types of calibration can
involve calibrating the color spaces of various cameras, which will facilitate the sharing

of target models amongst the sensors.

 

on:
ist

01’ {II

at it
[rat-l
Hum
and f.
agent
Very .
nlOllC

T1
the r

0f th.

 

{0 ii.”

   
  

felt.

malt-j

In order to perform automated surveillance there are two major subtasks: target
detection and target tracking. A target perception and video understanding module
is responsible for detecting and classifying the various targets in the active ﬁeld of
view (F CV) of the sensor and performing temporal consolidation of the detected
targets over multiple frames of detection. Moving target detection and classiﬁcation
is known to be a difﬁcult research problem and has been the focus of many recent
research efforts [29]. The next challenge would be to classify and associate the various
detected image blobs to discernable targets and maintain their temporal tracks in
order to pervasively track them. Once the targets have been identiﬁed, the challenge
is to recognize the relationships of the target to the environment and other targets in
order to attribute aims and motives to the tracked targets.

The target tracking problem involves continuously keeping the target in view of
at least one sensor. When the target is escaping the viewing area of the current
tracking sensor, the current sensor needs to be moved (relocated) or the tracking task
must be transferred to another sensor which can currently view the target. Modeling
and design of such cooperative tracking using multiple sensors is a challenge. Active
agents further make the modeling and design of such pervasive surveillance systems
very difficult. A modeling and design framework for coordinating and controlling the
motion of the active cameras to pervasively track targets is necessary.

The various methods through which the the human operator communicates with
the surveillance network form an integral part of the usability, usefulness and efficacy
of the surveillance network. These methods include passing commands and queries
to the network indicating the intention of the surveillance task and receiving realtime
feedback and results of the queries in progress and alarms for violations of various
predeﬁned conditions.

Video feedback is provided to the operator in order to apprise the operator of

the various tracking and monitoring tasks being currently executed by the sensors.

 

the
lIliU
and
surv
dciaj
rarui

3r. -

1.3

Nf‘l It.

muni .‘

 

 

However, due to the large number of sensors involved, the operator can be easily
overwhelmed by the amount of video information. Hence, only relevant video feeds
should be provided to the operator and, based on the task, the video feed should
be switched from sensor to sensor. There are various choices for the switched video
feedback implementation and they have their advantages and disadvantages based on
the scenario they are being applied to. A performance evaluation metric is needed for
evaluating the efﬁcacy of a particular scheme when being applied to certain scenarios.

Direct human intervention and control of the various cameras can be used for
the logging and tracking of targets which are not predetermined and programmed
into the network. Hence the human operator needs to move the active cameras
and receive real-time video feedback to manually track targets. However for such
surveillance systems, direct manual tele—control of the active active camera over the
delayed IP network suffers from stability and synchronization problems due to the
random, unbounded and asymmetric communication time delay in the network [30,

31]. This can result in reduced performance and even loss of tracking.

1.3 Literature Review

Networked surveillance systems have received much attention from the research com-
munity due to their many pervasive applications [32], [33]. Previous work from liter-

ature is reviewed here.

Deployment and Calibration: Based on their mobility and rapid-deployment
capability, the operation of MSN’s can be divided into two separate phases, namely
deployment and surveillance. Infrastructure-less rapid deployment is an important
characteristic of mobile surveillance networks and many research efforts have been
dedicated to optimally deploy sensor nodes for increasing their total sensing area

[34—36]. A global kinematic representation of a network of connected sensors ’R, =

 

Tat
1!]th
and

Tilt;

deter
appri
tion

ran:
next

discer
Ihem.
and B

larger

Targp

1W‘pl I. I

5])”. l f; .

.gr‘ ' I
nle. .I‘)

 

inl‘gh.‘ ]

I
the Far

5]] ”In r

b] Urfit

 

{R1,R2, ...,Rn} using only localization information between one-hop neighbors is
suggested in [34]. This relationship between the various nodes is key to sharing

meaningful information between them.

Target Identiﬁcation and Video Understanding: Beyond the initial deploy-
ment phase, in the surveillance phase there are two major subtasks: target detection
and target tracking. Figure 5.8 depicts the general architecture of a sensor node.
The target perception module is responsible for detecting and classifying the various
targets in the active ﬁeld of view (FOV) of the sensor and performing temporal con-
solidation of the detected targets over multiple frames of detection. Moving target
detection and classiﬁcation is known to be a difﬁcult research problem [29]. Many
approaches such as active background subtraction [37] [38] and temporal differentia-
tion have been suggested for detecting and classifying various types of moving targets
ranging from single humans and human groups to vehicles and wildlife [39] [38]. The
next problem would be to classify and associate the various detected image blobs to
discernable targets and maintain their temporal tracks in order to pervasively track
them. Various approaches such as extended Kalman ﬁltering, pheromone routing
and Bayesian belief nets have been suggested for maintaining the track of the various

targets [40].

Target Tracking with Active Camera: A general surveillance task involves
keeping moving targets in the active sensing region of the sensors with a certain pre-
speciﬁed resolution. Research approaches to this problem found in recent literature
[39] [37], generally use visual servo control [41] or gaze control [42], which mainly
involve feature-point-based tracking and fail to describe the basic task of maintaining
the target in the sensor’s active ﬁeld of view with certain resolution, effectively and
succinctly. These approaches result in excessive camera motion that may result in

blurring and other undesired effects, which in turn have detrimental effects on the

 

Slll

fill;

fron

its-'1:
in a ,
rt>r]11j
In or.
0f [lit
to an.
the 01
Opera

both (

 

 

surveillance task. Further, using these approaches it is not possible to describe a

multiple target tracking task with randomly moving targets using a single sensor.

Sensor Cooperation and Coordination: Multiple groups of nodes are sequen-
tially involved in order to pervasively track the target as it traverses through the
monitored area which is much larger than the range of an individual sensor. Hence
there needs to be a handoff mechanism which essentially “switches” the tracking task
from one group of sensors to another based on the movement of the target. Tracking
itself can be described as a switching phenomenon since the act of tracking a target
in a given monitoring area which is greater than the viewing area of any given sensor
requires there to be some form of hand-off, or switch, between a number of sensors.
In order to assure that the target visibility will be maintained even when it goes out
of the capable ﬁeld of view of one sensor, the system must switch the tracking task
to another sensor that has to “pick up” the target before it leaves the ﬁeld of view of
the original sensor. This discrete switching phenomenon along with the continuous
operation of the individual nodes can be modeled as a hybrid system consisting of
both continuous and discrete transitions.

Hybrid automata models for describing switching systems have been the focus
of many recent research efforts [43], [44]. They can effectively capture the discrete
switching nature of the surveillance task. However, their continuous part is modeled
in a point-based framework using differential equations. This model fails to effectively
model the continuous operation of the individual nodes in the surveillance task and

suffers from the drawbacks mentioned previously.

Network topology The efficient and timely exchange of information between the
individual sensing nodes is imperative for cooperation and coordination of multi-
ple nodes in order to perform pervasive target tracking. Ad-hoc, multi-hop mecha-

nisms are best suited for such mobile infrastructure-less wireless networked systems.

 

IN"

to '

sur:
by i

ore:

T ,w
(”nyl,

in
‘ I

[fwd r.

 

Networking and routing mechanisms for distributed wireless sensor networks have
attracted a lot of attention in recent literature [45] [46] [47]. These approaches gener-
ally assume scalar (low volume) data transfer between the individuals nodes and the
sink node (which is just a consumer of the information.) However, surveillance tasks
require fast and timely transfer of vector (large volume) data, e.g. video data, to the
sink and also within the nodes themselves. This scenario may create an unbalanced
networking load on the nodes along the optimal relay path, which can be detrimental

to the longevity of the network.

Operator Interface Design: Video feedback is an essential component of the
surveillance system. A single human operator cannot effectively monitor a large area
by looking at dozens of monitors showing raw video output. That amount of sensory
over-load virtually guarantees that information will be ignored and requires a pro-
hibitive amount of transmission bandwidth. In [39] an approach is presented which
provides an interactive, graphical user interface (GUI) showing a synthetic View of
the environment, upon which the system displays dynamic agents representing peo-
ple and vehicles. This approach has the beneﬁt that visualization of scene events
is no longer tied to the original resolution and viewpoint of a single video sensor
and the operator can therefore infer proper spatial relationships between multiple ob-
jects and scene features. Another program called the the Modular Semi-Automated
Forces (ModSAF) program provides a 2-D graphical interface similar to the VSAM
GUI [48], with the ability to insert computer-generated human and vehicle avatars
that provide simulated opponents for training [49]. The ModStealth program gener-
ates an immersive, realistic 3-D visualization of texture—mapped scene geometry and
computer-generated avatars [50].

Although automatic image analysis and video understanding tools [39] can be

used to facilitate identiﬁcation of targets and activation of alarms or logs for certain

10

 

 

 

I]

{)1

iii.
Iltfl

dcx

('lll’l
59115
Illl_l\'i
llllllt
1
(61111
(JYffr
to ti.

S‘Vlft

surveillance tasks, the operator needs the video feedback to make decisions about the
tracking task which may not have been pre—programmed or to independently task the
network based on the current feedback received from the sensors. The video feedback
provided to the operator can be transmitted over either analog or digital channels [51].
The use of analog modulating and transmission techniques for surveillance applica-
tions has been reported in [52]. Receiving digital video feedback over an IP-based
network from the networked camera sensors has also received much attention with the
development of the various video compression and communication protocols [53—57].

Since multiple cameras are deployed to track the identiﬁed targets, multiple, con-
current feedback video streams maybe required for monitoring the target. These
sensors initiating these streams will be changing from time to time as the target
moves out of range of the current sensors tracking it. However, providing multiple
unnecessary (unrelated to the task) video feedback streams often causes loss of at-
tention span of the operator and makes it hard to keep track of the various activities
over the cameras. Hence only video streams from relevant sensors should be presented
to the Operator on a per activity basis. This is done through automatic or manual
switching of the camera streams that are presented to the operator.

Literature Review Summary: From the above review on recent literature on
mobile surveillance networks, the focus of the major works reviewed revolves around
target identiﬁcation and video understanding. However, most of the approaches cited
in literature lack a generalized framework for modeling the surveillance task of main-
taining visibility of multiple targets with adequate resolution. These approaches tend
to use visual servoing which involves only feature point based target tracking and
cannot adequately describe multi—target tracking using a single camera sensor. Multi
target tracking is addressed in a very ad-hoc fashion in literature using methods such
as sensor slaving and gaze control. A generalized modeling and analysis framework

for pervasive target tracking using multiple distributed sensors is not available in

11

 

 

5835
0f ti;
Cfffjl’f
56911;}
n
mode]
Ih’ffast
tori“;
the CT,
”ad-ii:

Ufa}.

 

“fuel
I

Haiti}

”‘3 86::

F0]

 

current literature. Although switched surveillance implementations using multiple
distributed sensors have received much attention in recent literature, a comparative
analysis of video feedback schemes for various visual surveillance scenarios has not
been reported. Further, optimized sensor deployment based on minimizing the num-
ber of sensor switches in order to pervasively track a target using multiple sensors
over a large area has not been explored. This dissertation tries to address these issues
and develop a formal modeling, analysis and design framework for mobile surveillance

applications.

1.4 Objectives of this Research

The main objective of this research is to implement a pervasive surveillance network
using active camera sensors connected to each other using an IP-based network. The
sensors have motion capability and are required to continuously maintain visibility
of the identiﬁed targets in the surveillance region. The objectives of this research
effort are to provide a modeling and design framework for such pervasive surveillance
scenarios.

This dissertation proposes a mutational-analysis—based topological framework for
modeling and design of mobile surveillance networks which ﬁnd applications in myriad
infrastructure-less, rapidly deployable pervasive multi-target surveillance and moni-
toring scenarios. Using the concept of mutations of domains and shape analysis,
the conditions for accomplishing various surveillance tasks such as continuous target
tracking while maintaining appropriate image resolution can be derived. The design
of a surveillance task using image-based Hausdorff tracking, used when the target
is in the active ﬁeld of view of the sensor is presented. The method of cooperative
Hausdorff tracking, used to track targets which are outside the active ﬁeld of view of
the sensor using the observations from other sensors, is presented.

For pervasive surveillance scenario which requires a hand-off or switching mecha-

l2

nisrn between a number of sensors, this dissertation proposes the Mutational Hybrid
Automata (MHA) model, that combines the discrete switching structure of hybrid au-
tomata with the domain-based evolution of mutational systems in order to overcome
their individual limitations. It further presents example surveillance tasks modeled
using the proposed MHA model and provides an analysis and proof of stability of the
example systems.

A switched video feedback system is an important component of the pervasive
surveillance network. This dissertation compares different design alternatives for the
implementation of the switched video feedback system, such as MJ PEG [57] H.261 [53]
and H.263 [54] transported over RTP transport protocol [56]. Their performance
under certain scenarios/tasks are measured and the advantages and disadvantages
of different alternatives are analyzed. Various performance metrics are proposed to
compare the suitability of these schemes based on a given scenario/ task. Further an
optimal sensor selection algorithm for selecting minimum number of sensors to track

a moving target is presented.

1.5 Major Contributions of this Dissertation

The major contributions of this dissertation can be classiﬁed into various categories.
This section provides a brief statement about the major contributions of this disser-

tation.

Theoretical Contribution

o A new domain based modeling framework called Hausdorff tracking for tracking

multiple targets using continuous camera motion is developed.

0 A mutational hybrid automata model is developed for modeling multi-sensor
pervasive target tracking tasks. Using this model, autonomous as well as con-

trolled sensor switching scenarios can be modeled.

13

0 Stability analysis of the proposed Hausdorff tracking scheme using the shape

Lyapunov theorem has been presented.

0 Target tracking stability of the pervasive surveillance task modeled using the
mutational hybrid automata models is analyzed for example systems using a

two-step procedure.

Technological Contributions

0 A switched video feedback system that enables streaming of relevant video
streams to a human operator for multiple clients has been implemented us-
ing a H.264 video encoding scheme. The operator clients developed allow the

automated monitoring as well as active control of the networked sensors.

0 A task / scenario based performance metric is proposed to analyze the suitability
of application of a particular video transport scheme for that task. Various

schemes such as MJPEG, H.261, H.263 and H.264 are compared for different

scenarios using the proposed metric.

Design Contributions

0 An optimality framework for exploiting the redundancy in task accomplishment
using Hausdorff tracking is developed. Physical programming has been used to

provide a physical meaning to the combinations of the various cost functions.

a The video switching algorithm is modeled as a discrete-time ﬁnite-state dynamic
system and an assessment metric is proposed to analyze the locations and con-
ﬁgurations of the cameras for switched video feedback in order to minimize the
number of switches while maximizing the resolution sustained by the tracked

target at the tracking camera.

14

 

 

 

 

011

Hibt

 

 

Contributions for Experimental Work

0 A visual surveillance test-bed has been implemented for testing and validating

the efﬁcacy of the various approaches proposed.

0 A mobile robot and articulated robot arm have been integrated as a part of
the motion platform for the visual sensors. These can be used in the future for

further extension and experiments of this research.

1.6 Organization of this Dissertation

The scope of this dissertation is to investigate a novel modeling and design framework
for mobile surveillance networks. The work is divided into three parts - namely mod-
eling, analysis and design. First, in Chapter 2, we present the necessary background
on the mathematical tools which will be required to develop and analyze the formal
modeling and design framework proposed.

In Chapter 3, the new modeling framework using mutational equations for motion
of a single sensor is presented. Using this mutational equation model, a shape-based
Hausdorff tracking control scheme to track multiple targets with viewing constraints
is developed. Chapter 3 further presents the Mutational Hybrid Automata model
developed for modeling multi-sensor cooperative target tracking scenarios.

Chapter 4 presents the shape Lyapunov theorem for analyzing the stability (in the
sense of Lyapunov) of the Hausdorff tracking method proposed. It further presents
examples of multi camera surveillance systems modeled using the mutational hybrid
automata model. A procedure for analyzing the stability of the example multi-sensor
cooperative tracking systems modeled using the mutational hybrid automata model
is also presented.

Chapter 5 presents the work done toward design of the switched video surveil-

lance system. It presents a brief review on grouping architectures and presents a

15

sensor allocation algorithm which can be used to allocate sensors to targets. A design
framework for optimal tracking of multiple targets by exploiting the redundancy of
task accomplishment is presented. Further algorithms for sensor placement in or-
der to minimize sensor switching while tracking moving targets and algorithms for
optimized target tracking using switched sensors are also presented.

Experimental validations of the proposed algorithms are presented in Chapter 6.
The experimental results for Hausdorff tracking, multi-sensor cooperation using the
mutational hybrid automata model and multi—sensor target allocation are presented.
Further results of optimal Hausdorff tracking and multi-target tracking experiments
are also presented. Chapter 6 also provides a performance analysis of the various
video transmission schemes and proposes a task- and scenario— based metric for eval-
uating the various video compression and transmission schemes for switched video
surveillance tasks. Further, simulation results for optimized target tracking using
switched sensors is also presented.

Conclusions and future directions are presented in Chapter 7.

16

 

9qu
Fur
fl 10g

WIT ll

2.1

Let g.
and J
be Cir
SHIN.
(105,.“

lle‘d .

 

Chapter 2

Mathematical Background

This chapter presents the mathematical preliminaries that will be used in the rest of
the dissertation. A deﬁnition of a dynamic systems is presented. Then mutational
equations, which are used to represent domain based dynamic systems are introduced.
Further hybrid dynamic systems are introduced and the Mutational Hybrid Automata
model is presented. Shape analysis and shape functions are introduced to provide

various measures on sets (shapes).

2.1 Introduction

Let X be a collection of variables; then X denotes the valuations of these variables
and a: will be used to refer to both the variable and its valuation (the reference should
be clear from the context). 2X denotes the power space of X i.e., the family of all
subsets of X and [C(X) denotes the space of all nonempty compact subsets of a given
closed set W C R". We will use M to denote a general metric space and X will be
used to denote a ﬁnite-dimensional vector space.

A controlled dynamical system is a system 2 = [X , F ,U , qt] where,

o X is an arbitrary topological space called the state space of E;

17

 

ifl

di:

Dir;
C011

enti

 

 

o I‘ is the time set, which is a transition semigroup with identity;
0 L1 is a nonempty set called the control—value space of E;and

o The map d) : X x I‘ x U H X is a continuous function satisfying the identity

and semigroup properties [58].

The dynamical system can also be denoted by the system 2 = [X , F,L(, f] where
the transition function f is the generator of the extended transition function ¢ [58].
A discrete-time dynamic system is a dynamic system 2 for which I‘ = Z where,
Z is the set of integers. The discrete-time dynamic system, 2 is ﬁnite dimensional
if both X and L! are ﬁnite dimensional and the dimension of the system 2 is the
dimension of X. A system 2 is complete if every input is admissible for every state.

A hybrid dynamic system is an indexed collection of dynamic systems with some
map deﬁned for jumping among them. Various models for hybrid systems have been
proposed in the literature [43] [44]. Generally, these approaches assume that the
continuous dynamics part of the hybrid systems are deﬁned by the solutions of differ-
ential equations (51': = f (:I:(t))) where :r:(t) E X C R" and the function f : X H IR" is
called a vector ﬁeld on IR". The resulting dynamical system is given by ¢(:ro, t) = 23(t)
where :r:(t) is a solution of the differential equation starting at 3:0 at t = 0.

For the case of the pervasive surveillance problem, the variables to be tracked
are not elements of a vector space but geometrical domains themselves which are
elements of the space [C(W) of all non-empty compact subsets of a given closed set
W C R" with distance d. Hence we need to enhance the hybrid systems model(s) to
include the cases where the continuous state is not an element of a vector space but
a geometric domain K E [C (W) In this case we describe the continuous dynamical
system as the solution of a mutational equation K 3 <p(K(t)) [59] which is deﬁned
as a tube. Using this formalism we propose a Mutational Hybrid Automata (MHA)

model which can adequately describe a pervasive surveillance scenario.

18

 

as:

Del
(If I)

out

 

p020"

 

We also deﬁne the projection of a point a: which is an element of a vector space

V on a set A whose elements are also the elements of the vector space V as:

Deﬁnition 2.1.1 ( Projection of a point :r: on a set A: HA(:2:)) The projection HA(:r)
of a point r on set A is deﬁned as the set of points in A having the distance dA(:r)

from point :1: and is written as:
HAUL") = {2 E Alllz — Ill = dA($)} (2-1)
where, dA(:c) is the distance of set A from point :1: deﬁned as:

61.40:) = inf llx — till (22)
yEA

Extending this concept further, we now deﬁne the projection of a set A on set B

Deﬁnition 2.1.2 {Projection of a set B on a set A: HA(B)) The proyection HA(B)
of a set B on set A is deﬁned as the union of the projections of all the points in B

on the set A as:

HA(B) = U Mar) (2.3)
3368

where, Pi A(.r) is the projection of a point :1: on a set A which is the set of all the

points in A having the distance of dA(:r) from point :13.

2.2 Mutational Equations

This section provides a basic introduction to shape analysis using mutational equa-
tions [60] and relevant concepts useful for developing the theoretical foundation for

Hausdorff tracking. The sensor coverage area and the target are readily represented

19

 

tr}.
ma:

50h

 

as sets (domains) in E C R". In order to study the motion of these domains we need
to deﬁne a differential calculus on the space of all compact, non-empty subsets [C(E)
of the closed set E. The mathematical framework of mutational analysis allows us to
extend the concept of differential equations to the metric space [C(E).

For deﬁning mutational equations, we supply the space [C(R”) with a distance
d, for example the Hausdorff distance between domains K1,K2 E R" deﬁned by

due, K2) = supqemn IldK1(q) - dawn. where. em) = 1:12.6an — pll represents

the distance between the point q and domain X.

2.2.1 Evolution of a Tube

Tubes are domains evolving with time and can be deﬁned as a map K () : R+ i—r
[C(E). The deformation (motion) of the coverage and the target sets can be repre-

sented using tubes:

R+ i—-> [C(E)
K(): (2.4)

ti-—> K(t).

The evolution of a tube can be described using the notion of a time derivative of the
tube as the perturbation of a set. Associate with any Lipschitz map (,0 : E i—-> E, a
map called the transition 19¢(h, q) :2 q(h), which denotes the value at time h of the

solution of the differential equation:

(1 = 99((1)

(1(0) = <10-

20

Extend this concept of a transition to the space [C(E) by introducing the reachable

set from set K at time h of (,9 as

19,901.10 == {199201. qo)}qoeK (2-6)

The curve h +——> 19¢(h, K) plays the role of the half lines h i—> :1: + lit) for deﬁning
differential quotients in vector spaces. Using the concept of the reachable set the

time derivative of a tube can be deﬁned as a mutation:

Deﬁnition 2.2.1 (Mutation) Let E C R" and cp : E i—r E be a Lipschitz map
{9.0 E Lip(E,lR”)). Iffort E R+, the tube K : R+ H [C(E) satisﬁes:
d(1((t+ h). 199001. K0»)

1' = 2.
’13ng h 0. ( 7)

 

then, go is a mutation of K at time t and is denoted as:
K 3 «p (2.8)

It should be noted that cp is not a unique representation of the mutational equation,
which justiﬁes the use of the notation (3) [60].
Consider a function (,0 : R x [C (E) i—> Lip(E, IR") as a map associating a Lipschitz

map (,0 to a pair (t, K). Using this map, we can deﬁne the mutation of tube K (t) as:

K(t) a cp(t,K(t)), VtZO (2.9)

2.2.2 Controlled Mutational Equations

A controlled mutational equation can be written as:

K(t) a e(t,K(t),u(t)), VtZO (2.10)

21

u(t) e U. (2.11)

where, (0 : R+ x [C(E) x U H Lip(E, IR") is a continuous map associating a Lipschitz
map with (t, K, u) and u(t) E U is the control input.
A feedback law can be deﬁned as a map Ll : [C(E) H U associating a control u

with a domain K (t) as:

u(t) = U(K(t)) (2.12)

Using a controlled mutational equation, we can model the motion of the target and

coverage sets due to the motion input u to the camera/ robot.

2.3 Hybrid Dynamic Systems

Hybrid systems generally refer to dynamical systems which consist of a mixture of
discrete and continuous components. Such systems describe the time evolution of
systems with both continuous and discrete parts. Hybrid dynamical systems consid-
ered in the literature generally have continuous dynamics modeled using differential
equations which depend on some discrete phenomena like switching and jumping in
the continuous and discrete state of the system respectively. These hybrid phenomena
can be modeled using hybrid automata proposed by Lygeros and Sastry [44] [61]. A
hybrid automaton describes the interaction and time evolution of both the discrete
and continuous parts of a hybrid dynamical system. The hybrid automata frame-
work models the continuous part of a hybrid dynamical system as using differential
equations. Edge maps and reset maps are used to capture the discrete nature of the
system while guard conditions and domains are used to describe the interaction of

the discrete and continuous components.

22

A hybrid automaton (HA) is a collection

H = (Q, X, s, Init, D, E, G, R] (2.13)

with the constituent parts as follows.

Q is the set of index states or discrete states.
X is the set of continuous variables.

2 = {2,1}qu is a collection of controlled dynamical systems, where each Eq =
[Xq, Fq, fq,uq] is a controlled dynamical system. Xq are the valuations of the
continuous state variables X q, fq are the continuous dynamics and Hg is the set

of continuous controls.
I nit C Q X X is a set of initial states.

D : Q H 2X is a domain. Each discrete state has a domain of possible con-
tinuous states associated with it. The domain is the set of valuations of the

continuous states where the particular associated discrete state is valid.
E C Q x Q is a set of possible edge transitions between the discrete states.

G : E H 2X is a guard condition. Each edge in E is mapped to a manifold in

X which triggers a mode switch when the continuous state hits this manifold.

R : E x X H 2X is a reset map which causes the continuous state to be reset

due to the mode switch in the discrete state.

The state of the above hybrid dynamical system can be denoted as S = UqEQXq x

Q. The above model can be used to model the discrete phenomena mentioned earlier.

For more details about this model refer to [44].

23

2.4 Mutational Hybrid Dynamic Systems

Mutational equations can be incorporated into the deﬁnition of general dynamical sys-
tems as they satisfy the various nontrivialitry, semigroup and identity properties [58].
Using the concept of transitions of domains introduced earlier and mutational equa-

tions, we can write a general dynamic system as:

E =[IC(W),I‘,2999] or (2.14)

s = [K(W), r, ,0) (2.15)

where i9.p is the transition function and (,0 is the generator of that transition function.
Using the above deﬁnition of a dynamic system, we can now deﬁne a Mutational

Hybrid Automaton (MHA) as:
Hm = [Q, M,E,Init,D, E,G, R] (2.16)

with the constituent parts as follows.
0 Q is the set of index states or discrete states.

0 M is a collection of the continuous variables K,- 6 IC (W)

E = {2%}qu is a collection of controlled dynamic systems, where each Eq 2
[K(W)q, I‘q, (,0q,qu] is a controlled dynamic system. [C(W)q are the continuous
state spaces, (0., can be construed as the continuous dynamics and Hg is the set

of continuous controls.

Init E M x Q is a set of initial states.

D : Q H 2M is a domain which associates a domain of operation with every

qEQ.

24

 

n)-

o E C_: Q x Q is a set of edges along which discrete transition takes place.
0 G : E H 2M is a collection of guard conditions which enable discrete transitions.

0 R : E x M H 2'M is a reset map which resets the continuous state when a

discrete transition is enabled.

Thus, S = quQ M x {q} is the hybrid state space of Hm.

A MHA can be pictured as an automaton as in Figure 4.2. There, each node is a
constituent dynamical system with the index state as the name of the node.

This deﬁnition is a fairly broad and powerful modeling framework which subsumes
continuous differential equations, mutational equations, discrete time difference equa-
tions, ﬁnite automata as well as switched controlled systems and non-smooth differ-
ential/ mutational equations. A more intuitive explanation of the various components
will be provided via an example application to pervasive surveillance networks in the

following section.

2.5 Shape Analysis

Shape analysis deals with problems where the variables are not vectors of parameters
or functions, but the shapes of geometric domains or sets K contained in a subset
E C R”. The measure of deformation of a set K can be expressed using shape
functions as: J : [C(E) H IR. The directional derivative of the shape function rep-

resents the change in the shape function J (K) due to the deformation of the shape K.

2.5.] Shape Functions

Shape functions are set-deﬁned maps deﬁned from K(E) H IR and provide a “mea-

sure” of the deformation of K E [C(E). For example we can use a shape function

25

 

Ti,

to check the distance of the set K to another set K, or whether the reference set
K is contained within the current set K. Shape acceptability and optimality can be
studied using shape functions J as optimization cost functions as is shown in [62].
Various shape functions can be readily developed which can adequately describe the

surveillance task of set coverage or a measure of the size of a set, etc.

2.5.2 Directional Derivative 0f Shape Functions

The directional derivative of the shape function [62] represents the change in the
shape function J (K) due to the deformation of the shape K. It can be construed
as the analog of the directional derivative in vector spaces and provides us with a
measure of the change in the task criterion due to motion of the coverage or target
set.

Consider a function J : H H R where H is a real Hilbert space such as R". The

Gateaux (directional) derivative of J at the point q in the direction of v is deﬁned as:

J(q+tv)—J(Q)

 

 

 

D ' . = l' 2.1
J ((1)0) gr), t ( 7)
This can also be looked at in a different way. Suppose q(t) is deﬁned as:

40) = 11. (1(0) = q (2-18)
This means that ifj(t) = J(q(t)) then,

dj(t) _ -. _ . J((me - J(CI(0))
dt " J(t) — ill—13(1) t (2'19)
= lint) “‘1 + “’2 — “(1) = DJ(q)(v) (2.20)

Thus, the Gateaux (directional) derivative involves the point q and the direction v at

t=0.

26

for);

 

can

2.6

Tlll.‘
tati»

113-b:

 

nan]
llnu,
1151? j
A 11:.
Slime.

8‘30“ :

 

 

Extending this same concept when q is not an element of a Hilbert space, but a
shape (domain) K and v is replaced by (0, which is the direction of mutation of the
tube K. Thus the Eulerian semi—derivative (Gateaux directional derivative) is deﬁned
as:

MM z 1.... J(v.(t.1<))— J(K)

2.21
t—>0 t ( )

 

where, 1999(t, K) is the t reachable tube of K under the mutation (,0.
From [60] and [63], the directional derivative of the shape function having the

form of

J: [K new. (222)

can be written as:

J°(K)(<p) = / div(f<q><p(q>) dq (2.23)

K

2.6 Chapter Summary

This chapter presents a discussion on various dynamic systems and introduces mu-
tational equations as a tool to model the dynamics of shapes. It also presented the
hybrid automata model used to model a combination of discrete and continuous dy-
namic systems. Hybrid automata models use differential equations to model the con-
tinuous system dynamics. This chapter further extends the hybrid automata model to
use mutational equations to model the continuous dynamics. The Mutational Hybrid
Automata (MHA) model proposed allows us to model large scale networked pervasive
surveillance systems. This chapter also introduces shape analysis used to measure the

acceptability of shapes using shape functions.

27

 

 

Thi.
lam
irai:
Hiﬁr
Illi‘t
Inuit
(hill

of II:

3.1

 

In a
Willi
0f a .
One ,-

JUN};

 

Chapter 3

Modeling Mobile Surveillance

Networks

This chapter presents the modeling of a tracking task using a multiple camera surveil-
lance network. which is a mutational analysis based technique christened “Hausdorff
tracking” (in honor of the German mathematician Felix Hausdorff who made sig-
niﬁcant contributions to set theory, descriptive set theory, measure theory, function
theory, and functional analysis), is introduced in order to track multiple targets si-
multaneously using a single camera sensor. Further the Mutational Hybrid Automata
(MHA) model is introduced for tracking a target as it moves across the viewing region

of multiple cameras.

3.1 Introduction - Example Surveillance Scenario

In a multiple camera surveillance scenario with active (moving) cameras, the target,
which moves in a continuous fashion, can be tracked using either continuous motion
of a camera sensor or by discontinuous jumps of the “current” tracking sensor from
one camera to the next or a combination of continuous motion and a discontinuous

jump. Consider the example surveillance scenario form Figure 3.1(a) which depicts

28

  
 

-- Coverage set

.- Target set (disjoint)

Optical Axis

 

 

 

 

 

 

 

 

 

(a) Three—Camera Pervasive Surveillance Scenario

Figure 3.1: 3 Camera Pervasive Surveillance Scenario

the image planes of three camera sensors and a couple targets.

The targets can be tracked using the continuous motion of Camera 1 as is shown
in Figure 3.2(a). We see that Camera 1 is moved in a continuous fashion to ensure
that the targets are within the active ﬁeld of view of the sensor. In Figure 3.2(b),
the tracking task is transferred to Camera 2 which can already view the targets in
its active ﬁeld of view. Finally, as shown in Figure 3.2(c), the tracking task can
be transferred to camera 3 which will ﬁrst need to acquire the targets by executing
continuous motion and then a discrete switch will be executed to transfer the tracking
task to camera 3.

This chapter ﬁrst presents the modeling of continuous motion of the camera sen-

sor and target (continuous dynamics) using mutational equations. These dynamics

29

 

(a) Coverage Using Continuous Motion

 

(b) Coverage Using Only Discrete Motion

 

(c) Coverage Using Hybrid Motion

Figure 3.2: Modeling Pervasive Surveillance Networks

30

 

Ff.
1

models can be used for developing a servo mechanism to control the continuous cam-
era motion. The continuous input to the camera for motion can be controlled using
an error feedback from either the image space analysis or the world space analysis.
Image space analysis implies that the error feedback for moving the camera is derived
from analysis of the target location in the image plane of the camera, while world
space analysis implies that the error input for moving the camera is derived from the
world space coordinates of the target (acquired from other sensors that can observe
the target) and the current world space location of the camera coverage set. Hence,
we develop models for image space servoing and world space servoing separately.

The mutational equation model for the target motion in the image space is derived
using the perspective projection model and the image jacobian [64]. The motion of
the camera coverage set in the world space model is derived separately for 2-D and
3—D scenarios. In the 2-D scenario, the camera coverage set is taken to be a pie cut
out from a circle, while in the 3—D scenario, the camera coverage is modeled as a
3-D pie (cone bounded by a sphere) cut out from a sphere. The motion of these
coverage sets due to the motion of the camera is modeled as a controlled mutational
equation. Rirther, we relax the implicit assumption of holonomic motion constraints
and extend the continuous motion modeling using mutational equations for a sensor
with non-holonomic constraints.

The Hausdorff tracking method for image space and world space models is intro-
duced. This method uses shape functions to derive feedback errors which are used to
control the motion input to the camera. Image space Hausdorff tracking and coop-
erative Hausdorff trackng are presented as two examples of the Hausdorff tracking
method in the image space coordinates and world space coordinates, respectively.

This chapter also introduces the Mutational Hybrid Automata (MHA) model for
modeling the hybrid camera motion scenario shown in Figure 3.2(c). Using mutational

equations, the mutational hybrid automata model can describe the continuous motion

31

of the cameras and target and the discrete switching behaviors when a target leaves
or enters the ﬁeld of view of a camera.

Using Hausdorff tracking method developed in this section, there are numerous
choices of the input applied to the camera which can accomplish the task. This
is because the robot system has a certain amount of redundancy built in for the
surveillance task. An example of this redundancy in task execution can be evinced
by noting that in order to bring a target, which is currently located at one of the sides
of the camera F OV, into the cameras F OV, the camera can execute an angular motion
by panning or a lateral motion by moving behind. Both these motions individually
or a combination of them can be used to accomplish the surveillance task. Hence we
need to select an appropriate input to the camera for task accomplishment.

We can select various other tasks that the camera can perform in order to utilize
the redundancy offered by the camera motion. For example the energy consumed
due to the motion of the sensor should also be minimized while maintaining a safe
distance from sensed obstacles (in the case of sensors mounted on mobile robots).
These combined tasks can be expressed as a multi-objective optimization problem
which tries to minimize the resolution inadequacies and energy consumption subject
to the constraint of maintaining the visibility of the target.

Various other tasks can also be considered for the multi-objective optimization
problem and the complete problem can be represented as an optimal control frame—
work for Hausdorff tracking. Once the task criteria are expressed in a multi-objective
constrained optimization framework, various methods such weighted sum method
,physical programming, non-linear programming, search space methods, etc. can be

used to ﬁnd the solution to the Optimization problem.

32

 

Tl
llsj
ilt )1

hip
I111;

101

Spam:

 

3.2 Controlled Mutational Equation Model for Camera Sen-

sor and Target

A controlled mutational equation can be written as:

K(t) a (,0(t,K(t),u(t)), age (3.1)

u(t) E U, (3-2)

where, (p : R+ x [C(E) x U H Lip(E, R") is a continuous map associating a Lipschitz
map with (t, K, u) and u(t) E U is the control input.
Using a controlled mutational equation, we can model the motion of the target

and coverage sets due to the motion input u to the camera/ robot.

3.2.1 Perspective Projection Model

The deformation of the target set w.r.t. the motion of the camera can be represented
using a mutational equation. The mutational equation can be derived using the optic
ﬂow modeling as shown in [41]. Let u be a velocity screw which represents the velocity
input to the camera and let q be a vector of image space feature parameters. The
image jacobian Jv is a linear transformation that relates the camera velocity screw u

to the tangent space q of the image space feature parameter vector q as:

q = Jv(c)u (3.3)

where, c represents the camera pose such that c = u.
Assuming that the projective geometry of the camera is modeled by the per-

spective projection model, a point P = [:r,y, z]T, whose coordinates are expressed

33

 

 

 

r—J

 

with respect to the camera coordinate frame, will project onto the image plane with

coordinates q = [gag qy]T as:

(3.4)

where A is the focal length of the camera lens [41]. Using the perspective projection
model of the camera, the velocity of a point in the image frame with respective to the
motion of the camera frame [41] can be expressed. This is called the image jacobian

by [41] and is expressed as:

= «pc(q) = Bc(Q) , = Bc(q)uc (3-5)

—-:— o a —e A—xe q... s

0 —% 2} 1:33 _-_€1.rq _q$ {q

)T are the image space coordinates of the point P whose task

Where. q = ((122.93;
Space coordinates are (23,31, z)T. A is the focal length of the lens being used and
u = [11:2, vy, vz, (.013, wy, wZ]T is the velocity screw of the camera motion and A is

the rate of change of the focal length.

The mutational equation of the target set can now be written as:

(i: wlq) = 99cm) + 92107)

A

K 3 (0(K) (3.6)

3.2.2 World Space Model

We can derive the mutational equation for the sensor coverage set and target set

in the world space. Here we consider two cases for a 2-D model and 3-D model

34

 

 

Ci;

poi

for

mum.

 

 

separately and present the mutational model for the 2-D and 3-D cases based on the

dimensionality and geometry.

2 Dimensional World Space Model

Consider a point q E R2 deﬁned as:

qr, _ :r + r cos(p) (3 7)

qz 2 -l- rsin(p)

where [r, z, 0]T are the coordinates of the robot and r, p are the polar coordinates of
point q with respect to the robot coordinates. Using equation 4.15, the coverage set

for the focusing constraint can be written as:

K : {q l T E (Dmina Dmax)ap E (01min: amaw» (3'8)

where Dmm, Dmax are the minimum and maximum focus distances and amin: amax
are the minimum and maximum angles of the view lines as shown in Figure 3.5. The

motion of each point in the coverage set can be written as:

q;- 1 0 —r sin(p)
2 ’11.
4,. 0 1 r cos(p)
i=Mm=Bu we

where u = [13, 2,9]T is the velocity input to the camera. Using equation 4.17, the

mutational equations for the motion of the coverage set can be derived.

35

3 Dimensional World Space Model

The coverage set can be speciﬁed as the set of points which satisﬁes the viewing
constraints of the sensor and can be deﬁned as the active FOV of the camera after
incorporating the depth of ﬁeld constraints such as the maximum and minimum focus

distances. Consider a point q E R3 in the sensor coordinate frame deﬁned as:

 

 

 

 

' (1x - ’ rcos(0) sin((b) ‘
qy = rcos(q‘9) (3.10)
qz rsin(6) sin(c§) 4

where (r, 6, (b) are the polar coordinates of the point q in the sensor frame. Using the

above equation, the coverage set can be deﬁned as :

K = {q I 7' E (DminiDmar)»9 E (aminaamarla

d9 E (remap, (3111271)} (3:11)

where, (DmimDmax) represent the depth of ﬁeld and (am-mam”), (ﬁmabﬂmm)
are the view] angles for the sensor. The global coordinates of the target set K can
be transformed to the local sensor coordinates using the localization information
available.

The mutational equation for the motion of the coverage set can be written as:

 

 

 

 

qg; F cos(6)cos(q§) —sin(9)sin(gb) T
4y = 7‘ 811105) 0 u
_ (iz . [ sin(9)COS(¢) c08(9) $1105) -
(i = <.0(<1)=Bu (3-12)

where u = [wig wy]T is the velocity input to the camera. The objective is to cover the

36

 

In ti.

set of targets K with the coverage set K such that K C K.

3.3 Incorporating Non-holonomic Sensor Constraints

In section 3.2 and 3.2.2 the mutational model relating the motion of the sensor u
to the motion of the target and coverage sets was introduced. However, the implicit
assumption made was that the individual components of the vector u are independent
of each other i.e., the system has a holonomic structure. Consider the example of a
car moving in a horizontal plane. Select the reference point to be the midpoint of
the two rear wheels and the reference direction to be the main axis of the car moving
forward. This car’s conﬁguration can be deﬁned as a vector (state) (:r,y, 6) E R3.

However, these parameters and their derivatives are related by the following relation:

—:i: sin6 + ycos6 (3.13)

where it and y are the derivatives of a: and y respectively. At any conﬁguration

(2:, y, 6) along a trajectory parameterized by time, the triplet (5i), , y, 6) is the velocity
of the robot (car). The set of all vectors (2:, , y 6) passing through the point (x, y, 6) is
a vector space. If the robot was not constrained by the constraint in equation (3.13),
this vector space would be the tangent space of the conﬁguration space at (:r,y, 6)
and would have a dimension 3. However, due to the kinematic constraint in equation
(3.13), the velocity vectors only span a subspace of the tangent space of dimension
2. This is called a nonholonomic constraint and the system dynamics of the car are
said to be nonholonomic.

Various sensor motion platforms have a nonholonomic structure with examples like

a car which can be steered left and right only when the car has a velocity component

in the forward direction. Hence it is required to model such nonholonomic platforms

37

for an accurate description of system dynamics. Another example of a nonholonomic
platform would be a sensor mounted on a unicycle-type robotic motion module with
a differential drive, which has been quite popular recently since they require only two
motors for forward motion and steering.

Further, due to the large amount of energy consumed by the motion module,
multiple sensors with varying sensing modalities may be mounted on a single motion
module. An example is the mobile manipulator depicted in Figure 6.3, which has a
PTZ camera mounted on the mobile base and a bullet camera mounted on the robot
arm.

This section presents an example of a unicycle-type robot with nonholomic motion

constraints which can be modeled as:

a': = v cos 6,-
3) = 1) sin 6,-
6 = (.0 (3.14)

where, v and w are the linear and angular velocity control inputs of the robot. The

relationship between u and uh = (v, w)T can be written as:
cos6 0
v
u = Bhuh = sing O (3.15)
w
0 1

 

 

Thus, the controlled mutational equation for camera coverage model in equation 3.6

can be written as:

WI) = 3(a)u = B ((DBhuh = Bow. (3-16)

38

3.4 Hausdorff Tracking

The task of a surveillance camera is to continuously keep a moving target or multiple
moving targets in its active ﬁeld of view. The surveillance task of maintaining multi-
ple targets in the active ﬁeld of view (F OV) of the sensor with an adequate resolution
can be readily expressed in a set-based framework where the variables taken into con-
sideration are not vectors of parameters but shapes (domains) themselves. However,
due to lack of vectorial structure on the space, classical differential calculus cannot be
used to describe the evolution of such domains. However, mutational equations [60]
can be used to describe the dynamics (change in shape) of the sensor FOV and target
domains and further be used to calculate feedback mechanisms to accomplish the
surveillance task.

The surveillance task can be expressed, using shape functions [62], as the min—
imization of a Hausdorff distance-based metric or the size of the target, etc. The
shape function essentially represents the error between the desired and actual shapes
and reducing it to zero will accomplish the task. This section presents the method of
Hausdorff tracking using mutational equations for performing the surveillance task.

Shape or a geometric domain can be deﬁned as the set K E [C(E), E C IR" where
[C (E) represents the space of all nonempty, compact subsets of E. The target and the
camera coverage can be readily expressed as shapes. Mutational equations can then
be used to express the change (deformation) in the coverage and target sets based
on the motion of the sensor. Shape analysis [62] can be used to address problems
involving geometric domains or shapes. Shape functions, which are set-deﬁned maps
from J (K) : [C(E) H IR, can be used to provide a ”measure” of acceptability and
optimality of the shape K. For example we can use a shape function to see if a
reference set K is contained within a current set K. In order to accomplish the task
deﬁned using shape functions, we need to derive a feedback map LI : [C(E) H U,

where u = U (K (t)) is the input to the sensor, which will reduce the shape function

39

 

(it

d 3

IS ;
P0:

CO(

CO
I»)

to zero. The convergence of the shape function can be analyzed using the shape
Lyapunov theorem [65]. The convergence to zero of the task function would imply
task accomplishment.

The task requirements for Hausdorff tracking can be expressed as a positive semi-
deﬁnite shape function which provides a measure of acceptability of a particular shape
(set). For the example shown in Figure 3.2(a), the measure of acceptability of the
target image (sets) is whether they are within a certain region of the image space and
the measure can be expressed as a shape function J (K) 6 73+. The task of Hausdorff
tracking can be deﬁned as ﬁnding an input in order to move the camera in order to
reduce the shape function to zero. Hence we need to express the evolution (change)
in the shape of the target w.r.t. the movement of the camera. This is accomplished
using mutational equations, which are the analog of differential equations for shape
spaces. Using the mutational equation, we should be able to calculate the directional
derivative of the shape function w.r.t. the motion of the camera, and using an inverse
dynamics procedure, calculate a motion input to the camera in order to reduce the
shape function (error) to zero.

For the example in Figure 3.1(a), the task requirements for Hausdorff tracking
can be mathematically] expressed using a positive deﬁnite shape function J (K ) =
fl? (13((p) dp where K is the coverage set for the camera and dK(p) = inqu K ”g — pl],
is the point to set distance from point p to set K. Note that the shape function
is zero only when set K is completely covered by set K, otherwise it is a non-zero
positive value. Initially in Figure 3.1(a) the target set K is not covered by the camera
coverage set K and hence the shape function J (K > 0. The Hausdorff tracking task
is to reduce the shape function to zero by moving the camera as shown in Figure

3.2(a) i.e., minimize the shape function.

40

 

II

p:

r.
a(

lllt

Ta]

(‘3

3.4.1 Image-Based Hausdorﬁ Tracking

When the target is located in the active FOV of the sensor, the method of image-
based Hausdorff tracking can be used to accomplish the surveillance task. The target
is detected using image processing algorithms and represented as a collection of pixels
(blob) on the image plane. The sensor coverage set is represented as a rectangle cen-
tered at the image center and one of the tasks is to maintain the target blob within
the sensor coverage set. Surveillance tasks such as set coverage and maintaining ap—
propriate image resolution can be readily expressed in a setbased mutational analysis
framework using a shape function as the minimization of a Hausdorff distance-based
metric or the size of the target, etc. The task can be accomplished by reducing the
shape function to zero. The task of visual servoing can also be expressed in a set-
based framework [66]. The shape function essentially represents the error between

the desired and actual shapes.

Target and Coverage Sets for Image-Based Hausdorff Tracking

Figure 3.3 depicts the coverage and target sets. The coverage set K is the set of points
on the image plane (X, Y), within which we require the targets to be maintained.
Generally it is taken as a rectangle centered in the image plane. The target K is
deﬁned as the union of the individual image plane blobs (sets of pixels) formed by

the multiple (n) targets as: K = U121 Ki.

Shape Functions for Image-Based Hausdorff Tracking

The surveillance task of maintaining visibility of the target(s) with a certain resolution

can be represented using a shape function as:

41

 

 

Figure 3.3: Target and coverage set for image-based Hausdorff tracking.

J(K) = JFovu‘o + Jame?) + amuse?) (3.17)
JFOVUAQ = I}? did?) dq
JAm,,,(f{) = max(ff( dq — AREA_MIN, 0)

JAMAR) = min(AREA-MAX — In dq, 0)

where q is a point on the image set K and AREA_M AX and AREA_M I N denote
the maximum and minimum admissible areas of the target set K for maintaining
adequate resolution.

Note that the shape function J (K ) is zero only when set K is completely covered
by set K and when the area of set K is within the limits (AREA-M I N, AREA_M AX )

Otherwise it is a non-zero positive value.

42

Shape Directional Derivative for Image-Based Hausdorff Tracking

In order to derive the feedback u to the camera, we need to ﬁnd the shape directional
derivative J (K )((p) of J (K) in the direction of the mutation (0(K) using the shape
function and the mutational equation. Assuming a relatively ﬂat object, i.e., the z
coordinate of all the points on the target are approximately the same, we can derive

an expression for J (K )((,0) by substituting equations 6.1 and 3.6 into 2.23 as:

face) = [K waive) +f<q>diw<q> dq (3.18)

where, f (q) can be construed as the aggregated integrand from equation 6.1. For

the shape function of the type

J(K) = [K d§{(q) dq (3.19)

Substituting Equations (3.19) and (3.6) into Equation (3.21) we get

f(mev‘cu» = Lee—Menage
+ di<q>div(B(q>u)) dq
= [f((aq—ern-Be)
+d§~((q)divB(q)) dq - u (3.20)

The above equation can be written as:

June) = 2 [K (16311}; 0.1“" -p>..o(p>> dp+ 2 [K divee» dp (321)

Here it must be noted that for the existence of the shape directional derivative,

the integrand or E C1(IR”,IR). J FOV in Equation (6.1) is continuous. However, the

43

elements J Amin and J Amax in Equation (6.1) do not satisfy this criteria. Hence we
modify the shape function in order to satisfy the continuity criteria using smooth

approximations of the min and max functions as:

 

“d —Amm2 62 “d ”Amm
(MK 10 > + +0., 12 ) (3.22)

 

JAmin :

 

 

JAmaa: : _ (3:23)

2
V (Amar - IR dp)2 "l' 52 “ (Amos: _ fK dp)
2

where, e is a very small positive number. This approximation physically manifests
as adding a very small area (of size 6) to the target set size and hence this approx-
imation will hardly affect the validity of the shape function to represent the task of

maintaining adequate resolution.

Feedback Map u for Image-Based Hausdorff Tracking

The problem now is to ﬁnd a feedback map uc such that the shape function J is

reduced to zero. Equation 3.21 can be approximated as:

. - 1 .
J(K)((0) S Z—tCi(K)U1 + CI'2lKl‘U2 +' C3(K)’uc (3-24)

where,uc = [u?u§]T, u1 = [vx,vy,vz,A]T and u2 = [w$,wy,wz]T. zt is an esti-
mated minimum bound on the target 2 position. The relation 2t > 2 will guarantee

the inequality in 3.24.
The matrices C1(K), 02(K) and C3(K) can be deﬁned as:

c.(K)=2/ inf B.(q>T(q—p>dp, (3.25)
KqGHKIP)

44

02(K)=2/ inf B2(q)T(q—p)dp. (3.26)
KqEHK(P)

03(K) = 2 [K B(q)dp. (3.27)

where, the matrix B(q) is deﬁned as: B(q) = [31 (q),B2(q)] and its elements are
deﬁned as:

I "A 0 9x

Bi(q) = 2— (3.28)
t 0 —A qy
QIQ (A2+02) x
"TL q
B2(q) = EH2 y 9X (3.29)

will“ if“ —q. 91“

After combining the relevant terms from equation 3.24, it can now be rewritten

A

J(K)(<P) < C(K)uc (3-30)

A controller can now be proposed based on equation 3.30 using the notion of a
generalized pseudoinverse of a matrix. The pseudoinverse C# (K) of the matrix C (K)

must satisfy:

uc = aC#(K)J(K) (3.31)

where a is a gain value. It should be noted that the estimate zt of the target
distance only affects the gain of the control and not its validity. Further it is important
to note that the gain distribution between the various redundant control channels

depends on the selection of the null space vector when calculating the generalized

45

 

 

K Shape J

_. Function Controller

 

 

 

 

 

 

 

 

 

K Target 1 Image
Perception ‘ Frames

 

 

 

 

 

 

 

Figure 3.4: Block diagram for image-based Hausdorff tracking.

pseudoinverse C#(K) of matrix C (K )

Figure 3.4 depicts the block diagram representation of the Hausdorff tracking
controller. Once the targets to be tracked are identiﬁed by the target perception
module, the Hausdorff tracking method can be used to track the multiple identiﬁed
targets simultaneously irrespective of the relative motion of the targets with respect

to each other.

3.4.2 Cooperative Hausdorﬁr Tracking

Cooperative Hausdorff tracking is used when a sensor node cannot sense the target
in its current conﬁguration and needs to move in order to bring the target into its
active sensing region. The individual sensor nodes maintain information regarding
the observations of their neighboring nodes and broadcast (within their locality) their
own observations. Based on the combined observations, each node develops a list of
targets being actively tracked and the status of its peer nodes. Once the target is
detected in the active FOV, the sensor switches to image-based Hausdorff tracking to
perform the surveillance task.

For establishing cooperation of multiple sensors, which may not be able to observe

the same object simultaneously, there is a need transform the image blobs and mea-

46

surements into a common coordinate system referenced by all the sensors and task
the sensors according to the movement of the targets expressed in the common frame
of reference. Further, this common representation will be helpful in combining the
object detection hypothesis from multiple camera sensors for sensor fusion. It can also

be used for designing efﬁcient routing mechanisms and algorithms for communication.

Target and Coverage Sets for Cooperative Hausdorff Tracking

The coverage set can be construed as the set of points that satisfy the viewing con-
straints of the camera. It can be deﬁned as the active ﬁeld of view of the camera
after incorporating the depth of ﬁeld constraints such as the maximum and minimum
focus distances. Section 3.2 .2 provides the coverage set model for the 2D and 3D cases
considered and Figures 3.5 and 3.6 depict the 2D and 3D coverage sets, respectively.

For the world space model, the target is a set of points in 2D or 3D based on the
scenario. For ease of representation, the target set is represented as a bounding box
of the target. This simpliﬁcation tends to simplify the structures and computation.
required for tracking the target. Also, it should be noted that if the bounding box of

the target is covered, the target is also covered.

Shape Function for Cooperative Hausdorff Tracking

The objective is to cover the set of targets K with the coverage set K such that

K C K. This condition can be written using a shape function as:

J(K) = [K die) dp (3.32)

Note that when the target set is completely covered by the camera coverage set,
the value of the shape function is zero. Otherwise the shape function takes a positive

non-zero value. Hence the coverage task can be attained by reducing the value of the

47

2 ,
/
/

 

Figure 3.6: Target and coverage set for 3D cooperative Hausdorff tracking.

48

shape function in equation (3.32) to zero.

Shape Directional Derivative for COOperative Hausdorff Tfacking

Using the equation 3.21, the directional derivative of J (K) represented by J (K )((,0)

can be written as:

June) = 2 [K which" — 10). were (3.33)

The above equation can be rewritten as:

. — in T — U I -
J(K)(<.2)—2</Rqenli(p)3(q) (q p)dp. ) (3 34)

where u is the input to the camera and B (q) is deﬁned in equation 4.17.

Feedback Map 21 for Cooperative Hausdorff Tracking

In order to ﬁnd a u, let us denote:

C(K) = - inf B(q>T(q — pup. (3.35)
K (Enid?)
Let C (K )# be the pseudo inverse of C (K)
A choice of u can be taken as:
1 #
u = —§aC(K) J(K), (3.36)

where a is the gain of the control. The gain distribution of the task among the various

control channels of u is controlled by assigning the null space vector of the matrix

C(K).

49

3.5 Mutational Hybrid Automata Model for Pervasive Surveil-

lance Networks

Consider a network of N cameras which comprise the pervasive surveillance system.
With each camera camn we associate a coverage set Kn E [C(Wn) which signiﬁes the
region camn can view actively and Wn represents the total capable FOV of camn. A
target to be tracked is represented by a target set K C [C(W), W = Une N Wn. An
example of the graphical MHA model for two cameras tracking a target is shown in
Figure 4.2

Groups of cameras are sequentially involved in tracking a target as it traverses
the monitored region. Within each group, the behavior of each camera is governed
by target visibility from a particular camera. This combination of continuous track-
ing and switching behavior can be modeled using the Mutational Hybrid Automata
(MHA) model presented in section 2.4. This section discusses the various parts of the
MHA model for the pervasive surveillance network H p S N and further elaborates on

the design of stable controllers for a pervasive surveillance system modeled using the

MHA model.

3.5.1 Mutational Hybrid Automata Model

It is relatively straight forward to write the MHA model for a pervasive surveillance

system as:
HPSN = (Q,M,Z,Init,D,E,G,R) (3.37)

where Q represents the discrete variables, M represents the continuous variables,

2 = {2.1}qu is a collection of the continuous mutational dynamics, Init C_: Q x M is

50

the set of initial values of the variables in Q and M. D is a domain which associates
a domain of operation of the continuous states for every discrete state q E Q, E is
an edge set along which discrete transition takes place, G is a set of guard conditions
which enable the discrete transitions and R is a reset map which resets the continuous

states when a discrete transition is enabled.

Discrete Variables

In the above multi-camera surveillance scenario the target can be tracked by a single
camera or a group of cameras at any given time. Fhrther, if the target is in the FOV
of a particular camera, it can visually track the target, i.e., perform image-based
tracking, while if the target is not visible, the camera can cooperatively track the
target based on the location of another camera which can view the target. Hence
one discrete (binary) variable bn is associated with each camera to indicate whether
the target is visible to that camera, such that bn = 1 if target is visible in camn and
bn = 0 otherwise. Another discrete variable on indicates whether a camera (camn)

is involved in the particular tracking scenario. Thus the discrete variables collection

can be deﬁned as: Q = {{bn, CR}nENi }.

Edge Set

The edge set E deﬁnes which particular transitions are possible. Here we describe
general rules to develop E. A handoff between cameras can be initiated when the
target is visible to both cameras. An example edge set for a two—camera scenario is

developed in section 4.3.

51

Domains

The domains of operation Dq for each mode can be developed based on location of the
target set K, which will decide which cameras are capable of viewing the target, and
based on the continuity of the target motion, will decide which cameras can acquire

the target as it moves through the monitored region W.

Continuous Variables

The collection of continuous variables AI consists of the individual camera coverage
sets Kn and the target set K; i.e., M = {{KnlneNa K}. Let K E M represent an
element of the valuations of M.

Assumption: We can safely make the assumption that the target set K is convex.
This assumption does not have signiﬁcant effect on the model as all non-convex targets
can be modeled as their convex cover. This assumption will be invoked to aid the
proof of stability later in this section. We further make the assumption that the
coverage sets Kn are also convex. This assumption can be substantiated by the fact

that the active ﬁelds of view of most cameras are convex.

Continuous Dynamics

The continuous dynamics represent the change (deformation) of the sets in M which
can be modeled using mutational equations as: Kn 3 (pn(Kn, un) where an E U is
the control input to cam”. Based on the mode of operation of camn, the controller
model can be changed, giving rise to different continuous dynamics for each camera
in different modes of operation q. The motion of the target can also be represented as
a mutational equation as: K 3 (0t(K) which can be estimated from the observations

of the target.

52

Guard Condition

A guard condition or switching condition G associated with each edge e E E is
designed based on the location of the target set. Various events can be recognized
and processed by the multiple camera surveillance network such as the visibility of a
target in a particular target or the global location of the target or camera coverage
set reaching some predeﬁned value.

For example, when the target moves into the visible region of camn then the bn
associated with it changes, which effects the transition along an edge e to a mode
where bn = 1.

Further, based on the current location of the target (sensed and communicated
from other cameras), the variable an associated with camera camn will change in

order to involve the camera in the tracking task.

Reset Map

Since we are dealing with continuous positions of the targets and camera coverage
sets, there are no jumps affected in the continuous variables. Hence the reset map R
can be deﬁned as R : {(p, q), K} I-—> K, K E M. However, in case of multiple groups
of cameras being involved in the surveillance scenario, the coverage sets, which are
deﬁned as the union of the individual coverage sets, will jump from one group to
another based on the change in the discrete variable c. For example, when switching
the camera group from state q to p, the reset map is given as: R(p,q,K,;) t—> K j

where Ki = UlEpKlv Kj = UlEqu-

Init Set

The target can be recognized only if it is in the view of at least one camera. Hence

the set of initial conditions is restricted to the set of discrete modes Init : {q, M :

53

Anern = 1} where bn is the target visibility condition for camn.

3.5.2 Controller Design for Individual Cameras

Events can be recognized by the controller which can be used to generate the control
inputs provided to the various cameras and trigger mode switches. Such events include
the recognition of the target by a particular camera and the location of the coverage
set of camn hitting certain boundaries within its domain of operation Wn. Also,
within each mode q we are provided with the number of cameras actively involved in

the surveillance c and the cameras which can view the target on.

Camera Active, Target Visible(cn = 1, on = 1)

When the target is visible in the image plane of camn (bn = 1), the method of image—
based Hausdorff tracking [67] can be used to track the target. In this controller, the
control input to the controller is derived based on the shape function z'Jn(i"Kn,i" R) :
K(Wn) x [C(W) I—> R+ where the superscript in represents the projection of a set on
the image plane z'Wn of camn. The control input an can be calculated as is detailed

in [68].

Camera Active, Target Not Visible (on = 1, bn = 0)

When the target is not visible to camn but is visible to camm (bm = 1), we can use
the method of cooperative Hausdorff tracking proposed in [67]. If the target K ¢ Wn
then camn should track the projection HWn(K) of the target on the domain W”.
This strategy will ensure that as soon as the target moves into the region Wn it will
be encompassed in the visible region of camn.

However, in a real tracking scenario the projection flu/”(1%) may not be easily

available. But taking into account the fact that bm = 1, which implies that R C K m,

54

we can deduce that HWn(K) C IIWn(Km). Thus, tracking HWn (Km) will also
ensure tracking the set IIWnUf) (proof of this proposition not presented due to space
limitations but it can be done in a very straightforward manner).

We can derive a control input an using cooperative Hausdorff tracking technique in
order to track the set HWn ( K m) by minimizing the shape function CJn(Kn, HWn (K m)) :
[C(Wn) x [C(Wn) +——+ R+.

Camera Inactive(c.,, = 0)

When the distance of the target set to the coverage set of camn is large i.e., the target
is vary far away from camn, the variable on is set to zero and camn enters into inactive
mode. This mode prevents all the cameras from continuously servoing based on small
motions of a target which is rather far away and is used to conserve energy of the
camera. In this mode, the input to the camera, camn is set to 0 and the camera does
not move. The target dynamics are still represented using the mutational equation

of the target set.

3.6 Chapter Summary

In this chapter, we have introduced the use of controlled mutational equations to
model the motion of the sensor coverage and target sets. Using the mutational equa-
tion model, we have introduced the method of Hausdorff tracking which can be used
to control the motion of the sensors (cameras) to maintain target visibility for moving
targets. Speciﬁc examples of the Hausdorff tracking in image space and world space
have also been presented.

Cooperation among various sensors is an important characteristic for pervasive
surveillance. We have introduced the mutational hybrid automata (MHA) model for

modeling motion of multiple sensors tracking a target. The implementation of this

55

model for a pervasive surveillance scenario can be done in a distributed fashion as the
various events occurring, such as target visibility, etc., can be detected and processed

in a distributed fashion.

56

Chapter 4

Analysis of Surveillance Networks

This chapter presents the stability analysis of the proposed model and controllers.
The shape Lyapunov theorem is presented in section 4.1, and can be used to analyze
the asymptotic behavior of tubes. The stability analysis of the proposed Hausdorff
tracking method using the shape Lyapunov theorem is presented next, followed by
the analysis of an example two-camera system modeled using the mutational hybrid

automata model proposed earlier.

4.1 Lyapunov Theorem for Shape Space

The asymptotic behavior of the measure J (K (t)) of the deformation of the set K can
be studied using the shape Lyapunov theorem. The deformation is described as the
reachable tube K (t) and is the solution to the mutational equation in equation 3.1.

We usually look for assumptions that guarantee the convergence to O of J (K (t))

Deﬁnition 4.1.1 (Shape Lyapunov Function) Consider a subset E of R” and a
mutational map (,0 deﬁned on the set E as shown in section 1, a shape functional

J : [C(E) I——> R+ and a continuous map f : R I——> IR. The functional J is a f-Lyapunov

57

 

 

6L5

Tl.-

56

 

 

function for the mutational map (,9 if, for any K E Dom(J) we have,

deixngwmwnga Jm3=wm) (4D

where, w() is a solution of w’ = —f(w).

When the solution w(-) converges to 0 as t —+ 00, the shape Lyapunov functional
J (1990(t, K )) also converges to 0.
Using shape derivatives, the above deﬁnition of Lyapunov functions can be stated

as the shape Lyapunov theorem.

Theorem 4.1.1 Consider a subset E of IR" and a mutational map 99 deﬁned on the
set E as shown in section 1, a shape functional J : K(E) I—> IR+ and a continuous
map f : IR t—v R. Let the Eulerian semi-derivative of J in the direction go exist and be
deﬁned as J(K)(<p). The functional J is an f-Lyapunov function for (,0 if and only
if, for any K E Dom(J), we have

Jump) + mm) g 0. (4.2)

Proof. (from [65])Assume that for any positive t, we have J (19¢(t, K )) g w(t).

Using J(K(0)) = w(O), we can write

%uwyiK»—AK»< mnrwmm> “w

H-IH

Using the deﬁnition of directional shape derivative this yields,

Mono) s w'(0> (4.4)

Then, observing that w’ (0) = — f (w(0)) = — f (J (K )) we can conclude the result.

58

Conversely, denote y(t) = J (1999(t, K )) We can now write

mo~wn=u/:Eﬂ WicK>

By the hypothesis, we obtain
t
yw—mms—Afuwns 0%)
Now, we can write
yf1(t) S —f(y(t)) (47)

Now consider a solution w(-) of the differential equation

u305) = -f(w(t)), w(0) = J(K) (4-8)

Using the Gronwall Lemma, for any positive t, we obtain

y(t) = J (19920, K )) S 10(0) (4.9)

This proves the theorem. [:1
Using the shape Lyapunov theorem we can derive assumptions on the input u, to

move the camera/ robot, in order to accomplish the surveillance task.

59

4.2 Stability Analysis of Hausdorff Tracking

In this section we analyze the stability of the Hausdorff tracking method using the
shape Lyapunov theorem discussed in the previous section.

Stability in the sense of Lyapunov for the proposed Hausdorff tracking method
would imply the asymptotic convergence to zero of the shape function J (K) This
asymptotic convergence to zero of J (K ) will imply that the task criterion for the
system is accomplished.

For both the image-based and cooperative Hausdorff tracking methods proposed,

the shape directional derivative has been reduced to the following form:
J°(K)(so) < C(K) - u (4.10)

We can now propose the following theorem for stability of the Hausdorff tracking

method.

Theorem 4.2.1 For a automated tracking system modeled using the mutational equa-
tion K 3 w(K) where K is the target set and w(q) = B(q)u with the plant input
vector u. The shape ( error) function is deﬁned as J (K) with the shape directional
derivative in the direction of the mutation (J (K )(go) ) deﬁned by equation 4.10. The
system is stable, i.e., the shape function asymptotically converges to zero for the input

u = —aC#(K)J(K), a > 0.

Proof. In order to prove the stability of this system, we have to show that it
satisﬁes the shape Lyapunov theorem from section 4.1. That would imply that the
shape function J (K) asymptotically converges to zero. The shape Lyapunov theorem

can be written as:

J°(K)(<p) + aJ(K) g 0 (4.11)

60

Substituting the values of J (K )(cp) into the above equation we get
C(K) - u + aJ(K) g 0.. (4.12)

Substituting for u we get

—aC(K) - o#(K) . J(K) + (mm) s 0 (4.13)
aJ(K) g aJ(K) (4.14)
which concludes the proof. C]

4.3 Analysis of Example Surveillance System using Muta-

tional Hybrid Automata Model

This section presents the stability analysis for a two~camera example system modeled
using the mutational hybrid automata (MHA) model. The mutational hybrid model
for a two-camera example system along with the various constituent parts is presented,
followed by a stability analysis of the example system.

The two-camera model presented is extended by providing an inactive state for
each camera. This implies that the cameras need not be tracking the target contin-
uously and can shut down (go into an inactive state) or be assigned to other tasks
when a particular target is not close by. Owing to this fact, the proposed model will

be more energy efﬁcient.

61

 

Camera 2

Camera 1 .~ I

Active Fields of View
q

 

 

Capable Fields oz View

 

Figure 4.1: Example pervasive surveillance scenario with 2 cameras

4.3.1 Two-Camera Example for Pervasive Surveillance with Contin-

uous Servoing Cameras

Consider the example as shown in Figure 4.1, which has two cameras with intersecting
domains. With each camera we can associate a coverage set that represents the
current FOV of the camera as represented in section 3.2. The total capable FOV of

the camera carnn can be represented as a closed set Wn. The closed set W can be
modeled as the union of the total capable FOV of the two cameras (W = W1 U W2).
The target (modeled as a domain in W, If E [C(W)) can move around in the set W
with bounded velocity and acceleration. The problem is to ﬁnd a hybrid controller

(continuous control as well as switching conditions) which can pervasively track (cover

using the active FOV) the target as it moves about randomly (maybe adversarially)

in the domain W.
Consider the MHA model for the pervasive surveillance network given in equation

62

3.37. The constituent parts of the model can now be deﬁned with respect to the

two-camera pervasive surveillance system in consideration.

Hybrid State Space (Q and M)

The set of discrete states is modeled as Q 2 {b1, b2, 0} = {q} with its set of valuations
Q = {(0,0,2),(0,1,2),(1,0,2),(1,1,2)} = {q0,q1,q2,q3}. In this example we will
assume that c = 2 which implies that both the cameras are involved in the tracking
task. This assumption will be relaxed later on in the discussion. Each mode q E H,
uses a different controller for the two cameras hence effecting distinct dynamics in
them.

The continuous states for all modes are modeled as the target set R and the
current coverage sets K1 and K2 of cameras caml and camg, respectively. The
coverage sets for the cameras can be deﬁned as the sets of points which satisfy the
viewing constraints of the camera including field of view and depth of ﬁeld constraints
such as the maximum and minimum focus distances. Consider a point q 6 R3 in the

coordinate frame 0n of camn deﬁned as:

 

 

 

 

r ' P 1
kl- 'r cos(6) sin(¢)
kg = r cos(¢) (4-15)
kg 1" sin(6) sin((,b)

where (r, 6, ab) are the polar coordinates of the point k in the camera frame On. Using

the above equation, the coverage set can be deﬁned as :

Kn = {k l T E (Dminvaax)a6 E (aminaamar)v

Cb E (ﬁmaxa #3771270} (4'16)

where, (Dmin, Dmam) represent the depth of ﬁeld and (0mm, am”), (ﬁmax, ﬁmm) are

63

the view angles for the sensor. The collection of continuous variables can be written as
.M = {1%, K1, K2} with the set of valuations of AI deﬁned as M = [C(W') X K(Wl) x
K(Wzl

64

2358 5888.039 23 How Epoch <32 183920 ”me mama

 

 

 

    

      

   

     
 

N _ \ as
_ l. auwfsuwl
\suw N N N )N
A3. v: SmNVoN \SUM
23.5w: CN
M
Stag
:jiuNs
.MCM suN

Nauw

fuw

Amzhkvau m NM

 

suNkcwéufcw

 
      

 
   

  
  
  

 

_._ _ o
A: Mvsm—w ~3.—kVi—$M_¥
M O
‘ k
AWV 5mm” N.—|2 2 t ANchgmM
$2.3”; Tau c: :.o._.:um. /
\ AN:.N¥VNS m NM N _ :2
a an kafvu MCM .N

:5 suNkcwénfcw _ _ _
. Cs. 531M

M
Qismw

c8987 cs

*NUalQo—U. Qv H W

65

Assumption: We can safely make that assumption that the target set R is convex.
This assumption does not have signiﬁcant effect on the model as all non-convex targets
can be modeled as their convex cover. This assumption will be invoked to aid the

proof of stability later in this section.

Domain (D) and Init Set (Init)

The domain B(q) associated with every discrete mode q can be modeled as the union
of the capable FOV Wn of the cameras involved in that particular mode such that
the coverage sets of the individual cameras lie in their respective capable FOV’s
(Kn C W”). We can further restrict the deﬁnition of the domain by using the
condition that the target is a subset of the union of the capable FOV’s of the cameras
which can see the target (Ii C Une.7\7,bn=1l”l”n)v For this example the domains for

each mode can be deﬁned as:

B(QO) = {K C Ml(Kn C Wrnlne{1,2}}
D(q1) = {K e M|(Kn c Wn)n€{1,2} /\ R c W2}
D(Q2) = {K C Ml(Kn C Wnlne{1,2} A K C W1}

13(43)

{K C Ml(Kn C W7l)n€{l,2} A K C Une{1,2}Wn}

In order to track a target, it must be recognized by a camera for which it must be
in the active F OV of a camera. The above condition dictates that the discrete state q
must satisfy Vnern = 1. If this condition is satisﬁed, then the target is visible in at

least one camera. Hence we can deﬁne the Init set as I nit = {q, K Iq E {q1, q2, Q3}}

66

Edge Set (E) and Reset Map (R)

The possible discrete transitions denoted by the edge set E are:

E = { (<11, <10), (ql, q3)(q2, qO)(q2, (13), (<13, ql), ((13, <12), (cm. (10)}

Note that under the assumptions of no occlusions in the camera views, continuity of
target motion, ﬁnite size of the target set and non-empty overlap area between the
two cameras, the transitions (q1,q2) and (q2, q1) do not occur. Phrther once the target
is lost from both the active ﬁelds of View, the target is considered lost and cannot
be recovered again, which implies that go is an invariant mode and does not let any
trajectories escape. Hence in our model we do not take into account the transitions
(q0,q1), (q0,q2), (qO,Q3). However, in a real surveillance scenario, strategies based
on searching for the target around its last known location can be developed and
implemented in order to escape the failure state qo.

The reset relation does not bring about any jumps in the continuous states. Hence

R(-,-,K)=K

Continuous Dynamics: Mutational Equations (2)

The target motion is modeled as a mutation K 3 w(li') which is unknown but can
be predicted as a piecewise linear trajectory using the image sequences of the target
(the derivation of which is out of the scope of this paper). The 3D model for motion
of the coverage sets K1 and K 2 can be derived using the procedure outlined in section
3.2. Mutational equations are used to model the motion of the sets Kn.

Using equation 4.15, the mutational equation for the motion of the coverage set

67

Kn can be written as:
k = Mk) = B u... v k e Kn (4.17)

where an 2 [w$,wy]T is the velocity input to camn. The above equation can be

succinctly written as:
K13 991(K1,U1), K2 3 902(K2,U2)

where ul and W are the inputs to the cameras caml and 031112, respectively, and are
speciﬁed by a feedback map Ll : [C(Wn) H U. The feedback maps (control inputs) u,-

are selected based on the mode of operation of the network (q).

Target Visible(bn = 1) If the target is visible to camn i.e., bn = 1, the control
input an is calculated directly from the sensed image using image-based Hausdorff

tracking proposed in [67]. The shape function (error) can be calculated as:

i _ 2
J", — [Kdenov dp (4.18)

where inK and ink are the projections of the coverage set of camn and the target

on the image plane iWn of camn.

Target Not Visible (bn = 0) When the target is not visible to camn we use the
method of cooperative Hausdorff tracking [67] to track a projection of the target
HWnIi' on the domain Wn, which is the capable ﬁeld of view of camn. It should be
noted that Wn is assumed to be convex.

However, the exact 3D location of the target set if is not known during the
surveillance scenario. Here we point out that the target is visible to camm, i.e., bm =

1. Hence, using proposition 1, we can say that tracking the projection HWn(Km) of

68

the coverage set K m on the capable ﬁeld of view Wn of camn will sufﬁce for tracking
the projection of the target set HWn(K), since K C Km. Now we can derive the
shape function (error) an which will be used as an error input to the controller for

camn for deriving the control input an. The shape function can be written as:

CJ = (12.. , d 4.19
Tl ,/I:[‘an(Km) ("LIX”)(p) p ( )

Continuous Control an Given the shape functions (errors) for the various cameras
in different modes and the mutation equations for the coverage sets, the control input
an can be calculated using the directional derivative of the shape function J (K) in
the direction of the mutation 90K denoted as J (K) The shape Lyapunov theorem
can now be used to study the asymptotic behavior of the measure J ( K) of the set K
and further derive assumptions on the input un which guarantee the convergence of
J (K ) to zero.

Hence by selecting a control input an which ensures that J (K ) g ——aJ we can
prove that the continuous dynamics of each mode {q1, q2, q3} are exponentially stable.

The detailed procedure is derived in [67].

Guard Conditions: Switching Control (G)

The guard condition can be construed as the switching conditions for the hybrid
controller. These conditions have a profound impact on the stability of the system
and should be carefully chosen. For our example system, transitions are enabled
based on the recognition of certain events such as the recognition of the target by
camn in its ﬁeld of view (bn = 1). Assuming no occlusions and deterministic target
recognition, the change in bn can be equated to the target coming into or falling out
of view of the coverage set Kn i.e, bn = 1 4:) K C Kn and bn = 0 4:) K gZ Kn.

Assuming that parts of the target can be recognized when they come into view of the

69

camera, we can write the switching conditions as:

G(ql,qs) = {KEMtKﬂKlaéQl}
G(qz,qs) = {KeszrnKHea}
C(quol = {KEM2KDK1=0AKHK2=0}
001240)={K6M:f(nK1=Q)/\RHK2:0}
C(qaqo) = {KeMzanlztARnK2=a}
C(q3,q1) = {K€M:f{nK1=(Z)}

G(q3, q2) = {K e M : fr 0 K2 = a} (4.20)

A hysteresis is inherently associated with the above switching conditions which will

prevent inﬁnite switching in a ﬁnite time period (Zeno execution).

4.3.2 Stability of the T wo-C’amera Example System

The question that arises at this point is “How can we ensure that the tracked target
does not leave the active F CV of at least one sensor at any time and the network
behaves in a desired manner?” This corresponds to the systems and control theoretic
notion of “stability” - i.e., we need to guarantee that the hybrid ﬂows within the
network do occur as designed, states converge, the system does not “blow up”, etc.

We propose a two—step process for the stability analysis of the pervasive surveil-
lance scenario in section 6.7.3 modeled using the MHA model. First we require that
each permissible mode of the model be internally stable. Internal stability implies
that the continuous flow within a particular mode of the hybrid system converges and
the hybrid state (Q, M) remains bounded away from the forbidden regions while the
system operates in any particular mode.

The next step will be to Show that the goal set of the mutational hybrid system

70

will be invariant. Hence any trajectory that reaches in the goal set of the mutational
hybrid system will always remain in the goal set. Now if we restrict the I nit set to
the goal set, the system trajectory will never leave the goal set and hence the system
will be stable.

The goal set will be comprised of the various stable models of the MHA model. In
order to prove the invariance of the goal set, we would be required to prove that every
hybrid trajectory which exits a particular mode in the goal set, will only traverse to
another mode in the goal set with the continuous variables initialized to permissible
states in the new mode.

This procedure will ensure that the trajectory will never leave the goal set. If the
goal set is taken as the set of modes for which the target is visible to at least one

sensor, we can prove the pervasive tracking of the target.

Internal Stability for the Two-Camera Example System

Given an equilibrium point for each mode of operation of the MHA, we can now
deﬁne the notation of internal stability [69] for each operational mode of the MHA.

Internal stability implies that

Deﬁnition 4.3.1 (Internally stability): A mode q of a mutational hybrid automaton
is said to be internally stable if the local deliberative logic and hybrid control primitives

have been designed such that:

a continuous evolution within each mode is stable in the sense of Lyapunov relative

to an equilibrium point deﬁned for that mode.

0 the mechanism of mode transition is such that there is some dwell time in the

mode q, switched to (i.e., inﬁnite switching does not occur)

We can now put forward the following lemma for the stability in the sense of

Lyapunov for the operating modes {q1, (12, q3} of the system.

71

Lemma 4.3.1 Consider the continuous closed loop dynamics of the example two-
camera system in the operating modes {q1, q2, q3} and the guard conditions 4.20. The

operating modes {q1, q2,q3} are internally stable.

Proof. For each state in {q1, q2, q3} we can deﬁne a candidate Lyapunov function V,-

as:

q1 : V1 26 J1 +2112
92 : V2 =i J1 +6 J2

q3 : V3 =i J1+iJ2

where, 2J", 2 0 and an 2 0 are deﬁned as in equation 4.18 and 4.19 respectively.
Hence V,- {i = 1,2,3} are candidate Lyapunov functions, since V,- _>_ 0, Vi = 0 implies
that the target is covered. Based on the procedure for calculating the motion input
an to the cameras outlined in 4.3.1, we can show that z'Jn and an exponentially tend
to zero, which implies that V,- also exponentially tends to zero. Hence we can say
that the continuous evolution in the modes {q1, q2, q3} converges exponentially to the
equilibrium state.

We further note that the switching conditions G deﬁned in equation 4.20 have a
certain hysteresis associated with them due to the ﬁnite size of the non-empty target
set R and the non-empty overlap region W1 (1 W2. Hence we can prove that Zeno

executions are prevented, which completes the proof. [:1

Hybrid System Stability for the Two-Camera Example System without

Controlled Switches

Based on Deﬁnition 4.3.1 of internal stability and lemma 4.3.1, it can be deduced
that each state (except qO) is internally stable. We can now deﬁne stability for the

pervasive surveillance scenario as:

72

Deﬁnition 4.3.2 {Stability of a pervasive surveillance system) A pervasive surveil-
lance system is said to have the stability property if a target set (R) will be continu-
ously covered in the monitored region if it enters the coverage set (Ki) of one of the

sensors comprising the network.

Now, if we can prove that the set of modes {q1, q2, q3} is invariant, then any hybrid
execution (r, q, K) starting in these modes will always remain in these exponentially
stable modes, which implies that the target will be pervasively tracked. Based on
these conditions we can put forward the following theorem for stability of a pervasive

surveillance network.

Theorem 4.3.1 Consider the graphical MHA model of a pervasive surveillance sys-

tem in Figure 4.2. The set of operating modes {q1, (12, q3} is an invariant set.

Proof. Consider a hybrid trajectory 5 H = (T,q, K) of the system with its initial
conditions within the set {q,KIq E {q1,q2,q3}}. The modes {q1,q2,q3} are expo-
nentially stable, which implies that the energy of the system in that mode can be
reduced to a very small value in any ﬁnite amount of time t. For mode ql this implies
that when the target R C K 2 hits the boundary of the domain W1, its projection
flu/10?) C HW1(K2) is already covered by K1 as IIW1(K2) C K1. When K is inside
the boundary of W2, IIW2 (R) Q R C K1. Hence the target is within the coverage
region of K1 and will be recognized by K1 which will generate the event to set bl = 1,
which in turn will switch the mode to q3 (assuming W1 0 W2 75 0). Thus as soon as
the target moves out of W2 and into W1 it will be covered by K1. A similar argument
can be presented for the mode qg and the target hitting the boundary of W2.

Now in mode q3, the target is located in the intersection of the domains W1 and
W2 (fr c W1 n W2) and is being covered by both K1 and K2 such that fr c K1 and
If C K 2 due to the exponential stability of mode q3. Now the target can only move
into the regions W1 \ (W1 (1 W2) or W2 \ (Wl 0 W2) and in both regions either K1 or

73

K2 can cover the target, respectively. This implies that the system can switch modes
to q1 or q2, both of which possess the exponential stability (in the sense of Lyapunov)
property.

We have shown that all trajectories originating in ql and q2 can switch only to
state q3 and all trajectories originating in q3 can switch only to either q1 or q2. This
implies that the set of modes {q1, q2, q3} is invariant, which proves the theorem. El

Based on the above theorem, and the fact that all the modes in the invariant set
{q1, q2, q3} are exponentially stable (in the sense of Lyapunov), we can conclude that
the hybrid system is stable i.e., the target set is continuously covered once it enters

the invariant set.

4.3.3 Two-Camera Example for Pervasive Surveillance with Controlled

Switches

For the two-camera pervasive surveillance example considered in the previous section
4.3, the discrete variables were comprised of Q = (b1, b2, c) = {q} where b,- is a binary
value which denotes whether the target is visible to sensor Cam,- and c is a discrete
variable which indicates which cameras are involved in the tracking scenario.

This representation of the discrete state c is not very intuitive. Hence for a truly
distributed implementation, we propose to modify the discrete variable c and separate
it into its constituent parts. Each camera will be associated with its own 0,- which
will assist in a distributed implementation.

For the example system presented earlier, only the modes in which the discrete
variable c was set to 2 were considered. This implies that for all possible locations of
the target all the cameras will be servoing the target. This scenario will be wasteful
for energy consumption and does not take into account the fact that the sensors need
not track the target if it is relatively far away from the sensor’s domain.

Since the cameras need not be tracking the target continuously and can shut down

74

when the target is not close by, the MHA model with controlled switching will be
more energy efﬁcient. Figure 4.3 shows the MHA model of the two camera system
with controlled a controlled switch which is used render a camera into an inactive

state.

75

8:336 8:838 55 2.858 eHmeoéBu 2: 8m Ecoﬁ <32 EQEQSO ”m6 853m

 

NK&UM ‘NN:N

 

      
 
        

  
     
 
       

  

 

                  
            

vRUNN <§U- NNSUM
AN3.N¥VNSMN¥ <
:5 A$2¥V§m Cw :5
k a w: .
/ “VAC: 3 gesmw mambo s >\
< \RUN I_ . N N: N
3&5”st a- «CNN 3.3" s .. u MCNN $2.5st
Aizkﬁsm Cw Aizvbicm .M
CEMSmww thsmk
2.3.8 n s 2.3.: n s
aéﬁgé snﬂecwéuicw téﬁgss
N ... 33%.?
%VA¥V h ﬁHN¥C-<QH€~CN~ QHNMCN<QHGNCN~

       

       
 
       

. A c: .e

  

  

( «gut: : <¥U<¥
:15”st go o: anew
Om_VoN Amwuaukv~$MN« A_B._¥v_$m_vw
a...» Eases ass
. .... game . /

... Iv ... l
:33. s sinus E:

 

52.: AXaOJNOV H ow

sn~¥C-<eu .MCM

(

IN I—
ﬁl VNCNN<QI “CM Ambamﬁlbiﬁvuw

76

Discrete State Space (Q)

The discrete states are modeled as Q = {q, b1,c2,b2} = {q} where bn denotes the
visibility of the target from camera camn and an denotes if camn is active or inactive

for that state.

Domain (D) and Init Set (Init)

For the MHA model depicted in Figure 4.3 the domains of operation of the various

modes are modiﬁed as:

B((IO) = {K C MKKn C Wnlne{1,2}}
D(q1) = {K 6 MIX c W2 AdW1(f() < 6}
0(92) = {K E Ml}? C Un€{l,2}Wn}
D(q3) = {K e MIR c w1 /\ dW2(f() < a}
D(q4) = {K e ler c W1 /\ dW1(f{) > 5}

0(05) = {K E MIR C WiAdW2(f{) > 5}

where, 6 is a positive scalar distance whose value is selected by the system designer
based on the controller design and stability requirements of the model.

The Init set can be deﬁned as Init = {q,Kl Vi=1toN b,- = 1}}. This implies
that the Init set encompasses all the states where the target is visible to at least one

camera.

77

Edge Set (E)

The possible discrete transitions denoted by the edge set E are:

E = {(q1, (10), (ql, q2)(q2, q0)(<12, r13). ((12, <11), ((13, (12), ((13, (10).

((14, (10), ((15, (10), ((14, (11), (C11, (14), ((15, (13), ((13, (15)} (4-21)

Continuous Dynamics: lV’Iutational Equations (2)

The continuous dynamics in states q1, q2 and q3 are the same as in the previous
example presented in section 6.7.3. However for the sates q; and q5, the continuous
dynamics are modiﬁed such that the camera that can View the target will perform
image based Hausdorff tracking while the other camera will have zero input. Hence

there will be a third case such that on = 0.

Camera Inactive, (on = 0) When the camera is inactive, the input the the system

is set to zero i.e., un = 0.

Guard Conditions: Switching Control (C)

The various events recognized by the distributed sensors are:
0 Target visible to camera camn
0 Location of coverage set of all neighboring cameras.

Using the location of the neighboring cameras, the current camera can estimate the

approximate location of the target as shown in 4.3.1. The guard conditions for the

78

various transitions can be written as:

A

“(11.92) = {KCM3KCl/Vll

C(qsml = {KEMIKCW2}

C(q1,q0) = {KEMzR'nKlztlAKnK2=0}
C(q3,q0) = {KEMrIi’ﬂKlzzleKﬂKg-zlb}
C(q2,q0) = {KEMzKﬂKlzleKﬂngﬂ}
C(q4,q0) = {KEMzR’ﬂKlrﬁAKﬂnglb}
C(q5,q0) = {Kelei’nlemARanzw}
C(qwn) = {KEMIK0K1=0}

C(q2,q3) = {KEMin'ﬂKg-tlll}

G(Q3.<15) = {K E M I dW2(Kll > 5+ 6}
C(q5,q3) = {K e M : dWZ(K1) 4.) a}

001144) = {K E M I dW1(K2) '> ‘5 +6}

G(CI4:(11) = {K C M3d111/1(K2)> (5}

where, 6 > 0 is an arbitrarily small positive integer. A hysteresis is inherently as—
sociated with the switching conditions for every mode which will prevent inﬁnite

switching in a ﬁnite time period (zeno execution).

4.3.4 Stability Analysis of the Two Camera Example with Controlled

Switches

We can apply a two step procedure similar to the one presented in 4.3.2 in order to
analyze the stability of the mutational hybrid automata model presented in Figure

4.3. First we prove that each mode in the set q = {q1, q2, q3, q4, q5} is internally stable

79

as deﬁned in Deﬁnition 4.3.1. Next we show that the goal set of the mutational hybrid
system is invariant which would imply that a trajectory which reaches the goal set
will remain in the goal set. Further, by restricting the I nit set to the goal set, we can
show that all trajectories of the mutational hybrid system initialized in the I nit set

will remain in the goal set hence completing the proof.

Internal Stability

Lemma 4.3.2 Consider the continuous closed loop dynamics of the example two-
camera system represented in Figure 4.3 in the operating modes {q1,q2,q3,q4,q5}
and the guard conditions 4.22. The operating modes {q1, q2, q3,q4,q5} are internally
stable.

Proof. For each state in {q1, (12, q3, q4, q5} we can deﬁne a candidate Lyapunov func-

tion V,- as:

CI1=V1 = CJi+iJ2
(121V2 = z"71+CJ2
Q3IV3 = z'J1+z'Jz
(14IV4 = z'J2

<15=V5 = iJl

where, iJn 2 0 and 6.11; 2 0 are deﬁned as in equation 4.18 and 4.19 respectively.
Hence V,- {i = 1,2,3,4,5} are candidate Lyapunov functions, since V,- _>_ O, V,- = 0
implies that the target is covered. Based on the procedure for calculating the motion
input an to the cameras outlined in 4.3.1, we can show that z'Jn and an exponentially
tend to zero, which implies that V,- also exponentially tends to zero. Hence we can say
that the continuous evolution in the modes {q1, (12, q3, q4, q5} converges exponentially

to the equilibrium state.

80

We further note that the switching conditions C deﬁned in equation 4.22 have a
certain hysteresis associated with them due to the ﬁnite size of the non-empty target
set R and the non-empty overlap region W1 0 W2. We can also assume that the size
of the camera coverage set for any camera is larger than the size of the target which
implies that that dW1 (K 2) g dW1(K) when the target set is covered by the coverage
set K 2 of camera cam2. We can write a similar condition for the target being covered
by camera 1.Hence we can prove that Zeno executions are prevented, which completes

the proof. Cl

Hybrid System Stability

In order to prove that the target will always be visible to at least one camera at all
times, i.e., stability of the hybrid system, we need to prove that the set of modes
q = {q1,q2, q3, q4,q5} is invariant. This will imply that once the target enters this
set of invariant modes, i.e., the target is visible to at least one camera, the hybrid

trajectory will never leave the invariant set.

Theorem 4.3.2 Consider the graphical MHA model of a pervasive surveillance sys-

tem in Figure 4.3. The set of operating modes {q1, q2, q3, q4, q5} is an invariant set.

Proof. Consider a hybrid trajectory 5 H = (r, q, K ) of the system with its initial
conditions within the set {q, Klq E {q1,q2, (13,Q4, q5}}. The modes {q1, q2,q3,q4,q5}
are exponentially stable, which implies that the energy of the system in that mode
can be reduced to a very small value in any ﬁnite amount of time t.

In the mode q4, If C K2. Further as the target moves closer to the domain W1,
dW1(K2) —> 6 and as soon as the dW1(K2) < 6, there is a mode switch to mode q1.

For mode ql this implies that when the target Ii’ C K2 hits the boundary of
the domain W1, its projection HW1(R) C HW1(K2) is already covered by K1 as
HW1(K2) C K1. When R is inside the boundary of W2, HW2(F{) g Ff C K1. Hence

81

the target is within the coverage region of K1 and will be recognized by K1 which will
generate the event to set bl = 1, which in turn will switch the mode to q2 (assuming
W1 0 W2 # 0). Thus as soon as the target moves out of W2 and into W1 it will be
covered by K1.

Now within the mode ql, consider the target moving away from the domain W1.
This will imply that the distance dW1(F() is increasing which would in turn imply
that dWl (K 2) is increasing. When dW1(K2) > 6 + e, the system switches to mode q4.

At the instant when there is a mode switch from mode q4 to q1, dW1(f{) > 6.
Assuming that the maximum velocity of the target centroid is vmax it will take the
target t1 = 6/vmam time to reach the boundary of domain W1. From the continuous
dynamics system, the error CJ2 can be reduced to zero in a ﬁnite amount of time
t2 S 5a where a is the gain of the Hausdorff tracking control input. Now, if t2 < t1
i.e., 6 > 5avmax, we can show that the Lyapunov function V1 in mode ql will be
reduced to zero before the target set intersects with the boundary of domain W1.
Hence we have shown that for all mode switches out of mode q1, the Lyapunov
function of q1 is reduced to zero.

A similar argument can be presented for the mode q3 and the target hitting the
boundary of W2 or moving away from the domain W2 such that dW2(K 1) > 5 + 6.

Now in mode q2, the target is located in the intersection of the domains W1 and
W2 (K C W1 (1 W2) and is being covered by both K1 and K2 such that K C K1 and
If C K 2 due to the exponential stability of mode q2. Now the target can only move
into the regions W1 \ (W1 0 W2) or W2 \ (W1 0 W2) and in both regions either K1 or
K 2 can cover the target, respectively. This implies that the system can switch modes
to q1 or q3, both of which possess the exponential stability (in the sense of Lyapunov)
property.

We have shown that all trajectories originating in q1(q3) can switch only to state

q2(q2) or q4(q5), all trajectories orignating in state q4(q5) can switch only to state

82

q1(q3) and all trajectories originating in q2 can switch only to either q1 or q3. This
implies that the set of modes {q1, q2, q3, q4, q5} is invariant, which proves the theorem.

E]

4.4 Chapter Summary

This chapter presents the analysis of the Hausdorff tracking method and the muta-
tional hybrid automata model presented in Chapter 3. The Lyapunov theorem for
shape spaces is presented and it is used to prove the stability (in the sense of Lya—
punov) for the Hausdorff tracking method presented. The stability analysis of an

example two-camera pervasive surveillance system is also presented.

83

Chapter 5

Design of Surveillance Networks

Collaboration among multiple sensors can generally lead to better sensing perfor-
mance and higher fault tolerance to individual node failure. Calibration of the sen-
sors aids collaboration by establishing a common frame of reference for the various
sensors. Calibration needs to be handled at many levels including location calibration
of the sensors, color map calibration, calibration of the target recognition modules
etc.

Coordinating the tracking of multiple targets using multiple sensors can be ad-

dressed in two ways:

1. Use a central monitoring station to integrate the track information from each

node [39].

2. Allow sensor nodes to operate autonomously and exchange track information
with one another in a geographical vicinity and provide a handover mechanism

when the target transitions from one sensor ﬁeld of view to another [37].

The ﬁrst approach simpliﬁes track management and communications but is not
scalable to large networks due to the limited communication and processing capability
of the central monitoring station. Also, the system is not fault tolerant to the failure

of the central monitoring station. The second approach is highly scalable and fault

84

tolerant to the failure of nodes, but involves signiﬁcant communication overhead for

target track management and disambiguation.

5.1 Wide Area Camera Calibration

Calibration of the sensors in a wide area environment is a difﬁcult task without
physical access to the environment that the sensors are placed in. Normal techniques
of system calibration such as using precise planar grids [70] are not possible in a
previously unknown and inaccessible environment. Only the intrinsic properties of
the sensors can be calculated and accounted for prior to deployment. Therefore, a
robust calibration method that allows for a wide dispersion of sensors to be calibrated

must be used.

5.1.1 Unitless Calibration

An unsynchronized network of cameras can be calibrated by trackng an identiﬁable
point in motion. This provides a unitless orientation of all sensors that have sensed the
point. The sensor network must meet the requirement of viewing area overlap for each
camera with at least one other camera. Also, all sensors must be interconnected via
overlapping viewing areas of intermediary sensors. Without this interconnectedness
the network would be segmented into groups that could not be related spatially. This
method is ideal for our proposed environment for the following reasons: 1) with a
drop-in-place environment there is no assurance that sufﬁcient identiﬁable markers
will be available for automatic calibration of the type proposed by [71] [72] 2.) A
Kalman ﬁltering technique can be used to reduce the error between pairs of sensors
and the propagation of error across the network 3.) The world coordinate system can

be deﬁned with respect to any camera in the network including the mobile sensors.

85

Initial Calibration

It is desirable to have taken the intrinsic characteristics of the cameras into account
prior to deployment to reduce the number of actions necessary to calibrate the system.

These intrinsic properties are represented in matrix form as

an 0 v.0
A: 0 av '00 (5.1)

.OOIl

 

 

and includes the focal length (au, av), image center (uo, v0), and skewness (c) of the
image [73]. Focal length is the distance from the lens to the projection plane. The
image center is the center of the projected image on the projection plane. This allows
a more accurate pose to be generated. Finally, the skewness is the ratio of the pixel
width to pixel height and is used for correction of a flawed lens or to normalize wide
view images. We will assume that these characteristics are already well known for
each sensor in the network.

The extrinsic parameters of the cameras can be found using a point-in-motion
scheme. By identifying a detectable point in motion it is necessary, for every pair of
cameras viewing the point, to calculate both the cameras’ transformation matrices. A
camera’s transformation matrix is composed of the rotation matrix and the translation

vector for the camera with respect to another. This matrix is represented as:

T=[R t] (5.2)

The intrinsic and extrinsic matrices are then combined to provide a complete descrip-

tion of the mapping of a point in 2D image coordinates to 3D coordinates in the world

86

frame.

 

 

" T
' ' (I?
’U.
y
2
1
_ .. ‘1

 

 

One method of wide area calibration of extrinsic parameters is a point in motion
calibration technique. This method, proposed by [74], allows for a single identiﬁable
point to be passed throughout the observable environment. When all sensors have
detected the point while another sensor has the point in view simultaneously, a rough
estimate of the unitless calibration can be generated. This method is better suited
for wide area calibration because of the relatively large distances between sensors. In
order for a calibration object to be used to generate accurate results, resolution is very
important. The most accurate calibration takes place with the object ﬁlling most of
the screen. For a good calibration to be performed with a calibration object in a wide
area, it would need to be quite large to ensure the pose was properly modeled. This is
one of the strengths of the point-in-motion technique. The points can be distributed
throughout the volume of the coverage set, creating a virtual calibration object.

The scalar A is called the depth and accounts for perspective effects [75] and can
have any value. This ambiguity is a result of the loss of the z coordinate in relation
between the object in space and the object on the image plane. For any point (ui,v,-)

in the image plane i, a matrix T,- exists that satisﬁes

 

 

F .
.. ., (I:
“i
y
/\2 U2 = ATi (5 4)
Z
1
- - 1

 

 

where A,- is the depth at that point i.

87

The collected data is then processed assuming that times are synchronous for
each pair of positions and recorded for the sensors. Processing includes generating an
initial rough extrinsic calibration. Since no actual synchronization is used to gather
data, there will be time domain discrepancies in every pair of 2D points used to
ﬁnd location and pose. This error in the developing map is reduced later using an

extended Kalman ﬁlter [76].

5.1.2 Method for Assigning Units

In order to ﬁnd the scale factor associated with the calibration matrix it is necessary to
make some accurate real world measurements. With a drop-in-place system a number
of methods can be used to ﬁnd the scalar A associated with the calibration matrices.
In our proposed environment there can be no assurance that a known dimension will
be found or that an object with a known dimension can be placed.

We have implemented a pre—calibrated dual head camera rig. This rig can be
used to accurately assign units to the unitless calibration data that has already been
generated. An accurate calibration such as that proposed by Zhang [72] will have
been performed and the unit’s characteristics will be considered well known so that
minimal error is introduced into the scaling of the sensor map. All sensors in the
network of calibrated pairs that now have a shared normalized transformation can
have units assigned with only a single shared marker with one other sensor by the

pre-calibrated dual sensor rig.

5.2 Grouping Architectures

One important issue we need to address for the surveillance network is group forma-
tion and management. There is extensive previous research done to address forming

and managing ad hoc network groups based on the proximity of the sensor nodes. In

88

most surveillance tasks, spatial proximity of sensor nodes is a vital characteristic of
sensor networks. In order to take advantage of the proximity of the nodes to save
energy or enable efﬁcient sensing information processing, sensors are often organized
into local collaborative groups.

The group management approaches differ from each other depending on how the
target object is identiﬁed and the parameters are detected. In the work of Chen
et a1. [77], the target is detected based on the acoustic signal strength and a group
management algorithm is proposed based on the Voronoi diagram. Liu et a1. [78],
proposed a distributed group management framework for target localization appli-
cations. In their framework, each node runs a likelihood ratio test (LRT) once the
target is detected. Later the group formation and leader election process is initialized
based on the result of LRT. In RoamHBA (Roaming Hub Based Architecture, [7 9]), a
subnetwork is established among the sensor nodes that are observing the same target.
Communication within this subnet is done through a multicast tree, called a roaming
hub. The subnetwork is updated by adding or deleting nodes at the edge.

For the visual surveillance network, we propose to use the RoamHBA(Roaming
Hub Based Architecture, [79]), based group management approach. One group will
be set up for all the nodes in a particular locality. In most cases, it is necessary for
the tracking group to elect a leader to coordinate the tracking and surveillance task.
Previous research work addressed the leader election problem in a wireless sensor
network

The dynamically elected leader is responsible for coordinating the tracking tasks of
each node in the group. Periodically the group leader runs the coverage optimization
algorithm. A timer is set up to invoke the process of coverage optimization. The
value of the time is determined based on the capabilities of the sensor node and the
characteristics of the targets being tracked. For example, if the targets move very fast,

the timer should have a very short timeout, and at the same time, the requirement

89

for the capabilities of the sensor nodes are increased. The result of the optimization
algorithm, that is, the (multiple) target assignments, should be transmitted to the
member nodes using the networking transport services provided by the sensor nodes.
The actual transport services being used can vary based on the network infrastructure
on which the sensor network is built up.

It is reasonable to assume that the sensor nodes are equipped with geographical-
location-aware devices. Extensive research has been done about forwarding packets
in a wireless network based on the geographical location of the nodes. Different
from traditional wireless routing protocols, in location-based routing each node only
needs to know its own position and the position of its one-hop neighbors. This
makes location-based routing perform well in a dynamic environment such as a mobile
surveillance network. More details about location-based routing can be found in [80].

If the sensor nodes are geographically distributed and connected to the Internet,
common multicast protocols can be used for the group leader to disseminate the cov-
erage information to the group members. The group management can be done using
protocols like Internet Group Management Protocol (IGMP) and multicast routing
protocols can be used to set up the routes. In more common cases, where the sensor
nodes are connected using wireless networking technologies such as IEEE 802.11 series
technologies, group management techniques presented at the beginning of the section
can be used. The dissemination of the control information for sensor tasking can
be done using the multicasting or broadcasting techniques in wireless networks. For
access-point-based wireless networks, multicast among the group can be done with
the aid of the access points. For wireless ad hoc networks, much research has been
done to solve the multicasting problem. In the most straightforward case, a broadcast
simulated multicast, which involves the nodes boradcasting within their locality, can

serve the goal of a surveillance network.

90

5.3 Sensor Allocation for Multi-Sensor Cooperation

For successful completion of the surveillance task, it is important to allocate the
sensors to the targets that are actively being tracked. Generally, each target should
be tracked by the sensor that can view it at an optimal resolution for best visibility of
that target. However, when the system is heavily loaded, that is, the target density
in a particular region with a limited number of sensors is high, one sensor might
need to track multiple targets. Even when the system is underloaded, multiple target
tracking by individual sensors is highly desirable as viewpoint redundancy will lead
to better estimates of the target coordinates and also present the viewer a clearer
view of all the targets that are being tracked. In this section we propose a heuristic
method to determine an optimal sensor allocation strategy.

At each system interval, the leader node computes a cost and visibility weight
for the targets being tracked. Assume m sensors with cameras were setup in the
system and there are n targets that need to be covered. One camera can cover from
O to up to n targets, so for each camera, the number of possible coverage sets is
2220(2) = 2'". In order to determine the optimal coverage set for each camera, we
set up a weight matrix W for the coverage Options. An element of the matrix W is

deﬁned as (i E [1,m] and k 6 [0,2" — 1]):

C(L(k))
wik: Z Vi,L(lc)'(Rs,i,L(k)'wv+P3'wP) (5-5)
3:1

The following parameters are explained as follows:
0 L(k) is the kth combination of the set [1, n].

o ViJJUC) is a binary value (either 0 or 1) that indicates the ability for the sensor

node to cover all the targets in the set L(k).

o Rs,i‘,L(k) is the resolution of target 3 viewed by sensor node i when all the targets

91

in the set L(k) are covered by sensor node i. The resolution is calculated as
follows:

CC
RS,’l,L(k) = /\—3- (5-6)

zs,i
where x, is the width of the target 3 and 23,,- is the distance between target 3
and sensor node i. A E (Am-n, Am”) is the zoom factor (focal length) of the

camera of sensor node i.
a P5 is the preassigned priority of target 3 in the tracking system.

a my and wp are adjustable parameters to reﬂect the importance of resolution and

priority in the coverage weight.

For each row of the matrix, a larger value will indicate a more desirable target
assignment. However, it is required that all targets should be covered by the sensor
nodes. Thus, an optimal solution of the problem can be deﬁned as {(y,Ly)|y E
[1,n] and Ly is a combination of the set [1,n]} which satisﬁes that 23:1 Ly is the
largest among all the possible coverage set for each camera.

By brute force the complexity of the algorithm is 0(2m"). When n and m are
large, heuristic algorithms are needed to ﬁnd the optimal assignment. In reality,
many heuristic approaches can be taken into account when solving the problem. For
example, we can prove that under normal target density, the coverage matrix will be
a sparse matrix, where most of the elements are zero. Eirther, in most cases, we do
not need to ﬁnd the optimal solution. Finding a sub-optimal solution would reduce
the complexity of the algorithm dramatically. For example, after setting a threshold
of the acceptable weight, we can take the ﬁrst solution encountered as long as all the

targets are covered.

92

5.4 Optimal Hausdorff Tracking

Given the task speciﬁed by equation 3.31, notice that C (K, K) is not a square matrix,
which indicates that the system is redundant. Further the condition from Equation
(3.30) indicates that there are numerous choices of the input u which satisfy the
stability requirements of the Hausdorff tracking system. This implies that the system
has inﬁnite solutions for a choice of input u which will accomplish the task. Therefore
the designer can choose an algorithm in order to achieve an optimal solution for this
system. An appropriate choice for u will minimize the resolution inadequacies and
energy consumed.

A review on recent literature shows that a commonly used approach for the optimal
redundancy resolution relies on the use of generalized or pseudoinverse transformation
to degenerate the system, while optimizing the given task criterion constructed from
multiple objective functions using a weighted sum method [81], [82].

Optimality can be quantiﬁed by deﬁning supplementary task functions which the
system needs to accomplish. These functions are called objective functions which
need to be minimized in order to achieve an optimal solution. The optimization
problem can be expressed as:

Find the target and coverage sets (K and K respectively) and input u which
satisfy:

T

min: D(K,K,u)= D1(K,K,u) DN(K,K,u)
K,K,u

JOCK): :d2 (p)dp
Subject to: fK K (5.7)
J°(K,K)(<p1,c,92) = C(K, K)u g —aJ(K, K)

where, Di(K,K,u), i = 1 to N are the N objective functions, D(K,K,u) 6 RN

is a vector of these objective functions.

93

For the case of visual surveillance, the choice of objective functions can include

(but is not restricted to):
1. Energy consumed, which can be represented by ||u|]2.

2. Resolution constraints, which can be represented as Area_min g If? dp S

Area_max.

Various methods such as weighted sum approach and physical programming can
be used to solve the above optimization problem. It is proposed to use physical
programming for solving the constrained multi-objective optimization problem. The
signiﬁcant advantage of the physical programming approach is it capability of placing

the design into a more ﬂexible and natural framework.

5.4.1 Framework for Optimal Hausdorﬁ Tracking Design

Surveillance systems for target tracking are designed to perform tracking tasks with
different requirements under varying environmental conditions. Task requirements
vary even during the progress of a single task. The task criterion will vary for different
kinds of task requirements. Furthermore, a dynamically varying environment implies
the task criterion should be generated on-line. Thus, the new challenge for optimal
task distribution for the target tracking task is dynamic modiﬁcation of the task
criterion based on task requirements. A new dynamic multi-objective optimal task
distribution algorithm for a target tracking system employing Hausdorff tracking
method proposed in [67] is developed. The algorithm’s architecture is shown in Figure
5.1.

The dynamic optimal task distribution algorithm consists of three modules: task
analyzer, optimization scheduling, and optimal solution calculation. The task ana-
lyzer takes current condition of motion module, target state and working environment

as inputs, analyzes requirements of the current task, and generates a collection of pa-

94

Optimizer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A Environment
Sensors Task Information Sensors
l Analyzer = 1
A Task A L
Indices tr
Robot . . . R°b°t
Sm, Optimization States
Scheduling
Task Performance
RObOt Criterion RObOt
1 V 1
Optimal l
Optimajlask Solution Optimal Task
Distribution Calculation Distribution

 

 

 

 

 

 

Figure 5.1: The architecture of the optimal task distribution scheme.

rameters called as Task Indices (TI) to represent task requirements. The task analyzer
passes the task requirement information, TI, to the optimization scheduling module.
Based on T1, the optimization scheduling module builds the task criterion function
for the optimal task distribution. The optimal solution calculation algorithm then
ﬁnds an optimal solution for the criterion.

The optimal task distribution system may be located anywhere on the network
and can also be incorporated as a separate subsystem in one of the sensors. The
current environment conditions and state of the target and various sensors are input
to the optimizer over the network. The optimizer calculates the optimal solution as
a task distribution amongst the various sensors and robots involved in the tracking
task. This task distribution is represented as a vector of additional inputs to the

various sensors and is communicated to the sensors over the network.

95

It should be noted that in the presence of random network delay, which will affect
the timely communication of the task distribution vector, the various sensor nodes
can still perform the tracking task assigned to them, albeit in a less optimized fashion.
This is because the optimizer only assigns an optimal task distribution to the various
sensors and not the complete tracking task.

The various parts of the optimal task distribution module are described in the

following sections.

Task Analyzer

Different kinds of tasks have various requirements which depend on many factors
such as the nature of the task, the targets being tracked, the location and state of the
various sensors involved in the tracking task, the obstacles in the working environment
etc.. A complex task may have varying requirements even during execution of the
task. The task analyzer takes as input the various stated of the task including the
location and states of the various sensors, targets and obstacles and provides a task
index vector as an out which represents the requirements of the task in a concise

form.

Task Indicators

For optimal performance, the task criterion function should be dynamically modiﬁed
according to time-varying task requirements. An on—line task analyzer is proposed
to analyze task requirements. Using the information from the sensors and the work-
ing environment as inputs, a collection of task indicators is generated online, which
represent the actual conditions of the tracking task being performed. Using a fuzzy
inference scheme, the task analyzer maps the task conditions represented by task

indicators u, into TI, which represents task requirements. The task indicator vector

96

u 6 RM (set of all task indicators) is deﬁned as:
u={u1,u2,~-,uMquiEU, i=1:M}T. (5.8)
where, M is the number of task indicators.

Task Indices The Task Indices (T 1) vector is used to mathematically describe
the task requirements. These indices are generated by a fuzzy mapping which takes
the task indicators as input variables. Figure 5.2 shows the structure of the Task

Analyzer. Various task requirements have their own physical and functional meanings

Task Analyzer

Indicator Analyzers for:

 

 

equirement Class

 

 

   
  
 

 

 

 

 

 

Task
Robot States I d . In di 0 e s
Sensory Info equirement Class i n 6’“
Operator Cmds. i

 

 

 

equirement Class 2

 

 

C Index 1
equirement Class 1

 

 

 

 

Figure 5.2: The structure of the task analyzer module.

according to which they can be categorized into classes. For example, various classes,
such as maintaining the manipulability of the motion module, singularity avoidance,
conserving energy, maintaining adequate target resolution, etc., can be used.

For each class of task requirement, we generate one Task Index. For all task

97

requirements, a vector of Task Indices is generated:

Index = {Index1,1ndex2,---,IndexK}T,

Al

, ai,j9i,j(”j) (5.9)
Indexi = f,-(u)=‘7:1

 

.M
Z “232'
3:1

where I ndexi is the indicator analyzer for the ith requirement class. am- is the relative
weight of task indicator uj in the task requirement class i, 9231' is the mapping function
of the task indicator uj for the task requirement class i. Equation (5.9) maps the task
indicator value into the extent of task requirement. The relative weights are assigned
based on the physical meaning and importance of a particular task indicator within

a particular task requirement class.

Optimization Scheduling

The TI vector, generated in the task analyzer module, is the input to the optimiza-
tion scheduling module. This module generates the task criterion based on task
requirements represented in the TI using physical programming. The structure of
the optimization scheduling module is shown in Figure 5.3. Taking vector Index as
the input, a task criterion function is generated. This module consists of two parts:
physical programming and PP (Physical Programming) analyzer.

Physical programming, which is a multi-objective optimization approach, maps
the collection of objective functions into a utility function. Since the task requirements
vary according to transformation of tasks and environment conditions, the mapping
model for physical programming should change with the task requirements. Therefore,

the PP analyzer is built to analyze the physical programming mapping model based

98

 

 

4:1 Preference
_. Mapping for
Objective

Fuﬂon 1 Task

 

 

 

 

Performance
Preference Criterion

Mapping for ,
Objective
Function 2

 

 

Index PP Conﬁdence 52

 

 

V

Analyzer Vector

 

 

 

 

 

 

 

6 Preference
n Mapping for
Objective

Function N

 

 

 

 

 

 

 

 

 

 

Figure 5.3: The structure of the optimization scheduling module.

on the TI. A conﬁdence vector 3 6 RN,

E:{C17623”' iCNlT (510)

is generated to represent the degree of the system designer’s satisfaction about the
physical programming mapping model according to the current task’s conditions.

Finally, physical programming mapping is modiﬁed based on 3.

Physical Programming Optimal task distribution for Hausdorff tracking using
multiple robots is a multi-objective optimization problem for optimizing a vector of
objective functions, D, in equation (5.7). Usually, the system designer maps the
collection of multiple objective functions C,- into a utility function Z G R, which we

call the task criterion, using a mapping P : RN ——> R as:

99

D,- is the objective function for the ith task requirement. The optimal solution is
achieved by choosing the system variables [u, p] which calculates the optimal value
for the criterion Z, where u is the input velocity of the various axes and p is the state

of the sensors and targets.

Weighted Sum Method The weighted sum method is the most popular mapping
method for multi-objective optimization problems. In this method, the task criterion
is formulated as a weighted sum of the objective functions, where the system designer
assigns a relative importance (weight) to each objective function. However, the task
criterion obtained by this method lacks physical meaning. Misinterpretation of the
theoretical and practical meaning of the weights can make the ﬁnal solution unsatis-
factory. Although there are many methods for choosing weights, a priori selection of
weights does not necessarily guarantee the acceptability of the solution [81]. Further-
more, it is impossible to obtain a solution on the non-convex regions of the Pareto
frontier [83].

Thus, physical programming [84], is used to generate the task criterion. Using
this method, the system designers only need to specify a preference structure for
each objective instead of assigning a weight which may be meaningless. This usually
has more physical meaning and can better guarantee a satisfactory solution. Using
physical programming, the system design can be put into a more ﬂexible and natural
framework. Another advantage of physical programming is that it can obtain the

solutions in the non-convex regions of the Pareto frontier [83].

Preferences Mapping Using the physical programming approach, objective func-
tions are mapped into a preference space. The preference is a parameter which rep-
resents the extent of the designer’s satisfaction. The task criterion is set up as the
aggregation of the preferences of all the objective functions. Thus, instead of being

physically meaningless, the criterion becomes a satisfaction factor of the solution.

100

The designer’s expression of the preferences with respect to each objective function
can be categorized into four different classes: smaller-is—better (1S), larger-is-better
(2S), value—is-better (BS), range-is-better (48) as in Figure 5.4. According to Messac’s
research [85], the mapping is depicted as Figure 5.4. In this Figure, u,- is the value of
the ith objective function, P7; is preference, a real positive scalar, of the ith objective
function. Each objective belongs to one of these classes. [83—85] provide a more deeper
analysis of physical programming.

According to the degree of the desirability, the objective function values are cat-
egorized into several regions as seen in Figure 54. Taking the case of Class 1S as

example, the preference ranges are:

Ideal range (#2 S U21)
Desirable range (v11 S m S 1’2'2
Tolerable range (”012 S i S 7123
Undesirable range (

Highly Undesirable range (viz, _<_

Unacceptable range (”Dis S i)

The parameters, ”2’1 to vi5, are physically meaningful constants which express the
designer’s preference associated with the ith objective. For physical programming,
higher the desirability of the objective, the lower the value of preference. Therefore,
physical programming problems are always of the minimization type.

The class functions for preference mapping should have several important prop-
erties such as nonnegativity, continuity and convexity. Further, the preference value
P at range intersections (e.g., Tolerable-Undesirable) is the same for all class types

and objectives.

101

HU — Highly Undesirable
HD — Highly Desirable CLASS'l S ‘1’. (ﬂ )

UA — Unacceptable
U - Undesirable
T - Tolerable
D - Desirable

  
 
 
 
 
 

 

  

HD

CLASS-28

131-(#1.)

 

 

CLASS-38

  
 
 

 

 

 

 

 

 

 

 

 

 

 

 

UA HU u IT u HU‘UAﬂ.

l

”i5L ”i4L ”13L ”£21. ”i1 ”i2R ”i3R ”i4R ”i5R

CLASS-4S
PI-(ﬂi)

\ / UA

D HD ,

UA HU u T 1“ ’10 I T u Hub”:
”i5L ”i4L ”:31. ”i2L ”i1L ”i1L”i2R vi3R ”i4R ”i5R

Figure 5.4: Class functions for physical programming.

102

Physical Programming Formulation Based on the preference mapping of the ob-

jectives, physical programming problems can be stated as:

A!
1111112 2 Z P,- (gi(x))
x i=1
subject to:
9100 S 015 (for class IS)
9i(x) 2 112-5 (for class ZS) (5.12)

’Uz'5 L g 9,-(x) g 15:51; (for class BS and 48)

where x is the system variables, M is the number of the objective functions, gi(x) is

the ith objective function, P,- is the preference class function of 92-.

Physical Programming Analyzer (PP Analyzer) The preference structure
(say, class type and 211 to ’05) for each objective function must be speciﬁed to set
up the global criterion function. However, due to changing task requirements, the
initially speciﬁed preferences for the objectives may not represent the objective’s de-
sirability accurately at later time instants. Therefore, physical programming should
be modiﬁed according to the task requirements.

A PP analyzer module is built to handle this problem. The PP analyzer takes
the Task Dexterity Indices as inputs, and generates a vector E = [$1, - -- ,EM]T to
represent the degree of designers’ conﬁdence about the preference structure speciﬁed
for the objective function. It passes this conﬁdence vector to the physical program-
ming module. The latter module will modify its result according to the conﬁdence

parameters. The analyzer is formulated as:

E,- = (ai + k,- x indexi) i: 1, - -- ,]W (5.13)

103

where ai is a constant which represents the system designer’s conﬁdence about the
original preference mapping function, It, is a constant. The criterion generated by
physical programming will be modiﬁed as:

M

Z = 26.- >< 1).-(am) (5.14)

i=1

Equations (5.13) and (5.14) show that when indezri increases, the preference value

 

 

 

 

P. ‘ index; = 0.8
' 5,. = 0.9
index, = 0.2
6,. = 0.6
sz
#‘1 ' D
v2 v2
.11:

 

Figure 5.5: Modiﬁcation of the preference mapping by conﬁdence factor.

of the ith objective function will increase. Thus, when the task has high preference
on the requirement class i, the objective function D,- should have a higher preference
for further minimization. Thus, the task requirement condition governs the path of
optimization.

From another viewpoint, multiplying a factor with the preference mapping func-
tion can be treated as modifying the structure of the mapping. Taking the class 18 as
an example (from Figure 5.5), if the preference value at the range intersection (say 202
Desirable-Tolerable) is kept unchanged, the desirability ranges of the objective func-

tion change with the task index (conﬁdence factor). In this Figure, index’ > index

104

will induce 112' < '02 which makes the Tolerant range closer to the ideal range.
Therefore, combining equations (5.7), (5.12), and (5.13), and replacing g,- by Di,

and x by u, p, the optimal Hausdorff tracking problem is formulated as:

M
2511112 = 2 £1 X Pi (Di(u,P)) (5-15)
'1’ i=1
subject to
—aJ 2 Cu
D- u, S v- for class 18
i( P) 15 ( ) (5.16)
Di(u, p) Z 1),-5 (for class 28)

21,51, 3 D,(u,p) _<_ "oz-53 (for class 3S and 48)

where, a is a positive deﬁnite gain and J is the vector of shape functions described
for a speciﬁc task. The system constraint equation -on = Cu is derived from the

Hausdorff tracking task controller derived in [67,68].

Optimal Solution Calculation

The next step is to ﬁnd the optimal solution for the optimization problem speciﬁed by
equations (5.15) and (5.16. In this formulation the system constraint -aJ = Cu takes
the form of the linear equation. Therefore, we can use the generalized elimination
method [86] to eliminate the system dependent variables: suppose that there exist
matrices Anxm and an (n_m) such that [A B] is non-singular. Usually, for the single
target Hausdorff tracking task, n = 6 and m = 1. If matrices A and B satisfy the

following equations:

CA=I
CB=0,

(5.17)

105

the system solution is formulated as:
u S —aAJ + B6 (5.18)

where, 6 denotes the Null space of the matrix C.

The reduced optimization problem is shown as:

M

minZ = Z 5,- X P,- (Dz-(u,p)) (5.19)
5 i=1

subject to
D,-(u,p) _<_ 212-5 (for class IS)

Di(u,p) 2 222-5 (for class 28) (5'20)
1),-5L S Di(u,p) S "oz-53 (for class 33 and 4S)

where, u g —aAJ + Be
(5.21)

p = 13 + U/ f
13 is the initial value of the system variables p, f is the control frequency, 6 is and
arbitrary vector and A and B are as deﬁned in Equation (5.17). Using an online

pattern search algorithm [82], the system can easily obtain the optimal solution.

5.5 Switched Video Feedback

Video feedback is an essential component of the surveillance system. The task of cap-
turing, transmitting and displaying the live video stream from the various sensors to
the requesting clients is handled by the video subsystem. Since multiple cameras are
deployed to track the identiﬁed targets, multiple, concurrent feedback video streams
may be required for monitoring the target. These sensors initiating these streams
will be changing from time to time as the target moves out of range of the current

sensors tracking it. However, providing multiple unnecessary (unrelated to the task)

106

 

 

  
 
     

Sensor node 1 video server

 

 

 

 

 

 

 

 

 

 

 

 

Client display

 

Figure 5.6: Switched video architecture implementation

video feedback streams often causes loss of attention span of the operator and makes
it hard to keep track of the various activities over the cameras. Hence only video
streams from relevant sensors should be presented to the operator on a per-activity
basis. This is done through automatic or manual switching of the camera streams
that are presented to the operator.

The system is designed to support video feedback to multiple clients over an IP
network. The video service is designed to support various types of live video stream
feedback from the sensors to the individual clients such as MJ PEG and H.263. Various
resolutions such as CIF, QCIF and 4CIF are supported for the live video streams.

A switched video client and server architecture have been implemented as shown

in Figure 5.6.

107

The various properties and tenets of the design of the switched video feedback

system are

0 The client should be able to switch the video stream between multiple servers

in order to track a target as it ’handed-off’ between multiple servers.

0 The client should also afford the viewing of video streams from multiple servers
simultaneously in order to display multiple tracked targets or multiple views of

the same target.

0 The client should receive selected video feedback based on the commanded task
which will not tend to overwhelm the operator. This selection of the feedback

streams should be done automatically.

In order to transmit real—time video over a network, it is necessary to convert the
output of the encoder into a packet sequence - i.e., packetize it. Traditional TCP and
UDP services are not sufficient for real-time applications, and applications-oriented
protocols such as the Real~time T1 :13". sport Protocol RTP provide an alternate solution
for managing the essential tradeofls for quality and bandwidth. RTP provides end-
to-end delivery services such as payload type identiﬁcation, sequence numbering and
time stamping.

The proposed architecture has been implemented on a PC running Linux OS.
The Qt toolkit [87], which is a complete C++ application development framework,
including a class library and tools for cross-platform development, has been used to
develop the client and application. A Windows-only client has also been implemented
using Microsoft Visual Studio 2005. The current implementation of the user interface
is shown in Figure 5.7.

We have implemented the MJPEG, H.261, H.263 and H.264(hardware based)

video compression schemes which are packetized using RTP / U DP transport protocol

108

 

SW l

 

 

 

 

 

 

 

(0‘ mmtdswdmviduamﬂ
(‘ mmwwmmm
Mun

 

 

 

 

 

Nova-awa-
:13; m am.
Em .———J—'-
roam -—-J—
swam —J—
M -.-.——J—

atg-s ].
Mn ] _1 Ma]

_1

_l

WM~-*‘— 'P'“ .144 "ﬁr-1.1.1.!

Gd“; Milan] mal; :l WT— _l
_J

m “' TM ator: record Calm Cum Nun
.___J"’W: |——'Mﬂ——l [—,

Figure 5.7: H.624 Based Switched Video Streaming Client.

109

for transferring the surveillance video to the client. Characteristics and implemen-
tation details of the video subsystems implemented are summarized in the following

discussion.

5. 5.] Characteristics and Implementation Details for MJPEG

MJ PEG stands for Motion-JPEG and is a technique which simply performs a J PEG
compression on each video frame before transmission. Unlike MPEG and H.263
codecs, MJPEG does not support temporal compression but only supports spatial
compression.

The main advantages to using this approach is that J PEG compression can be im-
plemented relatively easily in hardware and it supports a wide variety of resolutions.
This implies that a wide variety of hardware can be supported in case of a network
with heterogenous video capture hardware. Phrther, it uses no inter-frame compres-
sion which results in low latency in the video transmission system. However, the
major disadvantage for using MJPEG technology is its inefﬁcient use of bandwidth.
Due to the lack of inter-frame (temporal) compression, MJ PEG streams require a high
bandwidth of the order of 2 Mbits/s for a 30 fps NTSC resolution stream. Though
at lower frame rates and lower resolutions MJ PEG can be used effectively, its use

cannot be justiﬁed in low bandwidth applications such as wireless sensor networks.

5. 5.2 Characteristics and Implementation Details for H.261

H.261 is a video coding standard published by the International Telecommunications
Union (ITU) [53]. It supports CIF (352x288 pixels) and QCIF (176x144 pixels)
resolutions. It supports both temporal and spatial compression for reducing the size
of the encoded image. The coding algorithm is a hybrid of inter-picture prediction,
transform coding, and motion compensation. The data rate of the coding algorithm

was designed to be able to be set to between 40 Kbits/s and 2 Mbits/s. The com-

110

pressed video stream is structured into a hierarchial bitstream consisting of four parts,
namely: (1) Blocks which correspond to 8 x 8 pixels; (2) Macro blocks which corre-
spond to 16 x 16 pixels of luminance and two 8 x 8 pixel chrominance components;
(3) Group of blocks (GOB) which corresponds to 1 / 12 of CIF picture or 1/3 QCIF
picture; (4) Picture layer which corresponds to one video frame.

The H.261 standard actually only speciﬁes how to decode the video stream. En-
coder designs are constrained only such that their output can be properly decoded by
any decoder meeting the standard spec [53]. In our implementation, based on the Vic
software, the H.261 bitstream is encoded such that macro blocks that have changed
signiﬁcantly from the previous frame are updated in the current frame. Also, each
macro block is updated at least once every 32 picture frames using a block aging

process.

5. 5.3 Characteristics and Implementation Details for H. 263

H.263 is video coding standard by ITU [54]. It was designed for data rates as low
as 20 Kbits/s and is based on the ITU H.261 standard. It supports 5 resolutions
(CIF, QCIF, sub-QCIF, 4CIF and 16CIF, where CIF is standard 352x288 pixels
resolution). Like H.261, it uses both temporal and spatial compression and also
provides for advanced coding options (not included in H.261) such as unrestricted
motion vectors, advanced prediction and arithmetic coding (instead of variable length
coding) for improvement of video quality at the expense of video codec complexity.
It allows for ﬁxed bit rate coding for transmission over a low bandwidth network as ‘
well as variable bit rate coding for preserving a constant image quality and frame rate
for storage and transmission over high bandwidth networks.
Each picture frame can be coded as either an ‘I’(intra), ‘P’(predictive), ‘B’(bidirectional)

or ‘PB’ frame. ‘I’ frames are intra—coded reference frames which contain all the infor-

mation needed to initialize the decoder and display the image. ‘P’ and ‘PB’ frames

111

are inter-coded frames which require information from other frames at the decoder in
order to decode them. In our implementation for the H.263 codec, which is based on

the Vic software, a completely intra—coded ‘1’ frame is transmitted every 10 seconds.

5.5.4 Characteristics of H.264 / MPEG4-Part10

A recent development for video coding is H.264/MPEG4 Part 10 [55], also named Ad-
vanced Video Coding (AVC), which is jointly developed by ITU and ISO. H.264/MPEG-
4 supports video compression (coding) for realtime video delivery over low bandwidth
networks. In contrast to the frame based coding in MJPEG, H.261 and H.263+
schemes, the coding in MPEG4 streams is object based where each scene is composed
of objects, which are coded separately. This object based coding structure can be
effectively used for video surveillance systems [88]. However, current software im-
plementations of the coding scheme have been tested to have best-case capture and
compression latency of the order of 500ms - ls which precludes their use in a realtime
video scenario.

We have integrated a commercially available hardware based H.264—baseline and
H.264-enhanced encoder in our switched surveillance implementation which supports
a frame resolution of up to 1080x768 pixels2 at a frame rate.of up to 30 Frames/ sec.
The hardware encoder supports constant bit rate encoding as well as variable bit rate
encoding. For a frame rate of 30 fps and an image resolution of 800x600 pixels2
with variable bit rate encoding and a tracking scenario involving camera motion, the
baseline encoder has a bit rate of approximate 500 Kbps. The interface developed is

shown in Figure 5.7.

112

 

  
 

  

Sensor

 

 
     

 

 

 

 

 

ﬁat—— Information
lon :’ ........................ , ...........
Module ] I 3 Sensor Nodes Information
Table about other
Target into No— .

 

 

 

 

 
  

 

  

 

 

 

 

 

 

 

Image based ’ Module Wide
’ ‘ Yesl
Trackm, 5 Targe fromother
S , Target Observations Table NM”
ensmg - -
Data (Videorl Cmmm Video Packetize Video
'Qmpressio for transport j

Motion active FOV . I t T etLocation - -
arsgo am 1 mutations
[TI—>1
Figure 5.8: Architecture of Sensor Node.

 

 

 

 

 

 

 

 

5.6 Sensor Node Architecture

Figure 5.8 depicts the general architecture of a sensor node. The individual sensor
nodes maintain information regarding the observations of their neighboring nodes
and broadcast (within their locality) their own observations. Based on the combined
observations, each node develops a list of targets being actively tracked and the status
of its peer nodes and stores this information in the targets table and the sensor nodes
table, respectively. In the targets table, the native as well as observed characteristics
of the target objects, observed by the respective sensors, are stored. The targets table
also stores information indicating the node that sensed these characteristics. Nodes
also store peer information, such as location, active FOV and total capable FOV of
the peer.

When the target to be tracked is outside the active FOV of the sensor (but in
the capable FOV), the node can still ensure that the target is acquired in the active
FOV using target location information available from other sensors in the targets
table. This paper proposes the method of cooperative Hausdorff tracking for deriving
assumptions on the input a to the robot/camera to bring the target into the active
FOV of the sensor. Using the cooperative Hausdorff tracking method the target can be

brought into the active FOV of the sensor, and assuming that the visual characteristics

113

of the target can be recognized, the sensor will switch over to image-based Hausdorff
tracking to maintain the target in the active F OV.

The sensed video information is broadcast to the requesting clients using the video
subsystem, which consists of a video server that compresses and transmits the cap-
tured video using either MJ PEG or H.263 bit stream mounted over the RPT/UDP/IP

transport protocol.

5.7 Modeling Switched Video Feedback as a Dynamic Sys-

tem

The conﬁguration of the cameras C can be defined as the collection of all the param-

eters of the cameras involved. That is:

C={Cl,C2,...,CN} (5.22)

where, C,- 6 RP is a vector containing all the intrinsic and extrinsic parameters of the
camera. The words “layout” and “placement” will also be alternatively used with
camera conﬁguration but they will reﬂect the internal parameters of the camera, too.

Consider a monitored region as a domain E C R" under surveillance. A ﬁnite
number (N) of cameras is distributed in this region and involved in the surveillance
task of maintaining visibility of a target as it moves within the monitored region.
The monitored region E can be discretized on a ﬁnite-dimensional grid 9 = {WW} 6
IR”,i=1,..,g} which consists of a ﬁnite number, 9, of vertices Vi. Note that grid 9’
can be generated from the domain E using various approximate cell decomposition
methods where the cells have a pre—deﬁned shape and size in order to achieve a certain
resolution.

Consider any continuous-time target trajectory in two dimensions kc(r) : 1" C +—->

R2. For every 7' 6 Fc, kc(r) provides as an output the location of the target. This

114

continuous time trajectory can be approximated as a discrete-time ﬁnite-dimensional

map on a grid as

k(t) : r H ug (5.23)

where, F = Z and 119 represents the ﬁnite-dimensional space of the locations of the
vertices V,- of the grid 9.

The dynamic sensor switching problem can be modeled as a discrete—time, ﬁnite-
dimensional dynamic system. The state space of this system is the ﬁnite set of camera
states deﬁned as: Q = {q,-lg,- = (i,C,-),i E (1, ...,N),C,- E C} where N is the total
number of cameras and C is space of conﬁgurations of all cameras.

Using the above deﬁnitions we can now describe the model of a dynamic sensor

switched system as a ﬁnite-dimensional discrete-time dynamic system V:

v =[Q7F1u7¢]

(b:QXPXUgXU1l—>Q (5.24)

where U := Hg x U1. Here, a 6 U1 is the vector of control inputs while k(t) 6 Mg, t E
I‘, the sequence of grid locations in 9 that the target visits, are the reference inputs
to the system. The reference inputs evolve according to a predeﬁned function of time
t and depend only on the target motion, which may not be known a priori. The
control inputs are used to control the trajectory of the system to ensure the target is
covered and can also be used for optimization of the various metrics being used.
The system V may not be complete, as all the transitions between various cameras
may not be enabled at all times, based on the visibility constraints of the various states

for the target locations provided.

115

5. 7.1 Examples of Video Feedback Algorithms - Best Resolution

Consider a set of N cameras Q = {(i, C;)|i = 1, ..., N C,- E C}. The sensor selection
strategy is dependent on the resolution sustained by the target at each of the cameras
and the one with the best resolution will be selected to provide feedback to the human
operator.

This sensor selection strategy takes into account only the best resolution sustained
at all the cameras, which implies that the control input 11. is based solely on the camera
conﬁgurations {C2} and the target locations k(t). The strategy does not depend on
the current camera tracking the target - i.e., sensor switching costs are not taken into
account.

In order to implement this strategy in the dynamic systems model, we deﬁne a
function RN33 : Hg x I‘ +—> U1 which maps the locations of the target at various
time instances to the space of the control input variables it based on the resolution
sustained by the target at each camera. Notice that R is not a function of the current
state of the dynamic system - i.e., the current camera. This implies that the system
does not have memory.

The above algorithm can be implemented as follows. Let qbest represent the cam-
era that has the least distance to the target; i.e., under the assumption of homogenous
camera sensors, Qbe st can view the target with the best resolution. For a given target

location k(t) and time t, Qbest can be written as:

ant) = cm i=rr11tigNdist(Cz-(t), km). visib1e<q<a km) == (525)

Using the deﬁnition of qbest the next sensor to switch to i.e., q(t+ 1) can be written

85: (10 + 1) = Qbest(k(t+1))-

116

5. 7.2 Example Video Feedback Algorithm - Persistent Camera

In the best—resolution—based video feedback algorithm, the current camera tracking
the target was not taken into account in the input to the dynamic system. Switching
cameras in a surveillance network can lead to the disorientation of the human operator
and is also generally associated with a switching time delay. Hence we should minimize
the number of switches.

In order to implement this strategy, deﬁne a function Rper : Q x Hg x F H U1
which maps the current tracking camera and the locations of the target at various
time instances to the space of the control input variables a. This implies that the
state of the system, q(t), is used along with the resolution in order calculate the
control input to the dynamic system.

The above algorithm can be implemented using the deﬁnition of qbest from the

previous section. The sensor selected at the next time step q(t + 1) can be written as:

q(t + 1) =
min(dist(Cbest(t + 1), k(t + 1) + e, dist(q(t), k(t + 1))),
if visible(q(t), k(t + 1)) = 1
Qbest(t + 1)
if visible(q(t), k(t + 1)) = 0

(5.26)

5. 7.3 Assessment Metric for Video Switching Based on Camera Loca-

tions

In sections 5.5 and 5.5, a switched video feedback system for realtime video feedback
to the human operator to assist in the monitoring and decision making task has been
described. However, the constant switching of the video stream can be quite disruptive
for human cognizance of the remote environment and adversely affect decision making.

Hence the system should try to reduce the number of video feed switches required for

117

the same tracking task.

The video feedback switch is based on the capability of the tracking cameras to
visually resolve the target and discern its features - i.e., the resolution of the target
sustained by the cameras. This switching is directly inﬂuenced by the locations of the
cameras in the scene. In the case of manually deployed camera conﬁgurations, the
system designer must take into consideration the adverse effects of video switching and
must decide the camera locations in order to minimize the switches while continuously
tracking the target. In this section we propose a metric as a performance measure of
the video feedback switching algorithm, based on the conﬁguration of the cameras.

The switching metric M 6 72+ maps the conﬁguration space of the video algo-
rithm [89] (which includes the previous feedback camera) and a scalar potential ﬁeld
over the conﬁguration space of the video algorithm to a positive scalar. Given the
conﬁguration of the cameras C, the video algorithm can be thought of as a map—
ping from the combined space of current feedback camera and target location to the
feedback camera space. The scalar potential w(p) E R+,p E E provides a relative
importance to the current target location and can be chosen to bias the importance

of the target locations.

M = [F Wane) 90(1)) df (5.27)

where, f E F C Q x E and p E E. VV 6 [0,1] is the spatial derivative of the
output of the video feedback algorithm and represents at which points in the video
feedback conﬁguration space F the output of the video feedback algorithm changes.
It is akin to edge segmentation in image processing [90]. Integration of these points
over the conﬁguration space represents the number of switching surfaces present in

the complete space E. A lower number of switching surfaces will imply that the

118

feedback camera location does not switch a lot with the free motion of the target.
The term cp(p) is just a scalar potential which reﬂects the importance of the particular
point p E E and can be used to bias the surveillance space E.

The switching metric M in conjunction with other metrics such as target reso-
lution at various locations in E can be computed for a large number of randomly
generated conﬁgurations and a sub-optimal solution can be derived for the placement
of the cameras. Owing to the large conﬁguration space of the deployment algorithm,
various evolutionary computation and optimization schemes can be utilized in order

to minimize the total placement metric.

5.8 Optimization for Target Tracking using Switched Sensors

Given the scenario having a large number of static cameras distributed in an envi-
ronment with signiﬁcant overlap of their viewing regions, the problem is to identify
the minimum number of cameras required to track a given target trajectory.

We propose to use dynamic programming as an optimal control strategy in order
to minimize the total number of cameras required to view the target with adequate

resolution.

5. 8.1 Optimal Control and Dynamic Programming

Consider a ﬁnite dimensional discrete-time dynamic system V and function b : X x
F x U 1—+ R+ that takes on non-negative real values. We can now deﬁne a trajectory

cost function as:
r—l
B(r,o,:r,w) = Z b(i,€(i),w(i)) (5.28)
i=0

where, w is a sequence of inputs and g = cp(:r,w) is a sequence of states of the

dynamical system given the initial state :1: and a sequence of inputs w. The trajectory

119

cost functions is read as “the cost of the trajectory produced by the control sequence
w E Lila”) if we start at state 1:.”

An optimization problem can be stated as: Given a discrete-time ﬁnite dimen-
sional dynamic system V, a trajectory cost function [3 and a pair of times a < r,
and an initial state 2(a) ﬁnd a control sequence to admissible for state 17(0) which
minimizes B.

More generalized problems can be introduced which require the total cost to be a
function of the ﬁnal state or by adding constraints to the ﬁnal state, etc.

In order to ﬁnd a control input sequence no that minimizes the trajectory cost
function 8, we could list all possible control input sequences a) = no, 1104.1, ...uT which
are elements of L! and compute the complete cost of all the trajectories generated by
all w E Illa”). This could indeed entail a prohibitively large computation cost.

Alternately, we could use the dynamic programming method and inductively con-
struct the Bellman function V and the optimal control input law K, backwards in

time i.e, from 7' towards a as:

V : [0,7] X X 1——> R... (5.29)

K : [0,7') x X H U (5.30)
The Bellman function V(s, 1:) should satisfy for any 3 E [0, r], and each a: E X,
V(s,:r) 2 main B(r,o,x,w) (5.31)
and the optimal control input satisﬁes the condition

60' + 1) = ¢(€(J'),J}K(j,€(j))), j = 3,3 + 1, MT

{(3) = :1: (5.32)

120

for each s E [0, r) and each a: E X.
The computation effort required to solve this problem will be signiﬁcantly lower
than tabulating all the control input sequences. However storage requirements for

this procedure will be signiﬁcantly large.

5. 8.2 Dynamic Programming Algorithm Implementation

In order to minimize the switching cost and maintain adequate resolution of the target,
we construct a graph based on the camera location and the visibility of the targets to
the cameras, given the predicted motion of the target, Figure 5.9. The procedure for
constructing the graph is shown in Algorithm 1. After the graph is constructed, we
use the dynamic programming method, a modiﬁed version of the Dijkstra algorithm,
to ﬁnd the optimal switching strategy (Algorithm 2). In the graph generating stage,
we enumerate all the cameras that can see the current grid point with an acceptable
resolution. The resolution metric is marked over each camera as a weight on the
vertex. However, in order to run the dynamic programming method, it is desirable
to have all the weights only on the edges. Thus each camera is represented by two
vertices in the graph, with an edge connecting them that has a weight equaling to
the resolution metric. We also enumerate all the cameras that can see the grid points
speciﬁed by the prediction vector. Between two prediction grid points we connect
the cameras based on their switching metric. If the two cameras are the same, the
switching metric is set to 0, otherwise a non-zero switching metric is set on the edge.

In Figure 5.9, all non-zero metrics are set to 1 for the sake of simplicity.

5.9 Chapter Summary

This chapter presents the design of the various components of a surveillance network.

For exploiting the redundancy in motion along the various axes of the sensor, an

121

f." ' "\ //’”\ ""\\ /“
{ C t Resolution .. l/ i \t O (/ C ‘1, Resolution J C'\l
'=. Metric . / ‘ ’ Metric ’ ‘ I
\\ 1/ \\..._// \x.._fu’/ \\a_l—/
1
/’""“ ’7 ’ ' , ”‘\ ”A
r" \ Resolution , / ’\ Resolution .. / ’\
i 62 l Metric K CZ / ’ CZ Metric W] 62 l
\\\_’_,’ .\\__// 1 \\_/ \ku/z
rx'"\ . /-\\ , ... . ‘ /"' ,/‘\.
(I C \ . Resolution a / C, . _ ‘ / C\ . Reso|ution .. “f C,
K k ) Metric K k K k / Metric \ k
’\._-/ -,_,. ‘\.__/ v/

Figure 5.9: Generating Graph for the Camera Switching Strategy

 

Algorithm 1 [0, W] = Grathen(pv)
1: for all cam in C ameraSet do
2 if pv(l) is visible by cam then
3: Add two nodes representing com to G
4
5

 

Connect the two nodes with the resolution metric of cam in W
end if
6: end for
i Pprev = 7911(1)
8: Delete pv(1) from pv
9: for all p in pv do
10: for all cam in CameraSet do

\1

11: if p is visible by cam then

12: Add two nodes representing cam to C

13: Connect the two nodes with the resolution metric of cam

14: Connecting the nodes generated by p and ppm” and set the switching cost
to W if they are for different cameras.

15: end if

16: end for

17: end for

 

optimization framework is prOposed. This framework allows the accomplishment of
various other tasks under the constraints of the main tracking task, such as energy
conservation and better sensor location for sensing.

The characteristics and implementation details of the various choices for video
feedback are discussed and finally the design of a sensor node is presented. The
architecture of the sensor node consists of a camera, a target perception module, a

motion control module, a video feedback module and a communications module.

122

 

Algorithm 2 p = Dijkstra(G, W, s, t)

 

1: for all 22 in V(G) do

2: d(v) <— +oo {Initialize the distance vector}

3: p(v) <— undeﬁned {Initialize the previous node vector}
4: end for

5: d(s) <— 0 {The source has 0 distance to itself}

6: C (— 0 {Initialize the checked node set}

7: Q <— V(C) {Copy the vertex set into a working set}

8: while Q sé (ll do

9: u +— ExtractMin(Q) {Extract the vertex with minimum value in d}
10: C s— C U {u}

11: if u = t then

12: return

13: end if

14: for all edge (11,12) do

15: if d(u) + W(u,v) < d(v) then

16: d(v) 4— d(u) + W(u, v)

17: p(v) <— u

18: end if

19: end for

20: end while

21: if !t e C then
n p90

23: end if

 

123

Chapter 6

Experimental Implementation and

Testing

Figure 6.1 shows the general architecture of the surveillance system implemented over
an IP network. This chapter presents a discussion of the implementation issues such
as switched video feedback to the human operator/ client and a performance analysis

of the experimental results of the proposed algorithms.

6.1 Experimental Test-bed Setup

A visual surveillance test-bed comprising of seven video cameras has been set up at
the Robotics and Automation Laboratory at Michigan State University. The test-bed
is used to demonstrate the integration of multiple active sensors with active target
tracking algorithms to perform a coherent pervasive surveillance task of tracking
multiple targets as they move across the monitored landscape. The cameras are
attached to processing and communication computers and some are mounted on pan-
tilt drives or robots for moving the cameras. The surveillance test-bed developed
has the functionality of an end-to—end, multi-camera monitoring system that allows a

single or multiple human Operator(s) to monitor activities in the region of surveillance.

124

Networked sensor”

(Wired)

   

   
   

\

Mobile networked
sensor (wireless)

   

. FEVER '
“iii-oi. ‘
5] “Tit—'4‘
Wireless networked
sensor
Wireless networked
sensor

Mobile wireless
operator

 

 

  

camZ \l / cam3
amt /
Network
"' Infrastructure
cam5
/
{'—

 

\ Robot Motion
Platforms

Targets

Figure 6.2: Surveillance test-bed designed along with the cameras and the motion
modules for camera motion.

125

 

Figure 6.3: Sony EVI—D30 mounted on Robot Nomad XR—4000

Figure 6.2 shows the test—bed setup. This section describes the hardware and

software architecture of the individual sensors used.

6.1.1 Hardware Setup

The sensor node setup consists of three Sony EVI—D30 active PTZ (pan-tilt—zoom)
cameras, one dual head stereo vision camera consisting of two Hitachi KPI-D50 cam-
eras mounted on a Digital Perception DP-250 pan-tilt drive and one ﬁxed view KPI
D-50 camera. One of the Sony EVI—D30 cameras was mounted on a Nomad XR 4000
mobile robot for extended mobility and sensing range. Figures 6.3 and 6.4 show the
sensor and motion hardware for the implemented system.

The cameras were connected to Pentium 4 2.4 GHz computers which had PCI-

based video capture hardware cards attached to them. The camera capture cards are

126

 

Active Camera
Sensors

 

Figure 6.4: Active Cameras: Sony EVI-D30 and dual head Hitachi KP-D50

capable of transcoding, in real time, the input from the cameras to an H.264 stream
(on a separate software interface) and transmiting that stream over the Internet to
clients that request the video stream. The PTZ modules for the various cameras
were controlled through serial port communications. The various computers were
connected to each other using wired ethernet and wireless 802.11g connections over
an IP network. The individual sensor nodes were provided with publicly addressable
IP addresses and hence could be accessed from the Internet. The human interface
clients were comprised of Pentium 4 laptop computers which could be connected to
the surveillance network through a direct wired or wireless connection or through the

Internet.

6.2 Software Setup

The sensor outputs are processed on the local computer for target detection and
identiﬁcation. The local computers used for sensing and processing are running a

Linux Operating system. The targets to be tracked using the automated tracking

127

system were multiple humans moving around in the monitored region. For ease
of target detection and identiﬁcation, the human targets were wearing solid color
clothing and CM Vision was used for color analysis and blob detection and merging
[91]. The acquired 640x480~pixe1 images were quantized on a 128x96 grid for reduced
computation load. The identiﬁed targets were represented using their bounding boxes
which were quantized on the 128x96 grid. The coverage set is also quantized on the
grid and the distances of the grid points to the target set are pre—computed and stored.
The tracking task is carried out at 25 fps - i.e., a new velocity input is computed for
the motion module inputs at the rate of 25 Hz.

At the same time, the raw camera feed is input to the H.264 transcoder card,
which is provided as a separate software device (/dev/dsp0) on the computer. There
are various implemented on Linux and Windows operating systems for receiving video

stream and other pertinent information from the camera servers.

6.3 Experimental Results for Wide Area Camera Calibration

The individual camera sensors involved in the setup of the camera system need to
be calibrated for their location and pose information in order to share information
amongst each other. A wide area calibration procedure described in Section 5.1 has
been used to calibrate the conﬁguration of the cameras.

BlueCCal, developed for use at Czech Technical University by Tomas Svoboda [92],
is an attempt at calibrating many cameras without the necessity of a large calibration
object, or the necessary handwork of more classical methods. The system does,
however, have necessary conditions for the calibration to yield sufficiently accurate
results, which, among others, is a minimum of three cameras for the setup.

The only action required as a reference for calibration is the mapping of l-point

objects that is easily detectable in the images captured. For our setup, this involved

128

capturing images in a dark setting with a plastic-capped laser pointer to act as the
1-point reference in all images. Therefore, the only necessary work was moving the
laser pointer around within our viewable domain, and capturing images during the
course of this motion. These images would later be used in the calibration software,
to locate the relevant points necessary for calibration.

The software itself can be broken down into a number of simple steps: ﬁnd all
points in the images for each sensor, discard all mis-detected points by pair-wise
RANSAC analysis, estimate projective depths, ﬁll missing points to make the scaled
measurement matrix complete, perform the rank 4 factorization to get projective
shape and motion, and estimate parameters of non—linear distortion.

Results of the calibration aren’t only just the necessary conversion matrices be-
tween each of the sensors, but also the distortion models of the system, the location
of all captured points in each of the sensors, and a scaled model of the location of
each of the sensors. This information proves vital in the immediate testing of the
results of the calibration; i.e., if the scaled model displays some erratic location of a
given sensor, it is likely the calibration is off w.r.t. that sensor. One can then look at
the location of the points in that sensor and notice if that sensor got the necessary
coverage of its viewing area to accurately calibrate w.r.t. the rest of the system. This
allows for a more intuitive check of the accuracy of a given calibration.

An advantage of this calibration tool, besides those stated above, is that the system
is an adaptive calibration, and among the images captured, the one-point object, -
i.e. the laser pointer - does not have to be in all of the images for each sensor. The
adaptive style of this network means that the more cameras involved, the better the
calibration results. So this proves beneﬁcial for heavily overlapped multiple sensor
networks. The only necessary condition w.r.t. the number of sensors that can view
a given point is that the resultant calibration requires a minimum number of points

that can be viewed by at least three cameras at a time; this proves incredibly helpful

129

5-Camera Conﬁguration

z-axis

 

 

4 4 x—axis

Figure 6.5: Camera Setup: Location and pointing direction of the camera setup.

when working with a disjoint network, where there exists no point that all of the
cameras can see. By only requiring three cameras to see it, as long as the majority
of the monitored region is covered by at least 3 cameras, then the system is able to
calibrate with reasonable efficiency.

Figure 6.5 shows the results of the calibration carried out on 5 cameras. The
cameras were located at different heights and were pointing to the center of the
room.

The results of this experiment demonstrate the calibration of the surveillance test-
bed setup for conducting the remaining experiments outlined. From Figure 6.5 we

can see that the camera setup was calibrated with an adequate degree of accuracy.

130

6.4 Experimental Implementation for Image-Based Hausdorff

Tracking

The automated tracking task was deﬁned as maintaining the visibility of multiple
moving targets in a region with adequate resolution. The shape function used to

track the targets is:

Mo = 2?; Jpovrkr) + await-1+ await.) (6.1)
JFOkai) = lg, dim?) dq

JAmmez) = ma.x(fKz dq — AREAJl/IINi, 0)

JAma$(f{,-) = min(AREA_MAX,- — IR. dq, 0)
2

where, N is the number of targets, q is a point on the target set R and AREA-M AX i
and AREA_M I N,- denote the maximum and minimum admissible areas of the target
set R,- for maintaining adequate resolution. Note that the shape function J (K )
is zero only when the set of targets UiN=1 If; is completely covered by the sen-
sor coverage set K and when the area of each target set R is within the limits
(AREA-M I N, AREA-M AX ) speciﬁed for that target. Otherwise J (R ) is a non-
zero positive value.

Based on the value of the shape function J (K ), the velocity input vector u to the
camera motion units (PTZ drives or robot) is calculated and applied at the rate of

image acquisition - i.e., 25 frames per second (fps).

6.4.1 Tracking a Single Target

The surveillance task is to maintain the target in the active FOV while maintaining

a discernable resolution of the target on the image plane. The task is described

131

 

 

 

 

 

 

 

 

g """"""""" Jr=ov
g . -'-'-JAmmMm
D . ‘ . J
E; ........................ E ...... .:.U§J\ .................. AﬂmMm‘
‘0 : . f f s '
(T) 1' t J E
r in! 3 -A I." I I
40 60 80 100 120
time (s)
(a) Shape functions
150-“; ......... 2 ............ J.H.H.H.E ............ . ............ : ............ VX
100._...{\' ......... ............ ............ ............ ..... .....vz

 

 

 

 

 

 

-100 i i i i i i
O 20 40 , 60 80 100 120
time (s)

(b) Active sensor inputs

Velocity input (mm/s) (mrad/s)

Figure 6.6: Image-based Hausdorff tracking using active camera

mathematically using equation 6.1. The target was a ball that was manually moved
around in 3D space during the duration of the experiment. The image was taken as a
regular grid of 128x96 pixels evenly spread over the 640x480 original image in order
to reduce the computation load. The target set K was approximated as occupying a
certain number of pixels on this grid. Assumptions on the input u = [23, 2, W13, wy, MT
to the robot and camera system were derived using equation 3.31. The target was
initially placed where the task critera were not satisﬁed and then the target moved
around manually, which generated a disturbance input to the system and the system
immediately tried to reduce the value of the shape function to zero.

Figures 6.6 and 6.7 depict the results of the image space Hausdorff tracking task
using an active camera. The shape functions J FOV (If ), J Area Minfk ) and J Area Max

and inputs u = [23,y,w$]T are plotted in 6.6 (a) and (b), respectively. The ﬁgure

132

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(O FOV
:
.9 J .
'5 AreaMrn
C ......................................................................... :
3 .
LL :
a) _ ......................................................... ;
o- .
to t
_c .
(D ............................................................ f
7
--—W
(I) Y
*5 """" 7»
Q .
E
S
m ........
E
(U
0
-10. """"""" 1 """"""" 1 """ 1 ... 1 ........ i ....... I.U.H,;j
(bl

Figure 6.7: Image-based Hausdorff tracking using active camera: Maintaining ade-
quate target resolution
shows that the system is stable and that the target is maintained in the camera ﬁeld
of view with a desired resolution despite the seemingly random motion of the target.
These experiments demonstrate Hausdorff tracking using multiple shape functions.
We see that the shape functions are are non-zero when the target moves but they
are quickly reduced to zero by the motion of the camera. The motion of the target
is taken as a disturbance input for the tracking control system shown in Figure 3.4
and it can be seen that the Hausdorff tracking controller can reject disturbance input

from motion of the target.

6.4.2 Tracking Multiple Targets Simultaneously

The surveillance task is to maintain the multiple targets in the active FOV of the

sensor. The targets were two humans moving around and interacting with each other.

133

 

 

 

 

 

 

 

 

 

 

 

 

 

 

,5 1 r r g s .1

0 1 0 20 40 50 60

time (s)30
Figure 6.8: Tracking multiple targets using Image—based Hausdorff tracking

The target set K was approximated as occupying a certain number of pixels (in
multiple disjoint locations) on this grid. Assumptions on the input u = [wx,wy]T to
the camera system were derived using equation 3.31. The targets were initially located
such that they were out of the ﬁeld of view of camera. The targets were detected by
the monitor and the sensor was tasked to cover the two targets simultaneously. At,
t = 0, the targets are just in the active FOV of the sensor and the task criterion is
not satisﬁed. The camera then moves to reduce the shape function J to zero so the
targets are now covered. The targets then randomly move around the room and the
camera tries to maintain both targets continuously in the active FOV.

Figure 6.8 depicts the J and the input velocities (u 2 [tax wy]T) applied to the
camera. Notice the initial large value of the shape function J, which is quickly reduced

to zero. Figure 6.9 depicts the position X, Z estimated of the two targets. We see

134

 

 

 

 

 

 

 

  
  

 

 

Person 1
Person 2

 

 

 

 

.............................................................

...........................

 

1o 20 30 4o 50 60 7o 80
time (s)

Figure 6.9: Image-based Hausdorff tracking for multiple targets: Estimated position
of targets
that despite the seemingly random motion of the two targets, the camera always tries
to keep both of them in the active FOV. Further, the energy efficiency of the proposed
method is demonstrated by the relatively infrequent input applied to the camera only
when one of the objects escapes the active FOV. Figure 6.10 shows the images taken
from the camera at various times during the tracking experiment shown in Figure
6.8.

This experiment shows multiple target tracking with a single camera sensor using
the Hausdorff tracking method. Tracking multiple targets using a single sensor is
advantageous for overloaded networks where the number of targets outnumber the
number of cameras available for tracking. The image sequence in Figure 6.10 shows
that the targets are moving about randomly and still being maintained in the ﬁeld

of view of the camera sensor at all times, which demonstrates the advantages of

135

    

time = Ssec time = 13sec time = 208cc

   

     

time = 425cc time = 465cc

Figure 6.10: Image sequence for tracking multiple targets using Image-based Haus-
dorff tracking.

Hausdorff trackng over vector space servoing.

6.5 Experimental Implementation for Cooperative Hausdorff

Tracking

The cooperation between nodes to continuously track a target by directly tasking
various sensors can be done using cooperative Hausdorff tracking as explained in
section 3.4.2.

In this experiment, a mobile robot is deployed in order to track a target that cannot
be covered by a ﬁxed sensor with adequate resolution. The robot uses the position
information received from other sensors in the network to compute a trajectory that
will bring the target set within its coverage set.

Figure 6.12 shows the x-z plot, which is the plane of the floor, of the data for
both the target centroid and the robot tracking it. The robot’s path shows the initial

cooperative tracking taking place. As the robot moved its view cone toward the

136

Plot of cooperative and image-based Hausdorff tracking on the x axis

 

 

 

 

 

 

 

 

 

7000 r r l r r r I I
f § § ’: § § """" ”b“

6000_H..H,H (,H.. . Win a.. .f,..,ﬂ ._ﬂq.q....;....n.m ; ....... -f--ungetcenUOklfj
E 5000
g
x 4000

2000 L I l I l J 1 I

0 10 20 30 40 50 60 70 80 90
time (5)

Plot of cooperative and image-based Hausdorff tracking on the z axis
4000 r r r I r r . 1
: E i 2 § 5 ------- robot
3500- . .. .. _. émm9t_wnmw _

 

 

 

 

 

    

 

 

 

E 3000 _. ......................
s ‘ .
N 2500 _ .......................................
2000 _ .- .................... ..
1 500 l l l l l 1 1 l
1 0 20 30 40 50 60 70 80 90
time (s)

Figure 6.11: Position of robot and target set on x and y axes.

target, the target also began moving. Re-projection of the target onto the view cone
caused a slight change in velocity in the robot that can be seen. Once the target set
enters the set space of the mobile robot’s view cone, image-based Hausdorff tracking
begins. Now the robot reacts to changes in the target to maintain optimal resolution.
This continues for the remainder of the experiment, as can be seen in the plot.

In Figure 6.11 we can see the data from the individual axes. The robot is quite
successful in maintaining the target set within its coverage set, as can be seen by the
correlation of the plots of each of the axes.

Finally, in Figure 6.13 we see the plot of the J value for both image—based and

137

Plot of cooperative and image-based Hausdorff tracking on the x-z plane
5000 r r 1 1 r 1

------ robot
—— target centroid

n ’

 

 

 

 

 

I

4500

Target

1
l

4000

3500

2 (mm)
1
41

I
l

3000

.....

I

2500

l
L

2000

uuuuuuuuuu

 

 

 

1 500 l L I l J l
2000 2500 3000 3500 4000 4500 5000 5500

Figure 6.12: Motion of target set and robot on x—z plane. The view cone of the robot
is shown at a particular instance in time with the position of the target centroid
shown as well.

138

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1° I F T I '.
8.... .. .4 ................................. —¢
2 :--e ------- ;--—----+ ------- e-—----A-------*
a; 6- ...... - M .. ....... ......... .—Ima9eJ
. i f I . q
s . 2 : s """ °°°9J ,
IL ' r r j - - -coop(|ow)oractive(hlgh)
8. 4,. l ....................... ................... -l
a
.r: I
‘0 I f
2,. I . _ ._
' ; 2
0 R 1M A 1 M
0 10 20 30 40 50 60
time(s)
Velocities on x and z axes in time
I l I I
3 —xvelodty
a ....... _'..f.'..z_.‘.’9'9‘?‘.'¥‘-
2
>.
2 ...... -
(I
x
a .................... ............... «—
.3‘ i
o . r----.--------s-
> ------__‘C ; -
—2oo ‘ ' 1 i
O 40 50 60
time(s)

Figure 6.13: Shape functions for cooperative and image-based Hausdorff tracking

139

 
  

Target at optimal distance. (505)
w k 1d ' I
' ‘ ‘ ' l

   

Proﬁle of robot and target

    

Target dlstance to great. (205) Target at edge of view. (805)

Figure 6.14: Robots View of target at several points in time to show several conditions
that the tracking system responds to.

cooperative Hausdorff tracking. A demarcation line has been included to clearly mark
the switch from cooperative to image-based tracking. This occurs at approximately
seven seconds. This agrees with the data from Figure 6.12 since this is the point
in time that the values of the axes begin to synchronize. Figure 6.5 shows several
images gathered during the experiment. The ﬁrst image in the series shows a proﬁle
of the mobile robot and the target tracked. The remaining images were gathered from
the robot’s onboard camera. The second image shows the target at the determined
optimal distance. This determination is based on the set space of the target instead of
simple-distance as is the case with point-based tracking. The third and fourth images
show cases where the target is moving toward, and away, from the robot, which then
causes the robot to retreat from, and pursue, the target respectively. The ﬁnal image
in the series shows the target straﬁng left, to which the robot responds by moving to

re—center the target set.

140

 

Figure 6.15: Target tracking using a Sony EVI—D30 camera mounted on Robot Nomad
XR—4000 mobile robot

This experiment shows that the Hausdorff tracking control can be used adequately
for task space servoing, as well. Given the task space location of a target set (possibly
multiple disjoint targets) and a model of the coverage set of a camera, the target
can be brought into view using cooperative Hausdorff tracking. This experiment also
demonstrates the stable switching (due to in-built hysteresis in the switching criteria)

from cooperative Hausdorff tracking mode to image-based Hausdorff tracking mode.

6.6 Experimental Implementation for Redundancy Resolu—

tion using Optimal Hausdorff Tracking

Based on the discussions in the preceding sections, we can now formulate the task
distribution problem for Hausdorff tracking using redundant motion stages. The
problem we are considering is the optimal task distribution for a pan-tilt camera
mounted on a mobile robot tracking a single target. The experimental setup for the
tracking task is depicted in Figure 6.15.

For the experimental implementation, we will consider the redundancy in the

robot and camera motion along the X traverse direction. When the target moves

141

along the X direction, either the robot can traverse linearly or the camera can pan.
Hence there is a redundancy in the task execution. The model for the Hausdorff

tracking task using the camera and the robot combination can be written as:

"U2:

[01C2] = —aJ (6.2)

my

where u 2 [ex 212ij is the velocity input vector to the robot and camera combination,
a is a positive task level scalar gain and J is the shape function which indicates the
error in accomplishing the task. The vector C = [Cl C2] describes the distribution of
the task error between the two redundant axes. Equation (6.2) can be derived using
the formula provided in [67].

The task indicators used are the energy consumed by the system and the cur-
rent pan angle of the camera. The current pan angle of the camera affects how
the translation in X by the mobile robot affects the task accomplishment. Hence,
the objective functions considered for the multi-objective optimization schemes also

included energy minimization and pan angle minimization and were represented as:

01 = q8 = (0 - 00)2 (63)

D2 = llvxll + Ilwyll (6-4)

where, 0 is the current pan angle value, 00 is the desired pan angle value considered as
zero pan, Hug,- II and Hwy II are the horizontal X velocity and the angular Y velocity (pan
velocity), respectively. Note that conﬁdence parameters are built into the objective
functions. For example, when the pan angle is close to zero, we would like to reduce
the weight for the objective function D1, which is exactly what happens.

Various optimization schemes were set up, including one with no optimization

involving just a random selection of the null space vector 6. The optimization schemes

142

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

 

 

10r .. ~.- w - 150
‘ b panvelocity
j '6 Xvelocity
c $100» . -.»
.9 >
*g x
3 E 50
“t .8
g 2
w 5 o
a
_50 l i a" g .
0 10 20 30 40 50
time (s) time (s)
120 150.
Tolerable
100’ .........
80 Tolerable 2 100’
8’
E) 60 F1 . g
a) ...... ...... . .............. 50
4O . 3
'20’“ ‘ """""""" 0 Highly Desirable
o ‘_ . i . 4 . A .
0 10 20 30 40 50 60 0 10 20 30 40 50 60

time (s) time (s)

Figure 6.16: Results of the tracking task with optimal task distribution using physical
programming.
selected for the experiment included single objective optimization where either the
energy minimization or zero pan angle objectives were considered individually. For
the multi-objective schemes, both weighted sum and physical programming methods
were tested. The results of these tests are tabulated in the following sections. Figure
6.16 shows the results of the experiments carried out using physical programming
optimization described above. The figure shows the plots of the shape function, the
velocities in the X and pan directions and the objective function values for energy
conservation and pan angle zeroing.

Notice that the values of all the objective functions are always maintained at
least in the tolerable range by sacriﬁcing the performance of the other cost functions
currently under consideration.

The same task was also carried out using a lie-optimization scheme with a ﬁxed

143

 

45 _ I. Weighted Sum Opt.
II :I _ .. . No Optimization
ll
40‘ II I:
35 II I.
- I _
:: ,I Desrrable
I I
I I
I I
' I
i .., .......................
I'
II
| I
Hfghly Desirable
" |

 

 

time (s)

Figure 6.17: The energy consumption for no optimization, and weighted sum.

task distribution between the two redundant axes and also using a weighted sum
scheme. As mentioned in section 5.4.], a weighted sum method with ﬁxed weights
without apparent meaning is used in order to combine the various cost criteria. The

cost criterion used is:

+ llwzll)

The two cost functions - i.e., energy and pan—angle maintenance for the no-
optimization method and the weighted sum method, are shown in Figures 6.17 and
6.17, respectively.

For the no—optimization case, it can be seen that in the absence of any optimiza-

tion, the linear velocity transferred to the robot is very small and most of the velocity

144

500 '

 

 

 

 

  
 

 

 

'— - - - — Weighted Sum Opt.
- - - No Optimization
400-
Untolerable
300-
I """""" H'i' iii ' 'Uh'cié's'iiéiiié"
_g 200 .................................. 3‘ ............. .g...y. .................
2’ -_
“3 |
S ------------------------------------ I ---------------------------
o.
100 ‘_ _, Tolerable
I',IIIIIII.IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII‘IIIIIIIIIIIIIIIIIIIIISIS
0". ............................................... ". .......................
lllllllllllllllllllllllllllllllllllllllllllllllll ‘Hl‘llllllllillllilltl
!
-1oo~ ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, I, _[9'37‘3'9'3 . ,
Undesirable
_20 I I I L I I
0 10 20 30 40 50 60

time (s)
Figure 6.18: The pan angle for no optimization, and weighted sum.

for task execution is allocated to the camera pan.

Using the weighted sum method, the various components comprising the task
criterion lack physical meaning. Based on this observation we notice that even when
the multi-objective optimization is enabled, the pan angle remains very close to the
desired value while the energy conservation objective attains unacceptable values.
It should be noted that various tolerance ranges, which are used for the physical
programming approach, are not used in the weighted sum approach. Hence, despite
the optimization being enabled on the various objectives, the values of the objective
functions can attain unacceptable values.

The above results show that the physical programming method provided better
results in terms of Optimizing multiple objectives while simultaneously accomplish-
ing the task. This is because with the weighted sum method, due to the loss of

physical meaning of the weights, dynamic weighting may provide an unacceptable

145

solution. However, for physical programming, since the cost criteria are combined
using weights which are attributed a physical meaning, physical programming can
calculate solutions on the pareto—optimal frontier.

This experiment further goes to show that the redundancy in the input expressed
in Equation (3.30) can be successfully exploited in order to optimize various cost func-
tions associated with the task. Hence, some criteria, such as resolution maintenance
of multiple targets. can indeed be removed from the set of constraints and used in

the optimization criteria and can provide more ﬂexibility for task speciﬁcation.

6.7 Experimental Results for Multisensor Cooperation

This section illustrates the experimental implementation and results for the two tech-

niques:
1. Sensor allocation algorithm and

2. Mutational hybrid automata modeling techniques proposed earlier.

6. 7.1 Experimental Evaluation for the Sensor Allocation Algorithm

Pervasive surveillance using the proposed multi-sensor coordination mechanism was
rigorously tested. The task of tracking a target over multiple sensor coverage regions
was carried out. The target moved a large distance, spanning the capable viewing
regions of multiple sensors. Figure 6.19 depicts the results of the tracking scenario.
Initially, sensor 2 can see the complete target but the target is out of the active FOV
of sensor 1, which is depicted by the target visible ﬂag. Based on the solution of the
optimization, sensor 1 is tasked to track the object. It uses cooperative Hausdorff
surveillance to cover the target. The global coordinates of the target for cooperative
Hausdorff tracking are provided by sensor 2. Once it acquires the target, it switches to

image-based Hausdorff tracking to ensure target coverage. The target then randomly

146

 

 
 
 
 

5- ....................... , ........................ , ................................................. ,
—X

4-_Y
—Z

 

 

-—- visible

 

 

 

  

 

 

E
C
.9
#0.;
O
0.
N
>. 3 -
X
'0
OJ
‘5 2 -
.§
E
,_ 1 - .
e l 5
g 0 I : l 1 l l l l I
8 0 1O 20 30 40 50 60 70 80
time (s)
(a)

 

 

 

 

 

 

 

 

 

 

Camera 2- Estimated X, Y, 2 Position (m)

(b)

Figure 6.19: Estimated position of targets for multisensor cooperation using sensor
allocation algorithm

147

moves around in the active F CV of both the sensors, which keep continuously tracking
it. Notice the robust tracking performance of the image-based tracking scenario
despite the minor differences in the estimation of the object location. The target
then starts to move out of the tracking range of sensor 2, which again triggers the
optimization planning problem, which assigns sensor 1 to keep continuously tracking
the target.

This experiment demonstrates the cooperation between multiple sensors for track-
ing a target. It shows that information from another sensor can be used in order to
cooperatively acquire a target by a camera and that the location of the targets can
be used to optimally assign the tracking task to the various sensors based on their

availability, target priority and the locations of the sensors and targets.

6. 7.2 Ezperimental Evaluation of Multiple Target Tracking using Mul-

tiple Formation- Constrained Sensors

Motion is an energy intensive resource for a sensor node and should be minimized.
Multiple sensors (maybe with varying sensing modalities) can be mounted on a single
motion drive to reduce the energy consumption. Further, the various sensors can be
tasked to track different targets simultaneously. Hence the motion of the sensor node
should be optimized in order to achieve the various tasks simultaneously. The optimal
Hausdorff tracking framework proposed in section 5.4.1 can be used to calculate the
optimal sensor input u.

Consider a scenario where two targets are being tracked by two separate formation-
constrained sensors as shown in Figure 6.20. The task is to maintain the visibility of
targets R1 and K2 using the sensors caml and cam2 respectively. The two sensors
are mounted on a single motion platform, although sensor caml can be moved inde-

pendently of sensor camg by using the motion of joint go of the PUMA robot arm.

148

The shape functions for this task can be expressed as:

J1: [Kl dK1(p)dp (6.6)
J2=/1.{2dK2(p)dp (6.7)

The motion inputs to the complete robot system are linear motion along the X
axis (um) and angular motion about the Z axis of the robot base (LUZ). Angular
motion of the robot will not affect the coverage task assigned to sensor camg while
linear motion along the X axis will affect the task assigned to both the sensors. Using
Equation (3.31), the model for Hausdorff tracking for the two—camera system can be
written as:

C C u a 0 J
Cu : 11 12 a: < _ 1 1 (6.8)

/

Figure 6.21 depicts the target tracks of the two targets in the world coordinate
frame. The two targets move independently of each other and can cross each other
as is shown in Figure 6.24.

Figure 6.22 depicts the shape functions .11 and J2 for two cameras caml and camg,
respectively, and the velocity proﬁles of the two inputs v3; and wz to the mobile base
and robot arm are shown in Figure 6.23. The inputs to the two motion axes (linear
velocity v3 and angular velocity wz) are calculated using 6.8.

The results of this experiment show that we can use formation-constrained sensors
in order to accomplish multiple tracking tasks simultaneously using multiple sensors.
This implies that the optimal Hausdorff tracking framework can be leveraged in order

to solve multiple target tracking problems.

149

r g Camera 1
\ . .

r

 

Figure 6.20: Experiment setup for multi-target tracking using multiple cameras.

150

 

Y position (m)

 

 

 

 

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Xposition (m)

Figure 6.21: Target tracks for multi-target tracking using multiple cameras.

151

Sha Functions
4000 .............. : ........... .7 ......... PET ....... . ..... : .....

 

 

 

3500 .............. .............. ............. ........... ..... ‘

 

3000 .............. g .............. ;.H_i.-.ug ............ g .............. ;.-.H.H_H;
N 1 E f f i 3
2500 ............. E ........... ;{H.H.H.H.i ............ i .............. j.m.x.u.né

2000 ............. é .......... jéli ........... é ...... 4E5}; ...... . ....... §.HH_H.H.i

Shape Function, J1, J

. l- . _ - I
. - _ . _ .
. : _ _ .
1 500 ............. . ........... _ ..... , .............
I ‘ --_I _ - I

   
 

1000 ............. g ......... .{;€§; ........

500

'IIIDIIIII,‘.'

 

 

 

Figure 6.22: Experiment setup for multi-target tracking using multiple cameras.

152

100-...W ............... I .................. : ............ ..: ...............
....... Base velocity: v)( i __E

 

 

 

 

—— Arm angular velocity: cox

 

 

 

 

 

 

Figure 6.23: Experiment setup for multi—target tracking using multiple cameras.

153

'l'I'l'I'I'I'I‘I'I'I'l'l‘l'l'I'I'l'l'l'I'I'l'l'l'l'I'I'I'l'l'l'l‘l'. . . . . . . .

     
  
  

   

Arm Camera — Cami Base Camera - Cam2

(a) time = 10 sec

   

Arm Camera — Carnl Base Camera — Cam2
(b) time = 25 sec

   
    

 

I - r .. ' . ‘
Arm Camera — Caml Base Camera - Cam2

(c) time = 37 sec

 

. ' g _, with
Arm Camera — Cam] Base Camera -— Cam2
(d) time = 55 sec

   

Figure 6.24: Image Sequences taken while tracking the two targets with two
formation-constrained cameras

154

 

 

 

7000

6000

5000

Y (mm)

3000

2000

1000

 

 

 

 

Figure 6.25: X—Y Plot for target motion

6. 7.3 Erperimental Evaluation for Multi-Sensor Cooperation using Mu-

tational Hybrid Automata Model

For the experiment, the location of the target (human) moving about was captured
using a stereo vision system. Note that the location of the target was not used in the
feedback control scheme designed, but was used merely to show the random motion
of the target and coordinate it with the various mode changes in the MHA model.
Figure 6.25 represents the position, in x-y coordinates, of the centroid of the target,
located using the stereo camera. The locations of the cameras caml and camz are

superimposed on this ﬁgure.

155

Pervasive Target Tracking Using Two Cameras with Continuous Cam-

era Servoing

In this experiment, the MHA model presented in section is implemented.

Figures 6.27 and 6.26 show the results of the experiments conducted using this
model. Figure 6.27 represents the data calculated by caml and Figure 6.26 represents
the data calculated by camg. Figures 6.27 and 6.26 show the value of the shape
function J, the pan velocity input to the camera, and the mode q E {q1,q2,q3}
(deﬁned in 4.2) that the system is in; and subﬁgure b depicts the current pan position
of the camera and the visibility flag of the target for that camera. The visibility ﬂag
indicates whether the target is within the FOV of the camera and is set when the
target is recognized by the target detection subsystem.

Figure 6.28 depicts an image sequence captured by the two cameras, with subﬁg-
ures a, b and c representing the three modes q1, q3 and q2 respectively. Within each
Figure 6.28(a), 6.28(b) and 6.28(c), the left and right images correspond to caml and
camg, respectively. We notice that the target is in the ﬁeld of view of at least one
camera at any time during the course of the entire experiment.

The results of this experiment show that the mutational hybrid automata frame—
work can be used to systematically design a pervasive surveillance network which can
be used to track a target as it moves across a wide region covered in parts by individ-
ual cameras. It shows that despite the random motions of the target, the surveillance
network can, in fact, perform stable tracking without having to estimate the exact

location of the target.

Pervasive Target Tracking Using Two Cameras with Controlled Switch

This section presents the results of the experimental implementation of the mutational

hybrid automata model with controlled switch shown in Figure 4.3.

156

 

200 ................
' _Shape FunctionJ g

. q3 .
. : pan velocnty
100 ....................... ”...,... .......... A. Mode

 

 

 

 

 

 

 

 

 

 

J, 6, Mode
0
Li
fr
‘3.

 

 

 

—10 i i i i .1
0 20 4O 60 80 100
time(s)
300 ................ : ...... ........ _ ......

 

Pan Angle (O) :
Target Visibility

 

 

 

 

200 ................ ..............
100 _ .. Visible

OJ .......... .............. 3 ........ i

 

 

 

9, Target Visibility

 

 

Not Visible § § § Not Visible -----------
-1o . i . 1
0 20 6° 80 100

 

time (s)

Figure 6.26: Current MHA mode, shape function, velocity, pan angle and target
visibility for Caml

157

100 ......... : ........ : ........ : ......... 1 ........ 2 ........ , ........ 2 ........ : ....... f ........ :

 

 

 

 

 

 

J, 9, Mode

 

—— Shape Function llLTl if . , . ...
Pan Velocity i ' i
— Mode

 

 

 

 

 

 

 

 

 

 

Visible Not Visible

    

 

"100 -------- 1 ........ g ...... i I :
' ' Pan (9)

-200 ... . . . . . . ....... ...... Visibi'ity

 

 

 

 

 

 

 

9, Target Visibility

 

_ g i g i . i i i . .
300o 10 20 3o 40 50 60 7o 80 90 100
time (s)

Figure 6.27: Current MHA mode, shape function, velocity, pan angle and target
visibility for Cam2

158

rip.

. ’i‘.
— in
'. “R

 

(b) Mode q3, (10 sec)

 

Figure 6.28: Image Sequences taken while tracking the target over different modes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

300- .......... ShapeFunctionJ ...... .......... ..........
' ‘ Pan Velocity 6) ' '
g ----- *— Current Mode
0 .
2
co“
5"
-100
-200
:§‘
2?:
.2
>
‘65
2’
(U
[.—
a5
_40 i i i L i i i i
O 5 10 15 , 20 25 3O 35 40
time (see)

Figure 6.29: Current MHA mode, shape function, velocity, pan angle and target
visibility for Caml

Figures 6.29 and 6.30 show the results of the experiments conducted. Figure
6.29 represents the data calculated by caml and Figure 6.30 represents the data
calculated by camg. Figures 6.29 and 6.30 show the value of the shape function J,
the pan velocity input to the camera, and the mode q E {q1, q2, q3,q4,q5} (defined
in 4.3) that the system is in; and subﬁgure b depicts the current pan position of
the camera and the visibility flag of the target for that camera. The visibility flag
indicates whether the target is within the FOV of the camera and is set when the
target is recognized by the target detection subsystem.

These experiments show that irrespective of the target’s current location in the

monitored region, visibility of the target is maintained by at least one camera. This

160

 

 

  
 

 

 

 

 

 

400 , ................... 3 .......... E .......... : ...... d . .i. . . ShapeFunction J
q4 q, Q2 g q3 g qs E 3 """" Pan Velocity 6)
g 200 ........ ....... ................. §-----CurrentMode
2 _ _. . . : : .
a5 .
ﬁ- 0 :1}. 3 "-::-:EE"' l.-o- .......
4d. q2 q1 q4
_200 L i I i i l i j
0 5 10 15 20 25 30 35 40

time (sec)

 

.........................

Pan angle (9)

 

 

 

 

.......................

.......................

 

 

9, Target Visibility

Not Visible

_10 ' I I : i i l
15 . 20 25 30 35 40
time (sec)

 

 

 

Figure 6.30: Current MHA mode, shape function, velocity, pan angle and target
visibility for Cam2

161

experiment further demonstrates the controlled switch scenario, where, the cameras

can be actively turned off, can be used to perform pervasive tracking.

6.8 Performance Analysis of the Video Subsystem

The real-time-performance video subsystem implemented was measured and analyzed
for use in the surveillance network scenario. The parameters which affect the perfor-
mance of a video transport scheme in the context of a switched surveillance scenario
are time taken to deliver the video stream to the operator, size and quality of the
image, the rate of video frame update (framerate) and the initialization time for the
client to receive one complete image frame from the server. We compared the perfor-
mance of using H.261, H.263 and MJPEG as the encoding/decoding scheme for the
visual surveillance tasks. The various choices for the video subsystem implementa-
tion have their advantages and disadvantages based on the scenario they are being
applied to. We propose a scenario/ task based metric which can be used to evaluate
the suitability of the various schemes based on their characteristics. The parameters

for evaluation include:
1. Codec complexity
2. Bitrate of generated video stream
3. Switched video initialization time

Codec complexity can be represented by the amount of time taken for the capture and
compression of an individual image and may be an important factor for implemen-
tation on computationally constrained systems such as embedded processors. The
bitrate of the generated video stream represents the bandwidth consumed for trans-
porting the video stream to the operator and can be a limiting factor for wireless

or power constrained sensor nodes. Switched video initialization time represents the

162

amount of time taken to display the relevant objects (targets or environment) after
a video stream switch has occurred. The proposed metric involving these parameters
is:

M=— — — 6.9
TLE+B+TC ( l

where, K L E, K B, K C are gain coefﬁcients chosen on a per-scenario/ task basis based
on a fuzzy rule. TL E and To represent the switched video initialization time and
capture and compress time, respectively, and B represents the bitrate of the generated
stream.

The surveillance tasks can be divided into two categories - namely, monitoring and
tracking tasks. In the monitoring task, the operator must evaluate the actions of the
target, which may require the knowledge of the environment w.r.t the target, while
the tracking task requires to only keep the target in view. Hence, for the tracking
task, in order to calculate the real switching time for the video system, we need to
account only for the display of a moving target, while for the monitoring task, we need
to evaluate the switching time for the moving target as well as the static environment.

Based on the characteristics of the task / scenario being evaluated, the values of the
gain coefﬁcients are chosen based on a fuzzy rule. For example, if the sensor nodes are
not constrained by the processing requirements or power consumption (implemented
on regular PC’s), the codec complexity will not have much effect on the system
performance, so we assign a very low value to the gain coefficient Kc- Similarly, if
the nodes are networked using a wired LAN (100Mbps ethernet) infrastructure having
large enough bandwidth, then we assign a low value to the gain coefﬁcient K B- In
contrast, if the nodes are sharing a low-bandwidth wireless channel, K B should be
assigned a higher value which signiﬁes the importance of the bandwidth-consumed
parameter.

The various task / scenario combinations and their respective gain values we have

compared for the video system evaluation are tabulated in Table 6.1. In the scenarios

163

Table 6.1: Scenarios and gain values

 

 

 

 

 

# Scenario K L E K B K C
1 Wired network, monitoring task 1 0 0
2 Wired network, tracking task 1 0 0
3 Wireless network, monitoring task 1 1000 0
4 Wireless network, tracking task 1 1000 0

 

 

 

 

 

 

 

implemented, the sensor nodes are implemented on general purpose PC’s running
Linux and hence the capture and compress time is not very crucial to the performance
comparison of the various schemes. The software implementation of the codecs for the
tests was a modiﬁed version of the Vic software. The video streams were packetized

using the RTP transport protocol.

6.8.1 Performance Analysis for Capture and Processing Time

The capture time and compression time for one image frame for H.263 and MJPEG
is shown in Table 6.2. The frame rate is set to 25 fps, the bit rate is set to be 1.5 M

and the image size is set to CIF resolution (352x288) for the experiments.

Table 6.2: Capture and Compression Time of the Two Schemes

 

Scheme MJPEG H.261 H.263
Capture 85 Compression 12.65ms 30.09ms 32.58ms

 

 

 

 

 

 

 

We noticed that for H.261 and H.263, when the scene has dramatic changes, the
compression time tends to increase, and in the case of MJPEG the compression time
is quite constant. Since both MJPEG and H.263 are symmetric coding/decoding

schemes, the decompression time should be equivalent to the compression time.

164

6. 8.2 Comparison for Video Size and Image Quality

The H.263 standard is very limited in its capability for selection of the frame size,
which is limited to 5 standard resolutions, namely, CIF, QCIF, SIF, 4CIF and 16CIF.
This may be a limitation when integrating multiple types of camera sensors. On
the other hand MJPEG allows for almost all resolutions (limited to multiples of 8,
of course) and can be used to transmit very high deﬁnition images when required.
The viewing quality of the transmitted images can be largely regarded as the same
because both schemes use DCT (discrete cosine transform) and quantization for the
compression. However, it should be noted that using MJPEG, the complete image
is updated at once, while for H.263, the image is updated in parts and may cause a

visual degradation of the perceived image quality.

6.8.3 Performance Analysis for Video Bitrate and Frame Rate

Due to the inter-frame (temporal) compression implementation, the video bitrate
generated per frame for the H.261/H.263 schemes is lower compared to MJ PEG. This
makes H.261/H.263 schemes more suitable for video communications over a restricted
bandwidth network. The frame rates and bitrates for the two schemes generated for
ﬁxed and panning cameras are tabulated in Table 6.3. The quantization ‘Q’ values [57]

used are based on a visual notion of picture clarity and are noted in the table.

 

 

 

 

Table 6.3: Ffame Rate and Bitrate for MJPEG, H.261 and H.263
Frame MJPEG: Q=30 H.261: Q=10 H.263: Q=10
Rate Bitrate(KBPS) Bitrate(KBPS) Bitrate(KBPS)
(fps) panning static panning static panning static
10 668 668 550 25 700 20
20 1300 1300 1000 45 1000 38
25 1700 1700 1500 60 1300 48

 

 

 

 

 

 

 

 

165

 

6. 8.4 Performance Analysis for Server Switching and Camera Hand-

017

The initialization time of H.261, H.263 and MJPEG are also measured. The initial-
ization time is deﬁned as the time from when the server starts broadcasting to when
the client receives the ﬁrst image frame. In all three cases, the initialization time is
around 10ms.

For H.261 and H.263 schemes, since inter-frame coding is used, there will be long
delay when a client joins a session where a server is already broadcasting to other
clients. The effect is termed “late entry” and is caused due to the use of inter-frame
encoding, where each frame is not independent of the others but must be decoded
using some information from other frames. The stream switching delay due to late
entry can be large when using these encoding schemes, especially when the scene is
relatively static. The late entry problem does not exist for the MJ PEG coding scheme
because all the frames transmitted are intra—coded and do not rely on information in
the previous or subsequent frames.

For the H.261 scheme implementation the server is only required to transmit those
master blocks that have changed through consecutive frames. Thus the client has to
wait until all the master blocks are transmitted once to be able to see the whole
scene. Based on our implementation methodology for the H.261 scheme, a moving
target will be displayed almost immediately while the static environment will take
a longer time to display after the switch, which implies that the switching time will
be low for a tracking task and relatively high for a monitoring task where the static
environment needs to be taken into account.

The H.263 standard allows the transmission of a completely intra—coded frame,
called the ‘1’ frame, to be transmitted periodically in order to negate the effects of
accumulated errors. In our experiments for the H.263 coded stream carried out at

various frame rate settings, the client has to wait for approximately 5—10 seconds to

166

decode and display the ﬁrst image frame because it can start to decode a stream
only after it has received the ﬁrst ‘I’ frame. Table 6.4 tabulates the maximum time
taken by the client to display all the relevant macro blocks for the ﬁrst time when the
client joins the broadcast session late for both the tracking and monitoring tasks. The
maximum switching time due to late entry problem is compared for various frame

rates of the video streams with a very large bandwidth.

Table 6.4: Server Switching Time for Monitoring and Tracking Tasks

 

 

 

 

 

Frame rate Monitoring task Tracking task
(fps) MJPEG H.261 H.263 MJPEG H.261 H.263
1 13 323 9.33 ls ls 9.53
10 0.13 3.23 9.83 0.13 0.123 9.73
25 0.063 1.33 9.23 0.063 0.053 9.43

 

 

 

 

 

 

 

 

 

Note that the times given in Table 6.4 are the measured values for the display to
update. We notice that the switching time for the H.261 scheme is signiﬁcantly lower
for the tracking task than for the monitoring task because the macro blocks comprising
the moving target will be updated and transmitted almost instantly; however, the
static environment blocks will not be updated until they age and expire at the end
of their 32-frame update cycle.

It should be noted that for the H.263 coding scheme, all the information needed to
initialize the decoder is stored in an intra coded ‘1’ reference frame, which is transmit-
ted periodically. One method to combat this late entry problem would be to transmit
an ‘I’ frame every time a new client joins the session, and can be implemented using
the RTCP (control commands) part of the RTP protocol. This may lead to higher

bandwidth consumption, but may be acceptable in wired surveillance applications.

167

6.8.5 Performance Metric Evaluation for Various Surveillance Net-

work Tasks

Based on the experimental results reported above, the metric It! evaluated for the
various scenarios and the codec schemes is tabulated in Table 6.5. The video frame
rate was held constant at 25 fps with the image quality (Q) being held at 30 for
MJ PEG and 10 for H.261/H.263. The cameras are assumed to be stationary.

Table 6.5: Metric for Comparison of Various Schemes

 

 

 

Scenario # MJPEG H.261 H.263
1: Wired, monitoring 16.67 0.769 0.109
2: Wired, tracking 16.67 20 0.109

 

3: Wireless, monitoring 17.26 17.43 20.94
4: Wireless, tracking 17.26 36.67 20.94

 

 

 

 

 

 

 

We notice that the H.261 implementation has best performance for the wired and
wireless tracking scenarios, whereas for the wired monitoring scenario, MJPEG has
the best performance and the H.263 implementation manages better performance for
the wireless monitoring scenario. It must be kept in mind that the performance eval-
uation for the various schemes are qualiﬁed by their implementation methodologies
outlined in section 5.5. A brief discussion regarding the advantages and disadvantages
of using the various video subsystems is presented here.

The MJ PEG system can transmit video frames of various sizes and hence has the
advantage of being able to handle heterogenous hardware for video capture. Further
it has a low coding and initialization latency and allows for a direct hardware im-
plementation, which is advantageous for using lower processing power on the sensor
node. It does not suffer from the late entry problem during switching. However, it
requires a consistently high bandwidth, which may limit its application on wireless
or low bandwidth communication channels.

H.261 and H.263 encoding schemes consume less bandwidth, allowing for higher

168

frame rates, which is an important characteristic for continuous and responsive real-
time monitoring. The drawbacks of these schemes are that they can handle video
frames of certain speciﬁed sizes only and that both initialization time and switching
time are higher than in MJ PEG. They also suffer from the late entry problem, which
can be detrimental for systems involving video stream switching and nodes linked over
unreliable channels. However, the late entry problem can be solved at the expense

of transmitting the complete intra—frame—encoded image when requested using the

RTCP protocol.

6.9 Chapter Summary

This chapter presents the experimental implementation details of the various algo-
rithms proposed earlier. The image-based Hausdorff tracking method has been im—
plemented for tracking a single and multiple targets simultaneously. This tracking of
multiple disjoint targets moving independently is a major advantage of the proposed
method, as the current existing image—based tracking techniques, such as visual ser-
voing and gaze control, cannot achieve this. Results for the cooperative Hausdorff
tracking are also presented. The two multi-sensor cooperation methods proposed are
also validated experimentally. A task-based metric is also proposed for evaluating the

suitability of the various video feedback schemes.

169

Chapter 7

Conclusions

This dissertation has presented an in-depth examination of the various aspects of
planning and control of mobile surveillance networks, concentrating particularly in
the broad areas of modeling, analysis and design of mobile surveillance networks. This

chapter discusses the conclusions and gives a view towards future areas of research.

7.1 Conclusions

The non-intrusive nature of cameras makes them an attractive choice for sensors for
various surveillance applications. In recent times, with the decrease in the cost of cam-
era sensors accompanied with increasing processing power and availability of wireless
networking capabilities in sensors of exceedingly smaller size, large-scale target track-
ing systems utilizing a large number of cameras are being increasingly deployed for
myriad surveillance and tracking tasks. Sensor mobility accompanied with large scale
wireless networking can overcome the limitations due to range, direction and occlu-
sion of visual sensing. However, a large number of sensors with motion capabilities
tends to increase the modeling and analysis complexity of such systems.

For successful completion of the surveillance task, it is important to allocate the

sensors to the targets that are actively being tracked. Generally, each target should

170

be tracked by the sensor that can view it at an optimal resolution for best visibility of
that target. However, when the system is heavily loaded, that is, the target density
in a particular region with a limited number of sensors is high, one sensor might need
to track multiple targets. Even when the system is under-loaded, multiple target
tracking by individual sensors is highly desirable, as viewpoint redundancy will lead
to better estimates of the target coordinates and also present the viewer a clearer
view of all the targets that are being tracked.

In order to track a target over a wide area with a distributed surveillance network,
the tracking sensors need to be successively switched and the tracking task needs to
be handed over from one sensor to the next. This switching, while increasing the
modeling and analysis complexity, also introduces the design problem of optimally
locating the sensors (i.e., conﬁguring the sensors) in order to maximize the tracking
performance. Switching the tracking task from one sensor to another tends to be quite
disorienting for a human operator and hence needs to be minimized while tracking a
target over a large region, while also maintaining adequate resolution of the target to
allow discerning various target features for classiﬁcation and identiﬁcation of targets.

This dissertation presents various novel approaches to solve the aforementioned
problems associated with mobile surveillance networks. The original contributions of
this dissertation can be broadly classiﬁed into modeling, analysis and design contri-

butions, which are individually addressed below.

Modeling Contributions Using the concepts of continuous and discrete motion
of cameras, a generalized modeling framework for conducting pervasive surveillance
has been developed. First, we have developed a novel domain-based modeling ap-
proach, christened “Hausdorff tracking”, for visually tracking multiple targets with
a single sensor while simultaneously accomplishing various other sub-tasks such as

maintaining adequate resolution of the tracked targets. Hausdorff tracking uses a

171

mutational-analysis-based approach to express the motion of the target and sensor
coverage sets with respect to the camera motion input. The various task requirements
can be speciﬁed using shape functions which can be construed as analogs of vector
errors in a vector—based control system.

Using the concept of mutations of domains and shape analysis, the dynamics
(change in shape) of the sensor ﬁeld of view (FOV) and target domains are described
and further feedback control mechanisms have been derived to complete the speciﬁed
task. The assumptions on the control inputs to the camera for stability of the control
system can be calculated using the shape Lyapunov theorem. Hausdorff tracking
provides a signiﬁcant advantage over classical vector space servoing as it can express
the multiple target tracking succinctly and perform the tracking task in an energy-
efficient manner. The methods of image-based Hausdorff tracking and cooperative
Hausdorff tracking are introduced to perform target tracking using target information
either directly sensed from the camera image plane or acquired from other sensors
using the communication module, respectively.

In order to pervasively track a target over a large area covered by multiple sensors,
the mutational hybrid automata (MHA) model is proposed. The MHA model is
capable of capturing the inherent hybrid (discrete switching with continuous motion)
nature of the pervasive surveillance task - i.e., it accounts for both continuous camera
motion and discrete jumps in the location of the tracking camera, which indicates
a switch of the tracking camera from one sensor to another. The MHA model for
pervasive surveillance can be used to model “hand-off” scenarios as the targets move
across the coverage regions of multiple networked sensors. It allows modeling of both
autonomous and controlled switches, which provides for a rich framework for hybrid
control of the various tracking cameras based on the location of the target and other

environment inputs.

172

:E_t Wﬁﬂ

Analysis Stability properties of the Hausdorff tracking controllers are proved using
the shape Lyapunov theorem. Using the shape Lyapunov theorem, it is shown that
the shape function (error) converges to zero at least as fast as a stable ﬁrst-order
linear system. Stability for a pervasive tracking scenario implies that when a target
is detected by a camera in the network, its visibility will always be ensured while
it moves around in the monitored system. Using examples of pervasive surveillance
scenarios, a two-step procedure is proposed to show the stability properties of the
MHA model for modeling pervasive surveillance tasks. The ﬁrst step shows that all
the modes of the MHA model are internally stable - i.e., continuous evolution within
each mode is stable in the sense of Lyapunov and there are no Zeno executions. The
next step is to show that a particular set of stable modes in the MHA model forms an
invariant set - i.e., if the hybrid execution enters this invariant set, it will remain in
this set for all time. Now if we limit the Init set of the MHA model to the invariant
set, the execution will never leave that set. This implies that if we can prove that the
invariant set is the set of all modes where at least one camera can view the target, then
if the target enters the ﬁeld of view of any one camera, it will be tracked continuously
as long as it remains in the monitored region. This procedure is developed for both

models with and without controlled switches.

Design A wide area multi-camera location and pose calibration scheme is presented
that can calibrate the many heterogenous cameras with respect to a global coordinate
system in a short time by simply waving a bright light source in the region observed
by the cameras. This method can be used even when all cameras do not have an
overlapping ﬁeld of view. The problem of optimally placing a number of cameras in
a given region to facilitate a surveillance task is presented. Metrics for evaluating
the performance of the tracking task based on the target resolution and number of

switches to pervasively track the target are presented. Another problem we presented

173

 

is related to camera switching strategy given a full or part of a trajectory path of a
moving target. A dynamic-programming—based approach to generate the switching
strategy and optimize both the switching and resolution metrics is presented.

The Hausdorff tracking problem is extended to incorporate an optimal control
framework for executing multiple subtasks along with the target visibility task for a
surveillance network. This optimization framework allows the surveillance task de-
signer to consider various sub-tasks in the planning phase. Various sub—tasks can
include minimizing the energy consumed by the network, which will enhance the
longevity of the deployed network sensors. Physical programming is used for ﬁnd-
ing a solution to the optimization problem in order to lend a tangible and physical
interpretation to the subtasks being considered.

The design of a switched video feedback system and a sensor node architecture
are also presented. The various alternatives for the switched video feedback namely
MJ PEG, H.261, H.263 and H.264 are evaluated and a task- and scenario-based metric
is proposed to analyze their performance. The MJPEG scheme does not suffer from
the late entry problem, but the bandwidth requirement is constantly high. H.261 and
H.263 schemes consume low bandwidth for relatively static scenes but suffer from
the late entry problem when video switching is required. A switched video feedback
system using a multiple-client, multiple-server architecture using the H.264 video

encoding scheme is implemented.

Implementation and testing A surveillance test-bed consisting of seven heteroge-
nous cameras with motion capabilities for validation and testing of the proposed
schemes and algorithms is implemented. The sensors are calibrated using the wide
area calibration scheme proposed and various target tracking scenarios are imple-
mented and tested. Experimental results validating the proposed theoretical ap-

proaches are presented.

174

The various theoretical approaches and tools presented in this dissertation will
enable the design, analysis and implementation of real-time mobile surveillance sys-

tems.

Overall Impacts All together this dissertation presents a collection of modeling
analysis and design tools for mobile surveillance networks. The impacts of this dis-

sertation to various ﬁelds are summarized below.

Impact on the ﬁeld of surveillance and sensor networks: This dissertation presents
signiﬁcant advances for modeling analysis and design for the ﬁeld of surveillance and
sensor networks. Modeling multiple target tracking using the Hausdorff tracking
method provides a signiﬁcant advantage for distributed sensor systems as it can sig-
niﬁcantly improve sensing accuracy due to multiple detections of the same target with
different sensors. The novel stability analysis approach presented for guaranteing tar-
get coverage using distributed sensors is a signiﬁcant step forward from the current

approaches of probabilistic modeling.

Impact on the ﬁeld of robotics and control: This dissertation presents a major
contribution to the ﬁeld of robotics and control. Vision based robot control has
suffered from the problem of task speciﬁcity i.e., the tracking task is generally over
speciﬁed and also lacked a generalized modeling structure for modeling multi target
visibility problems. The method of Hausdorff tracking proposed in this dissertation
represents a generalized set based modeling structure for modeling traditional vi-
sion based servoing. It can in fact be viewed as the super set of visual servoing as
traditional visual servoing tasks can also be readily expressed in this framework.

The use of mutational equations and shape analysis for representing shape based
control is a signiﬁcant step towards extending the rich tool-set of control literature to

domain based control. Generalizing a vector based model to a domain based model

175

opens up a lot of avenues for creative design of control algorithms. Many real-world
tasks which needed to be parameterized in order to be expressed in a controls theoretic

framework can be directly solved using this domain based approach.

Impact on general science and engineering: This dissertation presents various
results on suitability of various video streaming approaches for surveillance appli-
cations. It provides a method to measure the acceptability of various streaming
solutions which will have a signiﬁcant impact on the implementation of networking

and IP-based video transmission.

Impact on society in general: There are many issues up for debate on privacy
and intrusion of it thereof. However, these concerns should not preclude the pervasive
monitoring and surveillance of critical infrastructure. In this current world scenario,
exhibiting a little precaution with regards to safety is, in my opinion, not a bad
idea. This dissertation provides tools for conducting pervasive surveillance which will
have many potential uses with regards to monitoring critical infrastructure such as
airports power plants etc and also many applications in our fuller understanding of the
world around us i.e., in environmental monitoring for understanding and preserving
natural habitats. However there are also many possible uses for this technology such
as continuous tracking of people with express intent of invasion of their privacy. But
all technology like a double edged sword has it’s advantages and disadvantages. It
is behooved on the people who use technology to be fair, just and moral in the

application of technology.

7.2 Future Research Directions

Large-scale surveillance systems have only recently been commercialized, and the

architecture and capabilities of these systems are still an active research area as many

176

problems still remain to be solved. This dissertation has addressed several issues in
modeling, analysis and design of mobile surveillance systems. Certainly more can be
done to elaborate on the work presented in this dissertation and even more can be
done to address various other issues in mobile surveillance which have not been the
topic of this dissertation.

Shape functions for performing visibility- and resolution-based tasks have been
developed in this dissertation. However, more general classes of admissible shape
functions should be developed in order to provide a rich set of tools to design surveil-
lance tasks.

The mutational hybrid automata model presented uses an edge set in order to
describe the discrete switching dynamics of the pervasive target tracking model. It
should be noted that the number of edges and modes required to model the perva-
sive surveillance task increases exponentially as the number of cameras involved in
the surveillance task grows. This somewhat limits the applicability of the proposed
approach of using edge sets for modeling discrete dynamics. In order to alleviate this
problem, a language-based automata model can be used with an alphabet capable of
modeling very large-scale camera systems.

The proposed MHA model can be used for ensuring the visibility of a single
target moving in a region. The possibility of combining multiple such single-target
MHA models in order to track multiple targets simultaneously using the same network
should be investigated. Optimal control for the proposed mutational hybrid automata
model is still an open problem.

The switched video feedback system is implemented using a baseline encoder. A
high proﬁle encoder with object-level support to code the various objects separately
would certainly enhance the video feedback system by reducing the bandwidth re-
quirement for the surveillance system. The support of user-deﬁned queries and tasks

will certainly enhance the usability of the surveillance system, making it more ﬂexible

177

and adaptable for a real-world surveillance scenario.

There has been relatively little research done on camera deployment and place-
ment issues for tracking applications. The switching-based metric presented in this
dissertation serves as a ﬁrst step towards the ﬁnal goal of optimal camera placement.
A number of avenues can be pursued to carry this work further. A user-friendly
camera placement planning system that utilizes the switching metric amongst others

would provide a very useful tool for optimal camera placement.

178

10.

11.

AUTHOR’S PUBLICATIONS

A. Goradia, M. Huntwork, C. Haffner, N. Xi, and M. Mutka, “Modeling and
Control of Pervasive Surveillance Networks using Mutational Hybrid Systems,”
in IEEE IEEE/RSJ International Conference on Intelligent Robots and Sys-
tems, 2006.

A. Goradia, C. Haffner, N. Xi and M. Mutka, “Optimality Framework for Haus-
dorff Tracking using Mutational Dynamics and Physical Programming,” submit-
ted to IEEE International Conference on Robotics and Automation, 2007.

A. Goradia, Z. Cen, C. Haffner, N. Xi and M. Mutka, “Sensor Selection Problem
using Dynamic Programming,” submitted to IEEE International Conference on
Robotics and Automation, 2007.

A. Goradia, C. Haffner, Z. Cen, M. Mutka and N. Xi, “Modeling, Analysis and
Design of Mobile Surveillance Networks Using Mutational Hybrid Systems,”
Submitted to IEEE Transactions on Robotics.

. A. Goradia, M. Huntwork, C. Haffner, C. Klochko, N. Xi, and M. Mutka, “Per-

vasive Surveillance using a Cooperative Mobile Sensor Network,” in IEEE In—
ternational Conference on Robotics and Automation, 2006.

A. Goradia, Z. Cen, C. Haffner, Y. Liu, B. H. Song, M. Mutka and N. Xi,
“Pervasive Surveillance Networks: Design, Implementation and Performance
Analysis,” Accepted, International Journal of Distributed Sensor Networks.

A. Goradia, Z. Cen, C. Haffner, N. Xi and M. Mutka, “Design, Implementation
and Performance Analysis of Pervasive Surveillance Networks”, 19th Interna-
tional F LAIRS conference, 2006.

A. Goradia, N. Xi, Z. Cen, and M. Mutka, “Modeling and Design of Mobile
Surveillance Networks using a Mutational Analysis Approach,” in IEEE/RSJ
International Conference on Intelligent Robots and Systems, 2005.

A. Goradia, N. Xi, M. Prokos, Z. Gen, and M. Mutka, “Cooperative Multi-
Target Surveillance using a Mutational Analysis Approach,” in Proceedings of
2005 IEEE/ASME International Conference on Advanced Intelligent Mecha-
tronics (AIM 2005), Monterey, California, USA, 2005.

A. Goradia, I. Elhajj, and N. Xi, “Internet Based Robots: Applications, Im-
pacts, Challenges and Future Directions,” in IEEE Workshop on Advanced
Robotics and its Social Impacts (ARSO), 2005.

Z. Cen, A. Goradia, Y. Liu, N. Xi, M. Mutka, “Improving the Operation Effi-
ciency of Internet-Based Teleoperation via Overlay Networks,” Submitted, IEEE
Transactions on Robotics.

179

12

13.

14.

15.

16.

17.

18.

19.

20.

21.

Z. Cen, A. Goradia, M. Mutka, N. Xi, W.-K. Fung, and Y.-h. Liu, “Improving
the Operation Efficiency of Supermedia Enhanced Internet Based Teleoperation
via an Overlay Network,” in IEEE International Conference on Robotics and
Automation, Barcelona, Spain, 2005.

Z. Cen, M. Mutka, Y. Liu, A. Goradia, and N. Xi, “QoS Management of Super-
media Enhanced Teleoperation via Overlay Networks,” in IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems (IROS 2005), Edmonton,
Alberta, Canada, 2005.

A. Goradia, Z. Nichol, Y. Liu, P. Suchyta, M. Prokos, and N. Xi, “Super-
Media Enhanced Internet-Based Real—Time Teleoperation,” in IEEE Interna-
tional Conference on Mechatronics and International Conference on Hands-on
Intelligent Mechatronics and Automation (ICM-HIMA), 2005.

A. Goradia, L. Pan and N. Xi, “Dynamic Multi-Objective Optimal Task Dis-
tribution for Teleoperated Mobile Manipulators,” in IEEE/RSJ International
Conference on Intelligent Robots and Systems, 2004.

J. Tan, A. Goradia, N. Xi, and W. Sheng, “Coordination of Multi-Robot and
Human Systems in a Perceptive Reference Frame,” International Journal of
Vehicle Autonomous Systems, vol. 2, 2004.

I. Elhajj, A. Goradia, N. Xi, C. M. Kit, Y.-H. Liu, and T. Fukuda, “Design
and Analysis of Internet-Based Tele—Coordinated Multi-Robot Systems,” Au-
tonomous Robots, vol. 15, no. 3, 2003.

I. Elhajj, A. Goradia, N. Xi, C. M. Kit, Y.-H. Liu, and T. Fukuda, “Tele-
Coordinated Control of Multi-Robot Systems via the Internet.” in IEEE Inter-
national Conference on Robotics and Automation, 2003.

J. Tan, A. Goradia, N. Xi, and W. Sheng, “Coordination of Human and Mobile
Manipulator Formation in a Perceptive Reference Frame.” in IEEE Interna-
tional Conference on Robotics and Automation, 2003.

I. Elhajj, A. Goradia, N. Xi, W. T. Lo, Y.—H. Liu, and T. Fukuda, “Internet-
Based Tele-Manufacturing,” in The Seventh International Conference on Au—
tomation Technology, Chia—yi, Taiwan, 2003.

Amit Goradia, Jindong Tan, Ning Xi, “Hybrid Force/ Position Control in Mov-

ing Coordinate Frames,” International Conference of Control, Automation, Robotics

and Computer Vision, Singapore, 2002.

180

BIBLIOGRAPHY

[1] C. S. Regazzoni, G. Fabri, and G. Vernazza, Advanced Video-Based Surveillance
Systems. Norwell, MA: Kluwer, 1998.

[2] G. L. Foresti, P. Mahonen, and C. S. Regazzoni, Multimedia Video-Based Surveil-
lance Systems: Requirements, Issues and Solutions. Norwell, MA : Kluwer, 2000.

[3] M. Bogaert, N. Chelq, P. Cornez, C. S. Regazzoni, A. Teschioni, and M. Thonnat,
“The password project,” in Proceedings of the International Conference on Image
Processing, Chicago , IL, 1996, p. 675 678.

[4] C. S. Regazzoni and A. Tesei, “Distributed data fusion for real-time crowding
estimation,” Signal Processing, vol. 53, p. 47 63, 1996.

[5] E. F. Lyon, “The application of automatic surface lights to improve airport
safety,” IEEE AES Systems Magazine, p. 14 20, 1993.

[6] M. Braasch, M. DiBenedetto, S. Braasch, and R. Thomas, “Laas operations in
support of airport surface movement, guidance, control and surveillance: Initial
test results,” in IEEE Proceedings on Position Location and Navigation Sympo-

sium, 2000, p. 82 89.

[7] H. Wang, T. Kirubarajan, and Y. Bar-Shalom, “Precision large scale air
trafﬁc surveillance using imm/ assignment estimators,” IEEE Transactions on
Aerospace and Electronic Systems, vol. 35, p. 255 266, January 1999.

[8] B. Sridhar and G. B. Chatterji, “Computationally efficient conﬂict detection
methods for air trafﬁc management,” in Proceedings of the American Control
Conference, vol. 2, 1997, p. 1126 1130.

[9] G. Donohue, “Vision on aviation surveillance systems,” in Proceedings of the
IEEE International Radar Conference, 1995, p. 1 4.

[10] G. L. Foresti and B. Pani, “Monitoring motorway infrastructures for detection
of dangerous events,” in Proceedings of the IEEE International Conference on
Image Analysis and Processing, Venice, Italy, 1999, p. 1144 1147.

[11] J. M. Manendez, L. Salgado, E. Rendon, and N. Garcia, “Motorway surveil-
lance through stereo computer vision,” in Proceedings of the 33rd IEEE Annual
International Carnahan Conference on Security Technology, 1999, p. 197 202.

[12] D. Koller, K. Daniilidis, and H. Nagel, “Model-based object tracking in monoc-
ular image sequences of road trafﬁc scenes,” International Journal on Computer
Vision, vol. 10, p. 257 281, 1993.

[13] J. Malik, D. Koller, and J. Weber, “Robust multiple car tracking with occlusion
reasoning,” in European Conference on Computer Vision, Stockolm, Sweden,
1994, p. 189 196.

181

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

A. N. Ince and E. Topuz, “The design and computer simulation of a maritime
surveillance system,” in International Conﬁerence on Radar, 1997, p. 653 656.

J. G. Sanderson, M. K. Teal, and T. J. Ellis, “Target identiﬁcation in complex
maritime scenes,” in 6th International Conference on Image Processing and its
Applications, vol. 2, 1997, p. 463 467.

C. A. Rodriguez, J. A. Howell, H. O. Menlove, C. M. Brislawn, J. N. Bradley,
P. Chare, and T. Gorten, “Nuclear video image pro—cessing for nuclear safe-
guards,” in Proceedings of the IEEE 29th Annual International Carnahan Con-
ference on Security Technology, 1995, p. 355 363.

E. Stringa and C. S. Regazzoni, “Content-based retrieval and real time detection
from video sequences acquired by surveillance systems,” in IEEE International
Confnference on Image Processing, vol. 3, Chicago, IL, October 4-7 1998, p. 138
142.

C. Lin and R. Nevatia, “Building detection and description from a single intensity
image,” Computer Vision and Image Understanding, vol. 72, no. 2, p. 101 121,
1998.

L. Vergara and P. Bernabeu, “Automatic signal detection applied to ﬁre control
by infrared digital signal processing,” Signal Processing, vol. 80, no. 4, p. 659
669, 2000.

W. Millesi, M. J. Truppe, F. Watzinger, A. Wagner, and R. Ewers, Lecture Notes
in Computer Science. Berlin, Germany: Springer, 1997, vol. 1205, ch. Image
guided surgery extended by remote stereotactic visualization, p. 813 821.

I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveillance of peo—
ple and their activities,” IEEE Transaction on Pattern Analalysis and Machine
Intelligence, vol. 22, p. 809830, August 2000.

N. M. Oliver, B. Rosario, and A. P. Pentland, “A bayesian computer vision sys-
tem for modeling human interactions,” IEEE Transactions on Pattern Analalysis
and Machine Intelligence, vol. 22, no. 8, p. 831843, August 2000.

D. A. Pritchard, “System overview and applications of a panoramic imaging
perimeter sensor,” in Proceedings of the IEEE 29th Annual International Car-
nahan Conference on Security Technology, 1995, p. 420425.

U. Oppelt, “New possibilities for video applications in the security ﬁeld,” in
Proceedings of the IEEE 29th Annual International Carnahan Conference on
Security Technology, 1995, p. 426435.

B. Peters, J. Meehan, D. Miller, and D. Moore, “Sensor link protocol: Linking
sensor systems to the digital battleﬁeld,” in Proceedings of the IEEE Military
Communications Conference, vol. 3, 1998, p. 919923.

182

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

M. T. Fennell and R. P. Wishner, “Battleﬁeld awareness via synergistic sar and
mti exploitation,” IEEE Aerospace and Electronic Syststems Magazine, vol. 13,
p. 3943, February 1998.

J. S. Draper, S. Perlman, C. K. Chuang, M. Hanson, L. Lillard, B. Hibbeln, and
D. Sene, “Tracking and identiﬁcation of distant missiles by remote sounding,” in
Proceedings of the IEEE Aerospace Conference, vol. 4, 1999, p. 333341.

G. A. V. Sickle, “Aircraft self reports for military air surveillance,” in Proceedings
of the IEEE Digital Avionics Systems Conference, vol. 2, 1999, p. 28.

K. Toyoma, J. Krumm, B. Brumitt, and B. Meyers, “Wallﬂower: Principles and
practice of background maintainence,” in International Conference on Computer
Vision, 1999, pp. 255—261.

I. Elhajj, N. Xi, W. K. Fung, Y.-H. Liu, Y. Hasegawa, and T. Fukuda,
“Supermedia-enhanced internet-based telerobotics,” Proceedings of the IEEE,
vol. 91, no. 3, pp. 396 —- 42, March 2003.

N. Xi and T. J. Tarn, “Action synchronization and control of internet based teler-
obotic systems,” in IEEE International Conference on Robotics and Automation,
1999.

C. Regazzoni, V. Ramesh, and G. E. Foresti, “Special issue on third generation
surveillance systems,” Proceedings of the. IEEE, vol. 89, Oct. 2001.

R. Collins, A. Lipton, and T. Kanade, “Introduction to the special section on
video surveillance,” Pattern Analysis and Machine Intelligence, IEEE Transac-
tions on, vol. 22, no. 8, pp. 745—746, August 2000.

J. Tan, N. Xi, W. Sheng, and J. Xiao, “Modeling multiple robot systems for
area coverage and cooperation,” in International Cconference on Robotics and
Automation, 2004.

S. Stoeter, P. Rybski, M. Erickson, M. Gini, D. Hougen, D. Krantz, N. Pa-
panikolopoulos, and M. Wyman, “A robot team for exploration and surveillance:
Design and architecture,” in Proceedings of the International Conference on In-
telligent Autonomous Systems, 2000, pp. 767—774.

J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile
sensing networks,” in International Conference on Robotics and Automation,

2003.

T.Matsuyama and N .Ukita, “Real-time multitarget tracking by a cooperative
distributed vision system,” Proceedings of the IEEE, vol. 90, no. 7, pp. 1136—
1150, 2002.

183

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

T. Boult, R. J. Micheals, X. Gao, and M. Eckmann, “Into the woods: Visual
surveillance of noncooperative and camouﬂaged targets in complex outdoor set-
tings,” Proceedings of the IEEE, vol. 89, Oct. 2001.

R.T.Collins, A.J.Lipton, H.Fujiyoshi, and T.Kanade, “Algorithms for coopera-
tive multisensor surveillance,” Proceedings of the IEEE, vol. 89, pp. 1456—1477,
2001.

R. Brooks, C. Grifﬁn, and D. Friedlander, “Self-organized distributed sensor
network entity tracking,” International Journal of High Performance Computing,
vol. 16, no. 2, 2002.

S. Hutchinson, G. Hager, and P. Corke, “A tutorial on visual servo control,”
IEEE Transactions on Robotics and Automation, vol. 12, no. 5, pp. 651—670,
1996.

C. Brown, “Gaze control with interactions and delays,” IEEE Transactions on
Systems Man and Cybernetics, vol. 20, no. 1, pp. 518—527, 1990.

M. Branicky, V. Borkar, and S. Mitter, “A uniﬁed framework for hybrid control:
Model and optimal control theory,” IEEE Transactions on Automatic Control,
vol. 43, no. 1, pp. 31-45, 1998.

J. Lygeros, K. Johansson, S. Simic, J. Zhang, and S. S. Sastry, “Dynamical prop-
erties of hybrid automata,” IEEE Transactions on Automatic Control, vol. 48,
no. 1, pp. 2—17, 2003.

C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva, “Di-
rected Diffusion for Wireless Sensor Networking,” IEEE/A CM Transactions on
Networking, vol. 11, no. 1, 2003.

D. Braginsky and D. Estrin, “Rumor Routing Algorithm for Sensor Networks,”
Proceedings of the 1 st ACM international workshop on Wireless sensor networks
and applications, pp. 22—31, 2002.

H. S. Hamza and S. Wu, “RAODV: an Efﬁcient Routing Algorithm for Sen-
sor Networks,” Proceedings of the 2nd International Conference on Embedded
Networked Sensor Systems, pp. 297—298, 2004.

R. T. Collins, A. J. Lipton, and T. Kanade, “Introduction to the special section
on video surveillance,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 22, p. 745746, August 2000.

A. J. Courtemanche and A. Ceranowicz, “Modsaf development status,” in Pro-
ceeding of the 5th Conference on Computer Generated Forces and Behavioral
Representation, Orlando, FL, May 1995, p. 313.

K. Cauble, “The openscene modstealth,” in Proceedings of the Simulation Inter-
operability Workshop, Paper 97S-SIW—008 1997.

184

[51] C. S. Regazzoni, C. Sacchi, and C. Dambra, “Remote cable-based video surveil-
lance applications: the avs—rio project,” in Proceedings of ICIAP99, Venice ,
Italy, September 2729 1999, p. 12141215.

[52] P. Rybski, S. Stoeter, M. Gini, D. Hougen, and N. Papanikolopoulos, “Perfor-
mance of a distributed robotic system using shared communications channels,”
Robotics and Automation, IEEE Transactions on, vol. 18, no. 5, pp. 713 —- 727,
October 2002.

[53] ITU-H.261, “Video codec for audio—visual services at 64—1920 kbit/s,” I TU-T
Recommendation H.261, 1993.

[54] ITU-H.263, “Video coding for low bit rate communication,” I TU—T Recommen-
dation H.263, March 1996.

[55] I.-T. R. H. . I. 11496-10, “Advanced video coding,” Final Committee Draft,
Document J VT-E022, September 2002.

[56] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “Rtp: A transport
protocol for real—time applications,” RF 01889, July 1997.

[57] L. Berc, W. Fenner, R. Frederick, S. McCanne, and P. Stewart, “Rtp payload
format for jpeg—compressed video,” RF C2435, October 1998.

[58] E. Sontag, Mathematical Control Theory. Springer-Verlag, 1990.

[59] J .-P. Aubin, “Mutational equations in metric spaces,” Set- Valued Analysis, vol. 1,
pp. 346, 1993.

[60] —, Mutational and Morphological Analysis: Tools for Shape Evolution and
Morphogenesis. Birkhéiuser,1999.

[61] R. Alur, T. Henzinger, G. Lafferriere, and G. Pappas, “Discrete abstractions of
hybrid systems,” Proceedings of the IEEE, vol. 88, no. 7, pp. 971—984, 2000.

[62] J. Cea, “Problems of shape optimal design,” Optimization of Distributed Param-
eter Structures vol. I and II, pp. 1005—1087, 1981.

[63] J. Sokolowski and J .-P. Zolesio, Introduction to Shape Optimization: Shape Sen-
sitivity Analysis, ser. Computational Mathematics. Springer-Verlag, 1991.

[64] A. C. Sanderson, L. Weiss, and C. Neuman, “Dynamic sensor based control of
robots with visual feedback,” IEEE Transactions on Robotics and Automation,
vol. RA-3, pp. 404—417, October 1987.

[65] L. Doyen, “Shape laypunov functions and stabilization of reachable tubes of
control problems,” Journal of Mathematical Analysis and Applications, vol. 184,
pp. 222—228, 1994.

185

[66] —, “Mutational equations for shapes and vision-based control,” Journal of
Mathematical Imaging and Vision, vol. 5, no. 2, pp. 99—109, 1995.

[67] A. Goradia, Z. Cen, M. Mutka, and N. Xi, “Modeling and design of mobile
surveillance networks using a mutational analysis approach,” in IEEE/RSJ In-
ternational Conference on Intelligent Robots and Systems, 2005.

[68] A. Goradia, N. Xi, M. Prokos, Z. Gen, and M. Mutka, “Cooperative Multi-
Target Surveillance Using a Mutational Analysis Approach,” in Proceedings of
2005 IEEE/ASME International Conference on Advanced Intelligent Mechatron-
ics (AIM 2005), Monterey, California, USA, 2005.

[69] K. Fregene, D. Kennedy, and D. Wang, “A study of supervisory constraints in a
class of coordinated multiagent systems,” in Proceedings of the American control
Conference, 2003, pp. 1044-4049.

[70] Z. Zhang, “A ﬂexible new technique for camera calibration,” IEEE Transactions
on Pattern and Machine Intell, vol. 22, no. 11, pp. 1330—1334, 2000.

[71] —, “Motion and structure from two perspective views: from essential param-
eters to euclidean motion through the fundamental matrix,” J. Opt. Soc. Am,
vol. 14, no. 11, 1997.

[72] R. D. Z. Zhang, 0. Faugeras, “An effective technique for calibrating a binocular
stereo through projective reconstruction using both a calibration object and the
environment,” Videre: Journal of Computer Vision Research, vol. 1, no. 1, 1997.

[73] L. Shapiro and G. Stockman, Computer Vision. Prentice Hall, 2001.

[74] X. Chen, J. Davis, and P. Slusallek, “Wide area camera calibration using virtual
calibration objects,” in Computer Vision and Pattern Recognition. Proceedings.
IEEE Conference on, 2000.

[75] A. Heyden and K. Astrom, “Minimal conditions on intrinsic parameters for eu-
clidean reconstruction,” in ACC V ’98: Proceedings of the Third Asian Confer-
ence on Computer Vision- Volume II. London, UK: Springer-Verlag, 1997, pp.
169—176.

[76] M. Grewal and A. Andrews, Kalman Filtering: Theory and Practice. Prentice
Hall, 1993.

[77] W.-P. Chen, C.-J. Hou, and L. Sha, “Dynamic clustering for acoustic target
tracking in wireless sensor networks,” IEEE Transactions on Mobile Computing
Special Issue on Sensor Networks, 2004.

[78] J. Liu, J. Liu, J. Reich, P. Cheung, and F. Zhao, “Distributed Group Manage—
ment for Track Initiation and Maintenance in Target Localization Applications,”

in Proceedings of 2nd International Workshop on Information Processing in Sen—
sor Networks (IPSN), 2003.

186

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

Q. Fang, J. Liu, L. Guibas, and F. Zhao, “RoamHBA: Maintaining Group Con-
nectivity In Sensor Networks,” in The 3rd international symposium on Informa-
tion processing in sensor networks (IPSN2004), 2004, pp. 151—160.

M. Mauve, J. Widmer, and H. Hartenstein, “A Survey on Position—Based Routing
in Mobile Ad—Hoc Networks,” IEEE Network Magazine, vol. 15, no. 6, pp. 30—39,
2001.

R. Timothy and J. Arora, “Survey of multi-objective optimization method for en-
gineering.” http://www. ccad.uiowa.edu/ tmarler/rtm/img/survey.pdf, Accessed
March 2005.

B. S. Gottfried and J. VVeisman, Introduction to Optimization Theory. Prentice-
Hall, Inc., Englewood Cliffs, New Jersey, 1973.

A. Mullur, C. Mattson, , and A. Messac, “New decision matrix based ap-
proach for concept selection using linear physical programming,” in 44th
AIAA/ASME/ASCE/AHS, Paper No. AIAA 20034446, Norfolk, VA, April
2003.

A. Messac, “Physical programming: Effective optimization for computational
design,” AIAA Journal, vol. 34, no. 1, pp. 149—158, 1996.

———, “Horn dubious construction of objective functions to the application of
physical programming,” AIAA Journal, vol. 38, no. 1, pp. 155—163, Jan 2000.

R. Fletcher, Practical Methods of Optimization, Second Edition. Anchor Bren-
don Ltd., Tiptree, Essex, 1991.

Trolltech-Inc, “Qt: A cross platform application development framework,”
http://www.trolltech.com/products/qt/index.html, accessed December 2005.

C. Kim and J .-N . Hwang, “Ob ject-based video abstraction for video surveillance
systems,” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 12, no. 12, December 2002.

S. M. LaValle and S. A. Hutchinson, “Optimal motion planning for multiple
robots having independent goals,” IEEE Transactions on Robotics and Automa—
tion, vol. 14, no. 6, pp. 912—925, December 1998.

R. Pal, Nikhil and S. K. Pal, “A review on image segmentation techniques,”
Pattern Recognition, vol. 26, no. 9, pp. 1277-1294, 1993.

J. Bruce, T. Balch, and M. Veloso, “Fast and inexpensive color image segmen-
tation for interactive robots,” in IROS, 2000.

T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera self-
calibration for virtual environments,” PRESENCE: Teleoperators and Virtual
Environments, vol. 14, no. 4, pp. 407—422, August 2005.

187

   

I"[[[[gi[[[[[jgl[[1]]