A \ .-.
yaw

79633.
iﬁz‘ésrﬁirf-ﬂ' .-.
“in raw; m $2.}
“Ni”: r. ,
‘7 " 75.
1"

.4.
W}; ‘
131.25! . '

nu}
.~ g
. x
' I
1‘63“. ‘
.6
“St 13‘!

 

‘,‘\' . r
“Au; .
Ruler“

c -.. H

"41%"

TH E8l8

l

,". »‘

10039093!

-O-C-l-O-Q-O-.-.-O-.-O-

This is to certify that the
dissertation entitled

DAV: A HUMANOID PLATFORM AND DEVELOPMENTAL
LEARNING WITH CASE STUDIES

presented by

SHUQING ZENG

has been accepted towards fulfillment
of the requirements for the

PHD degree in COMPUTER SCIENCE

 

 

W ”W Wt

 

Major Professor’s Signature

8/20/04
Date

 

Doctoral Dissertation

MSU is an Affirmative Action/Equal Opportunity Institution

 

. LlBRARY
Michigan State
University

 

 

 

PLACE IN RETURN Box to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJCIRC/DateDue.p65-p.15

 

DAV: A HUMANOID PLATFORM AND DEVELOPMENTAL
LEARNING WITH CASE STUDIES

By

Shuqz'ng Zeng

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science and Engineering

2004

ABSTRACT

DAV: A HUMANOID PLATFORM AND DEVELOPMENTAL LEARNING
WITH CASE STUDIES

By
Shuqz'ng Zeng

Motivated by the autonomous development process of humans, we are interested
in building a robot that learns online in real-time by interacting with the environment
through sensors and effectors, and develops a representation of the environment and
tasks autonomously. We call such a robot a developmental robot, which Should sat-
isfy eight challenging requirements: environment openness, high-dimensional sensors,
completeness in using sensory information, online processing and real-time speed,
incremental processing, performing while learning, and handling increasingly more
complex tasks.

To partially satisfy the above-mentioned challenging requirements for a develop—

mental robot, this research contributes:

. Building the body of a humanoid robot, called Dav. Embodiment is the most
overlooked aspect of human intelligence in traditional artiﬁcial intelligence. The
morphology of the robot is preferably of a humanoid form. This human-like

shape may allow humans to interact with the robot more naturally and provide

the similar physical constraints as humans during those interactions. Further-
more, directly Situating the robot in the physical world relieves the needs for a
symbolic intermediary representation. The actual physical world serves for this
representation. Therefore, as a part of this research, we built the Dav robot,
which is the only untethered mobile humanoid currently in the universities of

the United States.

Creating a developmental architecture and a modiﬁed version of incremental
hierarchical discriminant analysis (IHDR) algorithm. This thesis also presents
a theory of developmental mental architecture. Five architecture types, from
the simplest Type-1 (observation-driven Markov decision process) to Type-5
(DOSASE MDP), are introduced. Further, we present the architecture design
of the Dav robot. The framework of the Dav architecture is hand-designed,
but the actual controller is developed, i.e., generated autonomously by the de-
velopmental program through real-time interactions with the real physical en-
vironment. We present the Dav architecture and the major components that

implement the architecture.

Based on the above developed robotic platform and the proposed architecture,

we have designed and implemented a range-based navigation system.

ACKNOWLEDGMENTS

I am indebted to my advisor, Professor John J. Weng, who has been the guiding force
of my professional development and an inexhaustible source of ideas throughout my
Ph.D program at Michigan State University. Without his kind assistance and advice,
this dissertation would not have been completed. I am grateful to Professors Anil K.
Jain, George C. Stockman, Zhengfang Zhou and Ning Xi, for serving on my comittee
and providing critical comments on my research work.

Dav has been an enormous scientiﬁc project which has beneﬁted from the efforts
and ideas of many people. This project simply could never have happened without
Professor John J. Weng. Thank you, John, for offering me the opportunity to build a
system from scratch. Thanks also to the members of the Dav team, past and present:
Dr. Jianda Han, Jingliang Wang and David M. Cherba. Four of us have built exten-
sively on the many previous iterations of hardware and software design. Particularly,
David has been invaluable in making the torso and arms a usable resource, in helping
me testing the embedded distributed system, mechanical system and software system.

Thanks also to the members of the E1 Laboratory, past and present, at Michigan
State University. I beneﬁted from many discussions with my colleagues. Thanks

iv

go to Wey-Shiuan Hwang, Yilu Zhang, Nan Zhang, Xiao Huang, Yi Chen, Ameet
Joshi, Jason Massey, Raja Ganjikunta, Gil Abramovich, Christopher Osborn, Tham
Kengyew, and David Bordoley.

I am greatly indebted to my parents for their continuous encouragement and
support throughout my school years. Thanks also go to my wife, Yanhua Chen, for

her patience and support during the years at Michigan State University.

TABLE OF CONTENTS

LIST OF FIGURES ix
LIST OF TABLES xv
1 Introduction 1
1.1 Background .................................. 1
1.1.1 Learning, Perception and World Models ................. 2
1.1.2 Actions through Action Practice ..................... 2
1.1.3 Perception-guided Actions ......................... 2
1.1.4 World-model Free Approach ........................ 3
1.1.5 Autonomous Mental Development .................... 4
1.2 Outline of the Dissertation .......................... 6
2 The Body Design of the Dav Humanoid Robot 8
2.1 The Body Requirement for a Developmental Robot ............ 8
2.2 Mechanism and Kinematic Analysis ..................... 13
2.2.1 Mobile Base ................................. 14
2.2.2 Torso .................................... 19
2.2.3 Neck and Head ............................... 21
2.2.4 Arm ..................................... 22
2.3 Dynamic Model and Control ......................... 26
2.3.1 Lagrange-Euler Formulation ........................ 28
2.3.2 Analysis Of Manipulator with n—link ................... 29
2.4 The Low-level Control Scheme ........................ 30
2.4.1 Model of Motor ............................... 30
2.4.2 Tracking Algorithm ............................. 32
2.5 Dav’s Mobile Base: Control of System with Nonholonomic Constraints 36
2.5.1 Nonholonomic Constraint ......................... 36
2.5.2 Dynamic Analysis of Dav’s Mobile Base ................. 38
2.5.3 Controller Design .............................. 43
2.5.4 Experimental Evaluation .......................... 46
2.6 The Distributed Control System Design .................. 49
2.6.1 Hardware Design .............................. 49
2.6.2 Software Development ........................... 55
2.7 Experiment .................................. 60
2.8 Summary ................................... 60

vi

3 Dav Architecture Design: A Theory of Developmental Mental Ar-

chitecture 65
3.1 Introduction .................................. 66
3.2 AMD Paradigm ................................ 68
3.3 Observation-driven Markov Processes .................... 70
3.3.1 Type-1: Observation-driven MDP ..................... 71
3.3.2 Type—2: Observation-driven Selective MDP ............... 74
3.3.3 Type-3: Observation-driven Selective Rehearsed MDP ......... 78
3.3.4 Type—4: Observation-driven SASE MDP ................. 79
3.3.5 Type-5: Developmental observation-driven SASE MDP ......... 83
3.3.6 Type-6: Multi-level DOSASE MDP .................... 84
3.4 Dav Architecture ............................... 86
3.4.1 Open view .................................. 87
3.4.2 Recursive view ............................... 87
3.4.3 Architecture view .............................. 88
3.5 Major components .............................. 89
3.5.1 Sensory mapping .............................. 90
3.5.2 Complementary Candid Incremental Principal Component Analysis . . 91
3.5.3 The cognitive mapping: IHDR tree .................... 94
3.5.4 Motor mapping ............................... 101
3.5.5 Innate value system ............................ 102
3.5.6 Sensorimotor algorithm .......................... 103
3.6 Summary ................................... 105
4 Perception-driven Control Architecture 117
4.1 Introduction .................................. 117
4.2 Control Architecture ............................. 118
4.3 Dav’s Range-based Navigation Experiment ................. 122
4.3.1 Wall Following Behaviors ......................... 122
4.4 Summary and Future Work ......................... 126
5 Global Obstacle Avoidance through Real-time Learning 130
5.1 Introduction .................................. 130
5.2 Related work ................................. 131
5.3 Approach ................................... 133
5.3.1 Obstacle avoidance ............................. 133
5.3.2 Attentional mechanism ........................... 135
5.3.3 Path planner ................................ 141
5.3.4 Cost function ................................ 145
5.4 The robotic system and online training procedure ............. 147
5.4.1 Online incremental training ........................ 148
5.5 Experiments and results ........................... 149
5.5.1 Attention mechanism ............................ 149
5.5.2 Obstacle—avoidance on the Dav robot ................... 151
5.5.3 Integrate with the path planning layer .................. 151

vii

5.6 Discussion and Conclusion .......................... 153

6 Conclusions 161
6.1 Concluding Remarks ............................. 161
6.2 Future Work ................................. 162
APPENDICES 166
A IHDR algorithm 166

viii

2.1
2.2
2.3
2.4

2.5

2.6

2.7

2.8
2.9

2.10

2.11

2.12
2.13

2.14

2.15
2.16
2.17

LIST OF FIGURES

Dav: a mobile humanoid robot .........................
Location of degrees of freedom for Dav ....................
The Solidwork’s rendering of Dav’s mobile base .................

The offset steered driving wheel. The steering pivot’s speed can be arbitrary
direction. .................................

The top view of the mobile base. The black small dots are pivot axes along
which the wheels can turn, and the small solid squares denote contact points
between the wheels and the ground. ....................

CAD drawing of Torso. Joint3 and Joint4 are the ﬁrst joint of the left arm and
right arm respectively. The unit is in mm ..................

Coordinate frames of torso. Frames 023431424, ox4y4z4 and 03431424 are end-
effector frames for mounting the left arm, right arm, and neck respectively.

CAD drawing of the neck and head module. The unit is in mm. .......

Coordinate frames of the neck and head. Frames ox5y525 and 0:1:5’y5’25’ are
end-effector frames for the left and right cameras respectively ........

Dav’s left arm. The unit is in mm. The actuator for Joint 1 is housed in the
torso module .................................

Coordinate frames of the left arm. Two spherical mechanisms (shoulder and
arm) are designed. .............................

Block diagram for a DC motor. ........................

A rolling disk on a plane. The disk cannot move along the direction perpendic-
ular to its face at each instant ........................

If q is away from the distribution A, the extra term AT in Eq. (2.58) will pull
it back ....................................

Schematic diagram of control algorithm of Dav’s base. ............
The result of the simulated base experiment. .................
The hardware architecture for Dav has two levels: a main computer system and

low-level servo control modules. ......................

ix

10
13
15

16

17

20

21
23

24

25

27
33

37

45

2.18

2.19
2.20

2.21
2.22

2.23
2.24

3.1
3.2

3.3

3.4
3.5

3.6

The “Brain” of Dav, a quadruple processor desktop PC. The SCSI RAIDO disk
array is not included .............................

The block diagram of PowerPC based controller. ...............

CANPC: a PC104 embedded system running RealTime Linux Operating sys—
tem with four CAN interface (a) PC104 computer board (MZ104+). (b)
PCM3680 Dual-CAN interface board. ...................

The software architecture of Dav’s control system ................

The movement of Dav’s neck and head modules. The ﬁrst, second and third rows
Show the rotations of the neck’s Roll, Pitch, and Yaw joints respectively;
the fourth and ﬁfth rows Show the tilt and pan motions of the eyeballs
respectively. ................................

The movement of Dav’s mobile base. .....................

The movement of the arm’s Joint 1, Joint 3 and Joint 4, from the top row to
the bottom. ................................

The autonomous development paradigm ....................

The Type-1 architecture of a multi-sensor multi—effector agent: Observation-
driven Markov decision process. Each square in the temporal streams de-
notes a smallest admissible mask. The Type-1 architecture takes the entire
image frame without applying any mask. The block marked with L is a
set of context states (prototypes), which are clusters of all observed context
vectors l (t) ..................................

Progressive addition of architecture components for Type-2 to Type-5. Type-
2: adding T and En. Type-3: Adding M and En. Type-4: Adding S.-
and primed sensation. The block marked with D is a delay module, which
introduces a unit-time delay. Type-5: DevelOpmental T, R, M and V.

The Type-6 architecture .............................

An outline view of the Dav architecture. The closed-loop control is realized by
the sensors that sense each effector, as well as a set of rich sensors that sense
the internal and external environments. w denotes the working memory and
l is the long-term memory ..........................

An open view of a sensorimotor system. A circle designates a local context state
determined by the last context L. An inward solid arrow indicates a sensory
input at that time frame (including internal and external sensors) at each
time frame. An outward solid arrow indicates an output at that state. A
dashed arrow indicates an experienced transition between states. The lower
part corresponds to the reality mapping R and the upper part corresponds
to the priming mapping F. The two generally do not have the same number
of states. The priming mapping is used to predict farther future contexts.

51
53

53
58

62
63

64

69

71

76
86

106

107

3.7 A temporal trajectory view of a sensorimotor system. Each cycle represents a
context state at a particular time. A shaded cycle corresponds to a visited
state. A dashed arrow indicates a priming of a context that corresponds
to the cycle that the arrow points to. The priming arrows from a state are
updated according to the real experience. An actually visited state is not
necessarily primed ..............................

3.8 A recursive view of a simpliﬁed sensorimotor system. Not all components are
shown. ...................................

3.9 A block diagram of the architecture of a sensorimotor system. The lower cog-
nitive mapping realizes the reality mapping and the upper one realizes the
priming mapping. GK: gate keeper, an internal effector to actively control
the update of the last context. .......................

3.10 The spatiotemporal organization of areas in the sensory mapping. The ellipses
represent receptive ﬁelds covering both space and time. Areas are organized
in a hierarchical way, and the output of an earlier (low order) neural area
is used as input to the later (higher order) neural area. Along the pathway
of information processing, a neuron in a later area has a larger receptive
ﬁeld than one in an early area. In order to make the correct signal available
at the right time for the right input line, we use a time shifting technique.
Acting as a time delay unit, the shifter moves each signal to the next line
at each time instant. ...........................

3.11 CCIPCA algorithm [119]: compute ﬁrst k dominant eigenvectors,
v1(n), v2(n), . . . ,vk(n), directly from u(n), where n = 1,2,. . ..

3.12 Y—clusters in space 32 and the corresponding X-clusters in space X. Each sample
is indicated by a number which denotes the order of arrival. The ﬁrst and
the second order statistic are updated for each cluster. The ﬁrst statistic
gives the position of the cluster, while the second statistics gives the size,
shape and orientation of each cluster. ...................

3.13 The autonomously deveIOped IHDR tree [114,123]. Each node corresponds to a
local model and covers a certain region of the input space. The higher level
node covers a larger region, and may partition into several smaller regions.
Each node has its own Most Discriminating Feature (MDF) subspace D
which decides its child models’ activation level. ..............

3.14 Regions in the input space marked 2’. j where 2' is the index of the parent region
while 3' is the index of child region. The leaves of the tree represent the ﬁnest
partition of the space. The decision boundary of the region is of quadratic
form. ...................................

3.15 (a) Fits using the nearest-neighbor rule. (b) Fits using locally weighted regres-
sion. (c) Fits using kernel regression. (a) and (b) are adapted from Atkeson
et al. [9]. ..................................

xi

108

108

109

109

110

110

111

111

112

3.16
3.17
3.18

3.19

3.20

4.1

4.2

4.3
4.4

4.5

4.6

4.7
4.8
4.9

5.1

The result Of IHDR on an artiﬁcial data set. .................
(a) The learning curves of the artiﬁcial data set. (b) The execution time . . .

The motor mapping as a reverse application of the sensory mapping, but with
signal reconstruction .............................

An illustration of a simple motor mapping for a single effector. The left is a
gating mechanism and the right is a subsumption mechanism. ......

A hierarchical developmental architecture (Type-6) for SAIL procedure learn-
ing. Each Level Building Element (LBE) corresponds to a sensorimotor
system ....................................

The decision process of an agent. The agent tries to use the last context
l(t) = (1:1(t),aj (t)) to predict the future contexts: primed sensation :rp(t)
and primed action ap(t). .........................

The architecture of the robot controller. Note that symbols a(t), x(t) and
l(t) denote the action, sensation, and context at time t, respectively. The
internal representation R; is realized by IHDR. The world sensors include
range ﬁnder and cameras. Block V denotes the innate value system.

Flow chart of learning procedure: the system learning while performing.

A map of the training site at the second floor of the Engineering Building at
Michigan State University. The test loop (desired trajectory) is indicated
by the thick solid lines. 5 different corners are denoted by bold-faced arabic
numbers. .................................

A subset of training samples. The red arrows denote the labeled heading for
the robot. .................................

The graphical user interface to train the robot online. The left window shows
the current laser map in the robot’s local frame. The trainer may impose an
action via clicking the mouse button in the window. The red arcs illustrate
the possible circular path that the robot may have. The right window shows
the primed range image (retrieved from the IHDR tree). .........

The error histogram of the leave-one-out test. ................
The error histogram of the testing data set collected from the third ﬂoor. . . .

A map of the test site at the third ﬂoor of the Engineering Building at Michigan
State University. The thick solid lines denotes the trajectory along which
the robot traveled. ............................

The overall architecture of the range-based navigation. “Attention” denotes an
attentional module ..............................

xii

115

115

116

120

121
121

124

126

127
128
128

129

135

5.2

5.3

5.4

5.5

5.6
5.7

5.8

5.9

Why attention is needed? The solid small circles denote the robot with a
short line segment at the front indicating orientation. Thick lines mark
the walls along the corridor. T and 1‘ denote the threshold and the mean,
respectively. (a2), (b2) and (c2) are range images taken by the robot at
the three scenarios, respectively. (a3), (b3) and (c3) are the corresponding
images after passing the attentional module. The diagrams in lower two
rows use logarithmic scales for the Y-axis. We can see that the distance
between (33) and (b3) becomes larger while the distance between (b3) and
(c3) becomes smaller. ...........................

The attentional signal generator f and attentional executor g. r(t) and z(t)
denote the input and output, respectively ..................

(a) shows that the pre-attentional map r is approximated by a polygon P. P
is speciﬁed by h end points: p1,p2, ...,ph. The kth end point is denoted
by (lk,ak). (b) shows the post-attentional approximation P’ , after the at-
tentional function 9". Those end-points whose distances are larger than T
are set to F. The half-circle with symbol T shows the area the robot pays
attention to. We can see numerous pre-attentional approximations map to a
single post-attentional approximation P’. It is clear that the points outside
the half-circle of (a) have the freedom to change positions without affecting
the shape of (b). ..............................

The occupancy grid of an environment. Black cells denote to occupied cells,
while others are free ones. Each cell has eight directly connected neighbors:
N, NE, E, SE, S, SW, W and NW, each of which corresponds to an angle:
heading 0. The optimal path from S to G can be encoded by the heading
sequence: (E, E, E, NE, NE, NE, NE, N, N, N) ...............

The result of tests on a simulated environment ................

The local coordinate system and control variables. Once the mouse button is
clicked, the position of the mouse pointer P provides an imposed action. Let
1' denote the radius of the arc. The arc’s length d controls the translation
speed while the radius controls the angular speed ..............

The 16 scenarios are used to train the simulated robot. The small solid circles
denote the robot, and solid dark line represents the walls. The dot line
recorded the online training trajectories ...................

The solid dark lines denote walls and the dot lines show the trajectory. Ob-
stacles of irregular shape are scattered about the corridor. (a) A 10—minute
run by the simulated robot with the attentional module. (b) The result of
the test without attentional selection. Two collisions indicated by arrows
are occurred during the ﬁrst two minutes simulation. ...........

5.10 Dav moved autonomously in a corridor crowded with peOple. ........

xiii

136

138

141

143

155

156

157

5.11 Navigating in an simulated environment with unknown Obstacles. The ini-
tial conﬁguration is chosen to be about (4.8m, 3.0m). The goal conﬁgura-
tions G1, G2, G3 and G4 are chosen to be (37.1m, 25.0111), (29.6m, 24.3m),
(18.6m, 27.6m) and (16.2m, 25.6m), respectively. For clearness, we do not
draw the portion of trajectory very close to the goals ............ 159

5.12 Dav run in the northwest corner of the corridor on the second floor of Engineer-
ing Building at Michigan State University. The red curve (or labeled with
“A” ..................................... 160

xiv

2.1
2.2

2.3

2.4

4.1

5.1
5.2
5.3

LIST OF TABLES

The speciﬁcation and installed sensors on the Dav robot. ...........

D-H geometric parameters. The joint variables are marked with *. The
unit is in millimeter. (a) is the parameters for left arm, (b) is for the
right arm, and (c) is for the neck. 91 and 02 denote the angles of joint
1 and joint 2 respectively. ........................

D-H geometric parameters of neck and head mechanism. The joint vari-
ables are marked with *. The unit is milli-meter. 01, 02, 0;; denote the
angles of the neck’s Roll joint, Pitch joint, and Yaw joint respectively.
Frames 4 and 4’ are the end-effector frames of the left and right eyeballs
respectively. Note that we do not consider two pan joints in head.

D-H geometric parameters of the arm mechanism. The joint variables are
marked with *. The unit is milli-meter. 6,- denotes the angles of joint
2', for z' = 1, ...,7 ...............................

Differences between the DPDCA controller and the traditional controller. . . .

The lower bound of the reduction ratio at several parametric setting . . .
The results of tests with random starting positions. ............

The results of tests with random starting positions. ............

XV

26

26

123

141
150
150

Chapter 1

Introduction

1.1 Background

Industrial robots have been widely and successfully used in controlled industrial set-
tings for a variety of tasks. Great challenges exist for robots to work in general human
environments. These are characterized by unknown, unpredictable and changing ob-
jects, and tasks that cannot be well speciﬁed algorithmically. We call them muddy
tasks. Vision, audition and language are at the forefront among cognitive capabilities
that are required for muddy tasks. Autonomous mental development seems to give a

possible way for machines to handle these muddy tasks [117].

Not until the late 1980’s did embodiment receive attention in the artiﬁcial intelli-
gent (AI) community, when it was popularized by the behavior-based approach [7,18].
Action generations have been fueled by impressive advances in robot construction,

especially in humanoid platforms.

The ﬁrst humanoid robot was “Elektro the Motorman” at the 1939 New York
World Fair. It was a legged humanoid robot that “talked,” moved its limbs, and played
some arithmetic tricks. With the arrival of computers, more impressive humanoid
robots have been constructed to demonstrate their specially designed skills. Wabot—

2 [83] (1980—1984) was designed and built solely to read musical notes and play piano.

1

WABIAN [61] was a humanoid robot capable of walking and dancing. An inspiring
engineering achievement in humanoids is the P2 robot from Honda, which is capable of
walking and climbing stairs [39]. More recent humanoid systems include H6 [69], ETL-
Humanoid [68], and Robonaut [59]. These projects primarily focused on the challenges
of mechatronic design and integration of anthropomorphic bodies and programming

for generating action sequences.

1.1.1 Learning, Perception and World Models

Actions can be programmed in or learned through practice. However, an action is
not very useful if the robot does not have a perception capability, not knowing in
what environmental context to produce an action. Perception is well-known to be
very challenging and typically a world model is predesigned. What are the limita-
tion of such a hand-crafted model? Since these issues are closely related to robot

construction, we discuss these issues in this section.

1.1.2 Actions through Action Practice

Instead of programing action into a robot, another category of efforts aims to train
the robot to produce desired behaviors. With redundant degrees of freedom, it is chal-
lenging to learn behaviors, especially when training time is limited or when training
must be conducted in real time. Such projects include LWPR on a SARCOS robot
by Schaal et al. [97], the simulated humanoid of a 37 degrees of freedom by Mataric
et a1. [11], and the scheduling degrees of freedom by Grupen et al. [26] motivated the

early child motor development.

1.1.3 Perception-guided Actions

In contrast with the above efforts that concentrate on behavior generation with-

out requiring sophisticated perception, a series of efforts deals with perception and

2

perception-guided behaviors. The main challenging perceptual sensing modalities
include vision, audition, and high-dimensional touch with many touch elements (in-
cluding range sensing). No matter how complex a behavior (action sequence) is,
the behavior is useful only when the robot can produce the behavior in the correct
perceptual context and can suppress it in all other contexts.

Studies of perception driven behaviors have had a long history. A well-practiced
approach is that a human programmer deﬁnes features (e.g., edges, colors, tones, etc)
or environmental models for the system. For example, the recent works on H2 from
Honda, Cog [19] and Kismet [14] produced impressive perception driven behaviors.
Some speech features and visual features are used to establish audio-visual association
by an active camera on a robot hand [78].

Programming perceptual capability using human deﬁned features is a quick way to
produce results in a controlled setting. A fundamental limitation is that the resulting .
robot cannot work well in unknown, partially unknown, or changing environments.
Weng & Chen [110] explained the inefﬁciency of human deﬁned features, the insufﬁ-
ciency of hand crafted models, and the difﬁculty in programming their applicability

checker for an unknown environment.

1.1.4 World-model Free Approach

The world-model free approach has been studied somewhat independently in two
research communities: robotics and computer vision. In the robotics community, they
are called behavior-based methods [18] [7]. The emphasis is to concentrate on behavior
generation directly from range sensor readings or directly from images [65] [43].F
The model-free approach in the computer vision community is called appearance-
based methods. The appearance-based method started around 1990 [50] [94] and
it appears to be the most popular method in the computer vision community now.
Appearance-based methods use statistical tools directly on high-dimensional image

vectors, normalized using the mean and variance of the pixels. Thus, an image with m

3

rows and 71. columns is considered as a d-dimensional vector v 6 Rd, where d = mn.
Since. high-dimensional statistical tools are directly applied to the vectors in space
Rd, these type of method can take into account not only correlations between nearby
pixels, but also far-away pixels. The need to process a high-dimensional sensory vector
brings out a sharp difference between behavioral modeling and perceptual modeling:
the effectors of a robot are all known but the sensory Space Rd is extremely complex
and unknown and, therefore, very challenging.

The power of learning directly from entire images has been demonstrated in vision-
guided autonomous navigation. They include ALVINN [74], which uses multilayer
perceptron (MLP) networks; ROBIN [76], which uses radial basis function networks
(RBFN); SHOSLIF [109] and state-based SHOSLIF [23], which use Fisher’s multidi-
mensional discriminant analysis in building a regression tree. As has been shown in
the comparison study of SHOSLIF [23,109], statistical regression methods perform
signiﬁcantly better than traditional non-statistics based networks such as multi-layer
perception and radial base function networks.

Perception-learning-based action generation relieves the human programmer from
the intractable task of programming perception. However, the above learning process
is not fully autonomous in the following sense: (1) The human programmer designs
a part of the task-speciﬁc representation, e.g., features and states. (2) The human
programmer has direct control over internal modules during learning. This mode of
manual development fundamentally limits the scalability of behavioral and perceptual

capabilities.

1.1.5 Autonomous Mental Development

M. Sur et al. [55] ﬁnd that if the optic nerve circuits carrying vision signals are re-
wired into the auditory cortex of a young ferret; the auditory cortex, the brain zone
usually assigned for sound processing, can develop the properties that are observed

only in visual cortex. This reveals a point that a developed mammal brain is a prod-

4

uct of active, autonomous and extensive interactions with the environment around

it. Motivated by this discovery and others, a new paradigm, autonomous mental

development (AMD) proposed by [118]:

o A human designer designs a robot body according to the general ecological

condition in which the robot will work (e.g., onland or underwater).

o A human programmer designs a task—nonspeciﬁc developmental program for the

robot.

0 The robot starts to run the developmental program and develops its mental skills
through real-time, on-line interactions with the environment, which includes

humans.

A robot built in this paradigm is called a developmental robot. Early examples of
such developmental robots include SAIL (short for Self-organizing, Autonomous, In-
cremental Learner) robot [105,130,132] at Michigan State University and the Darwin
V robot [5] at The Neurosciences Institute, San Diego. These robots share the char-
acteristic that system behaviors are generated through online real time interactions

with the environment.

Motivated by the above-mentioned research, this work follows a new engineering
paradigm, autonomous mental development [117], which emphasizes embodiment,
environmental openness, completeness in using sensory information, online processing,
real-time speed, incremental processing, performing while learning, and scale—up to
complex tasks. This dissertation is based on the research work of a mobile robot
project at Michigan State University funded by DARPA (Defense Advanced Research
Project Agency) DRS program. The purpose of the project is two—fold: to provide
a general purpose, ﬂexible, and dextrous robotic platform from engineering aspects

and to understand human autonomous mental development from scientiﬁc aspects.

5

1.2 Outline of the Dissertation

The scope of this dissertation is about building the body Of a humanoid robot and
applying it to the applications of autonomous navigation.

Chapter 2 describes the design of the Dav robot. First, the mechanism and kine—
matic model of the robot are introduced. To facilitate the joint-level control, the
robot’s dynamic analysis is derived. Then, we analyze the robot’s mobile base in
detail since it is related to the later navigation task. The control system is also
presented in this chapter.

Chapter 3 presents a theory of developmental mental architecture. Five architec-
ture types from the simplest Type-1 (observation-driven Markov decision process) to
Type-5 (DOSASE MDP) are introduced. The properties and limitation of the simpler
one are discussed before the introduction of the next more complex ones. Further-
more, we present the architectural design of the Dav robot. The framework of the Dav
architecture is hand-designed, but the actual controller is developed, i.e., generated
autonomously by the developmental program through real-time interactions with the
real physical environment. We present the Dav architecture and the major compo-
nents that realize the architecture. The designed architecture for Dav is the next
generation version from its extensively tested predecessor - the SAIL developmental
robot.

Chapter 4 outlines a novel developmental perception-driven control architecture
(DPDCA), which is adapted from the overall architecture presented in Chapter 3.
Dav’s range-based indoor navigation experiment is used to illustrate the power of
DPDCA.

Chapter 5 introduces a learning-based approach to obstacle-avoidance behavior
with a global planner. The robot operated in a partially known bounded 2-D en-
vironment populated by unknown static or moving obstacles (with slow speeds) of
arbitrary shape. The sensory perception was based on a laser range ﬁnder. Local

re—planning was implemented to account for unknown and dynamic parts of the en-

6

vironment. To greatly reduce the number of training samples needed, attention was
used to learn the obstacle-avoidance behavior. An efﬁcient, real-time implementation
of the approach has been tested, demonstrating smooth obstacle-avoidance behaviors
in a corridor with a crowd of moving students as well as static obstacles.

Chapter 6 summarizes the contributions of the research and concludes the disser-
tation with recommendations for the improvement of the Dav robot and for future

research work might be conducted on the humanoid robot.

Chapter 2

The Body Design of the Dav
Humanoid Robot

In this chapter, we ﬁrst discuss the body requirements of a developmental robot. Sec-
tion 2.2 presents the mechanism and kinematic analysis of the Dav robot whose overall
appearance is shown in Fig. 2.1 and detailed Speciﬁcations are listed in Table 2.1. We
derive the dynamic analysis of Dav’s serially driven manipulator in Section 2.3 and the
mobile base in Section 2.5. The joint-level control for Dav is presented in Section 2.4.

The design of the distributed control system is outlined at the end in Section 2.6.

2.1 The Body Requirement for a Developmental
Robot

The essence of autonomous mental development by machines is the capability of
learning directly, interactively, and incrementally from the environment using on-
board sensors and effectors. A developmental robot should be a real robot that runs
a developmental algorithm and is allowed to learn and practice autonomously in the
real physical world.

The goal of the Dav robot project is to provide a next generation robot platform for

8

Table 2.1: The speciﬁcation and installed sensors on the Dav robot.

 

Speciﬁcation

[ Installed sensors

 

 

 

Mobile humanoid, self-contained

700mm(L) x 700mm(W) x 1800(cm)
body dimensions

242 Kg

43 degrees-of-freedom (DOF). The
mechanisms include an 8-DOF
drive-base, a 2-DOF torso, a 3—DOF
neck, a 5—DOF head, two 7-DOF arms,
and two hands

Main computer includes a quadruple
Pentium Xeon III desktOp with 2 Giga

memory and 100 Giga SCSI hard
drives

11 Motorola MPC555 embedded
processors as low-level Actuator

Control Unit (ACU).

All ACUs are connected to the main
computer via CAN buses Wireless
ethernet, Cisco Airo 340 series PCI
card and access point

2 Hauppauge WinTV PCI cards as
image grabber

A Sound Blaster Live! 5.1 from
Creative Labs

Four 110 AmpH lead acid batteries.
8—hour continuous operation without
recharging

0 Two color micro CCD
cameras, QN42H from
ELMO

0 Two Microphones with
pre-ampliﬁers

0 Laser range scanner
(covering 180 deg at 0.5
deg resolution), PLS,
SICK company

0 Shaft encoder for each DC
motor

0 Two acceleration Sensors
(Analog ADXL202)
mounted on the head to
mimic vesicular organ

0 Current sensor or strain
gage to sense the torque at
each joint

 

 

 

 

Figure 2.1: Dav: a mobile humanoid robot

research on robot mental development. Because Dav is meant for mental development,
its design requirements are not the same as other humanoid robots in the following

perspectives:

o Mobility. There have been a number of studies that show the importance of
autonomous movements in developing the humans’ and animals’ senses of the
world and their normal behaviors. “Kitten carousel” [38] especially shows that
autonomous mobility is necessary for the kitten to develop the cliff avoidance
behavior. Therefore. it appears that autonomous mobility is a necessary condi-
tion for mental development. The implication of mobility and autonomy means
that the robot must be totally self-contained with no tethers. The amount
of power and computational resources for an eight-hour period of actions re—

quire batteries of such capacity that a biped robot is not feasible for current

10

technology and a rolling base with high mobility was chosen.

Manipulators. A developmental platform should be equipped with dexterous
arms and hands. In other words, it should have sufﬁcient degrees of freedom
and range of motion. It is desirable for a developmental humanoid to take a

human shape so that they can use a wide variety of tools designed for humans.

However, the design faces conflicting conditions: power and size. We like to
have a powerful limb for a larger payload. The higher the power, the larger the
motors and other supporting devices. But the size is limited by a typical human
arm size. The electronic boards of each limb must be contained in the same
limb. The most challenging activity of all is to satisfy this self-containment
requirement for Dav’s hands. We expect that this problem will be better solved

if the motors are custom designed.

Sensing. Developmental robots grow world representations autonomously from
a sensorimotor stream. Also the controller is perceptual driven architecture.
Therefore in a general setting, major sensor modalities such as visual, auditory,

touch, and somatosensory should be integrated.

Computational resource. The developmental program must be able to update
its memory for all the sensors within a fraction of a second, e. g., 10ms for sound
and lOOms for vision. The control signals of effectors must also be updated at

a rate of at least 10Hz.

Currently, hardware implementation is not practical for an algorithm that is
expected to change very often throughout the research project. Thus, the de-

velopmental program is to be implemented by software.

The memory of a developmental robot is expected to be very large. Thanks
to the logarithmic time complexity of IHDR [44] [114], it is expected that the

refresh rate of the developmental program will not slow down too much when

11

the memory size increases. However, this is still an open question, since we

need to test a robot with more extensive learning experiences.

Power supply. Developmental robots need extensive training and practice. We
should have enough battery capacity to allow the robot to operate without
charging the batteries frequently. The battery capacity is still an unsolved
problem, as with electrical cars. Currently, with a consideration of cost and
capacity availability, we still think that a reasonable choice is sealed rechargeable

batteries.

Wiring. Wiring is a challenging issue for humanoids, especially for develop-
mental ones. Due to a very large number of sensors and motors in the limbs,
the number of wires required between each joint reaches a few hundred if data

processing and computation are centralized at a single location.

Such a large number of wires cause at least two major problems. First, the
rigidity of the wire bundle interferes with dexterous manipulators. Second, the

wire bundles cannot sustain repetitive bending during extended usage and will

break.

We have adopted a network scheme, namely, a distributed embedded control sys-
tem is designed and implemented. Further mechanical parts (especially joints)
are designed in such a way that the wires can go through the mechanical joints
with minimal bending to increase the reliability of the system and reduce main-

tenance frequency.

Torque command. In biology the muscles are controlled by the action potential
in motor neurons. The larger the action potential, the more muscle ﬁbers are
recruited to contract. Therefore, the action potential can be regarded as a
torque command. Torque commands can be easily implemented by using the

DC motors except in the mobile base, which is a non-redundant dynamic system

12

with non-holonomic constraints. A decoupling controller is proposed to realize

torque commands on the mobile base.

2.2 Mechanism and Kinematic Analysis

The Dav robot is designed as a wheel-based mobile humanoid with 43 degree-of-
freedoms (DOF) to achieve the above-mentioned requirements. Figure 2.1 Shows the
body of the Dav robot. It is contained in a volume of 750(1) x 750 (w) x 1700(h)
mm in default posture. The DOF distribution is shown in Fig. 2.2. The mechanical
body contains over 300 types with a total of over 900 custom-made parts. All of the

mechanical drawings of the custom-made parts were generated by Solidworks.

 

 

 

Figure 2.2: Location of degrees of freedom for Dav

Dav consists of a mobile base, a torso, two arms, a head, and a neck. Each module
is designed in such a way that it is easy to assemble and maintain. In the following,

we present the mechanism and kinematic analysis of these modules in detail.

13

2.2.1 Mobile Base

High mobility is essential for robots to enlarge their scope of sensing, which is essence
for developmental robots having rich sensorimotor experiences. However, the car-like
wheeled robots have nonholonomic constraints that reduce the mobility and make
planning and learning difﬁcult. Omni-directional vehicles are hence developed. Omni-
directional (holonomic) vehicles can move in any direction immediately at robots’ any
conﬁguration and the motion can be decoupled into three independent components
(1:, 3;, ¢). This feature avoids the complicated manueuvering sequences to bring the
vehicle to a side position and, thus, facilitates control and learning.

There are a variety of designs of omni-directional vehicles in the literature. These
vehicles can be categorized into two classes based on their wheel design: a Special
wheel design and a conventional wheel design. Most special wheel design is based
on a concept that the robot actively drives in one direction while allowing passive
motion in the other [8,120]. However, vehicles based on this holonomic wheel design

have the following disadvantages under practical environments:
0 Complicated mechanism
0 Limited load capacity because of a reduced contact area

0 Limited accuracy to estimate the pose of the vehicle by dead-reckoning because

of the difﬁculty to decide the passive motion of wheels

0 Difﬁculties in coping with uneven ﬂoor and other ﬂoor irregularities in cases of

four-wheel driving vehicle without suspension

0 Low clearance from the ﬂoor makes this design hard to deal with the ﬂoor’s

step features (e.g., an exposed thick wire on the carpet).

To overcome those difﬁculties, the other category (conventional wheel design) is

proposed. For example, Wada & Mori [101] gives a design to achieve near omni-

14

directional mobility by using two active casters. Nomadic XR4000 is another example
with four casters [40].

Because of its high load capacity, simple mechanical design, and tolerance to floor
irregularities, conventional wheels with offset are chosen to realize omni-directional
mobility. Shown in Fig. 2.3, Dav’s mobile base has four wheels, and each one is driven

by two DC motors for driving and steering separately.

Motors

    

Figure 2.3: The Solidwork’s rendering of Dav’s mobile base.

Fig. 2.4 shows an offset steered driving wheel (an actuated caster). At this conﬁg-
uration, the speed from the driving motor is along the wheel’s orientation while the
speed from the steering motor is perpendicular to the wheel’s orientation. Thus it is
possible to translate the steering pivot point to an arbitrary direction by controlling
the two motors.

Supported by the steering pivot points, the vehicle frame can be considered as
a plate controlled by four supporting sticks. The plate’s translation and rotation
movements are determined by the sticks’ speeds. Since each stick can move in an
arbitrary direction, the vehicle’s movement is omni—directional. It is worthy noting
that this is an over-actuated system because the four sticks’ motion over-determines

the vehicle plate’s movement. A synchronizing control scheme is hence needed. In

15

the following, we derive the kinematic relation between the base’s joint speeds and

the vehicle speeds. Section 2.3 gives the dynamic analysis and control scheme for the

base.

 

Figure 2.4: The offset steered driving wheel. The steering pivot’s speed can be arbitrary
direction.

For kinematic analysis, we introduce the following symbols that are illustrated in

Fig- 2.5;

v The speed of the vehicle. The components are [um uy,w]T

‘1 The conﬁguration of the vehicle. The components are [$1, p1, ..., $4, p4]T
¢>i Angle of the ith steering joint, 2' = 1, ..., 4

‘di Speed rates of driving wheel

'r a b The radius and offset of the wheel

Consider the ith wheel, i = 1, ...,4. Let p,- and (p, be the wheel’s driving and
Steering angular speeds respectively. Let u,- = (13,-, 3),) denote the steering pivot’s
Speed. From Fig. 2.4, assuming that the ideal rolling condition is satisﬁed, we have

Oi l/b 0 11¢ (2 1)

_ I

p} 0 1/r up,
w
here 21¢, and up, are the wheel’s translational speeds in the wheel’s frame. The

16

 

   

8

 

 

 

 

Figure 2.5: The top view of the mobile base. The black small dots are pivot axes along
which the wheels can turn, and the small solid squares denote contact points between the
wheels and the ground.

transformation matrix from the vehicle’s frame is:

2). Sin,- —cos, 3'3,-
¢. = (I5 45 . (2.2)

Up. COS ¢i sin ¢i 9i

P 11lgging Eq. (2.2) into Eq. (2.1) and we obtain the relationship between the wheel’s

J 0i nt speed and its steering pivot’s speed as:

031' = 5111451/(9 “COS¢i/b it (2.3)

p, cos 451/7" Sin d), / r 3),-

Referring Fig. 2.5, we obtain the following relation between the ﬁrst wheel’s steer-

111g pivot’s speed and the whole vehicle’s speed:

i: u +aw 1 0 —a
1 = I = v. (2.4)

2]] uy-l-aw 0 1 —a

he Similar procedure is applied to the other three wheels and we erte the result In

17

matrix form as:

 

 

 

231 1 0 —a
91 l 0 —a
1'72 0 1 —a
' 1 0 a
y’ = v, (2.5)
233 0 1 a
93 1 0 a
(L4 0 1 a
.3“. L1 0 —a‘

 

where a denotes the offset between the center of the base and the wheels’ pivot points.

Eq. (2.3) is equivalent to

 

 

 

 

 

 

(AI 531

b1 _ _ a

432 T1 0 0 0 5:2

q= {)2 = 0 T2 0 0 '1“. (26)

4’3 0 0 T3 0 1'53

[)3 _0 0 0 T4] :03

454 Si34

364‘ :794‘
Where

_ sinoi/b —cosq§,~/b

i— cos qii/r sin (bi/r ,
for 2‘ =1,...,4.

Applying Eq. (2.5) to Eq. (2.6), we can derive the wheel joints’ speed q from the
vehiCIG cartesian speed, x:

q = Bv, (2.7)

18

where -

psinqbl/b —cos¢1/b —asinq51/b+acos¢1/b
cosqﬁl/r sin¢1/r —acos¢1/r—asin¢1/r
sinqbg/b —-cos¢2/b —-asin¢2/b—~acos¢2/b
cosrzbg/r sinqig/r —acos¢2/r+asin¢2/r
sinqbg/b —cos¢3/b asin¢3/b—acos¢3/b
cosqbg/r sin¢3/r acosqﬁg/r+asin¢3/r

sin (154/1) — cos¢4/b asin¢4/b+ acos¢4/b

 

 

cos¢4/r Sinq§4/r acos¢4/r—asin¢4/r

- d

Therefore, Eq. (2.7) gives the kinematic equation we are looking for here. We can
see Dav’s mobile base is a redundant-actuated system with non-holonomic constraints.

The system equation has 3DOF in velocity Space while 8DOF in conﬁguration space.

2 - 2 .2 Torso

TO imitate the human body, as well as to enable picking up objects from the ﬂoor,
a. 2DOF torso is placed on the top of the mobile base. The rolling joint, the one
ha-\v'ing an vertical axis, can be regulated with motion of the mobile base to keep the
orientation of the torso.

Fig. 2.7 shows the D-H coordinate frame assignment on the torso mechanism.
It is worthy noting that there are three end-effector frames: oz2y2z2, 027331323 and
033 431424, corresponding to the left arm, right arm, and neck respectively. The four
D-

H geometric parameters associated with each link are listed in Table 2.2.2. Those

four geometric parameters are deﬁned as follows:

9 0, is the joint angle from the 33-1 axis to the :13,- about z,_1 axis (using the right

handle rule).

- d1 is the distance from the origin of the (i ~— 1)th coordinate frame to the

intersection of the z,_1 axis with the 1',- axis along the 2.3--1 axis.

19

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I:Tigllre 2.6: CAD drawing of Torso. Joint3 and Joint4 are the ﬁrst joint of the left arm and
rlght arm respectively. The unit is in mm.

20

o a,- is the offset distance from the intersection of the 2,-_1 axis with x,- axis to the

origin of the ith frame along the 3:, axis (or the shortest distance between the

2.3-1 and z, axes).

0 Oz,- is the offset angle from the 2,11 axis to the z,- axis about the :13; axis (using

the right-hand rule).

 

D 2
z3= ‘03 Z4704 024:2
‘p’
JG, 3 7 th4 I
x x2
l» 20, yl
X0, 217’ 00,01
+
20/

 

 

/// ////

F i.gllre 2.7: Coordinate frames of torso. Frames 0x4y4z4, ox4y4z4 and ox4y4z4 are end-
effector frames for mounting the left arm, right arm, and neck respectively.

2 — 2.3 Neck and Head

AS a sensor platform, the neck and head (see Fig. 2.8) are designed to support active
Vision system. This 8DOF mechanism includes the 3DOF neck, whose rolling joint
with the vertical axis is actuated by an anti-backlash geared motor. The brows and
lips are computer-controlled for expressing emotion. The two cameras can indepen-
dently pan and coupled tilt.

Fig. 2.9 shows the D-H coordinate frame assignment on the torso mechanism.
There are two end-effector frames: 0r4y4z4 and 02:4’y4’z4’, corresponding to left and
right eyeballs respectively. These joints’ axes cross each other orthogonally: axes of
joint 1 and joint 2; axes of joint 4 and joint 5; axes of joint 4’ and joint 5’. Their

Or‘ - . . . .
lglns COInCIde, shown In Fig. 2.9.

21

Table 2.2: D-H geometric parameters. The joint variables are marked with *. The
unit is in millimeter. (a) is the parameters for left arm, (b) is for the right arm, and
(c) is for the neck. 61 and 6;; denote the angles of joint 1 and joint 2 respectively.

nink[ a, [(1,- [ drier]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 0 90° 0 9;
2 320.0 180° -155.0 0;
(a)
[Link [ a,- [ Oi I (I; I 91‘ J
1 0 90° 0 0;
3 320.0 180° 155.0 3
0))
[Link] a,- [ Oi [di [ 0i J
1 0 90° 0 a;
4 283.0 180° 0 6;

 

 

 

 

 

 

 

(C)

The four D-H geometric parameters associated with each link are listed in Ta-

ble 2.2.3.

2 - 2.4 Arm

Dav has human symmetry with two arms. Each arm is a 7DOF anthropomorphic
manipulator shown in Fig. 2.10. The extra joint is added to each arm for increasing
the end effector’s range of motion. Composed of shoulder, elbow, and wrist, the
arm design has been influenced by the human skeleton structure. Most of joints of
arm except Joint 1 which is driven by a set of bevel gears, are actuated by motors
vVitll gear-box either directly or through a simple pulley timing-belt mechanism. This
design simpliﬁed the mechanical transmission system and saves a lot of room for
installing servo units such as servo ampliﬁers, sensors, and control boards. All servo
units are integrated inside the arm itself except those for the shoulder, which are

l
OCated conveniently in the torso.

22

     

R-Pan Joint L—Pan Joint

    

  

m: '
_.

a 2

     
         
    

'l
l!“

  

L—Pan Motor ;
Yaw Joint

L—Pan Motor
Pitch Joint

 

 

I!
.3

     

Roll Motor/Elma Motor

Figure 2.8: CAD drawing of the neck and head module. The unit is in mm.

Joints 1, 2, and 3 together realize a spherical shoulder conﬁguration, shown in
Fig- 2.11, in which the joint axes 20, 21, and 22 intersect at 01. Similarly, the wrist
IIleczhanism, consisting of Joints 5, 6, and 7, adopts the Stanford manipulator (spher-
iCa-l wrist) design. The angle of spherical joints are Euler angles, which simpliﬁed the

1{il‘lematic analysis. D-H parameters are shown in Table 2.2.4.

Once the D-H coordinate system has been established for each robot link, the
forVvard kinematics can be easily derived via homogeneous transformation. Let A, be

t he matrix transforming the i - 1th coordinate frame to the ith frame as follows:

cos 0.- — cos a,- sin 0,- sin a,- a, sin 9,-
sin 0,- cos a, cos 0,- - sin a,- cos 0,- a,- sin 9,-
, = (2.9)
0 sin 0; cos a,- d,-
0 0 0 1

The matrix T3 which speciﬁes the location of the ith coordinate frame with respect to

he lnertla frame (frame 0) IS the chain product of successwe transformation matrices

23

Right Left

 

image plane image plane
25' 25
x3',z4' yS' X3,Z4 3’5
23 y4
4. ’ - - 3, 4
/ 230,),40 y X

y3',x4'

x2

y2 .g.
22
zO,y1
x0,zl D
y0,xl

/77/////

 

 

Figure 2.9: Coordinate frames of the neck and head. Frames ox5y525 and ox5’y5’25’ are
end-effector frames for the left and right cameras respectively.

Of Ak, for k = 0, ...,i, and is written as

T3 = H A, (2.10)
k=0

From the Eq. (2.10), the robot kinematic model is thus expressed by the 4 x 4
homogeneous transformation matrix T = Ta‘, where n is the number of robot moving
link under consideration. The T matrix has combined effects of rotation, translation,
and scaling and can be decomposed into:

R d
T = , (2.11)

0 1

' U here R denotes the rotation matrix of the end-effector frame with respect to the

111ertia frame; (:1 denotes the translation with respect to the inertia frame; 0 = [0,0,0].

Due to its redundant nature, Dav’s inverse kinematic analysis is extremely complex
a. . . . . .
Dd does have a unique solution. Usually, more constramts such as minimum power

C . . . . . .
thUHlptIon and smoothing trajectory are needed, but this model-based Inverse kine-

24

    

Joint]

 

Joint 4

Figure 2.10: Dav’s left arm. The unit is in mm. The actuator for Joint 1 is housed in the
torso module.

25

Table 2.3: D-H geometric parameters of neck and head mechanism. The joint vari-
ables are marked with *. The unit is milli-meter. 61, 62, 9;, denote the angles of the
neck’s Roll joint, Pitch joint, and Yaw joint respectively. Frames 4 and 4’ are the
end-effector frames of the left and right eyeballs respectively. Note that we do not
consider two pan joints in head.

Wink] az- Lei l 612- LU

1 0 90° 0 6;
2 32 —90° 0 a;
3 101 90° 77 0;
4 0 0° —53 a;
4’ 0 0° 53 9;»:

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 2.4: D-H geometric parameters of the arm mechanism. The joint variables are
marked with *. The unit is milli-meter. 0,- denotes the angles of joint 2', for i = 1, ..., 7.

 

[Link] a,- [ a,- [ d,- [ 6,- [
l 0 -90° 0 6f
0 90° 0 0;
0 -90° 309.9 0;
60 0 499.9 0;
30 —90° 0 0g
0 90° 0 6’;
0 —90° 80 I9;

 

 

 

 

 

 

 

 

 

\lCﬁUlvbOOlO

 

 

 

 

 

 

matics method is beyond the scope of this thesis. As studies in references [30,32,48],
this work focuses on using learning-based methods to approximate the mapping from

cartersian end-effector work space to joint space.

2.3 Dynamic Model and Control

In this section, we derive equations describing the time evolution of the robot’s con-
ﬁglues based on torques or forces under a known environment (e. g., repeated tasks on
a manufacturing assembly line with ﬁxed payload). It is worthy noting the known-

e”"”' 011111th assumption is not valid for some robotic: applications. Those cases in-

26

 

 

 

Figure 2.11: Coordinate frames of the left arm. Two Spherical mechanisms (shoulder and
arm) are designed.

clude: the dynamics of the manipulated object changes tremendously during a manip-
ulation task; dynamic equations are not available due to the complexity of the system
and lacking of domain knowledge. However, deriving such equations of motion is still
useful for designing a suitable low-level control law for Dav’s mechanism, and may

serve as a survey of the model-based control method.

The dynamic model of a robot can be obtained from known physical laws such
as the Newtonian mechanics and Lagrangian mechanics. The derivation of robotic
dynamics based on the Lagrange-Euler (L—E) formulation is more systematic; we
hence derive Dav’s dynamic equations based on the L-E formulation. We do not

Present the whole robot, which is cumbersome. In the following, we ﬁrst outline DE
a"ﬁlysﬁs on a general setting. We then apply it to a general n—link robot manipulator.
MObﬂe base’s dynamic analysis and control are relegated to Section 2.5 because the

base is a. redundant-actuated system with non-holonomic constraints and, thus, needs

Sp 90” 3.1 t reatment.

27

2.3.1 Lagrange-Euler Formulation

In the L—E formulation, the system’s dynamic behavior is described in terms of work
and energy using generalized coordinates. All the constraint forces are eliminated in
this method.

Assuming a robot has n DOF S, the L-E formulation is expressed in the following

form

at 6L 8L

32(671.)_ SE = 1. (2.12)

for i = 1, 2, ...,n, where
e L: Lagrangian function and equals to kinetic energy K - potential energy P

K: Total kinetic energy of the robot

P: Total potential energy of the robot

q,: Generalized coordinates of the robot

(j; Generalized velocity, or ﬁrst derivative of q,- with respect to time
o 17,-: Generalized actuating force (torque) applied to the robot at joint i

Now we are ready to derive the dynamic equation for a robot with it links. It is

known the overall kinetic energy of a rigid body is given by

1 1
K = Emv'fvc + éwTIw, (2.13)

Where Vc and w denote respectively, the mass center’s translation velocity vector and
angular velocity vector of the rigid body; m and I are the total mass and inertia

matrix respectively.

28

 

2.3.2 Analysis of Manipulator with n-link

In the D-H representation, the joint variables can be served as generalized variables
of L-E equation. According to the velocity kinematic analysis of [89, Chap. 5], the

ith link’s linear velocity of mass center and angular velocity are written as

vCi = Jvci(q)qi wi = R?(Q)chi(q)q (214)

respectively, where Jun. and chi are J acobian matrix for link i; 1% denotes the rotation
matrix of the ith frame with respect to the inertia frame (frame 0). Let re, be the
mass center of link i in the inertia frame (oroyozo). Since Dav has rotational joints

only, Jvc, and LC, can be calculated by

Jvci
= [J1 .12 .1.- 0 o], (2.15)
chi
Zi_XI‘.—Oi_ Z_XO—0_
whereJi= 1 (6' 1) andJk= k1 (k k1) fork<i;0,-andz,~
z;_1 zk—l

denote the origin and z-axis of the i frame of the D-H representation in the inertia

frame respectively.

Therefore, the total kinetic energy can be expressed by
1 " 1
K = 5 ‘7‘: [m.~J3;J... + J1,R.I.R.TJ...] q = §qTD(q)q (2.16)
i=1

Where 1,- denotes the inertia matrix with respect to the frame 2'.

The Lagrangian function are then written as
1 . .
L=K-V=§&Dmm+vmj nu)

Applying Eq. 2.17 into Eq. (2.12), we obtain the commonly used motion equation in

29

 

matrix form (see [89, Section 6.3] for the detailed derivation):

D(q)<'i + C(q, (1)61 + g((I) = T (2-18)

where C(q,<'1) denotes centrifugal and coriolis terms; g(q) denotes gravity related

terms; 7 denotes the vector of the output torque of the motors.

2.4 The Low-level Control Scheme

We use the proportional derivative (PD) control at the joint-level, where each joint
of the robot is treated as a servo-mechanism. Notice that using joint PD control is
ineffective because it does not take account of the coupling effects from the other robot
links. However, the dynamic analysis of Section 2.3 provides a way to compensate the
nonlinear factors. Hence, this control scheme can achieve better performance than

that of conventional PD control.

2.4.1 Model of Motor

Each joint of the Dav robot is driven by a DC servo motor. This subsection outlines
the DC motor’s model by deriving the transfer function from which control algorithms
will be obtained.

Illustrating in the schematic diagram of Fig. 2.12, we introduce the following

notations:
o V = armature voltage
0 L = armature inductance
O R = armature resistance
’ V; = back EMF voltage

.-

. 20 Q armature current

30

 

0 6m = motor shaft’s displacement (in radians)
e 6,, = joint shaft’s displacement (in radians)
e rm = motor torque

0 TL = load torque

0 K0 2 torque constant of the motor

0 K b = back EMF constant

e Jm = robot’s moment of inertia

e Bm = viscous friction coefﬁcient of the rotor

e n = gear train ratio

The following differential equations outline the behavior of a motor:

rm(t) = Kaia(t)

Vb“) = Kb9m(t)

var) = Ria(t) + 1114;}Q + v,,(t)

7mm — an(t) = Jméma) + Bmém(t)

After applying Laplace transform to those equations, we obtain:

Tmlsl = Ka1a(8)

V148) = Kb9m(3l

V(s) = 1210(3) + SLIa(S) + Vb(s)

rm(s) — an(s) = s2JmOm(s) + sBmOm(s)

(2.19)

(2.20)

Organizing terms in Eq. (2.20), we get the transfer function from the armature voltage

V to the angular displacement of rotor Om (with TL = 0) as

Om(s) _ Ka
V(s) ‘ s[(Ls + R)(Jms + Bm) + 1mm]

 

 

31

(2.21)

 

And the transfer function from the load torque TL to Om is given by (with V = 0)

9(3) _ —n(Ls + R)
71(8) — S[(Ls + R)(Jm3 + Bm) + KbKa] (2.22)

 

 

It is assumed that the “electrical time constant” % is much smaller than the “me—
chanical time constant” 3%, the influence of L/ R is negligible. Therefore Eqs. (2.21)

and (2.22) are simpliﬁed to be

 

 

 

 

9m(8) _ Ka/R
V(s) _ 3(Jms + B... + KaKb/R)’ (2'23)
and
9(3) n
= __ 2.24
rL(s) 3(Jm3 + Bm + KbKa/R)’ ( )
respectively.
By superposition, Eqs. (2.23) and (2.24) represent a second order system
sszOm(s) + 3(Bm + KbKa/R)Om(s) = (Kc/R)V(s) —- an(s). (2.25)

Usually gear trains are mounted between the motor and joint shafts to increase

the motor load capacity. If the gear ratio n E 3:, where 6;, denotes the angular

displacement of the output shaft, then Eq. (2.25) becomes

 

1 1
52FJmeL(S) + Sgg—(Bm + KbKa/R)GL(8) = - 71(8), (226)

Whose block diagram is shown in Fig. 2.12. It is worthy noting that Jm in Eq. (2.26)

represents the effective motor inertia, the sum of the rotor and gear inertias.

2.4.2 Tracking Algorithm

we ﬁrst, reformulate the nonlinear dynamic analysis by combining the dynamic effects

0f the rIlotors. We then derive the PD feedback control law with acceleration and

32

 

 

 

 

 

YE.) 1m Ka +m 7 l ems) 9L(s)

st+ B

 

 

 

 

 

 

 

 

 

 

 

Figure 2.12: Block diagram for a DC motor.

gravity terms, which has been implemented on Dav’s joint controllers.

Recalling the robot equations of motion Eq. (2.18) and the motor’s equation of

motion Eq. (2.26), we reinstate them as

 

2 draws) + Z Cijk(Q)(li(lj + gk(q) = n. (2.27)
i=1 i,j=1
and
1 -- 1 . 1(0ka
DEJmkaLk + g(Bmk + Kkaak/R)6Lk — nkRk 7‘ka (228)

respectively. Comparing with Eq. (2.26), some notations of Eq. (2.28) are added a
scription k for the kth motor and Eq. (2.28) represents ith motor’s dynamics in time

domain by differential equation.

We observe that the kth generalized variable qk is actually represented by the
joint shaft displacement «91,, and torque load rLk = rk. Let 8,, = Bmk + Km. Kak/Rk.
Plugging Eq. (2.28) into Eq. (2.27) yields

1 n n 1 K
__Jm " d- .. '2'" —B' =—1V, 229
”’2: Art; My +1.20:qu (lying ka‘l'gk MR 1: ( )

f01' 1‘? = l,...,n.

In a. matrix form, Eq. (2.29) can be written as

(19(01) + J)<'i + C(q, (1)61 + Br’I + g(Q) = 11. (2-30)

33

where D(q) is the n x n inertia matrix of the mechanical links, and J is a diagonal
matrix with diagonal elements $511. The C (q, ('1) and g(q) are deﬁned as before and
k

input control vector u is speciﬁed by

T
11: age/.1 5221/3 LL , (2.31)
njHl 71'sz 71an

The objective of joint-level tracking is to servo the motor such that the output
shaft’s displacement can track a reference path provided by a path planner. An

independent joint PD control can be written in a vector form as
‘1 = “K1001 ‘— (1d) — Ker“! — 9d), (232)

Where K p and K d are diagonal matrices of proportional and derivative feedback gains
reSlf><=:(:tively; qd and q“ are the desired displacement and speed of the joints respec-

tively, which are given by a path planner in advance.

In [89, Page 217-218], Spong & Vidyasagar show that, in the absense of gravity,
that, is, if g is zero in Eq. (2.31) for all conﬁgurations, the PD control law speciﬁed
in Eq- (2.32) achieves asymptotic tracking of the desired position q“ with q“ = 0.
However, the presence of the gravitational term or nonzero q does not guarantee the
a'SbrtrIIDtotic convergence of Eq. (2.31). A “stable” tracking error from the desired

tr ‘ . - -
a3 eC:tory lS needed to balance the grav1tatIonal torque g.

In the practical implementation, Dav moves at a slow speed: qd as 0; the dynamic
terms such as off-diagonal inertial elements djk, j 75 k, the centripetal and COl'iOIiS
term C(q, ('1), and damping from motor B(q) are usually small. Notice that the
comDUted torque rk is proportional to the gear ratio; thus those dynamic terms are
redueed further for a small nk, which adds to the validity of that the terms djk, j 75 k

are
Ilegligible for control. The gravity effect speciﬁed by g plays a dominant role in
th

e
II‘Dbot dynamic model. In order to compensate for gravity, the following control

34

law is proposed for Dav’s low level control:

11 = Dead + g(CI) - Kp(q — <1") - Kd(<'1 — CI"). (233)

where Deff is a diagonal matrix whose diagonal elements denote the effective inertia

of each link of the robot.

Eq. (2.33) can be written in the component form as

”I: = derriik + 91:01) — kp(Qk — (If) — kd(dk — (fig), (234)

Where de , kp, and kd are effective inertia, proportional gain, and derivative gain of
the Ir: joint respectiVely. Applying Eq. (2.34) into Eq. (2.29), by assuming def; =

dick + Jmk/ni and letting
1
c = d" .. .1... _ .
I: Z quJ + 2 Ctqu q] + "[2: Bka
#k 1,1
denote the disturbances, we Obtain the following closed-loop system equation for the
jOint k:

deffék + Ck = —kp€k — kdék, (2.35)

Let us remark that the control law speciﬁed in Eq. (2.34) is derived under the
cm>tl<iition of low-speed with gear reduction. It should be note that, for high Speed
[not ion, or for motion without gear reduction at the joints, the coupling nonlinearities
have a much larger effect on the performance of the system; treating them merely

ext
er11a] disturbances can cause large tracking errors.

35

2.5 Dav’s Mobile Base: Control of System with
Nonholonomic Constraints

Like other wheeled mobile robots involving rolling contracts between two or more rigid
bodies, Dav’s mobile base is an example of a mechanical system with nonholonomic
constraits. Moreover, Dav’s mobile base is a redundantly actuated mechanism [40]
whose surplus control inputs offer a solution to the holonomic vehicle that can move
to arbitrary direction at any conﬁguration.

In this section, we apply a theoretic framework Similar to [80] to Dav’s mobile

base and derive a feedback linearization control scheme.

2.5.1 Nonholonomic Constraint

If a constraint equation is in form h,(q) = 0, or can be integrated into this form, it
denotes a holonomic constrait. Otherwise, it denotes a nonholonomic constrait.
Holonomic and nonholonomic constraints affect mobility of a vehicle in a com-
pletely different way. For illustration, consider a single Pfafﬁan constraitl in an
n‘body system: aT(q)q = 0. If the constraint is holonomic, then it can be integrated
as h’ (q) = c with 3% = aT(q). The motion of the system is hence conﬁned to lie on a
particular level surface of h, depending on the initial condition through c = h(q0). A
Conrlrllon practice for L—E mechanics to cope with a holonomic constraint is to reduce
a. generalized variable since the total degree of freedom is n — 1. On the other hand,
If the constraint is nonholonomic (i.e, such a It does not exist), then the system can
reach any admissible conﬁguration, although at each conﬁguration the instantaneous
mot i011 (velocity) of the nonholonomic system is restricted to an (n - 1)-dimensional
Space- More precisely, the velocities of the system are conﬁned to lie on the null space

of
the constraint matrix aT(q).

\

(e_ ' I()st wheeled robots have kinematic constraints of this form, which are linear in the velocities
. ’ Dav’s kinematics in Eq. (2.7))

36

A well-known example to illustrate nonholonomy is a conventional wheel, shown in

Fig. 2.13. The generalized coordinates q: (:1:, y, 6) )are deﬁned In Fig. 2.13. The pure

/©/

Figure 2.13: A rolling disk on a plane. The disk cannot move along the direction perpen-
dicular to its face at each instant.

 

 

rolling constraint can be expressed as :1: sin 6—2) cos 6 = 0. The allowable velocities are

confined in the null Space of the constraint matrix aT(q) = (Sin 6, — cos 6,0), which is

I - p1

cos 6 0
null(aT(q)) = span sin6 , 0
0 1

 

 

 

 

HOWever, any conﬁguration q, = (:1: f, y f, 6 f) can be reached using the following strat-

egy;
1 - Roll the wheel until it aims the point (:1: f, yf)
2 - Reach the point (rf, yf)
3 - Rotate the wheel vertically until its orientation is 6 f

Now we consider a mechanical system with n degrees-of-freedom subject to m

c
01131; raints of Pfafﬁan form:
aMQQ=Q nan

f0 -
r 2 t: 1,2, ...,m. Eq. (2.36) can be written in the form

Amm=0, (2W)

where A(q) is an m x n full-rank matrix. Let sl(q),...,sn_m(q) be the basis vector

fields of null(A), the null space of A(q); i.e.,
A(q)s,—(q) = 0, for z' = 1, ..., n — m. (2.38)
Let S(q) = [sl(q), ...,sn_m(q)] and Eq. (2.38) can be written as
A(Q)S (<1) = 0- (2.39)
Let A be the space spanned by these vector ﬁelds

A = span{81(CI), Sn-m(Q)}-

A may or may not be involutivez. If A is not involutive, we deﬁne A" as the smallest
involutive space containing A. Obviously, dim(A) S dim(A*). Campion et al. [20]

Observes three cases:

1 - A is involutive. The m constraints are all holonomic

2 - A is not involutive and dim(A") = n; i.e., A" spans the entire space and all

constraints are hence nonholonomic.

A is not involutive and dim(A*) < 72; both nonholonomic and holonomic con-

straints exist.

2" 5 - 2 Dynamic Analysis of Dav’s Mobile Base

Rec
all the mechanism of Dav robot’s mobile base, as shown in Fig. 2.5. For dy-

113.11): . . . . . . .
1C analysrs, we introduce the followmg symbols 1n addition to those deﬁned in

section 2.2:

2\
For

the concept of involutive space, reader may refer [89, Page 265].

38

x = (mayor/1): The geometric center and orientation of the robot. Note that

x=v

o m: The mass of the whole platform

0 I : The inertia momentum of the whole platform

0 Iw: The inertia momentum of each wheel

1,: The inertia momentum of the moving part of steering mechanism along the

joint axis in each wheel.

The conﬁguration of the platform can be described by 11 generalized coordinates

(n = 1 1). The ﬁrst eight variables describe the angular displacements of the wheels

9; the remaining three are the platform conﬁguration, represented by x. Thus, q,, =
(QT, x7] = [¢1,pl.¢2,pz,¢3,p3,(amaze, ye, 111T-

We assume that the driving wheels roll and there is no slip along the direction
perpendicular to the face of the wheel. Those kinematic constraints have been rep-
resented by Eq. (2.7), which gives eight equations (m = 8). Eq. (2.7) can be written
in Pfafﬁan form as:

A(qmr = 0 (2-40)

A(q) = [18x8 —B] . (2.41)

a.
Dd B is deﬁned in Eq. (2.8).

It is straightforward to verify that S (q), the null space of A(q), can be selected
as

S(q) = [B 13.3] (2.42)

39

It is obvious, because

Since the system’s velocities are conﬁned in the space A spanned by S (q)’s column

vectors, it is possible to deﬁne three velocities v = [111,112,12ng such that

(1., = S(q)V- (2.43)
Note that v in Eq. (2.43) is actually the base’s speed (113, vy, w). This is because

' B
q”: q =Sv= v.

I I

Thus, ('1 = Bv. Inspecting Eq. (2.7), we can see v = (11x, 113/101)-

N Ow we are ready to derive dynamic equations for Dav’s mobile base. Assuming
Dav moves on a level ﬂoor, we neglect the effects of gravity. The Langrange function

IS equal to the total kinetic energy only, which is

”10an

4
1 1 .
=Z(1 d} +pr) +2(i§+yf)+-§I¢2 (2.44)

Applying Langrange-Euler to the above equation, we obtain
Me], = ET — AT(q)/\, (2.45)

w
here 7' is an eight-dimensional vector representing motor’s output torque and A is

th
6 Vector of constraint forces. E denotes input transformation matrix and M denotes

40

an 11 x 11 inertia matrix, which is deﬁned

M: JO 0
0 Mo

J :dia 187IW)18)IW)IS)IIU?IS)IW

0 gil _ l) (2.46)
(m 0 0

Mo: 0 m 0
_0 0 1_

 

 

We notice that there is no actuator acting on variables (r136, yo, (0) directly, thus E is

I
designed to be sxs

03x3

Differentiating Eq. (2.43) and plugging Eq. (2.45) at the left side, we obtain
ET — ATA = MSv + 114512, (2.47)

Where we omit the argument q of variables S and A, for simplicity.

Multiplying Eq. (2.47)’s both side by ST, using the fact that AS = O, we obtain

sTMsv = —STMSv + STET. (2.48)

Q()mbining the kinematic model Eq. (2.43), we obtain the reduced state-space
Irlode1:
q, = Sv

. (2.49)
STMSv = —STMSv + STET

41

Applying Eqs (2.42), (2.46) and E’s deﬁnition to Eq. (2.49), we obtain the sim-

pliﬁed dynamic equation for Dav’s base, in the component form:

q = Bv
x = v (2.50)
(BTJOB + Mow + BTJOBv = BTT

We observed that

3 3

. ab.- . _ ab,-
Bv - g(v, 61; )q _ 201.5an (2.51)

i=1

 

Where b.- denotes the ith column vector of B. The Jacobian matrices %’ z' = 1, 2, 3

can be easily derived from Eq. (2.7), and are written as:

I cos (bl/b 0 O 0 0 0 0 01

— sin 1251/1" 0 O 0 O 0 0 0

0 0 cos (ﬁg/b 0 0 0 O 0

£9513 : 3 g — sinqug/r 3 CO 0 0 0 0 (2.52)

s (123/b 0 0 0

0 0 0 O — sin 453/7' 0 0 0

0 0 0 0 0 0 cos (154/b 0

0 0 0 0 O 0 — sin ¢4/r 0

 

 

42

 

 

 

sin 451/ b 0 0 0 0 0 0 0
cos (251/? O 0 0 0 0 0 0
0 0 sin gig/b 0 0 0 0 O
b 0 0 cos (a r 0 0 0 0 0
%—2 = 2/ (2.53)
‘1 0 0 0 0 sin 43/5 0 0 0
0 0 0 0 cos 1153/1" 0 0 0
0 0 0 0 0 0 sin 424/ b 0
O 0 0 0 O 0 cos (154/1: 0
-—-ac4>1/b — (ism/b 0 o o o o o o-
asdn/r — acnl/r o o o o o o o
0 0 —ac¢2/b+ as¢2/b 0 0 0 0 0
61:3 _ o o asabz/r + ac¢2/r 0 o o o 0
6q - 0 O 0 O acos/b + as¢3/b 0 0 0
0 0 0 0 —as¢3/r + ac¢3/r 0 0 0
0 0 0 0 0 0 acdm/b - as¢4/b 0
. 0 0 0 0 0 0 —as¢4/r - ac¢4/r OJ

 

 

(2.54)
In Eq. (2.54), s and c denote sin(.) and cos(.) respectively, for shortness.

2 - S .3 Controller Design

Alt hough mechanical system with nonholonomic constraints cannot be stabilized at
a. point by a smooth time—invariant feedback rule in general settings [17], we show

e3(3F3erimentally that Dav’s mobile base can track different curves with high perfor-

II“ta-Ifme.

We notice that rank(BT) = 3 and dim(T) = 8, thus the system is sufﬁcient
a.
QtIJated, or redundantly actuated, to be more precisely. However, those surplus

i
I11:)11ts offer a solution to implement the holonomity of the base.

43

For feedback linearization, we use following input transformation,
T = H((BTJOB + M0)u + BTJOBV), (2.55)

where n denotes the input of the “new” system and H is an 8 x 3 matrix satisfying
BTH = I. A choice for H is the pseudoinverse of BT; i.e., H = (87)“. Another

choice of H is the solution of the following minimum norm problem3:
min TTT s.t. F = BTT where F = (BTJOB + Mo)u + BTJOBV. (2.56)

Thus, by referring [81, Page 395], we can see the solution of Eq. (2.56) is T =
B(BTB)'1F and
H = B(BTB)-1. (2.57)

It is worthy noting that the minimum norm solution of H has a nice property: it

distributes the torques in a way that minimizes the actuators’ output torque.

We notice that the minimum norm solution of H does not guarantee that the
killematic Eq. (2.7) be satisﬁed. In fact, due to external disturbance or unmodeled
er r or, the ('1 might not conﬁned the distribution A. We need a mechanism to entrap
the velocities of the system within the distribution A. Motivated by sliding mode

control [87, Chap. 7], we add an extra term to Eq. (2.57) which is servoing on

the residue of Eq. (2.7). More formally, let q = Q,» + q], where qp = Bqu and

q—k = (I — BBl)q (see Fig. 2.14). Thus, Eq. (2.57) becomes

H = B(BTB)‘1 + AT (2.58)

\

D1 F has a physical meaning. F denotes the desired equivalent force exerting on the vehicle
Vii-t form. We treat the vehicle frame as an end—effector and T as the joints’ torque vector. The
F731‘Jal displacement of the joint variable 6q = 36x. The virtual work 611} of the system is 611) =
6x — 7T6q = (FTB — T)6x. 6w = 0 if the platform is in equilibrium (the equivalent forces equal

t.
C) the actual driving force), then T = BTF.

44

where AT 2 —quJ_ and K p is the gain constant. We can easily verify that BTH : I.
Let us remark that, Eq. (2.58) will improve the transient behaviors, especially when
the wheels are not synchronized initially. This is shown in Fig. 2.14. When q does

not lie on A(q), the A?“ will pull the q back to the distribution A.

 

 

 

Figure 2.14: If q is away from the distribution A, the extra term AT in Eq. (2.58) will
pull it back.

Therefore, applying Eq. (2.55) to Eq. (2.50), we obtain the following linearized

S t at. e—space equation

Q=Bv
v=u

'The dynamic system described by Eq. (2.59) is known as the double integrator sys-
tem since it represents 3 uncoupled double integrators with respect to x = (3:1, :132, £133)
or Qoordinates of the base’s center (are, ya, 11)) in Fig. 2.5. This means that each input
21“? > the kth component of u, k = 1, 2, 3, is a function only of wk. That is, ul and 112
CO Iltrol the motion along :1: and y axes respectively, and 113 controls the rotation of
thQ base.

Since u can be designed to control the three simple linear second-order systems,

t . . . . .
hQ obv10us chorce 1S usmg PD control by setting

u = —K,,(x — xd) — Kd(x — it") + it“, (2.60)

45

where xd, 56’ and 2‘1 denote the desired trajectory of the base; KP and K d are the PD

control’s gain matrices with a common choice as

_ - 2 2 2
Kp —' dlaglwl,w2)w3]

(2.61)
K d = diag[2w1, 21.02, 21123]

double integrator system

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I I
I ..
xd Lr'\ u l (:11 {(3503 + M) u 'C 7 Mobile Base F; _l_ x g

S{ ! +Bflaév} Dynamic System q l s '
‘ l A i

i l

!

L ....................................... _!

Linear Feedback

 

 

 

 

 

 

 

Figure 2.15: Schematic diagram of control algorithm of Dav’s base.

Fig. 2.15 illustrates the control diagram of Dav’s base, which is an inner-
loop/outer-loop control. The computation of the nonlinear control Eq. (2.55) is
performed in an inner loop, with u, q and v as its inputs and 'r as output. The

outer loop in the system is the PD control calculating 11 based on the Eq. (2.60).

2.5.4 Experimental Evaluation

We developed a simulated base experiment to verify the validity of the dynamic model
and the effectiveness of the control scheme presented in the previous section. The
simulated base model is kinematically dynamically similar to the Dav’s mobile base.

The dimensions and the inertia parameters are speciﬁed in the following, according

46

to the previously introduced notations:

7‘ = 0.0889m
a = 0.223m
b = 0.0508m

m = 250kg

1 .—_- 174.5kg-m2
I, = 0.0313kg—rn2
Iw = 0.005kg-m2

The goal of the experiment is to control the base such that the base’s geometric
center coincides on a predeﬁned trajectory. As [86], we formulate the control problem

as a path-following problem instead of trajectory tracking.

Consider a circular trajectory. Let ($0,310) and R be the center and radius of
the circular path respectively. 1) denotes the desired forward translation and v =
W. We assume the base moves along the path while keeping the orientation
unchanged, i.e., «p = 0. This path-following problem can be formulated as controlling

the following system such that the output 2 follows a desired trajectory 2":

x = v
v = u
(2.62)
1110‘)
z = ,
h2(V)
vi + v: . .
where h1(x) = (:rC — $0)2 + (ye — :170)2 and h2(v) = . The Jacobian matrix
(.1)

47

of the output equation of the system speciﬁed in Eq. (2.62) can be written as

211x 211 0
H2 = €72} = y (2.63)
0 0 1

 

Therefore, if we regard the output 2 as the state variables, by using .731 = H15: and

2’2
= H211 to the ﬁrst two equations of Eq. (2.62), then we obtain:
23

3'1 = 2(:1:C — 11:0)u1 + 2(yc — y0)u2 + 2113+ 211:
22 = 211x111 + 2vyu2 (2-64)

23:11:;

where a feedback linear control law might be applied.

The simulated sensors were the encoders placed on each motor. At each sampling
instant (every 10 milli-seconds), the base received the joint angular displacement
vector q and the corresponding derivative q. From Eq. (2.7), the vehicle’s speed

v = Bq. The base’s pose (:cc,yc,1,b) were estimated via deadreckoning.

Let the base’s initial position is (:rc,yc,1/1) = (0.0, 08,90"); initial velocity was
zero; R = 5; v = 1. Therefore, zd can be chosen to be (25,0.25,0)T. Fig. 2.16 shows
the result. Subplot (a) shows the actual and desired path, which are illustrated by,
respectively, solid and dot-dash lines. In order to demonstrate the robustness of
our model respect to the dynamic parameters, we increase the the mass and inertia
momentum (m, I, 1,, Iw) by 100 percent. The result is shown in (b) of Fig. 2.16. We
can see the overshot and small stable deviation from the desired path; but the overall

performance is impressive.

Finally, let us remark that, although the system shows the robustness for the

48

 

 

 

 

 

 

 

   

 

 

fStart Point . r fStart Point 1
Op [0.
1[ .1.
2- , 2.
3. . 3»
4. . 4-
5» , 5»
6. . 6»
7. q 7-
8. - 81- '1
9. . 9.
1o . ... .. 1. . 1 . 10» . . «
6 4 2 0 2 4 6 6 6

 

Figure 2.16: The result of the simulated base experiment.

changing dynamic parameters, it is sensitive to the accuracy of kinematic model
and deadreckoning. If the kinematic model is wrong, the base cannot decouple the
wheel motion correctly and, in an extreme case, the wheel might ﬁght each other. In
addition, it is known that deadreckoning performance corrupts quickly with increasing
traveling distance. Feedback from the other sensor modalities are needed; for example,

laser ranger and GPS might be used for indoor and outdoor navigation respectively.

2.6 The Distributed Control System Design

As a self-contained mobile humanoid robotic system, Dav carries its own power source,

sensors, control system, learning system, and associated hardware.

2.6.1 Hardware Design

The hardware architecture of Dav has two levels: a main computer system and low-

level servo control modules, as shown in Fig. 2.17.

49

 

     

 

         

 

 

 

     

..oEEEn 882 a 2352 8am _b_ A
v. 05am
882 ® _ _ _
“VF—owned Paom 530m.
...,.mmmemeEmﬁw .....

M.....m_mmmx.a:=mm.mwﬂ2x

m0 030,—.

   
         

      

   

3.6-x mo .36;

m0 3302

Neck/Torso
Module

      

0
m0 n8:
30-x 93.»—

93-.. 30.4
2352 as: .

 

Figure 2.17: The hardware architecture for Dav has two levels: a main computer system

and low-level servo control modules.

50

Main Computer System

In order to build the main computer system, it is essential to know the requirement of
the task. The developmental program must process high—dimensional sensor data such
as vision and auditory input in real time. The computation power is hence critical
to achieve this real-time processing requirement. Various solutions were considered
for the high-level computer system to achieve the requirements. The single and dual
processor systems are not sufﬁcient to achieve the required throughput. A choice
for a quadruple-processor motherboard is required. Therefore, desktop motherboards
(ATX) with 4 Pentium III Xeon processors and 6 PCI slots were chosen. Fig. 2.18

shows this main computer system with a LCD monitor.

The main computer system situates in the middle of Fig. 2.17, including an 100G
SCSI RAIDO disk array, two frame-grabbers for left and right cameras, a sound card,
a wireless Ethernet card, and an Ethernet card connecting CAN PC (see Fig. 2.20) for
interaction with low—level modules. Being the “brain” of Dav, this computer system

handles all “cognitive” computation during learning or testing sessions.

 

Figure 2.18: The “Brain” of Dav, a. quadruple processor desktop PC. The SCSI RAIDO
disk array is not included.

51

Sensors

Dav has visual, auditory, tactile, range, encoder, strain gage, and current sensors.
The visual system consists of two micro color CCD camera heads with diameters
of 7mm and focal lengths of 2.2mm. They are installed on the eye platform in
the head. Each camera has a separate control box that is located in the torso,
providing automatic gain control and automatic electronic shutter control. Video
signal sampling is performed by two frame-grabber cards, which communicate with
processors via a PCI interface bus. Two microphones are mounted on each side of
the head. The auditory signals go through pre—ampliﬁers before they are sampled by
a sound card in the main computer. The touch and torque sensors are important for
hand-in-hand teaching, i.e., supervised learning. A laser range scanner is mounted in

the front of the robot to realize range sensing.

Embedded Controller

In the low-servo control level, a distributing modular scheme was adopted. The
low-level actuator control units (ACU) were placed as close to actuators or sensors
as possible, to reduce the wiring. Eleven ACU modules have been designed and
fabricated to satisfy the requirements of each motor groups, and were networked to
CANPC by four CAN buses. This architecture reduces hundreds of signal wires to
only two wires, through the entire body, plus other three wires for power supply (See
Fig. 2.19). Consequently, the system reliability increases.

Each of microprocessor-based servo modules is composed of two connected boards:
servo interface board and an embedded Motorola PowerPC MPC555 board with 448k
internal flash, 24K internal RAM, TPU, PWM, and ADC, as shown in Fig. 2.19 (b).
The signal connector between the two boards serves two purposes: to attach modules
physically together as well as electrically. Those ACU modules, with capability of
interfacing up to four motor/ joints, communicate over a CAN bus with a rate of 1M

bps.

52

CAN serial bus (two twist wires)

 
 

 

     
  

CAN physical t -
processor
(with in-chip

RAM, EEPROM,
TPU, PWM, CAN,
ADC modules)

 

 

 

 

 

 

 

  

 

 

Power Power supplier 12-15V
Management 1——

and processor

supervisor

 

Figure 2.19: The block diagram of PowerPC based controller.

‘1

s new?

mung; sf; =
‘-‘....." H '

   

(a) (b)

Figure 2.20: CANPC: a PC104 embedded system running RealTime Linux operating sys-
tem with four CAN interface (a) PC104 computer board (MZ104+). (b) PCM3680 Dual—
CAN interface board.

53

Distributed Architecture

Fig. 2.17 illustrates the overview of the hardware design. There are head, left arm,
neck-torso, right arm, left hand, base, and right hand modules, from the top to bot-
tom and the left to right. Each motor has a driving ampliﬁer and sensors such as a
quadruple encoder and a torque sensor“. The head module has an ACU: HeadCB,
controlling the Tilt, L—Pan (left pan), R-Pan (right pan), MT (mouth), and EB (eye
brow) joints. The left arm module has two ACUs: L—ChestCB and L—ArmCB. L-
ChestCB controls the left arm’s J2 (Joint 2), J3 (Joint 3) and J4 (joint 4) while
L-ArmCB controls the left arm’s J5 (Joint 5), J6 (joint 6) and J7 (joint 7). Similarly,
the right arm module has R-ChestCB and R-ArmCB ACUs, which control the corre—
sponding to the left arm. The torso module has two ACUs: NeckCB and TorsoCB.
NeckCB controls the neck’s three joints: Neck-R (roll), Neck-P (pitch), Neck-Y (yaw)
while TorsoCB controls the torso’s two joints (torso’s Joints 1 and 2) and the first

joint of the two arms (left arm’s Joint 1 and right arm’s Joint 1).

The base module is the hub of the whole system where the main computer system
is housed. The base module has two ACUs: L—BaseCB and R—BaseCB. L—BaseCB
controls W—l-D (Wheel 1’s driving motor), W—l-S (Wheel 2’s steering motor), W-2-D
(Wheel 2’s driving motor), and W—2-s (Wheel 2’s steering motor) joints. R—BaseCB
controls W—3—D (Wheel 3’s driving motor), W-3-S (Wheel 3’s steering motor), W—4—D
(Wheel 4’s driving motor), and W—4—s (Wheel 2’s steering motor) joints. In addition,
the fabricated power board and 4 sealed lead acid batteries (with capacity of 110AH
@12V for each one) are placed in the base module. The range ﬁnder (PLS, SICK
00.), connected to main computer by a RS232 interface (port 1), was mounted at the

front side of the base for realizing the range sensing.

We notice that those ACUs are not directly connected to the main computer sys-

tem. They are connected via CANPC instead. CANPC has four CAN buses: CanO,

 

4Armature current sensors were placed on motors of all modules except the two arms.

54

Canl, Can2, and Can3. As shown in Fig. 2.17, CanO links TorsoCB, R—ChestCB,
R-ArmCB and R-HandCB; Canl links L—ChestCB, L-ArmCB and L—HandCB; Can2
links Neck-CB and Head-CB; Can3 links L—BaseCB and R-BaseCB.

At the top of Fig 2.17. Two microphones with pre—ampliﬁer are linked to a sound
card plugged into main PC’s PCI slot. Two micro camera heads, ﬁxed in the inside of
the head mechanism, are linked to two digital control boxes housed in the base. The
control boxes’ outputting RGB signals are sent to the frame-grabbers in the main

computer.

2.6.2 Software Development

Dav’s control system involves the development of the large-scale control system that
must operate with a signiﬁcant and high bandwidth. A network of embedded proces-
sors adds to the complexity of software development. Many issues such as modular-
ity, scalability, reusability, internal communication transparent to location, ﬂexibility,
plug—and-play capabilities and open architecture, are needed to be considered.

There are substantial amounts of work addressing those issues and a lot of solutions
are proposed. For example, Ayllu [53] and COLBERT/Saphira [52] are used to control
the ActivMedia Pioneer robots. However, those proposed solutions are designed for a
particular control philosophy. They do not intend to encapsulate and to hide the low-
level device details, in the meanwhile offer enough flexibility for high-level program
if a different control strategy is needed. Player [96], DCA [60,70] and OROCOS [1]
are hence proposed. The latter provides standard software packages (e.g., abstract
device model) to facilitate high-level robot software development.

As in Player [96], Dav’s software development was guided by our desire to con-
currently support many heterogeneous devices and many heterogeneous clients. Each
device operates at some inherent frequency, with wide variation among devices. For
example, the popular SICK PLS laser ranger returns a full scan at approximately

8H2, while the robotic joints can give encoder feedback at almost 1000Hz. If we were

55

to take the naive approach and poll each device in turn at the rate of the slowest de-
vice, we would be discounting the full capabilities of the available resources. In most
cases, we want to obtain data from and send commands to each device at the highest
rate possible in order to fully exploit the hardware and maximize the responsiveness
of the system. Similarly, each client operates at some inherent frequency; while a
simple client written in C++ may be capable of consuming new data at 100Hz, a
graphically intensive client written in X-Window might operate at less than le. It
should be possible for these two clients to be connected to Dav simultaneously and
to execute as fast as they want without interfering with each other.

Moreover, in order to design a software architecture on a highly distributed sensing
and control hardware system as Dav, we need handle data ﬂows between sensors,
processors and actuators effectively in groups and across different CAN buses. Dav’s
designed software architecture, shown in Fig. 2.21, provides a client / server model for
realizing the robot interface. Acting as a client, the high-level control program is
separate from the server, which executes low-level device operations either on ACU
boards or local main computer system, via a uniﬁed UDP socket interface. We have

three major considerations for choosing such a UDP socket based robot interface.

0 Distribution: A client (high-level program) can access to sensors and actuators
from any workstation on the Ethernet network. For example, a clients may con-
nect to multiple ACUs while a ACU accepts connections from multiple clients

if no conﬂict exists.

0 Convenience: Clients can be written in any language as long as they support
socket programming. For example, the user can choose C/C++ for high speed

application, Java for trans-platform capability, and MATLAB for quick proto—

typing.

0 Light-weight device server: We feel the commonly accepted CORBA interface

is too sophisticated to ﬁt robotic application where real-time constraint is pri-

56

mary important, even though a lot of effectors have spent to ensure CORBA’S
performance (e.g., latency, throughput) is reasonable. ACE (Adaptive Commu-
nication Environment) and RPC (Remote Procedure Call) are two communi-
cation packages provides what we required, but they are not suit for embedded
application because of their big footprint. We use a simple and direct approach,
UDP socket interface shown in Fig. 2.21. Acting as the device server for low-
level ACUS, CANPC receives UDP packets and forwards CAN packets to ACUS,
and vice versa. Each UDP packet may contain several CAN packets (up to 64

packets) to improve the throughput.

Dav is implemented in C++ and makes extensive use of the POSIX—compliant
pthread interface 5. Since clients and physical devices may operate at different time-
scales, we assign an asynchronous daemon thread for each physical device. A main
thread listens for new client connections on a well-known UDP port, spawning client-
stub threads on demand to service clients and the devices they request. The overall
system structure of Dav is shown in Fig. 2.21. The center portion of the ﬁgure is
Dav’s software architecture; on the left are the physical devices and on the right
are the clients. Each client has a UDP socket connection to Dav. If the client is
executing on the same host as Dav, then this socket is simply a loopback connection;
otherwise, there is a physical network in between the two. At the other end, Dav
connects to each device by whatever method is appropriate for that device (e.g., RS-
232 for the ranger ﬁnder and CAN for ACUs). Within Dav’s architecture, the threads
communicate through a shared global address space. As indicated in Fig. 2.21, each
device has associated with it a command buffer and a data buffer. These buffers
provide an asynchronous communication channel between the device threads and the
client threads. For example, when a client reader thread receives a new command

for a device, it writes the command into the command buffer for that device. Later,

 

5Although the current implementation of Dav’s software architecture is on LINUX system, it is
straightforward to port on Window system

57

2n
:6on E020

3

ESmoE E620

 

  

nod—8m an:

Client stub thread M

Client stub thread

      
  

Zn 0033

E 830n—

Device thread 1

carom use 726 ..wov coats—e
258% oo_>oQ

  

 

Figure 2.21: The software architecture of Dav’s control system.

58

when the device thread is ready for a new command, it will read the command from
its command buffer and send it on to the device. Similarly, when a device thread
receives new data from its device, it writes the data into its data buffer. Later, when
a client thread is ready to send new data from that device, it reads the data from the
data buffer and passes it on to its client. In this way, the client program threads are
decoupled from the device daemon threads (and thus the clients are decoupled from
the devices). Also, by the nature of threads, the devices are decoupled from each

other, and the clients are decoupled from each other.

Since the bandwidth and latency are primarily important for real-time system, we
do not apply synchronization (e.g., semaphores) on the shared memory in the middle
of Fig. 2.21. Thus, old data may be read when the client thread’s reading frequency
is higher than device thread’s refresh rate. On the other hand, a client may miss data
when its reading frequency is lower than device refresh rate. This is reasobable, since
we treat the lately arrived data useless; it is hence unnecessary to keep a long data

buffer for each device.

Dav’s software architecture currently supports a variety of devices, including the
popular peripherals (e.g., ACUS, GPS, camera and frame—grabber combination, mi-
crophone and sound card, range ﬁnder). Devices with serial (RS-232), CAN or PCI
interface can be easily added to the system by developing a device driver and modi-
fying a conﬁguration ﬁle. In order to provide a uniform abstraction for a variety of
devices, we chose to follow the UNIX model of treating devices as ﬁles. Thus the
familiar ﬁle semantics hold for Dav’s devices. For example, to begin receiving sensor
readings, the client opens the appropriate device with read access; likewise, before

controlling an actuator, the client must open the appropriate device with write access.

By default, client programs receive data at 10Hz, which is roughly the signal
latency via neural circuit from the brain to peripheral neurons. Namely, a client can
expect to receive a data packet containing the current data from all the subscribed

devices in every 100ms. Certainly, by sending all the data at once, Dav’s device

59

server might repeatedly send old data from a device that operates at less than 10Hz.
We designed Dav’s architecture in this way for one reason: simplicity. By always
transmitting the current state for all subscribed devices, regardless of the time-scale
of the device, we facilitate the writing of client programs. As a result, clients are able
to use a simple read loop to receive data from Dav’s device server. It is worthy to

note client programs can issue command to Dav at any time.

2.7 Experiment

All Dav’s joint and the control system have been successfully implemented and tested.
Experiments were conducted to evaluate the performance of the robot.
Fig. 2.22 shows the movements of the neck and head modules. Fig. 2.23 shows

the movements of mobile base. Fig. 2.24 shows the arms’ movement.

2.8 Summary

This chapter presented the mechanical structure of the robot. The kinematic model
that describes the relation between the joint variables and the position of end-effectors
was derived. For the purpose of the lower-level controllers, the dynamic models of
Dav’s mechanism were outlined. The distributed embedded control system and the
main computer system for mental development were sketched.

It is known the kinematic and dynamic models of a humanoid robot are extremely
complex [30]. For example, due to the redundancy DOFs of a humanoid, the inverse
kinematics, the mapping from end—effector coordinates to joint variables can not be
easily written a program to simulate. The learning seems to be a promising alternative
solution. It is known that a robotics model that does not encompass learning will
always require a human programmer for building the movement plans [47], which is

not acceptable for a autonomous robot.

60

In the next chapter, we will introduce a new learning framework for robotics: a
theory of developmental mental architecture and Dav’s implementation of this archi-

tecture.

61

 

Figure 2.22: The movement of Dav’s neck and head modules. The ﬁrst, second and third
rows show the rotations of the neck’s Roll, Pitch, and Yaw joints respectively; the fourth
and ﬁfth rows show the tilt and pan motions of the eyeballs respectively.

62

 

Figure 2.23: The movement of Dav’s mobile base.

63

 

Figure 2.24: The movement of the arm’s Joint 1, Joint 3 and Joint 4, from the top row to
the bottom.

64

Chapter 3

Dav Architecture Design: A
Theory of Developmental Mental

Architecture

The software architecture of a developmental robot is a challenging new research
subject. This chapter presents a theory of developmental mental architecture. Five
architecture types, from the simplest Type-1 (observation-driven Markov decision
process) to Type-5 (DOSASE MDP), are introduced. The properties and limitation
of a simpler one are discussed before the introduction of the next more complex one.
Further, we present the architecture design of the Dav robot. The framework of
the Dav architecture is hand-designed, but the actual controller is developed, i.e.,
generated autonomously by the developmental program through real-time, online
interactions with the real physical environment. We present the Dav architecture
and the major components that realize the architecture. The designed architecture
for Dav is the next generation version from its extensively tested predecessor - the

SAIL developmental robot.

65

3. 1 Introduction

Many different architectures have been proposed in the intelligent robot commu-
nity. Robot perception and perception-based behavior generation have been proved
very difficult, especially in unknown or partially unknown environments. Some work
on robot learning was motivated by human learning and development: from sim-
ple to complex. BAIRN (a Scottish word for “child”) is a symbolic self-modifying
information processing system used as an implementation for a theory of cognitive
development [102]. Drescher [29] utilized the schema, a symbolic tripartite structure
in the form of a “context-action—result,” to model the ideas of child sensory-motor
learning in Piaget’s constructivist theory of cognitive development. Soar [56] and
ACT-R [6] are two well-known symbolic systems with an aim to model cognitive pro-
cesses, although cognitive development was not the goal of these efforts. The model
of Albus [4] takes sensors as input and produces control signals to effectors as output,
thus, allowing a ﬁner numeric representation of sensory features when a speciﬁc task is
given. The Finite State Machine (FSM) or its probability based variant, the Markov
Decision Process (MDP), are two general frameworks that have been used for con-
ducting autonomous learning with a given goal in a symbolic world (e.g., Shen [82])
or reinforcement learning (e.g., Kaelbling et al. [49]). The explanation-based neural
network, called “lifelong learning,” was used by Thrun [92].

Designing a program and its representation in a task-speciﬁc way using a tradi-
tional approach is typically very complex, ad hoc, and labor intensive. The resulting
system tends to be brittle especially in unknown and uncontrolled environments.
Recently an important direction of research aims at reducing or avoiding the human
imposed limitations (e.g., features, models) of the world for better environment adap—
tation in learning. Cresceptron [108] is a system that allows a human teacher to inter-
actively segment natural objects from complex images through which it incrementally
grows a network that performs both recognition and segmentation. SHOSLIF [43],
SARCOS [97], Cog [19], and Kismet [14] are motivated by simulating infant skills via

66

learning through interactions with the environment.

The efforts discussed above were motivated by human learning and cognitive devel-
opment to various degrees. However, these proposed architectures are fundamentally
different from human learning and have not reached autonomous mental development
(AMD) [118]. The SAIL robot [106] and the Darwin V robot [5] are two prototypes
of developmental robot. Darwin V was designed to provide a concrete example of
how the computational weights of neural circuits are determined in a controlled envi-
ronment by the behavioral and environmental interactions of an autonomous device.
The SAIL developmental robot was designed for developing perceptual and behav-
ioral skills through interactions in uncontrolled, complex human environments (see
the eight challenges for AMD in Section 2). There has been a lack of systematic

theory for developmental mental architecture.

This chapter introduces a theory of mental architecture suited for autonomous
development. As an architecture design example, it presents a new architecture for
the Dav robot [37], as shown in Fig. 2.1 the next generation developmental robot
after SAIL. The major differences between SAIL and Dav fall into two categories, the
body and the “mind.” Dav’s body is substantially more sophisticated than SAIL’s
body which only has a single arm. The software architecture of Dav takes advantage
of the rich sensors, especially interaction sensors, available on the Dav body. The Dav
architecture, presented here, is signiﬁcantly more complete and integrated than SAIL
(e.g., the SASE concept) and has not been published before. However, it takes into

account the extensive experience that has been gained from its predecessor SAIL.

Although the overall design of the Dav software architecture has not been tested
yet, most of its key components have been successfully tested on SAIL. Architecture
design for an intelligent robot is one of the most difﬁcult research topics in intelligent
robot research, especially for humanoids. As far as we know, no such highly extensive

and complete developmental architecture has been published.

Section 3.2 outlines the AMD paradigm. A theory of mental architecture is pre-

67

sented in Section 3.3. Sections 3.4 and 3.5 presents the software architecture and its

major components of the Dav robot. Section 3.6 provides concluding remarks.

3.2 AMD Paradigm

Shown in Fig. 3.1, the AMD paradigm has two phases. In the ﬁrst phase, construc-
tion and programming, tasks that the robot will end up learning are unknown to
the programmer. A task-nonspeciﬁc program, called the developmental program, is
written during this stage. The most fundamental difference between a developmental
program and a traditional program is that the tasks that the machine will execute
are unknown to the programmer during the time of programming.

In the second phase, autonomous development, the robot is turned on at time
t = 0, the robot starts to interact with the physical environment in real-time through
continuously sensing and acting. Humans teach the robot to learn tasks from simple
to complex via teacher “arranged experience” called scaffoldingl.

Unlike a traditional non-developmental robot, the following eight challenging re-

quirements are necessary for practical AMD:

1. Environmental openness: Due to task non-speciﬁcity, the developmental robot

must deal with unknown, uncontrolled complex environments.

2. High-dimensional sensors: The developmental robot must directly handle con-
tinuous raw signals from high-dimensional sensors, e.g., vision, audition and

tactile sensors.

 

1Scaﬁ‘oldz'ng is the process of using developed simple capabilities to further develop more complex
capabilities, through further experience (with or without a teacher), without the need of manual
modiﬁcation of the developmental program. Lev Vygotsky [100] proposed the concept of “zone of
proximal development” (PZD) which is a latent learning gap between what a child can do on his
or her own and what can be done with the help of a teacher. Wood, Burner & Ross [121] used
the term “scaffolding” to describe such an instructional support through which the child can extend
or construct current skills to higher levels of competence. Through this process, the scaffolding
(arranged experience) is slowly removed.

68

 

 

 

 

 

 

 

 

 

 

 

 

Ecological Given to Given to Given to

conditions . . .
l Given to
Release . _ _ _ _ .
Turn Training, Training, Training.
‘] on testing testing testing
A Sense, act, sense, act. sense. act
gent =
Construction & Autonomous Time
programming —> <——— development ————>
phase phase

Figure 3.1: The autonomous development paradigm

. Completeness in using sensory information: Due to environment openness and
task non-speciﬁcity, it is not desirable for a developmental program to discard,
at the design phase, sensory information that may be useful for future, unknown
tasks. Certainly it selects related information useful for a particular task, after

its task-speciﬁc representation is autonomously derived after birth.

. Online processing: At each time instant, what the machine will sense next is

affected by what the machine does now.

. Real-time speed: The sensory/ memory refresh rate must be fast enough so that
a physical event (e.g., motion, voice, and vision) can be sampled properly and
processed in real-time (e.g., at about 10Hz). It must handle one-instance learn-
ing: learning from one instance of experience. Time consuming iterations should

be avoided.

. Incremental processing: Acquired skills must be used to assist in the acquisition
of new skills as a form of “scaffolding”. Each new observation data must be

used to update the current representation and discarded after updating.

. Performing while learning: Conventional robots perform after they are built,

however, a developmental robot must perform while it “builds” itself (mentally).

69

8. Scale-up to complex tasks: A developmental robot must be able to perform

increasingly more complex tasks while gaining more experience.

After solving a series of technical problems, SAIL realized all of the above neces-
sary capabilities. However, a realization of the above capabilities has a level. The Dav
developmental architecture aims to signiﬁcantly advance such a level plus following

new concepts and components:

1. The realization of the SASE agent model, which enables the robot to perceive

and autonomously control the voluntary parts of internal “brain” activities.
2. Integration of multiple sensorimotor subsystems for multimodal learning.

3. Integration of the architecture with a value system.

This chapter concentrates on these new concepts and new components in the Dav

software architecture design.

3.3 Observation-driven Markov Processes

The architecture of intelligent agents is a complex issue. In this section, we introduce
a series of architectures, from simple to complex, and the associated properties.

We ﬁrst deﬁne the concept of internal and external environments of an agent.

Deﬁnition 3.3.1 (Environment) The internal environment of an agent is the
brain ( or “the central nervous system”) of the agent. The external environment con-
sists of all the remaining parts of the world, including the agent’s own body ( excluding

the brain).

Having deﬁned the external and internal environments, we deﬁne the sensors and

effectors associated with these concepts.

Deﬁnition 3.3.2 (Internal sensors and effectors) An external sensor 5,. is a

sensor that senses the external environment. An internal sensor S,- is a sensor that

70

Last Primed
context context

p--—--—

S DDDDDDDDD DDDDDDDD
Sen " DDDDDDDD DUDDDDDD
501‘s:

5,, $313 [11:11:15 DDDDDDDD

 

 

 

DECIDE]
DDDDD
DECIDE]
DECIDE]

C1D DUDE DDDDDDDD

J l l l l l L 1 l_1 I l l 1 l l l l ‘

Time
IDDDDDZIDD EDDDDUDDDDD
DDDDDZIDD DDDDDDDDDDD

:l

:1
EJDDD :ll'JL—JC Cl DDDDDDUD
1E] DDDUDC D D DDDUDDUD

 

 

 

 

 

 

Effectors:

 

BE] CH]

 

 

 

 

 

 

 

 

 

 

 

 

. -3

Figure 3.2: The Type-1 architecture of a multi-sensor multi-effector agent: Observation-
driven Markov decision process. Each square in the temporal streams denotes a smallest
admissible mask. The Type-1 architecture takes the entire image frame without applying
any mask. The block marked with L is a set of context states (prototypes), which are
clusters of all observed context vectors l (t).

 

 

 

 

senses the internal environment. An external effector E,3 is an effector that acts on the
external environment. An internal eﬁector E, is an effector that acts on the internal

environment.

3.3.1 Type-1: Observation-driven MDP

An agent has a number of sensors and effectors. Fig. 3.2 illustrates a multi-sensor
multi-effector model of an agent. The agent A(t) operates at equally spaced discrete
time instances t = O, 1, We assume that an image is produced at each time
instance by the sensor, independent of the sensing modality, visual, auditory, touch,
etc. Without loss of generality, we assume that the agent has two external sensors
and two external effectors. Each external sensor Sci; i = 1, 2, senses a random multi-
dimensional sensory frame xe(t) = (xel(t),xe2(t)) at each time instance t and the
sensed signal is fed into the agent. Each external effector Eei, i = 1, 2, receives from

the agent an effector frame ae(t) = (ae-1(t), ae2(t)) at each time instance t.

71

Let x, E X and pt 6 ’P be the observations and outcome covariates (i.e., ran-
dom vectors) at time t, respectively. Note that we change a variable of a vector
to its subscript (e.g., change x(t) to 513,) when it is more convenient to consider the
variable as a discrete index number. Let H, be the random vector of the history:
H, = {xt,xt_1,...,xo,pt_1,...,p0}. At time t, the agent A(t) needs to estimate the
distribution of P(p, | H, = h).

If t is large, H, is too large to be practical and it contains much information that

is not very related to the outcome Ht.

Deﬁnition 3.3.3 (Type-1) The Type-1 mental architecture is a k-th order
Observation-driven Markov Decision Process (MDP) [27, 122] that reduces the H,

to contain only the last k observations:

It = {xii xt—li "') xt-kipt—li "'vpt—k}

as shown in Fig. 3.2. The random observations in 1; across time t = 0, 1, ...,t are the
source from which the agent automatically generates states in the form of clusters l E
C, where 5 consists of all possible observations of the last contexts L = {Ht | 0 S t}.

The predicted consequence p, consists of predicted action at and the predicted value

”in Pt = (at/Ur)-

A major difference between a regular MDP [49, 91] (or HMM [75]) and an
observation-driven MDP are that the states 3, with a regular MDP are hand-designed
by a human programmer but the states with an observation-driven MDP can be auto-
matically generated (developed). With a regular MDP, the programmer must provide
initial estimates for the prior probability distribution P(so), the state transitional
probability P(st | x¢,st_1) and the state observation probability P(xt I 3,). It in
turn, requires that the human programmer establishes a correspondence between the
meanings of the physical events being modeled and the states. Due to the fact that

physical events are not known at the time of programing for the developmental robots,

72

the regular MDP is not suited for the developmental program. In contrast, the obser-
vation driven MDP requires no prior probability and all the probability distribution

P(p, I l, = I) can be estimated incrementally on-the—ﬁy.

Another important difference between the states in the traditional MDP and the
observation driven MDP is the very different nature of the representation. The states
in the former correspond to some objects or events in the world by human hand-
design. The entire set of states is a monolithic representation of the modeled part of
the world. That results in the high brittleness of the agent in dealing with unexpected
events. In contrast, the states in the latter are clusters of observational vectors. They
are not monolithic in the sense that an object in the world can correspond to many
state clusters. Further, each cluster may correspond to an observation of several
objects in the world. Therefore, there is no strict one-to-one correspondence between
a state in the observation-driven MDP and an object of the world. It is the behavior
of the agent (e. g., the same picking apple behavior from different views of the apples)

that shows the discrimination and generalization power of the agent.

In practice, we implement the regressor R using the Incremental Hierarchical
Discriminant Regression (IHDR) [112,114]. Given any observed (last) context I (t),
the regressor R produces multiple consequences (primed context) p1(t), ..., pk (t) with
a. high probability:

{1016), ---,Pk(t)} = B(Wll- (3-1)

Thus, the regressor R is a mapping from the space of the last context [I to the power
set of 'P:
R : c H 27’. (3.2)

R is developed incrementally through the real-time experience. For any t > 0 (after
the birth), it is a total function since it is deﬁned for all elements in [3, but it does not
do well for most elements in L that it has not experienced. It is not an onto function

since its range covers only a very small part of 2P.

73

Therefore, we need a value system that selects desirable contexts from multiple
primed ones. The value system V( t) takes a set of (e. g., k) contexts from the regressor

R and selects a single context:

V(R(l(t))) = V({p1(t),p2(t)i ---,Pk(t)}) = alt) (33)

where 1 g i g k and It varies according to experience. The value function selects
the best consequence 1),-(t) that has the best value v,(t) in p,(t) = (a,(t),v,-(t)). For
example, V({p1(t),p2(t), ...,pk(t)}) = 1),-(t) if i = arg max{v1(t), v2(t), ..., vk(t)}.

The real-time Q-learning algorithm [104] can be used to estimate the value of
each consequence p,(t), i = 1, 2, ..., k, and the agent selects the one (action) with the
highest value.

Therefore, the value system V is a mapping from the power set of P to the space
of P:

v : 2” H P. (3.4)

3.3.2 Type-2: Observation-driven Selective MDP

The Type-1 mental architecture is sensor nonselective in the sense that it is not able
to actively select a subpart of relevant information from the sensory frame (intra-
modal attention) or to attend a particular modality but not the others (inter-modal
attention).

Given a d-dimensional input vector x, the attention can be modeled by an atten-
tion mask m, where m is a d—dimensional vector whose elements are either 0 or 1.
Suppose that the input vector is x = (x1,x2) and the mask is m = (m1,m2). Then
the corresponding attended input vector is x’ = x (8) m = (x1m1,x2m2), where <8)
denotes vector pointwise product. Not all the masks are admissible. For example,
the set of admissible masks consists of circles with different radii p at different center

positions (r0, c0) of the image frame. Then, the attention selection effector has three

74

control parameters: (r0, c0, p).

Without loss of generality in our theoretical discussion, we assume, as shown in
Fig. 3.2, that there are only four addmissible masks for each image frame at time t,
denoted by the upper attention square, the lower attention square, and two trivial
masks: no square is selected and both squares are selected, respectively. Suppose
x = (x1, x2) is an input image frame, where x1 and x2 denote the subimage under the
upper and lower attention square, respectively. With these four admissible masks,
the attended image x’ = x®m has only four cases, (x1, x2), (x1, 0), (0, x2), and (0,0).

For the clarity of the discussion, we will use the notation in the theory of formal
languages and automata. Assume that the subimage in each attention square can
represent a string x E 2*, where 2 is the alphabet and 2* represents the set of all
strings of a ﬁnite length formed from the letters of the alphabet. Each attention square
can contain a long string as long as the image that it models has a sufﬁcient number
of pixels. Therefore, we can write x1(t) = Karl’s body, x2(t) = trees, etc. A similar
notation is applicable to effectors. For example, we can write a1(t) = move-forward,
a2(t) = move-backward, where a1 and a2 are two independently controllable motors

of the effector Eei, i = 1, 2.

Deﬁnition 3.3.4 (Type-2) The Type-2 mental architecture is a Type-1 mental ar-

chitecture, with an addition of an attention selector

TZyXAil—tc,

as shown in Fig. 3. 3, where 37 is the space of all possible pre-attention contexts y =
{l(t) I 0 S t}, A,- is the space of all possible attention selections for T and L is the

space of attention-masked last contexts.

In order to show the properties of different architectures, we deﬁne a concept

called higher.

75

Last Primed
context context

5 DECIDED [31313 BC] DDDDDDDDD
S " ISL—JUDGE] DUE DE] DDDDDUDDD
CDSOTSZ

II]

C]

Se, DUE] DUDE :lElC' DDDDDDDDD
El DECIDE”: DUE DDDDDDDD

 

D

 

 

RENE

 

 

 

 

 

 

 

 

 

 

 

 

 

1 1/1 I L 1 l 1 l l l l 1- d 1 l \l l 1 l l L l l l l >
/ ’---- 1. in".
c DDDDDJDDCDDEUD DDDDDDD
Eﬂt' DDDDDJDDCJDCDD DDDDDDDD
CC OI'SI
a,t DUDE JDDDDDDD
1 DUDE JDDDDDDDD

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 3.3: Progressive addition of architecture components for Type-2 to Type-5. Type-2:
adding T and En. Type-3: Adding M and E12. Type-4: Adding S,- and primed sensation.
The block marked with D is a delay module, which introduces a unit—time delay. Type-5:
Developmental T, R, M and V.

Deﬁnition 3.3.5 (Higher) Given a set D of tasks, we say that a developmental
architecture A2 is higher than another developmental architecture A1, if given the
same teaching environment E, the architecture A2 requires fewer statistically expected

teaching examples than A1, over the environment E and over the tasks in D.

Here the term “higher” is motivated by the concept “higher” animals.

As a convention, we regard environment as a part of a task. A task is more
challenging if the environment is uncontrolled. In the above deﬁnition, since the
architectures are developmental, the developmental program that guides the develop-
ment of the architecture must be task—nonspeciﬁc. Otherwise, one can always deﬁne
an architecture specialized for a particular type of task and this architecture requires
fewer teaching examples than a more general developmental architecture. However,
the former is not able to deal with other tasks that require more general cognitive

capabilities.

76

Theorem 3.3.1 (Existence of higher architecture) There is at least one class
D of tasks in which only a proper subcomponent of sensory input is related to the
tasks and the associated teaching environment E in which the T we? architecture is

higher than the Type-1 architecture.

Proof: Assume a set D of tasks whose goal is to classify sensory information in set
C (e.g., human body). Without loss of generality, assume that at any time, only
one of the attention squares of sensor 881 contains an element (e.g., human body)
in C and the other window does not (e.g., natural background that is free of human
bodies). For Type-2 architecture, the teacher creates such a teaching environment.
First, he teaches the attention selection control of T (e.g., through reinforcement
learning), so that T will open two attention squares one after another through a loop
(to simulate saccades). If the attention-masked input belongs to C, the regressor
R produces the class label as the action output to an external effector (e.g., using
supervised learning). Otherwise, R does not produce any output to the external
effector. Suppose that the attention window does not contain any element in C but
instead contains uncontrolled changing natural settings that never repeat. The Type-
2 architecture performs reasonably well, since at least one of the two ﬁxations enables
it to detect an element in C while disregarding the other square which always presents
a new natural background. In contrast, the Type-1 architecture can only learn from
the monolithic input x(t) = (x1(t), x2(t)). Every vector x(t) is very different from all
others because the natural settings are always new in the environment E, the Type-
1 architecture never observes a learned input x(t) and, thus, almost never performs
correctly. Therefore, for this class of tasks D and this type of environments E, Type-2
is higher than Type—1. C]

It is not true that a higher architecture can learn faster than a lower architecture
in any setting for any tasks. If the environment is such that the entire input vector
x(t) contains only elements in C and nothing of the natural background (which is

very rare in reality), then the attention selection mechanism enabled by the Type-2

77

architecture does not help to reduce the number of training examples.

3.3.3 Type-3: Observation-driven Selective Rehearsed MDP

The Type-2 architecture does not have a motor mapping M. Therefore, it cannot
rehearse an action sequence to evaluate its consequences without actually carrying

out the action sequence.

Deﬁnition 3.3.6 (Type-3) The Type—3 mental architecture is a Type-2 mental ar-

chitecture, with an addition of an action releaser M:

MIPXAHP,

as shown in Fig. 3. 3, where P is the space of all possible predicted consequences, A,-

is the space of all possible attention selection for M.

The action releaser M is a special case of a more general motor mapping which also
generates representation for frequently practiced action sequences (e.g., using the
principal component analysis PCA) so that smooth action sequences can be generated.
The Type-3 architecture enables the agent to hold an action before it is released
to be executed by the environment. It makes it possible for the agent to generate
several actions consecutively but to hold each action without actual execution.
With a traditional MDP with hand-designed states, it is possible to compute
all the possible next states and perform planning. The Q-learning method uses the
estimated action value Q(s,a) of action a at state 3 to select the best action a“ =
maxaeA Q(s,a), from the set A of all the possible actions. This best next action
a‘ maximizes the expected rewards in the future. This kind of approach has two
fundamental problems. First, the agent has a rigid value system. No matter what
value model is used, ﬁnite horizon, time discount model with a small or a large time

discount factor 7, the agent cannot change the way the value is determined [49,91].

78

For example, if a time discount model is used, the agent is short-sighted. It always
prefers multiple small rewards in the near furture to a far away but important reward.
Second, the agent is not able to learn to “think” (predict) using the events that it
learned from experience. For example, fed well and sleep well can be a reasonable
goal for a human infant, but the same value system is not suited for a human adult.

The observation-drive MDP does not suffer from these limitations. However, as
long as the predicted action is released, the effect that it causes to the external world
will result. Such an effect cannot be undone. For example, suppose that the robot
predicts (primes) an action to drop a cup from a hand. As long as it executes this
action, the cup will drop to the concrete ﬂoor and will be broken. Can we design
an architecture that enables the robot to “think” and “plan” a signiﬁcant amount of

time ahead before it releases the action?

3.3.4 Type-4: Observation-driven SASE MDP

Deﬁned in Weng [107] a Self-Aware and Self-Effecting agent not only has external
sensors Se and external effectors E8 that sense and act upon the external environment
(including the robot body), but also has internal sensors S,- and internal effectors E,-
that sense and act upon the internal representation of the brain (not including its
body)

In neuroscience, there is no concept of internal sensors and effectors. The brain
does not need a receptor to sense the signals in the brain, since the brain signals
are already in the desirable form. The concept of internal sensors and effectors is

introduced to facilitate computational understanding of the SASE agent.

Deﬁnition 3.3.7 (Awareness) If an agent A senses the states (presence, absence,
diﬂ'erent forms) of object b through its sensors, we say that the agent A senses the
object b. If the agent is able to recall the association between the sensed various states
of b and their resulting eﬁects in a task domain D sensed by the agent, we say that

the agent is aware of object b in the task domain D.

79

Some explanation is in order here. By deﬁnition, an agent must use its sensors,
the entry point of its sensory architecture (the input of T), to sense an object. For
example, if the state of an object is fed into the architecture, e.g., through the middle
of the regressor R, the motor mapping M or the value system V, the agent does
not sense the object by the deﬁnition because the agent cannot use such information
properly as it does with its sensors. In the above deﬁnition for awareness, we consider
a task domain D because any awareness has a scope. A person who is aware of the
boiling temperature of water in a domain (e.g., in a normal environment) may not
necessarily be aware of the boiling temperature of water in another domain (e.g.,
lower in a low pressure environment).

In the deﬁnition, we require that awareness must associate various states of b
and the corresponding resulting effects. For example, if a human is not aware of the
correspondence between various states of gravity (presence, absence, strong gravity)
and the resulting effects (e.g., to a falling object), he is not aware of the object.

With the above deﬁnition, we are able to deal with the issue of self-awareness and

self-effecting.

Theorem 3.3.2 (Necessary conditions of awareness) Suppose an agent is
aware of its mental activities in a domain. Then the following points must be
true: (1) It senses such activities using its sensors. (2) It feeds the sensed signal
into its perceptual entry point just like that for external sensors. (3) It recalls the
association between the diﬁerent status of the activities and the resulting eﬁ’ects to

the environment.

Proof: Point (1) is true because, according to Deﬁnition 7 for the awareness of an
object, the agent must sense the object using its sensors. Point (2) is true because the
status of the object must be sensed and fed into the entry point for sensors for proper
perception and effect recall. Point (3) is true because the deﬁnition of awareness

requires the recall of such an association. Cl

80

Based on the previous theorem, let us examine the issue of self-awareness. If an
agent runs a Q—learning algorithm (or any algorithm for that matter) but it does not
sense the algorithm using its sensors which are linked to its entry point for sensors,
the agent is not aware of its own algorithm. For the same reason, humans do not
sense the way their primary cortex works and, therefore, normally they are not aware
of their own earlier visual processing. This early processing is subconscious, in the
sense that it does not require a conscious decision. However, the voluntary part of
the mental decision process does require a conscious, willful decision. Therefore, in
the architecture design, the parts that require voluntary decisions must be sensed by

the agent, and the sensed signals must enter through the entry point of the sensors.

Deﬁnition 3.3.8 (Type-4) The Type-4 mental architecture is a Type-3 mental ar-
chitecture, but additionally, the internal voluntary decision is sensed by the internal
sensors S,- and the sensed signals are fed into the entry point of sensors, i.e., the
entry point of the attention selector T. In order to recall the effects of the voluntary
actions, not only is the expected reward value estimated by the value system, but also
the primed context, which includes not only the primed action, but also the primed

sensation.

The architecture illustrated in Fig. 3.3 is a Type-4 architecture. Two voluntary
internal actions are modeled by En for attention selection, and by Ea for action
release. Both internal actions are sensed by the internal (virtual) sensors Sn and
8,2, respectively. The rehersed external action (not released) is sensed by the virtual
internal sensor 5.3. Note that when the action is released, it is sensed as external
action from E81 and Egg. The primed sensation (which predicts the sensation of SCI
and 3.22) is used by the value system in selecting the best action according to, e.g.,
the novelty (suprise) or the nature of the reward (e.g., sweet or bitter).

The regressor R maps each attended context I 6 .C to a set of multiple primed

contexts, from which the value system selects a single primed context p E P. In other

81

words, the composite function of R followed by V gives a mapping: V o R : .C H P.
With a SASE agent, both external context (sensed by Se) and internal context (sensed

by S.) are available in l.

With a consecutive time series t = 1, 2, ..., k, the composite function VOR performs

a series of regressions, represented by a regression sequence:

3 = ((l1,p1), (12,122), (11.1%))

where each regression pair (1,, p,) is an input-output pair of the composite function
V o R, p,- = V o R(l,-), i = 1,2, ..., k. The link between two consecutive regression
pairs can be realized by two paths, the external path and the internal path, denoted
by e and i respectively. If a part of the primed context p,- is executed by an external
effector and the effect of the action on the external world is sensed by an external
sensor Se, the path corresponds to an external path. For example, if a mobile robot
moves ahead, the objects look closer after the motion. If a part of p,- is sensed
by one or multiple internal sensors, the path corresponds to an internal path. For
example, if the primed action is to jump, the following input to the regressor R will
contain a representation of this action through a (virtual) Si, no matter if the action
is executed or not. Symbolically, the reasoning process can be represented by the

following composite reasoning sequence:

61 62 ek-l 3k
S = “11191): . a(l21p2)i ' 1'": . 1(lkapkli . (3'5)
21 z2 Zk—i 2k

where

represents the external and internal paths. Whether the result of external and internal

paths are taken into account by the regressor in the reasoning process depends on the

82

attention selection in T.

Deﬁnition 3.3.9 (External and external reasoning process) There are three
types of reasoning processes, external, internal, and mixed, corresponding to the atten-

tion in which the attention module T attends to external, internal or both, respectively.

We can readily arrive at the following conclusions:

0 Type—1 through Type—3 architectures allow the agent to perform external rea-

soning processes (i.e., practice grounded in the physical world).

s Type—1 through Type-3 architectures do not allow the agent to perform internal
reasoning as deﬁned. For example, Type-3 is not aware of the primed context

because it does not have internal sensors.

0 A Type-4 architecture is able to execute external, internal, and mixed reasoning

processes.

One might conclude at this point that the internal process looks like “thinking.”
However, if the internal representation is hand-designed for a given speciﬁc task, the

internal process is still far from what is called thinking by higher animals and humans.

3.3.5 Type-5: Developmental observation-driven SASE
MDP

In the architecture discussed above, no requirements have been placed in terms of
how the architecture is created. Speciﬁcally, how should the mappings T, R (and L),
M, and V be created?

Deﬁnition 3.3.10 (DOSASE MDP) The developmental observation-drive SASE
MDP (DOSASE MDP) has an architecture Type-4 or higher, that satisﬁes the fol-

lowing requirements:

83

1. During the programming time, the tasks that the agent will learn are unknown

to the programmer.

2. The agent A(t) starts to run at t = 0 under the guidance of its developmental

program Pd. After the birth, the brain of the agent is not accessible to humans.

3. Human teachers can only affect the agent A(t) as a part of its environment
through its sensors and effectors recursively: At any time t = 0,t = 1, ..., its
observation vector at time t is the last context l(t). The output from A(t) at
time t is its selected primed context p(t) E P. A(t — 1) is updated to A(t),
including T, R (and L), M, and V.

In contrast with the traditional MDP, the DOSASE MDP (A(t), Pd) is develop-
mental in the sense that the developing program Pd does not require a given estimate
of the a priori probability distribution P(l) for all I E (I, nor even a given set of
states. Consequently, Pd does not require a given estimate for the state observation
probability P(lt | l¢_.1) nor that for the state observation probability P(xt | It).

When the number of states is very large, it is sufficient practically to keep track
of the states that have a high probability, instead of estimating probability of all the

states, which is too computationally expensive to reach real-time speed.

3.3.6 Type-6: Multi-level DOSASE MDP

The Type-5 architecture has only one sensorimotor level, although each mapping T, R,
and M have multiple levels in their own internal structure. We call it a sensorimotor
level because the pathway from T through R up to M corresponds to a pathway from
sensory input to motor output. The primed context of such a level can be fed into

another sensorimotor level for the following reasons:

1. Abstraction. While a low level is tightly linked to ﬁne time steps, the higher

levels become more “abstract,” in the sense that the higher level clusters of

84

context states are coarser in temporal granularity and grouped more according
to actively attended events. That is self-directed abstraction according to active
attention. For example, when the agent visually tracks a walking person, the

attended part of the person’s images are grouped.

. Self-generated context: Allow voluntarily generated motor actions to serve as
context input to the higher level. Thus the agent is able to “talk to itself.”
This capability also helps abstraction because the variation of self-generated
motor actions has a smaller within-class variation and a lower dimension that

the typical sensory inputs from the external environments.

. Enabling a higher degree of sensory integration: It is not practical to integrate
all the receptors in a human body by a single attention selection module T, oth-
erwise, the attention is too complex. Sensors that are related more closely (e.g.,
touch sensors of the same ﬁnger) are integrated ﬁrst by a low level sensorimotor
system and the output of these sensorimotor systems are fed into the next level

which integrates a larger scale of sensors (e.g., touch sensors of different ﬁngers).

With the above considerations, we introduce the Type-6 architecture.

Deﬁnition 3.3.11 The Type-6 mental architecture is a Type-5 mental architecture

with several levels of sensorimotor systems. Each Type-5 architecture is considered a

sensorimotor system. The primed contexts from the lower-level sensorimotor systems

are fed into the sensory input of the higher-level sensorimotor systems.

Fig. 3.4 illustrates the Type—6 architecture. The input to the attention selector

T9) at level 2 includes the primed context p(t) = (xp(t), ap(t)) from level 1, where

xp(t) and ap(t) are primed sensation and primed action, respectively. One or multiple

sensorimotor systems can feed their primed contexts into a single sensorimotor system

at the next higher level for sensory integration.

It is possible to prove that there is at least one task set and the associate teaching

environment where the Type—6 architecture is higher than the Type—5 architecture.

85

Last Primed
context context

Sol
Sensors:
3:2

Ex: I
Effectors:
5:2

 

Figure 3.4: The Type-6 architecture.

We have systematically introduced six types of architectures. Although the order
in which new capabilities are added is more a design choice than a necessity, the
order used here is motivated by a relatively large payoff in capability with a minimal

addition of the architecture complexity.

3.4 Dav Architecture

The Dav architecture design was guided by the theory of architecture discussed in
the previous section. It’s architecture belongs to Type-6. The basic modules T, R,
and M in the last section have more functions with the Dav architecture. In the
Dav architecture, the attention selection module T discussed above corresponds to
the sensory mapping, the regressor R corresponds to the cognitive mapping and the
action releaser M corresponds to the motor mapping. In the sensorimotor system
of the Dav architecture, two cognitive mappings are used, implemented by the same
engine IHDR. The reality mapping R predicts contexts of near (immediate) future,

while the priming mapping F predicts contexts of far future.

86

As shown in Fig. 3.5, the Dav architecture consists of parallel layers of the senso-
rimotor levels. The lower the level, the less spatial and temporal extent it deals with.
The fewer receptors and effectors a level integrates, the less spatial extent it deals
with. Also the lower the level, typically the shorter the temporal context it copes
with. Thus, the lower the level, the simpler its perceptual and behavioral capabilities

are.

3.4.1 Open view

Fig. 3.6 gives an open view of a sensorimotor system. A sensorimotor system can be
regarded as an observation-driven MDP. However, as we discussed in the last section,
it is not the same as the traditional MDP with hand-designed states.

Each circle in Fig. 3.6 indicates a local context state, corresponding to a single
time frame. At each time frame, the system grabs the current sensory input x, which
is used for updating the last context L. Two mapping engines: R and F are used to
map from the last context L to a set of primed contexts P. In Fig. 3.6, a priming
transition from one context to the next is indicated by a dashed line.

Fig. 3.7 illustrates how the temporal trajectory is generated from the visited con-
text states. What happens in the physical world (including the action of the agent)
and the internal world (including the internal action of the agent) jointly determine
the last context, which in turn determines which context state is visited. The next
context visited provides information for updating the primed contexts from the cur-
rent context state. In Fig. 3.7, such updating is simply denoted as an addition to the

set of primed contexts P.

3.4.2 Recursive view

The above open model of a sensorimotor system is straight forward to visualize the
concept of context state, but it is not suited for showing how the architecture realizes

the system. The recursive model of a sensorimotor system, shown in Fig. 3.8, can

87

better illustrate the computational architecture of a sensorimotor system. As shown
in the ﬁgure, both the reality mapping R and priming mapping F are accomplished
by a mapping engine. Each context state is represented by a vector in the input space
of the mapping engine.

With the recursive model, we can see that the effect of a value system V can be
modeled as a mapping that takes multiple primed contexts from the reality mapping
R or the priming mapping F, and selects a preferred context. Its action part is sent
to all the corresponding effectors. In the past, supervised learning is separated from
reinforcement learning. Reinforcement learning only works on an externally given
scalar reward value. We treat a value system in a more general way. Its innate part
is driven by physical pleasure (e.g., a sweet taste) or pain (e.g., an electric shock). Its '
learned part is responsible for learning values of events from the external world (e.g.,
working hard is good). Thus, the value system uses not only a reward value, but the

primed context as well (including sensations and actions).

3.4.3 Architecture view

A more detailed block diagram of the architecture of a sensorimotor system is shown
in Fig. 3.9. Each internal and external action output feeds back, through a delay
unit, into the next sensory input. This is required by the SASE agent model: the
agent must sense and perceive what it does, internally and externally. The input to
a sensorimotor system, indicated by the two left-most arrows in Fig. 3.9, is its target
for perception and cognition.

A sensory mapping [127] is needed wherever an attention selection effector (local
analysis) or dimension reduction is needed. As shown in Fig. 3.9, a sensory mapping is
used to enable attention selection from the current sensory input vector and another
sensory mapping enables the attention selection of the current input, the previous
context, or both. As mentioned earlier, each sensorimotor system has two cognitive

mappings, the reality mapping R and the priming mapping F. A near future context

88

from R is useful for carrying out skilled procedures. A far future context from F is
needed to predict the future consequence (e.g., 10 frames or more into the future).

The source of input to the priming mapping has two alternatives: the same input
as the reality mapping (as shown in the ﬁgure) or the primed context output from the
reality mapping. The former allows more accurate control of the timing in the primed
context while the latter may enable more efﬁcient “abstraction” of the context in the
priming mapping, since the primed context from the reality mapping has already
taken the output action into account (e.g., the discriminant analysis by the IHDR
tree in section 4.4).

The priming mapping, implemented by a cognitive mapping engine, IDHR needs
a prototype updating queue whose function is to predict far future context using the
Q-learning algorithm [41] in a recursive way. The reality mapping does not need such
a queue because it only predicts the next near future.

The motor mapping of a sensorimotor system generates concise representation for

stereotyped actions, in addition to the function of action release.

3.5 Major components

The architecture in Fig. 3.9 divides the information processing into three major map—
pings: the sensory, cognitive and motor mappings. The sensory mapping takes sen-
sory inputs. It may be followed by another sensory mapping to increase the extent
of coverage of space and time. The sensory mappings are followed by a cognitive
mapping.

The basic difference between the sensory mapping and the cognitive mappings is
that the sensory mapping does not use the information of motor output for its feature
space development, but the cognitive mapping does.

The motor mapping has two functions: (1) generating a higher motor representa-

tion in space and time for motor output space and ( 2) receiving signals from cognitive

89

mappings in terms of a higher motor representation and recovering the detailed con-
trol signal for every motor that it handles.

In this section, we ﬁrst discuss these three major mappings, next introduce an
innate value system, and ﬁnally an algorithm is given to summarize the sensorimotor

architecture.

3.5.1 Sensory mapping

The sensory mapping provides representation for all possible receptive ﬁelds in space
and time and allows attention selection. In the past the role of representation has
been well recognized but the purpose of attention selection has not been adequately
studied.

Fig. 3.10 shows the hierarchical spatiotemporal organization of sensory mapping.

The major functions of the sensory mapping include:

1. Grouping different sources of sensory input (e.g., different pixels in an image or
visual and auditory inputs) in space and time. The term “sensory” should be

understood as including both raw sensory inputs and processed internal signals.

2. For the grouped set of input, maintaining a complete representation for a hier-
archy of all possible (sampled) receptive ﬁelds, for the purpose of attention. By
sampled receptive ﬁelds, we mean a large but ﬁnite number of receptive ﬁelds

at different positions and sizes in space and time.

3. Automatically deriving features in the combined space as new representation.
It cannot assume a predetermined representation and therefore, it does not use

a symbolic representation.

4. Reducing the dimensionality of the new representation, while minimizing the

loss of necessary information.

90

5. Execution of attention selection as an attention effector, controlled by a signal

from other parts of the brain (top down control).

In [127], a simpliﬁed sensory mapping, Staggered Hierarchical Mapping (SHM), is
implemented. SHM uses the Incremental Principal Component Analysis (CCIPCA)
method to automatically develop orientation sensitive and other needed ﬁlters. In
addition, the internal representation generated by SHM for receptive ﬁelds at different
locations and sizes is nearly complete in the sense that it does not lose important

information.

3.5.2 Complementary Candid Incremental Principal Compo-
nent Analysis

Complementary Candid Incremental Principal Component Analysis (CCIPCA) is a
stochastic approximation algorithm to estimate eigenvalue and eigenvector iteratively.
It is the major tool to implement the sensory mapping.

PCA is a well-known technique in data compression and feature extraction. It
gives a linear transform that converts a set of d—dimensional data into a lower-
dimensional space by minimizing the error in the least mean square (LMS) sense.
Therefore, PCA is used to generate new representations from sensory inputs.

A well-known approach to PCA is to solve an eigensystem problem. Given A as the
sample covariance matrix of the data set, one can ﬁnd its eigenvectors and eigenvalues,
sort the eigenvalues in descending order, and construct a k x d matrix T with the
rows being the eigenvectors corresponding to the largest k eigenvalues. The matrix T
is the sought transform [35]. This is the basic idea behind most techniques for PCA,
such as the QR method [36]. However, since this approach requires an estimate of
the covariance matrix, the data set usually needs to be completely available at the
computation time. This is not appropriate for a developmental robot because of two

reasons. First, a developmental robot is an online agent, which senses and responds

91

to the environment continuously. It should not wait until all data is accumulated
before doing the processing. Second, when the dimension of the data is high, both the
computation and storage complexity grow dramatically. For example, in the eigenface
method [50] [94], one of the promising face recognition methods that involves PCA,
a moderate grey level image is of 88 rows and 64 columns, which results in a 5632-
dimensional vector. Since the sample covariance matrix of a data set of d-dimensional
random vectors contains d (d + 1)/ 2 independent numbers, this amounts to 15,862,528
numbers!

[119] proposes an algorithm, called complementary candid incremental PCA
(CCIPCA), to incrementally compute the principal components of sequentially ar-
rived samples without estimating the covariance matrix. Suppose that sample vectors
are acquired sequentially, u(0),u(1), . . .. Each u(n), n = 0,1,..., is a d—dimensional
vector and d can be as large as 5000 or even beyond. The d x d covariance matrix
is neither known nor affordable to be estimated and stored as an intermediate result.

Without loss of generality, [129] assumes that u(n) has a zero mean. The algorithm

of CCIPCA is as follows,

 

 

.nzn‘lv.n_ lunurnﬂﬂi
v.( > n .< 1>+n .< > .< lune—1111’ (3.6)
zine) = ...(n) —u.T(n> “(”l “U” (3.7)

llvilnlll llvi(n)l|’
where, u1(n) = u(n), and v,(n) is the i-th dominant eigenvectors of the samples’
covariance matrix estimated after collecting the n-th sample.

The idea underlying CCIPCA comes from the concept of statistical efficiency [46].
Basically, this deﬁnition says that the most efﬁcient estimate is one that has the
least variance from the real parameter and the variance is bounded by the Cramér-
Rao inequality. For example, the sample mean, 5: = ﬁzz; x,, is the most efficient
estimate for the mean of a Gaussian distribution with known standard deviation

0 [33].

92

For CCIPCA, by substituting itself recursively, Eq (3.6) can be written as,

-n =_1_ n u- 4.1;. THU—1)
v’t( ) it; i(]) i(])|[vi(j_1)ll'

Thus, v,(n) can be viewed as the mean of “samples” w,(j),

1).-(n) = 5: we), (3.8)

where w,(j) = u,( ])u,T( j )ﬂ Although w,( j) is not necessarily drawn from a
Gaussian distribution independently, the estimate v,(n) seems to have a close relation
with the estimate of a sample mean which is the most efﬁcient estimate. [129] have
proved that v,(n) —-> :tA,e,- with probability one when n —> 00, where A,- is the i-

th largest eigenvalue of the covariance matrix of {u(n)} and e,- is the corresponding

eigenvector.

The above analysis prompts a further improvement to CCIPCA. As shown in
Eq. (3.8), all the “samples” ({w, (j )}) are weighted uniformly when v,-(n) is estimated.
Since v,- (j) is far away from its real value at early estimation stage, the so—generated
w,(j) can be regarded as a “sample” with large “noise” when j is small. To help the
convergence of the estimation, it is preferable to give smaller weight on these early
“samples”. A way to implement this idea is to use an amnesic average by changing

Eq. (3.6) into,

v.” :ILZE_110,,_ Lﬂunur ”tin-1)
1( ) 71. t( 1)+ TL 1( l i (n)llvi(n—1)H,

 

(3.9)

where l is called amnesic parameter, typically, ranging from 2 to 4. With l, Eq. (3.9)
automatically adjusts the weights of old estimates and new samples.

In summary, the CCIPCA algorithm is as shown in Fig. 3.11. CCIPCA has been
shown empirically to be a very efﬁcient estimation algorithm compared with SGA

and GHA for high-dimensional data [128].

93

3.5.3 The cognitive mapping: IHDR tree

The R and F mappings in Fig. 3.9 are implemented by two Incremental Hierarchical
Discriminant Regression (IHDR) trees [44]. IHDR is not new for this thesis research.
We ﬁrst survey other related regression methods, then outline the improvements over
the old versions [111,113]. An experimental evaluation is provided at the end.

Learning a function f : X H y in real time for high-dimensional data remains a
nontrivial problem, particularly when the complexity of the function to be estimated
are unknown. By surveying the literature of regression in statistical learning, one can
identify two classes of methods: global model ﬁtting and local model ﬁtting methods.

Global model ﬁtting methods, including Multiple Layer Perceptron (MLP) [l2],
Radial Basis Function (RBF) [22], Lasso regression shrinkage [93] and Support Vector
Machine based Kernel Regression methods (SVMKR) [88], are characterized by ad-
justing a predeﬁned model using a pre—collected training data set via optimizing global
criteria. They are not suited for the real-time learning in high-dimensional spaces de-
spite their theoretic background. First, they require a priori task-speciﬁed knowledge
to select right topological structure and parameters. Their convergent properties are
sensitive to the initialization biases. Second, those methods are designed primarily for
batch learning and are not easy to adapt for incrementally arrived data. Third, their
ﬁxed topological structures eliminate the possibility to learn increasingly complicated
scenes.

Local model ﬁtting methods (e.g., IHDR) use temporal-spatially localized (thus
computationally efﬁcient) models, in the meanwhile, grow the complexity automati-
cally (i.e., the number and organization of local models) to account for the nonlinearity
and the complexity of the problem. They are more suited for incremental real-time
learning, especially for the situation where there is limited knowledge about the scene
and the robot, itself, needs to develop the representation of the scene in a generative
and data driven fashion.

IHDR generates local models to sample the high-dimensional space X x y sparsely

94

based on the presence of data points in a vector quantization (VQ) [51] manner. IHDR
enjoys two nice properties: First, IHDR derives automatically discriminating feature
subspaces in a coarse-to—ﬁne manner from input space X. Discriminating features
are automatically derived discriminating features at the internal nodes of the tree.
The features are most discriminative in the sense that they maximize the trace (or
the determinant) of between-class scatter. In this way input components that are
irrelevant to the mapping’s output are disregarded to achieve better discrimination
and generalization. Second, IHDR organizes its local models in a hierarchical way,
as shown in Fig. 3.14. IHDR’s tree structure recursively excludes many far-away
local models from consideration (e.g., an input face does not search among nonfaces),
thus, the time to retrieve and update the tree for each newly arrived data point x
is 0(log(n)), where n is the size of the tree or the number of local models. This
extremely low time complexity is essential for real-time learning with a very large

memory.

Two kinds of nodes exist in IHDR: internal nodes (e.g., the root node) and leaf
nodes. Each internal node has q children, and each child is associated with a discrim-

inating function:

1.1x) = 3a — c.)TW.-1(x — c.) + gluon/.1), (3.10)

where W,- and c,- denote the distance metric matrix and the x-center of the child
node i, respectively, for i = 1,2, ...,q. Meanwhile, a leaf node, say c, only keeps a
set of prototypes PC = {(x1,y1), (x2,y2), ..., (xnc,y,,c)}. The decision boundaries for
internal nodes are ﬁxed and have quadratic form. A leaf node may develop into an
internal node by spawning (see Step 8 of Procedure 1, Appendix B) once enough

prototypes are received.

The needs of learning the matrices W,, i = 1, 2, ..., q in (3.10) and inverting them

make it impossible to deﬁne W,- (for i = 1, 2, ..., q) directly on the high dimensional

95

space X 2. Given the empirical observation that the true intrinsic dimensionality of
high dimensional data is often very low [77]. As shown in Figs. 3.12 and 3.13, it
is possible to develop most discriminating feature (MDF) subspace D to avoid the
degeneracy and redundancy caused by irrelevant dimensions of the input data (“curse
of dimensionality” [10]).

As shown in Fig. 3.14, IHDR partitions the input space X into regions: R1, ...,
RN and each of them corresponding to a leaf node. In each of those region, a linear
regression model is used to represent the data samples received. Let the x be the
input querying point. Approximating function f is then accomplished by computing
the local linear regression models and by forming a ﬁnal prediction from the weighted

average of the individual predictions

N
y = 2: IR): (x)yk)
k=l

where 51,, denotes the predicted output for the linear regression model covering the
region R), (see Eqs (A5) and (A6) in Appendix B). The weight IRk(x) represents
an indicator function, which denotes the likelihood of the input x belonging to the

region R), and is deﬁned as

I. if X E R];
1R1; (X) =

0 otherwise.
Due to IHDR’s hierarchical organization of those regions, the computational com-
plexity IRk(x) is 0(log N) where N is the number of leaf nodes. Therefore the speed
of computing the weight is fast even the tree has grown to be a huge one.

In summary, the following improvements were suggested by this thesis:

1. Amnesic average is proposed to relieve the problem of initializing bias. At the

 

2The complexity of inverting an d x d matrix is 0(d3). When (1 is big (e.g., 5000), it is infeasible
to compute online in real-time.

96

beginning of the incremental procedure, because the number of samples is small,
the clusters are usually ill-generated. So are the boundaries of the cluster, which
will inﬂuence later partitions of the tree. We use the amnesic average technique

to gradually get rid of the effects of earlier data.

Suppose {x,-,i = 1, 2, . . . , n — 1} are the data entering the system sequentially.
Written in an incremental style, their sample mean 27“” and scatter matrix Flt"),

after receiving the nth sample, are given by,

- 1 — 1
53m) ___ n #(Tllim—i) + + #(n)
n n

 

xn, (3.11)

 

r55) = n ‘ I; M”) nan-1) + 117-@025; — x)(x,, — x)". (3.12)

u(n) is the amnesic parameter which controls the update rate, depending on n

in the following way:

0 ifn_<_n1
u(n) = b(n — n1)/(n2 — m) if n1 < n S n2, (3-13)
b+(n—n2)/d lf'ng <77.

where b, n1, n2 and d are parameters. Let us remark that u(n) is a piece-wisely
linear non-decreasing function. u(n) —> 1/d when 71 becomes big. This means
that new samples still have effects and old samples are “forgotten” gradually

when n ——> oo.

. We use the augmented space Z = X x y for clustering, while [111,113] use 32 for
labeling. This is desirable since it encourages more elaborated representation
using the information of input space. For example, in supervised learning, there
are samples whose x might be dramatically different but their labels y are the
same. In this case, using 31 information for clustering will cause those samples

indistinguishable and, thus, the derived discriminating subspace ’D is singular

97

and the yielding tree is unbalanced. Furthermore, adding X information may
increase the locality of the sgenerated clusters, which facilitates linear regression

model on a local region.

We use the following distance metric for clustering:

dizwzllx—clll +wylly—yill, (314)
0;; 0y

where w: and wy are two positive weights that sum to 1: w; + wy = 1; (I,B
and 0,, denote incrementally the estimated average lengths of x and y vectors,
respectively; and c,- and y, denote, respectively, the x-center and y—center of
the ith cluster of the node c. It is worth noting that the parameters on.n and
0,, should take account of the dimensionalities of X and y. In most of IHDR’s
application, dim(X) >> dim(y) and, hence, the x—part dominates the distance

in Eq. (3.14) if we do not weight them properly.

3. In [111,113], the nearest-neighbor model3 is used in the leaf node. Although the
nearest-neighbor’s error rate is bounded above by twice the Bayes rate (which
is the Optimal error rate for a classiﬁcation problem) [31], Atkeson et al. [9]

point out the nearest-neighbor models might overﬁt linear areas, as shown in

Fig. 3.15 (a).

We propose to use the linear regression in the leaf node instead of the nearest-
neighbor model. Let x denote the query point. Assume IHDR ﬁnds the leaf

node c whose prototype set PC is:

736 ={(X1,y1),(x2,y2),-..,(ch,ync)}

where the output labels are scalar“.

 

3The nearest-neighbor regression simply chooses the closest point and uses its output value.
4For notation simplicity, we only consider the scalar output.

98

As in locally weighted learning (LWR) [9], d(., .) denotes the Euclidean distance
and K (d) denotes a distance kernel function (e.g., K (d) = exp(—%)). We

compute the following diagnal weight matrix:
W = diag[K(d(x1, X), K(d(x2)x)3“°,K(d(x”1clx)l

T
with each one corresponding to a prototype in PC. Let Y = [y1 ync] and

T
X = [xf x56] . We formulate the problem as ﬁnding the parameter ,8

such that the regression output is
.7? = ﬁTX
and the following cost function is minimized

C = (Y — xs)TW(Y — 2(5).

The solution of this weighted least square problem [81, Page 386] is
a = (XTWX)-1XTWY. (3.15)

It is important to note that X TWX is a d X d matrix, where d denotes the
dimension of the vector x. If X is a high dimensional space, then inverting such
a matrix is impossible for a real-time application. However, we can conduct

such regression on the discriminating subspace D (see Appendix A and [123]).

In the following, we experimentally show the augmented IHDR’s feature extraction
and real-time learning capabilities on an artiﬁcial data set. As a ﬁrst test (2d-cross),

we ran IHDR on the noisy training data drawn from a two dimensional function

y = max{exp(—10xf),exp(—50x§), 1.25 exp(—5(xf + x3)} + N(0, 0.12),

99

as shown in Fig. 3.16 (a). This function is a combination of nonlinear and linear
areas and an interesting test of learning and generalization capabilities of a learn-
ing algorithm [99]: learning system with simple model ﬁnd it hard to capture the
nonlinearity well, while more complex models easily overﬁt the linear areas. A sec-
ond test (100d-cross) added 98 irrelevant noise features to the input, each having
a density N (0, 0.0252). We thus obtain a 100—dimension input space. A third test
(200d-cross) added another 100 irrelevant noise features with the density N (0, 0.05)
to the input space of the second test. The learning curves with those data set are
illustrated in Fig. 3.17 (a). In all three cases, the normalized mean square errors
(nMSE) are reported on an independent test set (1681 points on a 41 x 41 grid in
the unit-square in the input space). As the number of training number increasing, all
nMSEs converged to the nice function approximation results, namely, nM S E < 0.03
after 100,000 training data points. Fig. 3.16 (c) shows the learned surface of the third
test after 100,000 training samples presented. For comparison, at the same condition
of (c), Fig. 3.16 (b) shows the learned surface using the nearest-neighbor model in leaf
node. Fig. 3.17 (b) illustrates the execution time of both training and test processes
with respect the number of training samples. It is interesting to note that the execu-
tion time increases linearly w.r.t. the number of training samples. The reason is that
IHDR’s updating and retrieval procedures have the logarithmic complexity and, thus,
the average time of adding a training sample and retrieving a testing sample does not
change much even though the size of tree has grow tremendously. As considering
the third case (200d-cross), execution time on a 200-dimension data set takes only
about 500 seconds for 100,000 sample, in other words, the average time for a sample
is about 5 milliseconds. Therefore, IHDR is fast, which is extremely important for

later real-time robot collision avoidance experiment.

To sum up, the power of feature extraction is due to IHDR’s deriving discrimi-
nating features automatically. The real-time learning performance is achieved by the

logarithmic complexity of IHDR.

100

3.5.4 Motor mapping

A major goal of motor mapping is to generate a concise representation for stereo-
typical motor trajectories so that skilled actions that involve many motors can be
represented by high-level motor primitives in a lower-dimensional space. As shown in
Fig. 3.18, the architecture of the motor mapping is very similar to the spatiotemporal
sensory mapping but it works backwards. Motor mapping receives lower-dimensional
“feature” space signals (motor primitives) to synthesize higher-dimensional raw motor

signals. Motor mapping has many issues similar to those of sensory mapping.

Instead of having attention selection in sensory mapping, motor mapping has a
gating system. The gating system plays two roles. The ﬁrst is the role of gating
by which the motor mapping evaluates whether the intended action has sufﬁcient
action thrust to pass the gate threshold. The second is the role of subsumption [18]

in subsuming the lower—level action by the integrated action, as shown in Fig. 3.19.

The primed action from the cognitive mapping includes the desired control signal
for each effector as well as action thrust. The action thrust indicates how consistently
the action is issued. The higher the action thrust, the more consistent the action
generator is. Therefore, enough thrust must be generated to pass the gate before an

action can be issued to the effector.

Subsumption is needed where inconsistent actions for a single effector are issued
from multiple sources, and each source has a different priority. One typical use is that
the action derived from a higher level (more sensory integration) has a higher priority
over the action derived from a lower level (less sensory integration). The subsumption
belongs to innate internal behaviors. When the robot becomes more mature, the
selection of behaviors from low levels can be overidden by learned (voluntary) internal

behaviors.

If a single motor is considered, motor mapping includes only a gating system for
each of the single motor as well as the subsumption mechanism for integration from

other sensorimotor systems as shown in Fig. 3.19. Through developmental experience,

101

the motors that are highly correlated enable the growth of a new part of motor
mapping, denoted as an attached (top right) block to the basic motor mapping in
Fig. 3.9. The new part of the motor mapping plays the corresponding role of the gating
system, but it is for correlated multi-motor actions. Further, it not only performs the
gating function, but also the reconstruction of higher-dimensional raw motor signals
from a lower-dimensional “feature” representation (higher motor primitives).

In [131], an action gating system is implemented on the SAIL robot to decide

whether an action is actually triggered.

3.5.5 Innate value system

Given the last context I (t), the IHDR tree ﬁnds the best matched context P’ which
is associated with a set of primed contexts {p1, ..., pk}. How does the controller select
the best primed context? A Q value with each primed context is needed. This is
represented by a list Q = {q1, q2, ..., qk}, in parallel with the primed contexts. Thus,

each primed context consists of three components, (xp, (t), ap,(t), q,-), i = 1...k.

Deﬁnition 3.5.1 The innate value system chooses the action that has the best value
q.

The value system receives a list of primed contexts at each time instant t and
occasional environment reward signal r(t), which is from the biased sensors of the
robot (e.g., the aversive sensors such as a “bad button” or appetitive sensors such as
a “good button”). Each element in the Q vector starts from an initial value, e.g., a zero
value. The updating rule that we summarize here is adapted from Q-learning [103],

tested in [41]:

q(l', n) = n _1;#(n)q(l’,n - 1) +

 

and

V(p') = max q,,

195k

102

where q(l’, n) denotes the matched element in Q and has been updated n times, and l’
is the best matched prototype for the current last context l(t). 7 is a positive number
between 0 and 1, and is used for discounting in time. p’ is the primed context found
in the next time frame. To keep the temporal order of the latest retrieved primed
contexts for real—time local propagation, a primed context update queue (see Fig. 3.9)
is needed. The future primed context prototype (including reward) propagates back to
the previous prototypes recursively for each time frame in the context update queue.

Other prototypes (not in the queue) are not affected and thus are not updated.

3.5.6 Sensorimotor algorithm

The following is a summary of a sensorimotor system with an innate value system:

1. Initialize the mental cycle count to 0.

2. Grab the current input from the sensory mapping to form the last context
l(t) = (x(tlialtll-

3. Go through the sensory mapping to reduce the dimensionality of l (t) while
applying the internal actions to the attention selection effector of the sensory
mapping.

4. Retrieve the reality (cognitive) mapping R using l(t) to ﬁnd the best matched

prototype l’ in the matched leaf node of the cognitive mapping. Its output part

is a list of primed near contexts (Ar(t), X,(t)).

5. The innate value system selects the near primed context p,(t) from the lists
(Ar(t)in(t))'

6. Retrieve the priming (cognitive) mapping F from l (t) to produce the primed
far contexts (Af(t), Xf(t)).

7. The innate value system selects the far primed context pf(t) from the lists
(Arlt),X1(t))-

103

10.

11.

12.

13.

14.

15.

. Feed the actions in p,(t) and p ;(t) through the motor mapping. The current

internal action determines whether primed near context or primed far context

receives attention in the motor mapping. The attended primed context is p(t).

Update both the reality and priming mappings using the primed context p(t). If
there is an imposed action from the environment (supervised learning), replace

the corresponding part of ap(t) in p(t) = (ap(t), xp(t)).

Spatially update the primed context lists (A,(t), Xr(t)) and (A {(t), X f(t)) using

the Incremental Vector Quantization technique.

Temporally update primed contexts in the prototype update queue for the prim-

ing mapping F.

Push the current primed context p(t) into the prototype update queue from the

tail and pop out the oldest primed context.
The motor mapping produces internal and external actions.

If the mental cycle time has not been used up, sleep for the remaining time so
that the exact time actually spent by this cycle is equal to the ﬁxed pre-speciﬁed

mental cycle.

Increase mental cycle count by 1 and go to Step 2.

The SAIL robot has successfully tested those major components of the proposed

architecture. For example, a Type-6 architecture (shown in Fig. 3.20) was used in

action-chain learning [131]. The result of the experiment is promising and we plan

to implement the whole architecture on the Dav robot. Other experiments on SAIL

include: (1) vision-guided navigation [115], (2) grounded speech learning [116], (3)

communicative learning [131], (4) novelty and reinforcement learning in the value

system [41], and (5) sensory mapping development [127].

104

3.6 Summary

This chapter has outlined the major differences between a non-AMD robot and an
AMD robot: An AMD robot’s developmental program is task-nonspeciﬁc. Further,
this chapter introduces eight challenging operational requirements of an AMD robot,
which pose a series of challenges for the design of the architecture. The series of
architecture types introduced here demonstrates several architecture limitations of the
current hand-designed MDP and a possible route toward higher mental architectures.
The proposed architectures leave much freedom for different implementations of its
major components, e.g., sensory, cognitive, and motor mappings. Although the Dav
developmental program that is currently being implemented could be explained in
further detail, such details are beyond the scope of this chapter. Dav’s developmental
architecture has yet to be implemented and tested, but its predecessor, SAIL, has
successfully tested the major components of the designed architecture. The presented

architecture will be tested in future studies and the performance will be reported.

105

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

     
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Visual L» Sensori— _ Sensori— T
input ’ m0tor J ' motor _‘ [2:121:12]
> (W, I) \ (W, 1) Motor
mapping
Auditary,» Sensori— / Sensori— (w. I) To internal E
input motor _ motor I ) effectors > 3
V (W. l) \ (MD A E
l i
-> Sensori- / Sensori— Motor
—_> motor __ motor mapping [
E (w. I) > (w. I) (w, l)
_, Sensori— _ Motor
motor 7 mapping E
(W, 1) > (w, 1) #3
C
b -§
Sensori— ‘ Motor :9
motor 7 mapping
I ‘W' ” e (w. I) J
'Ii'guch Sensori» _ Motor Motor 1
put 1 7 . output
—> motor - mapping __ _ .. ,
(W. l) > (W. l)
E
Touch , Motor 0
input 2 Senson— > Motor output2 P L;
—> motor mapping .5
(WI 1) > (w. I) 1%
Pain & ~ Motor
sweet Senson— > Motor output 3
—> motor mapplng ‘_—>
(w, I) > (w. I) J

 

 

 

 

 

 

 

Figure 3.5: An outline view of the Dav architecture. The closed-loop control is realized by
the sensors that sense each effector, as well as a set of rich sensors that sense the internal
and external environments. w denotes the working memory and l is the long-term memory.

106

 

 

 

 

Val Action
ue
system a.
f \

\

 

Figure 3.6: An open view of a sensorimotor system. A circle designates a local context
state determined by the last context L. An inward solid arrow indicates a sensory input at
that time frame (including internal and external sensors) at each time frame. An outward
solid arrow indicates an output at that state. A dashed arrow indicates an experienced
transition between states. The lower part corresponds to the reality mapping R and the
upper part corresponds to the priming mapping F. The two generally do not have the same
number of states. The priming mapping is used to predict farther future contexts.

107

‘ States (prototypes)

oooooppppooooooo
ooooqqupooooooo
oopppeeeeppooooo
owwwmomm%WOooooo
oddeebobbeeppooo
aobbboooooeeeooo
@pooooooooempooo
obooooooooobbooo

; Time

 

 

Figure 3.7: A temporal trajectory view of a sensorimotor system. Each cycle represents a
context state at a particular time. A shaded cycle corresponds to a visited state. A dashed
arrow indicates a priming of a context that corresponds to the cycle that the arrow points
to. The priming arrows from a state are updated according to the real experience. An
actually visited state is not necessarily primed.

 

  
 
   

 

 

 

 

 

 

F F
ar
pﬁming contexts
- Preferred
mapping I .
$1231; Near Value action

. 1
input Reality contexts I ’ sys em a(t)
x(t) mapping V

 

 

 

R

Figure 3.8: A recursive view of a simpliﬁed sensorimotor system. Not all components are
shown.

108

”1111 p'
F -
Primed contexts

L Effector

   

 
 
   
   
 

Priming
mapping

 

Prototype updating
queue

T

  
   

Sensory S

 

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

input 8
Sensory Sensory Reality __ Motor “‘13:".
u mapping _ y mapping mapping mapping a
ll R M It
Delay < t ‘H I
ll \Release
Prev'ous control
' A ti
context+- c ons 5:33;:

Figure 3.9: A block diagram of the architecture of a sensorimotor system. The lower
cognitive mapping realizes the reality mapping and the upper one realizes the priming

mapping. GK: gate keeper, an internal effector to actively control the update of the last
context.

 

Time

)1
I \
I

I \ I I
I I

1‘ .. _ LI _1
it. It

..--

‘
A

 

 

7“”-3‘

nL | I
--.7c--.;<'--.7c‘_:.;c._->\

‘X. _ ><.~‘-‘

28.3— 8.
-—.k——.>c-_-k-_

 

 

 

 

 

 

 

 

 

 

 

 

 

I
I I II I
I | I I I
‘ — .- a- .—
X“ . -—>
1’ I H’ | Attention
'\ .. _‘ ’.' ‘_ effector '
x. X x —>
I ‘—r | r I I
l I I II I I
I I l I I I I
\ U \l \I \[
\\4‘\t’\lx\”\t
Control
input

Figure 3.10: The spatiotemporal organization of areas in the sensory mapping. The ellipses
represent receptive ﬁelds covering both space and time. Areas are organized in a hierarchical
way, and the output of an earlier (low order) neural area is used as input to the later (higher
order) neural area. Along the pathway of information processing, a neuron in a later area
has a larger receptive ﬁeld than one in an early area. In order to make the correct signal
available at the right time for the right input line, we use a time shifting technique. Acting
as a time delay unit, the shifter moves each signal to the next line at each time instant.

109

 

2. Fort: 1,2,...,k, do,
2.1. If i = n, initialize the ith eigenvector,
“Uzi" “ 1) = Uzi")
2.2. Ifi <= 71,
,(n—l)

Iu(n) = "ii-W" — 1) + LIT—(“4mg“)Thin—I‘mI

 

“MW = “4") ‘ "{("hlz‘kz in ll::(:)ll'
2.3. If 1' > n, initialize the ith eigenvector,
Uzi ) = 0-

 

 

 

Figure 3.11: CCIPCA algorithm [119]: compute ﬁrst k dominant eigenvectors,

v1(n),vg(n), . . . ,vk(n), directly from u(n), where n = 1,2, . . ..
I . a The plane for the II
[’16, d. u . -
I iscnmmatmg

,”6 29/ feature subspace

 

 

 

 

 

X space Y space

Figure 3.12: Y-clusters in space y and the corresponding X-clusters in space A”. Each
sample is indicated by a number which denotes the order of arrival. The ﬁrst and the
second order statistic are updated for each cluster. The ﬁrst statistic gives the position of
the cluster, while the second statistics gives the size, shape and orientation of each cluster.

110

 

 

 

Figure 3.13: The autonomously developed IHDR tree [114,123]. Each node corresponds to
a local model and covers a certain region of the input space. The higher level node covers a
larger region, and may partition into several smaller regions. Each node has its own Most
Discriminating Feature (MDF) subspace D which decides its child models’ activation level.

 

 

 

 

 

 

 

 

 

root
0
11 14 , . 4.4 43
. , ' , ‘42010203 04
4.11'
3.2 . .
21 ’22 31 33. 3,4 00(Le eecbe eeche eeche
' 1.1 1.4 2.1 2.4 3.1 3.44.1 4.4

 

 

Figure 3.14: Regions in the input space marked 1'. j where i is the index of the parent region
while j is the index of child region. The leaves of the tree represent the ﬁnest partition of
the space. The decision boundary of the region is of quadratic form.

111

 

 

 

 

 

 

 

 

 

 

 

5 6 {—0—-

4 I 5

3 4 ___.___i

2 3

1 _.J ° 2 4

o x 1 x
0 2 4 6 I 2 3 4 5 6 7

 

 

 

 

 

 

 

 

 

 

6 x 7 y

5 6

4 5

3 4

2 3

I 2

o - 1 x
y 2 4 6 I 2 3 4 5 6 7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 3.15: (a) Fits using the nearest-neighbor rule. (b) Fits using locally weighted
regression. (c) Fits using kernel regression. (a) and (b) are adapted from Atkeson et al. [9].

112

The true function

 

(b)

The fitted function: nMSE=0.025

 

Figure 3.16: The result of IHDR on an artiﬁcial data set.

113

 

 

-o- 2dcross

 

 

 

 

 

 

 

 

 

 

 

 

  

#Training data points
(a)
600 a - . - -T a - . . . f - .
-o- 2dcross
‘4- 100dcross
a 5m b *" 200dcross 3‘
.0 I
§
a 400 . .
0 III
.§ ,
E 300 . g .
I
o .
3: ~’
§ -’
x 200
m
100 .

 

 

 

 

#Training data points
(b)

Figure 3.17: (a) The learning curves of the artiﬁcial data set. (b) The execution time

114

New
high-level

Signal flow direction for action space learning

 

Signal flow direction for skilled action synthesis

 

Area3

 

ll

 

 

 

 

0202 i

 

control

Areal

N

r Q
.~.'-.—’
I O

r

 

inputs

Action

 

 

release

I:

 

 

 

 

control

> Time

 

‘
‘\

 

‘
-ID’ \

-..>g
’—~

I l
5c:‘:}<’:-:>I
51.“,

~
_.‘

"I h

I I
.PC'
I

- 4.--
Pi

ﬁt-

K.

x.

...—
-—'
'-
~—‘

”-
w a
-'
an—‘
-'
'-
-'
"k”-
.' I
\
X.___,

A
‘7 ‘7
a"-

’x
7 f I .
:57?
22' I
‘--’

.I ‘l
I." I
:2)?

1;

m
g
z""‘>{"“>
’ J
I
I
I
(r I
)1.

 

 

 

 

Figure 3.18: The motor mapping as a reverse application of the sensory mapping, but with

signal reconstruction.

Action
--——->

Gating

I From high level

 

 

 

 

 

 

To effector
——>

 

 

 

I Gating control Subsumption

Figure 3.19: An illustration of a simple motor mapping for a single effector. The left is a
gating mechanism and the right is a subsumption mechanism.

115

Attention control

  
 
  
  
       

To effectors
selector
<—
Prototype updating queue
Second level LBE
Attention control
sensation
Action
Legend:
I Reﬂexive
primed
Prototype updating queue context
Backpropagated

First level LBE primed COMCXI

Figure 3.20: A hierarchical developmental architecture (Type-6) for SAIL procedure learn-
ing. Each Level Building Element (LBE) corresponds to a sensorimotor system.

116

Chapter 4

Perception-driven Control

Architecture

4. 1 Introduction

From Chapter 2, we can see that conventional robotics relies on human supplied
equations of dynamics and kinematics to plan trajectories. This approach has been
proven to be very useful when those equations and environment representation are
available. However, it is shown that this approach has difﬁculties when the system is
extremely complicated and insufﬁcient analytical knowledge is available. For example,
humanoid robots have so many degrees of freedom that even for simple actions there
are nearly inﬁnite set of joint position sequences that accomplish a single action [98].
Some researchers hence propose to learn actions through practice. Related work in
the area includes LWPR on a SARCOS robot by Schaal et a1. [97], the simulated
model of a 37 degrees of freedom by Mataric et al. [11], and the scheduling degrees

of freedom by Grupen et al. [26] motivated the early child motor development.

On the other hand, the traditional approaches usually require a human supplied
representation of the environment. Robots designed with those approaches have dif-

ﬁculty in adapting to unknown and changing environments, for example, in sensing

117

and controlling a robot’s arm to pick apples in an orchard. Further, the goal can
change at any time (e.g., a larger apple becomes the target before the current one is
picked). Some efforts have been made to program behavior-based humanoid robots
(e.g., Kismet [14], Cog [13,65]). Recently, progress has been made in realizing robots
that can develop their mental skills autonomously through what is called Autonomous
Mental Development (AMD), which we have discussed in Chapter 3.

In this chapter, we present a novel developmental perception-driven control ar-
chitecture (DPDCA), which is adapted from the overall architecture presented in
Chapter 3. The novelty of DPDCA includes: (1) No task (goal) is given at pro-
gramming time. (2) The task-speciﬁc representation is generated autonomously from
sensor-motor space, not in a 3—D world space. (3) The robot learns while performing.
(4) The robot is “alive” while humans interact with it.

In the following, we ﬁrst outline the proposed framework. we then illustrate the

power of DPDCA via Dav’s range-based indoor navigation experiments.

4.2 Control Architecture

Machine perception has been proven difficult. Programming perceptual capability
using human deﬁned features (e.g., edges, colors, tones) provides a way to produce
results in a controlled setting. A resulting fundamental limitation is that the robot
does not work well in unknown, partially unknown, or changing complex environ-
ments. In this section, a Developmental Perception Driven Control Architecture

(DPDCA) will be presented to take up these issues.

Definition 4.2.1 A Sensory vector is a vector that contains raw data from all sen-

sors. A sensory vector is a stochastic process indexed by discrete time t, denoted by
x(t).

The sensing modalities of a developmental robot typically include vision, audition, 3-

D range sensors, joint encoders, tactile sensors and strain gages, plus “brain” internal

118

sensors such as sensors that sense attention.

Definition 4.2.2 An action vector at time t is a vector denoted by a(t), which con-

sists of the control signals sent to all effectors at time t.

Deﬁnition 4.2.3 A context at time t is the sensory information that is needed by
the controller to generate a proper action at that time. With the current sensation
denoted by a sensory vector (including sensing of action) g(t) = (x(t),a(t)), the last
context is deﬁned as l(t) = (g(t),g(t—1),...,g(t— m +1)) at time t. Here In denotes

the number of time frames in the context.

The context so—deﬁned includes only the latest m sensory vectors. It is desirable to

use a small subset in order to reduce the complexity of learning.

Deﬁnition 4.2.4 The control law of the DPDCA is a decision process Rt: g’(t) =
R,(l(t)), illustrated in Figs. 4.1 and 4.2. The primed (recalled) context vector g’(t)
consist of two parts: the primed sensory vector xp(t) and the primed action vector

ap(t). The action vector ap(t) is the output signal vector sent to the controllers.

Rather than programmed manually, the decision process (a function) R, is developed
through time. The function Rt depends on the robot’s sensorimotor experience, and

can be regarded as an internal representation of the environment and its own body.

Deﬁnition 4.2.5 An internal representation Rt is the agent’s representation of its
own body as well as that of the external environment. It is grown from the agent ’3

continuous interaction with the external environment.

The representation R, is an accumulation of the experience up to time t. The R, is
generated based on what we called developmental algorithm, from its own experience
indicated by {g(r)| 7' = O, ..., t}. B, does not necessarily describe the world accurately,

however, it may be improved by new experiences.

119

Externally
occurred

l Internally
: primed
I
. , . .
Last sensation t aned sensation Sensation
E—V—J W.) ’ time axis

11(1) xph)

L.
(D

a l (t) ap (t)
Action: 1 I l I Action

. . . time axis
Last action ' Primed action

Sensation:

 

 

‘ Agent

 

 

 

Figure 4.1: The decision process of an agent. The agent tries to use the last context
I (t) = (x1(t), al(t)) to predict the future contexts: primed sensation xp(t) and primed action

(W)-

R, is not a monolithic representation typical in human designed representation.
Instead, it is high dimensional, numeric, and hierarchical. Each experience g(t) must
be used to update the representation R; which is then used for computing the primed
action ap(t) and the primed sensation xp(t). Afterward, g(t) is discarded. Therefore,
the entire experience series from time 0 to t — 1 is not available at time t. The robot
simply does not have space to store all of them. The representation R, is crucial
in order to keep all the necessary information about what the agent has learned so
far. The following mappings are needed to learn incrementally: (1) Representation

updating from current context I (t) input and the current representation Rt_1(t):
Rt : R(Rt—II l(t)), (4'1)
(2) Control law:
P = 3:00)), (4-2)

where the set P is a list of primed contexts {g£(t) | i = 1, ..., k} as an approximation
of distribution of primed context. Each element of P consists of three elements,

(xp,(t),ap,(t),q,-). The action api(t) of the primed context with highest q, value is

120

World Joint Sensors
Sensors

  

Xﬂ) a(t

g (t) Controller
g(t-1)

Effectors

g(t+In-1)
Context

Reward
Imposed
actions

Figure 4.2: The architecture of the robot controller. Note that symbols a(t), x(t) and l (t)
denote the action, sensation, and context at time t, respectively. The internal representation
R; is realized by IHDR. The world sensors include range ﬁnder and cameras. Block V
denotes the innate value system.

3

Grab current sensory frame ‘—

 

 

 

 

 

 

 

Derive action

 

 

 

 

 

 

 

—--> Update memory

I

 

 

 

Figure 4.3: Flow chart of learning procedure: the system learning while performing.

chosen as the next output control signal a(t), that is

a(t) = apc(t), c = arg “_nliinkqp (4.3)

.....

The “innate” value system V updates q values using an augmented Q-learning algo-
rithm [42], based on handle external rewards r(t).

A detailed illustration of DPDCA architecture is in Figure 4.2. R, is implemented
by an IHDR tree and 7?. is its updating function. IHDR learns incrementally by
building a hierarchy of automatically derived discriminating feature subspaces. Given

each input, it retrieves output in logarithmic time.

121

As shown in Figure 3.1 the robot trainers are an active part of the environment,
they do “robot sitting”, training the robot step—by-step, and imposing reward or
punishment according the robot’s current behavior.

In each learning episode, if the trainer imposes a desired action on an effector,
the robot will comply using its force sensor depending on whether the imposed action
is consistent with what the robot intend to do (see Fig. 4.3). If everything is ﬁne,
proceed with the action and the robot may receive a reward for it. On the other
hand, if the robot is not doing what the trainer wants, it will modify its learned
internal representation Rt so it follows the trainer’s instruction (the imposed action).
This is a uniﬁed learning procedure, that combines both supervised and reinforcement
learnings.

Major differences exist between the DPDCA and a traditional controller. The
DPDCA has following characteristics: It develops a controller (software) by learning
tasks on the ﬂy, from simple tasks (e.g., moving around according to what it “sees”) to
more and more complex tasks (e. g., learning sensing based object manipulation). The
same developmental program runs continuously for learning all the tasks. Different
environments and commands invoke different task execution. Other differences are
summarized in Table 4.1.

To conclude, let us remark that, comparing with control diagram in Fig. 2.15 as
well as Eq. (2.33), the proposed architecture does not need a desired trajectory to be
speciﬁed. In fact, the physical environment itself shapes the robot’s behavior based

on the current task context (if multiple tasks are considered).

4.3 Dav’s Range-based Navigation Experiment

4.3.1 Wall Following Behaviors

Our mobile humanoid robot, Dav, as shown in Fig. 2.1, was used to test our proposed

the DPDCA framework. In this experiment, we trained the robot to navigate along

122

Table 4.1: Differences between the DPDCA controller and the traditional controller.

 

 

 

[ DPDCA controller | Traditional controller ]
e No task (goal) given at e Tasks (goals) are given (e.g.,
programming time. trajectories).
e No need of 3-D Euclidean space 0 Inverse kinematic transform is
(end—effector frame) needed since planned trajectories
representation. are represented in the

. . end-effector frame.
0 When human Interacts wrth a

robot, the robot is “alive” in a 0 Dynamic model of the robotic
sense of learning to associate the body is needed for tracking the
current sensory context and the trajectories.

im osed action. , ,
p 0 When human overrides machine

0 Robots learn their dynamic is dead (in a manual operation
behavior online. mode).

0 “Sensing and learning”. 0 Human programmer supplies the
Generate task-speciﬁc representation.
representation autonomously
“on the ﬂy”.

 

 

 

 

the loop shown in Fig. 4.4, which is the second ﬂoor of Engineering Building at
Michigan State University. The floor plane covers an area of approximately 136 X 116

square meters.

We have three types corners in this test site: ‘L’, ‘T’, ’+’, and ‘2’. Some training
range images around those corners and intersections are shown in Fig. 4.5. Those
corner types are typical for indoor navigation. The training session was conducted
under human control via the graphical user interface (GUI), shown in Fig. 4.6. The
imposed action can be calculated from the mouse pointer’s position (refer Eq. (5.14)
and Fig. 5.7). At each sampling instant, an input range image, previous robot’s
velocities and an imposed action vector (the desired forward translational speed and
angular speed) were fed into the IHDR tree for training if an action was imposed by
the trainer. On the other hand, the laser input and the robot’s velocities were used

to retrieve the action sent to low-level controller if no action was imposed. During

123

 

 

 

 

[r

 

 

 

 

 

J‘LLEIEH J

 

 

 

 

 

 

 

 

 

l'—'l I
L__]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.4: A map of the training site at the second ﬂoor of the Engineering Building at
Michigan State University. The test loop (desired trajectory) is indicated by the thick solid
lines. 5 different corners are denoted by bold-faced arabic numbers.

impose action cycle, it is worth noting that if the retrieved range image was similar
to the current range image (small Euclidean distance), the old stored action value

was replaced with the new imposed one.

First we took Dav for a training run. During the training session, we usually let
the robot run on its own (the current version of controller or the IHDR tree). If the
robot’s action was wrong, we overrode it either with a correct one or gave it a negative
feedback (reward); in the mean while, Dav associated the current laser map with the
imposed action or attributed the negative reward to the current action. This training
procedure resembles how babies learn to walk. When we satisfy with the performance
(e.g., small cross-validation error), Dav can be set free.

Dav learned very quickly during the straight section of the corridor, several inci-
dences of imposing action were sufficient for a reasonable performance. More training
samples were needed for Dav to turn correctly at the corners. After we trained the

robot with 2,215 samples online, Dav successfully ﬁnished the desired trajectory (the

124

test loop in Fig. 4.4). The distribution of the 2215 samples is the following: 215
samples is from the straight sections; the other 2000 samples are from corners, with

each 400 samples from a different corners.

Secondly, to illustrate the performance of the navigator, we partitioned the 2215
samples into six bins, with ﬁrst ﬁve bins corresponding to the ﬁve corners in F ig.4.4
respectively and the sixth bin corresponding to the samples from straight sections.
We performed ﬁve tests. In each of these tests, a bin from a corner was left out for
testing. In Fig. 4.7, we show the average error histogram of the five tests. The x-axis
denotes the Euclidean distance between the true and the predicted outputs, while the
y-axis denotes the number of samples. The range of output (v,w) is v 6 [0,1] and

w 6 [—7r/ 2, 7r/ 2]. The average normalized mean square error is 11.7%.

In the third experiment, we used the data set from the second ﬂoor as the training
samples. In the test, the data set with 4747 samples (manually labeled) from the third
ﬂoor was used. Fig. 4.8 shows the error histogram, in which the x-axis denotes the
Euclidean distance between the true and the predicted outputs. The normalized mean
square error is 8.6%. By inspecting the testing sample with large prediction error, we
found that in most cases, the labels of the testing samples were not consistent with

the training samples.

In order to provide a sense about how stable the navigating behavior is at a
different environment, we designed the fourth experiment. We trained the robot at
the second ﬂoor while testing it at the third ﬂoor. Fig. 4.9 shows how the robot
navigates at the third ﬂoor. The thick line in Fig. 4.9 denotes the trajectory of the
robot. One pass of autonomous navigation in the tested site took about 37 minutes.
The robot was observed to continuously run for 3.5 hours before the onboard batteries
were low. In several such tests conducted so far, the robot all performed well without
hitting the wall and objects on the ﬂoor. During these navigation tests, passers—by
were allowed to pass naturally. The mobile robot was not confused by these passers-

by largerly because the IDHR tree uses automatically derived discriminating features,

125

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.5: A subset of training samples. The red arrows denote the labeled heading for
the robot.

which covers the entire scene. In addition, we measured the minimal clearance from
the obstacles in those test sessions. In the worse case the clearance is of about 50cm,

which happens when the robot was in a narrow corridor.

4.4 Summary and Future Work

The framework presented here does not restrict the scene type and, thus, is potentially
applicable to other similar indoor environments. Instead of relying on particular scene
features, the proposed controller uses the raw range image and automatically derived
features (due to IHDR’s feature extracting capability). Since no manually extracted
features (e.g., lines and corners) are used in the system, the controller is less restrictive
and more robust to the noise of sensors and effectors than the manually designed ones.
Finally, let us remark that, although the range—based indoor navigation system is less

challenging than the vision-based one, the result of Dav’s indoor navigation is still

126

 

 

 

 

 

 

 

 

 

Figure 4.6: The graphical user interface to train the robot online. The left window shows
the current laser map in the robot’s local frame. The trainer may impose an action via
clicking the mouse button in the window. The red arcs illustrate the possible circular path
that the robot may have. The right window shows the primed range image (retrieved from
the IHDR tree).
very impressive. It shows the power of DPDCA framework.

The future work includes applying this framework to the robot’s eye-hand coordi-

nation task, which is more challenging because more degrees-of-freedom are used and

the high-dimensional vision is involved.

127

 

350 Y 1 r I

300 ~
250 l
200 .
150 «

 

 

Figure 4.7: The error histogram of the leave-one-out test.

 

 

 

 

3500 . . , -
3000 .
2500 .
2000 .
1500 -
1000 .
500 .
00 (IS—_—5 i 'Cs 5 215

Figure 4.8: The error histogram of the testing data set collected from the third floor.

128

 

 

 

 

l
.I

 

1TH

 

 

 

 

 

 

 

L_J

 

___..JL

Wﬂ

l' T' ILTI [:14 l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.9: A map of the test site at the third ﬂoor of the Engineering Building at Michigan
State University. The thick solid lines denotes the trajectory along which the robot traveled.

129

Chapter 5

Global Obstacle Avoidance

through Real-time Learning

5. 1 Introduction

Reactive obstacle—avoidance behaviors and limited look-ahead searching offer a solu-
tion for a mobile robot to navigate in unknown and dynamic environments. Although
substantial amount of work on robot’s obstacle-avoidance behavior exists, many ap-
proaches are still restrictive. A common restriction is that the mapping between the
sensory context and desired behaviors is too complex to write a program to simulate

accurately.

It seems that real-time learning provides a promising alternative. Rather than
using hand-designed behaviors in an off-line fashion, the learning allows robots to ex-
plore the environments themselves while improving their performance through prac-
tice. During the learning, the robot associates the desired behavior with the current
sensory context and plans ahead within a limited step in real time. This architec-
ture design endows robots with great ﬂexibility to cope with unknown or changing

environments, which are not easily handled by programmed-in behaviors.

130

 

5.2 Related work

The problem of range—based obstacle avoidance has been studied by many researchers.
Various reported methods fall into two categories: path planning approaches and local
reactive approaches. Path planning approaches are conducted off—line in a known
environment. It is shown that path planning for a robot with bounded velocity and
arbitrary many obstacles, is intractable (NP-hard) [21]. In Hwang & Ahuja [25,45], an
artiﬁcial potential ﬁeld is used to ﬁnd a nearly optimal collision-free path in a space.
Such methods can handle long—term path planning but are computationally expensive
for real-time obstacle avoidance in dynamic environment (moving obstacles).

The local reactive approaches are efﬁcient in unknown or partially unknown dy-
namic environments since they reduce the problem’s complexity by computing short-
term actions based on current local sensing. In the vector ﬁeld histogram method
(VFH) [13], a 1-D polar histogram representation is constructed to model the en-
vironment. The curvature-velocity [84] and dynamic window methods (DW) [34]
formulate obstacle avoidance as a constrained optimization problem in a 2-D velocity
space. Obstacles and the robot’s dynamics are considered by restricting the search
space to a set of admissible velocities. Other local approaches include: Nearness dia-
gram (ND) [66] uses high-level description of the environment to generate the motion
signals via ﬁve laws. Ego-Kinematic space method [67] suggests a nonlinear trans-
form to relieve the problem of the non-holonomic constraint of two-wheel vehicles. In
order to deal with local minima, limited look-ahead search strategies are integrated.
For example, the enhanced VFH algorithm (VFH* [95]) uses A* [79] and the global
DW approach [16] uses NF 1 [57].

A major challenge of scene-based behavior generation is the complexity of the
scene. Human expert’s knowledge has been used to design rules that produce behav-
iors using pre—specified features. In Lee & Wu [58], a fuzzy logic control approaches
is used to incorporate human expert knowledge for realizing obstacle avoidance be-

haviors. One difﬁculty of a pure fuzzy approach is to obtain the fuzzy rules and

131

membership functions. The neuro—fuzzy approach [2,62,126] is introduced with the
purpose of generating the fuzzy rules and the membership functions automatically.
Their training processes are usually conducted using a human supplied training data
set (e.g., trajectories) in an off-line fashion. The dimensionality of the input variables

(features) is usually less than 10 to be manageable.

In contrast with the above efforts that concentrate on behavior generation without
requiring sophisticated perception, a series of research deals with perception-guided
behaviors. Studies for perception-guided behaviors have had a long history. Usu-
ally human programmers deﬁne features (e.g., edges, colors, tones, etc.) or environ-
mental models [15,43]. An important direction of research, the appearance-based
method [23,74], aims at reducing or avoiding those human-deﬁned features for better
adaptation of unknown scenes. The need to process high dimensional sensory vector
inputs in appearance-based methods brings out a sharp difference between the be-
havioral modeling and perceptual modeling: the effectors of a robot are known with
the former, but the sensory space is extremely complex and unknown with the latter

and, therefore, very challenging.

In this chapter, we present an approach of developing an obstacle avoidance be-
havior by a mobile humanoid through online real-time incremental learning. This
learned behavior also integrated the result of look-ahead search seamlessly to solve
the “nearsightness” of purely reactive approaches. By limiting look-ahead search to
performed within real time, the path planner of this work is of the same spirit of
the learnng real-time A* (LRTA*) [54]. The major distinctions of the work include:
First, we used the appearance-based approach for range—map learning, rather than
an environment-dependent algorithm (e.g., obstacle segmentation and classiﬁcation)
for obstacle avoidance. Second, a novel data fusion technique was used to integrate
difference modalities, e.g., laser map, robot’s speeds, and planned heading. Third,
we extended the search scope of LRTA* to the range of the robot’s sensing and Di—

jkstra algorithm was used. The new appearance-based learning method was able to

132

distinguish small range map differences that are critical in altering the navigation be-
havior (e.g., passable and not passable sections). In principle, the appearance-based
method is complete in the sense that it is able to learn any complex function that
maps from the range-map space to the behavior space. This also implies that the
number of training samples that are required to approximate the complex function
is very large. To reduce the number of training samples required, we introduced
the attentional selective mechanism [124] which dynamically selected regions in near
approximity for analysis and treated other regions as negligible for the purpose of
local object-avoidance. Further, online training was used so that the trainer could
dynamically choose the training scenarios according to the system’s current strengths
and weakness, further reducing the time and samples of training.

The remainder of this chapter is organized as follows. Section 5.3 presents the
proposed approach. The robotic platform and the real-time online learning procedure
are described in Section 5.4. The results of simulation and real robot experiments are

reported in Section 5.5. Discussions and concluding remarks are given in Section 5.6.

5.3 Approach

The sensitivity to local minima of the reactive approach leads us to divide the problem
into two components: obstacle avoidance and path planner. This combined framework
can generate motions for mobile robots that accomplish their tasks, while securely

navigating in an unknown and dynamic environment (with low speeds).

5.3. 1 Obstacle avoidance

We consider the obstacle avoidance behavior as a decision process, which converts the
current sensing to action, including the range image and robot’s velocities, and the
desired heading. At this level, the robot does not sense and store scene conﬁguration

(e.g., global map of the environment) nor the global robot position. That is, we

133

assume that the current sensing and heading contain all of the information that is
sufﬁcient for robots to derive the next motor control signal. In fact, it uses the real
world as its major representation. The goal of this component is to move safely
according to the scene, meanwhile moving toward the goal.

The range scanner observes r(t) E ’R C R" at time t, where 7?. denotes the space
of all possible range images in a speciﬁc environment. r(t) is a vector of distance,
whose ith component r,- denotes the distance to the nearest obstacle at a speciﬁc
angle. The vector v(t) gives the current velocities of the vehicle at time t, which are
measured by the vehicle’s encoders. Let 6(t) denote the desired heading given by
the path planner. The variables r(t), v(t), and 0(t) are given by difference sensors
or action whose dimension and scale are different. We thus need a normalization

procedure when fusing them together as an integrated context vector.

Deﬁnition 5.3.1 The vector x(t) denotes the system’s context of the environment at

time t, deﬁned as:

r(t) — r v(t) — v 6(t) — 6

w. w,, we

 

x(t) =( )6 X, (5.1)

where w,, wv, and we are positive numbers denoting the scatter measurements’ of the

variates r, v, and 6, respectively, while r, o, and 0 are their corresponding means.

The action vector y(t) E 32 consists of control signals sent to all of the effectors
at time t, where )7 denotes the sample space of action vectors.

The obstacle avoidance behavior can be formulated as a mapping f : X I—+ y, i.e.,
the primed (predicted) action y(t + 1) (the signal sent to the motors) is a function of
x(t):

y(t + 1) = f (X(t))- (5-2)

 

1For simplicity, we assume the covariance matrix (Eu) of a variate u is equal to 021, where I
denotes an identical matrix. Thus, its corresponding scatter is wu = \/ trZu = J50, where n denotes
the dimension of the variate u (see Marida [64] page 13).

134

     
 

9(1)

Path Planner

 

Mobile
Robot

0

 

 

 

 

 

1It)

 

Figure 5.1: The overall architecture of the range-based navigation. “Attention” denotes
an attentional module.

Fig. 5.1 shows the coarse architecture of the navigator. An IHDR tree, which we
discussed in Section 3.5.3, is generated to estimate the control signal y from x. The
current range image r(t), the vehicle’s velocities v(t) and the heading 6(t) from the
path planner are used for deriving the next control signal y(t + 1). An attentional

module is added to extract partial views from a whole view of a scan.

5.3.2 Attentional mechanism

Direct use of an image as a long vector for statistical feature derivation and learning
is called the appearance-based approach in the computer vision community. Usually
the appearance—based approach uses monolithic views where the entire range data (or
visual image) frame is treated as a single entity. However, the importance of signal
components is not uniform. There are cases where appearances of two scenes are
quite similar globally, but different actions are required. Further, similar actions are
needed where the appearance of two scenes look quite different globally. Both cases
indicate that there are critical areas where differences critically determine the action
needed. This necessitates an attentional mechanism to select such critical areas.

For example, in Fig. 5.2, (a1), (b1), and (c1) show three scenarios along the robot’s
path. Range images (a2) and (b2) are quite similar globally judged from the entire

image (except the critical area on the left side). In the context of an appearance-based

135

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a1) (b1) (c1)
T A if A TN}; - - ‘ L 1 1.li ‘ J
(a2) , (b2) g KC?) ,
‘ * I
Y '1"
T *4 k - - 1 TNJA - ‘ - k T A - - A k
(33) (b3) (C3)

Figure 5.2: Why attention is needed? The solid small circles denote the robot with a
short line segment at the front indicating orientation. Thick lines mark the walls along the
corridor. T and r denote the threshold and the mean, respectively. (a2), (b2) and (c2) are
range images taken by the robot at the three scenarios, respectively. (33), (b3) and (c3) are
the corresponding images after passing the attentional module. The diagrams in lower two
rows use logarithmic scales for the Y-axis. We can see that the distance between (a3) and
(b3) becomes larger while the distance between (b3) and (03) becomes smaller.

method, this means that the distance (e.g., Euclidean one) between the two is small.
They require different actions: turning right for (al) and going straight or turning left
for (b1). In another case, range images (b2) and (c2) are very different globally, but
their desired actions are similar: going straight or turning left. Thus, it is difﬁcult to
discriminate the three cases correctly by using a distance metric deﬁned on the entire
image. But, if we look at the left regions in (a2), (b2), and (c2) of Fig. 5.2, we can
see that the similarities and differences are clear. Without a capability to attend to

this critical region, the learning system requires signiﬁcantly more training samples

136

when complex scenarios are considered.

In the above three cases, the critical area is the input component where range
readings are very small. This is true, in general, because near obstacles determine
heading more than, and often take precedence over, far-away objects. As we shall see
later, this can be accomplished by an attentional module.

We ﬁrst deﬁne the scalar attentional effector.

Deﬁnition 5.3.2 The operation of the attentional eﬂector a(t) for input r(t) and
output z(t) is deﬁned by:

l(t) = 9(T(t), a(t)) = (53)

where r denotes the sample mean of the raw signal r(t).

For intended application, we would like to have a(t) to behave in the following
way. First, when all input components have large values, the attentional selection
is in its default mode, turning on all components. Second, when there are nearby
objects, the attentional selection activates only nearby objects which are critical for
object avoidance while far-away objects are replaced by their mean readings. This

attentional action can be realized by two programmed functions 9 and f :

a(t) = g(n-(t), a(t)) (5-4)

and

1 if r,- <Tor \7’j rj(t) 2T,
a.(t) = f(r(t)) = (5-5)
0 otherwise,

where T is a threshold, i = 1,2, ...,n, and r(t) = (r1(t),r2(t),...,rn(t)) denotes the
input vector. We write z(t) = (21(t), 22(t), ..., zn(t)) as the output of the attentional
action and a(t) = (a1(t), a2(t), ..., an(t)) as the attention vector. The above function

f will suppress some far-away components (a,(t) = 0) if there are objects closer than

137

 

 

 

 

 

 

 

 

r(t)

 

w);

 

 

 

Figure 5.3: The attentional signal generator f and attentional executor g. r(t) and z(t)
denote the input and output, respectively.

T. If all readings are far—away, we do not want to turn the attention off completely
and, therefore, we leave all attentional effectors on (Vj, a]- (t) = 1). This operation is

illustrated in Fig. 5.3.

In practice, this raw attentional vector a(t) is smoothed by convoluting with a ﬂat

window, as
i+5

1
I — — .
a,(t) _ 11 Z a,(t)],
3:1-5
where [] denotes rounding to the nearest integer. This smoothing serves to eliminate

point-wise noise and to provide a neighborhood inﬂuence to the output attentional

vector.

In Fig. 5.2, the readings of the left part of diagrams (b2) and (c2) are smaller than
T. Thus, only the left part passes through the attentional module without change
whereas other parts are suppressed by being set to the mean, as shown in (b3) and
(03) of Fig. 5.2. This is needed for the robot to pass through tight areas, where a small
change in the width of a gap determines whether the robot can pass. The attentional
mechanism enables the robot to focus on critical areas (i.e., parts with close range)
and, thus, the learned behaviors sensitively depend on the attended parts of the range

map. All range readings are attended when there is no nearby object as shown by

(a2) and (a3) of Fig. 5.2.

In Fig. 5.1, the learner IHDR is a hierarchically organized high—dimensional re-
gression algorithm. In order to develop stable collision-avoidance behaviors, the robot

needs sufﬁcient training samples. Here, we show that the attentional mechanism

138

greatly reduces the number of necessary training samples when there are objects
close to the robot. To quantitatively analyze the attentional mechanism proposed,

we make the following deﬁnition.

Deﬁnition 5.3.3 Consider a scanner operating in an environment, which can be
approximated by piecewise 3—D planes for simplicity. Each range map 1‘ can be ap-
proximated by a polygon with h segments (as in Fig. 5.4 (a)). P = {p1,p2,...,ph}
is the map, where the ith end point p,- is denoted by its polar coordinate (r,,a,-).
Without lost of generality, the angle coordinate are sorted: (11 < 02 < < ah.
P’ = {p3,p’2,...,p;,} is the post-attentional map, where p: = (z,,a,~) whose range 2,-

has been deﬁned earlier.

Remark. The larger h is, the closer the approximation of r in general. In a particular
case, the polygon representation becomes a regular grid when h = n.

We write the post-attentional approximation P’ as the function of P, i.e.,

P' = g‘(P). (5.6)

The attentional mechanism, deﬁned in Eq. (5.6), is not a one-to-one mapping, as
shown in Fig. 5.4. The post-attentional map P’ is the representative for a set of
pre-attentional maps besides P if condition C: Bpj p,- E P A lj < T is satisﬁed. We
denote this set by ’R(P’), i.e. R(P’) E {Plg‘(P) = P’}. The following theorem gives
a lower bound of the average size of R( P’) when there are objects within the distance
T from the robot.

Theorem 1 Let A and rm denote, respectively, the range resolution and maximum
distance of each radial line. If the ith end point p,- ’s radial length r,- is a random
variable with a uniform distribution in the sample space {0,A, 2A, ...,rmA}. Then
the average size of the set R(P’) conditioned on C is:

(1 - plhqh"
1— p" ’

EP'iRip’HC} >

 

139

where q = (rm - T)/A, p :2 (rm —- T)/rm, and Epr{-|C} denotes the expectation on
condition C.

Proof Consider the cases where there are k (1 g k S h) end points located within
the half-circle T (see Fig. 5.4). The number of possible conﬁgurations for the h — k

end points outside the circle, denoted by sk, is:
3,, = qh—k. (5.8)

Because the radial distance of the h — k end points have the freedom to choose values

from the interval [T, rm], which has q discrete values. By deﬁnition:
h
EP'{R(P')IC} = Z skPUch).
k==l

where P(k|C) denotes the conditional probability when I: end points are located

within the half-circle T. We can see
P(le) = 050- Ink/(1 - Phl-

Therefore,

h k _ k
EP'{R(P’)} = th_.____c.<1_pf>
k=l

 

2 22:0 Ctr—*0 - 1))" - q"
1 — ph

(q+(1-p))"-q" > (1 -r)hq”"1
1-19" 1-19" '

In the last step, the inequality, (x + (5)” — x” > nxn‘lii if 0 < 5 << x, is used. [I

 

 

Table 5.3.2 shows the lower bound of the size due to attention at two typical
parametric settings. We see that the reduction is large. Of course the size of remaining

space to learn is also large. The ratio of space reduced over original space is roughly
p.

140

Table 5.1: The lower bound of the reduction ratio at several parametric setting

 

Parameters Reduced size
rm = 50m, A = 0.2m, T = 2m, and h = 10 9.51 x 1020
rm = 10m, A = 0.2m, T = 2m, and h = 15 7.52 x 10‘22

 

 

 

      
 
 

' I

 
 

I
III
III
II’(

I
”Iva,
I a
f

’

  
 

I
O

I , --’

I

’

 

 

 

 

Figure 5.4: (a) shows that the pre—attentional map r is approximated by a polygon P. P is
speciﬁed by h end points: p1, p2, ..., ph. The kth end point is denoted by (lk, ak). (b) shows
the post-attentional approximation P’ , after the attentional function 9". Those end-points
whose distances are larger than T are set to 1". The half-circle with symbol T shows the area
the robot pays attention to. We can see numerous pre-attentional approximations map to a
single post-attentional approximation P’. It is clear that the points outside the half-circle
of (a) have the freedom to change positions without affecting the shape of (b).

5.3.3 Path planner

The path planner gives the desired heading 6 required by obstacle-avoidance, based
on the robot’s estimated pose and the occupancy model of the environment. We
divide the path planning into two steps: First, a global search algorithm is applied
on the occupancy grid model of the terrain. Second, to account for unknown and
dynamic parts of the environment, a partial search algorithm is used within the scope
of the robot’s current sensing.

How to estimate the robot’s pose is not the scope of this chapter, we assume the
robot is given a method for calculating its coordinates in the occupancy grid via its
sensing. For example, GPS receiver gives the robot’s location within a digital map as

the robot navigating outdoor; for indoor environment, Kalman ﬁlter (KF) estimates

141

 

the robot pose based on the positions of the observed beacons and odometry.

More formally, let G = (V, C) be a weighted grid graph. V is a set of cells while
C is a cost function deﬁned on the cells. In the grid graph, each cell 3, except at
the boundary, has eight direct neighbors, denoted by Neighbor(s). The cost of an
cell x E V is given by c(x), representing the likelihood of the cell being occupied by
obstacles. For simplicity’s sake we assume 0 S c(x) S 1 (e.g., zero means the cell is
not occupied while one means the cell is occupied). We deﬁne a collection of paths

through G, from the source 3 to the destination d, as
r(C;s,d) = {(30,31, ...,sN) | sn+1 E Neighbor(sn),so = s,sN = d,n = 0, ...,N — 1}.

The cost of a path is then deﬁned by

N

c(so,sl,...,sN) = 20(371) (5.9)

n=0

and we seek the minimal cost path through the grid graph. It is well known the
path with minimal cost can be found via dynamic programming (DP) by recursively
calculating k(x) and b(x), where k(x) denotes the minimum-cost path from the x to
d cell and b(x) the next cell from r to the goal cell along the minimal cost path. k(x)
and b(x) are deﬁned respectively as:

O x=d
k(x)=

minyeNeighbo,($){k(y) + c(x)} otherwise,

and
0(3)) = a‘rgrninyeNeighbor(x){1"(y) + C(33)}.
The path with minimal cost is then ($3,313 ...,s}',,), where 53 = s and s; = b(s;_1),

142

u 15"

 

 

 

Figure 5.5: The occupancy grid of an environment. Black cells denote to occupied cells,
while others are free ones. Each cell has eight directly connected neighbors: N, NE, E, SE,
S, SW, W and NW, each of which corresponds to an angle: heading 9. The optimal path
from S to G can be encoded by the heading sequence: (E, E, E, NE, NE, NE, NE, N, N,
N).

for n = l, 2, ..., N. The minimal cost from the source to destination is
c(s,d) = c(sa,sI, ...,sjv) = k(s).

Since the cost function c(x) is positive, Dijkstra algorithm [28] can be used to search
the optimal path. If the priority queue is used, the complexity of the algorithm is
0(IVlloslVI) [24]-

Now we can deﬁne the planned heading 0 based on the function b(x), as illustrated
in Fig. 5.5. The heading d(x) of the cell :5 has eight possible directions: N (90°), NE
(45°), E (0°), SE (—45°), S (——90°), SW (—135°), W (180°), and NW (135°). We
deﬁne the head 6(x) as the direction toward the next cell along the optimal path,
e.g., the heading of the cell x in Fig. 5.5 is NE.

The above-mentioned DP procedure computes the heading 0 for the entire occu—
pancy grid. This is motivated by the fact that the same heading can be reused for
every location of the robot if the environment does not change. To account for the
discrepancy of the occupancy grid from the actual environment, re—planning is needed.

For example, when the robot senses an obstacle blocking the planned path, the path

143

is replanned from the robot’s current pose. It is not necessary and not desirable to
recompute the heading for the entire grid again. Instead, the heading is computed
within the scope of robot’s current sensing. In other words, this local-search strategy
limits the look-ahead search within the vicinity of the area enveloping the robot’s
current laser map.

More speciﬁcally, let L(x) denote the cells within the range of the robot’s sensing
from the pose x, which is a subset of the whole occupancy grid S. B(x) denotes the

boundary of L(x) but not belonging to L(x), i.e.,

B(x) = {z I z E S — L(x),3y (z E Neighbor(y))}.

Now we are ready to specify the local search scheme in Algorithm 1. Providing the
cost of a cell is positive, Eqs. (5.10) and (5.11) can be calculated by Dijkstra algorithm
and the complexity is O(|L| log(|L]). By carefully choosing the size of L, Algorithm

1 can be realized on a real-time mobile robot.

 

Algorithm 1 Local search algorithm within L

 

1: if the planned path is obstructed by an obstacle then
2: Let item be the robot’s current pose.
3: For each cell x E L(xcur), set [c(x) to its initial value k0(x) (e.g., non-informative
zero).
. for all each cell x E L(xcur) do
5: Look ahead and value update: Update the heuristic estimate k(x) by:

[C(13) =max{k($). min [C(IIy)+k(y)l}, (5-10)

ye B(zcur)

where c(x, y) denotes the minimal cost of path enveloped within L from the
cell x to y.

6: end for

7: Choose a path to a boundary cell y E B (17cm) such that

y = argminy68(xcur)[c(xcuri y) + [C(31)] (5'11)

8: Return the head 6 along the found path from xcu, to y.
9: end if

 

Not being an exhaustive search strategy as N F1 [57], oscillation re-plannings occur

144

 

 

rarely in badly conditioned obstacle conﬁguration but never endanger the accomplish-
ment of the task. More speciﬁcally, like LRTA* [54], the proposed re—planning scheme
never fails to reach a goal under assumption the the k-values of all cells outside space
L do not overestimate the true values. Our scheme is better than the LRTA* in the
oscillation issue by employing look-ahead within the envelope of current laser map,
while LRTA* has an one-step look-ahead strategy. We obtained reasonable subopti-
mal solutions for almost all cases faced by mobile robots, for example, this strategy
can deal with all kinds of U-shaped obstacles as long as they are detectable within

the robot’s local sensing scope.

5.3.4 Cost function

The cost function of the paths r(G; s, d) in Eq. (5.9) can easily modiﬁed to incorporate
the length and curvature of path. It is desirable to obtain a short and less Wiggly

path. The modiﬁed cost function is then deﬁned as:

N—l N—l N—l
c’(30. ....s~) = a Zea.) +ﬂ2dist<smsn+n +72 I 6.._1+0..+1— 20.. I. (5.12)
n=0 n=0 n=l

where dist(x, y) denotes the Euclidean distance from the cell x to y; (1, ﬂ, and 'y are
weighting parameters controlling the inﬂuence of each factors. The cost deﬁned in
Eq. 5.12 increases with the length and Wiggly nature of a path. This makes the path

search algorithm preferring short and straight (less Wiggly) paths.

In addition, as in NF 1, the cost function Eq. 5.12 produces trajectories grazing
obstacles (e.g., see (b) and (c) of Fig. 5.6). The reason is that there is no mechanism
enforcing the result path maintaining a minimum clearance from the obstacle. We

hence add another term 211:: V(sn) into Eq. 5.12, where

dist(y,x)gT

145

 

 

 

The term V(x) is designed in the same fashion as repulsive ﬁeld. It sums up all
x’s occupied neighbor cells within the distance T. The more occupied cell nearby,
the larger V(x) is. For example, if x has a clearance distance larger than T for all

occupied cells, then V(x) = 0.

Thus, the new cost function is:

N—l
c"(so, ..., sN) = c’(so, ..., SN) + 6 Z V(sn) (5.13)

n=0

Fig. 5.6 shows the result of an evaluation for our proposed planning scheme in a
simulated environment. The size of the grid is 30m by 25m; the resolution is 10cm
and the robot’s sensing scope is not more than 15m. In Fig. 5.6, (a), (b), (c), and (d)
illustrate the planned paths under different cost functions. (a) uses the cost deﬁned in
Eq. ( 5.9) without regularization terms. Because the cost of the free cells is zero, the
total cost of path is zero despite the Wiggly nature of the path. (b) shows the planned
path after adding the Euclidean term (the second term in Eq. (5.12)). The result
demonstrates the straightness of path is improved comparing that of (a), but there
is a noticeable portion of path grazing the wall and some undesirable turns near the
goal. Adding the curvature term (the third term in Eq. (5.12)), both the clearance
and straightness of the path are improved, as shown in (c). However, the clearance
of (c) is not satisﬁable since some portions of path still touch the occupied cells. ((1)
shows the yielding path by introducing the repulsive term deﬁned in Eq. (5.13). The
repulsive force pushes the result path away from the occupied cells, hence the path

in (d) is signiﬁcantly better than those of the other three.

The third row of Fig. 5.6 shows a situation where sensory information indicates
that the original planned path is obstructed by an unmodeled U-shaped obstacle.
The heading is needed to recomputed within the shaded area, indicated in Fig. 5.6
(e). This shaded area is the envelope of the laser range map in the occupancy grid.

After the new path is planned, the robot moves along the new path while checking

146

whether new obstacles block the planned path. (f) illustrates the ﬁnal actual path of

the robot and the updated occupancy grid.

5.4 The robotic system and online training proce-

dure

The tests were performed in a humanoid robot, called Dav [37,125], as shown in
Fig. 2.1. This humanoid is built in the Embodied Intelligence Laboratory at Michigan

State University.

Mounted on the front of Dav, the laser scanner (SICK PLS) has 180° of view
and 05° of resolution. The local vehicle coordinate system and control variables are
depicted in Fig. 5.7. During training, the control variables (r, d) are given interactively
by the position of the mouse pointer P through a graphical user interface. Once the
trainer clicks the mouse button, the following equations are used to compute the
imposed (taught) action y = (vww):

11” = M (5.14)
w = v / r
where K is a positive constant and r denotes the radius of the are through the center
of robot and the mouse pointer. The mouse pointer hence controls the translation

and angular speeds simultaneously, which facilitates the online training procedure.

Dav’s drive-base has four wheels, each driven by two DC motors. Let q denote the
velocity readings of the encoders of four wheels. Suppose v1. and v,, denote the base’s
translation velocities, and w denotes the angular velocity of the base. By assuming

that the wheels do not slip, the kinematics of the base is: [125]:

q = B(v,,v,,w)T, (5.15)

147

where B is an 8 x 3 matrix, decided by the wheels’ conﬁguration (known). The base

velocities (vx, vy,w)T is not directly available to learning. It can be estimated from

the wheels’ speed vector q in a least-square—error sense:
v = (v1, vy,w)T = (BTB)_IBT('1. (5.16)

We use two velocities, (vww), as the control vector y. Thus, the IHDR tree learns

the following mapping incrementally:

y(t + 1) = f(Z(t),V(t))-

During the interactive learning, y is given. Whenever y is not given, IHDR approxi-
mates y while it performs (testing). At the low level, the controller servoes 5 based

on y.

5.4.1 Online incremental training

The learning algorithm is a simpliﬁed version of sensorimotor algorithm presented in

Section 3.5.6 and is outlined as follows:

1. At time frame t, grab a new laser map r(t), the wheels’ velocity (l(t) and heading

d(t). Use Eq. (5.16) to calculate the base’s velocity v(t).

2. Computer a(t) based on r(t) using Eq. (5.5). Apply attention a(t) to given
z(t) using Eq. (5.4). Merge z(t), v(t), and 0(t) into a single vector x(t) using
Eq. (5.1).

3. If the mouse button is clicked (training), Eq. (5.14) is used to calculate the

imposed action y(t), then go to step 4. Otherwise go to step 6.

4. Use input-output pair (x(t), y(t)) to train the IHDR tree by calling Procedure

AddPattern in Appendix B as one incremental step.

148

 

 

5. Send the action y(t) to the controller which gives q(t + 1). Increment t by 1

and go to step 1.

6. Query the IHDR tree by calling the Procedure Retrieval in Appendix B and get
the primed action y(t + 1). Send y(t + 1) to the controller which gives (1(t + 1).

Increment t by 1 and go to step 1.

The online incremental training process does not explicitly have separate training and

testing phases. Whenever y is not given, the robot performs.

5.5 Experiments and results

Two kinds of experiments were designed and conducted. The ﬁrst kind was purely
object-avoidance behavior without modeling the environment. The second one incor-
porated the proposed path planner for the goal-directed navigation. It is interesting
to note the complete knowledge of the environment is not necessary even for the sec-
ond kind. The robot automatically built and updated the occupancy grid based on

sensing.

5.5.1 Attention mechanism

Example 1. In this simulation experiment, it will show the importance of the atten-
tional mechanism for generalization. Two IHDR trees were trained simultaneously:
one used attention and the other used the raw range image directly without attention.

2 as shown in Fig. 5.8.

We interactively trained the simulated robot in 16 scenarios
During this training process, 1971 samples were acquired and each of the sample is

associated with a label given by the trainer online.

 

2There are several training episodes for each scenario. For simplicity, we only show the robot’s
trace of a single episode in Fig 5.8. We do not show the goal pose, which is at right of each scene,
except (2) whose goal pose is at the top.

149

 

 

Table 5.2: The results of tests with random starting positions.

 

 

 

Range Mean error Mean error
(with attention) (without attention)
9 [0, it] 0.05 0.09
v [0, 1.0] 0.005 0.007

 

Table 5.3: The results of tests with random starting positions.

 

With attention Without attention
Rate of success 0.91 0.63

 

 

In order to test the generalization capability of the learning system, we performed
a leave-one—out test for both IHDR trees. The 1971 training samples were divided into
10 bins. We chose 9 bins for training and left one bin for testing. The mean square
error (MSE) of the cross-validation test of two IHDR trees are shown in Table 5.5.1.
Comparing the results, we can see that the mean error was decreased about 40 percent
by introducing attention, which indicates that generalization capability was improved
by adding attention.

The tests were performed in an environment different from the training scenarios.
In Fig. 5.9 (a), with attention, the simulated robot performed successfully a con-
tinuous 10—minute run. The robot’s trajectory is shown by small trailing dot-lines.
Remember that no environmental map was stored across the laser maps and the robot
had no global position sensors. Fig. 5.9 (b) shows that, without attention, the robot
failed twice in a 3—minute test run.

Secondly, we ran a test in which the two learned IHDR trees were performed on
the environment shown in Fig. 5.9 for 100 times. Each time we randomly chose a
start position in the free space. In Table 5.5.1, we report the rate of success for the
two trained IHDR trees. We treated a run to be successful when the robot can run
continually for three minutes without hitting an obstacle. From the Table 3, we can

see that the rate of success increased greatly by introducing attention.

150

 

5.5.2 Obstacle-avoidance on the Dav robot

Example 2. In this experiment, the robot’s only goal is to move safely according to
the scene. Such a navigation system is useful for applications where a human guides
global motion but local motion is autonomous.

The Dav robot has been trained at different object conﬁgurations. IHDR was
used to learn the mapping from sensory context (laser map and the robot’s current
speed) to desired command (i.e., translation vy and angular speeds w) for robot. The
desired command was supplied by online interactively via a graphical user interface
(see Fig. 5.7). Totally 4,655 samples were used for the reliable collision avoidance
behavior.

The Dav robot has been repeatedly tested for the learned range-based collision
avoidance and performance has been very satisfactory. For example, during a visit
by high school students, as shown in Fig. 5.10, Dav successfully navigated in this
dynamic changing environment without collisions with moving students. It is worth

noting the testing scenarios were not the same as the training scenarios.

5.5.3 Integrate with the path planning layer

Example 3. In this simulated example, we show the results of integrating the path
planning component with the learned obstacle-avoidance behavior on a bounded en-
vironment (43 meters by 30 meters). The resolution of the grid is of 10 cm and the
robot’s sensing scope is of 15 meters (for re—planning). Illustrated in Fig. 5.11, all the
obstacles (shaded polygons) were completed unknown in advance to the simulated
robot. Like the attention experiment (Example 1), we trained the robot on the 16
scenarios, shown in Fig. 5.8, but with speciﬁed goal headings 6. About 2000 samples
were labeled online and trained for the low-level obstacle-avoidance behavior.

The performance of the proposed navigator with the path planner was veriﬁed from

the same inital conﬁguration to different goals in this environment. Fig. 5.11 depicts

151

 

 

some difﬁcult cases where the goals are hidden by a U-shaped object or the goal are
located very near the obstacles. The navigator was able to navigate successfully to

the goals in all cases.

Example 4. The integrated navigating system has been implemented and tested on
the Dav robot (see Fig. 2.1). The size of occupancy grid is 37.3 meters by 24.0 meters
with the resolution 0.05 meter. The look-ahead sensing scope is not more than 10
meters. The heading has coarse resolution (i.e., eight angles); we hence convoluted it

with a ﬂat window for smoothing, that is:
10
0’(t) = Z 6(t — 2').
i=1

Since the way of localizing the robot is not the topic of here, we only mention
methods employed in this experiment here. Before navigation, the robot knew the
occupancy grid of the environment and the sensor-detectable landmarks (beacons),
such as the corner points and lines. Assuming the robot knows its initial pose in the
map, the Kalman ﬁlter was used to estimate the robot’s pose by fusing the odometry

and matched landmark information.

Fig. 5.12 shows an execution of the proposed navigator on a corridor corner on
the second ﬂoor of the Engineering Building at Michigan State University. Obstacles
are shown in black in the occupancy grid. Except the walls, all other obstacles were
unknown before navigation. As the robot starts moving, observed new obstacles
were added to the grid. The local search algorithm was employed to recompute the
new heading within the robot’s sensing scope if the obstacle obstructed the planned
path. Fig. 5.12 (a) corresponds to the moment after the robot starts for about a
minute; obstacles were only those directly observed from its current pose. The red
curve (labeled with “A”) depicts the current planned path. The blue curve (labeled
with “B”) shows the robot’s trajectory. Fig. 5.12 (b) illustrates the ﬁnal result of

Dav’s navigation task, including the planned path and the robot’s trajectory. Dav

152

 

 

 

successfully navigated on a number of obstacle conﬁgurations, and there was no case

in which the robot ran into obstacles or the goal was not achieved.

5.6 Discussion and Conclusion

The system failed when moving obstacles were outside the ﬁeld-of-view of the laser
scanner. Since the laser scanner has to be installed at the front of the robot, nearby
objects on the side are not “visible.” This means that the trainer needs to “look
ahead” when providing desired control signals so that the objects are not too close
to the “blind spots.” In addition, the trainer may label the samples such that the
translation speed is proportional to the side clearance. The robot learns traveling at
high speed through a wide corridor with sparse objects while slowing down through a
narrow corridor or areas with dense objects. This shows the online real-time learning

offers more ﬂexibility than that of manually designed behaviors.

Our local search scheme may not be globally optimal if the k-values of the cells
outside the sensing scope (L) are not equal to their true values. However, as argued
by Simon [85], it is relatively rare that real-time applications need strict optimal
solutions and near—optimal ones are often acceptable. The planner’s 0([LI log(|L[))
complexity allows us to search on a fairly large space for such a near-optimal solution.
Further, the planner’s termination at the goal is ensured if the k-values of cells outside

L are not overestimated the true values.

This chapter described a range-based obstacle-avoidance learning system with a
path planner implemented on a mobile robot. The attention selection mechanism
reduces the importance of far-away objects when nearby objects are present. The
power of the learning-based method is to enable the robot to learn very complex
functions between the sensing context and the desired behavior, such a function is
typically so complex that it is not possible to write a program to simulate it accurately.

Indeed, the complex range-perception based human action learned by y = f (z,v, 9)

153

 

 

is too complex to write a program without learning. The success of the learning for
high dimensional input (z, v, 6) is mainly due to the discriminating power of IHDR,
and the real-time speed is due to the logarithmic time complexity of IHDR. The
optimal subspace-based Bayesian generalization enables quasi-optimal interpolation
of behaviors from matched learned samples. The online incremental learning is useful
for the trainer to dynamically select scenarios according to the robot’s weakness (i.e.,
problem areas) in performance. It is true that training needs extra efforts, but it

enables the behaviors to change according to a wide variety of changes in the scenes.

154

 

 

 

 

 

 

 

 

 

_. . xxx.“

 

 

)

f
Figure 5.6: The result of tests on a simulated environment

8

155

 

 

 

Robot

Figure 5.7: The local coordinate system and control variables. Once the mouse button is
clicked, the position of the mouse pointer P provides an imposed action. Let r denote the
radius of the arc. The arc’s length (1 controls the translation speed while the radius controls
the angular speed.

156

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i
III/”l ‘\\ D / ’’’’’’’
..... '“xaw—G .,--‘\_// \\\m/’”e ..—" D 3
D e
//T\$ D ___,,--.-»—---.-9 / ,,,,, .9.
'/ D \__,_,...-»6 y /’/ y
-- D D ...
s ,. s -.., 9,,./ ON. ,,,_-_.._,
\\../’ -_.. __"_,,-/"" 3:9 _”/ (y
"\“ & //"---F“\ QB //"~\ I/xfx‘s‘
\\\\ ,I a __ "”x’ a I, p , \
x--.» g -_.. g c7 .J
D ./’ D D O
(14) (15) (16)

 

 

(13)
Figure 5.8: The 16 scenarios are used to train the simulated robot. The small solid circles
denote the robot, and solid dark line represents the walls. The dot line recorded the online

training trajectories.

157

 

1"-

 

‘0-..
o
-..-"

-'
1_
\

 

I

.4"
l
I
.
’4

 

Z,

 

- .-._ -,_.—
--.--.-

\
‘5
." ‘
.--"
,-

 

P._

'4
\

 

& ‘~

I
a
5
e
‘5

 

 

 

A

 

 

0
V

‘s...-'

 

 

 

 

 

 

(b)

Figure 5.9: The solid dark lines denote walls and the dot lines show the trajectory. Ob-
stacles of irregular shape are scattered about the corridor. (a) A 10—minute run by the
simulated robot with the attentional module. (b) The result of the test without attentional
selection. Two collisions indicated by arrows are occurred during the ﬁrst two minutes
simulation.

 

 

  

l
I
I
I
I
I I
I
o
l
1

o
o
I
I
I
I
I
l
I I
’l I
I
s , X
, l a
l I f
I ' '
I I l I
I,’ , '0
f” ' l
l I
XI , i
r 1 I l
x I l
I l I
I I l ‘a
I, ' ;' v'
1 ,-
I . a:-
I I: ___:3'
l _u‘ .-
I .— .-
Start' ___________
1.5M

Figure 5.11: Navigating in an simulated environment with unknown obstacles. The initial

conﬁguration is chosen to be about (4.8111, 3.0111). The goal conﬁgurations G1, G2, G3 and
G4 are chosen to be (37.1m, 25.0m), (29.6m, 24.3m), (18.6111, 27.6111) and (16.2m, 25.6m),
respectively. For clearness, we do not draw the portion of trajectory very close to the goals.

 

 

 

159

 

 

 

(b)

Figure 5.12: Dav run in the northwest corner of the corridor on the second ﬂoor of Engi-
neering Building at Michigan State University. The red curve (or labeled with “A”
depicts the actual planned path from the path planning layer, while the blue one (or
label with “B”) shows the actual trajectory in the occupancy grid.

160

Chapter 6

Conclusions

6.1 Concluding Remarks

This dissertation described the development of the Dav robot. This general-purpose
anthropomorphic robot has two goals: to provide a general purpose, ﬂexible, and
dextrous robotic platform from the engineering aspect and to understand human au-
tonomous mental development from the scientiﬁc aspect. Dav consists of a total 43 of
degrees-of-freedom (DOF), including wheel-driven base, torso, arms, hands, neck and
head. The body may support a variety of locomotive and manipulative behaviors. For
perception, Dav is equipped with sensors such as vision (two cameras), auditory (two
microphones), a laser range ﬁnder, tactile sensors, and somatic sensors (e.g., encoder
and strain gauges). Dav is untethered and self-contained with all the computational
resources and power sources onboard. The base’s redundantly actuated mechanism

provides holonomic locomotive capability.

The mechanical structure and kinematics of the robot were analyzed. The D-H
coordination representation of the joints was derived. To facilitate the low-level joint
control, Dav’s dynamic model was derived based upon the Lagrange-Euler formula-
tion. The dynamic model describes the interference between the robotic links and

relation between the input joint torques and the output motion. Based upon the dy-

161

 

namic analysis, we find that the gravitational force dominates when the robot moves
at low speeds. Therefore, the implemented proportional derivative feedback law with
gravitational compensation is asymptotic stable. The mobile base’s dynamic analysis
and control were described. The overall embedded control system and the underlying
software architecture were introduced.

A general-purposed and task—unspeciﬁc architecture was proposed for the Dav
robot. The proposed architecture leaves much freedom for different implementations
of its major components, e.g., sensory, cognitive, and motor mappings. Although the
Dav developmental program that is currently being implemented could be explained
in further detail, such details are beyond the scope of this dissertation.

We also presented a range-based obstacle-avoidance system with a path planner.
The attention selection mechanism reduces the importance of far-away objects when
nearby objects are present. The power of the learning-based method is to enable the
robot to learn a very complex function between the sensing context and the desired
behavior, such a function is typically so complex that it is not possible to write a
program to simulate it accurately. The integration with the path planner makes sure
the navigation system is free of local minima.

Finally, we need to remark that the work reported in this dissertation has not
reached the level of a full-ﬂedged developmental robot but can serve as a starting

point of the on-going research on the Dav robot.

6.2 Future Work

In this section, we ﬁrst recommend some improvements for the current implementa—
tion of the Dav robot; then, we list a few possible future perspectives along the line
of research of the developmental robot.

As a prototype robot, the Dav robot has some mechanical limitations and hence

needs improvement.

162

 

0 Drive base. Although the redundantly actuated four-wheel mechnanism pro-
vides increased contact area, robustness in coping with irregularities of floor,
and high clearance to tolerate the step features of the ﬂoor; it requires a very
complex control scheme to synchronize the wheels. Turning wheels vertically
also consumes lots of power to overcome the friction on a rough surface. In
addition, the current base module occupies too much space since eight motors
are needed. To simplify the base module, the author suggests use of a specially
designed wheel mechanism, which has been available in the market (e.g., the
omni-wheel from Kornylak Corporation). We can see the wheel actuated on
a direction while rolling passively along the perpendicular direction; hence, no
nonholonomic constraint exists. Three of those wheels are sufﬁcient to realize

the holonomic motion; hence the size of base module is reduced.

0 Arm. The current version of arm mechanism is a serially actuated arm with
rigid links. Also, in order to simplify the driving mechanism, DC motors are
directly mounted at a place close to the joints. As a consequence, the motors of
the low arm are a heavy load on the motors of the upper arm. In order to solve
the problem, the author suggests using a more sophisticate 4DOF spherical
shoulder mechanism [90]. This design moves all the shoulder motors into the

torso.

0 Hand. Dav’s hand design has a similar shortcoming as the arm. Placing the
motor close to the ﬁnger joints causes the hand to become intolerably heavy. A
more complex mechanism is needed to move the ﬁnger motors to the wrist mech-
anism. The author suggests using a mechanism similar to NASA’s Robonaut

hand [63].

0 Embedded system. Dav’s distributed embedded system currently uses the pro-
totype board (P8555) from Arman Corporation. The size of the board is a
little big. A customized PowerPC board (MPC555, Motorola Corporation)

163

 

with more ﬂash memory and static RAM memory is needed. For example,
phyCORE—MPC555 from Phytec America, LLC may be a better choice than

the old one.

Research in humanoid robotics has uncovered a variety of new problems and a few
solutions to classical problems in robotics and control theory. There is now a wide
consensus in the community that a humanoid that can duplicate humans’ abilities
will be a “grand challenge” for this millennium.

Autonomous mental development by a robot is an interdisciplinary area. It has
drawn upon work in artiﬁcial intelligence, developmental psychology, ethology, sys-
tems theory, philosophy, and linguistics. This dissertation represents one of the early

stage work in this line of research. This work has raised a few future perspectives:

o Dav’s developmental architecture has yet to be implemented and tested, but
its predecessor, SAIL, has successfully tested some major components of the
designed architecture. The presented architecture will be tested in future studies

and the performance will be reported.

0 Attention mechanism. The ﬂood of information (e.g., video stream) that enters
a system can easily overwhelm the system’s limited resources. To solve the
problem, attention, a selective processing must be included to the system to
let a part of the sensory input (e.g, goal related information) passing through.
In our collision avoidance system, a simple programming-in attention effector
is implemented. However, there is not much work done to develop (or learn-
ing autonomously) an attention system. This attention mechanism is primarily
important in a robot’s outdoor navigation experiment, where the lighting con-
dition changes dramatically. The robot should pay attention to a small part of
the scene (e.g., the road boundaries) in which the contrast is relatively stable

comparing the change of pixel values of the overall image. It is known that

164

 

‘1

attention can help a system to work in untrained scenes, however, it is not
clear how to develop an attention mechanism in an unstructured and highly

inconsistent environment.

Value system. The attention mechanism is an internal effector which is not
observable by outsiders. Hence, supervised learning is not suitable for learning
such an attention system. We believe a sophisticated value system is indispens-
able for development of a useful attention system. The problem becomes how to
design the architecture and to ﬁnd the computational model for the value sys—
tem. In Chapter 3, we presented an architecture for the value system (Type-5 '
DOSASE MDP). However, more detailed work is needed to implement it.

How to integrate high-level symbols? We observed that very young children
(birth to 2 years) [71—73] cannot manipulate symbols. When the children get
older, they develop symbolic representation based on their early sensorimotor
experiences. It is unknown how those sensorimotor grounded symbols are ex-
pressed in the brains. However, from engineering perspectives, we can shortcut
the development process, by plugging some human chosen symbols into the
“brain” of a developmental robot at early stage. This can be justiﬁed since
the human brain is not totally blank at birth. The biological brain is biased
with some innate behaviors (e.g., rooting and sucking). In addition, as Chen
and Weng [23] have shown, the performance of the indoor navigation system is
improved by introducing environment states, which can be regarded as symbols.
In Chapter 5, an occupancy grid was used to resolve the local minimum issue
of reactive behaviors. How to design a representation of the environment but

leaving sufﬁcient freedom for later learning is still a challenging problem.

165

 

 

Appendix A

IHDR algorithm

Procedure 1 AddPattem. Given a labeled sample (x, y), update the IHDR tree
T.

1: Find the best matched leaf node c by calling Procedure 3.
2: Add the training sample (x, y) to c’s prototype set E.

3: Find the closest cluster to the x and y vectors by computing:

m = arg min (wal—x—LEi—U. + wyw), (A.1)
151361 0,, 0,,

where mm and 21),, are two positive weights that sum to 1: w, + wy = 1; oz and 0,,
denote incrementally estimated average lengths of x and y vectors, respectively;
and c,- and y, denote, respectively, x-center and y—center of the ith cluster of the
node c.

4: Let u(n) denote the amnesic function that controls the updating rate, depending

on n, in such a way as:

0 ifngnl,
u(n) : b(n — "ll/(n2 — m) if n1 < n S n2 , (A2)
b+(n—-n2)/d ifn2<n,

166

 

where n denotes the number of visits to the cluster the closest m (see Eq. A.1) in
the node c, and b, n,, n2 and d are parameters. We update the x—center cm and
y-center ym of the cluster m in the node c, but leave other clusters unchanged:
cm(n) = Llflﬂcmm — 1) + 1—+—:(—")x, (A 3)
MW = who — 1) + —Uy '
where cm(n — 1) and ym(n — 1) are the old estimations (before update) of x-center
and y-center of the mth cluster in the node c, respectively, while cm(n) and ym(n)
are the new estimations (after update). Readers should note that the weights w;
and my control the inﬂuence of x and y vectors on the clustering algorithm. For
example, when w; = 0, Eq. (A.3) is equivalent to the y label clustering algorithm
used in Hwang & Weng [44].
: Compute the sample mean, say c, of all the q clusters in the node 0. Let the
current x-centers of the q clusters by {c,|c,- E X,i = 1, 2, ...,q} and the number

of samples in the cluster i be 71,-. Then,

q q
C: E nici/ E n,.
i=1 i=1

: Call the Gram-Schmidt Orthogonalization (GSO) procedure (see Hwang & Weng
[44, Appendix A]) using {Ci — cli = 1,2,...,q} as input. Then calculate the

projection matrix M of subspace ’D as:
M = [b1,b2,...,bq_1[, (A.4)

where b1,b2, ...,bq_1 denote the orthnormal basis vectors derived by the G80

procedure.

167

 

 

7: Update W,,,, the distance matrix of the mth cluster in the node c, as:

be = min{n — 1,ns}

bm = min{max{2(n — q)/q, 0}, n,}

by = 2(n - (1)/r12
we=be/b, wmzbm/b, wg= g/b
b = be + bm + bg

A = M’TM

B = MT(x — cm)(x — cm)TM

Fm(n) = WATrmm — 1)A + 1:93

S = 23:1 ngl‘:(n)/ 23:1 72:

Wm = wepzl + me + ngm(n)

 

where u(n) is the amnesic function deﬁned in Eq. (A.2), p and n, are empirical
deﬁned parameters, M’ is the old estimation of the projection matrix (before
update using Eq. (A.4)), and Fm(n — 1) and I‘m(n) denote, respectively, the old
and new estimations of the covariance matrix of the cluster m.
8: if the size ’Pc is larger than n f, a predeﬁned parameter, then
9: Mark c as an internal node. Create q nodes as c’s children, and reassign each
prototype x,- in ’PC to the child k, based on discriminating functions deﬁned in
Eq. (3.10). This is to compute:

k = arg minlS,Sq(l,-(x,-)).

10: end if

Procedure 2 Retrieval. Given an IHDR tree T and an input vector x, return the

corresponding estimated output 9.

1: By calling Procedure 3, we obtain the the best matched leaf node c.

2: p = parent(c), i.e., p denotes the parent node of c.

 

168

3: Let the M be the projection matrix of subspace D in the node p. Compute the

projection of the prototype set ”PC on subspace D:

’P; = {(ulaY1)7 (112,y2),..., (uncaync)}

where u,- = M T(x, — c) for 1 S i _<_ 72C and c denotes the x-center of the node p.

4: The projection of query input vector x:

u: MT(x—c)

5: if [P’I- — Nq, i. e., the number of prototype equals to predeﬁne constant M, then

6: Compute y by using the nearest-neighbor decision rule in the set:

p0 = {(x1)yl)’ (x2,y2), “'v (xnciync)}

, such that
y ym’ (A.5)
where
m = arglgisgc H x.- - x H-
7: else

Compute y by using locally weighted regression (LWR) [9] on dataset ’Pé. Let

 

 

 

w = diag( ((,/K d( (u1,u )), )u)\/K(d(u2, ..,(d(\/K (un.,u
Y = (Y1’Y2v' :ynclT

X=(uf m? mqu

Z=WX

V=WY

r3=(Zsz4va

169

 

 

where d(.r, y) denotes the Euclidean distance between vectors :1: and y; K(.) is

a kernel function:

and ’27 is a predeﬁned parameter. Thus, the predicted output for the querying
point x is:

W = uTﬁ (Mi)

8: end if

9: Return 9.

Procedure 3 SelectLeaf. Given an IHDR tree T and a sample (x, y), where y is

either given or not given. Output: the best matched leaf node c.

1: c +— the root node of T.

2: for c is an internal node do

3: c +— the mth child of the node c, where m = arg minls,sq(l,(x)) and b(x),
i = 1,2, ...,q are deﬁned in Eq. (3.10).

4: end for

5: Return node c.

170

 

Bibliography

[1] Orocos@kth. In http://cogvis.nada.kth.se/orocos/.

[2] L. Acosta, G. N. Marichal, L. Moreno, J. A. Méndez, and J. J. Rodrigo. Obstacle
avoidance using the human operator experience for a mobile robot. Journal of
Intelligent and Robotic Systems, 27:305—319, 2000.

[3] B. Adams, C. Breazeal, R. A. Brooks, and B. Scassellati. Humanoid robots: A
new kind of tool. IEEE Intelligent System, 15(4):25—31, July/ August, 2000.

[4] J. S. Albus. Outline for a theory of intelligence. IEEE Trans. Systems, Man
and Cybernetics, 21(3):473—509, May/ June 1991.

[5] N. Almassy, G. M. Edelman, and O. Sporns. Behavioral constraints in the
development of neural properties: A cortical model embedded in a real-world
device. Cerebral Cortex, 8(4):346—361, 1998.

[6] J. R. Anderson. Rules of the Mind. Lawrence Erlbaum, Mahwah, New Jersey,
1993.

[7] R. C. Arkin. Behavior-Based Robotics. The MIT Press, Cambridge, MA, 1998.

[8] H. Asama, M. Sato, L. Bogoni, H. Kaetu, A. Matsumoto, and I. Endo. Devel-
opment of an omni-directional mobile robot with 3 def decoupling drive mech-
anism. In Proc. International Conference on Robotics and Automation, pages
1925 — 1930, May 1995.

[9] C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. Artiﬁcial
Intelligence Review, 11(1-5):11—73, 1997.

[10] R. E. Bellman. Adaptive Control Processes. Princeton University Press, Prince-
ton, NJ, 1961.

171

 

 

[11] A. Billard and M. Mataric. Learning human arm movements by imitation:
Evaluation of a biologically-inspired connectionist architecture. In Proc. IEEE
International Conference on Humanoid Robots, Cambridge, MA, September 7-
8, 2000.

[12] C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press,
Oxford, 1995.

[13] J. Borenstein and Y. Koren. The vector ﬁeld histogram - fast obstacle avoidance
for mobile robots. IEEE Transactions Robotics and Automation, 7 (3)2341—346,
1991.

[14] C. Breazeal. Social constraints on animate vision. In Proc. IEEE International
Conference on Humanoid Robots, Cambridge, MA, June 20-25, 2000.

[15] C. Breazeal, A. Edsinger, P. Fitzpatrick, B. Scassellati, and P. Varchavskaia.
Social constraints on animate vision. IEEE Intelligent Systems and Their Ap-
plications, 15(4):32—37, 2000.

[16] 0. Brock and O. Khatib. High-speed navigation using the global dynamic
window approach. In International Conference on Robotics and Automation
(ICRAQQ), pages 341—346, Detroit, MI, June 1999.

[17] R. Brockett. Asymptotic stability and feedback stabilization. In R. Brockett,
R. Millman, and H. Sussmann, editors, Diﬂer‘ential Geometric Control Theory,
pages 181—191. Birkhauser, Boston, 1983.

[18] R. A. Brooks. A robust layered control system for a mobile robot. IEEE Journal
of Robotics and Automation, 2(1):14—23, March 1986.

[19] R. A. Brooks, C. Breazeal, M. Marjanovic, B. Scassellati, and M. M. William.
The cog project: Building a humanoid robot. In C. L. Nehaniv, editor, Compu-
tation for Metaphors, Analogy and Agents, vol. 1562 of Springer Lecture Notes
in Artiﬁcial Intelligence. Springer-Verlag, New York, NY, 1999.

[20] G. Campion, B. d’Andrea Novel, and G. Bastin. Controllability and state
feedback stabilization of nonholonomic mechanical systems. In C. Canudas

de Wit, editor, Lecture Notes in Control and Information Science. Springer-
Verlag, New York, 1991.

[21] J. Canny and J. Reif. New lower bound techniques for robot motion planning
problems. In 28th IEEE Symposium on Foundation of Comptuer Science, pages
49—60, Los Angeles, California, Oct. 12-14, 1987.

172

 

 

[22] S. Chen, C. F. N. Cowan, and P. M. Grant. Orthogonal least squares learning
for radial basis function networks. IEEE Transactions on Neural Networks,
2(2):321—355, 1991.

[23] S. Chen and J. Weng. State-based SHOSLIF for indoor visual navigation. IEEE
Trans. Neural Networks, 11(6):1300—1314, November 2000.

[24] Boris V. Cherkassky, Andrew V. Goldberg, and Tomasz Radzik. Shortest paths
algorithms: theory and experimental evaluation. Math. Program., 73(2):129—
174, 1996.

[25] J. H. Chuang. Potential-based modeling of tree dimensional workspace for
obstacle avoidance. IEEE Transactions on Robotics and Automation, 14(5):?78—
785, 1998.

[26] J. Coelho, J. Piater, and R. Grupen. Developing haptic and visual perceptual
categories for reading and grasping with a humanoid robot. In Proc. Ist IEEE
International Conference on Humanoid Robots, Cambridge, MA, September 7-
8, 2000.

[27] D. R. Cox. Statistical analysis of time series: Some recent developments. Scand.
J. Statist., 8(2):93—115, 1981.

[28] E. W. Dijkstra. A note on two problems in connection with graphs. Number.
Math, 1:269—271, 1959.

[29] G. L. Drescher. Made- Up Minds. MIT Press, Cambridge MA, 1991.

[30] A. D’Souza, S. Vijayakumar, and S. Schaal. Learning inverse kinematics. In
International Conference on Intelligence in Robotics and Autonomous Systesm
(IROS 2001), volume v.1, pages 298—303, Maui, Hawaii, October 2001.

[31] R. Duda, P. Hart, and David G. Stork. Pattern Classiﬁcation and Scene Anal-
ysis. John Weily & Sons, New York, NY, second edition, 2000.

[32] E. Eoyama, A. Agah, K. F. MacDorman, T. Maeda, and S. Tachi. A modulator
neural network architecture for inverse kinematics model learning. Neurocom-
puting, 38—40:797—805, June 2001.

[33] M. Fisz. Probability Theory and Mathematical Statistics. John Wiley & Sons
Inc., New York, third edition, 1963.

173

 

 

 

[34]

[35]

[36]

[37]

[38]

[391

[40]

[41]

[42]

[43]

[44]

D. Fox, W. Burgard, and S. Thrun. The dynamic window approach to collision
avoidance. IEEE Robotics and Automation Magazine, 4(1):23—33, 1997.

K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press,
New York, NY, second edition, 1990.

G. H. Golub and C. F. van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, MD, 1989.

JD. Han, SQ. Zeng, K.Y. Tham, M. Badgero, and J.Y. Weng. Dav: A hu-
manoid robot platform for autonomous mental development. In Proc. IEEE 2nd
International Conference on Development and Learning (ICDL 2002), pages
73—81, MIT, Cambridge, MA, June 12—15, 2002.

R. Held and A. Hein. Movement-produced stimulation and the development of
visually guided behaviors. Journal of Comparative and Physiological Psychology,
56:872—876, 1963.

K. Hirai, M. Hirose, Y. Haikaw, and T. akenak. The development of honda
humanoid robot. Proc. of IEEE International Conference on Robotics and Au-
tomation, pages 1321-1326, 1998.

R. Holmberg and O. Khatib. Development and control of a holonomic mo—
bile robot for mobile manipulation tasks. International Journal of Robotics
Research, 19(11):1066 - 1074, 2000.

X. Huang and J. Weng. Novelty and reinforcement learning in the value system
of developmental robots. In Proc. Second International Workshop on Epigenetic
Robotics: Modeling Cognitive Development in Robotic Systems (EPIROB’02),
pages 47—55, Edinburgh, Scotland, August 10 - 11, 2002.

X. Huang and J. Weng. Novelty and reinforcement learning in the value system
of developmental robots. In Proc. Second International Workshop on Epigenetic
Robotics: Modeling Cognitive Development in Robotic Systems (EPIROB 2002),
Edinburgh, Scottland, 2002.

W. Hwang and J. Weng. Vision-guided robot manipulator control as learning
and recall using SHOSLIF. In Proc. IEEE Int ’l Conf. on Robotics and Automa-
tion, pages 2862—2867, Albuquerque, NM, April 20-25, 1997.

W. S. Hwang and J. Weng. Hierarchical discriminant regression. IEEE Trans.
Pattern Analysis and Machine Intelligence, 22(11):1277—1293, 11 2000.

174

 

 

 

[45] Y. K. Hwang and N. Ahuja. A potential ﬁeld approach to path planning. IEEE
Transactions on Robotics and Automation, 8(1):23—32, 1992.

[46] Shun ichi Amari. Learning and statistical inference. In Michael A. Arbib, editor,
The handbook of brain theory and neural networks. The MIT Press, Cambridge,
Massechusetts, second edition, 2003.

[47] R. Ivry. Representation issues in motor learning: phenomena and theory. In
H. Heuer and S. W. Keele, editors, Handbook of perception and action, vol-
umn two, motor skill. Academic Press, London, San Diego, New York, Boston,
Sydney, Tokyo, Toronto, 1996.

[48] M. I. Jordan and D. E. Rumelhart. Forward models: supervised learning with
a distal teacher. Cognitive Science, 16:307—354, 1992.

[49] L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A
survey. Journal of Artiﬁcial Intelligence Research, 4:237—285, 1996.

[50] M. Kirby and L. Sirovich. Application of the karhunen-loeve procedure for the
characterization of human faces. IEEE Trans. Pattern Analysis and Machine
Intelligence, 12(1):103-108, January 1990.

[51] T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, second edition,
1997 .

[52] K. Konolige. Colbert: A language for reactive control in saphira. In Proceedings
of the German Conf. on Artiﬁcial Intellgence, F‘reiburg, Germany, 1997.

[53] K. Konolige. Ayllu: Distributed port-arbitrated behavior-based control. In
Proceedings of the Intl. Symp. on Distributed Autonomous Robotic Systems
(DARS), Knoxville, Tennessee, October 2000.

[54] RE. Korf. Real-time heuristic search. Artiﬁcial Intelligence, 42(2-3):189—211,
1990.

[55] S. L. Pallas L. von Melchner and M. Sur. Visual behavior mediated by retinal
projections directed to the auditory pathway. Nature, 404:871—876, 2000.

[56] J. E. Laird, A. Newell, and P. S. Rosenbloom. Soar: An architecture for general
intelligence. Artiﬁcial Intelligence, 33:1—64, 1987.

[57] J. Latombe. Robot motion planning. Kluwer Academic Publishers, Boston, MA,
1991.

175

 

[58]

[59]

[60]

[61]

[62]

[631

[64]

[651

[661

[67]

[68]

T. Lee and C. Wu. Fuzzy motion planning of mobile robots in unknown envi-
ronments. Journal of Intelligent and Robotic Systems, 37:177-191, 2003.

L. Li, B. Cox, M. Diftler, S. Shelton, and B. Rogers. Development of a telep-
resence controlled ambidextrous robot for space applications. Proceedings of
the IEEE International Conference on Robotics and Automation, pages 58—63,
1996.

W. Li, H. Christensen, and A. Oreback. An architecture for indoor naviga-
tion. In Proceedings of the IEEE International Conference on Robotics and
Automation, New Orleans, LA, April. 26-30, 2004.

H. Lim, A. Ishii, and A. Takanishi. Motion pattern generation for emotion
expression. In Proc. 2nd International Conference on Humanoid Robots, pages
36—61, Tokyo, Japan, October 8—9, 1999.

Chin-Tang Lin and CS. George Lee. Neural Fuzzy Systems: A Neuro-Fuzzy
Synergism to Intelligent Systems. Prentice-Hall PTR, 1996.

C. Lovchik, H. Aldridge, and M. Diftler. Design of the nasa robonaut hand. In
Proc. of the ASME Dynamic Systems and Control Division, volume DSC—Vol.
67, pages 823—830, 1999.

K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic
Press, London, New York, 1979.

M. J. Marjanovic, B. Scassellati, and M. M. Williamson. Self-taught visually-
guided pointing for a humanoid robot. In From Animals to Animats: Proc. of
1996 Society of Adaptive Behavior, pages 35—44, Cape Cod, MA, 1996.

J. Minguez and L. Montano. Nearness diagram navigation (nd): A new real time
collision avoidance approach. Proc. of IEEE/RSJ International Conference On
Intelligent Robots and Systems, pages 2094—2100, 2000.

J. Minguez, L. Montano, and J. Santos-Victor. Reactive navigation for non-
holonomic robots using the ego—kinematic space. In International Conference
on Robotics and Automation (ICRA2002), pages 3074—3080, Washington D. C.,
May 11-15, 2002.

A. N agakubo, Y. Kuniyoshi, and G. Cheng. Etl-humanoid - a high-performance
full body system for versatile action. Proc. of IEEE/RSJ International Confer-
ence 0n Intelligent Robots and Systems, pages 1087—1092, 2000.

176

 

[69] K. Nishiwaki, T. Sugihara, S. Kagami, F. Kanehiro, M. Inaba, and H. Enoue.
Design and development of research platform for perception-action integration
in humanoid robot : H6. Proc. of IEEE/RSJ International Conference On
Intelligent Robots and Systems, pages 1559—1564, 2000.

[70] L. Petersson, D. Austin, and H. Christensen. Dca: A distributed control ar-
chitecture for robotics. In International Conference on Intelligence in Robotics
and Autonomous Systesm (IROS 2001), volume v.1, pages 2361—2368, Maui,
Hawaii, October 2001.

[71] J. Piaget. The origins of intelligence in children. International Universities
Press, Madison, NY, 1952.

[72] J. Piaget. The construction of reality in the child. Basic Books, New York, NY,
1954.

[73] J. Piaget. The Psychology of Intelligence. Littleﬁeld & Adams, Totowa, NJ,
1973.

[74] D. A. Pomerleau. Efﬁcient training of artiﬁcial neural networks for autonomous
navigation. Neural Computation, 3(1):88—97, 1991.

[75] L. R. Rabiner. A tutorial on hidden Markov models and selected applications
in speech recognition. Proceedings of IEEE, 77(2):257—286, 1989.

[76] M. Rosenblum and Larry S. Davis. An improved radial basis function net-

work for visual autonomous road following. IEEE Trans. on Neural Networks,
7(5):1111—1120, 1996.

[77] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally
linear embedding. Science, 290:2323—2326, December 22, 2000.

[78] D. Roy, B. Schiele, and A. Pentland. Learning audio—visual associations using
mutual information. In Proc. International Conference on Computer Vision,
Workshop on Integrating Speech and Image Understanding, Corfu, Greece, 1999.

[79] S. Russell and P. Norvig. Artiﬁcial Intelligence: A Modern Approach. Prentice-
Hall, Upper Saddle River, NJ, second edition, 2003.

[80] N. Sarkar, X. Yun, and V. Kumar. Control of mechanical systems with rolling
constraints: Application to dynamic control of mobile robots. The International
Journal of Robotics Research, 13(1):55—69, February 1994.

177

[81] L. L. Scharf. Statistical Signal Processing: Detection, Estimation, and Time
Series Analysis. Addison-Wesley Publishing Company, Reading, Massachusetts,
1991.

[82] Wei-Min Shen. Autonomous Learning from the Environment. Computer Science
Press, New York, 1994.

[83] S. Shigeki and K. Ichiro. Wabot-2: Autonomous robot with dexterous ﬁnger-
arm coordination control in keyboard performance. In Proc. of IEEE Interna-
tional Conference on Robotics and Automation, pages 90—97, 1987.

[84] R. Simmons. The curvature—velocity method for local obstacle avoidance. In
IEEE International Conference on Robotics and Automation ICRA ’96, pages
2275—2282, Minneapolis, April 1996.

[85] HA. Simon. The sciences of the Artiﬁcial. MIT Press, Cambridge, MA, third
edition, 1996.

[86] S. Singh, D. Feng, P. Keller, G. Shaffer, W. Shi, D. Shin, J. West, and B. Wu.
A system for fast navigation of autonomous vehicles. Technical Report CMU-
RI—TR-91-20, The RoboticsCarnegie Institute, Mellon University, Pittsburgh,
PA, 1991.

[87] J. E. Slotine and W. Li. Applied Nonlinear Control. Prentice Hall, Englewood
Cliffs, New Jersey, 1991.

[88] A. J. Smola and B. Scholkopf. A tutorial on support vector regression. Technical
Report NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College,
University of London, UK, 1998.

[89] M. W. Spong and M. Vidyasagar. Robot Dynamics and Control. John Wiley &
Sons, New York, ﬁrst edition, 1989.

[90] M. Stanisié, J. Wiitala, and J. Feix. A dexterous humanoid shoulder mechanism.
Journal of Robotic Systems, 18(12):737—745, 2001.

[91] R. S. Sutton and A. Barto. Reinforcement Learning. The MIT Press, Cam-
bridge, MA, 1998.

[92] S. Thrun. Explanation-Based Neural Network Learning: A Lifelong Learning
Approach. Kluwer Academic, Boston, MA, 1996.

178

' ".‘Ylu "'- v'

 

[93] R. Tibishirani. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society. Series B (Methodological), 58(1):267-288, 1996.

[94] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive
Neuroscience, 3(1):71—86, 1991.

[95] Iwan Ulrich and Johann Borenstein. th*: Local obstacle avoidance with look-
ahead veriﬁcation. In International Conference on Robotics and Automation
(ICRA2000), pages 2505—2511, San Francisco, CA, April 2000.

[96] Richard T. Vaughan, Brian Gerkey, and Andrew Howard. On device abstrac-
tions for portable, reusable robot code. In International Conference on Intel-
ligence in Robotics and Autonomous Systesm (IROS 2003), pages 2121—2427,
Las Vegas, Arizona, October 2003.

[97] S. Vijayahumar and S. Schaal. Real time learning in humanoids: A challenge
for scalability of online algorithms. In Proc. IEEE International Conference on
Humanoid Robots, Cambridge, MA, September 7-8, 2000.

 

[98] S. Vijayakumar, A. D’Souza, T. Shibata, J. Conradt, and S. Schaal. Statistical
learning for humanoid robots. Autonomous Robot, 12(1):55—69, 2002.

[99] S. Vijayakumar and S. Schaal. Locally weighted projection regression: An 0(n)
algorithm for incremental real time learning in high dimensional space. In Proc.
of Seventeenth International Conference on Machine Learning (ICML2000),
pages 1079—1086, Stanford, CA, June 19-July 2, 2000.

[100] L. S. Vygotsky. Thought and language. MIT Press, Cambridge, Massachussetts,
1962. trans. E. Hanfmann & G. Vakar.

[101] M. Wada and S. Mori. Holonomic and omnidirectional vehicle with conventional
tires. In Proc. International Conference on Robotics and Automation, pages
3671 - 3676, April 1996.

[102] I. Wallance, D. Klahr, and K. Bluff. A self-modifying production system of cog-
nitive development. In D. Klahr, P. Langley, and R. Neches, editors, Production
System Models of Learning and Development, pages 359—435. MIT Press, Cam-
bridge, MA, 1987.

[103] C. Watkins. Learning from delayed rewards. Technical report, PhD thesis,
King’s College, Cambridge, England, 1989.

179

[104] C. Watkins. Q-learning. Artiﬁcial Intelligence, 8:55—67, 1992.

[105] J. Weng. The living machine initiative. Technical Report CPS 96—60, De-
partment of Computer Science, Michigan State University, East Lansing, MI,
December 1996. A revised version appeared in C. W. Chen and Y. Q. Zhang.

Visual Communication and Image Processing, Marcel Dekker, NY, 431 - 487,
1998.

[106] J. Weng. Learning in image analysis and beyond: Development. In C. W. Chen
and Y. Q. Zhang, editors, Visual Communication and Image Processing, pages
431 — 487. Marcel Dekker, New York, NY, 1998. A revised version from “Living
Machine Initiative,” MSU CPS Tech. Report CPS-96-60, 1996.

[107] J. Weng. A theory for mentally developing robots. In Proc. IEEE 2nd Interna-
tional Conference on Development and Learning (ICDL 2002), pages 131—140,
MIT, Cambridge, MA, June 12-15, 2002.

[108] J. Weng, N. Ahuja, and T. S. Huang. Learning recognition using the Cres-
ceptron. International Journal of Computer Vision, 25(2):109—143, November
1997.

[109] J. Weng and S. Chen. Vision-guided navigation using SHOSLIF. Neural Net-
works, 11:1511—1529, 1998.

[110] J. Weng and S. Chen. Visual learning with navigation as an example. IEEE
Intelligent Systems, 15:63—71, September/ October 2000.

[111] J. Weng and W. Hwang. An incremental learning algorithm with automatically
derived discriminating features. In Proc. Asian Conference on Computer Vision,
Taipei, Taiwan, Jan. 8-9, 2000.

[112] J. Weng and W. Hwang. Online image classiﬁcation using IHDR. International
Journal on Document Analysis and Recognition, 5(2-3):118—125, 2002.

[113] J. Weng and W. Hwang. Online image classiﬁcation using ihdr. International
Journal on Document Analysis and Recognition, 5(2-3):118—125, 2003.

[114] J. Weng and W. S. Hwang. An incremental learning algorithm with automati-
cally derived discriminating features. In Proc. Asian Conference on Computer
Vision, pages 426 — 431, Taipei, Taiwan, January 8-9, 2000.

180

 

[115] J. Weng, W. S. Hwang, Y. Zhang, and C. Evans. Developmental robots: Theory,
method and experimental results. In Proc. 2nd International Conference on
Humanoid Robots, pages 57-64, Tokyo, Japan, October 8-9, 1999. IEEE Press.

[116] J. Weng, W. S. Hwang, Y. Zhang, C. Yang, and R. Smith. Developmental
humanoids: Humanoids that develop skills automatically. In Proc. First IEEE
Conf. on Humanoid Robots, MIT, Cambridge, MA, September 7-8, 2000. IEEE

Press.

[117] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and
E. Thelen. Autonomous mental development by robots and animals. Science,
291(5504):599—600, JAN. 2000.

[118] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and
E. Thelen. Autonomous mental development by robots and animals. Science,
291:599—600, January 26, 2001.

[119] J. Weng, Y. Zhang, and W. Hwang. Candid covariance-free incremental prin-
cipal component analysis. IEEE Trans. Pattern Analysis and Machine Intelli-
gence, 25(8), 2003.

[120] M. West and H. Asada. Design of ball wheel mechanisms for omnidirectional
vehicles with full mobility and invariant kinematics. Journal of Mechanical
Design, 119:153—161, 1997.

[121] D. J. Wood, J. S. Bruner, and G. Ross. The role of tutoring in problem-solving.
Journal of Child Psychology and Psychiatry, pages 89—100, 1976.

[122] S. L. Zeger and B. Qaqish. Markov regression models for time series: A quasi-
likelihood approach. Biometrics, 44:1019—1031, 1988.

[123] S. Zeng and J. Weng. Obstacle avoidance through incremental learning with
attention selection. In Proceedings of the IEEE International Conference on
Robotics and Automation, volume v.1, New Orleans, LA, April. 26-30, 2004.

[124] S. Zeng and J. Weng. Online-learning and attention—based obstacle avoidance
using a range ﬁnder. In Proceedings of the 17th International FLAIRS Confer-
ence, Miami Beach, FL, May 17-19, 2004.

[125] S.Q. Zeng, D. M. Cherba, and J. Weng. Dav developmental humanoid: The
control architecture and body. In IEEE/ASME International Conference on
Advanced Intelligent Mechatronics, pages 974—980, Kobe, Japan, July 2003.

181

 

[126]

[127]

S.Q. Zeng and Y. B. He. Learning and tuning fuzzy logic controllers through
genetic algorithm. In IEEE International Conference on Neural Networks, vol-
ume 3, pages 1632—1637, Orlando, Florida, June 1994.

N. Zhang and J. Weng. A developing sensory mapping for robots. In Proc. IEEE
2nd International Conference on Development and Learning (ICDL 2002),
pages 13-20, MIT, Cambridge, MA, June 12-15, 2002.

[128] Y. Zhang and J. Weng. Complementary candid incremental principal compo-

[129]

[130]

[131]

[132]

nent analysis. Technical Report MSU-CSE—01-24, Department of Computer
Science and Engineering, Michigan State University, East Lansing, MI, August
2001.

Y. Zhang and J. Weng. Convergence analysis of complementary candid in-
cremental principal component analysis. Technical Report MSU-CSE—01-23,
Department of Computer Science and Engineering, Michigan State University,
East Lansing, MI, August 2001.

Y. Zhang and J. Weng. Grounded auditory development by a developmental
robot. In Proc. INNS—IEEE International Joint Conference on Neural Networks,
pages 1059-1064, Washington, DC, July 14-19, 2001.

Y. Zhang and J. Weng. Action chaining by a developmental robot with a
value system. In Proc. IEEE 2nd International Conference on Development
and Learning (ICDL 2002), pages 53—60, MIT, Cambridge, MA, June 12-15,
2002.

Y. Zhang and J. Weng. Chained action learning through real-time interactions.
In Proc. IEEE Int ’l Conf. on Neural Networks, Honolulu, USA, May 2002.

182

 

 

   

1l]]i]]]l][[]][1[[1,1]]

— ~‘-~'.—-I—-_—. (.