A \ .-. yaw 79633. ifiz‘ésrfiirf-fl' .-. “in raw; m $2.} “Ni”: r. , ‘7 " 75. 1" .4. W}; ‘ 131.25! . ' nu} .~ g . x ' I 1‘63“. ‘ .6 “St 13‘! ‘,‘\' . r “Au; . Ruler“ c -.. H "41%" TH E8l8 l ,". »‘ 10039093! -O-C-l-O-Q-O-.-.-O-.-O- This is to certify that the dissertation entitled DAV: A HUMANOID PLATFORM AND DEVELOPMENTAL LEARNING WITH CASE STUDIES presented by SHUQING ZENG has been accepted towards fulfillment of the requirements for the PHD degree in COMPUTER SCIENCE W ”W Wt Major Professor’s Signature 8/20/04 Date Doctoral Dissertation MSU is an Affirmative Action/Equal Opportunity Institution . LlBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJCIRC/DateDue.p65-p.15 DAV: A HUMANOID PLATFORM AND DEVELOPMENTAL LEARNING WITH CASE STUDIES By Shuqz'ng Zeng A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science and Engineering 2004 ABSTRACT DAV: A HUMANOID PLATFORM AND DEVELOPMENTAL LEARNING WITH CASE STUDIES By Shuqz'ng Zeng Motivated by the autonomous development process of humans, we are interested in building a robot that learns online in real-time by interacting with the environment through sensors and effectors, and develops a representation of the environment and tasks autonomously. We call such a robot a developmental robot, which Should sat- isfy eight challenging requirements: environment openness, high-dimensional sensors, completeness in using sensory information, online processing and real-time speed, incremental processing, performing while learning, and handling increasingly more complex tasks. To partially satisfy the above-mentioned challenging requirements for a develop— mental robot, this research contributes: . Building the body of a humanoid robot, called Dav. Embodiment is the most overlooked aspect of human intelligence in traditional artificial intelligence. The morphology of the robot is preferably of a humanoid form. This human-like shape may allow humans to interact with the robot more naturally and provide the similar physical constraints as humans during those interactions. Further- more, directly Situating the robot in the physical world relieves the needs for a symbolic intermediary representation. The actual physical world serves for this representation. Therefore, as a part of this research, we built the Dav robot, which is the only untethered mobile humanoid currently in the universities of the United States. Creating a developmental architecture and a modified version of incremental hierarchical discriminant analysis (IHDR) algorithm. This thesis also presents a theory of developmental mental architecture. Five architecture types, from the simplest Type-1 (observation-driven Markov decision process) to Type-5 (DOSASE MDP), are introduced. Further, we present the architecture design of the Dav robot. The framework of the Dav architecture is hand-designed, but the actual controller is developed, i.e., generated autonomously by the de- velopmental program through real-time interactions with the real physical en- vironment. We present the Dav architecture and the major components that implement the architecture. Based on the above developed robotic platform and the proposed architecture, we have designed and implemented a range-based navigation system. ACKNOWLEDGMENTS I am indebted to my advisor, Professor John J. Weng, who has been the guiding force of my professional development and an inexhaustible source of ideas throughout my Ph.D program at Michigan State University. Without his kind assistance and advice, this dissertation would not have been completed. I am grateful to Professors Anil K. Jain, George C. Stockman, Zhengfang Zhou and Ning Xi, for serving on my comittee and providing critical comments on my research work. Dav has been an enormous scientific project which has benefited from the efforts and ideas of many people. This project simply could never have happened without Professor John J. Weng. Thank you, John, for offering me the opportunity to build a system from scratch. Thanks also to the members of the Dav team, past and present: Dr. Jianda Han, Jingliang Wang and David M. Cherba. Four of us have built exten- sively on the many previous iterations of hardware and software design. Particularly, David has been invaluable in making the torso and arms a usable resource, in helping me testing the embedded distributed system, mechanical system and software system. Thanks also to the members of the E1 Laboratory, past and present, at Michigan State University. I benefited from many discussions with my colleagues. Thanks iv go to Wey-Shiuan Hwang, Yilu Zhang, Nan Zhang, Xiao Huang, Yi Chen, Ameet Joshi, Jason Massey, Raja Ganjikunta, Gil Abramovich, Christopher Osborn, Tham Kengyew, and David Bordoley. I am greatly indebted to my parents for their continuous encouragement and support throughout my school years. Thanks also go to my wife, Yanhua Chen, for her patience and support during the years at Michigan State University. TABLE OF CONTENTS LIST OF FIGURES ix LIST OF TABLES xv 1 Introduction 1 1.1 Background .................................. 1 1.1.1 Learning, Perception and World Models ................. 2 1.1.2 Actions through Action Practice ..................... 2 1.1.3 Perception-guided Actions ......................... 2 1.1.4 World-model Free Approach ........................ 3 1.1.5 Autonomous Mental Development .................... 4 1.2 Outline of the Dissertation .......................... 6 2 The Body Design of the Dav Humanoid Robot 8 2.1 The Body Requirement for a Developmental Robot ............ 8 2.2 Mechanism and Kinematic Analysis ..................... 13 2.2.1 Mobile Base ................................. 14 2.2.2 Torso .................................... 19 2.2.3 Neck and Head ............................... 21 2.2.4 Arm ..................................... 22 2.3 Dynamic Model and Control ......................... 26 2.3.1 Lagrange-Euler Formulation ........................ 28 2.3.2 Analysis Of Manipulator with n—link ................... 29 2.4 The Low-level Control Scheme ........................ 30 2.4.1 Model of Motor ............................... 30 2.4.2 Tracking Algorithm ............................. 32 2.5 Dav’s Mobile Base: Control of System with Nonholonomic Constraints 36 2.5.1 Nonholonomic Constraint ......................... 36 2.5.2 Dynamic Analysis of Dav’s Mobile Base ................. 38 2.5.3 Controller Design .............................. 43 2.5.4 Experimental Evaluation .......................... 46 2.6 The Distributed Control System Design .................. 49 2.6.1 Hardware Design .............................. 49 2.6.2 Software Development ........................... 55 2.7 Experiment .................................. 60 2.8 Summary ................................... 60 vi 3 Dav Architecture Design: A Theory of Developmental Mental Ar- chitecture 65 3.1 Introduction .................................. 66 3.2 AMD Paradigm ................................ 68 3.3 Observation-driven Markov Processes .................... 70 3.3.1 Type-1: Observation-driven MDP ..................... 71 3.3.2 Type—2: Observation-driven Selective MDP ............... 74 3.3.3 Type-3: Observation-driven Selective Rehearsed MDP ......... 78 3.3.4 Type—4: Observation-driven SASE MDP ................. 79 3.3.5 Type-5: Developmental observation-driven SASE MDP ......... 83 3.3.6 Type-6: Multi-level DOSASE MDP .................... 84 3.4 Dav Architecture ............................... 86 3.4.1 Open view .................................. 87 3.4.2 Recursive view ............................... 87 3.4.3 Architecture view .............................. 88 3.5 Major components .............................. 89 3.5.1 Sensory mapping .............................. 90 3.5.2 Complementary Candid Incremental Principal Component Analysis . . 91 3.5.3 The cognitive mapping: IHDR tree .................... 94 3.5.4 Motor mapping ............................... 101 3.5.5 Innate value system ............................ 102 3.5.6 Sensorimotor algorithm .......................... 103 3.6 Summary ................................... 105 4 Perception-driven Control Architecture 117 4.1 Introduction .................................. 117 4.2 Control Architecture ............................. 118 4.3 Dav’s Range-based Navigation Experiment ................. 122 4.3.1 Wall Following Behaviors ......................... 122 4.4 Summary and Future Work ......................... 126 5 Global Obstacle Avoidance through Real-time Learning 130 5.1 Introduction .................................. 130 5.2 Related work ................................. 131 5.3 Approach ................................... 133 5.3.1 Obstacle avoidance ............................. 133 5.3.2 Attentional mechanism ........................... 135 5.3.3 Path planner ................................ 141 5.3.4 Cost function ................................ 145 5.4 The robotic system and online training procedure ............. 147 5.4.1 Online incremental training ........................ 148 5.5 Experiments and results ........................... 149 5.5.1 Attention mechanism ............................ 149 5.5.2 Obstacle—avoidance on the Dav robot ................... 151 5.5.3 Integrate with the path planning layer .................. 151 vii 5.6 Discussion and Conclusion .......................... 153 6 Conclusions 161 6.1 Concluding Remarks ............................. 161 6.2 Future Work ................................. 162 APPENDICES 166 A IHDR algorithm 166 viii 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 LIST OF FIGURES Dav: a mobile humanoid robot ......................... Location of degrees of freedom for Dav .................... The Solidwork’s rendering of Dav’s mobile base ................. The offset steered driving wheel. The steering pivot’s speed can be arbitrary direction. ................................. The top view of the mobile base. The black small dots are pivot axes along which the wheels can turn, and the small solid squares denote contact points between the wheels and the ground. .................... CAD drawing of Torso. Joint3 and Joint4 are the first joint of the left arm and right arm respectively. The unit is in mm .................. Coordinate frames of torso. Frames 023431424, ox4y4z4 and 03431424 are end- effector frames for mounting the left arm, right arm, and neck respectively. CAD drawing of the neck and head module. The unit is in mm. ....... Coordinate frames of the neck and head. Frames ox5y525 and 0:1:5’y5’25’ are end-effector frames for the left and right cameras respectively ........ Dav’s left arm. The unit is in mm. The actuator for Joint 1 is housed in the torso module ................................. Coordinate frames of the left arm. Two spherical mechanisms (shoulder and arm) are designed. ............................. Block diagram for a DC motor. ........................ A rolling disk on a plane. The disk cannot move along the direction perpendic- ular to its face at each instant ........................ If q is away from the distribution A, the extra term AT in Eq. (2.58) will pull it back .................................... Schematic diagram of control algorithm of Dav’s base. ............ The result of the simulated base experiment. ................. The hardware architecture for Dav has two levels: a main computer system and low-level servo control modules. ...................... ix 10 13 15 16 17 20 21 23 24 25 27 33 37 45 2.18 2.19 2.20 2.21 2.22 2.23 2.24 3.1 3.2 3.3 3.4 3.5 3.6 The “Brain” of Dav, a quadruple processor desktop PC. The SCSI RAIDO disk array is not included ............................. The block diagram of PowerPC based controller. ............... CANPC: a PC104 embedded system running RealTime Linux Operating sys— tem with four CAN interface (a) PC104 computer board (MZ104+). (b) PCM3680 Dual-CAN interface board. ................... The software architecture of Dav’s control system ................ The movement of Dav’s neck and head modules. The first, second and third rows Show the rotations of the neck’s Roll, Pitch, and Yaw joints respectively; the fourth and fifth rows Show the tilt and pan motions of the eyeballs respectively. ................................ The movement of Dav’s mobile base. ..................... The movement of the arm’s Joint 1, Joint 3 and Joint 4, from the top row to the bottom. ................................ The autonomous development paradigm .................... The Type-1 architecture of a multi-sensor multi—effector agent: Observation- driven Markov decision process. Each square in the temporal streams de- notes a smallest admissible mask. The Type-1 architecture takes the entire image frame without applying any mask. The block marked with L is a set of context states (prototypes), which are clusters of all observed context vectors l (t) .................................. Progressive addition of architecture components for Type-2 to Type-5. Type- 2: adding T and En. Type-3: Adding M and En. Type-4: Adding S.- and primed sensation. The block marked with D is a delay module, which introduces a unit-time delay. Type-5: DevelOpmental T, R, M and V. The Type-6 architecture ............................. An outline view of the Dav architecture. The closed-loop control is realized by the sensors that sense each effector, as well as a set of rich sensors that sense the internal and external environments. w denotes the working memory and l is the long-term memory .......................... An open view of a sensorimotor system. A circle designates a local context state determined by the last context L. An inward solid arrow indicates a sensory input at that time frame (including internal and external sensors) at each time frame. An outward solid arrow indicates an output at that state. A dashed arrow indicates an experienced transition between states. The lower part corresponds to the reality mapping R and the upper part corresponds to the priming mapping F. The two generally do not have the same number of states. The priming mapping is used to predict farther future contexts. 51 53 53 58 62 63 64 69 71 76 86 106 107 3.7 A temporal trajectory view of a sensorimotor system. Each cycle represents a context state at a particular time. A shaded cycle corresponds to a visited state. A dashed arrow indicates a priming of a context that corresponds to the cycle that the arrow points to. The priming arrows from a state are updated according to the real experience. An actually visited state is not necessarily primed .............................. 3.8 A recursive view of a simplified sensorimotor system. Not all components are shown. ................................... 3.9 A block diagram of the architecture of a sensorimotor system. The lower cog- nitive mapping realizes the reality mapping and the upper one realizes the priming mapping. GK: gate keeper, an internal effector to actively control the update of the last context. ....................... 3.10 The spatiotemporal organization of areas in the sensory mapping. The ellipses represent receptive fields covering both space and time. Areas are organized in a hierarchical way, and the output of an earlier (low order) neural area is used as input to the later (higher order) neural area. Along the pathway of information processing, a neuron in a later area has a larger receptive field than one in an early area. In order to make the correct signal available at the right time for the right input line, we use a time shifting technique. Acting as a time delay unit, the shifter moves each signal to the next line at each time instant. ........................... 3.11 CCIPCA algorithm [119]: compute first k dominant eigenvectors, v1(n), v2(n), . . . ,vk(n), directly from u(n), where n = 1,2,. . .. 3.12 Y—clusters in space 32 and the corresponding X-clusters in space X. Each sample is indicated by a number which denotes the order of arrival. The first and the second order statistic are updated for each cluster. The first statistic gives the position of the cluster, while the second statistics gives the size, shape and orientation of each cluster. ................... 3.13 The autonomously deveIOped IHDR tree [114,123]. Each node corresponds to a local model and covers a certain region of the input space. The higher level node covers a larger region, and may partition into several smaller regions. Each node has its own Most Discriminating Feature (MDF) subspace D which decides its child models’ activation level. .............. 3.14 Regions in the input space marked 2’. j where 2' is the index of the parent region while 3' is the index of child region. The leaves of the tree represent the finest partition of the space. The decision boundary of the region is of quadratic form. ................................... 3.15 (a) Fits using the nearest-neighbor rule. (b) Fits using locally weighted regres- sion. (c) Fits using kernel regression. (a) and (b) are adapted from Atkeson et al. [9]. .................................. xi 108 108 109 109 110 110 111 111 112 3.16 3.17 3.18 3.19 3.20 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.1 The result Of IHDR on an artificial data set. ................. (a) The learning curves of the artificial data set. (b) The execution time . . . The motor mapping as a reverse application of the sensory mapping, but with signal reconstruction ............................. An illustration of a simple motor mapping for a single effector. The left is a gating mechanism and the right is a subsumption mechanism. ...... A hierarchical developmental architecture (Type-6) for SAIL procedure learn- ing. Each Level Building Element (LBE) corresponds to a sensorimotor system .................................... The decision process of an agent. The agent tries to use the last context l(t) = (1:1(t),aj (t)) to predict the future contexts: primed sensation :rp(t) and primed action ap(t). ......................... The architecture of the robot controller. Note that symbols a(t), x(t) and l(t) denote the action, sensation, and context at time t, respectively. The internal representation R; is realized by IHDR. The world sensors include range finder and cameras. Block V denotes the innate value system. Flow chart of learning procedure: the system learning while performing. A map of the training site at the second floor of the Engineering Building at Michigan State University. The test loop (desired trajectory) is indicated by the thick solid lines. 5 different corners are denoted by bold-faced arabic numbers. ................................. A subset of training samples. The red arrows denote the labeled heading for the robot. ................................. The graphical user interface to train the robot online. The left window shows the current laser map in the robot’s local frame. The trainer may impose an action via clicking the mouse button in the window. The red arcs illustrate the possible circular path that the robot may have. The right window shows the primed range image (retrieved from the IHDR tree). ......... The error histogram of the leave-one-out test. ................ The error histogram of the testing data set collected from the third floor. . . . A map of the test site at the third floor of the Engineering Building at Michigan State University. The thick solid lines denotes the trajectory along which the robot traveled. ............................ The overall architecture of the range-based navigation. “Attention” denotes an attentional module .............................. xii 115 115 116 120 121 121 124 126 127 128 128 129 135 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 Why attention is needed? The solid small circles denote the robot with a short line segment at the front indicating orientation. Thick lines mark the walls along the corridor. T and 1‘ denote the threshold and the mean, respectively. (a2), (b2) and (c2) are range images taken by the robot at the three scenarios, respectively. (a3), (b3) and (c3) are the corresponding images after passing the attentional module. The diagrams in lower two rows use logarithmic scales for the Y-axis. We can see that the distance between (33) and (b3) becomes larger while the distance between (b3) and (c3) becomes smaller. ........................... The attentional signal generator f and attentional executor g. r(t) and z(t) denote the input and output, respectively .................. (a) shows that the pre-attentional map r is approximated by a polygon P. P is specified by h end points: p1,p2, ...,ph. The kth end point is denoted by (lk,ak). (b) shows the post-attentional approximation P’ , after the at- tentional function 9". Those end-points whose distances are larger than T are set to F. The half-circle with symbol T shows the area the robot pays attention to. We can see numerous pre-attentional approximations map to a single post-attentional approximation P’. It is clear that the points outside the half-circle of (a) have the freedom to change positions without affecting the shape of (b). .............................. The occupancy grid of an environment. Black cells denote to occupied cells, while others are free ones. Each cell has eight directly connected neighbors: N, NE, E, SE, S, SW, W and NW, each of which corresponds to an angle: heading 0. The optimal path from S to G can be encoded by the heading sequence: (E, E, E, NE, NE, NE, NE, N, N, N) ............... The result of tests on a simulated environment ................ The local coordinate system and control variables. Once the mouse button is clicked, the position of the mouse pointer P provides an imposed action. Let 1' denote the radius of the arc. The arc’s length d controls the translation speed while the radius controls the angular speed .............. The 16 scenarios are used to train the simulated robot. The small solid circles denote the robot, and solid dark line represents the walls. The dot line recorded the online training trajectories ................... The solid dark lines denote walls and the dot lines show the trajectory. Ob- stacles of irregular shape are scattered about the corridor. (a) A 10—minute run by the simulated robot with the attentional module. (b) The result of the test without attentional selection. Two collisions indicated by arrows are occurred during the first two minutes simulation. ........... 5.10 Dav moved autonomously in a corridor crowded with peOple. ........ xiii 136 138 141 143 155 156 157 5.11 Navigating in an simulated environment with unknown Obstacles. The ini- tial configuration is chosen to be about (4.8m, 3.0m). The goal configura- tions G1, G2, G3 and G4 are chosen to be (37.1m, 25.0111), (29.6m, 24.3m), (18.6m, 27.6m) and (16.2m, 25.6m), respectively. For clearness, we do not draw the portion of trajectory very close to the goals ............ 159 5.12 Dav run in the northwest corner of the corridor on the second floor of Engineer- ing Building at Michigan State University. The red curve (or labeled with “A” ..................................... 160 xiv 2.1 2.2 2.3 2.4 4.1 5.1 5.2 5.3 LIST OF TABLES The specification and installed sensors on the Dav robot. ........... D-H geometric parameters. The joint variables are marked with *. The unit is in millimeter. (a) is the parameters for left arm, (b) is for the right arm, and (c) is for the neck. 91 and 02 denote the angles of joint 1 and joint 2 respectively. ........................ D-H geometric parameters of neck and head mechanism. The joint vari- ables are marked with *. The unit is milli-meter. 01, 02, 0;; denote the angles of the neck’s Roll joint, Pitch joint, and Yaw joint respectively. Frames 4 and 4’ are the end-effector frames of the left and right eyeballs respectively. Note that we do not consider two pan joints in head. D-H geometric parameters of the arm mechanism. The joint variables are marked with *. The unit is milli-meter. 6,- denotes the angles of joint 2', for z' = 1, ...,7 ............................... Differences between the DPDCA controller and the traditional controller. . . . The lower bound of the reduction ratio at several parametric setting . . . The results of tests with random starting positions. ............ The results of tests with random starting positions. ............ XV 26 26 123 141 150 150 Chapter 1 Introduction 1.1 Background Industrial robots have been widely and successfully used in controlled industrial set- tings for a variety of tasks. Great challenges exist for robots to work in general human environments. These are characterized by unknown, unpredictable and changing ob- jects, and tasks that cannot be well specified algorithmically. We call them muddy tasks. Vision, audition and language are at the forefront among cognitive capabilities that are required for muddy tasks. Autonomous mental development seems to give a possible way for machines to handle these muddy tasks [117]. Not until the late 1980’s did embodiment receive attention in the artificial intelli- gent (AI) community, when it was popularized by the behavior-based approach [7,18]. Action generations have been fueled by impressive advances in robot construction, especially in humanoid platforms. The first humanoid robot was “Elektro the Motorman” at the 1939 New York World Fair. It was a legged humanoid robot that “talked,” moved its limbs, and played some arithmetic tricks. With the arrival of computers, more impressive humanoid robots have been constructed to demonstrate their specially designed skills. Wabot— 2 [83] (1980—1984) was designed and built solely to read musical notes and play piano. 1 WABIAN [61] was a humanoid robot capable of walking and dancing. An inspiring engineering achievement in humanoids is the P2 robot from Honda, which is capable of walking and climbing stairs [39]. More recent humanoid systems include H6 [69], ETL- Humanoid [68], and Robonaut [59]. These projects primarily focused on the challenges of mechatronic design and integration of anthropomorphic bodies and programming for generating action sequences. 1.1.1 Learning, Perception and World Models Actions can be programmed in or learned through practice. However, an action is not very useful if the robot does not have a perception capability, not knowing in what environmental context to produce an action. Perception is well-known to be very challenging and typically a world model is predesigned. What are the limita- tion of such a hand-crafted model? Since these issues are closely related to robot construction, we discuss these issues in this section. 1.1.2 Actions through Action Practice Instead of programing action into a robot, another category of efforts aims to train the robot to produce desired behaviors. With redundant degrees of freedom, it is chal- lenging to learn behaviors, especially when training time is limited or when training must be conducted in real time. Such projects include LWPR on a SARCOS robot by Schaal et al. [97], the simulated humanoid of a 37 degrees of freedom by Mataric et a1. [11], and the scheduling degrees of freedom by Grupen et al. [26] motivated the early child motor development. 1.1.3 Perception-guided Actions In contrast with the above efforts that concentrate on behavior generation with- out requiring sophisticated perception, a series of efforts deals with perception and 2 perception-guided behaviors. The main challenging perceptual sensing modalities include vision, audition, and high-dimensional touch with many touch elements (in- cluding range sensing). No matter how complex a behavior (action sequence) is, the behavior is useful only when the robot can produce the behavior in the correct perceptual context and can suppress it in all other contexts. Studies of perception driven behaviors have had a long history. A well-practiced approach is that a human programmer defines features (e.g., edges, colors, tones, etc) or environmental models for the system. For example, the recent works on H2 from Honda, Cog [19] and Kismet [14] produced impressive perception driven behaviors. Some speech features and visual features are used to establish audio-visual association by an active camera on a robot hand [78]. Programming perceptual capability using human defined features is a quick way to produce results in a controlled setting. A fundamental limitation is that the resulting . robot cannot work well in unknown, partially unknown, or changing environments. Weng & Chen [110] explained the inefficiency of human defined features, the insuffi- ciency of hand crafted models, and the difficulty in programming their applicability checker for an unknown environment. 1.1.4 World-model Free Approach The world-model free approach has been studied somewhat independently in two research communities: robotics and computer vision. In the robotics community, they are called behavior-based methods [18] [7]. The emphasis is to concentrate on behavior generation directly from range sensor readings or directly from images [65] [43].F The model-free approach in the computer vision community is called appearance- based methods. The appearance-based method started around 1990 [50] [94] and it appears to be the most popular method in the computer vision community now. Appearance-based methods use statistical tools directly on high-dimensional image vectors, normalized using the mean and variance of the pixels. Thus, an image with m 3 rows and 71. columns is considered as a d-dimensional vector v 6 Rd, where d = mn. Since. high-dimensional statistical tools are directly applied to the vectors in space Rd, these type of method can take into account not only correlations between nearby pixels, but also far-away pixels. The need to process a high-dimensional sensory vector brings out a sharp difference between behavioral modeling and perceptual modeling: the effectors of a robot are all known but the sensory Space Rd is extremely complex and unknown and, therefore, very challenging. The power of learning directly from entire images has been demonstrated in vision- guided autonomous navigation. They include ALVINN [74], which uses multilayer perceptron (MLP) networks; ROBIN [76], which uses radial basis function networks (RBFN); SHOSLIF [109] and state-based SHOSLIF [23], which use Fisher’s multidi- mensional discriminant analysis in building a regression tree. As has been shown in the comparison study of SHOSLIF [23,109], statistical regression methods perform significantly better than traditional non-statistics based networks such as multi-layer perception and radial base function networks. Perception-learning-based action generation relieves the human programmer from the intractable task of programming perception. However, the above learning process is not fully autonomous in the following sense: (1) The human programmer designs a part of the task-specific representation, e.g., features and states. (2) The human programmer has direct control over internal modules during learning. This mode of manual development fundamentally limits the scalability of behavioral and perceptual capabilities. 1.1.5 Autonomous Mental Development M. Sur et al. [55] find that if the optic nerve circuits carrying vision signals are re- wired into the auditory cortex of a young ferret; the auditory cortex, the brain zone usually assigned for sound processing, can develop the properties that are observed only in visual cortex. This reveals a point that a developed mammal brain is a prod- 4 uct of active, autonomous and extensive interactions with the environment around it. Motivated by this discovery and others, a new paradigm, autonomous mental development (AMD) proposed by [118]: o A human designer designs a robot body according to the general ecological condition in which the robot will work (e.g., onland or underwater). o A human programmer designs a task—nonspecific developmental program for the robot. 0 The robot starts to run the developmental program and develops its mental skills through real-time, on-line interactions with the environment, which includes humans. A robot built in this paradigm is called a developmental robot. Early examples of such developmental robots include SAIL (short for Self-organizing, Autonomous, In- cremental Learner) robot [105,130,132] at Michigan State University and the Darwin V robot [5] at The Neurosciences Institute, San Diego. These robots share the char- acteristic that system behaviors are generated through online real time interactions with the environment. Motivated by the above-mentioned research, this work follows a new engineering paradigm, autonomous mental development [117], which emphasizes embodiment, environmental openness, completeness in using sensory information, online processing, real-time speed, incremental processing, performing while learning, and scale—up to complex tasks. This dissertation is based on the research work of a mobile robot project at Michigan State University funded by DARPA (Defense Advanced Research Project Agency) DRS program. The purpose of the project is two—fold: to provide a general purpose, flexible, and dextrous robotic platform from engineering aspects and to understand human autonomous mental development from scientific aspects. 5 1.2 Outline of the Dissertation The scope of this dissertation is about building the body Of a humanoid robot and applying it to the applications of autonomous navigation. Chapter 2 describes the design of the Dav robot. First, the mechanism and kine— matic model of the robot are introduced. To facilitate the joint-level control, the robot’s dynamic analysis is derived. Then, we analyze the robot’s mobile base in detail since it is related to the later navigation task. The control system is also presented in this chapter. Chapter 3 presents a theory of developmental mental architecture. Five architec- ture types from the simplest Type-1 (observation-driven Markov decision process) to Type-5 (DOSASE MDP) are introduced. The properties and limitation of the simpler one are discussed before the introduction of the next more complex ones. Further- more, we present the architectural design of the Dav robot. The framework of the Dav architecture is hand-designed, but the actual controller is developed, i.e., generated autonomously by the developmental program through real-time interactions with the real physical environment. We present the Dav architecture and the major compo- nents that realize the architecture. The designed architecture for Dav is the next generation version from its extensively tested predecessor - the SAIL developmental robot. Chapter 4 outlines a novel developmental perception-driven control architecture (DPDCA), which is adapted from the overall architecture presented in Chapter 3. Dav’s range-based indoor navigation experiment is used to illustrate the power of DPDCA. Chapter 5 introduces a learning-based approach to obstacle-avoidance behavior with a global planner. The robot operated in a partially known bounded 2-D en- vironment populated by unknown static or moving obstacles (with slow speeds) of arbitrary shape. The sensory perception was based on a laser range finder. Local re—planning was implemented to account for unknown and dynamic parts of the en- 6 vironment. To greatly reduce the number of training samples needed, attention was used to learn the obstacle-avoidance behavior. An efficient, real-time implementation of the approach has been tested, demonstrating smooth obstacle-avoidance behaviors in a corridor with a crowd of moving students as well as static obstacles. Chapter 6 summarizes the contributions of the research and concludes the disser- tation with recommendations for the improvement of the Dav robot and for future research work might be conducted on the humanoid robot. Chapter 2 The Body Design of the Dav Humanoid Robot In this chapter, we first discuss the body requirements of a developmental robot. Sec- tion 2.2 presents the mechanism and kinematic analysis of the Dav robot whose overall appearance is shown in Fig. 2.1 and detailed Specifications are listed in Table 2.1. We derive the dynamic analysis of Dav’s serially driven manipulator in Section 2.3 and the mobile base in Section 2.5. The joint-level control for Dav is presented in Section 2.4. The design of the distributed control system is outlined at the end in Section 2.6. 2.1 The Body Requirement for a Developmental Robot The essence of autonomous mental development by machines is the capability of learning directly, interactively, and incrementally from the environment using on- board sensors and effectors. A developmental robot should be a real robot that runs a developmental algorithm and is allowed to learn and practice autonomously in the real physical world. The goal of the Dav robot project is to provide a next generation robot platform for 8 Table 2.1: The specification and installed sensors on the Dav robot. Specification [ Installed sensors Mobile humanoid, self-contained 700mm(L) x 700mm(W) x 1800(cm) body dimensions 242 Kg 43 degrees-of-freedom (DOF). The mechanisms include an 8-DOF drive-base, a 2-DOF torso, a 3—DOF neck, a 5—DOF head, two 7-DOF arms, and two hands Main computer includes a quadruple Pentium Xeon III desktOp with 2 Giga memory and 100 Giga SCSI hard drives 11 Motorola MPC555 embedded processors as low-level Actuator Control Unit (ACU). All ACUs are connected to the main computer via CAN buses Wireless ethernet, Cisco Airo 340 series PCI card and access point 2 Hauppauge WinTV PCI cards as image grabber A Sound Blaster Live! 5.1 from Creative Labs Four 110 AmpH lead acid batteries. 8—hour continuous operation without recharging 0 Two color micro CCD cameras, QN42H from ELMO 0 Two Microphones with pre-amplifiers 0 Laser range scanner (covering 180 deg at 0.5 deg resolution), PLS, SICK company 0 Shaft encoder for each DC motor 0 Two acceleration Sensors (Analog ADXL202) mounted on the head to mimic vesicular organ 0 Current sensor or strain gage to sense the torque at each joint Figure 2.1: Dav: a mobile humanoid robot research on robot mental development. Because Dav is meant for mental development, its design requirements are not the same as other humanoid robots in the following perspectives: o Mobility. There have been a number of studies that show the importance of autonomous movements in developing the humans’ and animals’ senses of the world and their normal behaviors. “Kitten carousel” [38] especially shows that autonomous mobility is necessary for the kitten to develop the cliff avoidance behavior. Therefore. it appears that autonomous mobility is a necessary condi- tion for mental development. The implication of mobility and autonomy means that the robot must be totally self-contained with no tethers. The amount of power and computational resources for an eight-hour period of actions re— quire batteries of such capacity that a biped robot is not feasible for current 10 technology and a rolling base with high mobility was chosen. Manipulators. A developmental platform should be equipped with dexterous arms and hands. In other words, it should have sufficient degrees of freedom and range of motion. It is desirable for a developmental humanoid to take a human shape so that they can use a wide variety of tools designed for humans. However, the design faces conflicting conditions: power and size. We like to have a powerful limb for a larger payload. The higher the power, the larger the motors and other supporting devices. But the size is limited by a typical human arm size. The electronic boards of each limb must be contained in the same limb. The most challenging activity of all is to satisfy this self-containment requirement for Dav’s hands. We expect that this problem will be better solved if the motors are custom designed. Sensing. Developmental robots grow world representations autonomously from a sensorimotor stream. Also the controller is perceptual driven architecture. Therefore in a general setting, major sensor modalities such as visual, auditory, touch, and somatosensory should be integrated. Computational resource. The developmental program must be able to update its memory for all the sensors within a fraction of a second, e. g., 10ms for sound and lOOms for vision. The control signals of effectors must also be updated at a rate of at least 10Hz. Currently, hardware implementation is not practical for an algorithm that is expected to change very often throughout the research project. Thus, the de- velopmental program is to be implemented by software. The memory of a developmental robot is expected to be very large. Thanks to the logarithmic time complexity of IHDR [44] [114], it is expected that the refresh rate of the developmental program will not slow down too much when 11 the memory size increases. However, this is still an open question, since we need to test a robot with more extensive learning experiences. Power supply. Developmental robots need extensive training and practice. We should have enough battery capacity to allow the robot to operate without charging the batteries frequently. The battery capacity is still an unsolved problem, as with electrical cars. Currently, with a consideration of cost and capacity availability, we still think that a reasonable choice is sealed rechargeable batteries. Wiring. Wiring is a challenging issue for humanoids, especially for develop- mental ones. Due to a very large number of sensors and motors in the limbs, the number of wires required between each joint reaches a few hundred if data processing and computation are centralized at a single location. Such a large number of wires cause at least two major problems. First, the rigidity of the wire bundle interferes with dexterous manipulators. Second, the wire bundles cannot sustain repetitive bending during extended usage and will break. We have adopted a network scheme, namely, a distributed embedded control sys- tem is designed and implemented. Further mechanical parts (especially joints) are designed in such a way that the wires can go through the mechanical joints with minimal bending to increase the reliability of the system and reduce main- tenance frequency. Torque command. In biology the muscles are controlled by the action potential in motor neurons. The larger the action potential, the more muscle fibers are recruited to contract. Therefore, the action potential can be regarded as a torque command. Torque commands can be easily implemented by using the DC motors except in the mobile base, which is a non-redundant dynamic system 12 with non-holonomic constraints. A decoupling controller is proposed to realize torque commands on the mobile base. 2.2 Mechanism and Kinematic Analysis The Dav robot is designed as a wheel-based mobile humanoid with 43 degree-of- freedoms (DOF) to achieve the above-mentioned requirements. Figure 2.1 Shows the body of the Dav robot. It is contained in a volume of 750(1) x 750 (w) x 1700(h) mm in default posture. The DOF distribution is shown in Fig. 2.2. The mechanical body contains over 300 types with a total of over 900 custom-made parts. All of the mechanical drawings of the custom-made parts were generated by Solidworks. Figure 2.2: Location of degrees of freedom for Dav Dav consists of a mobile base, a torso, two arms, a head, and a neck. Each module is designed in such a way that it is easy to assemble and maintain. In the following, we present the mechanism and kinematic analysis of these modules in detail. 13 2.2.1 Mobile Base High mobility is essential for robots to enlarge their scope of sensing, which is essence for developmental robots having rich sensorimotor experiences. However, the car-like wheeled robots have nonholonomic constraints that reduce the mobility and make planning and learning difficult. Omni-directional vehicles are hence developed. Omni- directional (holonomic) vehicles can move in any direction immediately at robots’ any configuration and the motion can be decoupled into three independent components (1:, 3;, ¢). This feature avoids the complicated manueuvering sequences to bring the vehicle to a side position and, thus, facilitates control and learning. There are a variety of designs of omni-directional vehicles in the literature. These vehicles can be categorized into two classes based on their wheel design: a Special wheel design and a conventional wheel design. Most special wheel design is based on a concept that the robot actively drives in one direction while allowing passive motion in the other [8,120]. However, vehicles based on this holonomic wheel design have the following disadvantages under practical environments: 0 Complicated mechanism 0 Limited load capacity because of a reduced contact area 0 Limited accuracy to estimate the pose of the vehicle by dead-reckoning because of the difficulty to decide the passive motion of wheels 0 Difficulties in coping with uneven floor and other floor irregularities in cases of four-wheel driving vehicle without suspension 0 Low clearance from the floor makes this design hard to deal with the floor’s step features (e.g., an exposed thick wire on the carpet). To overcome those difficulties, the other category (conventional wheel design) is proposed. For example, Wada & Mori [101] gives a design to achieve near omni- 14 directional mobility by using two active casters. Nomadic XR4000 is another example with four casters [40]. Because of its high load capacity, simple mechanical design, and tolerance to floor irregularities, conventional wheels with offset are chosen to realize omni-directional mobility. Shown in Fig. 2.3, Dav’s mobile base has four wheels, and each one is driven by two DC motors for driving and steering separately. Motors Figure 2.3: The Solidwork’s rendering of Dav’s mobile base. Fig. 2.4 shows an offset steered driving wheel (an actuated caster). At this config- uration, the speed from the driving motor is along the wheel’s orientation while the speed from the steering motor is perpendicular to the wheel’s orientation. Thus it is possible to translate the steering pivot point to an arbitrary direction by controlling the two motors. Supported by the steering pivot points, the vehicle frame can be considered as a plate controlled by four supporting sticks. The plate’s translation and rotation movements are determined by the sticks’ speeds. Since each stick can move in an arbitrary direction, the vehicle’s movement is omni—directional. It is worthy noting that this is an over-actuated system because the four sticks’ motion over-determines the vehicle plate’s movement. A synchronizing control scheme is hence needed. In 15 the following, we derive the kinematic relation between the base’s joint speeds and the vehicle speeds. Section 2.3 gives the dynamic analysis and control scheme for the base. Figure 2.4: The offset steered driving wheel. The steering pivot’s speed can be arbitrary direction. For kinematic analysis, we introduce the following symbols that are illustrated in Fig- 2.5; v The speed of the vehicle. The components are [um uy,w]T ‘1 The configuration of the vehicle. The components are [$1, p1, ..., $4, p4]T ¢>i Angle of the ith steering joint, 2' = 1, ..., 4 ‘di Speed rates of driving wheel 'r a b The radius and offset of the wheel Consider the ith wheel, i = 1, ...,4. Let p,- and (p, be the wheel’s driving and Steering angular speeds respectively. Let u,- = (13,-, 3),) denote the steering pivot’s Speed. From Fig. 2.4, assuming that the ideal rolling condition is satisfied, we have Oi l/b 0 11¢ (2 1) _ I p} 0 1/r up, w here 21¢, and up, are the wheel’s translational speeds in the wheel’s frame. The 16 8 Figure 2.5: The top view of the mobile base. The black small dots are pivot axes along which the wheels can turn, and the small solid squares denote contact points between the wheels and the ground. transformation matrix from the vehicle’s frame is: 2). Sin,- —cos, 3'3,- ¢. = (I5 45 . (2.2) Up. COS ¢i sin ¢i 9i P 11lgging Eq. (2.2) into Eq. (2.1) and we obtain the relationship between the wheel’s J 0i nt speed and its steering pivot’s speed as: 031' = 5111451/(9 “COS¢i/b it (2.3) p, cos 451/7" Sin d), / r 3),- Referring Fig. 2.5, we obtain the following relation between the first wheel’s steer- 111g pivot’s speed and the whole vehicle’s speed: i: u +aw 1 0 —a 1 = I = v. (2.4) 2]] uy-l-aw 0 1 —a he Similar procedure is applied to the other three wheels and we erte the result In 17 matrix form as: 231 1 0 —a 91 l 0 —a 1'72 0 1 —a ' 1 0 a y’ = v, (2.5) 233 0 1 a 93 1 0 a (L4 0 1 a .3“. L1 0 —a‘ where a denotes the offset between the center of the base and the wheels’ pivot points. Eq. (2.3) is equivalent to (AI 531 b1 _ _ a 432 T1 0 0 0 5:2 q= {)2 = 0 T2 0 0 '1“. (26) 4’3 0 0 T3 0 1'53 [)3 _0 0 0 T4] :03 454 Si34 364‘ :794‘ Where _ sinoi/b —cosq§,~/b i— cos qii/r sin (bi/r , for 2‘ =1,...,4. Applying Eq. (2.5) to Eq. (2.6), we can derive the wheel joints’ speed q from the vehiCIG cartesian speed, x: q = Bv, (2.7) 18 where - psinqbl/b —cos¢1/b —asinq51/b+acos¢1/b cosqfil/r sin¢1/r —acos¢1/r—asin¢1/r sinqbg/b —-cos¢2/b —-asin¢2/b—~acos¢2/b cosrzbg/r sinqig/r —acos¢2/r+asin¢2/r sinqbg/b —cos¢3/b asin¢3/b—acos¢3/b cosqbg/r sin¢3/r acosqfig/r+asin¢3/r sin (154/1) — cos¢4/b asin¢4/b+ acos¢4/b cos¢4/r Sinq§4/r acos¢4/r—asin¢4/r - d Therefore, Eq. (2.7) gives the kinematic equation we are looking for here. We can see Dav’s mobile base is a redundant-actuated system with non-holonomic constraints. The system equation has 3DOF in velocity Space while 8DOF in configuration space. 2 - 2 .2 Torso TO imitate the human body, as well as to enable picking up objects from the floor, a. 2DOF torso is placed on the top of the mobile base. The rolling joint, the one ha-\v'ing an vertical axis, can be regulated with motion of the mobile base to keep the orientation of the torso. Fig. 2.7 shows the D-H coordinate frame assignment on the torso mechanism. It is worthy noting that there are three end-effector frames: oz2y2z2, 027331323 and 033 431424, corresponding to the left arm, right arm, and neck respectively. The four D- H geometric parameters associated with each link are listed in Table 2.2.2. Those four geometric parameters are defined as follows: 9 0, is the joint angle from the 33-1 axis to the :13,- about z,_1 axis (using the right handle rule). - d1 is the distance from the origin of the (i ~— 1)th coordinate frame to the intersection of the z,_1 axis with the 1',- axis along the 2.3--1 axis. 19 I:Tigllre 2.6: CAD drawing of Torso. Joint3 and Joint4 are the first joint of the left arm and rlght arm respectively. The unit is in mm. 20 o a,- is the offset distance from the intersection of the 2,-_1 axis with x,- axis to the origin of the ith frame along the 3:, axis (or the shortest distance between the 2.3-1 and z, axes). 0 Oz,- is the offset angle from the 2,11 axis to the z,- axis about the :13; axis (using the right-hand rule). D 2 z3= ‘03 Z4704 024:2 ‘p’ JG, 3 7 th4 I x x2 l» 20, yl X0, 217’ 00,01 + 20/ /// //// F i.gllre 2.7: Coordinate frames of torso. Frames 0x4y4z4, ox4y4z4 and ox4y4z4 are end- effector frames for mounting the left arm, right arm, and neck respectively. 2 — 2.3 Neck and Head AS a sensor platform, the neck and head (see Fig. 2.8) are designed to support active Vision system. This 8DOF mechanism includes the 3DOF neck, whose rolling joint with the vertical axis is actuated by an anti-backlash geared motor. The brows and lips are computer-controlled for expressing emotion. The two cameras can indepen- dently pan and coupled tilt. Fig. 2.9 shows the D-H coordinate frame assignment on the torso mechanism. There are two end-effector frames: 0r4y4z4 and 02:4’y4’z4’, corresponding to left and right eyeballs respectively. These joints’ axes cross each other orthogonally: axes of joint 1 and joint 2; axes of joint 4 and joint 5; axes of joint 4’ and joint 5’. Their Or‘ - . . . . lglns COInCIde, shown In Fig. 2.9. 21 Table 2.2: D-H geometric parameters. The joint variables are marked with *. The unit is in millimeter. (a) is the parameters for left arm, (b) is for the right arm, and (c) is for the neck. 61 and 6;; denote the angles of joint 1 and joint 2 respectively. nink[ a, [(1,- [ drier] 1 0 90° 0 9; 2 320.0 180° -155.0 0; (a) [Link [ a,- [ Oi I (I; I 91‘ J 1 0 90° 0 0; 3 320.0 180° 155.0 3 0)) [Link] a,- [ Oi [di [ 0i J 1 0 90° 0 a; 4 283.0 180° 0 6; (C) The four D-H geometric parameters associated with each link are listed in Ta- ble 2.2.3. 2 - 2.4 Arm Dav has human symmetry with two arms. Each arm is a 7DOF anthropomorphic manipulator shown in Fig. 2.10. The extra joint is added to each arm for increasing the end effector’s range of motion. Composed of shoulder, elbow, and wrist, the arm design has been influenced by the human skeleton structure. Most of joints of arm except Joint 1 which is driven by a set of bevel gears, are actuated by motors vVitll gear-box either directly or through a simple pulley timing-belt mechanism. This design simplified the mechanical transmission system and saves a lot of room for installing servo units such as servo amplifiers, sensors, and control boards. All servo units are integrated inside the arm itself except those for the shoulder, which are l OCated conveniently in the torso. 22 R-Pan Joint L—Pan Joint m: ' _. a 2 'l l!“ L—Pan Motor ; Yaw Joint L—Pan Motor Pitch Joint I! .3 Roll Motor/Elma Motor Figure 2.8: CAD drawing of the neck and head module. The unit is in mm. Joints 1, 2, and 3 together realize a spherical shoulder configuration, shown in Fig- 2.11, in which the joint axes 20, 21, and 22 intersect at 01. Similarly, the wrist IIleczhanism, consisting of Joints 5, 6, and 7, adopts the Stanford manipulator (spher- iCa-l wrist) design. The angle of spherical joints are Euler angles, which simplified the 1{il‘lematic analysis. D-H parameters are shown in Table 2.2.4. Once the D-H coordinate system has been established for each robot link, the forVvard kinematics can be easily derived via homogeneous transformation. Let A, be t he matrix transforming the i - 1th coordinate frame to the ith frame as follows: cos 0.- — cos a,- sin 0,- sin a,- a, sin 9,- sin 0,- cos a, cos 0,- - sin a,- cos 0,- a,- sin 9,- , = (2.9) 0 sin 0; cos a,- d,- 0 0 0 1 The matrix T3 which specifies the location of the ith coordinate frame with respect to he lnertla frame (frame 0) IS the chain product of successwe transformation matrices 23 Right Left image plane image plane 25' 25 x3',z4' yS' X3,Z4 3’5 23 y4 4. ’ - - 3, 4 / 230,),40 y X y3',x4' x2 y2 .g. 22 zO,y1 x0,zl D y0,xl /77///// Figure 2.9: Coordinate frames of the neck and head. Frames ox5y525 and ox5’y5’25’ are end-effector frames for the left and right cameras respectively. Of Ak, for k = 0, ...,i, and is written as T3 = H A, (2.10) k=0 From the Eq. (2.10), the robot kinematic model is thus expressed by the 4 x 4 homogeneous transformation matrix T = Ta‘, where n is the number of robot moving link under consideration. The T matrix has combined effects of rotation, translation, and scaling and can be decomposed into: R d T = , (2.11) 0 1 ' U here R denotes the rotation matrix of the end-effector frame with respect to the 111ertia frame; (:1 denotes the translation with respect to the inertia frame; 0 = [0,0,0]. Due to its redundant nature, Dav’s inverse kinematic analysis is extremely complex a. . . . . . Dd does have a unique solution. Usually, more constramts such as minimum power C . . . . . . thUHlptIon and smoothing trajectory are needed, but this model-based Inverse kine- 24 Joint] Joint 4 Figure 2.10: Dav’s left arm. The unit is in mm. The actuator for Joint 1 is housed in the torso module. 25 Table 2.3: D-H geometric parameters of neck and head mechanism. The joint vari- ables are marked with *. The unit is milli-meter. 61, 62, 9;, denote the angles of the neck’s Roll joint, Pitch joint, and Yaw joint respectively. Frames 4 and 4’ are the end-effector frames of the left and right eyeballs respectively. Note that we do not consider two pan joints in head. Wink] az- Lei l 612- LU 1 0 90° 0 6; 2 32 —90° 0 a; 3 101 90° 77 0; 4 0 0° —53 a; 4’ 0 0° 53 9;»: Table 2.4: D-H geometric parameters of the arm mechanism. The joint variables are marked with *. The unit is milli-meter. 0,- denotes the angles of joint 2', for i = 1, ..., 7. [Link] a,- [ a,- [ d,- [ 6,- [ l 0 -90° 0 6f 0 90° 0 0; 0 -90° 309.9 0; 60 0 499.9 0; 30 —90° 0 0g 0 90° 0 6’; 0 —90° 80 I9; \lCfiUlvbOOlO matics method is beyond the scope of this thesis. As studies in references [30,32,48], this work focuses on using learning-based methods to approximate the mapping from cartersian end-effector work space to joint space. 2.3 Dynamic Model and Control In this section, we derive equations describing the time evolution of the robot’s con- figlues based on torques or forces under a known environment (e. g., repeated tasks on a manufacturing assembly line with fixed payload). It is worthy noting the known- e”"”' 011111th assumption is not valid for some robotic: applications. Those cases in- 26 Figure 2.11: Coordinate frames of the left arm. Two Spherical mechanisms (shoulder and arm) are designed. clude: the dynamics of the manipulated object changes tremendously during a manip- ulation task; dynamic equations are not available due to the complexity of the system and lacking of domain knowledge. However, deriving such equations of motion is still useful for designing a suitable low-level control law for Dav’s mechanism, and may serve as a survey of the model-based control method. The dynamic model of a robot can be obtained from known physical laws such as the Newtonian mechanics and Lagrangian mechanics. The derivation of robotic dynamics based on the Lagrange-Euler (L—E) formulation is more systematic; we hence derive Dav’s dynamic equations based on the L-E formulation. We do not Present the whole robot, which is cumbersome. In the following, we first outline DE a"filysfis on a general setting. We then apply it to a general n—link robot manipulator. MObfle base’s dynamic analysis and control are relegated to Section 2.5 because the base is a. redundant-actuated system with non-holonomic constraints and, thus, needs Sp 90” 3.1 t reatment. 27 2.3.1 Lagrange-Euler Formulation In the L—E formulation, the system’s dynamic behavior is described in terms of work and energy using generalized coordinates. All the constraint forces are eliminated in this method. Assuming a robot has n DOF S, the L-E formulation is expressed in the following form at 6L 8L 32(671.)_ SE = 1. (2.12) for i = 1, 2, ...,n, where e L: Lagrangian function and equals to kinetic energy K - potential energy P K: Total kinetic energy of the robot P: Total potential energy of the robot q,: Generalized coordinates of the robot (j; Generalized velocity, or first derivative of q,- with respect to time o 17,-: Generalized actuating force (torque) applied to the robot at joint i Now we are ready to derive the dynamic equation for a robot with it links. It is known the overall kinetic energy of a rigid body is given by 1 1 K = Emv'fvc + éwTIw, (2.13) Where Vc and w denote respectively, the mass center’s translation velocity vector and angular velocity vector of the rigid body; m and I are the total mass and inertia matrix respectively. 28 2.3.2 Analysis of Manipulator with n-link In the D-H representation, the joint variables can be served as generalized variables of L-E equation. According to the velocity kinematic analysis of [89, Chap. 5], the ith link’s linear velocity of mass center and angular velocity are written as vCi = Jvci(q)qi wi = R?(Q)chi(q)q (214) respectively, where Jun. and chi are J acobian matrix for link i; 1% denotes the rotation matrix of the ith frame with respect to the inertia frame (frame 0). Let re, be the mass center of link i in the inertia frame (oroyozo). Since Dav has rotational joints only, Jvc, and LC, can be calculated by Jvci = [J1 .12 .1.- 0 o], (2.15) chi Zi_XI‘.—Oi_ Z_XO—0_ whereJi= 1 (6' 1) andJk= k1 (k k1) fork<=:(:tively; qd and q“ are the desired displacement and speed of the joints respec- tively, which are given by a path planner in advance. In [89, Page 217-218], Spong & Vidyasagar show that, in the absense of gravity, that, is, if g is zero in Eq. (2.31) for all configurations, the PD control law specified in Eq- (2.32) achieves asymptotic tracking of the desired position q“ with q“ = 0. However, the presence of the gravitational term or nonzero q does not guarantee the a'SbrtrIIDtotic convergence of Eq. (2.31). A “stable” tracking error from the desired tr ‘ . - - a3 eC:tory lS needed to balance the grav1tatIonal torque g. In the practical implementation, Dav moves at a slow speed: qd as 0; the dynamic terms such as off-diagonal inertial elements djk, j 75 k, the centripetal and COl'iOIiS term C(q, ('1), and damping from motor B(q) are usually small. Notice that the comDUted torque rk is proportional to the gear ratio; thus those dynamic terms are redueed further for a small nk, which adds to the validity of that the terms djk, j 75 k are Ilegligible for control. The gravity effect specified by g plays a dominant role in th e II‘Dbot dynamic model. In order to compensate for gravity, the following control 34 law is proposed for Dav’s low level control: 11 = Dead + g(CI) - Kp(q — <1") - Kd(<'1 — CI"). (233) where Deff is a diagonal matrix whose diagonal elements denote the effective inertia of each link of the robot. Eq. (2.33) can be written in the component form as ”I: = derriik + 91:01) — kp(Qk — (If) — kd(dk — (fig), (234) Where de , kp, and kd are effective inertia, proportional gain, and derivative gain of the Ir: joint respectiVely. Applying Eq. (2.34) into Eq. (2.29), by assuming def; = dick + Jmk/ni and letting 1 c = d" .. .1... _ . I: Z quJ + 2 Ctqu q] + "[2: Bka #k 1,1 denote the disturbances, we Obtain the following closed-loop system equation for the jOint k: deffék + Ck = —kp€k — kdék, (2.35) Let us remark that the control law specified in Eq. (2.34) is derived under the cm>tl1/b — (ism/b 0 o o o o o o- asdn/r — acnl/r o o o o o o o 0 0 —ac¢2/b+ as¢2/b 0 0 0 0 0 61:3 _ o o asabz/r + ac¢2/r 0 o o o 0 6q - 0 O 0 O acos/b + as¢3/b 0 0 0 0 0 0 0 —as¢3/r + ac¢3/r 0 0 0 0 0 0 0 0 0 acdm/b - as¢4/b 0 . 0 0 0 0 0 0 —as¢4/r - ac¢4/r OJ (2.54) In Eq. (2.54), s and c denote sin(.) and cos(.) respectively, for shortness. 2 - S .3 Controller Design Alt hough mechanical system with nonholonomic constraints cannot be stabilized at a. point by a smooth time—invariant feedback rule in general settings [17], we show e3(3F3erimentally that Dav’s mobile base can track different curves with high perfor- II“ta-Ifme. We notice that rank(BT) = 3 and dim(T) = 8, thus the system is sufficient a. QtIJated, or redundantly actuated, to be more precisely. However, those surplus i I11:)11ts offer a solution to implement the holonomity of the base. 43 For feedback linearization, we use following input transformation, T = H((BTJOB + M0)u + BTJOBV), (2.55) where n denotes the input of the “new” system and H is an 8 x 3 matrix satisfying BTH = I. A choice for H is the pseudoinverse of BT; i.e., H = (87)“. Another choice of H is the solution of the following minimum norm problem3: min TTT s.t. F = BTT where F = (BTJOB + Mo)u + BTJOBV. (2.56) Thus, by referring [81, Page 395], we can see the solution of Eq. (2.56) is T = B(BTB)'1F and H = B(BTB)-1. (2.57) It is worthy noting that the minimum norm solution of H has a nice property: it distributes the torques in a way that minimizes the actuators’ output torque. We notice that the minimum norm solution of H does not guarantee that the killematic Eq. (2.7) be satisfied. In fact, due to external disturbance or unmodeled er r or, the ('1 might not confined the distribution A. We need a mechanism to entrap the velocities of the system within the distribution A. Motivated by sliding mode control [87, Chap. 7], we add an extra term to Eq. (2.57) which is servoing on the residue of Eq. (2.7). More formally, let q = Q,» + q], where qp = Bqu and q—k = (I — BBl)q (see Fig. 2.14). Thus, Eq. (2.57) becomes H = B(BTB)‘1 + AT (2.58) \ D1 F has a physical meaning. F denotes the desired equivalent force exerting on the vehicle Vii-t form. We treat the vehicle frame as an end—effector and T as the joints’ torque vector. The F731‘Jal displacement of the joint variable 6q = 36x. The virtual work 611} of the system is 611) = 6x — 7T6q = (FTB — T)6x. 6w = 0 if the platform is in equilibrium (the equivalent forces equal t. C) the actual driving force), then T = BTF. 44 where AT 2 —quJ_ and K p is the gain constant. We can easily verify that BTH : I. Let us remark that, Eq. (2.58) will improve the transient behaviors, especially when the wheels are not synchronized initially. This is shown in Fig. 2.14. When q does not lie on A(q), the A?“ will pull the q back to the distribution A. Figure 2.14: If q is away from the distribution A, the extra term AT in Eq. (2.58) will pull it back. Therefore, applying Eq. (2.55) to Eq. (2.50), we obtain the following linearized S t at. e—space equation Q=Bv v=u 'The dynamic system described by Eq. (2.59) is known as the double integrator sys- tem since it represents 3 uncoupled double integrators with respect to x = (3:1, :132, £133) or Qoordinates of the base’s center (are, ya, 11)) in Fig. 2.5. This means that each input 21“? > the kth component of u, k = 1, 2, 3, is a function only of wk. That is, ul and 112 CO Iltrol the motion along :1: and y axes respectively, and 113 controls the rotation of thQ base. Since u can be designed to control the three simple linear second-order systems, t . . . . . hQ obv10us chorce 1S usmg PD control by setting u = —K,,(x — xd) — Kd(x — it") + it“, (2.60) 45 where xd, 56’ and 2‘1 denote the desired trajectory of the base; KP and K d are the PD control’s gain matrices with a common choice as _ - 2 2 2 Kp —' dlaglwl,w2)w3] (2.61) K d = diag[2w1, 21.02, 21123] double integrator system I I I .. xd Lr'\ u l (:11 {(3503 + M) u 'C 7 Mobile Base F; _l_ x g S{ ! +Bflaév} Dynamic System q l s ' ‘ l A i i l ! L ....................................... _! Linear Feedback Figure 2.15: Schematic diagram of control algorithm of Dav’s base. Fig. 2.15 illustrates the control diagram of Dav’s base, which is an inner- loop/outer-loop control. The computation of the nonlinear control Eq. (2.55) is performed in an inner loop, with u, q and v as its inputs and 'r as output. The outer loop in the system is the PD control calculating 11 based on the Eq. (2.60). 2.5.4 Experimental Evaluation We developed a simulated base experiment to verify the validity of the dynamic model and the effectiveness of the control scheme presented in the previous section. The simulated base model is kinematically dynamically similar to the Dav’s mobile base. The dimensions and the inertia parameters are specified in the following, according 46 to the previously introduced notations: 7‘ = 0.0889m a = 0.223m b = 0.0508m m = 250kg 1 .—_- 174.5kg-m2 I, = 0.0313kg—rn2 Iw = 0.005kg-m2 The goal of the experiment is to control the base such that the base’s geometric center coincides on a predefined trajectory. As [86], we formulate the control problem as a path-following problem instead of trajectory tracking. Consider a circular trajectory. Let ($0,310) and R be the center and radius of the circular path respectively. 1) denotes the desired forward translation and v = W. We assume the base moves along the path while keeping the orientation unchanged, i.e., «p = 0. This path-following problem can be formulated as controlling the following system such that the output 2 follows a desired trajectory 2": x = v v = u (2.62) 1110‘) z = , h2(V) vi + v: . . where h1(x) = (:rC — $0)2 + (ye — :170)2 and h2(v) = . The Jacobian matrix (.1) 47 of the output equation of the system specified in Eq. (2.62) can be written as 211x 211 0 H2 = €72} = y (2.63) 0 0 1 Therefore, if we regard the output 2 as the state variables, by using .731 = H15: and 2’2 = H211 to the first two equations of Eq. (2.62), then we obtain: 23 3'1 = 2(:1:C — 11:0)u1 + 2(yc — y0)u2 + 2113+ 211: 22 = 211x111 + 2vyu2 (2-64) 23:11:; where a feedback linear control law might be applied. The simulated sensors were the encoders placed on each motor. At each sampling instant (every 10 milli-seconds), the base received the joint angular displacement vector q and the corresponding derivative q. From Eq. (2.7), the vehicle’s speed v = Bq. The base’s pose (:cc,yc,1,b) were estimated via deadreckoning. Let the base’s initial position is (:rc,yc,1/1) = (0.0, 08,90"); initial velocity was zero; R = 5; v = 1. Therefore, zd can be chosen to be (25,0.25,0)T. Fig. 2.16 shows the result. Subplot (a) shows the actual and desired path, which are illustrated by, respectively, solid and dot-dash lines. In order to demonstrate the robustness of our model respect to the dynamic parameters, we increase the the mass and inertia momentum (m, I, 1,, Iw) by 100 percent. The result is shown in (b) of Fig. 2.16. We can see the overshot and small stable deviation from the desired path; but the overall performance is impressive. Finally, let us remark that, although the system shows the robustness for the 48 fStart Point . r fStart Point 1 Op [0. 1[ .1. 2- , 2. 3. . 3» 4. . 4- 5» , 5» 6. . 6» 7. q 7- 8. - 81- '1 9. . 9. 1o . ... .. 1. . 1 . 10» . . « 6 4 2 0 2 4 6 6 6 Figure 2.16: The result of the simulated base experiment. changing dynamic parameters, it is sensitive to the accuracy of kinematic model and deadreckoning. If the kinematic model is wrong, the base cannot decouple the wheel motion correctly and, in an extreme case, the wheel might fight each other. In addition, it is known that deadreckoning performance corrupts quickly with increasing traveling distance. Feedback from the other sensor modalities are needed; for example, laser ranger and GPS might be used for indoor and outdoor navigation respectively. 2.6 The Distributed Control System Design As a self-contained mobile humanoid robotic system, Dav carries its own power source, sensors, control system, learning system, and associated hardware. 2.6.1 Hardware Design The hardware architecture of Dav has two levels: a main computer system and low- level servo control modules, as shown in Fig. 2.17. 49 ..oEEEn 882 a 2352 8am _b_ A v. 05am 882 ® _ _ _ “VF—owned Paom 530m. ...,.mmmemeEmfiw ..... M.....m_mmmx.a:=mm.mwfl2x m0 030,—. 3.6-x mo .36; m0 3302 Neck/Torso Module 0 m0 n8: 30-x 93.»— 93-.. 30.4 2352 as: . Figure 2.17: The hardware architecture for Dav has two levels: a main computer system and low-level servo control modules. 50 Main Computer System In order to build the main computer system, it is essential to know the requirement of the task. The developmental program must process high—dimensional sensor data such as vision and auditory input in real time. The computation power is hence critical to achieve this real-time processing requirement. Various solutions were considered for the high-level computer system to achieve the requirements. The single and dual processor systems are not sufficient to achieve the required throughput. A choice for a quadruple-processor motherboard is required. Therefore, desktop motherboards (ATX) with 4 Pentium III Xeon processors and 6 PCI slots were chosen. Fig. 2.18 shows this main computer system with a LCD monitor. The main computer system situates in the middle of Fig. 2.17, including an 100G SCSI RAIDO disk array, two frame-grabbers for left and right cameras, a sound card, a wireless Ethernet card, and an Ethernet card connecting CAN PC (see Fig. 2.20) for interaction with low—level modules. Being the “brain” of Dav, this computer system handles all “cognitive” computation during learning or testing sessions. Figure 2.18: The “Brain” of Dav, a. quadruple processor desktop PC. The SCSI RAIDO disk array is not included. 51 Sensors Dav has visual, auditory, tactile, range, encoder, strain gage, and current sensors. The visual system consists of two micro color CCD camera heads with diameters of 7mm and focal lengths of 2.2mm. They are installed on the eye platform in the head. Each camera has a separate control box that is located in the torso, providing automatic gain control and automatic electronic shutter control. Video signal sampling is performed by two frame-grabber cards, which communicate with processors via a PCI interface bus. Two microphones are mounted on each side of the head. The auditory signals go through pre—amplifiers before they are sampled by a sound card in the main computer. The touch and torque sensors are important for hand-in-hand teaching, i.e., supervised learning. A laser range scanner is mounted in the front of the robot to realize range sensing. Embedded Controller In the low-servo control level, a distributing modular scheme was adopted. The low-level actuator control units (ACU) were placed as close to actuators or sensors as possible, to reduce the wiring. Eleven ACU modules have been designed and fabricated to satisfy the requirements of each motor groups, and were networked to CANPC by four CAN buses. This architecture reduces hundreds of signal wires to only two wires, through the entire body, plus other three wires for power supply (See Fig. 2.19). Consequently, the system reliability increases. Each of microprocessor-based servo modules is composed of two connected boards: servo interface board and an embedded Motorola PowerPC MPC555 board with 448k internal flash, 24K internal RAM, TPU, PWM, and ADC, as shown in Fig. 2.19 (b). The signal connector between the two boards serves two purposes: to attach modules physically together as well as electrically. Those ACU modules, with capability of interfacing up to four motor/ joints, communicate over a CAN bus with a rate of 1M bps. 52 CAN serial bus (two twist wires) CAN physical t - processor (with in-chip RAM, EEPROM, TPU, PWM, CAN, ADC modules) Power Power supplier 12-15V Management 1—— and processor supervisor Figure 2.19: The block diagram of PowerPC based controller. ‘1 s new? mung; sf; = ‘-‘....." H ' (a) (b) Figure 2.20: CANPC: a PC104 embedded system running RealTime Linux operating sys- tem with four CAN interface (a) PC104 computer board (MZ104+). (b) PCM3680 Dual— CAN interface board. 53 Distributed Architecture Fig. 2.17 illustrates the overview of the hardware design. There are head, left arm, neck-torso, right arm, left hand, base, and right hand modules, from the top to bot- tom and the left to right. Each motor has a driving amplifier and sensors such as a quadruple encoder and a torque sensor“. The head module has an ACU: HeadCB, controlling the Tilt, L—Pan (left pan), R-Pan (right pan), MT (mouth), and EB (eye brow) joints. The left arm module has two ACUs: L—ChestCB and L—ArmCB. L- ChestCB controls the left arm’s J2 (Joint 2), J3 (Joint 3) and J4 (joint 4) while L-ArmCB controls the left arm’s J5 (Joint 5), J6 (joint 6) and J7 (joint 7). Similarly, the right arm module has R-ChestCB and R-ArmCB ACUs, which control the corre— sponding to the left arm. The torso module has two ACUs: NeckCB and TorsoCB. NeckCB controls the neck’s three joints: Neck-R (roll), Neck-P (pitch), Neck-Y (yaw) while TorsoCB controls the torso’s two joints (torso’s Joints 1 and 2) and the first joint of the two arms (left arm’s Joint 1 and right arm’s Joint 1). The base module is the hub of the whole system where the main computer system is housed. The base module has two ACUs: L—BaseCB and R—BaseCB. L—BaseCB controls W—l-D (Wheel 1’s driving motor), W—l-S (Wheel 2’s steering motor), W-2-D (Wheel 2’s driving motor), and W—2-s (Wheel 2’s steering motor) joints. R—BaseCB controls W—3—D (Wheel 3’s driving motor), W-3-S (Wheel 3’s steering motor), W—4—D (Wheel 4’s driving motor), and W—4—s (Wheel 2’s steering motor) joints. In addition, the fabricated power board and 4 sealed lead acid batteries (with capacity of 110AH @12V for each one) are placed in the base module. The range finder (PLS, SICK 00.), connected to main computer by a RS232 interface (port 1), was mounted at the front side of the base for realizing the range sensing. We notice that those ACUs are not directly connected to the main computer sys- tem. They are connected via CANPC instead. CANPC has four CAN buses: CanO, 4Armature current sensors were placed on motors of all modules except the two arms. 54 Canl, Can2, and Can3. As shown in Fig. 2.17, CanO links TorsoCB, R—ChestCB, R-ArmCB and R-HandCB; Canl links L—ChestCB, L-ArmCB and L—HandCB; Can2 links Neck-CB and Head-CB; Can3 links L—BaseCB and R-BaseCB. At the top of Fig 2.17. Two microphones with pre—amplifier are linked to a sound card plugged into main PC’s PCI slot. Two micro camera heads, fixed in the inside of the head mechanism, are linked to two digital control boxes housed in the base. The control boxes’ outputting RGB signals are sent to the frame-grabbers in the main computer. 2.6.2 Software Development Dav’s control system involves the development of the large-scale control system that must operate with a significant and high bandwidth. A network of embedded proces- sors adds to the complexity of software development. Many issues such as modular- ity, scalability, reusability, internal communication transparent to location, flexibility, plug—and-play capabilities and open architecture, are needed to be considered. There are substantial amounts of work addressing those issues and a lot of solutions are proposed. For example, Ayllu [53] and COLBERT/Saphira [52] are used to control the ActivMedia Pioneer robots. However, those proposed solutions are designed for a particular control philosophy. They do not intend to encapsulate and to hide the low- level device details, in the meanwhile offer enough flexibility for high-level program if a different control strategy is needed. Player [96], DCA [60,70] and OROCOS [1] are hence proposed. The latter provides standard software packages (e.g., abstract device model) to facilitate high-level robot software development. As in Player [96], Dav’s software development was guided by our desire to con- currently support many heterogeneous devices and many heterogeneous clients. Each device operates at some inherent frequency, with wide variation among devices. For example, the popular SICK PLS laser ranger returns a full scan at approximately 8H2, while the robotic joints can give encoder feedback at almost 1000Hz. If we were 55 to take the naive approach and poll each device in turn at the rate of the slowest de- vice, we would be discounting the full capabilities of the available resources. In most cases, we want to obtain data from and send commands to each device at the highest rate possible in order to fully exploit the hardware and maximize the responsiveness of the system. Similarly, each client operates at some inherent frequency; while a simple client written in C++ may be capable of consuming new data at 100Hz, a graphically intensive client written in X-Window might operate at less than le. It should be possible for these two clients to be connected to Dav simultaneously and to execute as fast as they want without interfering with each other. Moreover, in order to design a software architecture on a highly distributed sensing and control hardware system as Dav, we need handle data flows between sensors, processors and actuators effectively in groups and across different CAN buses. Dav’s designed software architecture, shown in Fig. 2.21, provides a client / server model for realizing the robot interface. Acting as a client, the high-level control program is separate from the server, which executes low-level device operations either on ACU boards or local main computer system, via a unified UDP socket interface. We have three major considerations for choosing such a UDP socket based robot interface. 0 Distribution: A client (high-level program) can access to sensors and actuators from any workstation on the Ethernet network. For example, a clients may con- nect to multiple ACUs while a ACU accepts connections from multiple clients if no conflict exists. 0 Convenience: Clients can be written in any language as long as they support socket programming. For example, the user can choose C/C++ for high speed application, Java for trans-platform capability, and MATLAB for quick proto— typing. 0 Light-weight device server: We feel the commonly accepted CORBA interface is too sophisticated to fit robotic application where real-time constraint is pri- 56 mary important, even though a lot of effectors have spent to ensure CORBA’S performance (e.g., latency, throughput) is reasonable. ACE (Adaptive Commu- nication Environment) and RPC (Remote Procedure Call) are two communi- cation packages provides what we required, but they are not suit for embedded application because of their big footprint. We use a simple and direct approach, UDP socket interface shown in Fig. 2.21. Acting as the device server for low- level ACUS, CANPC receives UDP packets and forwards CAN packets to ACUS, and vice versa. Each UDP packet may contain several CAN packets (up to 64 packets) to improve the throughput. Dav is implemented in C++ and makes extensive use of the POSIX—compliant pthread interface 5. Since clients and physical devices may operate at different time- scales, we assign an asynchronous daemon thread for each physical device. A main thread listens for new client connections on a well-known UDP port, spawning client- stub threads on demand to service clients and the devices they request. The overall system structure of Dav is shown in Fig. 2.21. The center portion of the figure is Dav’s software architecture; on the left are the physical devices and on the right are the clients. Each client has a UDP socket connection to Dav. If the client is executing on the same host as Dav, then this socket is simply a loopback connection; otherwise, there is a physical network in between the two. At the other end, Dav connects to each device by whatever method is appropriate for that device (e.g., RS- 232 for the ranger finder and CAN for ACUs). Within Dav’s architecture, the threads communicate through a shared global address space. As indicated in Fig. 2.21, each device has associated with it a command buffer and a data buffer. These buffers provide an asynchronous communication channel between the device threads and the client threads. For example, when a client reader thread receives a new command for a device, it writes the command into the command buffer for that device. Later, 5Although the current implementation of Dav’s software architecture is on LINUX system, it is straightforward to port on Window system 57 2n :6on E020 3 ESmoE E620 nod—8m an: Client stub thread M Client stub thread Zn 0033 E 830n— Device thread 1 carom use 726 ..wov coats—e 258% oo_>oQ Figure 2.21: The software architecture of Dav’s control system. 58 when the device thread is ready for a new command, it will read the command from its command buffer and send it on to the device. Similarly, when a device thread receives new data from its device, it writes the data into its data buffer. Later, when a client thread is ready to send new data from that device, it reads the data from the data buffer and passes it on to its client. In this way, the client program threads are decoupled from the device daemon threads (and thus the clients are decoupled from the devices). Also, by the nature of threads, the devices are decoupled from each other, and the clients are decoupled from each other. Since the bandwidth and latency are primarily important for real-time system, we do not apply synchronization (e.g., semaphores) on the shared memory in the middle of Fig. 2.21. Thus, old data may be read when the client thread’s reading frequency is higher than device thread’s refresh rate. On the other hand, a client may miss data when its reading frequency is lower than device refresh rate. This is reasobable, since we treat the lately arrived data useless; it is hence unnecessary to keep a long data buffer for each device. Dav’s software architecture currently supports a variety of devices, including the popular peripherals (e.g., ACUS, GPS, camera and frame—grabber combination, mi- crophone and sound card, range finder). Devices with serial (RS-232), CAN or PCI interface can be easily added to the system by developing a device driver and modi- fying a configuration file. In order to provide a uniform abstraction for a variety of devices, we chose to follow the UNIX model of treating devices as files. Thus the familiar file semantics hold for Dav’s devices. For example, to begin receiving sensor readings, the client opens the appropriate device with read access; likewise, before controlling an actuator, the client must open the appropriate device with write access. By default, client programs receive data at 10Hz, which is roughly the signal latency via neural circuit from the brain to peripheral neurons. Namely, a client can expect to receive a data packet containing the current data from all the subscribed devices in every 100ms. Certainly, by sending all the data at once, Dav’s device 59 server might repeatedly send old data from a device that operates at less than 10Hz. We designed Dav’s architecture in this way for one reason: simplicity. By always transmitting the current state for all subscribed devices, regardless of the time-scale of the device, we facilitate the writing of client programs. As a result, clients are able to use a simple read loop to receive data from Dav’s device server. It is worthy to note client programs can issue command to Dav at any time. 2.7 Experiment All Dav’s joint and the control system have been successfully implemented and tested. Experiments were conducted to evaluate the performance of the robot. Fig. 2.22 shows the movements of the neck and head modules. Fig. 2.23 shows the movements of mobile base. Fig. 2.24 shows the arms’ movement. 2.8 Summary This chapter presented the mechanical structure of the robot. The kinematic model that describes the relation between the joint variables and the position of end-effectors was derived. For the purpose of the lower-level controllers, the dynamic models of Dav’s mechanism were outlined. The distributed embedded control system and the main computer system for mental development were sketched. It is known the kinematic and dynamic models of a humanoid robot are extremely complex [30]. For example, due to the redundancy DOFs of a humanoid, the inverse kinematics, the mapping from end—effector coordinates to joint variables can not be easily written a program to simulate. The learning seems to be a promising alternative solution. It is known that a robotics model that does not encompass learning will always require a human programmer for building the movement plans [47], which is not acceptable for a autonomous robot. 60 In the next chapter, we will introduce a new learning framework for robotics: a theory of developmental mental architecture and Dav’s implementation of this archi- tecture. 61 Figure 2.22: The movement of Dav’s neck and head modules. The first, second and third rows show the rotations of the neck’s Roll, Pitch, and Yaw joints respectively; the fourth and fifth rows show the tilt and pan motions of the eyeballs respectively. 62 Figure 2.23: The movement of Dav’s mobile base. 63 Figure 2.24: The movement of the arm’s Joint 1, Joint 3 and Joint 4, from the top row to the bottom. 64 Chapter 3 Dav Architecture Design: A Theory of Developmental Mental Architecture The software architecture of a developmental robot is a challenging new research subject. This chapter presents a theory of developmental mental architecture. Five architecture types, from the simplest Type-1 (observation-driven Markov decision process) to Type-5 (DOSASE MDP), are introduced. The properties and limitation of a simpler one are discussed before the introduction of the next more complex one. Further, we present the architecture design of the Dav robot. The framework of the Dav architecture is hand-designed, but the actual controller is developed, i.e., generated autonomously by the developmental program through real-time, online interactions with the real physical environment. We present the Dav architecture and the major components that realize the architecture. The designed architecture for Dav is the next generation version from its extensively tested predecessor - the SAIL developmental robot. 65 3. 1 Introduction Many different architectures have been proposed in the intelligent robot commu- nity. Robot perception and perception-based behavior generation have been proved very difficult, especially in unknown or partially unknown environments. Some work on robot learning was motivated by human learning and development: from sim- ple to complex. BAIRN (a Scottish word for “child”) is a symbolic self-modifying information processing system used as an implementation for a theory of cognitive development [102]. Drescher [29] utilized the schema, a symbolic tripartite structure in the form of a “context-action—result,” to model the ideas of child sensory-motor learning in Piaget’s constructivist theory of cognitive development. Soar [56] and ACT-R [6] are two well-known symbolic systems with an aim to model cognitive pro- cesses, although cognitive development was not the goal of these efforts. The model of Albus [4] takes sensors as input and produces control signals to effectors as output, thus, allowing a finer numeric representation of sensory features when a specific task is given. The Finite State Machine (FSM) or its probability based variant, the Markov Decision Process (MDP), are two general frameworks that have been used for con- ducting autonomous learning with a given goal in a symbolic world (e.g., Shen [82]) or reinforcement learning (e.g., Kaelbling et al. [49]). The explanation-based neural network, called “lifelong learning,” was used by Thrun [92]. Designing a program and its representation in a task-specific way using a tradi- tional approach is typically very complex, ad hoc, and labor intensive. The resulting system tends to be brittle especially in unknown and uncontrolled environments. Recently an important direction of research aims at reducing or avoiding the human imposed limitations (e.g., features, models) of the world for better environment adap— tation in learning. Cresceptron [108] is a system that allows a human teacher to inter- actively segment natural objects from complex images through which it incrementally grows a network that performs both recognition and segmentation. SHOSLIF [43], SARCOS [97], Cog [19], and Kismet [14] are motivated by simulating infant skills via 66 learning through interactions with the environment. The efforts discussed above were motivated by human learning and cognitive devel- opment to various degrees. However, these proposed architectures are fundamentally different from human learning and have not reached autonomous mental development (AMD) [118]. The SAIL robot [106] and the Darwin V robot [5] are two prototypes of developmental robot. Darwin V was designed to provide a concrete example of how the computational weights of neural circuits are determined in a controlled envi- ronment by the behavioral and environmental interactions of an autonomous device. The SAIL developmental robot was designed for developing perceptual and behav- ioral skills through interactions in uncontrolled, complex human environments (see the eight challenges for AMD in Section 2). There has been a lack of systematic theory for developmental mental architecture. This chapter introduces a theory of mental architecture suited for autonomous development. As an architecture design example, it presents a new architecture for the Dav robot [37], as shown in Fig. 2.1 the next generation developmental robot after SAIL. The major differences between SAIL and Dav fall into two categories, the body and the “mind.” Dav’s body is substantially more sophisticated than SAIL’s body which only has a single arm. The software architecture of Dav takes advantage of the rich sensors, especially interaction sensors, available on the Dav body. The Dav architecture, presented here, is significantly more complete and integrated than SAIL (e.g., the SASE concept) and has not been published before. However, it takes into account the extensive experience that has been gained from its predecessor SAIL. Although the overall design of the Dav software architecture has not been tested yet, most of its key components have been successfully tested on SAIL. Architecture design for an intelligent robot is one of the most difficult research topics in intelligent robot research, especially for humanoids. As far as we know, no such highly extensive and complete developmental architecture has been published. Section 3.2 outlines the AMD paradigm. A theory of mental architecture is pre- 67 sented in Section 3.3. Sections 3.4 and 3.5 presents the software architecture and its major components of the Dav robot. Section 3.6 provides concluding remarks. 3.2 AMD Paradigm Shown in Fig. 3.1, the AMD paradigm has two phases. In the first phase, construc- tion and programming, tasks that the robot will end up learning are unknown to the programmer. A task-nonspecific program, called the developmental program, is written during this stage. The most fundamental difference between a developmental program and a traditional program is that the tasks that the machine will execute are unknown to the programmer during the time of programming. In the second phase, autonomous development, the robot is turned on at time t = 0, the robot starts to interact with the physical environment in real-time through continuously sensing and acting. Humans teach the robot to learn tasks from simple to complex via teacher “arranged experience” called scaffoldingl. Unlike a traditional non-developmental robot, the following eight challenging re- quirements are necessary for practical AMD: 1. Environmental openness: Due to task non-specificity, the developmental robot must deal with unknown, uncontrolled complex environments. 2. High-dimensional sensors: The developmental robot must directly handle con- tinuous raw signals from high-dimensional sensors, e.g., vision, audition and tactile sensors. 1Scafi‘oldz'ng is the process of using developed simple capabilities to further develop more complex capabilities, through further experience (with or without a teacher), without the need of manual modification of the developmental program. Lev Vygotsky [100] proposed the concept of “zone of proximal development” (PZD) which is a latent learning gap between what a child can do on his or her own and what can be done with the help of a teacher. Wood, Burner & Ross [121] used the term “scaffolding” to describe such an instructional support through which the child can extend or construct current skills to higher levels of competence. Through this process, the scaffolding (arranged experience) is slowly removed. 68 Ecological Given to Given to Given to conditions . . . l Given to Release . _ _ _ _ . Turn Training, Training, Training. ‘] on testing testing testing A Sense, act, sense, act. sense. act gent = Construction & Autonomous Time programming —> <——— development ————> phase phase Figure 3.1: The autonomous development paradigm . Completeness in using sensory information: Due to environment openness and task non-specificity, it is not desirable for a developmental program to discard, at the design phase, sensory information that may be useful for future, unknown tasks. Certainly it selects related information useful for a particular task, after its task-specific representation is autonomously derived after birth. . Online processing: At each time instant, what the machine will sense next is affected by what the machine does now. . Real-time speed: The sensory/ memory refresh rate must be fast enough so that a physical event (e.g., motion, voice, and vision) can be sampled properly and processed in real-time (e.g., at about 10Hz). It must handle one-instance learn- ing: learning from one instance of experience. Time consuming iterations should be avoided. . Incremental processing: Acquired skills must be used to assist in the acquisition of new skills as a form of “scaffolding”. Each new observation data must be used to update the current representation and discarded after updating. . Performing while learning: Conventional robots perform after they are built, however, a developmental robot must perform while it “builds” itself (mentally). 69 8. Scale-up to complex tasks: A developmental robot must be able to perform increasingly more complex tasks while gaining more experience. After solving a series of technical problems, SAIL realized all of the above neces- sary capabilities. However, a realization of the above capabilities has a level. The Dav developmental architecture aims to significantly advance such a level plus following new concepts and components: 1. The realization of the SASE agent model, which enables the robot to perceive and autonomously control the voluntary parts of internal “brain” activities. 2. Integration of multiple sensorimotor subsystems for multimodal learning. 3. Integration of the architecture with a value system. This chapter concentrates on these new concepts and new components in the Dav software architecture design. 3.3 Observation-driven Markov Processes The architecture of intelligent agents is a complex issue. In this section, we introduce a series of architectures, from simple to complex, and the associated properties. We first define the concept of internal and external environments of an agent. Definition 3.3.1 (Environment) The internal environment of an agent is the brain ( or “the central nervous system”) of the agent. The external environment con- sists of all the remaining parts of the world, including the agent’s own body ( excluding the brain). Having defined the external and internal environments, we define the sensors and effectors associated with these concepts. Definition 3.3.2 (Internal sensors and effectors) An external sensor 5,. is a sensor that senses the external environment. An internal sensor S,- is a sensor that 70 Last Primed context context p--—--— S DDDDDDDDD DDDDDDDD Sen " DDDDDDDD DUDDDDDD 501‘s: 5,, $313 [11:11:15 DDDDDDDD DECIDE] DDDDD DECIDE] DECIDE] C1D DUDE DDDDDDDD J l l l l l L 1 l_1 I l l 1 l l l l ‘ Time IDDDDDZIDD EDDDDUDDDDD DDDDDZIDD DDDDDDDDDDD :l :1 EJDDD :ll'JL—JC Cl DDDDDDUD 1E] DDDUDC D D DDDUDDUD Effectors: BE] CH] . -3 Figure 3.2: The Type-1 architecture of a multi-sensor multi-effector agent: Observation- driven Markov decision process. Each square in the temporal streams denotes a smallest admissible mask. The Type-1 architecture takes the entire image frame without applying any mask. The block marked with L is a set of context states (prototypes), which are clusters of all observed context vectors l (t). senses the internal environment. An external effector E,3 is an effector that acts on the external environment. An internal efiector E, is an effector that acts on the internal environment. 3.3.1 Type-1: Observation-driven MDP An agent has a number of sensors and effectors. Fig. 3.2 illustrates a multi-sensor multi-effector model of an agent. The agent A(t) operates at equally spaced discrete time instances t = O, 1, We assume that an image is produced at each time instance by the sensor, independent of the sensing modality, visual, auditory, touch, etc. Without loss of generality, we assume that the agent has two external sensors and two external effectors. Each external sensor Sci; i = 1, 2, senses a random multi- dimensional sensory frame xe(t) = (xel(t),xe2(t)) at each time instance t and the sensed signal is fed into the agent. Each external effector Eei, i = 1, 2, receives from the agent an effector frame ae(t) = (ae-1(t), ae2(t)) at each time instance t. 71 Let x, E X and pt 6 ’P be the observations and outcome covariates (i.e., ran- dom vectors) at time t, respectively. Note that we change a variable of a vector to its subscript (e.g., change x(t) to 513,) when it is more convenient to consider the variable as a discrete index number. Let H, be the random vector of the history: H, = {xt,xt_1,...,xo,pt_1,...,p0}. At time t, the agent A(t) needs to estimate the distribution of P(p, | H, = h). If t is large, H, is too large to be practical and it contains much information that is not very related to the outcome Ht. Definition 3.3.3 (Type-1) The Type-1 mental architecture is a k-th order Observation-driven Markov Decision Process (MDP) [27, 122] that reduces the H, to contain only the last k observations: It = {xii xt—li "') xt-kipt—li "'vpt—k} as shown in Fig. 3.2. The random observations in 1; across time t = 0, 1, ...,t are the source from which the agent automatically generates states in the form of clusters l E C, where 5 consists of all possible observations of the last contexts L = {Ht | 0 S t}. The predicted consequence p, consists of predicted action at and the predicted value ”in Pt = (at/Ur)- A major difference between a regular MDP [49, 91] (or HMM [75]) and an observation-driven MDP are that the states 3, with a regular MDP are hand-designed by a human programmer but the states with an observation-driven MDP can be auto- matically generated (developed). With a regular MDP, the programmer must provide initial estimates for the prior probability distribution P(so), the state transitional probability P(st | x¢,st_1) and the state observation probability P(xt I 3,). It in turn, requires that the human programmer establishes a correspondence between the meanings of the physical events being modeled and the states. Due to the fact that physical events are not known at the time of programing for the developmental robots, 72 the regular MDP is not suited for the developmental program. In contrast, the obser- vation driven MDP requires no prior probability and all the probability distribution P(p, I l, = I) can be estimated incrementally on-the—fiy. Another important difference between the states in the traditional MDP and the observation driven MDP is the very different nature of the representation. The states in the former correspond to some objects or events in the world by human hand- design. The entire set of states is a monolithic representation of the modeled part of the world. That results in the high brittleness of the agent in dealing with unexpected events. In contrast, the states in the latter are clusters of observational vectors. They are not monolithic in the sense that an object in the world can correspond to many state clusters. Further, each cluster may correspond to an observation of several objects in the world. Therefore, there is no strict one-to-one correspondence between a state in the observation-driven MDP and an object of the world. It is the behavior of the agent (e. g., the same picking apple behavior from different views of the apples) that shows the discrimination and generalization power of the agent. In practice, we implement the regressor R using the Incremental Hierarchical Discriminant Regression (IHDR) [112,114]. Given any observed (last) context I (t), the regressor R produces multiple consequences (primed context) p1(t), ..., pk (t) with a. high probability: {1016), ---,Pk(t)} = B(Wll- (3-1) Thus, the regressor R is a mapping from the space of the last context [I to the power set of 'P: R : c H 27’. (3.2) R is developed incrementally through the real-time experience. For any t > 0 (after the birth), it is a total function since it is defined for all elements in [3, but it does not do well for most elements in L that it has not experienced. It is not an onto function since its range covers only a very small part of 2P. 73 Therefore, we need a value system that selects desirable contexts from multiple primed ones. The value system V( t) takes a set of (e. g., k) contexts from the regressor R and selects a single context: V(R(l(t))) = V({p1(t),p2(t)i ---,Pk(t)}) = alt) (33) where 1 g i g k and It varies according to experience. The value function selects the best consequence 1),-(t) that has the best value v,(t) in p,(t) = (a,(t),v,-(t)). For example, V({p1(t),p2(t), ...,pk(t)}) = 1),-(t) if i = arg max{v1(t), v2(t), ..., vk(t)}. The real-time Q-learning algorithm [104] can be used to estimate the value of each consequence p,(t), i = 1, 2, ..., k, and the agent selects the one (action) with the highest value. Therefore, the value system V is a mapping from the power set of P to the space of P: v : 2” H P. (3.4) 3.3.2 Type-2: Observation-driven Selective MDP The Type-1 mental architecture is sensor nonselective in the sense that it is not able to actively select a subpart of relevant information from the sensory frame (intra- modal attention) or to attend a particular modality but not the others (inter-modal attention). Given a d-dimensional input vector x, the attention can be modeled by an atten- tion mask m, where m is a d—dimensional vector whose elements are either 0 or 1. Suppose that the input vector is x = (x1,x2) and the mask is m = (m1,m2). Then the corresponding attended input vector is x’ = x (8) m = (x1m1,x2m2), where <8) denotes vector pointwise product. Not all the masks are admissible. For example, the set of admissible masks consists of circles with different radii p at different center positions (r0, c0) of the image frame. Then, the attention selection effector has three 74 control parameters: (r0, c0, p). Without loss of generality in our theoretical discussion, we assume, as shown in Fig. 3.2, that there are only four addmissible masks for each image frame at time t, denoted by the upper attention square, the lower attention square, and two trivial masks: no square is selected and both squares are selected, respectively. Suppose x = (x1, x2) is an input image frame, where x1 and x2 denote the subimage under the upper and lower attention square, respectively. With these four admissible masks, the attended image x’ = x®m has only four cases, (x1, x2), (x1, 0), (0, x2), and (0,0). For the clarity of the discussion, we will use the notation in the theory of formal languages and automata. Assume that the subimage in each attention square can represent a string x E 2*, where 2 is the alphabet and 2* represents the set of all strings of a finite length formed from the letters of the alphabet. Each attention square can contain a long string as long as the image that it models has a sufficient number of pixels. Therefore, we can write x1(t) = Karl’s body, x2(t) = trees, etc. A similar notation is applicable to effectors. For example, we can write a1(t) = move-forward, a2(t) = move-backward, where a1 and a2 are two independently controllable motors of the effector Eei, i = 1, 2. Definition 3.3.4 (Type-2) The Type-2 mental architecture is a Type-1 mental ar- chitecture, with an addition of an attention selector TZyXAil—tc, as shown in Fig. 3. 3, where 37 is the space of all possible pre-attention contexts y = {l(t) I 0 S t}, A,- is the space of all possible attention selections for T and L is the space of attention-masked last contexts. In order to show the properties of different architectures, we define a concept called higher. 75 Last Primed context context 5 DECIDED [31313 BC] DDDDDDDDD S " ISL—JUDGE] DUE DE] DDDDDUDDD CDSOTSZ II] C] Se, DUE] DUDE :lElC' DDDDDDDDD El DECIDE”: DUE DDDDDDDD D RENE 1 1/1 I L 1 l 1 l l l l 1- d 1 l \l l 1 l l L l l l l > / ’---- 1. in". c DDDDDJDDCDDEUD DDDDDDD Eflt' DDDDDJDDCJDCDD DDDDDDDD CC OI'SI a,t DUDE JDDDDDDD 1 DUDE JDDDDDDDD Figure 3.3: Progressive addition of architecture components for Type-2 to Type-5. Type-2: adding T and En. Type-3: Adding M and E12. Type-4: Adding S,- and primed sensation. The block marked with D is a delay module, which introduces a unit—time delay. Type-5: Developmental T, R, M and V. Definition 3.3.5 (Higher) Given a set D of tasks, we say that a developmental architecture A2 is higher than another developmental architecture A1, if given the same teaching environment E, the architecture A2 requires fewer statistically expected teaching examples than A1, over the environment E and over the tasks in D. Here the term “higher” is motivated by the concept “higher” animals. As a convention, we regard environment as a part of a task. A task is more challenging if the environment is uncontrolled. In the above definition, since the architectures are developmental, the developmental program that guides the develop- ment of the architecture must be task—nonspecific. Otherwise, one can always define an architecture specialized for a particular type of task and this architecture requires fewer teaching examples than a more general developmental architecture. However, the former is not able to deal with other tasks that require more general cognitive capabilities. 76 Theorem 3.3.1 (Existence of higher architecture) There is at least one class D of tasks in which only a proper subcomponent of sensory input is related to the tasks and the associated teaching environment E in which the T we? architecture is higher than the Type-1 architecture. Proof: Assume a set D of tasks whose goal is to classify sensory information in set C (e.g., human body). Without loss of generality, assume that at any time, only one of the attention squares of sensor 881 contains an element (e.g., human body) in C and the other window does not (e.g., natural background that is free of human bodies). For Type-2 architecture, the teacher creates such a teaching environment. First, he teaches the attention selection control of T (e.g., through reinforcement learning), so that T will open two attention squares one after another through a loop (to simulate saccades). If the attention-masked input belongs to C, the regressor R produces the class label as the action output to an external effector (e.g., using supervised learning). Otherwise, R does not produce any output to the external effector. Suppose that the attention window does not contain any element in C but instead contains uncontrolled changing natural settings that never repeat. The Type- 2 architecture performs reasonably well, since at least one of the two fixations enables it to detect an element in C while disregarding the other square which always presents a new natural background. In contrast, the Type-1 architecture can only learn from the monolithic input x(t) = (x1(t), x2(t)). Every vector x(t) is very different from all others because the natural settings are always new in the environment E, the Type- 1 architecture never observes a learned input x(t) and, thus, almost never performs correctly. Therefore, for this class of tasks D and this type of environments E, Type-2 is higher than Type—1. C] It is not true that a higher architecture can learn faster than a lower architecture in any setting for any tasks. If the environment is such that the entire input vector x(t) contains only elements in C and nothing of the natural background (which is very rare in reality), then the attention selection mechanism enabled by the Type-2 77 architecture does not help to reduce the number of training examples. 3.3.3 Type-3: Observation-driven Selective Rehearsed MDP The Type-2 architecture does not have a motor mapping M. Therefore, it cannot rehearse an action sequence to evaluate its consequences without actually carrying out the action sequence. Definition 3.3.6 (Type-3) The Type—3 mental architecture is a Type-2 mental ar- chitecture, with an addition of an action releaser M: MIPXAHP, as shown in Fig. 3. 3, where P is the space of all possible predicted consequences, A,- is the space of all possible attention selection for M. The action releaser M is a special case of a more general motor mapping which also generates representation for frequently practiced action sequences (e.g., using the principal component analysis PCA) so that smooth action sequences can be generated. The Type-3 architecture enables the agent to hold an action before it is released to be executed by the environment. It makes it possible for the agent to generate several actions consecutively but to hold each action without actual execution. With a traditional MDP with hand-designed states, it is possible to compute all the possible next states and perform planning. The Q-learning method uses the estimated action value Q(s,a) of action a at state 3 to select the best action a“ = maxaeA Q(s,a), from the set A of all the possible actions. This best next action a‘ maximizes the expected rewards in the future. This kind of approach has two fundamental problems. First, the agent has a rigid value system. No matter what value model is used, finite horizon, time discount model with a small or a large time discount factor 7, the agent cannot change the way the value is determined [49,91]. 78 For example, if a time discount model is used, the agent is short-sighted. It always prefers multiple small rewards in the near furture to a far away but important reward. Second, the agent is not able to learn to “think” (predict) using the events that it learned from experience. For example, fed well and sleep well can be a reasonable goal for a human infant, but the same value system is not suited for a human adult. The observation-drive MDP does not suffer from these limitations. However, as long as the predicted action is released, the effect that it causes to the external world will result. Such an effect cannot be undone. For example, suppose that the robot predicts (primes) an action to drop a cup from a hand. As long as it executes this action, the cup will drop to the concrete floor and will be broken. Can we design an architecture that enables the robot to “think” and “plan” a significant amount of time ahead before it releases the action? 3.3.4 Type-4: Observation-driven SASE MDP Defined in Weng [107] a Self-Aware and Self-Effecting agent not only has external sensors Se and external effectors E8 that sense and act upon the external environment (including the robot body), but also has internal sensors S,- and internal effectors E,- that sense and act upon the internal representation of the brain (not including its body) In neuroscience, there is no concept of internal sensors and effectors. The brain does not need a receptor to sense the signals in the brain, since the brain signals are already in the desirable form. The concept of internal sensors and effectors is introduced to facilitate computational understanding of the SASE agent. Definition 3.3.7 (Awareness) If an agent A senses the states (presence, absence, difl'erent forms) of object b through its sensors, we say that the agent A senses the object b. If the agent is able to recall the association between the sensed various states of b and their resulting efiects in a task domain D sensed by the agent, we say that the agent is aware of object b in the task domain D. 79 Some explanation is in order here. By definition, an agent must use its sensors, the entry point of its sensory architecture (the input of T), to sense an object. For example, if the state of an object is fed into the architecture, e.g., through the middle of the regressor R, the motor mapping M or the value system V, the agent does not sense the object by the definition because the agent cannot use such information properly as it does with its sensors. In the above definition for awareness, we consider a task domain D because any awareness has a scope. A person who is aware of the boiling temperature of water in a domain (e.g., in a normal environment) may not necessarily be aware of the boiling temperature of water in another domain (e.g., lower in a low pressure environment). In the definition, we require that awareness must associate various states of b and the corresponding resulting effects. For example, if a human is not aware of the correspondence between various states of gravity (presence, absence, strong gravity) and the resulting effects (e.g., to a falling object), he is not aware of the object. With the above definition, we are able to deal with the issue of self-awareness and self-effecting. Theorem 3.3.2 (Necessary conditions of awareness) Suppose an agent is aware of its mental activities in a domain. Then the following points must be true: (1) It senses such activities using its sensors. (2) It feeds the sensed signal into its perceptual entry point just like that for external sensors. (3) It recalls the association between the difierent status of the activities and the resulting efi’ects to the environment. Proof: Point (1) is true because, according to Definition 7 for the awareness of an object, the agent must sense the object using its sensors. Point (2) is true because the status of the object must be sensed and fed into the entry point for sensors for proper perception and effect recall. Point (3) is true because the definition of awareness requires the recall of such an association. Cl 80 Based on the previous theorem, let us examine the issue of self-awareness. If an agent runs a Q—learning algorithm (or any algorithm for that matter) but it does not sense the algorithm using its sensors which are linked to its entry point for sensors, the agent is not aware of its own algorithm. For the same reason, humans do not sense the way their primary cortex works and, therefore, normally they are not aware of their own earlier visual processing. This early processing is subconscious, in the sense that it does not require a conscious decision. However, the voluntary part of the mental decision process does require a conscious, willful decision. Therefore, in the architecture design, the parts that require voluntary decisions must be sensed by the agent, and the sensed signals must enter through the entry point of the sensors. Definition 3.3.8 (Type-4) The Type-4 mental architecture is a Type-3 mental ar- chitecture, but additionally, the internal voluntary decision is sensed by the internal sensors S,- and the sensed signals are fed into the entry point of sensors, i.e., the entry point of the attention selector T. In order to recall the effects of the voluntary actions, not only is the expected reward value estimated by the value system, but also the primed context, which includes not only the primed action, but also the primed sensation. The architecture illustrated in Fig. 3.3 is a Type-4 architecture. Two voluntary internal actions are modeled by En for attention selection, and by Ea for action release. Both internal actions are sensed by the internal (virtual) sensors Sn and 8,2, respectively. The rehersed external action (not released) is sensed by the virtual internal sensor 5.3. Note that when the action is released, it is sensed as external action from E81 and Egg. The primed sensation (which predicts the sensation of SCI and 3.22) is used by the value system in selecting the best action according to, e.g., the novelty (suprise) or the nature of the reward (e.g., sweet or bitter). The regressor R maps each attended context I 6 .C to a set of multiple primed contexts, from which the value system selects a single primed context p E P. In other 81 words, the composite function of R followed by V gives a mapping: V o R : .C H P. With a SASE agent, both external context (sensed by Se) and internal context (sensed by S.) are available in l. With a consecutive time series t = 1, 2, ..., k, the composite function VOR performs a series of regressions, represented by a regression sequence: 3 = ((l1,p1), (12,122), (11.1%)) where each regression pair (1,, p,) is an input-output pair of the composite function V o R, p,- = V o R(l,-), i = 1,2, ..., k. The link between two consecutive regression pairs can be realized by two paths, the external path and the internal path, denoted by e and i respectively. If a part of the primed context p,- is executed by an external effector and the effect of the action on the external world is sensed by an external sensor Se, the path corresponds to an external path. For example, if a mobile robot moves ahead, the objects look closer after the motion. If a part of p,- is sensed by one or multiple internal sensors, the path corresponds to an internal path. For example, if the primed action is to jump, the following input to the regressor R will contain a representation of this action through a (virtual) Si, no matter if the action is executed or not. Symbolically, the reasoning process can be represented by the following composite reasoning sequence: 61 62 ek-l 3k S = “11191): . a(l21p2)i ' 1'": . 1(lkapkli . (3'5) 21 z2 Zk—i 2k where represents the external and internal paths. Whether the result of external and internal paths are taken into account by the regressor in the reasoning process depends on the 82 attention selection in T. Definition 3.3.9 (External and external reasoning process) There are three types of reasoning processes, external, internal, and mixed, corresponding to the atten- tion in which the attention module T attends to external, internal or both, respectively. We can readily arrive at the following conclusions: 0 Type—1 through Type—3 architectures allow the agent to perform external rea- soning processes (i.e., practice grounded in the physical world). s Type—1 through Type-3 architectures do not allow the agent to perform internal reasoning as defined. For example, Type-3 is not aware of the primed context because it does not have internal sensors. 0 A Type-4 architecture is able to execute external, internal, and mixed reasoning processes. One might conclude at this point that the internal process looks like “thinking.” However, if the internal representation is hand-designed for a given specific task, the internal process is still far from what is called thinking by higher animals and humans. 3.3.5 Type-5: Developmental observation-driven SASE MDP In the architecture discussed above, no requirements have been placed in terms of how the architecture is created. Specifically, how should the mappings T, R (and L), M, and V be created? Definition 3.3.10 (DOSASE MDP) The developmental observation-drive SASE MDP (DOSASE MDP) has an architecture Type-4 or higher, that satisfies the fol- lowing requirements: 83 1. During the programming time, the tasks that the agent will learn are unknown to the programmer. 2. The agent A(t) starts to run at t = 0 under the guidance of its developmental program Pd. After the birth, the brain of the agent is not accessible to humans. 3. Human teachers can only affect the agent A(t) as a part of its environment through its sensors and effectors recursively: At any time t = 0,t = 1, ..., its observation vector at time t is the last context l(t). The output from A(t) at time t is its selected primed context p(t) E P. A(t — 1) is updated to A(t), including T, R (and L), M, and V. In contrast with the traditional MDP, the DOSASE MDP (A(t), Pd) is develop- mental in the sense that the developing program Pd does not require a given estimate of the a priori probability distribution P(l) for all I E (I, nor even a given set of states. Consequently, Pd does not require a given estimate for the state observation probability P(lt | l¢_.1) nor that for the state observation probability P(xt | It). When the number of states is very large, it is sufficient practically to keep track of the states that have a high probability, instead of estimating probability of all the states, which is too computationally expensive to reach real-time speed. 3.3.6 Type-6: Multi-level DOSASE MDP The Type-5 architecture has only one sensorimotor level, although each mapping T, R, and M have multiple levels in their own internal structure. We call it a sensorimotor level because the pathway from T through R up to M corresponds to a pathway from sensory input to motor output. The primed context of such a level can be fed into another sensorimotor level for the following reasons: 1. Abstraction. While a low level is tightly linked to fine time steps, the higher levels become more “abstract,” in the sense that the higher level clusters of 84 context states are coarser in temporal granularity and grouped more according to actively attended events. That is self-directed abstraction according to active attention. For example, when the agent visually tracks a walking person, the attended part of the person’s images are grouped. . Self-generated context: Allow voluntarily generated motor actions to serve as context input to the higher level. Thus the agent is able to “talk to itself.” This capability also helps abstraction because the variation of self-generated motor actions has a smaller within-class variation and a lower dimension that the typical sensory inputs from the external environments. . Enabling a higher degree of sensory integration: It is not practical to integrate all the receptors in a human body by a single attention selection module T, oth- erwise, the attention is too complex. Sensors that are related more closely (e.g., touch sensors of the same finger) are integrated first by a low level sensorimotor system and the output of these sensorimotor systems are fed into the next level which integrates a larger scale of sensors (e.g., touch sensors of different fingers). With the above considerations, we introduce the Type-6 architecture. Definition 3.3.11 The Type-6 mental architecture is a Type-5 mental architecture with several levels of sensorimotor systems. Each Type-5 architecture is considered a sensorimotor system. The primed contexts from the lower-level sensorimotor systems are fed into the sensory input of the higher-level sensorimotor systems. Fig. 3.4 illustrates the Type—6 architecture. The input to the attention selector T9) at level 2 includes the primed context p(t) = (xp(t), ap(t)) from level 1, where xp(t) and ap(t) are primed sensation and primed action, respectively. One or multiple sensorimotor systems can feed their primed contexts into a single sensorimotor system at the next higher level for sensory integration. It is possible to prove that there is at least one task set and the associate teaching environment where the Type—6 architecture is higher than the Type—5 architecture. 85 Last Primed context context Sol Sensors: 3:2 Ex: I Effectors: 5:2 Figure 3.4: The Type-6 architecture. We have systematically introduced six types of architectures. Although the order in which new capabilities are added is more a design choice than a necessity, the order used here is motivated by a relatively large payoff in capability with a minimal addition of the architecture complexity. 3.4 Dav Architecture The Dav architecture design was guided by the theory of architecture discussed in the previous section. It’s architecture belongs to Type-6. The basic modules T, R, and M in the last section have more functions with the Dav architecture. In the Dav architecture, the attention selection module T discussed above corresponds to the sensory mapping, the regressor R corresponds to the cognitive mapping and the action releaser M corresponds to the motor mapping. In the sensorimotor system of the Dav architecture, two cognitive mappings are used, implemented by the same engine IHDR. The reality mapping R predicts contexts of near (immediate) future, while the priming mapping F predicts contexts of far future. 86 As shown in Fig. 3.5, the Dav architecture consists of parallel layers of the senso- rimotor levels. The lower the level, the less spatial and temporal extent it deals with. The fewer receptors and effectors a level integrates, the less spatial extent it deals with. Also the lower the level, typically the shorter the temporal context it copes with. Thus, the lower the level, the simpler its perceptual and behavioral capabilities are. 3.4.1 Open view Fig. 3.6 gives an open view of a sensorimotor system. A sensorimotor system can be regarded as an observation-driven MDP. However, as we discussed in the last section, it is not the same as the traditional MDP with hand-designed states. Each circle in Fig. 3.6 indicates a local context state, corresponding to a single time frame. At each time frame, the system grabs the current sensory input x, which is used for updating the last context L. Two mapping engines: R and F are used to map from the last context L to a set of primed contexts P. In Fig. 3.6, a priming transition from one context to the next is indicated by a dashed line. Fig. 3.7 illustrates how the temporal trajectory is generated from the visited con- text states. What happens in the physical world (including the action of the agent) and the internal world (including the internal action of the agent) jointly determine the last context, which in turn determines which context state is visited. The next context visited provides information for updating the primed contexts from the cur- rent context state. In Fig. 3.7, such updating is simply denoted as an addition to the set of primed contexts P. 3.4.2 Recursive view The above open model of a sensorimotor system is straight forward to visualize the concept of context state, but it is not suited for showing how the architecture realizes the system. The recursive model of a sensorimotor system, shown in Fig. 3.8, can 87 better illustrate the computational architecture of a sensorimotor system. As shown in the figure, both the reality mapping R and priming mapping F are accomplished by a mapping engine. Each context state is represented by a vector in the input space of the mapping engine. With the recursive model, we can see that the effect of a value system V can be modeled as a mapping that takes multiple primed contexts from the reality mapping R or the priming mapping F, and selects a preferred context. Its action part is sent to all the corresponding effectors. In the past, supervised learning is separated from reinforcement learning. Reinforcement learning only works on an externally given scalar reward value. We treat a value system in a more general way. Its innate part is driven by physical pleasure (e.g., a sweet taste) or pain (e.g., an electric shock). Its ' learned part is responsible for learning values of events from the external world (e.g., working hard is good). Thus, the value system uses not only a reward value, but the primed context as well (including sensations and actions). 3.4.3 Architecture view A more detailed block diagram of the architecture of a sensorimotor system is shown in Fig. 3.9. Each internal and external action output feeds back, through a delay unit, into the next sensory input. This is required by the SASE agent model: the agent must sense and perceive what it does, internally and externally. The input to a sensorimotor system, indicated by the two left-most arrows in Fig. 3.9, is its target for perception and cognition. A sensory mapping [127] is needed wherever an attention selection effector (local analysis) or dimension reduction is needed. As shown in Fig. 3.9, a sensory mapping is used to enable attention selection from the current sensory input vector and another sensory mapping enables the attention selection of the current input, the previous context, or both. As mentioned earlier, each sensorimotor system has two cognitive mappings, the reality mapping R and the priming mapping F. A near future context 88 from R is useful for carrying out skilled procedures. A far future context from F is needed to predict the future consequence (e.g., 10 frames or more into the future). The source of input to the priming mapping has two alternatives: the same input as the reality mapping (as shown in the figure) or the primed context output from the reality mapping. The former allows more accurate control of the timing in the primed context while the latter may enable more efficient “abstraction” of the context in the priming mapping, since the primed context from the reality mapping has already taken the output action into account (e.g., the discriminant analysis by the IHDR tree in section 4.4). The priming mapping, implemented by a cognitive mapping engine, IDHR needs a prototype updating queue whose function is to predict far future context using the Q-learning algorithm [41] in a recursive way. The reality mapping does not need such a queue because it only predicts the next near future. The motor mapping of a sensorimotor system generates concise representation for stereotyped actions, in addition to the function of action release. 3.5 Major components The architecture in Fig. 3.9 divides the information processing into three major map— pings: the sensory, cognitive and motor mappings. The sensory mapping takes sen- sory inputs. It may be followed by another sensory mapping to increase the extent of coverage of space and time. The sensory mappings are followed by a cognitive mapping. The basic difference between the sensory mapping and the cognitive mappings is that the sensory mapping does not use the information of motor output for its feature space development, but the cognitive mapping does. The motor mapping has two functions: (1) generating a higher motor representa- tion in space and time for motor output space and ( 2) receiving signals from cognitive 89 mappings in terms of a higher motor representation and recovering the detailed con- trol signal for every motor that it handles. In this section, we first discuss these three major mappings, next introduce an innate value system, and finally an algorithm is given to summarize the sensorimotor architecture. 3.5.1 Sensory mapping The sensory mapping provides representation for all possible receptive fields in space and time and allows attention selection. In the past the role of representation has been well recognized but the purpose of attention selection has not been adequately studied. Fig. 3.10 shows the hierarchical spatiotemporal organization of sensory mapping. The major functions of the sensory mapping include: 1. Grouping different sources of sensory input (e.g., different pixels in an image or visual and auditory inputs) in space and time. The term “sensory” should be understood as including both raw sensory inputs and processed internal signals. 2. For the grouped set of input, maintaining a complete representation for a hier- archy of all possible (sampled) receptive fields, for the purpose of attention. By sampled receptive fields, we mean a large but finite number of receptive fields at different positions and sizes in space and time. 3. Automatically deriving features in the combined space as new representation. It cannot assume a predetermined representation and therefore, it does not use a symbolic representation. 4. Reducing the dimensionality of the new representation, while minimizing the loss of necessary information. 90 5. Execution of attention selection as an attention effector, controlled by a signal from other parts of the brain (top down control). In [127], a simplified sensory mapping, Staggered Hierarchical Mapping (SHM), is implemented. SHM uses the Incremental Principal Component Analysis (CCIPCA) method to automatically develop orientation sensitive and other needed filters. In addition, the internal representation generated by SHM for receptive fields at different locations and sizes is nearly complete in the sense that it does not lose important information. 3.5.2 Complementary Candid Incremental Principal Compo- nent Analysis Complementary Candid Incremental Principal Component Analysis (CCIPCA) is a stochastic approximation algorithm to estimate eigenvalue and eigenvector iteratively. It is the major tool to implement the sensory mapping. PCA is a well-known technique in data compression and feature extraction. It gives a linear transform that converts a set of d—dimensional data into a lower- dimensional space by minimizing the error in the least mean square (LMS) sense. Therefore, PCA is used to generate new representations from sensory inputs. A well-known approach to PCA is to solve an eigensystem problem. Given A as the sample covariance matrix of the data set, one can find its eigenvectors and eigenvalues, sort the eigenvalues in descending order, and construct a k x d matrix T with the rows being the eigenvectors corresponding to the largest k eigenvalues. The matrix T is the sought transform [35]. This is the basic idea behind most techniques for PCA, such as the QR method [36]. However, since this approach requires an estimate of the covariance matrix, the data set usually needs to be completely available at the computation time. This is not appropriate for a developmental robot because of two reasons. First, a developmental robot is an online agent, which senses and responds 91 to the environment continuously. It should not wait until all data is accumulated before doing the processing. Second, when the dimension of the data is high, both the computation and storage complexity grow dramatically. For example, in the eigenface method [50] [94], one of the promising face recognition methods that involves PCA, a moderate grey level image is of 88 rows and 64 columns, which results in a 5632- dimensional vector. Since the sample covariance matrix of a data set of d-dimensional random vectors contains d (d + 1)/ 2 independent numbers, this amounts to 15,862,528 numbers! [119] proposes an algorithm, called complementary candid incremental PCA (CCIPCA), to incrementally compute the principal components of sequentially ar- rived samples without estimating the covariance matrix. Suppose that sample vectors are acquired sequentially, u(0),u(1), . . .. Each u(n), n = 0,1,..., is a d—dimensional vector and d can be as large as 5000 or even beyond. The d x d covariance matrix is neither known nor affordable to be estimated and stored as an intermediate result. Without loss of generality, [129] assumes that u(n) has a zero mean. The algorithm of CCIPCA is as follows, .nzn‘lv.n_ lunurnflfli v.( > n .< 1>+n .< > .< lune—1111’ (3.6) zine) = ...(n) —u.T(n> “(”l “U” (3.7) llvilnlll llvi(n)l|’ where, u1(n) = u(n), and v,(n) is the i-th dominant eigenvectors of the samples’ covariance matrix estimated after collecting the n-th sample. The idea underlying CCIPCA comes from the concept of statistical efficiency [46]. Basically, this definition says that the most efficient estimate is one that has the least variance from the real parameter and the variance is bounded by the Cramér- Rao inequality. For example, the sample mean, 5: = fizz; x,, is the most efficient estimate for the mean of a Gaussian distribution with known standard deviation 0 [33]. 92 For CCIPCA, by substituting itself recursively, Eq (3.6) can be written as, -n =_1_ n u- 4.1;. THU—1) v’t( ) it; i(]) i(])|[vi(j_1)ll' Thus, v,(n) can be viewed as the mean of “samples” w,(j), 1).-(n) = 5: we), (3.8) where w,(j) = u,( ])u,T( j )fl Although w,( j) is not necessarily drawn from a Gaussian distribution independently, the estimate v,(n) seems to have a close relation with the estimate of a sample mean which is the most efficient estimate. [129] have proved that v,(n) —-> :tA,e,- with probability one when n —> 00, where A,- is the i- th largest eigenvalue of the covariance matrix of {u(n)} and e,- is the corresponding eigenvector. The above analysis prompts a further improvement to CCIPCA. As shown in Eq. (3.8), all the “samples” ({w, (j )}) are weighted uniformly when v,-(n) is estimated. Since v,- (j) is far away from its real value at early estimation stage, the so—generated w,(j) can be regarded as a “sample” with large “noise” when j is small. To help the convergence of the estimation, it is preferable to give smaller weight on these early “samples”. A way to implement this idea is to use an amnesic average by changing Eq. (3.6) into, v.” :ILZE_110,,_ Lflunur ”tin-1) 1( ) 71. t( 1)+ TL 1( l i (n)llvi(n—1)H, (3.9) where l is called amnesic parameter, typically, ranging from 2 to 4. With l, Eq. (3.9) automatically adjusts the weights of old estimates and new samples. In summary, the CCIPCA algorithm is as shown in Fig. 3.11. CCIPCA has been shown empirically to be a very efficient estimation algorithm compared with SGA and GHA for high-dimensional data [128]. 93 3.5.3 The cognitive mapping: IHDR tree The R and F mappings in Fig. 3.9 are implemented by two Incremental Hierarchical Discriminant Regression (IHDR) trees [44]. IHDR is not new for this thesis research. We first survey other related regression methods, then outline the improvements over the old versions [111,113]. An experimental evaluation is provided at the end. Learning a function f : X H y in real time for high-dimensional data remains a nontrivial problem, particularly when the complexity of the function to be estimated are unknown. By surveying the literature of regression in statistical learning, one can identify two classes of methods: global model fitting and local model fitting methods. Global model fitting methods, including Multiple Layer Perceptron (MLP) [l2], Radial Basis Function (RBF) [22], Lasso regression shrinkage [93] and Support Vector Machine based Kernel Regression methods (SVMKR) [88], are characterized by ad- justing a predefined model using a pre—collected training data set via optimizing global criteria. They are not suited for the real-time learning in high-dimensional spaces de- spite their theoretic background. First, they require a priori task-specified knowledge to select right topological structure and parameters. Their convergent properties are sensitive to the initialization biases. Second, those methods are designed primarily for batch learning and are not easy to adapt for incrementally arrived data. Third, their fixed topological structures eliminate the possibility to learn increasingly complicated scenes. Local model fitting methods (e.g., IHDR) use temporal-spatially localized (thus computationally efficient) models, in the meanwhile, grow the complexity automati- cally (i.e., the number and organization of local models) to account for the nonlinearity and the complexity of the problem. They are more suited for incremental real-time learning, especially for the situation where there is limited knowledge about the scene and the robot, itself, needs to develop the representation of the scene in a generative and data driven fashion. IHDR generates local models to sample the high-dimensional space X x y sparsely 94 based on the presence of data points in a vector quantization (VQ) [51] manner. IHDR enjoys two nice properties: First, IHDR derives automatically discriminating feature subspaces in a coarse-to—fine manner from input space X. Discriminating features are automatically derived discriminating features at the internal nodes of the tree. The features are most discriminative in the sense that they maximize the trace (or the determinant) of between-class scatter. In this way input components that are irrelevant to the mapping’s output are disregarded to achieve better discrimination and generalization. Second, IHDR organizes its local models in a hierarchical way, as shown in Fig. 3.14. IHDR’s tree structure recursively excludes many far-away local models from consideration (e.g., an input face does not search among nonfaces), thus, the time to retrieve and update the tree for each newly arrived data point x is 0(log(n)), where n is the size of the tree or the number of local models. This extremely low time complexity is essential for real-time learning with a very large memory. Two kinds of nodes exist in IHDR: internal nodes (e.g., the root node) and leaf nodes. Each internal node has q children, and each child is associated with a discrim- inating function: 1.1x) = 3a — c.)TW.-1(x — c.) + gluon/.1), (3.10) where W,- and c,- denote the distance metric matrix and the x-center of the child node i, respectively, for i = 1,2, ...,q. Meanwhile, a leaf node, say c, only keeps a set of prototypes PC = {(x1,y1), (x2,y2), ..., (xnc,y,,c)}. The decision boundaries for internal nodes are fixed and have quadratic form. A leaf node may develop into an internal node by spawning (see Step 8 of Procedure 1, Appendix B) once enough prototypes are received. The needs of learning the matrices W,, i = 1, 2, ..., q in (3.10) and inverting them make it impossible to define W,- (for i = 1, 2, ..., q) directly on the high dimensional 95 space X 2. Given the empirical observation that the true intrinsic dimensionality of high dimensional data is often very low [77]. As shown in Figs. 3.12 and 3.13, it is possible to develop most discriminating feature (MDF) subspace D to avoid the degeneracy and redundancy caused by irrelevant dimensions of the input data (“curse of dimensionality” [10]). As shown in Fig. 3.14, IHDR partitions the input space X into regions: R1, ..., RN and each of them corresponding to a leaf node. In each of those region, a linear regression model is used to represent the data samples received. Let the x be the input querying point. Approximating function f is then accomplished by computing the local linear regression models and by forming a final prediction from the weighted average of the individual predictions N y = 2: IR): (x)yk) k=l where 51,, denotes the predicted output for the linear regression model covering the region R), (see Eqs (A5) and (A6) in Appendix B). The weight IRk(x) represents an indicator function, which denotes the likelihood of the input x belonging to the region R), and is defined as I. if X E R]; 1R1; (X) = 0 otherwise. Due to IHDR’s hierarchical organization of those regions, the computational com- plexity IRk(x) is 0(log N) where N is the number of leaf nodes. Therefore the speed of computing the weight is fast even the tree has grown to be a huge one. In summary, the following improvements were suggested by this thesis: 1. Amnesic average is proposed to relieve the problem of initializing bias. At the 2The complexity of inverting an d x d matrix is 0(d3). When (1 is big (e.g., 5000), it is infeasible to compute online in real-time. 96 beginning of the incremental procedure, because the number of samples is small, the clusters are usually ill-generated. So are the boundaries of the cluster, which will influence later partitions of the tree. We use the amnesic average technique to gradually get rid of the effects of earlier data. Suppose {x,-,i = 1, 2, . . . , n — 1} are the data entering the system sequentially. Written in an incremental style, their sample mean 27“” and scatter matrix Flt"), after receiving the nth sample, are given by, - 1 — 1 53m) ___ n #(Tllim—i) + + #(n) n n xn, (3.11) r55) = n ‘ I; M”) nan-1) + 117-@025; — x)(x,, — x)". (3.12) u(n) is the amnesic parameter which controls the update rate, depending on n in the following way: 0 ifn_<_n1 u(n) = b(n — n1)/(n2 — m) if n1 < n S n2, (3-13) b+(n—n2)/d lf'ng <77. where b, n1, n2 and d are parameters. Let us remark that u(n) is a piece-wisely linear non-decreasing function. u(n) —> 1/d when 71 becomes big. This means that new samples still have effects and old samples are “forgotten” gradually when n ——> oo. . We use the augmented space Z = X x y for clustering, while [111,113] use 32 for labeling. This is desirable since it encourages more elaborated representation using the information of input space. For example, in supervised learning, there are samples whose x might be dramatically different but their labels y are the same. In this case, using 31 information for clustering will cause those samples indistinguishable and, thus, the derived discriminating subspace ’D is singular 97 and the yielding tree is unbalanced. Furthermore, adding X information may increase the locality of the sgenerated clusters, which facilitates linear regression model on a local region. We use the following distance metric for clustering: dizwzllx—clll +wylly—yill, (314) 0;; 0y where w: and wy are two positive weights that sum to 1: w; + wy = 1; (I,B and 0,, denote incrementally the estimated average lengths of x and y vectors, respectively; and c,- and y, denote, respectively, the x-center and y—center of the ith cluster of the node c. It is worth noting that the parameters on.n and 0,, should take account of the dimensionalities of X and y. In most of IHDR’s application, dim(X) >> dim(y) and, hence, the x—part dominates the distance in Eq. (3.14) if we do not weight them properly. 3. In [111,113], the nearest-neighbor model3 is used in the leaf node. Although the nearest-neighbor’s error rate is bounded above by twice the Bayes rate (which is the Optimal error rate for a classification problem) [31], Atkeson et al. [9] point out the nearest-neighbor models might overfit linear areas, as shown in Fig. 3.15 (a). We propose to use the linear regression in the leaf node instead of the nearest- neighbor model. Let x denote the query point. Assume IHDR finds the leaf node c whose prototype set PC is: 736 ={(X1,y1),(x2,y2),-..,(ch,ync)} where the output labels are scalar“. 3The nearest-neighbor regression simply chooses the closest point and uses its output value. 4For notation simplicity, we only consider the scalar output. 98 As in locally weighted learning (LWR) [9], d(., .) denotes the Euclidean distance and K (d) denotes a distance kernel function (e.g., K (d) = exp(—%)). We compute the following diagnal weight matrix: W = diag[K(d(x1, X), K(d(x2)x)3“°,K(d(x”1clx)l T with each one corresponding to a prototype in PC. Let Y = [y1 ync] and T X = [xf x56] . We formulate the problem as finding the parameter ,8 such that the regression output is .7? = fiTX and the following cost function is minimized C = (Y — xs)TW(Y — 2(5). The solution of this weighted least square problem [81, Page 386] is a = (XTWX)-1XTWY. (3.15) It is important to note that X TWX is a d X d matrix, where d denotes the dimension of the vector x. If X is a high dimensional space, then inverting such a matrix is impossible for a real-time application. However, we can conduct such regression on the discriminating subspace D (see Appendix A and [123]). In the following, we experimentally show the augmented IHDR’s feature extraction and real-time learning capabilities on an artificial data set. As a first test (2d-cross), we ran IHDR on the noisy training data drawn from a two dimensional function y = max{exp(—10xf),exp(—50x§), 1.25 exp(—5(xf + x3)} + N(0, 0.12), 99 as shown in Fig. 3.16 (a). This function is a combination of nonlinear and linear areas and an interesting test of learning and generalization capabilities of a learn- ing algorithm [99]: learning system with simple model find it hard to capture the nonlinearity well, while more complex models easily overfit the linear areas. A sec- ond test (100d-cross) added 98 irrelevant noise features to the input, each having a density N (0, 0.0252). We thus obtain a 100—dimension input space. A third test (200d-cross) added another 100 irrelevant noise features with the density N (0, 0.05) to the input space of the second test. The learning curves with those data set are illustrated in Fig. 3.17 (a). In all three cases, the normalized mean square errors (nMSE) are reported on an independent test set (1681 points on a 41 x 41 grid in the unit-square in the input space). As the number of training number increasing, all nMSEs converged to the nice function approximation results, namely, nM S E < 0.03 after 100,000 training data points. Fig. 3.16 (c) shows the learned surface of the third test after 100,000 training samples presented. For comparison, at the same condition of (c), Fig. 3.16 (b) shows the learned surface using the nearest-neighbor model in leaf node. Fig. 3.17 (b) illustrates the execution time of both training and test processes with respect the number of training samples. It is interesting to note that the execu- tion time increases linearly w.r.t. the number of training samples. The reason is that IHDR’s updating and retrieval procedures have the logarithmic complexity and, thus, the average time of adding a training sample and retrieving a testing sample does not change much even though the size of tree has grow tremendously. As considering the third case (200d-cross), execution time on a 200-dimension data set takes only about 500 seconds for 100,000 sample, in other words, the average time for a sample is about 5 milliseconds. Therefore, IHDR is fast, which is extremely important for later real-time robot collision avoidance experiment. To sum up, the power of feature extraction is due to IHDR’s deriving discrimi- nating features automatically. The real-time learning performance is achieved by the logarithmic complexity of IHDR. 100 3.5.4 Motor mapping A major goal of motor mapping is to generate a concise representation for stereo- typical motor trajectories so that skilled actions that involve many motors can be represented by high-level motor primitives in a lower-dimensional space. As shown in Fig. 3.18, the architecture of the motor mapping is very similar to the spatiotemporal sensory mapping but it works backwards. Motor mapping receives lower-dimensional “feature” space signals (motor primitives) to synthesize higher-dimensional raw motor signals. Motor mapping has many issues similar to those of sensory mapping. Instead of having attention selection in sensory mapping, motor mapping has a gating system. The gating system plays two roles. The first is the role of gating by which the motor mapping evaluates whether the intended action has sufficient action thrust to pass the gate threshold. The second is the role of subsumption [18] in subsuming the lower—level action by the integrated action, as shown in Fig. 3.19. The primed action from the cognitive mapping includes the desired control signal for each effector as well as action thrust. The action thrust indicates how consistently the action is issued. The higher the action thrust, the more consistent the action generator is. Therefore, enough thrust must be generated to pass the gate before an action can be issued to the effector. Subsumption is needed where inconsistent actions for a single effector are issued from multiple sources, and each source has a different priority. One typical use is that the action derived from a higher level (more sensory integration) has a higher priority over the action derived from a lower level (less sensory integration). The subsumption belongs to innate internal behaviors. When the robot becomes more mature, the selection of behaviors from low levels can be overidden by learned (voluntary) internal behaviors. If a single motor is considered, motor mapping includes only a gating system for each of the single motor as well as the subsumption mechanism for integration from other sensorimotor systems as shown in Fig. 3.19. Through developmental experience, 101 the motors that are highly correlated enable the growth of a new part of motor mapping, denoted as an attached (top right) block to the basic motor mapping in Fig. 3.9. The new part of the motor mapping plays the corresponding role of the gating system, but it is for correlated multi-motor actions. Further, it not only performs the gating function, but also the reconstruction of higher-dimensional raw motor signals from a lower-dimensional “feature” representation (higher motor primitives). In [131], an action gating system is implemented on the SAIL robot to decide whether an action is actually triggered. 3.5.5 Innate value system Given the last context I (t), the IHDR tree finds the best matched context P’ which is associated with a set of primed contexts {p1, ..., pk}. How does the controller select the best primed context? A Q value with each primed context is needed. This is represented by a list Q = {q1, q2, ..., qk}, in parallel with the primed contexts. Thus, each primed context consists of three components, (xp, (t), ap,(t), q,-), i = 1...k. Definition 3.5.1 The innate value system chooses the action that has the best value q. The value system receives a list of primed contexts at each time instant t and occasional environment reward signal r(t), which is from the biased sensors of the robot (e.g., the aversive sensors such as a “bad button” or appetitive sensors such as a “good button”). Each element in the Q vector starts from an initial value, e.g., a zero value. The updating rule that we summarize here is adapted from Q-learning [103], tested in [41]: q(l', n) = n _1;#(n)q(l’,n - 1) + and V(p') = max q,, 195k 102 where q(l’, n) denotes the matched element in Q and has been updated n times, and l’ is the best matched prototype for the current last context l(t). 7 is a positive number between 0 and 1, and is used for discounting in time. p’ is the primed context found in the next time frame. To keep the temporal order of the latest retrieved primed contexts for real—time local propagation, a primed context update queue (see Fig. 3.9) is needed. The future primed context prototype (including reward) propagates back to the previous prototypes recursively for each time frame in the context update queue. Other prototypes (not in the queue) are not affected and thus are not updated. 3.5.6 Sensorimotor algorithm The following is a summary of a sensorimotor system with an innate value system: 1. Initialize the mental cycle count to 0. 2. Grab the current input from the sensory mapping to form the last context l(t) = (x(tlialtll- 3. Go through the sensory mapping to reduce the dimensionality of l (t) while applying the internal actions to the attention selection effector of the sensory mapping. 4. Retrieve the reality (cognitive) mapping R using l(t) to find the best matched prototype l’ in the matched leaf node of the cognitive mapping. Its output part is a list of primed near contexts (Ar(t), X,(t)). 5. The innate value system selects the near primed context p,(t) from the lists (Ar(t)in(t))' 6. Retrieve the priming (cognitive) mapping F from l (t) to produce the primed far contexts (Af(t), Xf(t)). 7. The innate value system selects the far primed context pf(t) from the lists (Arlt),X1(t))- 103 10. 11. 12. 13. 14. 15. . Feed the actions in p,(t) and p ;(t) through the motor mapping. The current internal action determines whether primed near context or primed far context receives attention in the motor mapping. The attended primed context is p(t). Update both the reality and priming mappings using the primed context p(t). If there is an imposed action from the environment (supervised learning), replace the corresponding part of ap(t) in p(t) = (ap(t), xp(t)). Spatially update the primed context lists (A,(t), Xr(t)) and (A {(t), X f(t)) using the Incremental Vector Quantization technique. Temporally update primed contexts in the prototype update queue for the prim- ing mapping F. Push the current primed context p(t) into the prototype update queue from the tail and pop out the oldest primed context. The motor mapping produces internal and external actions. If the mental cycle time has not been used up, sleep for the remaining time so that the exact time actually spent by this cycle is equal to the fixed pre-specified mental cycle. Increase mental cycle count by 1 and go to Step 2. The SAIL robot has successfully tested those major components of the proposed architecture. For example, a Type-6 architecture (shown in Fig. 3.20) was used in action-chain learning [131]. The result of the experiment is promising and we plan to implement the whole architecture on the Dav robot. Other experiments on SAIL include: (1) vision-guided navigation [115], (2) grounded speech learning [116], (3) communicative learning [131], (4) novelty and reinforcement learning in the value system [41], and (5) sensory mapping development [127]. 104 3.6 Summary This chapter has outlined the major differences between a non-AMD robot and an AMD robot: An AMD robot’s developmental program is task-nonspecific. Further, this chapter introduces eight challenging operational requirements of an AMD robot, which pose a series of challenges for the design of the architecture. The series of architecture types introduced here demonstrates several architecture limitations of the current hand-designed MDP and a possible route toward higher mental architectures. The proposed architectures leave much freedom for different implementations of its major components, e.g., sensory, cognitive, and motor mappings. Although the Dav developmental program that is currently being implemented could be explained in further detail, such details are beyond the scope of this chapter. Dav’s developmental architecture has yet to be implemented and tested, but its predecessor, SAIL, has successfully tested the major components of the designed architecture. The presented architecture will be tested in future studies and the performance will be reported. 105 Visual L» Sensori— _ Sensori— T input ’ m0tor J ' motor _‘ [2:121:12] > (W, I) \ (W, 1) Motor mapping Auditary,» Sensori— / Sensori— (w. I) To internal E input motor _ motor I ) effectors > 3 V (W. l) \ (MD A E l i -> Sensori- / Sensori— Motor —_> motor __ motor mapping [ E (w. I) > (w. I) (w, l) _, Sensori— _ Motor motor 7 mapping E (W, 1) > (w, 1) #3 C b -§ Sensori— ‘ Motor :9 motor 7 mapping I ‘W' ” e (w. I) J 'Ii'guch Sensori» _ Motor Motor 1 put 1 7 . output —> motor - mapping __ _ .. , (W. l) > (W. l) E Touch , Motor 0 input 2 Senson— > Motor output2 P L; —> motor mapping .5 (WI 1) > (w. I) 1% Pain & ~ Motor sweet Senson— > Motor output 3 —> motor mapplng ‘_—> (w, I) > (w. I) J Figure 3.5: An outline view of the Dav architecture. The closed-loop control is realized by the sensors that sense each effector, as well as a set of rich sensors that sense the internal and external environments. w denotes the working memory and l is the long-term memory. 106 Val Action ue system a. f \ \ Figure 3.6: An open view of a sensorimotor system. A circle designates a local context state determined by the last context L. An inward solid arrow indicates a sensory input at that time frame (including internal and external sensors) at each time frame. An outward solid arrow indicates an output at that state. A dashed arrow indicates an experienced transition between states. The lower part corresponds to the reality mapping R and the upper part corresponds to the priming mapping F. The two generally do not have the same number of states. The priming mapping is used to predict farther future contexts. 107 ‘ States (prototypes) oooooppppooooooo ooooqqupooooooo oopppeeeeppooooo owwwmomm%WOooooo oddeebobbeeppooo aobbboooooeeeooo @pooooooooempooo obooooooooobbooo ; Time Figure 3.7: A temporal trajectory view of a sensorimotor system. Each cycle represents a context state at a particular time. A shaded cycle corresponds to a visited state. A dashed arrow indicates a priming of a context that corresponds to the cycle that the arrow points to. The priming arrows from a state are updated according to the real experience. An actually visited state is not necessarily primed. F F ar pfiming contexts - Preferred mapping I . $1231; Near Value action . 1 input Reality contexts I ’ sys em a(t) x(t) mapping V R Figure 3.8: A recursive view of a simplified sensorimotor system. Not all components are shown. 108 ”1111 p' F - Primed contexts L Effector Priming mapping Prototype updating queue T Sensory S input 8 Sensory Sensory Reality __ Motor “‘13:". u mapping _ y mapping mapping mapping a ll R M It Delay < t ‘H I ll \Release Prev'ous control ' A ti context+- c ons 5:33;: Figure 3.9: A block diagram of the architecture of a sensorimotor system. The lower cognitive mapping realizes the reality mapping and the upper one realizes the priming mapping. GK: gate keeper, an internal effector to actively control the update of the last context. Time )1 I \ I I \ I I I I 1‘ .. _ LI _1 it. It ..-- ‘ A 7“”-3‘ nL | I --.7c--.;<'--.7c‘_:.;c._->\ ‘X. _ ><.~‘-‘ 28.3— 8. -—.k——.>c-_-k-_ I I I II I I | I I I ‘ — .- a- .— X“ . -—> 1’ I H’ | Attention '\ .. _‘ ’.' ‘_ effector ' x. X x —> I ‘—r | r I I l I I II I I I I l I I I I \ U \l \I \[ \\4‘\t’\lx\”\t Control input Figure 3.10: The spatiotemporal organization of areas in the sensory mapping. The ellipses represent receptive fields covering both space and time. Areas are organized in a hierarchical way, and the output of an earlier (low order) neural area is used as input to the later (higher order) neural area. Along the pathway of information processing, a neuron in a later area has a larger receptive field than one in an early area. In order to make the correct signal available at the right time for the right input line, we use a time shifting technique. Acting as a time delay unit, the shifter moves each signal to the next line at each time instant. 109 2. Fort: 1,2,...,k, do, 2.1. If i = n, initialize the ith eigenvector, “Uzi" “ 1) = Uzi") 2.2. Ifi <= 71, ,(n—l) Iu(n) = "ii-W" — 1) + LIT—(“4mg“)Thin—I‘mI “MW = “4") ‘ "{("hlz‘kz in ll::(:)ll' 2.3. If 1' > n, initialize the ith eigenvector, Uzi ) = 0- Figure 3.11: CCIPCA algorithm [119]: compute first k dominant eigenvectors, v1(n),vg(n), . . . ,vk(n), directly from u(n), where n = 1,2, . . .. I . a The plane for the II [’16, d. u . - I iscnmmatmg ,”6 29/ feature subspace X space Y space Figure 3.12: Y-clusters in space y and the corresponding X-clusters in space A”. Each sample is indicated by a number which denotes the order of arrival. The first and the second order statistic are updated for each cluster. The first statistic gives the position of the cluster, while the second statistics gives the size, shape and orientation of each cluster. 110 Figure 3.13: The autonomously developed IHDR tree [114,123]. Each node corresponds to a local model and covers a certain region of the input space. The higher level node covers a larger region, and may partition into several smaller regions. Each node has its own Most Discriminating Feature (MDF) subspace D which decides its child models’ activation level. root 0 11 14 , . 4.4 43 . , ' , ‘42010203 04 4.11' 3.2 . . 21 ’22 31 33. 3,4 00(Le eecbe eeche eeche ' 1.1 1.4 2.1 2.4 3.1 3.44.1 4.4 Figure 3.14: Regions in the input space marked 1'. j where i is the index of the parent region while j is the index of child region. The leaves of the tree represent the finest partition of the space. The decision boundary of the region is of quadratic form. 111 5 6 {—0—- 4 I 5 3 4 ___.___i 2 3 1 _.J ° 2 4 o x 1 x 0 2 4 6 I 2 3 4 5 6 7 6 x 7 y 5 6 4 5 3 4 2 3 I 2 o - 1 x y 2 4 6 I 2 3 4 5 6 7 Figure 3.15: (a) Fits using the nearest-neighbor rule. (b) Fits using locally weighted regression. (c) Fits using kernel regression. (a) and (b) are adapted from Atkeson et al. [9]. 112 The true function (b) The fitted function: nMSE=0.025 Figure 3.16: The result of IHDR on an artificial data set. 113 -o- 2dcross #Training data points (a) 600 a - . - -T a - . . . f - . -o- 2dcross ‘4- 100dcross a 5m b *" 200dcross 3‘ .0 I § a 400 . . 0 III .§ , E 300 . g . I o . 3: ~’ § -’ x 200 m 100 . #Training data points (b) Figure 3.17: (a) The learning curves of the artificial data set. (b) The execution time 114 New high-level Signal flow direction for action space learning Signal flow direction for skilled action synthesis Area3 ll 0202 i control Areal N r Q .~.'-.—’ I O r inputs Action release I: control > Time ‘ ‘\ ‘ -ID’ \ -..>g ’—~ I l 5c:‘:}<’:-:>I 51.“, ~ _.‘ "I h I I .PC' I - 4.-- Pi fit- K. x. ...— -—' '- ~—‘ ”- w a -' an—‘ -' '- -' "k”- .' I \ X.___, A ‘7 ‘7 a"- ’x 7 f I . :57? 22' I ‘--’ .I ‘l I." I :2)? 1; m g z""‘>{"“> ’ J I I I (r I )1. Figure 3.18: The motor mapping as a reverse application of the sensory mapping, but with signal reconstruction. Action --——-> Gating I From high level To effector ——> I Gating control Subsumption Figure 3.19: An illustration of a simple motor mapping for a single effector. The left is a gating mechanism and the right is a subsumption mechanism. 115 Attention control To effectors selector <— Prototype updating queue Second level LBE Attention control sensation Action Legend: I Reflexive primed Prototype updating queue context Backpropagated First level LBE primed COMCXI Figure 3.20: A hierarchical developmental architecture (Type-6) for SAIL procedure learn- ing. Each Level Building Element (LBE) corresponds to a sensorimotor system. 116 Chapter 4 Perception-driven Control Architecture 4. 1 Introduction From Chapter 2, we can see that conventional robotics relies on human supplied equations of dynamics and kinematics to plan trajectories. This approach has been proven to be very useful when those equations and environment representation are available. However, it is shown that this approach has difficulties when the system is extremely complicated and insufficient analytical knowledge is available. For example, humanoid robots have so many degrees of freedom that even for simple actions there are nearly infinite set of joint position sequences that accomplish a single action [98]. Some researchers hence propose to learn actions through practice. Related work in the area includes LWPR on a SARCOS robot by Schaal et a1. [97], the simulated model of a 37 degrees of freedom by Mataric et al. [11], and the scheduling degrees of freedom by Grupen et al. [26] motivated the early child motor development. On the other hand, the traditional approaches usually require a human supplied representation of the environment. Robots designed with those approaches have dif- ficulty in adapting to unknown and changing environments, for example, in sensing 117 and controlling a robot’s arm to pick apples in an orchard. Further, the goal can change at any time (e.g., a larger apple becomes the target before the current one is picked). Some efforts have been made to program behavior-based humanoid robots (e.g., Kismet [14], Cog [13,65]). Recently, progress has been made in realizing robots that can develop their mental skills autonomously through what is called Autonomous Mental Development (AMD), which we have discussed in Chapter 3. In this chapter, we present a novel developmental perception-driven control ar- chitecture (DPDCA), which is adapted from the overall architecture presented in Chapter 3. The novelty of DPDCA includes: (1) No task (goal) is given at pro- gramming time. (2) The task-specific representation is generated autonomously from sensor-motor space, not in a 3—D world space. (3) The robot learns while performing. (4) The robot is “alive” while humans interact with it. In the following, we first outline the proposed framework. we then illustrate the power of DPDCA via Dav’s range-based indoor navigation experiments. 4.2 Control Architecture Machine perception has been proven difficult. Programming perceptual capability using human defined features (e.g., edges, colors, tones) provides a way to produce results in a controlled setting. A resulting fundamental limitation is that the robot does not work well in unknown, partially unknown, or changing complex environ- ments. In this section, a Developmental Perception Driven Control Architecture (DPDCA) will be presented to take up these issues. Definition 4.2.1 A Sensory vector is a vector that contains raw data from all sen- sors. A sensory vector is a stochastic process indexed by discrete time t, denoted by x(t). The sensing modalities of a developmental robot typically include vision, audition, 3- D range sensors, joint encoders, tactile sensors and strain gages, plus “brain” internal 118 sensors such as sensors that sense attention. Definition 4.2.2 An action vector at time t is a vector denoted by a(t), which con- sists of the control signals sent to all effectors at time t. Definition 4.2.3 A context at time t is the sensory information that is needed by the controller to generate a proper action at that time. With the current sensation denoted by a sensory vector (including sensing of action) g(t) = (x(t),a(t)), the last context is defined as l(t) = (g(t),g(t—1),...,g(t— m +1)) at time t. Here In denotes the number of time frames in the context. The context so—defined includes only the latest m sensory vectors. It is desirable to use a small subset in order to reduce the complexity of learning. Definition 4.2.4 The control law of the DPDCA is a decision process Rt: g’(t) = R,(l(t)), illustrated in Figs. 4.1 and 4.2. The primed (recalled) context vector g’(t) consist of two parts: the primed sensory vector xp(t) and the primed action vector ap(t). The action vector ap(t) is the output signal vector sent to the controllers. Rather than programmed manually, the decision process (a function) R, is developed through time. The function Rt depends on the robot’s sensorimotor experience, and can be regarded as an internal representation of the environment and its own body. Definition 4.2.5 An internal representation Rt is the agent’s representation of its own body as well as that of the external environment. It is grown from the agent ’3 continuous interaction with the external environment. The representation R, is an accumulation of the experience up to time t. The R, is generated based on what we called developmental algorithm, from its own experience indicated by {g(r)| 7' = O, ..., t}. B, does not necessarily describe the world accurately, however, it may be improved by new experiences. 119 Externally occurred l Internally : primed I . , . . Last sensation t aned sensation Sensation E—V—J W.) ’ time axis 11(1) xph) L. (D a l (t) ap (t) Action: 1 I l I Action . . . time axis Last action ' Primed action Sensation: ‘ Agent Figure 4.1: The decision process of an agent. The agent tries to use the last context I (t) = (x1(t), al(t)) to predict the future contexts: primed sensation xp(t) and primed action (W)- R, is not a monolithic representation typical in human designed representation. Instead, it is high dimensional, numeric, and hierarchical. Each experience g(t) must be used to update the representation R; which is then used for computing the primed action ap(t) and the primed sensation xp(t). Afterward, g(t) is discarded. Therefore, the entire experience series from time 0 to t — 1 is not available at time t. The robot simply does not have space to store all of them. The representation R, is crucial in order to keep all the necessary information about what the agent has learned so far. The following mappings are needed to learn incrementally: (1) Representation updating from current context I (t) input and the current representation Rt_1(t): Rt : R(Rt—II l(t)), (4'1) (2) Control law: P = 3:00)), (4-2) where the set P is a list of primed contexts {g£(t) | i = 1, ..., k} as an approximation of distribution of primed context. Each element of P consists of three elements, (xp,(t),ap,(t),q,-). The action api(t) of the primed context with highest q, value is 120 World Joint Sensors Sensors Xfl) a(t g (t) Controller g(t-1) Effectors g(t+In-1) Context Reward Imposed actions Figure 4.2: The architecture of the robot controller. Note that symbols a(t), x(t) and l (t) denote the action, sensation, and context at time t, respectively. The internal representation R; is realized by IHDR. The world sensors include range finder and cameras. Block V denotes the innate value system. 3 Grab current sensory frame ‘— Derive action —--> Update memory I Figure 4.3: Flow chart of learning procedure: the system learning while performing. chosen as the next output control signal a(t), that is a(t) = apc(t), c = arg “_nliinkqp (4.3) ..... The “innate” value system V updates q values using an augmented Q-learning algo- rithm [42], based on handle external rewards r(t). A detailed illustration of DPDCA architecture is in Figure 4.2. R, is implemented by an IHDR tree and 7?. is its updating function. IHDR learns incrementally by building a hierarchy of automatically derived discriminating feature subspaces. Given each input, it retrieves output in logarithmic time. 121 As shown in Figure 3.1 the robot trainers are an active part of the environment, they do “robot sitting”, training the robot step—by-step, and imposing reward or punishment according the robot’s current behavior. In each learning episode, if the trainer imposes a desired action on an effector, the robot will comply using its force sensor depending on whether the imposed action is consistent with what the robot intend to do (see Fig. 4.3). If everything is fine, proceed with the action and the robot may receive a reward for it. On the other hand, if the robot is not doing what the trainer wants, it will modify its learned internal representation Rt so it follows the trainer’s instruction (the imposed action). This is a unified learning procedure, that combines both supervised and reinforcement learnings. Major differences exist between the DPDCA and a traditional controller. The DPDCA has following characteristics: It develops a controller (software) by learning tasks on the fly, from simple tasks (e.g., moving around according to what it “sees”) to more and more complex tasks (e. g., learning sensing based object manipulation). The same developmental program runs continuously for learning all the tasks. Different environments and commands invoke different task execution. Other differences are summarized in Table 4.1. To conclude, let us remark that, comparing with control diagram in Fig. 2.15 as well as Eq. (2.33), the proposed architecture does not need a desired trajectory to be specified. In fact, the physical environment itself shapes the robot’s behavior based on the current task context (if multiple tasks are considered). 4.3 Dav’s Range-based Navigation Experiment 4.3.1 Wall Following Behaviors Our mobile humanoid robot, Dav, as shown in Fig. 2.1, was used to test our proposed the DPDCA framework. In this experiment, we trained the robot to navigate along 122 Table 4.1: Differences between the DPDCA controller and the traditional controller. [ DPDCA controller | Traditional controller ] e No task (goal) given at e Tasks (goals) are given (e.g., programming time. trajectories). e No need of 3-D Euclidean space 0 Inverse kinematic transform is (end—effector frame) needed since planned trajectories representation. are represented in the . . end-effector frame. 0 When human Interacts wrth a robot, the robot is “alive” in a 0 Dynamic model of the robotic sense of learning to associate the body is needed for tracking the current sensory context and the trajectories. im osed action. , , p 0 When human overrides machine 0 Robots learn their dynamic is dead (in a manual operation behavior online. mode). 0 “Sensing and learning”. 0 Human programmer supplies the Generate task-specific representation. representation autonomously “on the fly”. the loop shown in Fig. 4.4, which is the second floor of Engineering Building at Michigan State University. The floor plane covers an area of approximately 136 X 116 square meters. We have three types corners in this test site: ‘L’, ‘T’, ’+’, and ‘2’. Some training range images around those corners and intersections are shown in Fig. 4.5. Those corner types are typical for indoor navigation. The training session was conducted under human control via the graphical user interface (GUI), shown in Fig. 4.6. The imposed action can be calculated from the mouse pointer’s position (refer Eq. (5.14) and Fig. 5.7). At each sampling instant, an input range image, previous robot’s velocities and an imposed action vector (the desired forward translational speed and angular speed) were fed into the IHDR tree for training if an action was imposed by the trainer. On the other hand, the laser input and the robot’s velocities were used to retrieve the action sent to low-level controller if no action was imposed. During 123 [r J‘LLEIEH J l'—'l I L__] Figure 4.4: A map of the training site at the second floor of the Engineering Building at Michigan State University. The test loop (desired trajectory) is indicated by the thick solid lines. 5 different corners are denoted by bold-faced arabic numbers. impose action cycle, it is worth noting that if the retrieved range image was similar to the current range image (small Euclidean distance), the old stored action value was replaced with the new imposed one. First we took Dav for a training run. During the training session, we usually let the robot run on its own (the current version of controller or the IHDR tree). If the robot’s action was wrong, we overrode it either with a correct one or gave it a negative feedback (reward); in the mean while, Dav associated the current laser map with the imposed action or attributed the negative reward to the current action. This training procedure resembles how babies learn to walk. When we satisfy with the performance (e.g., small cross-validation error), Dav can be set free. Dav learned very quickly during the straight section of the corridor, several inci- dences of imposing action were sufficient for a reasonable performance. More training samples were needed for Dav to turn correctly at the corners. After we trained the robot with 2,215 samples online, Dav successfully finished the desired trajectory (the 124 test loop in Fig. 4.4). The distribution of the 2215 samples is the following: 215 samples is from the straight sections; the other 2000 samples are from corners, with each 400 samples from a different corners. Secondly, to illustrate the performance of the navigator, we partitioned the 2215 samples into six bins, with first five bins corresponding to the five corners in F ig.4.4 respectively and the sixth bin corresponding to the samples from straight sections. We performed five tests. In each of these tests, a bin from a corner was left out for testing. In Fig. 4.7, we show the average error histogram of the five tests. The x-axis denotes the Euclidean distance between the true and the predicted outputs, while the y-axis denotes the number of samples. The range of output (v,w) is v 6 [0,1] and w 6 [—7r/ 2, 7r/ 2]. The average normalized mean square error is 11.7%. In the third experiment, we used the data set from the second floor as the training samples. In the test, the data set with 4747 samples (manually labeled) from the third floor was used. Fig. 4.8 shows the error histogram, in which the x-axis denotes the Euclidean distance between the true and the predicted outputs. The normalized mean square error is 8.6%. By inspecting the testing sample with large prediction error, we found that in most cases, the labels of the testing samples were not consistent with the training samples. In order to provide a sense about how stable the navigating behavior is at a different environment, we designed the fourth experiment. We trained the robot at the second floor while testing it at the third floor. Fig. 4.9 shows how the robot navigates at the third floor. The thick line in Fig. 4.9 denotes the trajectory of the robot. One pass of autonomous navigation in the tested site took about 37 minutes. The robot was observed to continuously run for 3.5 hours before the onboard batteries were low. In several such tests conducted so far, the robot all performed well without hitting the wall and objects on the floor. During these navigation tests, passers—by were allowed to pass naturally. The mobile robot was not confused by these passers- by largerly because the IDHR tree uses automatically derived discriminating features, 125 Figure 4.5: A subset of training samples. The red arrows denote the labeled heading for the robot. which covers the entire scene. In addition, we measured the minimal clearance from the obstacles in those test sessions. In the worse case the clearance is of about 50cm, which happens when the robot was in a narrow corridor. 4.4 Summary and Future Work The framework presented here does not restrict the scene type and, thus, is potentially applicable to other similar indoor environments. Instead of relying on particular scene features, the proposed controller uses the raw range image and automatically derived features (due to IHDR’s feature extracting capability). Since no manually extracted features (e.g., lines and corners) are used in the system, the controller is less restrictive and more robust to the noise of sensors and effectors than the manually designed ones. Finally, let us remark that, although the range—based indoor navigation system is less challenging than the vision-based one, the result of Dav’s indoor navigation is still 126 Figure 4.6: The graphical user interface to train the robot online. The left window shows the current laser map in the robot’s local frame. The trainer may impose an action via clicking the mouse button in the window. The red arcs illustrate the possible circular path that the robot may have. The right window shows the primed range image (retrieved from the IHDR tree). very impressive. It shows the power of DPDCA framework. The future work includes applying this framework to the robot’s eye-hand coordi- nation task, which is more challenging because more degrees-of-freedom are used and the high-dimensional vision is involved. 127 350 Y 1 r I 300 ~ 250 l 200 . 150 « Figure 4.7: The error histogram of the leave-one-out test. 3500 . . , - 3000 . 2500 . 2000 . 1500 - 1000 . 500 . 00 (IS—_—5 i 'Cs 5 215 Figure 4.8: The error histogram of the testing data set collected from the third floor. 128 l .I 1TH L_J ___..JL Wfl l' T' ILTI [:14 l Figure 4.9: A map of the test site at the third floor of the Engineering Building at Michigan State University. The thick solid lines denotes the trajectory along which the robot traveled. 129 Chapter 5 Global Obstacle Avoidance through Real-time Learning 5. 1 Introduction Reactive obstacle—avoidance behaviors and limited look-ahead searching offer a solu- tion for a mobile robot to navigate in unknown and dynamic environments. Although substantial amount of work on robot’s obstacle-avoidance behavior exists, many ap- proaches are still restrictive. A common restriction is that the mapping between the sensory context and desired behaviors is too complex to write a program to simulate accurately. It seems that real-time learning provides a promising alternative. Rather than using hand-designed behaviors in an off-line fashion, the learning allows robots to ex- plore the environments themselves while improving their performance through prac- tice. During the learning, the robot associates the desired behavior with the current sensory context and plans ahead within a limited step in real time. This architec- ture design endows robots with great flexibility to cope with unknown or changing environments, which are not easily handled by programmed-in behaviors. 130 5.2 Related work The problem of range—based obstacle avoidance has been studied by many researchers. Various reported methods fall into two categories: path planning approaches and local reactive approaches. Path planning approaches are conducted off—line in a known environment. It is shown that path planning for a robot with bounded velocity and arbitrary many obstacles, is intractable (NP-hard) [21]. In Hwang & Ahuja [25,45], an artificial potential field is used to find a nearly optimal collision-free path in a space. Such methods can handle long—term path planning but are computationally expensive for real-time obstacle avoidance in dynamic environment (moving obstacles). The local reactive approaches are efficient in unknown or partially unknown dy- namic environments since they reduce the problem’s complexity by computing short- term actions based on current local sensing. In the vector field histogram method (VFH) [13], a 1-D polar histogram representation is constructed to model the en- vironment. The curvature-velocity [84] and dynamic window methods (DW) [34] formulate obstacle avoidance as a constrained optimization problem in a 2-D velocity space. Obstacles and the robot’s dynamics are considered by restricting the search space to a set of admissible velocities. Other local approaches include: Nearness dia- gram (ND) [66] uses high-level description of the environment to generate the motion signals via five laws. Ego-Kinematic space method [67] suggests a nonlinear trans- form to relieve the problem of the non-holonomic constraint of two-wheel vehicles. In order to deal with local minima, limited look-ahead search strategies are integrated. For example, the enhanced VFH algorithm (VFH* [95]) uses A* [79] and the global DW approach [16] uses NF 1 [57]. A major challenge of scene-based behavior generation is the complexity of the scene. Human expert’s knowledge has been used to design rules that produce behav- iors using pre—specified features. In Lee & Wu [58], a fuzzy logic control approaches is used to incorporate human expert knowledge for realizing obstacle avoidance be- haviors. One difficulty of a pure fuzzy approach is to obtain the fuzzy rules and 131 membership functions. The neuro—fuzzy approach [2,62,126] is introduced with the purpose of generating the fuzzy rules and the membership functions automatically. Their training processes are usually conducted using a human supplied training data set (e.g., trajectories) in an off-line fashion. The dimensionality of the input variables (features) is usually less than 10 to be manageable. In contrast with the above efforts that concentrate on behavior generation without requiring sophisticated perception, a series of research deals with perception-guided behaviors. Studies for perception-guided behaviors have had a long history. Usu- ally human programmers define features (e.g., edges, colors, tones, etc.) or environ- mental models [15,43]. An important direction of research, the appearance-based method [23,74], aims at reducing or avoiding those human-defined features for better adaptation of unknown scenes. The need to process high dimensional sensory vector inputs in appearance-based methods brings out a sharp difference between the be- havioral modeling and perceptual modeling: the effectors of a robot are known with the former, but the sensory space is extremely complex and unknown with the latter and, therefore, very challenging. In this chapter, we present an approach of developing an obstacle avoidance be- havior by a mobile humanoid through online real-time incremental learning. This learned behavior also integrated the result of look-ahead search seamlessly to solve the “nearsightness” of purely reactive approaches. By limiting look-ahead search to performed within real time, the path planner of this work is of the same spirit of the learnng real-time A* (LRTA*) [54]. The major distinctions of the work include: First, we used the appearance-based approach for range—map learning, rather than an environment-dependent algorithm (e.g., obstacle segmentation and classification) for obstacle avoidance. Second, a novel data fusion technique was used to integrate difference modalities, e.g., laser map, robot’s speeds, and planned heading. Third, we extended the search scope of LRTA* to the range of the robot’s sensing and Di— jkstra algorithm was used. The new appearance-based learning method was able to 132 distinguish small range map differences that are critical in altering the navigation be- havior (e.g., passable and not passable sections). In principle, the appearance-based method is complete in the sense that it is able to learn any complex function that maps from the range-map space to the behavior space. This also implies that the number of training samples that are required to approximate the complex function is very large. To reduce the number of training samples required, we introduced the attentional selective mechanism [124] which dynamically selected regions in near approximity for analysis and treated other regions as negligible for the purpose of local object-avoidance. Further, online training was used so that the trainer could dynamically choose the training scenarios according to the system’s current strengths and weakness, further reducing the time and samples of training. The remainder of this chapter is organized as follows. Section 5.3 presents the proposed approach. The robotic platform and the real-time online learning procedure are described in Section 5.4. The results of simulation and real robot experiments are reported in Section 5.5. Discussions and concluding remarks are given in Section 5.6. 5.3 Approach The sensitivity to local minima of the reactive approach leads us to divide the problem into two components: obstacle avoidance and path planner. This combined framework can generate motions for mobile robots that accomplish their tasks, while securely navigating in an unknown and dynamic environment (with low speeds). 5.3. 1 Obstacle avoidance We consider the obstacle avoidance behavior as a decision process, which converts the current sensing to action, including the range image and robot’s velocities, and the desired heading. At this level, the robot does not sense and store scene configuration (e.g., global map of the environment) nor the global robot position. That is, we 133 assume that the current sensing and heading contain all of the information that is sufficient for robots to derive the next motor control signal. In fact, it uses the real world as its major representation. The goal of this component is to move safely according to the scene, meanwhile moving toward the goal. The range scanner observes r(t) E ’R C R" at time t, where 7?. denotes the space of all possible range images in a specific environment. r(t) is a vector of distance, whose ith component r,- denotes the distance to the nearest obstacle at a specific angle. The vector v(t) gives the current velocities of the vehicle at time t, which are measured by the vehicle’s encoders. Let 6(t) denote the desired heading given by the path planner. The variables r(t), v(t), and 0(t) are given by difference sensors or action whose dimension and scale are different. We thus need a normalization procedure when fusing them together as an integrated context vector. Definition 5.3.1 The vector x(t) denotes the system’s context of the environment at time t, defined as: r(t) — r v(t) — v 6(t) — 6 w. w,, we x(t) =( )6 X, (5.1) where w,, wv, and we are positive numbers denoting the scatter measurements’ of the variates r, v, and 6, respectively, while r, o, and 0 are their corresponding means. The action vector y(t) E 32 consists of control signals sent to all of the effectors at time t, where )7 denotes the sample space of action vectors. The obstacle avoidance behavior can be formulated as a mapping f : X I—+ y, i.e., the primed (predicted) action y(t + 1) (the signal sent to the motors) is a function of x(t): y(t + 1) = f (X(t))- (5-2) 1For simplicity, we assume the covariance matrix (Eu) of a variate u is equal to 021, where I denotes an identical matrix. Thus, its corresponding scatter is wu = \/ trZu = J50, where n denotes the dimension of the variate u (see Marida [64] page 13). 134 9(1) Path Planner Mobile Robot 0 1It) Figure 5.1: The overall architecture of the range-based navigation. “Attention” denotes an attentional module. Fig. 5.1 shows the coarse architecture of the navigator. An IHDR tree, which we discussed in Section 3.5.3, is generated to estimate the control signal y from x. The current range image r(t), the vehicle’s velocities v(t) and the heading 6(t) from the path planner are used for deriving the next control signal y(t + 1). An attentional module is added to extract partial views from a whole view of a scan. 5.3.2 Attentional mechanism Direct use of an image as a long vector for statistical feature derivation and learning is called the appearance-based approach in the computer vision community. Usually the appearance—based approach uses monolithic views where the entire range data (or visual image) frame is treated as a single entity. However, the importance of signal components is not uniform. There are cases where appearances of two scenes are quite similar globally, but different actions are required. Further, similar actions are needed where the appearance of two scenes look quite different globally. Both cases indicate that there are critical areas where differences critically determine the action needed. This necessitates an attentional mechanism to select such critical areas. For example, in Fig. 5.2, (a1), (b1), and (c1) show three scenarios along the robot’s path. Range images (a2) and (b2) are quite similar globally judged from the entire image (except the critical area on the left side). In the context of an appearance-based 135 (a1) (b1) (c1) T A if A TN}; - - ‘ L 1 1.li ‘ J (a2) , (b2) g KC?) , ‘ * I Y '1" T *4 k - - 1 TNJA - ‘ - k T A - - A k (33) (b3) (C3) Figure 5.2: Why attention is needed? The solid small circles denote the robot with a short line segment at the front indicating orientation. Thick lines mark the walls along the corridor. T and r denote the threshold and the mean, respectively. (a2), (b2) and (c2) are range images taken by the robot at the three scenarios, respectively. (33), (b3) and (c3) are the corresponding images after passing the attentional module. The diagrams in lower two rows use logarithmic scales for the Y-axis. We can see that the distance between (a3) and (b3) becomes larger while the distance between (b3) and (03) becomes smaller. method, this means that the distance (e.g., Euclidean one) between the two is small. They require different actions: turning right for (al) and going straight or turning left for (b1). In another case, range images (b2) and (c2) are very different globally, but their desired actions are similar: going straight or turning left. Thus, it is difficult to discriminate the three cases correctly by using a distance metric defined on the entire image. But, if we look at the left regions in (a2), (b2), and (c2) of Fig. 5.2, we can see that the similarities and differences are clear. Without a capability to attend to this critical region, the learning system requires significantly more training samples 136 when complex scenarios are considered. In the above three cases, the critical area is the input component where range readings are very small. This is true, in general, because near obstacles determine heading more than, and often take precedence over, far-away objects. As we shall see later, this can be accomplished by an attentional module. We first define the scalar attentional effector. Definition 5.3.2 The operation of the attentional eflector a(t) for input r(t) and output z(t) is defined by: l(t) = 9(T(t), a(t)) = (53) where r denotes the sample mean of the raw signal r(t). For intended application, we would like to have a(t) to behave in the following way. First, when all input components have large values, the attentional selection is in its default mode, turning on all components. Second, when there are nearby objects, the attentional selection activates only nearby objects which are critical for object avoidance while far-away objects are replaced by their mean readings. This attentional action can be realized by two programmed functions 9 and f : a(t) = g(n-(t), a(t)) (5-4) and 1 if r,- 139 where q = (rm - T)/A, p :2 (rm —- T)/rm, and Epr{-|C} denotes the expectation on condition C. Proof Consider the cases where there are k (1 g k S h) end points located within the half-circle T (see Fig. 5.4). The number of possible configurations for the h — k end points outside the circle, denoted by sk, is: 3,, = qh—k. (5.8) Because the radial distance of the h — k end points have the freedom to choose values from the interval [T, rm], which has q discrete values. By definition: h EP'{R(P')IC} = Z skPUch). k==l where P(k|C) denotes the conditional probability when I: end points are located within the half-circle T. We can see P(le) = 050- Ink/(1 - Phl- Therefore, h k _ k EP'{R(P’)} = th_.____c.<1_pf> k=l 2 22:0 Ctr—*0 - 1))" - q" 1 — ph (q+(1-p))"-q" > (1 -r)hq”"1 1-19" 1-19" ' In the last step, the inequality, (x + (5)” — x” > nxn‘lii if 0 < 5 << x, is used. [I Table 5.3.2 shows the lower bound of the size due to attention at two typical parametric settings. We see that the reduction is large. Of course the size of remaining space to learn is also large. The ratio of space reduced over original space is roughly p. 140 Table 5.1: The lower bound of the reduction ratio at several parametric setting Parameters Reduced size rm = 50m, A = 0.2m, T = 2m, and h = 10 9.51 x 1020 rm = 10m, A = 0.2m, T = 2m, and h = 15 7.52 x 10‘22 ' I I III III II’( I ”Iva, I a f ’ I O I , --’ I ’ Figure 5.4: (a) shows that the pre—attentional map r is approximated by a polygon P. P is specified by h end points: p1, p2, ..., ph. The kth end point is denoted by (lk, ak). (b) shows the post-attentional approximation P’ , after the attentional function 9". Those end-points whose distances are larger than T are set to 1". The half-circle with symbol T shows the area the robot pays attention to. We can see numerous pre-attentional approximations map to a single post-attentional approximation P’. It is clear that the points outside the half-circle of (a) have the freedom to change positions without affecting the shape of (b). 5.3.3 Path planner The path planner gives the desired heading 6 required by obstacle-avoidance, based on the robot’s estimated pose and the occupancy model of the environment. We divide the path planning into two steps: First, a global search algorithm is applied on the occupancy grid model of the terrain. Second, to account for unknown and dynamic parts of the environment, a partial search algorithm is used within the scope of the robot’s current sensing. How to estimate the robot’s pose is not the scope of this chapter, we assume the robot is given a method for calculating its coordinates in the occupancy grid via its sensing. For example, GPS receiver gives the robot’s location within a digital map as the robot navigating outdoor; for indoor environment, Kalman filter (KF) estimates 141 the robot pose based on the positions of the observed beacons and odometry. More formally, let G = (V, C) be a weighted grid graph. V is a set of cells while C is a cost function defined on the cells. In the grid graph, each cell 3, except at the boundary, has eight direct neighbors, denoted by Neighbor(s). The cost of an cell x E V is given by c(x), representing the likelihood of the cell being occupied by obstacles. For simplicity’s sake we assume 0 S c(x) S 1 (e.g., zero means the cell is not occupied while one means the cell is occupied). We define a collection of paths through G, from the source 3 to the destination d, as r(C;s,d) = {(30,31, ...,sN) | sn+1 E Neighbor(sn),so = s,sN = d,n = 0, ...,N — 1}. The cost of a path is then defined by N c(so,sl,...,sN) = 20(371) (5.9) n=0 and we seek the minimal cost path through the grid graph. It is well known the path with minimal cost can be found via dynamic programming (DP) by recursively calculating k(x) and b(x), where k(x) denotes the minimum-cost path from the x to d cell and b(x) the next cell from r to the goal cell along the minimal cost path. k(x) and b(x) are defined respectively as: O x=d k(x)= minyeNeighbo,($){k(y) + c(x)} otherwise, and 0(3)) = a‘rgrninyeNeighbor(x){1"(y) + C(33)}. The path with minimal cost is then ($3,313 ...,s}',,), where 53 = s and s; = b(s;_1), 142 u 15" Figure 5.5: The occupancy grid of an environment. Black cells denote to occupied cells, while others are free ones. Each cell has eight directly connected neighbors: N, NE, E, SE, S, SW, W and NW, each of which corresponds to an angle: heading 9. The optimal path from S to G can be encoded by the heading sequence: (E, E, E, NE, NE, NE, NE, N, N, N). for n = l, 2, ..., N. The minimal cost from the source to destination is c(s,d) = c(sa,sI, ...,sjv) = k(s). Since the cost function c(x) is positive, Dijkstra algorithm [28] can be used to search the optimal path. If the priority queue is used, the complexity of the algorithm is 0(IVlloslVI) [24]- Now we can define the planned heading 0 based on the function b(x), as illustrated in Fig. 5.5. The heading d(x) of the cell :5 has eight possible directions: N (90°), NE (45°), E (0°), SE (—45°), S (——90°), SW (—135°), W (180°), and NW (135°). We define the head 6(x) as the direction toward the next cell along the optimal path, e.g., the heading of the cell x in Fig. 5.5 is NE. The above-mentioned DP procedure computes the heading 0 for the entire occu— pancy grid. This is motivated by the fact that the same heading can be reused for every location of the robot if the environment does not change. To account for the discrepancy of the occupancy grid from the actual environment, re—planning is needed. For example, when the robot senses an obstacle blocking the planned path, the path 143 is replanned from the robot’s current pose. It is not necessary and not desirable to recompute the heading for the entire grid again. Instead, the heading is computed within the scope of robot’s current sensing. In other words, this local-search strategy limits the look-ahead search within the vicinity of the area enveloping the robot’s current laser map. More specifically, let L(x) denote the cells within the range of the robot’s sensing from the pose x, which is a subset of the whole occupancy grid S. B(x) denotes the boundary of L(x) but not belonging to L(x), i.e., B(x) = {z I z E S — L(x),3y (z E Neighbor(y))}. Now we are ready to specify the local search scheme in Algorithm 1. Providing the cost of a cell is positive, Eqs. (5.10) and (5.11) can be calculated by Dijkstra algorithm and the complexity is O(|L| log(|L]). By carefully choosing the size of L, Algorithm 1 can be realized on a real-time mobile robot. Algorithm 1 Local search algorithm within L 1: if the planned path is obstructed by an obstacle then 2: Let item be the robot’s current pose. 3: For each cell x E L(xcur), set [c(x) to its initial value k0(x) (e.g., non-informative zero). . for all each cell x E L(xcur) do 5: Look ahead and value update: Update the heuristic estimate k(x) by: [C(13) =max{k($). min [C(IIy)+k(y)l}, (5-10) ye B(zcur) where c(x, y) denotes the minimal cost of path enveloped within L from the cell x to y. 6: end for 7: Choose a path to a boundary cell y E B (17cm) such that y = argminy68(xcur)[c(xcuri y) + [C(31)] (5'11) 8: Return the head 6 along the found path from xcu, to y. 9: end if Not being an exhaustive search strategy as N F1 [57], oscillation re-plannings occur 144 rarely in badly conditioned obstacle configuration but never endanger the accomplish- ment of the task. More specifically, like LRTA* [54], the proposed re—planning scheme never fails to reach a goal under assumption the the k-values of all cells outside space L do not overestimate the true values. Our scheme is better than the LRTA* in the oscillation issue by employing look-ahead within the envelope of current laser map, while LRTA* has an one-step look-ahead strategy. We obtained reasonable subopti- mal solutions for almost all cases faced by mobile robots, for example, this strategy can deal with all kinds of U-shaped obstacles as long as they are detectable within the robot’s local sensing scope. 5.3.4 Cost function The cost function of the paths r(G; s, d) in Eq. (5.9) can easily modified to incorporate the length and curvature of path. It is desirable to obtain a short and less Wiggly path. The modified cost function is then defined as: N—l N—l N—l c’(30. ....s~) = a Zea.) +fl2dist