SYNTHESIZING REALISTIC ANIMATED HUMAN MOTION USING MULTIPLE NATURAL SPACES By Reza Ferrydiansyah A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Computer Science 2011 ABSTRACT SYNTHESIZING REALISTIC ANIMATED HUMAN MOTION USING MULTIPLE NATURAL SPACES By Reza Ferrydiansyah When animating virtual humans, it is important that the movements created are realistic as well as that they meet various constraints. One way to create motion, given a starting pose, is to first find an ending pose that meets the specified constraints. Then a motion that translates from the starting space to the ending space is computed. Traditional inverse kinematics method are able to find poses that meets constraints, however these poses are not always natural. Linear interpolation between a starting pose and ending pose can be used to create motion. Once again however, the interpolation method does not always create motion that is natural. This thesis proposes the creation of a natural space. The natural space is a hyperdimensional space in which every point in this space describes a natural pose. Motion can be created by traversing over the points in this space. The natural space is created by reducing the dimensionality of motion capture data using Principal Component Analysis (PCA). Points in the reduced space retain the characteristic of the original data. Multiple natural spaces are created on different segment of the human skeleton. This thesis describes a method to generate new constrained natural poses that are natural. The poses synthesized are more natural than traditional inverse kinematics, and single space PCA. Motion is created through a space consisting of pose configurations and angular speed. A method to generate realistic looking motion based on this space is presented in this thesis. Keywords: Principal Component Analysis, Inverse Kinematics, Computer Graphics, Human Animation, Naturalness, Skeletal Model Dedicated to my beautiful wife Andriana and my energetic son Rivian iv Acknowledgements It is finished. All praises to Allah, God of the Universe, through his blessings, I was able to finish this dissertation and obtain my PhD at this fine school. Throughout this long journey I have encountered so many people who have made this work possible. First and most importantly, my wife and soul mate, Andriana, who is my inspiration and who has kept me focused on finishing this work. My parents have always given me support and encouragement, even if they do not understand anything about computers, and without them I would not be where I am today. Of course, this work would be impossible to finish without my advisor Dr Charles Owen, who taught me what a PhD dissertation is and isn’t. Dr Owen’s advice was priceless through the whole process of finding a topic, doing the experimentation, and writing the thesis. I have also received generous guidance from my committee. Dr Yiying Tong helped me find the ‘Aha’ mathematical formula moment. Dr Frank Biocca and Dr Joyce Chai’s gave me a different set of very interesting viewpoints that made for some very interesting ideas. I would also like to thak the CSE 101 instructors: Dr Jo Smith, Kevin Ohl, Helen Keefe, Judy Eberlein, Erik Eid, and Vaughn Anderson of whose course I was a big part of during my PhD. Throughout my time here, I have been blessed with meeting other students who share the same enthusiasm for learning. I will always be thankful of Metlab’s Lisa Rebetsnitch, Annette Lettsome, Fan Xiao, Kayra Hopkins, Jerry Pinero, Zubin Abraham, and Sarah Coburn for their part in dissecting my work, often repeatedly. A special shout also for my fellow CSE 101 crew who I spent so much time with: John Hettinger, Jacob v Brown, Alex Peer, Jignesh Patel, and Mayur Mudigonda for their friendship and conversations regarding all things. It is also important for me to also acknowledge the people that have shaped my path towards where I am know. Dr Yudho gave me my first programming in Basic book in Melbourne. Dr Windy Gambetta and Dr Bambang Parmanto conspired to send me to the United States to pursue my studies. Special thanks to Dr Wayne Dyksen who championed me to the admission committee here at MSU. Finally I’d like to thank the Elhedon Indonesian community including Putri Arum Jati, Perdinan, Mujiburrohman, Dwi Agus Yuliantoro, Yudi Wicaksono, Ririn Seamount, Irfan Prasetya, Saraswati Haruming, Tiara Ahmad, Ainur Rosyid, and Arinanda Mamahit among others who have kept me sane during my PhD years. Is it finished? No, a Phd is just the beginning. Reza Ferrydiansyah vi Table of contents List of Tables ......................................................................................................... ix List of Figures ......................................................................................................... x 1 Introduction..................................................................................................... 1 1.1 The problem of natural movement.......................................................... 2 1.2 Current Methods for Natural Animation................................................. 5 1.3 Proposed Solution ................................................................................... 7 1.3.1 Thesis Contribution............................................................................. 9 1.4 Document Structure .............................................................................. 10 2 Pose and Motion Calculation ........................................................................ 11 2.1 Kinematics Method............................................................................... 12 2.2 Dynamics Methods ............................................................................... 15 2.3 Multiple Constraints.............................................................................. 17 2.4 Data driven animation........................................................................... 18 2.4.1 Motion blending................................................................................ 20 2.4.2 Motion Synthesis .............................................................................. 22 3 Natural Spaces .............................................................................................. 24 3.1 Creating Natural Spaces........................................................................ 24 3.2 Principal Component Analysis ............................................................. 27 3.2.1 Use of PCA in human animation ...................................................... 29 3.2.2 Creating a Pose Space from Motion Capture Data ........................... 30 3.2.3 Characteristic of Synthetic Data in Natural Space............................ 32 3.3 Multiple Natural Spaces........................................................................ 36 3.3.1 Effector Space................................................................................... 37 4 Poses ............................................................................................................. 39 4.1 Lookup strategy .................................................................................... 39 4.1.1 Results............................................................................................... 41 4.2 Inverse Kinematics in Pose Space ........................................................ 45 4.2.1 Conquering joint limits ..................................................................... 49 4.2.2 Naturalness........................................................................................ 50 4.2.3 Number of Dimensions ..................................................................... 52 4.3 Multiple PCA Space Inverse Kinematics ............................................. 54 4.3.1 Estimated Effectors Space ................................................................ 55 4.3.2 Matching overlapping angles ............................................................ 56 4.3.3 Weighting Overlapping Angles ........................................................ 57 4.4 Results................................................................................................... 58 4.4.1 Accuracy ........................................................................................... 58 4.4.2 Naturalness........................................................................................ 60 vii 5 Motion........................................................................................................... 67 5.1 Characteristic of natural motion............................................................ 67 5.2 Utilizing a Motion Space ...................................................................... 69 5.2.1 The Motion Space ............................................................................. 71 5.2.2 Calculating Motion ........................................................................... 72 5.2.3 Characteristic of Motion Speed ........................................................ 76 5.2.4 Speed Trajectory Matching............................................................... 79 5.2.5 Multiple Spaces................................................................................. 83 5.3 Results................................................................................................... 83 5.3.1 Motion Reconstruction...................................................................... 84 5.3.2 Pose Based Motion Synthesis ........................................................... 88 5.3.3 Speed Based Motion Synthesis......................................................... 89 6 Conclusion and future work.......................................................................... 95 6.1 Conclusion ............................................................................................ 95 6.2 Future Work .......................................................................................... 97 Appendix A. Multiple Space PCA Results ......................................................... 101 References........................................................................................................... 108 viii List of Tables Table 3-1 List of bones for each segment ......................................................................... 37 Table 4-1 Accuracy comparison of various pose generation algorithm ........................... 42 Table 4-2. The effect of different number of dimensions to naturalness and accuracy.... 52 Table 4-3. Accuracy comparison of all algorithms........................................................... 59 Table 4-4. Average log likelihood of all algorithms......................................................... 60 Table 4-5. Average log likelihood comparison of the Inverse Jacobian and Multi PCA algorithm ................................................................................................................... 64 Table 4-6. Average log likelihood comparison of the Single PCA and Single PCA + algorithm ................................................................................................................... 64 Table 4-7. Average log likelihood comparison of the Single PCA and Multi PCA algorithm ................................................................................................................... 65 Table 4-8. Average log likelihood comparison of the Single PCA+ and Multi PCA algorithm ................................................................................................................... 65 Table 5-1 Number of dimensions per M-space segment .................................................. 83 Table 5-2 Euclidian distance for the right hand segment motions.................................... 84 Table 5-3 Euclidian distance for the right foot segment motions..................................... 85 ix List of Figures Figure 1-1. An example of a skeleton model for human animation. The skeleton model is used based on [16]. ..................................................................................................... 3 Figure 3-1 Weight vs. Horsepower data and the PCA axis. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. ....................................................................................... 27 Figure 3-2 Transformed data to PCA space...................................................................... 28 Figure 3-3 PCA data reduced to just one dimension ........................................................ 29 Figure 3-4 The coordinates of the right hand from both the original data (black) and the natural pose samples (grey). The top-left picture is the cutaway x-y coordinate view, the top-right, x-z and the bottom z-y......................................................................... 33 Figure 3-5 Naturalness of random poses generated by the P-space, random (with distribution), random, and from the motion capture data ......................................... 35 Figure 3-6 Division of the skeleton as shown in Figure 1-1 into multiple segments ....... 36 Figure 3-7 Location of effectors for effector space. The coordinates of joints that are colored in red are stored used to create a lower dimensional space. ........................ 38 Figure 4-1 Mean angle difference of poses created by CPG (both Jacobian and CCD) and iterative Jacobian method.......................................................................................... 43 Figure 4-2. Each frame shows the result of running the CPG algorithm for each of 6 poses. The pictures on the left side are the front view of the pose; the pictures of the right side are is from an angled view from the right side of the animated humans. The leftmost pose in each frame was created using the NSPA algorithm followed by the CCD algorithm, the middle pose was calculated using CCD, and the rightmost pose was calculated using the iterative Jacobian method. ........................................ 44 Figure 4-3. Graph of number of dimensions vs accuracy and naturalness ....................... 53 Figure 4-4 . Graph of accuracy of all algorithms.............................................................. 60 Figure 4-5. Average log likelihood of all algorithms graph ............................................. 62 Figure 4-6. Unnaturalness is caused by extreme leg movement in single space PCA. The figures were created by (from left to right) regular Jacobian, single space PCA, single space PCA +, multiple space PCA ................................................................. 63 x Figure 4-7. Unnaturalness is caused by extreme leg movement in single space PCA. The figures were created by (from left to right) regular Jacobian, single space PCA, single space PCA +, and multiple space PCA .......................................................... 64 Figure 4-8. Graph of pairwise comparison between algorithms....................................... 65 Figure 4-9 Sample result poses for 2 constraints (hands) problems ................................. 66 Figure 5-1 The speed and angles of natural motion and a linear interpolated motion...... 68 Figure 5-2 Motion vector representation used in this thesis ............................................. 72 Figure 5-3 Total speed, angles, and angular velocity of motion calculated using M-space. ................................................................................................................................... 76 Figure 5-4 Mean error of speed reconstruction on original samples ................................ 77 Figure 5-5 Mean speed trajectory of running motion and walking motion ...................... 78 Figure 5-6 Total speed, angles, and angular velocity of motion calculated using Speed Trajectory Matching.................................................................................................. 82 Figure 5-7 Comparison of walking motion....................................................................... 86 Figure 5-8 Speed and angle comparison of right hand and right foot .............................. 87 Figure 5-9 Creating new pointing motions using natural spaces ...................................... 89 Figure 5-10 Effect of a different speed trajectory to motion ............................................ 90 Figure 5-11 Speed and angle per frame after fitting with new trajectory......................... 91 Figure 5-12 Speed trajectories for walking and running motion ...................................... 92 Figure 5-13 Walking and running motions generated from prototype speed trajectories. The top figures show the original motion. The figures on the second row shows generated walking motion, while the figures on the third row shows generated running motion.......................................................................................................... 94 Figure 6-1 Results of running the pose generation algorithms on right hand constraints only ......................................................................................................................... 102 Figure A-2 Results of running the pose generation algorithms on right hand and left hand constraints ............................................................................................................... 103 Figure A-3 Results of running the pose generation algorithms on right foot constraint only ......................................................................................................................... 104 xi Figure A-4 Results of running the pose generation algorithms on right and left foot constraints ............................................................................................................... 105 Figure A-5 Results of running the pose generation algorithms on hands and foot constraints ............................................................................................................... 106 xii 1 Introduction Object animation in computer applications is virtually a requirement in current computer technology. With the rise in ease of access to computer graphics technology, animation is now used in a wide range of applications as an interface element. Animation describes the temporal manipulation of elements in a computer graphics system. Animations describe scenarios, events, and other information allowing users to quickly understand what is happening. Animation allows users to easily manipulate scenarios as the application allows, allowing better understanding and insight in the various scenarios and the relation of objects in those scenarios. One of the most important fields of endeavor in computer graphics and animation is the creation of apparently realistic animations The creation of animated virtual humans can be thought of a sub-problem within this field. The terms animated human or virtual human in this paper refer to human characters that has been rendered by the system on a display or to use as an element of a virtual/augmented reality system. An autonomous character that can make its own decision based on an algorithm (albeit in a limited way) is called an agent. An agent that is presented as a virtual human is referred to as an embodied agent. Creating realistic human animation requires realistic modeling and rendering of objects, modeling of the object’s physical characteristics, creation of realistic object behaviors, and object interaction with the user and other objects in an environment [1]. Virtual embodied agents are graphical renderings that represent humans in virtual and augmented environments. Humans respond to animated humans whether controlled by a real person, or by an algorithm [2]. Virtual humans have been used as characters in animated movies [3], actors in interactive story systems [4-6], controllable agents in computer games, tutors in educational software [7-9], and presentation agents [10]. They can also appear as a guide to a 1 person in an unfamiliar land, serve as a trainer who demonstrates and oversees, or provide the interactive component of a future user interface. These virtual humans can also be controlled by humans or act in a predefined sequence of actions. Due to our everyday interaction with other humans, we are used to seeing and noticing natural movement everyday. When embodied agents do not act naturally even only slightly, the user’s focus will be distracted by the unnaturalness of these virtual humans [11-13]. A prime example of this is in the movie Polar Express, where critics said that the movie was a good movie story wise, but they were bothered by the unnaturalness of the eyes [3]. Unfortunately this was one of the bad points of the movie which was oft repeated. 1.1 The problem of natural movement There are two main reasons why the animation of natural movement is difficult. First, the human body consists of a set of parts which are joined together. Each part moves according to a specified set of degree of freedoms. Animation for a virtual human body is usually achieved through the use of a skeleton model [14], [15]. The skeletal model consists of various joints. Each joint has up to 3 degrees of freedoms. The joint allows the bone to rotate in the x, y, and z axis. The rotations are adequate to simulate any human pose. Chapter 2 discusses the formulas used to calculate position based on these bone rotations in more detail. Figure 1-1 shows the skeletal model used for this thesis. This skeletal model is based on the skeletal model for the ASF specification [16]. The skeletal model consists of 30 joints and 56 degrees of freedom. 2 Figure 1-1. An example of a skeleton model for human animation. The skeleton model is used based on [16]. Because there are multiple joints and bones, the human body is flexible enough that it is able to achieve the same task with multiple poses. There are, for example, different poses for a person to retrieve something that is on top of a table. There are many possible different combinations of angles for each pose. Many of them will be deemed unrealistic to the trained human eye. This flexibility is great for humans, but a headache for animators. Out of all these possible poses, an animator must choose a ‘best’ one. This usually means that the animator must choose the most natural pose or the pose that sets the human up for the next movement sequence. 3 What does it mean to have a natural pose? In this paper, naturalness is defined simply as the pose selected by most people given the same constraints. The constraints are positions and orientation of body parts, positions of obstacles, and external physics constraints (such as gravity or motion). Whether a motion or even a pose is deemed to be natural or not will be observed by different criteria. Naturalness may be graded based on energy, meaning it will be based on the notion that the human will take as small, or least costly, a motion as possible to achieve the constraint, or to hold that constraint for a long time. Naturalness may also be based on the starting pose. If we start with an awkward starting pose, the natural thing to do to get to a constraint may simply continue with the awkward pose even though it is inefficient in the long run. Another very important factor is the environment. For example the position of target objects (where the animated human wants to touch or avoid) is obviously a primary factor in the resulting pose or animation. The positions of other (non-target) objects are also important because animated humans in most cases are assumed to follow the same physical rules that real humans adhere to. Therefore it is very important that human body parts do not, for example, go through any of the available objects. Because of interaction requirement, sometimes a virtual human must face a certain direction. Humans naturally point their eyes at what they are attending to or attempt to maintain eye contact with the user. Eye contact can also be used as cue for the users [17]. For example if the application wants to induce the user to pay attention to a particular object, the virtual human may be made to look at the object. 4 Finally naturalness may also depend on emotions and mood. People with different emotions tend to perform different types of actions [18]. A naturally disgusted motion will be different than an angry motion. A successful virtual human agent must take all of these factors into account when performing a particular motion. 1.2 Current Methods for Natural Animation In applications where the virtual agent has no need to think, plan or interact with the environment, or in other words have no autonomy, an artist determines the action of the virtual human at each time step. There is no internal agent representation of the current state or of the actions performed by the agent. Everything in the world is fully controlled by the world designer. This makes this class of agent relatively easy to create. However, it can be quite tedious to create animations frame by frame. . Tomlinson [19] refers to agents having non-autonomous behavior as linear agents because this method is not suited to characters that must interact with users. Consequently this method is of more interest to those creating motion pictures or other static animations [20], [21]. Tomlinson describes the various differences between linear and autonomous agents as well as differences in the applications and usage of these agents. Another way to create linear animations is through the use of scripts. Scripts are code in a human-readable language that will be translated by the animation engine into movements. Script languages are usually created by the animation engine developer and used by the artist to create the animation. Some example of systems based on scripting language are the Improv system [22], and STEP [23]. Similarly the Jack architecture uses a natural language system to describe motion and intention [24]. 5 The main difference between scripts and manually created animation is that scripts are internally known by the agent. Scripts can also make it easier to create new animations as it is quite easy to copy movement from one part to another (or from one agent to another), and simple programming constructs such as loops, sequential presentation, and blending of actions can be easily created. Because an artist creates individual frames, it is the responsibility of the artist to create the most natural animation. As artists, they may have the knowledge and the experience to do this, possibly to even do this well. In many cases the results are human animations that are very natural. Of course the drawback to this method is that it will take a significant amount of time for the artist to create each frame and a significant amount of talent on the artist’s part. Work has been done to create animations that are fully computer generated. Given a set of constraints, an algorithm calculates the correct angles depending on the starting position and other factors. An algorithm that calculates a set of angles based on one or more target position is classed into the inverse kinematics method [25]. The forward kinematics method calculates the position of each point on the body based on the angles of the joints In most cases of human animation, inverse kinematics methods are quite difficult to calculate algebraically. Therefore inverse kinematics solutions are often solved iteratively. The number of iterations may be quite high and therefore an application may take some time until a solution converges. Simple inverse kinematics solution algorithms such as the iterative Jacobian, do not necessary consider naturalness as a goal. The main objectives of these algorithms are simply to get a correct set of angles that satisfy the constraints. 6 In animation where the virtual humans walk, run, or jump, dynamics and spacetime calculation is often used [26-28]. The dynamics and spacetime calculation are based on physical forces that apply to each body part. Appendages are influenced by gravity, inertia, and other forces. By calculating these forces, a more natural movement is generated. Sampled movement, or motion capture, captures movement data directly from real people. These people are usually given a suit with reflectors to wear [29], [30]. A set of cameras capture their every movement. It is impossible to capture all possible motions that a human can perform, therefore new motion data must be generated by manipulation of the existing data. Manipulation of existing data can be as simple as splicing the motion capture data and joining them together to create different motions (motion blending). The motion capture data may also be transformed to fit different characters (motion retargeting). Finally new motion data may be generated which are similar to the pre-existing motion data. The difficulty is, of course, how to generate data that encompasses various positions from the available data and still make them realistic. Current techniques are often the result of integration of techniques from multiple categories. For example, motion can be created via kinematics techniques, then enhanced with dynamics techniques. Another common example is to use captured motion data and use dynamics techniques to create a new set of movements that both adhere to physical laws and are based on motions of a real representative human. 1.3 Proposed Solution The problem of animation generation can be stated as follow: A starting position angle that can be represented by the set of angles P0, and a set of constraints C0..k which are the 7 constraints of all positions in the movement are supplied as input. Some of the constraints may be hard constraints that must be met, while others may be a set of targets which the motion must try its best to fulfill. The objective of the algorithm is to find a motion that allows a smooth natural movement between P0 and Pt and satisfies all constraints at time t. In general, constraints can be positions of various body part, the angles of each degree of freedoms, orientation of a particular bone, or the position of other objects that cannot collide with any body part. In this thesis however, the focus is on Cartesian coordinates of the body part or effectors relative to the root bone. The problem with any animation generation is that there are many possible solutions that meet the constraints. An algorithm must find the correct pose from all possible solution, which may not be natural. This may entail rejecting unnatural movement, or changing angles to make it more natural. However if all possible animated human poses in the search space are natural poses, there would be no need to attempt to naturalize the poses. The focus will simply be on satisfying all constraints. All poses can be placed in a hyper-dimensional configuration space, where each dimension represents the value of one joint angle. By pruning all points in the space that represents unnatural poses, it may be possible to use the resulting space to find natural motions. However, it is difficult to define the limits of the area of these unnatural regions. The next problem posed is limiting the movement generation algorithm so that it does not wander into those spaces once the area is found. The solution proposed in this thesis is to create a hyper-dimensional sub-space where all movement inside this space is natural (instead of defining pockets of unnatural poses). The idea 8 is to first determine relationships or correlations between the various angles of the bones in a natural movement. This correlation data is found by observation of natural captured data. Principal Component Analysis (PCA) is a commonly applied method that captures the correlations between values in a vector. PCA can also create a reduced dimensional space with the strongest correlation between the angles in the natural poses [31]. This PCA reduced space represents the natural space because points in that space can be transformed into an original space which represents poses having the same variance as the motion capture poses, and therefore consists of natural poses. Our algorithm performs the search on the natural space and each point on the natural space is transformed back to the correct angles. As PCA is a global statistical method, it summarizes data globally. This method tends to merge various data together, and as a result, some of the motion details get lost. The method described here uses multiple PCA spaces, where each space corresponds to a segment of the skeleton (segments can be overlapping). Each space finds the variance of one segment. This enables better control of each individual segment, and a better chance for finding natural poses that meet the constraints. Natural space will be created from motion capture data. Creating a natural space that encompasses a wide range of movement will require a large amount of data. However due to the number of available data, as well as time and financial resources, the data collected will probably not be large enough. Multiple natural spaces will have to be created from data segregated by motion type or body parts. The motion generator will need to choose the correct spaces and join data from multiple natural spaces. 1.3.1 Thesis Contribution This thesis: 9 1. Demonstrates that it is possible to synthesize new user-constrained poses and motion from motion capture data having the same naturalness characteristics as the original data. 2. Describes an algorithm that finds a pose by searching in natural space that meets a user specified positional targets for multiple effectors. 3. Shows that, by segmenting the body into multiple overlapping spaces, it is possible to find poses that are more natural than traditional inverse kinematics and regular single space PCA, while also more reliably achieving specified constraints. 4. Describes a method to measure naturalness of poses based on the probability distribution function of the motion capture data. 5. Gives details on how to create motion by traversing a natural motion/phase space. This method creates motion that are quite similar to natural motion compared to straight interpolation 1.4 Document Structure The details of kinematics and dynamics calculation techniques as well as data based techniques are discussed in Chapter 2. Chapter 3 discusses various methods of using a statistical summary of motion capture data, including the PCA to create a natural space. Chapter 4 describes the algorithm and results of using multiple natural spaces in creating poses. Chapter 5 shows the results of the motion generating algorithm. Finally a conclusion, possible applications and possible future work is given in Chapter 6. 10 2 Pose and Motion Calculation The act of animation is simply rendering multiple frames one after the other at such a speed that the viewer perceives it as a continuous motion. To create animation, a set of key frames is generated between one pose and another. The computer interpolates the positions between key frames, adding additional frames to make a smooth transition between poses for the viewer. In the simplest case, an artist creates all the necessary key frames leaving the computer to fill in the transition frames. If the key frames are spaced fast enough, the resulting interpolated frames will have a smooth and natural characteristic to them. The problem that this thesis addresses is when key frames are not created by any artist. Constraints such as the position or orientation of various body parts and physical characteristics are given for the whole animation sequence, or for the final pose only. In this case, the key frames must be generated automatically by the software. Two ways this can be achieved are to find an end pose then create key frames that lead up to that pose or to generate multiple pose paths from a starting position and choose one path which transistions the animated human to that pose. The algorithm to solve this problem is linked with the human model used. The skeleton model which is a hierarchical model of human geometry and physical motion is often used, and is the one that will be utilized for this thesis. The skeleton is a set of rigid bodies which move relative to each other in a hierarchical structure and move the overlaid flesh with them. Most models of bipeds in computer graphics systems utilize an internal, invisible armature, the skeleton of the object, meant to accomplish the same functionality. The skeleton does not need to conform to an actual human skeleton, either in size, shape, or relative motions, although some 11 do. The movement of the animated character is specified by the location (position, rotation) of the skeletal bones. The skin (which is usually a triangle mesh) is correlated with one or more bones and will move according to the movement of the bones. Work on skeletal representation began in the 1970’s [14], [15]. Since then, the skeletal model has been the primary method for human animation. Skeletons have also been used as the basis for captured motion data [32]. There are many standards and recommendation in creating skeleton, each standard allowing various bone properties. In most cases, bones are a 3 dimensional object having an origin and a length. The origin of a bone can be determined from either the end position of its parent bone [16], an offset from the end position of its parent bone, or an offset from the center coordinate[33]. Most bones have a rotation value in the x, y, z axis which allows them to move into various position. Some standards allow scaling in one or more axis or the specification of limits. The terms joint and bone are used interchangeably, since it is only the joint location that is utilized. The bone itself is not assumed to have a specific geometry. In general, the length is provided mainly for display and user interface purposes. 2.1 Kinematics Method Kinematics method uses angles of bones to perform movement. Forward kinematics is the calculation of a position of a particular point in the skeleton given various angles of all the bone in the skeleton. Forward kinematics calculation is a chain of rotations (representing angles of joints in the correct axis) and translations (which represent bone length). A simple set of matrix multiplications composes the rotation matrices and translation matrices. The rotation matrices relative to the x, y, and z axis are given in Equations (2-1), (2-2), and (2-3) respectively. 12 (2-1) 0 0 0 1 0 cos α sin α 0  Mrx =  0 − sin α cos α 0   0 0 1 0 (2-2) cos α 0 − sin α 0  0 1 0 0  Mry =   sin α 0 cos α 0   0 0 1  0 (2-3)  cos α sin α 0 0 − sin α cos α 0 0  Mrz =   0 0 1 0   0 0 1  0 A translation (t, u, v) in the x, y, z coordinates can also be represented as a matrix: 1 0 Mt =  0  0 (2-4) 0 t 0 u  1 v  0 1 Given the coordinates of the root bone as (xr, yr, zr), the end position (xp, yp, zp) can be 0 1 0 0 calculated by multiplying the root bone with the rotation of every bone, by the translation (or direction and length) of every bone Equation (2-5). (2-5)  x p   xr  y  y   p  =  r  × Mrx (α x1 ) × Mr y (α y1 ) × Mrz (α z1 ) × Mt1 × Mrx (α x 2 ) × Mry (α y 2 )  z p   zr      1  1 A common problem in animation is to find a pose that will satisfy a constraint that a particular point on a specified bone is at a coordinate T. This problem is called the inverse kinematics problem, and it is an inverse of the forward kinematics formula. Unfortunately this calculation is very difficult to solve algebraically. There have been works on solving systems with limited (6) degrees of freedom [34]. However an animated human skeleton typically consists of more than 20 degrees of freedom. 13 To solve inverse kinematics problems, iterative methods are often utilized. One basic iterative method used is the Iterative Jacobian method [25]. The coordinate of the effector X is a function of the various joint angles (Equation (2-6)). The derivative of X can be calculated using the Jacobian given in Equation (2-7). The Jacobian calculates the partial differential for each value for each axis (x, y, or z) over the partial differential for a particular angle Equation (2-8). X = f (α ) dX = J (α )dα ∂f J (α ) = i ∂α j (2-6) (2-7) (2-8) By inverting the Jacobian the angle movement can be calculated. dα = J −1 (α )dX The new angles can be calculated as: (2-9) α t +1 = α t + dα (2-10) This step is repeated until the ending solution is close enough to the solution. This iteration can actually take a while depending on the size of dX. A large values for dX will result in a more imprecise angle difference. A small value for dX means that it will be slower to converge to a solution. Other iterative methods calculate the difference of angles per degree of freedom instead of for the whole skeleton. One of these methods is the CCD or Cyclic Coordinate Descent method [35]. In CCD, the angle between a joint and the effector and between the joint and the target is calculated. The difference of that is the rotation angle needed for that degree of freedom. The CCD iterates from the joint closest to the effector to the root of the skeleton. The problem with these two inverse kinematics problems is that there is no notion of naturalism. All bones move equally with the main goal of finding a solution, any solution. Weighted inverse kinematics proposed by Meredith [36] allows different motion which depends on the weight given to different part of the bodies. Ideally, such a scheme would create a more 14 natural movement, however it is not clear how to determine the weights for each of the bones to create the most natural movement. Creating weights for each bone does not solve the problem of the speed or the number of iterations it takes to find a solution. 2.2 Dynamics Methods In dynamics-based models, the movements of animated objects are based on the physical laws that they are subject to, such as momentum, gravity, friction, and acceleration. The model must also take into account the properties of each object such as the mass and the shape of the object. A set of formula is created that, based on the physical laws and the object’s properties, determines where each object is at each time frame. Various models also allow objects to link with other objects and thus constrain both object movement [37], [38]. The problem of calculating the position and state of each object given forces that are known is called the forward dynamics problem. The solution to the inverse dynamics problem determines the forces or velocity needed for an object to move from one position to another or to stay in motion with respect to its current position, mass and shape, as well as existing external forces [37]. Given a set of constraints, inverse dynamics can be used to calculate the motion from a starting pose to a pose that meets the constraint. A combination of dynamics and kinematics are often used [39], [40]. The inverse kinematics element of the algorithm determines the position of the joints at various points. The dynamics element calculates the speed and the trajectory of each bone segment. The Newton-Euler formulation calculates linear acceleration and angular acceleration and is often used to calculate the desired speed of movement [28], [41]. The Newton formulation 15 states that, in linear systems, force equals mass times acceleration. In the Euler formulation of angles, the moment (or sometimes torque) is the inertia multiplied by the angular acceleration. Newton equation: F = ma (2-11) Euler equation (2-12) M = Iω Other methods include using the potential and kinematics energy for the Lagrange formulation. Witkin and Kass [42] proposed a spacetime computation which uses Newton’s law and takes into account the starting and ending position of the object, the starting and ending time of the movement, as well as any additional constraints to calculate the trajectory and the forces needed to perform a motion. Ko and Badler used inverse dynamics to animate human locomotion [43]. Based on the location of the objects at a time t (determined by a script), they determined the positional and angular acceleration of all links in the system using the Newton-Euler dynamics method. The kinematics system adapts to the dynamics calculation result. A set of dynamic equations for walking motion was created in [26]. A numerical integrator approximated the forces and the torques needed for the motion of both the upper and lower body. A very complex and detailed model of the body has been created by Lee et al [44]. The model includes a complete skeletal/bone model, muscles, and skin. Although such a model may be more precise, the sheer size and complexity may make it unfeasible to build. For specialized programs, it may be better to create dynamic models of the body part that is needed, such as a hand dynamic model for grasping [45]. Motions created by dynamics calculation are much more natural than those created by the pure kinematics calculations. The motions depend on the physics models used. The more 16 complete the physics model, the more physically correct it is, however it may take longer to find a solution to such problems. 2.3 Multiple Constraints Whether solving for inverse dynamics or a combination of inverse kinematics and inverse dynamics, many problems in animation requires the animation to meet several constraints. These can be pose constraints, dynamic constraints, time constraints, and/or mechanical constraints [28], [46], [47]. Pose constraints determine where the position of a particular bone or bone segment is at a particular time. Mechanical constraints determine various physics laws at work on the body. Time constraints refer to the time that an agent must be in a particular position. Finally, dynamic constraints ensure that physics laws such as Newton’s second law are met at all times. Other constraints that are also commonly used in the calculation of movement, such as constraints for collision detection [40], energy [48], balance, and comfort [43]. In some cases not all constraints can be met at once. There might be conflicts that prohibit all constraints to be met. To solve this problem a priority scheme or a weighting scheme is used [47]. The solver will try to solve the constraints with the highest priority first, or tries to iterate towards finding solutions having the highest weights. There are often other soft constraints, which do not have to be satisfied, but in which the algorithm tries its best to achieve. These soft constraints are sometimes represented as an objective function, the idea being that the algorithm must minimize (or maximize) the value of these objective function. The objective function is used to find the correct answer in the event of multiple answers (for example, taking into account the minimization of energy). 17 Constraints can be formulated as functions. Usually constraints are formulated as a function that results in 0 when met, that is Ci(Xj)=0. The objective function is useful when the problem is under constrained which will allow us to have multiple solutions to the problem. When the constraints and the objective functions are all linear, a linear programming algorithm such as the simplex algorithm can be used. In most cases however a non linear algorithm must be used in order to solve these problems. The steps in solving a non-linear constrained optimization problem is to simply find repeatedly an estimate solution based on the gradient of the objective function with regard to the solution. The algorithm used is a in a class of algorithms called Sequential Quadratic Programming methods. Due to space constraints this document does not go into details on the various solution algorithms and instead refer readers to [49]. 2.4 Data driven animation A lot of work has examined using captured human-motion in so-called data-driven computer graphics. An instrumented human is asked to perform movements which are then captured by a computer system. The advantage of using captured human motion data is that there is no need for an animator to input the animation data manually. Furthermore, the result of captured human motion data is natural and human-like because it is a sampling of real human motion. In a common setting for capturing data, a subject is placed in a studio, wearing a special (usually plain and dark) suit. The subject will have special markers at different places to track the location of various bones. Cameras (or perhaps only one camera) located at various angles record the subject as he or she proceeds with various motions. 18 Each recorded frame is automatically stored in the database. The computer locates the coordinates of each reflector. The main reason that the suit is plain is to reduce interference with the reflector. Once the location of every reflector is found, the angles of various joints can be calculated [29], [30]. The captured data consists of the parameters of the movement for each frame, namely the position and orientation of each skeleton segment. A single capture sequence can have many angles and joints. ASF formatted data, for example, has 29 joints with up to 3 degrees of freedom for each joint. The captured data will typically be stored at a frame rate of 15-30 frames per second. When playing back a captured motion sequence, the motion player simply sets the angles of the joints according to the frame data. Once the pose is drawn, the player waits a set time (depending on the number of frames per second there are), clears the image and draws the next frame in the same way. The amount of data accumulated this way is quite large. To limit the size of the files or the database, subjects typically perform a single short motion. This motion can be as simple as walking, running, jumping, or it may be more complex such as dribbling a basketball [50]. By grouping sequences, users of the motion data can choose the data that is useful easily. The users do not have to select frames within the motion capture database. Instead users select the motion needed and use all of it. There is also some work on automatically segmenting long motion capture sequnences based on similar actions [51]. One problem with this method is that the range of possible human motion is quite large. Even a simple basic movement may have many variations. For example, a person can jump while facing to the left and also jump while facing to the right. There are infinite variations to a 19 basic movement, and as such, it is impossible capture all of it using human actors. New motions must be created from existing captured data sequence. 2.4.1 Motion blending One way to create a multitude of animation sequences is creating a weighted combination of frames from different captured sequences, a method known as motion blending. In Kovar [52], Gleicher [53], and Arikan and Forsyth [54], animation sequences are created by creating connection graphs for frames that are similar. Captured human motion is transferred to a skeletal format. Each clip is broken into a small set of frames and each frame set is connected to other frame sets on different clips based on a similarity function. A simple approach to finding similar frames is by calculating the difference between the angles for all bones (perhaps include velocity) [55]. Kovar [52] opposes this idea because this idea does not take into account the importance of joints, differences caused by translation or rotation on the root bone, nor velocities and acceleration of the motion. Instead they find the similarity between point-clouds created over k contiguous frames. The similarity is the total distance of each point in one point cloud with a corresponding point in the other point cloud (which comes from a different motion). Pairs of frames between two motions which have very low difference are joined together to create graph edges. Linear blending simply switches motion when the two clips are very similar. An angular method blending proposed by Shum joins motions based on the angular momentum trajectory [56]. Blending can be performed in the middle of an existing motion, instead of waiting for it to end, creating a smoother transition between clips. Heck et al. used a splicing method, which joins upper body from one clip with the lower body from another clip [57]. To join the two segments together, they perform time and spatial alignment, as well as a posture transfer which matches 20 the location of the shoulders and hips for each clip. Mutlu blends frames together to create new walking motion on a different path based on the position of the foot plants [58]. An animation sequence is generated by taking a sequence of frames from a clip and then adding another frame sequence from another clip continuously until the character is in the desired position. The number of graph nodes and edges can be very large, therefore an efficient algorithm is needed to perform the search. This can be done by using various graph search methods to find the best motion between the starting pose and the ending pose. Lee [55] uses reinforcement learning to find the best path between all various starting and target poses. Reinforcement learning uses rewards and discounted rewards to determine which path to take in each graph node. Once the best action for each graph node is already met, it is a trivial to create animation that encompasses various nodes. This concept of path finding for motion has been developed further to create dynamic motion controllers. Dynamic motion controllers allow the creation of motions based on states, where each state consists of the configuration, speed, torque, and environmental factors [59], [60]. Each state includes the current pose, motion, and even the environments. The controller chooses the next state based on the current state, dynamic constraints, and the environments. Use of a controller allows some measure of variability in the motions created even in unpredictable environments [61]. Interpolation is performed on the newly connected frames to create a smooth transition. Interpolation creates k number of frames where each frame is a pose between the two existing frames. Linear interpolation, the most basic interpolation method, uses the starting angle and the ending angle (of each joint) to create an equal size interval which is used to calculate the angle at 21 every frame (Equation (2-13)). A better method is to first convert the Euler angles to quaternion and perform a linear interpolation on the quaternion [62]. (2-13)  αk −α0   k   A spline interpolation can also be used to connect two frames together. The difference of αt = α0 + t ×  angles between two frames in spline interpolation is not fixed. The difference of angles between frames takes speed of motions into account. However, spline interpolation typically requires more than 2 sample points in order to get the best interpolation fit. Motion capture data has also be used for basis of change. Yamane [63] and Pan [64] uses RRT, a randomized tree search to find possible paths between the starting pose and the ending pose. The points found represent collision free poses. Motions between the poses are created based on motion capture data, by finding motion clips that are meets the constraints. 2.4.2 Motion Synthesis This method of blending motion is adequate when the range of movement in an application is limited. A prime example of this is sport games. In sport games (such as American football, or soccer) the main interaction is between a human and a ball. By capturing various interaction between person and ball, some which are basic, and a few specialized movements, various combination can be created that will allow natural movement over a whole game. However if the range of movement is larger, or the terrain is unpredictable, motion blending will not do the trick. There is a need to create new motion based on existing captured data, a method sometimes referred to as motion warping. Abe et al. [32] used a base frame of animation data captured from actors to create a family of similar frames. Each generated frame has each character in the same position as the 22 base frame but translated (in the x or z axis) or rotated (using the y axis). Each new animation preserves the physics characteristics of the base animation. New motion can be created by warping or changing existing motion capture data to meet different constraints. Van Basten uses a greedy search to find feet motion that is similar to a new set of stepping motion [65]. Warping is then performed on the resulting steps. Different body types have different correlation between joints [66] and thus different movements caused by body type. A lot of work on motion warping has focused on adapting motion from the captured data to other characters that may have different builds, shapes and characteristics, a process known as motion retargeting. Different body types have different correlation between joints and thus different movements caused by body type. Meredith and Maddock [36] change the motion capture data using weighted inverse kinematics. By changing the weights of various joints, they are able to create personalized movement for different types of people. The weighting of various parts was also performed by Popovic [28] to create different movement based on personality. 23 3 Natural Spaces Multi dimensional data points such as the joint angles for motion capture data frames can be placed in an n-dimensional space. Each point represents one pose, and each value in a dimension represents the angle of the degree of freedom. This space is commonly known as the configuration space. A theoretical natural space N-space can be thought of as a sub-space in the configuration space where human poses are natural. The dimension of N-space can be smaller or equal to the dimension of the configuration space. Search for poses and motions that meet constraints using N-space only returns poses and motions that are natural. Therefore there is no need to make the results natural. One way to create N-space is to have it consist only all the points that are found from the motion capture data. Unfortunately, this severely limits movements, as poses that were not found on the original motion capture data will not be used even though it may be natural. Another way to see this problem is by viewing this problem as a classification and learning problem. Given a set of learned data (motion capture data), classify any points into either natural or not natural. In other words, divide the space into N-space and N’-space. The next chapter discusses various ways to create N-space from the learning of motion capture data. 3.1 Creating Natural Spaces The animation generation algorithm will start with a point in N-space and find a new point which meets all the criteria. This is done by testing various points in that N-space. In order to do this faster the created N-space must have the following characteristics: 24 1. There is an easy way of determining whether a position is in N-space or in N’-space. Every time a candidate point is found, the point must first be tested on whether it is in Nspace or not. If the new point is not in N-space, this point is discarded. 2. A reasonable estimate can be made of how close a new position in N-space is to satisfying the various constraints of the animation based on the information provided by the space and also the resulting pose of other known positions. This allows the search algorithm to intelligently choose points that will take it reasonably closer to the solution point. One idea to create N-space is to use model boundaries in the configuration space separating natural space from unnatural space. Voronoi cells or Parzen Windows [31] allow the creation of such borders. However in order to create these boundaries, both negative and positive samples (that is natural and unnatural poses) had to be available. Motion capture data only provides positive samples of human motion. Another problem with using this method is the complexity of the algorithm to create the borders as well as to detect the borders is O(n log n) where n is the number of data points [67]. To create N-space, various methods were considered. One popular classification method is the use of clustering. However the data obtained is not a good match for this method. There are 2 groups for this classification, a pose is either in N-space or in N’-space. Once again there is a need for negative or unnatural samples which are not available. Therefore any classification method that requires negative examples such as clustering can not be considered. Statistical modeling allows the creation of models that describes the motion capture data. Synthesis of new motions is possible using these models. One of the earliest works of statistical modeling on motion capture data is done by Pullen and Bregler [68]. They used wavelet 25 decomposition and a Gaussian kernel to model a Wallaby’s motion. Realistic motions was be created by calculating the conditional probability of each frames. Stylistic Hidden Markov Models have also been created from motion capture data. New motions were created by doing random walk on the resulting HMM state machines [69]. A multivariate distribution function can also be generated from the motion capture data [31]. This distribution function can then be used to determine the likelihood that a pose is a natural pose. N-space can be defined as any point that has a natural pose likelihood score of over P. The distribution function allows calculation of new points which have similar or higher likelihood score. However the distribution function by itself will not help find points that are closer to a solution. The search function needs information about the relation of points in the Nspace to the solution to determine the best motion path. Component analysis methods are able to summarize the relationships between various dimensions of the original data. Typically component analysis methods such as Principal Component Analysis, Independent Component Analysis, and Nonlinear Component Analysis allow the transformation from one space to another space that can better describe the data [31]. Component analysis methods are also typically used for creating a lower dimensional space of the data. Other lower dimensional methods such as MDS and Isomaps have also been used to model motion data [70]. The problem with MDS and Isomaps is that they typically need a large amount of space as they need to store the distance between all possible frames. This thesis uses Principal Component Analysis (PCA) as the main method for creating natural poses and motions. PCA allows a straightforward summary and generalization of the motion capture data. The number of PCA dimensions can be reduced to limit the search space, 26 which is compatible with the aim of creating a natural search subspace. Synthesizing poses and motions from PCA space does not need for significant amount of space and computation. 3.2 Principal Component Analysis PCA is a statistical technique that uses the variance of data to allow data points to be transformed into points in a reduced dimension space. An extended primer on PCA is given in Joliffe [71]. The points in the reduced dimension space retain characteristics of the original data set that contribute to its variance [31], [46]. Due to the smaller number of dimensions, data is easier to analyze. To calculate the PCA of a data, first find the mean and the variance for each dimension. The covariance matrix C can then be calculated from the mean and variance. Given the covariance matrix C, a set of eigenvalues λ are calculated according to Equation (3-1). Once the eigenvalues are calculated, the eigenvectors e are calculated according to Equation (3-2). C − λi I = 0 (3-1) Cei = λi ei (3-2) Figure 3-1 Weight vs. Horsepower data and the PCA axis. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. 27 The set of eigenvalues is combined together to create the transform matrix T with a dimension of m x n where m is the number of dimensions in the original data and n is the number of dimensions in the reduced space. T is used to transform any data point in the original space into a point in the reduced dimension space. The value of any dimension in the reduced dimension space is actually a weighted combination of values from the original space. To clarify the concepts here a chart was created to show the weight vs. horsepower data for various cars (Figure 3-1). This data was part of the Auto MPG data set from the UCI machine learning repository [72]. The lines in the figure show the scaled eigenvectors which form an orthogonal axis. Figure 3-2 shows the result of transforming the original data with the PCA transform matrix created from the eigenvectors. Figure 3-2 Transformed data to PCA space If the dimension is reduced to one dimension, all the data will be on a single line (Figure 3-3). This reduced dimension space limits the data that can be synthesized. New data when transformed back to the original space will be on the main axis line of the PCA. 28 Figure 3-3 PCA data reduced to just one dimension The transformation matrix to calculate the position of a point in the reduced dimension space is not a square matrix (because the number of dimensions is different). Although no inverse matrix is available for this non-square matrix, a pseudo inverse matrix can be created using the Singular Value Decomposition method [73]. Any point in the reduced dimension space can be inverted back to the original space using the pseudo inverse of the matrix. These points will have the same covariance between points as any of the original motion capture data that is in the original space. The next few sections show that synthesized points in the natural space (within some boundary) is natural. Therefore the reduced dimension space created by PCA is a good candidate for N-space. 3.2.1 Use of PCA in human animation In PCA, the eigenvectors can be ordered by importance. The first k set of eigenvectors can be used to represent a certain percentage of the variance of the data. This property has been used to create compression algorithms on human motions using PCA [74], [75]. The quality of the motion reconstruction depends on the number of dimensions used in such a scheme. 29 PCA is often used to summarize motion capture data and then to generate new poses. Li shows that PCA can be used to create general unconstrained motions [76]. Some, like Tilmanne, uses PCA to create walking motions that are based on different emotion expressions [77]. Johnson’s PhD dissertation focuses on using PCA on motion capture data based on quaternion [78]. The aim of his dissertation was similar to this work, which is to use the PCA space as an “expressive” sub space in order to synthesize poses and motions. His work focuses on using a single PCA space for the whole body, while this work focuses on multi PCA spaces. In his work, he also mentioned an inverse kinematics solution using the sub space, but never actually implemented. Most of the work on PCA as a means of representing motion has been based on motion categorized into specific tasks [79], [80]. This thesis described a method that takes general motion data and performs PCA on body parts segments to better control the motion to meet the user’s constraints. The naturalness of the resulting PCA generated motion is often just visually inspected. In Chapter 4.4.2, naturalness is shown by statistical comparison of the various motions. 3.2.2 Creating a Pose Space from Motion Capture Data This chapter describes the creation of a natural pose space (P-Space) from motion capture data. The term pose space is a specialized version of a natural space (N-Space) that only contains information about pose configurations. Data used for this method is motion capture data which primarily deals with angles for each degree of freedom. Other information can also be added or deduced from the existing data. Additional data may include goals, emotions, and speed of movement. Adding information typically increases the number of dimensions, therefore the data used for learning need to also be increased. 30 To generate the P-space, frames are read from motion capture data. Each frame is considered to be a singular natural pose. A vector X=(x1, x2, …, xn) is created from the joint rotation angles for each pose. To normalize the data, the rotation and translation angles of the root bone are ignored. The covariance matrix C is calculated, and the PCA transformation matrix T is created based on the eigenvectors calculated with Equation (3-2). Each pose X can be transformed to a corresponding point in N-Space α=( α1, α2, …, αn) using Equation (3-3). α =T×X (3-3) This point is then transformed to an actual pose by using the inverse of the PCA transformation matrix. The original transformation matrix T is orthonormal, and therefore the inverse of this matrix is simply its transpose. The dimension of the pose space is less than the original dimension. Therefore, only the first n columns of the inverse of the transformation matrix are used. A vector X in the original space can be created from a point in P-space by using Equation (3-4). (3-4) X = T T ×α Multiple points in the original space may map to the same point in the reduced space. Inverting the point in the reduced space returns a point with the least mean squared-error to all possible points. It is not uncommon that the degrees of freedom may slightly exceed the specified DOF limits. If any of the angles of X is outside the bounds of the joint (AMini...AMaxi) the angle is adjusted according to the limit of the angles (Equation (3-5)). That is if Xi > AMaxi then Xi= AMaxi and if Xi< AMini then Xi= AMini. The angle error is the difference between the limits and the calculated joint angles. 31  xi < AMini  AErri =  xi > AMaxi  otherwise  AMini − X i (3-5) X i − AMaxi 0 3.2.3 Characteristic of Synthetic Data in Natural Space One important focus of our work is whether the P-space can be utilized to create new natural poses. First, a P-space is created for the right hand motion. The bones involved consists of right hand, right wrist, right humerus, right clavicle, thorax, upper back, lower back, and root. For this data, 98% of the arm variability can be described by only 7 dimensions. A transform matrix T is created from the eigenvectors that transforms the 18 angle vector to a 7 dimensional space. A sample point α consists of 7 values α=(α1, α2, …, α7), each corresponding to one vector element. The sample poses used for learning the PCA are transformed into their respective P-space coordinates. The minimum and maximum value of each dimensions are retained from these sampled poses. For each dimensions, a set of interval ranging from the minimum value αmin to the maximum value αmax is created. A set of new poses are created by selecting points at an interval in P-space. For each dimension, 4 evenly spaced points between the minimum value and the maximum value is chosen. Points contain a combination of values from each interval. 32 X vs Y X vs Z 20 20 10 10 0 0 -10 -10 -20 -20 -20 -10 0 10 20 -20 -10 0 10 20 Y vs Z 30 20 10 0 -10 -20 -20 -10 0 10 20 Figure 3-4 The coordinates of the right hand from both the original data (black) and the natural pose samples (grey). The top-left picture is the cutaway x-y coordinate view, the top-right, x-z and the bottom z-y The created points in P-space correspond to new poses that were not in the original sample. These points are also quite different than the data learned. Figure 3-4 shows the right hand effector coordinates from both the original sample poses as well as the new starting poses. The grey dots represent the right hand coordinates of the new poses, while the black dots 33 represent the original sample poses. It is apparent that the synthesized poses have a larger range of reach than the original ones. The naturalness score for each unconstrained point was calculated by calculating the similarity to an existing pose in the database. Comparison was done on a bone by bone basis. Given two poses, one a randomly selected sample point, and a pose from the MOCAP database, the quaternion dot product between each bone angles was computed. The dot product can be used to measure the angle needed to rotate from one angle to another. As the angle becomes more similar, the rotation needed decreases, and the dot product approaches a value of 1. Equation (3-6) shows the similarity measure between two poses, each having k bones, δ = ∑ (1 − (α1k ⋅ α 2k )) (3-6) k We compare this method with two random methods for generating poses. The first method simply selects a set of random angles (within the DOF limits) to generate a pose. The second method creates random numbers using the distribution of the angles in the training data. The method is also compared to actual natural pose taken from the MOCAP database. A random pose is taken from the MOCAP database, the similarity distance is calculated over all poses not in the same motion. The reason for this is that poses in the same motion tend to be close together (especially a pose which comes before or after the reference pose in a motion). 500 poses were generated for each method. The 90 th percentile of the distance is th calculated. The reason only the 90 percentile of the distance is calculated is that in some poses, the difference of the most different bones significantly dwarfs the values of the other differences. The data was resampled (bootstrapping) 1000 times. The mean angle in radians of the 90 34 th percentile for each degree of freedom is shown in Figure 3-5. ANOVA was used to calculate that the means are different at a .95 significance. Angle Difference 45 40 35 Angle 30 25 20 15 10 5 0 wrist radius humerus clavicle thorax upperback lowerback Joint Natural Motion (leave 1 out) Random (Natural Distribution) N-Space Random Figure 3-5 Naturalness of random poses generated by the P-space, random (with distribution), random, and from the motion capture data From the graph above, it is apparent that poses in pose space are more natural compared to the completely random algorithm. Algorithms that are using the P-space have a potential of creating more natural poses than algorithms that use regular space. 35 3.3 Multiple Natural Spaces Another factor of concern is the creation of an N-space that is too limited or too specific. The number of data and motion may not be enough to generate an N-space capable of all natural movements. The correlation between body parts that are not near each other may be small, and therefore. Furthermore, the work of Grudzinski concludes that PCA is often better for small sets of joints [81]. Therefore it is a good idea to create multiple N-spaces (one for different parts of the body) based on different parts of the data and then joining them in the animation generation phase. Using a single PCA for the whole body also lead to over generalization of the data. This over generalization reduces the number of possible poses inside the space. Figure 3-6 Division of the skeleton as shown in Figure 1-1 into multiple segments 36 In order to alleviate these problems, there is a need to create multiple natural spaces each one corresponding to one particular group of body parts. This gains importance due to the fact that there are multiple constraints for each target pose or motion. The algorithm proposed will combine these various natural spaces in generating the correct path to the pose. There have been various attempts to divide the human body into parts in order to simplify kinematics and dynamics calculation [57], [64], [82], [83]. The segments created in previous work are free of overlaps. As such, there is a loss of information between the various segments. In order not to lose all the relation between body segments, an overlapping segment scheme is used. In this scheme, some of the joints (such as the lower back, upper back, and thorax) are used in multiple spaces. The upper body part (thorax and back) is important to both the left hand and right hand, and therefore it is impossible to be placed in only one space. Our method divides the body into 5 subsets which have overlapping joints (Figure 3-6, Table 3-1). Table 3-1 List of bones for each segment Region Right Hand to root Left Hand to root Right Foot to root Left Foot to root Head Bones Right hand, right wrist, right radius, right humerus, right clavicle, thorax, upper back, lower back Left hand, left wrist, left radius, left humerus, left clavicle, thorax, upper back, lower back Right foot, right tibia, right femur, right hip joint Left foot, left tibia, left femur, left hip joint Head, upper Neck, lower Neck, thorax, upper back, lower back Number of Dimensions 18 18 6 6 18 3.3.1 Effector Space A problem with segmenting the body into multiple parts as shown above is that the correlation between bones in different part is lost. Another, more general problem is that, in many inverse kinematics problems, the constraint only applies to some body parts while 37 coordinates of other body parts are not specified. When the right hand is constrained to be placed at a desk for example, where should the other parts of the body lay? To solve this problem, a space for the coordinates of various body parts is created. This method was also used by Ishigaki et al to compare similarity of a user pose to an example motion for control of avatars [84]. The coordinates of the hands, feet, elbow, neck and head are stored as shown in Figure 3-7. PCA transforms the data into a lower dimensional space. This new space is called the effector space and acts as a guide for positioning the whole body. Figure 3-7 Location of effectors for effector space. The coordinates of joints that are colored in red are stored used to create a lower dimensional space. 38 4 Poses Poses are the basic building block of motion. One way of thinking about motions is a continuous set of poses, and therefore poses serve as a good starting point for animation. By calculating a pose that meets a certain constraint, one can create motion by interpolating from starting pose to end pose. The constraints of a pose can be coordinates/position of the end of various bones, orientation of bones, or specific areas where the bones must not go through (collision detection). The natural poses that are of interest are those poses that meet the constraints, as well as being as close as possible to the starting pose. The algorithms described here generate poses from pose spaces as described in the previous chapter. In order to generate natural poses, two strategies are used: search for closest natural pose and inverse kinematics in the pose spaces. 4.1 Lookup strategy A search on motion capture data can be performed in order to find poses meeting certain constraints. In order to do this reliably, the motion data must be complete, which will likely mean it must be very large. Instead of finding exact matches, it is also possible to find poses that are near to meeting the constraints and then performing minor modifications on the resulting pose. In order to efficiently perform the search, poses from motion capture data are placed into search optimized data structures such as the oct-tree or Rapidly-expanding Random Tree (RRT) [85]. RRT has been often been used in a planning stage, to find multiple poses (from starting to ending pose) that meet the constraints set out by the environment [63], [64]. 39 The algorithm described here do not use the actual motion capture data. Instead this method first generates a set of poses that is deemed to be representative of the whole possible set of natural poses, referred to as unconstrained poses. Unconstrained poses are simply poses created by selecting points in the pose space. This point is then transformed to an actual pose by using the inverse of the PCA transformation matrix. The motion capture data utilizes skeletons consist of 29 joints and 59 degrees of freedom. In this work we limit the method to finding poses for the right hand. As we are only concerned about arm pose location, we only utilize the angles from bones that connect the root bone to the right hand bone. The right hand bone acts as the end effector. There are 9 bones and 18 degrees of freedom between the right hand bone and the root bone. In our experiments, the goal is to place the end effector (hand) at a particular, specified, position. The starting pose of the virtual human is all the same. Bones from the hand to the root bone (lower back) are considered; other bones are ignored. The algorithm proposed is called the Constrained Pose Generator (CPG) algorithm. Sample points were predetermined by choosing points on a grid in P-space. For this data, 98% of the arm data variability can be described by only 7 dimensions. A P-Space was created using 7 dimensions. For each of the seven dimensions, points are sampled on the grid, starting from the minimal value to the maximum value. The lower dimension of PCA captures more variability than the higher dimensions; therefore the lower dimensions were sampled at a higher rate. The total number of points used as a seed for this method is 53000. These points are stored in database, indexed by the end effector position to facilitate fast searching. 40 The similarity calculation in Equation (4-1) was performed on a more complete motion database. The motion database contained over 3 million poses. For each sampled pose the similarity score is the minimum value calculated using the above formula. Given a starting pose P0, our method seeks to determine a set of DOFs that place the end effector (the right hand) at a desired target position T(x, y, and z). We find a sample point Ri having the end effector position S in the pose space which is the best match for that pose. The criteria to find Ri is based on the distance of the end effector with the addition of a weighted naturalness score δ. 2 (4-1) arg min i Ti − S i + wδ Once the algorithm determines a candidate pose that is nearest to achieving the desired constraint, either Coordinate Cyclical Descent (CCD) [35] or the iterative Jacobian [25] is used to refine the pose so as to accurately meet the constraints. 4.1.1 Results To test the result of this algorithm, 1000 random single constraint problems (on the position of the right hand) was generated to determine accuracy of algorithm as well as naturalness of results. The algorithms tested are the CPG using CCD, CPG using the Jacobian, and Iterative Jacobian algorithm. Table 4-1 shows how accurate the various algorithms were at finding a solution. Out of the 1000 constraints given, 104 constraints were never found by any of the algorithm. This could mean that the constraints were out of reach range of the virtual human. The two scores in Table 4-1 show the accuracy for all constraints, and accuracy for only the reachable constraints (with a confidence interval of 95%). Based on this table it is clear that in terms of accuracy in finding the 41 correct pose for a given constraint, usage of P-space is an improvement to using iterative Jacobian. In some of the poses, the inverse Jacobian algorithm creates poses where the body is twisted and awkward. There are of course exceptions. One of the main problems with the Jacobian is that all joint angles are changed, even though in natural human motion not all joint angles change to achieve a pose. Table 4-1 Accuracy comparison of various pose generation algorithm Method Accuracy CPG (CCD) CPG (Jacobian) Jacobian 0.7280 ± 0.0276 0.8090 ± 0.0244 0.6030 ± 0.0303 Accuracy for Reachable Constraints 0.8125 ± 0.0256 0.9029 ± 0.0194 0.6730 ± 0.0307 To compare the naturalness of each algorithm, the mean angle difference between each bone in the generated pose and the closest natural pose is calculated. The 90 distance is calculated. The reason only the 90 th th percentile of the percentile of the distance is calculated is that in some poses, the difference of the most different bones significantly dwarfs the values of the other differences. Therefore we take the biggest 10% difference out of the data. The data was resampled (bootstrapping) 1000 times, and ANOVA was used on the data. From ANOVA we find that the mean were different with 95% confidence. The mean angle in radians of the 90 percentile for each degree of freedom is shown in Figure 4-1. 42 th 90th Percentile Mean Angle Difference CPG (CCD) 0.6 CPG(Jacobian) Angle (Radians) 0.5 Iterative Jacobian 0.4 0.3 0.2 0.1 0 wrist radius humerus clavicle thorax upperback lowerback Bone Figure 4-1 Mean angle difference of poses created by CPG (both Jacobian and CCD) and iterative Jacobian method Based on this graph, we can see that the CPG algorithm created more natural (as shown by a smaller difference to the sampled motion. Because the CPG algorithm moves the starting point, the Jacobian method is able to find poses which are more natural. The resulting poses for the different algorithm on 6 different targets are shown in Figure 4-2. One of the major disadvantages of this method is that only the generated pose is natural. The modifications made using CCD and Jacobian do not necessarily keep the pose in a natural state. Another problem is that the generated P-Space pose is only based on the distance to meeting the constraints and not on the starting pose. The starting P-Space pose may actually be very far from the starting pose, and thus the ending pose is not always an optimal solution. 43 Figure 4-2. Each frame shows the result of running the CPG algorithm for each of 6 poses. The pictures on the left side are the front view of the pose; the pictures of the right side are is from an angled view from the right side of the animated humans. The leftmost pose in each frame was 44 created using the NSPA algorithm followed by the CCD algorithm, the middle pose was calculated using CCD, and the rightmost pose was calculated using the iterative Jacobian method. 4.2 Inverse Kinematics in Pose Space In order to address these problems, a method utilizing iteration within the N-Space is used. Iteration through N-Space will also take into account the starting pose. The method proposed here uses the inverse Jacobian in N-space to find pose that meets the constraint. In the previous chapter, CCD was shown to better find natural position from unconstrained N-space poses. However the Jacobian method was preferred for the inverse kinematics because of two things: 1. It is difficult to do multiple constraints with CCD. With Jacobian you just need to add the new degree of freedom to the Jacobian. 2. Segmented needs synchronization between two N-spaces. It is also difficult to do this with CCD. In CCD you change one DOF/dimension at a time. With multiple spaces we may have to change a degree, and then have it changed by the synchronization process. The first step is to calculate the Jacobian. The Jacobian determines changes in angle effector coordinate(s). Instead of the actual angle however, P-space is used. Therefore the Jacobian calculates the changes in P-space dimension to the effector coordinates. The Jacobian depends on the model used for the skeleton. Our model used a skeleton in which the forward kinematics formula is a series of rotation matrix (R(x)) and translation matrix (L). This corresponds to the bone having a rotation around the x, y, and z axis, followed by a translation (Equation (4-2)). P is a vector (Px, Py, Pz) containing the coordinates of the vector. 45 P = ∏ Rx ( xi1 ) R y ( xi 2 ) Rz ( xi 3 ) L (4-2) i However because we are using forward kinematics in N-space, this formula changes th slightly. To find the value for the n degree of freedom, Equation (4-3) is used. The angle for the n th degree of freedom is simply the n th T row of T multiplied by the current point in P-space (Equation (4-4)). (4-3) T X n = Tn × α (4-4) T T T X n = (Tn,1 × α1 ) + (Tn,2 × α 2 ) + L + (Tn,k × α k ) The forward kinematics in N-space can then be calculated as P = ∏ R x (TiT α ) R y (TiT α ) R z (TiT α ) L 1 2 3 (4-5) i th T The nth value of X is a weighted sum of α according to the n row of T . The derivation th th of the nth angle of X with regards to the i value of α is therefore the i weight of the value, or th th T simply the i value of the n row from T (Equation (4-6)). (4-6) dX n T = Tn,i dα i d ( ab) = a ' b + ab' (4-7) th Using the product rule for derivation (Equation (4-7)), the Jacobian for the i dimension th in P-space and the j degree of freedom can be calculated. To simplify the derivative equation, let Ψ represent each rotation and translation equation in the forward kinematics calculation of P (Equation (4-8)) such that P is simply the product of all Ψ (4-9). For all rotation factors of P, the derivation of Ψ with regards to α is shown in Equation (4-10). The translation factors of P are constant and therefore the derivations of such factors are 0. 46  R (TiT α )  Ψi =   Li  P = ∏ Ψi (4-8) dΨi ' = TiTj R ' (TiT α ) , dα j (4-10) (4-9) i Based on the above equation and the product rule, the partial derivative of the i th th coordinate in the pose P with regards to the j value of α is: n (4-11)  n −1  dPi = ∑  (∏ Ψk ) Ψn ' ( ∏ Ψk )    dα j n  k =1 k =n +1  The iterative Jacobian method of solving inverse kinematics is to iteratively calculate ∆X that moves the point in P-space to another point that is closer to meeting the target. At each step of the iteration, ∆P, the vector between the current coordinates Pt and the target constraints Pc is calculated (Equation (4-12)). (4-12) ∆P = Pc − Pt The Jacobian J in P-space is similar to the Jacobian in normal space. The difference is that each element is a partial derivative of P with regards to changes in α. This is shown in Equation (4-13) dP  dP 1 1 dα1 dα 2   dP2 dP2 dα1 dα 2 J =  M  dPm dPm  dα 1 dα 2   The aim is to find ∆ α, which is the (4-13)  dα n   dP2 dα n   O M  dPm  L dα n   changes to α that moves the skeleton to the target L dP 1 pose. ∆ α can be calculated by using Equation (4-14) J∆α = ∆P (4-14) 47 The Jacobian is not a square matrix, and therefore not invertible. There is no guarantee that the problem is under-constrained or over constrained. To solve the problem, we find ∆α that minimizes the distance to the following equation. 2 (4-15) J∆α − ∆P Furthermore, the angles of the joints should move in small interval. A large ∆α may cause the problem to overshoot, or become unnatural. A damping factor is introduced in Equation (4-16). This damping factor tries to minimize the distance of ∆α. T T ∆α (4-16) 2 A simple inverse Kinematics solution for N-Space can therefore be calculated by finding the minimum of the following equation J∆α − ∆P 2 + T T ∆α 2 (4-17) If this equation is calculated using Euclidian distance, the minimum distance is equivalent to finding the minimum value Equation (4-18). Because T is orthonormal, T*T T results in the identity matrix ( J∆α − ∆P)T ( J∆α − ∆P) + ∆α T TT T ∆α (4-18) = ( J∆α )T J∆α − ( J∆α )T ∆P − ∆P T J∆α + ∆P T ∆P + ∆α T ∆α = ∆α T J T J∆α − ∆α T J T ∆P − ∆P T J∆α + ∆P T ∆P + ∆α T ∆α The minimal value of ∆α can be calculated by deriving the formula above. The minimal value is found when the derivation of the formula is equals to 0. This results in Equation (4-21) that can be used to find ∆α through either the inverse of the left matrix or using factorization such as LU/QR algorithm [86], and therefore solve the inverse kinematics in P-Space. This method is also commonly known as the damped least square method. df d∆α = J T J∆α + ( J T J )T ∆α − J T ∆P − (∆P T J ) T + 2∆α = 2 J T J∆α − 2 J T ∆P + 2∆α 48 (4-19) df d∆α (4-20) =0 2 J T J∆α − 2 J T ∆P + 2∆α = 0 J T J∆α + ∆α = J T ∆P (4-21) ( J T J + I )∆α = J T ∆P 4.2.1 Conquering joint limits One of the problems of this method is that it does not take into account the joint limits. Problems occur when the closest path to a target is via poses that have angles outside the limits. To counter this problem, a perturbation factor that checks for limit breaking angles, and forces these angles to move to the other direction (usually to the middle of the joint limits). If a single iteration contains more than one joint outside the limits, not all are perturbed at the same time. The idea is to perturb one joint and allow other joints to adjust to the perturbed angle and change accordingly. th When the i joint is outside the limit, this method checks whether perturbation should be th performed. Perturbation on the i joint is performed when the number of steps, and i meets the condition in Equation (4-22). When the modulo of these two numbers to a constant k is equal, then perturbation starts. Once a perturbation starts, it will remain active for w iterations (w is a window size constant). mod(n _ step, k ) == mod(i, k ) (4-22) Equation (4-23) describes this perturbation factor. ∆m is the target joint angles, calculated by the difference between the mid-point of the joint limits and the current joint angle. This vector only contains joints currently subject to perturbation. The M factor is a dp x dα (P 49 rows times α columns) transformation matrix that simply deletes all unnecessary factors in the PCA space if the factor does not effect the angle being perturbed. MT T ∆α = ∆m (4-23) To calculate the change in angles/point in P-space, each step of the iteration tries to find a change to α that brings the body closer to the constraints, and at the same time make sure that the intersecting bones are consistent. To solve this, we find a solution for ∆α that minimizes the following equation: 2 λ1 J∆α − ∆P + λ2 T T ∆α 2 + λ3 MT T ∆α − ∆m 2 (4-24) The equation consists of a weighted sum of three distances. The first distance is based on the difference between the current body pose to the target constraint. The second term is the damping factor, which simply measures the distance between the current pose and the ending pose based on ∆α. A damping factor limits the size of ∆α, making it change only in small incremental steps. The third factor is the perturbation factor described above. The weights λ is set beforehand to calculate the importance of each factor. Similar to the simple inverse kinematics equation, the minimum value of this equation can be calculated by finding the derivative and setting it to 0. The solution for ∆α can be calculated by solving the following equation: (λ1 J T J + λ2 I + λ3TM T MT T )∆α = J T ∆P + TM T ∆m (4-25) 4.2.2 Naturalness An important part of this research is determining the naturalness of the resulting poses. The most notable method of calculating naturalness from motion capture data has been performed by Ren et al. [82]. They measured naturalness using various methods (Mixture of 50 Gaussians, Hidden Markov Models, Naïve Bayes, and Switching Linear Dynamic System) and compared the results to human scored naturalness. Due to time constraints, this thesis does not implement those methods. In this thesis, naturalness is measured by calculating the Gaussian probability density function based on a testing sample. The testing sample consists of 96000 poses taken from the motion capture data. The testing model is assumed to have a Gaussian multivariate distribution with a mean and covariance N(µ,Σ). The naturalness score for an algorithm is calculated by determining the likelihood that a set of poses were generated by the sample distribution given in Equation (4-26). L( N ( µ , Σ) | X ) = f ( X | N ( µ , Σ)) = ∏ f (xi | N ( µ , Σ)) (4-26) i Where f(x| N(µ,Σ)) is simply the Gaussian multivariate probability distribution function, with k being the number of dimensions (Equation (4-27)). f ( xi | N ( µ , Σ)) = 1 πk Σ 1 − ( xi − µ )'Σ −1 ( xi − µ ) e 2 (4-27) Due to the large size of k, it is common to calculate the average log likelihood in order to compare the various algorithms (Equation (4-28) and Equation (4-29)). ln(L( N (µ , Σ) | X )) = ∑ ln( f ( xi | N ( µ , Σ))) (4-28) i l= 1 ∑ ln( f ( xi | N (µ , Σ))) n i (4-29) The average log likelihood of each algorithm corresponds to the degree of naturalness of the algorithm. An algorithm with a higher average log likelihood score is deemed to be more natural than an algorithm with a lower score. Matlab was used to calculate the log probability distribution function and log likelihood. With Matlab, the minimum positive number is approximately 1x10-300, any value lower than that is considered to be 0. As ln(0) is undefined, a 51 constant value of -750 was used as the result of ln(0) in order to avoid working with undefined number. 4.2.3 Number of Dimensions For PCA methods the number of dimensions used has a significant effect on the resulting poses. To find out the effect of the number of dimensions on accuracy and naturalness, we run the single-space PCA algorithm on 100 randomly generated constraints for one constraint problem (the right hand). The original data has 56 dimensions, and testing was performed on 1, 7, 14, 21, 28, 35, 42, 49 and 55 dimensions. The results of the accuracy and naturalness is shown in Figure 4-3 and Table 4-2. In general, an increase of the number of dimensions increases the accuracy. Accuracy is the percentage of poses that meets all constraints. As the number of dimension increase, there is more freedom in the motion allowing the algorithm to find poses that meet the constraints. Naturalness on the other hand, peaks when the number of dimensions is approximately 50% of the original dimension. Allowing more freedom of movement is detrimental to naturalness, as it allows poses that are not natural to be used. Table 4-2. The effect of different number of dimensions to naturalness and accuracy Number of Dimensions 1 7 14 21 28 35 42 49 55 Average Log Likelihood -750 -738.8863 -529.2594 -387.8207 -326.3475 -502.4741 -502.2705 -678.8402 -673.3091 52 Accuracy 0 0.71 0.9 0.94 0.94 0.97 0.99 1 0.99 For the algorithm comparison it is important that the algorithms have a high degree of accuracy and naturalness. We use a threshold value of 95% for the accuracy with as high as possible score for naturalness. This translates to 30 dimensions (from 56 original dimensions) for the single PCA. For the multiple-space PCA, 12 dimensions were used for the right hand, left hand and head space, and 5 dimensions for the right and left foot space. The same number of PCA dimensions is used for all constraint groups. Figure 4-3. Graph of number of dimensions vs accuracy and naturalness 53 4.3 Multiple PCA Space Inverse Kinematics It is now a matter of solving a multi-constrained inverse kinematics problem using multiple spaces. There are two ways of solving multi-constrained inverse kinematics problem, using priority [51], [52], or using weights [53]. In this proposed algorithm, a dynamic weighting system is used. In essence, the importance of the estimated effector space constraints decreases at every step while the importance of the external constraints remains the same. This allows the algorithm to satisfy the external constraints in cases where both constraints cannot be satisfied at the same time. For each kinematics step the current target position ∆P is a matrix that consists of the external constraints Px and the effector space constraints Pe. Section 4.3.1 describes how to calculate both the external constraints and the effector space constraints. The external constraints are capped with a distance of md (Equation (4-30)) while the effector constraints are capped with a variable distance that depends on md and the step number n (Equation (4-31)). If the resulting calculated distance for the effectors space is lower than a constant kd then that constraint is ignored. This reduces the number of constraints that the calculation must meet and helps in finding the solution faster. This also has the effect that as the number of steps increases, the inverse kinematics solution prefers to stay at its current position rather than trying to move to the exact position of the effectors constraint d tx ( P) = min( d te ( P) = min( md 2 ,1) P md nP 2 ,1) 54 (4-30) (4-31) d tx ( Ptx ) Ptx   ∆P =  ∀ Pte ,dte ( Pte )>kd d te ( Pte ) Pte    (4-32) 4.3.1 Estimated Effectors Space The effectors space described in chapter 3.3.1 is implemented and utilized in this method. If P is a vector of coordinates of the various body parts, TE is a PCA transformation matrix to a lower dimension based on learning the coordinates of the 8 effectors positions from the motion capture data. β is the resulting point in the reduced dimensional space according to Equation (4-33). The reduced space is notated as CC-space (constraint coordinate space) β = TE P (4-33) There are two groups of constraints, the external constraints Cx which are specified by the user or by the actions, and the new set of constraints, called the estimated constraints Ce which constraint the rest of the body parts. The estimated constraint Ce, is calculated based on the current position as well as the target constraint. Given that β0 is a point in CC-space corresponding to the current pose, ∆β is the difference in CC-space that corresponds to a point that meets the external constraints Cx. Γ is a matrix that finds the external constraint elements from a pose. To calculate ∆β, the following equation is minimized: T ΓTE ∆β − (ΓPt +1 − ΓPt ) 2 + ∆β 2 (4-34) Equation (4-34) is minimized by solving the following equation: T (TE Γ T ΓTE + I )∆β = TE Γ T (ΓPt +1 − ΓPt ) 55 (4-35) The target position Pt, consisting of both the external and the estimated constraints is calculated using Equation (4-36). This value is used in the calculation for finding ∆α such as in Equation (4-25). T TE (∆β + β 0 ) = Pt (4-36) 4.3.2 Matching overlapping angles The biggest problem with creating segments that overlap is that some of the angles are in multiple segments. When calculating the inverse kinematics solution for each segment, angles that are in multiple segments must result in the same value. The first attempt to solve this involves creating additional factors to control the angle th difference. One angle may appear in two segments k and l. This angle appears as the i angle of th th th the k segment and the j angle of the l segment (Equation (4-37)). xi, k = x j , l (4-37) The angles from the above equation can be calculated from the points in P-space through: T TiTk α k = T j , l α l , T TiTk α k − T j , lα l = 0 , [Ti, k (4-38) α  − T j,l  k  = 0 α l  ] For multiple angles, a matrix can be created for all angles that must be matched. A new matrix Γ combines the subtraction formula for multiple angles, one formula per row. The above equation is equivalent to: α  Γ k  = 0 α l  (4-39) 56 The above equation can be added as a factor to the Equation (4-24). However, experiments show that this formulation decreases the accuracy of the solution. This significantly adds to the number of constraints in the system. The iterations using this formula often end up at a local minimum that does not sufficiently address the main constraints. 4.3.3 Weighting Overlapping Angles A second attempt to solve the overlapping problem is through weighting after the calculation. Once the change in α is calculated, the overlapping angles is consolidated. Consolidation is done through calculating the weighted sum of all angle spaces. Equation (4-40) th th describes the calculation for the k joint angle, based on the weight of the i space (wi), and the th th k angle calculated by the i space. ∑ wiθi, k θk = i (4-40) ∑ wi i th The weights used are based on whether the i space corresponds to an active constraint or an effector space constraint. If the space affects an active constraint, it is given a priority weight of 1.0 or a non-priority weight 0.8. Only one space can have a priority weight of 1.0 at one time, therefore the space that has the priority weighting is changed every n steps (we use 10 steps). Passive constraints or effector space constraints is given a weight of 0.4. All the constants here were found using empirical methods. This weighting scheme gives more importance to the active constraint spaces and allows the solution to be found much quicker. 57 4.4 Results Comparisons were performed on a different number of constraints. There are five groups of constraints that are used. All of the constraints were the Cartesian coordinates of the various body parts. For each group, 100 randomly generated set of constraints were created. The groups are: 1. Right hand constraint only 2. Right foot constraint only 3. Right and left hand 4. Right and left foot 5. Right hand, left hand, right foot, and left foot 4.4.1 Accuracy The first measure for the algorithm is the accuracy. The accuracy is simply the number of poses found that meets all constraints. The algorithm must find a pose that meet all constraints in 250 iterations. Table 4-3 and Figure 4-4 shows the accuracy of the various algorithms in finding poses that meet the constraints. An increase in the number of constraints tends to lower the accuracy of the various algorithms. This may be because it is impossible to achieve a pose that meets all constraints. Another explanation is that some of the algorithms simply iterates on a local minima during the Jacobian iteration. When the iteration count is higher than the maximum allowed, the application stops the algorithm and marks it as unable to fulfill the constraints. 58 Table 4-3. Accuracy comparison of all algorithms R. Hand Constraints R. Foot Constraints R + L hand constraints R + L foot constraints R+L Hand + R+L Foot constraints 0.32 0.37 0.68 0.79 0.61 Single PCA 0.95 0.96 0.96 0.63 0.91 Single PCA+ 0.91 0.98 0.28 0.82 0.14 Regular Jacobian 0.98 1.00 0.69 1.00 0.65 Multiple space PCA The foot constraints problem causes a lot of difficulty. A look at the right foot constraints problem shows that the single PCA algorithm is unable to find the goal because in trying to find the shortest path to a goal, typically tries to bend the knee outward (past the bounds of the angles). The Multiple space PCA method has a very high accuracy compared to the other methods. The only exception to this is for the right and left hand constraint group where the Multiple spaces PCA does worse than the Single PCA method. A possible reason for this is that the right and left hand constraints cause a conflict in the overlapping joints of the spaces. As such, the algorithm finds a pose where the error is minimized, which is to say the total distance from the end effector position to the constraints is the smallest. However this pose does not actually meet any of the constraints. 59 Figure 4-4 . Graph of accuracy of all algorithms 4.4.2 Naturalness Table 4-4. Average log likelihood of all algorithms R. Hand Constraints Regular -712.4789 Jacobian -339.3964 Single PCA Single PCA+ -315.1096 Multiple -264.8083 space PCA The naturalness of each R. Foot Constraints R + L hand constraints R + L foot constraints R+L Hand + R+L Foot constraints -141.356 -169.534 -234.453 -721.299 -577.728 -445.176 -219.605 -298.049 -363.788 -696.894 -580.832 -474.114 -133.141 -305.194 -139.147 -348.928 pose created by the algorithm is measured by Equation (4-29). To analyze this data further, bootstrapping was used to find the mean and the variance. 1000 60 pose groups were created for each algorithm. Each pose group consists of n poses, where n is the number of correct poses found by the algorithm. The poses in the pose group is found by randomly sampling with substitution from the correct poses. We calculate the average log likelihood for each group and then calculate the mean and standard deviation (Table 4-4 and Figure 4-5). The regular Jacobian method on average results in less natural poses for the hand constraints and the hands and feet constraints. However the poses for the feet constraints are more natural than the resulting poses from the single PCA algorithms. For the feet constraint groups we find that the regular Jacobian method does not move the upper body at all. Because the upper body remains in the starting pose (which is a natural pose), and given that the number of non moving joints is much higher, the probability of this pose being a natural pose is also higher. 61 Figure 4-5. Average log likelihood of all algorithms graph For all constraint groups, we can see that the multiple space PCA performs better than the single PCA or even the single PCA with effector space and perturbation. One reason why single space PCA creates unnatural poses is that all joints are correlated in this algorithm, therefore moving one part a certain angle, also moves another part that is deemed to be related to it. Moving the arms is correlated to movement in the legs, and the more the arm joint moves, the more the leg joints move. Figure 4-6 shows how the legs in figures created using the single space PCA method is unnatural due to the fact that they are moved in relation to the hands movement. In these figures, only the position of the right hand is a hard constraint. 62 Figure 4-6. Unnaturalness is caused by extreme leg movement in single space PCA. The figures were created by (from left to right) regular Jacobian, single space PCA, single space PCA +, multiple space PCA The second reason for unnatural poses is the proclivity to bend to a target when the natural thing is to turn the body towards a target. Bending is preferred because it is usually the closer solution in the pose space. However, as we can see in Figure 4-7, bending the body to a particular target sometimes results in unnatural poses. Because the correct poses created by each algorithm differ, the naturalness score may be skewed. To compare the algorithms more fairly, pair wise T-test were performed on pairs of algorithms. The list of tests consisted of the single space (Single PCA) versus single space + effector space + perturbation algorithm (Single PCA+), single PCA vs. multiple spaces, single PCA + vs. multiple space, and Jacobian vs. multiple spaces. Only figures that meet constraints for both algorithms were included in this calculation. The paired log likelihood score for these algorithms are shown in Table 4-5, Table 4-6, Table 4-7, Table 4-8, and Figure 4-8. Again, bootstrapping was used on the paired data in order to retrieve the best estimate for mean and standard deviation. T-test results suggest that the difference between all of the pairings are significant (p<0.01) 63 Figure 4-7. Unnaturalness is caused by extreme leg movement in single space PCA. The figures were created by (from left to right) regular Jacobian, single space PCA, single space PCA +, and multiple space PCA Table 4-5. Average log likelihood comparison of the Inverse Jacobian and Multi PCA algorithm R. Hand Constraints R. Foot Constraints R + L hand constraints R + L foot constraints R+L Hand + R+L Foot constraints -711.811 -719.07 -689.783 -717.499 -750.00 Jacobian -253.033 -269.747 -304.883 -273.159 -307.586 Multi PCA As expected, the regular Jacobian creates poses which are the most unnatural. For poses having the same constraint, the multiple-space PCA algorithm creates poses that are significantly more natural. Table 4-6. Average log likelihood comparison of the Single PCA and Single PCA + algorithm R. Hand Constraints R. Foot Constraints R + L hand constraints R + L foot constraints R+L Hand + R+L Foot constraints -323.0806 -340.911 -352.788 -369.725 -406.215 Single PCA -303.793 -333.184 -334.931 -388.236 Single PCA+ -296.2438 The single space PCA and single space PCA+ algorithm’s average likelihood do not differ as dramatically as other pairings. Adding the perturbation factor and a set of effector space constraints to the single space PCA algorithm helps control the location of all the nonconstrained body parts, and thus the single space PCA+ algorithm creates more natural poses. 64 Table 4-7. Average log likelihood comparison of the Single PCA and Multi PCA algorithm R. Hand Constraints R. Foot Constraints R + L hand constraints R + L foot constraints R+L Hand + R+L Foot constraints -330.6858 -339.53 -373.038 -369.51 -347.944 Single PCA -256.6454 -265.001 -287.241 -279.14 -260.00 Multi PCA Table 4-8. Average log likelihood comparison of the Single PCA+ and Multi PCA algorithm R. Hand Constraints R. Foot Constraints R + L hand constraints R + L foot constraints -325.293 -266.892 -334.842 -280.256 -332.925 -269.738 Single PCA + -307.2701 -256.5242 Multi PCA Figure 4-8. Graph of pairwise comparison between algorithms 65 R+L Hand + R+L Foot constraints -321.276 -263.291 Table 4-7 and Table 4-8 shows that the multiple space PCA method creates poses that are more natural than both the single space PCA and single space PCA+ algorithms. A graph of these values is shown in Figure 4-8. Figure 4-9 shows a small sample of results of the algorithms for 2 constraint problems (hands). For each pose group the figures show the resulting pose calculated by the regular Jacobian, single space PCA, single space PCA+, and multiple space PCA method. A more detailed set of samples for different constraint groups can be seen in Appendix A. Figure 4-9 Sample result poses for 2 constraints (hands) problems 66 5 Motion Motion is a set of frames presented sequentially a such a rate of speed that the illusion of smooth movement is created. To create motion automatically, an algorithm creates a number of frames containing the poses, from the starting pose to an ending pose that meets the target. The algorithm described here finds a motion in motion space (M-space) to move a virtual human from the starting pose to the ending pose that meets constraints. The M-space is a specialized version of a natural space that contains data regarding the phase, which is the pose configuration and angular speed for each frame. The ending pose must be predefined through other algorithms (e.g. the P-space inverse kinematics algorithm described in Chapter 4.3). The M-space algorithm uses the damped least squares method [87], [88] to find a set of frames that allows a natural movement between the starting pose and the ending pose. 5.1 Characteristic of natural motion The simplest method for creating motion between P0 and Pt is simple linear interpolation. Linear interpolation assumes the velocity at each time frame is the same, and that the path from P0 to Pn is a straight path. To perform this type of interpolation, the first thing to do is calculate the interpolation step ∆P, which is simply the vector from P0 to Pn divided by the number of frames (Equation (5-1)). Each step of the motion can be calculated by using the pose for the previous frame and adding the interpolation step (Equation (5-2)). This step is repeated until the final pose is reached. P −P ∆P = n 0 n (5-1) 67 Pt +1 = Pt + ∆P (5-2) Figure 5-1 The speed and angles of natural motion and a linear interpolated motion However, observation of captured motion data reveals that natural motion do not look like the results of linear interpolation. Natural motions do not have a constant speed for all angles. The angles for each joint do not always move with the same speed, some angles may start slow and speed up in the middle before slowing down again near the end. Joint angles sometimes deviates from the straight path between the start and end angles. An example of the speed of joint angles in a natural motion created by the right hand segment, with a length of 32 frames is shown in Figure 5-1. The left part of the figure shows the 68 natural speed, the actual angles in degrees for each degree of freedom, and the speed of each degree of freedom. The right part shows the same attributes created by linear interpolation between the starting pose and the ending pose. The same non linear motion characteristics have also been observed in faces deformation [89]. 5.2 Utilizing a Motion Space Chapter 2 discusses the various techniques used to create motion. This thesis focuses on how to use existing motion capture data to generate motions. There are two approaches to creating human motion assuming a constrained target. The first is to generate a motion that finds a pose at the end that meets the constraints. The other method finds a set of key frames that meet the constraints first, and then interpolate the rest of the motion. The second approach is the approach used in this thesis. The creation of synthesized motions from motion data, have been proposed by various authors. Abe et al. creates a limited amount of poses which have the same physical requirements using transformation (by rotation or translation) of motion capture data [32]. Grochow et al [90] compute poses via kinematics based on data learnt from captured motion. They calculate the likelihood of a particular pose using a probabilistic model. New poses are synthesized using an optimization algorithm in which the objective function depends on the learnt poses. Similarly, in Yamane [63], inverse kinematics is calculated using a constrained optimization algorithm. Captured data is stored and used as soft constraints. Motion is created by the smoothing of various results of IK computations over multiple positions. The algorithm proposed by Pan [64] uses RRT to plan out the key frames necessary. For each set of key frames, a motion is calculated using interpolation, and depending on how natural it is (and whether it is in a constrained environment or not), it is replaced by existing motion 69 capture data. They first find a motion capture data segment that resembles the motion they have, and then uses that. This still requires a large database of motion capture data to be able to create good results. PCA have been used on whole motions (normalized skeletal configuration across multiple frames) [79], [80]. The created PCA space is then used to create new motions with different constraints. In order to do this, only similar motions, or motions which performs one task can be summarized using the PCA, for example golf motions. Instead of a using each motion as one data point, the algorithm described here uses the phase (a combination of pose configuration and angular velocity) to create a reduced motion space (M-space). Given the starting pose P0, the ending pose Pn, and the number of frames n, the proposed method attempts to find a natural motion (through generation of the interleaving set of poses) that moves the human from the start pose to the end pose. The poses are found by finding the points in a lower dimension motion space that meets motion constraints. The algorithm focuses on finding the shortest path between the starting pose and ending pose. Realistic physics may sometimes be overlooked. For example, if position A was a standing position and position B was a kicking position (where the leg meets the ball), it is unlikely that the algorithm will try to move the leg back to gain momentum for kicking. Instead the end results will probably be a straight path between the starting positions to the kick position. This can be remedied by having a constraint that the motion includes a backward movement to create momentum. 70 5.2.1 The Motion Space To create a motion space, information about the current angle as well as the angular velocity for each DOF is used. To calculate the angular velocity ωt, the average difference between the previous frame and the next frame is used in order to smooth out the result (Equation (5-3)). ωt = Pt +1 − Pt −1 2 (5-3) The vector H, represents a phase of motion contains both the current angle configuration as well as the angle velocity at each frame (Equation (5-4)). The numbers of dimensions for H is double that of the original configuration space. PCA is used to create a lower dimension space (the motion space or M-space). The number of data used for creating the PCA space was increased to 45000 samples. H t = [Pt , ωt ] (5-4) N is the point in M-space corresponding to H. T is the transformation matrix from the original phase space to the M-space that transforms H to N (Equation (5-5)). T is orthornormal, and therefore the transpose of T transforms from N to H (Equation (5-6)). N t = TH t (5-5) H t = T T Nt (5-6) T T Each point in N-space corresponds to a pose and a velocity. T p and T v are transformation matrices that transforms from the point in M-space to the pose (Equation (5-7)) and velocity (Equation (5-8)) respectively. T P = TP N (5-7) ω = TvT N (5-8) 71 5.2.2 Calculating Motion Motion is calculated in the M-space. N0 and Nn is the point in M-space corresponding of the phases H0 and Hn. We attempt to find the motion M which is a contiguous set of points in Mspace corresponding to a set of poses in the original space. Figure 5-2 illustrates the M vector that includes the data for every frame. M = [N0 N1 N2… Nn] Figure 5-2 Motion vector representation used in this thesis To calculate M using N-Space, these constraints are used: 1. N0 = [P0 ω 0]. The first element of M is a PCA transformation of P0 and the initial speed 2. Nn = [Pn ω n]. The n th frame element of M is a PCA transformation of Pn and the final angular speed 3. Pt+1 = Pt + k ω t. The poses in the motion must be contiguous. The momentum added to each pose must match the following pose (possibly scaled by a constant k). 72 4. ω0 and ω n-1 are minimized. The assumption here is that the motion starts and end on a still pose. Therefore the angular velocity at the first frame and at the last frame is constrained to be as small as possible. In order to find the best motion between a starting pose P0 and an ending pose Pn, as well as meet all those constraints, the constraints are transformed to equations that can be minimized. Constraint number 1 and 2 corresponds to the minimization Equation (5-9) and Equation (5-10). This equation attempts to minimize the difference between the actual start and ending pose and the corresponding poses from the starting and ending points in M-space. Constraint number 3 corresponds to Equation (5-11). This equation tries to force the next pose in the motion to be as close as possible to the current pose plus the angular velocity. Constraint number four corresponds to Equations (5-12) and (5-13). These two equations attempt to find a point in Nspace with the smallest possible angular velocity. Finally a damping factor is introduced in Equation (5-14). This damping factor ensures that each pose remains as close as possible to the subsequent pose. T P0 − TP N 0 2 T Pn − TP N n 2 (5-9) T T ∀t =1..n −1 TP N t +1 − (TP N t T Tv N 0 (5-10) 2 T + Tv N t ) (5-11) 2 TvT N n −1 (5-12) 2 (5-13) ∀t =1..n −1 N t +1 − N t 2 73 (5-14) To get the individual M-space point corresponding to a single frame within a motion, the L matrix can be used. The L matrix’s value depends on the frame number. N k = Lk M (5-15) Therefore the Equations (5-9)-(5-14) can be written as T P0 − TP L0 M 2 T Pn − TP Ln M 2 T ∀t =1..n −1 TP Lt +1M (5-16) (5-17) T − (TP Lt M T Tv L0 M TvT Ln −1M 2 T + Tv Lt M ) (5-18) 2 (5-19) 2 (5-20) ∀t =1..n −1 Lt +1M − Lt M 2 (5-21) The motion solution M is the motion that minimizes all those factors according to Equation (5-22). Each factor in this equation is weighted by the constant lambda that denotes the importance of each part. The first four factors have the same weight because they are similar in purpose (matching the pose or the velocity). Furthermore having one weight reduces the number of parameters that must be controlled. T λ1 P0 − TP L0 M 2 T + λ1 Tv L0 M 2 T + λ1 Pn − TP Ln M 2 2 T + λ1 Tv Ln −1M 2 n −1 T T T + λ2 ∑ TP Lt +1M − (TP Lt M + Tv Lt M ) t =1 n −1 2 + λ3 ∑ Lt +1M − Lt M t =1 74 (5-22) To calculate the value of M that minimizes the above equation, the derivation for each factor with respect to M (df/dM) is calculated as shown in Equations (5-23)-(5-28) T 2( LT TPTP L0 ) M − 2( LT TP P0 ) 0 0 (5-23) T 2( LT TPTP Ln ) M − 2( LT TP Pn ) n n (5-24) T T T T T 2 ∑ (TP Lt +1 − TP Lt + TvT Lt )T (TP Lt +1 − TP Lt + Tv Lt ) M (5-25) 2( LT Tv TvT L0 ) M 0 (5-26) T 2( LT −1TvTv Ln −1 ) M n (5-27) 2 ∑ ( Lt +1 − Lt )T ( Lt +1 − Lt ) M (5-28) n −1 t =1 n −1 t =1 Finding the weighted sum of all the derivation of Equation (5-22) and setting it to 0, gives a formula that allows a matrix factorization to find M:   λ ( LT T T T L ) + λ ( LT T T T L ) 1 n P P n  1 0 P P 0    + λ ( LT T T T L ) + λ ( LT T T T L ) 1 0 v v 0 1 n −1 v v n −1   n −1   T T T T T T T  + λ2 ∑ (TP Lt +1 − TP Lt + Tv Lt ) (TP Lt +1 − TP Lt + Tv Lt )  M   t =1   n −1   + λ3 ∑ ( Lt +1 − Lt )T ( Lt +1 − Lt )   t =1   (5-29) = λ1LT TP P0 + λ1LT TP Pn 0 n The result of using this calculation to generate motion is shown in Figure 5-3. The original motion characteristic is shown on the left. The starting pose and ending pose of the original motion is used as input to the algorithm described in this chapter. The resulting motion shows variability in the total speed and angular speed. 75 Figure 5-3 Total speed, angles, and angular velocity of motion calculated using M-space. 5.2.3 Characteristic of Motion Speed The algorithm described so far minimizes the total speed at the beginning and the end. It does not constraint the speed of motion in the body of the motion. Different motions have different speed trajectories, and different motions can be created by changing the speed of the angles. An example of a speed trajectory based on a function is a quadratic function speed trajectory, which peaks at the middle of the motion. 76 Figure 5-4 Mean error of speed reconstruction on original samples Twelve walking motions and twelve running motions were selected from the motion capture database, each with different subjects. The motion of the right foot during the performance of a single step is used for speed analysis. The speed for each frame was calculated by taking the difference between subsequent frames. Every motion has different distances traveled and different number of time frames. Therefore the motion must be normalized. A set of regularly spaced key points, including the first and the last speed were selected. The speed at each key-point was then calculated. From this method 24 normalized speed trajectories from the motion samples were created. The prototypical speed for a motion is the average of those normalized speed trajectories. 77 To reconstruct the speed for a motion of n frames, the interval centers were matched to the frame index. A cubic spline algorithm [91] was used to create a curve that would best fit all the key points. This method was used to reconstruct the speeds of the original samples. The mean of the error of the reconstructed speed with respect to the original sample speed for key points between 3 and 50 was calculated and is shown in Figure 5-4. The figure shows that the error rate does not significantly decrease after 13 key points. Figure 5-5 Mean speed trajectory of running motion and walking motion Figure 5-5 shows the normalized mean speed trajectory for running and walking motions in 13 key points. The blue lines represent the running motion, while the red lines represent the walking motion. The thick lines represent the mean value or the prototype speed. As expected, different motions require different speed characteristics, and therefore this motion trajectory can be used to generate different motions. Hotelling’s T squared method was used to calculate the 78 statistical significance of the multivariate differences of means [92]. This test shows that the means of the two types of motions differ significantly (p<0.01, F13,10=17.2947) 5.2.4 Speed Trajectory Matching Given a prototype speed, the next task is to use that speed to create a motion. The total speed can be calculated from ωx as shown in (5-30). To match the velocity to a set of speed constraint S = {s0, s1… sn} the following constraint could be used. Ωx = 2 ∑ ω x,i (5-30) i 2 Ω x − sx (5-31) To simplify this problem however, we use the squared total velocity in (5-32) and minimize 2 Θ x = Ω 2 = ∑ ω x, i x (5-32) i 2 Θ x − sx 2 (5-33) Finding the derivation of this formula and then using it to calculate the motion with Equation (5-33) turns out to be complex. The reason for the difficulty comes from the w variable being raised to the power of 4. Instead of trying to derive this formula, we calculate the Jacobian with respect to each individual ω Equation (5-34). Based on Equation (5-30), it can be inferred that each partial derivative is equal to the value 2ω. dΘ  dΘ x dΘ x J = x L dω x,1 dω x , 2 dω x, n    = 2ω x,1 2ω x,2 L 2ω x, n [ ] 79 (5-34) In order to match the speed we can add the constraints shown in Equation (5-35) and Equation (5-36). The term ∆ω represents the moment velocity. Based on this moment velocity we can find the moment for motion ∆M ∀t =1..n J∆ωt − (vt2 − Θt ) 2 ∀t =1..n JTvT Lt ∆M − (vt2 − Θt ) (5-35) 2 (5-36) The derivation of this with respect to ∆M is: 2( LT Tv J T JTvT Lt )∆M − 2( LT Tv J T (vt2 − Θt )) t t (5-37) The solution to M can be found through iteration. After every iteration, ∆M is added to M as shown in Equation (5-38). In order to use this iteration, Equation (5-22) must be changed to use both M and ∆M. M t +1 = M t + ∆M (5-38) To find ∆M for each iteration, Equation (5-39) must be minimized. Equation (5-39) is similar to the original M-space equation shown in Equation (5-22), with an extra speed trajectory matching factor added. By calculating the derivation of Equation (5-39) and setting it to 0, the most optimal value of ∆M can be calculated using Equation (5-40). 80 2 T T λ1 X 0 − TP L0 ( M + ∆M ) + λ1 X n − TP Ln ( M + ∆M ) + λ1 TvT L0 ( M + ∆M ) 2 + λ1 TvT Ln −1 ( M + ∆M ) T T + λ2 ∑ (TP Lt +1 − TP Lt + TvT Lt )( M + ∆M ) 2 2 2 t + λ3 ∑ Lt +1 ( M + ∆M ) − Lt ( M + ∆M ) (5-39) 2 t + λ4 ∑ JTvT Lt ∆M − (vt2 − Θt ) 2 t         T T T T  λ1 ( L0 TPTP L0 ) + λ1 ( Ln TPTP Ln )    + λ1 ( LT TvTvT L0 ) + λ1 ( LT −1TvTvT Ln −1 )   0 n   n −1 T T T + λ (T T L − T T L + TvT Lt )T (TP Lt +1 − TP Lt + Tv Lt ) ∆M  2 ∑ P t +1 P t  t =1   n −1   T  + λ3 ∑ ( Lt +1 − Lt ) ( Lt +1 − Lt )    t =1   n −1   + λ4 ∑ ( LT Tv J T JTvT Lt ) − ( LT Tv J T (vt2 − Θt ) t t   t =1   = λ1LT TP X 0 − λ1LT TP M + λ1LT TP X n − λ1LT TP M 0 0 n n − λ1LT Tv M − λ1LT −1Tv M 0 n n −1 T T T − λ2 ∑ (TP Lt +1 − TP Lt + Tv Lt )T M t =1 n −1 − λ3 ∑ ( Lt +1 − Lt )T M + λ4 LT Tv J T (vt2 − Θt ) t t =1 81 (5-40) The result of using this calculation to generate motion is shown in Figure 5-6. The original motion characteristic is shown on the left. The starting pose and ending pose of the original motion is used as input to the algorithm described in this chapter. The right data shows the motion trying to match quadratic function speed. The resulting motion shows variability in the total speed and angular speed. Figure 5-6 Total speed, angles, and angular velocity of motion calculated using Speed Trajectory Matching 82 5.2.5 Multiple Spaces The body space is divided into 5 segments as described in Chapter 3.3. To calculate the motion for the whole body, each segment is calculated separately. The angle of joints that are in multiple segments is simply the average of the angle values for that joint in all segments. Equation (5-41) describes the calculation for the k th joint angle, which is an average of the angles in all spaces that contains the k joint angle. ∑θ i , k θk = (5-41) i n 5.3 Results Two types of evaluation are performed on this algorithm. The first evaluation is aimed at measuring the similarity between a motion generated by this algorithm and a motion taken from the motion capture data. To do this, the angle difference at each frame is calculated. A visual observation of the motions is also performed. The second evaluation attempts to asses new motions synthesized with the M-space. Visual observation of the created motion is used in assessing the resulting motion. Table 5-1 Number of dimensions per M-space segment Body Segment Right hand Left hand Head Right Foot Left Foot Number of Dimensions 18 18 18 8 8 83 The body space is divided into 5 segments. By using the minimum distance between actual motion and motion created using M-space, the number of dimensions used are shown in Table 5-1. 5.3.1 Motion Reconstruction The first evaluation compares a motion created in M-space and one existing in the motion capture data. Four clips were selected from either the right hand or the right foot segments. The clips chosen had a speed trajectory that was similar to a quadratic function. The clips start off slow, peak near the middle, and slow down again near the end. The start pose and the end pose was given as input to the various algorithms tested. The results of three different motion generating algorithms were compared. The first algorithm is a simple linear interpolation between the first pose and the end pose. The second motion algorithm uses the M-space algorithm described in chapter 5.2.2. The third algorithm is the speed trajectory matching algorithm of chapter 5.2.4. The trajectory of the speed was a quadratic function with a peak of 5 degrees/frame. Table 5-2 Euclidian distance for the right hand segment motions Clip Clip 1 Number of Frames 41 Clip 2 32 Clip 3 13 Clip 4 47 Total 133 Interpolation M-Space 13.9794 (7.4394) 10.2899 (6.0120) 14.3558 (9.0049) 32.4070 (15.5398) 19.6405 (14.4471) 13.9380 (7.9521) 12.3045 (7.2574) 14.6976 (8.6656) 33.0236 (16.9043) 20.3638 (15.0579) 84 Speed Trajectory 14.2215 (8.6094) 9.0544 (5.3202) 13.7084 (7.9130) 31.0832 (14.9078) 18.8868 (14.0731) The total Euclidian distance between the joint angles was calculated for each frame. The farther the distance of the natural motion and the generated motion, the less natural the motion is deemed to be. The mean and standard deviation (in brackets) for each algorithm is shown in Table 5-2. Based on the data found here, all three algorithms create motions which are similar in terms of naturalness. Paired T-tests shows that the differences were not statistically significant (t score were -0.3997 for methods 1 and 2, 0.4310 for methods 1 and 3, and 0.8265 for methods 2 and 3). Similarly, Table 5-3 shows the mean and standard deviation of distances for the right foot. It is clear here that the speed trajectory algorithm performs better for foot motion than the other algorithms. Paired T-tests shows that the differences are statistically significant for p<0.005 (t score were –8.0657 for methods 1 and 2, 3.9803 for methods 1 and 3, and 10.4810 for methods 2 and 3). Table 5-3 Euclidian distance for the right foot segment motions Clip Clip 1 Number of Frames 37 Clip 2 31 Clip 3 21 Clip 4 41 Total 130 Interpolation M-Space 15.7634 (7.3665) 14.1270 (9.0573) 8.7946 (4.5610) 13.0720 (9.7789) 13.7276 (8.7391) 26.1878 (14.8549) 31.2707 (18.2234) 44.1516 (30.6322) 25.6661 (19.2689) 28.7152 (19.5668) Speed Trajectory 8.7737 (4.5666) 11.3426 (7.7485) 4.6237 (2.8625) 11.0984 (7.0715) 9.9472 (6.6033) A second evaluation was performed on this data. A walking motion was taken from the motion capture database. The original walking motion was of 60 frame length. To generate the motion, the first and the last frame of the walking motion were given to the interpolation and the 85 M-space algorithms. The speed matching algorithm with a quadratic function approximation was used. The maximum sum of angle speed per frame was 3 degrees for the arms and left leg segments, 2 for the head segment, and 5 for the right leg segment. Figure 5-7 compares various walking motion created by regular interpolation and using motion space. The original walking motion is shown in the top row. The second row shows the result of linear interpolation. The third row shows the result from the M-space with speed matching. Figure 5-7 Comparison of walking motion The problem with linear interpolation can be seen in the right leg segment of the motion. In a normal walking motion shown in the first row, a person bends his knees to lift the leg forward. This is not seen in the motion created by interpolation. The motion drags the feet from the back to the front. The M-space motion created on the other hand shows some knee bending and as a result a small leg lift occurred, albeit not as high as the lift in the normal walking motion. 86 Figure 5-8 Speed and angle comparison of right hand and right foot 87 Figure 5-8 shows the speed and angles for both the right hand (top) and the right foot (bottom) segments. The left graphs show the original speed and angle per frames. The right graphs show the speed and angle per frames of the motion created by the speed trajectory matching algorithm. The motion generated by the speed trajectory matching algorithm follows a path that in most joints mimics the path in the original motion. 5.3.2 Pose Based Motion Synthesis In most applications of human animation, there is a need to synthesize new motions based on existing data. The method described in this thesis allows different ways of creating new motion to meet an application’s need. Motions are generated based on the start and end poses. By changing the end pose, new natural motions must be generated. The top figures of Figure 5-9 shows an animated human pointing upwards. New motions are generated by changing the direction that the human points to. Two new motions were generated, one pointing to the front and one pointing to the side. The first pose for each motion was taken from the original motion capture data. The multi space pose generator described in Chapter 4.3 was used to generate the end poses. The speed trajectory matching algorithm with a quadratic function motion trajectory with a peak of 5 was used for creating both motions. The algorithm was able to create a novel motion that meets both speed and position constraints. 88 Figure 5-9 Creating new pointing motions using natural spaces 5.3.3 Speed Based Motion Synthesis An alternate way of generating new motions is to change the speed of various parts of the body. However one must be very careful that the motion remains natural, even when the speed changes. This method allows a user to set the total speed for a segment. This allows the user to control which segment needs to change, while at the same time not worry about changing individual joint angles and how they relate to one another. 89 Figure 5-10 Effect of a different speed trajectory to motion The animated figure in Figure 5-10 is performing traffic control actions (waving cars through an intersection). The figures on top show the original motion. The speed of the right hand segment was changed to have a peak of 8. However the starting and ending poses remain the same. The figures on the bottom show the resulting motion after the speed was changed. Due to the change of speed, the character had to move the hands in a curved trajectory in order to meet both positional and speed constraints. As a result of this change of speed, the end motion is significantly different from the original motion. Figure 5-11 shows the speed and angle per frame of the right hand segment. The left graphs show the original motion capture data while the right graphs show the data after the change of speed. 90 Figure 5-11 Speed and angle per frame after fitting with new trajectory A prototype speed trajectory (as shown in chapter 5.2.3) can also be used to generate motions. A walking motion was taken from the motion capture database. The original walking motion was of 86 frame length. A 60 frame motion was created to speed up calculation. To generate the motion, the first and the last frame of the walking motion were given to the interpolation and the M-space algorithms. Instead of matching the speed at all frames, speed were matched every 2 frames. In essence, this allows the algorithm to choose a more natural 91 speed for some frames and not make unnecessary movements just to meet the pre arranged speed constraints. Figure 5-12 Speed trajectories for walking and running motion Figure 5-12 shows the speed and angle per frames generated by this algorithm for two prototype speed trajectories, a walking and a running motion that starts and ends with the same pose. Figure 5-13 shows the results of synthesizing a motion having those speed trajectories. The top images show the original motion taken from the motion capture data. The middle images 92 show the synthesized walking motion while the bottom images show the synthesized running motion. 93 Figure 5-13 Walking and running motions generated from prototype speed trajectories. The top figures show the original motion. The figures on the second row shows generated walking motion, while the figures on the third row shows generated running motion 94 6 Conclusion and future work 6.1 Conclusion This thesis describes a set of algorithms that creates natural spaces, subspaces of all possible poses and motions, and how to utilize these subspaces to create more natural poses and movement. The skeletal model is divided into multiple overlapping segments. PCA is performed for the motion capture data in each segment. The resulting space is called the natural space. Inverse kinematics on the PCA space is used to create constrained poses. Furthermore to retain the relation between the various segments, an effector space is created consisting of the coordinates of 8 end effectors / bones. Naturalness was measured by calculating as log likelihood of a set of poses created by an algorithm given a model of naturalness. The mean and variance of the motion capture data was calculated and used as the parameter for a normal distribution. This normal distribution of data is used as the model for comparing naturalness of poses. The likelihood is calculated by calculating the probability density function (pdf) of poses to the model. Resulting poses are most natural when the number of PCA dimensions is around 50% of the original number of dimensions. A lower number of dimensions severely limits the poses that the algorithm is able to use, therefore, it is not likely to find one that meets the constraints. A high number of dimensions on the other hand give too much freedom on the resulting bone angles. The relationships between the bone angles are lost, resulting in unnatural poses. 95 The algorithm proposed here creates poses that are more natural compared to traditional iterative Jacobian and the single PCA space method. As predicted the traditional iterative Jacobian performs the worst, as it focuses on finding solutions without taking naturalness into account. The single PCA solution creates relations over body parts that may not really exist, and are only there due to insufficient learning data. By breaking the space into multiple segments, this algorithm is better place the bones into a pose that fits the constraints, while at the same time keeping it natural. To create motion, a motion space was created from the phase space. The phase space includes the angle of the bones as well as the speed of each angle. Similar to the pose space, the skeletal model is also segmented into the same space. Given a starting pose and an ending pose, the algorithm described in this thesis creates the inner poses in order to create a motion. A single vector represents each phase of the motion from the starting pose to the ending pose. The motion generated algorithm searches through the motion space to fill out the vector with motion phases that meets the constraint. Traditional linear interpolation assumes changes in joint angles are always constant. The motion capture data however shows that this is not always true. With linear interpolation the angle of a joint at any time t is always a linear product of t and the angle speed. The speed of joint angles in natural motion changes constantly and cannot be easily replicated by linear interpolation. The first algorithm proposed finds a solution by minimizing the beginning and ending speed. This algorithm is able to create motions that have variance in its speed, similar to natural motion. The second algorithm proposed goes even further by allowing a 96 user to control the speed of the motion at each step. Therefore the user can select a speed curve (such as a simple quadratic curve) which the algorithm tries to match. This algorithm is also able to create motions with variable speed for each joint, creating a more natural motion. The work described here shows that the use of multiple reduced dimension space is a feasible alternative in generating poses and motions. By utilizing the statistical properties of motion capture data, these methods generate more natural looking poses and motions compared to traditional methods. 6.2 Future Work There are various directions where this work can be expanded. Some of the areas for expansion include a more thorough study of data, optimization, methods merging, and evaluations. Chai and Hodgins dismissed the use of global PCA in their work because it overgeneralizes the data [93]. One reason for this is that they use global PCA instead of the multiple space method described here. Even the multiple space method can be broken down further. Some have used techniques that work on motion capture data for specific movement [79]. Preliminary studies by this author, shows that categorizing motion and using different category results for the motion capture data results in different motions. Further study can be performed to utilize fully the specific properties of each motion. Different emotions elicit different types of movement [21]. PCA has been used to generate different walking movement based on emotions [77]. This work can be expanded by using emotion based motion capture and eliciting motions and poses that is typical of an emotion. 97 Quaternion representation of joints has some advantages over Euler angles [78]. Various PCA algorithms on motion have been used using this quaternion representation [78], [81]. A comparative study for multi space PCA using quaternion as compared to the current Euler angle is needed. No optimization or algorithmic analysis is performed on the current algorithm. Therefore it is impossible to know whether the algorithm here is suitable for real time systems. The complexity of this algorithm comes from finding the inverse of matrices. As some of the matrices are sparse, it may be possible to speed up the inverse calculation. The algorithm described here can be easily incorporated into other programs. The pose generating mechanism can be used in the RRT search algorithm proposed in Yamane [63] and Pan [64]. The motion generating algorithm does not necessarily need to take as input the results from the pose generating algorithm. All it needs is the skeletal configuration for the starting pose and the ending pose. The assumption used in this work is that motion is transferable between subjects and body types. One of the findings of Pronost et al, is that different body will have different correlation between joints, and thus different type of movement [66]. Future research can weigh in on how to use the body information to transfer or change the natural space. In this thesis, naturalness is measured objectively through the use of a probability distribution function. It would be interesting to see whether the poses and motions created here are natural according to direct observation by humans. To test the naturalness of the motion created, subjects may try to determine [94] which two motions were generated by a real human, and which were generated by this algorithm. If the subjects were not able to 98 discern which is which, it can be concluded that the algorithm proposed creates animation of natural motion. 99 APPENDICES Appendix A. Multiple Space PCA Results The following figures show selected comparison of running the iterative Jacobian, Single PCA, Single PCA +, and Multi PCA algorithms as described in Chapter 4.3 on various constraints. For all figures, the skeleton shows the results of running the respective algorithms in order from left to right. In all of the following figures, the Multi PCA found a solution pose that meets the constraints set. The other algorithms may also possibly have found a solution pose to the constraints. 101 Figure 6-1 Results of running the pose generation algorithms on right hand constraints only 102 Figure A-2 Results of running the pose generation algorithms on right hand and left hand constraints 103 Figure A-3 Results of running the pose generation algorithms on right foot constraint only 104 Figure A-4 Results of running the pose generation algorithms on right and left foot constraints 105 Figure A-5 Results of running the pose generation algorithms on hands and foot constraints 106 REFERENCES References [1] Nadia Magnenat-Thalmann and Daniel Thalmann, “An Overview of Virtual Humans,” in Handbook of Virtual Humans, John Wiley & Sons, 2004. [2] K. L. Nowak and F. Biocca, “The effect of the agency and anthropomorphism of users’ sense of telepresence, copresence, and social presence in virtual environments,” Presence: Teleoperators and Virtual Environments, vol. 12, p. 481– 494, Oct. 2003. [3] P. Goldstein, “Why is the $170-million ‘Polar Express’ getting derailed?,” Los Angeles Times, 2004. [4] F. Charles, M. Cavazza, S. J. Mead, O. Martin, A. Nandi, and X. Marichal, “Compelling experiences in mixed reality interactive storytelling,” in Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology, Singapore: ACM Press, 2004, pp. 32-40. [5] M. Mateas and A. Stern, “Towards Integrating Plot and Character for Interactive Drama,” Socially Intelligent Agents: The Human in the Loop. Papers from the 2000 AAAI Fall Symposium, pp. 113-118, 2000. [6] J. Gratch, “Émile: Marshalling passions in training and education,” in Proceedings of the fourth international conference on Autonomous agents, Barcelona, Spain: ACM Press, 2000, pp. 325-332. [7] J. Rickel and W. L. Johnson, “Integrating pedagogical capabilities in a virtual environment agent,” in Proceedings of the first international conference on Autonomous agents, Marina del Rey, California, United States: ACM Press, 1997, pp. 30-38. [8] J. Lester and B. Stone, “Increasing Believability in Animated Pedagogical Agents,” in The First International Conference on Autonomous Agents, Marina del Rey, California, 1997, pp. 16-21. [9] E. Andre, T. Rist, and J. Muller, “Integrating reactive and scripted behaviors in a life-like presentation agent,” in Proceedings of the second international conference on Autonomous agents, Minneapolis, Minnesota, United States: ACM Press, 1998, pp. 261-268. 108 [10] J. Cassell et al., “Embodiment in conversational interfaces: Rea,” in Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit, Pittsburgh, Pennsylvania, United States: ACM Press, 1999, pp. 520-527. [11] R. McDonnell, F. Newell, and C. O’Sullivan, “Smooth movers: perceptually guided human motion simulation,” in Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation, Aire-la-Ville, Switzerland, Switzerland, 2007, p. 259–269. [12] P. S. A. Reitsma and N. S. Pollard, “Perceptual metrics for character animation: sensitivity to errors in ballistic motion,” ACM Trans. Graph., vol. 22, no. 3, pp. 537542, 2003. [13] A. Hertzmann, C. O’Sullivan, and K. Perlin, “Realistic human body movement for emotional expressiveness,” in ACM SIGGRAPH 2009 Courses, New York, NY, USA, 2009, p. 20:1–20:27. [14] N. Burtnyk and M. Wein, “Interactive skeleton techniques for enhancing motion dynamics in key frame animation,” Commun. ACM, vol. 19, no. 10, pp. 564-569, 1976. [15] E. Catmull, “A system for computer generated movies,” in Proceedings of the ACM annual conference - Volume 1, Boston, Massachusetts, United States: ACM Press, 1972, pp. 422-431. [16] ASF/AMC, “Acclaim ASF/AMC,” 1999. [Online]. Available: http://www.cs.wisc.edu/graphics/Courses/cs-838-1999/Jeff/ASF-AMC.html. [17] M. Garau, M. Slater, S. Bee, and M. A. Sasse, “The impact of eye gaze on communication using humanoid avatars,” in Proceedings of the SIGCHI conference on Human factors in computing systems, Seattle, Washington, United States: ACM Press, 2001, pp. 309-316. [18] P. Gebhard, “ALMA: a layered model of affect,” in Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, The Netherlands: ACM Press, 2005, pp. 29-36. [19] B. Tomlinson, “From linear to interactive animation: how autonomous characters change the process and product of animating,” Comput. Entertain., vol. 3, no. 1, pp. 5-5, 2005. [20] J. Lasseter, “Tricks to animating characters with a computer,” SIGGRAPH Comput. Graph., vol. 35, no. 2, pp. 45-47, 2001. 109 [21] T. Porter and G. Susman, “On site: creating lifelike characters in Pixar movies,” Commun. ACM, vol. 43, no. 1, p. 25, 2000. [22] K. Perlin and A. Goldberg, “Improv: a system for scripting interactive actors in virtual worlds,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, 1996, pp. 205-216. [23] Z. Huang, A. Eliens, and C. Visser, “Implementation of a scripting language for VRML/X3D-based embodied agents,” in Proceeding of the eighth international conference on 3D Web technology, Saint Malo, France: ACM Press, 2003, pp. 91100. [24] N. I. Badler, M. S. Palmer, and R. Bindiganavale, “Animation control for real-time virtual humans,” Commun. ACM, vol. 42, no. 8, pp. 64-73, 1999. [25] A. Watt and M. Watt, Advanced Animation and Rendering Techniques Theory and Practice. New York: Addison-Wesley, 1992. [26] A. Bruderlin and T. W. Calvert, “Goal-directed, dynamic animation of human walking,” in Proceedings of the 16th annual conference on Computer graphics and interactive techniques, ACM Press, 1989, pp. 233-242. [27] C. K. Liu and Z. Popovic;, “Synthesis of complex dynamic character motion from simple animations,” in Proceedings of the 29th annual conference on Computer graphics and interactive techniques, San Antonio, Texas: ACM Press, 2002, pp. 408-416. [28] Z. Popovic, “Controlling physics in realistic character animation,” Commun. ACM, vol. 43, no. 7, pp. 50-58, 2000. [29] M. S. Geroch, “Motion capture for the rest of us,” J. Comput. Small Coll., vol. 19, no. 3, pp. 157-164, 2004. [30] B. Bodenheimer, C. Rose, S. Rosenthal, and J. Pella, “The Process of Motion Capture: Dealing with the Data,” Computer Animation and Simulation ’97, Eurographics Animation Workshop, no. 1997, 18. 1997. [31] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Wiley-Interscience, 2000. [32] Y. Abe, C. K. Liu, and Z. Popovic;, “Momentum-based parameterization of dynamic character motion,” in Proceedings of the 2004 ACM 110 SIGGRAPH/Eurographics symposium on Computer animation, Grenoble, France: Eurographics Association, 2004, pp. 173-182. [33] H-Anim, “H-ANIM specification for a standard humanoid,” 2007. [Online]. Available: http://h-anim.org/Specifications/H-Anim1.1/. [34] R. P. Paul, B. Shimano, and G. E. Mayer, “Kinematic Control Equations for Simple Manipulators,” IEEE Transactions on Systems, Man, and Cybernetics, 1981. [35] M. Fedor, “Application of inverse kinematics for skeleton manipulation in realtime,” in Proceedings of the 19th spring conference on Computer graphics, Budmerice, Slovakia: ACM Press, 2003, pp. 203-212. [36] M. Meredith and S. Maddock, “Adapting motion capture data using weighted realtime inverse kinematics,” Comput. Entertain., vol. 3, no. 1, pp. 5-5, 2005. [37] R. Barzel and A. H. Barr, “A modeling system based on dynamic constraints,” in Proceedings of the 15th annual conference on Computer graphics and interactive techniques, ACM Press, 1988, pp. 179-188. [38] P. M. Isaacs and M. F. Cohen, “Controlling dynamic simulation with kinematic constraints,” in Proceedings of the 14th annual conference on Computer graphics and interactive techniques, ACM Press, 1987, pp. 215-224. [39] J. K. Hodgins, W. L. Wooten, D. C. Brogan, and J. F. O’Brien, “Animating human athletics,” in Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, ACM Press, 1995, pp. 71-78. [40] J. K. Hahn, “Realistic animation of rigid bodies,” SIGGRAPH Comput. Graph., vol. 22, no. 4, pp. 299-308, 1988. [41] H. Zhiyong, N. M. Thalmann, and D. Thalmann, “Interactive human motion control using a closed-form of direct and inverse dynamics,” in The Second Pacific Conference on Computer Graphics and Applications, Pacific Graphics ’94., Beijing, China., 1994. [42] A. Witkin and M. Kass, “Spacetime constraints,” in Proceedings of the 15th annual conference on Computer graphics and interactive techniques, ACM Press, 1988, pp. 159-168. [43] H. Ko and N. I. Badler, “Animating human locomotion with inverse dynamics,” IEEE Computer Graphics and Animation, vol. 16, no. 2, pp. 50-59, Mar. 1999. 111 [44] S.-H. Lee, E. Sifakis, and D. Terzopoulos, “Comprehensive biomechanical modeling and simulation of the upper body,” ACM Transactions on Graphics (TOG), vol. 28, p. 99:1–99:17, Sep. 2009. [45] C. K. Liu, “Dextrous manipulation from a grasping pose,” in ACM Transactions on Graphics (TOG), New York, NY, USA, 2009, p. 59:1–59:6. [46] A. Safonova, J. K. Hodgins, and N. S. Pollard, “Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces,” ACM Trans. Graph., vol. 23, no. 3, pp. 514-521, 2004. [47] P. Baerlocher and R. Boulic, “An Inverse Kinematic Architecture Enforcing an Arbitrary Number of Strict Priority Levels,” The Visual Computer, vol. 20, p. 402– 417, 2004. [48] A. Witkin, K. Fleischer, and A. Barr, “Energy constraints on parameterized models,” in Proceedings of the 14th annual conference on Computer graphics and interactive techniques, ACM Press, 1987, pp. 225-232. [49] A. R. Conn and N. I. M. Gould, “Large-scale nonlinear constrained optimization: a current survey,” in Algorithms for continuous optimization: the state of the art, E. Spedicato, Ed. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1994, pp. 287-332. [50] CMU, “CMU Graphics Lab Motion Capture Database,” 2008. . [51] J. Barbic, A. Safonova, J.-Y. Pan, C. Faloutsos, J. K. Hodgins, and N. S. Pollard, “Segmenting motion capture data into distinct behaviors,” in Proceedings of Graphics Interface 2004, London, Ontario, Canada: Canadian Human-Computer Communications Society, 2004, pp. 185-194. [52] L. Kovar and M. Gleicher, “Flexible automatic motion blending with registration curves,” in Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, San Diego, California: Eurographics Association, 2003, pp. 214-224. [53] M. Gleicher, H. J. Shin, L. Kovar, and A. Jepsen, “Snap-together motion: assembling run-time animations,” in Proceedings of the 2003 symposium on Interactive 3D graphics, Monterey, California: ACM Press, 2003, pp. 181-188. [54] O. Arikan and D. A. Forsyth, “Interactive motion generation from examples,” in Proceedings of the 29th annual conference on Computer graphics and interactive techniques, San Antonio, Texas: ACM Press, 2002, pp. 483-490. 112 [55] J. Lee and K. H. Lee, “Precomputing avatar behavior from human motion data,” in Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, Grenoble, France: ACM Press, 2004, pp. 79-87. [56] H. P. H. Shum, T. Komura, and P. Yadav, “Angular momentum guided motion concatenation,” Computer Animation and Virtual Worlds, vol. 20, no. 2-3, pp. 385394, Jun. 2009. [57] R. Heck, L. Kovar, and M. Gleicher, “Splicing Upper-Body Actions with Locomotion,” Computer Graphics Forum, vol. 25, no. 3, pp. 459-466, Sep. 2006. [58] S. Mutlu, “Motion Enriching Using Humanoide Captured Motions,” Master’s Thesis, Polytecnic University of Catalunya, 2010. [59] M. da Silva, F. Durand, and J. Popović, “Linear Bellman combination for control of character animation,” in ACM Transactions on Graphics (TOG), New York, NY, USA, 2009, p. 82:1–82:10. [60] S. J. Lee and Z. Popović, “Learning behavior styles with inverse reinforcement learning,” in ACM Transactions on Graphics (TOG), New York, NY, USA, 2010, vol. 29, p. 122:1–122:7. [61] J. M. Wang, D. J. Fleet, and A. Hertzmann, “Optimizing walking controllers for uncertain inputs and environments,” in ACM Transactions on Graphics (TOG), New York, NY, USA, 2010, p. 73:1–73:8. [62] K. Shoemake, “Animating rotation with quaternion curves,” in Proceedings of the 12th annual conference on Computer graphics and interactive techniques, ACM, 1985, pp. 245-254. [63] K. Yamane, J. J. Kuffner, and J. K. Hodgins, “Synthesizing animations of human manipulation tasks,” ACM Trans. Graph., vol. 23, no. 3, pp. 532-539, 2004. [64] J. Pan, L. Zhang, M. C. Lin, and D. Manocha, “A hybrid approach for simulating human motion in constrained environments,” Comput. Animat. Virtual Worlds, vol. 21, p. 137–149, May. 2010. [65] B. J. H. van Basten, P. W. A. M. Peeters, and A. Egges, “The step space: examplebased footprint-driven motion synthesis,” Computer Animation and Virtual Worlds, vol. 21, p. 433–441, May. 2010. 113 [66] N. Pronost, A. Sandholm, and D. Thalmann, “Correlative joint definition for motion analysis and animation,” Computer Animation and Virtual Worlds, vol. 21, p. 183–192, May. 2010. [67] F. Aurenhammer, “Voronoi diagrams—a survey of a fundamental geometric data structure,” ACM Computing Surveys (CSUR), vol. 23, p. 345–405, Sep. 1991. [68] K. Pullen and C. Bregler, “Animating by Multi-Level Sampling,” in Proceedings of the Computer Animation, Washington, DC, USA, 2000, p. 36–. [69] M. Brand and A. Hertzmann, “Style machines,” Proceedings of the 27th annual conference on Computer graphics and interactive techniques, p. 183–192, 2000. [70] H. J. Shin and J. Lee, “Motion synthesis and editing in low-dimensional spaces: Research Articles,” Comput. Animat. Virtual Worlds, vol. 17, p. 219–227, Jul. 2006. [71] I. T. Jolliffe, Principal Component Analysis, 2nd ed. Springer, 2002. [72] A. Frank and A. Asuncion, UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, 2010. [73] K. T. Ge, “Solving Inverse Kinematics Constraint Problems for Highly Articulated Models,” University of Waterloo, 2000. [74] Q. Gu, J. Peng, and Z. Deng, “Compression of Human Motion Capture Data Using Motion Pattern Indexing,” Computer Graphics Forum, vol. 28, no. 1, pp. 1-12, Mar. 2009. [75] O. Arikan, “Compression of motion capture databases,” in ACM SIGGRAPH 2006 Papers, Boston, Massachusetts: ACM, 2006, pp. 890-897. [76] Z. Li, Y. Deng, and H. Li, “Generating Different Realistic Humanoid Motion,” in Advances in Artificial Reality and Tele-Existence, vol. 4282, Springer Berlin / Heidelberg, 2006, pp. 77-84. [77] J. Tilmanne and T. Dutoit, “Expressive gait synthesis using PCA and Gaussian modeling,” Proceedings of the Third international conference on Motion in games, p. 363–374, 2010. [78] M. P. Johnson, Exploiting Quaternions to Support Expressive Interactive Character Motion. Massachusetts Institute of Technology, 2003. 114 [79] J. Min, Y.-L. Chen, and J. Chai, “Interactive generation of human animation with deformable motion models,” ACM Transactions on Graphics (TOG), vol. 29, p. 9:1–9:12, Dec. 2009. [80] S. R. Carvalho, R. Boulic, and D. Thalmann, “Interactive low-dimensional human motion synthesis by combining motion models and PIK,” Comput. Animat. Virtual Worlds, vol. 18, no. 4-5, pp. 493-503, 2007. [81] T. Grudzinski, “Exploiting Quaternion PCA in Virtual Character Motion Analysis,” Proceedings of the International Conference on Computer Vision and Graphics: Revised Papers, p. 420–429, 2009. [82] L. Ren, A. Patrick, A. A. Efros, J. K. Hodgins, and J. M. Rehg, “A data-driven approach to quantifying natural human motion,” in International Conference on Computer Graphics and Interactive Techniques, Los Angeles, California, 2005. [83] F. Multon, R. Kulpa, L. Hoyet, and T. Komura, “Interactive animation of virtual humans based on motion capture data,” Computer Animation and Virtual Worlds, vol. 20, p. 491–500, Sep. 2009. [84] S. Ishigaki, T. White, V. B. Zordan, and C. K. Liu, “Performance-based control interface for character animation,” in ACM Transactions on Graphics (TOG), New York, NY, USA, 2009, vol. 28, p. 61:1–61:8. [85] S. M. LaValle, Planning Algorithm. Cambridge University Press, 2006. [86] David C. Lay, Linear Algebra and Its Applications, 2nd ed. Addison Wesley, 2000. [87] S. R. Buss, “Introduction to Inverse Kinematics with Jacobian Transpose, Pseudoinverse and Damped Least Squares Methods,” 2004. [Online]. Available: http://groups.csail.mit.edu/drl/journal_club/papers/033005/buss-2004.pdf. [88] S. Buss and J.-S. Kim, “Selectively Damped Least Squares for Inverse Kinematics,” Journal of Graphics, GPU, & Game Tools, vol. 10, no. 3, pp. 37-49, Jan. 2005. [89] D. Cosker, S. Paddock, D. Marshall, P. L. Rosin, and S. Rushton, “Toward Perceptually Realistic Talking Heads: Models, Methods and McGurk,” ACM Transactions on Applied Perception, vol. 2, no. 3, pp. 270-285, Jul. 2007. [90] K. Grochow, S. L. Martin, A. Hertzmann, and Z. Popovic, “Style-based inverse kinematics,” in ACM SIGGRAPH 2004 Papers, Los Angeles, California: ACM, 2004, pp. 522-531. 115 [91] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C book set: Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press, 1992. [92] Harold Hotelling, “Multivariate Quality Control,” in Techniques of Statistical Analysis, New York: McGraw-Hill, 1947, pp. 111-184. [93] J. Chai and J. K. Hodgins, “Performance Animation from Low-dimensional Control Signals,” ACM TRANSACTIONS ON GRAPHICS, vol. 24, p. 686--696, 2005. [94] S. E. M. Jansen and H. Welbergen, “Methodologies for the User Evaluation of the Motion of Virtual Humans,” in Intelligent Virtual Agents, vol. 5773, Z. Ruttkay, M. Kipp, A. Nijholt, and H. H. Vilhjálmsson, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 125-131. 116