CHANGING MOVEMENT PATTERNS USING REINFORCEMENT LEARNING By Tzu-Hsiang Lin A DISSERTATION Michigan State University in partial fulfillment of the requirements Submitted to for the degree of Kinesiology—Doctor of Philosophy 2020 CHANGING MOVEMENT PATTERNS USING REINFORCEMENT LEARNING ABSTRACT By Tzu-Hsiang Lin Humans interact with the world by generating movements, which make it important to understand the process of motor learning. There are two aspects of motor learning: (1) an improvement in task performance (e.g., learning to throw farther), and (2) a change in the movement pattern (e.g., learning to throw with an improved coordination or technique even if there is no change in task performance). Most studies on motor learning focus on the first aspect of task performance; however, the second aspect of movement pattern is also important and ubiquitous in our daily life - for example, we learn a better movement pattern to carry heavy objects to prevent injuries or the patients re-learn to perform movements in the rehab setting. In this dissertation, I designed a learning protocol that provided reinforcement feedback to guide participants to learn alternative movement patterns to perform the same task. Reinforcement feedback provides participants with a signal to start exploring different movement patterns but does not provide direct information about the desired movement pattern. Therefore the key question of this dissertation was to address the issue of how to schedule the reinforcement feedback to shift participants toward an alternative movement pattern in tasks requiring coordination of multiple body segments. In experiment one, I tested how providing ‘online’ reinforcement feedback (i.e. feedback provided during the movement) could shift the participants to alternative movement patterns. In experiment two, I tested how providing ‘terminal’ reinforcement feedback (i.e. feedback provided at the end of the movement) could shift participants toward alternative movement patterns, and if an adaptive method that adjusts reinforcement based on prior performance had better learning outcomes. In summary, I found: (1) reinforcement feedback can be used to change movement patterns in task requiring coordination of multiple body segments, although it is less successful when compared to its use in simpler tasks (2) online reinforcement feedback resulted in quick changes toward the desired movement pattern, and the amount of practice was the primary factor that determined retention, and (3) terminal reinforcement feedback resulted in less change toward the desired movement pattern, and an adaptive algorithm was needed to achieve better learning outcomes. These results contribute to the fields of motor learning and computational motor neuroscience to understand how the central nervous system uses feedback to change movement patterns, and can be applied to the fields of skill acquisition or motor rehabilitation to help people learn motor skills. TABLE OF CONTENTS LIST OF FIGURES ...................................................................................................................... vii CHAPTER 1 INTRODUCTION .................................................................................................... 1 Focus of dissertation ................................................................................................................... 3 CHAPTER 2 LITERATURE REVIEW ......................................................................................... 6 Learning in a redundant motor system ........................................................................................ 6 Data-driven approaches to describe movement patterns ............................................................ 7 Task-driven approaches to describe movement patterns ............................................................ 8 Learning with external feedback ................................................................................................. 9 Error-based learning .................................................................................................................. 10 The paradigm ........................................................................................................................ 10 The mechanism ...................................................................................................................... 11 Experimental results ............................................................................................................. 12 Optimization .......................................................................................................................... 12 Summary ............................................................................................................................... 13 Reinforcement learning ............................................................................................................. 14 The paradigm ........................................................................................................................ 14 The mechanism ...................................................................................................................... 15 Exploration vs. noise ............................................................................................................. 15 Summary ............................................................................................................................... 16 Use-dependent learning ............................................................................................................ 17 The paradigm ........................................................................................................................ 17 The mechanism ...................................................................................................................... 17 Summary ............................................................................................................................... 18 How to learn alternative movement patterns ............................................................................ 18 Summary ................................................................................................................................... 20 CHAPTER 3 LEARNING ALTERNATIVE MOVEMENT PATTERNS USING REINFORCEMENT FEEDBACK IN A REACHING TASK ..................................................... 21 Abstract ..................................................................................................................................... 21 Introduction ............................................................................................................................... 22 Methods- Experiment 1 ............................................................................................................. 24 Participants ........................................................................................................................... 24 Apparatus .............................................................................................................................. 25 Task ....................................................................................................................................... 25 Procedure .............................................................................................................................. 26 Providing Reinforcement feedback ....................................................................................... 26 Groups and Reinforcement schedules ................................................................................... 28 Data Analysis ........................................................................................................................ 29 Statistical Analysis ................................................................................................................ 30 Results ....................................................................................................................................... 31 iv Trunk-hand distance - Far targets ........................................................................................ 31 Trunk-hand distance - Near targets ...................................................................................... 32 Path Length - Far targets ..................................................................................................... 34 Path Length - Near targets ................................................................................................... 35 Discussion of Experiment 1 and rationale for Experiment 2 .................................................... 36 Methods – Experiment 2 ........................................................................................................... 37 Statistical Analysis ................................................................................................................ 37 Results ....................................................................................................................................... 37 Trunk-hand distance - Far targets ........................................................................................ 37 Trunk-hand distance - Near targets ...................................................................................... 39 Discussion of Experiment 2 ...................................................................................................... 40 General Discussion ................................................................................................................... 40 CHAPTER 4 SHAPING REINFORCEMENT FEEDBACK TO INDUCE CHANGES IN MOVEMENT PATTERNS IN A THROWING TASK ............................................................... 45 Abstract ..................................................................................................................................... 45 Introduction ............................................................................................................................... 46 Methods: experiment 1 ............................................................................................................. 49 Participants ........................................................................................................................... 49 Apparatus .............................................................................................................................. 49 Task ....................................................................................................................................... 50 Score feedback ...................................................................................................................... 50 Providing reinforcement feedback ........................................................................................ 51 Procedures ............................................................................................................................ 52 Groups .................................................................................................................................. 52 Data analysis ........................................................................................................................ 53 Statistical analysis ................................................................................................................ 55 Results: experiment 1 ................................................................................................................ 55 Score ..................................................................................................................................... 56 Punishment rate .................................................................................................................... 57 Trunk velocity ........................................................................................................................ 57 Hand velocity ........................................................................................................................ 58 Task space variability ........................................................................................................... 59 Null space variability ............................................................................................................ 60 Summary of experiment 1 ...................................................................................................... 61 Rationale for Experiment 2 ................................................................................................... 61 Methods- experiment 2 ............................................................................................................. 61 Participants ........................................................................................................................... 62 Apparatus and task ............................................................................................................... 62 Grouping and shaping methods ............................................................................................ 63 Data analysis ........................................................................................................................ 64 Results: experiment 2 ................................................................................................................ 65 Threshold .............................................................................................................................. 65 Score ..................................................................................................................................... 65 Punishment rate .................................................................................................................... 66 Trunk velocity ........................................................................................................................ 67 v Hand velocity ........................................................................................................................ 68 Null space variability ............................................................................................................ 69 Summary of experiment 2 ...................................................................................................... 70 General Discussion ................................................................................................................... 70 Reinforcement in multi-DOF tasks ....................................................................................... 71 Shaping schedules ................................................................................................................. 72 Abrupt vs. Gradual................................................................................................................ 72 Adaptive schedules ................................................................................................................ 73 CHAPTER 5 GENERAL DISCUSSION ..................................................................................... 75 Overall scope ............................................................................................................................ 75 Contributions of the dissertation ............................................................................................... 76 Online feedback vs delayed feedback ....................................................................................... 77 Shaping reward/punishment during reinforcement ................................................................... 78 Limitation and future direction ................................................................................................. 79 Conclusion ................................................................................................................................ 80 REFERENCES ............................................................................................................................. 82 vi LIST OF FIGURES Figure 3.1. Schematic of experimental setup ................................................................................ 26 Figure 3.2. Reinforcement feedback and experimental protocol .................................................. 28 Figure 3.3. Variation of thresholds and actual trunk-hand distances ............................................ 33 Figure 3.4. Mean trunk- hand distance in far and near targets ..................................................... 34 Figure 3.5. Schematic of path length ............................................................................................ 35 Figure 3.6. Variation of thresholds and actual trunk-hand distances ............................................ 38 Figure 3.7. Group mean trunk- hand distance ............................................................................... 39 Figure 4.1. Experimental setup ..................................................................................................... 49 Figure 4.2. Mechanism of providing reinforcement feedback ...................................................... 51 Figure 4.3. Design of shaping methods ........................................................................................ 53 Figure 4.4. Definition of task space and null space ...................................................................... 54 Figure 4.5. Change in mean score ................................................................................................. 56 Figure 4.6. Change in mean punishment rate ............................................................................... 57 Figure 4.7. Change in standardized trunk velocity and hand velocity .......................................... 58 Figure 4.8. Change in task and null space variability ................................................................... 60 Figure 4.9. Schematic of scoring with and without reinforcement feedback ............................... 63 Figure 4.10. Deign of reinforcement schedules ............................................................................ 64 Figure 4.11. Change in trunk velocity threshold ........................................................................... 65 Figure 4.12. Change in mean score ............................................................................................... 66 Figure 4.13. Change in punishment rate ....................................................................................... 67 Figure 4.14. Change in standardized trunk velocity and hand velocity ........................................ 68 vii Figure 4.15. Change in task and null space variability ................................................................. 69 viii CHAPTER 1 INTRODUCTION Consider the situation of a beginner learning to play tennis: not only does the learner have to focus on the eventual task outcome (say hitting the ball over the net), but in order to do so, the learner has to learn to coordinate joints and limbs to hit the ball – i.e., learn a new movement pattern. This problem of learning a movement pattern is further complicated by the fact that there are many possible movement patterns to hit the ball successfully - e.g., hitting the ball with an overarm or underarm movement pattern. This is the issue of motor redundancy where the motor system has many motor solutions to perform the same task (Bernstein, 1967). However, even though multiple solutions may be available to perform the task equally well in terms of the task outcome, some solutions may be preferred to others because they have other advantages (such as efficiency or injury prevention). In these cases, an external agent (e.g., a coach) may need to shift participants from using one solution to another. The central question that this dissertation addresses is how to ‘shift’ participants from one movement pattern to another in complex motor tasks. Understanding how to best structure practice to learn such new movement patterns is an important issue not only for skill acquisition, but is also a central part of movement rehabilitation in neurological disorders like stroke. Providing augmented feedback is a well-studied approach to guide the learner to specific movement patterns. Augmented feedback refers to feedback about the movement that is not intrinsic to the individual, and is usually provided by an external agent (coach, therapist, etc.) The learner uses this feedback as a ‘learning signal’ to modify movement patterns on future trials. Although previous approaches have distinguished based on the content of information (i.e. knowledge of results vs. knowledge of performance), a more recent distinction that has been made is based on the learning process itself – i.e., how behavior changes after providing different 1 types of feedback (Wolpert et al., 2011). Under this category, there are two primary types of feedback that guide the learner through different learning mechanisms: (1) error feedback (2) reinforcement feedback. Error feedback measures the difference between the performance and the target, and thus indicates both the magnitude and the direction of the error. For example, when a tennis player undershoots the target (and can see where the ball lands), the player can use the information to hit harder toward the direction of the target on the next trial. Since the magnitude and direction of the error are known, the learner modifies movement patterns to decrease the error trial-by- trial. Error-based learning is useful when the error is non-zero, it directs the learner to one of the many movement patterns that will bring the error to zero. However, there are two limitations of error-based learning – (i) once the error is zero, there is no learning signal to modify movement patterns further, even though learning may be required to modify movement patterns in tasks with redundancy, and (ii) implementing error in terms of movement patterns involving lots of body segments can result in high-dimensional feedback that is difficult to process for the learner. For example, when a learner performing a tennis serve receives a vector of errors on the shoulder, elbow, and wrist, it is difficult to use this information to make adjustments on all three joints at the same time. An alternative to error feedback is reinforcement feedback. Reinforcement feedback evaluates the performance and effectively provides a ‘good’ or ‘bad’ signal. In other words, it provides coarse-grained information that either signals the learner to retain the movement pattern (if the signal is good) or change the movement pattern (if the signal is bad). Unlike the systematic changes observed in error-based learning where errors can be gradually reduced to zero, reinforcement learning is characterized by exploration because the learner does not know 2 the size or the magnitude of the error with great precision. However, an advantage is that the reinforcement signal is low-dimensional (making it easy for the learner to use), and this process of exploration can be used to modify movement patterns even after the task error has become zero. Exploration helps to learn alternative movement patterns but here again, the challenge with task with multiple degrees of freedom (DOF) is that the learning process could be inefficient when the learner explores along the irrelevant or incorrect dimensions. Because the reinforcement is low-dimensional, it does not directly ‘guide’ the exploration toward the desired solution. Going back to the tennis serve example, telling a participant ‘bad serve’ does not provide sufficient information on how to change the shoulder, elbow and wrist motions. One solution to this problem is through ‘shaping’ (Ferster & Skinner, 1957), – i.e. gradually manipulating feedback based on the learner’s behavior. In other words, the learner has a higher possibility to receive a ‘good’ reinforcement feedback when exploring in the right direction. However, how these shaping schedules should be used in multi-DOF tasks to shift participants from one pattern to another is not well understood. To study this question, I used a reinforcement-based algorithm with different shaping methods to guide participants to explore different solutions in an efficient way. Focus of dissertation In this dissertation, I investigated two main questions: (1) can reinforcement feedback to change movement patterns involving multiple degrees of freedom (DOF), and (2) how can different shaping methods help to change movement patterns? 3 I designed a virtual throwing task based on trunk-hand coordination to study the change in movement patterns. Participants were asked to learn the coordination of trunk and hand kinematics to meet the task goal. The redundancy in the task was based on the fact that many different movement patterns could be used to achieve the task, which allowed us to examine how reinforcement could help participants shift from one movement pattern to another. In this case, participants typically did not use trunk motion at the start of performing this task, and we applied reinforcement feedback so that a ‘punishment’ (i.e., a low score) was applied when the trunk movement was below a certain ‘threshold’. I manipulated this threshold during practice using different methods to examine how participants changed their movement pattern. Participants initially performed a ‘pre-test’ with no reinforcement, then practiced with reinforcement feedback for a training period, and then performed a ‘post-test’ with no reinforcement. This allowed us to examine the change in movement pattern due to the reinforcement feedback, and the retention of this movement pattern after it was removed. In experiment one, I examined the effect of learning alternative movement patterns with online reinforcement feedback (i.e. feedback given during the movement). Participants were grouped into an ‘abrupt’ group where the threshold changed abruptly and a ‘gradual’ group where the threshold changed gradually. Results showed that both groups learned alternative movement patterns, although the abrupt group retained this movement pattern even after the reinforcement was removed. In experiment two, I examined the effect of learning new movement patterns with different shaping methods in a discrete task (i.e. where feedback was given only at the end of the movement). A key question to address was whether ‘adaptive’ shaping methods (i.e. where the threshold was modified according to the participants’ performance) would yield greater changes 4 in the movement pattern relative to open loop shaping methods (abrupt/gradual). Results showed that (i) reinforcement feedback in discrete tasks created smaller changes in the movement pattern (compared to the online reinforcement feedback), and that adaptive shaping methods which resulted in a moderate rate of reinforcement were more effective at creating changes in movement pattern relative to open loop methods. The first contribution of this dissertation is to study reinforcement learning in the context of redundant tasks that have multiple solutions to achieve the task goal. Our results show that although reinforcement learning is quite successful when it is provided during the movement, reinforcement learning is considerably less successful at eliciting changes in movement patterns in discrete tasks, when it is provided at the end of the movement. The second contribution is to understand how shaping methods can be used to make reinforcement learning in discrete tasks effective. Adaptive shaping methods, which are based on the participants’ performance, were found to be more effective at creating changes in movement patterns compared to open-loop methods. Overall, these results point to the need for further work in understanding reinforcement learning in real-world tasks and how they can be applied to motor learning in skill acquisition or rehabilitation. 5 CHAPTER 2 LITERATURE REVIEW The focus of the dissertation is to investigate how the change in movement patterns can be guided by using reinforcement feedback. In this context, I will use a theoretical framework from computational motor-neuroscience (Wolpert et al., 2001). This framework does not focus on where motor learning happens in the central nervous system but rather focuses on what kind of computation is implemented throughout learning (Wolpert & Flanagan, 2016). In a typical learning paradigm, the learners receive ‘learning signals’ and generate new movement patterns to perform the task (Jordan & Rumelhart, 1992). The computation happens after receiving the learning signal until the onset of the movement. Learning signals are the input to the nervous system and the movement patterns are the output (Franklin & Wolpert, 2011). In this review section, I discuss (i) the challenge of learning in a redundant system, and (ii) the different types of learning signals in motor learning in the context of learning new movement coordination patterns. Learning in a redundant motor system Humans have the ability to perform different movement coordination patterns to perform the same task. For example, one can reach for a target with the elbow flexed or the elbow extended. Both movement patterns are equally ‘good’ from the viewpoint of achieving the task goal. This many-to-one mapping, which arises because there are more degrees of freedom (DOF) in the motor system than that are constrained by the task, was framed as the challenge of redundancy in the motor system (Bernstein, 1967). Motor redundancy can be evaluated at different levels of the motor system (e.g., motor neurons, muscle groups, kinetics, and kinematics); here, I focus on the level of kinematics. For example, to reach a target, only the 6 position of the hand is constrained by the task directly. Other DOF such as elbow and shoulder are not directly constrained by the task, and therefore give rise to different ways of performing the task. This flexibility is especially useful to deal with the uncertain environments, for example, we can reach for an object even when there is an obstacle in the way. Understanding the relation between motor learning and motor redundancy is critical because that movement patterns can be variable while still achieving the same task outcome (Guigon et al., 2007; Latash, 2012; Singh et al., 2016). So, while learning results in changes in the task outcome, how do participants settle on a movement pattern or ‘shift’ to an alternative movement pattern when several movement patterns lead to the same task outcome? Before describing this process of learning in term of movement patterns, I briefly review methods that quantify movement patterns in the context of redundancy. Data-driven approaches to describe movement patterns Human movements often have many DOFs, but the important variance usually lies in a low-dimensional space. There are many dimensionality reduction techniques to find this lower- dimensional space. One widely used technique is principal component analysis (PCA), the goal of PCA is to find the linear transformation to capture the variance of the data with fewer dimensions (represented by the first few principal components). The principal components that have high explained variance can be seen as the important dimensions that drive the movement (Gløersen et al., 2018; Witte et al., 2010). PCA is advantageous because it is easy to implement and has been successfully used to extract low-dimensional data in a variety of contexts. However, not all the data have linear patterns, especially human movements are considered nonlinear due to the anatomical constraints. 7 Another category of dimensionality reduction techniques is nonlinear methods that consider the nonlinearity in human movement pattern. Nonlinear methods have a better biological explanation than linear methods, it assumes the dynamics of human movement lies on a nonlinear manifold (Ficuciello et al., 2018; Jenkins & Matarić, 2004; Safonova et al., 2004; Wang & Suter, 2007). The nonlinear method groups the dimensions based on the balance between local similarity and global similarity. For example, to throw a ball, the elbow and shoulder are measured as local whereas the whole body is global. The results will show the shoulder and the elbow are grouped together to achieve the goal of reducing dimensions. There are two advantages of using nonlinear techniques: 1) It makes sense to apply nonlinear techniques to the nonlinear motor system 2) It is possible to discover the hidden dynamics in the data. However, a more powerful technique means that the lower dimensions are harder to interpret, and also the computation is complicated. Unless the linear methods cannot approximate the data well otherwise it is not advised to start from nonlinear methods. Task-driven approaches to describe movement patterns The above mentioned dimensionality reduction approaches describe the relation between different degrees of freedom, but do not address how this relation affects task performance. Therefore, a second class of approaches measures movement patterns in terms of how they achieve the task goal. For example, consider a reaching movement: although there are multiple joints in motion (shoulder/elbow/wrist), the important point from a task goal is whether the hand gets to the target or not. The Uncontrolled manifold hypothesis (Scholz & Schöner, 1999) adapted this concept and divided the task into two independent spaces: a “task space”, where the dimensions that affect task performance, and a “null space”, which is the dimension along which 8 there is no change in the task performance (Latash et al., 2002; Scholz et al., 2002). For example, in the reaching context, the ‘task space’ would lead to changes in hand position, whereas the ‘null space’ would lead to no changes in hand position. This method has also been adopted for other methods (Cusumano & Cesari, 2006; Müller & Sternad, 2004). From a motor learning standpoint, the task and null spaces play very different roles. Because variations along the task space causes change in the outcome, learning should result in a reduction of the task space variability to have stable performance. However, there is no such constraint on the null space as, by definition, it does not influence task performance in any way. On the one hand, null space variability may decrease with learning as a way of being more consistent and finding an ‘optimal’ solution. However, the null space variability could also increase with learning as it provides flexibility in performing the task with different movement patterns. For example, if some of the solutions are riskier to cause injuries due to extreme body postures, the null space provides a way to ‘shift’ coordination pattern and move to a better one without affecting task performance. Addressing this kind of motor learning, which does not focus on the change of performance but rather on the change of movement patterns, is the main focus of the dissertation. Because this process is complex and usually takes a long time (e.g. pro tennis players change their stroke technique), understanding the computational aspects of motor learning can help to design a paradigm to change movement patterns (Neilson, 1993; Schaal & Schweighofer, 2005). Learning with external feedback This review section is based on the concept that the neurvous system receives external feedback of the task as the input, and processes this information to update the movement 9 coordination pattern (Cusumano & Cesari, 2006; Müller & Sternad, 2004). The purpose is to understand what types of input signals guide motor learning and how humans learn with different types of feedback. There are three forms of learning paradigms that provide different types of feedback and lead to different learning mechanisms (Wolpert et al., 2011): (1) error-based learning; (2) reinforcement learning; (3) use-dependent learning. First, I introduce the paradigm of each learning form and describe the details about the mechanism. Second, I discuss the question of learning alternative movement patterns in each mechanism. Third, I discuss future directions in the motor learning research. I separate the definition of the learning paradigm and learning mechanism due to the psychological aspects of human learning (Jordan & Rumelhart, 1992). The learning paradigm is defined by the structure of the feedback, and the learning mechanism is related to how the learner responds to the feedback. Error-based learning The paradigm In the error-based learning paradigm, the environment provides a signed feedback signal to the learner. The learner corrects the movement based on the magnitude and the sign of the feedback. The magnitude shows how much to correct and the sign shows which direction to correct. For example, in a dart-throwing task, the position of the dart shows the direction and the distance to the bullseye, this error information guides movement coordination pattern in the next trial. Error-based learning paradigm was studied by adaptation task, e.g. reaching in force fields (Bhushan & Shadmehr, 1999), visuomotor rotation (Krakauer et al., 2000), and prism goggle 10 adaptation (Martin et al., 1996). The experimenters introduced perturbations to create errors to show how participants correct the errors. The mechanism Providing feedback with the direction and magnitude in the task space triggers an error- based learning mechanism. After receiving the feedback, the learner compares the predicted outcome to the feedback to calculate the error. After certain trials of practice, the learner will be able to associate the movement patterns to the gradient of the error, i.e. how the error changes after modifying the movement patterns. The learner then modifies movement patterns based on the direction of the gradient to minimize the error trial-by-trial until the error approaches to zero (Wolpert et al., 2011). The predicted outcome is generated by the forward internal model (Jordan & Rumelhart, 1992; Wolpert et al., 1995). A good analogy of this process is fitting the parameters of the model with a least-squares algorithm. Each new trial is a new data point in the system, the learner runs the algorithm to update the parameter for the next prediction. Error- based learning is a type of model-based learning (Haith & Krakauer, 2013). This mechanism is effective but computationally heavy due to the process of finding the gradient and minimizing the error. Error-based learning is not easy to implement when learning a complex movement – for e.g., imagine receiving errors of hip, knee, ankle, and toe when practicing a gymnastic move, the learner may not be able to effectively use this information to update the next movement. Therefore, addressing how error-based learning can be used in complex tasks remains an important issue in computational motor learning. 11 Experimental results Error-based learning mechanisms are typically studied using motor adaptation tasks. In this type of task, participants experienced perturbations while performing a well-developed motor skill. The goal is to see how participants modify movement coordination to overcome the perturbation. For example, in a visuomotor rotation task (Tseng et al., 2007), participants controlled one cursor from home to the target position under a visual rotation i.e. the visual feedback deviates from actual hand trajectory. To successfully reach under +45 degrees of rotation, the participants needed to aim (the predicted outcome from the forward model) for-45 degrees. The error was calculated between the aiming angle and the visual feedback, participants adapted the aiming angle from 0 to -45 degrees trial-by-trial. By using a state-space model to describe trial-by-trial changing of aiming angle, the results showed the adaptation was driven by the prediction error, not the target error. Similar results were shown in other studies (Criscimagna-Hemminger et al., 2010), the error-based mechanism is robust across non- redundant tasks. Optimization Another question that is not mentioned in error-based learning is how humans modify the movement coordination pattern based on the gradient of the error. Studies showed motor learning can be seen as an optimization process (Körding & Wolpert, 2004; Selinger et al., 2015), but whether humans follow specific algorithms is still unclear. One possible optimization algorithm in human learning is gradient descent. This is a heuristic algorithm to find the movement pattern that changes the performance the most. For example, to learn a tennis forehand stroke, the learner tries different ways to hit the ball for many trials. Then the learner found modifying 12 elbow angle brings the greatest change of the speed of the ball. With this information the learner focuses on modifying the elbow angle to get the best performance. This concept was shown in a lab experiment: participants modified the effector with artificial noise in a redundant task (Wolpert et al., 2011). The artificial noise created a huge change to the performance, therefore the participants tried to minimize the errors from this effector. Gradient descent provides a good framework to describe how human associate the error to the movement pattern. However, searching through all the possible movement patterns is not feasible when there are too many possible solutions. A modified version of gradient descent seems more probable: stochastic gradient descent (SGD). Instead of searching through the entire variable space, SGD only samples a portion of the variables. The learning curve of SGD looks similar to the human motor learning curve. Both learning curves show different variability structures in different stages of learning. Motor learning and optimization algorithms share similar research interests in memory, variability, learning rate, and learning steps. It is possible to use ideas from SGD research to show how humans optimize errors during motor learning (Körding & Wolpert, 2004). Summary Error-based learning is a simple and effective mechanism for learning new motor skills. It has the advantage of faster learning speed and the disadvantage of harder to reach alternative coordination. It has been shown that humans use this mechanism in many motor adaptation tasks. However, the effectiveness of error-based learning in a higher-dimensional tasks are still not very clear, providing errors is problematic when there are too many dimensions that need to be corrected. 13 Reinforcement learning The paradigm The concept of reinforcement learning paradigm stems from the research of operant conditioning in behavioral psychology. The theory focuses on the responses after the organism receives stimulus from the environment, with two types of stimuli associated with the responses- reinforcement and punishment (Skinner, 1938). Reinforcement can be any stimulus that strengthens the responses and punishment is any stimulus that weakens the responses. The organism learns through either maximizing future reinforcement or minimizing future punishment. Later on, researchers proposed different schedules and different types of reinforcement feedback to consolidate behavior (Reynolds, 1961). To make the concept approachable, reward and punishment learning is widely used to explain operant conditioning. With the development of machine learning and artificial intelligence, computer scientists designed the algorithm of reinforcement learning from the concept of interaction between the organism and the environment (Sutton & Barto, 2017). The concept of reinforcement learning is to map the state of the environment to the actions by maximizing predicted rewards. To construct a reinforcement learning paradigm, the environment provides unsigned feedback to the learner, often an overall evaluation of the performance. The feedback can be as simple as success or failure, or as complex as numbers from abstract math functions (Wolpert et al., 2001). Studies also provided graded feedback like monetary rewards to study how the brain learns through reward prediction (Galea et al., 2015). The reinforcement learning paradigm is a natural way to learn in the real world. Receiving the feedback of success or failure reinforces the behavior by exploring possible actions. 14 The mechanism In contrast to error-based learning, the learner does not receive signed feedback of the task. Therefore, the learner has no direction of modifying movement coordination from binary feedback. A successful reinforcement learning mechanism includes two aspects - exploration and exploitation. Exploration searches the solution space and exploitation reproduces the coordination once the system has found a good solution. The long-term goal is to maximize the possibility of getting good feedback. One challenge is the ratio between exploitation and exploration. Exploitation ensures that the learner maintains current performance but doesn't help to improve the performance. Exploration provides the chance to improve the performance but the learner may not explore in the correct dimension, resulting a worse performance. As a result, reinforcement learning is often slow because of the uncertainty in the exploration, especially when the learner fails to explore the correct motor solutions. Since the forward model is not directly involved in this process, the reinforcement learning mechanism is a type of model-free learning (Haith & Krakauer, 2013). Exploration vs. noise Traditionally movement variability was seen as ‘noise’, and therefore considered detrimental to motor learning. However, because variability can also be due to exploration, there is evidence to show that variability has a positive role in motor learning (Dhawale et al., 2017; Herzfeld & Shadmehr, 2014; Murillo et al., 2017; Newell & Corcos, 1993; Wu et al., 2014). The variability is task-related, because of exploration, not the noise in the motor system. When the learner actively explores different movement patterns, the variability will be high in both task 15 space and joint space. However, these two sources of variability are hard to identify in the experiment. One solution to separate exploration and noise is to assume that participants are rational in the experiment, and only explore when they receive unsatisfactory outcomes. So the variability after ‘good’ trials can be defined as noise, and the learning process can be modeled with a Kalman filter (Therrien et al., 2016). However, this assumption of a ‘rational’ learner becomes more difficult when participants are engaged in high-dimensional tasks, where the distiction between exploration and noise becomes less obvious. Moreover, traditional linearized metrics like variance become more difficult to interpret in higher dimensions and also ignore the temporal component of the variation (Stergiou & Decker, 2011). Therefore, it is critical to study high-dimensional tasks to fully study the relationship between variability and exploration. Summary Reinforcement learning is slower to reach the solution space but it has the potential to learn alternative movement coordination patterns based on the amount of exploration. Reinforcement learning provides a flexible paradigm where experimenters can reduce the information in high dimensional joint space to binary feedback, and as a result it provides a tool to study learning of complex movements. 16 Use-dependent learning The paradigm Use-dependent learning is a type of unsupervised learning where there is no external feedback provided to the learner. The learner relies on the internal feedback and experience to learn the way to solve the task. The mechanism Use-dependent learning is the phenomenon that humans learn by repeating the same movement even though there is no external feedback provided (Krakauer & Mazzoni, 2011; Wolpert et al., 2011). This was shown in a reaching task that participants reduced variability by practicing the same task many times and caused a biased angle shift when reached to neighbor targets (Schaal et al., 2003). Moreover, since repeating the same task is what learners do to learn a new motor skill, use-dependent learning mechanism can happen along with error-based learning (Diedrichsen et al., 2010). Use-dependent learning does not actively help to learn alternative movement coordination patterns. Instead, this hinders people from moving to another movement coordination pattern after movement coordination pattern is well developed. This mechanism can be seen as a form of unsupervised learning (Jordan & Rumelhart, 1992; Todorov & Ghahramani, 2003). When the learner does not receive an external learning signal, all the available information for learning is based on the sensory input. The learner builds a probability model from all the sensory input then selects the most probable one as the next coordination. When one specific coordination is reproduced more than the others, it has a higher possibility. Then the learner has more chances to reproduce that coordination again. This explains why participants repeat the movement coordination pattern in the previous trials. 17 Summary Use-dependent learning is not an independent mechanism from error-based or reinforcement-based learning. Studies showed it happens along error-based or reinforcement- based learning (Todorov & Ghahramani, 2003). For example, when the learner experiences a series trials with zero error, from the error-based mechanism we should not observe much coordination change because zero error means there is nothing to be improved. Then use- dependent mechanism takes over the learning process where the learner tends to stick to the coordination in the previous trial. Another example in reinforcement learning, after a series of good feedback, the learner tends to keep to the movement pattern that has the largest possibility to receive good feedback. Use-dependent learning is an important mechanism to consider but often ignored when discussing the learning process. How to learn alternative movement patterns This dissertation focuses on how to use a different movement pattern to solve the same task. The aforementioned three learning paradigms are all possible to guide the learners to change the movement patterns. However, one might be more feasible for certain tasks. Understanding the difference between all three paradigms from the perspective of computation helps to design proper learning protocol. Error-based learning can help the learner to approach to the solution space, but hard to move to alternative coordination (Wolpert et al., 2011). This is because of all of the possible movement patterns in the task space have similarly low error. The error gradient directs learners to the solution space, but not within the solution space, so the mechanism of moving to alternative movement patterns is weak. An alternative is to provide the learning signal directly in 18 the joint space instead of the task space. This type of learning in skill acquisition research called observational learning, the learner observes a “good” movement pattern and try to imitate the pattern. Similarly, in robotics research, imitation learning is widely applied to teach a motor function to the robot (Schaal et al., 2003). Imitation learning provides the learning signal on coordination level, therefore the learner can compare prediction outcome to the imitation signal to construct an error-based learning paradigm. Imitation learning is faster than other learning algorithms in robotics. If motor learning researchers borrow this concept to design the learning signal in joint space, how the brain handles error feedback in a higher-dimensional space could be studied. Reinforcement learning, on the other hand, places a big emphasis on exploration. The learner explores actively to maximize future reward (or minimize future punishment). But it might be inefficient when the learner explores the whole solution space. One way to improve efficiency is to implicitly inform the nervous system which end-effector is important. For example, researchers added artificial noise to one of the end-effectors. Consequently, participants modified coordination by minimizing the noise in the noisy end-effector (Mehler et al., 2017; Thorp et al., 2017). In another study, the experimenters designed closed-loop reinforcement feedback where the current feedback was defined by previous feedback (Therrien et al., 2016). This design makes sure participants explore the same direction to ensure the learning performance. It is possible to reinforce a specific coordination pattern when the feedback was provided on the level of joint space. The idea of using error-based learning to move to the solution space then use reinforcement learning to change the movement pattern was proposed as a hypothesis (Wolpert et al., 2011) – however, there is no empirical evidence to show the combination will be effective. 19 But compared to the three learning paradigms, reinforcement learning is more likely to guide the learner to change movement patterns due to the active exploration. The learner sees the whole movement pattern as one action, the goal is to find the action which brings the highest future rewards. Although there is no empirical evidence to support the brain is using this algorithm to learn new movement patterns, in machine learning reinforcement learning, had great success in learning complex tasks. Therefore, the main focus of my work will use reinforcement learning. Summary Studying learning new movement patterns is a complex procedure with two main steps. First, there is a need to select a method to reduce the dimensions of the motor system, and make it easier to understand the change in patterns during learning. These dimensionality methods can be linear or nonlinear. Second, there is a need to understand how different learning mechanisms guide the learner toward new movement patterns. Based on these learning mechanisms, the focus of this dissertation is to use reinforcement learning to alter movement patterns. 20 CHAPTER 3 LEARNING ALTERNATIVE MOVEMENT PATTERNS USING REINFORCEMENT FEEDBACK IN A REACHING TASK Abstract One of the characteristic features of the human motor system is redundancy– i.e., the ability to achieve a given task outcome using multiple movement patterns. However, once participants settle on using a specific movement pattern, the process of learning to use a new alternative movement pattern to perform the same task is still poorly understood. Here, using two experiments, we examined this process of how participants shift from one movement pattern to another using different reinforcement schedules. Participants performed a virtual reaching task, where they moved a cursor to different targets positioned on the screen. Our goal was to make participants use a movement pattern with greater trunk motion, and to this end, we provided reinforcement by making the cursor disappear if the trunk motion during the reach did not cross a specified threshold value. In Experiment 1, we compared two reinforcement schedules in two groups of participants– an abrupt group, where the threshold was introduced immediately at the beginning of practice, and a gradual group, where the threshold was introduced gradually with practice. Results showed that both abrupt and gradual groups were effective in shifting their movement patterns to involve greater trunk motion, but the abrupt group showed greater retention when the reinforcement was removed. In Experiment 2, we examined the basis of this advantage in the abrupt group by using two additional control groups. Results showed that the advantage of the abrupt group was because of a greater number of practice trials with the desired movement patterns. Overall, these results show that reinforcement can be successfully used to shift movement patterns, which has potential in the rehabilitation of movement disorders. 21 Introduction Motor redundancy – the ability to perform a given motor task using different movement patterns - is a hallmark of the human motor system. Given the large number of degrees of freedom (DOF) in the body, the question of how the nervous system learns to harness this redundancy to produce goal-directed movement continues to be a central question in motor learning (Bernstein, 1967). This feature of motor redundancy (also see ‘motor abundance’, Latash, 2012) not only provides flexibility in performing everyday movements, but also underlies the phenomenon of ‘compensatory movements’ that are often observed after neurological injury such as stroke (Cirstea & Levin, 2000; Levin et al., 2009), where an alternative movement pattern is used to compensate for a specific movement deficit -e.g., using trunk motion during reaching to compensate for inadequate elbow extension. Therefore, from both theoretical and applied perspectives, it is critical to understand how participants learn to use motor redundancy. However, in spite of the extensive focus on redundancy at several levels of the motor system in the literature (Dingwell et al., 2010; Latash, 2012; Müller & Sternad, 2004; Todorov & Jordan, 2002), there has actually been surprisingly little attention in understanding how participants learn a new alternative movement pattern to perform the same task by reorganizing the DOFs. On the one hand, there is evidence that there is significant reorganization of DOFs during motor learning of novel tasks (Konczak et al., 2009; Newell, 1986; Vereijken et al., 1992) - however these studies typically do not address the issue of whether the new movement patterns that emerge with learning are ‘alternatives’ to the initial movement pattern (i.e. whether they could be used to produce the same task outcome). On the other hand, several studies have shown that participants can quickly change to alternative movement patterns to maintain the same task 22 outcome (Diedrichsen, 2007; Martin et al., 2011) – however these studies have typically employed already well-learned tasks such as reaching and force production. Therefore, in order to bridge this gap, we need to understand the acquisition of a new movement pattern in a task which is an ‘alternative’ to a pre-existing movement pattern (i.e. both can produce the same task outcome), and relatedly, how these changes in movement patterns can be elicited using augmented feedback. Here, we explored the use of reinforcement feedback as a tool, and tested different reinforcement schedules to alter movement patterns with multiple degrees of freedom. Reinforcement feedback, often summarized as ‘reward and punishment’ (Sutton & Barto, 2017; Wolpert et al., 2001), provides a scalar feedback about the movement, without providing precise error information. The goal of the participants is therefore to simply act in a way that maximizes the reward (or minimizes the punishment). It is important to note that the term ‘reinforcement’ used in this context is related but somewhat distinct from that used in the classic psychology literature (Ferster & Skinner, 1957) in that it can refer to both reward or punishment-like feedback. The rationale behind using reinforcement learning is that it is particularly well suited to the learning of novel movement patterns both from theoretical and practical viewpoints. From a theoretical viewpoint, it is different from error-dependent learning (Wolpert et al., 2011) in that there is no indication of the error magnitude, and therefore requires exploration to find the optimal solution. From a practical viewpoint, reinforcement feedback is often much simpler to provide in multi-DOF tasks because it is a simple scalar whereas in contrast, error-dependent feedback would have to be multidimensional since feedback would have to be provided on both the magnitude and direction of the error. However, it is unclear how reinforcement feedback is best utilized to elicit changes to coordination, so we examined how the reinforcement schedule 23 (abrupt vs. gradual) influenced the learning of a new movement pattern. Although several studies on motor adaptation (based on error-dependent learning), show greater retention (as measured by the ‘after-effects’ of adaptation) in the gradual reinforcement schedule (Kagerer et al., 1997; Shadmehr et al., 2010), we aimed to examine this relation in a reinforcement learning of a novel movement pattern. In this study, we examined the trunk-arm coordination during reaching. This is a system with redundancy because the position of the hand in space is influenced both by configuration of the arm, as well as the configuration of the trunk. Although several previous studies have examined how participants exploit the redundancy in multiple degrees of freedom during reaching, we focused on making participants shift from a ‘typical’ reaching motion to learning a ‘compensatory’ reaching motion. Typical reaching motions in unimpaired individuals involve little to no trunk motion for targets within arm length, whereas ‘compensatory’ movement patterns (similar to that seen after stroke) involved reaching with greater trunk motion (Cirstea & Levin, 2000). We hypothesized that (i) reinforcement feedback can shift movement pattern during reaching, and (ii) participants would show greater retention of the compensatory movement pattern in the gradual group compared to the abrupt group. Methods- Experiment 1 Participants Twenty-four college students (Mean age ± SD: 21 ± 1 years, 16 Female, 4 left-handed) with no history of neurological or musculoskeletal injury participated in the experiment for extra course credit. Participants provided informed consent and all procedures were approved by Institutional Review Board at Michigan State University. 24 Apparatus Participants sat in front of a desk facing a 50” (127 cm) television screen (Figure 3.1A). A motion capture system was used to record kinematics at a sampling rate of 120 Hz (Motion Analysis Corporation, Santa Rosa, CA). Ten retro-reflective markers were attached to the body - forehead, sternum, and bilaterally at the shoulder, elbow, wrist, and hand (third metacarpophalangeal joint). An additional eleventh marker was placed on the left side of the chest to distinguish left and right sides of the body. Task The task was a virtual reaching task where participants moved their right hand and attempted to move a cursor on a screen to specific targets. A MATLAB program was designed to receive (x,y,z) coordinates from motion capture system, and we mapped the (x,y) coordinates (corresponding to horizontal plane) of the right hand to a cursor on the screen. Six virtual targets were placed in three different directions (0, +45°, -45°) and two distances (near targets: 160 mm and far targets: 320 mm) from the starting position (Figure 3.1B). Targets were fixed at the same distances for all participants (i.e. they were not scaled to participant height or arm length). Participants were instructed to move the cursor from home position to the target to perform a virtual reaching task as fast as possible. Each trial ended when the cursor entered the specified target (and stayed inside the target for 500 ms). To keep the participants motivated throughout the experiment we provided additional feedback - if they reached the target within a target movement time (movement time had to be below 800 ms), the target was highlighted in yellow and they heard a rewarding tone, otherwise the target was highlighted in red. 25 Figure 3.1. Schematic of experimental setup (A) Participants sat in front of TV screen and their kinematics were recorded using a motion capture system. (B) Participants performed “virtual reaching” movements where they had to move the cursor on the screen to one of six different targets shown (three near and three far). The cursor position was controlled by the x-y coordinates of the marker on the right hand. Procedure There were a total of 7 blocks in the experiment - a pre-test, 5 blocks of training, and a post-test (Figure 3.2C). Participants performed 60 trials in each block (10 trials to each target), for a total of 420 trials in the entire experiment. In the pre-test and post-test blocks, there was no reinforcement feedback, which meant that participants could perform the reaching task with any movement pattern. In the training blocks, we introduced reinforcement feedback to constrain the participants to use a compensatory movement pattern (see Groups and reinforcement schedules section below for how this was implemented). Each trial consisted of a single outward reach from the home target to one of the six peripheral targets. The order of target presentation was randomized with the constraint that all six targets had to be presented before a target could repeat. Each experiment lasted for about 40 minutes. Providing Reinforcement feedback To make participants learn an alternative movement pattern – i.e., to use more trunk motion during the reach, we provided binary reinforcement feedback by manipulating the vision 26 of the cursor. First, we defined the movement to be compensatory by comparing the trunk-hand distance (i.e. the distance between the sternum and hand marker measured along the horizontal plane) to a specified threshold value (Figure 3.2A). Using this threshold, we manipulated the visual feedback such that the cursor was visible as long as the trunk-hand distance was smaller than threshold (i.e. greater trunk motion), but the cursor disappeared when the trunk-hand distance was larger than the threshold (Figure 3.2B). Depending on the trunk-hand distance, the cursor could appear and disappear multiple times during a single trial, and a trial could be completed only if the cursor was visible when it was inside the target. The instructions given to the participants were as follows: During the pre-test, participants were only given the instructions related to the task itself (i.e. moving the cursor to the targets as fast as possible). During the reinforcement blocks, participants were told that the cursor might occasionally disappear, and when this happened, they were simply instructed to explore different ways of moving their body until the cursor reappeared on screen. It is important to note that the instruction to ‘explore’ was meant only to prevent participants from stopping when the cursor disappeared - no explicit instruction was given about moving the trunk. Prior to the start of the post-test, participants were not explicitly told that the reinforcement feedback had been removed. The disappearance of the cursor served as reinforcement feedback because participants effectively knew when they made an “error” (the error in this context was defined as not using sufficient trunk motion during the reach), but did not have any precise information about how much this error was, or how much more trunk motion was needed to make in order to get the cursor back. Participants could also not ignore this feedback because they could not complete the trial without visual feedback of the cursor. 27 Figure 3.2. Reinforcement feedback and experimental protocol (A) Reinforcement feedback was provided based on the trunk-hand distance to elicit a movement pattern with greater trunk motion. (B) The cursor on the screen was visible as long as the trunk-hand distance was smaller than a specified threshold, but became invisible when the trunk-hand distance exceeded the threshold. (C) Participants performed 60 trials for each block and total of 7 blocks. Pre-test and post-test block had no reinforcement feedback. Blocks 1-5 (B1-B5) had reinforcement feedback. Groups and Reinforcement schedules We tested two types of reinforcement feedback schedules - abrupt, and gradual by assigning participants into one of two groups (n = 12/group). In the abrupt group, the threshold was immediately reduced from 600 to 360 mm in block 1, and raised back up to 600 mm in the post-test (Figure 3.3A). In contrast, in the gradual group, the threshold was gradually reduced over a set of 60 trials (a decrease of 48mm every 12 trials) so that the threshold was 360 mm only in block 2 (Figure 3.3B). Also, the threshold was raised back up gradually starting in block 5 so that the threshold was back to 600 mm by the end of block 5. The rationale for the specific values of 600 mm and 360mm were as follows - the 600 mm was essentially large enough to be a “no threshold” condition - i.e., the trunk-hand distance 28 did not cross this threshold during typical reaching movements, and participants could reach all targets using their typical movement pattern. However, when the threshold was set to 360 mm, the feedback depended on whether participants reached for the far or near targets. For the three far targets, making typical reaching movements crossed this threshold (i.e. the cursor would disappear), and therefore participants were required to reorganize their movements to these targets. However for the three near targets, the threshold was still large enough that participants could use the typical reaching movements, and therefore no reorganization was required for the near targets. It is important to note that because we used fixed target distances, the thresholds mentioned above were also fixed (i.e., not body-scaled) but they were set so that most participants would have to reorganize their movement pattern when the threshold was set to 360mm. Data Analysis Trunk-hand distance To examine changes in trunk-arm coordination, we computed the distance between sternum and right hand at the instant when the participants reached the target. This was the same variable that was used to control the reinforcement feedback. Since the hand had to travel the same distance, a longer trunk-hand distance meant lesser trunk movement, and a shorter distance meant more trunk movement. Path length To examine the temporal coordination between the trunk and the hand (i.e. whether the trunk and the hand moved simultaneously or sequentially), we used the path length. First we plotted the hand displacement (projected in the direction of the target) against the sternum displacement (also projected in the direction of the target). The path length was normalized by 29 dividing the actual path length by the shortest length between the start and end of the reach. Higher values indicated greater exploration, whereas lower values (closer to 1) represent a coordinated strategy with simultaneous motion of the trunk and hand. Statistical Analysis Trunk-hand Distance Because our threshold required compensatory movements to the far targets, but not the near targets, we separated the analysis of the far and near targets. For each of these, we were specifically interested in three different comparisons: First, to examine if reinforcement feedback had an effect on the coordination (i.e. the manipulation check), we compared the pre- test block and the first ‘full threshold’ block where the threshold was set at 360 mm (i.e. block 1 for the Abrupt group, and block 2 for the Gradual group). Second, to compare how participants adapted to the reinforcement feedback, we compared the first full threshold block (block 1 for Abrupt, block 2 for Gradual) to the last full threshold block (block 5 for Abrupt, block 4 for Gradual). Third, to examine retention of the pattern after the threshold was removed, we compared the pre-test block to the post-test block. The analyses were subject to a block (2) x group (2) repeated measures ANOVA with block as the within-subjects factor, and group as the between-subjects factor. The significance level was set at .05. Path length Because the path length was a measure of exploration (i.e. how participants reorganized the motion of their DOFs), we were primarily interested only in the phase when the reinforcement feedback was on. We compared the path length between the first full threshold block and the last threshold block to show how participants developed strategies to perform the 30 task. We ran a block (2) x group (2) repeated measures ANOVA with block as the within- subjects factor, and group as the between-subjects factor. The significance level was set as .05. Results Trunk-hand distance - Far targets The trunk-hand distance for the far targets is shown for the abrupt (Figure 3.3C) and the gradual group (Figure 3.3D). The average change between the groups is shown in Figure 3.4A. Effect of reinforcement feedback. There was a significant decrease in the trunk-hand distance from the pre-test to the first full threshold block (main effect of block: F(1,22) = 278.09, p < .001). In addition, the abrupt group showed a lower trunk-hand distance than the gradual group (main effect of group: F(1,22) = 5.17, p = .033). There was no significant Block x Group interaction (F(1,22) = 0.92, p = .347). Adaptation to reinforcement. There was a significant decrease in the trunk-hand distance from the first full threshold block to the last threshold block (main effect of block: F(1,22) = 9.62, p = .005). In addition, the abrupt group showed a lower trunk-hand distance (main effect of group: F(1,22) = 7.65, p = .011). There was no significant Block x Group interaction (F(1,22) = 4.17, p = .053). Post-test Retention. There was a significant decrease in the trunk-hand distance from the pre-test block to the post-test block (main effect of block: F(1,22) = 44.40, p < .001). In addition, the abrupt group showed a lower trunk-hand distance (main effect of group: F(1,22) = 8.62, p = .007), which was mediated by a significant Block x Group interaction (F(1,22) = 6.59, p = .018). Post hoc analysis showed there was no difference for both groups in pre-test (p = .467), but abrupt group had smaller trunk-hand distance in post-test (p = .023). 31 Trunk-hand distance - Near targets The trunk-hand distance for the near targets is shown for the abrupt (Figure 3.3E) and the gradual group (Figure 3.3F). The average change between the groups is shown in Figure 3.4B. For the near targets, there was no ‘requirement’ to change movement coordination since the targets were close enough that the trunk-hand distance would be under the 360 mm threshold. This is also seen in Figure 3.4C, which shows the proportion of time that the cursor was visible during the trial – there was a drop in the proportion for far targets initially in block 1 (indicating that the trunk-hand distance had exceeded the threshold), whereas the near targets were almost always visible. Effect of reinforcement feedback. There was as significant decrease in the trunk-hand distance from the pre-test to the first full threshold block (main effect of block: F(1,22) = 51.04, p < .001). There was no main effect of group (F(1,22) = 4.02, p = .058), and no Block x Group interaction (F(1,22) = 0.15, p = .702). Adaptation to reinforcement. There was no main effect of block (F(1,22) = 0.12, p = .735), and no main effect of group (F(1,22) = 2.54, p = .125), and no Group x Block interaction (F(1,22) = 0.82, p = .374). Post-test Retention. There was as significant decrease in the trunk-hand distance from the pre- test block to post-test block (main effect of block: F(1,22) = 18.16, p < .001). In addition, the abrupt group showed a smaller trunk-hand distance (main effect of group: F(1,22) = 5.73, p = .026). There was no Block x Group interaction (F(1,22) = 1.38, p = .252). 32 Figure 3.3. Variation of thresholds and actual trunk-hand distances (A) Threshold of abrupt group. (B) Threshold of gradual group. (C) Abrupt group at far targets. (D) Gradual group at far targets. (E) Abrupt group at near targets. (F) Gradual group at near targets. In panels (C-F), each line represents a single participant in that group. There was a marked decrease in the trunk-hand distance when the reinforcement feedback was provided (B1-B5). 33 Figure 3.4. Mean trunk- hand distance in far and near targets (A) Mean trunk- hand distance of all participants in abrupt and gradual groups when reaching to far targets. (B) Mean trunk- hand distance of all participants in gradual group when reaching to near targets. (C)(D) Mean proportion of time that cursor was visible when reaching to far and near targets. Error bars represent 1 SEM (between-participant). The gradual group showed poorer retention of the new movement pattern compared to the abrupt group in the post-test. Path Length - Far targets A sample trial from a single participant in each of four blocks of learning is shown in Figures 3.5A-D. The path length for the far targets is shown in Figure 3.5E. Adaptation to reinforcement. There was a significant decrease in path length from the first full threshold block to the last full threshold block (main effect of block: F(1,22) = 17.71, p < .001). There was no main effect of group (F(1,22) = 0.09, p = .771), but there was a significant interaction effect between group and block (F(1,22) = 4.87, p = .038). Post hoc tests showed both 34 groups had no difference in the first full threshold block (p = .200), but that the abrupt group had smaller path length in last full threshold block (p = .015). Path Length - Near targets The path length for the near targets is shown in Figure 3.5F. Adaptation to reinforcement. There was no main effect of block (F(1,22) = 2.13, p = .159), and no main effect of group (F(1,22) = 0.04, p = .845), and no interaction effect (F(1,22) = 0.01, p = .937). Figure 3.5. Schematic of path length (A) Pre-test block – the participant showed a ‘typical’ reaching motion with very little trunk movement. (B) First block with full threshold – participant 35 showed an exploratory strategy with increase in trunk movement. (C) Last block with full threshold – participant showed lower exploration and simultaneous movement of hand and trunk. (D) Post-test block, without the reinforcement feedback participant still reached the target with more trunk movement compared to the pre-test block. (E) Normalized path length at far targets for the abrupt and gradual groups. (F) Normalized path length at the near targets for the abrupt and gradual groups. Error bars in E and F represent one SEM (between-participant). Both abrupt and gradual groups show initial increases in path length during reinforcement feedback (indicating greater exploration), which decreases with further practice (indicating lesser exploration and simultaneous movement of the hand and trunk). Discussion of Experiment 1 and rationale for Experiment 2 The results of Experiment 1 showed reinforcement feedback helped to shift coordination in the reaching movement, and that the abrupt group had greater retention of trunk movement during post-test. This was seen both in the trunk-hand distance in the post-test (both far and near targets) as well as the smaller exploration index in the last full threshold block (indicating they had a more coordinated strategy). We examined two hypotheses that could potentially explain the greater retention of the new movement pattern in the abrupt group in this task. First, we hypothesized that the abrupt group had greater retention because it had more full threshold trials with reinforcement feedback compared to the gradual group (300 trials compared to 180 trials). Secondly, we hypothesized the gradual ‘ramp up’ phase (trials 300-360) of the gradual group resulted in participants becoming aware that the threshold had been removed prior to the post-test, which could have caused them to revert to the typical reaching motion faster. We added two new groups in Experiment 2 to test these hypotheses - (a) a ‘short abrupt’ group that had the same number of full threshold trials as the gradual group (i.e. 180 trials), and (b) a ‘gradual with abrupt return’ group where the threshold was introduced gradually, but removed abruptly (Figure 3.6D). 36 Methods – Experiment 2 All procedures were identical to that mentioned in Experiment 1. We recruited 24 additional healthy college students (Mean age ± SD = 21 ± 1 years, 14 Female, 1 left-handed). None of the participants in Experiment 2 were part of Experiment 1. Participants were randomly assigned to one of two groups (n=12/group): a short abrupt group (Figure 3.6A), or a gradual with abrupt return group (Figure 3.6B). Statistical Analysis We compared the two new groups to the gradual group separately to show the differences of trunk-hand distance and path length. A group (2) x block (2) mixed design was performed in each of the two comparisons (short abrupt vs. gradual, and gradual with abrupt return vs. gradual). Because the two groups in Experiment 2 were designed in order to examine hypotheses related to the greater retention of the abrupt group in the post-test, we were also only interested in comparing the groups at the post-test (i.e., post-test retention). Results Trunk-hand distance - Far targets The trunk-hand distance for the far targets is shown for the short-abrupt (Figure 3.6C) and the gradual with abrupt return group (Figure 3.6D). The average change between the groups is shown in Figure 3.7A. Post-test Retention (Gradual vs. Short abrupt). There was as significant decrease in the trunk- hand distance from pre-test block to post-test block (main effect of block: F(1,22) = 15.26, p < .001). However, there was no main effect of group (F(1,22) = 0.02, p = .877), and no Block x Group interaction (F(1,22) = 0.01, p = .934). 37 Post-test Retention (Gradual vs. Gradual with abrupt return).There was a significant reduction in the trunk-hand distance from the pre-test block to post-test block (main effect of block: F(1,22) = 16.38, p < .001). There was no main effect of group (F(1,22) = 0.56, p = .462), and no Block x Group interaction (F(1,22) = 0.63, p = .437). Figure 3.6. Variation of thresholds and actual trunk-hand distances (A) Threshold of short abrupt group. (B) Threshold of gradual with abrupt return group. (C) Short abrupt group at far targets. (D) Gradual with abrupt return group at far targets. (E) Short abrupt group at near 38 targets. (F) Gradual with abrupt return group at near targets. In panels (C-F), each line represents a single participant in that group. Again, similar to Experiment 1, both groups show decreases in trunk-hand distance when reinforcement feedback is provided. Figure 3.7. Group mean trunk- hand distance (A) Mean trunk- hand distance of all participants in the gradual group (from Experiment 1) and the short abrupt group, and gradual with abrupt return groups (from Experiment 2) when reaching to far targets. (B) Mean trunk- hand distance of all participants in the gradual group (from Experiment 1) and the short abrupt group, and gradual with abrupt return groups (from Experiment 2) when reaching to near targets Error bars represent 1 SEM (between-participant). There was no significant difference between either the gradual and the short-abrupt group, or the gradual and the gradual with abrupt return group in the post-test. Trunk-hand distance - Near targets The trunk-hand distance for the near targets is shown for the short-abrupt (Figure 3.6E) and the gradual with abrupt return group (Figure 3.6F). The average change between the groups is shown in Figure 3.7B. Post-test Retention (Gradual vs. Short abrupt). There was a significant decrease in the trunk-hand distance from pre-test block to post-test block (main effect of block: F(1,22) = 9.25, p = .005). However, there was no main effect of group (F(1,22) = 0.04, p = .847), and no Block x Group interaction (F(1,22) = 0.11, p = .739). 39 Post-test Retention (Gradual vs. Gradual with abrupt return). There was a significant reduction in the trunk-hand distance from the pre-test block to post-test block (main effect of block: F(1,22) = 7.56, p = .011). However, there was no main effect of group (F(1,22) = 1.80, p = .192), and no Block x Group interaction (F(1,22) = 0.08, p = .781). Discussion of Experiment 2 The results of Experiment 2 showed that (i) the short abrupt group did not show greater retention compared to the gradual group, and (ii) the gradual with abrupt return group did not show greater retention compared to the gradual group. These results support the idea that rather than the way the threshold was introduced, the number of trials practiced with reinforcement feedback of the desired movement pattern (which was determined by the number of full threshold trials in the current paradigm) determined the retention of the new movement pattern. General Discussion We examined the use of reinforcement to shift participants from using one coordination solution to another in a redundant task. Reinforcement in our task was provided by removing vision of the cursor if there was insufficient trunk motion during the reach, as defined by a threshold value. We examined two schedules of introducing the threshold - either abruptly or gradually to examine if there were any differences in retention of the new movement pattern after the threshold was removed. Regarding our first hypothesis, the results of Experiment 1 showed that both gradual and abrupt schedules of reinforcement were successful in creating changes in coordination from pre- test to post-test. Although the post-test showed signs of a return back to the pre-test movement pattern, it was still significantly different from the movement pattern in the pre-test even after 60 40 trials of reaching. Regarding our second hypothesis, we found that contrary to our expectation from adaptation studies, the abrupt group showed greater retention of the movement pattern compared to the gradual group. This was seen not only in terms of greater trunk motion (as indicated by the smaller trunk-hand distance), but also a more simultaneous coordinated movement between the trunk and the hand (as indicated by the path length). Experiment 2 further showed that the greater retention of the abrupt group was likely due to the greater number of trials practiced at full threshold, and not due to any specific differences between introducing the threshold in an abrupt or gradual manner. It is important to note that even though we statistically compared two groups from Experiment 2 to a group from Experiment 1 (i.e. they were not prospectively randomized), we are confident in the interpretations of these tests because the participants in these groups were highly similar both from a demographic standpoint, as well as their performance on the pre-test. These results support the idea of using reinforcement feedback to elicit changes in coordination. Previous studies on reinforcement learning have shown differences in motor adaptation tasks (Izawa & Shadmehr, 2011; Nikooyan & Ahmed, 2015). However as mentioned in the Introduction, these paradigms typically do not generally deal with the issue of learning a novel movement pattern requiring the reorganization of degrees of freedom. In contrast to smooth, exponential changes with practice that is typical of error-driven learning, reinforcement learning resulted in exploration where participants had to ‘break out’ of the typical movement pattern and employ search strategies to discover a new solution. Once this solution was learned, participants still refined the solution by decreasing the path length to produce a more coordinated movement between the trunk and the hand. Our results, along with other recent studies (Mehler et al., 2017; Thorp et al., 2017), show the potential of reinforcement feedback in understanding 41 learning and coordination in systems with motor redundancy, which could also be extended to rehabilitation of compensatory movements in movement disorders (Michaelsen et al., 2006; Ranganathan et al., 2017). When comparing abrupt and gradual schedules, we found results that are in general counter to the majority of the literature on motor adaptation (Huang & Shadmehr, 2009; Kagerer et al., 1997; Klassen et al., 2005). In our case, the abrupt schedule produced greater retention after the feedback was removed compared to the gradual group. Closer examination of this result in Experiment 2 suggested that the abrupt group had better performance not because of any conscious but because they had greater amounts of specific practice with the required movement pattern (which was determined by the number of full threshold trials). There are two critical differences between the previous studies and the current one. First, the advantage of the gradual group in adaptation studies is attributed to the size of the error signal (Criscimagna-Hemminger et al., 2010). It is hypothesized that smaller errors affect the credit assignment problem, which results in attributing more error to our body than to the environment (Berniker & Kording, 2008). In our study however, we used reinforcement feedback where participants only received binary feedback with no information about the magnitude of error. This difference between the experimental paradigms could explain why the gradual group did not show any advantages, and instead the number of trials at full threshold became the more critical factor. Second, the abrupt group showed a large amount of exploration for a short period of time very early in learning, which was followed by a period of stabilization of the new movement pattern (as indicated by the decrease in path length). In contrast, the gradual group (where the threshold was changed) had to explore by a smaller amount, but over a larger time period. This prolonged exploration meant that they had a smaller time to stabilize the new movement pattern, which potentially 42 affected retention. Consistent with this hypothesis, when the number of trials was reduced in the abrupt condition in Experiment 2 (for the short abrupt group), there was poorer retention. Further experiments are warranted to fully examine this hypothesis of how gradual vs. abrupt schedules affect stabilization of movement patterns in reinforcement learning. Finally, we also investigated the generalization of learning the new movement pattern by examining the movements to the near targets. As mentioned earlier, the near targets were positioned close enough that the reinforcement feedback had no influence when reaching to these targets (i.e. the cursor never disappeared when reaching to these targets). Therefore there was no requirement for participants to change their movement pattern when reaching to these targets. Yet, when practicing these near targets in combination with the far targets, participants changed their coordination even for the near targets by using greater trunk motion. These results clearly show that participants were “reusing” the same movement pattern for all targets, even though this is likely more energy inefficient from a metabolic energy standpoint (given the mass of trunk is much larger than the arm). These results support previous studies which found that in tasks requiring the learning of novel movement patterns, the computational cost of changing a solution may be more critical than metabolic cost in determining the movement pattern (Ganesh et al., 2010; Rosenbaum & Jorgensen, 1992). In summary, we found that reinforcement feedback was capable of causing a change in movement pattern used to perform a redundant task. Although participants moved back closer to the original movement pattern once the feedback was removed (indicating that the effects seen here were more reflective of short-term adaptation than long-term learning), we were able to identify clear changes that persisted over many trials, and across different schedules of training. Understanding how the nervous system organizes and reorganizes movement coordination still 43 remains a significant challenge in motor learning research and the current results highlight the potential of using reinforcement feedback for changing movement patterns both in healthy and neurologically impaired populations. 44 CHAPTER 4 SHAPING REINFORCEMENT FEEDBACK TO INDUCE CHANGES IN MOVEMENT PATTERNS IN A THROWING TASK Abstract Reinforcement learning has been used to facilitate motor learning, but its applicability in multiple degree of freedom (DOF) tasks with redundancy is not fully understood. A critical issue that arises in such tasks is how to use reinforcement to guide exploration in a high-dimensional space toward the desired movement patterns. Here, using two experiments, we examined the use of reinforcement feedback and different shaping techniques to change movement patterns in a multi-DOF task with redundancy. 70 college-aged participants performed a virtual throwing task where the goal was to throw a ball toward a target, and we introduced redundancy by making the ball velocity a linear combination of trunk and hand velocity. The goal of the participants was to use reinforcement feedback, provided as a score after each trial, to shift their movement pattern toward a higher trunk velocity. We used different shaping techniques to manipulate the threshold of trunk velocity below which participants received ‘punishment’ (i.e. a bad score) and examined the change in the trunk velocity with practice. In experiment 1, we compared three shaping techniques (abrupt, gradual and adaptive) that manipulated the threshold in different ways. In experiment 2, we compared four adaptive shaping techniques to further examine what characteristics of adaptive feedback. We found that: (i) reinforcement feedback in multi-DOF task was less successful in changing movement patterns compared to previous studies using single DOF tasks, (ii) adaptive shaping techniques, that incorporated the participant’s current level of performance, were more successful in changing movement patterns. These results highlight the potential of adaptive shaping techniques for learning complex motor skills. 45 Introduction Motor redundancy (or alternatively motor abundance (Latash, 2012)) , which arises due to the large number of degrees of freedom (DOF) in the human body, allows humans to perform most motor tasks with multiple movement patterns (Bernstein, 1967). This redundancy is present at several levels of analysis (trajectories, joints, muscles etc.), and understanding the organization of these DOFs with practice has been one of the central questions in motor learning (Martin et al., 2011; Müller & Sternad, 2004; Newell, 1986; Newell et al., 2003; Newell & Vaillancourt, 2001; J. P. Scholz & Schöner, 1999; Vereijken et al., 1992; Yang & Scholz, 2005). The phenomenon of motor redundancy raises an important issue from a motor learning standpoint. Specifically, although most studies of motor learning focus on a change in the task performance (e.g., increasing speed or decreasing variability) the fact that movement patterns and task outcomes do not have a one-to-one relation means that motor learning could involve changes in movement pattern without creating associated changes in the movement outcome (Latash et al., 2002). This aspect of motor learning is especially critical in situations when the movement pattern (rather than task performance) is the primary target of the intervention. For example, a golfer may want to alter their swing to reduce the risk of injury even if it does not result in any benefits in terms of their golf score. Similarly, in rehabilitation, stroke survivors are often guided away from the use of atypical compensatory movement patterns because they may impact long-term rehabilitation (Levin et al., 2009). Thus, the question of how to effectively create such changes in movement patterns in complex tasks with multi-DOF movements is still poorly understood. One of the challenges in using standard ‘prescriptive’ motor learning techniques (such as feedback and instructions etc.) to create changes in multi-DOF movements is the problem of 46 which information to specify. Because complex tasks involve the coordination of several DOFs, it is a challenge to get the learner to understand how to change each DOF simultaneously. For example, even a simple reaching motion involves motion at 7 DOFs (3 at the shoulder, 2 at the wrist, 1 at the elbow and 1 at the forearm). Therefore, trying to provide precise feedback or instructions on how to change each of these 7 DOFs after a given trial is likely not effective because it creates an ‘information overload’ for the learner (Wulf & Weigelt, 1997). Other techniques, such as visual demonstrations, help participants get a general idea of the intended movement pattern, but again, correcting specific errors at each DOF is difficult with such approaches. A potential solution to get around this problem is through reinforcement feedback. Reinforcement feedback is a signal that evaluates the whole performance then provides a simplified signal (either a binary or a scalar) to the learners (Wolpert et al., 2001). Because the feedback is not provided at the level of each DOF, participants typically engage in ‘exploration’ to learn alternative movement patterns (Dhawale et al., 2017). By exploring different movement patterns, participants could eventually learn the desired movement pattern that returns good feedback. Several studies of motor learning have shown that reinforcement feedback can be used to effectively modify movement patterns (Chen et al., 2018; Gläscher et al., 2010; Izawa & Shadmehr, 2011; Therrien et al., 2016). However, a key limitation of these studies is that they used simple tasks with limited or no redundancy. In redundant tasks with multiple DOFs, the exploration process is more complex because of the need to explore a higher dimensional space. Moreover, because these tasks require coordination of multiple degrees of freedom, there may be a greater possibility for ‘preferred’ coordination patterns that are more resistant to change (Schoner & Kelso, 1988). Therefore, there 47 is a need to examine ‘adaptive shaping techniques’ that can be used to guide exploration more effectively. Prior work has shown that adaptive reinforcement can be successfully used in modifying movement patterns in low dimensional tasks (Therrien et al., 2016). Here, we examine the use of different shaping techniques (Skinner, 1938) to change movement patterns in multi-DOF tasks. In this study, we performed two experiments where we used reinforcement feedback to guide participants to change to an alternative movement pattern in a multi-DOF task. The task was a virtual throwing task requiring coordination of the trunk and hand velocities. Typically, participants perform this task primarily with the hand (i.e. large hand velocity and small trunk velocity) – so our goal was to examine if we could get participants to increase their trunk velocity when throwing the ball by providing reinforcement feedback based on setting a threshold for the trunk velocity. We then compared the change in movement patterns between different shaping techniques that modified how this threshold across practice. In experiment 1, we compared three shaping techniques (abrupt, gradual and adaptive). While the abrupt and gradual groups changed the threshold in an ‘open-loop’ manner (i.e. the same way regardless of the participant performance), the adaptive schedule changed the threshold based on the participants’ prior performance (Therrien et al., 2016). In experiment 2, we compared four adaptive shaping techniques to further examine what characteristics of an adaptive schedule were most effective in facilitating a greater change of movement patterns. 48 Methods: experiment 1 Participants Thirty healthy college students (age range: 18-21 years, 20 females) with no upper body injuries were recruited to participate in the experiment for extra course credit. All participants provided informed written consent and all procedures were approved by the Institutional Review Board at Michigan State University. Apparatus Participants sat comfortably with both arms resting on the desk and faced a 50” (125 cm) television screen. The virtual throwing task was implemented with a 120 Hz Motion Capture system (Motion Analysis Corporation, Santa Rosa, CA), and a MATLAB program. Participants wore 11 retro-reflective markers (forehead, sternum, right chest, and bilaterally at the shoulder, elbow, wrist, and the third metatarsophalangeal joint) (Fig. 4.1) Figure 4.1. Experimental setup. Participants wore 11 retro-reflective markers. The ball on the screen was controlled by the right hand marker. The task was to glide the right hand through the line and maximize the score. There was no visual feedback on the screen except the score. 49 Task Participants were instructed to glide their right hand on the desk to throw a virtual ball to a target. The MATLAB program received the (x, y, z) coordinates of the retro-reflective markers from the motion capture system and mapped the (x, y) coordinates (corresponding to the plane of the desk) of the right metatarsophalangeal marker to a ball on the screen. Participants controlled the ball by the right hand and glided the right hand through a line on the screen. When the right hand passed this line, the ball was ‘released’ and the program computed the instantaneous velocity of y-direction (moving forward toward the screen) of the right metatarsophalangeal marker (to approximate hand velocity) and the sternum marker (to approximate trunk velocity). The ball velocity was then calculated based on the following equation: !!"##=4∗ !!"#$%+ !!!"# The goal for the participants was to throw the ball with a velocity of 1100 mm/s. This task is redundant because the same ball velocity can be achieved with different hand and trunk velocities. The trunk velocity was scaled up by 4 times based on pilot testing to make the variation in trunk velocity comparable to the variation in the hand velocity. Score feedback Participants did not see the target displayed on the screen, instead the only feedback they received after each throw was a score that reflected their performance. The score was calculated based on the following equation (Figure 4.2A): !"#$%=100− 90 (1+5!!!.!"#!)! Where x is the absolute error in the ball velocity (i.e. the absolute difference between the actual ball velocity and the target velocity of 1100 mm/s) 50 The maximum score was 100 and the minimum was 10. Participants were not informed about how the score was computed. They were only instructed to try different movement patterns to maximize the score. Figure 4.2. Mechanism of providing reinforcement feedback (A) Each blue point represents the combination of trunk and hand velocity. The Euclidean distance between the point and the solution manifold was mapped to the score function to calculate the score. The minimum score is 0, and the maximum score is 100. Different points could have the same score. (B) Reinforcement feedback was provided by adding a punishment zone to the state space. Participants received 0 score in the punishment zone even they were on the solution manifold. The size of the punishment zone was set by the trunk velocity threshold. Providing reinforcement feedback The goal was to examine how reinforcement feedback could be used to shift participants to use a higher trunk velocity when throwing the ball. The desired trunk velocity was set at 100 mm/s based on pilot testing – this number was high enough that participants could perceive this change as a different coordination pattern, but not so high to be uncomfortable for participants. Reinforcement was provided based on a trunk velocity threshold (Fig. 4.2B) and how this threshold was changed with practice differed based on the group participants were assigned to. When the trunk velocity was larger than the threshold, the score was calculated from the equation. When the trunk velocity was smaller than the threshold, the score was zero. It is important to note that because the score also reflected their performance on the task (i.e. how 51 close the throwing velocity was close to 1100 m/s), we used the “0-score” to signal that participants that they did something incorrectly (the minimum score otherwise was 10). The reinforcement feedback acted as punishment – i.e., we expected participants to increase the trunk velocity to avoid the 0 scores (Figure 4.2B). Procedures Each participant performed 8 blocks of 30 throws for a total of 240 trials in the experiment. Participants started from a pre-test block without reinforcement feedback (30 trials). Subsequently, they practiced 6 training blocks with reinforcement feedback (6 x 30 = 180 trials). Lastly, they performed a post-test block without reinforcement feedback (30 trials). In the pre- test and post-test, participants received normal score feedback. In the training blocks, we added one additional constraint to the score function to construct the reinforcement feedback. Participants were not informed about the onset and offset of the reinforcement feedback. Groups Reinforcement feedback was introduced in the training blocks with three shaping methods that varied in how the threshold was set for the trunk velocity across practice: abrupt, gradual, and adaptive. In the abrupt group, the threshold was increased to 100 mm/s in the first training block and remained constant until the last training block. In the gradual group, the threshold started from 0 mm/s in the first training block and linearly increased to 100 mm/s in the last training block. In the adaptive group, the threshold was calculated from the past performance of the participant. We took the average trunk velocity of the previous 6 trials to set the current threshold. Furthermore, to avoid overshooting the desired trunk velocity, the program set the 52 threshold as 100 mm/s when the average trunk velocity of the previous 6 trials exceeded 100 mm/s (Figure 4.3). Figure 4.3. Design of shaping methods. Abrupt group had huge rate of change of the threshold after pre-test, the threshold was fixed in the rest of the training. Gradual group had a positive linear change trial-by-trial in the training. The rate of change of threshold in Adaptive group was set by the average of previous 6 trials of the participant. Each participant in the Adaptive group had their own unique schedule depending on their performance. Data analysis Score The score feedback at the end of each trial was our measure of task performance, with higher scores indicating better task performance (i.e. the ball velocity was closer to 1100 m/s). In pre-test and post-test, there was no reinforcement feedback (max score was 100 and min score was 10). In the training blocks (B1 to B6), when the participants received reinforcement feedback, the score was 0, when participants did not receive reinforcement, the maximum score was 100, and the minimum was 10. Punishment rate per block The punishment rate in a block was the number of total trials where the participant was punished (i.e. received a score of zero because the trunk velocity was under the threshold) divided by the total number of trials in that block. A reduction in the punishment rate indicated 53 that participants increased their trunk velocity above the threshold. Because the pre-test and post- test did not have reinforcement, the punishment rate was computed only for the training blocks. Trunk and hand velocity The trunk and hand velocity were measured at the instant where the ball was released. Because the trunk velocity was multiplied by four to compute ball velocity, we report this value as the “standardized trunk velocity”. All the trunk velocities in the result section are standardized trunk velocity. Task and null space variability The exploration during the task was captured using the movement variability. In a redundant system, the movement variability can be separated into task space variability and null space variability (Latash et al., 2002; Mussa-Ivaldi et al., 2011; Ranganathan et al., 2014; Scholz & Schöner, 1999). The task space is the direction that is orthogonal to the solution space, where variability is detrimental to task performance. On the other hand, the null space is along the direction of the solution space, where variability does not affect task performance (Figure 4.4). Figure 4.4. Definition of task space and null space. Task space is the purple arrow, it is orthogonal to the solution manifold. Null space is the green arrow, it is parallel to the solution 54 manifold. To calculate the variability, the data were projected to both spaces to calculated the variance of all the projected points on each space. Statistical analysis We had two main research questions of interest – (i) in the training phase, how did reinforcement feedback affect movement for the different groups, and (ii) in the test phase, was the effect of reinforcement feedback present after the feedback was removed? To analyze this, we separated our analysis on the training phase and test phases. We used R (3.5.1) to run RM- ANOVA for all the dependent variables, and the alpha value was set as .05. Training phase First, we compared the change between the first block with reinforcement feedback (B1) and the last block with reinforcement feedback (B6) to investigate how participants reacted to the reinforcement feedback. We ran a block (2) X group (3) RM-ANOVA with block as the within- subject factor, and the group as the between-subject factor. Test phase Second, we compared the change between pre-test and post-test to investigate the overall change of practicing with different shaping methods. We ran a block (2) X group (3) RM- ANOVA with block as the within-subject factor, and the group as the between-subject factor. Results: experiment 1 To screen for outliers at the trial level, we used the Mahalanobis Distance (De Maesschalck et al., 2000). Approximately 3% of trials were removed based on this criterion before running the statistical tests. At the participant level, we also screened for high trunk velocities at the pre-test using a boxplot (because the goal of the study was to examine an 55 increase in trunk velocity). One participant in abrupt group and one participant from adaptive group were removed based on this criterion. Score Training phase There was a significant main effect by group (F(2,25) = 8.35, p = .002), and group x block interaction (F(2,25) = 7.03, p = .004). The main effect by block was not significant (F(1,25) = 4.0, p = .06). Analysis of the group x block interaction revealed that the gradual group decreased the scores from B1 to B6; the change in other two groups was not significant (Figure 4.5). Test phase There was no significant main effect by block (F(1,25) = .28, p = .76), group (F(2, 25) = 0.14, p = .71) or group x block interaction (F(2, 25) = 1.70, p = .20) (Figure 4.5). Figure 4.5. Change in mean score. The score dropped dramatically after reinforcement feedback was introduced because participants received punishment based on their trunk velocity. There was no reinforcement in the post-test so the average score increased back to the level of pre-test. Error bars indicate 1 SE (between-participants). 56 Punishment rate Training phase There was a significant main effects by block (F(1,25) = 10.27, p=.004), group (F(2,25) = 25.07, p < .001, and a group x block interaction (F(2, 25) = 17.37, p < .001). The analysis of the interaction revealed that in block B1, all three groups were significantly different from each other (with abrupt having the highest punishment rate, and gradual the lowest punishment rate). In block B6, the punishment rate in the abrupt group was higher than the adaptive group (Figure 4.6). Figure 4.6. Change in mean punishment rate. Abrupt group had constant high punishment rate throughout the experiment. Gradual group did not receive punishment at the beginning then gradually received more punishment toward the end of the experiment. Adaptive group had constant moderate punishment rate throughout the experiment. Trunk velocity Training phase 57 There was a significant main effect by block (F(1, 25) = 10.58, p = .003), with higher trunk velocities in B6 compared to B1. There was no significant main effect by group (F(2, 25), p = .44) or group x block interaction (F(2, 25) = 0.73, p = .49) (Figure 4.7A). Test phase There was a significant main effect by block (F(1, 25) = 23.47, p < .001), with higher trunk velocity in post-test compared to pre-test. There was no significant main effect by group (F(2, 25) = 1, p = .38). There was a significant group x block interaction effect (F(2, 25) = 5.80, p = .009). Trunk velocities increased from pre- to post-test but the interaction revealed that in the pre-test, there were no significant differences between groups; however in the post-test, the adaptive group had higher trunk velocities compared to the abrupt and gradual groups. (Figure 4.7A). Figure 4.7. Change in standardized trunk velocity and hand velocity (A) Compare pre-test and post-test standardized trunk velocity had the trend to go up in all three groups. Adaptive group had the largest increment in the post-test. (B) Hand velocity did not change much between pre-test and post-test. Hand velocity Training phase 58 There was no significant main effect by group (F(2, 25) =1.74, p = .20), no significant main effect by block (F(1, 25) = 0.87, p = .36), and no significant group x block interaction effect (F(2, 25) = 1.48, p = .25) (Figure 4.7B). Test phase There was a significant main effect by block (F(1, 25) = 4.55, p = .04), with a decrease in hand velocity in the post-test relative to the pre-test. There was no significant main effect by group (F(2, 25) = 1.96, p = .16) and no significant group x block interaction (F(2, 25) = 0.02, p = .98) (Figure 4.7B). Task space variability Training phase There was a significant main effect by block (F(1, 25) = 7.32, p = .01), the task space variability increased in B6 compared to B1. There was no significant main effect by group (F(2, 25) = 1.68, p = .21) and no significant group x block interaction (F(2, 25) = 0.78, p = .47) (Figure 4.8B). Test phase There was a significant main effect by block (F(1, 25) = 12.95, p=.001), the task space variability increased in B6 compared to B1. There was no significant main effect by group (F(2, 25) = 2.02, p = .15). There was a significant group x block interaction effect (F(2, 25) = 4.98, p = .002). The interaction showed there was no difference in the pre-test, but in the post-test, adaptive group had higher task space variability than the other two groups (Figure 4.8B). 59 Figure 4.8. Change in task and null space variability (A) Task space variability of abrupt and gradual group remained similar throughout the experiment. Task space variability increased in adaptive group. (B) Null space variability decreased in all three groups. Null space variability Training phase There was no significant main effect by group (F(2, 25) =1.39, p = .27), no significant main effect by block (F(1, 25) = 2.80, p = .11), and no significant interaction effect (F(2, 25) = 1.51, p = .24) (Figure 4.8B). Test phase There was a significant main effect by block (F(1, 25) = 7.32, p = .01), null space variability was lower in the post-test compared to pre-test. There was no significant main effect by group and no significant main effect by group (F(2, 25) = 1.68, p = .21) group x block interaction (F(2, 25)= 0.78, p = .47) (Figure 4.8B). 60 Summary of experiment 1 We investigated how to use reinforcement feedback with shaping to guide participants to change their movement pattern in a redundant task. Participants performed a virtual throwing task involving the coordination of the hand and trunk where reinforcement was provided to increase trunk velocity using different schedules (abrupt, gradual and adaptive). The main results showed that (i) exploration in all schedules were somewhat suboptimal as reflected by the high punishment rates at the end of learning (upwards of 50%), and (ii) although all groups increased their trunk velocity with practice, the adaptive group had the greatest change in trunk velocity from the pre- to the post-test. Rationale for Experiment 2 Our results found that the adaptive group had the greatest change in trunk velocity from pre- to post- test. Our adaptive algorithm was based on a simple average of the previous 6 trials – so we examined if modifying the parameters of the adaptive algorithm would have further benefits. We considered two parameters – memory (i.e. how many trials the adaptive algorithm acts over), and momentum (whether the adaptive algorithm uses the trend in those trials). We anticipated that this would provide further understanding regarding why the adaptive algorithm was more successful. Methods- experiment 2 Experiment 2 had four groups and shared the same procedure and data analysis with experiment 1. None of the participants in Experiment 2 had participated in Experiment 1. Given the relatively poor response to reinforcement in Experiment 1, we made two modifications to the experimental procedure in Experiment 2 - the score feedback and the instructions to the 61 participant. The rationale for modifying the score feedback was as follows: in experiment 1, the lowest score available without reinforcement was 10 while trials that received punishment got a score of zero. Given that punishment rates did not decrease with practice, we thought that this difference of “10 points” might have been too small for participants to perceive as punishment. So in Experiment 2, we increased this difference to be 200 points (with a maximum score of 1000 points). The rationale for changing the instructions was to examine if the exploration in Experiment 1 was suboptimal because participants did not know what body segments contributed to the task performance. As a result, we instructed the participants to focus on movements of the right hand and the trunk to examine if this would improve the exploration. Participants Forty healthy college students (age range: 18-24, 28 females, 4 left-handed) were recruited to participate in the experiment for earning extra credits. Left-handed participants also performed the task with their right hand. Because the focus of the task was only on generating a given velocity at release (and not trajectory or steady-state control), we did not expect handedness to play a major role. All participants signed the consent form and the consent process was approved by the Institutional Review Board at Michigan State University. Apparatus and task The experimental setting of experiment 2 was identical to experiment 1, with only the score feedback modified as shown below: !!"##=4∗ !!"#$%+ !!!"# 800 !"#$%=1000− (1+5!!!.!"#!)! 62 Where x is the absolute difference of ball velocity and target velocity (=1100 mm/s) The maximum score is 1000 and the minimum score is 200. (Figure 4.9A). During the training blocks, the reinforcement feedback was provided as the punishment of a score of zero. When the trunk velocity was smaller than the threshld, the partcipants received a zero. When the trunk velocity is larger than the threshold, the score was calculated by the equation where the maximum is 1000 and the minimum is 200. This made the zero score a punishment indicating something needed to be corrected. Figure 4.9. Schematic of scoring with and without reinforcement feedback (A) The score is 1000 when on the solution manifold (the purple line). When off the manifold the score was calculated by the distance between the point and the manifold, closer to the line the score is higher. (B) Adding a punishment zone to the state space. The punishment zone was defined by the trunk velocity. When the trunk velocity was smaller than then threshold the score would be 0 no matter how far to the solution manifold. Grouping and shaping methods We manipulated two parameters, memory and momentum in a crossed fashion, resulting in 4 groups. The memory parameter was changed by controlling the number of trials involved in the moving average. The momentum parameter was changed by using (or ignoring) the trend of how the velocities changed in that time window. This resulted in the following groups: 1) adaptive 2) long-adaptive 3) momentum 4) long-momentum. The adaptive group was the same as in the previous study, where a moving average of six prior trials was used to calculate the threshold of the next trial. In the Long-adaptive group, we increased the window size to ten prior 63 trials. In both groups, the threshold was computed using a moving average meaning that no trends in the data were used. In the Momentum group, we aimed to increase the speed of adjusting the threshold by using a linear regression over the prior six trials to predict the threshold on the next trial. Finally, in the Long-momentum group, we used the same strategy as the Momentum group but used a linear regression that included the previous 10 trials to predict the next threshold (Figure 4.10). Figure 4.10. Deign of reinforcement schedules (A) In the adaptive group in experiment 1, the threshold was the average of the previous six trials. (B) Long-adaptive group, increasing the window size (memory) of the average to 10, this shows how participants use delayed feedback to adjust motor behavior. (C) Momentum group, use a function to predict the threshold, so we can keep the momentum and create a smooth change. (D) The long-momentum group manipulates both window size and momentum Data analysis We kept the same dependent variables and ran the same statistical analyses as in experiment 1. Additionally, we showed how the threshold changed in different shaping methods. 64 Results: experiment 2 Following the same procedure as in experiment 1, we removed the participants who had high trunk velocity in the pre-test. There was 1 participant removed from each group. Then we used Mahalanobis Distance to remove the outliers before running the statistical test (De Maesschalck, Jouan-Rimbaud, & Massart, 2000). There were around the 5 % of trials got removed. Threshold Training phase All four groups increased the threshold in the training phase. There was a significant main effect by block (F(1, 32) = 11.68, p = .002), indicating that the threshold increased from B1 to B6. There was no significant main effect by group (F(3, 32) = 1.87, p = .15) and no significant group x block interaction effect (F(3, 22) = 0.36, p = .78) (Figure 4.11). Figure 4.11. Change in trunk velocity threshold. All the groups have higher threshold at the last training block but there was no significant group difference. Score Training phase 65 There was no significant main effect by block (F(1,32) = 2.61, p = .12), there was no significant main effect by group (F(3, 32) = 0.41, p = .75), and no significant group x block interaction effect (F(3,32) = 0.61, p = .61) (Figure 4.12). Test phase There was no significant main effect by block (F(1,32) = 0.69, p = .41), no significant main effect by group (F(3,32) = 0.18, p = .91), and no significant group x block interaction effect (F(3,32) = 0.48, p = .70) (Figure 4.12). Figure 4.12. Change in mean score. In the pre-test and post-test, there was no difference between the four groups. Punishment rate Training phase There was no significant main effect by block (F(1,32) = 1.50, p = .23), there was no significant main effect by group (F(3, 32) = 1.90, p = .15), and no significant group x block interaction(F(1,32) = 0.69, p = .41) (Figure 4.13). 66 Figure 4.13. Change in punishment rate. Adaptive group (blue) had the highest overall punishment rate. Long adaptive group (red) had the lowest overall punishment rate. Momentum group (black) and long momentum (brown) were in between where long momentum had higher average. Trunk velocity Training phase There was no significant main efffect by block (F(1,32) = 2.50, p = .07), no significant main effect by group (F(3, 32) = 2.08, p = .12) and no significant group x block interaction (F(3, 32) = 0.72, p = .67) (Figure 4.14A). Test phase There was a significant main effect by block (F(1,32) = 18.74, p < 0.001), trunk velocity was higher in the post-test. There was no significant main effect by group (F(3,32) = 0.70, p = .56) and no significant grouop x block interaction effect (F(3,32)=0.91, p = .45) (Figure 4.14A). 67 Figure 4.14. Change in standardized trunk velocity and hand velocity (A) Mean standardized trunk velocity over trials. After pre-test all four groups increased the trunk velocity. (B) Mean hand velocity over trials. There was no trend in hand velocity. Hand velocity Training phase There was no significant main effect by block (F(1,32) = 0.41, p = .53), no significant main effect by group (F(3, 32) = 0.37, p = .78), and no significant group x block interaction (F(3, 32) = 0.37, p = 0.78) (Figure 4.14B). Test phase There was no significant main effect by block (F(1,32) = 4.74, p = .04), no significant main effect by group (F(3, 32) = 1.28, p = .30), and no significant group x block interaction (F(3, 32) = 0.21, p = .89) (Figure 4.14B). F. Task space variability Training phase 68 There was no significant main effect by block (F(1,32) = 1.55, p = .22), no significant main effect by group (F(3, 32) = 2.23, p = .10), and no significant group x block interaction (F(3, 32) = 0.64, p = .60) (Figure 4.15A). Test phase There was a signnificant main effect by block (F(3, 32) = 19.43, p < .001), task space variability was higher in the post-test compared to pre-test. There was no significant main effect by group (F(3,32) = 0.41, p = .74) and no significaant group x block interaction (F(3,32)=1.23, p = .31) (Figure 4.15A). Figure 4.15. Change in task and null space variability (A) Task space variability in each block. Task space variability increased after pre-test. An inverse u shape showed participants explored less in the later phase of the training. (B) Null space variability in each block. The null space did not change much through the learning. Null space variability Training phase 69 There was a significaant main effect by block (F(1, 32) = 5.66, p = .02), null space variability was lower in B6 compared to B1. There was no significant main effect by group (F(1,32) = 0.88, p = 0.6) and no significant group x block interaction (F(3, 32) = 1.44, p = .25) (Figure 4.15B). Test phase There was a significant main effect by block (F(1, 32) = 5.51, p = .03), null space variability was lower in post-test compared to pre-test. There was no significant main effect by group (F(1,32) = 0.75, p = .53) and no significant group x block interaction (F(3, 32) = 1.12, p = .44) (Figure 4.15B). Summary of experiment 2 In Experiment 2, we added three variations of adaptive shaping to aim for a greater change of movement pattern. First, we modified the factor of memory in the algorithm, we wanted to see how participants reacted when including more previous trials in moving average (long adaptive group). Second, we used linear regression to calculate the thresholds, we wanted to see how exploration was performed with a wider search range (momentum group). Third, we modified both factors of memory and momentum together to see if both effects add up (long momentum group). The main findings that (i) all groups showed increased trunk velocity in the post-test, and (ii) although there were no statistical differences between the groups, the long- adaptive group seemed to perform the best relative to the adaptive group. General Discussion The overall goal of this study was to examine the use of reinforcement feedback to change movement patterns in a complex motor skill with multiple degrees of freedom. We found two main results across both experiments – (i) reinforcement feedback in multi-DOF tasks was 70 not as effective as previously reported in adaptation experiments with no redundancy, and (ii) an adaptive schedule outperformed open-loop schedules, with evidence that a more conservative strategy for setting the threshold performed the best. Reinforcement in multi-DOF tasks Our first question addressed the issue of whether reinforcement learning would be successful in altering movement patterns in multi-DOF tasks. We found that reinforcement learning (across all the groups used here with different shaping schedules) was only partly successful with punishment rates ranging from 30-80%. In prior studies with motor adaptation tasks, reinforcement learning show reward rates of 100% (i.e. equivalent to a punishment rate of 0% in our case), indicating that participants were able to clearly use reinforcement feedback to modify their movement pattern to the desired level. However, as mentioned before, a critical limitation of these tasks is that they mostly used single DOF reaching tasks where the only exploration was along the angle of the reach. The key challenge of reinforcement learning in multi-DOF tasks is exploration. When the exploration space is extremely small (as in single DOF tasks) reinforcement can easily guide the learner toward the solution, although there is some evidence that even in these tasks, there is a small proportion of individuals who fail to adapt (Chen et al., 2018). However, in the current task, with just 1 extra DOF, the two-dimensional DOF space was large enough that exploration was extremely difficult for participants. Moreover, even though the task was two dimensional, the exploration space could have been even higher dimensional. Experimental observations showed many participants exploring other body segments (such as the head or the other hand), indicating that participants were exploring a much higher dimensional space than what was specified in the task. Under these conditions, reinforcement learning, even with adaptive shaping 71 may not be sufficient to shift participants toward the desired movement pattern. Combining reinforcement feedback with other methods (such as demonstrations or attentional cues) may be necessary to improve exploration in high dimensional spaces. Shaping schedules To increase the efficiency of exploration in high dimensional spaces, shaping has been used in reinforcement learning (Ferster & Skinner, 1957). Shaping refers to providing additional rewards for making progress toward the desired solution and has been shown to be effective in animal experiments and also studies in artificial intelligence (Knox & Stone, 2009). Here we examined different shaping schedules in the context of altering movement patterns. Abrupt vs. Gradual In Experiment 1, we first compared two open-loop schedules - abrupt and gradual. The abrupt schedule, where the threshold is instantaneously raised to the desired level, is equivalent to no shaping (since there is no change in feedback until the desired movement pattern is reached), whereas the gradual group uses shaping (since small incremental changes in trunk velocity receive changes in feedback). Interestingly, although these two methods have had different effects on learning in a number of adaptation studies (Kagerer et al., 1997; Ludolph et al., 2017; Milner et al., 2018), we found that the abrupt and gradual group were not distinguishable in terms of the change of trunk velocity. However, the punishment rate suggested that they had two different routes to learning – the abrupt group, as expected, had high punishment rates throughout practice, indicating that exploration was not successful in determining the correct solution. However, the gradual group started off with low punishment rates but this rate kept increasing as the threshold increased, indicating a lack of ability to adapt to the feedback. One potential reason is that because the threshold in the gradual group increases 72 in an open-loop manner, once participants initially failed to adapt to the changing threshold, subsequent changes in threshold were even further away making it almost similar to the abrupt condition. These results suggest that open-loop shaping schedules are suboptimal in multi-DOF tasks because when exploration is slow and inefficient, even small changes in the threshold over trials can quickly become difficult to overcome. Adaptive schedules To overcome this limitation of open-loop schedules, we also tested an adaptive group, where the threshold was based on the participants’ performance. The punishment rate in this group was maintained around 50% throughout practice and was associated with the greatest increase in trunk velocity. These results support prior studies using adaptive schedules in motor adaptation (Therrien et al., 2018; Verstynen & Sabes, 2011) and suggest that the adaptive schedule created a condition where the exploration was at least moderately successful even in a multi-DOF task. In Experiment 2, we further examined the by manipulating two other factors – memory and momentum. Both factors essentially varied the aggressiveness with which the threshold changed over practice. Increasing memory to include more past information made the threshold estimate more conservative, whereas increasing the momentum to examine the trend across the past information made the estimate more aggressive. From the results, we found that the best learning results were found for the conservative threshold estimate, as seen by the lower punishment rate and the higher trunk velocity in the post-test. These results suggest that a more conservative estimate gave participants some time to explore and settle into a new movement pattern. 73 The success of the more conservative method is likely tied to that in our task, participants had to move away from their ‘preferred’ coordination pattern which involved mostly hand motion. Again, unlike prior adaptation studies where the only change required is the angle of reach (which does not have strong preferred directions), the task in the current experiment resembles real-life contexts (such as changing a golf swing or rehabilitation of a movement pattern) where there is a strong preference for an existing coordination pattern. These highlight the importance of considering the dynamical systems view of coordination where coordination patterns do not exist on a ‘blank slate’, but have different stability properties (Schoner & Kelso, 1988; Sternad, 1998). Overall, our results suggest that in such cases, reinforcement feedback with conservative adaptive are most likely to result in better learning outcomes. In summary, we found that reinforcement feedback can be used to change the movement patterns in a multi-DOF task. Adaptive schedules that modified reinforcement feedback based on participant performance had the greatest chance of modifying coordination patterns, especially when they were slow to create change. These results highlight the potential of reinforcement feedback in multi-DOF tasks and suggest that future work on adaptive schedules is needed to accelerate learning while still allowing participants to explore the space of possible solution. 74 CHAPTER 5 GENERAL DISCUSSION Overall scope The overall aim of this dissertation is to investigate how to guide participants to use alternative movement patterns to perform the same task by using reinforcement learning. Motor learning studies have mainly focused on the change of the task performance, with little focus on the change of movement patterns. However, studying the change of movement patterns is critical in contexts where different movement patterns can be used, but there may be advantages to using certain specific movement patterns. For example, in movement rehabilitation, the individuals with movement disabilities learn the correct way to perform the task to prevent further injuries. The process of learning different movement patterns is long and challenging because humans tend to use habitual movement patterns regardless of the pattern is optimal or not (De Rugy et al., 2012). To approach this question, I used reinforcement feedback to guide participants to learn alternative movement patterns. Reinforcement learning is the theoretical framework that connects three experiments in this dissertation. In motor adaptation research, reinforcement learning was used as a form of reward and punishment learning. The learners either changed the movement patterns to pursued the rewards or prevent the punishments. Given that prior work on reinforcement has used tasks that only require small manipulations of well-learned movement patterns (e.g. changing the direction of a reach), an important contribution of the current work is to examine how these reinforcement paradigms generalize to tasks involving the coordination of multiple DOFs. For all the three experiments, I used trunk-hand coordination as the multi-DOF task. Trunk-hand coordination is ubiquitous in our daily life, for example, when we reach, when we 75 throw, etc. The participants controlled the kinematics of the trunk and the hand to perform the task. With two degree of freedom in the system, there are many combinations of trunk and hand kinematics to solve the task. This setup provides the redundancy to study learning different movement patterns to perform the same task. Moreover, unlike bimanual movements where there are strong tendencies toward symmetry, using the trunk and the hand provided an opportunity to explore a larger range of coordination patterns. Contributions of the dissertation The first contribution of this dissertation is to investigate the reinforcement learning protocol in multi-DOF tasks. Given that human movement requires coordination of large number of DOFs (such as joints and muscles), I extended methods that have been used mainly for adaptation studies (where there is no change in the underlying coordination) to tasks requiring coordination of multiple DOFs. Increasing the number of DOFs is both theoretically relevant (as it now creates possibilities for multiple movement patterns to perform the task) and practically relevant (as it relates to common motor learning contexts like learning a golf swing or learning to reach after a stroke). I provided a series of experiments to approach the problem of learning multi-DOF tasks with the focus on changing movement patterns. The second contribution is to apply the concept of reinforcement learning and shaping to learn alternative movement patterns. The concept of shaping (or adapting reinforcement feedback based on individual performance) is widely used in many contexts of motor learning. For example, a coach adjusts the difficulty of the task based on the performance level of the athletes. Despite a general acceptance that such adaptive learning is a good strategy, how exactly shaping improve learning is not clear. My studies provide a systematic perspective to show how changing the parameters in shaping algorithm affects learning outcomes. 76 Online feedback vs delayed feedback One issue that I investigated in this dissertation concerns the timing of the feedback and how it affects learning of new coordination patterns. In experiment one, the reinforcement feedback was provided online during reaching. The participants got punished instantaneously if the trunk movement was smaller than the threshold. The experimental design showed reinforcement feedback successfully changed the movement patterns. However, such online feedback is limited in two ways – (a) it can only be used in slow positioning tasks where participants can react to the feedback during the movement, and (ii) it creates a confound in understanding the role of feedback because different participants receive different amounts of feedback (depending on how long they take to do the movement and how many errors they make). To better quantify the reinforcement feedback, in experiment two I used a discrete task that participants received one feedback after each trial. This allows this paradigm to be used in fast movements and also allows for controlling the total amount of feedback during training. With this design, I could calculate the rate of receiving feedback to show how reinforcement rate related to exploration. Providing offline feedback solved the problem of comparing across participants but increased the difficulty of the task. The feedback was given after the whole trial so the participants did not have much opportunity to map the feedback to the movement pattern. Especially the participants did not know the feedback was generated only by trunk velocity so they needed more trials to understand the feedback. Moreover, I used the reinforcement rate to discuss the exploration-exploitation tradeoff. Exploration- exploitation tradeoff is a fundamental problem to solve during learning. Providing online feedback worked better with the abrupt change of the threshold. The participants explored 77 extensively until they found the good solution. After this point, they exploit what they had learned to keep the good performance. On the contrary, providing offline feedback worked better with the change in the threshold was adaptive to the performance level of the learner. The exploration-exploitation ratio maintained at a certain level through the learning. The results showed that it is important to provide the task at a certain difficulty to have a good balance of exploration- exploitation ratio in learning. In sum, providing online feedback was a possible protocol to change movement patterns but it had the limitation to show how the amount of feedback guided the exploration. Providing offline feedback provided a fair comparison between participants so how exploration guided learning was clearer. Shaping reward/punishment during reinforcement Shaping is a common method to make sure the participants do not lose motivation and keep exploring, this term is from the early research of conditioning (Ferster & Skinner, 1957). The concept was also used in machine learning to help the agent to explore. There are two different ways to design shaping methods: 1) manipulating the task difficulty 2) manipulating the way to generate feedback. In the first type, the difficulty of the task changed during learning. For example, the algorithm lowers the difficulty of the task when the participants had trouble to receive the reward to help participants. In the visual-motor rotation tasks, participants receive the reward when they had the correct direction is an example of changing the actual task difficulty (Therrien et al., 2018). The second type, participants received different feedback based on their behavior (Coltman et al., 2019). For example, providing graded reinforcement feedback so participants got different degrees of reward. The two types seem similar on the behavior level, but they 78 interpretation could be very different from how the central nervous system learns the task. The main difference between the two types is whether the brain learns the cost function of the task or not (Körding & Wolpert, 2004). If the task difficulty is changing, that means the brain needs to learn different cost functions trial by trial. This mechanism leads to model-free reinforcement learning (Haith & Krakauer, 2013). For the second type, the task difficulty remains the same, the brain can actually learn the cost function, and this is a form of model-based reinforcement learning. My experiments followed the second type where the actual task difficulty did not change whereas I manipulated the protocol to provide feedback. However, the theoretical framework explains the behavior well, there is still no empirical evidence to show the difference between these two types to provide shaping, especially in multi-DOF tasks. My experiment is a step toward answering this question. These also raise many other interesting questions. There are many hyperparameters to play around to investigate how humans react to different shaping methods. For example, linear regression and moving average are both parametric methods to estimate the state of the participants. Non-parametric models like Gaussian Process (Nguyen-Tuong et al., 2009) can be used to fit the data. Changing the fitting model does not only have the chance to get different results in terms of the outcome but also provide a window into the learning mechanism. Limitation and future direction One limitation of this dissertation is the lack of methods to show how the variability plays the role in reinforcement learning in multi-DOF tasks. Reinforcement learning is n open paradigm where the learners are encouraged to explore. More exploration often leads to higher variance both on the performance and movement pattern level. This type of variability is essential to learn new movement patterns. However, exploration causes outliers in the data and 79 conventional linear method is not robust when estimating variance with outliers. Running outlier detection is not always a possibility since these outliers can potentially have important information. This problem becomes more serious when the tasks have more DOFs that the learner needs to explore more to find the solution. Finding a good method that provides consistent estimation of the variance across different types of exploration pattern is critical to describe this type of learning. One proposed solution for future studies is to use probabilistic modeling to model the distribution of the whole landscape then estimate the expected value and variance. Now the “outliers” are weighted by the probability, i.e., higher probability means the learner followed a pattern. This gives us more flexibility to accommodate the sparsity of the data and returns a better estimation of the variability. After having a reliable estimation of variability, the next challenge is to separate motor noise and exploration from variability. In a single-DOF task, this could be done by assuming participants did not change their strategy if they did not receive reinforcement feedback, then all the variability was motor noise and the changes after receiving reinforcement feedback was exploration. With this assumption, a stochastic process such as particle filter (Therrien et al., 2016) was applied to separate exploration and noise. However, this assumption could be problematic in multi-DOF tasks due to exploration can happen on all the dimensions, it is difficult to quantify exploration when multiple dimensions are involved. The future direction of this line of research will be to develop some methods to capture the “exploration” in multi-DOF tasks so the link between reinforcement learning and exploration can be further shown in motor learning. Conclusion Motor learning was mostly studied from the perspective of the change of performance but seldom focused on the change of movement patterns. Studying the change of movement patterns 80 provides additional information to describe how motor learning is implemented in the human nervous system. I used reinforcement learning as the theoretical framework to investigate how to guide participants using alternative movement patterns. In summary, this dissertation showed three main findings: 1) Reinforcement learning protocol could be applied to guide participants to use alternative movement patterns in a multi-DOF task. 2) Shaping outperformed non-shaping methods in shifting movement patterns in reinforcement learning protocol. 3) Changing parameters in shaping changed the performance of learning the alternative movement patterns. 81 REFERENCES 82 REFERENCES Berniker, M., & Kording, K. (2008). Estimating the sources of motor errors for adaptation and generalization. Nature Neuroscience, 11(12), 1454–1461. https://doi.org/10.1038/nn.2229 Bernstein, N. A. (1967). The Co-ordination and regulation of movements. Pergamon Press Ltd. Bhushan, N., & Shadmehr, R. (1999). Computational nature of human adaptive control during learning of reaching movements in force fields. Biological Cybernetics, 81, 39–60. Chen, X., Holland, P., & Galea, J. M. (2018). The effects of reward and punishment on motor skill learning. Current Opinion in Behavioral Sciences, 20, 83–88. https://doi.org/10.1016/j.cobeha.2017.11.011 Cirstea, M. C., & Levin, M. F. (2000). Compensatory strategies for reaching in stroke. Brain, 123(5), 940–953. https://doi.org/10.1093/brain/123.5.940 Coltman, S. K., Cashaback, J. G. A., & Gribble, P. L. (2019). Both fast and slow learning processes contribute to savings following sensorimotor adaptation. Journal of Neurophysiology, 121(4), 1575–1583. https://doi.org/10.1152/jn.00794.2018 Criscimagna-Hemminger, S. E., Bastian, A. J., & Shadmehr, R. (2010). Size of Error Affects Cerebellar Contributions to Motor Learning. Journal of Neurophysiology, 103(4), 2275– 2284. https://doi.org/10.1152/jn.00822.2009 Cusumano, J. P., & Cesari, P. (2006). Body-goal Variability Mapping in an Aiming Task. Biological Cybernetics, 94(5), 367–379. https://doi.org/10.1007/s00422-006-0052-1 De Maesschalck, R., Jouan-Rimbaud, D., & Massart, D. L. (2000). The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1), 1–18. https://doi.org/10.1016/S0169-7439(99)00047-7 De Rugy, A., Loeb, G. E., & Carroll, T. J. (2012). Muscle Coordination Is Habitual Rather than Optimal. Journal of Neuroscience, 32(21), 7384–7391. https://doi.org/10.1523/JNEUROSCI.5792-11.2012 Dhawale, A. K., Smith, M. A., & Ölveczky, B. P. (2017). The Role of Variability in Motor Learning. Annual Review of Neuroscience, 40(1), 479–498. https://doi.org/10.1146/annurev-neuro-072116-031548 Diedrichsen, J. (2007). Optimal Task-Dependent Changes of Bimanual Feedback Control and Adaptation. Current Biology, 17(19), 1675–1679. https://doi.org/10.1016/j.cub.2007.08.051 83 Diedrichsen, J., Shadmehr, R., & Ivry, R. B. (2010). The coordination of movement: Optimal feedback control and beyond. Trends in Cognitive Sciences, 14(1), 31–39. https://doi.org/10.1016/j.tics.2009.11.004 Dingwell, J. B., John, J., & Cusumano, J. P. (2010). Do Humans Optimally Exploit Redundancy to Control Step Variability in Walking? PLoS Computational Biology, 6(7), e1000856. https://doi.org/10.1371/journal.pcbi.1000856 Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. Appleton-Century-Crofts. https://doi.org/10.1037/10627-000 Ficuciello, F., Falco, P., & Calinon, S. (2018). A Brief Survey on the Role of Dimensionality Reduction in Manipulation Learning and Control. IEEE Robotics and Automation Letters, 3(3), 2608–2615. https://doi.org/10.1109/LRA.2018.2818933 Franklin, D. W., & Wolpert, D. M. (2011). Computational Mechanisms of Sensorimotor Control. Neuron, 72(3), 425–442. https://doi.org/10.1016/j.neuron.2011.10.006 Galea, J. M., Mallia, E., Rothwell, J., & Diedrichsen, J. (2015). The dissociable effects of punishment and reward on motor learning. Nature Neuroscience, 18(4), 597–602. https://doi.org/10.1038/nn.3956 Ganesh, G., Haruno, M., Kawato, M., & Burdet, E. (2010). Motor Memory and Local Minimization of Error and Effort, Not Global Optimization, Determine Motor Behavior. Journal of Neurophysiology, 104(1), 382–390. https://doi.org/10.1152/jn.01058.2009 Gläscher, J., Daw, N., Dayan, P., & O’Doherty, J. P. (2010). States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning. Neuron, 66(4), 585–595. https://doi.org/10.1016/j.neuron.2010.04.016 Gløersen, Ø., Myklebust, H., Hallén, J., & Federolf, P. (2018). Technique analysis in elite athletes using principal component analysis. Journal of Sports Sciences, 36(2), 229–237. https://doi.org/10.1080/02640414.2017.1298826 Guigon, E., Baraduc, P., & Desmurget, M. (2007). Computational Motor Control: Redundancy and Invariance. Journal of Neurophysiology, 97(1), 331–347. https://doi.org/10.1152/jn.00290.2006 Haith, A. M., & Krakauer, J. W. (2013). Model-Based and Model-Free Mechanisms of Human Motor Learning. In M. J. Richardson, M. A. Riley, & K. Shockley (Eds.), Progress in Motor Control (Vol. 782, pp. 1–21). Springer New York. https://doi.org/10.1007/978-1- 4614-5465-6_1 Herzfeld, D. J., & Shadmehr, R. (2014). Motor variability is not noise, but grist for the learning mill. Nature Neuroscience, 17(2), 149–150. https://doi.org/10.1038/nn.3633 84 Huang, V. S., & Shadmehr, R. (2009). Persistence of Motor Memories Reflects Statistics of the Learning Event. Journal of Neurophysiology, 102(2), 931–940. https://doi.org/10.1152/jn.00237.2009 Izawa, J., & Shadmehr, R. (2011). Learning from Sensory and Reward Prediction Errors during Motor Adaptation. PLoS Computational Biology, 7(3), e1002012. https://doi.org/10.1371/journal.pcbi.1002012 Jenkins, O. C., & Matarić, M. J. (2004). A spatio-temporal extension to Isomap nonlinear dimension reduction. Twenty-First International Conference on Machine Learning - ICML ’04, 56. https://doi.org/10.1145/1015330.1015357 Jordan, M. I., & Rumelhart, D. E. (1992). Forward Models: Supervised Learning with a Distal Teacher. Cognitive Science, 16(3), 307–354. https://doi.org/10.1207/s15516709cog1603_1 Kagerer, F. A., Contreras-Vidal, J. L., & Stelmach, G. E. (1997). Adaptation to gradual as compared with sudden visuo-motor distortions. Experimental Brain Research, 115(3), 557–561. Klassen, J., Tong, C., & Flanagan, J. R. (2005). Learning and recall of incremental kinematic and dynamic sensorimotor transformations. Experimental Brain Research, 164(2), 250–259. https://doi.org/10.1007/s00221-005-2247-4 Knox, W. B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. Proceedings of the Fifth International Conference on Knowledge Capture - K-CAP ’09, 9. https://doi.org/10.1145/1597735.1597738 Konczak, J., vander Velden, H., & Jaeger, L. (2009). Learning to Play the Violin: Motor Control by Freezing, Not Freeing Degrees of Freedom. Journal of Motor Behavior, 41(3), 243– 252. https://doi.org/10.3200/JMBR.41.3.243-252 Körding, K. P., & Wolpert, D. M. (2004). The loss function of sensorimotor learning. Proceedings of the National Academy of Sciences of the United States of America, 101(26), 9839–9842. Krakauer, J. W., & Mazzoni, P. (2011). Human sensorimotor learning: Adaptation, skill, and beyond. Current Opinion in Neurobiology, 21(4), 636–644. https://doi.org/10.1016/j.conb.2011.06.012 Krakauer, J. W., Pine, Z. M., Ghilardi, M.-F., & Ghez, C. (2000). Learning of Visuomotor Transformations for Vectorial Planning of Reaching Trajectories. Journal of Neuroscience, 20(23), 8916–8924. 85 Latash, M. L. (2012). The bliss (not the problem) of motor abundance (not redundancy). Experimental Brain Research, 217(1), 1–5. https://doi.org/10.1007/s00221-012-3000-4 Latash, M. L., Scholz, J. P., & Schöner, G. (2002). Motor control strategies revealed in the structure of motor variability. Exercise and Sport Sciences Reviews, 30(1), 26–31. Levin, M. F., Kleim, J. A., & Wolf, S. L. (2009). What Do Motor “Recovery” and “Compensation” Mean in Patients Following Stroke? Neurorehabilitation and Neural Repair, 23(4), 313–319. https://doi.org/10.1177/1545968308328727 Ludolph, N., Giese, M. A., & Ilg, W. (2017). Interacting Learning Processes during Skill Acquisition: Learning to control with gradually changing system dynamics. Scientific Reports, 7(1). https://doi.org/10.1038/s41598-017-13510-0 Martin, J. R., Zatsiorsky, V. M., & Latash, M. L. (2011). Multi-finger interaction during involuntary and voluntary single finger force changes. Experimental Brain Research, 208(3), 423–435. https://doi.org/10.1007/s00221-010-2492-z Martin, T. A., Keating, J. G., Goodkin, H. P., Bastian, A. J., & Thach, W. T. (1996). Throwing while looking through prisms: I. Focal olivocerebellar lesions impair adaptation. Brain, 119(4), 1183–1198. https://doi.org/10.1093/brain/119.4.1183 Mehler, D. M. A., Reichenbach, A., Klein, J., & Diedrichsen, J. (2017). Minimizing endpoint variability through reinforcement learning during reaching movements involving shoulder, elbow and wrist. PLOS ONE, 12(7), e0180803. https://doi.org/10.1371/journal.pone.0180803 Michaelsen, S. M., Dannenbaum, R., & Levin, M. F. (2006). Task-Specific Training With Trunk Restraint on Arm Recovery in Stroke: Randomized Control Trial. Stroke, 37(1), 186–192. https://doi.org/10.1161/01.STR.0000196940.20446.c9 Milner, T. E., Firouzimehr, Z., Babadi, S., & Ostry, D. J. (2018). Different adaptation rates to abrupt and gradual changes in environmental dynamics. Experimental Brain Research, 236(11), 2923–2933. https://doi.org/10.1007/s00221-018-5348-6 Müller, H., & Sternad, D. (2004). Decomposition of Variability in the Execution of Goal- Oriented Tasks: Three Components of Skill Improvement. Journal of Experimental Psychology: Human Perception and Performance, 30(1), 212–233. https://doi.org/10.1037/0096-1523.30.1.212 Murillo, D. B., Sánchez, C. C., Moreside, J., Vera-García, F. J., & Moreno, F. J. (2017). Can the structure of motor variability predict learning rate? Journal of Experimental Psychology: Human Perception and Performance, 43(3), 596–607. https://doi.org/10.1037/xhp0000303 86 Mussa-Ivaldi, F. A., Casadio, M., Danziger, Z. C., Mosier, K. M., & Scheidt, R. A. (2011). Sensory motor remapping of space in human–machine interfaces. In Progress in Brain Research (Vol. 191, pp. 45–64). Elsevier. https://doi.org/10.1016/B978-0-444-53752- 2.00014-X Neilson, P. D. (1993). The problem of redundancy in movement control: The adaptive model theory approach. Psychological Research, 55(2), 99–106. https://doi.org/10.1007/BF00419640 Newell, K. M. (1986). Constraints on the development of coordination. Motor Development in Children: Aspects of Coordination and Control. Newell, K. M., Broderick, M. P., Deutsch, K. M., & Slifkin, A. B. (2003). Task goals and change in dynamical degrees of freedom with motor learning. Journal of Experimental Psychology: Human Perception and Performance, 29(2), 379–387. https://doi.org/10.1037/0096-1523.29.2.379 Newell, K. M., & Corcos, D. M. (1993). Variability and Motor Control. Human Kinetics Publishers. Newell, K. M., & Vaillancourt, D. E. (2001). Dimensional change in motor learning. Human Movement Science, 20(4–5), 695–715. https://doi.org/10.1016/S0167-9457(01)00073-2 Nguyen-Tuong, D., Seeger, M., & Peters, J. (2009). Model Learning with Local Gaussian Process Regression. Advanced Robotics, 23(15), 2015–2034. https://doi.org/10.1163/016918609X12529286896877 Nikooyan, A. A., & Ahmed, A. A. (2015). Reward feedback accelerates motor learning. Journal of Neurophysiology, 113(2), 633–646. https://doi.org/10.1152/jn.00032.2014 Ranganathan, R., Wieser, J., Mosier, K. M., Mussa-Ivaldi, F. A., & Scheidt, R. A. (2014). Learning Redundant Motor Tasks with and without Overlapping Dimensions: Facilitation and Interference Effects. Journal of Neuroscience, 34(24), 8289–8299. https://doi.org/10.1523/JNEUROSCI.4455-13.2014 Ranganathan, R., Wang, R., Gebara, R., & Biswas, S. (2017). Detecting Compensatory Trunk Movements in Stroke Survivors using a Wearable System. Proceedings of the 2017 Workshop on Wearable Systems and Applications - WearSys ’17, 29–32. https://doi.org/10.1145/3089351.3089353 Reynolds, G. S. (1961). Relativity of response rate and reinforcement frequency in a multiple schedule. Journal of the Experimental Analysis of Behavior, 4(2), 179–184. 87 Rosenbaum, D. A., & Jorgensen, M. J. (1992). Planning macroscopic aspects of manual control. Human Movement Science, 11(1–2), 61–69. https://doi.org/10.1016/0167- 9457(92)90050-L Safonova, A., Hodgins, J. K., & Pollard, N. S. (2004). Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces. ACM Transactions on Graphics (ToG), 23(3), 514-521. Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society B: Biological Sciences, 358(1431), 537–547. https://doi.org/10.1098/rstb.2002.1258 Schaal, S., & Schweighofer, N. (2005). Computational motor control in humans and robots. Current Opinion in Neurobiology, 15(6), 675–682. https://doi.org/10.1016/j.conb.2005.10.009 Scholz, J. P., & Schöner, G. (1999). The uncontrolled manifold concept: Identifying control variables for a functional task. Experimental Brain Research, 126(3), 289–306. https://doi.org/10.1007/s002210050738 Scholz, J.P., Danion, F., Latash, M. L., & SchoÈner, G. (2002). Understanding finger coordination through analysis of the structure of force variability. Biological Cybernetics, 86(1), 29–39. Schoner, G., & Kelso, J. (1988). Dynamic pattern generation in behavioral and neural systems. Science, 239(4847), 1513–1520. https://doi.org/10.1126/science.3281253 Selinger, J. C., O’Connor, S. M., Wong, J. D., & Donelan, J. M. (2015). Humans Can Continuously Optimize Energetic Cost during Walking. Current Biology, 25(18), 2452– 2456. https://doi.org/10.1016/j.cub.2015.08.016 Shadmehr, R., Smith, M. A., & Krakauer, J. W. (2010). Error Correction, Sensory Prediction, and Adaptation in Motor Control. Annual Review of Neuroscience, 33(1), 89–108. https://doi.org/10.1146/annurev-neuro-060909-153135 Singh, P., Jana, S., Ghosal, A., & Murthy, A. (2016). Exploration of joint redundancy but not task space variability facilitates supervised motor learning. Proceedings of the National Academy of Sciences, 113(50), 14414–14419. https://doi.org/10.1073/pnas.1613383113 Skinner, B. F. (1938). The Behavior of Organisms. Appleton-Century-Crofts. New York. Stergiou, N., & Decker, L. M. (2011). Human movement variability, nonlinear dynamics, and pathology: Is there a connection? Human Movement Science, 30(5), 869–888. https://doi.org/10.1016/j.humov.2011.06.002 88 Sternad, D. (1998). Hot Topics in Motor Control and Learning: A Dynamic Systems Perspective to Perception and Action. Research Quarterly for Exercise and Sport, 69(4), 319–325. https://doi.org/10.1080/02701367.1998.10607705 Sutton, R. S., & Barto, A. G. (2017). Reinforcement Learning: An Introduction (2nd ed.). The MIT Press. Therrien, A. S., Wolpert, D. M., & Bastian, A. J. (2016). Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise. Brain, 139(1), 101–114. https://doi.org/10.1093/brain/awv329 Therrien, A. S., Wolpert, D. M., & Bastian, A. J. (2018). Increasing Motor Noise Impairs Reinforcement Learning in Healthy Individuals. Eneuro, 5(3), ENEURO.0050-18.2018. https://doi.org/10.1523/ENEURO.0050-18.2018 Thorp, E. B., Kording, K. P., & Mussa-Ivaldi, F. A. (2017). Using noise to shape motor learning. Journal of Neurophysiology, 117(2), 728–737. https://doi.org/10.1152/jn.00493.2016 Todorov, E., & Ghahramani, Z. (2003). Unsupervised learning of sensory-motor primitives. Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439), 1750–1753. https://doi.org/10.1109/IEMBS.2003.1279744 Todorov, E., & Jordan, M. I. (2002). Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5(11), 1226–1235. https://doi.org/10.1038/nn963 Tseng, Y., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Sensory Prediction Errors Drive Cerebellum-Dependent Adaptation of Reaching. Journal of Neurophysiology, 98(1), 54–62. https://doi.org/10.1152/jn.00266.2007 Vereijken, B., Emmerik, R. E. A. van, Whiting, H. T. A., & Newell, K. M. (1992). Free(z)ing Degrees of Freedom in Skill Acquisition. Journal of Motor Behavior, 24(1), 133–142. https://doi.org/10.1080/00222895.1992.9941608 Verstynen, T., & Sabes, P. N. (2011). How Each Movement Changes the Next: An Experimental and Theoretical Study of Fast Adaptive Priors in Reaching. Journal of Neuroscience, 31(27), 10050–10059. https://doi.org/10.1523/JNEUROSCI.6525-10.2011 Wang, L., & Suter, D. (2007). Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition. IEEE Transactions on Image Processing, 16(6), 1646–1661. https://doi.org/10.1109/TIP.2007.896661 Witte, K., Ganter, N., Baumgart, C., & Peham, C. (2010). Applying a principal component analysis to movement coordination in sport. Mathematical and Computer Modelling of Dynamical Systems, 16(5), 477–488. https://doi.org/10.1080/13873954.2010.507079 89 Wolpert, D. M., Diedrichsen, J., & Flanagan, J. R. (2011). Principles of sensorimotor learning. Nature Reviews Neuroscience, 12(12), 739. https://doi.org/10.1038/nrn3112 Wolpert, D. M., & Flanagan, J. R. (2016). Computations underlying sensorimotor learning. Current Opinion in Neurobiology, 37, 7–11. https://doi.org/10.1016/j.conb.2015.12.003 Wolpert, D. M., Ghahramani, Z., & Flanagan, J. R. (2001). Perspectives and problems in motor learning. Trends in Cognitive Sciences, 5(11), 487–494. https://doi.org/10.1016/S1364- 6613(00)01773-3 Wolpert, D. M., Ghahramani, Z., & Jordan, M. (1995). An internal model for sensorimotor integration. Science, 269(5232), 1880–1882. https://doi.org/10.1126/science.7569931 Wu, H. G., Miyamoto, Y. R., Castro, L. N. G., Ölveczky, B. P., & Smith, M. A. (2014). Temporal structure of motor variability is dynamically regulated and predicts motor learning ability. Nature Neuroscience, 17(2), 312–321. https://doi.org/10.1038/nn.3616 Wulf, G., & Weigelt, C. (1997). Instructions about Physical Principles in Learning a Complex Motor Skill: To Tell or Not to Tell…. Research Quarterly for Exercise and Sport, 68(4), 362–367. https://doi.org/10.1080/02701367.1997.10608018 Yang, J., & Scholz, J. P. (2005). Learning a throwing task is associated with differential changes in the use of motor abundance. Experimental Brain Research, 163(2), 137–158. https://doi.org/10.1007/s00221-004-2149-x 90