RESILIENT AND SAFE CONTROL OF CYBER-PHYSICAL SYSTEMS UNDER UNCERTAINTIES AND ADVERSARIES By Aquib Mustafa A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mechanical Engineering – Doctor of Philosophy 2020 ABSTRACT RESILIENT AND SAFE CONTROL OF CYBER-PHYSICAL SYSTEMS UNDER UNCERTAINTIES AND ADVERSARIES By Aquib Mustafa The recent growth of cyber-physical systems with a wide range of applications such as smart grids, healthcare, search and rescue and traffic monitoring, to name a few, brings new challenges to control systems due to the presence of significant uncertainties and undesired signals (i.e., disturbances and cyber-physical attacks). Thus, it is of vital importance to design resilient and safe control approaches that can adapt to the situation and mitigate adversaries to ensure an acceptable level of functionality and autonomy despite uncertainties and cyber-physical attacks. This dissertation begins with the analysis of adversaries and design of resilient distributed control mechanisms for multi-agent cyber-physical systems with guaranteed performance and consensus under mild assumptions. More specifically, the adverse effects of cyber-physical attacks are first analyzed on the synchronization of the multi-agent cyber-physical systems. Then, information-theoretic based detection and mitigation methods are presented by equip- ping agents with self-belief about the trustworthiness of their own information and trust about their neighbors. Then, the effectiveness of the developed approach is certified by ap- plying it to distributed frequency and voltage synchronization of AC microgrids under data manipulation attacks. In the next step, to relax some connectivity assumptions in the net- work for the resilient control design, a distributed adaptive attack compensator is developed by estimating the normal expected behavior of agents. The adaptive attack compensator is augmented with the controller and it is shown that the proposed controller achieves re- silient synchronization in the presence of the attacks on sensors and actuators. Moreover, this approach recovers compromised agents under actuator attacks and avoids propagation of attacks on sensors without discarding information from the compromised agents. Then, the problem of secure state estimation for distributed sensor networks is considered. More specifically, the adverse effects of cyber-physical attacks on distributed sensor networks are analyzed and attack mitigation mechanism for the event-triggered distributed Kalman filter is presented. It is shown that although event-triggered mechanisms are highly desirable, the attacker can leverage the event-triggered mechanism to cause triggering misbehaviors which significantly harms the network connectivity and performance. Then, an entropy estimation-based attack detection and mitigation mechanisms are designed. Finally, the safe reinforcement learning framework for autonomous control systems under constraints is developed. Reinforcement learning agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an uncertain sys- tem might encounter. To guarantee performance while assuring the satisfaction of safety constraints across variety of circumstances, an assured autonomous control framework is designed by empowering reinforcement learning algorithms with meta-cognitive learning ca- pabilities. More specifically, adapting the reward function parameters of the reinforcement learning agent is performed in a meta-cognitive decision-making layer to assure the feasibility of the reinforcement learning agent. Copyright by AQUIB MUSTAFA 2020 ACKNOWLEDGEMENTS I would like to express sincere thanks to my advisor Prof. Hamidreza Modares for his guidance, constant encouragement, and impeccable support during my doctoral research. I have been fortunate to learn the art of research under his exceptional tutelage. Apart from research, I have learned various things from him in my life during this journey. I am thankful to my committee members, Prof. Ranjan Mukherjee, Prof. George Zhu, and Prof. Zhaojian Li, for their help and insightful comments. I thank all my collaborators for all their help throughout my doctoral work. I thank Prof. Ali Bidram for his guidance and immense support for Chapter 3 of this dissertation. I thank Dr. Majid Mazouchi for all his help and discussions in collaborative projects. I believe the outcome of my year-long work was, by and large, a product of the excellent lab ambiance and support. I sincerely thank all my lab mates and friends. I am highly thank- ful to my close friends from Kanpur, Aligarh, Missouri, Colorado and of course Michigan for their help and support throughout this journey. Finally, I would like to thank all my family members for their unconditional support and encouragement during this journey. Especially, I would like to thank my parents and elder brothers for their immense support and sacrifice. I shall remain ever indebted to them. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Literature Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Secure Distributed State Estimation . . . . . . . . . . . . . . . . . . Safe Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Contributions and outline 1.4 Publications resulted from this work . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Resilient Control Design for Distributed Multi-Agent Systems 1.2.2 1.2.3 CHAPTER 2 RESILIENT SYNCHRONIZATION OF DISTRIBUTED MULTI- AGENT SYSTEMS UNDER ATTACKS . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Overview of Consensus in DMASs . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Attack Modelling and Analysis for DMASs . . . . . . . . . . . . . . . . . . . 2.4.1 Attack Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Attack Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Extention of Analysis Results to the Case of Noisy Communication . 2.5 An Attack Detection Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Attack detection for IMP-based attacks . . . . . . . . . . . . . . . . . 2.5.2 Attack detection for non-IMP-based attacks . . . . . . . . . . . . . . 2.6 An Attack Mitigation Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Self-belief of agents about their outgoing information . . . . . . . . . 2.6.2 Trust of agents about their incoming information . . . . . . . . . . . 2.6.3 The mitigation mechanism using trust and self-belief values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Non-IMP-based attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Simulation Results IMP-based attacks CHAPTER 3 DETECTION AND MITIGATION OF DATA MANIPULATION ATTACKS IN AC MICROGRIDS . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Preliminaries 3.3 Conventional Distributed Secondary Control . . . . . . . . . . . . . . . . . . 3.4 Attack Modeling and Detection Mechanism . . . . . . . . . . . . . . . . . . . vi ix x 1 1 4 4 6 7 8 11 13 13 13 15 16 16 19 26 27 28 33 35 35 37 38 41 41 43 44 45 48 48 49 49 52 3.4.1 Attack Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Attack Detection Mechanism . . . . . . . . . . . . . . . . . . . . . . 3.5 Resilient Distributed Control Mechanism . . . . . . . . . . . . . . . . . . . . 3.5.1 Belief of DERs About Their Own Observed Frequency . . . . . . . . 3.5.2 Belief of DERs About Their Neighbor’s Observed Frequency . . . . . 3.5.3 The Mitigation Mechanism Using Self and External-belief values . . . 3.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Case A: Simulation results for IEEE 34-bus feeder . . . . . . . . . . . 3.6.2 Case B: Simulation results for an Islanded Microgrid with 20 DERs . 3.6.3 Case C: Experimental verification of proposed techniques using a hardware-in-the-loop testing setup . . . . . . . . . . . . . . . . . . . . 3.6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 4 ATTACK ANALYSIS AND RESILIENT CONTROL DESIGN FOR 4.1 4.2 Preliminaries DISCRETE-TIME DISTRIBUTED MULTI-AGENT SYSTEMS . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Standard Distributed Consensus in MAS . . . . . . . . . . . . . . . . 4.2.2 4.3 Attack Analysis for Discrete-time DMAS . . . . . . . . . . . . . . . . . . . . 4.4 Resilient Distributed Control Protocol for Attacks on Sensor and Actuator : An Adaptive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 53 56 58 59 60 63 63 69 72 74 76 76 76 76 77 78 86 91 94 5.1 5.2 Preliminaries CHAPTER 5 SECURE EVENT-TRIGGERED DISTRIBUTED KALMAN FIL- TERS FOR STATE ESTIMATION OVER WIRELESS SENSOR 95 NETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.2.1 Process Dynamics and Sensor Models . . . . . . . . . . . . . . . . . . 97 5.2.2 Overview of Event-triggered Distributed Kalman Filter . . . . . . . . 5.2.3 Attack Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.3 Effect of Attack on Triggering Mechanism . . . . . . . . . . . . . . . . . . . 102 5.3.1 Non-triggering Misbehavior . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.2 Continuous-triggering Misbehavior . . . . . . . . . . . . . . . . . . . 106 5.4 Attack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Secure Distributed Estimation Mechanism . . . . . . . . . . . . . . . . . . . 112 5.5 5.5.1 Confidence of sensor nodes . . . . . . . . . . . . . . . . . . . . . . . . 113 5.5.2 Trust of sensor nodes about their incoming information . . . . . . . . 114 5.5.3 Attack mitigation mechanism using confidence and trust of sensors . 115 5.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 vii CHAPTER 6 ASSURED LEARNING-ENABLED AUTONOMY: A METACOG- 6.1 6.2 Preliminaries 6.2.1 Notations 6.2.2 6.2.3 Gaussian process NITIVE REINFORCEMENT LEARNING FRAMEWORK . . . . . . 125 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Signal Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 126 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.3 Problem Statement and Motivation . . . . . . . . . . . . . . . . . . . . . . . 129 6.4 Metacognitive Control Architecture . . . . . . . . . . . . . . . . . . . . . . . 131 . . . . . . . . . . . . . . 132 6.4.1.1 Metacognitive Monitoring . . . . . . . . . . . . . . . . . . . 134 . . . . . . . . . . . . . . . . . . . . . 142 6.4.1.2 Metacognitive Control 6.5 Low-Level RL-Based Control Architecture . . . . . . . . . . . . . . . . . . . 146 6.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.4.1 Metacognitive layer Monitoring and Control CHAPTER 7 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . 155 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 viii LIST OF TABLES Table 6.1: Vehicle Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 ix LIST OF FIGURES Figure 2.1: Schematic representation of the proposed resilient approach for DMASs. Figure 2.2: Communication topology. . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.3: The state of agents when Agent 1 is under an IMP-based attack. . . . . . Figure 2.4: Agent 5 is under IMP-based attack. The state of agents. . . . . . . . . . Figure 2.5: Agent 5 is under IMP-based attack. The local neighborhood tracking error of agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.6: Divergence for state of agents when Agent 5 is under a IMP-based attack. Figure 2.7: The state of agents using the proposed attack detection and mitigation approach for IMP-based attack. . . . . . . . . . . . . . . . . . . . . . . . Figure 2.8: The state of agents when Agent 5 is under a non-IMP-based attack. . . . Figure 2.9: Divergence for state of agents when Agent 5 is under a non-IMP based attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.10: The state of agents after attack detection and mitigation for non-IMP based attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.1: The flowchart of proposed attack detection and mitigation approach. . . Figure 3.2: Single line diagram of the microgrid test system in Case A. . . . . . . . . Figure 3.3: Communication graph of the microgrid test system in Case A. . . . . . . Figure 3.4: Case A: Effect of attack on DSFC: (a) frequency; (b) active power ratio. Figure 3.5: Case A: Relative entropy based on frequency of DERs. . . . . . . . . . . Figure 3.6: Case A: Resilient DSFC: (a) frequency; (b) active power ratio. . . . . . . Figure 3.7: Case A: Resilient DSFC: (a) relative entropy; (b) self-believes of DERs. . Figure 3.8: Effect of periodic attack on DSFC with 0.05 s duration: (a) frequency; (b) active power ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 38 41 42 42 42 43 43 43 44 44 57 62 62 64 64 64 64 66 Figure 3.9: Effect of periodic attack on DSFC with 0.05 s duration: (a) relative entropy; (b) self-believes of DERs. . . . . . . . . . . . . . . . . . . . . . . Figure 3.10: Effect of periodic attack on DSFC with 0.5 s duration: (a) frequency; (b) active power ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.11: Fig. 12. Effect of periodic attack on DSFC with 0.5 s duration: (a) relative entropy; (b) self-believes of DERs. . . . . . . . . . . . . . . . . . Figure 3.12: Effect of attack on DER 2 in DSVC: (a) voltage (V): (b) reactive power ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.13: Case A: Relative entropy based on voltage of DERs. . . . . . . . . . . . . Figure 3.14: Case A: Resilient DSVC: (a) voltage (V); (b) reactive power ratio. . . . . Figure 3.15: Case A: Resilient DSVC: (a) relative entropy; (b) self-belief of DERs. . . Figure 3.16: Microgrid testbed with 20 DERs. . . . . . . . . . . . . . . . . . . . . . . Figure 3.17: Communication graph of the microgrid testbed in Case B. . . . . . . . . Figure 3.18: Case B: Effect of attack on DSFC: (a) frequency; (b) active power ratio. Figure 3.19: Case B: Relative entropy based on frequency of DERs. . . . . . . . . . . Figure 3.20: Case B: Resilient DSFC: (a) frequency; (b) active power ratio. . . . . . . Figure 3.21: Case B: Resilient DSFC: (a) relative entropy; (b) self-belief of DERs. . . Figure 3.22: Microgrid test system for HIL testing. . . . . . . . . . . . . . . . . . . . Figure 3.23: HIL Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.24: Case C: Effect of attack on DSFC: (a) frequency; (b) active power ratio. Figure 3.25: Case C: Relative entropy based on frequency of DERs. . . . . . . . . . . Figure 3.26: Case C: Resilient DSFC: (a) frequency; (b) active power ratio. . . . . . . Figure 4.1: Graph topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.2: The Agents depth trajectory under the influence of the attacks on AUVs 2 and 3. Depth without adaptive compensator . . . . . . . . . . . . . . 66 67 67 67 68 68 69 70 70 70 71 71 71 73 73 73 74 74 92 93 xi Figure 4.3: The Agents depth trajectory under the influence of the attacks on AUVs 2 and 3. Depth with adaptive compensator . . . . . . . . . . . . . . . . 93 Figure 5.1: Effect of non-triggering misbehavior on sensor nodes {5,6} cluster the graph G in the two isolated graphs G1 and G2. . . . . . . . . . . . . . . . 105 Figure 5.2: Communication topology. . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Figure 5.3: Sensor network without any attack. (a) State estimation errors (b) Transmit function for sensor 2 . . . . . . . . . . . . . . . . . . . . . . . . 120 Figure 5.4: Sensor node 2 under continuous-triggering misbehavior. (a) State esti- mation errors (b) Transmit function for sensor 2 . . . . . . . . . . . . . . 121 Figure 5.5: Sensor node 2 under non-triggering misbehavior. (a) State estimation errors (b) Transmit function for sensor 2 . . . . . . . . . . . . . . . . . . 122 Figure 5.6: Sensor node 2 under attack. (a) Estimated KL divergence (b) Confi- dence of sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Figure 5.7: State estimation errors under attack on sensor 2 using proposed resilient state estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Figure 6.1: Proposed metacognitive control scheme. S: The system to be controlled; K: low-level RL controller; C: high-level metacognitive layer scheme. . . 133 Figure 6.2: Lane changing scenario for steering control of autonomous vehicle. . . . . 150 Figure 6.3: Lane changing with fixed value of hyperparameter without any change in dynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Figure 6.4: Constraint violation during lane changing with fixed value of hyperpa- rameter and change in dynamics. . . . . . . . . . . . . . . . . . . . . . . 151 Figure 6.5: Predicted fitness corresponding to vehicle trajectory under normal operation.152 Figure 6.6: Predicted fitness corresponding to vehicle trajectory under constraint violation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Figure 6.7: Overall fitness value under desired STL constraint violation for the ve- hicle trajectory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Figure 6.8: Vehicle trajectory with hyperparameter adaptation based on Algorithm 1 for lane changing scenario. . . . . . . . . . . . . . . . . . . . . . . . . . 153 xii Figure 6.9: Predicted fitness corresponding to vehicle trajectory with hyperparam- eter adaptation based on Algorithm 1. . . . . . . . . . . . . . . . . . . . 153 Figure 6.10: Overall fitness value under desired STL constraint for adapted vehicle trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 xiii CHAPTER 1 INTRODUCTION This chapter presents motivation, literature synopsis and contributions of this dissertation. 1.1 Motivation A cyber-physical system (CPS) refers to a class of engineering systems that integrates the cyber aspect of computation and communication elements with physical entities. Based on their control objectives CPSs can be categorized into two classes, namely distributed multi- agent systems (DMASs) and networked control systems (NCSs). The control objective in DMAS is to achieve a coordinated or synchronized motion or behavior through the exchange of local information among agents [1]-[4]. On the other hand, the control objective in NCS, for which the feedback loops are closed through a communication network, is to regulate the system’s output to the desired value or trajectory [86]. Despite their numerous applications in a variety of disciplines, DMASs and NCSs are cyber-physical systems that bring new challenges to control systems due to the presence of significant uncertainties and undesired signals (i.e., disturbances and cyber-physical attacks). Thus, it is of vital importance to design resilient and safe control approaches that can adapt to the situation and mitigate adversaries to ensure an acceptable level of functionality and autonomy despite uncertainties and cyber-physical attacks. The first part of this dissertation focus on attack analysis and resilient designs for DMASs. In the case of synchronization of DMASs, the coordination objective is to guarantee that all agents reach agreement on a common value or trajectory of interest. DMASs are cyber- physical systems that incorporate communication as a cyber component to facilitate the exchange of information among agents. This, however, makes them vulnerable to variety of attacks. In contrast to other undesirable inputs, such as disturbances and noises, attacks are intentionally planned to maximize the damage to the network. Therefore, it is important to 1 analyze the adverse effects of attacks on the performance and then design resilient DMASs that can mitigate attacks and guarantee an acceptable level of functionality despite attacks. To address the problem of interest, Chapters 2, 3 and 4 of this dissertation focus on attack analysis and resilient designs for DMASs. Next, to perform monitoring and to successfully design a controller for different systems where state measurements are not available, one needs to do the state estimation over wire- less sensor networks (WSNs). WSNs are a class of multi-agent CPSs for which a set of sensors are spatially distributed to monitor and estimate a variable of interest (e.g., loca- tion of a moving target, state of a large-scale system, etc.), and have various applications such as surveillance and monitoring, target tracking, and active health monitoring [103]. In centralized WSNs, all sensors broadcast their measurements to a center at which the infor- mation is fused to estimate the state [104]. These approaches, however, are communication demanding and prone to single-point-of-failure. To estimate the state with reduced com- munication burden, a distributed Kalman filter (DKF) is presented in [105]-[111], in which sensors exchange their information only with their neighbors, not with all agents in the network or a central agent. Cost constraints on sensor nodes in a WSN result in correspond- ing constraints on resources such as energy and communications bandwidth. Sensors in a WSN usually carry limited, irreplaceable energy resources and lifetime adequacy is a signif- icant restriction of almost all WSNs. Therefore, it is important to design event-triggered DKF to reduce the communication burden which consequently improves energy efficiency. To this end, several energy-efficient event-triggered distributed state estimation approaches are presented for which sensor nodes intermittently exchange information [112]-[115]. More- over, the importance of event-triggered state estimation problem is also reported for several practical applications such as smart grids and robotics [116]-[119]. Although event-triggered distributed state estimation is resource-efficient, it provides an opportunity for an attacker to harm the network performance and its connectivity by corrupting the information that is exchanged among sensors, as well as to mislead the event-triggered mechanism. Thus, it 2 is of vital importance to design a resilient event-triggered distributed state estimation ap- proach that can perform accurate state estimation despite attacks. To address this problem, Chapter 5 of this dissertation first analyzes the adverse effects of attacks and then presents the secure state estimator for distributed sensor networks. Finally, the safe control design problem for autonomous CPSs under uncertainties is ad- dressed. More specifically, the safe reinforcement learning (RL) framework for autonomous control systems under constraints is developed. RL is a goal-oriented learning approach, in- spired by biological systems, and is concerned with designing agents that can take actions in an environment to maximize some notion of cumulative reward [138]-[141]. Despite tremen- dous success of RL in a variety of applications, including robotics [142], control [143], and human-computer interaction [144], existing results are categorized as weak artificial intelli- gence (AI) [145]. That is, current RL practice has mainly been used to achieve pre-specified goals in structured environments by handcrafting a cost or reward function for which its minimization guarantees reaching the goal. Strong AI, on the other hand, holds the promise of designing agents that can learn to achieve goals across multiple circumstances by general- izing to unforeseen and novel situations. As the designer cannot foresee all the circumstances that the agent might encounter, pre-specifying and handcrafting the reward function cannot guarantee reaching goals in an uncertain and non-stationary environment. Reward shaping [146]-[147] has been presented in the literature with the primary goal of speeding up the learning without changing the outcome of the solution. Intrinsic motivated RL [148] has also been presented so that agents learn to shape their reward function to a better trade-off between exploration and exploitation or learn faster for the applications that the external environmental reward is sparse. Thus, in Chapter 6 of this dissertation, to guarantee perfor- mance while assuring the satisfaction of safety constraints across variety of circumstances, we present an assured autonomous control framework is designed by empowering RL algorithms with meta-cognitive learning capabilities. 3 1.2 Literature Synopsis In this section, we review the literature in areas relevant to this dissertation. We organize the literature according to the broad topics of interest in this dissertation. 1.2.1 Resilient Control Design for Distributed Multi-Agent Systems There has been a surge of interest in developing attack detection/identification and miti- gation approaches for cyber-physical systems [5]-[25], including DMASs [16]-[25]. In [5]-[6], conditions under which an attacker can remain unnoticed are presented, followed by detec- tion and identification mechanisms for attacks on sensors and actuators of cyber-physical systems. Resilient state estimation and control algorithms for cyber-physical systems under attacks are reported in [7]-[8]. A passivity-based attack mitigation mechanism is proposed in [9]. Teixeira et al. in [10] categorized attacks based on the attacker’s knowledge, disclo- sure, disruption resources, and characterized their impact using the concept of safe sets for cyber-physical systems. Various game-theoretic frameworks based resilient state estimation approaches are presented in [11]-[14]. Recently, in [15], authors presented proactive and re- active defense mechanisms to mitigate attacks on sensors and actuators. Although elegant, these aforementioned results do not apply to DMASs for which the overall objective is to synchronize the agents’ states to some value of interest. For DMASs, attack detection and mitigation algorithms are presented in [16]-[18]. Mean square subsequence based resilient distributed control protocols for consensus of DMASs are presented in [19]-[21]. In these approaches, agents discard neighbors’ information based on the discrepancy between their neighbors’ values and their own values. Moreover, the maxi- mum number of agents under attack is assumed known and a network connectivity assump- tion is made based on that. Adaptive local resilient control protocols based attack mitigation are designed to directly mitigate attack without identifying them using an observer-based approach [22]. An attacker-defender game framework is presented in [23] on networks with 4 unknown graph topology in which the defender injects control inputs to reach a consensus while attenuating the attack signal from compromised agents. Similarly, a controllability Gramian based game-theoretic approach is presented for resilient distributed consensus in [24]. A comprehensive survey on security of the cyber-physical system is presented in [25], categorizing the reported results for DMASs in three classes, called, prevention, resilience, and detection isolation. Despite tremendous and welcoming progress, most of the mentioned mitigation approaches for DMASs use the discrepancy among agents and their neighbors to detect and mitigate the effect of an attack. However, as shown in Chapters 2 and 4, a stealthy attack can make all agents unstable simultaneously, and thus misguide existing mitigation approaches. Moreover, this discrepancy could be caused by a legitimate change in the state of an agent, and rejecting this useful information can decrease the speed of convergence to the desired consensus and harm connectivity of the network. Several remarkable results are presented for resilient control designs for important appli- cations such as robotics and power systems. In [87]-[88], authors presented resilient algo- rithms for flocking and active target tracking applications in robotics, respectively. The work presented in [87] ensures a resilient consensus if the network connectivity is greater than a resilience threshold with the assumption that a compromised agent can share wrong infor- mation, but the actuator always works properly. The network connectivity constraints on the graph topology are relaxed in [89] by including trusted nodes. Similarly, the bulk of the research in cybersecurity of power systems focuses mainly on attack detection techniques [66]- [73]. Different techniques, including, adaptive cumulative sum using Markov-chain analysis [66], Kalman filter [67], graphical method [68], model-based scheme [69], matrix separation technique [70], Chi-square detector and cosine similarity matching approach [71], and non- linear internal observer [72] are introduced for the attack detection in power systems with centralized control structure. The proposed attack detection filter in [16] and systematic detection and localization strategy in [73] tackle the attack detection in distributed control systems. In [74], signal temporal logic has been utilized for attack detection in a distributed 5 control system. Attack mitigation has also recently been considered in power systems. In [75], sensor fault detection and mitigation schemes are proposed to mitigate the impacts of cyber-attacks in DC power systems with a centralized control structure. Reference [76] proposes a trust/confidence-based approach for cyber-attack mitigation in the distributed control system of DC microgrids. In [78], a two-fold strategy is proposed to mitigate the impacts of FDI attacks on the control system of the shipboard power system. In [79], a trust/confidence-based control protocol is proposed to mitigate the impact of attacks on the distributed secondary control of AC microgrids. This approach, however, only considers the secondary frequency control and does not address the attack mitigation of secondary voltage control. In Chapter 3, we present FDI-attack detection and mitigation approaches for dis- tributed secondary control of microgrids that are not limited to any specific type of attack with only mild restrictions on network connectivity. 1.2.2 Secure Distributed State Estimation In recent years, secure state estimation of CPSs have received significant attention and re- markable results have been reported for mitigation of cyber-physical attacks, including denial of service attacks [10], [120], false data injection attacks [5]-[7],[121]-[122], and bias injection attacks [36], [123]. For the time-triggered distributed scenario, several secure state estima- tion approaches are presented in [124]-[131]. Specifically, in [124]-[132] authors presented a distributed estimator that allows agents to perform parameter estimation in the presence of attack by discarding information from the adversarial agents. Byzantine-resilient distributed estimator with deterministic process dynamics is discussed in [126]. Then, the same authors solved the resilient distributed estimation problem with communication losses and intermit- tent measurements in [127]. Attack analysis and detection for distributed Kalman filters are discussed in [128]. Resilient state estimation subject to denial of service attacks for power system and robotics applications is presented in [129]-[131]. Although elegant, these aforementioned results for the time-triggered resilient state estimation do not apply to event- 6 triggered distributed state estimation problems. To solve this problem, we analyze the effect of adversaries and design secure event-triggered distributed Kalman filters for state estima- tion over wireless sensor networks in Chapter 5. 1.2.3 Safe Reinforcement Learning In the control community, several RL-based feedback controllers have been presented for con- trol of uncertain dynamical systems [149]-[153]. In these traditional RL-based controllers, the reinforcement signal feedback is derived through a fixed quadratic objective function [154]-[155]. A fixed reward or objective function, however, cannot guarantee to achieve desired specifications across all circumstances. To express rich specifications rather than quadratic objectives, temporal logic, as an expressive language close to human language, has been widely used. Temporal logic is well suited for specifying goals and introducing domain knowledge for the RL problem [156]-[159]. RL with temporal logic specifications has also been used recently [160]-[163]. However, defining the rewards solely based on temporal logic specifications and ignoring numerical rewards can result in sparse feedbacks in control sys- tems and cannot include other performance objectives such as energy and time minimization. Moreover, the system pursues several objectives and as the circumstance changes, the sys- tem’s needs and priorities also change, requiring adapting the reward signal to encode these needs and priorities to the context. It is therefore desired to design a controller that provides a good enough performance across variety of circumstances while assuring that its safety- related temporal logic specifications are satisfied. To this end, Chapter 6 of this dissertation takes a step towards strong AI for feedback control design by presenting a notion of adaptive reward function by introducing a metacognitive layer that decides on what reward function to optimize depending on the circumstance. More specifically, a metacognitive assured RL framework is presented to learn control solutions with good performances while satisfying desired specifications. 7 1.3 Contributions and outline In this section, we outline the organization of the chapters in this dissertation and provide the contributions of each chapter. The key contributions of the dissertation are listed as follows. • Attack Analysis and Resilient Designs for Multi-agent CPSs Chapter 2: In this chapter, we first address adverse effects of attacks on distributed synchronization of multi-agent systems, by providing conditions under which an at- tacker can destabilize the underlying network, as well as another set of conditions un- der which local neighborhood tracking errors of intact agents converge to zero. Based on this analysis, we propose a Kullback-Liebler divergence based criterion in view of which each agent detects its neighbors’ misbehavior and, consequently, forms a self- belief about the trustworthiness of the information it receives. Agents continuously update their self-beliefs and communicate them with their neighbors to inform them of the significance of their outgoing information. Moreover, if the self-belief of an agent is low, it forms trust on its neighbors. Agents incorporate their neighbors’ self-beliefs and their own trust values on their control protocols to slow down and mitigate attacks. We show that using the proposed resilient approach, an agent discards the information it receives from a neighbor only if its neighbor is compromised, and not solely based on the discrepancy among neighbors’ information, which might be caused by legitimate changes, and not attacks. The proposed approach is guaranteed to work under mild connectivity assumptions. Chapter 3: This chapter presents a resilient control framework for distributed fre- quency and voltage control of AC microgrids under data manipulation attacks. In order for each distributed energy resource (DER) to detect any misbehavior on its neighboring DERs, an attack detection mechanism is first presented using a Kullback- Liebler divergence-based criterion. An attack mitigation technique is then proposed 8 that utilizes the calculated KL divergence factors to determine trust values indicating the trustworthiness of the received information.Moreover, DERs continuously generate a self-belief factor and communicate it with their neighbors to inform them of the valid- ity level of their own outgoing information. DERs incorporate their neighbors’ selfbelief and their own trust values in their control protocols to slow down and mitigate attacks. It is shown that the proposed cybersecure control effectively distinguishes data manipu- lation attacks from legitimate events. The performance of proposed resilient frequency and voltage control techniques is verified through simulation of microgrid tests system and hardware-in-the-loop (HIL) set-up using Opal-RT as a real-time digital simulator. Chapter 4: This chapter analyzes the adverse effects of cyber-physical attacks on discrete-time distributed multi-agent systems, and proposes a mitigation approach for attacks on sensors and actuators. First, we show how an attack on a single node snowballs into a network-wide attack and even destabilizes the entire system. Next, to overcome the adversarial effects of attacks on sensors and actuators, a distributed adaptive attack compensator is designed by estimating the normal expected behavior of agents. The adaptive attack compensator is augmented with the controller and it is shown that the proposed controller achieves secure consensus in the presence of the attacks on sensors and actuators. No restrictive assumption on the number of agents under adversarial input is assumed. Moreover, it recovers compromised agents under actuator attacks and avoids propagation of attacks on sensors without discarding information from the compromised agents. Finally, numerical simulations validate the effectiveness of the presented theoretical contributions on a network of Sentry autonomous underwater vehicles. Chapter 5: In this chapter, we analyze the adverse effects of cyber-physical attacks as well as mitigate their impacts on the event-triggered distributed Kalman filter (DKF). We first show that although event-triggered mechanisms are highly desirable, the at- tacker can leverage the event-triggered mechanism to cause non-triggering misbehav- 9 ior which significantly harms the network connectivity and its collective observability. We also show that an attacker can mislead the event-triggered mechanism to achieve continuous-triggering misbehavior which not only drains the communication resources but also harms the network’s performance. An information-theoretic approach is pre- sented next to detect attacks on both sensors and communication channels. In contrast to the existing results, the restrictive Gaussian assumption on the attack signal’s prob- ability distribution is not required. To mitigate attacks, a meta-Bayesian approach is presented that incorporates the outcome of the attack detection mechanism to per- form second-order inference. The proposed second-order inference forms confidence and trust values about the truthfulness or legitimacy of sensors’ own estimates and those of their neighbors, respectively. Each sensor communicates its confidence to its neighbors. Sensors then incorporate the confidence they receive from their neighbors and the trust they formed about their neighbors into their posterior update laws to successfully discard corrupted information. Finally, the simulation result validates the effectiveness of the presented resilient event-triggered DKF. • Safe Reinforcement Learning for Autonomous Systems: A Metacognitive Framework Chapter 6: This chapter presents a safe reinforcement learning framework for au- tonomous control systems under constraints. As RL agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an un- certain system might encounter. To guarantee performance while assuring satisfaction of safety constraints across variety of circumstances, an assured autonomous control framework is presented by empowering RL algorithms with metacognitive learning ca- pabilities. More specifically, adapting the reward function parameters of the RL agent is performed in a metacognitive decision-making layer to assure the feasibility of RL agent. That is, to assure that the learned policy by the RL agent satisfies safety constraints specified by signal temporal logic while achieving as much performance as 10 possible. The metacognitive layer monitors any possible future safety violation under the actions of the RL agent and employs a higher-layer Bayesian RL algorithm to proactively adapt the reward function for the lower-layer RL agent. To minimize the higher-layer Bayesian RL intervention, a fitness function is leveraged by the metacog- nitive layer as a metric to evaluate success of the lower-layer RL agent in satisfaction of safety and liveness specifications, and the higher-layer Bayesian RL intervenes only if there is a risk of lower-layer RL failure. Finally, a simulation example is provided to validate the effectiveness of the proposed approach. 1.4 Publications resulted from this work Journal Articles: 1. A. Mustafa, H .Modares and R. Moghadam, “Resilient Synchronization of Distributed Multi-agent Systems under Attacks”, Automatica, vol. 115, 2020. 2. A. Mustafa, and H. Modares, “Attack Analysis and Resilient Control Design for Discrete-time Distributed Multi-agent Systems”, IEEE Robotics and Automation Let- ters, vol. 5, no. 2, pp. 369-376, 2020. 3. A. Mustafa, B. Poudel, A. Bidram and H. Modares, “Detection and Mitigation of Data Manipulation Attacks in AC Microgrids”, IEEE Transaction on Smart Grid, vol. 11, no. 3, pp. 2588-2603, 2020. 4. B. Poudel, A. Mustafa, A. Bidram and H. Modares, "Detection and Mitigation of Cyber-threats in the DC Microgrid Distributed Control System", International Journal of Electrical Power and Energy Systems, vol. 120, 2020. 5. A. Mustafa, M. Mazouchi and H. Modares, “Secure Event-Triggered Distributed Kalman Filters for State Estimation”, IEEE Transactions on Systems, Man and Cy- bernetics: Systems. (under Review) 11 6. A. Mustafa, M. Mazouchi, H. Modares and S.P. Nageshrao, “Assured Learning- enabled Autonomy: A Metacognitive Reinforcement Learning Framework”, IEEE Trans- action on Neural Networks and Learning systems. (Under Review) Conferences: 1. A. Mustafa, and H. Modares, “Attack Analysis for Discrete-time Distributed Multi- Agent Systems”, 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 230-237, 2019. 2. A. Mustafa, and H. Modares, “Analysis and Detection of Cyber-physical Attacks in Distributed Sensor Networks”, 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 973-980, 2018. 12 CHAPTER 2 RESILIENT SYNCHRONIZATION OF DISTRIBUTED MULTI-AGENT SYSTEMS UNDER ATTACKS 2.1 Introduction In this chapter, we present attack analysis, detection, and mitigation mechanisms for dis- tributed multi-agent systems (DMASs). First, the adverse effects of cyber-physical attacks on the synchronization of the DMAS are described, supported with analysis. Specifically, conditions under which an attack can destabilize the entire network are provided. Moreover, conditions under which the local neighborhood tracking error of intact agents becomes zero while they are far from the synchronization are provided. The results of this analysis en- ables us to design detection mechanisms for sophisticated and threatening attacks. Then, two attack detectors are designed based on Kullback-Leibler (KL) divergence metrics to de- tect attacks that make the local neighborhood tracking error of intact agents zero and those for which this error cannot be zero. These detectors are then combined to detect variety of deception attacks. Finally, to mitigate attacks, self-belief (i.e., the belief on the trustworthi- ness of agents’ own information) and trust (i.e., the belief on trustworthiness of neighbors information) metrics are introduced based on the results from the detection mechanism. A weighted local neighborhood tracking error is introduced in which each agent incorporates trusts on its neighbors as well as the self-belief of its neighbors. 2.2 Preliminaries A directed graph (digraph) G consists of a pair (V, E) in which V = {v1,··· , vN} is a set of nodes and E ⊆ V × V is a set of edges. We denote the directed link (edge) from vj to vi by the ordered pair (vj, vi). The adjacency matrix is defined as Ad = [aij], with aij > 0 if (vj, vi) ∈ E, and aij = 0 otherwise. The nodes in the set Ni = {vj : (vj, vi) ∈ E} are said to 13 is defined as L = D − Ad, where D = diag(di) is the in-degree matrix, with di =(cid:80) be neighbors of node νi. The in-degree of vi is the number of edges having vi as a head. The out-degree of a node vi is the number of edges having vi as a tail. If the in-degree equals the out-degree for all nodes vi ∈ V the graph is said to be balanced. The graph Laplacian matrix aij as the weighted in-degree of node νi. A node is called as a root node if it can reach all other nodes of the digraph G through a directed path. A leader is a root node with no incoming link. A (directed) tree is a connected digraph where every node except one, called the root, j∈Ni has in-degree equal to one. A spanning tree of a digraph is a directed tree formed by graph edges, which connects all the nodes of the graph. Throughout the chapter, we denote the set of integers by Z. The set of integers greater than or equal to some integer q ∈ Z is denoted by Z(cid:62)q. The cardinality of a set S is denoted by |S|. λ(A) and tr(A) denote, respectively, the eigenvalues and trace of the matrix A. Furthermore, λmin(A) represents the minimum eigenvalue of the matrix A. The Kronecker product of matrices A and B is denoted by A ⊗ B, and diag (A1, . . . , An) represents a block diagonal matrix with matrices Ai, ∀ i ∈ N as its diagonal entries. 1N is the N-vector of ones and IN is the N × N identity matrix. (cid:107)A(cid:107) denotes Euclidean norm of A. span(a1, . . . , an) represents the set of all linear combinations of the vectors a1, . . . , an. A Gaussian distribution with mean µ and covariance Σ is denoted by N (µ, Σ). Moreover, FN(cid:0)¯µ, ¯σ2(cid:1) represents univariate folded Gaussian distribution with ¯µ and ¯σ2 as mean and variance, respectively [26]. E[.] denotes the expectation operator. The term statistical properties is used for error sequences in the chapter to denote their mean and variance. A system dynamics is called stable (i.e., Hurwitz) if all its eigenvalues have negative real part, and called unstable if it has eigenvalues with positive real part or repeated pair of eigenvalues on the imaginary axis. In this chapter, the term destabilize is used when the attacker makes system unstable. Assumption 1. The communication graph G is directed and has a spanning tree. Note that having a spanning tree is the minimum requirement to guarantee synchroniza- 14 tion over a directed graph [27]. Definition 1 [27]-[28]. A square matrix A ∈ Rn×n is called a singular (non-singular) M-matrix, if all its off-diagonal elements are non-positive and all its eigenvalues have non- (cid:3) negative (positive) real parts. Lemma 1 [27]-[28]. The graph Laplacian matrix L of a directed graph G has at least one zero eigenvalue, and all its nonzero eigenvalues have positive real parts. Zero is a simple eigenvalue of L, if and only if Assumption 1 is satisfied. 2.3 Overview of Consensus in DMASs In this section, we provide an overview of the consensus problem for leaderless DMAS. Consider a group of N homogeneous agents with linear identical dynamics described by ∀ i ∈ N , ˙xi(t) = Axi(t) + Bui(t) (2.1) where xi ∈ Rn and ui ∈ Rm denote, respectively, the state and the control input of agent i. The matrices A ∈ Rn×n and B ∈ Rn×m are, respectively, the drift dynamics and the input matrix. Problem 1. Design local control protocols ui for all agents ∀i ∈ N in (2.1) such that all agents reach consensus or synchronization on some common value or trajectory of interest, i.e., t→∞||xj(t) − xi(t)|| = 0 ∀ i, j ∈ N . lim (2.2) Assumption 2. The system dynamics matrix A in (2.1) is assumed to be marginally stable with all eigenvalues on the imaginary axis [29]. Define the local neighborhood tracking error for the agent i as N(cid:88) ηi(t) = aij(xj(t) − xi(t)), with aij as the (i, j)-th entry of the graph adjacency matrix Ad. j=1 Consider the distributed control protocol for each agent i as [27]-[28] ui(t) = cKηi(t) ∀ i ∈ N , 15 (2.3) (2.4) where c and K ∈ Rm×n denote, respectively, the scalar coupling gain and the feedback control gain matrix. Several approaches are presented to design c and K locally to solve Problem 1 [27]- [30]. To this end, the gains c and K are designed such that A − cλiBK is Hurwitz for all i = 2, . . . , N [27]-[30], with λi as the ith eigenvalue of the graph Laplacian matrix L. In the subsequent sections, we assume that c and K are designed locally by each agent and without using a central agent appropriately to solve problem 1 in the absence of attack. We then analyze the effect of attacks and propose mitigation approaches. Remark 1. Note that the presented results subsume the leader-follower synchronization problem and the average consensus as special cases. For the leader-follower case, the leader is only root node, whereas for the average consensus case, the graph is assumed to be balanced (cid:3) and A = 0 and B = Im. 2.4 Attack Modelling and Analysis for DMASs In this section, attacks on agents are modelled and a complete attack analysis is provided. 2.4.1 Attack Modelling In this subsection, attacks on DMASs are modelled. Attacks on actuators of agent i can be modelled as i = ui + βiud uc i , (2.5) i and uc i denote, respectively, the nominal value of the control protocol for agent where ui, ud i in (2.1), the disrupted signal directly injected into actuators of agent i, and the corrupted control protocol of agent i. If agent i is under actuator attack, then βi = 1, otherwise βi = 0. Similarly, one can model attacks on sensors of agent i as i = xi + αixd xc i , (2.6) 16 i and xc where xi, xd i denote, respectively, the nominal value of the state of agent i in (2.3), the disrupted signal directly injected into sensors of agent i, and the corrupted state of agent i. If agent i is under sensor attack, then αi = 1, otherwise αi = 0. Using the corrupted state (2.6) in the controller (2.4)-(2.3) with the corrupted control input (2.5) in (2.1), the system dynamics under attack becomes ˙xi = Axi + Bui + Bfi ∀ i ∈ N , fi = βiud i + cK aij αjxd j − αixd i where fi denotes the overall attack affecting the agent i which can be written as (cid:16) (cid:88) j∈Ni (cid:17) , (2.7) (2.8) with ud i and xd i as attacks directly on actuators and sensors of agent i, respectively, and xd j as the disruption in the received state of the jth neighbor due to injected attack signal either into its sensors or actuators or into the incoming communication link from agent j to agent i. The following definition categorizes all attacks into two categories. The first type of attack exploits the knowledge of the system dynamics A and use it in the design of its attack signal. That is, for the first type of attack for fi in (2.8), one has ˙fi = Ψfi, (2.9) where Ψ ∈ Rm×m depends on the knowledge of the system dynamics A as discussed in Definition 2. On the other hand, for the second type of attack, the attacker has no knowledge of the system dynamics A and this can cover all other attacks that are not in the form of (2.9). Define  EΨ = {λ1(Ψ), . . . , λm(Ψ)} EA = {λ1(A), . . . , λn(A)}, (2.10) where λi(Ψ) ∀i = 1, . . . , m and λi(A) ∀i = 1, . . . , N are, respectively, the set of eigenvalues of the attack signal generator dynamics matrix Ψ and the system dynamics matrix A. 17 Definition 2 (IMP-based and non-IMP-based Attacks). If the attack signal fi in (2.7) is generated by (2.9), then the attack signal is called the internal model principle (IMP)- based attack, if EΨ ⊆ EA. Otherwise, i.e., EΨ (cid:54)⊂ EA or if the attacker has no dynamics (cid:3) (e.g. a random signal), it is called a non-IMP based attack. Remark 2. Note that we do not limit attacks to the IMP-based attacks given by (2.9). Attacks are categorized into two classes in Definition 2 based on their impact on the system performance, as to be shown in the subsequent sections. The non-IMP based attacks cover (cid:3) a broad range of attacks. Definition 3 (Compromised and Intact Agent). We call an agent that is directly under attack as a compromised agent. An agent is called intact if it is not compromised. We denote the set of intact agents as NInt, i.e.,NInt = N\NComp where NComp denotes the set of (cid:3) compromised agents. Definition 4. In a graph, Agent i is reachable from agent j if there exists [v1, v2, . . . , vl] ∈ V av1v2 . . . avli (cid:54)= 0 for some l ≥ 0, i.e., there is a directed path of length l + 1 such that ajv1 (cid:3) from node j to node i. Using (2.4)-(2.3), the global form of control input, i.e., u = [uT 1 , . . . , uT N ]T can be written as where L denotes the graph Laplacian matrix. u = (−cL ⊗ K)x, By using (2.11) in (2.7), the global dynamics of agents under attack becomes ˙x(t) = (IN ⊗ A) x(t) + (IN ⊗ B) uc(t), where uc(t) = u(t) + f (t) ∆ = (−cL ⊗ K)x(t) + f (t), (2.11) (2.12) (2.13) with f (t) = [f T N (t)]T denote, respectively, the overall vector of attacks on agents and the global vector of the states of agents. Note that N (t)]T and x(t) = [xT 1 (t), . . . , f T 1 (t), . . . , xT 18 based on (2.8), and since corrupted sensory information is used by the controller, f (t) in (2.13) not only captures attacks on actuators, but also on sensors. If agents are not under attack, i.e., f (t) = 0, then, the control input (2.11) eventually compensates for the difference between the agents’ initial conditions and becomes zero once they reach an agreement. That is, in the absence of attack, uc(t) = u(t) goes to zero (i.e., uc(t) → 0), and the global dynamics of agents become ˙xss(t) = (IN ⊗ A) xss(t), (2.14) where xss = lim Definition 5 (Steady State and Reaching Consensus). We say that agents with the t→∞x(t) is called the global steady state of agents. dynamics given by (2.7) and the global dynamics given by (2.12) reach a steady state if (2.14) is satisfied, i.e., if uc(t) → 0 in (2.13). In the absence of attack, if agents reach a (cid:3) steady state, then, agents reach consensus, i.e., t→∞||xj(t) − xi(t)|| = 0 ∀ i, j ∈ N . lim In the presence of attack, whether agents reach a steady state or not, i.e., Remark 3. whether uc(t) → 0 or uc(t) (cid:54)→ 0, plays an important role in the attack analysis and mitigation to follow. Reaching a steady state is necessary for agents to achieve consensus based on Definition 5. However, we show that even if agents reach a steady state, they do not achieve (cid:3) consensus if the system is under attack. 2.4.2 Attack Analysis In this subsection, a graph theoretic-based approach is utilized to analyze the effect of attacks on DMASs. To this end, the following notation and lemmas are used. Let the graph Laplacian matrix L be partitioned as [27]  Lr×r L = 0r×nr Lnr×r Lnr×nr  , where r and nr in (2.15) denote, respectively, the number of root nodes and non-root nodes. Moreover, Lr×r and Lnr×nr are, respectively, the sub-graph matrices corresponding to the 19 (2.15) sub-graphs of root nodes and non-root nodes. The result of the following lemma is used in the proof of Theorem 1 to show that the local neighborhood tracking error goes to zero even in the presence of attack. Lemma 2. Consider the partitioned graph Laplacian matrix (2.15). Then, Lr×r is a singular M-matrix and Lnr×nr is a non-singular M-matrix. Proof. We first prove that the subgraph of root nodes is strongly connected. According to the definition of a root node, there always exists a directed path from a root node to all other nodes of the graph G, including other root nodes. Therefore, in the graph G, there always exists a path from each root node to all other root nodes. We now show that removing non-root nodes from the graph G does not affect the connectivity of the subgraph comprised of only root nodes. In the graph G, if a non-root node is not an incoming neighbor of a root node, then its removal does not harm the connectivity of the subgraph of the root nodes. Suppose that removing a non-root node affects the connectivity of the subgraph of root nodes. This requires the non-root node to be an incoming neighbor of a root node. However, this makes the removed node a root node, as it can now access all other nodes through the root node it is connected to. Hence, this argument shows that the subgraph of root nodes is always strongly connected. Then, based on Lemma 1, Lr×r has zero as one of its eigenvalues, which implies that Lr×r is a singular M-matrix according to Definition 1. On the other hand, from (2.15), since L is a lower triangular matrix, the eigenvalues of L are the union of the eigenvalues of Lr×r and Lnr×nr. Moreover, as stated in Lemma 1, L has a simple zero eigenvalue and, as shown above, zero is the eigenvalue of Lr×r. Therefore, all eigenvalues of Lnr×nr have positive real parts only, and thus based on Definition 1, Lnr×nr is a non-singular M-matrix. In the following Lemmas 3-4 and Theorem 1, we now provide the conditions under which the agents can reach a steady state. Lemma 3. Consider the global dynamics of DMAS (2.12) under attack. Let the attack 20 signal f (t) be a non-IMP based attack and f (t) (cid:54)= 0. Then, agents cannot reach a steady state, i.e., uc(t) (cid:54)→ 0. Proof. We prove this result by contradiction. Assume that the attack signal f (t) is a non- IMP based attack, i.e., EΨ (cid:54)⊂ EA, but uc(t) → 0 in (2.12), which implies ˙xi → Axi for all i ∈ N . Using the modal decomposition, one has xi(t) → n(cid:88) j=1 (rjxi(0))eλj (A)tmj, (2.16) where rj and mj denote, respectively, the left and right eigenvectors associated with the eigenvalue λj(A). On the other hand, based on (2.13) uc(t) → 0 implies f (t) → (cL⊗K)x(t) or equivalently aij(xj(t) − xi(t)) ∀ i ∈ N . (2.17) fi(t) → (cid:88) j∈Ni As shown in (2.16), the right-hand side of (2.17) is generated by the natural modes of the system dynamics whereas the left-hand side is generated by the natural modes of the attack signal generator dynamics in (2.9). By the prior assumption, EΨ (cid:54)⊂ EA, the attacker’s natural modes are different from those of the system dynamics. Therefore, (2.17) cannot be satisfied which contradicts the assumption. This completes the proof. Equation (2.17) in Lemma 3 also shows that for non-IMP based attacks, the local neigh- borhood tracking error is nonzero for a compromised agent. The following results show that under IMP-based attack, either agents’ states diverge, or they reach a steady state while their local neighborhood tracking errors converge to zero, despite attack. The following lemma is needed in Theorem 1, which gives conditions under which agents reach a steady state under IMP-based attack. Then, Theorem 2 shows that under what conditions an IMP-based attack makes the entire network of agents unstable. Define  SA(t) = [e Sψ(t) = [e t λA1 λΨ1 t , . . . , eλAn t] , . . . , eλΨn t], 21 (2.18) where e λAi t ∀i = 1, . . . , n and e λΨi t ∀i = 1, . . . , m are, respectively, the set of natural modes of agent dynamics A in (2.1) and the attacker dynamics Ψ in (2.9). Lemma 4. Consider the global dynamics of DMAS (2.12) under attack on non-root nodes. Then, for an IMP-based attack, agents reach a steady state, i.e., uc(t) → 0. Proof. According to (2.14), in steady state, one has ˙xss(t) → (IN ⊗ A) xss(t) since uc(t) → 0. This implies that xss(t) ∈ span(SA) where SA is defined in (2.18). On the other hand, if agents reach a steady state, then based on (2.13), one has (cL ⊗ K)xss(t) = f (t). (2.19) nrs]T , where ¯xrs and ¯xnrs are, re- Define the global steady state vector xss(t) = [¯xT spectively, the global steady states of root nodes and non-root nodes. Since attack is only N ]T on non-root nodes, f (t) can be written as f (t) = [0r, ¯f T represents the attack vector on non-root nodes. rs, ¯xT nr]T , where ¯fnr = [f T r+1, . . . , f T Then, using (2.15) and (2.19), one has  (cLr×r ⊗ K)¯xrs = 0 (cLnr×r ⊗ K)¯xrs + (cLnr×nr ⊗ K)¯xnrs = ¯fnr. (2.20) As stated in Lemma 2, Lr×r is a singular M-matrix with zero as an eigenvalue and 1r is its corresponding right eigenvector and, thus, the solution to the first equation of (2.20) becomes ¯xrs = c11r for some positive scalar c1. Using ¯xrs = c11r in the second equation of (2.20), the global steady states of non-root nodes becomes ¯xnrs = (cLnr×nr ⊗ K)−1(cid:2)−(cLnr×r ⊗ K)c11r + ¯fnr (cid:3) . (2.21) Equation (2.21) shows that the steady states of non-root nodes are affected by the attack signal f (t). If EΨ (cid:54)⊂ EA, it results in ¯xnrs ∈ span(SA, SΨ) where SA and SΨ are defined in (2.18) which contradicts xss(t) ∈ span(SA). Therefore, the condition EΨ ⊂ EA is necessary to conclude that for any f = [0r, ¯f T nr]T , there exists a steady state solution xss(t), i.e., uc(t) → 0 holds true. This completes the proof. 22 The following theorem provides necessary and sufficient conditions for IMP-based attacks to make an agent reach a steady state, i.e., uc(t) → 0 in (2.13). Theorem 1. Consider the global dynamics of DMAS (2.12) with the control protocol (2.13), where the attack signal f (t) is generated based on an IMP-based attack. Then, agents reach a steady state, i.e., uc(t) → 0 if and only if the attack signals satisfy N(cid:88) pkfk = 0, (2.22) where pk are the nonzero elements of the left eigenvector of the graph Laplacian matrix L associated with its zero eigenvalue. k=1 Proof. It was shown in the Lemma 4 that for the IMP-based attack on non-root nodes, agents reach a steady state, i.e., uc(t) → 0. Therefore, whether agents reach a steady state or not depends solely upon attacks on root nodes. Let f (t) = [ ¯fr, ¯fnr] where ¯fr represents the vector of attacks for root nodes given by ¯fr = [f T r ]T . Now, we first prove the necessary condition for root nodes. If uc(t) → 0, then, using (2.15) and (2.19) , there exists a nonzero vector ¯xrs for root nodes such that 1 , . . . , f T (cLr×r ⊗ K)¯xrs = ¯fr, (2.23) where ¯xrs can be considered as the global steady state of the root nodes. Moreover, based on Lemma 3, (2.23) does not hold, if EΨ (cid:54)⊂ EA which implies that (2.23) is true only for EΨ ⊆ EA. As stated in Lemma 2, Lr×r is a strongly connected graph of root nodes and, therefore, it is a singular M-matrix. Let ¯wT = [p1, . . . , pr] be the left eigenvector associated with the zero eigenvalue of Lr×r. Now, pre-multiplying both sides of (2.23) by ¯wT and using the fact that ¯wTLr×r = 0 yield ¯wT (cLr×r ⊗ K)¯xrs = ¯wT ¯fr = 0. (2.24) This states that IMP-based attacks on root nodes have to satisfy(cid:80)N k=1 pkfk = 0 to ensure agents reach a steady state, i.e., uc(t) → 0. Note that pk = 0 for k = r + 1, . . . , N, i.e., the 23 elements of the left eigenvector of the graph Laplacian matrix L, corresponding to its zero eigenvalue, are zero for non-root nodes [27]-[28]. This proves the necessity part. a steady state, i.e., uc(t) → 0, but(cid:80)N Now, we prove the sufficient part by contradiction for root nodes. Assume agents reach k=1 pkfk (cid:54)= 0. Note that, agents reach a steady state (cid:80)N implies that there exists a nonzero vector ¯xrs such that (2.23) holds. Using (2.24) and k=1 pkfk (cid:54)= 0, one can conclude that ¯wT (cLr×r ⊗ K)¯xrs (cid:54)= 0. This can happen only when strongly connected graph. Therefore, ¯wT (cLr×r⊗K)¯xrs = 0 which results in(cid:80)N Lr×r does not have any zero eigenvalue, which violates the fact in Lemma 2 that Lr×r is a k=1 pkfk = 0 and contradicts the assumption made. This completes the proof. The following theorem provides conditions for IMP-based attack under which the network becomes unstable. Theorem 2. Consider the global dynamics of DMAS (2.12) with the control protocol (2.13) under IMP-based attack. If (2.22) is not satisfied and EΨ ∩ EA (cid:54)= ∅, then the dynamics of agents become unstable. Proof. Since it is assumed that the condition in (2.22) is not satisfied, then based on Theorem 1, uc(t) (cid:54)→ 0 even under IMP-based attack. Thus, the attack signal f (t) does not vanish over time and eventually acts as an input to the system in (2.12). Assume that there exists at least one common marginal eigenvalue between the system dynamics matrix A in (2.1) and the attacker dynamics matrix Ψ in (2.9), i.e., EΨ ∩ EA (cid:54)= ∅. Then, the multiplicity of at least one marginally stable pole becomes greater than 1. Therefore, the attacker destabilizes the state of the agent in (2.12). Moreover, since (2.22) is not satisfied, then the attack is on root nodes, and root nodes have a path to all other nodes in the network, the state of the all agents become unstable. This completes the proof. Theorem 3. Consider the global dynamics of DMAS (2.12) under attack f (t). Then, the local neighborhood tracking error (2.3) converges to zero for all intact agents if uc(t) → 0. 24 Moreover, intact agents that are reachable from the compromised agents do not converge to the desired consensus trajectory. Proof. In the presence of attacks, the global dynamics of the DMAS (2.12) with (2.13) can be written as (cid:104) ˙x(t) = (IN ⊗ A) x(t) + (IN ⊗ B)((−cL ⊗ K)x(t) + f (t)) , (cid:105)T is the global vector of the state of agents and f (t) = denotes the global vector of attacks. As shown in (2.14) that if uc(t) → xT 1 (t), . . . , xT (2.25) (cid:105)T N (t) (cid:104) where x(t) = f T 1 (t), . . . , f T N (t) 0, agents reach a steady state. That is, cKηi → −fi ∀ i ∈ N , (2.26) where ηi denotes the local neighborhood tracking error of agent i defined in (2.3). For an intact agent, by definition one has fi = 0, and thus (2.26) implies that the local neighborhood tracking error (2.3) converges to zero. Now, we show that intact agents that are reachable from the compromised agent do not synchronize to the desired consensus behavior. To do this, let agent j be under attack. Assuming that all intact agents synchronize, one has xk = xi ∀i, k ∈ N − {j}. Now, consider the intact agent i as an immediate neighbor of the compromised agent j. Then using (2.13), if uc(t) → 0, for intact agent i, i.e., fi = 0, one has (cid:88) k∈Ni−{j} aij(xk − xi) + (xj − xi) → 0, (2.27) where xk denotes the state of the all intact neighbors of agent i. On the other hand, (2.7) shows that the state of the compromised agent j, i.e., xj, is deviated from the desired consensus value with a value proportional to fj. Therefore, (2.27) results in deviating the state of the immediate neighbor of the compromised agent j from the desired consensus behavior, which contradicts the assumption. Consequently, intact agents that have a path to the compromised agent do not reach consensus, while their local neighborhood tracking error is zero. This completes the proof. 25 Remark 4. An attacker can exploit the security of the network by eavesdropping and mon- itoring the transmitted data to identify at least one of the marginal eigenvalues of the agent dynamics by identifying the system dynamics using data-based approaches. Eavesdropping (cid:3) based design of attack signal is also discussed in [25]. 2.4.3 Extention of Analysis Results to the Case of Noisy Communication Up to now, the presented analysis has been under the assumption that the communication is noise free. We now briefly discuss what changes if the communication noise is present, and propose attack detection and mitigation in the presence of communication noise. In the presence of Gaussian distributed communication noise, the local neighborhood tracking error in (2.3) becomes (2.28) where ωi ∼ N (0, Σωi) denotes the aggregate Gaussian noise affecting the incoming informa- tion to agent i and is given as ¯ηi = ηi + ωi, ωi = aijωij, (2.29) (cid:88) j∈Ni where ωij denotes noise in measurement from agent j to agent i. In such situations, the DMAS consensus problem defined in Problem 1 changes to the mean square consensus prob- lem [37]. In the presence of Gaussian noise, based on (2.28), the control protocol in (2.4)-(2.3) becomes [37] ui(t) = cKa(t) N(cid:88) aij(xj(t) − xi(t)) + ωi ∀ i ∈ N , (2.30) with a(t) as time-dependent consensus gain and ωi ∼ N (0, Σωi) is defined in (2.29). Based on mean square consensus, one has j=1 E[ui(t)] →0 ∀ i ∈ N , lim t→∞ (2.31) 26 and thus, based on (2.1), the steady state of agents converge to a consensus trajectory in mean square sense and its global form in (2.14) becomes ss = (IN ⊗ A) xm ˙xm ss, (2.32) where xm ss = limt→∞E[x(t)] denotes the global steady state of agents in mean square sense. Then, following the same procedure as Lemmas 3-4 and Theorems 1-3, one can show that an IMP-based attack does not change the statistical properties of the local neighborhood tracking error, while a non-IMP based attack does. Moreover, the local neighborhood track- ing error converges to zero in mean for an IMP-based attack, and it does not converges to zero in mean for a non-IMP based attack. Remark 5. In general, the noise associated with electronic circuits at the receiver end lies under the category of thermal noise and statistically modeled as Gaussian [38]. Therefore, it is standard to assume ωij in (2.29) is Gaussian [37]. However, if the assumption is violated, the same attack detection mechanism can be still developed using the divergence estimation (cid:3) approach presented in [39]-[40] for all distributions. 2.5 An Attack Detection Mechanism In this section, Kullback-Liebler (KL)-based attack detection and mitigation approaches are developed for both IMP-based and non-IMP-based attacks. Definition 6 (Kullback-Leibler Divergence) [41]-[42]. The Kullback-Leibler divergence between two probability densities PX and PZ of a random variable Θ is defined as (cid:90) (cid:18) PX (θ) (cid:19) PZ (θ) DKL(X||Z) = PX (θ) log θ∈Θ . (2.33) In the following subsections, KL-divergence is used to detect IMP-based and non-IMP-based attacks on DMASs. 27 2.5.1 Attack detection for IMP-based attacks In this subsection, an attack detector is designed to identify IMP-based attacks. To this end, two error sequences τi and ϕi are defined based on only local exchanged information for agent i as and τi = ϕi = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13), (cid:88) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j∈Ni aijdij j∈Ni (cid:13)(cid:13)aijdij (2.34) (2.35) where the measured discrepancy dij between agent i(cid:48)s state and its neighbor j(cid:48)s state under attack becomes j − xc i + ωij ∀j ∈ Ni, dij = xc (2.36) where ωij ∼ N (0, Σωij ) denotes the Gaussian incoming communication noise from agent j to agent i. Moreover, xc j is the possibly corrupted information it receives from its jth neighbor. If agent i is not compromised, then i = xi, and, similarly, if agent j is not compromised, then xc j = xj. In fact, (2.34) is the xc norm of the summation of the measured discrepancy of agent i and all its neighbors, and i is the measured state of agent i under attack and xc (2.35) is the summation of norms of those measured discrepancies. In the absence of attack, these two signals show the same behavior in the sense that their means converge to zero. Remark 6. In the presence of an IMP-based attack and in the absence of noise, based on Theorem 3, τi goes to zero for intact agents, but ϕi does not converge to zero, as its convergence implies consensus, which cannot not happen based on Theorem 3. On the other hand, for an IMP-based attack in the presence of noise, based on Theorem 3, τi converges to zero in mean because the local neighborhood tracking error converges to zero in mean for all agents. In contrast, the mean of ϕi depends upon not only on the mean of the noise signal, but also of the attack signal. Therefore, the behavior of these two signals significantly 28 diverges in the presence of attacks which can be captured by KL-divergence based detection (cid:3) mechanism. Note that one can measure τi and ϕi based on the exchanged information among agents, which might be corrupted by the attack signal. Existing KL-divergence methods are, never- theless, developed for Gaussian signals. However, while the communication noise is assumed to be Gaussian, error sequences (2.34) and (2.35) are norms of some variable with Gaus- sian distributions, thus, they have univariate folded Gaussian distributions given by [26] ϕi ∼ FN (µ1i, σ2 1i) and τi ∼ FN (µ2i, σ2 2i) . That is, − (qi−µ1i)2 2σ2 1i + − (qi−µ2i)2 2σ2 2i + 1√ 2π |σ1i| e 1√ 2π |σ2i| e − (qi+µ1i)2 2σ2 1i 1√ 2π |σ1i|e − (qi+µ2i)2 2σ2 2i 1√ 2π |σ2i|e (2.37) , Pϕi(qi, µ1i, σ1i) = Pτi(qi, µ2i, σ2i) = where µ1i and σ1i are the mean and variance of the error sequences ϕi and µ2i and σ2i are the mean and variance of the error sequences τi. Using (2.33), the KL divergence in terms of the local error sequences ϕi and τi can be defined as (cid:32) (cid:33) DKL(ϕi||τi) = Pϕi(qi) log Pϕi(qi) Pτi(qi) dqi = EPϕi log Pϕi(qi) Pτi(qi) , (2.38) [.] represents the expectation value with respect to the distribution of the first (cid:90) (cid:32) (cid:33) where EPϕi sequence [26]. A KL divergence formula for the folded Gaussian distributions is now developed in the following lemma. Lemma 5. Consider the error sequences τi and ϕi in (2.34)-(2.35) with folded Gaussian distributions Pϕi and Pτi in (2.37). Then, the KL divergence between error sequences τi and ϕi, i.e., DKL(ϕi||τi), becomes σ2 2i σ2 1i DKL(ϕi||τi) ≈ 1 2 − 1 + (σ−2 2i σ2 1i) (cid:33) (cid:32) 1 2 + σ−2 2i (µ2i − µ1i)2 + 1 ρ2 4 2σ2 1i  − e ρ2 1 2σ2 1i + e   ρ2 2 2σ2 1i (2.39) 1 − e log  + e 8µ2 1i σ2 1i − µ2 1i 2σ2 1i  1 2 e 4µ2 1i σ2 1i 1 2 e ρ2 3 2σ2 1i + e 29 for some ρ1 = (µ1i − 2µ2iσ2 ρ4 = (µ1i + 4µ2iσ2 1iσ−2 2i ). 1iσ−2 2i ), ρ2 = (µ1i + 2µ2iσ2 1iσ−2 2i ), ρ3 = (µ1i − 4µ2iσ2 1iσ−2 2i ) and Proof. See Appendix A. Note that in (2.39), τi and ϕi are error sequences and the divergence between their distributions depends on their means and variances. One can calculate the value of these error sequences based on the equations (1.36) and (1.37) at each time instant. Then, based on previous m samples one can determine the mean and the variance of distributions. Therefore, we do not explicitly need to know these statistical parameters. In the following theorem, we show that the effect of IMP-based attacks can be captured using the KL divergence defined in (2.39). Theorem 4. Consider the DMAS (2.1) along with the controller (2.13), and under the IMP-based attacks. Assume that the communication noise sequences are i.i.d. Then, for an intact agent i reachable from the compromised agent, (cid:90) k+T−1 1 T k DKL(ϕi||τi)dκ > γi, (2.40) where ϕi and τi are defined in (2.34) and (2.35), respectively, and T and γi represent the window size and the predesigned threshold parameter. Proof. According to Theorem 3, the local neighborhood tracking error goes to zero for intact agents in the presence of an IMP-based attack when there is no communication noise. In the presence of communication noise with Gaussian distribution, i.e., ωij ∼ (0, Σωij ) and IMP-based attack, the expectation value of the local neighborhood tracking error for intact agent i becomes E[ηi] = E[ aijdij] → 0, (2.41) (cid:88) j∈Ni 30 where the measured discrepancy dij is defined in (2.36). Using (2.41), one can write (2.34) as τi = aijdij (2.42) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j∈Ni (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∼ FN (0, ¯υ2 ωi), which represents a folded Gaussian distribution with mean zero and variance ¯υ2 the mean and variance of the distribution Pτi in (2.37) become µ2i = 0 and σ2 ωi. Note that ωi. 2i = ¯υ2 Since noise signals are independent and identically distributed, from (2.35), one can infer that the folded Gaussian distribution Pϕi in (2.37) has the following statistical properties ϕi ∼ FN (µ , ¯υ2 ωi + ˆυ2 ωi f d i + ¯υ2 f d i ), (2.43) where µ and ¯υ2 ωi + ˆυ2 ωi + ¯υ2 f d i f d i represent the overall mean and covariance, respectively, due to the communication noise and overall deviation from the desired behavior in intact neighbors reachable from the compromised agent. In the absence of attack, the statistical properties corresponding to sequences τi and ϕi ) , respectively, and the corresponding KL diver- ωi) and FN (0, ¯υ2 ωi + ˆυ2 ωi become FN (0, ¯υ2 gence in (2.39) becomes (cid:32) KL(ϕi||τi) ≈ 1 Dwa 2 log ¯υ2 ωi + ˆυ2 ωi ¯υ2 ωi + ¯υ−2 ωi (cid:33) ˆυ2 ωi )) , (2.44) represents additional variance in sequence ϕi, which depends on the communica- where ˆυ2 ωi tion noise. Note that τi in (2.34) is the norm of the summation of the measured discrepancy of agent i and all its neighbors whereas ϕi in (2.35) is the summation of norms of those measured dis- crepancies. Even in the absence of attack, they represent folded Gaussian distributions with zero means and different covariances due to application of norm on measured discrepancies. Now, in the presence of IMP-based attacks, using the derived form of KL divergence for 31 folded Gaussian distributions from Lemma 5, one can simplify (2.39) using (2.42)-(2.43) as log DKL(ϕi||τi) ≈ 1 2 ¯υ2 ωi + ˆυ2 ωi ¯υ2 ωi + ¯υ−2 ωi (¯υ2 f d i + ˆυ2 ωi ) + ¯υ−2 ωi 1 2 )2 + (µ f d i 1 2 e ¯υ2 ωi )2 4(µ f d i +ˆυ2 ωi +¯υ2 f d i 8(µ f d i +ˆυ2 ωi ¯υ2 ωi )2 +¯υ2 f d i + ¯υ2 f d i 1 − e   . (2.45) Then, one can design the threshold parameter γi such that DKL(ϕi||τi)dκ > γi, (2.46) where T denotes the sliding window size. This completes the proof. Based on Theorem 4, one can use the following conditions for attack detection [31] DKL(ϕi||τi)dκ < γi DKL(ϕi||τi)dκ > γi : H0 : H1, (2.47) (cid:90) k+T−1 1 T k  1 T 1 T (cid:90) k+T−1 (cid:90) k+T−1 k k where γi denotes the designed threshold for detection, the null hypotheses H0 represents the intact mode and H1 denotes the compromised mode of an agent. Remark 7. Note that several existing results on the faults/attack’s detection employ log- likelihood or generalized likelihood ratio-based test statistics [31]-[34]. Similarly, to get (1.47), we consider a log-likelihood ratio-based test statistic given by Λ(qi) = log( Pϕi(qi) Pτi(qi) ) (2.48) where observations qi are drawn from probability distribution functions Pϕi and Pτi de- fined in (2.37). Based on (2.38), the expectation of Λ(qi) with respect to the probability distribution function Pϕi becomes EPϕi [Λ(qi)] = EPϕi [log( Pϕi(qi) Pτi(qi) )] = DKL(ϕi||τi) (2.49) 32 In the absence of attack, i.e., when the intact mode hypothesis H0 is true, based on Remark 6 and Theorem 4, DKL(ϕi||τi) in (2.49) becomes a small positive value. This is because Pϕi and Pτi both have zero mean in the absence of attack. Moreover, in the presence of attack, i.e., when the compromised mode hypothesis H1 is true, based on Remark 6 and Theorem 4, DKL(ϕi||τi) in (2.49) becomes large because Pϕi and Pτi will have different mean and covariance values. Therefore, based on log-likelihood ratio-based test statistic, DKL(ϕi||τi) is used in (2.47) for the attack detection over a sliding window. Note that the notion of sliding window is employed in order to avoid the false detection which may occur due to legitimate change, i.e., transient behavior of agents in the DMAS. The designed threshold γi and the sliding window size T in (2.47) are predefined parameters. The threshold γi is typically designed based on the knowledge of the bound on the communication noise ωij in (36). This design choice, even though ensures that the compromised mode H1 in (2.47) will not be active in the absence of attacks, can imply that the threshold is conservative. Selection of the predefined parameters for the adversary detection and stealthiness based on (cid:3) the system knowledge is reported in the literature [35]-[36]. 2.5.2 Attack detection for non-IMP-based attacks This subsection presents the design of a KL-based attack detector for non-IMP based attacks. It was shown in Theorem 3 that the local neighborhood tracking error goes to zero if agents are under IMP-based attacks. Therefore, for the case of non-IMP-based attacks, one can identify these types of attacks using the changes in the statistical properties of the local neighborhood tracking error. In the absence of attack, since the Gaussian noise, i.e., ωi ∼ N (0, Σωi), is considered in the communication link, the local neighborhood tracking error ηi in (2.28) has the following statistical properties ηi ∼ N (0, Σωi) (2.50) and it represents the nominal behavior of the system. 33 In the presence of attacks, using (2.28), the local neighborhood tracking error ηa i can be written as ηa i = (cid:88) j∈Ni aijdij, (2.51) where measured discrepancy under attacks dij is defined (2.36). From (2.51), one has i ∼ N (µfi ηa , Σfi + Σωi), (2.52) where µfi corrupted states under attacks as given in (2.36). and Σfi are, respectively, mean and covariance of the overall deviation due to Now, since both ηa i and ηi have normal Gaussian distributions, the KL divergence in the terms of ηa i and ηi as DKL(ηa i ||ηi) can be written as [42] log (cid:12)(cid:12) (cid:12)(cid:12)Σηi (cid:12)(cid:12)(cid:12)Σηa (cid:12)(cid:12)(cid:12) − n + tr(Σ−1 ηi i  ) Σηa i DKL(ηa i ||ηi) = 1 2 (µηi − µηa where µηi and Σηi denote the mean and covariance of ηi and µηa i and covariance of ηa denote the mean i . Moreover, n denotes the dimension of the error sequence. Define the (µηi − µηa and Σηa i 1 2 + ), i )T Σ−1 ηi i average of KL divergence over a window T as (2.53) (cid:90) k+T−1 ¯Di = 1 T k DKL(ηa i ||ηi)dκ. (2.54) The following theorem shows that the effect of non-IMP based attacks can be detected using the KL divergence between the two error sequences ηa i and ηi. Theorem 5. Consider the DMAS (2.1) along with the controller (2.13). Then, 1. in the absence of attack, ¯Di defined in (2.54) tends to zero. 2. in the presence of a non-IMP-based attack, ¯Di defined in (2.54) is greater than a predefined threshold γi. 34 Proof. In the absence of attacks, the statistical properties of sequences ηi and ηa same as in (2.50). Therefore, the KL divergence DKL(ηa this makes ¯Di in (2.54) zero. This completes the proof of part 1. i are the i ||ηi) in (2.53) becomes zero and To prove Part 2, using (2.50)-(2.52) in (2.53) and the fact that (Σ−1 ωi + Σωi) − n = (Σfi tr(Σ−1 ωi Σfi ), one can write the KL divergence between ηa i and ηi as DKL(ηa i ||ηi) = 1 2 (log Σfi ) + µT fi Σ−1 ωi ). µfi (2.55) Then, using (2.54), one has k+T−1 ¯Di = 1 T 1 2 (log (cid:90) k + Σωi (cid:12)(cid:12) (cid:12)(cid:12)Σωi (cid:12)(cid:12)(cid:12)Σfi (cid:12)(cid:12)Σωi (cid:12)(cid:12)(cid:12)Σfi (cid:12)(cid:12) + Σωi ωi (cid:12)(cid:12)(cid:12) + tr(Σ−1 (cid:12)(cid:12)(cid:12) + tr(Σ−1 ωi Σfi  ¯Di < γi : H0 ¯Di > γi : H1 ) + µT fi Σ−1 ωi µfi ) > γi, (2.56) where T and γi denote the sliding window size and the predefined design threshold, respec- tively. This completes the proof. Based on Theorem 5, one can use the following conditions for attack detection , (2.57) where γi denotes the designed threshold for detection, H0 represents the intact mode of the system and H1 denotes the compromised mode of the system. 2.6 An Attack Mitigation Mechanism In this section, both IMP-based and non-IMP-based attacks are mitigated using the presented detection mechanisms in the previous section. 2.6.1 Self-belief of agents about their outgoing information To determine the level of trustworthiness of each agent about its own information, a self-belief value is presented. 35 Using the DKL(ϕi||τi) and DKL(ηa i ||ηi) from Theorems 4 and 5, we define c1 self-belief of agent i calculated under IMP-based attack and c2 i (t) as the i (t) as its self-belief under non-IMP based attack as t(cid:90) 0 cj i (t) = κj eκj (τ−t)χj i (τ )dτ, j = 1, 2, where 0 ≤ cj i (t) ≤ 1 with and χ1 i (t) = ∆i ∆i + DKL(ϕi||τi) χ2 i (t) = ∆i ∆i + DKL(ηa , i ||ηi) (2.58) (2.59) (2.60) Moreover, ∆i represents the threshold to account for the channel fading and other uncer- tainties and κj > 0 denotes the discount factor. Note that χj i (t), j = 1, 2 in (2.58) depends on divergence between error sequences which are functions of time and consequently, χj i (t) is also time-dependent. However, in order to maintain consistency with detection part, we have avoided time-indexing for divergence terms DKL(ϕi||τi) and DKL(ηa i ||ηi). Based on Leibniz integral rule [43], equation (2.58) can be implemented by the following differential equation ˙cj i (t) + κjcj i (t) = κjχj i (t), j = 1, 2. divergence term DKL(ϕi||τi) ( DKL(ηa zero and consequently makes the value of c1 According to Theorems 4 (Theorem 5), for IMP-based attack (non-IMP based attack), the i (t)) approach i (t)) close to zero. On the other hand, i (t)) approach 1 and, i (t) is, the more i (t)) becomes close to 1. The larger the value of cj without an attack, divergence term tend to zero, making χ1 i ||ηi)) increases, which makes χ1 consequently, c1 i (t) (χ2 i (t) (χ2 i (t) (c2 i (t) (c2 confident the agent is about the trustworthiness of its broadcasted information. Then, using cj i (t), j = 1, 2 defined in (2.58), the self-belief of an agent i is defined as ξi(t) = min{c1 i (t), c2 i (t)}. (2.61) 36 If an agent i is under direct attack or receives corrupted information from its neighbors, then the self-belief of the agent i tends to zero. In such a situation, it transmits the low self-belief value to its neighbor to put less weight on the information they receive from it and this prevents attack propagation in the distributed network. 2.6.2 Trust of agents about their incoming information The trust value represents the level of confidence of an agent on its neighbors’ information. If the self-belief value of an agent is low, it forms beliefs on its neighbors (either intact or compromised) and updates its trust value which depends on the beliefs on each of its neigh- bors using only local information. Therefore, agents identify their compromised neighbors and discard their information. Using the KL divergence between exchanged information of agent i and its neighbor, one j∈Ni xj and Λ1, Λ2 > 0 represent the threshold to account for channel fading and other uncertainties, and κ3 > 0 denotes the discount factor. For the compromised neighbor, the KL divergence DKL(xj||mi) tends to zero, which makes Lij(t) close to zero. Consequently, this makes the value of ηij(t) close to zero. On the other hand, if the incoming neighbor is not compromised, then DKL(xj||mi) increases and makes ηij(t) approach 1. Based on Leibniz integral rule, equation (2.62) can be implemented using the following differential equation ˙ηij(t) + κ3ηij(t) = κ3Lij(t). 37 can define ηij(t) as where 0 ≤ ηij(t) ≤ 1 with Lij(t) = 1 − with mi = (cid:80) t(cid:90) ηij(t) = κ3 eκ3(τ−t)Lij(τ )dτ, 0 (cid:18) Λ1 + e (cid:19) Λ1 DKL(xj||mi) −Λ2 ∀j ∈ Ni , (2.62) (2.63) Now, we define the trust value of an agent on its neighbors as Ωij(t) = max(ξi(t), ηij(t)), (2.64) with 0 ≤ Ωij(t) ≤ 1. In the absence of attacks, the state of agents converge to the consensus trajectory and the KL divergence DKL(xj||mi), ∀j ∈ Ni tends to zero which results in Ωij(t) being 1 ∀j ∈ Ni. In the presence of attacks, ηij(t) corresponding to the compromised agents tends to zero. Figure 2.1: Schematic representation of the proposed resilient approach for DMASs. 2.6.3 The mitigation mechanism using trust and self-belief values In this subsection, the trust and self-belief values are utilized to design the mitigation al- gorithm. To achieve resiliency, both self-belief and trust values are incorporated into the exchange information among agents as shown in Fig. 2.1. Consequently, the resilient form of local neighborhood tracking error (2.28) is presented as (cid:0)xj − xi (cid:1) + ωi, (2.65) (cid:88) j∈Ni ˜ηi = Ωij(t)ξj(t)aij 38 where Ωij(t) and ξj(t) denote, respectively, the trust value in (2.65) and the self-belief of neighboring agents in (2.61). Based on the controller in (2.4) with resilient form of local neighborhood tracking error in (2.65), the resilient control protocol can be written as ˜ui = cK ˜ηi, ∀ i ∈ N . (2.66) According to (2.65), the topology of the graph changes over time due to incorporation of the trust and the self-belief values of agents, thus we denote the time-varying graph as G(t) = (V, E(t)) with E(t) ⊆ V × V representing the set of time-varying edges. Now, based on following definitions and lemma, we formally present Theorem 6 to il- lustrate that the trust and self-belief based proposed resilient control protocol (2.66) solves Problem 1 for all intact agents, i.e., NInt = N\NComp as defined in Definition 3 achieve the final desired consensus, regardless of attacks in the distributed network. Definition 7 (r-reachable set) [20]. Given a directed graph G and a nonempty subset Vs ⊂ V, the set Vs is r-reachable if there exists a node i ∈ Vs such that |Ni\Vs| (cid:62) r, where r ∈ Z(cid:62)0. (cid:3) Definition 8 (r-robust graph) [20]. A directed graph G is called an r-robust graph with r ∈ Z(cid:62)0 if for every pair of nonempty, disjoint subsets of V, at least one of the subsets is (cid:3) r-reachable. Assumption 3. If at most q neighbors of each intact agents are under attack, at least (q + 1) neighbors of each intact agents are intact [15]. [20] Consider an r-robust time-varying directed graph G(t). Then, the graph Lemma 6. has a directed spanning tree, if and only if G(t) is 1-robust. The following theorem shows that the proposed resilient controller (2.66) guarantees synchronization despite attack. Theorem 6. Consider the DMAS (2.1) under attack with the proposed resilient control protocol ˜ui in (2.66). Let the time-varying graph G(t) be such that at each time instant t, t→∞||xj(t) − xi(t)|| = 0 ∀i, j ∈ NInt. Assumption 1 and Assumption 3 are satisfied. Then, lim 39 Proof. The DMAS (2.1) with the proposed resilient control protocol ˜ui in (2.66) without noise can be written as ˙xi = Axi + cBK (2.67) aij(t)(cid:0)xj − xi (cid:1), (cid:88) j∈Ni with aij(t) = Ψij(t)Cj(t)aij, where Ωij(t) and ξj(t) represent, respectively, the trust value in (2.65) and the self-belief of neighboring agents in (2.61). The global form of resilient system dynamics in (2.67) becomes ˙x = (IN ⊗ A − cL(t) ⊗ BK)x, (2.68) where L(t) denotes the time-varying graph Laplacian matrix of the directed graph G(t). Based on Assumption 3, even if q neighbors of an intact agent are attacked and collude to send the corrupted value to misguide it, there still exists q + 1 intact neighbors that communicate values different from the compromised ones. Moreover, since at least q+1 of the intact agent’s neighbors are intact, it can update its trust values to remove the compromised neighbors. Furthermore, since the time varying graph G(t) resulting from isolating the compromised agents is 1-robust, based on Definition 8 and Lemma 6, the entire network is still connected to the intact agents. Therefore, there exists a spanning tree in the graph associated with all intact agents NInt. Hence, it is shown in [44] that the solutions of DMAS in (2.68) reach consensus on desired behavior if the time-varying graph G(t) jointly contains a spanning tree t→∞||xj(t) − xi(t)|| = 0 ∀ i, j ∈ NInt as the network evolves with time. This results in lim assymptotically. This completes the proof. Remark 8. The proposed approach discards the compromised agent only when an attack is detected, in contrast to most of the existing methods that are based on solely the discrepancy among agents. Note that discrepancy can be the result of a legitimate change in the state of one agent. Moreover, in the beginning of synchronization, there could be a huge discrepancy (cid:3) between agents’ states that should not cause discarding information. 40 2.7 Simulation Results Consider a group of 5 homogeneous agents with the dynamics defined as ˙xk = Axk + Buk k = 1, . . . , 5, (2.69) where A = [0, −1; 1 0] and B = [1, 1]. The communication graph is shown in Fig. 2.2. We assume zero-mean Gaussian com- munication noise with following statistical properties N (0, 0.1). Figure 2.2: Communication topology. IMP-based attacks 2.7.1 Since the eigenvalues of A in (2.69) are λ1,2 = ±i, based on Definition 2, the attack signal f = 20 sin(t) is an IMP-based attack. Let the attack signal be injected to Agent 1 (root node) at time t=20. The results are shown in Fig. 2.3. It can be seen that the compromised agent destabilizes the entire network. This result is consistent with Theorem 2. It is shown in Fig. 2.4 that the same IMP-based attack on Agent 5 (non-root node) cannot destabilize the entire network. However, Agent 4, which is the only agent reachable from Agent 5, does not synchronize to the desired consensus trajectory. Moreover, one can see in Fig. 2.5 that the local neighborhood tracking error converges to zero for all intact agents except the compromised Agent 5. These results are in line with Theorem 3. Fig. 2.6 shows the KL divergence in the presence of IMP based attack on Agent 5 grows for corrupted agent. This 41 result follows Theorem 4. Then, the effect of attack is rejected using the presented belief- based detection and mitigation approach in Theorem 4 and Theorem 6. Fig. 2.7 shows that reachable agents follow the desired consensus trajectory, even in the presence of the attack. Figure 2.3: The state of agents when Agent 1 is under an IMP-based attack. Figure 2.4: Agent 5 is under IMP-based attack. The state of agents. Figure 2.5: Agent 5 is under IMP-based attack. The local neighborhood tracking error of agents. 42 510152025303540Time (s)-1000-50005001000StateAgent 1Agent 2Agent 3Agent 4Agent 501020304050Time (s)-505StateAgent 1Agent 2Agent 3Agent 4Agent 501020304050Time (s)-10-50510LNT ErrorAgent 1Agent 2Agent 3Agent 4Agent 5 Figure 2.6: Divergence for state of agents when Agent 5 is under a IMP-based attack. Figure 2.7: The state of agents using the proposed attack detection and mitigation approach for IMP-based attack. Figure 2.8: The state of agents when Agent 5 is under a non-IMP-based attack. 2.7.2 Non-IMP-based attacks The attack signal is assumed to be f = 10 + 5 sin(2t). The effect of this attack on Agent 5 (non-root node) is shown in Fig. 2.8. It can be seen that this non-IMP-based attack on Agent 5 only affects the reachable Agent 4. Then, Fig. 2.9 shows that the KL divergence in the presence of non-IMP based attack on Agent 5 grows for compromised agent. It is shown 43 01020304050Time (s)0200400600800KL Divergence01020304050Time (s)-10-50510StateAgent 1Agent 2Agent 3Agent 4Agent 501020304050Time (s)-505StateAgent 1Agent 2Agent 3Agent 4Agent 5 Figure 2.9: Divergence for state of agents when Agent 5 is under a non-IMP based attack. Figure 2.10: The state of agents after attack detection and mitigation for non-IMP based attack. in Fig. 2.10 that the effect of the attack is removed for the intact Agent 4 using belief-based detection and mitigation approaches presented in Theorems 5 and 6. 2.8 Conclusion A resilient control framework has been introduced for DMASs. First, the effects of IMP- based and non-IMP-based attacks on DMASs have been analyzed using a graph-theoretic approach. Then, a KL divergence based criterion, using only the observed local information of agents, has been employed to detect attacks. Each agent detects its neighbors’ misbe- haviors, consequently forming a self-belief about the correctness of its own information, and continuously updates its self-belief and communicates it with its neighbors to inform them about the significance of its outgoing information. Additionally, if the self-belief value of an agent is low, it forms beliefs on the type of its neighbors (intact or compromised) and, consequently, updates its trust of its neighbors. Finally, agents incorporate their neighbors’ 44 01020304050Time (s)01020304050KL Divergence01020304050Time (s)-50510StateAgent 1Agent 2Agent 3Agent 4Agent 5 self-beliefs and their own trust values in their control protocols to slow down and mitigate attacks. 2.9 Appendix Proof of Lemma 5 Using (2.39), the KL divergence between error sequences ϕi and τi can be written as DKL(ϕi||τi) = E1[log Pϕi − log Pτi], (2.70) where probability density functions Pϕi and Pτi are defined in (2.37). Using (2.37), (2.70) and the logarithm property log(a + b) = log(a) + log(1 + b/a), one has DKL(ϕi||τi) =  = E1[log (cid:124) − (qi−µ1i)2 2σ2 1i 1√ 2π |σ1i|e 1 + e − 2qiµ1i σ2 1i + E1[log (cid:124) − (qi−µ2i)2 2σ2 2i 1√ 2π |σ2i| e  − log  (cid:123)(cid:122)  − log 1 + e (cid:123)(cid:122) T1 T2 − 2qiµ2i σ2 2i ] (cid:125) ] (cid:125) . (2.71) The first term in (2.71) is a KL divergence formula for statistical sequences with normal Gaussian distribution which is given in [42] as 2i σ2 1i) σ−2 2i (µ2i − µ1i)2. + 1 2 (2.72) (cid:32) T1 = 1 2 log σ2 2i σ2 1i (cid:33) − 1 + (σ−2 (cid:16) (cid:88) n(cid:62)0 The second term T2 in (2.71), using power series expansion (−1)nan+1(cid:46) (cid:17) log(1 + a) = (n + 1) and ignoring higher order terms, can be approximated as T2 ≈ E1[e − 2qiµ1i 1i − (e σ2 2 − 2qiµ1i σ2 1i ) 2 ] − E1[e − 2qiµ2i 2i − (e σ2 2 − 2qiµ2i σ2 2i ) 2 ], (2.73) 45 which can be expressed as ∞(cid:90) −∞ − 2qiµ1i Pϕie 1i dqi − 1 σ2 ∞(cid:90) 2 ∞(cid:90) −∞ T2 ≈ ∞(cid:90) − −∞ − 2qiµ2i σ2 2i dqi + 1 2 Pϕie − 4qiµ2i σ2 2i dqi. Pϕie −∞ − 4qiµ1i σ2 1i dqi Pϕie Now, the first term of T2 can be written as − 2qiµ1i σ2 1i dqi Pϕie − (qi+µ1i)2 2σ2 1i 1√ 2π |σ1i|e − (qi+µ1i)2+4qiµ1i 2σ2 1i 1√ 2π |σ1i| e ∞(cid:90) −∞ dqi + ∞(cid:90) ∞(cid:90) −∞ = −∞ Using the fact that density integrates to 1, (2.75) becomes ∞(cid:90) −∞ − 2qiµ1i σ2 1i dqi = 1 + e 4µ2 1i σ2 1i . Pϕie Similarly, second term of T2 can be written as ∞(cid:90) −∞ −1 2 − 4qiµ1i σ2 1i dqi Pϕie = − √ 2 1 2π |σ1i| which yields ∞(cid:90) e ∞(cid:90) −∞ − (qi+3µ1i)2−8µ2 1i 2σ2 1i − (qi+5µ1i)2−24µ2 1i 2σ2 1i dqi + e −1 2 − 4qiµ1i Pϕie 1i dqi = −1 σ2 2 e 4µ2 1i σ2 1i + e 12µ2 1i σ2 1i  . −∞ The third term of T2 is − 2qiµ1i ∞(cid:90) Pϕie σ2 1i dqi − −∞ e ∞(cid:90) −∞ = − 1√ 2π |σ1i| − (qi−µ1i)2 2σ2 1i − 2qiµ2i e σ2 2i + e − (qi+µ1i)2 2σ2 1i − 2qiµ2i σ2 2i e 46 (2.74) (2.75) (2.76) (2.77) (2.78) (2.79) dqi. , dqi dqi, which can be written in the form ∞(cid:90) e − −∞ = − − 2qiµ1i σ2 1i dqi Pϕie − µ2 1i−ρ2 2σ2 1i 1 ∞(cid:90) −∞ − µ2 1i−ρ2 2σ2 1i 2 +e ∞(cid:90) −∞ − (qi−ρ1)2 2σ2 1i 1√ 2π |σ1i|e − (qi−ρ2)2 2σ2 1i dqi 1√ 2π |σ1i|e dqi  , 1iσ−2 where ρ1 = (µ1i − 2µ2iσ2 ∞(cid:90) 1iσ−2 2i ) and ρ2 = (µ1i + 2µ2iσ2 1i−ρ2 2σ2 − 2qiµ1i − µ2 1 1i dqi = − σ2 − µ2 1i−ρ2 2σ2 1i 2 1i + e 2i ) which becomes − −∞ Pϕie Similarly, the last term of T2 can be simplified as − µ2 − 2qiµ1i ∞(cid:90) 1 2 −∞ Pϕie σ2 1i dqi = 1 2 1i−ρ2 2σ2 3 1i + e − µ2 1i−ρ2 2σ2 1i 4 e e .   , (2.80) (2.81) (2.82) (2.83) 1iσ−2 where ρ3 = (µ1i − 4µ2iσ2 and (2.82), T2 can be written as 2i ) and ρ4 = (µ1i + 4µ2iσ2 2i ) . Adding (2.76), (2.78), (2.81) 1iσ−2  − e   ρ2 1 2σ2 1i + e ρ2 2 2σ2 1i e 1 1 − e 2 ρ2 3 2σ2 1i + e ρ2 4 2σ2 1i  . 8µ2 1i σ2 1i − µ2 1i 2σ2 1i T2 ≈ e 4µ2 1i σ2 1i 1 2 e +1 + Now, using (2.72)-(2.73) and (2.83), one gets (2.39). This completes the proof. 47 CHAPTER 3 DETECTION AND MITIGATION OF DATA MANIPULATION ATTACKS IN AC MICROGRIDS 3.1 Introduction In the previous chapter, we analyzed the adverse effect of attacks and designed resilient distributed control mechanism for DMASs with guaranteed performance and consensus un- der mild assumptions. This chapter validates the effectiveness of the developed approach in Chapter 2 by applying it to distributed frequency and voltage synchronization of AC micro- grids under data manipulation attacks. The attack detection mechanism deploys Kullback- Liebler (KL) divergence to measure the discrepancy between the Gaussian distributions of the actual and expected local frequency/active power and voltage/reactive power neighbor- hood tracking errors. To mitigate the negative impact of attack, a self-belief value, as an indication of the probability of presence of attacks on neighbors of an agent, is presented for each distributed energy resource (DER) by utilizing KL-based detectors. The self-belief value is a measure of trustworthiness of the DER’s own outgoing information and is transmit- ted to neighboring DERs. Moreover, the trustworthiness of the incoming information from neighboring DERs is estimated using a trust factor. Trust for individual DERs is developed based on the relative entropy between DER own information and its neighbor’s informa- tion on the communication graph. The attack mitigation algorithm utilizes self-belief and trust values to modify distributed control protocols. Finally, the performance of proposed resilient frequency and voltage control techniques is verified through simulation of microgrid tests system and hardware-in-the-loop (HIL) set-up using Opal-RT as a real-time digital simulator. 48 3.2 Preliminaries The communication network of a microgrid can be modeled by a graph. DERs are consid- ered as the nodes of the communication graph and the communication links are considered as the edges. A graph is usually expressed as G = (V,E,A) with a nonempty finite set of N nodes V = {v1, v2, . . . , vN}, a set of edges or arcs E ⊂ V × V, and the associated adjacency matrix A = [aij ∈ RN×N . aij is the weight of edge (vj, vi), and aij > 0 if (vj, vi) ∈ E, otherwise aij = 0. The set of neighbors of node i is denoted as Ni = {j|(vj, vi) ∈ E}. The aij. The Laplacian in-degree matrix is defined as D = diag{di} ∈ RN×N with di =(cid:80) j∈Ni matrix is defined as L = D − A [80]. Assumption 1. The communication graph G has a spanning tree. 3.3 Conventional Distributed Secondary Control In the microgrid hierarchical control structure, the primary control level maintains the voltage and frequency stability of the microgrid. The secondary control level restores the microgrid voltage and frequency to their nominal values. DERs are integrated to the rest of microgrid through Voltage Source Inverters (VSI). Depending on the control objectives, DERs can be of two main types, namely grid forming and grid following. Grid forming DERs utilize a Voltage Controlled VSI (VCVSI) and have the capability of dictating microgrid frequency and voltage. On the other hand, grid following DERs utilize a Current Controlled VSI (CCVSI) and follow the microgrid frequency and voltage while supplying a specific amount of active and reactive power based on external set points [54]. The primary control is locally implemented at grid forming DERs by the droop technique. This technique prescribes a relation between the frequency, ωi, and the active power, and between the voltage magnitude, vo,magi, and the reactive power. The frequency and voltage droop characteristics are  ωi = ωni − mP iPi vo,magi = Vni − nQiQi 49 , (3.1) where ωni and Vni are the primary frequency and voltage control references and mP i and nQi are the active and reactive power droop coefficients, respectively. Conventionally, the active power droop coefficients are proportionally selected based on the apparent power rating of DERs. However, the reactive power droop coefficients are proportionally selected based on the maximum reactive power which is calculated using a minimum allowable power factor and apparent power rating of DER [46]. The apparent power rating is related to thermal rating of DER equipment (e.g., power electronics switches). The objective of distributed secondary control is to mitigate the microgrid frequency and voltage deviations from their nominal values which are caused by primary control. Dis- tributed secondary control utilizes distributed control protocols implemented on individual DERs that can communicate with each other through a distributed communication network and share their local information with neighboring DERs. Problem 1: The distributed secondary control chooses ωni and Vni in (3.1) such that the operating frequency and terminal voltage magnitude of each DER synchronize to the reference frequency and voltage, ωref and vref , i.e., lim t→∞ lim t→∞ (cid:13)(cid:13)(cid:13) = 0 (cid:13)(cid:13)(cid:13)ωi(t) − ωref (cid:13)(cid:13)(cid:13)vo,magi(t) − vref (cid:13)(cid:13)(cid:13) = 0 ∀i ∈ N. (3.2) Moreover, the secondary control should guarantee the allocation of active and reactive power of DERs based on the droop coefficients [51]-[55] as mP iPi = mP jPj, (3.3) (3.4) where Pmaxi/Qmaxi and Pmaxj/Qmaxj are the active and reactive power ratings of i − th and j − th DER, respectively. nQiQi = nQjQj, The secondary control of a microgrid including N DERs is described as the synchronization problem for the following first-order multi-agent system to adjust the primary control inputs (cid:110) ˙ωni = vωi, ˙Vni = vvi, i = 1, ..., N, (3.5) 50 where vωi and vvi are the distributed secondary frequency and voltage control (DSFC and DSVC) protocols that are chosen based on the local information of each DER and neighboring DERs’ information and can be written as [54] vωi = −cωδωi, vvi = −cvδvi, (3.6) (3.7) where cω and cv are the control gains; δωi and δvi are the local frequency and voltage neighborhood tracking errors that can be written as (3.8) (3.9) (cid:88) (cid:88) j∈Ni j∈Ni δωi = + aij(ωi − ωj) + gi(ωi − ωref ) aij(mP iPi − mP jPj), (cid:88) (cid:88) j∈Ni j∈Ni δvi = + aij(vo,magi − vo,magj) + gi(vo,magi − vref ) aij(nQiQi − nQjQj), The pinning gain gi is assumed nonzero for only one DER. Remark 1. Note that there always exists a low-level communication noise in the network of DERs. Therefore, in the presence of the communication noise, one can write the auxiliary controls vωi and vvi of i − thDER in (3.6) and (3.7) as ζωi = vωi + ηωi ζvi = vvi + ηvi , (3.10) where ηωi ∼ N (0, Σωi) and ηvi ∼ N (0, Σvi), respectively, denote the aggregate Gaussian noise affecting the incoming neighbors’ frequency and voltage to i-th DER. In general, the noise associated with electronic devices at the receiver end lies under the category of thermal noise and statistically modeled as Gaussian, thus we assumed communication noise to be Gaussian and it is a standard assumption in the literature [56]. In noisy scenarios, the synchronization problem for microgrid frequency and voltage as defined in Problem 1 changes to the mean square synchronization problem 51 and becomes  lim (cid:13)(cid:13)(cid:13)2 t→∞ E(cid:13)(cid:13)(cid:13)ωi(t) − ωref (t) t→∞ E(cid:13)(cid:13)(cid:13)vo,magi(t) − vref (t) lim = 0 (cid:13)(cid:13)(cid:13)2 ∀i ∈ N. = 0 (3.11) 3.4 Attack Modeling and Detection Mechanism This section presents attack modelling and detection mechanism for the distributed secondary control of microgrid. Definition 1. (Compromised DER). A DER that is directly under attack is called a compro- mised DER. Definition 2. (Intact DER). A DER that is not compromised or not under direct attack is called an intact DER. 3.4.1 Attack Modeling For the direct attack on controller, one can model the DER’s frequency as ωcr i = ωi + γiωa i (3.12) with ωa i as the injected attacker’s input into the controller of i-th DER and ωcr i denotes the corrupted DER frequency with scalar γi equal to 1 in the presence of attack. Similarly, for the attack on the communication channel between two DERs, one can model the received corrupted frequency signal from j − th DER as ωcr j = ωj + γjωa j (3.13) where ωa j represents the injected attacker’s input into the communication channel between two i denotes the corrupted DER frequency of neighbor j received at i − th DER with DERs and ωcr scalar γi equal to 1 in the presence of attack. Remark 2. This subsection discusses the attack model in terms of DER’s frequency which affects the auxiliary control vωi in (3.6). Moreover, the rest of the chapter considers frequency-based attacks and presents attack detection and mitigation mechanisms. Without loss of generality, the same approach holds true for attack modelling, detection and mitigation mechanisms for voltage- based attacks. 52 Remark 3. Attack models in (3.12)-(3.13) represent frequency manipulation attacks on controllers. Due to the extensive deployment of communication and control technologies and the presence of Intelligent Electronic Devices (IEDs), the microgrid control system is highly vulnerable to cyber- attacks. An attack tree for FDI threat analysis is provided to illustrate the attack path. As seen, the FDI attack can tamper with either the sensors (e.g., Phasor Measurement Units (PMUs)) or actuators (control and decision-making units). Such attacks can be launched by injecting counterfeit attack signals into sensors of DER measurement units or directly by injecting a disturbance into the control units and even hijacking the entire controller. More specifically, FDI attacks on DERs can endanger microgrid voltage and frequency stability, slow down the DER control system responses, or overload DERs. The existing firewall/intrusion detection systems (IDSs) monitor and analyze information flow in the network and detect if there exists considerable change in the information flow. However, there is no single IDS that is able to detect all different attack types [81]. Moreover, the IDSs’ effectiveness highly depends on their parameters. So, if the IDS parameters are not fine-tuned, the possibility of not detecting attacks increases [81]. On the other hand, IDSs do not block the corrupted information and cannot mitigate attacks. Therefore, it is of vital importance to design a resilient control protocol for microgrids that can mitigate attacks and ensure an acceptable level of functionality for microgrid despite attacks. 3.4.2 Attack Detection Mechanism This subsection presents a relative entropy-based attack detection approach for the distributed secondary control of microgrid. More specifically, KL divergence, a non-negative measure of the rel- ative entropy between two probability distributions is employed to measure the discrepancy between them. Definition 3. (KL divergence) [41], [82] Let X and Z be two random sequences with probability density function PX and PZ, respectively. The KL divergence measure between PX and PZ in continuous-time is defined as dθ (3.14) (cid:90) DKL(X||Z) = (cid:18) PX (θ) (cid:19) PZ (θ) PX (θ) log 53 If the sequences X and Z are Gaussian distributed, then the KL divergence in (3.14) can be simplified in the terms of mean and covariance of sequences as [41] (cid:18) (µZ − µX )T Σ−1 |ΣZ| |ΣX| − n + tr(Σ−1 Z (µZ − µX ) log 1 2 Z ΣX ) (cid:19) DKL(X||Z) = 1 2 + (3.15) where µX and ΣX denote the mean and covariance of sequence X, and µZ and ΣZ denote the mean and covariance of sequence Z. Moreover, n denotes the dimension of the sequences. For the design of an attack detector, we first rewrite the frequency auxiliary control ζωi in (3.10) with statistical properties and then present an attack detection mechanism based on the KL divergence measure for distributed secondary control of AC microgrids. We show that in the presence of an attack, one can identify different sophisticated attacks based on the change in the statistical properties of the auxiliary control variables. In the absence of attack, since we consider the Gaussian noise in the communication channel, then the auxiliary control ζωi in (3.10) can be written as ζωi = −cωδωi + ηωi (3.16) where ηωi denotes the aggregate Gaussian noise affecting the incoming neighbors’ information given by ηωi = aijηωij ∼ N (0, Σωi) (3.17) (cid:88) j∈Ni Due to presence of noise, the statistical properties of the auxiliary control ζωi in (3.10) becomes ζωi ∼ N (0, Σωi) and it represents the nominal behavior of the DSFC. In the presence of attacks, using (3.10), the auxiliary control ζa ωi becomes with the corrupted local neighborhood tracking error δcr ωi = δωi + fi where ωi ωi = −cωδcr ζa (cid:88) j∈Ni aij + gi)ωa + ηωi i − (cid:88) j∈Ni fi = [( aijωa j ] 54 (3.18) (3.19) (3.20) denotes the overall deviation in the local neighborhood tracking error due to the attacks on con- troller/communication channel in the network. Note that in presence of attacks, one can observe control ζa the corrupted frequency of DERs and based on corrupted frequency, one has the corrupted auxiliary ωi. The overall attacker’s input fi is neither measurable nor required to be known. The statistical properties of corrupted control protocol changes due to the effect of attacks. Now, from (3.19), one has the following statistical properties ωi ∼ N (µfi ζa , Σfi + Σωi) (3.21) and Σfi are mean and covariance of the injected overall attack signal fi, respectively. ωi and ζωi have normal Gaussian distributions, according to (3.15) the KL divergence where µfi Since both ζa DKL(ζa ωi||ζωi) between control sequences ζa DKL(ζa ωi||ζωi) = 1 2 log ωi and ζωi becomes (cid:12)(cid:12)(cid:12)Σζωi (cid:12)(cid:12)(cid:12)(cid:12)Σζa ωi (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) − 1 + tr(Σ−1 ζωi  ) Σζa ωi + 1 2 (µζωi − µζa ωi )T Σ−1 ζωi (µζωi − µζa ωi ) (3.22) where µζωi and Σζωi and covariance of ζa ωi. denote the mean and covariance of ζωi and µζa ωi and Σζa ωi denote the mean We define the average of KL divergence over a window T as (cid:90) k+T−1 Ωi = 1 T k DKL(ζa ωi||ζωi)dτ (3.23) to detect the change due to the adversarial input. Now, in the following theorem, we show that the effect of attacks in the secondary distributed control of the microgrid can be detected based on the discrepancy of the control sequences ζa ωi and ζωi. Theorem 1. Consider the distributed auxiliary control ζωi in (3.16) under attacks. Then, a) Ωi defined in (3.23) becomes zero, if there is no attack on DERs. b) Ωi defined in (3.23) is greater than a design threshold γi, if the microgrid secondary control is under attack. Proof. In the absence of attacks, the statistical properties of sequences ζa in (3.18) and (3.21) are the same because µfi ωi and ζωi, respectively, become zero as fi = 0. Therefore, the KL and Σfi 55 divergence DKL(ζa zero. This complete the proof of part (a). ωi||ζωi) in (3.22) becomes zero based on (3.15) which yields Ωi in (3.23) to be For the proof of Part (b), using (3.18)-(3.21) in (3.22), the KL divergence between ζa ωi and ζωi becomes DKL(ζa Then, using (3.23), one has k+T−1 ( Ωi = 1 T k (cid:12)(cid:12)(cid:12) 1 2 (log (cid:12)(cid:12)(cid:12)Σωi (cid:12)(cid:12)(cid:12)Σfi (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)Σωi (cid:12)(cid:12)(cid:12)Σfi + Σωi + Σωi ωi (cid:12)(cid:12)(cid:12) + tr(Σ−1 (cid:12)(cid:12)(cid:12) + tr(Σ−1 Σfi ωi ωi||ζωi) = (cid:90) 1 2 log Σfi ) + µT fi Σ−1 ωi ) µfi (3.24) ) + µT fi Σ−1 ωi µfi ))dτ > γi (3.25) where T and γi denote the sliding window size and the predefined positive design threshold, respec- (cid:3) tively. This completes the proof. Based on the presented Theorem 1, effect of attacks on the distributed secondary control of microgrids can be detected using the predefined design threshold γi. Attack detection in (3.25) uses the idea of average over a fixed length moving window to avoid false detection. If there is a short-period anomaly rather than attack (such as disturbance or packet dropout), it vanishes in a few time steps and such anomalies are not detected as attacks. 3.5 Resilient Distributed Control Mechanism This section presents a resilient distributed control mechanism for distributed secondary control of microgrids based on the proposed attack detection algorithm in the previous section. 56 Figure 3.1: The flowchart of proposed attack detection and mitigation approach. 57 To this end, first, we introduce the notion of self and external-belief of DERs about trustworthiness of their own information and their neighbor’s information, respectively. Then the presented beliefs are incorporated in the distributed secondary control protocols. 3.5.1 Belief of DERs About Their Own Observed Frequency To measure the level of trustworthiness of each DER about its own observed frequency, which depends on the proximity to the source of the attack in the network, a self-belief is presented. In the presence of the adversary, a DER reduces its level of trustworthiness about its own observed frequency and transmits its self-belief to its immediate neighbors which prevent the propagation of attack in the microgrid. Using the DKL(ζa frequency is defined as ωi||ζωi) from Theorem 1, self-belief of i − th DER about its own observed t(cid:90) IBel i (t) = κ1 eκ1(τ−t)ψi(τ )dτ (3.26) (3.27) where 0 ≤ IBel i (t) ≤ 1 with 0 ψi(t) = ∆1 + DKL(ζa ωi||ζωi) ∆1 where ∆1 represents the threshold to account for the channel fading and other uncertainties and 0 < κ1 < 1 denotes the discount factor. Equation (3.26) can be implemented by the following differential equation ˙IBel i (t) + κ1IBel i (t) = κ1ψi(t) Based on Theorem 1, in the presence of attacks, DKL(ζa of the DER ψi(t) close to zero and, consequently, the value of IBel the other hand, based on Theorem 1, in the absence of attack DKL(ζa makes ψi(t) close to one and, consequently, IBel (t) becomes close to one. i (3.28) ωi||ζωi) >> ∆1, which makes the self-belief (t) becomes close to zero. On ωi||ζωi) tends to zero, which i If a DER is under direct attack, its self-belief tends to zero according to (3.26). The DER transmits its self-belief value to the neighboring DERs. Using the received self-belief values, neigh- boring DERs ignore the information received from the attacked DER which prevents the attack propagation. Note that the discount factor in (3.26) evaluates the importance of current informa- tion with regards to past information. The discount factor ensures that if an attacker removes the 58 effect of attack in a while, or if a short-period adversarial effect exists rather than attack (such as packet dropout), then the belief of the DER will be recovered, as it mainly depends on the current information. 3.5.2 Belief of DERs About Their Neighbor’s Observed Frequency To evaluate the level of confidence of a DER on its neighbor’s observed frequency, we introduce the notion of external-belief or trust. If the self-belief value of a DER is low, it forms beliefs on its neighboring DER’s information (either intact or compromised) and updates its external-belief which depends on the beliefs on each of its neighbors using only local information. Therefore, the DERs can identify the compromised neighbor and discard its information in their control protocol. In the worst-case scenario, a compromised DER always transmits the self-belief value of 1 to its neighbors to deceive them. Based on the external-belief a DER can identify the corrupted neighbors and discards their information. Using the KL divergence between exchanged information of the i-th DER and its neighbor, one can define the Υij(t) as where 0 ≤ Υij(t) ≤ 1 and with mi = (1/|Ni|)(cid:80) j∈Ni t(cid:90) 0 Υij(t) = κ2 eκ2(τ−t)χij(τ )dτ χij(t) = ∆2 ∆2 + DKL(ωi||mi) ∀j ∈ Ni ωj; ∆2 > 0 represent the threshold to account for the (3.29) (3.30) channel fading and other uncertainties; 0 < κ2 < 1 denotes the discount factor. For the neighboring DER under direct attack, the KL divergence DKL(ωi||mi) becomes high which makes χij(t) close to zero. Consequently, this makes the value of Υij(t) close to zero. On the other hand, if the incoming information from neighboring DER is intact, then DKL(ωi||mi) becomes close to zero which makes χij(t) close to one. Equation (3.29) can be implemented using the following differential equation Now, we define the external-belief value of a DER on its neighbors as ˙Υij(t) + κ2Υij(t) = κ2χij(t) EBel ij (t) = min(IBel i (t), Υij(t)) 59 (3.31) (3.32) with 0 ≤ EBel (t) ≤ 1. Note also that the discount factor in (3.26) and (3.29) determines how ij much we value the current experience with regards to past experiences. It also guarantees that if the attack is not persistent and disappears after a while, or if a short-period adversary rather than attack (such as disturbance or packet dropout) causes, the belief will be recovered, as it mainly depends on the current circumstances. 3.5.3 The Mitigation Mechanism Using Self and External-belief values This subsection presents a resilient or cyber-secure auxiliary control protocol for secondary control of microgrid. We employ the entropy-based self and external-belief values in the mitigation algorithm (See Fig. 3.1). More specifically, both self and external-belief values in (3.26) and (3.32) are incorporated into the frequency based auxiliary control in (3.10) and the resilient form is presented as where αij(t)(ωi − ωj) + gi(ωi − ωref ) (cid:88) j∈Ni αij(t)(mP iPi − mP jPj)) + ηωi ζωi = −cω( (cid:88) j∈Ni + αij(t) = aijIBel i (t)EBel ij (t) (3.33) (3.34) incorporates the self and external-belief discussed in the previous subsection. The following theorem solves Problem 1 using proposed resilient auxiliary control protocol in (3.33) for intact DERs in the presence of attack. Assumption 2 (m-local connectivity). If at most m neighbors of each intact DER is under attack, at least m + 1 neighbors of each intact DER are intact [82]. Remark 4. Assumption 2 is a common assumption in the distributed control literature [16], [82]. This assumption provides a minimum requirement for any distributed system to ensure consensus in the presence of attack. Theorem 2. Consider the resilient DSFC in (3.33). Let Assumptions 1 and 2 be satisfied. Then, the frequency of the intact DERs synchronizes to the desired nominal frequency in mean square sense, despite the m compromised DERs. 60 Proof. The resilient frequency based secondary control in (3.33) can be rewritten as ζωi = −cω( (cid:88) j∈Ni + αij(t)(ωi − ωj) + gi(ωi − ωref ) (cid:88) j∈Ni αij(t)(mP iPi − mP jPj)) + ηωi (3.35) where the weight αij(t) defined in (3.34) combines the self-belief of agent i and its external belief on agent j. The global form of the (3.35) becomes ζω = −cω((L(t) + G)(ω − ¯ωref ) + L(t) ¯P ) + ηω (3.36) where ω = [ω1, . . . , ωN ]T , ωref = 1N ⊗ ¯ωref , ηω = [ηω1, . . . , ηωN ]T , ζω = [ζω1, . . . , ζωN ]T and ¯P = [mP 1P1, . . . , mP N PN ]T . Moreover, L(t) ∈ RN×N and G ∈ RN×N denote the graph Lapla- cian matrix and the diagonal gain matrix, with diagonal entries equal to the pinning gains gi, respectively. According to Assumption 2, the total number of the compromised agents is less than half of the network connectivity, i.e., 2m + 1. Therefore, even if m neighbors of an intact DER are attacked and collude to transmit the same value to mislead the intact DER, there still exists m + 1 intact neighbors that transmit the actual values which differ from the compromised ones. Moreover, since m+1 intact DER’s neighbors are intact, it can update its external belief and isolate the compromised neighbors. As shown in [20], the resulting graph after isolating the compromised DERs in the entire network remains connected to the intact DERs. Therefore, there exists a spanning tree in the graph associated with all intact DERs. On the other hand, it is shown in [83]-[84], that distributed agents reach mean square consensus in the presence of Gaussian noise if the graph contains spanning tree. Thus, resilient DSFC in (3.33) intact DERs synchronize to the nominal frequency or the leader’s (cid:3) state. This completes the proof. Remark 5. Note that even in the presence of replay attacks where the attacker replicates all the statistical characteristics of previous control signals for the DER, intact DERs lose their trust on the compromised DER’s due to the divergence term in in calculating the external-belief in (3.30) and reject the corrupted information in their control protocol. 61 Figure 3.2: Single line diagram of the microgrid test system in Case A. Figure 3.3: Communication graph of the microgrid test system in Case A. Remark 6. Although not considered in this chapter, the proposed cyber-secure distributed sec- ondary control can be effectively integrated into the event-triggered based distributed controls (e.g., [64]) to increase the resilience of control system with respect to both FDI and DoS attacks. 62 3.6 Case Studies 3.6.1 Case A: Simulation results for IEEE 34-bus feeder The microgrid test system is illustrated in Fig. 3.2. The IEEE 34 bus test feeder is utilized as the back bone of microgrid with six DERs integrated to different locations. This microgrid system is simulated in MATLAB/Simulink. The specification of lines is provided in [85]. A balanced feeder model by averaging the line parameters is utilized in the test system. Tables I and II summarize the specifications of the loads and DERs, respectively. The nominal frequency and line-to-line voltage are set to 60 Hz and 24.9 kV, respectively. DERs are connected to the feeder through six Y-Y, 480 V/24.9 kV, 400 kVA transformers with the series impedance of 0.03 + j 0.12 pu. The communication graph of distributed secondary control system is depicted in Fig. 3.3. Only DER 1 knows the frequency and voltage reference values with the pinning gain g1 = 1. The control gains cω and cv in (3.6) and (3.7) are set to 40. We assume zero-mean Gaussian communication noise with following statistical properties N (0, 0.01). Two different cases are considered to evaluate the presented results for attack detection and mitigation in the distributed secondary control of microgrids. Case A.1 analyzes the results for DSFC and Case A.2 presents the results for DSVC in the presence of attacks in the microgrid. Case A.1.1 (effect of attack on the conventional DSFC): In this case, we consider the attack on DER 6 based on (3.12). At t = 0, the microgrid is islanded from the main grid. From t = 0 to t = 0.6 s only the primary control is applied. The primary control takes action to provide frequency stability in the islanded However, the primary control only maintains frequency in stable ranges and cannot maintain frequency at exactly 60 Hz. Then, the secondary distributed frequency control is applied at t = 0.6 s to restore the microgrid frequency to 60 Hz. However, the attacker hijacks the DSFC of DER 6 and replaces the actual frequency with 60.2 Hz. Fig. 3.4 show that the conventional DSFC protocol leads to the loss of desired consensus. The frequency of each DER deviates from the desired frequency of 60 Hz and shows oscillatory behavior. In the presence of attack, the behavior of the compromised DER 6 is directly affected by the attack signal and its corrupted frequency is observed by reachable intact DERs which are affected by it and they also show oscillatory behaviors as shown in Fig. 3.4. Fig. 3.5 clearly shows that the relative entropy of compromised and reachable 63 (a) (b) Figure 3.4: Case A: Effect of attack on DSFC: (a) frequency; (b) active power ratio. Figure 3.5: Case A: Relative entropy based on frequency of DERs. (a) (a) (b) Figure 3.6: Case A: Resilient DSFC: (a) frequency; (b) active power ratio. DERs diverge and go beyond the predefined design threshold which is assumed to be in the presence of attack. The relative entropy of compromised DER is relatively much higher than the intact DERs and designed detector can easily detect the effect of attack. Case A.1.2 (attack detection and mitigation): Similar to Case A.1.1, at t = 0, the microgrid is (a) (b) Figure 3.7: Case A: Resilient DSFC: (a) relative entropy; (b) self-believes of DERs. 64 islanded from the main grid. From t = 0 to t = 0.6 s only primary control is applied. Then, the secondary distributed frequency control is applied at t = 0.6 s to restore the microgrid frequency to 60 Hz. However, the attacker hijacks the DSFC of DER 6 and replaces the actual frequency with 60.2 Hz at t = 0.6 s and then, the designed attack detection and mitigation mechanism is applied at t = 0.7 s. As shown in Fig. 3.6, frequency of intact DERs restores to 60 Hz after applying the attack mitigation mechanism at t = 0.7 s. Active power of all DERs are also retrieved back as in intact mode. After applying the resilient DSFC in (3.33), intact DERs discard the frequency value received from corrupted DER and the mean and variance of their local frequency neighborhood tracking error distribution remain close to the normal case. Therefore, based on (3.22), the relative entropy for intact DERs remains close to zero but it keeps growing for the compromised DER 6 due to deviation in mean and variance of the corrupted frequency signal from the nominal one as shown in Fig. 3.7. According to (3.26)-(3.27), self-belief of a DER depends on its relative entropy and one can see in Fig. 3.7 that self-belief for all DERs becomes one except for the compromised DER 6, which indicates that all the DERs are confident about their frequencies, except for the compromised one. The self-belief of a DER measures the level of trustworthiness about its observed frequency, which is updated in each iteration and recursively used in resilient DSFC in (3.33) for mitigation of the attack. Based on the presented resilient DSFC, intact DERs do not incorporate the corrupted frequency from DER 6 and achieve the desired synchronization as shown in Fig. 3.6. Case A.1.3 (attack detection and mitigation for periodic adversaries): In this subsection, the effec- tiveness of the presented attack detection and mitigation algorithm is validated for periodic attacks. The secondary distributed frequency control is applied at t = 0 s which synchronizes the frequency of the microgrid to 60 Hz. Then, the attacker hijacks the DSFC of DER 6 and replaces the actual frequency with 60.2 Hz at t = 1.2 s and t = 2.2 s. In the following, the simulation results are provided for two different attack durations. First, it is assumed that when the attack is applied at t = 1.2 s and t = 2.2 s, it is only effective for 0.05 s. Fig. 3.8 shows the DER frequencies and active power ratios. As seen in Fig. 3.8, due to the short duration of attack, its impact is minimal; DER frequencies slightly deviate from 60 Hz. Based on (22), the relative entropy for intact DERs remains close to zero, but it keeps growing for the compromised DER 6 when the attack is effective due to deviation in mean and variance of the 65 (a) (b) Figure 3.8: Effect of periodic attack on DSFC with 0.05 s duration: (a) frequency; (b) active power ratio. (a) (b) Figure 3.9: Effect of periodic attack on DSFC with 0.05 s duration: (a) relative entropy; (b) self-believes of DERs. corrupted frequency signal from the nominal one as shown in Fig. 3.9. According to (3.26)-(3.27), the self-belief of a DER depends on its relative entropy; one can see in Fig. 3.9 that self-belief values during the attack period are one for all DERs except for the compromised DER 6, which indicates that all the DERs are confident about their exchanged frequencies, except for the compromised one. As expected, for the time interval that the attacker turns off its attack signal, DER frequencies and active power ratios are restored to their intact values before the attack is applied. In the second simulation scenario, it is assumed that when the attack is applied at t = 1 s and t = 3 s, it is effective for 0.5 s. Fig. 3.10 shows the DER frequencies and active power ratios. As seen in Fig. 3.10, after the attack is applied, DER frequencies deviate from 60 Hz and active power ratios experience noticeable oscillations. The relative entropy for intact DERs remains close to zero, but it keeps growing for the compromised DER 6 during durations of attack as shown in Fig. 3.11 (a). As seen in Fig. 3.11(b), self-belief during attack periods becomes one for all DERs except for the compromised DER 6. The attack mitigation scheme restores DER frequencies to 60 Hz and active power ratios to a common value. For the time interval that the attacker turns off its attack 66 (a) (b) Figure 3.10: Effect of periodic attack on DSFC with 0.5 s duration: (a) frequency; (b) active power ratio. (a) (b) Figure 3.11: Fig. 12. Effect of periodic attack on DSFC with 0.5 s duration: (a) relative entropy; (b) self-believes of DERs. (a) (b) Figure 3.12: Effect of attack on DER 2 in DSVC: (a) voltage (V): (b) reactive power ratio. signal, DER frequencies and active power ratios are restored to their intact values before the attack is applied. Case A.2.1 (effect of attack on the conventional DSVC): In this case, we consider the attack on DER 6 based on (3.12). From t = 0 to t = 0.65 s only primary control is applied and then the attacker hijacks the DSVC of DER 6 and replaces the actual voltage with 482 V at t = 0.65 s. In the presence of attack, the conventional DSVC leads to the loss of desired consensus as shown in Fig. 3.12(a) and Fig. 3.12(b). Voltage and reactive power ratio for each DER deviate from the desired consensus and show oscillatory response. The corrupted voltage magnitude of DER 6 is directly 67 Figure 3.13: Case A: Relative entropy based on voltage of DERs. (a) (a) (b) Figure 3.14: Case A: Resilient DSVC: (a) voltage (V); (b) reactive power ratio. observed by reachable intact DERs which are affected by it. The reachable neighboring DERs also show oscillatory behaviors in their operating voltage and reactive power as shown in Fig. 3.12(a). This makes the relative entropy of compromised and reachable DERs diverge and go beyond the predefined design threshold of γi = 5 ∀i in the presence of attack as shown in Fig. 3.13. Case A.2.2 (attack detection and mitigation on DSVC): In this case, we consider the attack on DER 6. From t = 0 to t = 0.65 s only primary control is applied and then the attacker hijacks the DSVC of DER 6 and replaces the actual voltage with 482 V at t = 0.65 s. As shown in Fig. 3.14(a) and Fig. 3.14(b), voltage of all DERs except the hijacked one synchronize to 480 V after applying the mitigation mechanism at t = 0.7 s. The reactive power of DERs are also shared based on their ratings. Fig. 3.15(a) shows that the relative entropy for intact DERs remains close to zero, but it keeps growing for the compromised DER 6 due to deviation of the corrupted voltage from the nominal one, and, consequently, as shown in Fig. 3.15(b), self-belief for all DERs becomes one except for the compromised DER 6. 68 (a) (b) Figure 3.15: Case A: Resilient DSVC: (a) relative entropy; (b) self-belief of DERs. 3.6.2 Case B: Simulation results for an Islanded Microgrid with 20 DERs Case B verifies the validity of proposed control techniques on a 60Hz and 480V microgrid test system with 20 DERs. The single-line diagram of this microgrid test system is illustrated in Fig. 3.16. This test system is simulated in MATLAB/Simulink. The specifications of DERs are listed in Table III. Lines and loads specifications are shown in Tables IV. The communication network graph is depicted in Fig. 3.17. The frequency reference value is shared with DER1 with the pinning gain g1 = 1. ωref is set to 2π × 60 rad/s. The control gains cω is set to 40. We assume zero-mean Gaussian communication noise with following statistical properties N (0, 0.01). This system is used to validate the proposed attack detection and mitigation schemes considering the DSFC. Case B.1 (effect of attack on the conventional DSFC): We consider the attack on DER 20. At t = 0, the microgrid is islanded from the main grid. From t = 0 to t = 0.7s only primary control is applied. The primary control takes action to provide frequency stability in the islanded microgrid. However, primary control only maintains frequency in stable ranges and cannot maintain frequency at exactly 60 Hz. Then, the secondary distributed frequency control is applied at t = 0.7s to restore microgrid frequency to 60Hz. However, the attacker hijacks the DSFC of DER 20 and replaces the actual frequency with 60.2 Hz. Fig. 3.18 shows that the conventional DSFC protocol leads to the loss of the desired consensus. The frequency of each DER deviates from the desired frequency of 60 Hz and shows oscillatory behavior. In the presence of attack, the behavior of the compromised DER 20 is directly affected by the attack signal and its corrupted frequency is shared with neighboring DERs. This causes an oscillatory behavior in the neighboring DERs as shown in Fig. 3.18(a). Fig. 3.19 shows that the relative entropy of compromised and neighboring DERs diverge due to deviation in their behavior from the nominal one and go beyond predefined design threshold which is assumed 69 (a) Figure 3.16: Microgrid testbed with 20 DERs. Figure 3.17: Communication graph of the microgrid testbed in Case B. (a) (a) (b) Figure 3.18: Case B: Effect of attack on DSFC: (a) frequency; (b) active power ratio. to be γi = 5 ∀i. Case B.2 (attack detection and mitigation): Similar to Case B.1, at t = 0, the microgrid is islanded from the main grid. From t = 0 to t = 0.7s only primary control is applied. Then, the secondary distributed frequency control is applied at t = 0.7s. The attacker hijacks the DSFC of DER 20 and 70 Figure 3.19: Case B: Relative entropy based on frequency of DERs. (a) (a) (b) Figure 3.20: Case B: Resilient DSFC: (a) frequency; (b) active power ratio. (a) (b) Figure 3.21: Case B: Resilient DSFC: (a) relative entropy; (b) self-belief of DERs. replaces the actual frequency with 60.2 Hz at t = 0.7 s and then, the designed attack detection and mitigation mechanism is applied at t = 0.75 s. As shown in Fig. 3.20, frequency of intact DERs restores to 60 Hz after applying the attack mitigation mechanism at t = 0.75 s. Active power of all DERs are also retrieved back as in intact mode. After applying resilient DSFC, intact DERs discard locally observed frequency of corrupted DER. Therefore, the relative entropy for intact DERs remains close to zero but it keeps growing for the compromised DER 20 as shown in Fig. 3.21(a). Fig. 3.21(b) shows that self-belief for all DERs becomes one except the compromised DER 20. 71 3.6.3 Case C: Experimental verification of proposed techniques using a hardware-in- the-loop testing setup To experimentally validate the performance of proposed attack detection and mitigation techniques, a hardware-in-the-loop (HIL) laboratory testbed is developed using Opal-RT as a real-time digital simulator and Raspberry Pi modules. A microgrid testbed including four DERs is simulated in Opal-RT. The specifications of DERs, loads, and lines are summarized in Table V. It is assumed that DERs communicate to each other through the communication graph network in Fig. 3.22. The nominal operating voltage and frequency of the microgrid test system are 480 V and 60 Hz, respectively. The frequency reference value is shared with DER1 with the pinning gain g1 = 1. ωref is set to 2π × 60 rad/s. The control gains cω is set to 40. We assume zero-mean Gaussian communication noise with following statistical properties N (0, 0.01). As seen in Fig. 3.22, four Raspberry Pi modules are utilized in the HIL testing. Each Raspberry Pi module hosts the cyber-secure DSFC protocol for a DER. Raspberry Pi modules communicate to each other through a distributed communication network. The HIL setup, including Opal-RT, Raspberry Pi modules, Gigabit ethernet switch, and host computer, is shown in Fig. 3.23. The microgrid electric circuit, including DERs, loads, lines, and primary controllers, are modelled in RT-LAB. The DER local measurements including the voltage, frequency, and active/reactive power measurements are sent to the corresponding Raspberry Pi module through User Datagram Protocol (UDP). Each Raspberry Pi module runs three processes in parallel. These processes include receiving real-time DER measurements and sending secondary control references to DERs, communicating to the neighboring DER Raspberry Pi modules, and running the secondary control protocol and attack detection and mitigation techniques. Case C.1 (effect of attack on the conventional DSFC): We consider the attack on DER 2. At t = 0, the microgrid is islanded from the main grid. From t = 0 to t = 30s only primary control is applied. Then, the secondary distributed frequency control is applied at t = 30s to restore microgrid frequency to 60 Hz. However, the attacker hijacks the DSFC of DER 2 and replaces the actual frequency with 66 Hz. Fig. 3.24(a) and Fig. 3.24(b) show that the conventional DSFC protocol leads to the loss of the desired consensus. The frequency of each DER deviates from the desired frequency of 60 Hz and shows oscillatory behavior. In the presence of attack, the behavior 72 (a) Figure 3.22: Microgrid test system for HIL testing. (a) Figure 3.23: HIL Setup. (a) (b) Figure 3.24: Case C: Effect of attack on DSFC: (a) frequency; (b) active power ratio. 73 Figure 3.25: Case C: Relative entropy based on frequency of DERs. (a) (a) (b) Figure 3.26: Case C: Resilient DSFC: (a) frequency; (b) active power ratio. of the compromised DER 2 is directly affected by the attack signal and its corrupted frequency is shared with neighboring DERs. This causes an oscillatory behavior in the neighboring DERs. Fig. 3.25 shows that the relative entropy of compromised and neighboring DERs diverge due to deviation in their behavior from the nominal one. Case C.2 (attack detection and mitigation): At t = 0, the microgrid is islanded from the main grid. From t = 0 to t = 30s only primary control is applied. Then, the secondary distributed frequency control is applied at t = 30s. The attacker hijacks the DSFC of DER 2 and replaces the actual frequency with 66 Hz at t = 30s and then, the designed attack detection and mitigation mechanism is applied at the same time. As shown in Fig. 3.26(a) and Fig. 3.26(b), frequency of intact DERs restores to 60 Hz after applying the attack mitigation mechanism. Active power of all DERs are also retrieved back as in intact mode. 3.6.4 Conclusion This chapter addresses the effects of data manipulation attacks on distributed secondary frequency and voltage control in AC microgrids. An information-theoretic approach is employed for design of 74 detection and mitigation mechanism. Each DER detects the misbehavior of its neighbors on the distributed communication network and, consequently, calculates a belief related to the trustworthi- ness of the received information. It is shown that using the proposed cyber-secure approach, a DER can distinguish data manipulation attacks from legitimate events and only discards the information received from a neighbor if it is compromised. The proposed approach is ensured to work under a mild communication graph connectivity. 75 CHAPTER 4 ATTACK ANALYSIS AND RESILIENT CONTROL DESIGN FOR DISCRETE-TIME DISTRIBUTED MULTI-AGENT SYSTEMS 4.1 Introduction In this chapter, we relax some connectivity assumptions in the network for the resilient control design provided in Chapters 2 and 3 and present a distributed adaptive resilient mechanism to mitigate attacks on sensors and actuators. We first describe, supported with analysis, the adverse effects of cyber-physical attacks on the consensus of the DMAS. Specifically, we show how an attack on a compromised agent can propagate and even destabilize the entire network. Conditions under which the network becomes unstable are provided. We also show that the local neighborhood tracking error of agents becomes zero under specific types of attack while they are far from the synchronization. Therefore, existing robust control approaches such as H∞ that aim at minimizing the local neighborhood tracking error can no longer mitigate these types of attacks. Then, based on the results of attack analysis, to mitigate the effect of attacks on sensors and actuators, an observer-like anomaly detector is first designed which provides the expected normal behavior of the agents when there is no attack. An adaptive attack compensator is designed and augmented with the controller to mitigate attacks without discarding information from compromised or unattacked neighbors’. We show that the consensus error is uniformly bounded using the proposed controller in the presence of the attack, and the bound can be made arbitrarily small. 4.2 Preliminaries 4.2.1 Graph Theory A directed graph G consists of a pair (V, E) in which set of nodes and set of edges are represented by V = v1, . . . , vN and E ⊂ V × V, respectively. The adjacency matrix is defined as A = [aij], with aij > 0 if (vj, vi) ∈ E. The set of nodes vi with edges incoming to node vj is called as neighbors 76 where H = diag(hi) is known as the in-degree matrix, with(cid:80) of node vi, namely Ni = vj : (vj, vi) ∈ E. The graph Laplacian matrix is defined as L = H − A, aij as the weighted in-degree of node i. A node is called as a root node if it can reach all other nodes of the graph G through a directed path. A directed tree is an acyclic digraph with a root node, such that any other node of j∈Ni the digraph can be reached by one and only one directed path starting at the root node. A graph is said to have a spanning tree if a subset of the edges forms a directed tree. Throughout the paper, λ(.) represents the eigenvalues of a matrix. (.)adj refers to adjoint of a matrix. ker(.) denotes the null space. Furthermore, λmax(.) and λmin(.) represent maximum and minimum eigenvalue of matrix, respectively. diag(.) denotes the diagonal matrix. If A is an m × n matrix, with aij being its i − th row and j − th column entry, and B is a p × q matrix, then the Kronecker product A ⊗ B is the mp × nq block matrix given by  .  A ⊗ B = a11B ··· ... ... a1nB ... am1B ··· amnB Assumption 1. The directed graph G has a spanning tree. This assumption is a minimum requirement for the graph to guarantee consensus even in the absence of the attack [91], [27]. 4.2.2 Standard Distributed Consensus in MAS Consider the DMAS with N agents having identical system dynamics represented by xi(k + 1) = Axi(k) + Bui(k), ∀ i = 1, . . . , N, (4.1) where xi(k) ∈ Rn and ui(k) ∈ Rm are the state and control input of agent i, respectively. A and B are the system and input matrices, respectively. (A, B) is assumed to be stabilizable. Define the local neighborhood tracking error for the agent i as εi(k) = (1 + hi)−1 (cid:88) j∈Ni aij(xj(k) − xi(k)). Consider the distributed control law for node i as [91] ui(k) = cKεi(k), ∀ i = 1, . . . , N, 77 (4.2) (4.3) where c is a positive coupling gain and K ∈ Rm×n is a control gain, designed to gaurantee that agents reach consensus, i.e., xi(k) → xj(k) ∀i, j. Define the global state vector as x(k) = [xT 1 (k), xT Then, by substituting the controller ui(k) from (3.3) into the system dynamics (3.1), the global N (k)]T ∈ RnN . 2 (k), . . . , xT dynamics of DMAS can be expressed as [27] x(k + 1) = [IN ⊗ A − c(I + H)−1L ⊗ BK]x(k). Then, the solution to (4.4) is given by x(k) = [IN ⊗ A − c ˆL ⊗ BK]kx(0) (cid:44) Ak c x(0), where ˆL is the normalized graph Laplacian matrix defined as [91] (4.4) (4.5) ˆL = (I + H)−1L. (4.6) Let the eigenvalues of ˆL be λi, ∀ i = 1, . . . , N. Then, λi lies inside the unit circle centered at 1 + j0 for i = 2, . . . , N and λ1 = 0 [27]. Lemma 1. [27] Let R ⊂ V be the set of root nodes and r = [p11, . . . , p1N ]T be the left eigenvector of ˆL for λ1 = 0. Then, p1i > 0 if i ∈ R and p1i = 0 if i /∈ R. Theorem 1. [91], [27] Let feedback gain K be designed such that A − cλiBK is Schur stable for i = 2, . . . , N. Then, DMAS reaches consensus and the final consensus value can be written as x(k) → (rT ⊗ Ak)[(x1(0))T , . . . , (xN (0))T ]T ∀ i = 1, . . . , N. (4.7) 4.3 Attack Analysis for Discrete-time DMAS In this section, we model false-data injection attack on sensors and actuators, and analyze its adverse effects on DMASs. Attacks on actuators of agent i can be modeled as uc i (k) = ui(k) + γiua i (k), (4.8) where ui is the control law given in (3.3), ua actuator of agent i, uc i represents the attacker’s signal injected into the i is the distorted control law applied to (3.1) and the scalar γi is 1 when there is an attack on actuators of agent i and 0, otherwise. 78 Attacks on sensors of agent i can be modeled as xc i (k) = xi(k) + δixa i (k), (4.9) where xi represents the state of agent i, xa i, xc i is the attacker’s signal injected into the sensor of agent i is the distorted state and the scalar δi is 1 when there is an attack on sensors of agent i and 0, otherwise. Based on the distributed control law (4.3), and using (4.8) and (4.9) in (4.1), one can express the DMAS dynamics for an agent i as xi(k + 1) = Axi(k) + Bui(k) + Bfi(k), ∀ i = 1, . . . , N, where fi(k) represents the overall attack signal injected into the agent i, which is given by fi(k) = c(1 + hi)−1K( aij(δjxa j (k) − δixa i (k)) + γiua i (k)). N(cid:88) (4.10) (4.11) j=1 Definition 1. In a graph, agent i is reachable from agent j if there exists [v1, v2, . . . , vl] ∈ V such av1v2 . . . avli (cid:54)= 0 for some l ≥ 0, i.e., there is a directed path of length l + 1 from node j that ajv1 to node i. Definition 2 (Compromised and unattacked Agent). We call an agent that is directly under attack as a compromised agent. An agent is called unattacked if it is not compromised. We denote the set of agents as NInt, i.e.,NInt = N\NComp where NComp denotes the set of compromised agents. In the control systems, the internal model principle (IMP) states that the controller must in- corporate a model of the dynamics that generate the signals which the control system is supposed to track. We show in Theorem 2 that the attacker can also leverage the IMP and incorporate some eigenvalues of the consensus dynamics in its attack design to destabilize the entire network. Definition 3. (IMP-based and non-IMP-based Attacks.) Let the attacker designs it attack signal fi(k) on sensors and/or actuators of the compromised agent i as with W ∈ Rm×m as the attacker’s dynamic. Define fi(k + 1) = W fi(k),  ΛW = [λW1 ΛA = [λA1 79 , . . . , λWm] , . . . , λAn], (4.12) (4.13) as the set of eigenvalues of W and the system dynamics matrix A, respectively. Then, if ΛW ⊆ ΛA, the attack signal is called the IMP-based attack. Otherwise, if ΛW (cid:54)⊂ ΛA or the attacker has no dynamics (e.g. a random signal), it is called a non-IMP based attack. Remark 1. Note that, in this paper, attack signals are not restricted to IMP-based attacks that are designed based on the dynamics (4.12). Attacks are categorized into two classes in Definition 3 based on their effects on the network stability and it will be shown that IMP-based attacks can destabilize the network while non-IMP based attacks cannot. The non-IMP based attacks cover a broad range of attacks. The proposed resilient approach presented in section IV works for both (cid:3) IMP and non-IMP based attacks. Remark 2. Note that if the attack is only on actuators of the agent i, i.e., fi(k) = ua the attacker can design ua in (12). If the attack is only on the sensor of the agent i, then fi(k) = −c(1 + hi)−1Kxa the attacker can design xa i (k), then i (k) = W fi(k − 1), as an IMP-based attack signal that follows dynamics i (k) and i (k) = W fi(k − 1), to follow the dynamics in (12). Note that the scalar coefficient cannot change the common mode of the dynamics. Hence, (12) can be implemented for (cid:3) attacks both on sensors and actuators. We assume that the system matrix A in (3.1) is marginally stable, with eigenvalues on the unit circle centered at the origin. This is a standard assumption in the literature for consensus and synchronization problems [29]. Define S(k) = (cid:88)N j=1 p1jfj(k), (4.14) Note that S(k) = (cid:80)N where p1j represents the element of left eigenvector corresponding to zero eigenvalue of ˆL. S(k) in (4.14) is used in Theorem 2 to analyze the effects of attack on root nodes and non-root nodes. j=1 p1jfj(k) represents the product of the left eigenvector corresponding to the zero eigenvalue of the graph Laplacian matrix ˆL and the attack vector f (k). Based on Lemma 1, elements of the left eigenvector corresponding to non-root nodes are zero. Thus, for a non-root node, S(k) = 0 regardless of attack. On the other hand, the left eigenvector corresponding to root nodes are non-zero and thus S(k) (cid:54)= 0 if there is an attack on a root node. Theorem 2. Consider the DMAS (4.10) under the attack fi(k) with the control protocol (4.3). Let fi(k) be designed as (4.12). Then, 80 1. An IMP-based attack destabilizes the complete network, if S(k) (cid:54)= 0, i.e., if attack is on a root node. 2. Any non-IMP based attack or IMP-based attack with S(k) = 0 deviates agents from the desired consensus behavior but does not cause instability, if agents are reachable from the compromised one. Proof. The transfer function for the DMAS (4.1), from xi(z) to ui(z), in Z-domain, can be written as G(z) = xi(z) ui(z) = (zI − A)−1B. Using (4.3), the global control law under the influence of the attack can be expressed as u(z) = −(c ˆL ⊗ K)x(z) + f (z), (4.15) (4.16) with u(z) = [uT 1 , . . . , uT N ]T , x(z) = [xT 1 , . . . , xT N ]T and f (z) = [f T 1 , . . . , f T N ]T . Using (4.15) and (4.16), the system state in the global form can be written as x(z) = (IN ⊗ G(z))u(z) = (IN ⊗ G(z))(−(c ˆL ⊗ K)x(z) + f (z)), (4.17) where G(z) ∈ Rnxm. Let M be a non-singular matrix such that ˆL = M ΛM−1, with Λ be the Jordan canonical form of ˆL. The left and the right eigenvectors of ˆL corresponding to the zero eigenvalue are r and 1N , respectively [27]. Define M = [1 M1], M−1 = [rT M2]T where M1 ∈ RN×(N−1) and M2 ∈ R(N−1)×N . Then, using (4.17) with ˆL = M ΛM−1, one has (M ⊗ In)[InN + cΛ ⊗ G(z)K](M−1 ⊗ In)x(z) = (IN ⊗ G(z))f (z). (4.18) Defining a state transformation as ˆx(z) = (M−1 ⊗ In)x(z) and premultiplying (4.18) with (M−1 ⊗ In) gives ˆx(z) = [IN n + cΛ ⊗ G(z)K]−1(M−1 ⊗ G(z))f (z). (4.19) Let assume for simplicity that all the Jordan blocks are simple, i.e., M−1 = [pij] and M = [mij], where pij and mij represent the elements of matrices M−1 and M. Then, for the agent i, using (4.19) and the fact that the first eigenvalue of ˆL is zero and its corresponding right eigenvector is 81 1N i.e. mi1 = 1, one has N(cid:88) j=1 p1jfj(z)+ xi(z) = G(z) N(cid:88) h=2 mih[In + cλhG(z)K]−1G(z) N(cid:88) j=1 phjfj(z). (4.20) We now show that [In + cKG(z)λh]−1 is Schur and thus to check stability of agents under attacks, one needs to analyze the first term of (4.20). Towards this end, since (A − cλhBK), ∀h = 2, . . . , N is Schur, therefore if we show that the roots of the characteristic polynomial (A−cλhBK) are identical to the poles of [In +cKG(z)λh]−1, then [In + cKG(z)λh]−1 is also Schur. To this end, using (4.15), one has ∆|(zIn − (A − cλhBK))| = ∆|(zIn − A + cλhBK)| = ∆|zIn − A|(In + cλh(zIn − A)−1BK) (4.21) ∆|zIn − A|[(∆|zIn − A| + cλh(zIn − A)adjBK)] ∆|zIn − A| . = Hence, this proves that the roots of the characteristic polynomial (A− cλhBK) are identical to the poles of [In + cKG(z)λh]−1, and thus [In + cKG(z)λh]−1 is Schur. (cid:80)N To analyze the boundedness of the first term in (4.20), note that according to Lemma 1, j=1 p1jfj(k) in (4.20), which is identical to S(k) in (4.14), is zero for an attack on non-root nodes and is nonzero if the attack is launched on root nodes. Consider now an IMP-based attack on a root node. Then, using the transfer function (4.15) and the attack signal defined in (4.12), one can write (4.20) as xi(z) = N(cid:88) j=1 p1j (zIn − A)adjB(zIn − W )adjfi(0) )(z2 + λ2 2{ ) (z2 + λ2 Ai i=1,i(cid:54)=l (z2 + λ2 Al + )} (4.22) Wi N(cid:88) h=2 mih[1 + cKG(z)λh]−1G(z) phjfj(z), n(cid:81) N(cid:88) j=1 as the marginal eigenvalue of the system dynamics A, i.e., λAl with λAl centered at the origin. Since the first term of (4.22) shows that the pole λAl centered at the origin and has multiplicity greater than 1 due to IMP-based attacks, thus the system lies on the unit circle lies on the unit circle 82 N(cid:88) N(cid:88) states tend to infinity as k → ∞. Therefore, attacks on root nodes destabilize the entire network in the sense that the state of all agents goes to infinity as time tends to infinity. This completes the proof of part 1. (cid:80)N To prove part 2, note that based on Lemma 1, if the attack is on a non-root node, then S(k) = j=1 p1jfj(k) is zero. Therefore, the first term in (4.20) becomes zero and it can be expressed as xi(z) = mih[1 + cKG(z)λh]−1G(z) phjfj(z). (4.23) h=2 j=1 According to (4.21), [In + cKG(z)λh]−1 is Schur stable. Therefore, based on (3.23), and since agent i is unattacked itself, which concludes G(z) is also Schur, the system states are bounded, even in the presence of attacks. However, although agents that are reachable from a compromised agent show a stable behavior, their states deviate from the desired consensus value based on the result of (cid:3) Theorem 1 in [94]. This completes the proof. Disturbance attenuation approaches focus on minimizing the effects of disturbance on the local neighborhood tracking error [92]. More specifically, the H∞ approach for DMAS (4.1) in presence of disturbance wi(k) designs a distributed control protocol as in (4.3), such that the desired consensus is achieved as in (4.7), if disturbance wi(k) = 0 and the bounded L2-gain condition is fulfilled for any disturbance wi(k) ∈ L2[0,∞) ∞(cid:88) ∞(cid:88) εT (k) ¯M ε(k) (cid:54) γ2 wT (k) ¯N w(k), (4.24) k=0 k=0 where γ > 0 is attenuation constant, ¯M and ¯N are positive definite weight matrices. Lemma 2. Consider the normalized graph Laplacian matrix ˆL defined in (4.6). Then, [ ˆLT ˆL − 2 ˆL] is negative semidefinite. Proof. Let λk be the eigenvalue of the normalized graph Laplacian matrix ˆL. So, the eigenvalue of [ ˆLT ˆL − 2 ˆL] for a undirected graph can be written as λ[ ˆLT ˆL − 2 ˆL] = λ2 k − 2λk = (λk − 1)2 − 1, ∀ k = 1, . . . , N. (4.25) Since all eigenvalues of matrix ˆL lie inside unit circle centered at 1+j0, except λ1 = 0 [27], therefore (λk − 1)2 − 1 is less than or equal to zero for k = 1, . . . , N. This shows that [ ˆLT ˆL − 2 ˆL] is negative (cid:3) semidefinite. 83 In the following theorem, for the sake of simplicity, we consider the single integrator dynamics in global form given by x(k + 1) = x(k) + u(k). Under the influence of attack, one can write the control input u(k) in (4.26) as u(k) = − ˆLx(k) + f (k). (4.26) (4.27) Theorem 3. Consider the DMAS with single integrator dynamics (4.26). Assume that the system is under a constant attack signal f (k). Then, εi(k) → 0, ∀i ∈ NInt while agents do not reach the desired consensus. Proof. Consider the Lyapunov function for the discrete-time DMAS as V (x(k), f (k)) = (− ˆLx(k) + f (k))T (− ˆLx(k) + f (k)). (4.28) For the considered system (4.26) under constant attack signal f (k + 1) = f (k) with the control input (4.27), one has ∆V (x(k), f (k)) = (− ˆL[x(k) − ˆLx(k) + f (k))])T (− ˆL[x(k) − ˆLx(k) + f (k))]) −(− ˆLx(k))T (− ˆLx(k)) − 2f (k)T ˆL(− ˆLx(k) + f (k))). After simplifying (4.29) and based on Lemma 2, one has ∆V (x(k), f (k)) = (− ˆLx(k) + f (k))T [ ˆLT ˆL −2 ˆL](− ˆLx(k) + f (k)) (cid:54) 0. (4.29) (4.30) Then, using Lasalle’s invariance principle [93], the trajectories (x(k), f (k)) converge to a set that satisfies ∆V (x(k), f (k)) = 0. Based on (4.30), this yields (− ˆLx(k) + f (k)) ∈ ker( ˆLT ˆL − 2 ˆL) or (− ˆLx(k) + f (k)) = 0. (4.31) (4.32) 84 From (3.31), one has (− ˆLx(k) + f (k)) = ¯c1N . According to this, the single integrator system dynamics becomes xi(k + 1) = xi(k) + ¯c, which shows that it destabilizes the system. Therefore, xi(k) → ∞ as k → ∞ ∀i = 1, . . . , N while the local neighborhood tracking error goes to zero for all agent. Note that based on Theorem 2, (3.32) is the possible case when the attack is on a root node. On the other hand, for an attack on a non-root node, from (4.32), one has (− ˆLx(k) + f (k)) = 0. Since for the unattacked agent i, fi(k) = 0, therefore, the local neighborhood tracking error for unattacked agents converge to zero, even in the presence of the attack. We now show that unattacked agents do not reach the desired consensus, despite the fact the local neighborhood tracking error is zero. From (4.32), one has ˆLx(k) = f (k) which can be written for agent i as N(cid:88) (1 + hi)−1 aij(xj(k) − xi(k)) = fi(k). For a compromised agent i, since fi(k) (cid:54)= 0, then, one has xi(k) (cid:54)= xj(k) for some i, j. j=1 Now assume that agent i is unattacked, i.e., fi(k) = 0. Then, based on (4.33), one has N(cid:88) (1 + hi)−1 aij(xj(k) − xi(k)) = 0. (4.33) (4.34) Consider the unattacked agent i as an immediate neighbor of the compromised agent ic. Let assume by contradiction that only the compromised agent does not reach the desired consensus but all the j=1 unattacked agents reach the desired consensus. Using (4.34), one can write (1 + hi)−1 (cid:88) j∈Ni aij(xj − xi)+aiic(xic − xi) = 0. (4.35) Assuming that unattacked agents reach consensus, xi(k) = xj(k), ∀j ∈ Ni. However, (4.35) cannot be satisfied if xi(k) = xj(k), ∀j ∈ Ni because xic(k) (cid:54)= xi(k) and this contradicts the assumption. Therefore, this shows that the unattacked agent i is deviated from the desired consensus value. Similarly, one can use the same argument to show that all reachable agents from the compromised (cid:3) agent deviate from the desired consensus value. This completes the proof. Corollary 1. Let the attacker design its attack signal using the IMP-based approach described in Theorem 2. Then, it bypasses the H∞ control protocol. 85 Proof. In the absence of the attack, minimizing the local neighborhood tracking error results in minimizing the consensus error. Therefore, the H∞ control in (4.24) is used to attenuate the effect of adversarial input on the local neighborhood tracking error. However, according to Theorem 3, in the presence of IMP attack, by making the local neighborhood tracking error go to zero, agents do (cid:3) not reach consensus. This completes the proof. 4.4 Resilient Distributed Control Protocol for Attacks on Sensor and Actuator : An Adaptive Approach In this section, the expected normal behavior of each agent is predicted using an observer-like predictor (called here expected state predictor), which employs the agent’s dynamics to predict its expected normal state at each time step. This expected state predictor does not use any actual state measurement, and, instead, calculates the expected normal state of the agent based on the evolution rule of its dynamics, and taking into account the local information it receives from its neighbors. Then, a distributed adaptive compensator is designed using predicted behavior of agents to compensate for any discrepancy between the actual state and its predicted normal one. Consider the estimated state for agent i as ˆxi(k). The distributed expected state predictor is designed as ˆxi(k + 1) = Aˆxi(k) + cBK(1 + hi)−1 N(cid:88) aij(ˆxj − ˆxi), (4.36) where the gain K and the coupling coefficient c are to be designed to ensure Ac in (4.5) is Schur. The global expected state predictor state vector for (4.36) can be written as ˆx(k) = j=1 [ˆxT N (k)]T ∈ RnN . 1 (k), ˆxT 2 (k), . . . , ˆxT Lemma 3. Consider the N expected state predictors given in (4.36). Let the feedback gain K and the coupling coefficient c are designed to ensure Ac in (4.5) is Schur. Then, the expected state predictor state ˆx(k) converges to the desired consensus value. Proof. The designed expected state predictor in (4.36) can be expressed as ˆxi(k + 1) = Aˆxi(k) + B ˆui(k), (4.37) 86 where ˆui(k) = cK ˆεi(k) with the local neighborhood tracking error ˆε(k) as N(cid:88) ˆεi(k) = (1 + hi)−1 aij(ˆxj − ˆxi)). (4.38) j=1 One can write the global expected state predictor state dynamics as ˆx(k + 1) = Ac ˆx(k) ∈ RnN c ˆx(0) ∈ RnN . As A − cλiBK is Schur stable, with λi be the eigenvalues of which yields ˆx(k) = Ak the normalized graph Laplacian matrix ˆL for i = 2, . . . , N and λ1 = 0. Therefore, the expected (cid:3) state predictor states achieve the desired consensus value. Remark 3. Note that a broad class of the DMAS includes the leader-follower or the containment control problem (i.e., DMAS with multiple-leader) for which even if the ˆxi(0) (cid:54)= xi(0), Lemma 3 is valid. This is because, the reference trajectory to be followed by agents is determined only by the leaders, which are assumed to be trusted by using more advanced sensors and investing more security. The system (4.36) acts as a reference model for the agents and if ˆxi(0) (cid:54)= xi(0), even for the unattacked DMAS, di in (4.42) will be nonzero until the difference between the initial conditions (cid:3) is gone. Agents converge to the desired behavior irrespective of initial values. We now design a distributed resilient control protocol as ui,r(k) = ui(k) + ui,comp(k), (4.39) where ui(k) represents the standard control protocol defined in (4.3) and ui,comp(k) represents the distributed adaptive compensator protocol responsible for rejection of the adversarial input. Consider the feedback gain K in the control protocol (4.3) given as K = (R1 + BT P1B)−1BT P1A = ¯R−1 1 BT P1A, where R1 is a positive definite design matrix, and P1 is the solution of AT P1A − P1 − AT P1B(R1 + BT P1B)−1BT P1A = Q1, with a positive definite matrix Q1. The designed distributed control protocol is given by ui,r(k) = cK ¯εi(k) − di(k), 87 (4.40) (4.41) (4.42) where di(k) is the estimated response of the adaptive compensator and K is the gain given by (4.40) and (4.41). The local neighborhood tracking error ¯εi(k) in (4.42) is given by N(cid:88) ¯εi(k) = (1 + hi)−1 aij(xc j(k) − xc i (k)). The update law for the distributed adaptive compensator is designed as j=1 di(k + 1) = θcK(ˆεi(k) − ¯εi(k)) + θdi(k), (4.43) (4.44) where θ > 0 is a design parameter, and ¯εi(k) and ˆεi(k) are defined in (4.43) and (4.38). According to Lemma 3, the expected state predictor converges to the desired consensus value. Therefore, consensus of DMAS can be achieved by showing the convergence of the agent state xi(k) to the predicted state ˆxi(k). Define the consensus error ˜x(k) as ˜x(k) = x(k) − ˆx(k). (4.45) In the following theorem, we show that the consensus error remains bounded using the proposed resilient adaptive controller. Theorem 4. Consider the DMAS (4.10) under attacks on sensors and actuators. Let the control protocol be developed as (4.42)-(4.44). Then, the agent’s consensus error in (4.45) is bounded, i.e., (cid:107)˜x(k)(cid:107) ≤ b0 for some bound b0 and it can be made arbitrarily small, despite attack. Proof. According to Lemma 3, the expected state predictor converges to the desired consensus value. Therefore, consensus of discrete-time DMAS can be achieved by showing the convergence of the agent state xi(k) to the predicted state ˆxi(k). Then, with (4.10) and (4.37), one can write ˜x(k + 1) as ˜x(k + 1) = (IN ⊗ A − c ˆL ⊗ BK)˜x(k) − (IN ⊗ B) ˜d(k), (4.46) where ˜d(k) = d(k) − f (k), denotes attack rejection error with d(k) = [dT 1 (k), dT 2 (k), . . . , dT compensator vector and the dynamics of the attack f (k) is defined in (4.12). Using (4.44), the global dynamics of the adaptive compensator can be written as 88 (4.47) N (k)]T ∈ RnN as the global adaptive d(k + 1) = θc ˆL ⊗ ¯R−1 1 BT P1A˜x(k) + θ ˜d(k) + θ ¯f (k), (4.48) where ¯R1 = R1 + BT P1B and ¯f (k) = 2f (k) − (γ ⊗ IN )ua. Note that ¯f (k) = f (k) only if the actuator of any agent is compromised. Define Q2 = QT ˆL with some positive definite matrix R2. Let the real part of the minimum eigenvalue of the normalized graph Laplacian matrix ˆL be λm. 2 > 0 as Q2 = cR2(I + H)−1L = cR2 Define the Lyapunov candidate function function as V (k) = ˜xT (k)(Q2 ⊗ P1)˜x(k) + θ−2 ˜dT (k)(R2 ⊗ ¯R1) ˜d(k). (4.49) The difference equation of the Lyapunov candidate function can be written as ∆V (k) = V (k + 1) − V (k) (cid:125) (cid:124) = ˜xT (k + 1)(Q2 ⊗ P1)˜x(k + 1) − ˜xT (k)(Q2 ⊗ P1)˜x(k) (cid:124) (cid:125) + θ−1 ˜dT (k + 1)(R2 ⊗ R1) ˜d(k + 1) − θ−1 ˜dT (k)(R2 ⊗ R1) ˜d(k) (cid:123)(cid:122) (cid:123)(cid:122) part 1 (4.50) . part 2 Using (4.46), part 1 of the difference equation of the Lyapunov candidate function (4.50) can be expressed as ˆL ⊗ AT P1BK ˆL ⊗ (BK)T P1BK − (Q2 ⊗ P1))˜x(k) = ˜xT (k)(Q2 ⊗ AT P1A − 2cQ2 +c2 ˆLT Q2 −2˜xT (k)[Q2 ⊗ AT P1B − c ˆLT Q2 ⊗ (BK)T P1B] ˜d(k) + ˜dT (k)(Q2 ⊗ BT P1B) ˜d(k). Using the Young’s inequality, one can further simplify and express (4.51) as (cid:54) −˜xT (k)(Q2 ⊗ Q1)˜x(k) − ˜xT (k)(−Q2 + 2cQ2 ˆL) ⊗ AT P1BK)˜x(k) +2c2λmin(c2 ˆLT ˆLλmin(T Q−1 −2˜xT (k)(Q2 ⊗ AT P1B) ˜d(k) + 2 ˜dT (k)(Q2 ⊗ BT P1B) ˜d(k), 1 ))˜xT (k)(Q2 ⊗ Q1)˜x(k) (4.51) (4.52) where T = KT BT P1BK. We now consider the part 2 of the difference equation of the Lyapunov candidate function in (4.50) as θ−2 ˜dT (k + 1)(R2 ⊗ ¯R1) ˜d(k + 1) − θ−2 ˜dT (k)(R2 ⊗ ¯R1) ˜d(k), (4.53) 89 where ¯R1 = (R1 + BT P1B) is a positive definite matrix. Using (4.47), one can express (4.53) as 1 θ2 [dT (k + 1)(R2 ⊗ ¯R1)d(k + 1) − 2dT (k + 1)(R2 ⊗ ¯R1)f (k + 1) +f T (k + 1)(R2 ⊗ ¯R1)(f (k + 1) − ˜dT (k)(R2 ⊗ ¯R1) ˜d(k)]. Using the dynamics of the distributed adaptive compensator in (4.48) with (4.54), one has = ˜xT (k)(c ˆLT Q2 ⊗ KT BT P1A)˜x(k) + 2 ˜dT (k)(Q2 ⊗ BT P1A)˜x(k) +2[ ¯f (k) − θ−1f (k + 1)]T (Q2 ⊗ BT P1A)˜x(k) +(1 − θ−2) ˜dT (k)(R2 ⊗ ¯R1) ˜d(k) +[ ¯f (k) − θ−1f (k + 1)]T (R2 ⊗ ¯R1) ˜d(k) +[ ¯f (k) − θ−1f (k + 1)]T (R2 ⊗ ¯R1)[ ¯f (k) − θ−1f (k + 1)]. Using the Young’s inequality, one can simplify (4.55) as ˜xT (k)(cQ2 ˆL ⊗ AT P1BK)˜x(k) + 2 ˜dT (k)(Q2 ⊗ BT P1A)˜x(k) ≤ 3 2 +(2 − θ−2) ˜dT (k)(R2 ⊗ ¯R1) ˜d(k) +4[ ¯f (k) − θ−1ψ(k)f (k)]T (R2 ⊗ ¯R1)[ ¯f (k) − θ−1ψ(k)f (k)], (4.54) (4.55) (4.56) where ψ(k) denotes how the value of attack signal changes at next time instant. If the attack signal is constant, i.e., f (k + 1) = f (k), then ψ(k) = 1. Thus, one can infer that ψ(k) is always bounded, i.e., |ψ(k)| < ζ ∀ k. Integrating (4.52) and (4.56) with further simplification, one has 1 2 cQ2 ˆL) ⊗ AT P1BK)˜x(k) ∆V (cid:54) −˜xT (k)(Q2 ⊗ Q1)˜x(k) −˜xT (k)(−Q2 + +2c2λmin( ˆLT ˆL)λmin(T Q−1 − (θ−2 − 2 − 2λmin(c ˆLBT P1B ¯R−1 +4[ ¯f (k) − θ−1ζf (k)]T (R2 ⊗ ¯R1)[ ¯f (k) − θ−1ζf (k)]. 1 ))˜xT (k)(Q2 ⊗ Q1)˜x(k) (4.57) 1 ) ˜dT (k)(R2 ⊗ ¯R1) ˜d(k) (cid:113) 2λmin(T Q−1 1 ) < c < λm 1 and . (4.58) One can show that ∆V ≤ 0, if the coupling coefficient satisfies 2 λm (cid:13)(cid:13)(cid:13) ˜d(k) (cid:13)(cid:13)(cid:13) > (cid:13)(cid:13)(cid:13)( ¯f (k) − θ−1ζf (k)) (cid:13)(cid:13)(cid:13) 4 θ−2 − 2 − 2λmin(c ˆLBT P1B ¯R−1 1 ) 90 (cid:113) 1 The design parameter θ can be chosen such θ < and then, one can ensure the bound in (4.58). This shows that the agent’s consensus error is bounded, i.e., (cid:107)˜x(k)(cid:107) ≤ b0 for some bound b0. Therefore, the actual agent’s state x(k) achieves the desired consensus behavior with a bounded error that can be made arbitrarily small by appropriate selection of the design (cid:3) 2+2λmin(c ˆLBT P1B ¯R−1 1 ) parameter θ. This completes the proof. Remark 4. The coupling coefficient c needs to be in a certain range which depends on the λm and λmin(T Q−1 1 ). This condition is standard in the literature of DMAS [91]. On the other hand, the condition for the bound on ˜d(k) in (4.58) depends on the design parameters θ, and one can select this parameter to satisfy (4.58) which ensures ∆V ≤ 0. Thus, the bound on consensus error can be made arbitrarily small based on the selection of design parameter θ. Moreover, this bound is (cid:3) conservative, and as shown in the simulation results, the consensus error almost goes to zero. Remark 5. Compromised agents under the effect of the sensor attack might not be recovered completely and result a nonzero bound in the consensus error defined in (4.45). The proposed surement (xc distributed adaptive law compensates the difference between the incoming neighboring sensor mea- i (k) (cid:54)= xi(k) in the case of sensor attack. Under i (k) = xi(k) and the bound on the consensus error can be made arbitrarily (cid:3) i (k)) and the desired state ˆxi(k) and xc the actuator attack xc small. 4.5 Simulation Results We consider a leader-follower network of autonomous underwater vehicle’s (AUV’s) for the evaluation of the presented results. The communication network in Fig.4.1 considers the Sentry AUVs as agents which are manufac- tured by the Woods Hole Oceanographic Institution [95]. The linearized model of the Sentry is of 6 DOF, but it is generally decomposed into four non-interacting subsystems which are speed subsys- tem (u), the roll subsystem (φ), the steering subsystem (ν, r, ψ), the diving subsystem (ω, q, z, θ). Here, we focus on the diving subsystem of Sentry AUV for the desired depth maneuvering in the leader-follower network. The diving subsystem dynamics of Sentry AUV follows the dynamics in (4.1) where 91  A = 0.65 0.54 0.0 0.0019 0.21 1.48 0.0 0.83 0.84 1.0 0.11 1.21 0.0 0.01 0.99 0.99  and B =   0.08 0.13 - 0.13 0.20 0.02 0.09 - 0.07 0.09 (4.59) with xi(k) = [(ωi(k), qi(k), zi(k), θi(k))]T , and ui(k) = [δb i (k) and (ωi(k), qi(k), zi(k), θi(k)) represent bone and stern plane deflections, and heave speed, pitch rate, depth and pitch, respectively. i (k)]T , where δb i (k), δb i (k), δb Figure 4.1: Graph topology. In the network communication graph, we assumed that the agent 0 represents a active non- autonomous leader which aim to follow a desired sinusoidal depth trajectory and agents 1 to 5 designate the followers. The leader has the control input u0(k) = K0x0(k) + r(k), where K0 is state feedback gain, x0 denotes the leader state and r(k) represents the desired sinusoidal trajectory, respectively. Since the leader input is non-zero, slightly different discrete-time control protocol from that the one proposed in the paper is used for which the leader exchanges its input signal u0 with its neighbors and agents reach consensus by exchanging states and leader’s input. This, however, does not change our attack analysis and mitigation. The state feedback gain K0 = [ - 0.18 -2.25 0.13 -0.21; 1.56 5.39 0.49 1.59] 92 Figure 4.2: The Agents depth trajectory under the influence of the attacks on AUVs 2 and 3. Depth without adaptive compensator Figure 4.3: The Agents depth trajectory under the influence of the attacks on AUVs 2 and 3. Depth with adaptive compensator of Agent 2 and Agent 3 with attack signals ua Now, the effect of multiple attacks on the network is analyzed. We consider attack on actuators 3(k) = [20sin(k) 20sin(k)](cid:48), respectively at t = 40 sec. Fig. 4.2 shows that agents which are reachable from compromised Agents 2(k) = [30 30](cid:48) and ua 2 and 3 are deviated from the desired behavior. This verifies results of Theorem 2. Then, Fig. 4.3 illustrates the response of the system under the influence of multiple attacks using the proposed controller with Q1 and R1 be identity matrix in (4.40) and (4.41), respectively. The system states achieve the desired consensus behavior, even in the presence of the attack. This result demonstrates the effectiveness of the proposed resilient controller in Theorem 4 for multiple attacks. Note that result in Fig. 4.3 also shows that this approach is not limited to particular model of attack and attack signal can be constant or time-varying. Compared to the existing work such as [18], the 93 020406080Time (s)510152025Depth(m)020406080Time (s)510152025Depth(m) presented approach brings back the compromised agents to the network. However, approaches such as [18] work also for attacks on the communication network while the presented approach is limited to attacks on sensors and actuators. 4.6 Conclusion This paper analyzes the effects of attacks for leaderless DMAS and designs an adaptive resilient distributed control protocol for attack mitigation. It is shown that how the IMP-based attack on a root node can destabilize the entire network. To overcome the effect of the attacks on sensors and actuators, a resilient controller is developed based on an expected state predictor. The presented controller shows that the attack on the sensor and actuator can be mitigated without compromising the connectivity of the network and achieves the desired consensus. Although not considered in this paper, attacks on the communication links can be removed by integrating our approach with existing resilient methods presented in [10]-[11]. The presented approach also works for the leader-follower problems in which leaders are assumed to be trusted. 94 CHAPTER 5 SECURE EVENT-TRIGGERED DISTRIBUTED KALMAN FILTERS FOR STATE ESTIMATION OVER WIRELESS SENSOR NETWORKS 5.1 Introduction Motivated by results on resilient designs presented in the previous chapters, we consider the problem of secure state estimation for distributed sensor networks. This chapter analyzes adverse effect of attacks and designs a resilient event-triggered distributed state estimation approach that can perform accurate state estimation despite attacks. More Specifically, first, we show that the attacker can cause emerging non-triggering misbehavior so that the compromised sensors do not broadcast any information to their neighbors. This can significantly harm the network connectivity and its collective observability, which is a necessary condition for solving the distributed state estimation problem. We then show that an attacker can achieve continuous-triggering misbehavior which drains the communication resources and affects performance. Then, to detect adversarial intrusions a Kullback-Leibler (KL) divergence based detector is presented and estimated via k-nearest neighbors approach to obviate the restrictive Gaussian assumption on the probability density function of the attack signal. Based on attack detection results, finally, to mitigate attacks on event-triggered DKF, a meta-Bayesian approach is employed that performs second-order inference to form confidence and trust about the truthfulness or legitimacy of the outcome of its own first-order inference (i.e., the posterior belief about the state estimate) and those of its neighbors, respectively. Each sensor communicates its confidence to its neighbors and also incorporates the trust about its neighbors into its posterior update law to put less weight on untrusted data and thus successfully discard corrupted information. 5.2 Preliminaries The data communication among sensors in a WSN is captured by an undirected graph G, consists of a pair (V,E), where V = {1, 2, . . . , N} is the set of nodes or sensors and E ⊂ V × V is the set of edges. An edge from node j to node i, represented by (j, i), implies that node j can broadcast 95 information to node i. Moreover, Ni = {j : (j, i) ∈ E} is the set of neighbors of node i on the graph G. An induced subgraph Gw is obtained by removing a set of nodes W ⊂ V from the original graph G, which is represented by nodes set V\W and contains the edges of E with both endpoints in V\W. Throughout this chapter, R and N represent the sets of real numbers and natural numbers, respectively. AT denotes transpose of a matrix A. tr(A) and max(ai) represent trace of a matrix A and maximum value in the set, respectively. C(S) represents the cardinality of a set S. σmax(A), λmax(A), and In represent maximum singular value, maximum eigenvalue of matrix A, and an identity matrix of dimension n, respectively. U(a, b) with a < b denotes an uniform distribution between the interval a and b. Consider pX (x) as the probability density of the random variable or vector x with X taking values in the finite set {0, ..., p}. When a random variable X is distributed normally with mean ν and variance σ2, we use the notation X ∼ N (υ, σ2). E[X] and ΣX = E[(X − E[X])(X − E[X])T ] denotes, respectively, the expectation and the covariance of X. Finally, E[.|.] represents the conditional expectation. 5.2.1 Process Dynamics and Sensor Models Consider a process that evolves according to the following dynamics x(k + 1) = Ax(k) + w(k), (5.1) where A denotes the process dynamic matrix, and x(k) ∈ Rn and w(k) are, respectively, the process state and process noise at the time k. The process noise w(k) is assumed to be independent and identically distributed (i.i.d.) with Gaussian distribution, and x0 ∈ N (ˆx0, P0) represents the initial process state with ˆx0 as mean and P0 as covariance, respectively. The goal is to estimate the state x(k) for the process (5.1) in a distributed fashion using N sensor nodes that communicate through the graph G, and their sensing models are given by yi(k) = Cixi(k) + vi(k); ∀i = 1,··· , N, (5.2) where yi(k) ∈ Rp represents the measurement data with vi(k) as the i.i.d. Gaussian measurement noise and Ci as the observation matrix of the sensor i, respectively. Assumption 1. The process noise w(k), the measurement noise vi(k), and the initial state x0 are uncorrelated random vector sequences. 96 Assumption 2. The sequences w(k) and vi(k) are zero-mean Gaussian noise with and E[w(k)(w(h))T ] = µkhQ E[vi(k)(vi(h))T ] = µkhRi, with µkh = 0 if k (cid:54)= h, and µkh = 1 otherwise. Moreover, Q ≥ 0 and Ri > 0 denote the noise covariance matrices for process and measurement noise, respectively and both are finite. Definition 1. (Collectively observable) [106]. We call the plant dynamics (5.1) and the measurement equation (5.2) collectively observable, if the pair (A, CS) is observable where Cs is the stack column vectors of Cj, ∀j ∈ S with S ⊆ V and C(S) > N/2. Assumption 3. The plant dynamics (5.1) and the measurement equation (5.2) are collectively observable, but not necessarily locally observable, i.e., (A, Ci) ∀i ∈ V is not necessarily observable. Assumptions 1 and 2 are standard assumptions in Kalman filters. Assumption 3 states that the state of the target in (5.1) cannot be observed by measurements of any single sensor, i.e., the pairs (A, Ci) cannot be observable (see for instances [106] and [132]). It also provides the necessary assumption of collectively observable for the estimation problem to be solvable. Also note that under Assumption 2, i.e., the process and measurement covariance are finite, the stochastic observ- ability rank condition coincides with the deterministic observability [Theorem 1, 43]. Therefore, deterministic observability rank condition holds true irrespective of the process and measurement noise. 5.2.2 Overview of Event-triggered Distributed Kalman Filter This subsection presents the overview of the event-triggered DKF for estimating the process state x(k) in (5.1) from a collection of noisy measurements yi(k) in (5.2). Let the prior and posterior estimates of the target state x(k) for sensor node i at time k be denoted by xi(k|k − 1) and xi(k|k), respectively. In the centralized Kalman filter, a recursive rule based on Bayesian inference is employed to compute the posterior estimate xi(k|k) based on its prior estimate xi(k|k− 1) and the new measurement yi(k). When the next measurement comes, the previous posterior estimate is used as a new prior and it proceeds with the same recursive estimation 97 rule. In the event-triggered DKF, the recursion rule for computing the posterior incorporates not only its own prior and observations, but also its neighbors’ predictive state estimate. Sensor i communicates its prior state estimate to its neighbors and if the norm of the error between the actual output and the predictive output becomes greater than a threshold after a new observation arrives. That is, it employs the following event-triggered mechanism for exchange of data with its neighbors (cid:107)yi(k) − Ci ˜xi(k − 1)(cid:107) < α, (5.3) where α denotes a predefined threshold for event-triggering. Moreover, ˜xi(k) denotes the predictive state estimate for sensor i and follows the update law ˜xi(k) = ζi(k)xi(k|k − 1) + (1 − ζi(k))A˜xi(k − 1), ∀i ∈ V, (5.4) with ζi(k) ∈ {0, 1} as the transmit function. Note that the predictive state estimate update equation in (4) depends on the value of the transmit function ζi(k) which is either zero or one depending on the triggering condition in (3). When ζi(k) = 1, then the prior and predictive state estimates are the same, i.e., ˜xi(k) = xi(k|k − 1). When ζi(k) = 0, however, the predictive state estimate depends on its own previous state estimate, i.e., ˜xi(k) = A˜xi(k − 1). Incorporating (5.4), the following recursion rule is used to update the posterior state estimate in the event-triggered DKF [112], [114] for sensor i as xi(k|k) = xi(k|k − 1) + Ki(k)(yi(k) − Cixi(k|k − 1)) (˜xj(k) − ˜xi(k)), (cid:80) j∈Ni +γi where xi(k|k − 1) = Axi(k − 1|k − 1), (5.5) (5.6) is the prior update. Moreover, the second and the third terms in (5.5) denote, respectively, the innovation part (i.e., the estimation error based on the sensor ith new observation and its prior prediction) and the consensus part (i.e., deviation of the sensor state estimates from its neighbor’s state estimates). We call this recursion rule as the Bayesian first-order inference on the posterior, which provides the belief over the value of the state. Moreover, Ki(k) and γi in (5.5), respectively, denote the Kalman gain and the coupling co- efficient. The Kalman gain Ki(k) in (5.5) depends on the estimation error covariance matrices 98 associated with the prior xi(k|k − 1) and the posterior xi(k|k) for the sensor i. Let define the prior and posterior estimated error covariances as Pi(k|k − 1) = E[(x(k) − xi(k|k − 1))(x(k) − xi(k|k − 1))T ], Pi(k|k) = E[(x(k) − xi(k|k))(x(k) − xi(k|k))T ]. which are simplified as [112], [114] Pi(k|k) = Mi(k)Pi(k|k − 1)(Mi(k))T + Ki(k)Ri(Ki(k))T , (5.7) (5.8) and Pi(k|k − 1) = APi(k − 1|k − 1)AT + Q. (5.9) with Mi(k) = In − Ki(k)Ci. Then, the Kalman gain Ki(k) is designed to minimize the estimation covariance and is given by [112], [114] Ki(k) = Pi(k|k − 1)(Ci)T (Ri(k) + CiPi(k|k − 1)(Ci)T )−1. Let the innovation sequence ri(k) for the node i be defined as ri(k) = yi(k) − Cixi(k|k − 1), where ri(k) ∼ N (0, Ωi(k)) with Ωi(k) = E[ri(k)(ri(k))T ] = CiPi(k|k − 1)Ci T + Ri(k). (5.10) (5.11) (5.12) Note that for the notional simplicity, henceforth we denote the prior and posterior state estima- tions as xi(k|k − 1) posterior covariance are, respectively, denoted by Pi(k|k − 1) = ¯xi(k) and xi(k|k) ∆ ∆ = ˆxi(k), respectively. Also, the prior covariance and the ∆ = ¯Pi(k) and Pi(k|k) ∆ = ˆPi(k). (a) (b) (5.13) The event-triggered DKF algorithm becomes Time updates: ¯xi(k + 1) = Aˆxi(k) ¯Pi(k + 1) = A ˆPi(k)AT + Q(k) Measurment updates: 99  ˆxi(k) = ¯xi(k) + Ki(k)(yi(k) − Ci ¯xi(k)) (˜xj(k) − ˜xi(k)), (cid:80) j∈Ni +γi ˜xi(k) = ζi(k)¯xi(k) + (1 − ζi(k))A˜xi(k − 1), i )−1, Ki(k) = ¯Pi(k)CT ˆPi(k) = Mi T + Ki(k)Ri(k)(Ki(k))T . i (Ri(k) + Ci ¯Pi(k)CT (a) (b) (c) (d) (5.14) ¯Pi(k)Mi Remark 1. Based on the result presented in [17, Th.1], the event triggered DKF (5.13)-(5.14) ensures that the estimation error ˆxi(k) − x(k) is exponentially bounded in the mean square sense ∀i ∈ V. Remark 2. The consensus gain γi in (5) is designed such that the stability of the event-triggered DKF in (13)-(14) is guaranteed. Specifically, as shown in [Theorem 2, 19], if 2(I − KiCi)(Γi)−1 λmax(L)λmax((Γ)−1) γi = where L denotes the Laplacian matrix associated with the graph G and Γ = diag{Γ1, .., ΓN} with Γi = (I − KiCi)T AT ( ¯Pi)+A(I − KiCi), ∀i = {1, ..., N}, then the stability of the event-triggered DKF in (13)-(14) is guaranteed. However, the design of event-triggered DKF itself is not the concern of this chapter and this chapter mainly analyzes the adverse effects of cyber-physical attacks on the event-triggered DKF and proposes an information-theoretic approach based attack detection and mitigation mechanism. Note that the presented attack analysis and mitigation can be extended to other event-triggered methods such as [113] and [115] as well. 5.2.3 Attack Modeling In this subsection, we model the effects of attacks on the event-triggered DKF. An attacker can design a false data injection attack to affect the triggering mechanism presented in (5.3) and con- sequently compromise the system behavior. Definition 2. (Compromised and intact sensor node). We call a sensor node that is directly under attack as a compromised sensor node. A sensor node is called intact if it is not compromised. Throughout the chapter, Vc and V\Vc denote, respectively, the set of compromised and intact sensor nodes. 100 Consider the sensing model (5.2) for sensor node i under the effect of the attack as ya i (k) = yi(k) + fi(k) = Cixi(k) + vi(k) + fi(k), (5.15) where yi(k) and ya i (k) are, respectively, the sensor i’s actual and corrupted measurements and fi(k) ∈ Rp represents the adversarial input on sensor node i. For a compromised sensor node i, let p(cid:48) ⊆ p be the subset of measurements disrupted by the attacker. Let the false data injection attack ¯fj(k) on the communication link be given by j (k) = ¯xj(k) + ¯fj(k), ∀j ∈ Ni. ¯xa (5.16) Using (5.15)-(5.16), in the presence of an attack on sensor node i and/or its neighbors, its state estimate equations in (5.14)-(5.13) becomes  i (k) = ¯xa ˆxa i (k + 1) = Aˆxa ¯xa i (k) = ζi(k)¯xa ˜xa i (k) = Ka f a i (k)) i (k)) + f a i (k), (˜xj(k) − ˜xa (cid:80) i (k)(yi(k) − Ci ¯xa i (k) + Ka j∈Ni +γi i (k), i (k) + (1 − ζi(k))A˜xa (cid:88) j∈Ni i (k)fi(k) + γi i (k − 1), ˜fj(k), ˜fj(k) = ζj(k) ¯fj(k) + (1 − ζj(k)) ˜fj(k − 1). where with (5.17) (5.18) (5.19) (5.20) The Kalman gain Ka i (k) in presence of attack is given by Ka i (k) = ¯P a i (k)CT i (Ri(k) + Ci i (k)CT ¯P a i )−1. The first part in (5.18) represents the direct attack on sensor node i and the second part denotes the aggregative effect of adversarial input on neighboring sensors, i.e., j ∈ Ni. Moreover, ˆxa and ˜xa i (k), i (k) denote, respectively, the corrupted posterior, prior, and predictive state estimates. The i (k), ¯xa Kalman gain Ka i (k) depends on the following corrupted prior state estimation error covariance ¯P a i (k + 1) = A ˆP a i (k)AT + Q. (5.21) where the corrupted posterior state estimation error covariance ˆP a i (k) evolution is shown in the following theorem. 101 Theorem 1. Consider the process dynamics (5.1) with compromised sensor model (5.15). Let the i (k) in (5.18). state estimation equation be given by (5.17) in the presence of attacks modeled by f a Then, the corrupted posterior state estimation error covariance ˆP a i (k) is given by ˆP a i (k) = M a i (k))T i (k)Ξf (k) (5.22) i (k) ¯P a +2γi (cid:80) i (k)[Ri(k) + Σf i (k)(M a i (k))T + Ka 2((cid:80) a a i,j (k)− (cid:95) (cid:95) j∈Ni i (k))(M a ( P P j (k) − 2 ˜P a j∈Ni ( ˜P a i,j(k) + ˜P a i (k)](Ka i (k))T − 2Ka i (k)), +γi i (k) and Ξf (k) denote the attacker’s input dependent covariance matrices and M a i (k)Ci) with Ka where Σf (In − Ka covariance update in (5.20) and (5.21), respectively. Moreover, ˜P a i = i (k) as the prior state estimation error a (cid:95) i,j (k) are cross- P i (k) as the Kalman gain and ¯P a i,j(k) and correlated estimation error covariances updated according to (6)-(8). Proof. See Appendix A. Note that the corrupted state estimation error covariance recursion ˆP a i (k) in (5.22) depends on the attacker’s input distribution. Since the state estimation depends on compromised estimation i (k), therefore, the attacker can design its attack signal to blow up the estimates error covariance ˆP a of the desired process state and damage the system performance. 5.3 Effect of Attack on Triggering Mechanism This section presents the effects of cyber-physical attacks on the event-triggered DKF. We show that although event-triggered approaches are energy efficient, they are prone to triggering misbehaviors, which can harm the network connectivity, observability and drain its limited resources. 5.3.1 Non-triggering Misbehavior In this subsection, we show how an attacker can manipulate the sensor measurement to mislead the event-triggered mechanism and damage network connectivity and collective observability by causing non-triggering misbehavior as defined in the following Definition 3. Definition 3 (Non-triggering Misbehavior). The attacker designs an attack strategy such that a compromised sensor node does not transmit any information to its neighbors by misleading the triggering mechanism in (5.3), even if the actual performance deviates from the desired one. 102 The following theorem shows how a false data injection attack, followed by an eavesdropping attack, can manipulate the sensor reading to avoid the event-triggered mechanism (5.3) from being violated while the actual performance could be far from the desired one. To this end, we first define the vertex cut of the graph as follows. Definition 4 (Vertex cut). A set of nodes C ⊂ V is a vertex cut of a graph G if removing the nodes in the set C results in disconnected graph clusters. Theorem 2. Consider the process dynamics (5.1) with N sensor nodes (5.2) communicating over the graph G. Let sensor i be under a false data injection attack given by ya i (k) = yi(k) + θa i (k)1p, ∀k ≥ L + 1, (5.23) where yi(k) is the actual sensor measurement at time instant k and L denotes the last triggering i (k) ∼ U(a(k), b(k)) is a scalar uniformly distributed random variable in time instant. Moreover, θa the interval (a(k), b(k)) with a(k) = ϕ − (cid:107)Ci ˜xi(k − 1)(cid:107) + (cid:107)yi(k)(cid:107) , b(k) = ϕ + (cid:107)Ci ˜xi(k − 1)(cid:107) − (cid:107)yi(k)(cid:107) , (5.24) where ˜xi(k) and ϕ < α denote, respectively, the predictive state estimate and an arbitrary scalar value less than the triggering threshold α. Then, 1. The triggering condition (5.3) will not be violated for the sensor node i and it shows non- triggering misbehavior; 2. The original graph G is clustered into several subgraphs, if all sensors in a vertex cut are under attack (5.23). Proof. Taking norms from both sides of (5.23), the corrupted sensor measurement ya i (k) becomes i (k)(cid:13)(cid:13) =(cid:13)(cid:13)yi(k) + θa (cid:13)(cid:13)ya (cid:13)(cid:13) ≤(cid:13)(cid:13)ya (cid:107)yi(k)(cid:107) −(cid:13)(cid:13)θa (cid:13)(cid:13) . i (k)(cid:13)(cid:13) ≤ (cid:107)yi(k)(cid:107) +(cid:13)(cid:13)θa i (k)1p i (k)1p (5.25) (5.26) (cid:13)(cid:13) . i (k)1p Using the triangular inequality for (5.25) yields 103 Based on the bounds of θa i (k), given by (5.24), (5.26) becomes (cid:107)Ci ˜xi(k − 1)(cid:107) − ϕ ≤(cid:13)(cid:13)ya i (k)(cid:13)(cid:13) − (cid:107)Ci ˜xi(k − 1)(cid:107) − ϕ)((cid:13)(cid:13)ya ((cid:13)(cid:13)ya i (k)(cid:13)(cid:13) ≤ (cid:107)Ci ˜xi(k − 1)(cid:107) + ϕ, i (k)(cid:13)(cid:13) − (cid:107)Ci ˜xi(k − 1)(cid:107) + ϕ) ≤ 0. which yields This implies that the condition (cid:13)(cid:13)ya i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13) ≤ ϕ < α, (5.27) (5.28) (5.29) always holds true. Therefore, under (5.23)-(5.24), the corrupted sensor node i shows non-triggering misbehavior, which proves part 1. We now prove part 2. Let An ⊆ Vc be the set of sensor nodes showing non-triggering misbe- havior. Then, based on the presented result in part 1, under the attack signal (5.23), sensor nodes in the set An are misled by the attacker and consequently do not transmit any information to their neighbors which make them to act as sink nodes. Since the set of sensor nodes An is assumed to be a vertex cut. Then, the non-triggering misbehavior of sensor nodes in An prevents information flow from one portion of the graph G to another portion of the graph G and thus clusters the original graph G into subgraphs. This completes the proof. Remark 3. Note that to design the presented strategic false data injection attack signal given in (5.23) an attacker needs to eavesdrop the actual sensor measurement yi(k) and the last transmitted prior state estimate ¯xi(L) through the communication channel. The attacker then determines the predictive state estimate ˜xi(k) using the dynamics in (5.5) at each time instant k ≥ L + 1 to achieve non-triggering misbehavior for the sensor node i. We provide Example 1 for further illustration of the results of Theorem 2. Example 1. Consider a graph topology for a distributed sensor network given in fig. 5.1. Let the vertex cut An = {5, 6} be under the presented false data injection attack in Theorem 2 and show non-triggering misbehavior. Then, the sensor nodes in An = {5, 6} do not transmit any information to their neighbors under the designed false data injection attack. Moreover, the sensor nodes in An = {5, 6} act as sink nodes and prevent information flow from subgraph G1 to subgraph G2 which clusters the graph G into two non-interacting subgraphs G1 and G2 as shown in Fig. 5.1. This 104 Figure 5.1: Effect of non-triggering misbehavior on sensor nodes {5,6} cluster the graph G in the two isolated graphs G1 and G2. example shows that the attacker can compromise the vertex cut An of the original graph G such that it shows non-triggering misbehavior and harm the network connectivity or cluster the graph into various non-interacting subgraphs. We now analyze the effect of non-triggering misbehavior on the collective observability of the sensor network. To do so the following definitions are needed. Definition 5 (Potential Set). A set of nodes P ⊂V is said to be a potential set of the graph G if the pair (A, CV\P ) is not collectively observable. Definition 6 (Minimal Potential Set). A set of nodes Pm ⊂ V is said to be a minimal potential set if Pm is a potential set and no subset of Pm is a potential set. Remark 4. Note that if the attacker knows the graph structure and the local pair(A, Ci), ∀i ∈ V. Then, the attacker can identify the minimum potential set of sensor nodes Pm in the graph G and achieves non-triggering misbehavior for Pm. Thus, the set of sensor nodes Pm does not exchange any information with its neighbors and becomes isolated in the graph G. 105 Corollary 1. Let the set of sensors that shows non-triggering misbehavior be the minimal potential set Sn. Then, the network is no longer collectively observable and the process state reconstruction from the distributed sensor measurements is impossible. Proof. According to the statement of the corollary, Sn represents a minimal potential set of the graph G and shows non-triggering misbehavior. Then, the sensor nodes in the set Sn do not transmit any information to their neighbors and they act as sink nodes, i.e., they only absorb information. Therefore, the exchange of information happen just between the remaining sensor nodes in the graph G\Sn. Hence, after excluding the minimum potential nodes Sn, the pair (A, CG\Sn unobservable based on the Definitions 5 and 6, and thus makes the state reconstruction impossible. ) becomes This completes the proof. 5.3.2 Continuous-triggering Misbehavior In this subsection, we discuss how an attacker can compromise the actual sensor measurement to mislead the event-triggered mechanism and achieves continuous-triggering misbehavior and thus results in a time-driven DKF that not only drains the communication resources but also continuously propagates the adverse effect of attack in the network. Definition 7 (Continuous-triggering Misbehavior). Let the attacker design an attack strategy such that it deceives the triggering mechanism in (5.3) at each time instant. This turns the event- driven DKF into a time-driven DKF that continuously exchanges corrupted information among sensor nodes. We call this a continuous-triggering misbehavior. We now show how a reply attack, followed by an eavesdropping attack, can manipulate the sensor reading to cause continuous violation of the event-triggered mechanism (5.3). Theorem 3. Consider the process dynamics (5.1) with N sensor nodes (5.2) communicating over the graph G. Let the sensor node i in (5.2) be under a replay attack given by i (k) = Ci ¯xi(k − 1) + υi(k), ∀k ≥ l + 1, ya (5.30) where ¯xi(k − 1) represents the last transmitted prior state and υi(k) denotes a scalar disruption signal. l denotes the last triggering time instant when intact prior state estimate was transmitted. Then, the sensor node i shows continuous-triggering misbehavior if the attacker selects (cid:107)υi(k)(cid:107) > α. 106 Proof. To mislead a sensor to cause a continuous-triggering misbehavior, the attacker needs to i.e., (cid:13)(cid:13)ya design the attack signal such that the event-triggered condition (5.3) is constantly being violated, i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13) ≥ α all the time. The attacker can eavesdrop the last transmitted prior state estimate ¯xi(k − 1) and design the strategic attack signal given by (5.30). Then, one has i (k) − Ci ˜xi(k − 1) = Ci ¯xi(k − 1) + δi(k) − Ci ˜xi(k − 1) ya = Ci ¯xi(k − 1) + υi(k)−Ci[ζi(k − 1)¯xi(k − 1) +(1 − ζi(k − 1))A¯xi(k − 2)] = (1 − ζi(k − 1))Ci[¯xi(k − 1) − A¯xi(k − 2)] + υi(k), Taking the norm from both sides of (5.31) yields i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13) (cid:13)(cid:13)ya = (cid:107)(1 − ζi(k − 1))Ci[¯xi(k − 1) − A¯xi(k − 2)] + υi(k)(cid:107) , Since for k = l + 1, ζi(l) = 1 i (l + 1) − Ci ˜xi(l)(cid:13)(cid:13) = (cid:107)υi(l + 1)(cid:107) , (cid:13)(cid:13)ya (5.31) (5.32) (5.33) (5.34) If the attacker selects υi(l + 1) such that (cid:107)υi(l + 1)(cid:107) > α, then the attack signal (5.30) ensures triggering at time instant k = l + 1. Then, based on similar argument for (5.32), ∀k ≥ l + 1 (cid:13)(cid:13)ya i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13) = (cid:107)υi(k)(cid:107) > α, which ensures continuous triggering misbehavior. This completes the proof. To achieve continuous-triggering misbehavior the attacker needs to eavesdrop prior state esti- mates ¯xi(k − 1) at each triggering instant and selects the υi(k) large enough such that (cid:107)υi(k)(cid:107) > α always holds true. Note that continuous-triggering misbehavior can completely ruin the advantage of event-triggered mechanisms and turn it into time-driven mechanisms. This significantly increases the communica- tion burden. Since nodes in the WSNs are usually powered through batteries with limited energy, the attacker can drain sensors limited resources by designing the above-discussed attack signals to achieve continuous-triggering misbehavior, and, consequently can make them non-operating in the network along with the deteriorated performance of the network. 107 Note that although we classified attacks into non-triggering misbehavior and continuous-triggering misbehavior, to analyze how the attacker can leverage the event-triggered mechanism, the following analysis, detection and mitigation approaches are not restricted to any class of attacks. 5.4 Attack Detection In this section, we present an entropy estimation-based attack detection approach for the event- triggered DKF. The KL divergence is a non-negative measure of the relative entropy between two probability distributions which is defined as follows. Definition 8 (KL Divergence) [36]. Let X and Z be two random variables with probability density function PX and PZ, respectively. The KL divergence measure between PX and PZ is defined as (cid:90) (cid:18) PX (θ) (cid:19) PZ (θ) DKL(PX||PZ ) = PX (θ) log θ∈Θ , (5.35) with the following properties [41] 1. DKL(PX||PZ ) ≥ 0; 2. DKL(PX||PZ ) = 0 if and only if, PX = Pz; 3. DKL(PX||PZ ) (cid:54)= DKL(PZ||PX ). In the existing resilient literature, the entropy-based anomaly detectors need to know the prob- ability density function of sequences, i.e., PX and PZ , to determine the relative entropy. In most of the cases, authors assume that the probability density function of corrupted innovation sequence remains Gaussian (see [36] and [135] for instance). Since, the attacker’s input signal is unknown, it is restrictive to assume that the probability density function of the corrupted sequence remains Gaussian. To relax this restrictive assumption on probability density function of the corrupted se- quence, we estimate the relative entropy between two random sequences X and Z using k−nearest neighbor (k − N N ) based divergence estimator [40]. Let {X1, . . . , Xn1} and {Z1, . . . , Zn2} be i.i.d. samples drawn independently from PX and PZ , k (i) be the Euclidean distance between Xi and its k − N N respectively with Xj, Zj ∈ Rm. Let dX in {Xl}l(cid:54)=i. The k − N N of a sample s in {s1, . . . , sn} is si(k) where i(1), . . . , i(n) such that 108 (cid:13)(cid:13)(cid:13)s − si(1) (cid:13)(cid:13)(cid:13) ≤(cid:13)(cid:13)(cid:13)s − si(2) (cid:13)(cid:13)(cid:13) ≤ . . . ≤(cid:13)(cid:13)(cid:13)s − si(n) (cid:13)(cid:13)(cid:13) . (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)Xi − Xj More specifically, the Euclidean distance dX k (i) is given by [136] j=1,...,n1,j(cid:54)={i,j1,...,jk−1} The k − N N based relative entropy estimator is given by [40] dX k (i) = ˆDKL(PX||PZ ) = m n1 log dZ k (i) dX k (i) + log n2 n1 − 1 . min n1(cid:88) i=1 (5.36) (5.37) The innovation sequences represent the deviation of the actual output of the system from the estimated one. It is known that innovation sequences approach a steady state quickly and thus it is reasonable to design innovation-based anomaly detectors to capture the system abnormality [36]. Using the innovation sequence of each sensor and the innovation sequences that it estimates for its neighbors, we present innovation based divergence estimator and design detectors to capture the effect of the attacks on the event-triggered DKF. Based on innovation expression (5.11), in the presence of attack, one can write the compromised i (k) in (5.15) and state estimation i (k) for sensor node i with disrupted measurement ya innovation ra ¯xa i based on (5.17) as ra i (k) = ya i (k) − Ci ¯xa i (k). Let {ra i (l), . . . , ra (5.38) i (l − 1 + w)} and {ri(l), . . . , ri(l − 1 + w)} be i.i.d. p-dimensional samples of corrupted and nominal innovation sequences with probability density function Pra and Pri, i respectively. The nominal innovation sequence follows ri(k) defined in (5.11). Using k − N N based relative entropy estimator (5.37), one has [40] ˆDKL(Pra i ||Pri) = p w log ri d k (j) ra i k (j) d + log w w − 1 , ∀i ∈ V. Define the average of the estimated KL divergence over a time window of T as w(cid:88) j=1 k(cid:88) (5.39) (5.40) Φi(k) = 1 T l=k−T +1 ˆDKL(Pra i ||Pri), ∀i ∈ V. Now, in the following theorem, it is shown that the effect of attacks on the sensors can be captured using (5.40). Theorem 4. Consider the distributed sensor network (5.1)-(5.2) under attack on sensor. Then, 109 1. in the absence of attack, Φi(k) = log(w/w − 1), ∀k; 2. in the presence of attack, Φi(k) > δ, ∀k > la, where δ and la denotes, respectively, a prede- fined threshold and the time instant at which the attack happen. Proof. In the absence of attack, the samples of innovation sequences {ra ra {ri(l), . . . , ri(l − 1 + w)} are similar. Then, the Euclidean distance d i k (j) = d and one has i (l), . . . , ra i (l − 1 + w)} and k (j), ∀j ∈ {1, ..., w} ri ˆDKL(Pra i ||Pri) = log w w − 1 , ∀i ∈ V. Based on (5.41), one has Φi(k) = 1 T k(cid:88) l=k−T +1 log w w − 1 = log w w − 1 < δ, ∀i ∈ V. (5.41) (5.42) where log(w/w − 1) in (42) depends on the sample size of innovation sequence and log(w/w − 1) ≤ 0.1, ∀w ≥ 10. Therefore, the predefined threshold δ can be selected with some δ > 0.1 such that the condition in (42) is always satisfied. This complete the proof of part 1. In the presence of attack, the samples of innovation sequences {ra i (l − 1 + w)} and ra {ri(l), . . . , ri(l − 1 + w)} are different, i.e., d k (j) (cid:54)= d k (j), ∀j ∈ {1, ..., w}. More specifically, ri i ra k (j), ∀j ∈ {1, ..., w} due to change in the corrupted innovation sequence. Therefore, ri i k (j) > d d based on (5.39) the estimated relative entropy between sequences becomes i (l), . . . , ra ˆDKL(Pra i ||Pri) = p w log(1 + ri ∆ k (j) ra i d k (j) ) + log w w − 1 , ∀i ∈ V, (5.43) with ∆ ri k (j) as the change in Euclidean distance due to corrupted innovation sequence. Based on w(cid:88) j=1 w(cid:88) k(cid:88) j=1 (5.43), one has Thus, one has ˆDKL(Pra i ||Pri) = p w log(1 + ri ∆ k (j) ra i k (j) d ) + log w w − 1 (cid:29) log w w − 1 . Φi(k) = 1 T l=k−T +1 ˆDKL(Pra i ||Pri) > δ, ∀i ∈ V, where T and δ denote the sliding window size and the predefined design threshold. This completes the proof. 110 (5.44) (5.45) Based on Theorem 4, one can use the following condition for attack detection.  Φi(k) < δ : H0, Φi(k) > δ : H1, (5.46) where δ denotes the designed threshold for detection, the null hypothesis H0 represents the intact mode of sensor nodes and H1 denotes the compromised mode of sensor nodes. Remark 5. Note that in the absence of an attack, the innovation sequence has a known zero-mean Gaussian distribution due to the measurement noise. Based on the prior system knowledge, one can always consider that the nominal innovation sequence is zero-mean Gaussian distribution with predefined covariance. The bound on the predefined covariance can be determined during normal operation of the event-triggered DKF. This assumption for the knowledge of the nominal innovation sequence for attack detection is standard in the existing literature (see [135] for instance). The designed threshold δ in (5.46) is a predefined parameter and chosen appropriately for the detection of the attack signal. Moreover, the selection of detection threshold based on expert knowledge is standard in the existing literature. For example, several results on adversary detection and stealthiness have considered similar thresholds [36], [124]. Algorithm 1 Detecting attacks on sensors. 1: Initialize with a time window T and detection threshold δ. 2: procedure ∀i = 1, . . . , N 3: Use samples of innovation sequences {ra {ri(l), . . . , ri(l − 1 + w)} based on (5.38) and (5.11), ∀l ∈ k. i (l), . . . , 4: Estimate the ˆDKL(Pra i 5: Compute Φi(k) as (5.45) and use condition in (5.46) to detect attacks on sensors. 6: end procedure ||Pri) using (5.44). i (l − 1 + w)} and ra Based on the results presented in Theorem 4 and Algorithm 1, one can capture attacks on both sensors and communication links, but it cannot identify the specific compromised communication link as modelled in (5.16). To detect the source of attacks, we present an estimated entropy-based detector to capture the effect of attacks on the specific communication channel. More specifically, the relative entropy between the estimated innovation sequences for the neighbors at particular 111 sensor node and the nominal innovation sequence of the considered sensor node is estimated using (5.37). Define the estimated innovation sequences ζa i,j(k) for a neighbor j under attacks on communi- cation channel from the sensor node i side as i,j(k) = yi(k) − Cj ˜xa ζa j (k), (5.47) where ˜xa j (k) is the corrupted communicated state estimation of neighbor j at sensor node i at the last triggering instant. i,j(l), . . . , ζa Let {ζa i,j(l−1+w)} be i.i.d. p-dimensional samples of neighbor’s estimated innovation . Using k − N N based relative entropy at the sensor node i with probability density function Pζa i,j estimator (5.37), one has ˆDKL(Pζa i,j ||Pri) = p w log ri d k (j) ζa i,j d k (j) + log w w − 1 , ∀i ∈ V, j ∈ Ni. (5.48) Note that in the presence of attacks on the communication channels, the neighbor’s actual innovation differs the neighbor’s estimated innovation at sensor i. In the absence of the attack, the mean value of all the sensor state estimates converge to the mean of the desired process state at steady state, and, therefore, the innovation sequences ri and ζa i,j have the same zero mean Gaussian distributions. In the presence of attack, however, as shown in Theorem 5 and Algorithm 2, their distributions diverge. w(cid:88) j=1 k(cid:88) Define the average of the KL divergence over a time window of T as Ψi,j(k) = 1 T l=k−T +1 ˆDKL(Pζa i,j ||Pri), ∀i ∈ V, j ∈ Ni. (5.49) Theorem 5. Consider the distributed sensor network (5.1)-(5.2) under attack on communication links (5.16). Then, in the presence of an attack, Ψi,j(k) > δ, ∀k where δ denotes a predefined threshold. Proof. The result follows a similar argument as given in the proof of part 2 of Theorem 4. 5.5 Secure Distributed Estimation Mechanism This section presents a meta-Bayesian approach for secure event-triggered DKF, which incor- porates the outcome of the attack detection mechanism to perform second-order inference and 112 Algorithm 2 Detecting attack on a specific communication link. 1: Initialize with a time window T and detection threshold δ. 2: procedure ∀i = 1, . . . , N 3: For each sensor node j ∈ Ni, use samples of innovation sequences{ζa 1 + w)} and {ri(l), . . . , ri(l − 1 + w)} based on (5.47) and (5.11), ∀l ∈ k. i,j(l), . . . , ζa i,j(l − 4: Estimate the ˆDKL(Pζa i,j 5: Compute Ψi,j(k) as (5.49) and use same argument in (5.46) to detect attacks on ||Pri) using (5.48). specific communication link. 6: end procedure consequently form beliefs over beliefs. That is, the second-order inference forms confidence and trust about the truthfulness or legitimacy of the sensors’ own state estimate (i.e., the posterior belief of the first-order Bayesian inference) and those of its neighbor’s state estimates, respectively. Each sensor communicates its confidence to its neighbors. Then sensors incorporate the confidence of their neighbors and their own trust about their neighbors into their posterior update laws to successfully discard the corrupted information. 5.5.1 Confidence of sensor nodes The second-order inference forms a confidence value for each sensor node which determines the level of trustworthiness of the sensor about its own measurement and state estimate (i.e., the posterior belief of the first-order Bayesian inference). If a sensor node is compromised, then the presented attack detector detects the adversary and it then reduces its level of trustworthiness about its own understanding of the environment and communicates it with its neighbors to inform them the significance of its outgoing information and thus slow down the attack propagation. To determine the confidence of the sensor node i, based on the divergence ˆDKL(Pra i Theorem 4, we first define χi(k) = Υ1 Υ1 + ˆDKL(Pra i , ||Pri) ||Pri) from (5.50) with 0 < Υ1 < 1 represents a predefined threshold to account for the channel fading and other uncertainties. Then, in the following lemma, we formally present the results for the confidence of sensor node i. 113 Lemma 1. Let βi(k) be the confidence of the sensor node i which is updated using k−1(cid:88) βi(k) = (κ1)k−l+1χi(l), (5.51) where χi(k) is defined in (5.50), and 0 < κ1 < 1 is a discount factor. Then, βi(k) ∈ (0, 1] and l=0 1. βi(k) → 0, ∀i ∈ Vc; 2. βi(k) → 1, ∀i ∈ V\Vc. Proof. Based on the expression (5.50), since ˆDKL(Pra i using (5.51), one can infer that βi(k) ∈ (0, 1]. ||Pri) ≥ 0, one has χi(k) ∈ (0, 1]. Then, Now according to Theorem 4, if the sensor node i is under attack, then ˆDKL(Pra i ||Pri) >> Υ1 in (5.50), which makes χi(k) close to zero. Then, based on expression (5.51) with the discount factor 0 < κ1 < 1, the confidence βi(k) in (5.51) approaches zero, and thus the ith sensor’s belief about the trustworthiness of its own information would be low. This completes the proof of part 1. ||Pri) → 0 as w → ∞, which makes χi(k) close to one and, consequently, βi(k) becomes close to one. This indicates that the ith sensor node is confident about its own state estimate. This completes the On the other hand, based on Theorem 4, in the absence of attacks, ˆDKL(Pra i proof of part 2. Note that the expression for the confidence of sensor node i in (5.51) can be implemented using the following difference equation βi(k + 1) = βi(k) + κ1χi(k). (5.52) Note also that the discount factor in (5.51) determines how much we value the current experience with regards to past experiences. It also guarantees that if the attack is not persistent and disappears after a while, or if a short-period adversary rather than attack (such as packet dropout) causes, the belief will be recovered, as it mainly depends on the current circumstances. 5.5.2 Trust of sensor nodes about their incoming information Similar to the previous subsection, the second-order inference forms trust of sensor nodes to represent their level of trust on their neighboring sensor’s state estimates. Trust decides the usefulness of the neighboring information in the state estimation of sensor node i. 114 k−1(cid:88) The trust of the sensor node i on its neighboring sensor j can be determined based on the divergence ˆDKL(Pζa i,j ||Pri) in (5.47) from Theorem 5, from which we define θi,j(k) = Λ1 Λ1 + ˆDKL(Pζa i,j , ||Pri) (5.53) where 0 < Λ1 < 1 represents a predefined threshold to account for the channel fading and other uncertainties. Then, in the following lemma, we formally present the results for the trust of the sensor node i on its neighboring sensor j. Lemma 2. Let σi,j(k) be the trust of the sensor node i on its neighboring sensor j which is updated using σi,j(k) = (κ2)k−l+1θi,j(l), (5.54) where θi,j(k) is defined in (5.53), and 0 < κ2 < 1 is a discount factor. Then, σi,j(k) ∈ (0, 1] and l=0 1. σi,j(k) → 0, ∀j ∈ Vc ∩ Ni; 2. σi,j(k) → 1, ∀j ∈ V\Vc ∩ Ni. Proof. The result follows a similar argument as given in the proof of Lemma 1. Note that the trust of sensor node i in (5.54) can be implemented using the following difference equation σi,j(k + 1) = σi,j(k) + κ2θi,j(k). (5.55) Using the presented idea of trust, one can identify the attacks on the communication channel and discard the contribution of compromised information for the state estimation. 5.5.3 Attack mitigation mechanism using confidence and trust of sensors This subsection incorporates the confidence and trust of sensors to design a resilient event-triggered DKF. To this end, using the presented confidence βi(k) in (5.51) and trust σi,j(k) in (5.54), we design the resilient form of the event-triggered DKF as ˆxi(k) = ¯xi(k) + Ki(k)(βi(k)yi(k) + (1 − βi(k))Cimi(k) − Ci ¯xi(k)) (5.56) (cid:80) j∈Ni σi,j(k)βj(k)(˜xj(k) − ˜xi(k)), where the weighted neighbor’s state estimate mi(k) is defined as +γi 115 mi(k) = 1|Ni| (cid:80) j∈Ni ∀k (cid:107)εi(k)(cid:107) < τ, σi,j(k)βj(k)˜xj(k) ≈ x(k) + εi(k), (5.57) where εi(k) denotes the deviation between the weighted neighbor’s state estimate mi(k) and the actual process state x(k). Note that in (5.57) the weighted state estimate depends on the trust values σi,j(k) and the confidence values βj(k), ∀j ∈ Ni. Since the weighted state estimate depends only on the information from intact neighbors, then one has (cid:107)εi(k)(cid:107) < τ for some τ > 0, ∀k. For the sake of mathematical representation, we approximate the weighted state estimate mi(k) in terms of the actual process state x(k), i.e., mi(k) ≈ x(k) + εi(k). We call this a meta-Bayesian inference that integrates the first-order inference (state estimates) with second-order estimates or belief (trust and confidence on the trustworthiness of state estimate beliefs). Define the prior and predictive state estimation errors as ¯ηi(k) = x(k) − ¯xi(k) ˜ηi(k) = x(k) − ˜xi(k), Using the threshold in triggering mechanism (5.3), one has (cid:107)˜ηi(k)(cid:107) − (cid:107)x(k + 1) − x(k) + vi(k + 1)(cid:107) ≤ α/(cid:107)Ci(cid:107) , (cid:107)˜ηi(k)(cid:107) ≤ α/(cid:107)Ci(cid:107) + B, where B denotes the bound on (cid:107)x(k + 1) − x(k) + vi(k + 1)(cid:107) . Other notations used in the following theorem are given by ¯η(k) = [¯η1(k), . . . , ¯ηN (k)], M (k) = diag[M1(k), . . . , MN (k)] Υ = diag[γ1, . . . , γN ], Υm = (cid:107)max{γi}(cid:107) , ∀i ∈ V, ¯β = (IN − diag(βi)), E(k) = [ε1(k), . . . , εN (k)], ˜η(k) = [˜η1(k), . . . , ˜ηN (k)]. Assumption 4. At least (C(Ni)/2) + 1 neighbors of the sensor node i are intact. (5.58) (5.59) (5.60) Assumption 4 is similar to the assumption found in the secure estimation and control literature [7], [125]. Necessary and sufficient condition for any centralized or distributed estimator to resiliently estimate actual state is that the number of attacked sensors is less than half of all sensors. Theorem 6. Consider the resilient event triggered DKF (5.56) with the triggering mechanism (5.3). Let the time-varying graph be G(k) such that at each time instant k, Assumptions 3 and 4 are satisfied. Then, 116 1. The following uniform bound holds on state estimation error in (5.58), despite attacks k−1(cid:88) (cid:107)¯η(k)(cid:107) ≤ (Ao)k (cid:107)¯η(0)(cid:107) + (Ao)k−m−1Bo, where m=0 Ao = σmax((IN ⊗ A)M (k)), Bo = σmax(A)σmax(L(k))Υm +(σmax(A) + σmax(Ao))(cid:13)(cid:13) ¯β(cid:13)(cid:13)√ (cid:112)N (α/(cid:107)Ci(cid:107) + B) N τ, with L(k) denotes the confidence and trust dependent time-varying graph Laplacian matrix, and bound τ defined in (5.57); 2. The uniform bound on the state estimation error (5.61) becomes k→∞(cid:107)¯η(k)(cid:107) ≤ AoBo 1 − Ao lim . Moreover, other notations used in (5.62) are defined in (5.60). Proof. Using the presented resilient estimator (5.56), one has (5.61) (5.62) (5.63) (5.64) (5.65) (5.66) Substituting (5.57) into (5.64) and using (5.58), the state estimation error dynamics becomes ¯xi(k + 1) = Aˆxi(k) (cid:80) j∈Ni = A(¯xi(k) + Ki(k)(βi(k)yi(k) + (1 − βi(k))Cimi(k) σi,j(k)βj(k)(˜xj(k) − ˜xi(k))), −Ci ¯xi(k)) + γi (cid:80) j∈Ni aij(k)(˜ηj(k) − ˜ηi(k)) ¯ηi(k + 1) = AMi(k)¯ηi(k) + Aγi −AKi(k)(1 − βi(k))Ciεi(k), where aij(k) = σi,j(k)βj(k) and Mi(k) = I − Ki(k)Ci. Using (5.65) and notations defined in (5.60), the global form of error dynamics becomes ¯η(k + 1) = (IN ⊗ A)M (k)¯η(k) − (Υ ⊗ A)L(k)˜η(k) −( ¯β ⊗ A)(InN − M (k))E(k)). Note that Assumption 4 implies that the total number of the compromised sensors is less than half of the total number of sensors in the network. That is, if q neighbors of an intact sensor node are attacked and collude to send the same value to mislead it, there still exists q + 1 intact neighbors that communicate values different from the compromised ones. Moreover, since at least 117 half of the intact sensor’s neighbors are intact, it can update its beliefs to discard the compromised neighbor’s state estimates. Furthermore, since the time-varying graph G(k) resulting from isolating the compromised sensors, based on Assumptions 3 and 4, the entire network is still collectively observable. Using the trust and confidence of neighboring sensors, the incoming information from the compromised communication channels is discarded. Now taking norm of equation (5.66) from both sides and then using the triangular inequality, one has (cid:107)¯η(k + 1)(cid:107) ≤ (cid:107)(IN ⊗ A)M (k)¯η(k)(cid:107) + (cid:107)(Υ ⊗ A)L(k)˜η(k)(cid:107) +(cid:13)(cid:13)( ¯β ⊗ A)(InN − M (k))E(k)(cid:13)(cid:13) . Using (5.57), (5.67) can be rewritten as (cid:107)¯η(k + 1)(cid:107) ≤ Ao (cid:107)¯η(k)(cid:107) + σmax(L(k))(cid:107)(Υ ⊗ A)˜η(k)(cid:107) +(cid:13)(cid:13)( ¯β ⊗ A) − ( ¯β ⊗ In)(IN ⊗ A)M (k))E(k)(cid:13)(cid:13) . After some manipulations, one has (cid:107)¯η(k + 1)(cid:107) ≤ Ao (cid:107)¯η(k)(cid:107) + σmax(A)σmax(L(k))Υm (cid:107)˜η(k)(cid:107) +(σmax(A) + σmax(Ao))(cid:13)(cid:13) ¯β(cid:13)(cid:13)√ N τ, with Υm defined in (5.60). Then, using (5.59), one can write (5.69) as (cid:107)¯η(k + 1)(cid:107) ≤ Ao (cid:107)¯η(k)(cid:107) + (σmax(A) + σmax(Ao))(cid:13)(cid:13) ¯β(cid:13)(cid:13)√ +σmax(A)σmax(L(k))Υm (cid:112)N (α/(cid:107)Ci(cid:107) + B), N τ After solving (5.70), one has (cid:107)¯η(k)(cid:107) ≤ (Ao)k (cid:107)¯η(0)(cid:107) + k−1(cid:88) (Ao)k−m−1Bo, m=0 (5.67) (5.68) (5.69) (5.70) (5.71) where A0 and B0 are given in (5.62). This completes the proof of part 1. Based on Assumption 3, the distributed sensor network is always collectively observable. Thus, based on result provided in [137], one can conclude that A0 is always Schur and then the upper bound on state estimation error becomes (5.63). This completes the proof. Based on the attack detection approach presented in Algorithms 1 and 2, one can detect the attacker’s misbehavior and estimate the actual state using the result presented in Theorem 6 and Algorithm 3. 118 Algorithm 3 Secure Distributed Estimation Mechanism (SDEM). 1: Start with initial innovation sequences and design parameters Υ1 and Λ1. 2: procedure ∀i = 1, . . . , N 3: Use samples of innovation sequences {ra {ri(l), . . . , ri(l − 1 + w)} based on (5.38) and (5.11), ∀l ∈ k. i (l), . . . , i (l − 1 + w)} and ra 4: Estimate the ˆDKL(Pra i 5: Based on (5.50)-(5.51), compute confidence βi(k) as (κ1)k−l+1 Υ1 + ˆDKL(Pra i ||Pri) using (5.44). k−1(cid:88) βi(k) = Υ1 l=0 . ||Pri) (5.72) 6: For each sensor node j ∈ Ni, use samples of innovation sequences {ζa 1 + w)} and {ri(l), . . . , ri(l − 1 + w)} based on (5.47) and (5.11), ∀l ∈ k. i,j(l), . . . , ζa i,j(l− ||Pri) using (5.48). 7: Estimate the ˆDKL(Pζa i,j 8: Using (5.53)-(5.54), compute trust σi,j(k) as k−1(cid:88) l=0 σi,j(k) = Λ1 (κ2)k−l+1 Λ1 + ˆDKL(Pζa i,j θi,j(l). ||Pri) (5.73) 9: Using the sensor measurement yi(k) with the confidence βi(k), the trust on neighbor’s σi,j(k) and neighbor’s state estimates ˜xj(k), ∀j ∈ Ni, update the resilient state estimator in (5.56). 10: end procedure 5.6 Simulation Results In this section, we discuss simulation results to demonstrate the efficacy of presented attack detection and mitigation mechanism. The sensor network assumed to have following undirected graph topology as given in Fig. 5.2 with objective to follow the desired process dynamics. Consider the process dynamics in (5.1) for generating the target trajectory as  cos(π/200) − sin(π/200) sin(π/200) cos(π/200)  x(k) + w(k), x(k + 1) = with the observation matrix Ci in (5.2), noise covariances and initial state as Ci = [5 0; 0 2], Q = I2, Ri = I2, x0 = (0.5, 0). 119 (5.74) (5.75) Figure 5.2: Communication topology. (a) (b) Figure 5.3: Sensor network without any attack. (a) State estimation errors (b) Transmit function for sensor 2 For intact sensor network, the state estimates of sensors converge to the desired process state in the mean square sense and their state estimation error goes to zero for each sensor node as shown in Fig. 5.3 (a). The event generation based on the event-triggering mechanism in (5.3) with the triggering threshold α = 1.8 is shown in Fig. 5.3 (b). Then, we consider the sensor 2 of the network is compromised with the adversarial input δ2(k) = 2 + 10 sin(100k) after 20 seconds. Fig. 5.4 120 51015202530Time (s)-20-100102000.51251015202530Time (s) (a) (b) Figure 5.4: Sensor node 2 under continuous-triggering misbehavior. (a) State estimation errors (b) Transmit function for sensor 2 (a) shows the attacker’s effect on sensor 2 and one can notice that the compromised sensors and other sensors in the network deviates from desired target state and results in non-zero estimation error based on attacker’s input. Furthermore, the event generation based on the event-triggering mechanism in (5.3) in the presence of attack is shown in Fig. 5.4 (b) and it can be seen that after injection of the attack on sensor 2, the event-triggered system becomes time-triggered and shows continuous-triggering misbehavior. This result follows the analysis presented for the continuous- triggering misbehavior. In Fig. 5.5, we show the results for non-triggering misbehavior for sensor node 2 which follow the presented analysis. Now, we detect the effect of the attack on the sensor using presented attack detection mechanism. Fig. 5.6 (a) shows the result for estimated KL divergence based attack detection mechanism and it illustrates that the after the injection of attack signal the estimated KL divergence starts increasing for compromised sensor node as well as for sensor nodes which has a path from the compromised sensor. One can always design a threshold and detect the effect of the attack in the sensor network 121 102030405060Time (s)-2002000.51210203040Time (s) (a) (b) Figure 5.5: Sensor node 2 under non-triggering misbehavior. (a) State estimation errors (b) Transmit function for sensor 2 and, then isolate the corrupted sensor in WSNs to avoid propagation of attack in the WSNs. The estimated divergence for the compromised sensor, i.e., sensor 2 grows after attack injection at k = 20 which follows the result presented in the Theorem 4. The confidence of the sensor is evaluated based on the Lemma 1 with the discount factor κ1 = 0.5 and the uncertainty threshold Υ1 = 0.5. Fig. 5.6 (b) shows the confidence of sensors in the presence of the considered attack which is close to one for healthy sensors and tends to zero for the compromised one. Then, the belief based proposed resilient estimator is implemented and Fig. 5.7 shows the result for the state estimation using the resilient estimator (5.56). After the injection of attack, within a few seconds, the sensors reach consensus on the state estimates, i.e., the state estimates of sensors converge to the actual position of the target. The result in Fig. 5.7 follows Theorem 6. 122 102030405060Time (s)-200204000.512102030405060Time (s) (a) (b) Figure 5.6: Sensor node 2 under attack. (a) Estimated KL divergence (b) Confidence of sensors Figure 5.7: State estimation errors under attack on sensor 2 using proposed resilient state estimator. 5.7 Conclusion In this chapter, first, we analyze the adverse effects of cyber-physical attacks on the event- triggered distributed Kalman filter (DKF). We show that attacker can adversely affect the perfor- 123 1020304050Time (s)024681020304050Time (s)00.20.40.60.811020304050Time (s)-10010 mance of the DKF. We also show that the event-triggered mechanism in the DKF can be leveraged by the attacker to result in a non-triggering misbehavior that significantly harms the network con- nectivity and its collective observability. Then, to detect adversarial intrusions in the DKF, we relax restrictive Gaussian assumption on probability density functions of attack signals and estimate the Kullback-Leibler (KL) divergence via k-nearest neighbors approach. Finally, to mitigate attacks, a meta-Bayesian approach is presented that incorporates the outcome of the attack detection mech- anism to perform second-order inference and consequently form beliefs over beliefs, i.e., confidence and trust of a sensor. Each sensor communicates its confidence to its neighbors. Sensors then incorporate the confidence of their neighbors and their own trust about their neighbors into their posterior update laws to successfully discard corrupted sensor information. Then, the simulation result illustrates the performance of the presented resilient event-triggered DKF. 124 CHAPTER 6 ASSURED LEARNING-ENABLED AUTONOMY: A METACOGNITIVE REINFORCEMENT LEARNING FRAMEWORK 6.1 Introduction This chapter presents the safe reinforcement learning (RL) framework for autonomous control systems under constraints. Reinforcement learning agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an uncertain system might encounter. To guarantee performance while assuring the satisfaction of safety constraints across variety of cir- cumstances, an assured autonomous control framework is designed by empowering RL algorithms with meta-cognitive learning capabilities. We first discuss that RL agents with pre-specified reward functions cannot guarantee satisfaction of the desired specified constraints and performances across all circumstances that an uncertain system might encounter. That is, the system either violates safety specifications or achieves no optimality and liveness specifications. To overcome this issue, a metacognitive decision-making layer is augmented with the RL agent to learn what reward func- tions to choose to satisfy desired specifications and to achieve a good enough performance across variety of circumstances. More specifically, a fitness function is defined in the metacognitive layer that indicates how safe the system would react in the future for a given reward function and in case of a drop in the fitness function, a Bayesian RL algorithm will proactively adapt the reward function parameters to maximize system’s assuredness (i.e., satisfaction of the desired STL safety and liveness specifications) and guarantee performance. Off-policy RL algorithms are proposed to find optimal policies corresponding to each hyperparameter by reusing the data collected from the system. The proposed approach separates learning the reward function that satisfies specifications from learning the control policy that maximizes the reward and thus allows us to evaluate as many hyperparameters as required using reused data collected from the system dynamics. 125 6.2 Preliminaries 6.2.1 Notations Throughout the chapter, R and N represent the sets of real numbers and natural numbers, respec- tively. Rn denotes n-dimensional Euclidean space. The superscript (.)T denotes transposition. I denotes the identity matrix of proper dimension. [K]j denotes the j-th element of the vector K. [K]ij denotes the [i, j]-th entry of the matrix K. diag(A) denotes a diagonal matrix in which all off-diagonal entries are zero, i.e., [A]ij = 0, ∀i (cid:54)= j. Tr(A) stands for trace of the matrixA. When a random variable εi is distributed normally with mean m and variance w2, we use the notation εi ∼ N (m, w2). ⊗ denotes Kronecker product and vec (A) denotes the mn-vector constructed by stacking the columns of matrix A ∈ Rn×m on top of one another. Definition 1. The weighted Kullback–Leibler (KL) divergence between distributions P and Q is (cid:19)h(x) defined as [164] (cid:90) KL(X||Z) = Dh PX (θ) log dθ (6.1) (cid:18) PX (θ) PZ (θ) where h(x) is a non-negative real-valued weighting function. Note that the weighting function is defined to weight more heavily promising regions of the state space. Note also that DKL(PX||Pz) ≥ 0 and DKL(PX||Pz) = 0 if and only if PX = Pz. 6.2.2 Signal Temporal Logic Temporal logics can be used to specify rich time-dependent constraints for control systems in a wide variety of applications. Signal temporal logic (STL) is a category of temporal logics, which allows the specification of temporal properties of real-valued signals. STL is a predicate logic defined over continuous-time signals [165]-[168]. Let x(t)be a continuous-time signal. The predicates σ are evaluated as True ((cid:62)) or False (⊥) according to a corresponding predicate function zσ(x) : Rn → R as  (cid:62) zσ(x) > 0 ⊥ zσ(x) ≤ 0 σ = 126 (6.2) The predicate function zσ(x) is a linear or nonlinear combination of the elements of the signal x and the predicate σ belongs to a set of predicates Pσ = [σ1, σ2, . . . , σN ] with N ∈ N denoting the number of predicates. Predicates can be recursively combined using Boolean logic negation (¬), disjunction (∨) and conjunction (∧) as well as temporal operators eventually ((cid:5)), globally or always ((cid:3)) and until (U ) to form increasingly complex formulas ϕ (also referred to as task specifications) as ϕ := (cid:62)|σ|¬σ| ϕ1 ∧ ϕ2|ϕ1 ∨ ϕ2|(cid:5)[a,b]ϕ| (cid:3) [a,b]ϕ| ϕ1 U[a,b]ϕ2 For each predicate σi, i = 1..., N, a predicate function zσ(x(t)) is defined as (2). The time bounds of the until operator ϕ U[a,b]µ are given as a, b ∈ [0,∞) with a < b. The commonly used temporal operators eventually and always follow from (cid:5)[a,b]ϕ = (cid:62)U[a,b]ϕ, respectively. For example, the temporal formula (cid:5)[3,6]ϕ is satisfied when the STL formula ϕ becomes True within the time interval of 3 to 6 seconds. A signal x(t) is said to satisfy an STL expression at a time t by the following qualitative semantics [164]-[167]. (x, t) (cid:15) σ (x, t) (cid:15) ¬σ (x, t) (cid:15) ϕ ∧ µ (x, t) (cid:15) ϕ ∨ µ (x, t) (cid:15) ϕU[a,b]µ (x, t) (cid:15) (cid:5)[a,b]ϕ (x, t) (cid:15) (cid:3) [a,b]ϕ ⇔ zσ(x(t)) > 0 ⇔ ¬((x, t) (cid:15) σ) ⇔ (x, t) (cid:15) ϕ ∧ (x, t) (cid:15) µ ⇔ (x, t) (cid:15) ϕ ∨ (x, t) (cid:15) µ ⇔ ∃t1 ∈ [t + a, t + b] s.t. (x, t1) (cid:15) µ ∧∀t2 ∈ [t, t1], (x, t2) (cid:15) ϕ ⇔ ∃t1 ∈ [t + a, t + b] s.t. (x, t1) (cid:15) ϕ ⇔ ∀t1 ∈ [t + a, t + b] s.t. (x, t1) (cid:15) ϕ (6.3) The symbol (cid:15) denotes satisfaction of an STL formula. The time interval [a, b] differentiates STL from general temporal logic and defines the quantitative timing to achieve the continuing temporal formula. Apart from syntaxes and qualitative semantics, STL is also equipped with various robustness measures that quantify the extent to which a temporal constraint is satisfied. Given STL formulas ϕ and µ, the spatial robustness is defined as [167] 127 ρσ(x, t) = zσ(x(t)) ρ¬σ(x, t) = −ρσ(x, t) ρϕ∧µ(x, t) = min(ρϕ(x, t), ρµ(x, t)) ρϕ∨µ(x, t) = max(ρϕ(x, t), ρµ(x, t)) ϕU[a,b]µ ρ (cid:5)[a,b]ϕ ρ (cid:3) ρ t1∈[t+a, t+b] t1∈[t+a, t+b] t1∈[t+a, t+b] (x, t) = (x, t) = max (x, t) = min [a,b]ϕ max ρϕ(x, t1) ρϕ(x, t1) (min(ρµ(x, t1), min t2∈[t, t1] (ρϕ(x, t2)) (6.4) This robustness measure determines how well a given signal x(t) satisfies a specification. The space robustness defines such a real-valued function ρσ(x, t) which is positive if and only if (x, t) (cid:15) σ. That is, ρσ(x, t) > 0 ⇔ (x, t) (cid:15) σ . Let a trajectory τ [0, T ] be defined by the signals x(t) throughout its evolution from time 0 to T . A trajectory then satisfies the specification if and only if ρσ(x, t) > 0 ∀ t ∈ [0, tf ] , where tf is the end time of the STL horizon. The robustness degree is the bound on the perturbation that the signal can tolerate without changing the truth value of specification. 6.2.3 Gaussian process A Gaussian process (GP) can be viewed as a distribution over functions, in the sense that a draw from a GP is a function. GP has been widely used as a nonparametric regression method, where the goal is to find an approximation of a nonlinear map f : X → R from a state x to the function value f (x). The function values f (x) to be random variables, so that any finite number of them have a joint Gaussian distribution. When a process f follows a Gaussian process model, then f (.) ∼ GP (m0(.), k0(. , .)) (6.5) where m0(.) is the mean function and k0(. , .) is the real-valued positive definite covariance kernel function [169]. In GP inference, the posterior mean and covariance of a function value f (x) at an arbitrary state x can be obtained by conditioning the GP distribution of f on a set of past measurements. Let Xn = [x1, ..., xn] be a set of discrete state measurements, providing the set of 128 inducing inputs. For each measurement xi, there is an observed output yi = f (xi) = m(xi) + εi where εi ∼ N (0, w2). The stack outputs give y = [y1, ..., yn]. The posterior distribution at a query point x is also a Gaussian distribution and is given by [169] −1 mn(x) = m0(x) + K(x, Xn)T (Kn + I w2) kn(x, x(cid:48)) = k0(x, x(cid:48)) − K(x, Xn)T (Kn + I w2) −1 K(x, XI ) (6.6) where the vector K(x, Xn) = [k0(x, x1), ..., k0(x, xn)] contains the covariance between the new data, x, and the states in Xn, and [Kn]ij = k0(xi, xj), ∀i, j ∈ {1, ..., n}. 6.3 Problem Statement and Motivation In this section, the problem of optimal control of systems subject to the desired specifications is formulated. We then discuss that optimizing a single reward or performance function cannot work for all circumstances and it is essential to adapt the reward function to the context to provide a good enough performance and assure safety and liveness of the system. Consider the non-linear continuous-time system given by ˙x(t) = f (x(t)) + g(x(t))u(t) (6.7) where x ∈ X and u ∈ U denote the admissible set of states and inputs, respectively. We assume that f (0) = 0, f (x(t)) and g(x(t)) are locally Lipschitz functions on a set Ω ⊆ Rn that contains the origin, and that the system is stabilizable on Ω. The control objective is to design the control signal u for the system (6.7) to 1) make the system achieve desired behaviors (e.g., track the desired trajectory xd(t) with good transient response) while 2) guaranteeing safety specifications specified by STL, i.e., guaranteeing (x, t) (cid:15) σ where σ belongs to a set of predicates Pσ = [σ1, σ2, . . . , σN ] with N as the number of the constraints. To achieve these goals, one can use an objective function for which its minimization subject to (x, t) (cid:15) σ provide an optimal and safe control solution aligned with the intention of the designer. That is, the following optimal control formulation can be used to achieve an optimal performance (encoded in the reward function r) while guaranteeing STL specifications. Problem 1 (Safety-Certified Optimal Control). Given the system (6.7), find a control policy u that solves the following safe optimal control problem. 129 min J(x(t), u(t), xd(t)) =(cid:82) ∞ t e−γ(t−τ )r(x(τ ), u(τ ), xd(τ )) dτ s.t. (x, τ ) (cid:15) σ, ∀τ ≥ t where γ is a positive discount factor, r(x(t), u(t), xd(t)) = (cid:88) j qj rj(x(t), u(t), xd(t)) (6.8) (6.9) is the overall reward function with rj(x(t), u(t), xd(t)) as the cost for the i-th sub-goal of the system with qj as its weight function, and xd(t) is an external signal (e.g., reference trajectory). The optimization framework in Problem 1, if efficiently solved, works well for systems operating in structured environments in which the system is supposed to perform a single task and the priorities across sub-goals (i.e., qj in (6.9)) do not change over time and reference trajectories need not be adjusted. However, first of all, Problem 1 is hard to solve as it considers both optimality and safety in one framework. Second of all, even if efficiently solved, for complex systems such as self- driving cars for which the system might encounter numerous circumstances, a fixed reward function cannot capture the complexity of the semantics of complex tasks across all circumstances. As the circumstance changes, the previously rewarding maneuvers might not be achievable safely and thus the feasibility of the solution to Problem 1 might be jeopardized. Definition 2. (Feasible Control Policy). Consider the system (6.7) with specifications σ. A control policy u(t) = µ(x) is said to be a feasible solution to Problem 1 if 1. µ(x) stabilizes the system (6.7) on Ω. 2. There exists a safe set S ∈ X such that for every x0 ∈ S, xt(x0, µ) (cid:15) σ ∀t, where xt(x0, µ) is the state trajectory x(t) and time t ≥ 0 generated by (6.7) with the initial condition x0 and the policy u = µ(x). 3. J(x, u, xd) ≤ ∞ for all x ∈ Ω. The feasibility of Problem 1 can be jeopardized as the context changes unless the reward weights and/or the reference trajectory are adapted to the context. For example, consider the case where a vehicle is performing some maneuvers with a desired velocity safely under a normal road condition. If, however, the friction of the road changes and the vehicle does not adapt its aspiration towards its 130 desired reference trajectory, when solving Problem 1, it must either violate its safety specifications or the performance function will become unbounded as the system’s state cannot follow the desired speed without violating safety. Since the performance will be unbound for any safe policy, the vehicle might only wander around and not reach any goal, providing a very poor performance. This highlights the importance of proposing a metacognitive framework that adapts to the context. Remark 1. One might argue that the original weights or desired reference trajectory in Problem 1 can be appropriately designed in a context-dependent fashion to ensure satisfaction of the desired specifications across variety of circumstances. However, during the design stage, it is generally not possible to foresee the circumstances that will cause violation of the desired specifications and come up with a context-dependent reward function. This is generally due to modeling errors, unknown changes in the environment, and operator intervention. Solving Problem 1 for systems with uncertain dynamics is hard. While RL algorithms can solve optimal control problems for systems with uncertain dynamics, they typically do so without taking into account safety constraints. To deal with this challenge, in this chapter, we use two layers of control to solve Problem 1 and guarantee its feasibility. In the lower layer, an RL algorithm is used to find an optimal controller that minimizes the performance (6.8) without considering the safety constraints. The metacognitive layer then monitors safety constraints and their level of satisfaction to proactively make meta-decisions about what reward function to optimize to guarantee feasibility of Problem 1 as the context changes. To guarantee satisfaction of the desired specifications with maximum assuredness, the metacognitive layer must be added on top of the lower-layer optimal control design to decide about priorities over sub-goals as well as the adaptation of the desired reference trajectory. The metacognitive layer monitors system-level operation and provides correc- tive action by optimizing a fitness function that guarantees systems’ liveness and safety, and thus ensures the maximum assuredness across different circumstances. 6.4 Metacognitive Control Architecture To find an optimal solution while always guaranteeing satisfaction of the desired specifications with maximum assuredness, as shown in Fig. 6.1, a metacognitive RL algorithm is presented and it consists of: 131 • A low-level RL based controller K for the system S minimizes the performance (6.8) without considering the safety constraints. • A high-level metacognitive controller C adapts the reward function for the low-level controller K to guarantee feasibility of Problem 1 and to maximize assuredness. We aim at synthesizing the controller C for the system (6.1) such that the closed-loop sys- tem achieves the desired objective defined in terms of minimization of a low-level cost function J(x(t), u(t), xd(t)) in (6.8) while guaranteeing systems’ liveness and safety, i.e., (x, t) (cid:15) σ in the metacognitive layer. Separation of RL control design to optimize the performance and metacogni- tive design to maximize assuredness by optimizing a fitness function significantly simplifies solving Problem 1 and allows to present data-based techniques for solving it. Let θ1 ∈ Rd1, θ2 ∈ Rd2, and θ3 ∈ Rd3 be vectors of parameters in matrices Q(θ1), R(θ2), and xd(θ3). Let ¯λ be defined as the T . Note that we assume that the set of all admissible hyperparameters θ where θ := [θT set of all admissible parameters θ ∈ ¯λ is predefined by the designer based on some prior knowledge. With a slight abuse of notation, we write rθ, Qθ, and Rθ instead of xd(θ3), Q(θ1), and R(θ2) in what follows. 2 , θT 3 ] 1 , θT 6.4.1 Metacognitive layer Monitoring and Control The focus of this subsection is on designing a metacognitive layer that guarantees feasibility of Problem 1, regardless of the type of the RL-based controller (e.g., policy gradient, policy iteration, etc) used in the lower layer. While the proposed approach is not limited to any specific type of performance, in the sequel, we consider an optimal set-point tracking problem with STL safety specifications. That is, we consider the following parameterized reward function (6.9) in terms of the hyperparameter vector θ. r(x(τ ), u(τ ), r(τ ), θ) = (x(τ ) − rθ)T Qθ(x(τ ) − rθ) + uT (τ )Rθu(τ ) (6.10) where Qθ and Rθ are parametrized design weight matrices, which are assumed diagonal, and rθ is the desired set-point. The hyperparameter can be then defined as the stack of all parameters of the design weight matrices and the desired set-point. The performance function in Problem 1 then becomes 132 Figure 6.1: Proposed metacognitive control scheme. S: The system to be controlled; K: low-level RL controller; C: high-level metacognitive layer scheme. J(x, u, r) =(cid:82) ∞ t e−γ(t−τ )[(x(τ ) − rθ)T Qθ(x(τ ) − rθ) + uT (τ )Rθu(τ ))]dτ (6.11) Assumption 1. Let u∗ θ(x) be the optimal solution to Problem 1 for a set of hyperparameters θ in the performance function. At each circumstance, there exists a θ such that its corresponding optimal control policy u∗ θ(x) is a feasible solution to Problem 1. Remark 2. Note that the lower layer receives the hyperparameter vector θ from the metacognitive layer and any RL algorithm for an unknown system can be used to minimize the accumulated reward function for the specified θ and determine the optimal control policy in the lower layer. While the presented framework considers a model-free RL-based control approach in the lower layer, it can be applied to any performance driven-control architecture such as [170]-[171]. The metacognitive layer monitors the functionality of the system-level operations and performs 133 corrective actions for the lower-level controller when required. That is, hyperparameters (i.e., design weights and set-point) are adjusted in the metacognitive layer to guarantee feasibility of Problem 1 with maximum assuredness as the context changes. To monitor the functionality of the lower- layer, i.e., to check if it performs as intended, accumulated robustness degree of temporal logic specifications is used to define a fitness function as a metric for measuring the system’s safety (constraint satisfying) and liveness (goal-reaching and reference tracking). If the fitness function drops, which is an indication that the feasibility of the solution to Problem 1 is about to be violated, the metacognitive layer adapts the hyperparameters of the reward function to guarantee feasibility. Since feasibility of Problem 1 is only guaranteed if the safety STL specifications are satisfied and the performance remains bounded, we use degree of robustness of the safety STL specifications as well as degree of robustness of goal or reference reaching STL specifications (liveness specifications to assure the performance boundness) and define a fitness as a metric to proactively monitor the feasibility of Problem 1 and reacts before STL specifications are violated. 6.4.1.1 Metacognitive Monitoring While N safety specifications are considered as constraints in Problem 1, we define the N + 1-th specification as the liveness (set-point tracking) of the system as the following STL formula. σN +1 = (cid:5)[0,ts]¬((cid:107)x(t) − rθ(cid:107) > ε) (6.12) with ε as the envelope on the tracking error, and ts as the expected settling time. Lemma 1. Let u∗ (6.11). If u∗ θ(x) be the optimal control policy found by solving Problem 1 with the performance θ(x) is feasible, then xt(x0, u∗ θ) (cid:15) σN +1 ∀t. Proof. For a given θ, under the assumption that there exists a stabilizing control policy, it is shown in [170] that the performance is bounded for the optimal controller u∗ θ(x). On the other hand, based on Barbalat lemma [172], a uniformly continuous real function, whose integral up to infinity exists and is bounded, vanishes at infinity. Therefore, the performance is bounded for u∗ θ(x) if it makes (cid:107)x(t) − r(t)(cid:107) become very small after the settling time ts. This completes the proof. 134 Remark 3. Incorporating goal-reaching specifications can help adapt the reward function to avoid performing unintended functionalities in many applications. A classic example for which this could have helped resolve the problem is OpenAI’s demo (https://openai.com/blog/faulty-reward- functions/) for which an RL agent in a boat racing game kept going in circles while repeatedly hitting the same reward targets to gain a high score without having to finish the course. The STL specification (x, t) (cid:15) σN +1 essentially states that the trajectory tracking error should eventually become less than ε after the expected settling time. Otherwise, the set-point r is aimed too high to be achieved safely and must be adjusted. The settling time could be obtained from the knowledge that we have from the control expectation from the lower layer and can be conservative. We now extend the set of predicates from P = [σ1, ..., σN ] with N as the number of the constraints to P = [σ1, ..., σN , σN +1] to include the liveness STL predicate. The monitor then predicts if (x, t) (cid:15) σ will be satisfied all the time to make proactive meta decisions accordingly. Let the stack of predicate functions for safety and liveness specification STL σi ∈ Pσ be  zσ(x) =  zσ1(x(t)) zσ2(x(t)) ... zσN +1(x(t)) (6.13) (6.14) Based on (6.12), the predicate function for liveness becomes zσN +1(x(t)) = ε − (cid:107)x(t) − rθ(cid:107) . Using zσ(x), a fitness function is now designed to monitor and estimate the accumulated level of satisfaction of the desired STL specifications (i.e., the safety-value function) in the metacognitive layer. If the fitness function drops, which is an indication that either the safety constraints are about to be violated in the future or liveness of the system will not guaranteed, i.e., the feasibility of the solution to Problem 1 is about to be violated, then metacognitive layer proactively adapts the hyperparameters of the reward function to guarantee feasibility. Definition 3. Consider a specific hyperparameter vector θ and let u = µ(x) be a feasible solution to Problem 1 with the performance (6.11). The set S ∈ X is called a viable set (safe and live) of 135 µ(x) if for every x0 ∈ S, xt(x0, µ) (cid:15) σ ∀t and is defined as Sµ,θ(x) = {x0 : xt(x0, µ) (cid:15) σ, ∀t ≥ 0} (6.15) where σ belongs to the set of predicates P = [σ1, ..., σN , σN +1] with the predicate functions given in (6.13). Note that the dependence of Sµ,θ(x) on θ is because the specification (6.14) depends on θ. Lemma 2. Consider a specific hyperparameter vector θ. If the set Sµ,θ(x) is empty for all control policies µ ∈ U, then there exists no feasible solution to Problem 1 for the hyperparameter vector θ. Proof. The set of predicates P includes all safety constraints with predicates σ1, ..., σN as well as the liveness condition with predicate σN +1 defined in (6.12). If the set Sµ,θ(x) is empty, then, there is no control policy µ(x) to simultaneously satisfy all safety constraints and make the performance bounded (based on Lemma 1). Therefore, based on Definition 1, there is no feasible solution to Problem 1. This completes the proof. To monitor the feasibility of Problem 1 for the current hyperparameter vectorθ, we now define a fitness function based on the quantitative semantics of STL specifications under an optimal control policy found by minimizing (6.11) for the given θ. Let u∗ θ(x) be the optimal control policy found by minimizing the performance function (6.11) for a given θ and is applied to the system. I will be shown in the next section how to use off-policy learning to find optimal solutions for many hyperparameters while only a behavior policy is applied to the system to collect data. Based on (6.4), once u∗ θ(x) is applied to the system, to monitor the degree of robustness of specifications under u∗ θ(x), one can write the conjunction over predicate functions P = [σ1, ..., σN , σN +1] as ξθ(x, t, σ) = ∧N +1 i=1 ρσi(x, t) = min i∈[1,...,N +1] (cid:0)ρσ1(x, t), ..., ρσN +1(x, t))(cid:1) (6.16) where ρσi(x, t) = zσi(x(t)) and x(t) = xt(x0, u∗ generated by (6.7) with the initial condition x0 and the policy u∗ θ(x). θ(x)) is the state trajectory x(t) and time t ≥ 0 In order to avoid the non-smooth analysis, a smooth under approximation for ξθ(x, t, σ) is provided in the following lemma. 136 Lemma 3. Consider a conjunction of N + 1 predicate functions given in (6.13) and their overall robustness ξ(x, t, σ) defined in (6.16). Then, θ (x, t, σ) (cid:44) − N +1(cid:88) ξa ln [e−ρσi(x,t)] ≤ ξθ(x, t, σ) (6.17) Proof. See [168]. i=1 Lemma 4. The sign of the function ξa θ (x, t, σ) is the same as the sign of the function ξθ(x, t, σ). Proof. It is immediate from (6.17) that ξθ(x, t, σ) < 0 results in ξa θ (x, t, σ) < 0. On the other hand, if ξθ(x, t, σ) > 0, then based on (6.16), ρσi(x, t) > 0 for all i ∈ [1, ..., N + 1] and thus − ln(e−ρσi (x,t)) > 0, i = [1, ..., N + 1], which completes the proof. Now, a fitness function for a hyperparameter vector θ as a metric for measuring the system’s safety and liveness in terms of overall robustness is defined as fθ(x(t)) = e−a(τ−t)[(1 − l) log(ξa θ (x, τ, σ)) + (1 + l) log(1 + ξa θ −1(x, τ, σ))]dτ (6.18) (cid:90) ∞ t where l = (ξa θ (x, τ, σ)). The first term is a barrier function to make the fitness infinity if the degree of robustness becomes negative (STL specifications are violated). On the other hand, the lower the fitness, the better the robustness of safety and liveness specifications. This is because the inverse of the degree of robustness is used in the fitness function. Note that for the nonempty set Sµ,θ(x) in (6.15), the fitness function in (6.18) becomes fθ(x(t)) = e−a(τ−t)[2 log(1 + ξa θ −1(x, τ, σ))]dτ (6.19) (cid:90) ∞ t Theorem 1. There exists a control policy µ(x) for which Sµ,θ(x) is nonempty, if and only if the fitness function in (6.18) is bounded over some set. Proof. If the set Sµ,θ(x) is empty for all µ(x) ∈ U, then, for any initial condition x0 and anyµ(x) ∈ U, one has xt(x0, µ) (cid:50) σ and consequently ξθ(x, t, σ) < 0 ⇒ ξa θ (x, t, σ) < 0 for some time t . This makes the fitness function (6.18) unbounded because of the first term. On the other hand, if the set Sµ,θ(x)is nonempty, then for some control policy µ(x) ∈ U and any initial condition in x0 ∈ Sµ,θ(x), 137 xt(x0, µ) (cid:15) σ, ∀t ≥ 0 . Thus, ξθ(x, t, σ) > 0∀t ≥ 0. Based on Lemma 4, ξa ε0 = min t≥0 θ (x, t, σ) > 0∀t ≥ 0. Let e−a(τ−t) dτ ≤ 2 a ln(1 + 1 ε0 ) < ∞, ∀x0 ∈ Sµ,θ(x) (6.20) θ (x, t, σ)} < ∞. Then, {ξa (cid:90) ∞ fθ(x0) ≤ 2 ln(1 + 1 ε0 ) t This completes the proof. We now present an online data-based approach to learn the fitness function as a function of the state. Define the meta reward as rm(x, t) = (1 − l) log(ξa θ (x, t, σ)) + (1 + l) log(1 + ξa θ −1(x, t, σ)) (6.21) The fitness function corresponding to u∗ θ(x) at one specific state can now be interpreted as the accumulated meta rewards the system receives starting from that state when u∗ θ(x) is applied to the system, and thus it can be interpreted as a safety-related value for that state. To calculate the fitness function for all states in a set of interest, one can run the system from all those states to collect a trajectory and then calculate the fitness function. This, however, is not practical and not data efficient. To obviate this issue, the fitness function in (6.18) is written as e−a(τ−t)rm(x, τ ) dτ + fθ(x(t + T )) fθ(x(t)) = (cid:90) t+T t (6.22) where fθ(x(t + T )) is the fitness value sampled at the time t + T . This equation resembles the Bellman equation in RL and its importance is that it allows us to express the fitness values of states as fitness values of other sampled states. This opens the door for value-function approximation-like approaches with function approximation for calculating the fitness value for each state. That is, since the fitness of consecutive samples are related using (6.22) , a parametrized form of the fitness function or a nonparametric form of it can be used to learn the fitness function for all states of interest using only a single trajectory of the system. This will allow fast proactive decision making in the upper layer and will prevent the system from reaching an irreversible crisis, for which no action can keep the system in its safety envelope in the future. To this end, a data-based approach will assess the fitness online in real-time without using a model of the system. Once the fitness is learned, a monitor will detect changes in the fitness and consequently the situation. 138 We first consider learning the fitness function. Since the form of the fitness function is not known, Gaussian processes (GP) is employed to estimate the function fθ(x(t)). In analogy to GP regression for RL [173], a GP prior is first imposed over the fitness function, i.e., fθ(x) ∼ N (m0(x), k0(x, x(cid:48))) with a mean m0(x) and covariance k0(x, x(cid:48)). The covariance form can be chosen to reflect the prior knowledge concerning the similarity of the state’s fitness in the domain of interest. To employ GP regression, based on (6.22), the temporal difference (TD) error for fitness function is written as fθ(x(t)) − fθ(x(t + T )) = e−a(τ−t)rm(x, τ )dτ (6.23) (cid:90) t+T t To learn the fitness function using GP, the sequence of samples of the fitness function corresponding to the trajectory x1, ..., xL is used to present the following generative model. R(xt+T ) = fθ(x(t)) − fθ(x(t + T )) + δ(t) (6.24) where R(xi) = e−a(τ−ti−1)rm(x, τ ) dτ (cid:90) ti ti−1 and δ(t) ∼ N (0, w2) denotes zero-mean Gaussian noise indicating uncertainty on the fitness func- tion. Note that (6.24) can be considered as a latent variable model in which the fitness function plays the role of the latent or hidden variable while the meta reward plays the role of the observable output variable. As a Bayesian method, GP computes a predictive posterior over the latent values θ(x)) be the by conditioning on observed meta rewards. Let X trajectory collected after u∗ (x) (the optimal control input found by the lower layer RL for the θi hyperparameter θ) applied to the system. An algorithm for the derivation of u∗ θ(t), for the new hyperparameter θ based on the recorded history data will be given and discussed in detail later in L = [x1, ..., xL] with xt = xt(x0, u∗ θ Section V. Let define the following vectors for this finite-state trajectory of length L as L = [R(x1), . . . , R(xL)]T Rθ L = [fθ(x1), . . . , fθ(xL)]T f θ ¯δL = [δ(1), . . . , δ(L)]T (6.25) and covariance vector and matrices are given by 139 Kθ (cid:80) Based on (6.25)-(6.26), one has¯δL  ∼ N{ f θ L K(x, X θ L) = [k0(x1, x), . . . , k0(xL, x)]T , T θ , L)] θ L), . . . , K(xL, X L = [K(x1, X L = diag(w2, . . . , w2)  0 m0(x)  , (cid:80) 0 L 0 Kθ L } Using (6.24), for a finite state trajectory of length L one has where Rθ L−1 = HLf θ L + ¯δL−1 1 −1 0 . . . 1 −1 . . . 0  0 ... 1 −1 L +(cid:80) HL =   0 0 ... 0 m0(x) (6.26) (6.27) (6.28) (6.29) (6.30) } (6.31) (6.32) (6.33) Based on standard results on jointly Gaussian random variable, one has L−1×L Rθ L−1 fθ(x)  ∼ N{ 0 . . .  , HLKθ LHT T K(x, Xθ L) L−1 HLK(x, Xθ L) k0(x, x(cid:48)) HT L Using (6.6), the posterior distribution of the fitness function fθ(x(t))at state x(t), conditioned on observed integral meta reward values Rθ t−1 is given by t−1) ∼ N (νθ (fθ(x(t))|Rθ t (x), pθ t (x)) where with T θ νθ t (x) = m0(x) + K(x, X t ) t (x) = k0(x, x) − K(x, X θ pθ t ) αθ t T Cθ t K(x, X θ t ) t +(cid:80) t +(cid:80) t HT −1 Ht t−1) −1 t−1) Ht T αθ t = Rθ t−1 t (HtKθ Cθ t = HT (HtKθ t HT Based on (6.22), the following difference error is used as a surprise signal to detect changes in the fitness. SP (t) = νt(x) − νt+T (x) − R(xt) (6.34) 140 where νt(x) is the mean of the GP and SP (t) is the surprise signal at the time t. Note that after the GP learns the fitness, the surprise signal will be small, as the GP is learned to assure that (6.22) is satisfied. However, once the robustness degree changes due to a change in the situation, the surprise signal will increase, and if the average of the surprise signal over a horizon is bigger than a threshold, a new fitness function will be learned and metacognitive decisions will be made (as explained in the next section) to improve the fitness if it is below some desired state-dependent threshold. That is, the monitor will perform an evaluation of the surprise signal in moving horizon fashion and will indicate a change if (cid:90) t+∆ t SP (τ ) dτ ≥ β (6.35) for some threshold β. The metacognitive layer does not adapt the hyperparameters all the time and only adapts them when two conditions are satisfied: 1) an event indicating a change is triggered, i.e., if (6.35) is satisfied, 2) the new learned fitness is below a threshold, i.e., it does not indicate future safety and liveness. To monitor the second requirement for detecting a threat that requires adapting the hyperpa- rameters, inspired by [173], a KL divergence metric is defined to measure similarity the GP learned for the current fitness and a base GP. The base GP can be obtained based on the knowledge of the constraints and STL specifications to assure the minimum safety of the system. Note that constructing the base GP only requires the knowledge of the STL specifications and is independent of the system dynamics, the situation, or the control objectives. A library of safe GPs can also be constructed as based GP and the previous learned GPs for other circumstances can be added. If the fitness remains close to any of the GPs in this library, this indicates that the system safety is still not in danger. If not, it is highly likely that the system’s safety is in danger of being violated in the near future. Since the covariance function K corresponds to a (possibly infinite-dimensional) feature space, the Gaussian process can be viewed as a Gaussian distribution in the feature space. To show this, let φ(x) be the feature space representation of the covariance function so that K(x, x) = φ(x)T φ(x). Then, we use the notation fθ(x) = GPK (αθ t , Cθ t ) (6.36) 141 to denote the GP with the corresponding covariance function K and parameters αθ t and Cθ t . We define the GP for the base fitness as fb(x) = GPK (αb t , Cb t ) (6.37) Lemma 5 [174]. Let the base GP and the GP for the hyperparameters θ share the same inducing L) = Q−1. θ L = [x1, ..., xL], and the same covariance function K. Let K(X inputs, i.e., Xb θ L, Xb L = X Then, the KL divergence between two dynamic GPs fθ(x) and fb(x) is given by DKL(fθ(x)||fb(x)) = DKL(GPK (αθ V (αθ = (αθ −1 and W = Tr[(Q + Cb t , Cθ t − αb t )||GPK (αb t − αb T t ) t ) + W t )V − I] − log [(Q + Cb t )V ] t , Cb t )) where V = (Q + Cθ t ) (6.38) Remark 4. Since the base fitness function is learned offline based on the minimum acceptable degree of robustness, one can select many inducing points for the base fitness function and retain only a subset of them in the expressions of the posterior mean and kernel functions that increase the similarity of the inducing inputs for both GPs. To use the KL divergence metric (38), one can use the fact that K(x, x) = φ(x)T φ(x), and so K(x, x1) = K(x, x2)K(x1, x2), to shift the inducing points of the base fitness function to those of the learned GP. Let fb(x) = [fb1 DKL(fθ(x)||fbi (x)] be the stack of base GPs. After a change, if the condition (x), ..., fbM (x)) ≤ , with fb(x) as the fitness before change, then this indicates that the system is still safe despite the change. Therefore, the monitor now triggers an event that requires min i adaptation of the hyperparameters if the following STL condition is violated (cid:90) t+∆ ϕ = (cid:3) ( ( (cid:124) t SP (τ ) dτ > β) (cid:123)(cid:122) ϕ1 (cid:125) ∧ min i (cid:124) (cid:123)(cid:122) (DKL(fθ(x)||fbi ϕ2 (x)) > ) ) (cid:125) (6.39) 6.4.1.2 Metacognitive Control to find a new set of hyperparameters to guarantee that the fitness is improved, i.e., max In this section, after the STL specification (6.39) is violated, a metacognitive controller is presented DKL(fθ(x)||fbi  with the minimum sacrifice on the performance. That is, it is desired to assure safety while achieving as much performance as possible close to the θ∗ be the optimal hyperparameter found to i (x)) ≤ 142 optimize the performance prior to the change. The metacognitive layer then performs the following optimization. min (cid:107)θ − θ∗(cid:107) s.t. min i DKL(fθ(x)||fbi (x)) ≤  (6.40) To solve this optimization, one can define the survival score function in the metacognitive layer as H(θ) = ||θ − θ∗|| + log(1 +  − min i 1 DKL(fθ(x)||fbi ) (x)) (6.41) In this chapter, safe Bayesian optimization (SBO) [175] is used to find the optimal hyperparame- ters that optimize the survival score function in (6.41). SBO algorithm provided in Algorithm 1 guarantees safety by only evaluating hyperparameters that achieve a safe score threshold with high probability. This threshold is chosen as a value below which we do not want the system fitness to fall, which can be an indication of great risk of violation of specifications. SBO is a sample efficient optimization algorithm that requires only a few evaluations of the hyper performance or survival score function to find the optimal hyperparameters. While the safe set of hyperparameters is not known in the beginning, it is estimated after each function evaluation of SBO. In fact, at each iteration, SBO tries to not only find the global maximum within the currently known safe set (ex- ploitation), but also to increase the set of hyperparameters that are known to be safe (exploration) as described in Algorithm 1. More specifically, the SBO builds a surrogate model P that maps the hyperparameters to the survival score H(θ)in the metacognitive layer and express as P : D(cid:48) → R (6.42) where D(cid:48) denotes the bounded domain of hyperparameters. Note that the latent function P in (6.42) is unknown, as the dynamics are not known. P can be sampled by running the system and evaluating the survival score function H(θ) from the recorded data. These samples of P are generally uncertain because of noisy data and the score function is typically nonconvex, and no gradients are easily available. Now, we used GP to perform a non-parametric regression for the latent function P [169] and then SBO is used to optimize the survival score and determine the optimum values of hyperpa- rameters. For the non-parametric regression, we define a prior mean function µ0(θ), which encodes prior knowledge about the survival score function P(θ), and a covariance function k(θ, θ ) which (cid:48) 143 defines the covariance of any two function values, P(θ) and P(θ(cid:48)) are used to model the uncertainty about the mean estimates. One can predict the survival score of the system corresponding to the hyperparameter θ by calculating its mean µk(θ) and covariance σk {θ1:k, P1:k} and given by P(θ) ∼ N (µk(θ), σk 2(θ)). 2(θ) over a set of k observations Based on the predicted score function, i.e., P(θ) ∼ N (µk(θ), σk 2(θ)), the lower and upper bound of the confidence interval at iteration k is given as  ¯mk(θ) = µk−1(θ) − βkσk−1(θ) Mk(θ) = µk−1(θ) + βkσk−1(θ) , (6.43) (6.44) (6.45) where βk > 0 denotes a scalar factor that defines the desired confidence interval. Based on (6.43), the safe set of all the hyperparameters θ that lead to survival score values above the threshold Pmin is given by Sk ← {θ ∈ D (cid:48)| ¯mk ≥ Pmin} Then, a set of potential maximizers is defined as [175] Tk ← {θ ∈ Sk|Mk(θ) ≥ max (cid:48) ¯mk(θ)} θ which contains all the safe set of hyperparameters for which the upper confidence interval Mk(θ) is above the best safe lower bound. In order to define, a set of potential expanders which quantifies whether a new set of hyperparameters can be classified as safe for a new observation, an optimistic characteristic function for expanders is given as gk(θ) = {θ (cid:48) ∈ D (cid:48)\Sk| ¯mk,(θ,Mk(θ))(θ (cid:48) ) ≥ Pmin} (6.46) where ¯mk,(θ,Mk(θ)) is the lower bound of the GP, based on prior data and a data point (θ, Mk(θ)) with a noiseless measurement of the upper confidence bound. Based on (6.46), one can determine how many previously unsafe points can be classified as safe according to (6.44) assuming that we measure Mk(θ) while evaluating P(θ). The characteristic function is positive if the new data point has a non-negligible chance to expand the safe set. Therefore, the set of possible expanders is given as Gk ← {θ ∈ Sk|gk(θ) > 0} (6.47) 144 Then, a new set of hyperparameters is selected to evaluate the performance on the real system by selecting the set of hyperparameters about which we are the most uncertain from the union of the sets Gk and Mk, i.e., at iteration k the score function is evaluated at θk θk ← arg maxθ∈{Gk∪Tk}(Mk(θ) − ¯mk(θ)) (6.48) The evaluation approach in (6.48) works well for expanding the safe set [175], with a trade-off between exploration and exploitation. For the exploration, the most uncertain parameter locations are usually on the boundary of the safe set, which results in efficient exploration. An estimate of the best currently known set of hyperparameters is obtained from arg max ¯mk(θ), ∀θ ∈ Sk which corresponds to the point that achieves the best lower bound on the survival score. Algorithm 4 SBO for Metacognitive Control. (cid:48)| ¯mk ≥ Pmin}where ¯mk(θ) = µk−1(θ) − βkσk−1(θ) ¯mk(θ)} where Mk(θ) = µk−1(θ) + βkσk−1(θ) 1: procedure 2: Initialize GP with (θ0,P(θ0)) 3: for k = 1, . . . do 4: Sk ← {θ ∈ D 5: Tk ← {θ ∈ Sk|Mk(θ) ≥ max (cid:48) 6: Gk ← {θ ∈ Sk|gk(θ) > 0} where gk(θ) = {θ 7: θk ← arg maxθ∈{Gk∪Tk}(Mk(θ) − ¯mk(θ)) 8: Obtain measurement P(θk) 9: Update GP with (θk,P(θk)) 10: end for 11: end procedure θ (cid:48) ∈ D (cid:48)\Sk|mk,(θ,Mk(θ))(θ (cid:48) ) ≥ Pmin} Remark 5. It is shown in [176]-[177] that, given a persistently exciting input, a single rich measured trajectory can be used to characterize the entire system trajectories. That is, having a single rich trajectory of an unknown system, the trajectory for a given sequence of inputs and an initial condition can be constructed without even applying it to the system. This can be leveraged to learn the fitness function for a given control policy selected by the Bayesian optimization algorithm without actually applying it to the system. More specifically, after a change, to evaluate the fitness of hyperparameters, a rich trajectory of the system is first collected and then used to reconstruct 145 the trajectory of the system with enough length for any hyperparameter under evaluation. This trajectory data can then be used to learn the GP for the hyperparameters without even applying its corresponding controller. This will allow us to even evaluate unsafe policies without even applying them to the system. Hence, for each set of hyperparameters, one can compute the fitness function from measured data without knowledge of the closed-loop system’s dynamics, and consequently, find the optimal hyperparameters that optimize the survival score function in (40). This resembles off-policy learning in RL. Remark 6. Note that SBO is a derivative-free optimization algorithm, which is useful since a closed-form expression of P as a function of the hyperparameters θ is not available. Also, it allows us to tune the hyperparameter with as few evaluations of P as possible which is crucial since each evaluation can be costly and time-consuming, as it requires a closed-loop experiment. 6.5 Low-Level RL-Based Control Architecture In this section, a computational data-driven algorithm is first developed to find the new op- timal control policy u∗ shown that the proposed algorithm converges to the optimal control policy u∗ hyperparameters θ. θ(t), for the new hyperparameter θ based on recorded history. It is then θ(t) for all admissible Following the same reasoning from [171], we are ready to give the following computational data- θ(t), for a new hyperparameter θ driven algorithm for finding the new optimal control policy, i.e., u∗ based on the recorded history data. Remark 7. Note that Algorithm 2 does not rely on the dynamics of the system. Note also that, inspired by the off-policy algorithm in [171], Algorithm 2 has two separate phases. In the first phase, i.e., Step 2, a fixed initial exploratory control policy u is applied, and the system information is recorded over the time interval [t0, tl]. In the second phase, i.e., Steps 3-7, then, without requiring any knowledge of the system dynamics, the information collected in the first phase is repeatedly used to find a sequence of updated policies converging to u∗. The following theorem shows that Algorithm 2 converges to the optimal control policy u∗ θ(t) for all admissible hyperparameters θ. 146 Algorithm 5 Low-level Off-policy RL-based Control. 1: procedure 2: Perform an experiment on the time interval [t0, tl] by applying fixed stabilizing control policy u(t)+e to the system, where e is the exploration noise, and then record {Xθ(t)} and corresponding {u(t)} at N ≥ l1 + m × l2 different sampling in the time interval [t0, tl], where Xθ(t) = [ed(t)T , rθ ed(t) := x(t) − rθ. 3: For a new parameters of θ where θ ∈ ¯λ, construct φ1(Xθ) ∈ Rl1 and Φ(Xθ) ∈ Rl2 as T ] T suitable basis function vectors. 4: Compute Ξk(θ) and Θk(θ) for a new set of parameters θ based on the recorded history (cid:20)vec( ¯Qθ) (cid:21) Ξk(θ) = −Ixx(θ) data as follows: and vec(Rθ) k Rθ) − 2Ixu(θ)(In ⊗ Rθ) (6.49) (6.50) Θk(θ) = [δxx,−2 ¯Ixx(θ)(In ⊗ ˆ¯W T where ¯Qθ := diag(Qθ, 0) and δxx = [e−γT (φ1(Xθ(t1)) − φ1(Xθ(t0))), e−γT (φ1(Xθ(t2)) − φ1(Xθ(t1))), . . . , (6.51) e−γ(τ−t)(Φ(Xθ) ⊗ Φ(Xθ))dτ , . . . , e−γT (φ1(Xθ(tl)) − φ1(Xθ(tl−1)))]T ¯Ixx(θ) = [(cid:82) t1 (cid:82) tl e−γ(τ−t)(Φ(Xθ) ⊗ Φ(Xθ))dτ ,(cid:82) t2 t0 e−γ(τ−t)(Φ(Xθ) ⊗ Φ(Xθ))dτ ]T tl−1 t1  [(cid:82) t1 (cid:82) t2 . . . ,(cid:82) tl t0 t1 e−γ(τ−t)Xθ ⊗ Xθdτ , e−γ(τ−t)Xθ ⊗ Xθdτ, e−γ(τ−t)Xθ ⊗ Xθdτ ] tl−1 e−γ(τ−t)( ˆ¯W T k Φ(Xθ) ⊗ ˆ¯W T e−γ(τ−t)( ˆ¯W T k Φ(Xθ) ⊗ ˆ¯W T tl−1 [(cid:82) t1 (cid:82) t2 . . . ,(cid:82) tl e−γ(τ−t)(Φ(Xθ) ⊗ u)dτ ,(cid:82) t2 k Φ(Xθ) ⊗ ˆ¯W T e−γ(τ−t)( ˆ¯W T t1 t0 k Φ(Xθ))dτ , k Φ(Xθ))dτ , t1 e−γ(τ−t)(Φ(Xθ) ⊗ u)dτ ]T ,(cid:82) tl tl−1 Ixx(θ) = Ixu(θ) = [(cid:82) t1 t0 k Φ(Xθ))dτ ] e−γ(τ−t)(Φ(Xθ) ⊗ u)dτ , . . . T  (6.52) (6.53) (6.54) 5: Solve ˆW V k ∈ Rl1 and ˆ¯Wk+1 ∈ Rl2×m from (6.55) 147 (cid:35) (cid:34) ˆW V k vec( ˆ¯W T k+1) = (ΘT k (θ)Θk(θ)) −1 ΘT k (θ)Ξk(θ) (6.55) and update the value function and control policy as follows: T k (Xθ) := ( ˆW V ˆV θ k ) φ1(Xθ) (6.56) and uk+1 θ (t) := ˆ¯W T 6: Let k ← k + 1, and go to Step 5 until (cid:13)(cid:13)(cid:13) ˆW V k+1Φ(Xθ) k − ˆW V k−1 (cid:13)(cid:13)(cid:13) ≤ ε for k ≥ 1, where the (6.57) constant ε > 0 is a predefined small threshold. 7: Use u∗ (t) and V θ∗ θ(t) := uk+1 k (Xθ) as the approximated optimal control policy and its approximated optimal value function corresponding to a new set of θ, respectively. (Xθ) := ˆV θ θ 8: end procedure Theorem 2. Convergence of Algorithm 2. Let the new hyperparameter θ be admissible. Using the fixed stabilizing control policy u(t), when N ≥ l1 + m× l2, uk+1 in the off-policy Algorithm 2, converges to the optimal control policy u∗ (t) obtained from solving (6.55) θ θ(t), ∀θ ∈ ¯λ. Proof. ∀θ ∈ ¯λ, set Qθ, Rθ, and rθ. Using (6.49)-(6.54), it follows from (6.55) that ˆW V ˆ¯Wk+1 ∈ Rl2×m satisfy the following Bellman equation. k ∈ Rl1 and 1 (φ1(Xθ(t + δt)) − φ1(Xθ(t))) e−γ(τ−t)((Xθ)T ¯Qθ(Xθ)) + uk e−γ(τ−t)(−2( ˆ¯W T T Rθuk)dτ k+1Φ(Xθ))Rθ(u − uk))xdτ (6.58) t t e−γT ˆW T = −(cid:82) t+δt +(cid:82) t+δt  W V vec( ¯W T )  = (ΘT Now, let W V ∈ Rl1 and ¯W ∈ Rl2×m such that uk = ¯W T Φ(Xθ) and −1 k (θ)Θk(θ)) ΘT k (θ)Ξk(θ) (6.59) Then, one immediately has W V = ˆW V k+1). If the condition given in Step 2 is satisfied, i.e., the information is collected at N ≥ l1 + m × l2 points, then [ ˆW V T k+1)] has N independent elements, and therefore the solution of least squares (LS) in (6.55) and (6.59) k and vec( ¯W ) = vec( ˆ¯W T k , vec( ˆ¯W T are equal and unique. That is, ˆW V k = W V and ˆ¯Wk+1 = ¯W . This completes the proof. 148 6.6 Simulation Results The presented algorithm is validated for the steering control of an autonomous vehicle in the lane changing scenario. Consider the following vehicle dynamics for steering control as [178] ˙x(t) = Ax(t) + Bu(t) (6.60) with A =  0 0 0 vT 0 0 vT 0 0 1 akf−bkr 0 − kf +kr mT vT 0 − akf−bkr IT − mT vT + mT vT vT − a2kf +b2kr IT vT  , B =   0 0 kf mT vT akf IT (6.61) where x(t) = [x1 x2 x3 x4] of vehicle y, yaw angle ψ, slip angle α and rate of change of yaw angle ˙ψ. Moreover, δ represents . The state variables are defined as lateral position T T = [y ψ α ψ] steering angle and act as the control input. vT denotes the longitudinal speed of the vehicle. mT is the total mass of the vehicle and IT is its moment of inertia with respect to the center of the mass. kf and kr denote the stiffness parameters of the tire. Finally, a and b show the distance of front and rear tires to the center of the mass. The value of vehicle parameters used in the simulation are provided in Table I. Table 6.1: Vehicle Parameters Parameter mT IT vT a Value 1300 kg 10000 m2.kg 16 m/s 1.6154 m Parameter Value kf kr β b 91000 N/rad 91000 N/rad 2 1.8846 m For the validation of the presented algorithm, lane changing scenario for two lane highway is consider as shown in Fig. 6.2. In this simulation following STL constraint is considered, the vehicle state is subject to desired specification on the offset from center line, i.e., ϕ = (cid:3) (|x1 − r| < 1) 149 Figure 6.2: Lane changing scenario for steering control of autonomous vehicle. Figure 6.3: Lane changing with fixed value of hyperparameter without any change in dynamics. with r as the center lane trajectory (act as set point value). In the simulation, set point value is selected as r = 1 for t < 4s, otherwise r = 3. Fig. 6.3 shows the result for lane changing with fixed values of hyperparameters and without any change in system dynamics. The control policy or steering angle in (60) is evaluated based on off-policy RL algorithm in [171] with hyperparameter values as Q = diag(10, 10, 10, 10) and R = 2. Then, in Fig. 6.4, we consider the change in the system dynamics after t=4s and it becomes ˙x(t) = (A + ∆A)x(t) + (B + ∆B)u(t) (6.62) 150 0246810Time (s)01234Lateral Position (m)r=3r=1Lane 1Lane 2 with ∆A =  0 0 0 0 ∆vT 0 0 0  , ∆B =   0 0 kf mT ∆vT 0 (6.63) ∆vT 0 − kf +kr mT ∆vT 0 0 0 − mT ∆vT + akf−bkr ∆vT mT ∆vT − a2kf +b2kr IT ∆vT The vehicle parameter values for (6.63) are provided in Table I. Control input is evaluated based on off-policy RL algorithm in [171] with fixed hyperparameter values as Q = diag(10, 10, 10, 10) and R = 2. The result in Fig. 6.4 shows that that the vehicle starts wavering and goes out of the lane. That is, vehicle violates the desired specification, i.e., ϕ = (cid:3) (|x1 − r| < 1) after change in the dynamics. Now, in order to implement the presented algorithm, first the fitness function in (6.22) is learned as GP based on the temporal difference in (6.23) . Note the fitness function is learned offline and it is implemented for online monitoring and control in meta-cognitive layer. Figs. 6.5 and 6.6 show the vehicle trajectories in Figs. 6.3 and 6.4. Based on the result in predicted fitness function based on the learned GP for the vehicle trajectories in Figs. 6.3 and 6.4. Based on the result in vehicle trajectories in Figs. 6.3 and 6.4. Based on the result in Figs. 6.6 and 6.7, one can see how the fitness value grow due to the operation of the system close or beyond the desired STL specification. The fitness value is used for meta-cognitive monitoring and intermittent evaluation of the meta-cognitive control layer. Figure 6.4: Constraint violation during lane changing with fixed value of hyperparameter and change in dynamics. 151 0246810Time (s)01234Lateral Position (m)Lane 2Lane 1r=3r=1 Figure 6.5: Predicted fitness corresponding to vehicle trajectory under normal operation. Figure 6.6: Predicted fitness corresponding to vehicle trajectory under constraint violation. Figure 6.7: Overall fitness value under desired STL constraint violation for the vehicle trajectory. Based on the meta-cognitive monitor in (39), Algorithm 1 is evaluated using the survival score function in (41) and determine the optimum hyperparameter to ensure the desired STL specification, i.e., ϕ = (cid:3) (|x1 − r| < 1). Fig. 6.8 shows the vehicle trajectory with hyperparameter adaptation 152 0246810Time (s)0123Predicted Fitness0246810Time (s)01234Predicted Fitness 0246810Time (s)0200040006000800010000Overall Fitness Value based on Algorithm 1 with change in dynamics. The new optimum hyperparameter values are found to be Q = diag(96.11, 1.2, 1, 1.5) and R = 1. Also, Fig. 6.9 shows how the predicted fitness value converges close to zero after the hyperparameter adaptation and overall fitness value becomes constant as shown in Fig. 6.10. Figure 6.8: Vehicle trajectory with hyperparameter adaptation based on Algorithm 1 for lane changing scenario. Figure 6.9: Predicted fitness corresponding to vehicle trajectory with hyperparameter adaptation based on Algorithm 1. The presented algorithm is employed to learn control solutions with good enough performances while satisfying desired specifications and properties expressed in terms of STL. As shown in Figs. 6.8 and 6.9, based on meta-cognitive layer hyperparameters are adapted and lane changing problem for autonomous vehicle is solved without violating any constraint. 153 0246810Time (s)01234Lateral Postion (m)r=3r=1Lane 1Lane 20246810Time (s)01234Predicted Fitness Figure 6.10: Overall fitness value under desired STL constraint for adapted vehicle trajectory . 6.7 Conclusion In this chapter, an assured metacognitive RL-based autonomous control framework is presented to learn control solutions with good enough performances while satisfying desired specifications and properties expressed in terms of STL. We discussed that the pre-specified reward functions cannot guarantee the satisfaction of the desired specified constraints and properties across all circumstances that an uncertain system might encounter. That is, the system either violates safety specifications or achieves no optimality and liveness specifications. In order to overcome this issue, learning what re- ward functions to choose to satisfy desired specifications and to achieve a good enough performance across variety of circumstances, a metacognitive decision-making layer is presented in augmentation with the performance-driven layer. More specifically, an adaptive reward function is presented in terms of its gains and adaptive reference trajectory (hyperparameters), and these hyperparameters are determined based on metacognitive monitor and control to assure the satisfaction of the de- sired STL safety and liveness specifications. The proposed approach separates learning the reward function that satisfies specifications from learning the control policy that maximizes the reward and thus allows us to evaluate as many hyperparameters as required using reused data collected from the system dynamics. 154 051015Time (s)0200400600Overall Fitness Value CHAPTER 7 CONCLUSION AND FUTURE WORK This dissertation analyzed the adverse effects of attacks and designed resilient distributed control mechanisms for multi-agent cyber-physical systems with guaranteed performance and consensus under mild assumptions. The effectiveness of the developed approach is certified by applying it to distributed frequency and voltage synchronization of AC microgrids under data manipulation attacks. Then, the adverse effects of cyber-physical attacks on distributed sensor networks are analyzed and attack mitigation mechanism for the event-triggered distributed Kalman filter is pre- sented. It is shown that although event-triggered mechanisms are highly desirable, the attacker can leverage the event-triggered mechanism to cause triggering misbehaviors which significantly harms the network connectivity and performance. Then, an entropy estimation-based attack detection and mitigation mechanisms are designed. Finally, the safe reinforcement learning framework for autonomous control systems under constraints is developed. Reinforcement learning agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an uncertain system might encounter. To guarantee performance while assuring the satisfaction of safety constraints across a variety of circumstances, an assured autonomous control framework is designed by empowering RL algorithms with meta-cognitive learning capabilities. The following are some of the directions for the future work • A possible direction for future work is to extend the results of resilient control designs to synchronization of DMASs with heterogeneous nonlinear dynamics. Since nonlinear systems can exhibit finite-time escape behavior, a problem of interest is to find the conditions un- der which the attacker can make the trajectories of agents unbounded in finite time and to obtain detection and mitigation mechanisms to counteract such attacks fast and thus avoid instability. • Another possible direction is to extend the presented results to the containment control prob- lem for which there exists more than one leader or exo-system with different dynamics under network uncertainties. 155 • Extension to meta-cognitive resilient design, i.e., a combination of high-level rules with au- tonomy can be the next level of resiliency which allows the system to learn and adapt from adversarial situations and acquire a level of resiliency against the unforeseen using some prior knowledge. • Also possible direction for future work is to extend the results of safe reinforcement learning to a safe learning-based control framework with conflicting constraints. 156 APPENDIX 157 A.1 Proof of Theorem 1 in Chapter 5 Note that for the notional simplicity, in the following proof, we keep the sensor index i but ignore the time-indexing k. Without the time index, we represent the prior at time k + 1 as ¯xa i (k + 1) ∆ = (¯xa i )+ and follow the same for other variables. Using the process dynamics in (5.1) and the corrupted prior state estimate in (5.17), one has i )+ = x+ − (¯xa (¯ηa i )+ = A(x − ˆxa i ) + w, (1) where the compromised posterior state estimate ˆxa i (k) follows the dynamics (5.17). Similarly, using (5.17), the corrupted posterior state estimation error becomes i = x − ˆxa ηa i = x − ¯xa i − Ka i (yi − C ¯xa i ) − γi j − ˜xa (˜xa i ) − Ka i fi. (cid:88) j∈Ni Then, one can write (1)-(2) as (¯ηa where i )+ = Aηa i = (In − Ka ηa (cid:88) j∈Ni ua i = γi i + w, i Ci)¯ηa i − Ka i vi + ua i , j − ˜ηa (˜ηa i ) − Ka i fi. (2) (3) (4) (5) (6) (7) Based on (5.4), we define the predictive state estimation error, respectively, under attack as i )+ = x+ − (˜xa (˜ηa i (¯ηa = ζ+ i )+ i )+ + (1 − ζ+ i )A˜ηa i . Using (3), the corrupted covariance of the prior state estimation error becomes (cid:104) (cid:104) i )+)T(cid:105) ( ¯P a i )+ = E (¯ηa i )+((¯ηa (Aηa , i + w)(Aηa = E i + w )T(cid:105) = A ˆP a i AT + Q. Using the corrupted predictive state estimate error (˜ηa i )+ in (5) with ( ¯P a i,jAT + Q, one can write the cross-correlated predictive state estimation error covariance ( ˜P a i,j)+ = A ˆP a i,j)+ as ( ˜P a i,j)+ = E j )+)T(cid:105) (cid:104) i )+((˜ηa (˜ηa i (1 − ζ+ +ζ+ i ζ+ j ( ¯P a = ζ+ i,j)+ + (1 − ζ+ j )A( ˘P a i,j)+ + (1 − ζ+ i )ζ+ j ( i )(1 − ζ+ j )(A ˜P a a (cid:95) i,j)+AT P i,jAT + Q), 158 where a (cid:95) i,j and ˘P a P i,j be the cross-correlated estimation error covariances and their updates are given in (8)-(9). The cross-correlated estimation error covariance ( a (cid:95) i,j)+ in (7) is given by P (cid:104) a (cid:95) i,j)+ = E P ( j )+)T(cid:105) = ζ+ (˜ηa i )+((¯ηa i ( ¯P a (1 − ζ+ i,j)+ + (1 − ζ+ a (cid:95) i,j M a i )[A P r∈Ni i AT + Q], i )A(cid:80) i,r − ˜P a ( ˜P a i,j)(γiA)T + (8) where ˜P a i,j and ˘P a i,j denote the cross-correlated estimation error covariances evolve according to (7) and (9). Similarly, ( ˘P a i,j)+ is updated based on the expression given by ( ˘P a i,j)+ = E (cid:104) (cid:104) (¯ηa j )+)T(cid:105) i )+((˜ηa j )+ + (1 − ζ+ i )+(ζ+ (¯ηa (cid:80) i,j)+ + (1 − ζ+ j )[A(M a j ( ¯P a +(1 − ζ+ s∈Ni j )Aγi j (¯ηa = E = ζ+ j + w) )T(cid:105) j )(A˜ηa a i )T (cid:95) i,j AT + Q] P s,j − ˜P a i,j)AT . ( ˜P a (9) (10) (11) (12) (13) Now using (2)-(5), one can write the covariance of posterior estimation error ˆP a i as i )(Ka i )(γiua Using (6) and measurement noise covariance, the first two terms of (10) become i )T ] + E[Ka i )T ]+E[(γiua i = E[Mi ¯ηa ˆP a −2E[Ka i vi)T ]−2E[(Mi ¯ηa i )T ] + 2E[(Mi ¯ηa i (Mi ¯ηa i vi(γiua i vi)T ] i )T ], i )(γiua i vi(Ka E[Mi ¯ηa i (Mi ¯ηa i )T ] = Mi i M T ¯P a i , E[Ka i vi(Ka i vi)T ] = Ka i Ri(Ka i )T . According to Assumption 1, the measurement noise vi is i.i.d. and uncorrelated with state estimation errors, therefore, the third and fourth terms in (10) become zero. Now ua i in (4) and Assumption 1, the last two terms in (10) can be simplified as E[(ua i )(ua i )T ]= γi 2(E +E[Ka = γi −2Ka (cid:104) [(cid:80) 2((cid:80) j∈Ni i fi(Ka j∈Ni i E[fi (cid:80) (cid:80) j∈Ni = 2γi i ))]T(cid:105) i )][(cid:80) (cid:80) j − ˜ηa j∈Ni (˜ηa i fi)T ]−2Ka i E[fi (cid:80) j − 2 ˜P a ( ˜P a i,j + ˜P a j − ˜ηa j∈Ni (˜ηa j − ˜ηa (˜ηa j − ˜ηa j∈Ni (˜ηa i )T ]), i Σf i ) + Ka i (Ka i )T i )T ]), j − ˜ηa j∈Ni (˜ηa a i,j − (cid:95) (cid:95) ( P P i ) − Ka a i )M T i fi)(Mi ¯ηa i )T ], i E[fi(¯ηa i − 2Ka i )T ]M T i , 159 and 2E[(ua i )(Mi ¯ηa i )T ] = 2E[(γi where the cross-correlated term a (cid:95) i,j is updated according to (8). Using (10)-(13), the posterior P state estimation error P a ˆP a i = M a i +2γi i under attacks is given by (cid:80) i [Ri + Σf ¯P a i (M a i )T + Ka a i,j − (cid:95) (cid:80) j∈Ni i )(M a ( P j − ˜ηa j∈Ni (˜ηa i )T ]) + E[fi(¯ηa a (cid:95) P i ](Ka i )T + γi 2((cid:80) i )T − 2Ka i Ξf j∈Ni with Ξf = [E[fi i )T ](M a i )T ]. This completes the proof. j − 2 ˜P a ( ˜P a i,j + ˜P a i ), (14) 160 BIBLIOGRAPHY 161 BIBLIOGRAPHY [1] R. Olfati-Saber, J. A. Fax, and R. Murray, "Consensus and cooperation in networked multi- agent systems," Proceedings of the IEEE, vol. 95, no. 1, pp. 215-233, 2007. [2] F. Bullo, J. Cortés, and S. Martinez, Distributed control of robotic networks: a mathematical approach to motion coordination algorithms, vol. 27, Princeton University Press, 2009. [3] A. Khanafer, and T. Başar, "Robust distributed averaging: When are potential-theoretic strate- gies optimal?," IEEE Transactions on Automatic Control, vol. 61, no. 7, pp. 1767-1779, 2016. [4] Q. Zhu, and T. Başar, "Game-theoretic methods for robustness, security, and resilience of cyber- physical control systems: games-in-games principle for optimal cross-layer resilient control systems," IEEE Control Systems Magazine, vol. 35, no. 1, pp. 46-65, 2015. [5] F. Pasqualetti, F. Dorfler, and F. Bullo, "Attack detection and identification in cyber-physical systems," IEEE transactions on automatic control, vol. 58, no. 11, pp. 2715-2729, 2013. [6] F. Pasqualetti, F. Dorfler, and F. Bullo, "Control-theoretic methods for cyber-physical security: Geometric principles for optimal cross-layer resilient control systems," IEEE Control Systems Magazine, vol. 35, no. 1, pp. 110-127, 2015. [7] H. Fawzi, P. Tabuada, and S. Diggavi, "Secure estimation and control for cyber-physical sys- tems under adversarial attacks," IEEE Transactions on Automatic control, vol. 59, no. 6, pp. 1454-1467, 2014. [8] Y. Shoukry, and P. Tabuada, "Event-triggered state observers for sparse sensor noise/attacks," IEEE Transactions on Automatic Control, vol. 61, no. 8, pp. 2079-2091, 2016. [9] Y. Yan, P. Antsaklis, and V. Gupta, "A resilient design for cyber physical systems under attack," In American Control Conference (ACC), pp. 4418-4423, 2017. [10] A. Teixeira, I. Shames, H. Sandberg, and K.H. Johansson, "A secure control framework for resource-limited adversaries," Automatica, vol. 51, pp. 135-148, 2015. [11] E. Akyol, T. Başar, and C. Langbort, "Signaling games in networked cyber-physical systems with strategic elements," In 56th IEEE Conference on Decision and Control (CDC), pp. 4576- 4581, 2017. [12] M. O. Sayin, and T. Başar, "Secure sensor design for cyber-physical systems against advanced persistent threats," In International Conference on Decision and Game Theory for Security, pp. 91-111, 2017. [13] A. Kanellopoulos, and K.G. Vamvoudakis, "Non-equilibrium dynamic games and cy- ber–physical security: A cognitive hierarchy approach," Systems and Control Letters, vol. 125, pp. 59-66. 2019 [14] K. G. Vamvoudakis, J. P. Hespanha, B. Sinopoli, and Y. Mo, "Detection in adversarial envi- ronments," IEEE Transactions on Automatic Control, vol. 59, no. 12, pp. 3209-3223, 2014. 162 [15] A. Kanellopoulos, and K.G. Vamvoudakis, "A moving target defense control systems," IEEE Transactions on Automatic Control, frame- doi: work for 10.1109/TAC.2019.2915746. cyber-physical [16] F. Pasqualetti, A. Bicchi, and F. Bullo, "Consensus computation in unreliable networks: A system theoretic approach," IEEE Transactions on Automatic Control, vol. 57, no. 1, pp. 90- 104, 2012. [17] S.Weerakkody, X. Liu, S.H. Son, and B. Sinopoli, "A graph-theoretic characterization of perfect attackability for secure design of distributed control systems," IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp. 60-70, 2017. [18] S. Sundaram, and C. Hadjicostis, "Distributed function calculation via linear iterative strategies in the presence of malicious agents," IEEE Transactions on Automatic Control, vol. 56, no. 7, pp. 1495-1508, 2011. [19] S. M. Dibaji, H. Ishii, and R. Tempo, "Resilient randomized quantized consensus," IEEE Transactions on Automatic Control, vol. 63, no. 8, pp. 2508-2522, 2018. [20] H. J. LeBlanc, H. Zhang, and X. Koutsoukos, and S. Sundaram, "Resilient asymptotic consen- sus in robust networks," IEEE Journal on Selected Areas in Communications, vol. 31, no. 4, pp. 766-781, 2013. [21] H. J. LeBlanc, and X. Koutsoukos, "Resilient first-order consensus and weakly stable, higher order synchronization of continuous-time networked multiagent systems," IEEE Transactions on Control of Network Systems, vol. 5, no. 3, pp. 1219-1231, 2018. [22] X. Jin, W. Haddad, and T. Yucelen, "An adaptive control architecture for mitigating sensor and actuator attacks in cyber-physical systems," IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 6058-6064, 2017. [23] K. G. Vamvoudakis, and J. P. Hespanha, "Game-theory-based consensus learning of double- integrator agents in the presence of worst-case adversaries," Journal of Optimization Theory and Applications, vol. 177, no. 1, pp. 222-253, 2018. [24] M. Pirani, E. Nekouei, S. M. Dibaji, H. Sandberg, and K. H. Johansson, "Design of attackt re- silient consensus dynamics: A game-theoretic approach," In 18th European Control Conference (ECC), pp. 2227-2232, 2019. [25] S. M. Dibaji, M. Pirani, D.B. Flamholz, A.M. Annaswamy, K. H. Johansson, and A. Chakrabortty, "A systems and control perspective of CPS security," 2019. [26] S. Kotz, and N. Johnson, Process capability indices. Chapman and Hall/CRC, 1993. [27] Z. Li, and Z. Duan, Cooperative control of multi-agent systems: a consensus region approach. CRC Press, 2014. [28] F. L. Lewis, H. Zhang, K. Hengster-Movric, and A. Das, Cooperative control of multi-agent systems: optimal and adaptive design approaches. Springer Science and Business Media, 2013. [29] Y. Su, and J. Huang, "Stability of a class of linear switching systems with applications to two consensus problems," IEEE Transactions on Automatic Control, vol. 57, no. 6, pp. 1420-1430, 2012. 163 [30] H. Zhang, F. L. Lewis, and A. Das, "Optimal design for synchronization of cooperative systems: state feedback, observer and output feedback," IEEE Transactions on Automatic Control, vol. 56, no. 8, pp. 1948-1952, 2011. [31] E. Daniel, E. Frisk, and M. Krysander, "A method for quantitative fault diagnosability analysis of stochastic linear descriptor models," Automatica, vol. 49, no. 6, pp. 1591-1600, 2013. [32] K. Michel, and J. Hao, "Distributed sensor fault detection and isolation over network," IFAC Proceedings Volumes, vol. 47, no. 3, pp. 11458-11463, 2014. [33] D. Kazakos, and P. Kazakos, Detection and Estimation. Computer Science Press, 1990. [34] M. Wax, and T. Kailath, "Detection of signals by information theoretic criteria," IEEE Trans- actions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 387-392, 1985. [35] Y. Mo, and B. Sinopoli, "Secure control against replay attacks," In 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 911-918, 2009. [36] Z. Guo, K. H. Johansson, and L. Shi, "Worst-case stealthy innovation-based linear attack on remote state estimation," Automatica, vol. 89, pp. 117-124, 2018. [37] T. Li, and J. F. Zhang, "Mean square average-consensus under measurement noises and fixed topologies: Necessary and sufficient conditions," Automatica, vol. 45, no. 8, pp. 1929-1936, 2009. [38] J. G. Proakis, Digital Communications, New York, NY, USA:McGraw-Hill, 1995. [39] P. Fernando, Kullback-Leibler divergence estimation of continuous distributions. IEEE inter- national symposium on information theory, pp. 1666-1670, 2008. [40] Q. Wang, S. Kulkarni, and S. Verdu, "Divergence estimation for multidimensional densities via k-nearest-neighbor Distances," IEEE Transactions on Information Theory, vol. 55, no. 5, pp. 2392-2405, 2009. [41] M. Basseville, and I. V. Nikiforov, Detection of abrupt changes: theory and application, vol. 104, Englewood Cliffs: Prentice Hall, 1993. [42] T. M. Cover, and J. A. Thomas, Elements of information theory. John Wiley and Sons, 2012. [43] M. H. Protter, and C. B. Morrey, Differentiation under the Integral Sign. New York: Springer, 1985. [44] W. Ren, and R. W. Beard, "Consensus seeking in multiagent systems under dynamically chang- ing interaction topologies," IEEE Transactions on automatic control, vol. 50, no. 5, pp. 655-661, 2005. [45] D. T. Ton, and M. A. Smith, “The U.S. Department of Energy’s Microgrid Initiative", Elseiveir, The Electricity Journal, vol. 25, pp. 84-94, Oct. 2012. [46] A. Bidram, and A. Davoudi, “Hierarchical structure of microgrids control system," IEEE Trans. Smart Grid, vol. 3, pp. 1963-1976, Dec. 2012. 164 [47] Z. Li, C. Zang, P. Zeng, H. Yu, and S. Li, “Fully distributed hierarchical control of parallel grid-supporting inverters in islanded AC microgrids," IEEE Trans. Ind. Informat., vol. 14, no. 2, pp. 679-690, Feb. 2018. [48] J. Schiffer, T. Seel, J. Raisch, and T. Sezi, “Voltage stability and reactive power sharing in inverter-based microgrids with consensus based distributed voltage control," IEEE Trans. Con- trol Syst. Technol., vol. 24, no. 1, pp. 96-109, Jan. 2016. [49] M. Yazdanian and A. Mehrizi-Sani, “Distributed control techniques in microgrids," IEEE Trans. Smart Grid, vol. 5, no. 6, pp. 2901-2909, Nov. 2014. [50] A. Bidram, A. Davoudi, F. L. Lewis, and J. M. Guerrero, “Distributed cooperative control of microgrids using feedback linearization," IEEE Trans. Power Syst., vol. 28, no. 3, pp. 3462- 3470, Aug. 2013. [51] A. Bidram, F. L. Lewis, and A. Davoudi, “Distributed control systems for small-scale power networks: Using multiagent cooperative control theory," IEEE Control Systems Magazine, vol. 34, no. 6, pp. 56-77, Nov. 2014. [52] J. Duan, C. Wang, H. Xu, W. Liu, J. C. Peng, and H. Jiang, “Distributed control of inverter- interfaced microgrids with bounded transient line currents," IEEE Trans. Ind. Informat., vol. 14, no. 5, pp. 2052-2061, May 2018. [53] San Diego Gas and Electric Company, “Smart grid architecutre demonstrations program – EPIC-1, Project 1 report," Electric Power Investment Charge (EPIC), Dec. 2017. [54] A. Bidram, A. Davoudi, and F. L. Lewis, “A Multiobjective distributed control framework for islanded AC microgrids," IEEE Trans. Ind. Informat., vol. 10, no. 3, pp. 1785-1798, May 2014. [55] A. Bidram, A. Davoudi, F. L. Lewis, and Z. Qu, “Secondary control of microgrids based on distributed cooperative control of multi-agent systems," IET Generation, Transmission, Dis- tribution, vol. 7, no. 8, pp. 822-831, Aug. 2013. [56] N. M. Dehkordi, H. R. Baghaee, N. Sadati and J. M. Guerrero, “Distributed noise-resilient secondary voltage and frequency control for islanded microgrids," IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 3780-3790, July 2019. [57] J. Duan, C. Wang, H. Xu, W. Liu, Y. Xu, J. C. Peng, and H. Jiang, “Distributed control of inverter-interfaced microgrids based on consensus algorithm with improved transient perfor- mance," IEEE Trans. Smart Grid, vol. 10, no. 2, pp. 1303-1312, Mar. 2019. [58] D. Jin, Z. Li, C. Hannon, C. Chen, J. Wang, M. Shahidehpour, and C. W. Lee, “Toward a cyber resilient and secure microgrid using software-defined networking," IEEE Trans. Smart Grid, vol. 8, no. 5, pp. 2494-2504, Sept. 2017. [59] X. Liu and Z. Li, "False data attacks against AC state estimation with incomplete network information," IEEE Trans. Smart Grid, vol. 8, no. 5, pp. 2239-2248, Sept. 2017. [60] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, "Limiting false data attacks on power system state estimation," in Proc. 44th Annu. Conf. Inf. Sci. Syst. (CISS), 2010, pp. 1-6. 165 [61] Y. Liu, P. Ning, and M. K. Reiter, "False data injection attacks against state estimation in electric power grids," ACM Trans. Inf. Syst. Security, vol. 14, no. 1, pp. 1-33, May 2011. [62] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, “Malicious data attacks on the smart grid," IEEE Trans. Smart Grid, vol. 2, no. 4, pp. 645-658, Dec. 2011. [63] M. Chlela, D. Mascarella, G. Joos, and M. Kassouf, “Fallback control for isochronous energy storage systems in autonomous microgrids under denial-of-service cyber-attacks," IEEE Trans. Smart Grid, vol. 9, no. 5, pp. 4702-4711, Sept. 2018. [64] W. Meng, X. Wang and S. Liu, “Distributed load sharing of an inverter-based microgrid with reduced communication," IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 1354-1364, Mar. 2018. [65] B. Schafer, D. Witthaut, M. Timme, and V. Latora, “Dynamically induced cascading failures in power grids" Nature Communications, vol. 9, Article Number 1975, pp. 1-13, 2018. [66] Y. Huang, J. Tang, Y. Cheng, H. Li, K. A. Campbell, and Z. Han, “Real-time detection of false data injection in smart grid networks: An adaptive cusum method and analysis," IEEE Syst. J., vol. 10, no. 2, pp. 532-543, June 2016. [67] K. Manandhar, X. Cao, F. Hu, and Y. Liu, “Detection of faults and attacks including false data injection attack in smart grid using kalman filter," IEEE Trans. Control Netw. Syst., vol. 1, no. 4, pp. 370-379, Dec. 2014. [68] S. Bi and Y. J. Zhang, “Graphical methods for defense against false-data injection attacks on power system state estimation," IEEE Trans. Smart Grid, vol. 5, no. 3, pp. 1216-1227, May 2014. [69] Y. Mo, R. Chabukswar, and B. Sinopoli, “Detecting integrity attacks on scada systems," IEEE Trans. Control Syst. Technol., vol. 22, no. 4, pp. 1396-1407, July 2014. [70] L. Liu, M. Esmalifalak, Q. Ding, V. A. Emesih, and Z. Han, “Detecting false data injection attacks on power grid by sparse optimization,” IEEE Trans. Smart Grid, vol. 5, no. 2, pp. 612-621, Mar. 2014. [71] D. B. Rawat, and C. Bajracharya, “Detection of false data injection attacks in smart grid communication systems," IEEE Signal Process. Lett., vol. 22, no. 10, pp. 1652–1656, Oct. 2015. [72] X. Wang, X. Luo, Y. Zhang, and X. Guan, “Detection and isolation of false data injection attacks in smart grids via nonlinear internal observer," IEEE Trans. Smart Grid, vol. 6, no. 4, pp. 6498–6512, Aug. 2019. [73] L. Y. Lu, H. J. Liu, and H. Zhu, “Distributed secondary control for isolated microgrids under malicious attacks," in Proc. North American Power Symposium (NAPS), Denver, CO, USA, 2016, pp. 1-6. [74] O. A. Beg, T. T. Johnson, and A. Davoudi, “Detection of false-data injection attacks in cyber- physical DC microgrids," IEEE Trans. Ind. Informat., vol. 13, no. 5, pp. 2693-2703, Oct. 2017. [75] S. Saha, T. K. Roy, M. A. Mahmud, M. E. Haque, S. N. Islam, “Sensor fault and cyber attack resilient operation of DC microgrids," Int. J. Electr. Power Energy Syst., vol. 99, pp. 540-554, 2018. 166 [76] O. A. Beg, T. T. Johnson, and A. Davoudi, “Detection of false-data injection attacks in cyber- physical DC microgrids," IEEE Trans. Ind. Informat., vol. 13, no. 5, pp. 2693-2703, Oct. 2017. [77] S. Abhinav, H. Modares, F. L. Lewis, and A. Davoudi, “Resilient cooperative control of DC microgrids," IEEE Trans. Smart Grid, vol. 10, no. 1, pp. 1083-1085, Jan. 2019. [78] T. R. B. Kushal, K. Lai, and M. S. Illindala, “Risk-based mitigation of load curtailment cyber- attack using intelligent agents in a shipboard power system," IEEE Trans. Smart Grid, vol. 10, no. 5, pp. 4741-4750, Sept. 2019. [79] S. Abhinav, H. Modares, F. L. Lewis, F. Ferrese, and A. Davoudi, “Synchrony in networked microgrids under attacks,” IEEE Trans. Smart Grid, vol. 9, no. 6, pp. 6731-6741, Nov. 2018. [80] Z. Qu, Cooperative control of dynamical systems: Applications to autonomous vehicles. New York: Springer-Verlag, 2009. [81] M. Zhou, Y. Wang, A. K. Srivastava, Y. Wu, and P. Banerjee, “Ensemble-based algorithm for synchrophasor data anomaly detection,” IEEE Trans. Smart Grid, vol. 10, no. 3, pp. 2979-2988, May 2019. [82] S. Kullback, and R. A. Leibler, “On information and sufficiency”, The annals of mathematical statistics, vol. 22, no. 1, pp.79-86, 1951. [83] F. Sun, Z. H. Guan, L. Ding, and Y.W. Wang, “Mean square average-consensus for multi-agent systems with measurement noise and time delay,” International Journal of System and Science, vol. 44, no. 6, pp. 995-1005, 2013. [84] T. Li, and J. F. Zhang, “Consensus conditions of multi-agent systems with time-varying topolo- gies and stochastic communication noises,” IEEE Trans. Autom. Control, vol. 55, no. 9, pp. 2043-2057, 2010. [85] N. Mwakabuta and A. Sekar, “Comparative study of the IEEE 34 node test feeder under practical simplifications," in Proc. 39th North American Power Symposium, 2007, pp. 484-491. [86] N Cameron, and J Cortés, "Team-triggered coordination for real-time control of networked cyber-physical systems," IEEE Transactions on Automatic Control, vol. 61, no. 1, pp. 34-47, 2016. [87] K. Saulnier, D. Saldana, A. Prorok, G. Pappas and V. Kumar, "Resilient flocking for mobile robot teams," IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 1039-1046, 2017. [88] L. Zhou, V. Tzoumas, G. Pappas, and P. Tokekar, "Resilient Active Target Tracking With Multiple Robots," IEEE Robotics and Automation Letters, vol. 4, no. 1, pp. 129-136, 2019. [89] W. Abbas, V. Yevgeniy, and X. Koutsoukos, "Resilient consensus protocol in the presence of trusted nodes," in Proc. of 7th Int. Sym. on Resilient Control Sys., pp. 1-7, 2014. [90] J. Usevitch and D. Panagou, "Resilient leader-follower consensus to arbitrary reference values," in Proc. of American Control Conference., pp. 1292-1298, 2018. [91] F. Lewis, H. Zhang, K. Hengster-Movric, and A. Das, Cooperative Control of Multi-Agent Systems: Optimal and Adaptive Design Approaches, Communications and Control Engineering, Springer, London, 2013. 167 [92] Q. Jiao, H. Modares, F. L. Lewis, S. Xu, and L. Xie, "Distributed gain output-feedback control of homogeneous and heterogeneous systems," Automatica, vol. 71, pp. 361-368, 2016. [93] A. Isidori, Nonlinear control systems, Springer Science and Business Media, 2013. [94] R. Moghadam and H. Modares, "An internal model principle for the attacker in distributed control systems," In Proceedings of IEEE Conference on Decision and Control, pp. 6604-6609, 2017. [95] M. V. Jakuba, "Modeling and control of an autonomous underwater vehicle with combined foil/thruster actuators," M.S. thesis, MIT Woods Hole Oceanographic Inst., USA, 2003. [96] K. D. Kim, and P. R. Kumar, "Cyber-physical systems: A perspective at the centennial", Proceedings of the IEEE, vol. 100, pp. 1287-1308, 2012. [97] J. Lee, B. Bagheri, and H. Kao, "A cyber-physical systems architecture for industry 4.0-based manufacturing systems", Manufacturing Letters, vol. 3, pp. 18-23, 2015. [98] J. Fink, A. Ribeiro, and V. Kumar, "Robust control for mobility and wireless communication in cyber-physical systems with application to robot teams", Proceedings of the IEEE, vol. 100, no. 1, pp. 164-178, 2012. [99] S. Sridhar, A. Hahn, and M. Govindarasu, "Cyber-physical system security for the electric power grid", Proceedings of the IEEE, vol. 100, no. 1, pp. 210-224, 2012. [100] J. J. Blum, A. Eskandarian, and L. J. Hoffman, "Challenges of intervehicle ad hoc networks", IEEE Transactions on Intelligent Transportation Systems, vol. 5, no. 4, pp. 347-351, 2004. [101] J. P. Farwell, and R. Rohozinski, "Stuxnet and the future of cyber war", Survival, vol. 53, no. 1, pp. 23-40, 2011. [102] J. Slay, and M. Miller, "Lessons learned from the Maroochy water breach", Critical Infras- tructure Protection, vol. 253, pp. 73-82, 2007. [103] I. Akyildiz, W. Su, Y. Sankarasubramniam, and E. Cayirci, "A survey on sensor networks", IEEE Communications Magazine, vol. 40, no. 8, pp. 102-114, 2002. [104] B. D. O. Anderson, and J. B. Moore, Optimal Filtering, Courier corporation, 2012. [105] D. P. Spanos, R. Olfati-Saber, and R. M. Murray, "Approximate distributed Kalman filter- ing in sensor networks with quantifiable performance", Proceedings of the 4th International Symposium on Information Processing in Sensor Networks, pp. 133-139, 2005. [106] R. Olfati-Saber, "Distributed Kalman filtering for sensor networks", Proceedings of the 46th IEEE Conference on Decision and Control, pp. 5492-5498, 2007. [107] R. Olfati-Saber, "Kalman-Consensus Filter : Optimality, stability, and performance", Pro- ceedings of the 48th IEEE Conference on Decision and Control, pp. 7036-7042, 2009. [108] S. Das and J. M. F. Moura, "Distributed Kalman filtering with dynamic observations consen- sus", IEEE Transactions on Signal Processing, vol. 63, no. 17, pp. 4458-4473, 2015. 168 [109] G. Wei, W. Li, D. Ding, and Y. Liu, "Stability Analysis of Covariance Intersection-Based Kalman Consensus Filtering for Time-Varying Systems", IEEE Transactions on Systems, Man, and Cybernetics: Systems, doi: 10.1109/TSMC.2018.2855741. [110] S. Das and J. M. F. Moura, "Consensus + innovations distributed Kalman filter with optimized gains", IEEE Transactions on Signal Processing, vol. 65, no. 2, pp. 467-481, 2017. [111] U. A. Khan and J. M. F. Moura, "Distributing the Kalman Filter for Large-Scale Systems", IEEE Transactions on Signal Processing, vol. 56, no. 10, pp. 4919-4935, 2008. [112] W. Li, Y. Jia, and J. Du, "Event-triggered Kalman consensus filter over sensor networks", IET Control Theory and Applications, vol. 10, no. 1, pp. 103-110, 2016. [113] Q. Liu, Z. Wang, X. He, and D. H. Zhou, "Event-Based Recursive Distributed Filtering Over Wireless Sensor Networks", IEEE Transactions on Automatic Control, vol. 60, no. 9, pp. 2470- 2475, 2015. [114] X. Meng, and T. Chen, "Optimality and stability of event triggered consensus state estimation for wireless sensor networks", Proceedings of the 48th American Control Conference, pp. 3565- 3570, 2014. [115] G. Battistelli, L. Chisci, and D. Selvi, "A distributed Kalman filter with event-triggered communication and guaranteed stability", Automatica, vol. 93, pp. 75-82, 2018. [116] R.C. Francy, A.M. Farid, and K. Youcef-Toumi, "Event triggered state estimation techniques for power systems with integrated variable energy resources", ISA transactions, vol. 56, pp. 165-172, 2015. [117] S. Li et al., "Event-Trigger Heterogeneous Nonlinear Filter for Wide-Area Measurement Sys- tems in Power Grid", IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 2752-2764, 2019. [118] N. Sadeghzadeh Nokhodberiz, H. Nemati, and A. Montazeri, "Event-Triggered Based State Estimation for Autonomous Operation of an Aerial Robotic Vehicle", IFAC-Papers On Line, 2019. [119] M. Ouimet, D. Iglesias, N. Ahmed, and S. Martínez, "Cooperative Robot Localization Using Event-Triggered Estimation", Journal of Aerospace Information Systems, vol. 15, no. 7, pp. 427-449, 2018. [120] A. Gupta, C. Langbort, and T. Basar, "Optimal control in the presence of an intelligent jam- mer with limited actions", Proceedings of the 49th IEEE Conference on Decision and Control, pp. 1096-110, 2010. [121] L. Yu, X. Sun, and T. Sui, "False-Data Injection Attack in Electricity Generation System Subject to Actuator Saturation: Analysis and Design," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 8, pp. 1712-1719, 2019. p4r30 Y. Mo, E. Garone, A. Casavola, and B. Sinopoli, "False data injection attacks against state estimation in wireless sensor networks", Proceedings of the 49th IEEE Conference on Decision and Control, pp. 5967-5972. 2010. 169 [122] F. Miao, Q. Zhu, M. Pajic, and G. J. Pappas, "Coding schemes for securing cyber-physical systems against stealthy data injection attacks", IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp 106-117, 2017. [123] C.Z. Bai, V. Gupta, and F. Pasqualetti, "On Kalman Filtering with Compromised Sensors: Attack Stealthiness and Performance Bounds", IEEE Transactions on Automatic Control, vol. 62, no. 12, pp. 6641-6648, 2017. [124] Y. Chen, S. Kar, and J. M. F. Moura, "Resilient Distributed Estimation Through Adversary Detection", IEEE Transactions on Signal Processing, vol. 66, no. 9, pp. 2455-2469, 2018. [125] Y. Chen, S. Kar, and J. M. F. Moura, "Resilient distributed estimation: sensor attacks", IEEE Transactions on Automatic Control, 2019. [126] A. Mitra, and S. Sundaram, "Byzantine-resilient distributed observers for LTI systems", Au- tomatica, vol. 108, 2019. [127] A. Mitra, J. Richards, S. Bagchi, and S. Sundaram, "Resilient distributed state estimation with mobile agents: overcoming Byzantine adversaries, communication losses, and intermittent measurements", Autonomous Robots, 2018. [128] A. Mustafa, and H. Modares, "Analysis and detection of cyber-physical attacks in distributed sensor networks", Proceedings of the 56th Allerton Conference on Communication, Control, and Computing, pp. 973-980, 2018. [129] W. Chen, D. Ding, H. Dong, and G. Wei, "Distributed Resilient Filtering for Power Systems Subject to Denial-of-Service Attacks", IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 8, pp. 1688-1697, 2019. [130] D. Du, X. Li, W. Li, R. Chen, M. Fei, and L. Wu, "ADMM-Based Distributed State Estimation of Smart Grid Under Data Deception and Denial of Service Attacks", IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 8, pp. 1698-1711, 2019. [131] B. Chen, D. W. C. Ho, W. Zhang and L. Yu, "Distributed Dimensionality Reduction Fusion Estimation for Cyber-Physical Systems Under DoS Attacks", IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 2, pp. 455-468, 2019. [132] P. Millan, L. Orihuela, C. Vivas, F. Rubio, D. Dimarogonas, and K. H. Johansson, "Sensor network-based robust distributed control and estimation", Control Engineering Practice, vol. 21, no. 9, pp. 1238-1249, 2013. [133] A. R. Liu, and R. R. Bitmead, "Stochastic observability in network state estimation and control", Automatica, vol. 47, no. 1, pp. 65-78, 2011. [134] S. Trimpe and R. D’Andrea, "Event-based state estimation with variance-based triggering", IEEE Transactions on Automatic Control, vol. 59, no. 12, pp. 3266-3281, Dec. 2014. [135] S. Weerakkody, B. Sinopoli, S. Kar, and A. Datta, "Information flow for security in control systems", Proceedings of the 55th IEEE Conference on Decision and Control, pp. 5065-5072, 2016. 170 [136] M. N. Goria, N. N. Leonenko, V. V. Mergel, and P. L. N. Inverardi, "A new class of random vector entropy estimators and its applications in testing statistical hypotheses", Journal of Nonparametric Statistics,, vol. 17, no. 3, pp. 277-297, 2005. [137] J. Su, B. Li, and W. Chen, "On existence, optimality and asymptotic stability of the Kalman filter with partially observed inputs", Automatica, vol. 53, pp. 149-154, 2015. [138] R. S. Sutton, and A. G. Barto, Reinforcement Learning—An Introduction. Cambridge, MA: MIT Press, 1998. [139] D. P. Bertsekas, and J. N. Tsitsiklis,Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996. [140] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York: Wiley-Interscience, 2007. [141] P. J. Werbos, "A menu of designs for reinforcement learning over time", in Neural Networks for Control, W. T. Miller, R. S. Sutton, and P. J. Werbos, Eds. Cambridge, MA: MIT Press, pp. 67–95, 1991. [142] J. Kober, J. A. Bagnell, and J. Peters, "Reinforcement learning in robotics: A survey", The International Journal of Robotics Research,, vol. 32, no. 11, pp. 1238-1274, 2013. [143] R. S. Sutton, A. G. Barto and R. J. Williams, "Reinforcement learning is direct adaptive optimal control", in IEEE Control Systems Magazine, vol. 12, no. 2, pp. 19-22, 1992. [144] R. Chavarriaga, P.W. Ferrez, and J. D. Millan, "To err is human: Learning from error po- tentials in brain-computer interfaces", in Advances in Cognitive Neurodynamics, Springer, pp. 777-782, 2007. [145] H. Lu, Y. Li, M. Chen, H. Kim, and S. Serikawa, "Brain intelligence: go beyond artificial intelligence", Mobile Networks and Applications, vol. 23, no. 2, pp. 368-375, 2018. [146] A.Y. Ng, D. Harada, and S. Russell, "Policy invariance under reward transformations: Theory and application to reward shaping", in proceedings of the 16th international conference on Machine learning, pp. 278-287, 1999. [147] G. Konidaris, and A. Barto, "Autonomous shaping: Knowledge transfer in reinforcement learning", in proceedings of the 23rd international conference on Machine learning, pp. 489- 496, 2006. [148] N. Chentanez, A. Barto, and S. Singh, "Intrinsically motivated reinforcement learning", in proceedings of Advances in neural information processing systems, pp. 1281-1288, 2005. [149] B. Kiumarsi, K. G. Vamvoudakis, H. Modares and F. L. Lewis, "Optimal and Autonomous Control Using Reinforcement Learning: A Survey", in IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042-2062, 2018. [150] K. Doya, "Reinforcement learning in continuous-time and space", Neural Computation, vol. 12, pp. 219–245, 2000. [151] R. S. Sutton, A. G. Barto and R. J. Williams, "Reinforcement learning is direct adaptive optimal control", in IEEE Control Systems Magazine, vol. 12, no. 2, pp. 19-22, 1992. 171 [152] M. Ohnishi, W. Li, N. Gennaro, and M. Egerstedt, "Barrier-certified adaptive reinforcement learning with applications to brushbot navigation", IEEE Transactions on Robotics, vol. 35, no. 5, pp. 1186-1205, 2019. [153] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, "End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks", In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3387-3395, 2019. [154] F. L. Lewis, and D. Liu, Reinforcement learning and approximate dynamic programming for feedback control, John Wiley & Sons, vol. 17, 2013. [155] J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Handbook of Learning and Approximate Dynamic Programming, IEEE Press and John Wiley & Sons, 2004. [156] J. Liu, N. Ozay, U. Topcu and R. M. Murray, "Synthesis of Reactive Switching Protocols in IEEE Transactions on Automatic Control,vol. 58, From Temporal Logic Specifications", no. 7, pp. 1771-1785, 2013. [157] I. Papusha, J. Fu, U. Topcu and R. M. Murray, "Automata theory meets approximate dynamic programming: Optimal control with temporal logic constraints", in proceedings of IEEE 55th Conference on Decision and Control (CDC), pp. 434-440, 2016. [158] Y. Zhou, D. Maity and J. S. Baras, "Timed automata approach for motion planning using metric interval temporal logic", in proceedings of European Control Conference (ECC), pp. 690-695, 2016. [159] S. Saha and A. A. Julius, "Task and Motion Planning for Manipulator Arms With Metric Temporal Logic Specifications", in IEEE Robotics and Automation Letters, vol. 3, no. 1, pp. 379-386, 2018. [160] X. Li, C. Vasile and C. Belta, "Reinforcement learning with temporal logic rewards", in proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3834-3839, 2017. [161] D. Sadigh, E. S. Kim, S. Coogan, S. S. Sastry and S. A. Seshia, "A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications", in proceedings of 53rd IEEE Conference on Decision and Control, pp. 1091-1096, 2014. [162] M. Wen, R. Ehlers and U. Topcu, "Correct-by-synthesis reinforcement learning with temporal logic constraints", in proceedings of 54th IEEE Conference on Decision and Control, pp. 4983- 4990, 2015. [163] X. Li, Y. Ma and C. Belta, "A Policy Search Method For Temporal Logic Specified Reinforce- ment Learning Tasks", in proceedings of American Control Conference (ACC), pp. 240-245, 2018. [164] M. McIntire, D. Ratner, and S. Ermon, "Sparse Gaussian processes for Bayesian optimiza- tion", in proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI’16), AUAI Press, USA, pp. 517–526, 2016. 172 [165] A. Donze, and O. Maler, "Robust satisfaction of temporal logic over real-valued signals", in proceedings of in Int. Conf. on Formal Modeling and Analysis of Timed Systems, pp. 92-106, 2010. [166] G.E. Fainekos, and G. J. Pappas, "Robustness of temporal logic specifications for continuous- time signals", Theoretical Computer Science, vol. 410, no. 42, pp. 4262-4291, 2009. [167] L. Lindemann, and D. V. Dimarogonas, "Robust control for signal temporal logic specifications using discrete average space robustness", Automatica, vol. 101, pp. 377-387, 2019. [168] L. Lindemann, and D. V. Dimarogonas, "Control Barrier Functions for Signal Temporal Logic Tasks", in IEEE Control Systems Letters, vol. 3, no. 1, pp. 96-101, 2019. [169] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. [170] F. L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, 3rd ed. New York: Wiley, 2012. [171] Y. Jiang, and Z. P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics", Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. [172] H. K. Khalil, Nonlinear Systems, 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 2002. [173] Y. Engel, S. Mannor, and R. Meir, "Bayes meets Bellman: The Gaussian process approach to temporal difference learning", in Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 154-161, 2003. [174] M. McIntire, D. Ratner, and S. Ermon, "Sparse Gaussian processes for Bayesian optimiza- tion", in Proceedings of conference on Uncertainty in Artificial Intelligence (UAI), pp. 154-161, 2016. [175] F. Berkenkamp, A.P. Schoellig, and A. Krause, "Safe controller optimization for quadro- tors with Gaussian processes", in IEEE International Conference on Robotics and Automation (ICRA), pp. 491-496, 2016. [176] J. Berberich, and F. Allgöwer, "A trajectory-based framework for data-driven system analysis and control", arXiv preprint arXiv:1903.10723, 2019. [177] J. C. Willems, P. Rapisarda, I. Markovsky, and B. De Moor, "A note on persistency of excitation", Systems & Control Letters, vol. 54, pp. 325–329, 2005. [178] A. M. de Souza, D. Meneghetti, M. Ackermann, and A. de Toledo Fleury, "Vehicle Dynamics- Lateral: Open Source Simulation Package for MATLAB", SAE Technical Paper, 2016. 173