RESILIENT AND SAFE CONTROL OF CYBER-PHYSICAL SYSTEMS UNDER

UNCERTAINTIES AND ADVERSARIES

By

Aquib Mustafa

A DISSERTATION

Submitted to

Michigan State University

in partial fulﬁllment of the requirements

for the degree of

Mechanical Engineering – Doctor of Philosophy

2020

ABSTRACT

RESILIENT AND SAFE CONTROL OF CYBER-PHYSICAL SYSTEMS UNDER

UNCERTAINTIES AND ADVERSARIES

By

Aquib Mustafa

The recent growth of cyber-physical systems with a wide range of applications such as

smart grids, healthcare, search and rescue and traﬃc monitoring, to name a few, brings new

challenges to control systems due to the presence of signiﬁcant uncertainties and undesired

signals (i.e., disturbances and cyber-physical attacks). Thus, it is of vital importance to

design resilient and safe control approaches that can adapt to the situation and mitigate

adversaries to ensure an acceptable level of functionality and autonomy despite uncertainties

and cyber-physical attacks.

This dissertation begins with the analysis of adversaries and design of resilient distributed

control mechanisms for multi-agent cyber-physical systems with guaranteed performance and

consensus under mild assumptions. More speciﬁcally, the adverse eﬀects of cyber-physical

attacks are ﬁrst analyzed on the synchronization of the multi-agent cyber-physical systems.

Then, information-theoretic based detection and mitigation methods are presented by equip-

ping agents with self-belief about the trustworthiness of their own information and trust

about their neighbors. Then, the eﬀectiveness of the developed approach is certiﬁed by ap-

plying it to distributed frequency and voltage synchronization of AC microgrids under data

manipulation attacks. In the next step, to relax some connectivity assumptions in the net-

work for the resilient control design, a distributed adaptive attack compensator is developed

by estimating the normal expected behavior of agents. The adaptive attack compensator

is augmented with the controller and it is shown that the proposed controller achieves re-

silient synchronization in the presence of the attacks on sensors and actuators. Moreover,

this approach recovers compromised agents under actuator attacks and avoids propagation

of attacks on sensors without discarding information from the compromised agents.

Then, the problem of secure state estimation for distributed sensor networks is considered.

More speciﬁcally, the adverse eﬀects of cyber-physical attacks on distributed sensor networks

are analyzed and attack mitigation mechanism for the event-triggered distributed Kalman

ﬁlter is presented. It is shown that although event-triggered mechanisms are highly desirable,

the attacker can leverage the event-triggered mechanism to cause triggering misbehaviors

which signiﬁcantly harms the network connectivity and performance. Then, an entropy

estimation-based attack detection and mitigation mechanisms are designed.

Finally, the safe reinforcement learning framework for autonomous control systems under

constraints is developed. Reinforcement learning agents with pre-speciﬁed reward functions

cannot provide guaranteed safety across variety of circumstances that an uncertain sys-

tem might encounter. To guarantee performance while assuring the satisfaction of safety

constraints across variety of circumstances, an assured autonomous control framework is

designed by empowering reinforcement learning algorithms with meta-cognitive learning ca-

pabilities. More speciﬁcally, adapting the reward function parameters of the reinforcement

learning agent is performed in a meta-cognitive decision-making layer to assure the feasibility

of the reinforcement learning agent.

Copyright by
AQUIB MUSTAFA
2020

ACKNOWLEDGEMENTS

I would like to express sincere thanks to my advisor Prof. Hamidreza Modares for his

guidance, constant encouragement, and impeccable support during my doctoral research. I

have been fortunate to learn the art of research under his exceptional tutelage. Apart from

research, I have learned various things from him in my life during this journey.

I am thankful to my committee members, Prof. Ranjan Mukherjee, Prof. George Zhu,

and Prof. Zhaojian Li, for their help and insightful comments. I thank all my collaborators

for all their help throughout my doctoral work. I thank Prof. Ali Bidram for his guidance

and immense support for Chapter 3 of this dissertation. I thank Dr. Majid Mazouchi for all

his help and discussions in collaborative projects.

I believe the outcome of my year-long work was, by and large, a product of the excellent

lab ambiance and support. I sincerely thank all my lab mates and friends. I am highly thank-

ful to my close friends from Kanpur, Aligarh, Missouri, Colorado and of course Michigan for

their help and support throughout this journey.

Finally, I would like to thank all my family members for their unconditional support and

encouragement during this journey. Especially, I would like to thank my parents and elder

brothers for their immense support and sacriﬁce. I shall remain ever indebted to them.

v

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Literature Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
Secure Distributed State Estimation . . . . . . . . . . . . . . . . . .
Safe Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Contributions and outline
1.4 Publications resulted from this work . . . . . . . . . . . . . . . . . . . . . .

1.2.1 Resilient Control Design for Distributed Multi-Agent Systems
1.2.2
1.2.3

CHAPTER 2 RESILIENT SYNCHRONIZATION OF DISTRIBUTED MULTI-

AGENT SYSTEMS UNDER ATTACKS . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2 Preliminaries
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Overview of Consensus in DMASs . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Attack Modelling and Analysis for DMASs . . . . . . . . . . . . . . . . . . .
2.4.1 Attack Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Attack Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Extention of Analysis Results to the Case of Noisy Communication .
2.5 An Attack Detection Mechanism . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Attack detection for IMP-based attacks . . . . . . . . . . . . . . . . .
2.5.2 Attack detection for non-IMP-based attacks
. . . . . . . . . . . . . .
2.6 An Attack Mitigation Mechanism . . . . . . . . . . . . . . . . . . . . . . . .
2.6.1
Self-belief of agents about their outgoing information . . . . . . . . .
2.6.2 Trust of agents about their incoming information . . . . . . . . . . .
2.6.3 The mitigation mechanism using trust and self-belief values . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.2 Non-IMP-based attacks . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.7 Simulation Results

IMP-based attacks

CHAPTER 3 DETECTION AND MITIGATION OF DATA MANIPULATION

ATTACKS IN AC MICROGRIDS . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Preliminaries
3.3 Conventional Distributed Secondary Control
. . . . . . . . . . . . . . . . . .
3.4 Attack Modeling and Detection Mechanism . . . . . . . . . . . . . . . . . . .

vi

ix

x

1
1
4
4
6
7
8
11

13
13
13
15
16
16
19
26
27
28
33
35
35
37
38
41
41
43
44
45

48
48
49
49
52

3.4.1 Attack Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Attack Detection Mechanism . . . . . . . . . . . . . . . . . . . . . .
3.5 Resilient Distributed Control Mechanism . . . . . . . . . . . . . . . . . . . .
3.5.1 Belief of DERs About Their Own Observed Frequency . . . . . . . .
3.5.2 Belief of DERs About Their Neighbor’s Observed Frequency . . . . .
3.5.3 The Mitigation Mechanism Using Self and External-belief values . . .
3.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.1 Case A: Simulation results for IEEE 34-bus feeder . . . . . . . . . . .
3.6.2 Case B: Simulation results for an Islanded Microgrid with 20 DERs .
3.6.3 Case C: Experimental veriﬁcation of proposed techniques using a

hardware-in-the-loop testing setup . . . . . . . . . . . . . . . . . . . .
3.6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 4 ATTACK ANALYSIS AND RESILIENT CONTROL DESIGN FOR

4.1
4.2 Preliminaries

DISCRETE-TIME DISTRIBUTED MULTI-AGENT SYSTEMS . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Standard Distributed Consensus in MAS . . . . . . . . . . . . . . . .
4.2.2
4.3 Attack Analysis for Discrete-time DMAS . . . . . . . . . . . . . . . . . . . .
4.4 Resilient Distributed Control Protocol for Attacks on Sensor and Actuator

: An Adaptive Approach

. . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Simulation Results
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52
53
56
58
59
60
63
63
69

72
74

76
76
76
76
77
78

86
91
94

5.1
5.2 Preliminaries

CHAPTER 5 SECURE EVENT-TRIGGERED DISTRIBUTED KALMAN FIL-
TERS FOR STATE ESTIMATION OVER WIRELESS SENSOR
95
NETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
5.2.1 Process Dynamics and Sensor Models . . . . . . . . . . . . . . . . . .
97
5.2.2 Overview of Event-triggered Distributed Kalman Filter . . . . . . . .
5.2.3 Attack Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3 Eﬀect of Attack on Triggering Mechanism . . . . . . . . . . . . . . . . . . . 102
5.3.1 Non-triggering Misbehavior . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.2 Continuous-triggering Misbehavior
. . . . . . . . . . . . . . . . . . . 106
5.4 Attack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Secure Distributed Estimation Mechanism . . . . . . . . . . . . . . . . . . . 112
5.5
5.5.1 Conﬁdence of sensor nodes . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5.2 Trust of sensor nodes about their incoming information . . . . . . . . 114
5.5.3 Attack mitigation mechanism using conﬁdence and trust of sensors
. 115
5.6
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

vii

CHAPTER 6 ASSURED LEARNING-ENABLED AUTONOMY: A METACOG-

6.1
6.2 Preliminaries

6.2.1 Notations
6.2.2
6.2.3 Gaussian process

NITIVE REINFORCEMENT LEARNING FRAMEWORK . . . . . . 125
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Signal Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 126
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3 Problem Statement and Motivation . . . . . . . . . . . . . . . . . . . . . . . 129
6.4 Metacognitive Control Architecture . . . . . . . . . . . . . . . . . . . . . . . 131
. . . . . . . . . . . . . . 132
6.4.1.1 Metacognitive Monitoring . . . . . . . . . . . . . . . . . . . 134
. . . . . . . . . . . . . . . . . . . . . 142
6.4.1.2 Metacognitive Control
6.5 Low-Level RL-Based Control Architecture . . . . . . . . . . . . . . . . . . . 146
6.6 Simulation Results
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

6.4.1 Metacognitive layer Monitoring and Control

CHAPTER 7 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . 155

APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

viii

LIST OF TABLES

Table 6.1: Vehicle Parameters

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

ix

LIST OF FIGURES

Figure 2.1: Schematic representation of the proposed resilient approach for DMASs.

Figure 2.2: Communication topology.

. . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 2.3: The state of agents when Agent 1 is under an IMP-based attack. . . . . .

Figure 2.4: Agent 5 is under IMP-based attack. The state of agents.

. . . . . . . . .

Figure 2.5: Agent 5 is under IMP-based attack. The local neighborhood tracking

error of agents.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 2.6: Divergence for state of agents when Agent 5 is under a IMP-based attack.

Figure 2.7: The state of agents using the proposed attack detection and mitigation

approach for IMP-based attack.

. . . . . . . . . . . . . . . . . . . . . . .

Figure 2.8: The state of agents when Agent 5 is under a non-IMP-based attack. . . .

Figure 2.9: Divergence for state of agents when Agent 5 is under a non-IMP based

attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 2.10: The state of agents after attack detection and mitigation for non-IMP

based attack.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.1: The ﬂowchart of proposed attack detection and mitigation approach.

. .

Figure 3.2: Single line diagram of the microgrid test system in Case A. . . . . . . . .

Figure 3.3: Communication graph of the microgrid test system in Case A.

. . . . . .

Figure 3.4: Case A: Eﬀect of attack on DSFC: (a) frequency; (b) active power ratio.

Figure 3.5: Case A: Relative entropy based on frequency of DERs.

. . . . . . . . . .

Figure 3.6: Case A: Resilient DSFC: (a) frequency; (b) active power ratio.

. . . . . .

Figure 3.7: Case A: Resilient DSFC: (a) relative entropy; (b) self-believes of DERs. .

Figure 3.8: Eﬀect of periodic attack on DSFC with 0.05 s duration: (a) frequency;

(b) active power ratio.

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

38

41

42

42

42

43

43

43

44

44

57

62

62

64

64

64

64

66

Figure 3.9: Eﬀect of periodic attack on DSFC with 0.05 s duration: (a) relative

entropy; (b) self-believes of DERs. . . . . . . . . . . . . . . . . . . . . . .

Figure 3.10: Eﬀect of periodic attack on DSFC with 0.5 s duration: (a) frequency;

(b) active power ratio.

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.11: Fig. 12. Eﬀect of periodic attack on DSFC with 0.5 s duration: (a)

relative entropy; (b) self-believes of DERs.

. . . . . . . . . . . . . . . . .

Figure 3.12: Eﬀect of attack on DER 2 in DSVC: (a) voltage (V): (b) reactive power

ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.13: Case A: Relative entropy based on voltage of DERs. . . . . . . . . . . . .

Figure 3.14: Case A: Resilient DSVC: (a) voltage (V); (b) reactive power ratio. . . . .

Figure 3.15: Case A: Resilient DSVC: (a) relative entropy; (b) self-belief of DERs.

. .

Figure 3.16: Microgrid testbed with 20 DERs.

. . . . . . . . . . . . . . . . . . . . . .

Figure 3.17: Communication graph of the microgrid testbed in Case B.

. . . . . . . .

Figure 3.18: Case B: Eﬀect of attack on DSFC: (a) frequency; (b) active power ratio.

Figure 3.19: Case B: Relative entropy based on frequency of DERs.

. . . . . . . . . .

Figure 3.20: Case B: Resilient DSFC: (a) frequency; (b) active power ratio.

. . . . . .

Figure 3.21: Case B: Resilient DSFC: (a) relative entropy; (b) self-belief of DERs.

. .

Figure 3.22: Microgrid test system for HIL testing.

. . . . . . . . . . . . . . . . . . .

Figure 3.23: HIL Setup.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.24: Case C: Eﬀect of attack on DSFC: (a) frequency; (b) active power ratio.

Figure 3.25: Case C: Relative entropy based on frequency of DERs.

. . . . . . . . . .

Figure 3.26: Case C: Resilient DSFC: (a) frequency; (b) active power ratio.

. . . . . .

Figure 4.1: Graph topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 4.2: The Agents depth trajectory under the inﬂuence of the attacks on AUVs

2 and 3. Depth without adaptive compensator

. . . . . . . . . . . . . .

66

67

67

67

68

68

69

70

70

70

71

71

71

73

73

73

74

74

92

93

xi

Figure 4.3: The Agents depth trajectory under the inﬂuence of the attacks on AUVs

2 and 3. Depth with adaptive compensator

. . . . . . . . . . . . . . . .

93

Figure 5.1: Eﬀect of non-triggering misbehavior on sensor nodes {5,6} cluster the

graph G in the two isolated graphs G1 and G2.

. . . . . . . . . . . . . . . 105

Figure 5.2: Communication topology.

. . . . . . . . . . . . . . . . . . . . . . . . . . 120

Figure 5.3: Sensor network without any attack.

(a) State estimation errors (b)

Transmit function for sensor 2 . . . . . . . . . . . . . . . . . . . . . . . . 120

Figure 5.4: Sensor node 2 under continuous-triggering misbehavior. (a) State esti-

mation errors (b) Transmit function for sensor 2 . . . . . . . . . . . . . . 121

Figure 5.5: Sensor node 2 under non-triggering misbehavior. (a) State estimation

errors (b) Transmit function for sensor 2 . . . . . . . . . . . . . . . . . . 122

Figure 5.6: Sensor node 2 under attack. (a) Estimated KL divergence (b) Conﬁ-

dence of sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Figure 5.7: State estimation errors under attack on sensor 2 using proposed resilient

state estimator.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Figure 6.1: Proposed metacognitive control scheme. S: The system to be controlled;

K: low-level RL controller; C: high-level metacognitive layer scheme.

. . 133

Figure 6.2: Lane changing scenario for steering control of autonomous vehicle. . . . . 150

Figure 6.3: Lane changing with ﬁxed value of hyperparameter without any change

in dynamics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Figure 6.4: Constraint violation during lane changing with ﬁxed value of hyperpa-

rameter and change in dynamics.

. . . . . . . . . . . . . . . . . . . . . . 151

Figure 6.5: Predicted ﬁtness corresponding to vehicle trajectory under normal operation.152

Figure 6.6: Predicted ﬁtness corresponding to vehicle trajectory under constraint

violation.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Figure 6.7: Overall ﬁtness value under desired STL constraint violation for the ve-

hicle trajectory.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Figure 6.8: Vehicle trajectory with hyperparameter adaptation based on Algorithm

1 for lane changing scenario. . . . . . . . . . . . . . . . . . . . . . . . . . 153

xii

Figure 6.9: Predicted ﬁtness corresponding to vehicle trajectory with hyperparam-

eter adaptation based on Algorithm 1.

. . . . . . . . . . . . . . . . . . . 153

Figure 6.10: Overall ﬁtness value under desired STL constraint for adapted vehicle

trajectory .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

xiii

CHAPTER 1

INTRODUCTION

This chapter presents motivation, literature synopsis and contributions of this dissertation.

1.1 Motivation

A cyber-physical system (CPS) refers to a class of engineering systems that integrates the

cyber aspect of computation and communication elements with physical entities. Based on

their control objectives CPSs can be categorized into two classes, namely distributed multi-

agent systems (DMASs) and networked control systems (NCSs). The control objective in

DMAS is to achieve a coordinated or synchronized motion or behavior through the exchange

of local information among agents [1]-[4]. On the other hand, the control objective in NCS,

for which the feedback loops are closed through a communication network, is to regulate the

system’s output to the desired value or trajectory [86]. Despite their numerous applications

in a variety of disciplines, DMASs and NCSs are cyber-physical systems that bring new

challenges to control systems due to the presence of signiﬁcant uncertainties and undesired

signals (i.e., disturbances and cyber-physical attacks). Thus, it is of vital importance to

design resilient and safe control approaches that can adapt to the situation and mitigate

adversaries to ensure an acceptable level of functionality and autonomy despite uncertainties

and cyber-physical attacks.

The ﬁrst part of this dissertation focus on attack analysis and resilient designs for DMASs.

In the case of synchronization of DMASs, the coordination objective is to guarantee that

all agents reach agreement on a common value or trajectory of interest. DMASs are cyber-

physical systems that incorporate communication as a cyber component to facilitate the

exchange of information among agents. This, however, makes them vulnerable to variety of

attacks. In contrast to other undesirable inputs, such as disturbances and noises, attacks are

intentionally planned to maximize the damage to the network. Therefore, it is important to

1

analyze the adverse eﬀects of attacks on the performance and then design resilient DMASs

that can mitigate attacks and guarantee an acceptable level of functionality despite attacks.

To address the problem of interest, Chapters 2, 3 and 4 of this dissertation focus on attack

analysis and resilient designs for DMASs.

Next, to perform monitoring and to successfully design a controller for diﬀerent systems

where state measurements are not available, one needs to do the state estimation over wire-

less sensor networks (WSNs). WSNs are a class of multi-agent CPSs for which a set of

sensors are spatially distributed to monitor and estimate a variable of interest (e.g., loca-

tion of a moving target, state of a large-scale system, etc.), and have various applications

such as surveillance and monitoring, target tracking, and active health monitoring [103]. In

centralized WSNs, all sensors broadcast their measurements to a center at which the infor-

mation is fused to estimate the state [104]. These approaches, however, are communication

demanding and prone to single-point-of-failure. To estimate the state with reduced com-

munication burden, a distributed Kalman ﬁlter (DKF) is presented in [105]-[111], in which

sensors exchange their information only with their neighbors, not with all agents in the

network or a central agent. Cost constraints on sensor nodes in a WSN result in correspond-

ing constraints on resources such as energy and communications bandwidth. Sensors in a

WSN usually carry limited, irreplaceable energy resources and lifetime adequacy is a signif-

icant restriction of almost all WSNs. Therefore, it is important to design event-triggered

DKF to reduce the communication burden which consequently improves energy eﬃciency.

To this end, several energy-eﬃcient event-triggered distributed state estimation approaches

are presented for which sensor nodes intermittently exchange information [112]-[115]. More-

over, the importance of event-triggered state estimation problem is also reported for several

practical applications such as smart grids and robotics [116]-[119]. Although event-triggered

distributed state estimation is resource-eﬃcient, it provides an opportunity for an attacker

to harm the network performance and its connectivity by corrupting the information that

is exchanged among sensors, as well as to mislead the event-triggered mechanism. Thus, it

2

is of vital importance to design a resilient event-triggered distributed state estimation ap-

proach that can perform accurate state estimation despite attacks. To address this problem,

Chapter 5 of this dissertation ﬁrst analyzes the adverse eﬀects of attacks and then presents

the secure state estimator for distributed sensor networks.

Finally, the safe control design problem for autonomous CPSs under uncertainties is ad-

dressed. More speciﬁcally, the safe reinforcement learning (RL) framework for autonomous

control systems under constraints is developed. RL is a goal-oriented learning approach, in-

spired by biological systems, and is concerned with designing agents that can take actions in

an environment to maximize some notion of cumulative reward [138]-[141]. Despite tremen-

dous success of RL in a variety of applications, including robotics [142], control [143], and

human-computer interaction [144], existing results are categorized as weak artiﬁcial intelli-

gence (AI) [145]. That is, current RL practice has mainly been used to achieve pre-speciﬁed

goals in structured environments by handcrafting a cost or reward function for which its

minimization guarantees reaching the goal. Strong AI, on the other hand, holds the promise

of designing agents that can learn to achieve goals across multiple circumstances by general-

izing to unforeseen and novel situations. As the designer cannot foresee all the circumstances

that the agent might encounter, pre-specifying and handcrafting the reward function cannot

guarantee reaching goals in an uncertain and non-stationary environment. Reward shaping

[146]-[147] has been presented in the literature with the primary goal of speeding up the

learning without changing the outcome of the solution.

Intrinsic motivated RL [148] has

also been presented so that agents learn to shape their reward function to a better trade-oﬀ

between exploration and exploitation or learn faster for the applications that the external

environmental reward is sparse. Thus, in Chapter 6 of this dissertation, to guarantee perfor-

mance while assuring the satisfaction of safety constraints across variety of circumstances, we

present an assured autonomous control framework is designed by empowering RL algorithms

with meta-cognitive learning capabilities.

3

1.2 Literature Synopsis

In this section, we review the literature in areas relevant to this dissertation. We organize

the literature according to the broad topics of interest in this dissertation.

1.2.1 Resilient Control Design for Distributed Multi-Agent Systems

There has been a surge of interest in developing attack detection/identiﬁcation and miti-

gation approaches for cyber-physical systems [5]-[25], including DMASs [16]-[25]. In [5]-[6],

conditions under which an attacker can remain unnoticed are presented, followed by detec-

tion and identiﬁcation mechanisms for attacks on sensors and actuators of cyber-physical

systems. Resilient state estimation and control algorithms for cyber-physical systems under

attacks are reported in [7]-[8]. A passivity-based attack mitigation mechanism is proposed

in [9]. Teixeira et al. in [10] categorized attacks based on the attacker’s knowledge, disclo-

sure, disruption resources, and characterized their impact using the concept of safe sets for

cyber-physical systems. Various game-theoretic frameworks based resilient state estimation

approaches are presented in [11]-[14]. Recently, in [15], authors presented proactive and re-

active defense mechanisms to mitigate attacks on sensors and actuators. Although elegant,

these aforementioned results do not apply to DMASs for which the overall objective is to

synchronize the agents’ states to some value of interest.

For DMASs, attack detection and mitigation algorithms are presented in [16]-[18]. Mean

square subsequence based resilient distributed control protocols for consensus of DMASs are

presented in [19]-[21]. In these approaches, agents discard neighbors’ information based on

the discrepancy between their neighbors’ values and their own values. Moreover, the maxi-

mum number of agents under attack is assumed known and a network connectivity assump-

tion is made based on that. Adaptive local resilient control protocols based attack mitigation

are designed to directly mitigate attack without identifying them using an observer-based

approach [22]. An attacker-defender game framework is presented in [23] on networks with

4

unknown graph topology in which the defender injects control inputs to reach a consensus

while attenuating the attack signal from compromised agents. Similarly, a controllability

Gramian based game-theoretic approach is presented for resilient distributed consensus in

[24]. A comprehensive survey on security of the cyber-physical system is presented in [25],

categorizing the reported results for DMASs in three classes, called, prevention, resilience,

and detection isolation. Despite tremendous and welcoming progress, most of the mentioned

mitigation approaches for DMASs use the discrepancy among agents and their neighbors to

detect and mitigate the eﬀect of an attack. However, as shown in Chapters 2 and 4, a stealthy

attack can make all agents unstable simultaneously, and thus misguide existing mitigation

approaches. Moreover, this discrepancy could be caused by a legitimate change in the state

of an agent, and rejecting this useful information can decrease the speed of convergence to

the desired consensus and harm connectivity of the network.

Several remarkable results are presented for resilient control designs for important appli-

cations such as robotics and power systems. In [87]-[88], authors presented resilient algo-

rithms for ﬂocking and active target tracking applications in robotics, respectively. The work

presented in [87] ensures a resilient consensus if the network connectivity is greater than a

resilience threshold with the assumption that a compromised agent can share wrong infor-

mation, but the actuator always works properly. The network connectivity constraints on

the graph topology are relaxed in [89] by including trusted nodes. Similarly, the bulk of the

research in cybersecurity of power systems focuses mainly on attack detection techniques [66]-

[73]. Diﬀerent techniques, including, adaptive cumulative sum using Markov-chain analysis

[66], Kalman ﬁlter [67], graphical method [68], model-based scheme [69], matrix separation

technique [70], Chi-square detector and cosine similarity matching approach [71], and non-

linear internal observer [72] are introduced for the attack detection in power systems with

centralized control structure. The proposed attack detection ﬁlter in [16] and systematic

detection and localization strategy in [73] tackle the attack detection in distributed control

systems. In [74], signal temporal logic has been utilized for attack detection in a distributed

5

control system. Attack mitigation has also recently been considered in power systems. In

[75], sensor fault detection and mitigation schemes are proposed to mitigate the impacts

of cyber-attacks in DC power systems with a centralized control structure. Reference [76]

proposes a trust/conﬁdence-based approach for cyber-attack mitigation in the distributed

control system of DC microgrids. In [78], a two-fold strategy is proposed to mitigate the

impacts of FDI attacks on the control system of the shipboard power system.

In [79], a

trust/conﬁdence-based control protocol is proposed to mitigate the impact of attacks on the

distributed secondary control of AC microgrids. This approach, however, only considers the

secondary frequency control and does not address the attack mitigation of secondary voltage

control. In Chapter 3, we present FDI-attack detection and mitigation approaches for dis-

tributed secondary control of microgrids that are not limited to any speciﬁc type of attack

with only mild restrictions on network connectivity.

1.2.2 Secure Distributed State Estimation

In recent years, secure state estimation of CPSs have received signiﬁcant attention and re-

markable results have been reported for mitigation of cyber-physical attacks, including denial

of service attacks [10], [120], false data injection attacks [5]-[7],[121]-[122], and bias injection

attacks [36], [123]. For the time-triggered distributed scenario, several secure state estima-

tion approaches are presented in [124]-[131]. Speciﬁcally, in [124]-[132] authors presented a

distributed estimator that allows agents to perform parameter estimation in the presence of

attack by discarding information from the adversarial agents. Byzantine-resilient distributed

estimator with deterministic process dynamics is discussed in [126]. Then, the same authors

solved the resilient distributed estimation problem with communication losses and intermit-

tent measurements in [127]. Attack analysis and detection for distributed Kalman ﬁlters

are discussed in [128]. Resilient state estimation subject to denial of service attacks for

power system and robotics applications is presented in [129]-[131]. Although elegant, these

aforementioned results for the time-triggered resilient state estimation do not apply to event-

6

triggered distributed state estimation problems. To solve this problem, we analyze the eﬀect

of adversaries and design secure event-triggered distributed Kalman ﬁlters for state estima-

tion over wireless sensor networks in Chapter 5.

1.2.3 Safe Reinforcement Learning

In the control community, several RL-based feedback controllers have been presented for con-

trol of uncertain dynamical systems [149]-[153]. In these traditional RL-based controllers,

the reinforcement signal feedback is derived through a ﬁxed quadratic objective function

[154]-[155]. A ﬁxed reward or objective function, however, cannot guarantee to achieve

desired speciﬁcations across all circumstances. To express rich speciﬁcations rather than

quadratic objectives, temporal logic, as an expressive language close to human language, has

been widely used. Temporal logic is well suited for specifying goals and introducing domain

knowledge for the RL problem [156]-[159]. RL with temporal logic speciﬁcations has also

been used recently [160]-[163]. However, deﬁning the rewards solely based on temporal logic

speciﬁcations and ignoring numerical rewards can result in sparse feedbacks in control sys-

tems and cannot include other performance objectives such as energy and time minimization.

Moreover, the system pursues several objectives and as the circumstance changes, the sys-

tem’s needs and priorities also change, requiring adapting the reward signal to encode these

needs and priorities to the context. It is therefore desired to design a controller that provides

a good enough performance across variety of circumstances while assuring that its safety-

related temporal logic speciﬁcations are satisﬁed. To this end, Chapter 6 of this dissertation

takes a step towards strong AI for feedback control design by presenting a notion of adaptive

reward function by introducing a metacognitive layer that decides on what reward function

to optimize depending on the circumstance. More speciﬁcally, a metacognitive assured RL

framework is presented to learn control solutions with good performances while satisfying

desired speciﬁcations.

7

1.3 Contributions and outline

In this section, we outline the organization of the chapters in this dissertation and provide

the contributions of each chapter. The key contributions of the dissertation are listed as

follows.

• Attack Analysis and Resilient Designs for Multi-agent CPSs

Chapter 2: In this chapter, we ﬁrst address adverse eﬀects of attacks on distributed

synchronization of multi-agent systems, by providing conditions under which an at-

tacker can destabilize the underlying network, as well as another set of conditions un-

der which local neighborhood tracking errors of intact agents converge to zero. Based

on this analysis, we propose a Kullback-Liebler divergence based criterion in view of

which each agent detects its neighbors’ misbehavior and, consequently, forms a self-

belief about the trustworthiness of the information it receives. Agents continuously

update their self-beliefs and communicate them with their neighbors to inform them of

the signiﬁcance of their outgoing information. Moreover, if the self-belief of an agent is

low, it forms trust on its neighbors. Agents incorporate their neighbors’ self-beliefs and

their own trust values on their control protocols to slow down and mitigate attacks.

We show that using the proposed resilient approach, an agent discards the information

it receives from a neighbor only if its neighbor is compromised, and not solely based on

the discrepancy among neighbors’ information, which might be caused by legitimate

changes, and not attacks. The proposed approach is guaranteed to work under mild

connectivity assumptions.

Chapter 3: This chapter presents a resilient control framework for distributed fre-

quency and voltage control of AC microgrids under data manipulation attacks.

In

order for each distributed energy resource (DER) to detect any misbehavior on its

neighboring DERs, an attack detection mechanism is ﬁrst presented using a Kullback-

Liebler divergence-based criterion. An attack mitigation technique is then proposed

8

that utilizes the calculated KL divergence factors to determine trust values indicating

the trustworthiness of the received information.Moreover, DERs continuously generate

a self-belief factor and communicate it with their neighbors to inform them of the valid-

ity level of their own outgoing information. DERs incorporate their neighbors’ selfbelief

and their own trust values in their control protocols to slow down and mitigate attacks.

It is shown that the proposed cybersecure control eﬀectively distinguishes data manipu-

lation attacks from legitimate events. The performance of proposed resilient frequency

and voltage control techniques is veriﬁed through simulation of microgrid tests system

and hardware-in-the-loop (HIL) set-up using Opal-RT as a real-time digital simulator.

Chapter 4: This chapter analyzes the adverse eﬀects of cyber-physical attacks on

discrete-time distributed multi-agent systems, and proposes a mitigation approach for

attacks on sensors and actuators. First, we show how an attack on a single node

snowballs into a network-wide attack and even destabilizes the entire system. Next,

to overcome the adversarial eﬀects of attacks on sensors and actuators, a distributed

adaptive attack compensator is designed by estimating the normal expected behavior

of agents. The adaptive attack compensator is augmented with the controller and

it is shown that the proposed controller achieves secure consensus in the presence of

the attacks on sensors and actuators. No restrictive assumption on the number of

agents under adversarial input is assumed. Moreover, it recovers compromised agents

under actuator attacks and avoids propagation of attacks on sensors without discarding

information from the compromised agents. Finally, numerical simulations validate

the eﬀectiveness of the presented theoretical contributions on a network of Sentry

autonomous underwater vehicles.

Chapter 5: In this chapter, we analyze the adverse eﬀects of cyber-physical attacks as

well as mitigate their impacts on the event-triggered distributed Kalman ﬁlter (DKF).

We ﬁrst show that although event-triggered mechanisms are highly desirable, the at-

tacker can leverage the event-triggered mechanism to cause non-triggering misbehav-

9

ior which signiﬁcantly harms the network connectivity and its collective observability.

We also show that an attacker can mislead the event-triggered mechanism to achieve

continuous-triggering misbehavior which not only drains the communication resources

but also harms the network’s performance. An information-theoretic approach is pre-

sented next to detect attacks on both sensors and communication channels. In contrast

to the existing results, the restrictive Gaussian assumption on the attack signal’s prob-

ability distribution is not required. To mitigate attacks, a meta-Bayesian approach is

presented that incorporates the outcome of the attack detection mechanism to per-

form second-order inference. The proposed second-order inference forms conﬁdence

and trust values about the truthfulness or legitimacy of sensors’ own estimates and

those of their neighbors, respectively. Each sensor communicates its conﬁdence to its

neighbors. Sensors then incorporate the conﬁdence they receive from their neighbors

and the trust they formed about their neighbors into their posterior update laws to

successfully discard corrupted information. Finally, the simulation result validates the

eﬀectiveness of the presented resilient event-triggered DKF.

• Safe Reinforcement Learning for Autonomous Systems: A Metacognitive

Framework

Chapter 6: This chapter presents a safe reinforcement learning framework for au-

tonomous control systems under constraints. As RL agents with pre-speciﬁed reward

functions cannot provide guaranteed safety across variety of circumstances that an un-

certain system might encounter. To guarantee performance while assuring satisfaction

of safety constraints across variety of circumstances, an assured autonomous control

framework is presented by empowering RL algorithms with metacognitive learning ca-

pabilities. More speciﬁcally, adapting the reward function parameters of the RL agent

is performed in a metacognitive decision-making layer to assure the feasibility of RL

agent. That is, to assure that the learned policy by the RL agent satisﬁes safety

constraints speciﬁed by signal temporal logic while achieving as much performance as

10

possible. The metacognitive layer monitors any possible future safety violation under

the actions of the RL agent and employs a higher-layer Bayesian RL algorithm to

proactively adapt the reward function for the lower-layer RL agent. To minimize the

higher-layer Bayesian RL intervention, a ﬁtness function is leveraged by the metacog-

nitive layer as a metric to evaluate success of the lower-layer RL agent in satisfaction

of safety and liveness speciﬁcations, and the higher-layer Bayesian RL intervenes only

if there is a risk of lower-layer RL failure. Finally, a simulation example is provided to

validate the eﬀectiveness of the proposed approach.

1.4 Publications resulted from this work

Journal Articles:

1. A. Mustafa, H .Modares and R. Moghadam, “Resilient Synchronization of Distributed

Multi-agent Systems under Attacks”, Automatica, vol. 115, 2020.

2. A. Mustafa, and H. Modares, “Attack Analysis and Resilient Control Design for

Discrete-time Distributed Multi-agent Systems”, IEEE Robotics and Automation Let-

ters, vol. 5, no. 2, pp. 369-376, 2020.

3. A. Mustafa, B. Poudel, A. Bidram and H. Modares, “Detection and Mitigation of

Data Manipulation Attacks in AC Microgrids”, IEEE Transaction on Smart Grid, vol.

11, no. 3, pp. 2588-2603, 2020.

4. B. Poudel, A. Mustafa, A. Bidram and H. Modares, "Detection and Mitigation of

Cyber-threats in the DC Microgrid Distributed Control System", International Journal

of Electrical Power and Energy Systems, vol. 120, 2020.

5. A. Mustafa, M. Mazouchi and H. Modares, “Secure Event-Triggered Distributed

Kalman Filters for State Estimation”, IEEE Transactions on Systems, Man and Cy-

bernetics: Systems. (under Review)

11

6. A. Mustafa, M. Mazouchi, H. Modares and S.P. Nageshrao, “Assured Learning-

enabled Autonomy: A Metacognitive Reinforcement Learning Framework”, IEEE Trans-

action on Neural Networks and Learning systems. (Under Review)

Conferences:

1. A. Mustafa, and H. Modares, “Attack Analysis for Discrete-time Distributed Multi-

Agent Systems”, 57th Annual Allerton Conference on Communication, Control, and

Computing (Allerton), pp. 230-237, 2019.

2. A. Mustafa, and H. Modares, “Analysis and Detection of Cyber-physical Attacks in

Distributed Sensor Networks”, 56th Annual Allerton Conference on Communication,

Control, and Computing (Allerton), pp. 973-980, 2018.

12

CHAPTER 2

RESILIENT SYNCHRONIZATION OF DISTRIBUTED MULTI-AGENT

SYSTEMS UNDER ATTACKS

2.1

Introduction

In this chapter, we present attack analysis, detection, and mitigation mechanisms for dis-

tributed multi-agent systems (DMASs). First, the adverse eﬀects of cyber-physical attacks

on the synchronization of the DMAS are described, supported with analysis. Speciﬁcally,

conditions under which an attack can destabilize the entire network are provided. Moreover,

conditions under which the local neighborhood tracking error of intact agents becomes zero

while they are far from the synchronization are provided. The results of this analysis en-

ables us to design detection mechanisms for sophisticated and threatening attacks. Then,

two attack detectors are designed based on Kullback-Leibler (KL) divergence metrics to de-

tect attacks that make the local neighborhood tracking error of intact agents zero and those

for which this error cannot be zero. These detectors are then combined to detect variety of

deception attacks. Finally, to mitigate attacks, self-belief (i.e., the belief on the trustworthi-

ness of agents’ own information) and trust (i.e., the belief on trustworthiness of neighbors

information) metrics are introduced based on the results from the detection mechanism. A

weighted local neighborhood tracking error is introduced in which each agent incorporates

trusts on its neighbors as well as the self-belief of its neighbors.

2.2 Preliminaries

A directed graph (digraph) G consists of a pair (V, E) in which V = {v1,··· , vN} is a
set of nodes and E ⊆ V × V is a set of edges. We denote the directed link (edge) from vj to
vi by the ordered pair (vj, vi). The adjacency matrix is deﬁned as Ad = [aij], with aij > 0 if
(vj, vi) ∈ E, and aij = 0 otherwise. The nodes in the set Ni = {vj : (vj, vi) ∈ E} are said to

13

is deﬁned as L = D − Ad, where D = diag(di) is the in-degree matrix, with di =(cid:80)

be neighbors of node νi. The in-degree of vi is the number of edges having vi as a head. The
out-degree of a node vi is the number of edges having vi as a tail. If the in-degree equals the
out-degree for all nodes vi ∈ V the graph is said to be balanced. The graph Laplacian matrix
aij
as the weighted in-degree of node νi. A node is called as a root node if it can reach all other
nodes of the digraph G through a directed path. A leader is a root node with no incoming
link. A (directed) tree is a connected digraph where every node except one, called the root,

j∈Ni

has in-degree equal to one. A spanning tree of a digraph is a directed tree formed by graph

edges, which connects all the nodes of the graph.

Throughout the chapter, we denote the set of integers by Z. The set of integers greater
than or equal to some integer q ∈ Z is denoted by Z(cid:62)q. The cardinality of a set S is denoted
by |S|. λ(A) and tr(A) denote, respectively, the eigenvalues and trace of the matrix A.
Furthermore, λmin(A) represents the minimum eigenvalue of the matrix A. The Kronecker
product of matrices A and B is denoted by A ⊗ B, and diag (A1, . . . , An) represents a block
diagonal matrix with matrices Ai, ∀ i ∈ N as its diagonal entries. 1N is the N-vector of ones
and IN is the N × N identity matrix. (cid:107)A(cid:107) denotes Euclidean norm of A. span(a1, . . . , an)
represents the set of all linear combinations of the vectors a1, . . . , an. A Gaussian distribution

with mean µ and covariance Σ is denoted by N (µ, Σ). Moreover, FN(cid:0)¯µ, ¯σ2(cid:1) represents

univariate folded Gaussian distribution with ¯µ and ¯σ2 as mean and variance, respectively
[26]. E[.] denotes the expectation operator. The term statistical properties is used for error
sequences in the chapter to denote their mean and variance.

A system dynamics is called stable (i.e., Hurwitz) if all its eigenvalues have negative real

part, and called unstable if it has eigenvalues with positive real part or repeated pair of

eigenvalues on the imaginary axis. In this chapter, the term destabilize is used when the

attacker makes system unstable.
Assumption 1. The communication graph G is directed and has a spanning tree.

Note that having a spanning tree is the minimum requirement to guarantee synchroniza-

14

tion over a directed graph [27].
Deﬁnition 1 [27]-[28]. A square matrix A ∈ Rn×n is called a singular (non-singular)
M-matrix, if all its oﬀ-diagonal elements are non-positive and all its eigenvalues have non-
(cid:3)
negative (positive) real parts.
Lemma 1 [27]-[28]. The graph Laplacian matrix L of a directed graph G has at least one
zero eigenvalue, and all its nonzero eigenvalues have positive real parts. Zero is a simple
eigenvalue of L, if and only if Assumption 1 is satisﬁed.

2.3 Overview of Consensus in DMASs

In this section, we provide an overview of the consensus problem for leaderless DMAS.

Consider a group of N homogeneous agents with linear identical dynamics described by

∀ i ∈ N ,

˙xi(t) = Axi(t) + Bui(t)

(2.1)
where xi ∈ Rn and ui ∈ Rm denote, respectively, the state and the control input of agent i.
The matrices A ∈ Rn×n and B ∈ Rn×m are, respectively, the drift dynamics and the input
matrix.
Problem 1. Design local control protocols ui for all agents ∀i ∈ N in (2.1) such that all
agents reach consensus or synchronization on some common value or trajectory of interest,

i.e.,

t→∞||xj(t) − xi(t)|| = 0 ∀ i, j ∈ N .

lim

(2.2)

Assumption 2. The system dynamics matrix A in (2.1) is assumed to be marginally stable

with all eigenvalues on the imaginary axis [29].

Deﬁne the local neighborhood tracking error for the agent i as

N(cid:88)

ηi(t) =

aij(xj(t) − xi(t)),

with aij as the (i, j)-th entry of the graph adjacency matrix Ad.

j=1

Consider the distributed control protocol for each agent i as [27]-[28]

ui(t) = cKηi(t)

∀ i ∈ N ,

15

(2.3)

(2.4)

where c and K ∈ Rm×n denote, respectively, the scalar coupling gain and the feedback
control gain matrix.

Several approaches are presented to design c and K locally to solve Problem 1 [27]-
[30]. To this end, the gains c and K are designed such that A − cλiBK is Hurwitz for all
i = 2, . . . , N [27]-[30], with λi as the ith eigenvalue of the graph Laplacian matrix L. In the
subsequent sections, we assume that c and K are designed locally by each agent and without

using a central agent appropriately to solve problem 1 in the absence of attack. We then

analyze the eﬀect of attacks and propose mitigation approaches.

Remark 1. Note that the presented results subsume the leader-follower synchronization

problem and the average consensus as special cases. For the leader-follower case, the leader

is only root node, whereas for the average consensus case, the graph is assumed to be balanced
(cid:3)

and A = 0 and B = Im.

2.4 Attack Modelling and Analysis for DMASs

In this section, attacks on agents are modelled and a complete attack analysis is provided.

2.4.1 Attack Modelling

In this subsection, attacks on DMASs are modelled. Attacks on actuators of agent i can be

modelled as

i = ui + βiud
uc
i ,

(2.5)

i and uc

i denote, respectively, the nominal value of the control protocol for agent
where ui, ud
i in (2.1), the disrupted signal directly injected into actuators of agent i, and the corrupted

control protocol of agent i. If agent i is under actuator attack, then βi = 1, otherwise βi = 0.
Similarly, one can model attacks on sensors of agent i as

i = xi + αixd
xc
i ,

(2.6)

16

i and xc

where xi, xd
i denote, respectively, the nominal value of the state of agent i in (2.3),
the disrupted signal directly injected into sensors of agent i, and the corrupted state of agent

i. If agent i is under sensor attack, then αi = 1, otherwise αi = 0. Using the corrupted state
(2.6) in the controller (2.4)-(2.3) with the corrupted control input (2.5) in (2.1), the system

dynamics under attack becomes

˙xi = Axi + Bui + Bfi

∀ i ∈ N ,

fi = βiud

i + cK

aij

αjxd

j − αixd

i

where fi denotes the overall attack aﬀecting the agent i which can be written as

(cid:16)

(cid:88)

j∈Ni

(cid:17)

,

(2.7)

(2.8)

with ud

i and xd

i as attacks directly on actuators and sensors of agent i, respectively, and xd
j
as the disruption in the received state of the jth neighbor due to injected attack signal either

into its sensors or actuators or into the incoming communication link from agent j to agent

i.

The following deﬁnition categorizes all attacks into two categories. The ﬁrst type of

attack exploits the knowledge of the system dynamics A and use it in the design of its attack

signal. That is, for the ﬁrst type of attack for fi in (2.8), one has

˙fi = Ψfi,

(2.9)

where Ψ ∈ Rm×m depends on the knowledge of the system dynamics A as discussed in
Deﬁnition 2. On the other hand, for the second type of attack, the attacker has no knowledge

of the system dynamics A and this can cover all other attacks that are not in the form of

(2.9).

Deﬁne

 EΨ = {λ1(Ψ), . . . , λm(Ψ)}

EA = {λ1(A), . . . , λn(A)},

(2.10)

where λi(Ψ) ∀i = 1, . . . , m and λi(A) ∀i = 1, . . . , N are, respectively, the set of eigenvalues
of the attack signal generator dynamics matrix Ψ and the system dynamics matrix A.

17

Deﬁnition 2 (IMP-based and non-IMP-based Attacks). If the attack signal fi in (2.7)
is generated by (2.9), then the attack signal is called the internal model principle (IMP)-
based attack, if EΨ ⊆ EA. Otherwise, i.e., EΨ (cid:54)⊂ EA or if the attacker has no dynamics
(cid:3)
(e.g. a random signal), it is called a non-IMP based attack.

Remark 2. Note that we do not limit attacks to the IMP-based attacks given by (2.9).

Attacks are categorized into two classes in Deﬁnition 2 based on their impact on the system

performance, as to be shown in the subsequent sections. The non-IMP based attacks cover
(cid:3)

a broad range of attacks.

Deﬁnition 3 (Compromised and Intact Agent). We call an agent that is directly under

attack as a compromised agent. An agent is called intact if it is not compromised. We denote
the set of intact agents as NInt, i.e.,NInt = N\NComp where NComp denotes the set of
(cid:3)
compromised agents.
Deﬁnition 4. In a graph, Agent i is reachable from agent j if there exists [v1, v2, . . . , vl] ∈ V
av1v2 . . . avli (cid:54)= 0 for some l ≥ 0, i.e., there is a directed path of length l + 1
such that ajv1
(cid:3)
from node j to node i.

Using (2.4)-(2.3), the global form of control input, i.e., u = [uT

1 , . . . , uT

N ]T can be written

as

where L denotes the graph Laplacian matrix.

u = (−cL ⊗ K)x,

By using (2.11) in (2.7), the global dynamics of agents under attack becomes

˙x(t) = (IN ⊗ A) x(t) + (IN ⊗ B) uc(t),

where

uc(t) = u(t) + f (t)

∆

= (−cL ⊗ K)x(t) + f (t),

(2.11)

(2.12)

(2.13)

with f (t) = [f T

N (t)]T denote, respectively, the
overall vector of attacks on agents and the global vector of the states of agents. Note that

N (t)]T and x(t) = [xT

1 (t), . . . , f T

1 (t), . . . , xT

18

based on (2.8), and since corrupted sensory information is used by the controller, f (t) in

(2.13) not only captures attacks on actuators, but also on sensors.

If agents are not under attack, i.e., f (t) = 0, then, the control input (2.11) eventually

compensates for the diﬀerence between the agents’ initial conditions and becomes zero once

they reach an agreement. That is, in the absence of attack, uc(t) = u(t) goes to zero (i.e.,
uc(t) → 0), and the global dynamics of agents become

˙xss(t) = (IN ⊗ A) xss(t),

(2.14)

where xss = lim
Deﬁnition 5 (Steady State and Reaching Consensus). We say that agents with the

t→∞x(t) is called the global steady state of agents.

dynamics given by (2.7) and the global dynamics given by (2.12) reach a steady state if
(2.14) is satisﬁed, i.e., if uc(t) → 0 in (2.13). In the absence of attack, if agents reach a
(cid:3)
steady state, then, agents reach consensus, i.e.,

t→∞||xj(t) − xi(t)|| = 0 ∀ i, j ∈ N .

lim

In the presence of attack, whether agents reach a steady state or not, i.e.,
Remark 3.
whether uc(t) → 0 or uc(t) (cid:54)→ 0, plays an important role in the attack analysis and mitigation
to follow. Reaching a steady state is necessary for agents to achieve consensus based on

Deﬁnition 5. However, we show that even if agents reach a steady state, they do not achieve
(cid:3)

consensus if the system is under attack.

2.4.2 Attack Analysis

In this subsection, a graph theoretic-based approach is utilized to analyze the eﬀect of attacks

on DMASs. To this end, the following notation and lemmas are used.

Let the graph Laplacian matrix L be partitioned as [27]

 Lr×r

L =

0r×nr
Lnr×r Lnr×nr

 ,

where r and nr in (2.15) denote, respectively, the number of root nodes and non-root nodes.
Moreover, Lr×r and Lnr×nr are, respectively, the sub-graph matrices corresponding to the

19

(2.15)

sub-graphs of root nodes and non-root nodes. The result of the following lemma is used in

the proof of Theorem 1 to show that the local neighborhood tracking error goes to zero even

in the presence of attack.
Lemma 2. Consider the partitioned graph Laplacian matrix (2.15). Then, Lr×r is a singular
M-matrix and Lnr×nr is a non-singular M-matrix.

Proof. We ﬁrst prove that the subgraph of root nodes is strongly connected. According to

the deﬁnition of a root node, there always exists a directed path from a root node to all other
nodes of the graph G, including other root nodes. Therefore, in the graph G, there always
exists a path from each root node to all other root nodes. We now show that removing
non-root nodes from the graph G does not aﬀect the connectivity of the subgraph comprised
of only root nodes. In the graph G, if a non-root node is not an incoming neighbor of a
root node, then its removal does not harm the connectivity of the subgraph of the root

nodes. Suppose that removing a non-root node aﬀects the connectivity of the subgraph of

root nodes. This requires the non-root node to be an incoming neighbor of a root node.

However, this makes the removed node a root node, as it can now access all other nodes

through the root node it is connected to. Hence, this argument shows that the subgraph of
root nodes is always strongly connected. Then, based on Lemma 1, Lr×r has zero as one
of its eigenvalues, which implies that Lr×r is a singular M-matrix according to Deﬁnition 1.
On the other hand, from (2.15), since L is a lower triangular matrix, the eigenvalues of L are
the union of the eigenvalues of Lr×r and Lnr×nr. Moreover, as stated in Lemma 1, L has
a simple zero eigenvalue and, as shown above, zero is the eigenvalue of Lr×r. Therefore, all
eigenvalues of Lnr×nr have positive real parts only, and thus based on Deﬁnition 1, Lnr×nr
is a non-singular M-matrix.

In the following Lemmas 3-4 and Theorem 1, we now provide the conditions under which

the agents can reach a steady state.

Lemma 3. Consider the global dynamics of DMAS (2.12) under attack. Let the attack

20

signal f (t) be a non-IMP based attack and f (t) (cid:54)= 0. Then, agents cannot reach a steady
state, i.e., uc(t) (cid:54)→ 0.

Proof. We prove this result by contradiction. Assume that the attack signal f (t) is a non-
IMP based attack, i.e., EΨ (cid:54)⊂ EA, but uc(t) → 0 in (2.12), which implies ˙xi → Axi for all
i ∈ N . Using the modal decomposition, one has

xi(t) → n(cid:88)

j=1

(rjxi(0))eλj (A)tmj,

(2.16)

where rj and mj denote, respectively, the left and right eigenvectors associated with the
eigenvalue λj(A). On the other hand, based on (2.13) uc(t) → 0 implies f (t) → (cL⊗K)x(t)
or equivalently

aij(xj(t) − xi(t))

∀ i ∈ N .

(2.17)

fi(t) → (cid:88)

j∈Ni

As shown in (2.16), the right-hand side of (2.17) is generated by the natural modes of the

system dynamics whereas the left-hand side is generated by the natural modes of the attack
signal generator dynamics in (2.9). By the prior assumption, EΨ (cid:54)⊂ EA, the attacker’s
natural modes are diﬀerent from those of the system dynamics. Therefore, (2.17) cannot be

satisﬁed which contradicts the assumption. This completes the proof.

Equation (2.17) in Lemma 3 also shows that for non-IMP based attacks, the local neigh-

borhood tracking error is nonzero for a compromised agent. The following results show that

under IMP-based attack, either agents’ states diverge, or they reach a steady state while their

local neighborhood tracking errors converge to zero, despite attack. The following lemma is

needed in Theorem 1, which gives conditions under which agents reach a steady state under

IMP-based attack. Then, Theorem 2 shows that under what conditions an IMP-based attack

makes the entire network of agents unstable.

Deﬁne

 SA(t) = [e

Sψ(t) = [e

t

λA1
λΨ1
t

, . . . , eλAn t]
, . . . , eλΨn t],

21

(2.18)

where e

λAi

t ∀i = 1, . . . , n and e

λΨi

t ∀i = 1, . . . , m are, respectively, the set of natural modes

of agent dynamics A in (2.1) and the attacker dynamics Ψ in (2.9).

Lemma 4. Consider the global dynamics of DMAS (2.12) under attack on non-root nodes.
Then, for an IMP-based attack, agents reach a steady state, i.e., uc(t) → 0.

Proof. According to (2.14), in steady state, one has ˙xss(t) → (IN ⊗ A) xss(t) since uc(t) →
0. This implies that xss(t) ∈ span(SA) where SA is deﬁned in (2.18). On the other hand, if
agents reach a steady state, then based on (2.13), one has

(cL ⊗ K)xss(t) = f (t).

(2.19)

nrs]T , where ¯xrs and ¯xnrs are, re-
Deﬁne the global steady state vector xss(t) = [¯xT
spectively, the global steady states of root nodes and non-root nodes. Since attack is only
N ]T

on non-root nodes, f (t) can be written as f (t) = [0r, ¯f T
represents the attack vector on non-root nodes.

rs, ¯xT

nr]T , where ¯fnr = [f T

r+1, . . . , f T

Then, using (2.15) and (2.19), one has



(cLr×r ⊗ K)¯xrs = 0

(cLnr×r ⊗ K)¯xrs + (cLnr×nr ⊗ K)¯xnrs = ¯fnr.

(2.20)

As stated in Lemma 2, Lr×r is a singular M-matrix with zero as an eigenvalue and 1r is
its corresponding right eigenvector and, thus, the solution to the ﬁrst equation of (2.20)

becomes ¯xrs = c11r for some positive scalar c1. Using ¯xrs = c11r in the second equation of
(2.20), the global steady states of non-root nodes becomes

¯xnrs = (cLnr×nr ⊗ K)−1(cid:2)−(cLnr×r ⊗ K)c11r + ¯fnr

(cid:3) .

(2.21)

Equation (2.21) shows that the steady states of non-root nodes are aﬀected by the attack
signal f (t). If EΨ (cid:54)⊂ EA, it results in ¯xnrs ∈ span(SA, SΨ) where SA and SΨ are deﬁned in
(2.18) which contradicts xss(t) ∈ span(SA). Therefore, the condition EΨ ⊂ EA is necessary
to conclude that for any f = [0r, ¯f T
nr]T , there exists a steady state solution xss(t), i.e.,
uc(t) → 0 holds true. This completes the proof.

22

The following theorem provides necessary and suﬃcient conditions for IMP-based attacks

to make an agent reach a steady state, i.e., uc(t) → 0 in (2.13).

Theorem 1. Consider the global dynamics of DMAS (2.12) with the control protocol (2.13),

where the attack signal f (t) is generated based on an IMP-based attack. Then, agents reach
a steady state, i.e., uc(t) → 0 if and only if the attack signals satisfy

N(cid:88)

pkfk = 0,

(2.22)

where pk are the nonzero elements of the left eigenvector of the graph Laplacian matrix L
associated with its zero eigenvalue.

k=1

Proof. It was shown in the Lemma 4 that for the IMP-based attack on non-root nodes,
agents reach a steady state, i.e., uc(t) → 0. Therefore, whether agents reach a steady state
or not depends solely upon attacks on root nodes. Let f (t) = [ ¯fr, ¯fnr] where ¯fr represents
the vector of attacks for root nodes given by ¯fr = [f T
r ]T . Now, we ﬁrst prove the
necessary condition for root nodes. If uc(t) → 0, then, using (2.15) and (2.19) , there exists
a nonzero vector ¯xrs for root nodes such that

1 , . . . , f T

(cLr×r ⊗ K)¯xrs = ¯fr,

(2.23)

where ¯xrs can be considered as the global steady state of the root nodes. Moreover, based
on Lemma 3, (2.23) does not hold, if EΨ (cid:54)⊂ EA which implies that (2.23) is true only for
EΨ ⊆ EA. As stated in Lemma 2, Lr×r is a strongly connected graph of root nodes and,
therefore, it is a singular M-matrix. Let ¯wT = [p1, . . . , pr] be the left eigenvector associated
with the zero eigenvalue of Lr×r. Now, pre-multiplying both sides of (2.23) by ¯wT and using
the fact that ¯wTLr×r = 0 yield

¯wT (cLr×r ⊗ K)¯xrs = ¯wT ¯fr = 0.

(2.24)

This states that IMP-based attacks on root nodes have to satisfy(cid:80)N

k=1 pkfk = 0 to ensure
agents reach a steady state, i.e., uc(t) → 0. Note that pk = 0 for k = r + 1, . . . , N, i.e., the

23

elements of the left eigenvector of the graph Laplacian matrix L, corresponding to its zero
eigenvalue, are zero for non-root nodes [27]-[28]. This proves the necessity part.

a steady state, i.e., uc(t) → 0, but(cid:80)N
Now, we prove the suﬃcient part by contradiction for root nodes. Assume agents reach
k=1 pkfk (cid:54)= 0. Note that, agents reach a steady state
(cid:80)N
implies that there exists a nonzero vector ¯xrs such that (2.23) holds. Using (2.24) and
k=1 pkfk (cid:54)= 0, one can conclude that ¯wT (cLr×r ⊗ K)¯xrs (cid:54)= 0. This can happen only when
strongly connected graph. Therefore, ¯wT (cLr×r⊗K)¯xrs = 0 which results in(cid:80)N
Lr×r does not have any zero eigenvalue, which violates the fact in Lemma 2 that Lr×r is a
k=1 pkfk = 0

and contradicts the assumption made. This completes the proof.

The following theorem provides conditions for IMP-based attack under which the network

becomes unstable.

Theorem 2. Consider the global dynamics of DMAS (2.12) with the control protocol (2.13)
under IMP-based attack. If (2.22) is not satisﬁed and EΨ ∩ EA (cid:54)= ∅, then the dynamics of
agents become unstable.

Proof. Since it is assumed that the condition in (2.22) is not satisﬁed, then based on Theorem
1, uc(t) (cid:54)→ 0 even under IMP-based attack. Thus, the attack signal f (t) does not vanish
over time and eventually acts as an input to the system in (2.12). Assume that there exists

at least one common marginal eigenvalue between the system dynamics matrix A in (2.1)
and the attacker dynamics matrix Ψ in (2.9), i.e., EΨ ∩ EA (cid:54)= ∅. Then, the multiplicity of at
least one marginally stable pole becomes greater than 1. Therefore, the attacker destabilizes

the state of the agent in (2.12). Moreover, since (2.22) is not satisﬁed, then the attack is on

root nodes, and root nodes have a path to all other nodes in the network, the state of the

all agents become unstable. This completes the proof.

Theorem 3. Consider the global dynamics of DMAS (2.12) under attack f (t). Then, the
local neighborhood tracking error (2.3) converges to zero for all intact agents if uc(t) → 0.

24

Moreover, intact agents that are reachable from the compromised agents do not converge to

the desired consensus trajectory.

Proof. In the presence of attacks, the global dynamics of the DMAS (2.12) with (2.13) can

be written as

(cid:104)

˙x(t) = (IN ⊗ A) x(t) + (IN ⊗ B)((−cL ⊗ K)x(t) + f (t)) ,
(cid:105)T

is the global vector of the state of agents and f (t) =
denotes the global vector of attacks. As shown in (2.14) that if uc(t) →

xT
1 (t), . . . , xT

(2.25)

(cid:105)T

N (t)

(cid:104)

where x(t) =

f T
1 (t), . . . , f T

N (t)

0, agents reach a steady state. That is,

cKηi → −fi

∀ i ∈ N ,

(2.26)

where ηi denotes the local neighborhood tracking error of agent i deﬁned in (2.3). For an
intact agent, by deﬁnition one has fi = 0, and thus (2.26) implies that the local neighborhood
tracking error (2.3) converges to zero. Now, we show that intact agents that are reachable

from the compromised agent do not synchronize to the desired consensus behavior. To do

this, let agent j be under attack. Assuming that all intact agents synchronize, one has
xk = xi ∀i, k ∈ N − {j}. Now, consider the intact agent i as an immediate neighbor of the
compromised agent j. Then using (2.13), if uc(t) → 0, for intact agent i, i.e., fi = 0, one has

(cid:88)

k∈Ni−{j}

aij(xk − xi) + (xj − xi) → 0,

(2.27)

where xk denotes the state of the all intact neighbors of agent i. On the other hand, (2.7)
shows that the state of the compromised agent j, i.e., xj, is deviated from the desired
consensus value with a value proportional to fj. Therefore, (2.27) results in deviating the
state of the immediate neighbor of the compromised agent j from the desired consensus

behavior, which contradicts the assumption. Consequently, intact agents that have a path

to the compromised agent do not reach consensus, while their local neighborhood tracking

error is zero. This completes the proof.

25

Remark 4. An attacker can exploit the security of the network by eavesdropping and mon-

itoring the transmitted data to identify at least one of the marginal eigenvalues of the agent

dynamics by identifying the system dynamics using data-based approaches. Eavesdropping
(cid:3)

based design of attack signal is also discussed in [25].

2.4.3 Extention of Analysis Results to the Case of Noisy Communication

Up to now, the presented analysis has been under the assumption that the communication

is noise free. We now brieﬂy discuss what changes if the communication noise is present,

and propose attack detection and mitigation in the presence of communication noise.

In

the presence of Gaussian distributed communication noise, the local neighborhood tracking

error in (2.3) becomes

(2.28)
where ωi ∼ N (0, Σωi) denotes the aggregate Gaussian noise aﬀecting the incoming informa-
tion to agent i and is given as

¯ηi = ηi + ωi,

ωi =

aijωij,

(2.29)

(cid:88)

j∈Ni

where ωij denotes noise in measurement from agent j to agent i. In such situations, the
DMAS consensus problem deﬁned in Problem 1 changes to the mean square consensus prob-

lem [37].

In the presence of Gaussian noise, based on (2.28), the control protocol in (2.4)-(2.3)

becomes [37]

ui(t) = cKa(t)

N(cid:88)

aij(xj(t) − xi(t)) + ωi

∀ i ∈ N ,

(2.30)

with a(t) as time-dependent consensus gain and ωi ∼ N (0, Σωi) is deﬁned in (2.29). Based
on mean square consensus, one has

j=1

E[ui(t)] →0

∀ i ∈ N ,

lim
t→∞

(2.31)

26

and thus, based on (2.1), the steady state of agents converge to a consensus trajectory in

mean square sense and its global form in (2.14) becomes

ss = (IN ⊗ A) xm
˙xm
ss,

(2.32)

where xm

ss = limt→∞E[x(t)] denotes the global steady state of agents in mean square sense.
Then, following the same procedure as Lemmas 3-4 and Theorems 1-3, one can show that

an IMP-based attack does not change the statistical properties of the local neighborhood

tracking error, while a non-IMP based attack does. Moreover, the local neighborhood track-

ing error converges to zero in mean for an IMP-based attack, and it does not converges to

zero in mean for a non-IMP based attack.

Remark 5. In general, the noise associated with electronic circuits at the receiver end lies

under the category of thermal noise and statistically modeled as Gaussian [38]. Therefore, it

is standard to assume ωij in (2.29) is Gaussian [37]. However, if the assumption is violated,
the same attack detection mechanism can be still developed using the divergence estimation
(cid:3)

approach presented in [39]-[40] for all distributions.

2.5 An Attack Detection Mechanism

In this section, Kullback-Liebler (KL)-based attack detection and mitigation approaches

are developed for both IMP-based and non-IMP-based attacks.

Deﬁnition 6 (Kullback-Leibler Divergence) [41]-[42]. The Kullback-Leibler divergence

between two probability densities PX and PZ of a random variable Θ is deﬁned as

(cid:90)

(cid:18) PX (θ)

(cid:19)

PZ (θ)

DKL(X||Z) =

PX (θ) log

θ∈Θ

.

(2.33)

In the following subsections, KL-divergence is used to detect IMP-based and non-IMP-based

attacks on DMASs.

27

2.5.1 Attack detection for IMP-based attacks

In this subsection, an attack detector is designed to identify IMP-based attacks. To this

end, two error sequences τi and ϕi are deﬁned based on only local exchanged information
for agent i as

and

τi =

ϕi =

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
(cid:13)(cid:13),

(cid:88)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
(cid:88)

j∈Ni

aijdij

j∈Ni

(cid:13)(cid:13)aijdij

(2.34)

(2.35)

where the measured discrepancy dij between agent i(cid:48)s state and its neighbor j(cid:48)s state under
attack becomes

j − xc

i + ωij ∀j ∈ Ni,

dij = xc

(2.36)
where ωij ∼ N (0, Σωij ) denotes the Gaussian incoming communication noise from agent j
to agent i. Moreover, xc
j is the possibly
corrupted information it receives from its jth neighbor. If agent i is not compromised, then
i = xi, and, similarly, if agent j is not compromised, then xc
j = xj. In fact, (2.34) is the
xc
norm of the summation of the measured discrepancy of agent i and all its neighbors, and

i is the measured state of agent i under attack and xc

(2.35) is the summation of norms of those measured discrepancies. In the absence of attack,

these two signals show the same behavior in the sense that their means converge to zero.

Remark 6.

In the presence of an IMP-based attack and in the absence of noise, based

on Theorem 3, τi goes to zero for intact agents, but ϕi does not converge to zero, as its
convergence implies consensus, which cannot not happen based on Theorem 3. On the other

hand, for an IMP-based attack in the presence of noise, based on Theorem 3, τi converges
to zero in mean because the local neighborhood tracking error converges to zero in mean

for all agents. In contrast, the mean of ϕi depends upon not only on the mean of the noise
signal, but also of the attack signal. Therefore, the behavior of these two signals signiﬁcantly

28

diverges in the presence of attacks which can be captured by KL-divergence based detection
(cid:3)

mechanism.

Note that one can measure τi and ϕi based on the exchanged information among agents,
which might be corrupted by the attack signal. Existing KL-divergence methods are, never-

theless, developed for Gaussian signals. However, while the communication noise is assumed

to be Gaussian, error sequences (2.34) and (2.35) are norms of some variable with Gaus-

sian distributions, thus, they have univariate folded Gaussian distributions given by [26]
ϕi ∼ FN (µ1i, σ2

1i) and τi ∼ FN (µ2i, σ2

2i) . That is,
− (qi−µ1i)2

2σ2

1i +

− (qi−µ2i)2

2σ2

2i +

1√
2π |σ1i| e

1√
2π |σ2i| e

− (qi+µ1i)2

2σ2
1i

1√
2π |σ1i|e

− (qi+µ2i)2

2σ2
2i

1√
2π |σ2i|e

(2.37)

,

Pϕi(qi, µ1i, σ1i) =

Pτi(qi, µ2i, σ2i) =

where µ1i and σ1i are the mean and variance of the error sequences ϕi and µ2i and σ2i are
the mean and variance of the error sequences τi. Using (2.33), the KL divergence in terms
of the local error sequences ϕi and τi can be deﬁned as

(cid:32)

(cid:33)

DKL(ϕi||τi) =

Pϕi(qi) log

Pϕi(qi)
Pτi(qi)

dqi = EPϕi

log

Pϕi(qi)
Pτi(qi)

,

(2.38)

[.] represents the expectation value with respect to the distribution of the ﬁrst

(cid:90)

(cid:32)

(cid:33)

where EPϕi
sequence [26].

A KL divergence formula for the folded Gaussian distributions is now developed in the

following lemma.
Lemma 5. Consider the error sequences τi and ϕi in (2.34)-(2.35) with folded Gaussian
distributions Pϕi and Pτi in (2.37). Then, the KL divergence between error sequences τi and
ϕi, i.e., DKL(ϕi||τi), becomes
σ2
2i
σ2
1i

DKL(ϕi||τi) ≈ 1
2

− 1 + (σ−2

2i σ2
1i)

(cid:33)

(cid:32)

1
2

+

σ−2
2i (µ2i − µ1i)2 + 1
ρ2
4
2σ2
1i

 −

e

ρ2
1
2σ2

1i + e




ρ2
2
2σ2
1i

(2.39)

1 − e

log

 + e

8µ2
1i
σ2
1i

− µ2
1i
2σ2
1i

 1

2

e

4µ2
1i
σ2
1i

1
2

e

ρ2
3
2σ2

1i + e

29

for some ρ1 = (µ1i − 2µ2iσ2
ρ4 = (µ1i + 4µ2iσ2

1iσ−2
2i ).

1iσ−2

2i ), ρ2 = (µ1i + 2µ2iσ2

1iσ−2

2i ), ρ3 = (µ1i − 4µ2iσ2

1iσ−2

2i ) and

Proof. See Appendix A.

Note that in (2.39), τi and ϕi are error sequences and the divergence between their
distributions depends on their means and variances. One can calculate the value of these

error sequences based on the equations (1.36) and (1.37) at each time instant. Then, based on

previous m samples one can determine the mean and the variance of distributions. Therefore,

we do not explicitly need to know these statistical parameters.

In the following theorem, we show that the eﬀect of IMP-based attacks can be captured

using the KL divergence deﬁned in (2.39).

Theorem 4. Consider the DMAS (2.1) along with the controller (2.13), and under the

IMP-based attacks. Assume that the communication noise sequences are i.i.d. Then, for an

intact agent i reachable from the compromised agent,

(cid:90) k+T−1

1
T

k

DKL(ϕi||τi)dκ > γi,

(2.40)

where ϕi and τi are deﬁned in (2.34) and (2.35), respectively, and T and γi represent the
window size and the predesigned threshold parameter.

Proof. According to Theorem 3, the local neighborhood tracking error goes to zero for intact

agents in the presence of an IMP-based attack when there is no communication noise. In
the presence of communication noise with Gaussian distribution, i.e., ωij ∼ (0, Σωij ) and
IMP-based attack, the expectation value of the local neighborhood tracking error for intact

agent i becomes

E[ηi] = E[

aijdij] → 0,

(2.41)

(cid:88)

j∈Ni

30

where the measured discrepancy dij is deﬁned in (2.36). Using (2.41), one can write (2.34)
as

τi =

aijdij

(2.42)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)

(cid:88)

j∈Ni

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∼ FN (0, ¯υ2

ωi),

which represents a folded Gaussian distribution with mean zero and variance ¯υ2
the mean and variance of the distribution Pτi in (2.37) become µ2i = 0 and σ2

ωi. Note that
ωi.
2i = ¯υ2

Since noise signals are independent and identically distributed, from (2.35), one can infer
that the folded Gaussian distribution Pϕi in (2.37) has the following statistical properties

ϕi ∼ FN (µ

, ¯υ2
ωi

+ ˆυ2
ωi

f d
i

+ ¯υ2
f d
i

),

(2.43)

where µ

and ¯υ2
ωi

+ ˆυ2
ωi

+ ¯υ2
f d
i

f d
i

represent the overall mean and covariance, respectively, due to

the communication noise and overall deviation from the desired behavior in intact neighbors

reachable from the compromised agent.

In the absence of attack, the statistical properties corresponding to sequences τi and ϕi
) , respectively, and the corresponding KL diver-

ωi) and FN (0, ¯υ2

ωi + ˆυ2
ωi

become FN (0, ¯υ2
gence in (2.39) becomes

(cid:32)

KL(ϕi||τi) ≈ 1
Dwa

2

log

¯υ2
ωi
+ ˆυ2
ωi

¯υ2
ωi

+ ¯υ−2
ωi

(cid:33)

ˆυ2
ωi

))

,

(2.44)

represents additional variance in sequence ϕi, which depends on the communica-

where ˆυ2
ωi
tion noise.

Note that τi in (2.34) is the norm of the summation of the measured discrepancy of agent
i and all its neighbors whereas ϕi in (2.35) is the summation of norms of those measured dis-
crepancies. Even in the absence of attack, they represent folded Gaussian distributions with

zero means and diﬀerent covariances due to application of norm on measured discrepancies.

Now, in the presence of IMP-based attacks, using the derived form of KL divergence for

31

folded Gaussian distributions from Lemma 5, one can simplify (2.39) using (2.42)-(2.43) as

log

DKL(ϕi||τi) ≈ 1
2

¯υ2
ωi
+ ˆυ2
ωi

¯υ2
ωi

+ ¯υ−2
ωi

(¯υ2
f d
i

+ ˆυ2
ωi

)

+

¯υ−2
ωi

1
2

)2 +

(µ

f d
i

1
2

e

¯υ2
ωi

)2

4(µ

f d
i
+ˆυ2
ωi

+¯υ2
f d
i

8(µ

f d
i
+ˆυ2
ωi

¯υ2
ωi

)2

+¯υ2
f d
i

+ ¯υ2
f d
i

1 − e



 .

(2.45)

Then, one can design the threshold parameter γi such that

DKL(ϕi||τi)dκ > γi,

(2.46)

where T denotes the sliding window size. This completes the proof.

Based on Theorem 4, one can use the following conditions for attack detection [31]

DKL(ϕi||τi)dκ < γi
DKL(ϕi||τi)dκ > γi

: H0

: H1,

(2.47)

(cid:90) k+T−1

1
T

k



1
T
1
T

(cid:90) k+T−1
(cid:90) k+T−1

k

k

where γi denotes the designed threshold for detection, the null hypotheses H0 represents the
intact mode and H1 denotes the compromised mode of an agent.

Remark 7. Note that several existing results on the faults/attack’s detection employ log-

likelihood or generalized likelihood ratio-based test statistics [31]-[34]. Similarly, to get

(1.47), we consider a log-likelihood ratio-based test statistic given by

Λ(qi) = log(

Pϕi(qi)
Pτi(qi)

)

(2.48)

where observations qi are drawn from probability distribution functions Pϕi and Pτi de-
ﬁned in (2.37). Based on (2.38), the expectation of Λ(qi) with respect to the probability
distribution function Pϕi becomes

EPϕi

[Λ(qi)] = EPϕi

[log(

Pϕi(qi)
Pτi(qi)

)] = DKL(ϕi||τi)

(2.49)

32

In the absence of attack, i.e., when the intact mode hypothesis H0 is true, based on Remark
6 and Theorem 4, DKL(ϕi||τi) in (2.49) becomes a small positive value. This is because Pϕi
and Pτi both have zero mean in the absence of attack. Moreover, in the presence of attack,
i.e., when the compromised mode hypothesis H1 is true, based on Remark 6 and Theorem
4, DKL(ϕi||τi) in (2.49) becomes large because Pϕi and Pτi will have diﬀerent mean and
covariance values. Therefore, based on log-likelihood ratio-based test statistic, DKL(ϕi||τi)
is used in (2.47) for the attack detection over a sliding window. Note that the notion of

sliding window is employed in order to avoid the false detection which may occur due to

legitimate change, i.e., transient behavior of agents in the DMAS. The designed threshold

γi and the sliding window size T in (2.47) are predeﬁned parameters. The threshold γi is
typically designed based on the knowledge of the bound on the communication noise ωij
in (36). This design choice, even though ensures that the compromised mode H1 in (2.47)
will not be active in the absence of attacks, can imply that the threshold is conservative.

Selection of the predeﬁned parameters for the adversary detection and stealthiness based on
(cid:3)

the system knowledge is reported in the literature [35]-[36].

2.5.2 Attack detection for non-IMP-based attacks

This subsection presents the design of a KL-based attack detector for non-IMP based attacks.

It was shown in Theorem 3 that the local neighborhood tracking error goes to zero if

agents are under IMP-based attacks. Therefore, for the case of non-IMP-based attacks,

one can identify these types of attacks using the changes in the statistical properties of the

local neighborhood tracking error. In the absence of attack, since the Gaussian noise, i.e.,
ωi ∼ N (0, Σωi), is considered in the communication link, the local neighborhood tracking
error ηi in (2.28) has the following statistical properties

ηi ∼ N (0, Σωi)

(2.50)

and it represents the nominal behavior of the system.

33

In the presence of attacks, using (2.28), the local neighborhood tracking error ηa

i can be

written as

ηa
i =

(cid:88)

j∈Ni

aijdij,

(2.51)

where measured discrepancy under attacks dij is deﬁned (2.36). From (2.51), one has

i ∼ N (µfi
ηa

, Σfi

+ Σωi),

(2.52)

where µfi
corrupted states under attacks as given in (2.36).

and Σfi

are, respectively, mean and covariance of the overall deviation due to

Now, since both ηa

i and ηi have normal Gaussian distributions, the KL divergence in the

terms of ηa

i and ηi as DKL(ηa

i ||ηi) can be written as [42]

log

(cid:12)(cid:12)
(cid:12)(cid:12)Σηi
(cid:12)(cid:12)(cid:12)Σηa
(cid:12)(cid:12)(cid:12) − n + tr(Σ−1

ηi

i



)

Σηa
i

DKL(ηa

i ||ηi) =

1
2

(µηi − µηa
where µηi and Σηi denote the mean and covariance of ηi and µηa
i
and covariance of ηa

denote the mean
i . Moreover, n denotes the dimension of the error sequence. Deﬁne the

(µηi − µηa

and Σηa
i

1
2

+

),

i

)T Σ−1
ηi

i

average of KL divergence over a window T as

(2.53)

(cid:90) k+T−1

¯Di =

1
T

k

DKL(ηa

i ||ηi)dκ.

(2.54)

The following theorem shows that the eﬀect of non-IMP based attacks can be detected

using the KL divergence between the two error sequences ηa

i and ηi.

Theorem 5. Consider the DMAS (2.1) along with the controller (2.13). Then,

1. in the absence of attack, ¯Di deﬁned in (2.54) tends to zero.

2. in the presence of a non-IMP-based attack, ¯Di deﬁned in (2.54) is greater than a

predeﬁned threshold γi.

34

Proof. In the absence of attacks, the statistical properties of sequences ηi and ηa
same as in (2.50). Therefore, the KL divergence DKL(ηa
this makes ¯Di in (2.54) zero. This completes the proof of part 1.

i are the
i ||ηi) in (2.53) becomes zero and

To prove Part 2, using (2.50)-(2.52) in (2.53) and the fact that (Σ−1
ωi

+ Σωi) − n =

(Σfi

tr(Σ−1
ωi

Σfi

), one can write the KL divergence between ηa

i and ηi as

DKL(ηa

i ||ηi) =

1
2

(log

Σfi

) + µT
fi

Σ−1
ωi

).

µfi

(2.55)

Then, using (2.54), one has

k+T−1

¯Di =

1
T

1
2

(log

(cid:90)

k

+ Σωi

(cid:12)(cid:12)

(cid:12)(cid:12)Σωi
(cid:12)(cid:12)(cid:12)Σfi
(cid:12)(cid:12)Σωi
(cid:12)(cid:12)(cid:12)Σfi

(cid:12)(cid:12)

+ Σωi

ωi

(cid:12)(cid:12)(cid:12) + tr(Σ−1
(cid:12)(cid:12)(cid:12) + tr(Σ−1

ωi

Σfi

 ¯Di < γi : H0

¯Di > γi

: H1

) + µT
fi

Σ−1
ωi

µfi

) > γi,

(2.56)

where T and γi denote the sliding window size and the predeﬁned design threshold, respec-
tively. This completes the proof.

Based on Theorem 5, one can use the following conditions for attack detection

,

(2.57)

where γi denotes the designed threshold for detection, H0 represents the intact mode of the
system and H1 denotes the compromised mode of the system.

2.6 An Attack Mitigation Mechanism

In this section, both IMP-based and non-IMP-based attacks are mitigated using the

presented detection mechanisms in the previous section.

2.6.1 Self-belief of agents about their outgoing information

To determine the level of trustworthiness of each agent about its own information, a self-belief

value is presented.

35

Using the DKL(ϕi||τi) and DKL(ηa

i ||ηi) from Theorems 4 and 5, we deﬁne c1

self-belief of agent i calculated under IMP-based attack and c2

i (t) as the
i (t) as its self-belief under

non-IMP based attack as

t(cid:90)

0

cj
i (t) = κj

eκj (τ−t)χj

i (τ )dτ, j = 1, 2,

where 0 ≤ cj

i (t) ≤ 1 with

and

χ1
i (t) =

∆i

∆i + DKL(ϕi||τi)

χ2
i (t) =

∆i

∆i + DKL(ηa

,

i ||ηi)

(2.58)

(2.59)

(2.60)

Moreover, ∆i represents the threshold to account for the channel fading and other uncer-
tainties and κj > 0 denotes the discount factor. Note that χj
i (t), j = 1, 2 in (2.58) depends
on divergence between error sequences which are functions of time and consequently, χj
i (t)
is also time-dependent. However, in order to maintain consistency with detection part, we
have avoided time-indexing for divergence terms DKL(ϕi||τi) and DKL(ηa
i ||ηi). Based on
Leibniz integral rule [43], equation (2.58) can be implemented by the following diﬀerential

equation

˙cj
i (t) + κjcj

i (t) = κjχj

i (t), j = 1, 2.

divergence term DKL(ϕi||τi) ( DKL(ηa
zero and consequently makes the value of c1

According to Theorems 4 (Theorem 5), for IMP-based attack (non-IMP based attack), the
i (t)) approach
i (t)) close to zero. On the other hand,
i (t)) approach 1 and,
i (t) is, the more

i (t)) becomes close to 1. The larger the value of cj

without an attack, divergence term tend to zero, making χ1

i ||ηi)) increases, which makes χ1

consequently, c1

i (t) (χ2

i (t) (χ2

i (t) (c2

i (t) (c2

conﬁdent the agent is about the trustworthiness of its broadcasted information.

Then, using cj

i (t), j = 1, 2 deﬁned in (2.58), the self-belief of an agent i is deﬁned as

ξi(t) = min{c1

i (t), c2

i (t)}.

(2.61)

36

If an agent i is under direct attack or receives corrupted information from its neighbors, then

the self-belief of the agent i tends to zero. In such a situation, it transmits the low self-belief

value to its neighbor to put less weight on the information they receive from it and this

prevents attack propagation in the distributed network.

2.6.2 Trust of agents about their incoming information

The trust value represents the level of conﬁdence of an agent on its neighbors’ information.

If the self-belief value of an agent is low, it forms beliefs on its neighbors (either intact or

compromised) and updates its trust value which depends on the beliefs on each of its neigh-

bors using only local information. Therefore, agents identify their compromised neighbors

and discard their information.

Using the KL divergence between exchanged information of agent i and its neighbor, one

j∈Ni

xj and Λ1, Λ2 > 0 represent the threshold to account for channel fading
and other uncertainties, and κ3 > 0 denotes the discount factor. For the compromised
neighbor, the KL divergence DKL(xj||mi) tends to zero, which makes Lij(t) close to zero.
Consequently, this makes the value of ηij(t) close to zero. On the other hand, if the incoming
neighbor is not compromised, then DKL(xj||mi) increases and makes ηij(t) approach 1.
Based on Leibniz integral rule, equation (2.62) can be implemented using the following

diﬀerential equation

˙ηij(t) + κ3ηij(t) = κ3Lij(t).

37

can deﬁne ηij(t) as

where 0 ≤ ηij(t) ≤ 1 with

Lij(t) = 1 −

with mi = (cid:80)

t(cid:90)

ηij(t) = κ3

eκ3(τ−t)Lij(τ )dτ,

0

(cid:18)

Λ1 + e

(cid:19)

Λ1
DKL(xj||mi)

−Λ2

∀j ∈ Ni

,

(2.62)

(2.63)

Now, we deﬁne the trust value of an agent on its neighbors as

Ωij(t) = max(ξi(t), ηij(t)),

(2.64)

with 0 ≤ Ωij(t) ≤ 1.

In the absence of attacks, the state of agents converge to the consensus trajectory and the
KL divergence DKL(xj||mi), ∀j ∈ Ni tends to zero which results in Ωij(t) being 1 ∀j ∈ Ni.
In the presence of attacks, ηij(t) corresponding to the compromised agents tends to zero.

Figure 2.1: Schematic representation of the proposed resilient approach for DMASs.

2.6.3 The mitigation mechanism using trust and self-belief values

In this subsection, the trust and self-belief values are utilized to design the mitigation al-

gorithm. To achieve resiliency, both self-belief and trust values are incorporated into the

exchange information among agents as shown in Fig. 2.1. Consequently, the resilient form

of local neighborhood tracking error (2.28) is presented as

(cid:0)xj − xi

(cid:1) + ωi,

(2.65)

(cid:88)

j∈Ni

˜ηi =

Ωij(t)ξj(t)aij

38

where Ωij(t) and ξj(t) denote, respectively, the trust value in (2.65) and the self-belief of
neighboring agents in (2.61). Based on the controller in (2.4) with resilient form of local

neighborhood tracking error in (2.65), the resilient control protocol can be written as

˜ui = cK ˜ηi,

∀ i ∈ N .

(2.66)

According to (2.65), the topology of the graph changes over time due to incorporation of

the trust and the self-belief values of agents, thus we denote the time-varying graph as
G(t) = (V, E(t)) with E(t) ⊆ V × V representing the set of time-varying edges.

Now, based on following deﬁnitions and lemma, we formally present Theorem 6 to il-

lustrate that the trust and self-belief based proposed resilient control protocol (2.66) solves
Problem 1 for all intact agents, i.e., NInt = N\NComp as deﬁned in Deﬁnition 3 achieve the
ﬁnal desired consensus, regardless of attacks in the distributed network.
Deﬁnition 7 (r-reachable set) [20]. Given a directed graph G and a nonempty subset
Vs ⊂ V, the set Vs is r-reachable if there exists a node i ∈ Vs such that |Ni\Vs| (cid:62) r, where
r ∈ Z(cid:62)0.
(cid:3)
Deﬁnition 8 (r-robust graph) [20]. A directed graph G is called an r-robust graph with
r ∈ Z(cid:62)0 if for every pair of nonempty, disjoint subsets of V, at least one of the subsets is
(cid:3)
r-reachable.

Assumption 3.

If at most q neighbors of each intact agents are under attack, at least

(q + 1) neighbors of each intact agents are intact [15].

[20] Consider an r-robust time-varying directed graph G(t). Then, the graph

Lemma 6.
has a directed spanning tree, if and only if G(t) is 1-robust.

The following theorem shows that the proposed resilient controller (2.66) guarantees

synchronization despite attack.

Theorem 6. Consider the DMAS (2.1) under attack with the proposed resilient control
protocol ˜ui in (2.66). Let the time-varying graph G(t) be such that at each time instant t,
t→∞||xj(t) − xi(t)|| = 0 ∀i, j ∈ NInt.
Assumption 1 and Assumption 3 are satisﬁed. Then,

lim

39

Proof. The DMAS (2.1) with the proposed resilient control protocol ˜ui in (2.66) without
noise can be written as

˙xi = Axi + cBK

(2.67)

aij(t)(cid:0)xj − xi

(cid:1),

(cid:88)

j∈Ni

with aij(t) = Ψij(t)Cj(t)aij, where Ωij(t) and ξj(t) represent, respectively, the trust value
in (2.65) and the self-belief of neighboring agents in (2.61). The global form of resilient

system dynamics in (2.67) becomes

˙x = (IN ⊗ A − cL(t) ⊗ BK)x,

(2.68)

where L(t) denotes the time-varying graph Laplacian matrix of the directed graph G(t).
Based on Assumption 3, even if q neighbors of an intact agent are attacked and collude to send

the corrupted value to misguide it, there still exists q + 1 intact neighbors that communicate

values diﬀerent from the compromised ones. Moreover, since at least q+1 of the intact agent’s

neighbors are intact, it can update its trust values to remove the compromised neighbors.
Furthermore, since the time varying graph G(t) resulting from isolating the compromised
agents is 1-robust, based on Deﬁnition 8 and Lemma 6, the entire network is still connected

to the intact agents. Therefore, there exists a spanning tree in the graph associated with all
intact agents NInt. Hence, it is shown in [44] that the solutions of DMAS in (2.68) reach
consensus on desired behavior if the time-varying graph G(t) jointly contains a spanning tree
t→∞||xj(t) − xi(t)|| = 0 ∀ i, j ∈ NInt
as the network evolves with time. This results in lim

assymptotically. This completes the proof.

Remark 8. The proposed approach discards the compromised agent only when an attack is

detected, in contrast to most of the existing methods that are based on solely the discrepancy

among agents. Note that discrepancy can be the result of a legitimate change in the state of

one agent. Moreover, in the beginning of synchronization, there could be a huge discrepancy
(cid:3)

between agents’ states that should not cause discarding information.

40

2.7 Simulation Results

Consider a group of 5 homogeneous agents with the dynamics deﬁned as

˙xk = Axk + Buk

k = 1, . . . , 5,

(2.69)

where A = [0, −1; 1 0] and B = [1, 1].

The communication graph is shown in Fig. 2.2. We assume zero-mean Gaussian com-

munication noise with following statistical properties N (0, 0.1).

Figure 2.2: Communication topology.

IMP-based attacks

2.7.1
Since the eigenvalues of A in (2.69) are λ1,2 = ±i, based on Deﬁnition 2, the attack signal
f = 20 sin(t) is an IMP-based attack. Let the attack signal be injected to Agent 1 (root

node) at time t=20. The results are shown in Fig. 2.3. It can be seen that the compromised

agent destabilizes the entire network. This result is consistent with Theorem 2. It is shown

in Fig. 2.4 that the same IMP-based attack on Agent 5 (non-root node) cannot destabilize

the entire network. However, Agent 4, which is the only agent reachable from Agent 5,

does not synchronize to the desired consensus trajectory. Moreover, one can see in Fig. 2.5

that the local neighborhood tracking error converges to zero for all intact agents except the

compromised Agent 5. These results are in line with Theorem 3. Fig. 2.6 shows the KL

divergence in the presence of IMP based attack on Agent 5 grows for corrupted agent. This

41

result follows Theorem 4. Then, the eﬀect of attack is rejected using the presented belief-

based detection and mitigation approach in Theorem 4 and Theorem 6. Fig. 2.7 shows that

reachable agents follow the desired consensus trajectory, even in the presence of the attack.

Figure 2.3: The state of agents when Agent 1 is under an IMP-based attack.

Figure 2.4: Agent 5 is under IMP-based attack. The state of agents.

Figure 2.5: Agent 5 is under IMP-based attack. The local neighborhood tracking error of agents.

42

510152025303540Time (s)-1000-50005001000StateAgent 1Agent 2Agent 3Agent 4Agent 501020304050Time (s)-505StateAgent 1Agent 2Agent 3Agent 4Agent 501020304050Time (s)-10-50510LNT ErrorAgent 1Agent 2Agent 3Agent 4Agent 5Figure 2.6: Divergence for state of agents when Agent 5 is under a IMP-based attack.

Figure 2.7: The state of agents using the proposed attack detection and mitigation approach for IMP-based

attack.

Figure 2.8: The state of agents when Agent 5 is under a non-IMP-based attack.

2.7.2 Non-IMP-based attacks

The attack signal is assumed to be f = 10 + 5 sin(2t). The eﬀect of this attack on Agent

5 (non-root node) is shown in Fig. 2.8. It can be seen that this non-IMP-based attack on

Agent 5 only aﬀects the reachable Agent 4. Then, Fig. 2.9 shows that the KL divergence in

the presence of non-IMP based attack on Agent 5 grows for compromised agent. It is shown

43

01020304050Time (s)0200400600800KL Divergence01020304050Time (s)-10-50510StateAgent 1Agent 2Agent 3Agent 4Agent 501020304050Time (s)-505StateAgent 1Agent 2Agent 3Agent 4Agent 5Figure 2.9: Divergence for state of agents when Agent 5 is under a non-IMP based attack.

Figure 2.10: The state of agents after attack detection and mitigation for non-IMP based attack.

in Fig. 2.10 that the eﬀect of the attack is removed for the intact Agent 4 using belief-based

detection and mitigation approaches presented in Theorems 5 and 6.

2.8 Conclusion

A resilient control framework has been introduced for DMASs. First, the eﬀects of IMP-

based and non-IMP-based attacks on DMASs have been analyzed using a graph-theoretic

approach. Then, a KL divergence based criterion, using only the observed local information

of agents, has been employed to detect attacks. Each agent detects its neighbors’ misbe-

haviors, consequently forming a self-belief about the correctness of its own information, and

continuously updates its self-belief and communicates it with its neighbors to inform them

about the signiﬁcance of its outgoing information. Additionally, if the self-belief value of

an agent is low, it forms beliefs on the type of its neighbors (intact or compromised) and,

consequently, updates its trust of its neighbors. Finally, agents incorporate their neighbors’

44

01020304050Time (s)01020304050KL Divergence01020304050Time (s)-50510StateAgent 1Agent 2Agent 3Agent 4Agent 5self-beliefs and their own trust values in their control protocols to slow down and mitigate

attacks.

2.9 Appendix

Proof of Lemma 5

Using (2.39), the KL divergence between error sequences ϕi and τi can be written as

DKL(ϕi||τi) = E1[log Pϕi − log Pτi],

(2.70)

where probability density functions Pϕi and Pτi are deﬁned in (2.37). Using (2.37), (2.70)
and the logarithm property log(a + b) = log(a) + log(1 + b/a), one has

DKL(ϕi||τi) =



= E1[log

(cid:124)

− (qi−µ1i)2

2σ2
1i

1√
2π |σ1i|e

1 + e

− 2qiµ1i
σ2
1i

+ E1[log

(cid:124)

− (qi−µ2i)2

2σ2
2i

1√
2π |σ2i| e

 − log

(cid:123)(cid:122)
 − log
1 + e
(cid:123)(cid:122)

T1

T2

− 2qiµ2i
σ2
2i

]
(cid:125)

]
(cid:125)

.

(2.71)

The ﬁrst term in (2.71) is a KL divergence formula for statistical sequences with normal

Gaussian distribution which is given in [42] as

2i σ2
1i)

σ−2
2i (µ2i − µ1i)2.

+

1
2

(2.72)

(cid:32)

T1 =

1
2

log

σ2
2i
σ2
1i

(cid:33)

− 1 + (σ−2
(cid:16)
(cid:88)

n(cid:62)0

The second term T2 in (2.71), using power series expansion

(−1)nan+1(cid:46)

(cid:17)

log(1 + a) =

(n + 1)

and ignoring higher order terms, can be approximated as

T2 ≈ E1[e

− 2qiµ1i

1i − (e
σ2

2

− 2qiµ1i
σ2
1i )
2

] − E1[e

− 2qiµ2i

2i − (e
σ2

2

− 2qiµ2i
σ2
2i )
2

],

(2.73)

45

which can be expressed as

∞(cid:90)

−∞

− 2qiµ1i

Pϕie

1i dqi − 1
σ2
∞(cid:90)
2

∞(cid:90)

−∞

T2 ≈
∞(cid:90)

−
−∞

− 2qiµ2i

σ2
2i dqi +

1
2

Pϕie

− 4qiµ2i

σ2
2i dqi.

Pϕie

−∞

− 4qiµ1i

σ2
1i dqi

Pϕie

Now, the ﬁrst term of T2 can be written as

− 2qiµ1i

σ2
1i dqi

Pϕie

− (qi+µ1i)2

2σ2
1i

1√
2π |σ1i|e

− (qi+µ1i)2+4qiµ1i

2σ2
1i

1√
2π |σ1i| e

∞(cid:90)

−∞

dqi +

∞(cid:90)
∞(cid:90)

−∞

=

−∞

Using the fact that density integrates to 1, (2.75) becomes

∞(cid:90)

−∞

− 2qiµ1i

σ2
1i dqi = 1 + e

4µ2
1i
σ2
1i .

Pϕie

Similarly, second term of T2 can be written as

∞(cid:90)

−∞

−1
2

− 4qiµ1i

σ2
1i dqi

Pϕie

= −

√
2

1
2π |σ1i|

which yields

∞(cid:90)

e
∞(cid:90)

−∞

− (qi+3µ1i)2−8µ2

1i

2σ2
1i

− (qi+5µ1i)2−24µ2

1i

2σ2
1i

dqi + e

−1
2

− 4qiµ1i

Pϕie

1i dqi = −1
σ2
2

e

4µ2
1i
σ2
1i + e

12µ2
1i
σ2
1i

 .

−∞
The third term of T2 is
− 2qiµ1i

∞(cid:90)

Pϕie

σ2
1i dqi

−
−∞

e

∞(cid:90)

−∞

= −

1√
2π |σ1i|

− (qi−µ1i)2

2σ2
1i

− 2qiµ2i
e

σ2
2i + e

− (qi+µ1i)2

2σ2
1i

− 2qiµ2i
σ2
2i

e

46

(2.74)

(2.75)

(2.76)

(2.77)

(2.78)

(2.79)

dqi.

,

dqi

dqi,

which can be written in the form

∞(cid:90)
e

−
−∞

= −

− 2qiµ1i

σ2
1i dqi

Pϕie

− µ2

1i−ρ2
2σ2
1i

1

∞(cid:90)

−∞

− µ2

1i−ρ2
2σ2
1i

2

+e

∞(cid:90)

−∞

− (qi−ρ1)2

2σ2
1i

1√
2π |σ1i|e

− (qi−ρ2)2

2σ2
1i

dqi

1√
2π |σ1i|e

dqi

 ,

1iσ−2
where ρ1 = (µ1i − 2µ2iσ2
∞(cid:90)

1iσ−2
2i ) and ρ2 = (µ1i + 2µ2iσ2
1i−ρ2
2σ2

− 2qiµ1i

− µ2

1

1i dqi = −
σ2

− µ2

1i−ρ2
2σ2
1i

2

1i + e

2i ) which becomes

−
−∞

Pϕie

Similarly, the last term of T2 can be simpliﬁed as
− µ2

− 2qiµ1i

∞(cid:90)

1
2

−∞

Pϕie

σ2
1i dqi =

1
2

1i−ρ2
2σ2

3

1i + e

− µ2

1i−ρ2
2σ2
1i

4

e
e

.


 ,

(2.80)

(2.81)

(2.82)

(2.83)

1iσ−2
where ρ3 = (µ1i − 4µ2iσ2
and (2.82), T2 can be written as

2i ) and ρ4 = (µ1i + 4µ2iσ2

2i ) . Adding (2.76), (2.78), (2.81)

1iσ−2

 −

e





ρ2
1
2σ2
1i + e

ρ2
2
2σ2
1i

e
1
1 − e

2

ρ2
3
2σ2
1i + e

ρ2
4
2σ2
1i

 .

8µ2
1i
σ2
1i

− µ2
1i
2σ2
1i

T2 ≈ e

4µ2
1i
σ2
1i

1
2

e

+1 +

Now, using (2.72)-(2.73) and (2.83), one gets (2.39). This completes the proof.

47

CHAPTER 3

DETECTION AND MITIGATION OF DATA MANIPULATION ATTACKS

IN AC MICROGRIDS

3.1

Introduction

In the previous chapter, we analyzed the adverse eﬀect of attacks and designed resilient

distributed control mechanism for DMASs with guaranteed performance and consensus un-

der mild assumptions. This chapter validates the eﬀectiveness of the developed approach in

Chapter 2 by applying it to distributed frequency and voltage synchronization of AC micro-

grids under data manipulation attacks. The attack detection mechanism deploys Kullback-

Liebler (KL) divergence to measure the discrepancy between the Gaussian distributions of

the actual and expected local frequency/active power and voltage/reactive power neighbor-

hood tracking errors. To mitigate the negative impact of attack, a self-belief value, as an

indication of the probability of presence of attacks on neighbors of an agent, is presented

for each distributed energy resource (DER) by utilizing KL-based detectors. The self-belief

value is a measure of trustworthiness of the DER’s own outgoing information and is transmit-

ted to neighboring DERs. Moreover, the trustworthiness of the incoming information from

neighboring DERs is estimated using a trust factor. Trust for individual DERs is developed

based on the relative entropy between DER own information and its neighbor’s informa-

tion on the communication graph. The attack mitigation algorithm utilizes self-belief and

trust values to modify distributed control protocols. Finally, the performance of proposed

resilient frequency and voltage control techniques is veriﬁed through simulation of microgrid

tests system and hardware-in-the-loop (HIL) set-up using Opal-RT as a real-time digital

simulator.

48

3.2 Preliminaries

The communication network of a microgrid can be modeled by a graph. DERs are consid-

ered as the nodes of the communication graph and the communication links are considered
as the edges. A graph is usually expressed as G = (V,E,A) with a nonempty ﬁnite set of N
nodes V = {v1, v2, . . . , vN}, a set of edges or arcs E ⊂ V × V, and the associated adjacency
matrix A = [aij ∈ RN×N . aij is the weight of edge (vj, vi), and aij > 0 if (vj, vi) ∈ E,
otherwise aij = 0. The set of neighbors of node i is denoted as Ni = {j|(vj, vi) ∈ E}. The
aij. The Laplacian

in-degree matrix is deﬁned as D = diag{di} ∈ RN×N with di =(cid:80)

j∈Ni

matrix is deﬁned as L = D − A [80].
Assumption 1. The communication graph G has a spanning tree.

3.3 Conventional Distributed Secondary Control

In the microgrid hierarchical control structure, the primary control level maintains the

voltage and frequency stability of the microgrid. The secondary control level restores the

microgrid voltage and frequency to their nominal values. DERs are integrated to the rest

of microgrid through Voltage Source Inverters (VSI). Depending on the control objectives,

DERs can be of two main types, namely grid forming and grid following. Grid forming DERs

utilize a Voltage Controlled VSI (VCVSI) and have the capability of dictating microgrid

frequency and voltage. On the other hand, grid following DERs utilize a Current Controlled

VSI (CCVSI) and follow the microgrid frequency and voltage while supplying a speciﬁc

amount of active and reactive power based on external set points [54].

The primary control is locally implemented at grid forming DERs by the droop technique.

This technique prescribes a relation between the frequency, ωi, and the active power, and
between the voltage magnitude, vo,magi, and the reactive power. The frequency and voltage
droop characteristics are

 ωi = ωni − mP iPi

vo,magi = Vni − nQiQi

49

,

(3.1)

where ωni and Vni are the primary frequency and voltage control references and mP i and nQi
are the active and reactive power droop coeﬃcients, respectively. Conventionally, the active

power droop coeﬃcients are proportionally selected based on the apparent power rating of

DERs. However, the reactive power droop coeﬃcients are proportionally selected based on

the maximum reactive power which is calculated using a minimum allowable power factor

and apparent power rating of DER [46]. The apparent power rating is related to thermal

rating of DER equipment (e.g., power electronics switches).

The objective of distributed secondary control is to mitigate the microgrid frequency

and voltage deviations from their nominal values which are caused by primary control. Dis-

tributed secondary control utilizes distributed control protocols implemented on individual

DERs that can communicate with each other through a distributed communication network

and share their local information with neighboring DERs.

Problem 1: The distributed secondary control chooses ωni and Vni in (3.1) such that the operating
frequency and terminal voltage magnitude of each DER synchronize to the reference frequency and

voltage, ωref and vref , i.e., lim

t→∞
lim
t→∞

(cid:13)(cid:13)(cid:13) = 0
(cid:13)(cid:13)(cid:13)ωi(t) − ωref
(cid:13)(cid:13)(cid:13)vo,magi(t) − vref
(cid:13)(cid:13)(cid:13) = 0

∀i ∈ N.

(3.2)

Moreover, the secondary control should guarantee the allocation of active and reactive power of

DERs based on the droop coeﬃcients [51]-[55] as

mP iPi = mP jPj,

(3.3)

(3.4)
where Pmaxi/Qmaxi and Pmaxj/Qmaxj are the active and reactive power ratings of i − th and
j − th DER, respectively.

nQiQi = nQjQj,

The secondary control of a microgrid including N DERs is described as the synchronization

problem for the following ﬁrst-order multi-agent system to adjust the primary control inputs

(cid:110)

˙ωni = vωi,

˙Vni = vvi,

i = 1, ..., N,

(3.5)

50

where vωi and vvi are the distributed secondary frequency and voltage control (DSFC and DSVC)
protocols that are chosen based on the local information of each DER and neighboring DERs’

information and can be written as [54]

vωi = −cωδωi,

vvi = −cvδvi,

(3.6)

(3.7)

where cω and cv are the control gains; δωi and δvi are the local frequency and voltage neighborhood
tracking errors that can be written as

(3.8)

(3.9)

(cid:88)
(cid:88)
j∈Ni
j∈Ni

δωi =

+

aij(ωi − ωj) + gi(ωi − ωref )

aij(mP iPi − mP jPj),

(cid:88)
(cid:88)
j∈Ni
j∈Ni

δvi =

+

aij(vo,magi − vo,magj) + gi(vo,magi − vref )

aij(nQiQi − nQjQj),

The pinning gain gi is assumed nonzero for only one DER.
Remark 1. Note that there always exists a low-level communication noise in the network of DERs.

Therefore, in the presence of the communication noise, one can write the auxiliary controls vωi and

vvi of i − thDER in (3.6) and (3.7) as ζωi = vωi + ηωi

ζvi = vvi + ηvi

,

(3.10)

where ηωi ∼ N (0, Σωi) and ηvi ∼ N (0, Σvi), respectively, denote the aggregate Gaussian noise
aﬀecting the incoming neighbors’ frequency and voltage to i-th DER. In general, the noise associated

with electronic devices at the receiver end lies under the category of thermal noise and statistically

modeled as Gaussian, thus we assumed communication noise to be Gaussian and it is a standard

assumption in the literature [56].

In noisy scenarios, the synchronization problem for microgrid

frequency and voltage as deﬁned in Problem 1 changes to the mean square synchronization problem

51

and becomes

 lim
(cid:13)(cid:13)(cid:13)2
t→∞ E(cid:13)(cid:13)(cid:13)ωi(t) − ωref (t)
t→∞ E(cid:13)(cid:13)(cid:13)vo,magi(t) − vref (t)

lim

= 0

(cid:13)(cid:13)(cid:13)2

∀i ∈ N.

= 0

(3.11)

3.4 Attack Modeling and Detection Mechanism

This section presents attack modelling and detection mechanism for the distributed secondary

control of microgrid.

Deﬁnition 1. (Compromised DER). A DER that is directly under attack is called a compro-

mised DER.

Deﬁnition 2. (Intact DER). A DER that is not compromised or not under direct attack is called

an intact DER.

3.4.1 Attack Modeling

For the direct attack on controller, one can model the DER’s frequency as

ωcr
i = ωi + γiωa
i

(3.12)

with ωa

i as the injected attacker’s input into the controller of i-th DER and ωcr

i denotes the corrupted
DER frequency with scalar γi equal to 1 in the presence of attack. Similarly, for the attack on the
communication channel between two DERs, one can model the received corrupted frequency signal
from j − th DER as

ωcr
j = ωj + γjωa
j

(3.13)

where ωa

j represents the injected attacker’s input into the communication channel between two
i denotes the corrupted DER frequency of neighbor j received at i − th DER with

DERs and ωcr

scalar γi equal to 1 in the presence of attack.
Remark 2. This subsection discusses the attack model in terms of DER’s frequency which aﬀects

the auxiliary control vωi in (3.6). Moreover, the rest of the chapter considers frequency-based
attacks and presents attack detection and mitigation mechanisms. Without loss of generality, the

same approach holds true for attack modelling, detection and mitigation mechanisms for voltage-

based attacks.

52

Remark 3. Attack models in (3.12)-(3.13) represent frequency manipulation attacks on controllers.

Due to the extensive deployment of communication and control technologies and the presence of

Intelligent Electronic Devices (IEDs), the microgrid control system is highly vulnerable to cyber-

attacks. An attack tree for FDI threat analysis is provided to illustrate the attack path. As seen,

the FDI attack can tamper with either the sensors (e.g., Phasor Measurement Units (PMUs)) or

actuators (control and decision-making units). Such attacks can be launched by injecting counterfeit

attack signals into sensors of DER measurement units or directly by injecting a disturbance into the

control units and even hijacking the entire controller. More speciﬁcally, FDI attacks on DERs can

endanger microgrid voltage and frequency stability, slow down the DER control system responses,

or overload DERs.

The existing ﬁrewall/intrusion detection systems (IDSs) monitor and analyze information ﬂow

in the network and detect if there exists considerable change in the information ﬂow. However,

there is no single IDS that is able to detect all diﬀerent attack types [81]. Moreover, the IDSs’

eﬀectiveness highly depends on their parameters. So, if the IDS parameters are not ﬁne-tuned,

the possibility of not detecting attacks increases [81]. On the other hand, IDSs do not block the

corrupted information and cannot mitigate attacks. Therefore, it is of vital importance to design a

resilient control protocol for microgrids that can mitigate attacks and ensure an acceptable level of

functionality for microgrid despite attacks.

3.4.2 Attack Detection Mechanism

This subsection presents a relative entropy-based attack detection approach for the distributed

secondary control of microgrid. More speciﬁcally, KL divergence, a non-negative measure of the rel-

ative entropy between two probability distributions is employed to measure the discrepancy between

them.

Deﬁnition 3. (KL divergence) [41], [82] Let X and Z be two random sequences with probability

density function PX and PZ, respectively. The KL divergence measure between PX and PZ in
continuous-time is deﬁned as

dθ

(3.14)

(cid:90)

DKL(X||Z) =

(cid:18) PX (θ)

(cid:19)

PZ (θ)

PX (θ) log

53

If the sequences X and Z are Gaussian distributed, then the KL divergence in (3.14) can be simpliﬁed

in the terms of mean and covariance of sequences as [41]

(cid:18)
(µZ − µX )T Σ−1

|ΣZ|
|ΣX| − n + tr(Σ−1
Z (µZ − µX )

log

1
2

Z ΣX )

(cid:19)

DKL(X||Z) =
1
2

+

(3.15)

where µX and ΣX denote the mean and covariance of sequence X, and µZ and ΣZ denote the
mean and covariance of sequence Z. Moreover, n denotes the dimension of the sequences.

For the design of an attack detector, we ﬁrst rewrite the frequency auxiliary control ζωi in
(3.10) with statistical properties and then present an attack detection mechanism based on the

KL divergence measure for distributed secondary control of AC microgrids. We show that in the

presence of an attack, one can identify diﬀerent sophisticated attacks based on the change in the

statistical properties of the auxiliary control variables. In the absence of attack, since we consider

the Gaussian noise in the communication channel, then the auxiliary control ζωi in (3.10) can be
written as

ζωi = −cωδωi + ηωi

(3.16)

where ηωi denotes the aggregate Gaussian noise aﬀecting the incoming neighbors’ information given
by

ηωi =

aijηωij ∼ N (0, Σωi)

(3.17)

(cid:88)
j∈Ni

Due to presence of noise, the statistical properties of the auxiliary control ζωi in (3.10) becomes

ζωi ∼ N (0, Σωi)

and it represents the nominal behavior of the DSFC.

In the presence of attacks, using (3.10), the auxiliary control ζa

ωi becomes

with the corrupted local neighborhood tracking error δcr
ωi

= δωi + fi where

ωi

ωi = −cωδcr
ζa
(cid:88)
j∈Ni

aij + gi)ωa

+ ηωi

i − (cid:88)

j∈Ni

fi = [(

aijωa
j ]

54

(3.18)

(3.19)

(3.20)

denotes the overall deviation in the local neighborhood tracking error due to the attacks on con-

troller/communication channel in the network. Note that in presence of attacks, one can observe

control ζa

the corrupted frequency of DERs and based on corrupted frequency, one has the corrupted auxiliary
ωi. The overall attacker’s input fi is neither measurable nor required to be known. The
statistical properties of corrupted control protocol changes due to the eﬀect of attacks. Now, from

(3.19), one has the following statistical properties

ωi ∼ N (µfi
ζa

, Σfi

+ Σωi)

(3.21)

and Σfi
are mean and covariance of the injected overall attack signal fi, respectively.
ωi and ζωi have normal Gaussian distributions, according to (3.15) the KL divergence

where µfi
Since both ζa
DKL(ζa

ωi||ζωi) between control sequences ζa

DKL(ζa

ωi||ζωi) =

1
2

log

ωi and ζωi becomes

(cid:12)(cid:12)(cid:12)Σζωi
(cid:12)(cid:12)(cid:12)(cid:12)Σζa

ωi

(cid:12)(cid:12)(cid:12)
(cid:12)(cid:12)(cid:12)(cid:12) − 1 + tr(Σ−1

ζωi



)

Σζa
ωi

+

1
2

(µζωi

− µζa

ωi

)T Σ−1
ζωi

(µζωi

− µζa

ωi

)

(3.22)

where µζωi
and Σζωi
and covariance of ζa
ωi.

denote the mean and covariance of ζωi and µζa
ωi

and Σζa
ωi

denote the mean

We deﬁne the average of KL divergence over a window T as

(cid:90) k+T−1

Ωi =

1
T

k

DKL(ζa

ωi||ζωi)dτ

(3.23)

to detect the change due to the adversarial input. Now, in the following theorem, we show that the

eﬀect of attacks in the secondary distributed control of the microgrid can be detected based on the

discrepancy of the control sequences ζa

ωi and ζωi.

Theorem 1. Consider the distributed auxiliary control ζωi in (3.16) under attacks. Then, a) Ωi
deﬁned in (3.23) becomes zero, if there is no attack on DERs. b) Ωi deﬁned in (3.23) is greater than
a design threshold γi, if the microgrid secondary control is under attack.

Proof. In the absence of attacks, the statistical properties of sequences ζa
in (3.18) and (3.21) are the same because µfi

ωi and ζωi, respectively,
become zero as fi = 0. Therefore, the KL

and Σfi

55

divergence DKL(ζa
zero. This complete the proof of part (a).

ωi||ζωi) in (3.22) becomes zero based on (3.15) which yields Ωi in (3.23) to be

For the proof of Part (b), using (3.18)-(3.21) in (3.22), the KL divergence between ζa

ωi and ζωi

becomes

DKL(ζa

Then, using (3.23), one has
k+T−1
(

Ωi =

1
T

k

(cid:12)(cid:12)(cid:12)

1
2

(log

(cid:12)(cid:12)(cid:12)Σωi
(cid:12)(cid:12)(cid:12)Σfi
(cid:12)(cid:12)(cid:12)
(cid:12)(cid:12)(cid:12)Σωi
(cid:12)(cid:12)(cid:12)Σfi

+ Σωi

+ Σωi

ωi

(cid:12)(cid:12)(cid:12) + tr(Σ−1
(cid:12)(cid:12)(cid:12) + tr(Σ−1

Σfi

ωi

ωi||ζωi) =
(cid:90)

1
2

log

Σfi

) + µT
fi

Σ−1
ωi

)

µfi

(3.24)

) + µT
fi

Σ−1
ωi

µfi

))dτ > γi

(3.25)

where T and γi denote the sliding window size and the predeﬁned positive design threshold, respec-
(cid:3)
tively. This completes the proof.

Based on the presented Theorem 1, eﬀect of attacks on the distributed secondary control of

microgrids can be detected using the predeﬁned design threshold γi. Attack detection in (3.25)
uses the idea of average over a ﬁxed length moving window to avoid false detection. If there is a

short-period anomaly rather than attack (such as disturbance or packet dropout), it vanishes in a

few time steps and such anomalies are not detected as attacks.

3.5 Resilient Distributed Control Mechanism

This section presents a resilient distributed control mechanism for distributed secondary control

of microgrids based on the proposed attack detection algorithm in the previous section.

56

Figure 3.1: The ﬂowchart of proposed attack detection and mitigation approach.

57

To this end, ﬁrst, we introduce the notion of self and external-belief of DERs about trustworthiness

of their own information and their neighbor’s information, respectively. Then the presented beliefs

are incorporated in the distributed secondary control protocols.

3.5.1 Belief of DERs About Their Own Observed Frequency

To measure the level of trustworthiness of each DER about its own observed frequency, which

depends on the proximity to the source of the attack in the network, a self-belief is presented. In

the presence of the adversary, a DER reduces its level of trustworthiness about its own observed

frequency and transmits its self-belief to its immediate neighbors which prevent the propagation of

attack in the microgrid.
Using the DKL(ζa
frequency is deﬁned as

ωi||ζωi) from Theorem 1, self-belief of i − th DER about its own observed

t(cid:90)

IBel
i

(t) = κ1

eκ1(τ−t)ψi(τ )dτ

(3.26)

(3.27)

where 0 ≤ IBel

i

(t) ≤ 1 with

0

ψi(t) =

∆1 + DKL(ζa

ωi||ζωi)

∆1

where ∆1 represents the threshold to account for the channel fading and other uncertainties and
0 < κ1 < 1 denotes the discount factor. Equation (3.26) can be implemented by the following
diﬀerential equation

˙IBel
i

(t) + κ1IBel

i

(t) = κ1ψi(t)

Based on Theorem 1, in the presence of attacks, DKL(ζa
of the DER ψi(t) close to zero and, consequently, the value of IBel
the other hand, based on Theorem 1, in the absence of attack DKL(ζa
makes ψi(t) close to one and, consequently, IBel

(t) becomes close to one.

i

(3.28)
ωi||ζωi) >> ∆1, which makes the self-belief
(t) becomes close to zero. On
ωi||ζωi) tends to zero, which

i

If a DER is under direct attack, its self-belief tends to zero according to (3.26). The DER

transmits its self-belief value to the neighboring DERs. Using the received self-belief values, neigh-

boring DERs ignore the information received from the attacked DER which prevents the attack

propagation. Note that the discount factor in (3.26) evaluates the importance of current informa-

tion with regards to past information. The discount factor ensures that if an attacker removes the

58

eﬀect of attack in a while, or if a short-period adversarial eﬀect exists rather than attack (such as

packet dropout), then the belief of the DER will be recovered, as it mainly depends on the current

information.

3.5.2 Belief of DERs About Their Neighbor’s Observed Frequency

To evaluate the level of conﬁdence of a DER on its neighbor’s observed frequency, we introduce

the notion of external-belief or trust. If the self-belief value of a DER is low, it forms beliefs on

its neighboring DER’s information (either intact or compromised) and updates its external-belief

which depends on the beliefs on each of its neighbors using only local information. Therefore, the

DERs can identify the compromised neighbor and discard its information in their control protocol.

In the worst-case scenario, a compromised DER always transmits the self-belief value of 1 to its

neighbors to deceive them. Based on the external-belief a DER can identify the corrupted neighbors

and discards their information.

Using the KL divergence between exchanged information of the i-th DER and its neighbor, one

can deﬁne the Υij(t) as

where 0 ≤ Υij(t) ≤ 1 and

with mi = (1/|Ni|)(cid:80)

j∈Ni

t(cid:90)

0

Υij(t) = κ2

eκ2(τ−t)χij(τ )dτ

χij(t) =

∆2

∆2 + DKL(ωi||mi)

∀j ∈ Ni

ωj; ∆2 > 0 represent the threshold to account for the

(3.29)

(3.30)

channel fading and other uncertainties; 0 < κ2 < 1 denotes the discount factor. For the neighboring
DER under direct attack, the KL divergence DKL(ωi||mi) becomes high which makes χij(t) close to
zero. Consequently, this makes the value of Υij(t) close to zero. On the other hand, if the incoming
information from neighboring DER is intact, then DKL(ωi||mi) becomes close to zero which makes
χij(t) close to one. Equation (3.29) can be implemented using the following diﬀerential equation

Now, we deﬁne the external-belief value of a DER on its neighbors as

˙Υij(t) + κ2Υij(t) = κ2χij(t)

EBel

ij

(t) = min(IBel

i

(t), Υij(t))

59

(3.31)

(3.32)

with 0 ≤ EBel
(t) ≤ 1. Note also that the discount factor in (3.26) and (3.29) determines how
ij
much we value the current experience with regards to past experiences. It also guarantees that if

the attack is not persistent and disappears after a while, or if a short-period adversary rather than

attack (such as disturbance or packet dropout) causes, the belief will be recovered, as it mainly

depends on the current circumstances.

3.5.3 The Mitigation Mechanism Using Self and External-belief values

This subsection presents a resilient or cyber-secure auxiliary control protocol for secondary control of

microgrid. We employ the entropy-based self and external-belief values in the mitigation algorithm

(See Fig. 3.1). More speciﬁcally, both self and external-belief values in (3.26) and (3.32) are

incorporated into the frequency based auxiliary control in (3.10) and the resilient form is presented

as

where

αij(t)(ωi − ωj) + gi(ωi − ωref )

(cid:88)
j∈Ni
αij(t)(mP iPi − mP jPj)) + ηωi

ζωi = −cω(
(cid:88)
j∈Ni

+

αij(t) = aijIBel

i

(t)EBel

ij

(t)

(3.33)

(3.34)

incorporates the self and external-belief discussed in the previous subsection. The following theorem

solves Problem 1 using proposed resilient auxiliary control protocol in (3.33) for intact DERs in the

presence of attack.

Assumption 2 (m-local connectivity). If at most m neighbors of each intact DER is under

attack, at least m + 1 neighbors of each intact DER are intact [82].

Remark 4. Assumption 2 is a common assumption in the distributed control literature [16], [82].

This assumption provides a minimum requirement for any distributed system to ensure consensus

in the presence of attack.

Theorem 2. Consider the resilient DSFC in (3.33). Let Assumptions 1 and 2 be satisﬁed. Then,

the frequency of the intact DERs synchronizes to the desired nominal frequency in mean square

sense, despite the m compromised DERs.

60

Proof. The resilient frequency based secondary control in (3.33) can be rewritten as

ζωi = −cω(
(cid:88)
j∈Ni

+

αij(t)(ωi − ωj) + gi(ωi − ωref )

(cid:88)
j∈Ni
αij(t)(mP iPi − mP jPj)) + ηωi

(3.35)

where the weight αij(t) deﬁned in (3.34) combines the self-belief of agent i and its external belief
on agent j. The global form of the (3.35) becomes

ζω = −cω((L(t) + G)(ω − ¯ωref ) + L(t) ¯P ) + ηω

(3.36)

where ω = [ω1, . . . , ωN ]T , ωref = 1N ⊗ ¯ωref , ηω = [ηω1, . . . , ηωN ]T , ζω = [ζω1, . . . , ζωN ]T and
¯P = [mP 1P1, . . . , mP N PN ]T . Moreover, L(t) ∈ RN×N and G ∈ RN×N denote the graph Lapla-
cian matrix and the diagonal gain matrix, with diagonal entries equal to the pinning gains gi,
respectively.

According to Assumption 2, the total number of the compromised agents is less than half of the

network connectivity, i.e., 2m + 1. Therefore, even if m neighbors of an intact DER are attacked

and collude to transmit the same value to mislead the intact DER, there still exists m + 1 intact

neighbors that transmit the actual values which diﬀer from the compromised ones. Moreover, since

m+1 intact DER’s neighbors are intact, it can update its external belief and isolate the compromised

neighbors. As shown in [20], the resulting graph after isolating the compromised DERs in the entire

network remains connected to the intact DERs. Therefore, there exists a spanning tree in the graph

associated with all intact DERs. On the other hand, it is shown in [83]-[84], that distributed agents

reach mean square consensus in the presence of Gaussian noise if the graph contains spanning tree.

Thus, resilient DSFC in (3.33) intact DERs synchronize to the nominal frequency or the leader’s
(cid:3)

state. This completes the proof.

Remark 5. Note that even in the presence of replay attacks where the attacker replicates all the

statistical characteristics of previous control signals for the DER, intact DERs lose their trust on

the compromised DER’s due to the divergence term in in calculating the external-belief in (3.30)

and reject the corrupted information in their control protocol.

61

Figure 3.2: Single line diagram of the microgrid test system in Case A.

Figure 3.3: Communication graph of the microgrid test system in Case A.

Remark 6. Although not considered in this chapter, the proposed cyber-secure distributed sec-

ondary control can be eﬀectively integrated into the event-triggered based distributed controls (e.g.,

[64]) to increase the resilience of control system with respect to both FDI and DoS attacks.

62

3.6 Case Studies

3.6.1 Case A: Simulation results for IEEE 34-bus feeder

The microgrid test system is illustrated in Fig. 3.2. The IEEE 34 bus test feeder is utilized as the

back bone of microgrid with six DERs integrated to diﬀerent locations. This microgrid system is

simulated in MATLAB/Simulink. The speciﬁcation of lines is provided in [85]. A balanced feeder

model by averaging the line parameters is utilized in the test system. Tables I and II summarize

the speciﬁcations of the loads and DERs, respectively. The nominal frequency and line-to-line

voltage are set to 60 Hz and 24.9 kV, respectively. DERs are connected to the feeder through six

Y-Y, 480 V/24.9 kV, 400 kVA transformers with the series impedance of 0.03 + j 0.12 pu. The

communication graph of distributed secondary control system is depicted in Fig. 3.3. Only DER

1 knows the frequency and voltage reference values with the pinning gain g1 = 1. The control
gains cω and cv in (3.6) and (3.7) are set to 40. We assume zero-mean Gaussian communication
noise with following statistical properties N (0, 0.01). Two diﬀerent cases are considered to evaluate
the presented results for attack detection and mitigation in the distributed secondary control of

microgrids. Case A.1 analyzes the results for DSFC and Case A.2 presents the results for DSVC in

the presence of attacks in the microgrid.

Case A.1.1 (eﬀect of attack on the conventional DSFC):

In this case, we consider the attack on

DER 6 based on (3.12). At t = 0, the microgrid is islanded from the main grid. From t = 0 to t

= 0.6 s only the primary control is applied. The primary control takes action to provide frequency

stability in the islanded However, the primary control only maintains frequency in stable ranges and

cannot maintain frequency at exactly 60 Hz. Then, the secondary distributed frequency control is

applied at t = 0.6 s to restore the microgrid frequency to 60 Hz. However, the attacker hijacks the

DSFC of DER 6 and replaces the actual frequency with 60.2 Hz. Fig. 3.4 show that the conventional

DSFC protocol leads to the loss of desired consensus. The frequency of each DER deviates from the

desired frequency of 60 Hz and shows oscillatory behavior. In the presence of attack, the behavior

of the compromised DER 6 is directly aﬀected by the attack signal and its corrupted frequency is

observed by reachable intact DERs which are aﬀected by it and they also show oscillatory behaviors

as shown in Fig. 3.4. Fig. 3.5 clearly shows that the relative entropy of compromised and reachable

63

(a)

(b)

Figure 3.4: Case A: Eﬀect of attack on DSFC: (a) frequency; (b) active power ratio.

Figure 3.5: Case A: Relative entropy based on frequency of DERs.

(a)

(a)

(b)

Figure 3.6: Case A: Resilient DSFC: (a) frequency; (b) active power ratio.

DERs diverge and go beyond the predeﬁned design threshold which is assumed to be in the presence

of attack. The relative entropy of compromised DER is relatively much higher than the intact DERs

and designed detector can easily detect the eﬀect of attack.

Case A.1.2 (attack detection and mitigation): Similar to Case A.1.1, at t = 0, the microgrid is

(a)

(b)

Figure 3.7: Case A: Resilient DSFC: (a) relative entropy; (b) self-believes of DERs.

64

islanded from the main grid. From t = 0 to t = 0.6 s only primary control is applied. Then, the

secondary distributed frequency control is applied at t = 0.6 s to restore the microgrid frequency to

60 Hz. However, the attacker hijacks the DSFC of DER 6 and replaces the actual frequency with

60.2 Hz at t = 0.6 s and then, the designed attack detection and mitigation mechanism is applied

at t = 0.7 s. As shown in Fig. 3.6, frequency of intact DERs restores to 60 Hz after applying the

attack mitigation mechanism at t = 0.7 s. Active power of all DERs are also retrieved back as in

intact mode. After applying the resilient DSFC in (3.33), intact DERs discard the frequency value

received from corrupted DER and the mean and variance of their local frequency neighborhood

tracking error distribution remain close to the normal case. Therefore, based on (3.22), the relative

entropy for intact DERs remains close to zero but it keeps growing for the compromised DER 6 due

to deviation in mean and variance of the corrupted frequency signal from the nominal one as shown

in Fig. 3.7. According to (3.26)-(3.27), self-belief of a DER depends on its relative entropy and one

can see in Fig. 3.7 that self-belief for all DERs becomes one except for the compromised DER 6,

which indicates that all the DERs are conﬁdent about their frequencies, except for the compromised

one. The self-belief of a DER measures the level of trustworthiness about its observed frequency,

which is updated in each iteration and recursively used in resilient DSFC in (3.33) for mitigation of

the attack. Based on the presented resilient DSFC, intact DERs do not incorporate the corrupted

frequency from DER 6 and achieve the desired synchronization as shown in Fig. 3.6.

Case A.1.3 (attack detection and mitigation for periodic adversaries): In this subsection, the eﬀec-

tiveness of the presented attack detection and mitigation algorithm is validated for periodic attacks.

The secondary distributed frequency control is applied at t = 0 s which synchronizes the frequency

of the microgrid to 60 Hz. Then, the attacker hijacks the DSFC of DER 6 and replaces the actual

frequency with 60.2 Hz at t = 1.2 s and t = 2.2 s.

In the following, the simulation results are

provided for two diﬀerent attack durations. First, it is assumed that when the attack is applied at

t = 1.2 s and t = 2.2 s, it is only eﬀective for 0.05 s. Fig. 3.8 shows the DER frequencies and active

power ratios. As seen in Fig. 3.8, due to the short duration of attack, its impact is minimal; DER

frequencies slightly deviate from 60 Hz.

Based on (22), the relative entropy for intact DERs remains close to zero, but it keeps growing

for the compromised DER 6 when the attack is eﬀective due to deviation in mean and variance of the

65

(a)

(b)

Figure 3.8: Eﬀect of periodic attack on DSFC with 0.05 s duration: (a) frequency; (b) active power ratio.

(a)

(b)

Figure 3.9: Eﬀect of periodic attack on DSFC with 0.05 s duration: (a) relative entropy; (b) self-believes of

DERs.

corrupted frequency signal from the nominal one as shown in Fig. 3.9. According to (3.26)-(3.27),

the self-belief of a DER depends on its relative entropy; one can see in Fig. 3.9 that self-belief values

during the attack period are one for all DERs except for the compromised DER 6, which indicates

that all the DERs are conﬁdent about their exchanged frequencies, except for the compromised one.

As expected, for the time interval that the attacker turns oﬀ its attack signal, DER frequencies and

active power ratios are restored to their intact values before the attack is applied.

In the second simulation scenario, it is assumed that when the attack is applied at t = 1 s and

t = 3 s, it is eﬀective for 0.5 s. Fig. 3.10 shows the DER frequencies and active power ratios. As

seen in Fig. 3.10, after the attack is applied, DER frequencies deviate from 60 Hz and active power

ratios experience noticeable oscillations. The relative entropy for intact DERs remains close to zero,

but it keeps growing for the compromised DER 6 during durations of attack as shown in Fig. 3.11

(a). As seen in Fig. 3.11(b), self-belief during attack periods becomes one for all DERs except for

the compromised DER 6. The attack mitigation scheme restores DER frequencies to 60 Hz and

active power ratios to a common value. For the time interval that the attacker turns oﬀ its attack

66

(a)

(b)

Figure 3.10: Eﬀect of periodic attack on DSFC with 0.5 s duration: (a) frequency; (b) active power ratio.

(a)

(b)

Figure 3.11: Fig. 12. Eﬀect of periodic attack on DSFC with 0.5 s duration: (a) relative entropy; (b)

self-believes of DERs.

(a)

(b)

Figure 3.12: Eﬀect of attack on DER 2 in DSVC: (a) voltage (V): (b) reactive power ratio.

signal, DER frequencies and active power ratios are restored to their intact values before the attack

is applied.

Case A.2.1 (eﬀect of attack on the conventional DSVC): In this case, we consider the attack on

DER 6 based on (3.12). From t = 0 to t = 0.65 s only primary control is applied and then the

attacker hijacks the DSVC of DER 6 and replaces the actual voltage with 482 V at t = 0.65 s. In the

presence of attack, the conventional DSVC leads to the loss of desired consensus as shown in Fig.

3.12(a) and Fig. 3.12(b). Voltage and reactive power ratio for each DER deviate from the desired

consensus and show oscillatory response. The corrupted voltage magnitude of DER 6 is directly

67

Figure 3.13: Case A: Relative entropy based on voltage of DERs.

(a)

(a)

(b)

Figure 3.14: Case A: Resilient DSVC: (a) voltage (V); (b) reactive power ratio.

observed by reachable intact DERs which are aﬀected by it. The reachable neighboring DERs also

show oscillatory behaviors in their operating voltage and reactive power as shown in Fig. 3.12(a).

This makes the relative entropy of compromised and reachable DERs diverge and go beyond the
predeﬁned design threshold of γi = 5 ∀i in the presence of attack as shown in Fig. 3.13.
Case A.2.2 (attack detection and mitigation on DSVC): In this case, we consider the attack on

DER 6. From t = 0 to t = 0.65 s only primary control is applied and then the attacker hijacks

the DSVC of DER 6 and replaces the actual voltage with 482 V at t = 0.65 s. As shown in Fig.

3.14(a) and Fig. 3.14(b), voltage of all DERs except the hijacked one synchronize to 480 V after

applying the mitigation mechanism at t = 0.7 s. The reactive power of DERs are also shared based

on their ratings. Fig. 3.15(a) shows that the relative entropy for intact DERs remains close to zero,

but it keeps growing for the compromised DER 6 due to deviation of the corrupted voltage from

the nominal one, and, consequently, as shown in Fig. 3.15(b), self-belief for all DERs becomes one

except for the compromised DER 6.

68

(a)

(b)

Figure 3.15: Case A: Resilient DSVC: (a) relative entropy; (b) self-belief of DERs.

3.6.2 Case B: Simulation results for an Islanded Microgrid with 20 DERs

Case B veriﬁes the validity of proposed control techniques on a 60Hz and 480V microgrid test

system with 20 DERs. The single-line diagram of this microgrid test system is illustrated in Fig.

3.16. This test system is simulated in MATLAB/Simulink. The speciﬁcations of DERs are listed

in Table III. Lines and loads speciﬁcations are shown in Tables IV. The communication network

graph is depicted in Fig. 3.17. The frequency reference value is shared with DER1 with the pinning
gain g1 = 1. ωref is set to 2π × 60 rad/s. The control gains cω is set to 40. We assume zero-mean
Gaussian communication noise with following statistical properties N (0, 0.01). This system is used
to validate the proposed attack detection and mitigation schemes considering the DSFC.

Case B.1 (eﬀect of attack on the conventional DSFC): We consider the attack on DER 20. At t = 0,

the microgrid is islanded from the main grid. From t = 0 to t = 0.7s only primary control is applied.

The primary control takes action to provide frequency stability in the islanded microgrid. However,

primary control only maintains frequency in stable ranges and cannot maintain frequency at exactly

60 Hz. Then, the secondary distributed frequency control is applied at t = 0.7s to restore microgrid

frequency to 60Hz. However, the attacker hijacks the DSFC of DER 20 and replaces the actual

frequency with 60.2 Hz. Fig. 3.18 shows that the conventional DSFC protocol leads to the loss of

the desired consensus. The frequency of each DER deviates from the desired frequency of 60 Hz and

shows oscillatory behavior. In the presence of attack, the behavior of the compromised DER 20 is

directly aﬀected by the attack signal and its corrupted frequency is shared with neighboring DERs.

This causes an oscillatory behavior in the neighboring DERs as shown in Fig. 3.18(a). Fig. 3.19

shows that the relative entropy of compromised and neighboring DERs diverge due to deviation in

their behavior from the nominal one and go beyond predeﬁned design threshold which is assumed

69

(a)

Figure 3.16: Microgrid testbed with 20 DERs.

Figure 3.17: Communication graph of the microgrid testbed in Case B.

(a)

(a)

(b)

Figure 3.18: Case B: Eﬀect of attack on DSFC: (a) frequency; (b) active power ratio.

to be γi = 5 ∀i.
Case B.2 (attack detection and mitigation): Similar to Case B.1, at t = 0, the microgrid is islanded

from the main grid. From t = 0 to t = 0.7s only primary control is applied. Then, the secondary

distributed frequency control is applied at t = 0.7s. The attacker hijacks the DSFC of DER 20 and

70

Figure 3.19: Case B: Relative entropy based on frequency of DERs.

(a)

(a)

(b)

Figure 3.20: Case B: Resilient DSFC: (a) frequency; (b) active power ratio.

(a)

(b)

Figure 3.21: Case B: Resilient DSFC: (a) relative entropy; (b) self-belief of DERs.

replaces the actual frequency with 60.2 Hz at t = 0.7 s and then, the designed attack detection and

mitigation mechanism is applied at t = 0.75 s. As shown in Fig. 3.20, frequency of intact DERs

restores to 60 Hz after applying the attack mitigation mechanism at t = 0.75 s. Active power of

all DERs are also retrieved back as in intact mode. After applying resilient DSFC, intact DERs

discard locally observed frequency of corrupted DER. Therefore, the relative entropy for intact

DERs remains close to zero but it keeps growing for the compromised DER 20 as shown in Fig.

3.21(a). Fig. 3.21(b) shows that self-belief for all DERs becomes one except the compromised DER

20.

71

3.6.3 Case C: Experimental veriﬁcation of proposed techniques using a hardware-in-

the-loop testing setup

To experimentally validate the performance of proposed attack detection and mitigation techniques,

a hardware-in-the-loop (HIL) laboratory testbed is developed using Opal-RT as a real-time digital

simulator and Raspberry Pi modules. A microgrid testbed including four DERs is simulated in

Opal-RT. The speciﬁcations of DERs, loads, and lines are summarized in Table V. It is assumed

that DERs communicate to each other through the communication graph network in Fig. 3.22.

The nominal operating voltage and frequency of the microgrid test system are 480 V and 60 Hz,

respectively. The frequency reference value is shared with DER1 with the pinning gain g1 = 1.
ωref is set to 2π × 60 rad/s. The control gains cω is set to 40. We assume zero-mean Gaussian
communication noise with following statistical properties N (0, 0.01).

As seen in Fig. 3.22, four Raspberry Pi modules are utilized in the HIL testing. Each Raspberry

Pi module hosts the cyber-secure DSFC protocol for a DER. Raspberry Pi modules communicate

to each other through a distributed communication network. The HIL setup, including Opal-RT,

Raspberry Pi modules, Gigabit ethernet switch, and host computer, is shown in Fig. 3.23. The

microgrid electric circuit, including DERs, loads, lines, and primary controllers, are modelled in

RT-LAB. The DER local measurements including the voltage, frequency, and active/reactive power

measurements are sent to the corresponding Raspberry Pi module through User Datagram Protocol

(UDP). Each Raspberry Pi module runs three processes in parallel. These processes include receiving

real-time DER measurements and sending secondary control references to DERs, communicating

to the neighboring DER Raspberry Pi modules, and running the secondary control protocol and

attack detection and mitigation techniques.

Case C.1 (eﬀect of attack on the conventional DSFC): We consider the attack on DER 2. At

t = 0, the microgrid is islanded from the main grid. From t = 0 to t = 30s only primary control

is applied. Then, the secondary distributed frequency control is applied at t = 30s to restore

microgrid frequency to 60 Hz. However, the attacker hijacks the DSFC of DER 2 and replaces

the actual frequency with 66 Hz. Fig. 3.24(a) and Fig. 3.24(b) show that the conventional DSFC

protocol leads to the loss of the desired consensus. The frequency of each DER deviates from the

desired frequency of 60 Hz and shows oscillatory behavior. In the presence of attack, the behavior

72

(a)

Figure 3.22: Microgrid test system for HIL testing.

(a)

Figure 3.23: HIL Setup.

(a)

(b)

Figure 3.24: Case C: Eﬀect of attack on DSFC: (a) frequency; (b) active power ratio.

73

Figure 3.25: Case C: Relative entropy based on frequency of DERs.

(a)

(a)

(b)

Figure 3.26: Case C: Resilient DSFC: (a) frequency; (b) active power ratio.

of the compromised DER 2 is directly aﬀected by the attack signal and its corrupted frequency

is shared with neighboring DERs. This causes an oscillatory behavior in the neighboring DERs.

Fig. 3.25 shows that the relative entropy of compromised and neighboring DERs diverge due to

deviation in their behavior from the nominal one.

Case C.2 (attack detection and mitigation): At t = 0, the microgrid is islanded from the main grid.

From t = 0 to t = 30s only primary control is applied. Then, the secondary distributed frequency

control is applied at t = 30s. The attacker hijacks the DSFC of DER 2 and replaces the actual

frequency with 66 Hz at t = 30s and then, the designed attack detection and mitigation mechanism

is applied at the same time. As shown in Fig. 3.26(a) and Fig. 3.26(b), frequency of intact DERs

restores to 60 Hz after applying the attack mitigation mechanism. Active power of all DERs are

also retrieved back as in intact mode.

3.6.4 Conclusion

This chapter addresses the eﬀects of data manipulation attacks on distributed secondary frequency

and voltage control in AC microgrids. An information-theoretic approach is employed for design of

74

detection and mitigation mechanism. Each DER detects the misbehavior of its neighbors on the

distributed communication network and, consequently, calculates a belief related to the trustworthi-

ness of the received information. It is shown that using the proposed cyber-secure approach, a DER

can distinguish data manipulation attacks from legitimate events and only discards the information

received from a neighbor if it is compromised. The proposed approach is ensured to work under a

mild communication graph connectivity.

75

CHAPTER 4

ATTACK ANALYSIS AND RESILIENT CONTROL DESIGN FOR

DISCRETE-TIME DISTRIBUTED MULTI-AGENT SYSTEMS

4.1

Introduction

In this chapter, we relax some connectivity assumptions in the network for the resilient control

design provided in Chapters 2 and 3 and present a distributed adaptive resilient mechanism to

mitigate attacks on sensors and actuators.

We ﬁrst describe, supported with analysis, the adverse eﬀects of cyber-physical attacks on the

consensus of the DMAS. Speciﬁcally, we show how an attack on a compromised agent can propagate

and even destabilize the entire network. Conditions under which the network becomes unstable are

provided. We also show that the local neighborhood tracking error of agents becomes zero under

speciﬁc types of attack while they are far from the synchronization. Therefore, existing robust
control approaches such as H∞ that aim at minimizing the local neighborhood tracking error can
no longer mitigate these types of attacks. Then, based on the results of attack analysis, to mitigate

the eﬀect of attacks on sensors and actuators, an observer-like anomaly detector is ﬁrst designed

which provides the expected normal behavior of the agents when there is no attack. An adaptive

attack compensator is designed and augmented with the controller to mitigate attacks without

discarding information from compromised or unattacked neighbors’. We show that the consensus

error is uniformly bounded using the proposed controller in the presence of the attack, and the

bound can be made arbitrarily small.

4.2 Preliminaries

4.2.1 Graph Theory

A directed graph G consists of a pair (V, E) in which set of nodes and set of edges are represented
by V = v1, . . . , vN and E ⊂ V × V, respectively. The adjacency matrix is deﬁned as A = [aij], with
aij > 0 if (vj, vi) ∈ E. The set of nodes vi with edges incoming to node vj is called as neighbors

76

where H = diag(hi) is known as the in-degree matrix, with(cid:80)

of node vi, namely Ni = vj : (vj, vi) ∈ E. The graph Laplacian matrix is deﬁned as L = H − A,
aij as the weighted in-degree
of node i. A node is called as a root node if it can reach all other nodes of the graph G through a
directed path. A directed tree is an acyclic digraph with a root node, such that any other node of

j∈Ni

the digraph can be reached by one and only one directed path starting at the root node. A graph

is said to have a spanning tree if a subset of the edges forms a directed tree.

Throughout the paper, λ(.) represents the eigenvalues of a matrix. (.)adj refers to adjoint of a

matrix. ker(.) denotes the null space. Furthermore, λmax(.) and λmin(.) represent maximum and
minimum eigenvalue of matrix, respectively. diag(.) denotes the diagonal matrix. If A is an m × n
matrix, with aij being its i − th row and j − th column entry, and B is a p × q matrix, then the
Kronecker product A ⊗ B is the mp × nq block matrix given by

 .



A ⊗ B =

a11B ···
...

...

a1nB

...

am1B ··· amnB

Assumption 1. The directed graph G has a spanning tree.

This assumption is a minimum requirement for the graph to guarantee consensus even in the

absence of the attack [91], [27].

4.2.2 Standard Distributed Consensus in MAS

Consider the DMAS with N agents having identical system dynamics represented by

xi(k + 1) = Axi(k) + Bui(k), ∀ i = 1, . . . , N,

(4.1)

where xi(k) ∈ Rn and ui(k) ∈ Rm are the state and control input of agent i, respectively. A and
B are the system and input matrices, respectively. (A, B) is assumed to be stabilizable.

Deﬁne the local neighborhood tracking error for the agent i as

εi(k) = (1 + hi)−1 (cid:88)

j∈Ni

aij(xj(k) − xi(k)).

Consider the distributed control law for node i as [91]

ui(k) = cKεi(k), ∀ i = 1, . . . , N,

77

(4.2)

(4.3)

where c is a positive coupling gain and K ∈ Rm×n is a control gain, designed to gaurantee
that agents reach consensus, i.e., xi(k) → xj(k) ∀i, j. Deﬁne the global state vector as x(k) =
[xT
1 (k), xT
Then, by substituting the controller ui(k) from (3.3) into the system dynamics (3.1), the global

N (k)]T ∈ RnN .

2 (k), . . . , xT

dynamics of DMAS can be expressed as [27]

x(k + 1) = [IN ⊗ A − c(I + H)−1L ⊗ BK]x(k).

Then, the solution to (4.4) is given by

x(k) = [IN ⊗ A − c ˆL ⊗ BK]kx(0) (cid:44) Ak

c x(0),

where ˆL is the normalized graph Laplacian matrix deﬁned as [91]

(4.4)

(4.5)

ˆL = (I + H)−1L.

(4.6)
Let the eigenvalues of ˆL be λi, ∀ i = 1, . . . , N. Then, λi lies inside the unit circle centered at 1 + j0
for i = 2, . . . , N and λ1 = 0 [27].
Lemma 1. [27] Let R ⊂ V be the set of root nodes and r = [p11, . . . , p1N ]T be the left eigenvector
of ˆL for λ1 = 0. Then, p1i > 0 if i ∈ R and p1i = 0 if i /∈ R.
Theorem 1. [91], [27] Let feedback gain K be designed such that A − cλiBK is Schur stable for
i = 2, . . . , N. Then, DMAS reaches consensus and the ﬁnal consensus value can be written as

x(k) → (rT ⊗ Ak)[(x1(0))T , . . . , (xN (0))T ]T ∀ i = 1, . . . , N.

(4.7)

4.3 Attack Analysis for Discrete-time DMAS

In this section, we model false-data injection attack on sensors and actuators, and analyze its

adverse eﬀects on DMASs.

Attacks on actuators of agent i can be modeled as

uc
i (k) = ui(k) + γiua

i (k),

(4.8)

where ui is the control law given in (3.3), ua
actuator of agent i, uc

i represents the attacker’s signal injected into the
i is the distorted control law applied to (3.1) and the scalar γi is 1 when there

is an attack on actuators of agent i and 0, otherwise.

78

Attacks on sensors of agent i can be modeled as

xc
i (k) = xi(k) + δixa

i (k),

(4.9)

where xi represents the state of agent i, xa
i, xc

i is the attacker’s signal injected into the sensor of agent
i is the distorted state and the scalar δi is 1 when there is an attack on sensors of agent i and

0, otherwise.

Based on the distributed control law (4.3), and using (4.8) and (4.9) in (4.1), one can express

the DMAS dynamics for an agent i as

xi(k + 1) = Axi(k) + Bui(k) + Bfi(k), ∀ i = 1, . . . , N,

where fi(k) represents the overall attack signal injected into the agent i, which is given by

fi(k) = c(1 + hi)−1K(

aij(δjxa

j (k) − δixa

i (k)) + γiua

i (k)).

N(cid:88)

(4.10)

(4.11)

j=1

Deﬁnition 1. In a graph, agent i is reachable from agent j if there exists [v1, v2, . . . , vl] ∈ V such
av1v2 . . . avli (cid:54)= 0 for some l ≥ 0, i.e., there is a directed path of length l + 1 from node j
that ajv1
to node i.

Deﬁnition 2 (Compromised and unattacked Agent). We call an agent that is directly under

attack as a compromised agent. An agent is called unattacked if it is not compromised. We denote
the set of agents as NInt, i.e.,NInt = N\NComp where NComp denotes the set of compromised
agents.

In the control systems, the internal model principle (IMP) states that the controller must in-

corporate a model of the dynamics that generate the signals which the control system is supposed

to track. We show in Theorem 2 that the attacker can also leverage the IMP and incorporate some

eigenvalues of the consensus dynamics in its attack design to destabilize the entire network.

Deﬁnition 3. (IMP-based and non-IMP-based Attacks.) Let the attacker designs it attack signal

fi(k) on sensors and/or actuators of the compromised agent i as

with W ∈ Rm×m as the attacker’s dynamic. Deﬁne

fi(k + 1) = W fi(k),

 ΛW = [λW1

ΛA = [λA1

79

, . . . , λWm]
, . . . , λAn],

(4.12)

(4.13)

as the set of eigenvalues of W and the system dynamics matrix A, respectively. Then, if ΛW ⊆ ΛA,
the attack signal is called the IMP-based attack. Otherwise, if ΛW (cid:54)⊂ ΛA or the attacker has no
dynamics (e.g. a random signal), it is called a non-IMP based attack.

Remark 1. Note that, in this paper, attack signals are not restricted to IMP-based attacks that

are designed based on the dynamics (4.12). Attacks are categorized into two classes in Deﬁnition

3 based on their eﬀects on the network stability and it will be shown that IMP-based attacks can

destabilize the network while non-IMP based attacks cannot. The non-IMP based attacks cover

a broad range of attacks. The proposed resilient approach presented in section IV works for both
(cid:3)

IMP and non-IMP based attacks.
Remark 2. Note that if the attack is only on actuators of the agent i, i.e., fi(k) = ua
the attacker can design ua
in (12). If the attack is only on the sensor of the agent i, then fi(k) = −c(1 + hi)−1Kxa
the attacker can design xa

i (k), then
i (k) = W fi(k − 1), as an IMP-based attack signal that follows dynamics
i (k) and
i (k) = W fi(k − 1), to follow the dynamics in (12). Note that the scalar
coeﬃcient cannot change the common mode of the dynamics. Hence, (12) can be implemented for
(cid:3)

attacks both on sensors and actuators.

We assume that the system matrix A in (3.1) is marginally stable, with eigenvalues on the unit

circle centered at the origin. This is a standard assumption in the literature for consensus and

synchronization problems [29]. Deﬁne

S(k) =

(cid:88)N

j=1

p1jfj(k),

(4.14)

Note that S(k) = (cid:80)N

where p1j represents the element of left eigenvector corresponding to zero eigenvalue of ˆL. S(k)
in (4.14) is used in Theorem 2 to analyze the eﬀects of attack on root nodes and non-root nodes.
j=1 p1jfj(k) represents the product of the left eigenvector corresponding to
the zero eigenvalue of the graph Laplacian matrix ˆL and the attack vector f (k). Based on Lemma

1, elements of the left eigenvector corresponding to non-root nodes are zero. Thus, for a non-root

node, S(k) = 0 regardless of attack. On the other hand, the left eigenvector corresponding to root
nodes are non-zero and thus S(k) (cid:54)= 0 if there is an attack on a root node.
Theorem 2. Consider the DMAS (4.10) under the attack fi(k) with the control protocol (4.3).
Let fi(k) be designed as (4.12). Then,

80

1. An IMP-based attack destabilizes the complete network, if S(k) (cid:54)= 0, i.e., if attack is on a

root node.

2. Any non-IMP based attack or IMP-based attack with S(k) = 0 deviates agents from the

desired consensus behavior but does not cause instability, if agents are reachable from the

compromised one.

Proof. The transfer function for the DMAS (4.1), from xi(z) to ui(z), in Z-domain, can be

written as

G(z) =

xi(z)
ui(z)

= (zI − A)−1B.

Using (4.3), the global control law under the inﬂuence of the attack can be expressed as

u(z) = −(c ˆL ⊗ K)x(z) + f (z),

(4.15)

(4.16)

with u(z) = [uT
1

, . . . , uT
N

]T , x(z) = [xT
1

, . . . , xT
N

]T and f (z) = [f T
1

, . . . , f T
N

]T . Using (4.15) and

(4.16), the system state in the global form can be written as

x(z) = (IN ⊗ G(z))u(z) = (IN ⊗ G(z))(−(c ˆL ⊗ K)x(z) + f (z)),

(4.17)

where G(z) ∈ Rnxm. Let M be a non-singular matrix such that ˆL = M ΛM−1, with Λ be the Jordan
canonical form of ˆL. The left and the right eigenvectors of ˆL corresponding to the zero eigenvalue
are r and 1N , respectively [27]. Deﬁne M = [1 M1], M−1 = [rT M2]T where M1 ∈ RN×(N−1)
and M2 ∈ R(N−1)×N . Then, using (4.17) with ˆL = M ΛM−1, one has

(M ⊗ In)[InN + cΛ ⊗ G(z)K](M−1 ⊗ In)x(z) = (IN ⊗ G(z))f (z).

(4.18)

Deﬁning a state transformation as ˆx(z) = (M−1 ⊗ In)x(z) and premultiplying (4.18) with (M−1 ⊗
In) gives

ˆx(z) = [IN n + cΛ ⊗ G(z)K]−1(M−1 ⊗ G(z))f (z).

(4.19)
Let assume for simplicity that all the Jordan blocks are simple, i.e., M−1 = [pij] and M = [mij],
where pij and mij represent the elements of matrices M−1 and M. Then, for the agent i, using
(4.19) and the fact that the ﬁrst eigenvalue of ˆL is zero and its corresponding right eigenvector is

81

1N i.e. mi1 = 1, one has

N(cid:88)

j=1

p1jfj(z)+

xi(z) = G(z)

N(cid:88)

h=2

mih[In + cλhG(z)K]−1G(z)

N(cid:88)

j=1

phjfj(z).

(4.20)

We now show that [In + cKG(z)λh]−1 is Schur and thus to check stability of agents under attacks,
one needs to analyze the ﬁrst term of (4.20).

Towards this end, since (A − cλhBK), ∀h = 2, . . . , N is Schur, therefore if we show that the
roots of the characteristic polynomial (A−cλhBK) are identical to the poles of [In +cKG(z)λh]−1,
then [In + cKG(z)λh]−1 is also Schur. To this end, using (4.15), one has
∆|(zIn − (A − cλhBK))| = ∆|(zIn − A + cλhBK)|
= ∆|zIn − A|(In + cλh(zIn − A)−1BK)

(4.21)

∆|zIn − A|[(∆|zIn − A| + cλh(zIn − A)adjBK)]

∆|zIn − A|

.

=

Hence, this proves that the roots of the characteristic polynomial (A− cλhBK) are identical to the
poles of [In + cKG(z)λh]−1, and thus [In + cKG(z)λh]−1 is Schur.
(cid:80)N

To analyze the boundedness of the ﬁrst term in (4.20), note that according to Lemma 1,
j=1 p1jfj(k) in (4.20), which is identical to S(k) in (4.14), is zero for an attack on non-root
nodes and is nonzero if the attack is launched on root nodes. Consider now an IMP-based attack

on a root node. Then, using the transfer function (4.15) and the attack signal deﬁned in (4.12), one

can write (4.20) as

xi(z) =

N(cid:88)

j=1

p1j

(zIn − A)adjB(zIn − W )adjfi(0)
)(z2 + λ2

2{
)

(z2 + λ2
Ai

i=1,i(cid:54)=l

(z2 + λ2
Al

+

)}

(4.22)

Wi

N(cid:88)

h=2

mih[1 + cKG(z)λh]−1G(z)

phjfj(z),

n(cid:81)
N(cid:88)

j=1

as the marginal eigenvalue of the system dynamics A, i.e., λAl
with λAl
centered at the origin. Since the ﬁrst term of (4.22) shows that the pole λAl
centered at the origin and has multiplicity greater than 1 due to IMP-based attacks, thus the system

lies on the unit circle

lies on the unit circle

82

N(cid:88)

N(cid:88)

states tend to inﬁnity as k → ∞. Therefore, attacks on root nodes destabilize the entire network in
the sense that the state of all agents goes to inﬁnity as time tends to inﬁnity. This completes the

proof of part 1.

(cid:80)N

To prove part 2, note that based on Lemma 1, if the attack is on a non-root node, then S(k) =
j=1 p1jfj(k) is zero. Therefore, the ﬁrst term in (4.20) becomes zero and it can be expressed as

xi(z) =

mih[1 + cKG(z)λh]−1G(z)

phjfj(z).

(4.23)

h=2

j=1

According to (4.21), [In + cKG(z)λh]−1 is Schur stable. Therefore, based on (3.23), and since agent
i is unattacked itself, which concludes G(z) is also Schur, the system states are bounded, even in

the presence of attacks. However, although agents that are reachable from a compromised agent

show a stable behavior, their states deviate from the desired consensus value based on the result of
(cid:3)

Theorem 1 in [94]. This completes the proof.

Disturbance attenuation approaches focus on minimizing the eﬀects of disturbance on the local
neighborhood tracking error [92]. More speciﬁcally, the H∞ approach for DMAS (4.1) in presence of
disturbance wi(k) designs a distributed control protocol as in (4.3), such that the desired consensus
is achieved as in (4.7), if disturbance wi(k) = 0 and the bounded L2-gain condition is fulﬁlled for
any disturbance wi(k) ∈ L2[0,∞)
∞(cid:88)

∞(cid:88)

εT (k) ¯M ε(k) (cid:54) γ2

wT (k) ¯N w(k),

(4.24)

k=0

k=0

where γ > 0 is attenuation constant, ¯M and ¯N are positive deﬁnite weight matrices.
Lemma 2. Consider the normalized graph Laplacian matrix ˆL deﬁned in (4.6). Then, [ ˆLT ˆL − 2 ˆL]
is negative semideﬁnite.

Proof. Let λk be the eigenvalue of the normalized graph Laplacian matrix ˆL. So, the eigenvalue of
[ ˆLT ˆL − 2 ˆL] for a undirected graph can be written as

λ[ ˆLT ˆL − 2 ˆL] = λ2

k − 2λk = (λk − 1)2 − 1, ∀ k = 1, . . . , N.

(4.25)

Since all eigenvalues of matrix ˆL lie inside unit circle centered at 1+j0, except λ1 = 0 [27], therefore
(λk − 1)2 − 1 is less than or equal to zero for k = 1, . . . , N. This shows that [ ˆLT ˆL − 2 ˆL] is negative
(cid:3)
semideﬁnite.

83

In the following theorem, for the sake of simplicity, we consider the single integrator dynamics

in global form given by

x(k + 1) = x(k) + u(k).

Under the inﬂuence of attack, one can write the control input u(k) in (4.26) as

u(k) = − ˆLx(k) + f (k).

(4.26)

(4.27)

Theorem 3. Consider the DMAS with single integrator dynamics (4.26). Assume that the system
is under a constant attack signal f (k). Then, εi(k) → 0, ∀i ∈ NInt while agents do not reach the
desired consensus.

Proof. Consider the Lyapunov function for the discrete-time DMAS as

V (x(k), f (k)) = (− ˆLx(k) + f (k))T (− ˆLx(k) + f (k)).

(4.28)

For the considered system (4.26) under constant attack signal f (k + 1) = f (k) with the control

input (4.27), one has

∆V (x(k), f (k))
= (− ˆL[x(k) − ˆLx(k) + f (k))])T (− ˆL[x(k) − ˆLx(k) + f (k))])

−(− ˆLx(k))T (− ˆLx(k)) − 2f (k)T ˆL(− ˆLx(k) + f (k))).

After simplifying (4.29) and based on Lemma 2, one has

∆V (x(k), f (k)) = (− ˆLx(k) + f (k))T [ ˆLT ˆL
−2 ˆL](− ˆLx(k) + f (k)) (cid:54) 0.

(4.29)

(4.30)

Then, using Lasalle’s invariance principle [93], the trajectories (x(k), f (k)) converge to a set that

satisﬁes ∆V (x(k), f (k)) = 0. Based on (4.30), this yields

(− ˆLx(k) + f (k)) ∈ ker( ˆLT ˆL − 2 ˆL)

or

(− ˆLx(k) + f (k)) = 0.

(4.31)

(4.32)

84

From (3.31), one has (− ˆLx(k) + f (k)) = ¯c1N . According to this, the single integrator system
dynamics becomes xi(k + 1) = xi(k) + ¯c, which shows that it destabilizes the system. Therefore,
xi(k) → ∞ as k → ∞ ∀i = 1, . . . , N while the local neighborhood tracking error goes to zero for all
agent. Note that based on Theorem 2, (3.32) is the possible case when the attack is on a root node.
On the other hand, for an attack on a non-root node, from (4.32), one has (− ˆLx(k) + f (k)) = 0.
Since for the unattacked agent i, fi(k) = 0, therefore, the local neighborhood tracking error for
unattacked agents converge to zero, even in the presence of the attack.

We now show that unattacked agents do not reach the desired consensus, despite the fact the

local neighborhood tracking error is zero. From (4.32), one has ˆLx(k) = f (k) which can be written

for agent i as

N(cid:88)

(1 + hi)−1

aij(xj(k) − xi(k)) = fi(k).

For a compromised agent i, since fi(k) (cid:54)= 0, then, one has xi(k) (cid:54)= xj(k) for some i, j.

j=1

Now assume that agent i is unattacked, i.e., fi(k) = 0. Then, based on (4.33), one has

N(cid:88)

(1 + hi)−1

aij(xj(k) − xi(k)) = 0.

(4.33)

(4.34)

Consider the unattacked agent i as an immediate neighbor of the compromised agent ic. Let assume
by contradiction that only the compromised agent does not reach the desired consensus but all the

j=1

unattacked agents reach the desired consensus. Using (4.34), one can write

(1 + hi)−1 (cid:88)

j∈Ni

aij(xj − xi)+aiic(xic − xi) = 0.

(4.35)

Assuming that unattacked agents reach consensus, xi(k) = xj(k), ∀j ∈ Ni. However, (4.35) cannot
be satisﬁed if xi(k) = xj(k), ∀j ∈ Ni because xic(k) (cid:54)= xi(k) and this contradicts the assumption.
Therefore, this shows that the unattacked agent i is deviated from the desired consensus value.

Similarly, one can use the same argument to show that all reachable agents from the compromised
(cid:3)

agent deviate from the desired consensus value. This completes the proof.

Corollary 1. Let the attacker design its attack signal using the IMP-based approach described in
Theorem 2. Then, it bypasses the H∞ control protocol.

85

Proof. In the absence of the attack, minimizing the local neighborhood tracking error results in
minimizing the consensus error. Therefore, the H∞ control in (4.24) is used to attenuate the eﬀect
of adversarial input on the local neighborhood tracking error. However, according to Theorem 3, in

the presence of IMP attack, by making the local neighborhood tracking error go to zero, agents do
(cid:3)

not reach consensus. This completes the proof.

4.4 Resilient Distributed Control Protocol for Attacks on Sensor

and Actuator : An Adaptive Approach

In this section, the expected normal behavior of each agent is predicted using an observer-like

predictor (called here expected state predictor), which employs the agent’s dynamics to predict its

expected normal state at each time step. This expected state predictor does not use any actual

state measurement, and, instead, calculates the expected normal state of the agent based on the

evolution rule of its dynamics, and taking into account the local information it receives from its

neighbors. Then, a distributed adaptive compensator is designed using predicted behavior of agents

to compensate for any discrepancy between the actual state and its predicted normal one.

Consider the estimated state for agent i as ˆxi(k). The distributed expected state predictor is

designed as

ˆxi(k + 1) = Aˆxi(k) + cBK(1 + hi)−1

N(cid:88)

aij(ˆxj − ˆxi),

(4.36)

where the gain K and the coupling coeﬃcient c are to be designed to ensure Ac in (4.5) is
Schur. The global expected state predictor state vector for (4.36) can be written as ˆx(k) =

j=1

[ˆxT

N (k)]T ∈ RnN .

1 (k), ˆxT

2 (k), . . . , ˆxT

Lemma 3. Consider the N expected state predictors given in (4.36). Let the feedback gain K

and the coupling coeﬃcient c are designed to ensure Ac in (4.5) is Schur. Then, the expected state
predictor state ˆx(k) converges to the desired consensus value.

Proof. The designed expected state predictor in (4.36) can be expressed as

ˆxi(k + 1) = Aˆxi(k) + B ˆui(k),

(4.37)

86

where ˆui(k) = cK ˆεi(k) with the local neighborhood tracking error ˆε(k) as

N(cid:88)

ˆεi(k) = (1 + hi)−1

aij(ˆxj − ˆxi)).

(4.38)

j=1

One can write the global expected state predictor state dynamics as ˆx(k + 1) = Ac ˆx(k) ∈ RnN
c ˆx(0) ∈ RnN . As A − cλiBK is Schur stable, with λi be the eigenvalues of
which yields ˆx(k) = Ak
the normalized graph Laplacian matrix ˆL for i = 2, . . . , N and λ1 = 0. Therefore, the expected
(cid:3)
state predictor states achieve the desired consensus value.

Remark 3. Note that a broad class of the DMAS includes the leader-follower or the containment
control problem (i.e., DMAS with multiple-leader) for which even if the ˆxi(0) (cid:54)= xi(0), Lemma 3
is valid. This is because, the reference trajectory to be followed by agents is determined only by

the leaders, which are assumed to be trusted by using more advanced sensors and investing more
security. The system (4.36) acts as a reference model for the agents and if ˆxi(0) (cid:54)= xi(0), even for
the unattacked DMAS, di in (4.42) will be nonzero until the diﬀerence between the initial conditions
(cid:3)
is gone. Agents converge to the desired behavior irrespective of initial values.

We now design a distributed resilient control protocol as

ui,r(k) = ui(k) + ui,comp(k),

(4.39)

where ui(k) represents the standard control protocol deﬁned in (4.3) and ui,comp(k) represents the
distributed adaptive compensator protocol responsible for rejection of the adversarial input.

Consider the feedback gain K in the control protocol (4.3) given as

K = (R1 + BT P1B)−1BT P1A = ¯R−1

1 BT P1A,

where R1 is a positive deﬁnite design matrix, and P1 is the solution of

AT P1A − P1 − AT P1B(R1 + BT P1B)−1BT P1A = Q1,

with a positive deﬁnite matrix Q1. The designed distributed control protocol is given by

ui,r(k) = cK ¯εi(k) − di(k),

87

(4.40)

(4.41)

(4.42)

where di(k) is the estimated response of the adaptive compensator and K is the gain given by (4.40)
and (4.41). The local neighborhood tracking error ¯εi(k) in (4.42) is given by

N(cid:88)

¯εi(k) = (1 + hi)−1

aij(xc

j(k) − xc

i (k)).

The update law for the distributed adaptive compensator is designed as

j=1

di(k + 1) = θcK(ˆεi(k) − ¯εi(k)) + θdi(k),

(4.43)

(4.44)

where θ > 0 is a design parameter, and ¯εi(k) and ˆεi(k) are deﬁned in (4.43) and (4.38).

According to Lemma 3, the expected state predictor converges to the desired consensus value.

Therefore, consensus of DMAS can be achieved by showing the convergence of the agent state xi(k)
to the predicted state ˆxi(k). Deﬁne the consensus error ˜x(k) as

˜x(k) = x(k) − ˆx(k).

(4.45)

In the following theorem, we show that the consensus error remains bounded using the proposed

resilient adaptive controller.

Theorem 4. Consider the DMAS (4.10) under attacks on sensors and actuators. Let the control

protocol be developed as (4.42)-(4.44). Then, the agent’s consensus error in (4.45) is bounded, i.e.,
(cid:107)˜x(k)(cid:107) ≤ b0 for some bound b0 and it can be made arbitrarily small, despite attack.
Proof. According to Lemma 3, the expected state predictor converges to the desired consensus

value. Therefore, consensus of discrete-time DMAS can be achieved by showing the convergence

of the agent state xi(k) to the predicted state ˆxi(k). Then, with (4.10) and (4.37), one can write
˜x(k + 1) as

˜x(k + 1) = (IN ⊗ A − c ˆL ⊗ BK)˜x(k) − (IN ⊗ B) ˜d(k),

(4.46)

where

˜d(k) = d(k) − f (k),

denotes attack rejection error with d(k) = [dT

1 (k), dT

2 (k), . . . , dT

compensator vector and the dynamics of the attack f (k) is deﬁned in (4.12).

Using (4.44), the global dynamics of the adaptive compensator can be written as

88

(4.47)
N (k)]T ∈ RnN as the global adaptive

d(k + 1) = θc ˆL ⊗ ¯R−1

1 BT P1A˜x(k) + θ ˜d(k) + θ ¯f (k),

(4.48)
where ¯R1 = R1 + BT P1B and ¯f (k) = 2f (k) − (γ ⊗ IN )ua. Note that ¯f (k) = f (k) only if the
actuator of any agent is compromised. Deﬁne Q2 = QT
ˆL with
some positive deﬁnite matrix R2. Let the real part of the minimum eigenvalue of the normalized
graph Laplacian matrix ˆL be λm.

2 > 0 as Q2 = cR2(I + H)−1L = cR2

Deﬁne the Lyapunov candidate function function as

V (k) = ˜xT (k)(Q2 ⊗ P1)˜x(k) + θ−2 ˜dT (k)(R2 ⊗ ¯R1) ˜d(k).

(4.49)

The diﬀerence equation of the Lyapunov candidate function can be written as

∆V (k) = V (k + 1) − V (k)

(cid:125)
(cid:124)
= ˜xT (k + 1)(Q2 ⊗ P1)˜x(k + 1) − ˜xT (k)(Q2 ⊗ P1)˜x(k)
(cid:124)
(cid:125)
+ θ−1 ˜dT (k + 1)(R2 ⊗ R1) ˜d(k + 1) − θ−1 ˜dT (k)(R2 ⊗ R1) ˜d(k)

(cid:123)(cid:122)
(cid:123)(cid:122)

part 1

(4.50)

.

part 2

Using (4.46), part 1 of the diﬀerence equation of the Lyapunov candidate function (4.50) can be

expressed as

ˆL ⊗ AT P1BK
ˆL ⊗ (BK)T P1BK − (Q2 ⊗ P1))˜x(k)

= ˜xT (k)(Q2 ⊗ AT P1A − 2cQ2
+c2 ˆLT Q2
−2˜xT (k)[Q2 ⊗ AT P1B − c ˆLT Q2 ⊗ (BK)T P1B] ˜d(k)
+ ˜dT (k)(Q2 ⊗ BT P1B) ˜d(k).

Using the Young’s inequality, one can further simplify and express (4.51) as

(cid:54) −˜xT (k)(Q2 ⊗ Q1)˜x(k) − ˜xT (k)(−Q2 + 2cQ2

ˆL) ⊗ AT P1BK)˜x(k)

+2c2λmin(c2 ˆLT ˆLλmin(T Q−1
−2˜xT (k)(Q2 ⊗ AT P1B) ˜d(k) + 2 ˜dT (k)(Q2 ⊗ BT P1B) ˜d(k),

1 ))˜xT (k)(Q2 ⊗ Q1)˜x(k)

(4.51)

(4.52)

where T = KT BT P1BK. We now consider the part 2 of the diﬀerence equation of the Lyapunov
candidate function in (4.50) as

θ−2 ˜dT (k + 1)(R2 ⊗ ¯R1) ˜d(k + 1) − θ−2 ˜dT (k)(R2 ⊗ ¯R1) ˜d(k),

(4.53)

89

where ¯R1 = (R1 + BT P1B) is a positive deﬁnite matrix. Using (4.47), one can express (4.53) as

1
θ2

[dT (k + 1)(R2 ⊗ ¯R1)d(k + 1) − 2dT (k + 1)(R2 ⊗ ¯R1)f (k + 1)
+f T (k + 1)(R2 ⊗ ¯R1)(f (k + 1) − ˜dT (k)(R2 ⊗ ¯R1) ˜d(k)].

Using the dynamics of the distributed adaptive compensator in (4.48) with (4.54), one has

= ˜xT (k)(c ˆLT Q2 ⊗ KT BT P1A)˜x(k) + 2 ˜dT (k)(Q2 ⊗ BT P1A)˜x(k)
+2[ ¯f (k) − θ−1f (k + 1)]T (Q2 ⊗ BT P1A)˜x(k)
+(1 − θ−2) ˜dT (k)(R2 ⊗ ¯R1) ˜d(k)
+[ ¯f (k) − θ−1f (k + 1)]T (R2 ⊗ ¯R1) ˜d(k)
+[ ¯f (k) − θ−1f (k + 1)]T (R2 ⊗ ¯R1)[ ¯f (k) − θ−1f (k + 1)].

Using the Young’s inequality, one can simplify (4.55) as

˜xT (k)(cQ2

ˆL ⊗ AT P1BK)˜x(k) + 2 ˜dT (k)(Q2 ⊗ BT P1A)˜x(k)

≤ 3
2
+(2 − θ−2) ˜dT (k)(R2 ⊗ ¯R1) ˜d(k)
+4[ ¯f (k) − θ−1ψ(k)f (k)]T (R2 ⊗ ¯R1)[ ¯f (k) − θ−1ψ(k)f (k)],

(4.54)

(4.55)

(4.56)

where ψ(k) denotes how the value of attack signal changes at next time instant. If the attack signal

is constant, i.e., f (k + 1) = f (k), then ψ(k) = 1. Thus, one can infer that ψ(k) is always bounded,
i.e., |ψ(k)| < ζ ∀ k. Integrating (4.52) and (4.56) with further simpliﬁcation, one has

1
2

cQ2

ˆL) ⊗ AT P1BK)˜x(k)

∆V (cid:54) −˜xT (k)(Q2 ⊗ Q1)˜x(k)
−˜xT (k)(−Q2 +
+2c2λmin( ˆLT ˆL)λmin(T Q−1
− (θ−2 − 2 − 2λmin(c ˆLBT P1B ¯R−1
+4[ ¯f (k) − θ−1ζf (k)]T (R2 ⊗ ¯R1)[ ¯f (k) − θ−1ζf (k)].

1 ))˜xT (k)(Q2 ⊗ Q1)˜x(k)

(4.57)

1 ) ˜dT (k)(R2 ⊗ ¯R1) ˜d(k)
(cid:113)
2λmin(T Q−1
1 )

< c <

λm

1

and

.

(4.58)

One can show that ∆V ≤ 0, if the coupling coeﬃcient satisﬁes 2
λm

(cid:13)(cid:13)(cid:13) ˜d(k)
(cid:13)(cid:13)(cid:13) >

(cid:13)(cid:13)(cid:13)( ¯f (k) − θ−1ζf (k))
(cid:13)(cid:13)(cid:13)

4

θ−2 − 2 − 2λmin(c ˆLBT P1B ¯R−1
1 )

90

(cid:113)

1

The design parameter θ can be chosen such θ <
and then, one can ensure
the bound in (4.58). This shows that the agent’s consensus error is bounded, i.e., (cid:107)˜x(k)(cid:107) ≤ b0 for
some bound b0. Therefore, the actual agent’s state x(k) achieves the desired consensus behavior
with a bounded error that can be made arbitrarily small by appropriate selection of the design
(cid:3)

2+2λmin(c ˆLBT P1B ¯R−1
1 )

parameter θ. This completes the proof.

Remark 4. The coupling coeﬃcient c needs to be in a certain range which depends on the λm and
λmin(T Q−1
1 ). This condition is standard in the literature of DMAS [91]. On the other hand, the
condition for the bound on ˜d(k) in (4.58) depends on the design parameters θ, and one can select
this parameter to satisfy (4.58) which ensures ∆V ≤ 0. Thus, the bound on consensus error can
be made arbitrarily small based on the selection of design parameter θ. Moreover, this bound is
(cid:3)

conservative, and as shown in the simulation results, the consensus error almost goes to zero.

Remark 5. Compromised agents under the eﬀect of the sensor attack might not be recovered

completely and result a nonzero bound in the consensus error deﬁned in (4.45). The proposed

surement (xc

distributed adaptive law compensates the diﬀerence between the incoming neighboring sensor mea-
i (k) (cid:54)= xi(k) in the case of sensor attack. Under
i (k) = xi(k) and the bound on the consensus error can be made arbitrarily
(cid:3)

i (k)) and the desired state ˆxi(k) and xc

the actuator attack xc

small.

4.5 Simulation Results

We consider a leader-follower network of autonomous underwater vehicle’s (AUV’s) for the

evaluation of the presented results.

The communication network in Fig.4.1 considers the Sentry AUVs as agents which are manufac-

tured by the Woods Hole Oceanographic Institution [95]. The linearized model of the Sentry is of 6

DOF, but it is generally decomposed into four non-interacting subsystems which are speed subsys-

tem (u), the roll subsystem (φ), the steering subsystem (ν, r, ψ), the diving subsystem (ω, q, z, θ).

Here, we focus on the diving subsystem of Sentry AUV for the desired depth maneuvering in the

leader-follower network. The diving subsystem dynamics of Sentry AUV follows the dynamics in

(4.1) where

91



A =

0.65

0.54

0.0 0.0019

0.21

1.48

0.0

0.83

0.84 1.0

0.11

1.21

0.0

0.01

0.99

0.99

 and B =





0.08

0.13

- 0.13 0.20

0.02

0.09

- 0.07 0.09

(4.59)

with xi(k) = [(ωi(k), qi(k), zi(k), θi(k))]T , and ui(k) = [δb
i (k) and
(ωi(k), qi(k), zi(k), θi(k)) represent bone and stern plane deﬂections, and heave speed, pitch rate,
depth and pitch, respectively.

i (k)]T , where δb

i (k), δb

i (k), δb

Figure 4.1: Graph topology.

In the network communication graph, we assumed that the agent 0 represents a active non-

autonomous leader which aim to follow a desired sinusoidal depth trajectory and agents 1 to 5

designate the followers. The leader has the control input u0(k) = K0x0(k) + r(k), where K0 is state
feedback gain, x0 denotes the leader state and r(k) represents the desired sinusoidal trajectory,
respectively. Since the leader input is non-zero, slightly diﬀerent discrete-time control protocol

from that the one proposed in the paper is used for which the leader exchanges its input signal

u0 with its neighbors and agents reach consensus by exchanging states and leader’s input. This,
however, does not change our attack analysis and mitigation. The state feedback gain K0 =
[ - 0.18 -2.25 0.13 -0.21; 1.56 5.39 0.49 1.59]

92

Figure 4.2: The Agents depth trajectory under the inﬂuence of the attacks on AUVs 2 and 3. Depth

without adaptive compensator

Figure 4.3: The Agents depth trajectory under the inﬂuence of the attacks on AUVs 2 and 3. Depth with

adaptive compensator

of Agent 2 and Agent 3 with attack signals ua

Now, the eﬀect of multiple attacks on the network is analyzed. We consider attack on actuators
3(k) = [20sin(k) 20sin(k)](cid:48),
respectively at t = 40 sec. Fig. 4.2 shows that agents which are reachable from compromised Agents

2(k) = [30 30](cid:48) and ua

2 and 3 are deviated from the desired behavior. This veriﬁes results of Theorem 2. Then, Fig. 4.3

illustrates the response of the system under the inﬂuence of multiple attacks using the proposed

controller with Q1 and R1 be identity matrix in (4.40) and (4.41), respectively. The system states
achieve the desired consensus behavior, even in the presence of the attack. This result demonstrates

the eﬀectiveness of the proposed resilient controller in Theorem 4 for multiple attacks. Note that

result in Fig. 4.3 also shows that this approach is not limited to particular model of attack and

attack signal can be constant or time-varying. Compared to the existing work such as [18], the

93

020406080Time (s)510152025Depth(m)020406080Time (s)510152025Depth(m)presented approach brings back the compromised agents to the network. However, approaches such

as [18] work also for attacks on the communication network while the presented approach is limited

to attacks on sensors and actuators.

4.6 Conclusion

This paper analyzes the eﬀects of attacks for leaderless DMAS and designs an adaptive resilient

distributed control protocol for attack mitigation. It is shown that how the IMP-based attack on a

root node can destabilize the entire network. To overcome the eﬀect of the attacks on sensors and

actuators, a resilient controller is developed based on an expected state predictor. The presented

controller shows that the attack on the sensor and actuator can be mitigated without compromising

the connectivity of the network and achieves the desired consensus. Although not considered in this

paper, attacks on the communication links can be removed by integrating our approach with existing

resilient methods presented in [10]-[11]. The presented approach also works for the leader-follower

problems in which leaders are assumed to be trusted.

94

CHAPTER 5

SECURE EVENT-TRIGGERED DISTRIBUTED KALMAN FILTERS FOR STATE

ESTIMATION OVER WIRELESS SENSOR NETWORKS

5.1

Introduction

Motivated by results on resilient designs presented in the previous chapters, we consider the

problem of secure state estimation for distributed sensor networks. This chapter analyzes adverse

eﬀect of attacks and designs a resilient event-triggered distributed state estimation approach that can

perform accurate state estimation despite attacks. More Speciﬁcally, ﬁrst, we show that the attacker

can cause emerging non-triggering misbehavior so that the compromised sensors do not broadcast

any information to their neighbors. This can signiﬁcantly harm the network connectivity and its

collective observability, which is a necessary condition for solving the distributed state estimation

problem. We then show that an attacker can achieve continuous-triggering misbehavior which drains

the communication resources and aﬀects performance. Then, to detect adversarial intrusions a

Kullback-Leibler (KL) divergence based detector is presented and estimated via k-nearest neighbors

approach to obviate the restrictive Gaussian assumption on the probability density function of the

attack signal. Based on attack detection results, ﬁnally, to mitigate attacks on event-triggered DKF,

a meta-Bayesian approach is employed that performs second-order inference to form conﬁdence and

trust about the truthfulness or legitimacy of the outcome of its own ﬁrst-order inference (i.e., the

posterior belief about the state estimate) and those of its neighbors, respectively. Each sensor

communicates its conﬁdence to its neighbors and also incorporates the trust about its neighbors

into its posterior update law to put less weight on untrusted data and thus successfully discard

corrupted information.

5.2 Preliminaries

The data communication among sensors in a WSN is captured by an undirected graph G, consists
of a pair (V,E), where V = {1, 2, . . . , N} is the set of nodes or sensors and E ⊂ V × V is the set
of edges. An edge from node j to node i, represented by (j, i), implies that node j can broadcast

95

information to node i. Moreover, Ni = {j : (j, i) ∈ E} is the set of neighbors of node i on the graph
G. An induced subgraph Gw is obtained by removing a set of nodes W ⊂ V from the original graph
G, which is represented by nodes set V\W and contains the edges of E with both endpoints in V\W.
Throughout this chapter, R and N represent the sets of real numbers and natural numbers,
respectively. AT denotes transpose of a matrix A. tr(A) and max(ai) represent trace of a matrix
A and maximum value in the set, respectively. C(S) represents the cardinality of a set S. σmax(A),
λmax(A), and In represent maximum singular value, maximum eigenvalue of matrix A, and an
identity matrix of dimension n, respectively. U(a, b) with a < b denotes an uniform distribution
between the interval a and b. Consider pX (x) as the probability density of the random variable or
vector x with X taking values in the ﬁnite set {0, ..., p}. When a random variable X is distributed
normally with mean ν and variance σ2, we use the notation X ∼ N (υ, σ2). E[X] and ΣX =
E[(X − E[X])(X − E[X])T ] denotes, respectively, the expectation and the covariance of X. Finally,
E[.|.] represents the conditional expectation.

5.2.1 Process Dynamics and Sensor Models

Consider a process that evolves according to the following dynamics

x(k + 1) = Ax(k) + w(k),

(5.1)

where A denotes the process dynamic matrix, and x(k) ∈ Rn and w(k) are, respectively, the process
state and process noise at the time k. The process noise w(k) is assumed to be independent and
identically distributed (i.i.d.) with Gaussian distribution, and x0 ∈ N (ˆx0, P0) represents the initial
process state with ˆx0 as mean and P0 as covariance, respectively.

The goal is to estimate the state x(k) for the process (5.1) in a distributed fashion using N

sensor nodes that communicate through the graph G, and their sensing models are given by

yi(k) = Cixi(k) + vi(k);

∀i = 1,··· , N,

(5.2)

where yi(k) ∈ Rp represents the measurement data with vi(k) as the i.i.d. Gaussian measurement
noise and Ci as the observation matrix of the sensor i, respectively.

Assumption 1. The process noise w(k), the measurement noise vi(k), and the initial state x0 are
uncorrelated random vector sequences.

96

Assumption 2. The sequences w(k) and vi(k) are zero-mean Gaussian noise with

and

E[w(k)(w(h))T ] = µkhQ

E[vi(k)(vi(h))T ] = µkhRi,

with µkh = 0 if k (cid:54)= h, and µkh = 1 otherwise. Moreover, Q ≥ 0 and Ri > 0 denote the noise
covariance matrices for process and measurement noise, respectively and both are ﬁnite.

Deﬁnition 1.

(Collectively observable) [106]. We call the plant dynamics (5.1) and the

measurement equation (5.2) collectively observable, if the pair (A, CS) is observable where Cs is
the stack column vectors of Cj, ∀j ∈ S with S ⊆ V and C(S) > N/2.
Assumption 3. The plant dynamics (5.1) and the measurement equation (5.2) are collectively
observable, but not necessarily locally observable, i.e., (A, Ci) ∀i ∈ V is not necessarily observable.
Assumptions 1 and 2 are standard assumptions in Kalman ﬁlters. Assumption 3 states that

the state of the target in (5.1) cannot be observed by measurements of any single sensor, i.e., the

pairs (A, Ci) cannot be observable (see for instances [106] and [132]). It also provides the necessary
assumption of collectively observable for the estimation problem to be solvable. Also note that

under Assumption 2, i.e., the process and measurement covariance are ﬁnite, the stochastic observ-

ability rank condition coincides with the deterministic observability [Theorem 1, 43]. Therefore,

deterministic observability rank condition holds true irrespective of the process and measurement

noise.

5.2.2 Overview of Event-triggered Distributed Kalman Filter

This subsection presents the overview of the event-triggered DKF for estimating the process state

x(k) in (5.1) from a collection of noisy measurements yi(k) in (5.2).

Let the prior and posterior estimates of the target state x(k) for sensor node i at time k be
denoted by xi(k|k − 1) and xi(k|k), respectively. In the centralized Kalman ﬁlter, a recursive rule
based on Bayesian inference is employed to compute the posterior estimate xi(k|k) based on its
prior estimate xi(k|k− 1) and the new measurement yi(k). When the next measurement comes, the
previous posterior estimate is used as a new prior and it proceeds with the same recursive estimation

97

rule. In the event-triggered DKF, the recursion rule for computing the posterior incorporates not

only its own prior and observations, but also its neighbors’ predictive state estimate. Sensor i

communicates its prior state estimate to its neighbors and if the norm of the error between the

actual output and the predictive output becomes greater than a threshold after a new observation

arrives. That is, it employs the following event-triggered mechanism for exchange of data with its

neighbors

(cid:107)yi(k) − Ci ˜xi(k − 1)(cid:107) < α,

(5.3)

where α denotes a predeﬁned threshold for event-triggering. Moreover, ˜xi(k) denotes the predictive
state estimate for sensor i and follows the update law

˜xi(k) = ζi(k)xi(k|k − 1) + (1 − ζi(k))A˜xi(k − 1), ∀i ∈ V,

(5.4)

with ζi(k) ∈ {0, 1} as the transmit function. Note that the predictive state estimate update equation
in (4) depends on the value of the transmit function ζi(k) which is either zero or one depending on
the triggering condition in (3). When ζi(k) = 1, then the prior and predictive state estimates are
the same, i.e., ˜xi(k) = xi(k|k − 1). When ζi(k) = 0, however, the predictive state estimate depends
on its own previous state estimate, i.e., ˜xi(k) = A˜xi(k − 1).

Incorporating (5.4), the following recursion rule is used to update the posterior state estimate

in the event-triggered DKF [112], [114] for sensor i as

xi(k|k) = xi(k|k − 1) + Ki(k)(yi(k) − Cixi(k|k − 1))

(˜xj(k) − ˜xi(k)),

(cid:80)
j∈Ni

+γi

where

xi(k|k − 1) = Axi(k − 1|k − 1),

(5.5)

(5.6)

is the prior update. Moreover, the second and the third terms in (5.5) denote, respectively, the

innovation part (i.e., the estimation error based on the sensor ith new observation and its prior

prediction) and the consensus part (i.e., deviation of the sensor state estimates from its neighbor’s

state estimates). We call this recursion rule as the Bayesian ﬁrst-order inference on the posterior,

which provides the belief over the value of the state.

Moreover, Ki(k) and γi in (5.5), respectively, denote the Kalman gain and the coupling co-
eﬃcient. The Kalman gain Ki(k) in (5.5) depends on the estimation error covariance matrices

98

associated with the prior xi(k|k − 1) and the posterior xi(k|k) for the sensor i. Let deﬁne the prior
and posterior estimated error covariances as

Pi(k|k − 1) = E[(x(k) − xi(k|k − 1))(x(k) − xi(k|k − 1))T ],
Pi(k|k) = E[(x(k) − xi(k|k))(x(k) − xi(k|k))T ].

which are simpliﬁed as [112], [114]

Pi(k|k) = Mi(k)Pi(k|k − 1)(Mi(k))T + Ki(k)Ri(Ki(k))T ,

(5.7)

(5.8)

and

Pi(k|k − 1) = APi(k − 1|k − 1)AT + Q.

(5.9)
with Mi(k) = In − Ki(k)Ci. Then, the Kalman gain Ki(k) is designed to minimize the estimation
covariance and is given by [112], [114]

Ki(k) = Pi(k|k − 1)(Ci)T (Ri(k) + CiPi(k|k − 1)(Ci)T )−1.

Let the innovation sequence ri(k) for the node i be deﬁned as

ri(k) = yi(k) − Cixi(k|k − 1),

where ri(k) ∼ N (0, Ωi(k)) with

Ωi(k) = E[ri(k)(ri(k))T ] = CiPi(k|k − 1)Ci

T + Ri(k).

(5.10)

(5.11)

(5.12)

Note that for the notional simplicity, henceforth we denote the prior and posterior state estima-

tions as xi(k|k − 1)
posterior covariance are, respectively, denoted by Pi(k|k − 1)

= ¯xi(k) and xi(k|k)

∆

∆
= ˆxi(k), respectively. Also, the prior covariance and the

∆

= ¯Pi(k) and Pi(k|k)

∆
= ˆPi(k).

(a)

(b)

(5.13)

The event-triggered DKF algorithm becomes

Time updates: ¯xi(k + 1) = Aˆxi(k)

¯Pi(k + 1) = A ˆPi(k)AT + Q(k)

Measurment updates:

99



ˆxi(k) = ¯xi(k) + Ki(k)(yi(k) − Ci ¯xi(k))
(˜xj(k) − ˜xi(k)),

(cid:80)
j∈Ni

+γi

˜xi(k) = ζi(k)¯xi(k) + (1 − ζi(k))A˜xi(k − 1),
i )−1,
Ki(k) = ¯Pi(k)CT
ˆPi(k) = Mi

T + Ki(k)Ri(k)(Ki(k))T .

i (Ri(k) + Ci

¯Pi(k)CT

(a)

(b)

(c)

(d)

(5.14)

¯Pi(k)Mi

Remark 1. Based on the result presented in [17, Th.1], the event triggered DKF (5.13)-(5.14)
ensures that the estimation error ˆxi(k) − x(k) is exponentially bounded in the mean square sense
∀i ∈ V.

Remark 2. The consensus gain γi in (5) is designed such that the stability of the event-triggered
DKF in (13)-(14) is guaranteed. Speciﬁcally, as shown in [Theorem 2, 19], if

2(I − KiCi)(Γi)−1
λmax(L)λmax((Γ)−1)

γi =

where L denotes the Laplacian matrix associated with the graph G and Γ = diag{Γ1, .., ΓN} with
Γi = (I − KiCi)T AT ( ¯Pi)+A(I − KiCi), ∀i = {1, ..., N}, then the stability of the event-triggered
DKF in (13)-(14) is guaranteed. However, the design of event-triggered DKF itself is not the concern

of this chapter and this chapter mainly analyzes the adverse eﬀects of cyber-physical attacks on the

event-triggered DKF and proposes an information-theoretic approach based attack detection and

mitigation mechanism. Note that the presented attack analysis and mitigation can be extended to

other event-triggered methods such as [113] and [115] as well.

5.2.3 Attack Modeling

In this subsection, we model the eﬀects of attacks on the event-triggered DKF. An attacker can

design a false data injection attack to aﬀect the triggering mechanism presented in (5.3) and con-

sequently compromise the system behavior.

Deﬁnition 2. (Compromised and intact sensor node). We call a sensor node that is directly

under attack as a compromised sensor node. A sensor node is called intact if it is not compromised.
Throughout the chapter, Vc and V\Vc denote, respectively, the set of compromised and intact
sensor nodes.

100

Consider the sensing model (5.2) for sensor node i under the eﬀect of the attack as

ya
i (k) = yi(k) + fi(k) = Cixi(k) + vi(k) + fi(k),

(5.15)

where yi(k) and ya
i (k) are, respectively, the sensor i’s actual and corrupted measurements and
fi(k) ∈ Rp represents the adversarial input on sensor node i. For a compromised sensor node i, let
p(cid:48) ⊆ p be the subset of measurements disrupted by the attacker.

Let the false data injection attack ¯fj(k) on the communication link be given by

j (k) = ¯xj(k) + ¯fj(k), ∀j ∈ Ni.
¯xa

(5.16)

Using (5.15)-(5.16), in the presence of an attack on sensor node i and/or its neighbors, its state

estimate equations in (5.14)-(5.13) becomes



i (k) = ¯xa
ˆxa

i (k + 1) = Aˆxa
¯xa
i (k) = ζi(k)¯xa
˜xa

i (k) = Ka
f a

i (k))
i (k)) + f a

i (k),

(˜xj(k) − ˜xa

(cid:80)
i (k)(yi(k) − Ci ¯xa
i (k) + Ka
j∈Ni
+γi
i (k),
i (k) + (1 − ζi(k))A˜xa
(cid:88)
j∈Ni

i (k)fi(k) + γi

i (k − 1),

˜fj(k),

˜fj(k) = ζj(k) ¯fj(k) + (1 − ζj(k)) ˜fj(k − 1).

where

with

(5.17)

(5.18)

(5.19)

(5.20)

The Kalman gain Ka

i (k) in presence of attack is given by

Ka

i (k) = ¯P a

i (k)CT

i (Ri(k) + Ci

i (k)CT
¯P a

i )−1.

The ﬁrst part in (5.18) represents the direct attack on sensor node i and the second part denotes the
aggregative eﬀect of adversarial input on neighboring sensors, i.e., j ∈ Ni. Moreover, ˆxa
and ˜xa

i (k),
i (k) denote, respectively, the corrupted posterior, prior, and predictive state estimates. The

i (k), ¯xa

Kalman gain Ka

i (k) depends on the following corrupted prior state estimation error covariance

¯P a
i (k + 1) = A ˆP a

i (k)AT + Q.

(5.21)

where the corrupted posterior state estimation error covariance ˆP a

i (k) evolution is shown in the

following theorem.

101

Theorem 1. Consider the process dynamics (5.1) with compromised sensor model (5.15). Let the
i (k) in (5.18).

state estimation equation be given by (5.17) in the presence of attacks modeled by f a

Then, the corrupted posterior state estimation error covariance ˆP a

i (k) is given by

ˆP a
i (k) = M a

i (k))T
i (k)Ξf (k)

(5.22)

i (k) ¯P a
+2γi

(cid:80)
i (k)[Ri(k) + Σf
i (k)(M a
i (k))T + Ka
2((cid:80)
a
a
i,j (k)− (cid:95)
(cid:95)
j∈Ni
i (k))(M a
(
P
P
j (k) − 2 ˜P a
j∈Ni
( ˜P a
i,j(k) + ˜P a

i (k)](Ka
i (k))T − 2Ka
i (k)),

+γi

i (k) and Ξf (k) denote the attacker’s input dependent covariance matrices and M a
i (k)Ci) with Ka

where Σf
(In − Ka
covariance update in (5.20) and (5.21), respectively. Moreover, ˜P a

i =
i (k) as the prior state estimation error
a
(cid:95)
i,j (k) are cross-
P

i (k) as the Kalman gain and ¯P a

i,j(k) and

correlated estimation error covariances updated according to (6)-(8).

Proof. See Appendix A.

Note that the corrupted state estimation error covariance recursion ˆP a

i (k) in (5.22) depends on
the attacker’s input distribution. Since the state estimation depends on compromised estimation
i (k), therefore, the attacker can design its attack signal to blow up the estimates

error covariance ˆP a

of the desired process state and damage the system performance.
5.3

Eﬀect of Attack on Triggering Mechanism

This section presents the eﬀects of cyber-physical attacks on the event-triggered DKF. We

show that although event-triggered approaches are energy eﬃcient, they are prone to triggering

misbehaviors, which can harm the network connectivity, observability and drain its limited resources.

5.3.1 Non-triggering Misbehavior

In this subsection, we show how an attacker can manipulate the sensor measurement to mislead the

event-triggered mechanism and damage network connectivity and collective observability by causing

non-triggering misbehavior as deﬁned in the following Deﬁnition 3.

Deﬁnition 3 (Non-triggering Misbehavior). The attacker designs an attack strategy such that

a compromised sensor node does not transmit any information to its neighbors by misleading the

triggering mechanism in (5.3), even if the actual performance deviates from the desired one.

102

The following theorem shows how a false data injection attack, followed by an eavesdropping

attack, can manipulate the sensor reading to avoid the event-triggered mechanism (5.3) from being

violated while the actual performance could be far from the desired one. To this end, we ﬁrst deﬁne

the vertex cut of the graph as follows.
Deﬁnition 4 (Vertex cut). A set of nodes C ⊂ V is a vertex cut of a graph G if removing the
nodes in the set C results in disconnected graph clusters.
Theorem 2. Consider the process dynamics (5.1) with N sensor nodes (5.2) communicating over
the graph G. Let sensor i be under a false data injection attack given by

ya
i (k) = yi(k) + θa

i (k)1p, ∀k ≥ L + 1,

(5.23)

where yi(k) is the actual sensor measurement at time instant k and L denotes the last triggering
i (k) ∼ U(a(k), b(k)) is a scalar uniformly distributed random variable in
time instant. Moreover, θa

the interval (a(k), b(k)) with a(k) = ϕ − (cid:107)Ci ˜xi(k − 1)(cid:107) + (cid:107)yi(k)(cid:107) ,

b(k) = ϕ + (cid:107)Ci ˜xi(k − 1)(cid:107) − (cid:107)yi(k)(cid:107) ,

(5.24)

where ˜xi(k) and ϕ < α denote, respectively, the predictive state estimate and an arbitrary scalar
value less than the triggering threshold α. Then,

1. The triggering condition (5.3) will not be violated for the sensor node i and it shows non-

triggering misbehavior;

2. The original graph G is clustered into several subgraphs, if all sensors in a vertex cut are

under attack (5.23).

Proof. Taking norms from both sides of (5.23), the corrupted sensor measurement ya

i (k) becomes

i (k)(cid:13)(cid:13) =(cid:13)(cid:13)yi(k) + θa
(cid:13)(cid:13)ya
(cid:13)(cid:13) ≤(cid:13)(cid:13)ya
(cid:107)yi(k)(cid:107) −(cid:13)(cid:13)θa

(cid:13)(cid:13) .
i (k)(cid:13)(cid:13) ≤ (cid:107)yi(k)(cid:107) +(cid:13)(cid:13)θa

i (k)1p

i (k)1p

(5.25)

(5.26)

(cid:13)(cid:13) .

i (k)1p

Using the triangular inequality for (5.25) yields

103

Based on the bounds of θa

i (k), given by (5.24), (5.26) becomes

(cid:107)Ci ˜xi(k − 1)(cid:107) − ϕ ≤(cid:13)(cid:13)ya
i (k)(cid:13)(cid:13) − (cid:107)Ci ˜xi(k − 1)(cid:107) − ϕ)((cid:13)(cid:13)ya
((cid:13)(cid:13)ya

i (k)(cid:13)(cid:13) ≤ (cid:107)Ci ˜xi(k − 1)(cid:107) + ϕ,
i (k)(cid:13)(cid:13) − (cid:107)Ci ˜xi(k − 1)(cid:107) + ϕ) ≤ 0.

which yields

This implies that the condition (cid:13)(cid:13)ya

i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13) ≤ ϕ < α,

(5.27)

(5.28)

(5.29)

always holds true. Therefore, under (5.23)-(5.24), the corrupted sensor node i shows non-triggering

misbehavior, which proves part 1.

We now prove part 2. Let An ⊆ Vc be the set of sensor nodes showing non-triggering misbe-
havior. Then, based on the presented result in part 1, under the attack signal (5.23), sensor nodes
in the set An are misled by the attacker and consequently do not transmit any information to their
neighbors which make them to act as sink nodes. Since the set of sensor nodes An is assumed to be
a vertex cut. Then, the non-triggering misbehavior of sensor nodes in An prevents information ﬂow
from one portion of the graph G to another portion of the graph G and thus clusters the original
graph G into subgraphs. This completes the proof.
Remark 3. Note that to design the presented strategic false data injection attack signal given in

(5.23) an attacker needs to eavesdrop the actual sensor measurement yi(k) and the last transmitted
prior state estimate ¯xi(L) through the communication channel. The attacker then determines the
predictive state estimate ˜xi(k) using the dynamics in (5.5) at each time instant k ≥ L + 1 to achieve
non-triggering misbehavior for the sensor node i.

We provide Example 1 for further illustration of the results of Theorem 2.

Example 1. Consider a graph topology for a distributed sensor network given in ﬁg. 5.1. Let the
vertex cut An = {5, 6} be under the presented false data injection attack in Theorem 2 and show
non-triggering misbehavior. Then, the sensor nodes in An = {5, 6} do not transmit any information
to their neighbors under the designed false data injection attack. Moreover, the sensor nodes in
An = {5, 6} act as sink nodes and prevent information ﬂow from subgraph G1 to subgraph G2
which clusters the graph G into two non-interacting subgraphs G1 and G2 as shown in Fig. 5.1. This

104

Figure 5.1: Eﬀect of non-triggering misbehavior on sensor nodes {5,6} cluster the graph G in the two

isolated graphs G1 and G2.

example shows that the attacker can compromise the vertex cut An of the original graph G such
that it shows non-triggering misbehavior and harm the network connectivity or cluster the graph

into various non-interacting subgraphs.

We now analyze the eﬀect of non-triggering misbehavior on the collective observability of the

sensor network. To do so the following deﬁnitions are needed.
Deﬁnition 5 (Potential Set). A set of nodes P ⊂V is said to be a potential set of the graph G
if the pair (A, CV\P ) is not collectively observable.
Deﬁnition 6 (Minimal Potential Set). A set of nodes Pm ⊂ V is said to be a minimal potential
set if Pm is a potential set and no subset of Pm is a potential set.
Remark 4. Note that if the attacker knows the graph structure and the local pair(A, Ci), ∀i ∈ V.
Then, the attacker can identify the minimum potential set of sensor nodes Pm in the graph G and
achieves non-triggering misbehavior for Pm. Thus, the set of sensor nodes Pm does not exchange
any information with its neighbors and becomes isolated in the graph G.

105

Corollary 1. Let the set of sensors that shows non-triggering misbehavior be the minimal potential
set Sn. Then, the network is no longer collectively observable and the process state reconstruction
from the distributed sensor measurements is impossible.
Proof. According to the statement of the corollary, Sn represents a minimal potential set of the
graph G and shows non-triggering misbehavior. Then, the sensor nodes in the set Sn do not transmit
any information to their neighbors and they act as sink nodes, i.e., they only absorb information.

Therefore, the exchange of information happen just between the remaining sensor nodes in the
graph G\Sn. Hence, after excluding the minimum potential nodes Sn, the pair (A, CG\Sn
unobservable based on the Deﬁnitions 5 and 6, and thus makes the state reconstruction impossible.

) becomes

This completes the proof.

5.3.2 Continuous-triggering Misbehavior

In this subsection, we discuss how an attacker can compromise the actual sensor measurement

to mislead the event-triggered mechanism and achieves continuous-triggering misbehavior and thus

results in a time-driven DKF that not only drains the communication resources but also continuously

propagates the adverse eﬀect of attack in the network.

Deﬁnition 7 (Continuous-triggering Misbehavior). Let the attacker design an attack strategy

such that it deceives the triggering mechanism in (5.3) at each time instant. This turns the event-

driven DKF into a time-driven DKF that continuously exchanges corrupted information among

sensor nodes. We call this a continuous-triggering misbehavior.

We now show how a reply attack, followed by an eavesdropping attack, can manipulate the

sensor reading to cause continuous violation of the event-triggered mechanism (5.3).
Theorem 3. Consider the process dynamics (5.1) with N sensor nodes (5.2) communicating over
the graph G. Let the sensor node i in (5.2) be under a replay attack given by

i (k) = Ci ¯xi(k − 1) + υi(k), ∀k ≥ l + 1,
ya

(5.30)
where ¯xi(k − 1) represents the last transmitted prior state and υi(k) denotes a scalar disruption
signal. l denotes the last triggering time instant when intact prior state estimate was transmitted.
Then, the sensor node i shows continuous-triggering misbehavior if the attacker selects (cid:107)υi(k)(cid:107) > α.

106

Proof. To mislead a sensor to cause a continuous-triggering misbehavior, the attacker needs to

i.e., (cid:13)(cid:13)ya

design the attack signal such that the event-triggered condition (5.3) is constantly being violated,

i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13) ≥ α all the time. The attacker can eavesdrop the last transmitted prior

state estimate ¯xi(k − 1) and design the strategic attack signal given by (5.30). Then, one has

i (k) − Ci ˜xi(k − 1) = Ci ¯xi(k − 1) + δi(k) − Ci ˜xi(k − 1)
ya
= Ci ¯xi(k − 1) + υi(k)−Ci[ζi(k − 1)¯xi(k − 1)
+(1 − ζi(k − 1))A¯xi(k − 2)]
= (1 − ζi(k − 1))Ci[¯xi(k − 1) − A¯xi(k − 2)] + υi(k),

Taking the norm from both sides of (5.31) yields

i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13)
(cid:13)(cid:13)ya

= (cid:107)(1 − ζi(k − 1))Ci[¯xi(k − 1) − A¯xi(k − 2)] + υi(k)(cid:107) ,

Since for k = l + 1, ζi(l) = 1

i (l + 1) − Ci ˜xi(l)(cid:13)(cid:13) = (cid:107)υi(l + 1)(cid:107) ,
(cid:13)(cid:13)ya

(5.31)

(5.32)

(5.33)

(5.34)

If the attacker selects υi(l + 1) such that (cid:107)υi(l + 1)(cid:107) > α, then the attack signal (5.30) ensures
triggering at time instant k = l + 1. Then, based on similar argument for (5.32), ∀k ≥ l + 1

(cid:13)(cid:13)ya
i (k) − Ci ˜xi(k − 1)(cid:13)(cid:13) = (cid:107)υi(k)(cid:107) > α,

which ensures continuous triggering misbehavior. This completes the proof.

To achieve continuous-triggering misbehavior the attacker needs to eavesdrop prior state esti-
mates ¯xi(k − 1) at each triggering instant and selects the υi(k) large enough such that (cid:107)υi(k)(cid:107) > α
always holds true.

Note that continuous-triggering misbehavior can completely ruin the advantage of event-triggered

mechanisms and turn it into time-driven mechanisms. This signiﬁcantly increases the communica-

tion burden. Since nodes in the WSNs are usually powered through batteries with limited energy,

the attacker can drain sensors limited resources by designing the above-discussed attack signals to

achieve continuous-triggering misbehavior, and, consequently can make them non-operating in the

network along with the deteriorated performance of the network.

107

Note that although we classiﬁed attacks into non-triggering misbehavior and continuous-triggering

misbehavior, to analyze how the attacker can leverage the event-triggered mechanism, the following

analysis, detection and mitigation approaches are not restricted to any class of attacks.
5.4 Attack Detection

In this section, we present an entropy estimation-based attack detection approach for the event-

triggered DKF.

The KL divergence is a non-negative measure of the relative entropy between two probability

distributions which is deﬁned as follows.

Deﬁnition 8 (KL Divergence) [36]. Let X and Z be two random variables with probability

density function PX and PZ, respectively. The KL divergence measure between PX and PZ is
deﬁned as

(cid:90)

(cid:18) PX (θ)

(cid:19)

PZ (θ)

DKL(PX||PZ ) =

PX (θ) log

θ∈Θ

,

(5.35)

with the following properties [41]

1. DKL(PX||PZ ) ≥ 0;
2. DKL(PX||PZ ) = 0 if and only if, PX = Pz;
3. DKL(PX||PZ ) (cid:54)= DKL(PZ||PX ).

In the existing resilient literature, the entropy-based anomaly detectors need to know the prob-

ability density function of sequences, i.e., PX and PZ , to determine the relative entropy. In most
of the cases, authors assume that the probability density function of corrupted innovation sequence

remains Gaussian (see [36] and [135] for instance). Since, the attacker’s input signal is unknown,

it is restrictive to assume that the probability density function of the corrupted sequence remains

Gaussian. To relax this restrictive assumption on probability density function of the corrupted se-
quence, we estimate the relative entropy between two random sequences X and Z using k−nearest
neighbor (k − N N ) based divergence estimator [40].

Let {X1, . . . , Xn1} and {Z1, . . . , Zn2} be i.i.d. samples drawn independently from PX and PZ ,
k (i) be the Euclidean distance between Xi and its k − N N

respectively with Xj, Zj ∈ Rm. Let dX
in {Xl}l(cid:54)=i. The k − N N of a sample s in {s1, . . . , sn} is si(k) where i(1), . . . , i(n) such that

108

(cid:13)(cid:13)(cid:13)s − si(1)

(cid:13)(cid:13)(cid:13) ≤(cid:13)(cid:13)(cid:13)s − si(2)

(cid:13)(cid:13)(cid:13) ≤ . . . ≤(cid:13)(cid:13)(cid:13)s − si(n)
(cid:13)(cid:13)(cid:13) .
(cid:13)(cid:13)(cid:13)
(cid:13)(cid:13)(cid:13)Xi − Xj

More speciﬁcally, the Euclidean distance dX

k (i) is given by [136]

j=1,...,n1,j(cid:54)={i,j1,...,jk−1}
The k − N N based relative entropy estimator is given by [40]

dX
k (i) =

ˆDKL(PX||PZ ) =

m
n1

log

dZ
k (i)
dX
k (i)

+ log

n2
n1 − 1

.

min

n1(cid:88)

i=1

(5.36)

(5.37)

The innovation sequences represent the deviation of the actual output of the system from the

estimated one. It is known that innovation sequences approach a steady state quickly and thus it

is reasonable to design innovation-based anomaly detectors to capture the system abnormality [36].

Using the innovation sequence of each sensor and the innovation sequences that it estimates for its

neighbors, we present innovation based divergence estimator and design detectors to capture the

eﬀect of the attacks on the event-triggered DKF.

Based on innovation expression (5.11), in the presence of attack, one can write the compromised
i (k) in (5.15) and state estimation

i (k) for sensor node i with disrupted measurement ya

innovation ra
¯xa
i based on (5.17) as

ra
i (k) = ya

i (k) − Ci ¯xa

i (k).

Let {ra

i (l), . . . , ra

(5.38)
i (l − 1 + w)} and {ri(l), . . . , ri(l − 1 + w)} be i.i.d. p-dimensional samples
of corrupted and nominal innovation sequences with probability density function Pra
and Pri,
i
respectively. The nominal innovation sequence follows ri(k) deﬁned in (5.11). Using k − N N based
relative entropy estimator (5.37), one has [40]

ˆDKL(Pra
i

||Pri) =

p
w

log

ri
d
k (j)
ra
i
k (j)
d

+ log

w
w − 1

, ∀i ∈ V.

Deﬁne the average of the estimated KL divergence over a time window of T as

w(cid:88)

j=1

k(cid:88)

(5.39)

(5.40)

Φi(k) =

1
T

l=k−T +1

ˆDKL(Pra
i

||Pri), ∀i ∈ V.

Now, in the following theorem, it is shown that the eﬀect of attacks on the sensors can be

captured using (5.40).

Theorem 4. Consider the distributed sensor network (5.1)-(5.2) under attack on sensor. Then,

109

1. in the absence of attack, Φi(k) = log(w/w − 1), ∀k;
2. in the presence of attack, Φi(k) > δ, ∀k > la, where δ and la denotes, respectively, a prede-

ﬁned threshold and the time instant at which the attack happen.

Proof. In the absence of attack, the samples of innovation sequences {ra
ra
{ri(l), . . . , ri(l − 1 + w)} are similar. Then, the Euclidean distance d
i
k (j) = d
and one has

i (l), . . . , ra

i (l − 1 + w)} and
k (j), ∀j ∈ {1, ..., w}
ri

ˆDKL(Pra
i

||Pri) = log

w
w − 1

, ∀i ∈ V.

Based on (5.41), one has

Φi(k) =

1
T

k(cid:88)

l=k−T +1

log

w
w − 1

= log

w
w − 1

< δ, ∀i ∈ V.

(5.41)

(5.42)

where log(w/w − 1) in (42) depends on the sample size of innovation sequence and log(w/w − 1) ≤
0.1, ∀w ≥ 10. Therefore, the predeﬁned threshold δ can be selected with some δ > 0.1 such that
the condition in (42) is always satisﬁed. This complete the proof of part 1.

In the presence of attack, the samples of innovation sequences {ra

i (l − 1 + w)} and
ra
{ri(l), . . . , ri(l − 1 + w)} are diﬀerent, i.e., d
k (j) (cid:54)= d
k (j), ∀j ∈ {1, ..., w}. More speciﬁcally,
ri
i
ra
k (j), ∀j ∈ {1, ..., w} due to change in the corrupted innovation sequence. Therefore,
ri
i
k (j) > d
d
based on (5.39) the estimated relative entropy between sequences becomes

i (l), . . . , ra

ˆDKL(Pra
i

||Pri) =

p
w

log(1 +

ri
∆
k (j)
ra
i
d
k (j)

) + log

w
w − 1

, ∀i ∈ V,

(5.43)

with ∆

ri
k (j) as the change in Euclidean distance due to corrupted innovation sequence. Based on

w(cid:88)

j=1

w(cid:88)
k(cid:88)

j=1

(5.43), one has

Thus, one has

ˆDKL(Pra
i

||Pri) =

p
w

log(1 +

ri
∆
k (j)
ra
i
k (j)
d

) + log

w
w − 1

(cid:29) log

w
w − 1

.

Φi(k) =

1
T

l=k−T +1

ˆDKL(Pra
i

||Pri) > δ, ∀i ∈ V,

where T and δ denote the sliding window size and the predeﬁned design threshold. This completes

the proof.

110

(5.44)

(5.45)

Based on Theorem 4, one can use the following condition for attack detection.

 Φi(k) < δ : H0,

Φi(k) > δ : H1,

(5.46)

where δ denotes the designed threshold for detection, the null hypothesis H0 represents the intact
mode of sensor nodes and H1 denotes the compromised mode of sensor nodes.

Remark 5. Note that in the absence of an attack, the innovation sequence has a known zero-mean

Gaussian distribution due to the measurement noise. Based on the prior system knowledge, one

can always consider that the nominal innovation sequence is zero-mean Gaussian distribution with

predeﬁned covariance. The bound on the predeﬁned covariance can be determined during normal

operation of the event-triggered DKF. This assumption for the knowledge of the nominal innovation

sequence for attack detection is standard in the existing literature (see [135] for instance). The

designed threshold δ in (5.46) is a predeﬁned parameter and chosen appropriately for the detection

of the attack signal. Moreover, the selection of detection threshold based on expert knowledge

is standard in the existing literature. For example, several results on adversary detection and

stealthiness have considered similar thresholds [36], [124].

Algorithm 1 Detecting attacks on sensors.

1: Initialize with a time window T and detection threshold δ.
2: procedure ∀i = 1, . . . , N
3: Use samples of

innovation sequences {ra

{ri(l), . . . , ri(l − 1 + w)} based on (5.38) and (5.11), ∀l ∈ k.

i (l), . . . ,

4: Estimate the ˆDKL(Pra
i
5: Compute Φi(k) as (5.45) and use condition in (5.46) to detect attacks on sensors.
6: end procedure

||Pri) using (5.44).

i (l − 1 + w)} and
ra

Based on the results presented in Theorem 4 and Algorithm 1, one can capture attacks on both

sensors and communication links, but it cannot identify the speciﬁc compromised communication

link as modelled in (5.16). To detect the source of attacks, we present an estimated entropy-based

detector to capture the eﬀect of attacks on the speciﬁc communication channel. More speciﬁcally,

the relative entropy between the estimated innovation sequences for the neighbors at particular

111

sensor node and the nominal innovation sequence of the considered sensor node is estimated using

(5.37).

Deﬁne the estimated innovation sequences ζa

i,j(k) for a neighbor j under attacks on communi-

cation channel from the sensor node i side as

i,j(k) = yi(k) − Cj ˜xa
ζa

j (k),

(5.47)

where ˜xa

j (k) is the corrupted communicated state estimation of neighbor j at sensor node i at the

last triggering instant.
i,j(l), . . . , ζa

Let {ζa

i,j(l−1+w)} be i.i.d. p-dimensional samples of neighbor’s estimated innovation
. Using k − N N based relative entropy

at the sensor node i with probability density function Pζa
i,j
estimator (5.37), one has

ˆDKL(Pζa
i,j

||Pri) =

p
w

log

ri
d
k (j)
ζa
i,j
d
k

(j)

+ log

w
w − 1

, ∀i ∈ V, j ∈ Ni.

(5.48)

Note that in the presence of attacks on the communication channels, the neighbor’s actual

innovation diﬀers the neighbor’s estimated innovation at sensor i. In the absence of the attack, the

mean value of all the sensor state estimates converge to the mean of the desired process state at
steady state, and, therefore, the innovation sequences ri and ζa
i,j have the same zero mean Gaussian
distributions. In the presence of attack, however, as shown in Theorem 5 and Algorithm 2, their

distributions diverge.

w(cid:88)

j=1

k(cid:88)

Deﬁne the average of the KL divergence over a time window of T as

Ψi,j(k) =

1
T

l=k−T +1

ˆDKL(Pζa
i,j

||Pri), ∀i ∈ V, j ∈ Ni.

(5.49)

Theorem 5. Consider the distributed sensor network (5.1)-(5.2) under attack on communication
links (5.16). Then, in the presence of an attack, Ψi,j(k) > δ, ∀k where δ denotes a predeﬁned
threshold.

Proof. The result follows a similar argument as given in the proof of part 2 of Theorem 4.

5.5

Secure Distributed Estimation Mechanism

This section presents a meta-Bayesian approach for secure event-triggered DKF, which incor-

porates the outcome of the attack detection mechanism to perform second-order inference and

112

Algorithm 2 Detecting attack on a speciﬁc communication link.
1: Initialize with a time window T and detection threshold δ.
2: procedure ∀i = 1, . . . , N
3: For each sensor node j ∈ Ni, use samples of innovation sequences{ζa

1 + w)} and {ri(l), . . . , ri(l − 1 + w)} based on (5.47) and (5.11), ∀l ∈ k.

i,j(l), . . . , ζa

i,j(l −

4: Estimate the ˆDKL(Pζa
i,j
5: Compute Ψi,j(k) as (5.49) and use same argument in (5.46) to detect attacks on

||Pri) using (5.48).

speciﬁc communication link.

6: end procedure

consequently form beliefs over beliefs. That is, the second-order inference forms conﬁdence and

trust about the truthfulness or legitimacy of the sensors’ own state estimate (i.e., the posterior

belief of the ﬁrst-order Bayesian inference) and those of its neighbor’s state estimates, respectively.

Each sensor communicates its conﬁdence to its neighbors. Then sensors incorporate the conﬁdence

of their neighbors and their own trust about their neighbors into their posterior update laws to

successfully discard the corrupted information.

5.5.1 Conﬁdence of sensor nodes

The second-order inference forms a conﬁdence value for each sensor node which determines the level

of trustworthiness of the sensor about its own measurement and state estimate (i.e., the posterior

belief of the ﬁrst-order Bayesian inference). If a sensor node is compromised, then the presented

attack detector detects the adversary and it then reduces its level of trustworthiness about its

own understanding of the environment and communicates it with its neighbors to inform them the

signiﬁcance of its outgoing information and thus slow down the attack propagation.

To determine the conﬁdence of the sensor node i, based on the divergence ˆDKL(Pra
i

Theorem 4, we ﬁrst deﬁne

χi(k) =

Υ1

Υ1 + ˆDKL(Pra
i

,

||Pri)

||Pri) from

(5.50)

with 0 < Υ1 < 1 represents a predeﬁned threshold to account for the channel fading and other
uncertainties. Then, in the following lemma, we formally present the results for the conﬁdence of

sensor node i.

113

Lemma 1. Let βi(k) be the conﬁdence of the sensor node i which is updated using

k−1(cid:88)

βi(k) =

(κ1)k−l+1χi(l),

(5.51)

where χi(k) is deﬁned in (5.50), and 0 < κ1 < 1 is a discount factor. Then, βi(k) ∈ (0, 1] and

l=0

1. βi(k) → 0, ∀i ∈ Vc;
2. βi(k) → 1, ∀i ∈ V\Vc.

Proof. Based on the expression (5.50), since ˆDKL(Pra
i
using (5.51), one can infer that βi(k) ∈ (0, 1].

||Pri) ≥ 0, one has χi(k) ∈ (0, 1]. Then,

Now according to Theorem 4, if the sensor node i is under attack, then ˆDKL(Pra
i

||Pri) >> Υ1
in (5.50), which makes χi(k) close to zero. Then, based on expression (5.51) with the discount
factor 0 < κ1 < 1, the conﬁdence βi(k) in (5.51) approaches zero, and thus the ith sensor’s belief
about the trustworthiness of its own information would be low. This completes the proof of part 1.
||Pri) → 0 as
w → ∞, which makes χi(k) close to one and, consequently, βi(k) becomes close to one. This
indicates that the ith sensor node is conﬁdent about its own state estimate. This completes the

On the other hand, based on Theorem 4, in the absence of attacks, ˆDKL(Pra
i

proof of part 2.

Note that the expression for the conﬁdence of sensor node i in (5.51) can be implemented using

the following diﬀerence equation

βi(k + 1) = βi(k) + κ1χi(k).

(5.52)

Note also that the discount factor in (5.51) determines how much we value the current experience

with regards to past experiences. It also guarantees that if the attack is not persistent and disappears

after a while, or if a short-period adversary rather than attack (such as packet dropout) causes, the

belief will be recovered, as it mainly depends on the current circumstances.

5.5.2 Trust of sensor nodes about their incoming information

Similar to the previous subsection, the second-order inference forms trust of sensor nodes to represent

their level of trust on their neighboring sensor’s state estimates. Trust decides the usefulness of the

neighboring information in the state estimation of sensor node i.

114

k−1(cid:88)

The trust of the sensor node i on its neighboring sensor j can be determined based on the

divergence ˆDKL(Pζa
i,j

||Pri) in (5.47) from Theorem 5, from which we deﬁne

θi,j(k) =

Λ1

Λ1 + ˆDKL(Pζa
i,j

,

||Pri)

(5.53)

where 0 < Λ1 < 1 represents a predeﬁned threshold to account for the channel fading and other
uncertainties. Then, in the following lemma, we formally present the results for the trust of the

sensor node i on its neighboring sensor j.

Lemma 2. Let σi,j(k) be the trust of the sensor node i on its neighboring sensor j which is updated
using

σi,j(k) =

(κ2)k−l+1θi,j(l),

(5.54)

where θi,j(k) is deﬁned in (5.53), and 0 < κ2 < 1 is a discount factor. Then, σi,j(k) ∈ (0, 1] and

l=0

1. σi,j(k) → 0, ∀j ∈ Vc ∩ Ni;
2. σi,j(k) → 1, ∀j ∈ V\Vc ∩ Ni.

Proof. The result follows a similar argument as given in the proof of Lemma 1.

Note that the trust of sensor node i in (5.54) can be implemented using the following diﬀerence

equation

σi,j(k + 1) = σi,j(k) + κ2θi,j(k).

(5.55)

Using the presented idea of trust, one can identify the attacks on the communication channel and

discard the contribution of compromised information for the state estimation.

5.5.3 Attack mitigation mechanism using conﬁdence and trust of sensors

This subsection incorporates the conﬁdence and trust of sensors to design a resilient event-triggered

DKF. To this end, using the presented conﬁdence βi(k) in (5.51) and trust σi,j(k) in (5.54), we
design the resilient form of the event-triggered DKF as

ˆxi(k) = ¯xi(k) + Ki(k)(βi(k)yi(k) + (1 − βi(k))Cimi(k) − Ci ¯xi(k))

(5.56)

(cid:80)
j∈Ni

σi,j(k)βj(k)(˜xj(k) − ˜xi(k)),
where the weighted neighbor’s state estimate mi(k) is deﬁned as

+γi

115

mi(k) = 1|Ni|

(cid:80)
j∈Ni
∀k (cid:107)εi(k)(cid:107) < τ,

σi,j(k)βj(k)˜xj(k) ≈ x(k) + εi(k),

(5.57)

where εi(k) denotes the deviation between the weighted neighbor’s state estimate mi(k) and the
actual process state x(k). Note that in (5.57) the weighted state estimate depends on the trust
values σi,j(k) and the conﬁdence values βj(k), ∀j ∈ Ni. Since the weighted state estimate depends
only on the information from intact neighbors, then one has (cid:107)εi(k)(cid:107) < τ for some τ > 0, ∀k. For the
sake of mathematical representation, we approximate the weighted state estimate mi(k) in terms of
the actual process state x(k), i.e., mi(k) ≈ x(k) + εi(k). We call this a meta-Bayesian inference that
integrates the ﬁrst-order inference (state estimates) with second-order estimates or belief (trust and

conﬁdence on the trustworthiness of state estimate beliefs).

Deﬁne the prior and predictive state estimation errors as
¯ηi(k) = x(k) − ¯xi(k)
˜ηi(k) = x(k) − ˜xi(k),
Using the threshold in triggering mechanism (5.3), one has

(cid:107)˜ηi(k)(cid:107) − (cid:107)x(k + 1) − x(k) + vi(k + 1)(cid:107) ≤ α/(cid:107)Ci(cid:107) ,
(cid:107)˜ηi(k)(cid:107) ≤ α/(cid:107)Ci(cid:107) + B,

where B denotes the bound on (cid:107)x(k + 1) − x(k) + vi(k + 1)(cid:107) .
Other notations used in the following theorem are given by

¯η(k) = [¯η1(k), . . . , ¯ηN (k)], M (k) = diag[M1(k), . . . , MN (k)]
Υ = diag[γ1, . . . , γN ], Υm = (cid:107)max{γi}(cid:107) , ∀i ∈ V,
¯β = (IN − diag(βi)), E(k) = [ε1(k), . . . , εN (k)],
˜η(k) = [˜η1(k), . . . , ˜ηN (k)].

Assumption 4. At least (C(Ni)/2) + 1 neighbors of the sensor node i are intact.

(5.58)

(5.59)

(5.60)

Assumption 4 is similar to the assumption found in the secure estimation and control literature

[7], [125]. Necessary and suﬃcient condition for any centralized or distributed estimator to resiliently

estimate actual state is that the number of attacked sensors is less than half of all sensors.

Theorem 6. Consider the resilient event triggered DKF (5.56) with the triggering mechanism
(5.3). Let the time-varying graph be G(k) such that at each time instant k, Assumptions 3 and 4
are satisﬁed. Then,

116

1. The following uniform bound holds on state estimation error in (5.58), despite attacks

k−1(cid:88)

(cid:107)¯η(k)(cid:107) ≤ (Ao)k (cid:107)¯η(0)(cid:107) +

(Ao)k−m−1Bo,

where

m=0

Ao = σmax((IN ⊗ A)M (k)),
Bo = σmax(A)σmax(L(k))Υm

+(σmax(A) + σmax(Ao))(cid:13)(cid:13) ¯β(cid:13)(cid:13)√

(cid:112)N (α/(cid:107)Ci(cid:107) + B)

N τ,

with L(k) denotes the conﬁdence and trust dependent time-varying graph Laplacian matrix,
and bound τ deﬁned in (5.57);

2. The uniform bound on the state estimation error (5.61) becomes

k→∞(cid:107)¯η(k)(cid:107) ≤ AoBo
1 − Ao

lim

.

Moreover, other notations used in (5.62) are deﬁned in (5.60).

Proof. Using the presented resilient estimator (5.56), one has

(5.61)

(5.62)

(5.63)

(5.64)

(5.65)

(5.66)

Substituting (5.57) into (5.64) and using (5.58), the state estimation error dynamics becomes

¯xi(k + 1) = Aˆxi(k)

(cid:80)
j∈Ni

= A(¯xi(k) + Ki(k)(βi(k)yi(k) + (1 − βi(k))Cimi(k)
σi,j(k)βj(k)(˜xj(k) − ˜xi(k))),
−Ci ¯xi(k)) + γi
(cid:80)
j∈Ni

aij(k)(˜ηj(k) − ˜ηi(k))

¯ηi(k + 1) = AMi(k)¯ηi(k) + Aγi

−AKi(k)(1 − βi(k))Ciεi(k),

where aij(k) = σi,j(k)βj(k) and Mi(k) = I − Ki(k)Ci.
Using (5.65) and notations deﬁned in (5.60), the global form of error dynamics becomes

¯η(k + 1) = (IN ⊗ A)M (k)¯η(k) − (Υ ⊗ A)L(k)˜η(k)

−( ¯β ⊗ A)(InN − M (k))E(k)).

Note that Assumption 4 implies that the total number of the compromised sensors is less than

half of the total number of sensors in the network. That is, if q neighbors of an intact sensor

node are attacked and collude to send the same value to mislead it, there still exists q + 1 intact

neighbors that communicate values diﬀerent from the compromised ones. Moreover, since at least

117

half of the intact sensor’s neighbors are intact, it can update its beliefs to discard the compromised
neighbor’s state estimates. Furthermore, since the time-varying graph G(k) resulting from isolating
the compromised sensors, based on Assumptions 3 and 4, the entire network is still collectively

observable. Using the trust and conﬁdence of neighboring sensors, the incoming information from

the compromised communication channels is discarded.

Now taking norm of equation (5.66) from both sides and then using the triangular inequality,

one has

(cid:107)¯η(k + 1)(cid:107) ≤ (cid:107)(IN ⊗ A)M (k)¯η(k)(cid:107) + (cid:107)(Υ ⊗ A)L(k)˜η(k)(cid:107)

+(cid:13)(cid:13)( ¯β ⊗ A)(InN − M (k))E(k)(cid:13)(cid:13) .

Using (5.57), (5.67) can be rewritten as

(cid:107)¯η(k + 1)(cid:107) ≤ Ao (cid:107)¯η(k)(cid:107) + σmax(L(k))(cid:107)(Υ ⊗ A)˜η(k)(cid:107)

+(cid:13)(cid:13)( ¯β ⊗ A) − ( ¯β ⊗ In)(IN ⊗ A)M (k))E(k)(cid:13)(cid:13) .

After some manipulations, one has

(cid:107)¯η(k + 1)(cid:107) ≤ Ao (cid:107)¯η(k)(cid:107) + σmax(A)σmax(L(k))Υm (cid:107)˜η(k)(cid:107)

+(σmax(A) + σmax(Ao))(cid:13)(cid:13) ¯β(cid:13)(cid:13)√

N τ,

with Υm deﬁned in (5.60). Then, using (5.59), one can write (5.69) as

(cid:107)¯η(k + 1)(cid:107) ≤ Ao (cid:107)¯η(k)(cid:107) + (σmax(A) + σmax(Ao))(cid:13)(cid:13) ¯β(cid:13)(cid:13)√

+σmax(A)σmax(L(k))Υm

(cid:112)N (α/(cid:107)Ci(cid:107) + B),

N τ

After solving (5.70), one has

(cid:107)¯η(k)(cid:107) ≤ (Ao)k (cid:107)¯η(0)(cid:107) +

k−1(cid:88)

(Ao)k−m−1Bo,

m=0

(5.67)

(5.68)

(5.69)

(5.70)

(5.71)

where A0 and B0 are given in (5.62). This completes the proof of part 1. Based on Assumption
3, the distributed sensor network is always collectively observable. Thus, based on result provided

in [137], one can conclude that A0 is always Schur and then the upper bound on state estimation
error becomes (5.63). This completes the proof.

Based on the attack detection approach presented in Algorithms 1 and 2, one can detect the

attacker’s misbehavior and estimate the actual state using the result presented in Theorem 6 and

Algorithm 3.

118

Algorithm 3 Secure Distributed Estimation Mechanism (SDEM).

1: Start with initial innovation sequences and design parameters Υ1 and Λ1.
2: procedure ∀i = 1, . . . , N
3: Use samples of

innovation sequences {ra

{ri(l), . . . , ri(l − 1 + w)} based on (5.38) and (5.11), ∀l ∈ k.

i (l), . . . ,

i (l − 1 + w)} and
ra

4: Estimate the ˆDKL(Pra
i
5: Based on (5.50)-(5.51), compute conﬁdence βi(k) as
(κ1)k−l+1
Υ1 + ˆDKL(Pra
i

||Pri) using (5.44).
k−1(cid:88)

βi(k) = Υ1

l=0

.

||Pri)

(5.72)

6: For each sensor node j ∈ Ni, use samples of innovation sequences {ζa

1 + w)} and {ri(l), . . . , ri(l − 1 + w)} based on (5.47) and (5.11), ∀l ∈ k.

i,j(l), . . . , ζa

i,j(l−

||Pri) using (5.48).
7: Estimate the ˆDKL(Pζa
i,j
8: Using (5.53)-(5.54), compute trust σi,j(k) as

k−1(cid:88)

l=0

σi,j(k) = Λ1

(κ2)k−l+1
Λ1 + ˆDKL(Pζa
i,j

θi,j(l).

||Pri)

(5.73)

9: Using the sensor measurement yi(k) with the conﬁdence βi(k), the trust on neighbor’s
σi,j(k) and neighbor’s state estimates ˜xj(k), ∀j ∈ Ni, update the resilient state
estimator in (5.56).

10: end procedure

5.6

Simulation Results

In this section, we discuss simulation results to demonstrate the eﬃcacy of presented attack

detection and mitigation mechanism. The sensor network assumed to have following undirected

graph topology as given in Fig. 5.2 with objective to follow the desired process dynamics.

Consider the process dynamics in (5.1) for generating the target trajectory as

 cos(π/200) − sin(π/200)

sin(π/200)

cos(π/200)

 x(k) + w(k),

x(k + 1) =

with the observation matrix Ci in (5.2), noise covariances and initial state as

Ci = [5 0; 0 2], Q = I2, Ri = I2, x0 = (0.5, 0).

119

(5.74)

(5.75)

Figure 5.2: Communication topology.

(a)

(b)

Figure 5.3: Sensor network without any attack. (a) State estimation errors (b) Transmit function for

sensor 2

For intact sensor network, the state estimates of sensors converge to the desired process state in

the mean square sense and their state estimation error goes to zero for each sensor node as shown

in Fig. 5.3 (a). The event generation based on the event-triggering mechanism in (5.3) with the

triggering threshold α = 1.8 is shown in Fig. 5.3 (b). Then, we consider the sensor 2 of the network

is compromised with the adversarial input δ2(k) = 2 + 10 sin(100k) after 20 seconds. Fig. 5.4

120

51015202530Time (s)-20-100102000.51251015202530Time (s)(a)

(b)

Figure 5.4: Sensor node 2 under continuous-triggering misbehavior. (a) State estimation errors (b)

Transmit function for sensor 2

(a) shows the attacker’s eﬀect on sensor 2 and one can notice that the compromised sensors and

other sensors in the network deviates from desired target state and results in non-zero estimation

error based on attacker’s input. Furthermore, the event generation based on the event-triggering

mechanism in (5.3) in the presence of attack is shown in Fig. 5.4 (b) and it can be seen that after

injection of the attack on sensor 2, the event-triggered system becomes time-triggered and shows

continuous-triggering misbehavior. This result follows the analysis presented for the continuous-

triggering misbehavior. In Fig. 5.5, we show the results for non-triggering misbehavior for sensor

node 2 which follow the presented analysis.

Now, we detect the eﬀect of the attack on the sensor using presented attack detection mechanism.

Fig. 5.6 (a) shows the result for estimated KL divergence based attack detection mechanism and it

illustrates that the after the injection of attack signal the estimated KL divergence starts increasing

for compromised sensor node as well as for sensor nodes which has a path from the compromised

sensor. One can always design a threshold and detect the eﬀect of the attack in the sensor network

121

102030405060Time (s)-2002000.51210203040Time (s)(a)

(b)

Figure 5.5: Sensor node 2 under non-triggering misbehavior. (a) State estimation errors (b) Transmit

function for sensor 2

and, then isolate the corrupted sensor in WSNs to avoid propagation of attack in the WSNs.

The estimated divergence for the compromised sensor, i.e., sensor 2 grows after attack injection

at k = 20 which follows the result presented in the Theorem 4. The conﬁdence of the sensor is

evaluated based on the Lemma 1 with the discount factor κ1 = 0.5 and the uncertainty threshold
Υ1 = 0.5. Fig. 5.6 (b) shows the conﬁdence of sensors in the presence of the considered attack
which is close to one for healthy sensors and tends to zero for the compromised one. Then, the

belief based proposed resilient estimator is implemented and Fig. 5.7 shows the result for the state

estimation using the resilient estimator (5.56). After the injection of attack, within a few seconds,

the sensors reach consensus on the state estimates, i.e., the state estimates of sensors converge to

the actual position of the target. The result in Fig. 5.7 follows Theorem 6.

122

102030405060Time (s)-200204000.512102030405060Time (s)(a)

(b)

Figure 5.6: Sensor node 2 under attack. (a) Estimated KL divergence (b) Conﬁdence of sensors

Figure 5.7: State estimation errors under attack on sensor 2 using proposed resilient state estimator.

5.7 Conclusion

In this chapter, ﬁrst, we analyze the adverse eﬀects of cyber-physical attacks on the event-

triggered distributed Kalman ﬁlter (DKF). We show that attacker can adversely aﬀect the perfor-

123

1020304050Time (s)024681020304050Time (s)00.20.40.60.811020304050Time (s)-10010mance of the DKF. We also show that the event-triggered mechanism in the DKF can be leveraged

by the attacker to result in a non-triggering misbehavior that signiﬁcantly harms the network con-

nectivity and its collective observability. Then, to detect adversarial intrusions in the DKF, we relax

restrictive Gaussian assumption on probability density functions of attack signals and estimate the

Kullback-Leibler (KL) divergence via k-nearest neighbors approach. Finally, to mitigate attacks, a

meta-Bayesian approach is presented that incorporates the outcome of the attack detection mech-

anism to perform second-order inference and consequently form beliefs over beliefs, i.e., conﬁdence

and trust of a sensor. Each sensor communicates its conﬁdence to its neighbors. Sensors then

incorporate the conﬁdence of their neighbors and their own trust about their neighbors into their

posterior update laws to successfully discard corrupted sensor information. Then, the simulation

result illustrates the performance of the presented resilient event-triggered DKF.

124

CHAPTER 6

ASSURED LEARNING-ENABLED AUTONOMY: A METACOGNITIVE

REINFORCEMENT LEARNING FRAMEWORK

6.1

Introduction

This chapter presents the safe reinforcement learning (RL) framework for autonomous control

systems under constraints. Reinforcement learning agents with pre-speciﬁed reward functions cannot

provide guaranteed safety across variety of circumstances that an uncertain system might encounter.

To guarantee performance while assuring the satisfaction of safety constraints across variety of cir-

cumstances, an assured autonomous control framework is designed by empowering RL algorithms

with meta-cognitive learning capabilities. We ﬁrst discuss that RL agents with pre-speciﬁed reward

functions cannot guarantee satisfaction of the desired speciﬁed constraints and performances across

all circumstances that an uncertain system might encounter. That is, the system either violates

safety speciﬁcations or achieves no optimality and liveness speciﬁcations. To overcome this issue,

a metacognitive decision-making layer is augmented with the RL agent to learn what reward func-

tions to choose to satisfy desired speciﬁcations and to achieve a good enough performance across

variety of circumstances. More speciﬁcally, a ﬁtness function is deﬁned in the metacognitive layer

that indicates how safe the system would react in the future for a given reward function and in

case of a drop in the ﬁtness function, a Bayesian RL algorithm will proactively adapt the reward

function parameters to maximize system’s assuredness (i.e., satisfaction of the desired STL safety

and liveness speciﬁcations) and guarantee performance. Oﬀ-policy RL algorithms are proposed to

ﬁnd optimal policies corresponding to each hyperparameter by reusing the data collected from the

system. The proposed approach separates learning the reward function that satisﬁes speciﬁcations

from learning the control policy that maximizes the reward and thus allows us to evaluate as many

hyperparameters as required using reused data collected from the system dynamics.

125

6.2 Preliminaries

6.2.1 Notations

Throughout the chapter, R and N represent the sets of real numbers and natural numbers, respec-
tively. Rn denotes n-dimensional Euclidean space. The superscript (.)T denotes transposition. I

denotes the identity matrix of proper dimension.

[K]j denotes the j-th element of the vector K.
[K]ij denotes the [i, j]-th entry of the matrix K. diag(A) denotes a diagonal matrix in which all
oﬀ-diagonal entries are zero, i.e., [A]ij = 0, ∀i (cid:54)= j. Tr(A) stands for trace of the matrixA. When
a random variable εi is distributed normally with mean m and variance w2, we use the notation
εi ∼ N (m, w2). ⊗ denotes Kronecker product and vec (A) denotes the mn-vector constructed by
stacking the columns of matrix A ∈ Rn×m on top of one another.
Deﬁnition 1. The weighted Kullback–Leibler (KL) divergence between distributions P and Q is
(cid:19)h(x)

deﬁned as [164]

(cid:90)

KL(X||Z) =
Dh

PX (θ) log

dθ

(6.1)

(cid:18) PX (θ)

PZ (θ)

where h(x) is a non-negative real-valued weighting function.

Note that the weighting function is deﬁned to weight more heavily promising regions of the state

space. Note also that DKL(PX||Pz) ≥ 0 and DKL(PX||Pz) = 0 if and only if PX = Pz.

6.2.2 Signal Temporal Logic

Temporal logics can be used to specify rich time-dependent constraints for control systems in a

wide variety of applications. Signal temporal logic (STL) is a category of temporal logics, which

allows the speciﬁcation of temporal properties of real-valued signals. STL is a predicate logic deﬁned

over continuous-time signals [165]-[168]. Let x(t)be a continuous-time signal. The predicates σ are
evaluated as True ((cid:62)) or False (⊥) according to a corresponding predicate function zσ(x) : Rn → R
as

 (cid:62) zσ(x) > 0

⊥ zσ(x) ≤ 0

σ =

126

(6.2)

The predicate function zσ(x) is a linear or nonlinear combination of the elements of the signal x
and the predicate σ belongs to a set of predicates Pσ = [σ1, σ2, . . . , σN ] with N ∈ N denoting the
number of predicates. Predicates can be recursively combined using Boolean logic negation (¬),
disjunction (∨) and conjunction (∧) as well as temporal operators eventually ((cid:5)), globally or always
((cid:3)) and until (U ) to form increasingly complex formulas ϕ (also referred to as task speciﬁcations)
as

ϕ := (cid:62)|σ|¬σ| ϕ1 ∧ ϕ2|ϕ1 ∨ ϕ2|(cid:5)[a,b]ϕ| (cid:3)

[a,b]ϕ| ϕ1 U[a,b]ϕ2

For each predicate σi, i = 1..., N, a predicate function zσ(x(t)) is deﬁned as (2). The time
bounds of the until operator ϕ U[a,b]µ are given as a, b ∈ [0,∞) with a < b. The commonly used
temporal operators eventually and always follow from (cid:5)[a,b]ϕ = (cid:62)U[a,b]ϕ, respectively. For example,
the temporal formula (cid:5)[3,6]ϕ is satisﬁed when the STL formula ϕ becomes True within the time
interval of 3 to 6 seconds. A signal x(t) is said to satisfy an STL expression at a time t by the

following qualitative semantics [164]-[167].

(x, t) (cid:15) σ
(x, t) (cid:15) ¬σ
(x, t) (cid:15) ϕ ∧ µ
(x, t) (cid:15) ϕ ∨ µ
(x, t) (cid:15) ϕU[a,b]µ

(x, t) (cid:15) (cid:5)[a,b]ϕ
(x, t) (cid:15) (cid:3)
[a,b]ϕ

⇔ zσ(x(t)) > 0
⇔ ¬((x, t) (cid:15) σ)
⇔ (x, t) (cid:15) ϕ ∧ (x, t) (cid:15) µ
⇔ (x, t) (cid:15) ϕ ∨ (x, t) (cid:15) µ
⇔ ∃t1 ∈ [t + a, t + b] s.t. (x, t1) (cid:15) µ
∧∀t2 ∈ [t, t1], (x, t2) (cid:15) ϕ
⇔ ∃t1 ∈ [t + a, t + b] s.t. (x, t1) (cid:15) ϕ
⇔ ∀t1 ∈ [t + a, t + b] s.t. (x, t1) (cid:15) ϕ

(6.3)

The symbol (cid:15) denotes satisfaction of an STL formula. The time interval [a, b] diﬀerentiates STL
from general temporal logic and deﬁnes the quantitative timing to achieve the continuing temporal

formula.

Apart from syntaxes and qualitative semantics, STL is also equipped with various robustness

measures that quantify the extent to which a temporal constraint is satisﬁed. Given STL formulas

ϕ and µ, the spatial robustness is deﬁned as [167]

127

ρσ(x, t) = zσ(x(t))
ρ¬σ(x, t) = −ρσ(x, t)
ρϕ∧µ(x, t) = min(ρϕ(x, t), ρµ(x, t))
ρϕ∨µ(x, t) = max(ρϕ(x, t), ρµ(x, t))
ϕU[a,b]µ
ρ
(cid:5)[a,b]ϕ
ρ
(cid:3)
ρ

t1∈[t+a, t+b]
t1∈[t+a, t+b]
t1∈[t+a, t+b]

(x, t) =

(x, t) =

max

(x, t) =

min

[a,b]ϕ

max

ρϕ(x, t1)

ρϕ(x, t1)

(min(ρµ(x, t1), min
t2∈[t, t1]

(ρϕ(x, t2))

(6.4)

This robustness measure determines how well a given signal x(t) satisﬁes a speciﬁcation. The

space robustness deﬁnes such a real-valued function ρσ(x, t) which is positive if and only if (x, t)
(cid:15) σ. That is, ρσ(x, t) > 0 ⇔ (x, t) (cid:15) σ . Let a trajectory τ [0, T ] be deﬁned by the signals x(t)
throughout its evolution from time 0 to T . A trajectory then satisﬁes the speciﬁcation if and only
if ρσ(x, t) > 0 ∀ t ∈ [0, tf ] , where tf is the end time of the STL horizon. The robustness degree
is the bound on the perturbation that the signal can tolerate without changing the truth value of

speciﬁcation.

6.2.3 Gaussian process

A Gaussian process (GP) can be viewed as a distribution over functions, in the sense that a draw

from a GP is a function. GP has been widely used as a nonparametric regression method, where
the goal is to ﬁnd an approximation of a nonlinear map f : X → R from a state x to the function
value f (x). The function values f (x) to be random variables, so that any ﬁnite number of them

have a joint Gaussian distribution. When a process f follows a Gaussian process model, then

f (.) ∼ GP (m0(.), k0(. , .))

(6.5)

where m0(.) is the mean function and k0(. , .) is the real-valued positive deﬁnite covariance kernel
function [169].
In GP inference, the posterior mean and covariance of a function value f (x) at

an arbitrary state x can be obtained by conditioning the GP distribution of f on a set of past

measurements. Let Xn = [x1, ..., xn] be a set of discrete state measurements, providing the set of

128

inducing inputs. For each measurement xi, there is an observed output yi = f (xi) = m(xi) + εi
where εi ∼ N (0, w2). The stack outputs give y = [y1, ..., yn]. The posterior distribution at a query
point x is also a Gaussian distribution and is given by [169]

−1
mn(x) = m0(x) + K(x, Xn)T (Kn + I w2)
kn(x, x(cid:48)) = k0(x, x(cid:48)) − K(x, Xn)T (Kn + I w2)

−1

K(x, XI )

(6.6)

where the vector K(x, Xn) = [k0(x, x1), ..., k0(x, xn)] contains the covariance between the new data,
x, and the states in Xn, and [Kn]ij = k0(xi, xj), ∀i, j ∈ {1, ..., n}.

6.3 Problem Statement and Motivation

In this section, the problem of optimal control of systems subject to the desired speciﬁcations is

formulated. We then discuss that optimizing a single reward or performance function cannot work

for all circumstances and it is essential to adapt the reward function to the context to provide a

good enough performance and assure safety and liveness of the system.

Consider the non-linear continuous-time system given by

˙x(t) = f (x(t)) + g(x(t))u(t)

(6.7)

where x ∈ X and u ∈ U denote the admissible set of states and inputs, respectively. We assume
that f (0) = 0, f (x(t)) and g(x(t)) are locally Lipschitz functions on a set Ω ⊆ Rn that contains the
origin, and that the system is stabilizable on Ω.

The control objective is to design the control signal u for the system (6.7) to 1) make the system

achieve desired behaviors (e.g., track the desired trajectory xd(t) with good transient response)
while 2) guaranteeing safety speciﬁcations speciﬁed by STL, i.e., guaranteeing (x, t) (cid:15) σ where σ
belongs to a set of predicates Pσ = [σ1, σ2, . . . , σN ] with N as the number of the constraints. To
achieve these goals, one can use an objective function for which its minimization subject to (x, t) (cid:15) σ
provide an optimal and safe control solution aligned with the intention of the designer. That is, the

following optimal control formulation can be used to achieve an optimal performance (encoded in

the reward function r) while guaranteeing STL speciﬁcations.

Problem 1 (Safety-Certiﬁed Optimal Control). Given the system (6.7), ﬁnd a control policy

u that solves the following safe optimal control problem.

129

min J(x(t), u(t), xd(t)) =(cid:82) ∞

t

e−γ(t−τ )r(x(τ ), u(τ ), xd(τ )) dτ

s.t. (x, τ ) (cid:15) σ, ∀τ ≥ t

where γ is a positive discount factor,

r(x(t), u(t), xd(t)) =

(cid:88)

j

qj rj(x(t), u(t), xd(t))

(6.8)

(6.9)

is the overall reward function with rj(x(t), u(t), xd(t)) as the cost for the i-th sub-goal of the system
with qj as its weight function, and xd(t) is an external signal (e.g., reference trajectory).

The optimization framework in Problem 1, if eﬃciently solved, works well for systems operating

in structured environments in which the system is supposed to perform a single task and the priorities

across sub-goals (i.e., qj in (6.9)) do not change over time and reference trajectories need not be
adjusted. However, ﬁrst of all, Problem 1 is hard to solve as it considers both optimality and

safety in one framework. Second of all, even if eﬃciently solved, for complex systems such as self-

driving cars for which the system might encounter numerous circumstances, a ﬁxed reward function

cannot capture the complexity of the semantics of complex tasks across all circumstances. As the

circumstance changes, the previously rewarding maneuvers might not be achievable safely and thus

the feasibility of the solution to Problem 1 might be jeopardized.

Deﬁnition 2. (Feasible Control Policy). Consider the system (6.7) with speciﬁcations σ. A

control policy u(t) = µ(x) is said to be a feasible solution to Problem 1 if

1. µ(x) stabilizes the system (6.7) on Ω.

2. There exists a safe set S ∈ X such that for every x0 ∈ S, xt(x0, µ) (cid:15) σ ∀t, where xt(x0, µ) is
the state trajectory x(t) and time t ≥ 0 generated by (6.7) with the initial condition x0 and
the policy u = µ(x).

3. J(x, u, xd) ≤ ∞ for all x ∈ Ω.

The feasibility of Problem 1 can be jeopardized as the context changes unless the reward weights

and/or the reference trajectory are adapted to the context. For example, consider the case where a

vehicle is performing some maneuvers with a desired velocity safely under a normal road condition.

If, however, the friction of the road changes and the vehicle does not adapt its aspiration towards its

130

desired reference trajectory, when solving Problem 1, it must either violate its safety speciﬁcations

or the performance function will become unbounded as the system’s state cannot follow the desired

speed without violating safety. Since the performance will be unbound for any safe policy, the

vehicle might only wander around and not reach any goal, providing a very poor performance. This

highlights the importance of proposing a metacognitive framework that adapts to the context.

Remark 1. One might argue that the original weights or desired reference trajectory in Problem 1

can be appropriately designed in a context-dependent fashion to ensure satisfaction of the desired

speciﬁcations across variety of circumstances. However, during the design stage, it is generally not

possible to foresee the circumstances that will cause violation of the desired speciﬁcations and come

up with a context-dependent reward function. This is generally due to modeling errors, unknown

changes in the environment, and operator intervention.

Solving Problem 1 for systems with uncertain dynamics is hard. While RL algorithms can solve

optimal control problems for systems with uncertain dynamics, they typically do so without taking

into account safety constraints. To deal with this challenge, in this chapter, we use two layers of

control to solve Problem 1 and guarantee its feasibility. In the lower layer, an RL algorithm is used

to ﬁnd an optimal controller that minimizes the performance (6.8) without considering the safety

constraints. The metacognitive layer then monitors safety constraints and their level of satisfaction

to proactively make meta-decisions about what reward function to optimize to guarantee feasibility

of Problem 1 as the context changes. To guarantee satisfaction of the desired speciﬁcations with

maximum assuredness, the metacognitive layer must be added on top of the lower-layer optimal

control design to decide about priorities over sub-goals as well as the adaptation of the desired

reference trajectory. The metacognitive layer monitors system-level operation and provides correc-

tive action by optimizing a ﬁtness function that guarantees systems’ liveness and safety, and thus

ensures the maximum assuredness across diﬀerent circumstances.

6.4 Metacognitive Control Architecture

To ﬁnd an optimal solution while always guaranteeing satisfaction of the desired speciﬁcations

with maximum assuredness, as shown in Fig. 6.1, a metacognitive RL algorithm is presented and

it consists of:

131

• A low-level RL based controller K for the system S minimizes the performance (6.8) without

considering the safety constraints.

• A high-level metacognitive controller C adapts the reward function for the low-level controller

K to guarantee feasibility of Problem 1 and to maximize assuredness.

We aim at synthesizing the controller C for the system (6.1) such that the closed-loop sys-
tem achieves the desired objective deﬁned in terms of minimization of a low-level cost function
J(x(t), u(t), xd(t)) in (6.8) while guaranteeing systems’ liveness and safety, i.e., (x, t) (cid:15) σ in the
metacognitive layer. Separation of RL control design to optimize the performance and metacogni-

tive design to maximize assuredness by optimizing a ﬁtness function signiﬁcantly simpliﬁes solving
Problem 1 and allows to present data-based techniques for solving it. Let θ1 ∈ Rd1, θ2 ∈ Rd2, and
θ3 ∈ Rd3 be vectors of parameters in matrices Q(θ1), R(θ2), and xd(θ3). Let ¯λ be deﬁned as the
T . Note that we assume that the
set of all admissible hyperparameters θ where θ := [θT
set of all admissible parameters θ ∈ ¯λ is predeﬁned by the designer based on some prior knowledge.
With a slight abuse of notation, we write rθ, Qθ, and Rθ instead of xd(θ3), Q(θ1), and R(θ2) in
what follows.

2 , θT
3 ]

1 , θT

6.4.1 Metacognitive layer Monitoring and Control

The focus of this subsection is on designing a metacognitive layer that guarantees feasibility of

Problem 1, regardless of the type of the RL-based controller (e.g., policy gradient, policy iteration,

etc) used in the lower layer. While the proposed approach is not limited to any speciﬁc type of

performance, in the sequel, we consider an optimal set-point tracking problem with STL safety

speciﬁcations. That is, we consider the following parameterized reward function (6.9) in terms of

the hyperparameter vector θ.

r(x(τ ), u(τ ), r(τ ), θ) = (x(τ ) − rθ)T Qθ(x(τ ) − rθ) + uT (τ )Rθu(τ )

(6.10)

where Qθ and Rθ are parametrized design weight matrices, which are assumed diagonal, and rθ is
the desired set-point. The hyperparameter can be then deﬁned as the stack of all parameters of

the design weight matrices and the desired set-point. The performance function in Problem 1 then

becomes

132

Figure 6.1: Proposed metacognitive control scheme. S: The system to be controlled; K: low-level RL

controller; C: high-level metacognitive layer scheme.

J(x, u, r) =(cid:82) ∞

t

e−γ(t−τ )[(x(τ ) − rθ)T Qθ(x(τ ) − rθ) + uT (τ )Rθu(τ ))]dτ

(6.11)

Assumption 1. Let u∗
θ(x) be the optimal solution to Problem 1 for a set of hyperparameters θ
in the performance function. At each circumstance, there exists a θ such that its corresponding
optimal control policy u∗

θ(x) is a feasible solution to Problem 1.

Remark 2. Note that the lower layer receives the hyperparameter vector θ from the metacognitive

layer and any RL algorithm for an unknown system can be used to minimize the accumulated reward

function for the speciﬁed θ and determine the optimal control policy in the lower layer. While the

presented framework considers a model-free RL-based control approach in the lower layer, it can be

applied to any performance driven-control architecture such as [170]-[171].

The metacognitive layer monitors the functionality of the system-level operations and performs

133

corrective actions for the lower-level controller when required. That is, hyperparameters (i.e., design

weights and set-point) are adjusted in the metacognitive layer to guarantee feasibility of Problem

1 with maximum assuredness as the context changes. To monitor the functionality of the lower-

layer, i.e., to check if it performs as intended, accumulated robustness degree of temporal logic

speciﬁcations is used to deﬁne a ﬁtness function as a metric for measuring the system’s safety

(constraint satisfying) and liveness (goal-reaching and reference tracking). If the ﬁtness function

drops, which is an indication that the feasibility of the solution to Problem 1 is about to be violated,

the metacognitive layer adapts the hyperparameters of the reward function to guarantee feasibility.

Since feasibility of Problem 1 is only guaranteed if the safety STL speciﬁcations are satisﬁed and

the performance remains bounded, we use degree of robustness of the safety STL speciﬁcations as

well as degree of robustness of goal or reference reaching STL speciﬁcations (liveness speciﬁcations

to assure the performance boundness) and deﬁne a ﬁtness as a metric to proactively monitor the

feasibility of Problem 1 and reacts before STL speciﬁcations are violated.

6.4.1.1 Metacognitive Monitoring

While N safety speciﬁcations are considered as constraints in Problem 1, we deﬁne the N + 1-th

speciﬁcation as the liveness (set-point tracking) of the system as the following STL formula.

σN +1 = (cid:5)[0,ts]¬((cid:107)x(t) − rθ(cid:107) > ε)

(6.12)

with ε as the envelope on the tracking error, and ts as the expected settling time.
Lemma 1. Let u∗
(6.11). If u∗

θ(x) be the optimal control policy found by solving Problem 1 with the performance

θ(x) is feasible, then xt(x0, u∗

θ) (cid:15) σN +1 ∀t.

Proof. For a given θ, under the assumption that there exists a stabilizing control policy, it is shown
in [170] that the performance is bounded for the optimal controller u∗
θ(x). On the other hand, based
on Barbalat lemma [172], a uniformly continuous real function, whose integral up to inﬁnity exists
and is bounded, vanishes at inﬁnity. Therefore, the performance is bounded for u∗
θ(x) if it makes
(cid:107)x(t) − r(t)(cid:107) become very small after the settling time ts. This completes the proof.

134

Remark 3.

Incorporating goal-reaching speciﬁcations can help adapt the reward function to

avoid performing unintended functionalities in many applications. A classic example for which this

could have helped resolve the problem is OpenAI’s demo (https://openai.com/blog/faulty-reward-

functions/) for which an RL agent in a boat racing game kept going in circles while repeatedly

hitting the same reward targets to gain a high score without having to ﬁnish the course.

The STL speciﬁcation (x, t) (cid:15) σN +1 essentially states that the trajectory tracking error should
eventually become less than ε after the expected settling time. Otherwise, the set-point r is aimed

too high to be achieved safely and must be adjusted. The settling time could be obtained from the

knowledge that we have from the control expectation from the lower layer and can be conservative.

We now extend the set of predicates from P = [σ1, ..., σN ] with N as the number of the constraints
to P = [σ1, ..., σN , σN +1] to include the liveness STL predicate. The monitor then predicts if
(x, t) (cid:15) σ will be satisﬁed all the time to make proactive meta decisions accordingly. Let the stack
of predicate functions for safety and liveness speciﬁcation STL σi ∈ Pσ be



zσ(x) =



zσ1(x(t))

zσ2(x(t))

...

zσN +1(x(t))

(6.13)

(6.14)

Based on (6.12), the predicate function for liveness becomes

zσN +1(x(t)) = ε − (cid:107)x(t) − rθ(cid:107) .

Using zσ(x), a ﬁtness function is now designed to monitor and estimate the accumulated level of

satisfaction of the desired STL speciﬁcations (i.e., the safety-value function) in the metacognitive

layer.

If the ﬁtness function drops, which is an indication that either the safety constraints are

about to be violated in the future or liveness of the system will not guaranteed, i.e., the feasibility

of the solution to Problem 1 is about to be violated, then metacognitive layer proactively adapts

the hyperparameters of the reward function to guarantee feasibility.

Deﬁnition 3. Consider a speciﬁc hyperparameter vector θ and let u = µ(x) be a feasible solution
to Problem 1 with the performance (6.11). The set S ∈ X is called a viable set (safe and live) of

135

µ(x) if for every x0 ∈ S, xt(x0, µ) (cid:15) σ ∀t and is deﬁned as

Sµ,θ(x) = {x0 : xt(x0, µ) (cid:15) σ, ∀t ≥ 0}

(6.15)

where σ belongs to the set of predicates P = [σ1, ..., σN , σN +1] with the predicate functions given
in (6.13).

Note that the dependence of Sµ,θ(x) on θ is because the speciﬁcation (6.14) depends on θ.

Lemma 2. Consider a speciﬁc hyperparameter vector θ. If the set Sµ,θ(x) is empty for all control
policies µ ∈ U, then there exists no feasible solution to Problem 1 for the hyperparameter vector θ.

Proof. The set of predicates P includes all safety constraints with predicates σ1, ..., σN as well as
the liveness condition with predicate σN +1 deﬁned in (6.12). If the set Sµ,θ(x) is empty, then, there
is no control policy µ(x) to simultaneously satisfy all safety constraints and make the performance

bounded (based on Lemma 1). Therefore, based on Deﬁnition 1, there is no feasible solution to

Problem 1. This completes the proof.

To monitor the feasibility of Problem 1 for the current hyperparameter vectorθ, we now deﬁne a

ﬁtness function based on the quantitative semantics of STL speciﬁcations under an optimal control
policy found by minimizing (6.11) for the given θ. Let u∗
θ(x) be the optimal control policy found
by minimizing the performance function (6.11) for a given θ and is applied to the system. I will

be shown in the next section how to use oﬀ-policy learning to ﬁnd optimal solutions for many

hyperparameters while only a behavior policy is applied to the system to collect data. Based on
(6.4), once u∗
θ(x) is applied to the system, to monitor the degree of robustness of speciﬁcations
under u∗

θ(x), one can write the conjunction over predicate functions P = [σ1, ..., σN , σN +1] as

ξθ(x, t, σ) = ∧N +1

i=1 ρσi(x, t) =

min

i∈[1,...,N +1]

(cid:0)ρσ1(x, t), ..., ρσN +1(x, t))(cid:1)

(6.16)

where ρσi(x, t) = zσi(x(t)) and x(t) = xt(x0, u∗
generated by (6.7) with the initial condition x0 and the policy u∗

θ(x).

θ(x)) is the state trajectory x(t) and time t ≥ 0

In order to avoid the non-smooth analysis, a smooth under approximation for ξθ(x, t, σ) is

provided in the following lemma.

136

Lemma 3. Consider a conjunction of N + 1 predicate functions given in (6.13) and their overall

robustness ξ(x, t, σ) deﬁned in (6.16). Then,

θ (x, t, σ) (cid:44) − N +1(cid:88)

ξa

ln [e−ρσi(x,t)] ≤ ξθ(x, t, σ)

(6.17)

Proof. See [168].

i=1

Lemma 4. The sign of the function ξa

θ (x, t, σ) is the same as the sign of the function ξθ(x, t, σ).

Proof. It is immediate from (6.17) that ξθ(x, t, σ) < 0 results in ξa
θ (x, t, σ) < 0. On the other
hand, if ξθ(x, t, σ) > 0, then based on (6.16), ρσi(x, t) > 0 for all i ∈ [1, ..., N + 1] and thus
− ln(e−ρσi (x,t)) > 0, i = [1, ..., N + 1], which completes the proof.

Now, a ﬁtness function for a hyperparameter vector θ as a metric for measuring the system’s

safety and liveness in terms of overall robustness is deﬁned as

fθ(x(t)) =

e−a(τ−t)[(1 − l) log(ξa

θ (x, τ, σ)) + (1 + l) log(1 + ξa
θ

−1(x, τ, σ))]dτ

(6.18)

(cid:90) ∞

t

where l = (ξa

θ (x, τ, σ)). The ﬁrst term is a barrier function to make the ﬁtness inﬁnity if the degree
of robustness becomes negative (STL speciﬁcations are violated). On the other hand, the lower the

ﬁtness, the better the robustness of safety and liveness speciﬁcations. This is because the inverse of

the degree of robustness is used in the ﬁtness function.

Note that for the nonempty set Sµ,θ(x) in (6.15), the ﬁtness function in (6.18) becomes

fθ(x(t)) =

e−a(τ−t)[2 log(1 + ξa

θ

−1(x, τ, σ))]dτ

(6.19)

(cid:90) ∞

t

Theorem 1. There exists a control policy µ(x) for which Sµ,θ(x) is nonempty, if and only if the
ﬁtness function in (6.18) is bounded over some set.

Proof. If the set Sµ,θ(x) is empty for all µ(x) ∈ U, then, for any initial condition x0 and anyµ(x) ∈
U, one has xt(x0, µ) (cid:50) σ and consequently ξθ(x, t, σ) < 0 ⇒ ξa
θ (x, t, σ) < 0 for some time t . This
makes the ﬁtness function (6.18) unbounded because of the ﬁrst term. On the other hand, if the set
Sµ,θ(x)is nonempty, then for some control policy µ(x) ∈ U and any initial condition in x0 ∈ Sµ,θ(x),

137

xt(x0, µ) (cid:15) σ, ∀t ≥ 0 . Thus, ξθ(x, t, σ) > 0∀t ≥ 0. Based on Lemma 4, ξa
ε0 = min
t≥0

θ (x, t, σ) > 0∀t ≥ 0. Let

e−a(τ−t) dτ ≤ 2

a

ln(1 +

1
ε0

) < ∞, ∀x0 ∈ Sµ,θ(x)

(6.20)

θ (x, t, σ)} < ∞. Then,
{ξa
(cid:90) ∞

fθ(x0) ≤ 2 ln(1 +

1
ε0

)

t

This completes the proof.

We now present an online data-based approach to learn the ﬁtness function as a function of the

state. Deﬁne the meta reward as

rm(x, t) = (1 − l) log(ξa

θ (x, t, σ)) + (1 + l) log(1 + ξa
θ

−1(x, t, σ))

(6.21)

The ﬁtness function corresponding to u∗

θ(x) at one speciﬁc state can now be interpreted as the
accumulated meta rewards the system receives starting from that state when u∗
θ(x) is applied to
the system, and thus it can be interpreted as a safety-related value for that state. To calculate the

ﬁtness function for all states in a set of interest, one can run the system from all those states to

collect a trajectory and then calculate the ﬁtness function. This, however, is not practical and not

data eﬃcient. To obviate this issue, the ﬁtness function in (6.18) is written as
e−a(τ−t)rm(x, τ ) dτ + fθ(x(t + T ))

fθ(x(t)) =

(cid:90) t+T

t

(6.22)

where fθ(x(t + T )) is the ﬁtness value sampled at the time t + T . This equation resembles the
Bellman equation in RL and its importance is that it allows us to express the ﬁtness values of states

as ﬁtness values of other sampled states. This opens the door for value-function approximation-like

approaches with function approximation for calculating the ﬁtness value for each state. That is,

since the ﬁtness of consecutive samples are related using (6.22) , a parametrized form of the ﬁtness

function or a nonparametric form of it can be used to learn the ﬁtness function for all states of

interest using only a single trajectory of the system. This will allow fast proactive decision making

in the upper layer and will prevent the system from reaching an irreversible crisis, for which no

action can keep the system in its safety envelope in the future. To this end, a data-based approach

will assess the ﬁtness online in real-time without using a model of the system. Once the ﬁtness is

learned, a monitor will detect changes in the ﬁtness and consequently the situation.

138

We ﬁrst consider learning the ﬁtness function. Since the form of the ﬁtness function is not

known, Gaussian processes (GP) is employed to estimate the function fθ(x(t)). In analogy to GP
regression for RL [173], a GP prior is ﬁrst imposed over the ﬁtness function, i.e.,

fθ(x) ∼ N (m0(x), k0(x, x(cid:48)))

with a mean m0(x) and covariance k0(x, x(cid:48)). The covariance form can be chosen to reﬂect the prior
knowledge concerning the similarity of the state’s ﬁtness in the domain of interest. To employ GP

regression, based on (6.22), the temporal diﬀerence (TD) error for ﬁtness function is written as

fθ(x(t)) − fθ(x(t + T )) =

e−a(τ−t)rm(x, τ )dτ

(6.23)

(cid:90) t+T

t

To learn the ﬁtness function using GP, the sequence of samples of the ﬁtness function corresponding

to the trajectory x1, ..., xL is used to present the following generative model.

R(xt+T ) = fθ(x(t)) − fθ(x(t + T )) + δ(t)

(6.24)

where

R(xi) =

e−a(τ−ti−1)rm(x, τ ) dτ

(cid:90) ti

ti−1

and δ(t) ∼ N (0, w2) denotes zero-mean Gaussian noise indicating uncertainty on the ﬁtness func-
tion. Note that (6.24) can be considered as a latent variable model in which the ﬁtness function

plays the role of the latent or hidden variable while the meta reward plays the role of the observable

output variable. As a Bayesian method, GP computes a predictive posterior over the latent values
θ(x)) be the
by conditioning on observed meta rewards. Let X
trajectory collected after u∗
(x) (the optimal control input found by the lower layer RL for the
θi
hyperparameter θ) applied to the system. An algorithm for the derivation of u∗
θ(t), for the new
hyperparameter θ based on the recorded history data will be given and discussed in detail later in

L = [x1, ..., xL] with xt = xt(x0, u∗

θ

Section V. Let deﬁne the following vectors for this ﬁnite-state trajectory of length L as

L = [R(x1), . . . , R(xL)]T
Rθ
L = [fθ(x1), . . . , fθ(xL)]T
f θ
¯δL = [δ(1), . . . , δ(L)]T

(6.25)

and covariance vector and matrices are given by

139

Kθ

(cid:80)
Based on (6.25)-(6.26), one has¯δL
 ∼ N{

f θ
L

K(x, X

θ
L) = [k0(x1, x), . . . , k0(xL, x)]T ,
T
θ
,
L)]

θ
L), . . . , K(xL, X

L = [K(x1, X
L = diag(w2, . . . , w2)

 0

m0(x)

 ,

(cid:80)

0

L
0 Kθ
L

}

Using (6.24), for a ﬁnite state trajectory of length L one has

where

Rθ
L−1 = HLf θ

L + ¯δL−1

1 −1

0

. . .
1 −1 . . .

0



0
...
1 −1

L +(cid:80)

HL =


 0

0
...

0

m0(x)

(6.26)

(6.27)

(6.28)

(6.29)

(6.30)

}

(6.31)

(6.32)

(6.33)

Based on standard results on jointly Gaussian random variable, one has

L−1×L

Rθ

L−1
fθ(x)

 ∼ N{

0

. . .

 ,

HLKθ

LHT
T
K(x, Xθ
L)

L−1 HLK(x, Xθ
L)
k0(x, x(cid:48))

HT
L

Using (6.6), the posterior distribution of the ﬁtness function fθ(x(t))at state x(t), conditioned on
observed integral meta reward values Rθ

t−1 is given by
t−1) ∼ N (νθ

(fθ(x(t))|Rθ

t (x), pθ

t (x))

where

with

T
θ
νθ
t (x) = m0(x) + K(x, X
t )
t (x) = k0(x, x) − K(x, X
θ
pθ
t )

αθ
t
T
Cθ

t K(x, X

θ
t )

t +(cid:80)
t +(cid:80)

t HT

−1

Ht

t−1)
−1

t−1)

Ht

T

αθ
t = Rθ
t−1
t (HtKθ
Cθ
t = HT

(HtKθ
t HT

Based on (6.22), the following diﬀerence error is used as a surprise signal to detect changes in

the ﬁtness.

SP (t) = νt(x) − νt+T (x) − R(xt)

(6.34)

140

where νt(x) is the mean of the GP and SP (t) is the surprise signal at the time t. Note that after
the GP learns the ﬁtness, the surprise signal will be small, as the GP is learned to assure that

(6.22) is satisﬁed. However, once the robustness degree changes due to a change in the situation,

the surprise signal will increase, and if the average of the surprise signal over a horizon is bigger

than a threshold, a new ﬁtness function will be learned and metacognitive decisions will be made

(as explained in the next section) to improve the ﬁtness if it is below some desired state-dependent

threshold. That is, the monitor will perform an evaluation of the surprise signal in moving horizon

fashion and will indicate a change if

(cid:90) t+∆

t

SP (τ ) dτ ≥ β

(6.35)

for some threshold β. The metacognitive layer does not adapt the hyperparameters all the time and

only adapts them when two conditions are satisﬁed: 1) an event indicating a change is triggered,

i.e., if (6.35) is satisﬁed, 2) the new learned ﬁtness is below a threshold, i.e., it does not indicate

future safety and liveness.

To monitor the second requirement for detecting a threat that requires adapting the hyperpa-

rameters, inspired by [173], a KL divergence metric is deﬁned to measure similarity the GP learned

for the current ﬁtness and a base GP. The base GP can be obtained based on the knowledge of

the constraints and STL speciﬁcations to assure the minimum safety of the system. Note that

constructing the base GP only requires the knowledge of the STL speciﬁcations and is independent

of the system dynamics, the situation, or the control objectives. A library of safe GPs can also be

constructed as based GP and the previous learned GPs for other circumstances can be added. If

the ﬁtness remains close to any of the GPs in this library, this indicates that the system safety is

still not in danger. If not, it is highly likely that the system’s safety is in danger of being violated

in the near future.

Since the covariance function K corresponds to a (possibly inﬁnite-dimensional) feature space,

the Gaussian process can be viewed as a Gaussian distribution in the feature space. To show this,

let φ(x) be the feature space representation of the covariance function so that K(x, x) = φ(x)T φ(x).

Then, we use the notation

fθ(x) = GPK (αθ

t , Cθ
t )

(6.36)

141

to denote the GP with the corresponding covariance function K and parameters αθ

t and Cθ

t . We

deﬁne the GP for the base ﬁtness as

fb(x) = GPK (αb

t , Cb
t )

(6.37)

Lemma 5 [174]. Let the base GP and the GP for the hyperparameters θ share the same inducing
L) = Q−1.

θ
L = [x1, ..., xL], and the same covariance function K. Let K(X

inputs, i.e., Xb

θ
L, Xb

L = X

Then, the KL divergence between two dynamic GPs fθ(x) and fb(x) is given by

DKL(fθ(x)||fb(x)) = DKL(GPK (αθ
V (αθ

= (αθ
−1 and W = Tr[(Q + Cb

t , Cθ
t − αb

t )||GPK (αb
t − αb
T
t )
t ) + W
t )V − I] − log [(Q + Cb

t )V ]

t , Cb

t ))

where V = (Q + Cθ
t )

(6.38)

Remark 4. Since the base ﬁtness function is learned oﬄine based on the minimum acceptable

degree of robustness, one can select many inducing points for the base ﬁtness function and retain

only a subset of them in the expressions of the posterior mean and kernel functions that increase

the similarity of the inducing inputs for both GPs. To use the KL divergence metric (38), one can
use the fact that K(x, x) = φ(x)T φ(x), and so K(x, x1) = K(x, x2)K(x1, x2), to shift the inducing
points of the base ﬁtness function to those of the learned GP.

Let fb(x) = [fb1
DKL(fθ(x)||fbi

(x)] be the stack of base GPs. After a change, if the condition
(x), ..., fbM
(x)) ≤ , with fb(x) as the ﬁtness before change, then this indicates that the
system is still safe despite the change. Therefore, the monitor now triggers an event that requires

min

i

adaptation of the hyperparameters if the following STL condition is violated

(cid:90) t+∆

ϕ = (cid:3) ( (

(cid:124)

t

SP (τ ) dτ > β)

(cid:123)(cid:122)

ϕ1

(cid:125)

∧ min
i

(cid:124)

(cid:123)(cid:122)
(DKL(fθ(x)||fbi

ϕ2

(x)) > )

)

(cid:125)

(6.39)

6.4.1.2 Metacognitive Control

to ﬁnd a new set of hyperparameters to guarantee that the ﬁtness is improved, i.e., max

In this section, after the STL speciﬁcation (6.39) is violated, a metacognitive controller is presented
DKL(fθ(x)||fbi
 with the minimum sacriﬁce on the performance. That is, it is desired to assure safety while
achieving as much performance as possible close to the θ∗ be the optimal hyperparameter found to

i

(x)) ≤

142

optimize the performance prior to the change. The metacognitive layer then performs the following

optimization.

min (cid:107)θ − θ∗(cid:107)

s.t. min

i

DKL(fθ(x)||fbi

(x)) ≤ 

(6.40)

To solve this optimization, one can deﬁne the survival score function in the metacognitive layer as

H(θ) = ||θ − θ∗|| + log(1 +

 − min
i

1

DKL(fθ(x)||fbi

)

(x))

(6.41)

In this chapter, safe Bayesian optimization (SBO) [175] is used to ﬁnd the optimal hyperparame-

ters that optimize the survival score function in (6.41). SBO algorithm provided in Algorithm 1

guarantees safety by only evaluating hyperparameters that achieve a safe score threshold with high

probability. This threshold is chosen as a value below which we do not want the system ﬁtness to

fall, which can be an indication of great risk of violation of speciﬁcations. SBO is a sample eﬃcient

optimization algorithm that requires only a few evaluations of the hyper performance or survival

score function to ﬁnd the optimal hyperparameters. While the safe set of hyperparameters is not

known in the beginning, it is estimated after each function evaluation of SBO. In fact, at each

iteration, SBO tries to not only ﬁnd the global maximum within the currently known safe set (ex-

ploitation), but also to increase the set of hyperparameters that are known to be safe (exploration)
as described in Algorithm 1. More speciﬁcally, the SBO builds a surrogate model P that maps the
hyperparameters to the survival score H(θ)in the metacognitive layer and express as

P : D(cid:48) → R

(6.42)

where D(cid:48)

denotes the bounded domain of hyperparameters.

Note that the latent function P in (6.42) is unknown, as the dynamics are not known. P can be
sampled by running the system and evaluating the survival score function H(θ) from the recorded
data. These samples of P are generally uncertain because of noisy data and the score function is
typically nonconvex, and no gradients are easily available.

Now, we used GP to perform a non-parametric regression for the latent function P [169] and
then SBO is used to optimize the survival score and determine the optimum values of hyperpa-

rameters. For the non-parametric regression, we deﬁne a prior mean function µ0(θ), which encodes
prior knowledge about the survival score function P(θ), and a covariance function k(θ, θ
) which

(cid:48)

143

deﬁnes the covariance of any two function values, P(θ) and P(θ(cid:48)) are used to model the uncertainty
about the mean estimates. One can predict the survival score of the system corresponding to the

hyperparameter θ by calculating its mean µk(θ) and covariance σk
{θ1:k, P1:k} and given by P(θ) ∼ N (µk(θ), σk

2(θ)).

2(θ) over a set of k observations

Based on the predicted score function, i.e., P(θ) ∼ N (µk(θ), σk

2(θ)), the lower and upper bound

of the conﬁdence interval at iteration k is given as

 ¯mk(θ) = µk−1(θ) − βkσk−1(θ)

Mk(θ) = µk−1(θ) + βkσk−1(θ)

,

(6.43)

(6.44)

(6.45)

where βk > 0 denotes a scalar factor that deﬁnes the desired conﬁdence interval. Based on (6.43),
the safe set of all the hyperparameters θ that lead to survival score values above the threshold Pmin
is given by

Sk ← {θ ∈ D

(cid:48)| ¯mk ≥ Pmin}

Then, a set of potential maximizers is deﬁned as [175]

Tk ← {θ ∈ Sk|Mk(θ) ≥ max

(cid:48) ¯mk(θ)}

θ

which contains all the safe set of hyperparameters for which the upper conﬁdence interval Mk(θ) is
above the best safe lower bound. In order to deﬁne, a set of potential expanders which quantiﬁes

whether a new set of hyperparameters can be classiﬁed as safe for a new observation, an optimistic

characteristic function for expanders is given as

gk(θ) = {θ

(cid:48) ∈ D

(cid:48)\Sk| ¯mk,(θ,Mk(θ))(θ

(cid:48)

) ≥ Pmin}

(6.46)

where ¯mk,(θ,Mk(θ)) is the lower bound of the GP, based on prior data and a data point (θ, Mk(θ))
with a noiseless measurement of the upper conﬁdence bound. Based on (6.46), one can determine

how many previously unsafe points can be classiﬁed as safe according to (6.44) assuming that we
measure Mk(θ) while evaluating P(θ). The characteristic function is positive if the new data point
has a non-negligible chance to expand the safe set. Therefore, the set of possible expanders is given

as

Gk ← {θ ∈ Sk|gk(θ) > 0}

(6.47)

144

Then, a new set of hyperparameters is selected to evaluate the performance on the real system by

selecting the set of hyperparameters about which we are the most uncertain from the union of the
sets Gk and Mk, i.e., at iteration k the score function is evaluated at θk
θk ← arg maxθ∈{Gk∪Tk}(Mk(θ) − ¯mk(θ))

(6.48)

The evaluation approach in (6.48) works well for expanding the safe set [175], with a trade-oﬀ

between exploration and exploitation. For the exploration, the most uncertain parameter locations

are usually on the boundary of the safe set, which results in eﬃcient exploration. An estimate of
the best currently known set of hyperparameters is obtained from arg max ¯mk(θ), ∀θ ∈ Sk which
corresponds to the point that achieves the best lower bound on the survival score.

Algorithm 4 SBO for Metacognitive Control.

(cid:48)| ¯mk ≥ Pmin}where ¯mk(θ) = µk−1(θ) − βkσk−1(θ)

¯mk(θ)} where Mk(θ) = µk−1(θ) + βkσk−1(θ)

1: procedure
2: Initialize GP with (θ0,P(θ0))
3: for k = 1, . . . do
4: Sk ← {θ ∈ D
5: Tk ← {θ ∈ Sk|Mk(θ) ≥ max
(cid:48)
6: Gk ← {θ ∈ Sk|gk(θ) > 0} where gk(θ) = {θ
7: θk ← arg maxθ∈{Gk∪Tk}(Mk(θ) − ¯mk(θ))
8: Obtain measurement P(θk)
9: Update GP with (θk,P(θk))
10: end for
11: end procedure

θ

(cid:48) ∈ D

(cid:48)\Sk|mk,(θ,Mk(θ))(θ

(cid:48)

) ≥ Pmin}

Remark 5. It is shown in [176]-[177] that, given a persistently exciting input, a single rich measured

trajectory can be used to characterize the entire system trajectories. That is, having a single rich

trajectory of an unknown system, the trajectory for a given sequence of inputs and an initial

condition can be constructed without even applying it to the system. This can be leveraged to

learn the ﬁtness function for a given control policy selected by the Bayesian optimization algorithm

without actually applying it to the system. More speciﬁcally, after a change, to evaluate the ﬁtness

of hyperparameters, a rich trajectory of the system is ﬁrst collected and then used to reconstruct

145

the trajectory of the system with enough length for any hyperparameter under evaluation. This

trajectory data can then be used to learn the GP for the hyperparameters without even applying its

corresponding controller. This will allow us to even evaluate unsafe policies without even applying

them to the system. Hence, for each set of hyperparameters, one can compute the ﬁtness function

from measured data without knowledge of the closed-loop system’s dynamics, and consequently,

ﬁnd the optimal hyperparameters that optimize the survival score function in (40). This resembles

oﬀ-policy learning in RL.

Remark 6. Note that SBO is a derivative-free optimization algorithm, which is useful since a
closed-form expression of P as a function of the hyperparameters θ is not available. Also, it allows
us to tune the hyperparameter with as few evaluations of P as possible which is crucial since each
evaluation can be costly and time-consuming, as it requires a closed-loop experiment.

6.5 Low-Level RL-Based Control Architecture

In this section, a computational data-driven algorithm is ﬁrst developed to ﬁnd the new op-

timal control policy u∗
shown that the proposed algorithm converges to the optimal control policy u∗
hyperparameters θ.

θ(t), for the new hyperparameter θ based on recorded history.

It is then
θ(t) for all admissible

Following the same reasoning from [171], we are ready to give the following computational data-
θ(t), for a new hyperparameter θ

driven algorithm for ﬁnding the new optimal control policy, i.e., u∗
based on the recorded history data.

Remark 7. Note that Algorithm 2 does not rely on the dynamics of the system. Note also that,

inspired by the oﬀ-policy algorithm in [171], Algorithm 2 has two separate phases. In the ﬁrst phase,

i.e., Step 2, a ﬁxed initial exploratory control policy u is applied, and the system information is

recorded over the time interval [t0, tl]. In the second phase, i.e., Steps 3-7, then, without requiring
any knowledge of the system dynamics, the information collected in the ﬁrst phase is repeatedly
used to ﬁnd a sequence of updated policies converging to u∗.

The following theorem shows that Algorithm 2 converges to the optimal control policy u∗

θ(t) for

all admissible hyperparameters θ.

146

Algorithm 5 Low-level Oﬀ-policy RL-based Control.

1: procedure
2: Perform an experiment on the time interval [t0, tl] by applying ﬁxed stabilizing control
policy u(t)+e to the system, where e is the exploration noise, and then record {Xθ(t)}
and corresponding {u(t)} at N ≥ l1 + m × l2 diﬀerent sampling in the time interval
[t0, tl], where Xθ(t) = [ed(t)T , rθ

ed(t) := x(t) − rθ.

3: For a new parameters of θ where θ ∈ ¯λ, construct φ1(Xθ) ∈ Rl1 and Φ(Xθ) ∈ Rl2 as

T ]

T

suitable basis function vectors.

4: Compute Ξk(θ) and Θk(θ) for a new set of parameters θ based on the recorded history

(cid:20)vec( ¯Qθ)
(cid:21)

Ξk(θ) = −Ixx(θ)

data as follows:

and

vec(Rθ)
k Rθ) − 2Ixu(θ)(In ⊗ Rθ)

(6.49)

(6.50)

Θk(θ) = [δxx,−2 ¯Ixx(θ)(In ⊗ ˆ¯W T

where ¯Qθ := diag(Qθ, 0) and

δxx = [e−γT (φ1(Xθ(t1)) − φ1(Xθ(t0))), e−γT (φ1(Xθ(t2)) − φ1(Xθ(t1))), . . . ,

(6.51)
e−γ(τ−t)(Φ(Xθ) ⊗ Φ(Xθ))dτ , . . . ,

e−γT (φ1(Xθ(tl)) − φ1(Xθ(tl−1)))]T
¯Ixx(θ) = [(cid:82) t1
(cid:82) tl

e−γ(τ−t)(Φ(Xθ) ⊗ Φ(Xθ))dτ ,(cid:82) t2

t0
e−γ(τ−t)(Φ(Xθ) ⊗ Φ(Xθ))dτ ]T
tl−1

t1



[(cid:82) t1
(cid:82) t2
. . . ,(cid:82) tl

t0

t1

e−γ(τ−t)Xθ ⊗ Xθdτ ,
e−γ(τ−t)Xθ ⊗ Xθdτ,
e−γ(τ−t)Xθ ⊗ Xθdτ ]
tl−1
e−γ(τ−t)( ˆ¯W T
k Φ(Xθ) ⊗ ˆ¯W T
e−γ(τ−t)( ˆ¯W T
k Φ(Xθ) ⊗ ˆ¯W T
tl−1

[(cid:82) t1
(cid:82) t2
. . . ,(cid:82) tl
e−γ(τ−t)(Φ(Xθ) ⊗ u)dτ ,(cid:82) t2

k Φ(Xθ) ⊗ ˆ¯W T

e−γ(τ−t)( ˆ¯W T

t1

t0

k Φ(Xθ))dτ ,
k Φ(Xθ))dτ ,

t1

e−γ(τ−t)(Φ(Xθ) ⊗ u)dτ ]T

,(cid:82) tl

tl−1

Ixx(θ) =

Ixu(θ) = [(cid:82) t1

t0

k Φ(Xθ))dτ ]

e−γ(τ−t)(Φ(Xθ) ⊗ u)dτ , . . .

T



(6.52)

(6.53)

(6.54)

5: Solve ˆW V

k ∈ Rl1 and ˆ¯Wk+1 ∈ Rl2×m from (6.55)

147

(cid:35)

(cid:34)

ˆW V
k
vec( ˆ¯W T
k+1)

= (ΘT

k (θ)Θk(θ))

−1

ΘT

k (θ)Ξk(θ)

(6.55)

and update the value function and control policy as follows:

T
k (Xθ) := ( ˆW V
ˆV θ
k )

φ1(Xθ)

(6.56)

and

uk+1
θ

(t) := ˆ¯W T

6: Let k ← k + 1, and go to Step 5 until

(cid:13)(cid:13)(cid:13) ˆW V

k+1Φ(Xθ)
k − ˆW V
k−1

(cid:13)(cid:13)(cid:13) ≤ ε for k ≥ 1, where the

(6.57)

constant ε > 0 is a predeﬁned small threshold.

7: Use u∗

(t) and V θ∗

θ(t) := uk+1

k (Xθ) as the approximated optimal control
policy and its approximated optimal value function corresponding to a new set of θ,
respectively.

(Xθ) := ˆV θ

θ

8: end procedure

Theorem 2. Convergence of Algorithm 2. Let the new hyperparameter θ be admissible. Using
the ﬁxed stabilizing control policy u(t), when N ≥ l1 + m× l2, uk+1
in the oﬀ-policy Algorithm 2, converges to the optimal control policy u∗

(t) obtained from solving (6.55)

θ

θ(t), ∀θ ∈ ¯λ.

Proof. ∀θ ∈ ¯λ, set Qθ, Rθ, and rθ. Using (6.49)-(6.54), it follows from (6.55) that ˆW V
ˆ¯Wk+1 ∈ Rl2×m satisfy the following Bellman equation.

k ∈ Rl1 and

1 (φ1(Xθ(t + δt)) − φ1(Xθ(t)))
e−γ(τ−t)((Xθ)T ¯Qθ(Xθ)) + uk
e−γ(τ−t)(−2( ˆ¯W T

T Rθuk)dτ
k+1Φ(Xθ))Rθ(u − uk))xdτ

(6.58)

t

t

e−γT ˆW T

= −(cid:82) t+δt
+(cid:82) t+δt
 W V

vec( ¯W T )

 = (ΘT

Now, let W V ∈ Rl1 and ¯W ∈ Rl2×m such that uk = ¯W T Φ(Xθ) and

−1

k (θ)Θk(θ))

ΘT

k (θ)Ξk(θ)

(6.59)

Then, one immediately has W V = ˆW V
k+1). If the condition given in Step
2 is satisﬁed, i.e., the information is collected at N ≥ l1 + m × l2 points, then [ ˆW V
T
k+1)]
has N independent elements, and therefore the solution of least squares (LS) in (6.55) and (6.59)

k and vec( ¯W ) = vec( ˆ¯W T

k , vec( ˆ¯W T

are equal and unique. That is, ˆW V

k = W V and ˆ¯Wk+1 = ¯W . This completes the proof.

148

6.6 Simulation Results

The presented algorithm is validated for the steering control of an autonomous vehicle in the

lane changing scenario. Consider the following vehicle dynamics for steering control as [178]

˙x(t) = Ax(t) + Bu(t)

(6.60)

with

A =



0

0

0 vT

0

0

vT

0

0

1

akf−bkr

0

− kf +kr
mT vT
0 − akf−bkr

IT

− mT vT +

mT vT

vT
− a2kf +b2kr

IT vT

 , B =





0

0
kf

mT vT
akf
IT

(6.61)

where x(t) = [x1 x2 x3 x4]
of vehicle y, yaw angle ψ, slip angle α and rate of change of yaw angle ˙ψ. Moreover, δ represents

. The state variables are deﬁned as lateral position

T

T
= [y ψ α ψ]

steering angle and act as the control input. vT denotes the longitudinal speed of the vehicle. mT is
the total mass of the vehicle and IT is its moment of inertia with respect to the center of the mass.
kf and kr denote the stiﬀness parameters of the tire. Finally, a and b show the distance of front
and rear tires to the center of the mass. The value of vehicle parameters used in the simulation are

provided in Table I.

Table 6.1: Vehicle Parameters

Parameter

mT
IT
vT
a

Value
1300 kg

10000 m2.kg

16 m/s
1.6154 m

Parameter

Value

kf
kr
β
b

91000 N/rad
91000 N/rad

2

1.8846 m

For the validation of the presented algorithm, lane changing scenario for two lane highway is

consider as shown in Fig. 6.2. In this simulation following STL constraint is considered, the vehicle
state is subject to desired speciﬁcation on the oﬀset from center line, i.e., ϕ = (cid:3) (|x1 − r| < 1)

149

Figure 6.2: Lane changing scenario for steering control of autonomous vehicle.

Figure 6.3: Lane changing with ﬁxed value of hyperparameter without any change in dynamics.

with r as the center lane trajectory (act as set point value). In the simulation, set point value is

selected as r = 1 for t < 4s, otherwise r = 3. Fig. 6.3 shows the result for lane changing with

ﬁxed values of hyperparameters and without any change in system dynamics. The control policy or

steering angle in (60) is evaluated based on oﬀ-policy RL algorithm in [171] with hyperparameter

values as Q = diag(10, 10, 10, 10) and R = 2. Then, in Fig. 6.4, we consider the change in the

system dynamics after t=4s and it becomes

˙x(t) = (A + ∆A)x(t) + (B + ∆B)u(t)

(6.62)

150

0246810Time (s)01234Lateral Position (m)r=3r=1Lane 1Lane 2with

∆A =



0

0

0

0 ∆vT

0

0

0

 , ∆B =





0

0
kf

mT ∆vT

0

(6.63)

∆vT

0

− kf +kr
mT ∆vT

0

0

0

− mT ∆vT +

akf−bkr
∆vT
mT ∆vT
− a2kf +b2kr
IT ∆vT

The vehicle parameter values for (6.63) are provided in Table I. Control input is evaluated based

on oﬀ-policy RL algorithm in [171] with ﬁxed hyperparameter values as Q = diag(10, 10, 10, 10) and

R = 2. The result in Fig. 6.4 shows that that the vehicle starts wavering and goes out of the lane.
That is, vehicle violates the desired speciﬁcation, i.e., ϕ = (cid:3) (|x1 − r| < 1) after change in the
dynamics.

Now, in order to implement the presented algorithm, ﬁrst the ﬁtness function in (6.22) is learned

as GP based on the temporal diﬀerence in (6.23) . Note the ﬁtness function is learned oﬄine and

it is implemented for online monitoring and control in meta-cognitive layer. Figs. 6.5 and 6.6 show

the vehicle trajectories in Figs. 6.3 and 6.4. Based on the result in predicted ﬁtness function based

on the learned GP for the vehicle trajectories in Figs. 6.3 and 6.4. Based on the result in vehicle

trajectories in Figs. 6.3 and 6.4. Based on the result in Figs. 6.6 and 6.7, one can see how the ﬁtness

value grow due to the operation of the system close or beyond the desired STL speciﬁcation. The

ﬁtness value is used for meta-cognitive monitoring and intermittent evaluation of the meta-cognitive

control layer.

Figure 6.4: Constraint violation during lane changing with ﬁxed value of hyperparameter and change in

dynamics.

151

0246810Time (s)01234Lateral Position (m)Lane 2Lane 1r=3r=1Figure 6.5: Predicted ﬁtness corresponding to vehicle trajectory under normal operation.

Figure 6.6: Predicted ﬁtness corresponding to vehicle trajectory under constraint violation.

Figure 6.7: Overall ﬁtness value under desired STL constraint violation for the vehicle trajectory.

Based on the meta-cognitive monitor in (39), Algorithm 1 is evaluated using the survival score

function in (41) and determine the optimum hyperparameter to ensure the desired STL speciﬁcation,
i.e., ϕ = (cid:3) (|x1 − r| < 1). Fig. 6.8 shows the vehicle trajectory with hyperparameter adaptation

152

0246810Time (s)0123Predicted Fitness0246810Time (s)01234Predicted Fitness 0246810Time (s)0200040006000800010000Overall Fitness Valuebased on Algorithm 1 with change in dynamics. The new optimum hyperparameter values are

found to be Q = diag(96.11, 1.2, 1, 1.5) and R = 1. Also, Fig. 6.9 shows how the predicted ﬁtness

value converges close to zero after the hyperparameter adaptation and overall ﬁtness value becomes

constant as shown in Fig. 6.10.

Figure 6.8: Vehicle trajectory with hyperparameter adaptation based on Algorithm 1 for lane changing

scenario.

Figure 6.9: Predicted ﬁtness corresponding to vehicle trajectory with hyperparameter adaptation based on

Algorithm 1.

The presented algorithm is employed to learn control solutions with good enough performances

while satisfying desired speciﬁcations and properties expressed in terms of STL. As shown in Figs.

6.8 and 6.9, based on meta-cognitive layer hyperparameters are adapted and lane changing problem

for autonomous vehicle is solved without violating any constraint.

153

0246810Time (s)01234Lateral Postion (m)r=3r=1Lane 1Lane 20246810Time (s)01234Predicted Fitness Figure 6.10: Overall ﬁtness value under desired STL constraint for adapted vehicle trajectory .

6.7 Conclusion

In this chapter, an assured metacognitive RL-based autonomous control framework is presented

to learn control solutions with good enough performances while satisfying desired speciﬁcations and

properties expressed in terms of STL. We discussed that the pre-speciﬁed reward functions cannot

guarantee the satisfaction of the desired speciﬁed constraints and properties across all circumstances

that an uncertain system might encounter. That is, the system either violates safety speciﬁcations or

achieves no optimality and liveness speciﬁcations. In order to overcome this issue, learning what re-

ward functions to choose to satisfy desired speciﬁcations and to achieve a good enough performance

across variety of circumstances, a metacognitive decision-making layer is presented in augmentation

with the performance-driven layer. More speciﬁcally, an adaptive reward function is presented in

terms of its gains and adaptive reference trajectory (hyperparameters), and these hyperparameters

are determined based on metacognitive monitor and control to assure the satisfaction of the de-

sired STL safety and liveness speciﬁcations. The proposed approach separates learning the reward

function that satisﬁes speciﬁcations from learning the control policy that maximizes the reward and

thus allows us to evaluate as many hyperparameters as required using reused data collected from

the system dynamics.

154

051015Time (s)0200400600Overall Fitness ValueCHAPTER 7

CONCLUSION AND FUTURE WORK

This dissertation analyzed the adverse eﬀects of attacks and designed resilient distributed control

mechanisms for multi-agent cyber-physical systems with guaranteed performance and consensus

under mild assumptions. The eﬀectiveness of the developed approach is certiﬁed by applying it

to distributed frequency and voltage synchronization of AC microgrids under data manipulation

attacks. Then, the adverse eﬀects of cyber-physical attacks on distributed sensor networks are

analyzed and attack mitigation mechanism for the event-triggered distributed Kalman ﬁlter is pre-

sented. It is shown that although event-triggered mechanisms are highly desirable, the attacker can

leverage the event-triggered mechanism to cause triggering misbehaviors which signiﬁcantly harms

the network connectivity and performance. Then, an entropy estimation-based attack detection

and mitigation mechanisms are designed. Finally, the safe reinforcement learning framework for

autonomous control systems under constraints is developed. Reinforcement learning agents with

pre-speciﬁed reward functions cannot provide guaranteed safety across variety of circumstances that

an uncertain system might encounter. To guarantee performance while assuring the satisfaction of

safety constraints across a variety of circumstances, an assured autonomous control framework is

designed by empowering RL algorithms with meta-cognitive learning capabilities.

The following are some of the directions for the future work

• A possible direction for future work is to extend the results of resilient control designs to
synchronization of DMASs with heterogeneous nonlinear dynamics. Since nonlinear systems

can exhibit ﬁnite-time escape behavior, a problem of interest is to ﬁnd the conditions un-

der which the attacker can make the trajectories of agents unbounded in ﬁnite time and to

obtain detection and mitigation mechanisms to counteract such attacks fast and thus avoid

instability.

• Another possible direction is to extend the presented results to the containment control prob-
lem for which there exists more than one leader or exo-system with diﬀerent dynamics under

network uncertainties.

155

• Extension to meta-cognitive resilient design, i.e., a combination of high-level rules with au-
tonomy can be the next level of resiliency which allows the system to learn and adapt from

adversarial situations and acquire a level of resiliency against the unforeseen using some prior

knowledge.

• Also possible direction for future work is to extend the results of safe reinforcement learning

to a safe learning-based control framework with conﬂicting constraints.

156

APPENDIX

157

A.1 Proof of Theorem 1 in Chapter 5

Note that for the notional simplicity, in the following proof, we keep the sensor index i but

ignore the time-indexing k. Without the time index, we represent the prior at time k + 1 as

¯xa
i (k + 1)

∆
= (¯xa

i )+ and follow the same for other variables.

Using the process dynamics in (5.1) and the corrupted prior state estimate in (5.17), one has

i )+ = x+ − (¯xa
(¯ηa

i )+ = A(x − ˆxa

i ) + w,

(1)

where the compromised posterior state estimate ˆxa

i (k) follows the dynamics (5.17). Similarly, using

(5.17), the corrupted posterior state estimation error becomes

i = x − ˆxa
ηa

i = x − ¯xa

i − Ka

i (yi − C ¯xa

i ) − γi

j − ˜xa
(˜xa

i ) − Ka

i fi.

(cid:88)
j∈Ni

Then, one can write (1)-(2) as (¯ηa

where

i )+ = Aηa
i = (In − Ka
ηa
(cid:88)
j∈Ni

ua
i = γi

i + w,
i Ci)¯ηa

i − Ka

i vi + ua
i ,

j − ˜ηa
(˜ηa

i ) − Ka

i fi.

(2)

(3)

(4)

(5)

(6)

(7)

Based on (5.4), we deﬁne the predictive state estimation error, respectively, under attack as

i )+ = x+ − (˜xa
(˜ηa
i (¯ηa

= ζ+

i )+
i )+ + (1 − ζ+

i )A˜ηa
i .

Using (3), the corrupted covariance of the prior state estimation error becomes

(cid:104)

(cid:104)

i )+)T(cid:105)

( ¯P a

i )+ = E

(¯ηa

i )+((¯ηa
(Aηa

,
i + w)(Aηa

= E

i + w )T(cid:105)

= A ˆP a

i AT + Q.

Using the corrupted predictive state estimate error (˜ηa

i )+ in (5) with ( ¯P a

i,jAT + Q,

one can write the cross-correlated predictive state estimation error covariance ( ˜P a

i,j)+ = A ˆP a
i,j)+ as

( ˜P a

i,j)+ = E

j )+)T(cid:105)

(cid:104)
i )+((˜ηa
(˜ηa
i (1 − ζ+
+ζ+
i ζ+
j ( ¯P a

= ζ+

i,j)+ + (1 − ζ+

j )A( ˘P a
i,j)+ + (1 − ζ+

i )ζ+
j (
i )(1 − ζ+
j )(A ˜P a

a
(cid:95)
i,j)+AT
P
i,jAT + Q),

158

where

a
(cid:95)
i,j and ˘P a
P

i,j be the cross-correlated estimation error covariances and their updates are given

in (8)-(9).

The cross-correlated estimation error covariance (

a
(cid:95)
i,j)+ in (7) is given by
P

(cid:104)

a
(cid:95)
i,j)+ = E
P

(

j )+)T(cid:105)

= ζ+

(˜ηa
i )+((¯ηa
i ( ¯P a
(1 − ζ+

i,j)+ + (1 − ζ+
a
(cid:95)
i,j M a
i )[A
P

r∈Ni
i AT + Q],

i )A(cid:80)

i,r − ˜P a
( ˜P a

i,j)(γiA)T +

(8)

where ˜P a

i,j and ˘P a

i,j denote the cross-correlated estimation error covariances evolve according to (7)

and (9). Similarly, ( ˘P a

i,j)+ is updated based on the expression given by

( ˘P a

i,j)+ = E

(cid:104)

(cid:104)

(¯ηa

j )+)T(cid:105)
i )+((˜ηa
j )+ + (1 − ζ+
i )+(ζ+
(¯ηa
(cid:80)
i,j)+ + (1 − ζ+
j )[A(M a
j ( ¯P a
+(1 − ζ+
s∈Ni

j )Aγi

j (¯ηa

= E

= ζ+

j + w) )T(cid:105)

j )(A˜ηa
a
i )T (cid:95)
i,j AT + Q]
P
s,j − ˜P a
i,j)AT .
( ˜P a

(9)

(10)

(11)

(12)

(13)

Now using (2)-(5), one can write the covariance of posterior estimation error ˆP a

i as

i )(Ka
i )(γiua
Using (6) and measurement noise covariance, the ﬁrst two terms of (10) become

i )T ] + E[Ka
i )T ]+E[(γiua

i = E[Mi ¯ηa
ˆP a
−2E[Ka

i vi)T ]−2E[(Mi ¯ηa
i )T ] + 2E[(Mi ¯ηa

i (Mi ¯ηa
i vi(γiua

i vi)T ]
i )T ],

i )(γiua

i vi(Ka

E[Mi ¯ηa

i (Mi ¯ηa

i )T ] = Mi

i M T
¯P a

i , E[Ka

i vi(Ka

i vi)T ] = Ka

i Ri(Ka

i )T
.

According to Assumption 1, the measurement noise vi is i.i.d. and uncorrelated with state estimation
errors, therefore, the third and fourth terms in (10) become zero. Now ua
i in (4) and Assumption

1, the last two terms in (10) can be simpliﬁed as

E[(ua

i )(ua

i )T ]= γi

2(E

+E[Ka

= γi

−2Ka

(cid:104)
[(cid:80)
2((cid:80)

j∈Ni
i fi(Ka
j∈Ni
i E[fi
(cid:80)
(cid:80)
j∈Ni

= 2γi

i ))]T(cid:105)

i )][(cid:80)

(cid:80)
j − ˜ηa
j∈Ni
(˜ηa
i fi)T ]−2Ka
i E[fi
(cid:80)
j − 2 ˜P a
( ˜P a
i,j + ˜P a
j − ˜ηa
j∈Ni
(˜ηa

j − ˜ηa
(˜ηa
j − ˜ηa
j∈Ni
(˜ηa
i )T ]),
i Σf
i ) + Ka
i (Ka
i )T
i )T ]),

j − ˜ηa
j∈Ni
(˜ηa
a
i,j − (cid:95)
(cid:95)
(
P
P

i ) − Ka
a
i )M T

i fi)(Mi ¯ηa

i )T ],
i E[fi(¯ηa

i − 2Ka

i )T ]M T
i ,

159

and

2E[(ua

i )(Mi ¯ηa

i )T ] = 2E[(γi

where the cross-correlated term

a
(cid:95)
i,j is updated according to (8). Using (10)-(13), the posterior
P

state estimation error P a

ˆP a
i = M a
i
+2γi

i under attacks is given by
(cid:80)
i [Ri + Σf
¯P a
i (M a
i )T + Ka
a
i,j − (cid:95)
(cid:80)
j∈Ni
i )(M a
(
P
j − ˜ηa
j∈Ni
(˜ηa

i )T ]) + E[fi(¯ηa

a
(cid:95)
P

i ](Ka
i )T + γi

2((cid:80)
i )T − 2Ka
i Ξf
j∈Ni

with Ξf = [E[fi

i )T ](M a

i )T ]. This completes the proof.

j − 2 ˜P a
( ˜P a

i,j + ˜P a
i ),

(14)

160

BIBLIOGRAPHY

161

BIBLIOGRAPHY

[1] R. Olfati-Saber, J. A. Fax, and R. Murray, "Consensus and cooperation in networked multi-

agent systems," Proceedings of the IEEE, vol. 95, no. 1, pp. 215-233, 2007.

[2] F. Bullo, J. Cortés, and S. Martinez, Distributed control of robotic networks: a mathematical

approach to motion coordination algorithms, vol. 27, Princeton University Press, 2009.

[3] A. Khanafer, and T. Başar, "Robust distributed averaging: When are potential-theoretic strate-
gies optimal?," IEEE Transactions on Automatic Control, vol. 61, no. 7, pp. 1767-1779, 2016.

[4] Q. Zhu, and T. Başar, "Game-theoretic methods for robustness, security, and resilience of cyber-
physical control systems: games-in-games principle for optimal cross-layer resilient control
systems," IEEE Control Systems Magazine, vol. 35, no. 1, pp. 46-65, 2015.

[5] F. Pasqualetti, F. Dorﬂer, and F. Bullo, "Attack detection and identiﬁcation in cyber-physical

systems," IEEE transactions on automatic control, vol. 58, no. 11, pp. 2715-2729, 2013.

[6] F. Pasqualetti, F. Dorﬂer, and F. Bullo, "Control-theoretic methods for cyber-physical security:
Geometric principles for optimal cross-layer resilient control systems," IEEE Control Systems
Magazine, vol. 35, no. 1, pp. 110-127, 2015.

[7] H. Fawzi, P. Tabuada, and S. Diggavi, "Secure estimation and control for cyber-physical sys-
tems under adversarial attacks," IEEE Transactions on Automatic control, vol. 59, no. 6, pp.
1454-1467, 2014.

[8] Y. Shoukry, and P. Tabuada, "Event-triggered state observers for sparse sensor noise/attacks,"

IEEE Transactions on Automatic Control, vol. 61, no. 8, pp. 2079-2091, 2016.

[9] Y. Yan, P. Antsaklis, and V. Gupta, "A resilient design for cyber physical systems under

attack," In American Control Conference (ACC), pp. 4418-4423, 2017.

[10] A. Teixeira, I. Shames, H. Sandberg, and K.H. Johansson, "A secure control framework for

resource-limited adversaries," Automatica, vol. 51, pp. 135-148, 2015.

[11] E. Akyol, T. Başar, and C. Langbort, "Signaling games in networked cyber-physical systems
with strategic elements," In 56th IEEE Conference on Decision and Control (CDC), pp. 4576-
4581, 2017.

[12] M. O. Sayin, and T. Başar, "Secure sensor design for cyber-physical systems against advanced
persistent threats," In International Conference on Decision and Game Theory for Security,
pp. 91-111, 2017.

[13] A. Kanellopoulos, and K.G. Vamvoudakis, "Non-equilibrium dynamic games and cy-
ber–physical security: A cognitive hierarchy approach," Systems and Control Letters, vol. 125,
pp. 59-66. 2019

[14] K. G. Vamvoudakis, J. P. Hespanha, B. Sinopoli, and Y. Mo, "Detection in adversarial envi-

ronments," IEEE Transactions on Automatic Control, vol. 59, no. 12, pp. 3209-3223, 2014.

162

[15] A. Kanellopoulos, and K.G. Vamvoudakis, "A moving target defense control

systems," IEEE Transactions on Automatic Control,

frame-
doi:

work for
10.1109/TAC.2019.2915746.

cyber-physical

[16] F. Pasqualetti, A. Bicchi, and F. Bullo, "Consensus computation in unreliable networks: A
system theoretic approach," IEEE Transactions on Automatic Control, vol. 57, no. 1, pp. 90-
104, 2012.

[17] S.Weerakkody, X. Liu, S.H. Son, and B. Sinopoli, "A graph-theoretic characterization of perfect
attackability for secure design of distributed control systems," IEEE Transactions on Control
of Network Systems, vol. 4, no. 1, pp. 60-70, 2017.

[18] S. Sundaram, and C. Hadjicostis, "Distributed function calculation via linear iterative strategies
in the presence of malicious agents," IEEE Transactions on Automatic Control, vol. 56, no. 7,
pp. 1495-1508, 2011.

[19] S. M. Dibaji, H. Ishii, and R. Tempo, "Resilient randomized quantized consensus," IEEE

Transactions on Automatic Control, vol. 63, no. 8, pp. 2508-2522, 2018.

[20] H. J. LeBlanc, H. Zhang, and X. Koutsoukos, and S. Sundaram, "Resilient asymptotic consen-
sus in robust networks," IEEE Journal on Selected Areas in Communications, vol. 31, no. 4,
pp. 766-781, 2013.

[21] H. J. LeBlanc, and X. Koutsoukos, "Resilient ﬁrst-order consensus and weakly stable, higher
order synchronization of continuous-time networked multiagent systems," IEEE Transactions
on Control of Network Systems, vol. 5, no. 3, pp. 1219-1231, 2018.

[22] X. Jin, W. Haddad, and T. Yucelen, "An adaptive control architecture for mitigating sensor
and actuator attacks in cyber-physical systems," IEEE Transactions on Automatic Control,
vol. 62, no. 11, pp. 6058-6064, 2017.

[23] K. G. Vamvoudakis, and J. P. Hespanha, "Game-theory-based consensus learning of double-
integrator agents in the presence of worst-case adversaries," Journal of Optimization Theory
and Applications, vol. 177, no. 1, pp. 222-253, 2018.

[24] M. Pirani, E. Nekouei, S. M. Dibaji, H. Sandberg, and K. H. Johansson, "Design of attackt re-
silient consensus dynamics: A game-theoretic approach," In 18th European Control Conference
(ECC), pp. 2227-2232, 2019.

[25] S. M. Dibaji, M. Pirani, D.B. Flamholz, A.M. Annaswamy, K. H. Johansson, and A.

Chakrabortty, "A systems and control perspective of CPS security," 2019.

[26] S. Kotz, and N. Johnson, Process capability indices. Chapman and Hall/CRC, 1993.

[27] Z. Li, and Z. Duan, Cooperative control of multi-agent systems: a consensus region approach.

CRC Press, 2014.

[28] F. L. Lewis, H. Zhang, K. Hengster-Movric, and A. Das, Cooperative control of multi-agent
systems: optimal and adaptive design approaches. Springer Science and Business Media, 2013.

[29] Y. Su, and J. Huang, "Stability of a class of linear switching systems with applications to two
consensus problems," IEEE Transactions on Automatic Control, vol. 57, no. 6, pp. 1420-1430,
2012.

163

[30] H. Zhang, F. L. Lewis, and A. Das, "Optimal design for synchronization of cooperative systems:
state feedback, observer and output feedback," IEEE Transactions on Automatic Control, vol.
56, no. 8, pp. 1948-1952, 2011.

[31] E. Daniel, E. Frisk, and M. Krysander, "A method for quantitative fault diagnosability analysis

of stochastic linear descriptor models," Automatica, vol. 49, no. 6, pp. 1591-1600, 2013.

[32] K. Michel, and J. Hao, "Distributed sensor fault detection and isolation over network," IFAC

Proceedings Volumes, vol. 47, no. 3, pp. 11458-11463, 2014.

[33] D. Kazakos, and P. Kazakos, Detection and Estimation. Computer Science Press, 1990.

[34] M. Wax, and T. Kailath, "Detection of signals by information theoretic criteria," IEEE Trans-

actions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 387-392, 1985.

[35] Y. Mo, and B. Sinopoli, "Secure control against replay attacks," In 47th Annual Allerton

Conference on Communication, Control, and Computing (Allerton), pp. 911-918, 2009.

[36] Z. Guo, K. H. Johansson, and L. Shi, "Worst-case stealthy innovation-based linear attack on

remote state estimation," Automatica, vol. 89, pp. 117-124, 2018.

[37] T. Li, and J. F. Zhang, "Mean square average-consensus under measurement noises and ﬁxed
topologies: Necessary and suﬃcient conditions," Automatica, vol. 45, no. 8, pp. 1929-1936,
2009.

[38] J. G. Proakis, Digital Communications, New York, NY, USA:McGraw-Hill, 1995.

[39] P. Fernando, Kullback-Leibler divergence estimation of continuous distributions. IEEE inter-

national symposium on information theory, pp. 1666-1670, 2008.

[40] Q. Wang, S. Kulkarni, and S. Verdu, "Divergence estimation for multidimensional densities via
k-nearest-neighbor Distances," IEEE Transactions on Information Theory, vol. 55, no. 5, pp.
2392-2405, 2009.

[41] M. Basseville, and I. V. Nikiforov, Detection of abrupt changes: theory and application, vol.

104, Englewood Cliﬀs: Prentice Hall, 1993.

[42] T. M. Cover, and J. A. Thomas, Elements of information theory. John Wiley and Sons, 2012.

[43] M. H. Protter, and C. B. Morrey, Diﬀerentiation under the Integral Sign. New York: Springer,

1985.

[44] W. Ren, and R. W. Beard, "Consensus seeking in multiagent systems under dynamically chang-
ing interaction topologies," IEEE Transactions on automatic control, vol. 50, no. 5, pp. 655-661,
2005.

[45] D. T. Ton, and M. A. Smith, “The U.S. Department of Energy’s Microgrid Initiative", Elseiveir,

The Electricity Journal, vol. 25, pp. 84-94, Oct. 2012.

[46] A. Bidram, and A. Davoudi, “Hierarchical structure of microgrids control system," IEEE Trans.

Smart Grid, vol. 3, pp. 1963-1976, Dec. 2012.

164

[47] Z. Li, C. Zang, P. Zeng, H. Yu, and S. Li, “Fully distributed hierarchical control of parallel
grid-supporting inverters in islanded AC microgrids," IEEE Trans. Ind. Informat., vol. 14, no.
2, pp. 679-690, Feb. 2018.

[48] J. Schiﬀer, T. Seel, J. Raisch, and T. Sezi, “Voltage stability and reactive power sharing in
inverter-based microgrids with consensus based distributed voltage control," IEEE Trans. Con-
trol Syst. Technol., vol. 24, no. 1, pp. 96-109, Jan. 2016.

[49] M. Yazdanian and A. Mehrizi-Sani, “Distributed control techniques in microgrids," IEEE Trans.

Smart Grid, vol. 5, no. 6, pp. 2901-2909, Nov. 2014.

[50] A. Bidram, A. Davoudi, F. L. Lewis, and J. M. Guerrero, “Distributed cooperative control of
microgrids using feedback linearization," IEEE Trans. Power Syst., vol. 28, no. 3, pp. 3462-
3470, Aug. 2013.

[51] A. Bidram, F. L. Lewis, and A. Davoudi, “Distributed control systems for small-scale power
networks: Using multiagent cooperative control theory," IEEE Control Systems Magazine, vol.
34, no. 6, pp. 56-77, Nov. 2014.

[52] J. Duan, C. Wang, H. Xu, W. Liu, J. C. Peng, and H. Jiang, “Distributed control of inverter-
interfaced microgrids with bounded transient line currents," IEEE Trans. Ind. Informat., vol.
14, no. 5, pp. 2052-2061, May 2018.

[53] San Diego Gas and Electric Company, “Smart grid architecutre demonstrations program –

EPIC-1, Project 1 report," Electric Power Investment Charge (EPIC), Dec. 2017.

[54] A. Bidram, A. Davoudi, and F. L. Lewis, “A Multiobjective distributed control framework for
islanded AC microgrids," IEEE Trans. Ind. Informat., vol. 10, no. 3, pp. 1785-1798, May 2014.

[55] A. Bidram, A. Davoudi, F. L. Lewis, and Z. Qu, “Secondary control of microgrids based on
distributed cooperative control of multi-agent systems," IET Generation, Transmission, Dis-
tribution, vol. 7, no. 8, pp. 822-831, Aug. 2013.

[56] N. M. Dehkordi, H. R. Baghaee, N. Sadati and J. M. Guerrero, “Distributed noise-resilient
secondary voltage and frequency control for islanded microgrids," IEEE Trans. Smart Grid,
vol. 10, no. 4, pp. 3780-3790, July 2019.

[57] J. Duan, C. Wang, H. Xu, W. Liu, Y. Xu, J. C. Peng, and H. Jiang, “Distributed control of
inverter-interfaced microgrids based on consensus algorithm with improved transient perfor-
mance," IEEE Trans. Smart Grid, vol. 10, no. 2, pp. 1303-1312, Mar. 2019.

[58] D. Jin, Z. Li, C. Hannon, C. Chen, J. Wang, M. Shahidehpour, and C. W. Lee, “Toward a
cyber resilient and secure microgrid using software-deﬁned networking," IEEE Trans. Smart
Grid, vol. 8, no. 5, pp. 2494-2504, Sept. 2017.

[59] X. Liu and Z. Li, "False data attacks against AC state estimation with incomplete network

information," IEEE Trans. Smart Grid, vol. 8, no. 5, pp. 2239-2248, Sept. 2017.

[60] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, "Limiting false data attacks on power system

state estimation," in Proc. 44th Annu. Conf. Inf. Sci. Syst. (CISS), 2010, pp. 1-6.

165

[61] Y. Liu, P. Ning, and M. K. Reiter, "False data injection attacks against state estimation in

electric power grids," ACM Trans. Inf. Syst. Security, vol. 14, no. 1, pp. 1-33, May 2011.

[62] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, “Malicious data attacks on the smart grid," IEEE

Trans. Smart Grid, vol. 2, no. 4, pp. 645-658, Dec. 2011.

[63] M. Chlela, D. Mascarella, G. Joos, and M. Kassouf, “Fallback control for isochronous energy
storage systems in autonomous microgrids under denial-of-service cyber-attacks," IEEE Trans.
Smart Grid, vol. 9, no. 5, pp. 4702-4711, Sept. 2018.

[64] W. Meng, X. Wang and S. Liu, “Distributed load sharing of an inverter-based microgrid with

reduced communication," IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 1354-1364, Mar. 2018.

[65] B. Schafer, D. Witthaut, M. Timme, and V. Latora, “Dynamically induced cascading failures

in power grids" Nature Communications, vol. 9, Article Number 1975, pp. 1-13, 2018.

[66] Y. Huang, J. Tang, Y. Cheng, H. Li, K. A. Campbell, and Z. Han, “Real-time detection of false
data injection in smart grid networks: An adaptive cusum method and analysis," IEEE Syst.
J., vol. 10, no. 2, pp. 532-543, June 2016.

[67] K. Manandhar, X. Cao, F. Hu, and Y. Liu, “Detection of faults and attacks including false
data injection attack in smart grid using kalman ﬁlter," IEEE Trans. Control Netw. Syst., vol.
1, no. 4, pp. 370-379, Dec. 2014.

[68] S. Bi and Y. J. Zhang, “Graphical methods for defense against false-data injection attacks on
power system state estimation," IEEE Trans. Smart Grid, vol. 5, no. 3, pp. 1216-1227, May
2014.

[69] Y. Mo, R. Chabukswar, and B. Sinopoli, “Detecting integrity attacks on scada systems," IEEE

Trans. Control Syst. Technol., vol. 22, no. 4, pp. 1396-1407, July 2014.

[70] L. Liu, M. Esmalifalak, Q. Ding, V. A. Emesih, and Z. Han, “Detecting false data injection
attacks on power grid by sparse optimization,” IEEE Trans. Smart Grid, vol. 5, no. 2, pp.
612-621, Mar. 2014.

[71] D. B. Rawat, and C. Bajracharya, “Detection of false data injection attacks in smart grid
communication systems," IEEE Signal Process. Lett., vol. 22, no. 10, pp. 1652–1656, Oct.
2015.

[72] X. Wang, X. Luo, Y. Zhang, and X. Guan, “Detection and isolation of false data injection
attacks in smart grids via nonlinear internal observer," IEEE Trans. Smart Grid, vol. 6, no. 4,
pp. 6498–6512, Aug. 2019.

[73] L. Y. Lu, H. J. Liu, and H. Zhu, “Distributed secondary control for isolated microgrids under
malicious attacks," in Proc. North American Power Symposium (NAPS), Denver, CO, USA,
2016, pp. 1-6.

[74] O. A. Beg, T. T. Johnson, and A. Davoudi, “Detection of false-data injection attacks in cyber-
physical DC microgrids," IEEE Trans. Ind. Informat., vol. 13, no. 5, pp. 2693-2703, Oct. 2017.

[75] S. Saha, T. K. Roy, M. A. Mahmud, M. E. Haque, S. N. Islam, “Sensor fault and cyber attack
resilient operation of DC microgrids," Int. J. Electr. Power Energy Syst., vol. 99, pp. 540-554,
2018.

166

[76] O. A. Beg, T. T. Johnson, and A. Davoudi, “Detection of false-data injection attacks in cyber-
physical DC microgrids," IEEE Trans. Ind. Informat., vol. 13, no. 5, pp. 2693-2703, Oct. 2017.

[77] S. Abhinav, H. Modares, F. L. Lewis, and A. Davoudi, “Resilient cooperative control of DC

microgrids," IEEE Trans. Smart Grid, vol. 10, no. 1, pp. 1083-1085, Jan. 2019.

[78] T. R. B. Kushal, K. Lai, and M. S. Illindala, “Risk-based mitigation of load curtailment cyber-
attack using intelligent agents in a shipboard power system," IEEE Trans. Smart Grid, vol.
10, no. 5, pp. 4741-4750, Sept. 2019.

[79] S. Abhinav, H. Modares, F. L. Lewis, F. Ferrese, and A. Davoudi, “Synchrony in networked
microgrids under attacks,” IEEE Trans. Smart Grid, vol. 9, no. 6, pp. 6731-6741, Nov. 2018.

[80] Z. Qu, Cooperative control of dynamical systems: Applications to autonomous vehicles. New

York: Springer-Verlag, 2009.

[81] M. Zhou, Y. Wang, A. K. Srivastava, Y. Wu, and P. Banerjee, “Ensemble-based algorithm for
synchrophasor data anomaly detection,” IEEE Trans. Smart Grid, vol. 10, no. 3, pp. 2979-2988,
May 2019.

[82] S. Kullback, and R. A. Leibler, “On information and suﬃciency”, The annals of mathematical

statistics, vol. 22, no. 1, pp.79-86, 1951.

[83] F. Sun, Z. H. Guan, L. Ding, and Y.W. Wang, “Mean square average-consensus for multi-agent
systems with measurement noise and time delay,” International Journal of System and Science,
vol. 44, no. 6, pp. 995-1005, 2013.

[84] T. Li, and J. F. Zhang, “Consensus conditions of multi-agent systems with time-varying topolo-
gies and stochastic communication noises,” IEEE Trans. Autom. Control, vol. 55, no. 9, pp.
2043-2057, 2010.

[85] N. Mwakabuta and A. Sekar, “Comparative study of the IEEE 34 node test feeder under
practical simpliﬁcations," in Proc. 39th North American Power Symposium, 2007, pp. 484-491.

[86] N Cameron, and J Cortés, "Team-triggered coordination for real-time control of networked
cyber-physical systems," IEEE Transactions on Automatic Control, vol. 61, no. 1, pp. 34-47,
2016.

[87] K. Saulnier, D. Saldana, A. Prorok, G. Pappas and V. Kumar, "Resilient ﬂocking for mobile

robot teams," IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 1039-1046, 2017.

[88] L. Zhou, V. Tzoumas, G. Pappas, and P. Tokekar, "Resilient Active Target Tracking With

Multiple Robots," IEEE Robotics and Automation Letters, vol. 4, no. 1, pp. 129-136, 2019.

[89] W. Abbas, V. Yevgeniy, and X. Koutsoukos, "Resilient consensus protocol in the presence of

trusted nodes," in Proc. of 7th Int. Sym. on Resilient Control Sys., pp. 1-7, 2014.

[90] J. Usevitch and D. Panagou, "Resilient leader-follower consensus to arbitrary reference values,"

in Proc. of American Control Conference., pp. 1292-1298, 2018.

[91] F. Lewis, H. Zhang, K. Hengster-Movric, and A. Das, Cooperative Control of Multi-Agent
Systems: Optimal and Adaptive Design Approaches, Communications and Control Engineering,
Springer, London, 2013.

167

[92] Q. Jiao, H. Modares, F. L. Lewis, S. Xu, and L. Xie, "Distributed gain output-feedback control

of homogeneous and heterogeneous systems," Automatica, vol. 71, pp. 361-368, 2016.

[93] A. Isidori, Nonlinear control systems, Springer Science and Business Media, 2013.

[94] R. Moghadam and H. Modares, "An internal model principle for the attacker in distributed
control systems," In Proceedings of IEEE Conference on Decision and Control, pp. 6604-6609,
2017.

[95] M. V. Jakuba, "Modeling and control of an autonomous underwater vehicle with combined

foil/thruster actuators," M.S. thesis, MIT Woods Hole Oceanographic Inst., USA, 2003.

[96] K. D. Kim, and P. R. Kumar, "Cyber-physical systems: A perspective at the centennial",

Proceedings of the IEEE, vol. 100, pp. 1287-1308, 2012.

[97] J. Lee, B. Bagheri, and H. Kao, "A cyber-physical systems architecture for industry 4.0-based

manufacturing systems", Manufacturing Letters, vol. 3, pp. 18-23, 2015.

[98] J. Fink, A. Ribeiro, and V. Kumar, "Robust control for mobility and wireless communication
in cyber-physical systems with application to robot teams", Proceedings of the IEEE, vol. 100,
no. 1, pp. 164-178, 2012.

[99] S. Sridhar, A. Hahn, and M. Govindarasu, "Cyber-physical system security for the electric

power grid", Proceedings of the IEEE, vol. 100, no. 1, pp. 210-224, 2012.

[100] J. J. Blum, A. Eskandarian, and L. J. Hoﬀman, "Challenges of intervehicle ad hoc networks",

IEEE Transactions on Intelligent Transportation Systems, vol. 5, no. 4, pp. 347-351, 2004.

[101] J. P. Farwell, and R. Rohozinski, "Stuxnet and the future of cyber war", Survival, vol. 53, no.

1, pp. 23-40, 2011.

[102] J. Slay, and M. Miller, "Lessons learned from the Maroochy water breach", Critical Infras-

tructure Protection, vol. 253, pp. 73-82, 2007.

[103] I. Akyildiz, W. Su, Y. Sankarasubramniam, and E. Cayirci, "A survey on sensor networks",

IEEE Communications Magazine, vol. 40, no. 8, pp. 102-114, 2002.

[104] B. D. O. Anderson, and J. B. Moore, Optimal Filtering, Courier corporation, 2012.

[105] D. P. Spanos, R. Olfati-Saber, and R. M. Murray, "Approximate distributed Kalman ﬁlter-
ing in sensor networks with quantiﬁable performance", Proceedings of the 4th International
Symposium on Information Processing in Sensor Networks, pp. 133-139, 2005.

[106] R. Olfati-Saber, "Distributed Kalman ﬁltering for sensor networks", Proceedings of the 46th

IEEE Conference on Decision and Control, pp. 5492-5498, 2007.

[107] R. Olfati-Saber, "Kalman-Consensus Filter : Optimality, stability, and performance", Pro-

ceedings of the 48th IEEE Conference on Decision and Control, pp. 7036-7042, 2009.

[108] S. Das and J. M. F. Moura, "Distributed Kalman ﬁltering with dynamic observations consen-

sus", IEEE Transactions on Signal Processing, vol. 63, no. 17, pp. 4458-4473, 2015.

168

[109] G. Wei, W. Li, D. Ding, and Y. Liu, "Stability Analysis of Covariance Intersection-Based
Kalman Consensus Filtering for Time-Varying Systems", IEEE Transactions on Systems, Man,
and Cybernetics: Systems, doi: 10.1109/TSMC.2018.2855741.

[110] S. Das and J. M. F. Moura, "Consensus + innovations distributed Kalman ﬁlter with optimized

gains", IEEE Transactions on Signal Processing, vol. 65, no. 2, pp. 467-481, 2017.

[111] U. A. Khan and J. M. F. Moura, "Distributing the Kalman Filter for Large-Scale Systems",

IEEE Transactions on Signal Processing, vol. 56, no. 10, pp. 4919-4935, 2008.

[112] W. Li, Y. Jia, and J. Du, "Event-triggered Kalman consensus ﬁlter over sensor networks",

IET Control Theory and Applications, vol. 10, no. 1, pp. 103-110, 2016.

[113] Q. Liu, Z. Wang, X. He, and D. H. Zhou, "Event-Based Recursive Distributed Filtering Over
Wireless Sensor Networks", IEEE Transactions on Automatic Control, vol. 60, no. 9, pp. 2470-
2475, 2015.

[114] X. Meng, and T. Chen, "Optimality and stability of event triggered consensus state estimation
for wireless sensor networks", Proceedings of the 48th American Control Conference, pp. 3565-
3570, 2014.

[115] G. Battistelli, L. Chisci, and D. Selvi, "A distributed Kalman ﬁlter with event-triggered

communication and guaranteed stability", Automatica, vol. 93, pp. 75-82, 2018.

[116] R.C. Francy, A.M. Farid, and K. Youcef-Toumi, "Event triggered state estimation techniques
for power systems with integrated variable energy resources", ISA transactions, vol. 56, pp.
165-172, 2015.

[117] S. Li et al., "Event-Trigger Heterogeneous Nonlinear Filter for Wide-Area Measurement Sys-
tems in Power Grid", IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 2752-2764, 2019.

[118] N. Sadeghzadeh Nokhodberiz, H. Nemati, and A. Montazeri, "Event-Triggered Based State
Estimation for Autonomous Operation of an Aerial Robotic Vehicle", IFAC-Papers On Line,
2019.

[119] M. Ouimet, D. Iglesias, N. Ahmed, and S. Martínez, "Cooperative Robot Localization Using
Event-Triggered Estimation", Journal of Aerospace Information Systems, vol. 15, no. 7, pp.
427-449, 2018.

[120] A. Gupta, C. Langbort, and T. Basar, "Optimal control in the presence of an intelligent jam-
mer with limited actions", Proceedings of the 49th IEEE Conference on Decision and Control,
pp. 1096-110, 2010.

[121] L. Yu, X. Sun, and T. Sui, "False-Data Injection Attack in Electricity Generation System
Subject to Actuator Saturation: Analysis and Design," IEEE Transactions on Systems, Man,
and Cybernetics: Systems, vol. 49, no. 8, pp. 1712-1719, 2019.
p4r30 Y. Mo, E. Garone, A. Casavola, and B. Sinopoli, "False data injection attacks against
state estimation in wireless sensor networks", Proceedings of the 49th IEEE Conference on
Decision and Control, pp. 5967-5972. 2010.

169

[122] F. Miao, Q. Zhu, M. Pajic, and G. J. Pappas, "Coding schemes for securing cyber-physical
systems against stealthy data injection attacks", IEEE Transactions on Control of Network
Systems, vol. 4, no. 1, pp 106-117, 2017.

[123] C.Z. Bai, V. Gupta, and F. Pasqualetti, "On Kalman Filtering with Compromised Sensors:
Attack Stealthiness and Performance Bounds", IEEE Transactions on Automatic Control, vol.
62, no. 12, pp. 6641-6648, 2017.

[124] Y. Chen, S. Kar, and J. M. F. Moura, "Resilient Distributed Estimation Through Adversary

Detection", IEEE Transactions on Signal Processing, vol. 66, no. 9, pp. 2455-2469, 2018.

[125] Y. Chen, S. Kar, and J. M. F. Moura, "Resilient distributed estimation: sensor attacks",

IEEE Transactions on Automatic Control, 2019.

[126] A. Mitra, and S. Sundaram, "Byzantine-resilient distributed observers for LTI systems", Au-

tomatica, vol. 108, 2019.

[127] A. Mitra, J. Richards, S. Bagchi, and S. Sundaram, "Resilient distributed state estimation
with mobile agents: overcoming Byzantine adversaries, communication losses, and intermittent
measurements", Autonomous Robots, 2018.

[128] A. Mustafa, and H. Modares, "Analysis and detection of cyber-physical attacks in distributed
sensor networks", Proceedings of the 56th Allerton Conference on Communication, Control,
and Computing, pp. 973-980, 2018.

[129] W. Chen, D. Ding, H. Dong, and G. Wei, "Distributed Resilient Filtering for Power Systems
Subject to Denial-of-Service Attacks", IEEE Transactions on Systems, Man, and Cybernetics:
Systems, vol. 49, no. 8, pp. 1688-1697, 2019.

[130] D. Du, X. Li, W. Li, R. Chen, M. Fei, and L. Wu, "ADMM-Based Distributed State Estimation
of Smart Grid Under Data Deception and Denial of Service Attacks", IEEE Transactions on
Systems, Man, and Cybernetics: Systems, vol. 49, no. 8, pp. 1698-1711, 2019.

[131] B. Chen, D. W. C. Ho, W. Zhang and L. Yu, "Distributed Dimensionality Reduction Fusion
Estimation for Cyber-Physical Systems Under DoS Attacks", IEEE Transactions on Systems,
Man, and Cybernetics: Systems, vol. 49, no. 2, pp. 455-468, 2019.

[132] P. Millan, L. Orihuela, C. Vivas, F. Rubio, D. Dimarogonas, and K. H. Johansson, "Sensor
network-based robust distributed control and estimation", Control Engineering Practice, vol.
21, no. 9, pp. 1238-1249, 2013.

[133] A. R. Liu, and R. R. Bitmead, "Stochastic observability in network state estimation and

control", Automatica, vol. 47, no. 1, pp. 65-78, 2011.

[134] S. Trimpe and R. D’Andrea, "Event-based state estimation with variance-based triggering",

IEEE Transactions on Automatic Control, vol. 59, no. 12, pp. 3266-3281, Dec. 2014.

[135] S. Weerakkody, B. Sinopoli, S. Kar, and A. Datta, "Information ﬂow for security in control
systems", Proceedings of the 55th IEEE Conference on Decision and Control, pp. 5065-5072,
2016.

170

[136] M. N. Goria, N. N. Leonenko, V. V. Mergel, and P. L. N. Inverardi, "A new class of random
vector entropy estimators and its applications in testing statistical hypotheses", Journal of
Nonparametric Statistics,, vol. 17, no. 3, pp. 277-297, 2005.

[137] J. Su, B. Li, and W. Chen, "On existence, optimality and asymptotic stability of the Kalman

ﬁlter with partially observed inputs", Automatica, vol. 53, pp. 149-154, 2015.

[138] R. S. Sutton, and A. G. Barto, Reinforcement Learning—An Introduction. Cambridge, MA:

MIT Press, 1998.

[139] D. P. Bertsekas, and J. N. Tsitsiklis,Neuro-Dynamic Programming. Belmont, MA: Athena

Scientiﬁc, 1996.

[140] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality.

New York: Wiley-Interscience, 2007.

[141] P. J. Werbos, "A menu of designs for reinforcement learning over time", in Neural Networks
for Control, W. T. Miller, R. S. Sutton, and P. J. Werbos, Eds. Cambridge, MA: MIT Press,
pp. 67–95, 1991.

[142] J. Kober, J. A. Bagnell, and J. Peters, "Reinforcement learning in robotics: A survey", The

International Journal of Robotics Research,, vol. 32, no. 11, pp. 1238-1274, 2013.

[143] R. S. Sutton, A. G. Barto and R. J. Williams, "Reinforcement learning is direct adaptive

optimal control", in IEEE Control Systems Magazine, vol. 12, no. 2, pp. 19-22, 1992.

[144] R. Chavarriaga, P.W. Ferrez, and J. D. Millan, "To err is human: Learning from error po-
tentials in brain-computer interfaces", in Advances in Cognitive Neurodynamics, Springer, pp.
777-782, 2007.

[145] H. Lu, Y. Li, M. Chen, H. Kim, and S. Serikawa, "Brain intelligence: go beyond artiﬁcial

intelligence", Mobile Networks and Applications, vol. 23, no. 2, pp. 368-375, 2018.

[146] A.Y. Ng, D. Harada, and S. Russell, "Policy invariance under reward transformations: Theory
and application to reward shaping", in proceedings of the 16th international conference on
Machine learning, pp. 278-287, 1999.

[147] G. Konidaris, and A. Barto, "Autonomous shaping: Knowledge transfer in reinforcement
learning", in proceedings of the 23rd international conference on Machine learning, pp. 489-
496, 2006.

[148] N. Chentanez, A. Barto, and S. Singh, "Intrinsically motivated reinforcement learning", in

proceedings of Advances in neural information processing systems, pp. 1281-1288, 2005.

[149] B. Kiumarsi, K. G. Vamvoudakis, H. Modares and F. L. Lewis, "Optimal and Autonomous
Control Using Reinforcement Learning: A Survey", in IEEE Transactions on Neural Networks
and Learning Systems, vol. 29, no. 6, pp. 2042-2062, 2018.

[150] K. Doya, "Reinforcement learning in continuous-time and space", Neural Computation, vol.

12, pp. 219–245, 2000.

[151] R. S. Sutton, A. G. Barto and R. J. Williams, "Reinforcement learning is direct adaptive

optimal control", in IEEE Control Systems Magazine, vol. 12, no. 2, pp. 19-22, 1992.

171

[152] M. Ohnishi, W. Li, N. Gennaro, and M. Egerstedt, "Barrier-certiﬁed adaptive reinforcement
learning with applications to brushbot navigation", IEEE Transactions on Robotics, vol. 35,
no. 5, pp. 1186-1205, 2019.

[153] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, "End-to-end safe reinforcement
learning through barrier functions for safety-critical continuous control tasks", In Proceedings
of the AAAI Conference on Artiﬁcial Intelligence, vol. 33, pp. 3387-3395, 2019.

[154] F. L. Lewis, and D. Liu, Reinforcement learning and approximate dynamic programming for

feedback control, John Wiley & Sons, vol. 17, 2013.

[155] J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Handbook of Learning and Approximate

Dynamic Programming, IEEE Press and John Wiley & Sons, 2004.

[156] J. Liu, N. Ozay, U. Topcu and R. M. Murray, "Synthesis of Reactive Switching Protocols
in IEEE Transactions on Automatic Control,vol. 58,

From Temporal Logic Speciﬁcations",
no. 7, pp. 1771-1785, 2013.

[157] I. Papusha, J. Fu, U. Topcu and R. M. Murray, "Automata theory meets approximate dynamic
programming: Optimal control with temporal logic constraints", in proceedings of IEEE 55th
Conference on Decision and Control (CDC), pp. 434-440, 2016.

[158] Y. Zhou, D. Maity and J. S. Baras, "Timed automata approach for motion planning using
metric interval temporal logic", in proceedings of European Control Conference (ECC), pp.
690-695, 2016.

[159] S. Saha and A. A. Julius, "Task and Motion Planning for Manipulator Arms With Metric
Temporal Logic Speciﬁcations", in IEEE Robotics and Automation Letters, vol. 3, no. 1, pp.
379-386, 2018.

[160] X. Li, C. Vasile and C. Belta, "Reinforcement learning with temporal logic rewards", in
proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
pp. 3834-3839, 2017.

[161] D. Sadigh, E. S. Kim, S. Coogan, S. S. Sastry and S. A. Seshia, "A learning based approach
to control synthesis of Markov decision processes for linear temporal logic speciﬁcations", in
proceedings of 53rd IEEE Conference on Decision and Control, pp. 1091-1096, 2014.

[162] M. Wen, R. Ehlers and U. Topcu, "Correct-by-synthesis reinforcement learning with temporal
logic constraints", in proceedings of 54th IEEE Conference on Decision and Control, pp. 4983-
4990, 2015.

[163] X. Li, Y. Ma and C. Belta, "A Policy Search Method For Temporal Logic Speciﬁed Reinforce-
ment Learning Tasks", in proceedings of American Control Conference (ACC), pp. 240-245,
2018.

[164] M. McIntire, D. Ratner, and S. Ermon, "Sparse Gaussian processes for Bayesian optimiza-
tion", in proceedings of the 32nd Conference on Uncertainty in Artiﬁcial Intelligence (UAI’16),
AUAI Press, USA, pp. 517–526, 2016.

172

[165] A. Donze, and O. Maler, "Robust satisfaction of temporal logic over real-valued signals", in
proceedings of in Int. Conf. on Formal Modeling and Analysis of Timed Systems, pp. 92-106,
2010.

[166] G.E. Fainekos, and G. J. Pappas, "Robustness of temporal logic speciﬁcations for continuous-

time signals", Theoretical Computer Science, vol. 410, no. 42, pp. 4262-4291, 2009.

[167] L. Lindemann, and D. V. Dimarogonas, "Robust control for signal temporal logic speciﬁcations

using discrete average space robustness", Automatica, vol. 101, pp. 377-387, 2019.

[168] L. Lindemann, and D. V. Dimarogonas, "Control Barrier Functions for Signal Temporal Logic

Tasks", in IEEE Control Systems Letters, vol. 3, no. 1, pp. 96-101, 2019.

[169] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT

Press, 2006.

[170] F. L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, 3rd ed. New York: Wiley, 2012.

[171] Y. Jiang, and Z. P. Jiang, "Computational adaptive optimal control for continuous-time linear
systems with completely unknown dynamics", Automatica, vol. 48, no. 10, pp. 2699–2704, 2012.

[172] H. K. Khalil, Nonlinear Systems, 3rd ed. Englewood Cliﬀs, NJ: Prentice Hall, 2002.

[173] Y. Engel, S. Mannor, and R. Meir, "Bayes meets Bellman: The Gaussian process approach to
temporal diﬀerence learning", in Proceedings of the 20th International Conference on Machine
Learning (ICML), pp. 154-161, 2003.

[174] M. McIntire, D. Ratner, and S. Ermon, "Sparse Gaussian processes for Bayesian optimiza-
tion", in Proceedings of conference on Uncertainty in Artiﬁcial Intelligence (UAI), pp. 154-161,
2016.

[175] F. Berkenkamp, A.P. Schoellig, and A. Krause, "Safe controller optimization for quadro-
tors with Gaussian processes", in IEEE International Conference on Robotics and Automation
(ICRA), pp. 491-496, 2016.

[176] J. Berberich, and F. Allgöwer, "A trajectory-based framework for data-driven system analysis

and control", arXiv preprint arXiv:1903.10723, 2019.

[177] J. C. Willems, P. Rapisarda, I. Markovsky, and B. De Moor, "A note on persistency of

excitation", Systems & Control Letters, vol. 54, pp. 325–329, 2005.

[178] A. M. de Souza, D. Meneghetti, M. Ackermann, and A. de Toledo Fleury, "Vehicle Dynamics-

Lateral: Open Source Simulation Package for MATLAB", SAE Technical Paper, 2016.

173