ADVERSARIAL MODELING IN GAME-THEORETIC FRAMEWORKS FOR SECURING CYBER-PHYSICAL SYSTEMS By Sandeep Banik A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical and Computer Engineeringโ€”Doctor of Philosophy 2023 ABSTRACT In an era where intricate cyber and physical systems are integrated into daily life, the imperative for controlling and optimizing them emerges as a critical avenue. This avenue addresses global challenges such as climate change, healthcare equity, security, and resilience, shaping the swiftly evolving concepts of resource sharing and the shared economy. This complexity is particularly evident in critical infrastructures, spanning electrical grids, building management systems, solar farms, autonomous vehicles, and other Cyber-Physical Systems (CPS). Furthermore, securing CPS through decision-making inherently involves collaboration or competition, engaging multiple stakeholders with diverse perspectives and interests. Game theory provides a powerful analytical framework to model strategic conflicts among decision makers to assure security and resilience. The realm of security methodologies in CPS is vast and has garnered considerable attention over the past two decades. Different adversarial models impact CPS in various ways by targeting one or multiple security attributes. Therefore, a fundamental aspect of securing a CPS is the characterization of the adversary type and developing corresponding defense strategies. In the first part of the thesis, we introduce a game-theoretic decision-making framework that captures the interaction between a defender and two types of adversaries: a deterministic adver- sary, and a stochastic adversary capable of both benign and adversarial actions. We analyze this framework under different information structures and focus on characterizing the Nash equilibrium of the game, particularly emphasizing closed-form solutions. We illustrate how this framework can be applied in various domains such as path planning, motion planning, and in the context of resilient estimation. In the second part of the thesis, we design a game-theoretic framework to encompass state- dependent decision-making and develop defensive strategies against an adversary capable of a complete takeover of a dynamical system. We employ tools from optimization, control theory, and backward induction to solve for the takeover strategies and control policies of both players. We demonstrate the application of this framework in linear dynamical systems. Finally, we present a domain-aware data-driven framework to determine defensive strategies by simulating an adversary in a high-fidelity CPS. We illustrate the application of this data-driven framework in a smart building system. In conclusion, we discuss potential future extensions and the integration of the game-theoretic framework with the data-driven approach. Copyright by SANDEEP BANIK 2023 ACKNOWLEDGEMENTS I would like to express my deepest gratitude to Prof. Shaunak D. Bopardikar for affording me the invaluable opportunity to embark on this Ph.D. journey. His unwavering support has served as a fount of inspiration and a wellspring of innovative ideas. Engaging discussions with him have been both stimulating and enlightening, consistently providing deeper insights into the topics at hand. My sincere thanks extend to Prof. Hayder Radha, Prof. Betty H.C. Cheng, and Prof. Bahare Kiumarsi for their meticulous review of my thesis and for their roles on my Ph.D. committee. Their constructive feedback has significantly contributed to broadening the horizons of my work. The accomplishments detailed in this thesis are the fruit of collaboration with exceptional individuals. I am particularly indebted to Arnab Bhattacharya and Thiagarajan Ramachandran from the Pacific Northwest National Laboratory (PNNL) for their joint efforts in the second part of this thesis. Their mentorship and patience have been invaluable, not only in the thesis but also during my enriching summer internship at PNNL. Reflecting on my years at MSU, the transformative journey has been shaped by remarkable individuals. Joining MSU as graduate students in 2019, Sankhadeep Basu and Aakash Khandelwal were not just roommates but cherished companions in unforgettable movie nights, pizza indulgences, and game sessions. The camaraderie extended to weekend coffees, the madness of preparing weekly meals in a single evening, Friday binges, birthday celebrations, ping-pong games, and the ambitious yet elusive gym plans. The shared experiences, summer walks, and festive celebrations are etched in my memory. Likewise, Shivam Bajaj, with his camaraderie in coffee breaks, shared lunches, Spartan village strolls, and whimsical whiteboard doodles, has left an indelible mark. I also extend my gratitude to labmates Christopher Calle, Ethan Lau, Bhargav Jha, Pouria Tooranjipour, and Richard Frost for their support and guidance during challenging times. I owe immeasurable thanks to my family for their unwavering support. To Chetan M Rao, my childhood friend, words cannot capture the depth of gratitude for the shared trips, laughter, and meals. To Shivangi Agarwal, my wife, your steadfast support during tough times has been my anchor. Our shared interests in travel, specialty coffee, South Indian cuisine, and TV series have v been a source of bliss. Ishani Banik, my sister, has been my inspiration since my bachelorโ€™s degree, motivating me to pursue research and offering invaluable lessons. My heartfelt appreciation goes to my parents, whose unwavering faith and encouragement propelled me toward higher education. In conclusion, I acknowledge the support received from the NSF Award CNS-2134076 under the Secure and Trustworthy Cyberspace (SaTC) program and the NSF CAREER Award ECCS- 2236537, which played a pivotal role in advancing this research. vi LIST OF TABLES . . . LIST OF FIGURES . . . . . . . . . . . . . TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Part I: State Independent Adversarial Models 1.2 Part II: State Based Adversarial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 2 4 CHAPTER 2 . . . . Introduction . DETERMINISTIC ADVERSARY - STOPPING STATE GAMES AND THEIR APPLICATION TO PATH PLANNING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 6 2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 . 15 2.3 Full Information Edge Game with a Termination Threshold . . . . . . . . . . 2.4 Full Information Edge Game with an Arbitrary Termination Threshold . . . . . 20 . 23 2.5 Partial Information Edge Game with a Termination Threshold . . . . . . . . . 2.6 Solution of the Meta-game . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.7 Robotic Simulation of PIE-game . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.8 Summary . . . 42 2.9 Supplementary Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 . . . . Introduction . STOCHASTIC ADVERSARY - STOCHASTIC STOPPING STATE GAMES AND THEIR APPLICATION TO MOTION PLANNING . . . 47 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3 Solution to the M-SSG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 Solution to SSG๐‘šร—๐‘› . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 . . 3.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 . . . 3.6 Summary . . . . . . . CHAPTER 4 . . . . . . Introduction . TAKEOVER ADVERSARY - FLIPDYN: RESOURCE TAKEOVER GAMES . . . 88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3 FlipDyn for general systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4 FlipDyn for LQ Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.5 FlipDyn-Con for LQ Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 118 . 145 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 . . . DATA-DRIVEN ADVERSARIAL MODEL . . . . . . . . . . . . . . . 5.1 . Introduction . 5.2 Model Formulation . 5.3 Solution Approaches 5.4 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Summary . . . 147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 . 171 . 179 . . . . vii CHAPTER 6 FUTURE DIRECTION . . . . . . . . . . . . . . . . . . . . . . . . . . 180 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 viii LIST OF TABLES Table 2.1 Performance of the meta-game vs. Algorithm 1 averaged over 100 runs with an average degree between [2,3] for every vertex of the roadmap. . . . . . . . . . 36 Table 2.2 Performance of the meta-game vs. Algorithm 1 averaged over 100 runs in a fully connected roadmap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 5.1 Cyber exploits and their corresponding probability of success. . . . . . . . . . . 174 Table 5.2 Description of the variables in the building model. . . . . . . . . . . . . . . . . 174 ix LIST OF FIGURES Figure 1.1 Illustration of attacks on a CAV. The dotted line and dashed lines represents potentials spots for a cyber-attack. The solid line represent an authentic user communicating with the CAV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Figure 2.1 The full information edge-game with ๐ฟ = 1, along the edge ๐œˆ๐œ‰ for given graph with ๐‘˜ ๐œˆ๐œ‰ as the number of stages. The information set for the adversary and defender is indicated by the dotted line and nodes taking on a value ๐‘‰๐‘– for ๐‘– โˆˆ {0, 1, . . . ๐พ๐‘’ โˆ’ 1}, respectively. Actions of the adversary (resp. defender) abbreviated as {๐ด, ๐‘ ๐ด} (resp. {๐ท, ๐‘ ๐ท}) for {Attack, No attack} (resp. {Defend, No defend}). The stopping state is indicated by ๐‘†๐‘†. . . . . . . . 11 Figure 2.2 The partial information edge-game along edge ๐œˆ๐œ‰ for the given graph. The information set for the adversary and defender is given by the dotted line between the nodes of a stage and on nodes respectively, indicating the uncer- tainty of each player. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Figure 2.3 Figure 2.4 (a) Value of a FIE-game vs. stages ๐พ๐‘’ for a given set of ๐‘Ÿ2 and ๐‘Ÿ1 with a termination threshold of to ๐ฟ = 1. (b) Policy of the defender at ๐‘˜ = 0 of a FIE-game vs. stages ๐พ๐‘’ for the same set of ๐‘Ÿ2 and ๐‘Ÿ1 with a termination threshold of to ๐ฟ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 (a) Policy of the adversary at the start (๐‘˜ = 0) of a FIE-game vs. stages ๐พ๐‘’ for the same conditions as in Figure 2.3a. (b) Percentage error between the approximate value (equation (2.16)) and recursive value (equation (2.14)) of the FIE-game with ๐ฟ = 1 for a set of game parameters. . . . . . . . . . . . . . . 20 Figure 2.5 The FIE-game along with termination threshold of ๐ฟ = 2. The dynamic game shown can tolerate an action pair of {Defend, Attack} twice followed by disabling the adversary. The information set for the adversary and de- fender is indicated by the dotted line and nodes taking on a value ๐‘‰ ๐‘— for ๐‘– ๐‘– โˆˆ {0, 1, . . . ๐พ๐‘’ โˆ’ 1} and ๐‘— โˆˆ 0, 1, . . . , 3 respectively. The actions of the adversary (resp. defender) is abbreviated as {๐ด, ๐‘ ๐ด} (resp. {๐ท, ๐‘ ๐ท}) for {Attack, No attack} (resp. {Defend, No defend}). The termination states are denoted by ๐‘†๐‘† (blue colored node). . . . . . . . . . . . . . . . . . . . . . . . . 21 Figure 2.6 Figure 2.7 (a) Value of the FIE-game across multiple ๐‘Ÿ2, stages ๐พ๐‘’ and termination threshold ๐ฟ for given ๐‘Ÿ1 = 1.5. (b) Value of the FIE-game across multiple ๐‘Ÿ2, stages ๐พ๐‘’ and termination threshold ๐ฟ for given ๐‘Ÿ1 = 3.0. . . . . . . . . . . . . 24 (a) Policy of the defender at start stage (๐‘˜ = 0) of FIE-game for increasing ๐‘Ÿ2 and the number of attacks ๐ฟ across given stages ๐พ๐‘’.(b) Policy of the adversary at start stage (๐‘˜ = 0) of FIE-game for increasing ๐‘Ÿ2 and the number of attacks ๐ฟ across given stages ๐พ๐‘’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 x Figure 2.8 The PIE-game with a termination threshold of ๐ฟ = 1 with ๐พ๐‘’ = 2. The leaf node ๐‘†๐‘† represents the stopping state. The dotted line indicates the information set for the corresponding player. The notation ๐›ผ๐‘˜ (resp. ๐›ฝ๐‘˜ ) represents information set for the defender (resp. adversary) for the stage ๐‘˜ โˆˆ 1, 2. The value of each leaf node is represented by ๐‘„๐‘š for ๐‘š โˆˆ {1, 2, . . . , 8}. The leaf node values are presented in Section 2.5.1. . . . . . . . . . . . . . . . 25 Figure 2.9 Illustration of the PIE-game matrix ๐ด structure for any given number of stages, ๐พ๐‘’. The solid square blocks indicate the leaf node entries, the triangle blocks indicate the solution from the preceding stage game, the diamond block indicates the value ๐‘‰๐พ๐‘’ with ๐พ๐‘’ = 2, and the empty space indicates zeros. For a given number of stages ๐พ๐‘’, the game matrix ๐ด is recursively solved from ๐ด(2) to ๐ด(๐พ๐‘’). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Figure 2.10 (a) Value of the PIE-game for a set of ๐‘Ÿ1 and ๐‘Ÿ2 for given stages ๐พ๐‘’. (b) Policy of the adversary with the attack action at the start stage (๐‘˜ = 0) of a PIE-game for given ๐‘Ÿ1, ๐‘Ÿ2, and stages ๐พ๐‘’. . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Figure 2.11 (a) Policy of the defender with defend action at the start stage (๐‘˜ = 0) of a PIE-game for given ๐‘Ÿ1, ๐‘Ÿ2, and stages ๐พ๐‘’.(b) The value of a PIE-game and FIE-game vs. stages ๐พ๐‘’ for the same set of ๐‘Ÿ1 and ๐‘Ÿ2. . . . . . . . . . . . . . . . 30 Figure 2.12 (a) Illustration of a simple graph with 3 vertices and 3 edges. The start and end vertex is indicated with ๐œˆ and ๐œ‰ respectively. The number of stages between the nodes ๐‘– and ๐‘— are given by ๐‘˜๐‘–, ๐‘— . (b) The simple network (figure 2.12a) with stages over the edge, ๐‘˜ ๐œˆ1 = ๐‘˜1๐œ‰ = 3 and ๐‘˜ ๐œˆ๐œ‰ = 6. The shortest path is calculated over the edge weights. (c) The solution of the simple graph meta-game with the defender probabilities over the paths. The shortest path is indicated with a larger arrow as compared to others and with lighter shade of vertex. (d) The solution of the simple graph meta-game with the adversary probabilities over the edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Figure 2.13 (a) The sensitivity of choosing the shortest path (๐œ‹def) with changing ๐‘Ÿ1 and ๐‘Ÿ2 with fixed stages over each edge. (b) The sensitivity of choosing the shortest path edge (๐‘’att) with changing ๐‘Ÿ1 and ๐‘Ÿ2 with fixed stages over each edge (c) The sensitivity of choosing the shortest path (๐œ‹def) with changing number of stages over the edges ๐พ๐œ‹๐‘†1,1๐‘‡ and ๐พ๐œ‹๐‘†๐‘‡ given a fixed stage cost. (d) The sensitivity of choosing the shortest path edge (๐‘’att) with changing number of stages over the edges ๐พ๐œ‹๐‘†1,1๐‘‡ and ๐พ๐œ‹๐‘†๐‘‡ given a fixed stage cost. . . . . . . . . . . 33 xi Figure 2.14 (a) A graph consisting of 10 nodes which is sparsely connected. The output of Algorithm 1 is path 1 โˆ’ 10. Of all paths available, the path 1 โˆ’ 2 โˆ’ 10 has highest likelihood of getting selected. (b) Edge 1 โˆ’ 10 has the least chance of being attacked, while edge 2 โˆ’ 10 has the highest chance of getting attacked. (c) The probability of choosing the shortest path for graphs with an average vertex degree in the interval [2, 3]. (d) Probability of choosing the shortest path in a fully connected graph. . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figure 2.15 Illustration of a vehicle attacked from extended view while performing a V2V or V2X communication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 2.16 (a) An attack realized on ROS with Turtlebot3 burger. The attack is obstacles (vehicles) in formation causing a larger deviation in normal trajectory. (b) Influence of an attack (obstacle) on the deviation of path causing an increase in time to destination (security loss). (c) The PIE-game evaluated experimental and expected theoretical value of the PIE-game. . . . . . . . . . . . . . . . . . 39 Figure 2.17 (a) The initial position of the robot with the planned trajectory indicated (b) Attack along the trajectory causing a change in with the dotted line. deviation along the covered and planned trajectory. The covered trajectory is represented by the solid line. (c) The change in trajectory after the defender has intercepted the attack and recovery of the planned trajectory.(d) The final position of the robot with the covered trajectory indicated in solid line. . . . . . 39 Figure 2.18 Illustration of a vehicle attacked from extended view under a V2V or V2X communication in a single/two lane road. . . . . . . . . . . . . . . . . . . . . . 40 Figure 2.19 (a) PIE-game on single traffic lane attack scenario. The figure above is a simulation with TurtleBot3 burger in a Gazebo environment and the one below is the corresponding experimental setup. (b) Velocity profile of the vehicle under different actions such as A(resp. NA) as attack (resp. no attack) and D(resp. ND) as defend(resp. no defend). . . . . . . . . . . . . . . . . . . . 41 Figure 2.20 (a) The value of the PIE-game evaluated in ROS using TurtleBot3 burger and gazebo over multiple simulations and compared with the expected value of the PIE-game. (b) The value of the PIE-game evaluated in ROS with TurtleBot3 burger over multiple experiments and compared with the expected value of the PIE-game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 . . . . Figure 2.21 Illustration of a vehicle attacked from extended view under a V2V or V2X communication in a single/two lane road. The dotted line represent the planned trajectory and the solid line represents the trajectory followed. The solid blocks around the robot at all the time instances represent the boundary and the solid block in from of the robot at time instant 7.89 s represents a spoofed trailer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 xii Figure 3.1 An M-SSG consisting of ๐พ stages a termination threshold of ๐ฟ = 2. The in- formation set for the defender and second player is indicated by the dotted line and nodes taking value ๐‘‰ ๐‘– ๐‘˜ for ๐‘˜ โˆˆ {1, 2, . . . ๐พ }, ๐‘– โˆˆ {0, 1, . . . , ๐ฟ}. The value of an M-SSG under an adversarial intent is indicated by ๐‘‰ ๐‘˜ , ๐‘˜ โˆˆ {0, . . . , ๐พ } (see Remark 3.2.1). At every stage, the game branches with probability ๐œŒ๐‘˜ to indicate an adversarial player. Actions of an adversary (resp. defender) abbreviated as {๐ด, ๐‘ ๐ด} (resp. {๐‘†๐ท, ๐‘Š ๐ท}) for {Attack, No attack} (resp. {Strong Defense, Weak defense}). SS indicates the stopping state. . . . . . . . 53 Figure 3.2 An SSG๐‘šร—๐‘› refers to a stochastic stopping state game where there are ๐‘š possible actions for the defender and ๐‘› possible actions for the second player. At stage ๐‘˜, when the game diverges with a probability of 1 โˆ’ ๐œŒ๐‘˜ , it signifies a non-adversarial player scenario where solely the actions of the defender are applicable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 . . . . . . . Figure 3.3 Figure 3.4 Figure 3.5 (a) Value of an M-SSG vs. edge-game (๐œŒ = 1) over stages ๐พ for หœ๐‘ 2,๐‘˜ = 0.3 and หœ๐‘ 1,๐‘˜ = 1.25, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ } with termination threshold of ๐ฟ = 2 and 4. (b) Probability parameter หœ๐œŒ๐ฟ ๐‘˜ for the same set of parameters. . . . . . . . . . 64 (a) Probability of choosing strong defense action when ๐‘– = ๐ฟ for the M-SSG and edge-game (๐œŒ = 1), solved for ๐พ = 20 using the same set of parameters, หœ๐‘ 2,๐‘˜ , หœ๐‘ 1,๐‘˜ , and ๐ฟ. (b) Probability of choosing attack action when ๐‘– = ๐ฟ for the M-SSG and edge-game (๐œŒ = 1) with the same parameters of stages ๐พ, หœ๐‘ 2,๐‘˜ , หœ๐‘ 1,๐‘˜ , and ๐ฟ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 . . . . (a) Nash Equilibrium policy of the defender (rows 2,3 and 4) for an SSG๐‘šร—๐‘› for a range of ๐œŒ solved over a total of ๐พ = 20 stages. The SSG๐‘šร—๐‘› was solved with stage cost matrix entries ๐‘ 1,๐‘˜ = 1.0, ๐‘ 2,๐‘˜ = 1.2 and ๐‘ 3,๐‘˜ = 0.3, ๐‘˜ โˆˆ {1, 2, . . . , ๐พ }. (b) Nash Equilibrium policy of the second player (column 1 and 4) for the same stage cost parameters and total number of stage. . . . . . . . 72 Figure 3.6 (a) Ego and non-ego vehicle policy averaged over 50 experiment runs for ๐พ = 25 with ๐œŒ = 0.25. (b) Simulated ego and non-ego vehicle policy with defined stage costs and ๐œŒ = 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . 73 Figure 3.7 (a) Sample policy of the defender and attack for a given experimental run. (b) Sampled and expected value of the SSG compared with the theoretical value of the SSG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 . . . . . Figure 3.8 Illustration of trajectory under scaled and unscaled control input ๐›พ๐‘ข๐‘˜ and ๐‘ข๐‘˜ respectively for a finite horizon ๐‘‡. . . . . . . . . . . . . . . . . . . . . . . . . . 76 Figure 3.9 (a) Simulated trajectory of an ego and non-ego vehicle in a lane change scenario over 50 time steps with ๐œŒ = 0.1 and ๐œŒ = 1.0, and a final time of 25s. (b) Expected trajectory of the ego and non-ego vehicle averaged over 50 experiment runs, with 50 time steps for corresponding parameters of ๐œŒ. . . . . 77 xiii Figure 3.10 (a) Ego and Non-ego vehicle policy averaged over 50 experiment runs with a nominal speed of 0.15 m/s and 0.18 m/s. (b) Ego and non-ego vehicle policy averaged over 50 simulation runs for the same set of nominal speeds (0.15 and 0.18 m/s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 . . . Figure 3.11 (a) Simulated policy of an ego and non-ego vehicle (possible adversary) in lane change scenario over a range of sample time with ๐œŒ = 0.1. (b) Simulated policy of an ego and non-ego vehicle for the same scenario over a range of nominal speeds with ๐œŒ = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Figure 3.12 A typical feedback control system with an estimator. The control law is a function of the estimates. The estimator performance is dependent on the channel used to communicate the data observed from a sensor. An adversary might be present in the feedback loop impacting the performance of the estimates by injecting noise on different channels. . . . . . . . . . . . . . . . . 82 Figure 3.13 Value of the SSG๐‘šร—๐‘› for a range of fixed probability ๐œŒ solved over a total of ๐พ = 20 stages with a engagement budget of ๐ฟ = 1. . . . . . . . . . . . . . . . . 84 Figure 3.14 (a) Probability of the defender actions; defense 2,3 and 4 (rows 2,3 and 4) for an SSG๐‘šร—๐‘› for the corresponding probability and stages ๐พ. (b) Probability of the second player actions; attack 1,2 and 3 (column 1, 2 and 3) for the same SSG๐‘šร—๐‘› for the corresponding probability and stages ๐พ. . . . . . . . . . . . . . 84 Figure 4.1 Figure 4.2 Figure 4.3 Figure 4.4 (a) Closed-loop system with adversaries present at various locations infecting the reference values, actuator, plant, measurement output and control input. (b) Closed-loop system with the adversary present between the controller and actuator trying to takeover the control signals. The takeover action at time ๐‘˜ of the defender (resp. adversary) is given by ๐œ‹0 ๐‘˜ ). A FlipIt is setup over the control signal between the defender and adversarial control. ๐‘˜ (resp. ๐œ‹1 . . . . . . 89 (a) Coefficient of the parameterized value function, p0 and p1 for a 1- dimensional system where the state is bounded (๐น โ‰ค 1) over a horizon length of ๐ฟ = 50. (b) Attack and defense policy corresponding to the value function in Figure 4.2a for the given set of costs. . . . . . . . . . . . . . . . . . . . . . . 109 (a) Coefficient of the parameterized value function, p0 and p1 for an un- bounded (๐น โ‰ฅ 1) 1-dimensional system with a horizon length of ๐ฟ = 50. (b) Policy of defense and attack for the obtained parameterized value function indicated in Figure 4.3a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 (a) Minimum eigenvalue of the saddle-point value parameters, ๐œ†๐‘› ( ห†๐‘ƒ0) and ๐œ†๐‘› ( ห†๐‘ƒ1), given ๐‘’ = 0.99 over a horizon length of ๐ฟ = 100. (b) Takeover strategies corresponding to the saddle-point value in Figure 4.4a for a given initial state ๐‘ฅ1 and FlipDyn state ๐›ผ๐‘˜ = 0, โˆ€๐‘˜ โˆˆ K. . . . . . . . . . . . . . . . . . 116 xiv Figure 4.5 (a) Minimum eigenvalue of the saddle-point value parameters, ๐œ†๐‘› ( ห†๐‘ƒ0) and ๐œ†๐‘› ( ห†๐‘ƒ1), given ๐‘’ = 1.01 over the same horizon length of ๐ฟ = 100. (b) Takeover strategies corresponding to the saddle-point value in Figure 4.5a for a given initial state ๐‘ฅ1 and FlipDyn state ๐›ผ๐‘˜ = 0, โˆ€๐‘˜ โˆˆ K. . . . . . . . . . . . . . . . . 117 Figure 4.6 Saddle-point value parameters p๐‘– ๐‘˜ , ๐‘˜ โˆˆ {1, 2, . . . , ๐ฟ}, ๐‘– โˆˆ {0, 1} for state tran- sition constant (a) ๐ธ = 0.85, (b) ๐ธ = 1.0. The parameters p๐‘– ๐‘˜ ,M-NE corre- sponds to the parameters of the saddle-point under a mixed NE takeover over the entire time horizon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Figure 4.7 Defender takeover strategies ๐›ฝ๐‘˜ and adversary takeover strategies ๐›พ๐‘˜ for state transition (a) ๐ธ = 0.85 and (b) ๐ธ = 1.0. M-NE corresponds to the mixed NE policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 . . . . . . . . Figure 4.8 Maximum eigenvalues (๐œ†1(๐‘ƒ ๐›ผ ๐‘˜ , ๐‘˜ โˆˆ {0, 1, . . . , ๐ฟ + 1}, ๐›ผ โˆˆ {0, 1} for state transition constant (a) ๐‘’ = 0.85, (b) ๐‘’ = 1.0. ๐›ผ ๐‘˜ )) of saddle-point value parameters ๐‘ƒ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 . . . . . . . . Figure 4.9 The parameters ๐‘ƒ๐‘– ๐‘˜ ,M-NE corresponds to saddle-point value parameter re- cursion under a mixed NE takeover over the entire time horizon. Defender takeover strategy ๐›ฝ๐‘˜ and adversary takeover strategy ๐›พ๐‘˜ for state transition (a) ๐‘’ = 0.85 and (b) ๐‘’ = 1.0. M-NE corresponds to the mixed NE policy. . . . . . . 143 Figure 5.1 A hybrid attack graph for a single-zone building with four cyber nodes (in red) and one physical node (in blue) [28]. An adversary infiltrates the leaf node (node 1) and progressively secures additional security attributes (nodes 2-4) before attacking the zone temperature controller by perturbing sensor measurements at the root node 5. . . . . . . . . . . . . . . . . . . . . . . . . . 153 Figure 5.2 Switching control graph with nodes 1, 4 and 6 representing adversary control, and nodes 2, 3, 5 representing defender control. . . . . . . . . . . . . . . . . . 170 Figure 5.3 Figure 5.4 (a) An HAG inspired from a ransomware attack graph [126]. The source node 1 is represented by the dashed circle and the physical node (sink node) 9 is represented by concentric circles. (b) Trajectories of Zone 1 temperature (Zone 1) along with the outside air temperature (Outside T) over a year with upper (T max) and lower temperature (T min) comfort bounds. . . . . . . . . . 173 (a) Defenderโ€™s (๐ฝdef) and Attackerโ€™s (๐ฝatt) objective with a hardening cost factor ๐‘‘๐‘’ := 0.1, where min(๐‘–) is defined as the ๐‘–๐‘กโ„Ž argument minimum of ๐ฝdef/att. (b) Defenderโ€™s (๐ฝdef) and Attackerโ€™s (๐ฝatt) objective with a hardening cost factor of ๐‘‘๐‘’ := 0.5. (c) Defenderโ€™s (๐ฝdef) and Attackerโ€™s (๐ฝatt) objective with a hardening cost factor ๐‘‘๐‘’ := 1. . . . . . . . . . . . . . . . . . . . . . . . 176 xv Figure 5.5 Figure 5.6 (a) Average time steps required to reach the physical node for the adversary for the hardening factor of ๐‘‘๐‘’ := 0.1. (b) Average time steps to reach the physical node with ๐‘‘๐‘’ := 0.5 (c) Average time steps to reach the physical node with ๐‘‘๐‘’ := 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 . . (a) Cyber exploits weights obtained from the result of Algorithm 3 with a cyber cost factor of ๐‘‘๐‘’ = 0.1, 0.5 and 1.0. (b) Time to reach the physical node 9 for varying hardening cost factor and compared with the expected time to reach (๐ฝAMC) obtained from (5.11). . . . . . . . . . . . . . . . . . . . . . . . . 178 Figure 5.7 Sample node trajectory obtained from an attack policy with a hardening cost factor of (a) ๐‘‘๐‘’ = 1.0, (b)๐‘‘๐‘’ = 0.5, and (c) ๐‘‘๐‘’ = 0.1, where with null action corresponding to no action taken by the adversary. . . . . . . . . . . . . . . . . 178 Figure 5.8 Discomfort corresponding under the optimal policy {๐œ‹โˆ—, ๐œถโˆ—} for the hardening cost factor (a) ๐‘‘๐‘’ := 1.0, (b) ๐‘‘๐‘’ := 0.5, and (c) ๐‘‘๐‘’ := 0.1. . . . . . . . . . . . . . 179 xvi CHAPTER 1 INTRODUCTION Decision making is an integral part of our daily lives, and it plays a crucial role in shaping the outcomes we experience. Whether we are individuals, organizations, or governments, we are constantly faced with a myriad of choices that have the potential to impact our lives and the lives of those around us. The complexity of decision making becomes even more pronounced in the real world, where we are often confronted with multifaceted problems that require careful analysis and consideration. Real-world decision making encompasses a wide range of contexts, including business, health- care, finance, public policy, and security. In these domains, decision makers must navigate through a maze of uncertainties, conflicting objectives, limited resources, and evolving circumstances. This complexity is particularly evident in critical infrastructures such as electrical grids, building management systems, solar farms, autonomous vehicles, and other Cyber-Physical Systems (CPS). Furthermore, real-world decision making is inherently collaborative or competitive, involving mul- tiple stakeholders with diverse perspectives and interests. To better understand and navigate the strategic interactions among decision makers, game theory provides a powerful framework. By analyzing the choices and behaviors of individuals or organizations in situations where the outcome of each participantโ€™s decision depends not only on their own actions but also on the actions of others, game theory offers valuable insights into decision-making processes. In this thesis, we address decision-making related to the security of CPS using tools from game theory, control theory, and data-driven methods against diverse adversarial models. We split the thesis into two parts. In part one, we focus on state independent adversaries, where there is no explicit dependence on the underlying state of a CPS. We characterize defense strategies for this type of adversarial model and demonstrate its application. In part two, we develop defensive strategies against adversarial models which take into account the underlying dynamics of the systems. 1 1.1 Part I: State Independent Adversarial Models Attacks in path planning are inspired by two specific models. The first is when an adversary can replace the messages received by a vehicle (man-in-the-middle (MITM) or communication attack [71]). The second is when the adversary spoofs the perception module of a vehicle to introduce fake obstacles in the vehicleโ€™s occupancy grid [34]. Adversarial methods have been used to fool a multi-object tracking system, thereby causing a large deviation in vehicleโ€™s trajectory and impacting its safety [72]. The vulnerabilities in LiDAR-based perception architectures have been targeted via spoofing attacks with a 80% mean success rate [138]. The vehicle can verify authenticity of the messages through advanced encryption methods to detect an MITM attack or can query the infrastructure (V2I) under a perception-based attack. However, doing so will consume resources (energy or bandwidth) and introduce delays. Therefore, it is important to find the right energy allocation between security and mobility. An illustration of different attacks on autonomous vehicles is shown in Figure 1.1. In Chapter 2, we study the problem of path Figure 1.1 Illustration of attacks on a CAV. The dotted line and dashed lines represents potentials spots for a cyber-attack. The solid line represent an authentic user communicating with the CAV. planning on a graph, where a defender endeavors to chart an optimal course from a source to a destination vertex. This endeavor unfolds in the presence of a deterministic adversary. We frame this scenario as a zero-sum multi-stage game played over an edge, wherein the defender and adversary engage in a strategic dance with a crucial element known as the stopping state. The analysis unfolds under two distinct information paradigms: full information, granting each player complete insight into the opponentโ€™s past actions, and partial information structure, wherein the defender gains a comprehensive understanding of the adversaryโ€™s actions only upon deploying the 2 Vehicle toย  Vehicle attack - V2VVehicle toย  Infrastructure attack - V2Iย  countermeasure. We characterize the Nash equilibrium for this edge-game under both information structures, accounting for the fact that each player possesses two strategic actions. The fundamental advantage of this chapter lies in the construction of a meta-game for determining a path resilient to attacks. This endeavor is juxtaposed against a novel heuristic, with a particular emphasis on scenarios where the number of attacked edges is constrained. Ascertaining adversarial behavior can be thought of as nature choosing one sub-tree of a game at every instance. Bayesian games [50, 57, 112, 65, 153] are used to capture the impact of nature and multiple types of adversaries. The application of Bayesian games extends to discerning deceptive actions by adversaries and employing defensive deception techniques as a deterrent [65]. Notably, Garnaev et al. [57] delve into the game type played by an adversary, whether simultaneous or sequential, aiming to maximize defender payoff. Horรกk et al.[63] introduced the concept of cyber deception as a partially observable stochastic game, incorporating one-sided information to evaluate the robustness of defense strategies. Chen et al.[37] proposed a stochastic game within a data-fusion framework for asymmetric threat detection and prediction, relying on advanced knowledge infrastructure. Salem et al. [128] present a two-state stochastic game involving a user aiding a sophisticated intrusion detection system (IDS) and an adversary capable of launching eavesdropping or jamming attacks, providing explicit solutions and numerical results. In the context of honeypot modeling, Tian et al. [141] frame it as a Bayesian game, determining Nash equilibrium strategies under resource constraints. In the next chapter, we assume nature impacts the game not in a Bayesian setting, but rather in governing the presence or absence of an adversary at every stage of the game. In Chapter 3, we introduce the assumption that the defender wields a classifier, which provides a probability reflecting the second playerโ€™s potential adversarial intent. This introduces a new layer of complexity in the decision-making process, necessitating an understanding of the potential impact of adversarial actions over multiple planning stages (finite-horizon). Our proposed model treats the second player as adversarial with a certain probability at each stage, until the point where adversarial intent is confirmed after a set of actions is played for a specified number of times 3 (referred to as a budget). This pivotal event leads the game to a stopping state and results in its termination. We proceed to analytically characterize the Nash equilibria of this game, focusing specifically on the case of two actions per player with an engagement budgetโ€”a model termed as M-SSG. This initial framework is subsequently extended to incorporate two additional types of budget: an attack and a defense budget. Furthermore, we broaden the action space for both players beyond the initial two actions, and also provide an analytical condition governing the transition to a pure policy. 1.2 Part II: State Based Adversarial Models The setup in Chapter 4 is inspired by the cybersecurity game of stealthy takeover known as FlipIt [145]. FlipIT is a two-player game between an adversary and defender competing to control a shared resource. The resource can be represented as a critical digital system such as a computing device, virtual machines or a cloud service [30]. In particular Chapter 4 takes us further along the path an adversariesโ€™ potential to seize control of a system entirely. Picture an adversary endowed with the knowledge of a prototypical Cyber- Physical Systems (CPS) control loop, capable of executing a takeover at various critical junctures. These junctures encompass the reference inputs, actuator, state, sensor, and control output, each bearing the potential to sway the systemโ€™s performance. Unlike conventional adversaries that tamper with either the systemโ€™s states (actuator attack) or measurements (integrity attack) [1], this chapter envisions a scenario where the adversary commandeers a resource, wielding the power to transmit arbitrary values originating from the controlled resource. The focus of this chapter extends beyond static systems, delving into the dynamic landscape of resource takeovers within a CPS. It confronts the challenge of devising effective strategies to combat adversaries while navigating the delicate balance between operational costs and system performance. The core challenge in CPS security is the tight (often nebulous) integration of the cyber, physical and computational elements. Such an integration, which can expand the CPS to arbitrary dimensions proportional to the complexity of the real-world system, necessitates a scalable framework for developing defense policies. Riding on recent successes, Machine Learning (ML)-based methods 4 use parametric representations to create computational models to represent multi-level abstraction from data. ML has replaced hand-engineered tasks with computational models that offer high accuracy and performance. Although ML is being increasingly used in specific aspects of CPS security, such as anomaly detection [78], malware detection, intrusion detection [32], prevention of blackouts, attacks and destruction [148], the explicit consideration of the hybrid dynamics governing a CPS is relatively unexplored. In Chapter 5, we introduce a data-driven domain-aware, optimization-based approach that revolutionizes the way we fortify Cyber-Physical Systems (CPS) against potential threats. By emulating a strategic adversary within the system, exploiting vulnerabilities, interconnections, and the dynamics of physical components, we engineer an automated defense strategy. Our approach leverages an adversarial decision-making model founded on a Markov Decision Process (MDP). This model orchestrates the optimal cyber (discrete) and physical (continuous) attack actions across a CPS attack graph. The defense planning problem takes shape as a non-zero-sum game, an intricate dance between the adversary and defender. To solve the adversaryโ€™s problem, we employ a model- free reinforcement learning technique, dynamically adapting to the chosen defense strategy. Next, we employ Bayesian optimization to discern an approximate best-response for the defender, arming the network against the ensuing adversary policy. This iterative process refines the strategies for both players, creating a dynamic and adaptable defense against potential threats. Finally, in Chapter 6, we discuss some future directions corresponding to both parts of the thesis. 5 CHAPTER 2 DETERMINISTIC ADVERSARY - STOPPING STATE GAMES AND THEIR APPLICATION TO PATH PLANNING In this chapter, we introduce a persistent adversary in the CPS, termed as a deterministic adversary, alongside the decision making framework of stopping state games. Our objective is to introduce a mathematical framework to reason about a diverse range of adversaries and deploy effective defense strategies to counter and possibly capture such adver- saries. The aim of such a framework is to bring together tools from game-theory, optimization and backward induction with an emphasis on closed-form solution. Such closed-form solutions enable computational efficiency and easier transitions to real-world deployment. The concepts and definitions we introduce in this chapter will be subsequently carried forward in the later chapters. In particular, the focus of this chapter will be on discussing defense and attack strategies of a defender and a deterministic adversary, respectively. The outcome of this chapter is to ensure security and resilience in path planning problems while striking a balance between performance and costs. 2.1 Introduction In this chapter, we consider a path planning problem on a graph wherein a vehicle (defender) seeks to find an optimal path from a source to a destination vertex in the presence of a deterministic adversary. The defender is equipped with a countermeasure that can detect and permanently disable the attack if it occurs concurrently. We model the problem over an edge as a zero-sum multi-stage game played between the defender and the adversary with a stopping state, termed as the edge- game. We analyze this game under full information, in which each player has complete knowledge of the past actions taken by the opponent at every stage. We also analyze the game under a partial information structure, wherein the defender obtains complete knowledge of the attackerโ€™s actions only when the defender uses the countermeasure. We characterize the Nash equilibrium of the edge-game in both information structures with two actions per player and analyze its sensitivity to the game parameters. We then construct a meta-game using the edge-game solutions to determine an attack-resilient path and compare it with an efficient novel heuristic with a constraint on the 6 number of edges attacked. Attack resilience is an essential attribute for mobile robots, and has garnered a lot of attention in the recent years. Several methods have been proposed to improve resilience, such as designing robust estimators in presence of process noise and modeling errors [110, 82]. In the context of input-output attacks and known disturbance bounds for a linear dynamical system, Hespanha et al. [62] employed game-theoretic methods to compute locally optimal solutions. Liu et al.[91] contributed by deriving secure trajectories for robotic systems navigating from a source to a destination, elucidating conditions under which attacks can remain undetected. Furthermore, the study conducted by Bianchin et al.[29] focuses on the localization and navigation of a robot in the presence of attacks, exploring the conditions under which both detectable and undetectable attacks may exist. Highlighting the significance of communication in preventing damages and system manipula- tion, Agarwal et al. [2] emphasize the importance of conveying information. In response to the rising prevalence of swarm-robotics applications, a distributed robust sub-modular optimization algorithm is proposed in [160]. Addressing security concerns in swarm-robotics motion planning, Tsiamis et al. [142] contributed by implementing security measures to safeguard mobile robots against eavesdropping. Furthermore, design of a sensor network has been used to address security concerns in disaster relief applications, such as deploying a helicopter in a flood-hit region for search and rescue operations [77]. Game theory can be used to model strategic decision-making in vehicular networks such as the communication links, hardware and software [7]. Attacks have been studied over the wireless communication channels [71] and inter-connectivity between different components of CPS [97], where risk is modeled corresponding to a threat profile in conjunction with the communication links, hardware and software. A MITM attack [122] was demonstrated on commercial UAV(s) which showed that mission critical tasks are susceptible to such attacks and can be secured through an appropriate set of countermeasures. There have been several works on game theory applied to network interdiction [147], which 7 models the interaction between an evader attempting to travel between two nodes in presence of an edge interdictor as a two-person zero-sum game. Similar network interdiction works have been conducted between an evader and an interdictor under a budget [69]. A subsequent work extends this line of inquiry, introducing asymmetric information between the evader and interdictor and formulating the problem as a mixed-integer nonlinear bilevel challenge [24]. Addressing the computational efficiency of managing asymmetric information in network interdiction, an efficient approach has been proposed [87]. Additionally, variants of network interdiction exploit physical flows [115, 43], employing multi-level optimization techniques [133]. Exploring attack-resilient path planning in mobile robotics, such as works from Sanjab et al. [130, 129], address this emerging challenge. These studies present a comprehensive framework for analyzing the security of drone delivery systems. Central to their approach is the formulation of a zero-sum interdiction game involving a defender (e.g., the drone operator) and a malicious adversary. In this strategic interplay, the drone aims to minimize delivery time, while the adversary strategically identifies locations for interdiction, maximizing delivery time. Another key contribution involves integrating prospect theory to capture nuanced perceptions of success and achievable delivery times for both the defender and the adversary relative to a specified delivery time. The research extends incorporating new concepts from cumulative prospect theory (PT) [129]. The game undergoes analysis both with and without PT, leading to the development of innovative algorithms to attain equilibria under PT and the interdiction game. We conducted a validation of the partial information model using a ground robot traveling from a source pose to a destination pose. This validation was performed through the utilization of ROS in conjunction with the Gazebo simulation environment [74]. In a second scenario, we simulated and experimented with the ground robot, representing a single traffic lane. This allowed us to demonstrate how an attack could be executed to deceive the robotโ€™s perception by introducing spurious obstacles in front of it while in motion. Subsequently, we applied the derived solution to make informed decisions regarding when to query the infrastructure for information validation. The effectiveness of our theoretical solution was evaluated across multiple epochs of the attack 8 simulation and experiments. The primary contributions of this chapter are four-fold. 1. Game-theoretic modeling: We model the interplay between costs related to mobility and security in an attack-resilient path planning problem using the framework of dynamic zero- sum games with a stopping state, i.e., the game terminates if the players play out of a given subset of their actions at any stage (cf. Figure 2.1). The attack based on the MITM model leads to a dynamic game with full information structure, i.e., every player has complete knowledge of the past actions of the opponent at every stage. In contrast, the attack based on sensor spoofing yields a partial information structure, i.e., only the adversary has complete information of the past actions of the defender, but the adversary is constrained to attack at all subsequent stages if it decides to spoof the sensor at any stage. The proposed models can be considered as dual versions of the classic Chicken game or the War of attrition [61] with an additive stage cost that models look-ahead and the novel aspect of partial (asymmetric) information. 2. Solutions and parameter sensitivity: For both information structures, we first characterize the Nash equilibria and present the solutions to be played over the edges of the roadmap mod- eled as graph. For ease of exposition and presentation, we report the analysis and numerical results for the special case of two actions per player, although our solution techniques are applicable for any number of actions per player. Additionally, we study the sensitivity of the obtained solution and the player strategies to: (i) the relative costs of mobility and security, and (ii) the number of stages along with the extension to multiple attacks in the full informa- tion model. We show that the partial information model leads to a linear programming-based formulation yielding an efficient solution to the game. This solution has been inspired by analogous techniques to solve games of incomplete information [119]. 3. Computation of an optimal attack-resilient path: We use the solutions to the two classes of games to construct a meta-game over a given roadmap. In the meta-game, the objective 9 of the defender vehicle is to go from a source to a destination vertex while minimizing the impact of attack. The adversaryโ€™s objective is to target the most vulnerable set of edges over the given set of feasible path(s). We assume that the adversary is resource constrained, and is thus restricted to be active on only one edge of the graph. The assumption of a single edge attack arises from the fact that, if an attack is detected along an edge of a given path, the defender becomes highly vigilant along the current path and will either take an alternate path or ensure higher security in the current path making the adversary less efficient. However, this assumption can be relaxed by allowing the adversary to be active over multiple edges. Making the adversary more capable (multi-edge attack) only changes the size of the meta- game leading to an increased computation. Over a simple graph, we quantify the sensitivity of the choice of the resulting path (resp. edges for the adversary) to the costs of mobility and security. 4. Comparison with competing heuristics: For large-sized roadmaps, we compare the solution of the meta-game against a simple heuristic proposed based on computing the shortest path constrained to a single attack and defense scenario over an edge. We calculate the shortest path by replacing edge weights with the corresponding solutions of the full or partial information game. We observe that in sparse graphs, the solutions of the meta-game and the heuristic are comparable, whereas in dense graphs, the meta-game yields a reduced cost but with longer computation times. Outline: This chapter is organized as follows. We present the problem formulation of the zero-sum multi-stage game played over any edge and the meta-game representing the resilient path planning problem in Section 2.2. This is followed by Section 2.3 and Section 2.4, wherein we characterize the solution to the full information structure as a function of the stage costs, the number of stages, and a threshold on the number of attacks. We present the solution to the partial information structure in Section 2.5. Then, we present the solution to the meta-game in Section 2.6 on a simple graph and evaluate the sensitivity of choosing the shortest path versus the alternatives 10 Figure 2.1 The full information edge-game with ๐ฟ = 1, along the edge ๐œˆ๐œ‰ for given graph with ๐‘˜ ๐œˆ๐œ‰ as the number of stages. The information set for the adversary and defender is indicated by the dotted line and nodes taking on a value ๐‘‰๐‘– for ๐‘– โˆˆ {0, 1, . . . ๐พ๐‘’ โˆ’ 1}, respectively. Actions of the adversary (resp. defender) abbreviated as {๐ด, ๐‘ ๐ด} (resp. {๐ท, ๐‘ ๐ท}) for {Attack, No attack} (resp. {Defend, No defend}). The stopping state is indicated by ๐‘†๐‘†. Figure 2.2 The partial information edge-game along edge ๐œˆ๐œ‰ for the given graph. The information set for the adversary and defender is given by the dotted line between the nodes of a stage and on nodes respectively, indicating the uncertainty of each player. as a function of the stage cost parameters and number of stages along an edge. Furthermore, we simulate the meta-game over large graphs and compare it against the shortest path heuristic. We implement the partial information game in an open path and single traffic lane scenario, and validate the approach with simulations and experiments in Section 2.7. Finally, we conclude this chapter in Section 2.8. We present the proofs of all mathematical claims in the appendix. 11 2.2 Problem Formulation Consider a roadmap modeled as a graph ๐บ with a set of vertices ๐‘‰ and directed edges ๐ธ. Given a directed edge ๐‘’ โˆˆ ๐ธ, we model the interaction between an adversary and a defender as a multi- stage zero-sum finite game called an edge-game (e-game). Each stage is associated with a matrix ๐‘† whose rows represent the actions available to the defender and columns as actions available to the adversary, with a row-column entry representing the payoff. Each stage is also associated with multiple states and the current state of the game at a stage is a function of the past actions of each player. An e-game is characterized by a sequence of matrices {๐‘†1, ๐‘†2, . . . , ๐‘†๐พ๐‘’ }, where ๐พ๐‘’ is the number of stages on edge ๐‘’. For ease of exposition, we assume that at any stage ๐‘˜, the number of actions for the defender (resp. adversary) are limited to two โ€“ defend or no defend (resp. attack or no attack for the adversary), implying ๐‘†๐‘˜ โˆˆ R2ร—2, although the model easily extends to more than two actions per player. The actions of the players and the corresponding entries of the stage cost matrices are given as, attack no attack defense no defense ๐‘ 11,๐‘˜ ๐‘ 21,๐‘˜ ๐‘ 12,๐‘˜ ๐‘ 22,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป At each stage ๐‘˜, the defender (row player) and the adversary (column player) simultaneously select their respective actions ๐‘–, ๐‘— from the set {Defend, No Defend} and {Attack, No Attack}, leading to an adversary payoff of ๐‘ ๐‘–๐‘˜ ๐‘—๐‘˜,๐‘˜ โˆˆ ๐‘†๐‘˜ . We consider that an attack is detected at any stage ๐‘˜ โ‰ค ๐พ๐‘’, whenever the action pair {Defend, Attack} is played simultaneously. The game stops at any stage ๐‘˜ if the adversary gets detected a total of ๐ฟ โ‰ฅ 1 times, defined as the stopping state. We call the parameter ๐ฟ as the termination threshold. The edge cost is the net payoff to the adversary. 2.2.1 Full and partial information e-game We consider two variations of e-games namely; full information and partial information e- game. In the full information e-game, termed as FIE-game, the game state at any stage is common knowledge to both players. In the partial information e-game, referred to as the PIE-game, if, at any given stage, the defender chooses not to defend, they remain unaware of the action taken by 12 the adversary, meaning the game state remains unknown. In contrast, in a PIE-game, we assume that the adversary is fully aware of the game state. However, if the adversary chooses to attack at any stage of a PIE-game, then it must continue to play its action of attack until the game reaches a stopping state or till the last stage ๐พ๐‘’ is reached. An illustration of a simple roadmap with FIE-game and PIE-game are shown in Figures 2.1 and 2.2, respectively. The stopping state (๐‘†๐‘†) models the fact that after getting detected a total of ๐ฟ times, the adversary gets permanently disabled. To define engagement of an e-game formally, consider an indicator function at stage ๐‘˜ defined by: ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ The game terminates in a stopping state if there exists a stage ๐‘ก โ‰ค ๐พ๐‘’ for which (cid:205)๐‘ก if {๐‘–๐‘˜ , ๐‘—๐‘˜ } = {๐ท, ๐ด}, 1(๐‘–๐‘˜ , ๐‘—๐‘˜ ) = otherwise. 1, 0, ๐‘˜=1 (2.1) ๐ผ (๐‘–๐‘˜ , ๐‘—๐‘˜ ) = ๐ฟ. Now, given a sequence of player actions {(๐‘–1, ๐‘—1), . . . , (๐‘–๐พ๐‘’, ๐‘—๐พ๐‘’)}, the net payoff to the adversary is given by: ๐‘ก โˆ‘๏ธ ๐ฝ๐พ๐‘’ = ๐‘ ๐‘–๐‘˜ ๐‘—๐‘˜,๐‘˜ + ๐พ๐‘’โˆ‘๏ธ ๐‘ 11,๐œ…, (2.2) ๐œ…=๐‘ก+1 since the game stops at stage ๐‘ก โ‰ค ๐พ๐‘’. The quantity (cid:205)๐พ๐‘’ ๐‘˜=1 ๐œ…=๐‘ก+1 ๐‘ 11,๐œ… represents the cost-to-go from a stopping state to the final stage, which is the additive mobility cost. This chapter analyzes the FIE-game for any ๐ฟ โ‰ฅ 1. For ease of exposition, we restrict ourselves to ๐ฟ = 1 for the PIE-game, although the approach can be extended to ๐ฟ โ‰ฅ 1. For both FIE-game and PIE-game, we consider the space of behavioral policies. A multi- stage behavioral policy [61] for the defender and adversary is a set of probability distributions Y๐‘’ := {๐‘ฆ1, . . . , ๐‘ฆ๐พ๐‘’ } โˆˆ ฮ” ๐พ๐‘’ ๐พ๐‘’ 2 and Z๐‘’ := {๐‘ง1, . . . , ๐‘ง๐พ๐‘’ } โˆˆ ฮ” 2 , respectively, where ฮ”2 is the probability ๐พ๐‘’ ๐พ๐‘’ โ†’ R to the adversary with respect simplex in 2 dimensions. The net expected cost ๐ฝ๐ธ : ฮ” ร— ฮ” 2 2 to the behavioral policies {Y๐‘’, Z๐‘’} is given by: ๐ฝ๐ธ๐‘’ (Y๐‘’, Z๐‘’) = ๐พ๐‘’โˆ‘๏ธ ๐‘˜=1 ๐‘˜ ๐‘†๐‘˜ ๐‘ง๐‘˜ . ๐‘ฆT (2.3) 13 It can be shown that ๐ฝ๐ธ is obtained from the forward recursive equation, given by: ๐ฝ๐‘Ž = ๐‘Ž โˆ‘๏ธ ๐‘˜=1 ๐‘˜ ๐‘†๐‘˜ ๐‘ง๐‘˜ โˆ’ ๐‘ฆT ๐‘Žโˆ’1 โˆ‘๏ธ ๐‘=1 ๐‘ฆ๐‘,1๐‘ง๐‘,1๐ฝ๐‘Žโˆ’๐‘, (2.4) where ๐ฝ๐‘Ž is the expected pay-off at stage ๐‘Ž โˆˆ {1, 2, . . . , ๐พ๐‘’}. If the adversary attacks an edge ๐‘’ โˆˆ ๐ธ, then the cost over the edge ๐‘’ is defined by a pair of behavioral policies (Yโˆ— ๐‘’ , Zโˆ— ๐‘’ ) that are in Nash equilibrium [61], i.e., โˆ€Y, Z โˆˆ ฮ” ๐พ๐‘’ 2 , they satisfy ๐ฝ๐ธ๐‘’ (Yโˆ— ๐‘’ , Z๐‘’) โ‰ค ๐ฝ๐ธ๐‘’ (Yโˆ— ๐‘’ , Zโˆ— ๐‘’ ) โ‰ค ๐ฝ๐ธ๐‘’ (Y๐‘’, Zโˆ— ๐‘’ ). We denote the outcome of the e-game as ๐ฝโˆ— ๐ธ๐‘’ := ๐ฝ๐ธ๐‘’ (Yโˆ— ๐‘’ , Zโˆ— ๐‘’ ). 2.2.2 Attack-resilient path planning The cost of traversing an edge ๐‘’ โˆˆ ๐ธ is contingent upon whether the edge has been attacked or not, defined as: ๐‘ค๐‘’ = , ๐ฝโˆ— ๐ธ๐‘’ if edge ๐‘’ is attacked, (cid:205)๐พ๐‘’ ๐‘˜=1 ๐‘ 22,๐‘˜ , otherwise, (2.5) ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ where the no attack condition corresponds to the mobility cost over the edge. Let ๐‘’๐‘– ๐‘— โˆˆ ๐ธ denote the directed edge connecting vertices ๐‘– and ๐‘—. Let ๐œˆ, ๐œ‰ โˆˆ ๐‘‰ denote a start and destination pair of vertices. A path from ๐œˆ to ๐œ‰ is a collection of at most |๐‘‰ | โˆ’ 1 directed edges. A sample path is defined as ๐œ‹๐œˆ๐œ‰ := {๐‘’๐œˆ๐‘ข, . . . , ๐‘’๐‘ฃ๐œ‰ }, โˆ€๐‘ข, ๐‘ฃ โˆˆ ๐‘‰ \ {๐œˆ, ๐œ‰}. The set of paths from ๐œˆ to ๐œ‰ is denoted as ๐‘ƒ๐œˆ๐œ‰. The cost of a path ๐œ‹๐œˆ๐œ‰ โˆˆ ๐‘ƒ๐œˆ๐œ‰ is defined as ๐‘ค๐œ‹๐œˆ ๐œ‰ = โˆ‘๏ธ ๐‘’โˆˆ๐œ‹๐œˆ ๐œ‰ ๐‘ค๐‘’. (2.6) The cost ๐‘ค๐‘’ can then be used to define a meta-game played between the path defender and an edge adversary. In this game, the adversary selects a subset E โŠ‚ ๐ธ from a total of |๐ธ |๐‘ƒ๐œ‚ edges, where ๐œ‚ is the total number of edge attacks, while the defender selects a pathh ๐œ‹๐‘†๐‘‡ โˆˆ ๐‘ƒ๐‘†๐‘‡ . Assumption 2.2.1 [Single edge attack] We focus on the scenario of a single possible attack edge. Consequently, the total number of edges corresponds to |๐ธ |๐‘ƒ1 = |๐ธ |. 14 This allows for representing the meta-game equivalently through the entries of a matrix ๐‘Š whose number of rows and columns equal the cardinality of |๐‘ƒ๐‘†๐‘‡ | and |๐ธ |, respectively. A mixed policy for the defender (resp. adversary) in the meta-game is a probability distribution ห†๐‘ฆ (resp. ห†๐‘ง) over the set of paths ๐‘ƒ๐‘†๐‘‡ (resp. the subsets of edges E โŠ‚ ๐ธ).In the e-game, we determine the policies for both the defender and adversary over an edge ๐‘’. In contrast, in attack-resilient path planning, we establish a meta-level policy for the defender and an attack policy. Our objective is to compute a Nash equilibrium for the meta-game, defined as: ๐‘Š๐‘ ๐ธ = min ห†๐‘ฆโˆˆฮ”| ๐‘ƒ๐‘†๐‘‡ | max ห†๐‘งโˆˆฮ”|๐ธ | ห†๐‘ฆ๐‘‡๐‘Š ห†๐‘ง, (2.7) with the resulting optimal policies for each player computed as: ห†๐‘ฆโˆ— โˆˆ arg min ห†๐‘ฆโˆˆฮ”| ๐‘ƒ๐‘†๐‘‡ | ห†๐‘ฆT๐‘Š ห†๐‘งโˆ—, ห†๐‘งโˆ— โˆˆ arg max ห†๐‘งโˆˆฮ”| ๐ธ | ห†๐‘ฆโˆ—T๐‘Š ห†๐‘ง. The optimal policies ห†๐‘ฆโˆ— and ห†๐‘งโˆ— represent the probabilities of picking a resilient path ๐œ‹๐‘†๐‘‡ and attacking an edge ๐‘’ โˆˆ ๐ธ, respectively. We expect the complexity of this approach to scale undesirably (exponentially in the case of dense graphs) with the size of the roadmap. Therefore, a second objective in this chapter is to design a computationally efficient approach to find a resilient path. 2.3 Full Information Edge Game with a Termination Threshold In this section, we analyze the full information game over an edge ๐‘’ with a termination threshold of ๐ฟ = 1, i.e., the adversary is disabled immediately when the action pair {defend, attack} is played simultaneously. Furthermore, we assume a fixed stage cost matrix across all the stages, i.e., ๐‘†๐‘˜ = ๐‘†, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ๐‘’}. In particular, we will derive a method to compute the expected payoff ๐ฝ๐ธ (2.3), resulting in a Nash equilibrium for the FIE-game. As shown in Figure 2.1, the game stops either in the states indicated by ๐‘†๐‘† or at the final stage ๐พ๐‘’. 2.3.1 Nash equilibria and value of the game We define a matrix, ๐ท = 0 1 1 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป 15 which encodes if the action pair {defend, attack} was used in stage ๐‘˜ of the FIE-game. The value of zero-sum matrix X is given by Val(๐‘‹) := min๐‘ฆ๐‘˜ โˆˆฮ”2 max๐‘ง๐‘˜ โˆˆฮ”2 ๐‘˜ ๐‘‹ ๐‘ง๐‘˜ , where ๐‘ฆ๐‘˜ and ๐‘ง๐‘˜ are the space ๐‘ฆT of defender and adversary policies, respectively. For a full information game, a standard technique to solve such games using the cost-to-go function (e.g., see [61]) is to compute the solution of the Bellman equation backward in time of the form: ๐‘‰๐‘˜โˆ’1 = Val(๐‘‰๐‘˜ ๐ท + ๐‘†), (cid:18) ๐‘‰๐‘˜ = Val + 0 1 ๏ฃฎ ๏ฃน ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ 1 1 ๏ฃบ ๏ฃฏ (cid:32) (cid:32) ๏ฃฐ ๏ฃป (cid:125) (cid:123)(cid:122) (cid:124) ๐ท ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:124) ๐‘ 11 + (๐พ๐‘’ โˆ’ ๐‘˜)๐‘ 22 ๐‘ 21 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) ห†๐‘†๐‘˜ (cid:19) ๐‘ 12 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๐‘ 22 ๏ฃบ ๏ฃบ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป (cid:125) = min ๐‘ฆ๐‘˜ โˆˆฮ”2 max ๐‘ง๐‘˜ โˆˆฮ”2 ๐‘ฆT ๐‘˜ (cid:16) ๐‘‰๐‘˜ ๐ท + ห†๐‘† (cid:17) ๐‘ง๐‘˜ , (2.8) where ๐‘˜ โˆˆ {1, 2, . . . , ๐พ๐‘’} denotes the stage, ๐‘‰๐‘˜ is the expected value of the game at the ๐‘˜ ๐‘กโ„Ž stage, ๐‘†๐‘˜ is the stage cost matrix. The expected value of the game at any stage ๐‘˜ is given by: ๐‘‰๐‘˜โˆ’1 = ๐‘ฆโˆ— ๐‘˜ T (cid:16) ๐‘‰๐‘˜ ๐ท + ห†๐‘† (cid:17) ๐‘งโˆ— ๐‘˜ , (2.9) where {๐‘ฆโˆ— ๐‘˜ , ๐‘งโˆ— ๐‘˜ } is a Nash equilibrium policy at stage ๐‘˜. The following assumption enables us to analyze the e-game and derive closed-form expressions for the value of the game and player policies. Assumption 2.3.1 The following stage cost inequalities hold at any stage ๐‘˜ of the FIE-game, ๐‘ 21 > ๐‘ 12 โ‰ฅ ๐‘ 11 > ๐‘ 22 โ‰ฅ 0 Assumption 2.3.1, is commonly encountered in security-related problems, implying the cost of defense is lower against a no defense under an attack action. Consequently, the cost corresponding to a defense is higher than not defending against an attack-free scenario. The following theorem summarizes the analytic expressions for the Nash equilibrium policies and the corresponding value at stage ๐‘˜. 16 Theorem 2.3.2 Under Assumption 2.3.1, the unique Nash equilibrium at any stage ๐‘˜ for a full information edge game (FIE-game) with a termination threshold of ๐ฟ = 1 is given by: (cid:20) (cid:20) ๐‘ฆโˆ— ๐‘˜ = ๐‘งโˆ— ๐‘˜ = ๐‘ 22 โˆ’ ๐‘ 21 ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + (๐พ๐‘’ โˆ’ ๐‘˜ + 1)๐‘ 22 โˆ’ ๐‘‰๐‘˜ ๐‘ 11 โˆ’ ๐‘ 12 + (๐พ๐‘’ โˆ’ ๐‘˜)๐‘ 22 โˆ’ ๐‘‰๐‘˜ ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + (๐พ๐‘’ โˆ’ ๐‘˜ + 1)๐‘ 22 โˆ’ ๐‘‰๐‘˜ (cid:21) T , (2.10) ๐‘ 22 โˆ’ ๐‘ 12 ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + (๐พ๐‘’ โˆ’ ๐‘˜ + 1)๐‘ 22 โˆ’ ๐‘‰๐‘˜ ๐‘ 11 โˆ’ ๐‘ 21 + (๐พ๐‘’ โˆ’ ๐‘˜)๐‘ 22 โˆ’ ๐‘‰๐‘˜ ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + (๐พ๐‘’ โˆ’ ๐‘˜ + 1)๐‘ 22 โˆ’ ๐‘‰๐‘˜ (cid:21) T , (2.11) with the boundary condition ๐‘‰๐พ๐‘’ := 0. The value of the game is given by: ๐‘‰๐‘˜โˆ’1 = ๐‘‰๐‘˜ + det(๐‘†) + ๐‘ 22((๐พ๐‘’ โˆ’ ๐‘˜)๐‘ 22 โˆ’ ๐‘‰๐‘˜ ) ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + (๐พ๐‘’ โˆ’ ๐‘˜ + 1)๐‘ 22 โˆ’ ๐‘‰๐‘˜ , where det(๐‘†) is the determinant of the matrix ๐‘†. (2.12) โ–ก Please refer to 2.9 for the proof of Theorem 2.3.2. The derived Theorem 2.3.2 yields a closed- form expression to compute the solution of a FIE-game recursively, and is computationally efficient to evaluate even over very large number of stages ๐พ๐‘’. In order to study sensitivity of the solution of the FIE-game with respect to security and mobility costs, we parameterize the stage cost matrix ๐‘† with two ratios ๐‘Ÿ1 and ๐‘Ÿ2. The parameterized stage cost matrix is given by: ๐‘† = ๐‘ 11 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘Ÿ1 ๏ฃฏ ๏ฃฐ 1 ๐‘Ÿ2 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , where ๐‘Ÿ1 := ๐‘ 21 ๐‘ 11 , and ๐‘Ÿ2 := ๐‘ 22 ๐‘ 11 . (2.13) The motivation for such a parameterization stems from the fact that the payoff for defense remains independent of the action taken by the adversary. The cost of defense is represented by ๐‘ 11, whereas the loss of security is represented by ๐‘ 21, i.e., the attack drains resources from the vehicle and goes unnoticed. The mobility cost is denoted by ๐‘ 22, i.e., the cost incurred by the defender vehicle when it goes from current to next stage under no attack. Furthermore, the assumption of ๐‘Ÿ1 โ‰ฅ 1 and ๐‘Ÿ2 < 1 carries over from Assumption 2.3.1. The condition of ๐‘Ÿ1 โ‰ฅ 1 naturally fits the incentive of an adversary to cause a loss and, ๐‘Ÿ2 < 1 represents the minimum cost of mobility. The parameterized matrix (2.13) results in the following recursive value of the game: ๐‘‰๐‘˜โˆ’1 = ๐‘ 11 (cid:18) ๐‘‰๐‘˜ + ๐‘Ÿ2 โˆ’ ๐‘Ÿ1 + ๐‘Ÿ2((๐พ๐‘’ โˆ’ ๐‘˜)๐‘Ÿ2 โˆ’ ๐‘‰๐‘˜ ) ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜ + 1) โˆ’ ๐‘Ÿ1 โˆ’ ๐‘‰๐‘˜ (cid:19) . (2.14) 17 With the game well-defined in the final stage ๐พ๐‘’ with ๐‘‰๐พ๐‘’ := 0, the Nash equilibrium at the final stage ๐พ๐‘’ is given by the pair of policies: ๐‘ฆโˆ— ๐พ๐‘’ = (cid:105) T (cid:104) 1 0 , ๐‘งโˆ— ๐พ๐‘’ = (cid:20) 1 โˆ’ ๐‘Ÿ2 ๐‘Ÿ1 โˆ’ ๐‘Ÿ2 (cid:21) T . ๐‘Ÿ1 โˆ’ 1 ๐‘Ÿ1 โˆ’ ๐‘Ÿ2 (2.15) Thus, we can determine the limiting probabilities of attack and defense using (2.15) and assign appropriate costs to balance between performance and cost. The following result summarizes another key property of the FIE-game. Corollary 2.3.3 Given a FIE-game with a constant stage cost ๐‘†, the mixed policy for the defender and adversary at the start of a FIE-game in the limit as the number of stages ๐พ๐‘’ โ†’ โˆž satisfies, lim ๐พ๐‘’โ†’โˆž ๐‘ฆโˆ— 1 = (cid:104) 0 1 (cid:105) T , lim ๐พ๐‘’โ†’โˆž ๐‘งโˆ— 1 = (cid:104) 0 1 (cid:105) T . In short, this means that both players begin with not defending and not attacking, respectively, at the beginning of the FIE-game, and gradually (monotonically) shift the weights toward defending and attacking as the stages progress. Our next result examines what happens when the multi-stage game is played with a very small inter-stage period. Proposition 2.3.4 (Approximate solution) In the limit as the inter-stage time interval tends to zero, the value of an FIE-game at any stage ๐‘˜ satisfies ๐‘‰๐‘˜ โ‰ˆ โˆ’๐‘Ÿ1 + โˆš๏ธƒ ๐‘Ÿ 2 1 + (2๐‘Ÿ1๐พ๐‘’ + 2๐‘Ÿ1(1 โˆ’ ๐‘˜) + 1) + ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜) (2.16) โ–ก We present the proof in 2.9. Given the number of stages ๐พ๐‘’, we determine the value of a FIE-game at ๐‘˜ = 0, i.e., ๐‘‰0 using the solution of the game (equation (2.4)). From (2.16), we observe that the value of the FIE-game increases sublinearly with ๐พ๐‘’ for sufficiently small values of ๐‘Ÿ2 and linearly with ๐‘Ÿ2. We now investigate the sensitivity of the FIE-game solution to the values of ๐‘Ÿ1 and ๐‘Ÿ2 assuming a unit defense cost, i.e., ๐‘ 11 = 1. Figure 2.3a presents the FIE-game solution for various ๐‘Ÿ1 and 18 ๐‘Ÿ2 values. The ratio ๐‘Ÿ2 has a greater impact on determining the FIE-gameโ€™s value compared to ๐‘Ÿ1, as ๐‘Ÿ2 represents the minimum possible cost to pay. As ๐‘Ÿ2 increases, we observe that the value of the FIE-game grows approximately linearly with the number of stages, in accordance with the attack ๐‘งโˆ— approximate solution in Equation (2.16). The Nash equilibrium probability of defense ๐‘ฆโˆ— 0(1) and 0(1) at the start of an FIE-game are shown in Figures 2.3b and 2.4a, respectively. For a given ๐‘Ÿ2 and ๐‘Ÿ1, the probability of an attack is more likely for smaller values of ๐‘Ÿ1 and ๐‘Ÿ2, indicative of a cautious adversary. In contrast, the defender defends with a lower chance for small values of ๐‘Ÿ1 and ๐‘Ÿ2. Furthermore, we observe that the defender and adversary probabilities monotonically decrease with increasing stages ๐พ๐‘’, which aligns with Corollary 2.3.3. Finally, we compare the approximate value in Equation (2.16) with the recursive value of the FIE-game in Equation (2.14) and plot the percentage error relative to the recursive value of the FIE-game in Figure 2.4b. We notice that the approximation accuracy improves with decreasing ๐‘Ÿ2 and ๐‘Ÿ1, and the error for any given ๐‘Ÿ1 and ๐‘Ÿ2 tends to zero as the number of stages ๐พ๐‘’ increases. (a) (b) Figure 2.3 (a) Value of a FIE-game vs. stages ๐พ๐‘’ for a given set of ๐‘Ÿ2 and ๐‘Ÿ1 with a termination threshold of to ๐ฟ = 1. (b) Policy of the defender at ๐‘˜ = 0 of a FIE-game vs. stages ๐พ๐‘’ for the same set of ๐‘Ÿ2 and ๐‘Ÿ1 with a termination threshold of to ๐ฟ = 1. 19 5101520Stages Ke51015Value of the fie-game5101520Stages Ke00.20.40.60.81Defender policy (y0) (a) (b) Figure 2.4 (a) Policy of the adversary at the start (๐‘˜ = 0) of a FIE-game vs. stages ๐พ๐‘’ for the same conditions as in Figure 2.3a. (b) Percentage error between the approximate value (equation (2.16)) and recursive value (equation (2.14)) of the FIE-game with ๐ฟ = 1 for a set of game parameters. 2.4 Full Information Edge Game with an Arbitrary Termination Threshold We now extend the FIE-game for the case of ๐ฟ > 1. A FIE-game with ๐ฟ = 2 is illustrated in Figure 2.5. With increasing ๐ฟ, the game tree extends further where it would have originally terminated for lower values of ๐ฟ. 2.4.1 Value of the game and player policy For a FIE-game with an arbitrary number of termination threshold, we introduce a second matrix given by: ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ which encodes the value of the game corresponding to the action pair {defend, attack} at stage 0 0 1 0 ๐ธ = ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , ๐‘˜ + 1. We build on the procedure described in Section 2.3, and determine a recursive equation for 20 5101520Stages Ke0.10.20.30.40.50.6Attacker policy (z0)2004006008001000Stages Ke0102030Percentage error Figure 2.5 The FIE-game along with termination threshold of ๐ฟ = 2. The dynamic game shown can tolerate an action pair of {Defend, Attack} twice followed by disabling the adversary. The information set for the adversary and defender is indicated by the dotted line and nodes taking on a value ๐‘‰ ๐‘— for ๐‘– โˆˆ {0, 1, . . . ๐พ๐‘’ โˆ’ 1} and ๐‘— โˆˆ 0, 1, . . . , 3 respectively. The actions of the ๐‘– adversary (resp. defender) is abbreviated as {๐ด, ๐‘ ๐ด} (resp. {๐ท, ๐‘ ๐ท}) for {Attack, No attack} (resp. {Defend, No defend}). The termination states are denoted by ๐‘†๐‘† (blue colored node). value of the game given by: ๐‘‰ ๐‘– ๐‘˜โˆ’1 = Val(๐‘‰ ๐‘–+1 ๐‘˜ ๐ธ + ๐‘‰ ๐‘– ๐‘˜ ๐ท + ๐‘†), (cid:18) ๐‘‰ ๐‘–+1 ๐‘˜ = Val +๐‘‰ ๐‘– ๐‘˜ ๏ฃฎ ๏ฃน 1 0 ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ 0 0 ๏ฃบ ๏ฃฏ (cid:32) (cid:32) ๏ฃฐ ๏ฃป (cid:125) (cid:123)(cid:122) (cid:124) ๐ธ (cid:19) +๐‘† ๏ฃฎ ๏ฃน 0 1 ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ 1 1 ๏ฃบ ๏ฃฏ (cid:32) (cid:32) ๏ฃฐ ๏ฃป (cid:125) (cid:123)(cid:122) (cid:124) ๐ท (2.17) ๐‘ฆT ๐‘˜ 1 0 max ๐‘ง๐‘˜ โˆˆฮ”2 = min ๐‘ฆ๐‘˜ โˆˆฮ”2 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ where the number of attacks defended prior to a stopping state is given by ๐‘– โˆˆ {๐ฟ โˆ’ 1, ๐ฟ โˆ’ 2, . . . , 1}, ๐‘˜ is the expected value of the FIE-game at the ๐‘˜ ๐‘กโ„Ž stage after the ๐‘–-th instance of an attack detection. ๐‘‰ ๐‘– For ๐‘– = ๐ฟ โˆ’ 1, we use the solution of FIE-game with single termination threshold. (cid:169) ๐‘‰ ๐‘–+1 (cid:173) ๐‘˜ (cid:173) (cid:171) + ๐‘†(cid:170) (cid:174) (cid:174) (cid:172) + ๐‘‰ ๐‘– ๐‘˜ 0 1 0 0 1 1 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ง๐‘˜ , At every stage, only one out of two events can occur โ€“ either the attack is detected and therefore the value of ๐‘– increments by one, or the attack goes undetected and thus, ๐‘– remains unchanged. The 21 expected value of the game for stage ๐‘˜ after the ๐‘–-th attack gets defended is given by: ๐‘˜โˆ’1 = ๐‘ฆ๐‘–โˆ—T ๐‘‰ ๐‘– ๐‘˜ (cid:16) ๐‘˜ ๐ธ + ๐‘‰ ๐‘– ๐‘‰ ๐‘–+1 ๐‘˜ ๐ท + ๐‘† (cid:17) ๐‘ง๐‘–โˆ— ๐‘˜ , (2.18) where {๐‘ฆ๐‘–โˆ— ๐‘˜ ๐‘ง๐‘–โˆ— ๐‘˜ } is the corresponding mixed Nash equilibrium policy. When ๐‘– = ๐ฟ, it corresponds to immediate engagement of the adversary, thus represented by the recursive equation of a single termination threshold (2.8). We now present the results for FIE-game with an arbitrary finite termination threshold: Corollary 2.4.1 Under Assumption 2.3.1, the unique Nash equilibrium policy at any stage ๐‘˜ after ๐‘– instances of {attack, defense} action pairs of the FIE-game with a termination threshold of ๐ฟ is given by: (cid:34) (cid:34) ๐‘ฆ๐‘–โˆ— ๐‘˜ = ๐‘ง๐‘–โˆ— ๐‘˜ = ๐‘ 22 โˆ’ ๐‘ 21 ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + ๐‘ 22 + ๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ๐‘ 11 โˆ’ ๐‘ 12 + ๐‘ 22 + ๐‘‰ ๐‘–+1 ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + ๐‘ 22 + ๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ๐‘ 22 โˆ’ ๐‘ 12 ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + ๐‘ 22 + ๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ๐‘ 11 โˆ’ ๐‘ 21 + ๐‘ 22 + ๐‘‰ ๐‘–+1 ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + ๐‘ 22 + ๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ (cid:35) T (cid:35) T , , (2.19) (2.20) with the boundary condition ๐‘‰ ๐‘– ๐พ๐‘’ := 0, โˆ€๐‘– โˆˆ {1, . . . , ๐ฟ โˆ’ 1}. The value of the FIE-game is given by: det(๐‘†) + ๐‘ 22(โˆ’๐‘‰ ๐‘– ๐‘˜ + ๐‘‰ ๐‘–+1 ) ๐‘˜ ๐‘˜ + ๐‘‰ ๐‘–+1 ๐‘ 11 โˆ’ ๐‘ 12 โˆ’ ๐‘ 21 + ๐‘ 22 โˆ’ ๐‘‰ ๐‘– ๐‘˜ (2.21) ๐‘˜ + . ๐‘˜โˆ’1 = ๐‘‰ ๐‘– ๐‘‰ ๐‘– โ–ก We skip the proof for Corollary 2.4.1, as the proof is analogous to Theorem 2.3.2 with a change in the zero-sum matrix. The mixed policies in Equation (2.19) and (2.20) are defined for attacks when the number of stages ๐พ๐‘’ โ‰ฅ ๐ฟ. When ๐พ๐‘’ < ๐ฟ, we determine the policies only for ๐‘– โˆˆ {1, 2, . . . , ๐พ๐‘’}. 2.4.2 Parameterized stage cost and numerical evaluation Using the parameterized cost matrix (2.13), the expected value of the FIE-game in equa- tion (2.21) satisfies: ๐‘‰ ๐‘– ๐‘˜โˆ’1 = ๐‘ 11 (cid:32) ๐‘‰ ๐‘– ๐‘˜ + + ๐‘Ÿ2 โˆ’ ๐‘Ÿ1 + ๐‘Ÿ2(โˆ’๐‘‰ ๐‘– ๐‘Ÿ2 โˆ’ ๐‘Ÿ1 โˆ’ ๐‘‰ ๐‘– ๐‘˜ + ๐‘‰ ๐‘–+1 ๐‘˜ ๐‘˜ + ๐‘‰ ๐‘–+1 ๐‘˜ (cid:33) ) (2.22) 22 Equation (2.22) is defined for instances of ๐‘– โˆˆ {1, 2, . . . , ๐ฟ โˆ’ 1}. For ๐‘– = ๐ฟ, we resort to the recursive expected value of the FIE-game defined in Equation (2.14). From Equation (2.22), we observe that the value of the game at any stage ๐‘˜ โˆ’ 1 after the ๐‘–-th instance is dependent on the value of the game at ๐‘˜ under instances ๐‘– and ๐‘– + 1. This dependency can also be observed in Figure 2.5, where the ๐‘–๐‘กโ„Ž attack node branches into the (๐‘– + 1)๐‘กโ„Ž attack. Following the previous section, we study the effect of ๐‘Ÿ1 and ๐‘Ÿ2 under unit defense cost, i.e., ๐‘ 11 = 1. The value of the FIE-game for varying numbers of stages ๐พ๐‘’ and a set of ๐‘Ÿ1 values are shown in Figures 2.6a and 2.6b. We observe that the value of the FIE-game depends strongly on the termination threshold ๐ฟ; for a given ๐‘Ÿ1 and ๐‘Ÿ2, the value of the FIE-game increases by a significant amount with increasing ๐ฟ. Furthermore, with a larger termination threshold, the value of the FIE-game becomes independent of ๐‘Ÿ1. From the analysis of ๐ฟ = 1, we observed that at equilibrium, the probability of defense increases and that of attack decreases with increasing value of ๐‘Ÿ1. Therefore, in this section, we observe the player policies for a fixed ๐‘Ÿ1. The equilibrium policies at the start of FIE-game with ๐ฟ termination threshold are shown in Figure 2.7a and 2.7b. We observe that the probability of defense increases with a larger termination threshold ๐ฟ. This is reflective of a defender accounting for multiple attacks before engagement. In contrast, we observe that the attack probability at any stage is much lower for a large ๐ฟ. This indicates the adversary being aware of multiple attack possibilities and wants to gain as much as possible. Additionally, we observe that the optimal attack policy decreases at a lower rate when compared to smaller value of ๐ฟ. 2.5 Partial Information Edge Game with a Termination Threshold We now present the solution to the PIE-game over an edge ๐‘’ with ๐พ๐‘’ stages. For ease of exposition, we assume ๐ฟ = 1, although by following steps similar to those outlined in Section 2.4, it is possible to extend the approach to a general value of ๐ฟ with careful book-keeping. Recall that the defender has partial information since it is uncertain about the game state whenever it chooses not to defend. This causes the information sets to span across different branches of the game tree (cf. Figure 2.8). Consequently, this model introduces a constraint on the adversary; if it attacks at any stage, then it is constrained to continue to attack at subsequent stages until it gets caught (reaches a 23 (a) (b) Figure 2.6 (a) Value of the FIE-game across multiple ๐‘Ÿ2, stages ๐พ๐‘’ and termination threshold ๐ฟ for given ๐‘Ÿ1 = 1.5. (b) Value of the FIE-game across multiple ๐‘Ÿ2, stages ๐พ๐‘’ and termination threshold ๐ฟ for given ๐‘Ÿ1 = 3.0. (a) (b) Figure 2.7 (a) Policy of the defender at start stage (๐‘˜ = 0) of FIE-game for increasing ๐‘Ÿ2 and the number of attacks ๐ฟ across given stages ๐พ๐‘’.(b) Policy of the adversary at start stage (๐‘˜ = 0) of FIE-game for increasing ๐‘Ÿ2 and the number of attacks ๐ฟ across given stages ๐พ๐‘’. stopping state) or the game reaches its final stage. This constraint arises from the perspective of a defender (CAV). In other words, when an attack occurs (spoofed vehicle) followed by a no-attack stage (removing the spoofed vehicle), the defender would be alerted to the presence of an adversary in the system. In a realistic scenario, if a CAV were to observe a vehicle (spoofed) signal toggling 24 5101520Stages Ke51015Value of the fie-game5101520Stages Ke51015Value of the fie-game5101520Stages Ke0.40.60.811.2Defender policy (y0)5101520Stages Ke0.10.20.30.40.5Attacker policy (z0) on and off, this would reveal the existence of an adversary in the current path. Figure 2.8 The PIE-game with a termination threshold of ๐ฟ = 1 with ๐พ๐‘’ = 2. The leaf node ๐‘†๐‘† represents the stopping state. The dotted line indicates the information set for the corresponding player. The notation ๐›ผ๐‘˜ (resp. ๐›ฝ๐‘˜ ) represents information set for the defender (resp. adversary) for the stage ๐‘˜ โˆˆ 1, 2. The value of each leaf node is represented by ๐‘„๐‘š for ๐‘š โˆˆ {1, 2, . . . , 8}. The leaf node values are presented in Section 2.5.1. 2.5.1 Formulation and solution of a 2 stage game We will illustrate a procedure to solve the PIE-game with ๐พ๐‘’ = 2 and use mathematical induction to solve any PIE-game with an arbitrary finite number of stages ๐พ๐‘’. From Figure 2.8, we observe that the PIE-game consists of 8 leaf nodes with values defined as: ๐‘„1 = ๐‘ 11 + ๐‘ 21, ๐‘„2 = ๐‘ 12 + ๐‘‰ 1 1 , ๐‘„3 = ๐‘ 21 + ๐‘ 11, ๐‘„4 = 2๐‘ 21, ๐‘„5 = ๐‘ 22 + ๐‘ 11, ๐‘„6 = ๐‘ 22 + ๐‘ 12, ๐‘„7 = ๐‘ 22 + ๐‘ 21, ๐‘„8 = 2๐‘ 22. Let ๐‘ฆ๐›ผ๐‘˜ ๐‘– and ๐‘ง๐›ฝ๐‘˜ ๐‘– represent the defender and adversary policy with 2 actions, ๐‘– โˆˆ {1, 2} and at stage ๐‘˜ โˆˆ {1, 2, ...๐พ๐‘’}. The information sets for the adversary and defender at stage ๐‘˜ are represented by ๐›ผ๐‘˜ and ๐›ฝ๐‘˜ , respectively. The expected value of the 2 stage game is given by: ๐‘‰0(๐‘ฆ, ๐‘ง) = ๐‘ฆ๐›ผ1 1 ๐‘ง๐›ฝ1 2 ๐‘ฆ๐›ผ1 2 ๐‘ง๐›ฝ1 1 ๐‘ฆ๐›ผ2 1 1 ๐‘„1 + ๐‘ฆ๐›ผ1 ๐‘ง๐›ฝ2 1 ๐‘„5 + ๐‘ฆ๐›ผ1 ๐‘ง๐›ฝ1 2 2 ๐‘„2 + ๐‘ฆ๐›ผ1 2 ๐‘ง๐›ฝ2 ๐‘ง๐›ฝ1 ๐‘ฆ๐›ผ2 2 1 2 ๐‘ง๐›ฝ1 1 ๐‘ฆ๐›ผ2 1 ๐‘„3 + ๐‘ฆ๐›ผ1 2 ๐‘ง๐›ฝ1 ๐‘ฆ๐›ผ2 2 2 ๐‘ง๐›ฝ1 1 ๐‘ง๐›ฝ2 1 2 ๐‘ฆ๐›ผ2 2 ๐‘„4+ ๐‘„7 + ๐‘ฆ๐›ผ1 2 ๐‘„6 + ๐‘ฆ๐›ผ1 ๐‘ง๐›ฝ1 2 ๐‘ฆ๐›ผ2 2 ๐‘ง๐›ฝ2 2 ๐‘„8, (2.23) 25 where ๐‘ฆ, ๐‘ง are the probability distributions of the defender and adversary actions given by: ๐‘ฆ = (cid:104) ๐‘ฆ๐›ผ1 1 (cid:105) T ๐‘ฆ๐›ผ1 2 , ๐‘ง = (cid:104) ๐‘ง๐›ฝ1 1 (cid:105) T . ๐‘ง๐›ฝ1 2 (2.24) By a change of variables, (2.23) can be re-written as, ๐‘‰0( หœ๐‘ฆ, หœ๐‘ง) = หœ๐‘ฆ1 หœ๐‘ง1๐‘„1 + หœ๐‘ฆ1 หœ๐‘ง2๐‘„2 + หœ๐‘ฆ2 หœ๐‘ง1๐‘„3 + หœ๐‘ฆ3 หœ๐‘ง1๐‘„4 + หœ๐‘ฆ2 หœ๐‘ง3๐‘„5 + หœ๐‘ฆ2 หœ๐‘ง4๐‘„6 + หœ๐‘ฆ3 หœ๐‘ง3๐‘„7 + หœ๐‘ฆ3 หœ๐‘ง4๐‘„8, (2.25) where หœ๐‘ฆ, หœ๐‘ง are multinomial probability distributions over the defender and adversary actions, given by: หœ๐‘ฆ = (cid:104) ๐‘ฆ๐›ผ1 1 ๐‘ฆ๐›ผ1 2 ๐‘ฆ๐›ผ2 1 ๐‘ฆ๐›ผ1 2 ๐‘ฆ๐›ผ2 2 (cid:105) T , หœ๐‘ง = (cid:104) ๐‘ง๐›ฝ1 1 ๐‘ง๐›ฝ1 2 ๐‘ง๐›ฝ1 2 ๐‘ง๐›ฝ2 1 ๐‘ง๐›ฝ1 2 ๐‘ง๐›ฝ2 2 (cid:105) T . (2.26) The Nash equilibrium policy of the defender and adversary, and the value of the PIE-game are determined by solving the following zero-sum matrix game, ๐‘‰0 (cid:17) min หœ๐‘ฆโˆˆฮ”3 max หœ๐‘งโˆˆR4 โ‰ฅ0 หœ๐‘ฆT 0 0 ๏ฃฎ ๐‘„1 ๐‘„2 ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘„3 ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘„4 ๏ฃฏ ๏ฃฏ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃฐ (cid:124) 0 ๐‘„5 ๐‘„6 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ 0 ๐‘„7 ๐‘„8 ๏ฃบ ๏ฃบ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป (cid:123)(cid:122) (cid:125) ๐ด หœ๐‘ง. (2.27) Equation (2.27) can be posed as a linear program to solve for the security level and actions at each stage of both the adversary and defender. 2.5.2 Formulation and solution of a ๐พ๐‘’ stage PIE-game In order to solve a PIE-game with an arbitrary number of stages ๐พ๐‘’, we need to determine the total number of leaf nodes and then generate the corresponding zero-sum matrix (analogous to Equation (2.27)). Using an induction argument, it follows that the structure of the game matrix for varying numbers of stages is illustrated in Figure 2.9. For a given number of stages ๐พ๐‘’ โ‰ฅ 2, the total number of leaf nodes ๐‘‡ (entries in the game matrix) is given by: ๐‘‡ = 4๐พ๐‘’ + (๐พ๐‘’ โˆ’ 1) (๐พ๐‘’ โˆ’ 2) 2 . (2.28) 26 The multinomial probability distribution for the defender and adversary comprising of the behavioral policies for ๐พ๐‘’ stages are defined as: หœ๐‘ฆ = [๐‘ฆ๐›ผ1 1 ๐‘ฆ๐›ผ1 2 ๐‘ฆ๐›ผ2 1 ๐‘ฆ๐›ผ1 2 ๐‘ฆ๐›ผ2 2 ๐‘ฆ๐›ผ3 1 หœ๐‘ง = [๐‘ง๐›ฝ1 1 ๐‘ง๐›ฝ1 2 ๐‘ง๐›ฝ1 2 ๐‘ง๐›ฝ2 1 ๐‘ง๐›ฝ1 2 ๐‘ง๐›ฝ2 2 . . . . . . (cid:16)(cid:206)๐พ๐‘’โˆ’1 ๐‘š=1 (cid:16)(cid:206)๐พ๐‘’โˆ’1 ๐‘š=1 ๐‘ฆ๐›ผ๐‘š 2 ๐‘ง๐›ฝ๐‘š 2 (cid:17) (cid:17) ๐‘ฆ๐›ผ๐พ๐‘’ 1 ๐‘ง๐›ฝ๐พ๐‘’ 1 (cid:206)๐พ๐‘’ ๐‘š=1 ๐‘ฆ๐›ผ๐‘š 2 ]T, (cid:206)๐พ๐‘’ ๐‘š=1 ๐‘ง๐›ฝ๐‘š 2 ]T. (2.29) The dimensions of multinomial probability distribution for the defender and adversary are หœ๐‘ฆ โˆˆ R๐พ๐‘’+1 and หœ๐‘ง โˆˆ R2๐พ๐‘’, respectively. Similar to the FIE-game, for a given number of stages ๐พ๐‘’, the value of PIE-game at any stage ๐‘˜ is a function of the stage cost ๐‘† and the value of PIE-game at the next stage, recursively defined as: ๐‘‰๐‘˜ (cid:17) ๐‘“ (๐‘‰๐‘˜+1, ๐‘†) = Val( ๐ด(๐พ๐‘’ โˆ’ ๐‘˜)), ๐‘˜ = {0, 1, . . . , ๐พ๐‘’ โˆ’ 1}, where ๐ด(๐พ๐‘’ โˆ’ ๐‘˜) โˆˆ R๐‘˜+1ร—2๐‘˜ is the zero-sum matrix at stage ๐‘˜. When ๐‘˜ = ๐พ๐‘’ โˆ’ 1, the zero-sum matrix ๐ด(1) = ๐‘†. Figure 2.9 Illustration of the PIE-game matrix ๐ด structure for any given number of stages, ๐พ๐‘’. The solid square blocks indicate the leaf node entries, the triangle blocks indicate the solution from the preceding stage game, the diamond block indicates the value ๐‘‰๐พ๐‘’ with ๐พ๐‘’ = 2, and the empty space indicates zeros. For a given number of stages ๐พ๐‘’, the game matrix ๐ด is recursively solved from ๐ด(2) to ๐ด(๐พ๐‘’). A pictorial representation in constructing the zero-sum matrix ๐ด(๐พ๐‘’ โˆ’ ๐‘˜) is illustrated in Figure 2.9. The solution(s) of the previous stage(s) is (are) indicated by the triangular blocks. The 27 only exception is for a 2 stage game indicated by a diamond block, where we use the expected solution of the stage cost matrix in a minimax setting. Similar to the 2 stage setting, we populate the game matrix for any given number of stages ๐พ๐‘’ to fill the entries of a game matrix for any given number of stages ๐พ๐‘’. We then obtain the value of the PIE-game by solving the problem: The security level of the defender and corresponding probability distribution หœ๐‘ฆ for a PIE-game with ๐‘‰0 (cid:17) min หœ๐‘ฆโˆˆฮ”๐พ๐‘’+1 max หœ๐‘งโˆˆR2๐พ๐‘’ โ‰ฅ0 หœ๐‘ฆT ๐ด(๐พ๐‘’) หœ๐‘ง. (2.30) ๐พ๐‘’ stages are computed from the solution of the linear program: ๐‘ฃ min ๐‘ฃโˆˆR, หœ๐‘ฆโˆˆฮ”๐พ๐‘’+1 subject to ๐ด(๐พ๐‘’)T หœ๐‘ฆ โ‰ค ๐‘ฃ1๐พ๐‘’+1, (2.31) where 1๐พ๐‘’+1 denotes the vector of ones of size ๐พ๐‘’ + 1. Similarly, the security level of the adversary and corresponding probability distribution หœ๐‘ง for the PIE-game with ๐พ๐‘’ stages are obtained from the linear program: max ๐‘ฃโˆˆR, หœ๐‘งโˆˆR2๐พ๐‘’ โ‰ฅ0 ๐‘ฃ subject to ๐ด(๐พ๐‘’) หœ๐‘ง โ‰ฅ ๐‘ฃ1๐พ๐‘’+1, หœ๐‘ง1 + หœ๐‘ง2 = 1, หœ๐‘ง3 + หœ๐‘ง4 = หœ๐‘ง2, ... หœ๐‘ง2๐พ๐‘’โˆ’1 + หœ๐‘ง2๐พ๐‘’ = หœ๐‘ง2๐พ๐‘’โˆ’2. (2.32) Let the solutions obtained from (2.31) and (2.32) be หœ๐‘ฆโˆ— and หœ๐‘งโˆ—, respectively. Thus, the solution of PIE-game for ๐พ๐‘’ stages is: To solve the PIE-game for ๐พ๐‘’ stages, we recursively solve a zero-sum matrix game from stages 2 to ๐‘‰0 (cid:17) หœ๐‘ฆโˆ—T ๐ด(๐พ๐‘’) หœ๐‘งโˆ—. (2.33) ๐พ๐‘’ โˆ’ 1, to construct the matrix ๐ด(๐พ๐‘’) as illustrated in Figure 2.9. 2.5.3 Parametric stage cost and numerical illustration We now evaluate the value of the PIE-game with parametric stage costs and compare it against different parameters. We use the same parametric stage costs ๐‘† as defined in Section 2.3 with a 28 unit defense cost ๐‘ 11 = 1. The values of the PIE-game for different sets of ๐‘Ÿ1 and ๐‘Ÿ2 are shown in Figure 2.10a. Similar to the FIE-game, we observe that ๐‘Ÿ2 impacts the value of the PIE-game to a greater degree compared to ๐‘Ÿ1. In other words, the mobility cost has a greater influence on the value of the game compared to the security cost. For a given number of stages ๐พ๐‘’, the adversary and defender policies for attack and defense actions at the start of the game (first stage) are shown in Figures 2.10b and 2.11a. The probability of attack decreases monotonically for all the values of ๐‘Ÿ1 and ๐‘Ÿ2. A larger security loss leads to a lower attack probability, indicating that the adversary wants to prolong the game without being caught. The defender policy, on the other hand, shows a monotonic decrease when the mobility cost is zero and a decrease followed by an increase with the number of stages ๐พ๐‘’ for non-zero mobility cost. For the monotonically decreasing case, the defender opts to maintain a minimum net cost from the no defense action as the probability of attack is low. This is contrary to the behavior under a non-zero mobility cost, where the probability of defending decreases in the first few stages and then maintains the same probability till the last stage, due to the partial information structure. Finally, we compare the solution of the PIE-game with the FIE-game for the same values of ๐‘Ÿ1 and ๐‘Ÿ2 in Figure 2.11b. We observe that the value of the PIE-game and FIE-game are identical. However, the policies of both games are significantly different due to the difference in information structures between the games. 2.6 Solution of the Meta-game In sections 2.3 and 2.5, we solved the edge-game under the full (FIE-game) and partial (PIE- game) information structures, respectively. Since we observed the expected value of FIE-game and PIE-game to be identical, in this section, we use either of the edge-game solutions from Section 2.3 and 2.5 to determine a secure path. Consider the roadmap ๐บ with vertices ๐‘‰ and directed edges ๐ธ, with each edge ๐‘’ โˆˆ ๐ธ being associated with a finite number of stages ๐พ๐‘’. For each edge ๐‘’ with stages ๐พ๐‘’ we determine the solution to a FIE/PIE-game. The solutions are then used to populate a meta-game matrix ๐‘Š (from Section 2.2.2) that represents the choice of a path taken by the defender (row of ๐‘Š) and the choice 29 (a) (b) Figure 2.10 (a) Value of the PIE-game for a set of ๐‘Ÿ1 and ๐‘Ÿ2 for given stages ๐พ๐‘’. (b) Policy of the adversary with the attack action at the start stage (๐‘˜ = 0) of a PIE-game for given ๐‘Ÿ1, ๐‘Ÿ2, and stages ๐พ๐‘’. (a) (b) Figure 2.11 (a) Policy of the defender with defend action at the start stage (๐‘˜ = 0) of a PIE-game for given ๐‘Ÿ1, ๐‘Ÿ2, and stages ๐พ๐‘’.(b) The value of a PIE-game and FIE-game vs. stages ๐พ๐‘’ for the same set of ๐‘Ÿ1 and ๐‘Ÿ2. of the edge to attack (column of ๐‘Š). With a slight abuse of notation, we denote ๐œ‹๐‘– โˆˆ ๐‘ƒ๐œˆ,๐œ‰ as the ๐‘–th path out of ๐‘š paths, such that |๐‘ƒ๐œˆ,๐œ‰ | = ๐‘š. Similarly, we use ๐‘’ ๐‘— โˆˆ ๐ธ to denote the ๐‘— th edge out of ๐‘› edges, such that |๐ธ | = ๐‘›. For ๐‘š possible paths and ๐‘› attack edges, the meta-game matrix is given 30 5101520Stages, Ke5101520Value of the gamer1:=1.5,r2:=0.0r1:=1.5,r2:=0.5r1:=3.0,r2:=0.0r1:=3.0,r2:=0.55101520Stages, Ke0.10.20.30.4Probability , z11(1)A, r1:=1.5,r2:=0.0A, r1:=1.5,r2:=0.5A, r1:=3.0,r2:=0.0A, r1:=3.0,r2:=0.55101520Stages, Ke0.40.60.81.01.2Probability , y11(1)D, r1:=1.5,r2:=0.0D, r1:=1.5,r2:=0.5D, r1:=3.0,r2:=0.0D, r1:=3.0,r2:=0.55101520Stages, Ke5101520Value of the gamePIF,r1:=1.5,r2:=0.5FIF,r1:=1.5,r2:=0.5PIF,r1:=3.0,r2:=0.5FIF,r1:=3.0,r2:=0.5 by: ๐‘Š = . . . ๐‘Š๐œ‹1๐‘’1 ๐‘Š๐œ‹1๐‘’2 ๐‘Š๐œ‹2๐‘’1 ๐‘Š๐œ‹2๐‘’2 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘Š๐œ‹๐‘š๐‘’1 ๐‘Š๐œ‹๐‘š๐‘’2 ๏ฃฏ ๏ฃฐ . . . . . . . . . . . . ๐‘Š๐œ‹1๐‘’๐‘› . . . ๐‘Š๐œ‹2๐‘’๐‘› . . . . . . . . . . . . . . . ๐‘Š๐œ‹๐‘š๐‘’๐‘› ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป . A path ๐œ‹๐‘– contains ๐‘๐‘– โІ ๐ธ linked edges. ๐‘Š๐œ‹๐‘– ๐‘’ ๐‘— represents the sum of edge costs on the path ๐œ‹๐‘– given the adversary attacks edge ๐‘’ ๐‘— , and is given by: ๐‘Š๐œ‹๐‘– ๐‘’ ๐‘— = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ (cid:205)๐‘ฅโˆˆ๐‘๐‘– (cid:205)๐พ๐‘’๐‘ฅ ๐‘˜=1 ๐‘†22,๐‘˜ , if ๐‘’ ๐‘— โˆ‰ ๐œ‹๐‘– (cid:205)๐‘’ ๐‘— โˆˆ๐œ‹๐‘– ๐‘ค๐‘’ ๐‘— , otherwise. (2.34) The if condition refers to the cost of mobility under the assumption that the entire path is free of any attack. The latter condition pertains to the cost of a path while under attack along one of its edges, as defined in Equation (2.6). The zero-sum meta-game ๐‘Š is solved using a standard linear programming technique [61] to obtain an attack-resilient path. The policies obtained for the defender and adversary in the FIE/PIE-game correspond to the actions over an edge ๐‘’, whereas here, the meta-policy of the defender and adversary provides the probability of selecting paths and edges, respectively. However, the computational complexity of this approach scales undesirably with the number of paths in the roadmap. To address the scalability aspect, we propose the following heuristic: replace every edge of the roadmap by the Nash equilibrium value of the e-game. Then, use any standard shortest path algorithm (e.g., Dฤณkstra algorithm [42] for directed acyclic roadmaps) to compute an optimal path. For the resultant heuristic path, we determine the attack edge which maximizes the path cost. The approach is summarized in Algorithm 1. The meta-game solution ๐‘Š๐‘ ๐ธ is compared against the length ๐ฟ๐‘†๐ธ ๐ด of the shortest path ๐œ‹๐‘†๐‘ƒ (shortest path heuristic) following the constraints that only one of the edge ๐‘’ in the graph ๐บ can be attacked. Here, ๐œ‹๐‘†๐‘ƒ denotes the shortest path. 31 (a) (b) (c) (d) Figure 2.12 (a) Illustration of a simple graph with 3 vertices and 3 edges. The start and end vertex is indicated with ๐œˆ and ๐œ‰ respectively. The number of stages between the nodes ๐‘– and ๐‘— are given (b) The simple network (figure 2.12a) with stages over the edge, ๐‘˜ ๐œˆ1 = ๐‘˜1๐œ‰ = 3 and by ๐‘˜๐‘–, ๐‘— . ๐‘˜ ๐œˆ๐œ‰ = 6. The shortest path is calculated over the edge weights. (c) The solution of the simple graph meta-game with the defender probabilities over the paths. The shortest path is indicated with a larger arrow as compared to others and with lighter shade of vertex. (d) The solution of the simple graph meta-game with the adversary probabilities over the edges. Algorithm 1: Shortest path edge attack Input: G(graph) Output: ๐ฟSEA for every ๐‘’ โˆˆ ๐ธ do Set ๐‘ค๐‘’ = ๐‘‰0 for edge ๐‘’ ; end ยฏ๐œ‹ = Dijkstra (๐‘‰, {๐‘ค๐‘’1, . . . , ๐‘ค๐‘’ | ๐ธ | }) Determine the row ๐‘Š ยฏ๐œ‹ โˆˆ ๐‘Š corresponding to the path ยฏ๐œ‹ โˆˆ ๐‘ƒ๐œˆ๐œ‰. ๐ฟSEA = arg max๐‘ฅโˆˆ๐ธ ๐‘Š ยฏ๐œ‹๐‘ฅ Figure 2.12a illustrates this algorithm on a graph consisting of two paths namely; ๐‘ƒ๐œˆ๐œ‰ = {{๐‘’๐œ‰๐œˆ}, {๐‘’๐œˆ1, ๐‘’1๐œ‰ }}, i.e., from vertex ๐œˆ to ๐œ‰, and from vertex ๐œˆ โ†’ 1 followed by 1 โ†’ ๐œ‰. The set of attack edges is given as ๐ธ = {๐‘’๐œˆ๐œ‰, ๐‘’๐œˆ1, ๐‘’1๐œ‰ }. We assume the fixed stage cost matrix given by: ๐‘† = , ๏ฃฎ 30 30 ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ 70 10 ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป for the graph, and solve the FIE-game and meta-game. We summarize the results of the simple graph in Figure 2.12b. The figure also depicts the shortest path which resulted from Algorithm 1. The defender and adversary probabilities are shown in Figures 2.12c and 2.12d, respectively. We observe that the probability of choosing the shortest path by the defender and an attack edge on the same path is higher compared to the alternate path. The obtained policies are dependent on the 32 S1T77.63158138.194177.63158Shortest pathS1T77.63158138.194177.631580.37860.6214Defender probabilitiesShortest pathS1T77.63158138.194177.631580.31070.3786Attacker probabilitiesShortest path stages ๐พ๐‘’ and stage costs along each edge ๐‘’, thus motivating us to study the game parameters. 2.6.1 Sensitivity of optimal policies to the game parameters We first study the sensitivity of defender (paths) and adversary (edges) policies for the simple graph ๐บ (Figure 2.12a) as a function of stage cost entries and stages ๐พ๐‘’ along an edge ๐‘’. In the first scenario, we examine the sensitivity over stage costs. The stage cost is parameterized with two ratios also defined in equation (2.13) as, ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘Ÿ1 ๏ฃฏ ๏ฃฏ ๏ฃฐ We abbreviate the shortest path as ๐œ‹def and edge along the same as ๐‘’att. The sensitivity plot for ๐‘† = ๐‘ 11 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐‘Ÿ2 1 1 , both ๐œ‹def and ๐‘’att for changing ๐‘Ÿ1 and ๐‘Ÿ2 are shown in Figures 2.13a and 2.13b, respectively. The probability of choosing the shortest path and edge decreases with increasing ๐‘Ÿ1, indicating that the defender (resp. adversary) is aware of risks and chooses alternate paths as opposed to the shortest path. With increasing ๐‘Ÿ2, we observe that the probability of choosing the shortest path also decreases. This relates to a high cost of mobility, i.e., under no defense and no attack, the payoff is high, and therefore, the defender prefers alternate path(s). (a) (b) (c) (d) Figure 2.13 (a) The sensitivity of choosing the shortest path (๐œ‹def) with changing ๐‘Ÿ1 and ๐‘Ÿ2 with fixed stages over each edge. (b) The sensitivity of choosing the shortest path edge (๐‘’att) with changing ๐‘Ÿ1 and ๐‘Ÿ2 with fixed stages over each edge (c) The sensitivity of choosing the shortest path (๐œ‹def) with changing number of stages over the edges ๐พ๐œ‹๐‘†1,1๐‘‡ and ๐พ๐œ‹๐‘†๐‘‡ given a fixed stage cost. (d) The sensitivity of choosing the shortest path edge (๐‘’att) with changing number of stages over the edges ๐พ๐œ‹๐‘†1,1๐‘‡ and ๐พ๐œ‹๐‘†๐‘‡ given a fixed stage cost. Next, we characterize the sensitivity of choosing ๐œ‹def and ๐‘’att with varying number of stages ๐พ๐‘’ along an edge ๐‘’, i.e., along the shortest path and along the alternate path which consists of 33 12345r100.20.40.60.8r20.360.380.4Defender Path probabilities (def)12345r100.20.40.60.8r20.360.380.4Attacker Edge probabilities (eatt)12345KST12345KS1,1T00.51Defender Path probabilities (def)12345KST12345KS1,1T00.51Attack Edge probabilities (eatt) (a) (b) (c) (d) Figure 2.14 (a) A graph consisting of 10 nodes which is sparsely connected. The output of Algorithm 1 is path 1 โˆ’ 10. Of all paths available, the path 1 โˆ’ 2 โˆ’ 10 has highest likelihood of getting selected. (b) Edge 1 โˆ’ 10 has the least chance of being attacked, while edge 2 โˆ’ 10 has the highest chance of getting attacked. (c) The probability of choosing the shortest path for graphs with an average vertex degree in the interval [2, 3]. (d) Probability of choosing the shortest path in a fully connected graph. two edges. We increase the number of stages on both edges equally. From Figure 2.13c, it can be inferred that the probability of choosing the shortest path ๐œ‹def (resp. alternate path) monotonically increases (resp. decreases) with the number of stages. Similarly, from Figure 2.13d, the probability of choosing the shortest path edge ๐‘’att is directly proportional to the number of stages over the edge ๐‘’๐œˆ๐œ‰ and is inversely proportional to the number of stages along the path ๐‘’๐œˆ1 โˆช ๐‘’1๐œ‰. Thus, if the number of stages over a path is significantly higher than over other paths, then the defenderโ€™s probability of selecting such a path is higher. We conclude from the sensitivity analysis that the ratios ๐‘Ÿ1 and ๐‘Ÿ2 govern the defenderโ€™s propensity to be either risk-seeking or risk-averse. That is, when the costs of mobility and security loss are high, the defender is less likely to choose the shortest path, indicating risk aversion; otherwise, it is risk-seeking. Furthermore, the influence of edge stages ๐พ๐‘’ strongly governs the defender and adversary policies. Alternate paths with multiple edges comprising a lower number of stages significantly deviate the policy away from the shortest path. In the next subsection, we will examine how the solution of the meta-game compares with that from Algorithm 1 over larger sized graphs. We will also assess whether the shortest path obtained from Algorithm 1 can serve as a reasonable attack-resilient route. 34 468101214Vertices0.650.70.750.80.850.9Probability (Shortest Path)Meta-Game468101214Vertices0.40.50.6Probability (Shortest Path)Meta-Game 2.6.2 Comparisons on larger roadmaps In this section, we solve the meta-game (Section 2.6) played over roadmaps of varying size to determine an optimal attack-resilient path and compare the result against the solution provided by Algorithm 1, which is treated as a baseline. The shortest path, along with the probabilities of choosing the paths and edges on a sparsely connected graph with 10 vertices, is shown in Figure 2.14a and Figure 2.14b. The shortest path is indicated by a square block vertex with an arrow. The path of interest is from the source vertex 1 to the destination vertex 10. We observe a higher probability of picking an alternate path as opposed to the shortest path. However, the probability of choosing an attack edge is distributed across multiple paths. These results indicate that even for a sparse graph, an attack-resilient path is not necessarily the shortest path. In general, for densely connected directed acyclic graphs (DAG) with ๐‘ vertices, the possible paths scale as 2๐‘โˆ’2 with the total number of edges being ๐‘ (๐‘+1) 2 โˆ’ ๐‘. Therefore, the size of a meta-game increases exponentially with the number of vertices leading to a meta-game matrix, ๐‘Š โˆˆ R(2๐‘ โˆ’2)ร— . Now, we will investigate the solutions of the meta-game from (2.7) and (cid:16) ๐‘ ( ๐‘ +1) 2 โˆ’๐‘ (cid:17) compare it against the output of Algorithm 1 for a given graph. The connectivity of a graph is characterized by the degree of each vertex. For a sparse graph, the degree of each vertex is less than the number of nodes (assuming no self-loops). A sparse graph is generated by uniformly sampling ๐‘ vertices from a unit square and randomly connecting them such that a desired degree for each vertex is obtained. The number of stages ๐พ๐‘’ over an edge ๐‘’ is proportional to the Euclidean distance between the connected vertices. Finally, all the stage cost matrices along any edge of the graph are set to a constant value (defined in Section 2.3). The computation times and costs of both the meta-game and Algorithm 1 for sparse and fully connected graphs are reported in Tables 2.1 and 2.2, respectively. The average degree of each vertex in the sparse graph is set between the limits 2 and 3. From Table 2.1, we observe that the ratio of average time taken to solve the meta-game to that by Algorithm 1 decreases with an increasing number of nodes, but with a decrease in benefit in terms of cost optimality of Algorithm 1. This decrease in computation time is a consequence of the 35 average degree per vertex across graphs of various sizes, thereby increasing the sparsity of graphs with a large number of vertices. In contrast, from Table 2.2, we observe that in dense graphs, the ratio of cost performance between the two approaches decreases with the graph size, but at the expense of increasing the ratio of computation times. Probabilities of picking the shortest path are reported in Figures 2.14c and 2.14d. From both figures, we observe that the probabilities of picking the shortest path corresponding to Algorithm 1 increase with the sparsity of a graph as opposed to a densely connected graph. This implies that the defender becomes risk-seeking over sparse graphs and risk-averse over densely connected graphs. Table 2.1 Performance of the meta-game vs. Algorithm 1 averaged over 100 runs with an average degree between [2,3] for every vertex of the roadmap. Vertices Time performance, Time(๐‘Š๐‘ ๐ธ )/Time(๐ฟ๐‘†๐ธ ๐ด) Cost performance, ๐‘Š๐‘ ๐ธ /๐ฟ๐‘†๐ธ ๐ด 4 6 8 10 12 14 173.833 145.143 148.500 146.556 111.556 112.222 0.83 0.85 0.87 0.89 0.92 0.92 Table 2.2 Performance of the meta-game vs. Algorithm 1 averaged over 100 runs in a fully connected roadmap. Vertices Time performance, Time(๐‘Š๐‘ ๐ธ )/Time(๐ฟ๐‘†๐ธ ๐ด) Cost performance, ๐‘Š๐‘ ๐ธ /๐ฟ๐‘†๐ธ ๐ด 4 6 8 10 12 14 152.000 179.857 165.429 187.625 283.429 773.100 0.82 0.79 0.80 0.77 0.78 0.77 2.7 Robotic Simulation of PIE-game In this section, we demonstrate the framework of the PIE-game applied to autonomous vehicle navigation via simulations and experiments implemented in a robotic simulation engine. For the 36 setup, we use Robot Operating System (ROS) in conjunction with Gazebo [74]. The defender vehicle in our simulations and experiments is a TurtleBot3 burger. We assume the existence of an architecture, such as a camera network or a positioning and localization system, which provides uncorrupted global knowledge of the environment. This architecture forms the infrastructure for vehicle-to-infrastructure (V2I) communication. In this context, we focus on the vulnerability of the CAV at the perception level and show its impact on the time taken to reach a destination. The Turtlebot3, representing a CAV, is equipped with a set of sensors, including a LiDAR, a camera, and/or a radar, to detect obstacles in its own vicinity (local view). The robot relies on vehicle-to-vehicle (V2V) or vehicle-to-everything (V2X) communication to gain information from the environment beyond its local view, known as the extended view. We consider the presence of an adversary communicating false data, such as position or velocity, from an extended view, as illustrated in Figure 2.15. We assume that the infrastructure can verify any malicious data, like spoofed obstacles or vehicles present in the environment, but at a cost (e.g., delay). Figure 2.15 Illustration of a vehicle attacked from extended view while performing a V2V or V2X communication. 2.7.1 Open Path Attack We construct a PIE-game with the objective of traveling from a source pose (vertex) to a destination pose (vertex) in the presence of an attack. We model the attack as an action that creates fake obstacles in the occupancy grid of the robot (information passed via V2V communication). The actions of the defender (vehicle) are to either communicate with the infrastructure (V2I), analogous to validating the information received from any vehicles/malicious agents, or to do nothing, equivalent to relying on the received information. We assume that the message exchange 37 Local viewExtendedviewAttacker with the infrastructure occurs at a much lower rate compared to the vehicleโ€™s controller. Therefore, the vehicle slows down or does not accelerate during V2I communication to ensure safety. When an attack is successful, the vehicle deviates from its planned trajectory by a certain amount, thus adding to the time required to reach the destination. The stage cost matrix represents the time required by the vehicle at every stage and is summarized as: ๐‘† = ๐œ™ฮ”๐‘‘ ๐œ™ฮ”๐‘‘ ฮ”๐‘‘ + ฮ”๐‘Ž ฮ”๐‘‘ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , (2.35) where ฮ”๐‘‘ is the time spent per decision epoch, ๐œ™ is the factor by which ฮ”๐‘‘ is increased during a V2I communication (๐œ™ > 1), and ฮ”๐‘Ž is the additional time spent per decision epoch under a successful attack. Although we can solve the PIE-game for varying stage costs, for the sake of clarity and to maintain consistency with the methodology analyzed earlier in this section, we have established a ROS environment with a constant stage cost. In Figures 2.16a and 2.16b, we observe an attack in ROS using a TurtleBot3 burger. It illustrates how a deviation is linked to the attack and contributes to additional time required for travel to a destination. Given a source pose, target pose, and the planned velocity of the robot, we estimate an approximate number of decision epochs needed. This determined number of decision epochs is then employed as the number of stages ๐พ๐‘’ to solve the PIE-game and to define the player policies for use in the simulation. Simulations We conducted multiple simulations of the PIE-game, guiding a TurtleBot3 burger in conjunction with Gazebo from a source pose to a target pose. In Figure 2.17, you can see snapshots of a sample robot trajectory. The planned trajectory is represented by the dotted line, while the solid line depicts the actual trajectory covered by the robot. The attack induces a deviation from the normal trajectory, as shown in Figure 2.17b. This deviation is subsequently corrected once the adversary is apprehended, allowing the robot to safely reach its destination, as demonstrated in Figure 2.17c and 2.17d respectively. 38 (a) (b) (c) Figure 2.16 (a) An attack realized on ROS with Turtlebot3 burger. The attack is obstacles (vehicles) in formation causing a larger deviation in normal trajectory. (b) Influence of an attack (obstacle) on the deviation of path causing an increase in time to destination (security loss). (c) The PIE-game evaluated experimental and expected theoretical value of the PIE-game. Lastly, we present the average time taken by the robot to reach the destination over multiple runs, comparing it against the value predicted by the PIE-game in Figure 2.16c. Itโ€™s evident that the solution obtained from the simulations closely aligns with the theoretical value predicted by the PIE-game. (a) (b) (c) (d) Figure 2.17 (a) The initial position of the robot with the planned trajectory indicated with the dotted line. (b) Attack along the trajectory causing a change in deviation along the covered and planned trajectory. The covered trajectory is represented by the solid line. (c) The change in trajectory after the defender has intercepted the attack and recovery of the planned trajectory.(d) The final position of the robot with the covered trajectory indicated in solid line. 39 05101520Epoch18202224Time to destinationSimulationsPIE Game TheoreticalPIE Game Simulation-2-10X coordinate-2-101Y coordinatetime = 0.01 s-2-10X coordinate-2-101Y coordinatetime = 5.22 s-2-10X coordinate-2-101Y coordinatetime = 7.88 s-2-10X coordinate-2-101Y coordinatetime = 22.97 s Figure 2.18 Illustration of a vehicle attacked from extended view under a V2V or V2X communi- cation in a single/two lane road. 2.7.2 One/Two Lane Attack We now extend the previously described attack on an open path to a PIE-game on a TurtleBot inspired by a realistic attack scenario, i.e., on a traffic lane. We assume that the vehicle (robot) is traveling on a single or two lane road where passing another vehicle is unfeasible or unsafe. By spoofing the sensor, the adversary can create a fake large object, such as a trailer in front of a vehicle causing it to slow down. Such a scenario is illustrated in Figure 2.18. The allowable actions of the defender are exactly as described in subsection 2.7.1 โ€“ either to validate with the infrastructure (V2I) or do nothing. We model the stage cost matrix for this scenario as: ๐‘† = ๐œ™ฮ”๐‘‘ ๐œ™ฮ”๐‘‘ หœ๐œ™ฮ”๐‘‘ ฮ”๐‘‘ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (2.36) where หœ๐œ™ is the additional time spent per decision epoch when the vehicle slows down due to an attack and the remaining parameters are presented in Equation (2.35). In the given scenario, หœ๐œ™ > ๐œ™ to maintain Assumption 2.3.1. Experiments and simulations To realize the described scenario, we constructed the environment depicted in Figure 2.19a for both simulation (top) and experiments (bottom). The environment was designed to emulate a constant stage cost matrix as accurately as possible. Figure 2.19a also illustrates the execution of an attack when moving from a source to a destination pose. Similar to subsection 2.7.1, we determined an approximate number of decision instants ๐พ๐‘’ to solve the PIE-game and establish player policies, 40 Local ViewSpoofedVehicle (a) (b) Figure 2.19 (a) PIE-game on single traffic lane attack scenario. The figure above is a simulation with TurtleBot3 burger in a Gazebo environment and the one below is the corresponding experimental setup. (b) Velocity profile of the vehicle under different actions such as A(resp. NA) as attack (resp. no attack) and D(resp. ND) as defend(resp. no defend). utilizing the maximum allowable velocity of the TurtleBot3 burger and the distance to be traveled. Figure 2.21 displays a snapshot of a sample robot trajectory under successful attacks. The dotted and solid lines denote the planned and actual trajectories, respectively. A successful attack causes the vehicle to operate at a lower velocity compared to the nominal. The action of V2I communication by the defender also reduces the vehicleโ€™s velocity, and in the absence of any active actions by either player or if the game concludes (shaded region), the vehicle reverts back to its nominal velocity, as illustrated in Figure 2.20a. The time taken by the robot over individual experimental and simulation runs (epochs) is presented in Figure 2.20a, 2.20b, and is compared to the value of the PIE-game. We observe that the average time taken by the vehicle closely matches the value of the PIE-game in simulations. However, the average time taken by the robot is observed to be slightly lower than the value of the PIE-game due to several reasons, such as 1) approximating a constant stage cost matrix, 2) using precomputed policies when the vehicle slows down, as opposed to re-evaluating the game and using new policies, and 3) noise in the motion of the vehicle. In conclusion, with appropriate models of the environment, the fie/PIE-game can be successfully applied to a range of problems. 41 0246810120.150.20VelocityND,AD,NA/AND,NA024681012Time instant0.000.501.00ActionDefendAttack (a) (b) Figure 2.20 (a) The value of the PIE-game evaluated in ROS using TurtleBot3 burger and gazebo over multiple simulations and compared with the expected value of the PIE-game. (b) The value of the PIE-game evaluated in ROS with TurtleBot3 burger over multiple experiments and compared with the expected value of the PIE-game. 2.8 Summary In this chapter, we addressed a prototypical path planning problem defined over a roadmap, where a vehicle aims to find an attack-resilient path from a given source to a destination in the presence of an adversary capable of launching an attack on an edge of the roadmap. The defender (vehicle) can take an action to detect an attack at the expense of some cost (energy) and disable the attack permanently if detected multiple times. We formulated this scenario using the framework of a zero-sum multi-stage game, with a stopping state being played simultaneously by the adversary and defender. We characterized the Nash equilibria of an edge-game and provided a detailed analysis for the case where both the defender and adversary are limited to only two actions. Additionally, we con- ducted a comprehensive study of two edge-game variants, namely the fie and PIE-games, defined in terms of the information structure induced by constraints on the type of attack. We also investigated the sensitivity of the edge-game with respect to (i) the cost of using the countermeasure, (ii) the cost of motion, and (iii) the benefit of disabling the attack. Moreover, we demonstrated how the results of either edge game can be used to create a zero-sum meta-game for a given roadmap, and compared 42 02040Epoch141618Time to destinationSimulationPIE Game TheoreticalPIE Game Simulation02040Epoch10111213Time to destinationExperimentsPIE Game TheoreticalPIE Game Experiments Figure 2.21 Illustration of a vehicle attacked from extended view under a V2V or V2X commu- nication in a single/two lane road. The dotted line represent the planned trajectory and the solid line represents the trajectory followed. The solid blocks around the robot at all the time instances represent the boundary and the solid block in from of the robot at time instant 7.89 s represents a spoofed trailer. the meta-game solution with the result of a novel shortest path heuristic. Finally, we reported three sets of numerical validations: (i) computation time and cost optimality of the proposed approaches, (ii) implementation of the PIE-game solution in a robotic simulation engine, and (iii) realization of the PIE-game in a robotic experiment. An initial version of the result consisting of full information stopping state game with a single termination threshold appeared in [14]. The results of the partial information structure along with the experiments were demonstrated in [15]. 2.9 Supplementary Materials Proof of Theorem 2.3.2 Since the edge-game consists of only 2 players, the following method can be used to determine the policy for each player at any stage ๐‘˜. Equation (2.9) can be represented as a zero-sum matrix given 43 0X coordinate-101Y coordinatetime = 0.01 s0-101time = 7.89 s0-101time = 15.45 s by: ๐‘‰๐‘˜โˆ’1 = ๐‘ฆT ๐‘˜ (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) ๐‘Ž11,๐‘˜ ๐‘Ž12,๐‘˜ (cid:170) (cid:174) ๏ฃน ๏ฃฎ (cid:174) ๏ฃบ ๏ฃฏ (cid:174) ๏ฃบ ๏ฃฏ (cid:174) ๏ฃบ ๏ฃฏ (cid:174) ๐‘Ž21,๐‘˜ ๐‘Ž22,๐‘˜ ๏ฃบ ๏ฃฏ (cid:174) ๏ฃฏ ๏ฃบ (cid:174) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป ๏ฃฐ (cid:174) (cid:125) (cid:123)(cid:122) (cid:124) ฮž(๐‘˜) (cid:172) ๐‘ง๐‘˜ . (2.37) Policy for Defender - The expected value of a zero-sum matrix ฮž(๐‘˜) at any stage ๐‘˜ is given by: ๐‘‰๐‘˜ (ฮž(๐‘˜)) (cid:17) min ๐‘ฆ๐‘˜ โˆˆฮ”2 max ๐‘ง๐‘˜ โˆˆฮ”2 ๐‘˜ ฮž(๐‘˜)๐‘ง๐‘˜ , ๐‘ฆT = min ๐‘ฆโˆˆ{๐‘ฆ1,๐‘˜,๐‘ฆ2,๐‘˜ } max ๐‘งโˆˆ{๐‘ง1,๐‘˜,๐‘ง2,๐‘˜ } (๐‘ฆ1,๐‘˜ ๐‘Ž11,๐‘˜ + ๐‘ฆ2,๐‘˜ ๐‘Ž21,๐‘˜ )๐‘ง1,๐‘˜ + (๐‘ฆ1,๐‘˜ ๐‘Ž12,๐‘˜ + ๐‘ฆ2,๐‘˜ ๐‘Ž22,๐‘˜ )๐‘ง2,๐‘˜ (cid:169) (cid:173) (cid:173) (cid:171) , (cid:170) (cid:174) (cid:174) (cid:172) (2.38) where, ฮ”2 is probability simplex in two dimensions, ๐‘ฆ๐‘–,๐‘˜ and ๐‘ง๐‘–,๐‘˜ , ๐‘– โˆˆ {1, 2} represent the ๐‘–th element of the probability vector ๐‘ฆ๐‘˜ and ๐‘ง๐‘˜ , respectively. The policy of any player is the space of mixed policies can determined analytically or through a graphical approach [61]. Given the policy of second player, the first playerโ€™s policy does not deviate unilaterally if the expected outcome over any of its action result in the same outcome. From the probability simplex of dimension 2, we get ๐‘ฆ2 = 1 โˆ’ ๐‘ฆ1. This leads to the following, ๐‘‰๐‘˜ (ฮž(๐‘˜)) = min ๐‘ฆ1,๐‘˜ โˆˆ[0,1] max ๐‘ฆ1,๐‘˜ ๐‘Ž11,๐‘˜ + (1 โˆ’ ๐‘ฆ1,๐‘˜ )๐‘Ž21,๐‘˜ ๐‘ฆ1,๐‘˜ ๐‘Ž12,๐‘˜ + (1 โˆ’ ๐‘ฆ1,๐‘˜ )๐‘Ž22,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (2.39) Equation (2.39) yields the policy for player 1/defender is given as: ๐‘ฆ1,๐‘˜ ๐‘Ž11,๐‘˜ + ๐‘Ž21,๐‘˜ โˆ’ ๐‘Ž21,๐‘˜ ๐‘ฆ1,๐‘˜ = ๐‘ฆ1,๐‘˜ ๐‘Ž12,๐‘˜ + ๐‘Ž22,๐‘˜ โˆ’ ๐‘Ž22,๐‘˜ ๐‘ฆ1,๐‘˜ , โ‡’ ๐‘ฆโˆ— 1,๐‘˜ = ๐‘Ž22,๐‘˜ โˆ’ ๐‘Ž21,๐‘˜ (๐‘Ž11,๐‘˜ โˆ’ ๐‘Ž21,๐‘˜ โˆ’ ๐‘Ž12,๐‘˜ + ๐‘Ž22,๐‘˜ ) . The probability of choosing the second action is, ๐‘ฆโˆ— 2,๐‘˜ = ๐‘Ž11,๐‘˜ โˆ’ ๐‘Ž12,๐‘˜ (๐‘Ž11,๐‘˜ โˆ’ ๐‘Ž21,๐‘˜ โˆ’ ๐‘Ž12,๐‘˜ + ๐‘Ž22,๐‘˜ ) . Substituting the values from matrix ฮž(๐‘˜), equation (2.10) yields the optimal policy for defender. 44 Policy for Attacker - The mixed policy for attacker satisfies, ๐‘‰๐‘˜ (ฮž(๐‘˜)) = min max ๐‘ง๐‘˜ โˆˆ{๐‘ง1,๐‘˜ } ๐‘Ž11,๐‘˜ ๐‘ง1,๐‘˜ + ๐‘Ž12,๐‘˜ (1 โˆ’ ๐‘ง1,๐‘˜ ), ๐‘Ž21,๐‘˜ ๐‘ง1,๐‘˜ + ๐‘Ž22,๐‘˜ (1 โˆ’ ๐‘ง1,๐‘˜ ) (cid:169) (cid:173) (cid:173) (cid:171) . (cid:170) (cid:174) (cid:174) (cid:172) (2.40) Equation (2.40) gives us the policy for attacker as, ๐‘ง1,๐‘˜ = ๐‘Ž22,๐‘˜ โˆ’ ๐‘Ž12,๐‘˜ (๐‘Ž11,๐‘˜ โˆ’ ๐‘Ž21,๐‘˜ โˆ’ ๐‘Ž12,๐‘˜ + ๐‘Ž22,๐‘˜ ) , ๐‘ง2,๐‘˜ = ๐‘Ž11,๐‘˜ โˆ’ ๐‘Ž21,๐‘˜ (๐‘Ž11,๐‘˜ โˆ’ ๐‘Ž21,๐‘˜ โˆ’ ๐‘Ž12,๐‘˜ + ๐‘Ž22,๐‘˜ ) This yields the mixed policy of the attacker in equation (2.11). Value of the game The value of the game at any stage โ€˜๐‘˜โ€™ is given by, ๐‘‰๐‘˜โˆ’1 = ๐‘ฆTโˆ— ๐‘˜ ฮž(๐‘˜)๐‘งโˆ— ๐‘˜ , Expanding the terms, ๐‘‰๐‘˜โˆ’1 = ๐‘‰๐‘˜ + det(๐‘†๐‘˜ ) + ๐‘ 22,๐‘˜ ((๐พ๐‘’ โˆ’ ๐‘˜)๐‘ 22,๐‘˜ โˆ’ ๐‘‰๐‘˜ ) (๐‘ 11,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ + (๐พ๐‘’ โˆ’ ๐‘˜ + 1)๐‘ 22,๐‘˜ โˆ’ ๐‘‰๐‘˜ ) . (2.41) Proof of Proposition 2.3.4 To determine an approximate solution of an edge-game at any stage ๐‘˜, we begin with the analysis of equation (2.14). It is observed that the recursive equation can be divided into two parts namely; when ๐‘Ÿ2 = 0 and, when ๐‘Ÿ2 > 0 with increasing number of stages ๐พ๐‘’. Using the parametric recursive equation with unit defense cost (๐‘ 11 = 1), ๐‘‰๐‘˜โˆ’1 = ๐‘‰๐‘˜ + ๐‘Ÿ2 โˆ’ ๐‘Ÿ1 ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜ + 1) โˆ’ ๐‘Ÿ1 โˆ’ ๐‘‰๐‘˜ + โˆ’๐‘Ÿ2((๐พ๐‘’ โˆ’ ๐‘˜)๐‘Ÿ2 โˆ’ ๐‘‰๐‘˜ ) ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜ + 1) โˆ’ ๐‘Ÿ1 โˆ’ ๐‘‰๐‘˜ โ‡’ ๐‘‰๐‘˜โˆ’1 โˆ’ ๐‘‰๐‘˜ = ๐‘Ÿ2 โˆ’ ๐‘Ÿ1 ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜ + 1) โˆ’ ๐‘Ÿ1 โˆ’ ๐‘‰๐‘˜ + โˆ’๐‘Ÿ2((๐พ๐‘’ โˆ’ ๐‘˜)๐‘Ÿ2 โˆ’ ๐‘‰๐‘˜ ) ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜ + 1) โˆ’ ๐‘Ÿ1 โˆ’ ๐‘‰๐‘˜ . , (2.42) We will first investigate the case where ๐‘Ÿ2 = 0. It is observed that the recursive equation can be formulated by a continuous version using Taylor series expansion at time instant ๐‘˜ ๐‘‰ (๐‘˜ โˆ’ ฮ”๐‘˜) = ๐‘‰ (๐‘˜) โˆ’ ฮ”๐‘˜๐‘‰ โ€ฒ(๐‘˜), โ‡’ โ‡’ lim ฮ”๐‘˜โ†’0 ๐‘‰ (๐‘˜) โˆ’ ๐‘‰ (๐‘˜ โˆ’ ฮ”๐‘˜) ฮ”๐‘˜ ๐‘‰ (๐‘˜) โˆ’ ๐‘‰ (๐‘˜ โˆ’ ฮ”๐‘˜) ฮ”๐‘˜ = ๐‘‰ โ€ฒ(๐‘˜), โˆ’๐‘Ÿ1 ๐‘‰๐‘˜ + ๐‘Ÿ1 = . 45 We obtain the continuous form of equation (2.14) as, ๐‘‘๐‘‰ ๐‘‘๐‘˜ = โˆ’๐‘Ÿ1 ๐‘‰๐‘˜ + ๐‘Ÿ1 . Integrating with respect to ๐‘‰ and ๐‘˜, โˆซ ๐‘‰๐ผ ๐‘‰ ๐‘‰๐‘˜ + ๐‘Ÿ1 ๐‘‘๐‘‰ = โˆ’ โˆซ ๐พ๐‘’ ๐‘˜ ๐‘Ÿ1 ๐‘‘๐‘ , โ‡’ ๐‘‰ 2 ๐ผ 2 + ๐‘Ÿ1๐‘‰๐ผ โˆ’ ๐‘‰ 2 2 โˆ’ ๐‘Ÿ1๐‘‰ = โˆ’๐‘Ÿ1(๐พ๐‘’ โˆ’ ๐‘˜), where ๐‘‰๐ผ = 1, initial condition. Substituting the value of ๐‘‰๐ผ in the equation (2.44), 1 2 + ๐‘Ÿ1 โˆ’ ๐‘‰ 2 2 โˆ’ ๐‘Ÿ1๐‘‰ = โˆ’๐‘Ÿ1(๐พ๐‘’ โˆ’ ๐‘˜). The solution of equation (2.45) yields the desired result given by, ๐‘‰๐‘˜ = โˆ’๐‘Ÿ1 + โˆš๏ธƒ ๐‘Ÿ 2 1 + (2๐‘Ÿ1๐พ๐‘’ + 2๐‘Ÿ1(1 โˆ’ ๐‘˜) + 1). (2.43) (2.44) (2.45) Similarly, we now determine the solution for ๐‘Ÿ2 > 0. For a given ๐พ๐‘’ the value ๐‘‰๐‘˜ at stage ๐‘˜ monotonically increases. Therefore, for a large ๐พ๐‘’ as ๐‘˜ โ†’ 0, equation (2.42) can be approximated as, ๐‘‰๐‘˜โˆ’1 โˆ’ ๐‘‰๐‘˜ โ‰ˆ โˆ’๐‘Ÿ2((๐พ๐‘’ โˆ’ ๐‘˜)๐‘Ÿ2 โˆ’ ๐‘‰๐‘˜ ) ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜ + 1) โˆ’ ๐‘Ÿ1 โˆ’ ๐‘‰๐‘˜ , โ‡’ ๐‘‰๐‘˜โˆ’1 โˆ’ ๐‘‰๐‘˜ โ‰ˆ โˆ’๐‘Ÿ2 (2.46) Using the Taylor series expansion method as described in equation (2.43), we obtain the following solution, ๐‘‰๐‘˜โˆ’1 โ‰ˆ ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜). (2.47) Therefore, combining the solutions when ๐‘Ÿ2 = 0 given by equation (2.45) and equation (2.47), we obtain the following approximation, ๐‘‰๐‘˜โˆ’1 โ‰ˆ โˆ’๐‘Ÿ1 + โˆš๏ธƒ ๐‘Ÿ 2 1 + (2๐‘Ÿ1๐พ๐‘’ + 2๐‘Ÿ1(1 โˆ’ ๐‘˜) + 1) + ๐‘Ÿ2(๐พ๐‘’ โˆ’ ๐‘˜). 46 CHAPTER 3 STOCHASTIC ADVERSARY - STOCHASTIC STOPPING STATE GAMES AND THEIR APPLICATION TO MOTION PLANNING In the previous chapter, we introduced a deterministic adversary that aims to maximize itโ€™s payoff from a defender over a finite-stage. Such an interaction between the defender and adversary is modeled as a zero-sum multi-stage game. Furthermore, the adversary is disabled if a specific pair of actions is played ๐ฟ times, and the game reaches a stopping state. We characterized the Nash equilibrium of such a game and demonstrated its application on a path planning problem. However, in many real-world applications, a defender may not encounter a deterministic ad- versary, rather a second player which has a certain probability of moving adversarially over each stage of a finite-stage process. We term such an adversary as a stochastic adversary, which can acts in a benign or in an adversarial manner. Such an adversary model generalizes the previous model of a deterministic adversary. Setting the probability of adversarial behavior to 1, retrieves the deterministic adversary model. 3.1 Introduction In this chapter, we develop a framework to plan the actions of a defender accounting for both, the cost of safety and minimum cost per stage in presence of a second player, which might act adversarially. We assume that the defender uses a classifier that outputs a probability indicating the adversarial intent of the second player. Thus, the decision-making process for the defender needs to model the impact of possible adversarial actions over multiple planning stages (finite- horizon). The proposed model treats the second player as adversarial with a certain probability at each stage of the decision-making. Once the second player becomes adversarial, it continues to select adversarial actions for the remaining stages of the game. The adversarial intent of the second player is confirmed only after a set of actions are played for a specified number of times (termed as termination threshold) by both players. This event causes the game to reach a stopping state and the game terminates. We analytically characterize the Nash equilibrium of this game for the case of two actions per player with a termination threshold, termed as M-SSG. We then expand 47 the action space of both players to an arbitrary number of actions, termed as SSG๐‘šร—๐‘›, and solve it using linear programming for a termination threshold of unity. Furthermore, for SSG๐‘šร—๐‘›, we also characterize an analytical condition under which the playersโ€™ transition to a pure policy. We demonstrate the application of M-SSG via two autonomous motion planning applications. The first involves maintaining a safe distance from a non-ego vehicle ahead, modeled using fixed stage costs. The second involves safe lane-changing with costs that are stage dependent. In both scenarios, we provide a comparison between the analytic/simulated and experimental results using ground robots. Finally, we apply our framework with a larger action space to address a resilient estimation problem employing a Kalman filter. Works such as [104, 144, 143] delve into the formulation of a game between an estimator, striving to minimize worst-case error, and an attacker manipulating measurements. This game is explored in both complete [104] and partial information [143] settings, accommodating an arbitrary number of sensors [144]. The study presented by Huang et al. [66] introduces a framework to analyze cross- layer coordinated attacks on Cyber-Physical Systems (CPS). Here, the defender has the option to dispense observation, while an adversary can launch a jamming attack aimed at degrading estimation performance. Additional insights into attacks on CPS are provided by Mahmoud et al. in the survey [96]. In this work, along with the exploration of a defensive policy, considerations are made for an adversaryโ€™s termination scenario (stopping state) once its presence is ascertained. There have been a number of works addressing the problem of estimation and control in the presence of an adversary. A hybrid game was proposed between a defender choosing a set of controllers, detector, and estimator against an adversary capable of manipulating the sensor measurements [100], where the authors derived a sub-optimal value iteration method along with a moving horizon approach. The dynamic landscape of security in shared communication networks is explored in the study presented by Xing et al.[150], where a dynamic non-zero-sum game with asymmetric information engages multiple sensors in deciding their investments in security. Addressing a multi-sensor transmission control problem within a signal-to-noise-and-interference communication channel, Li et al.[89] contribute to the understanding of efficient sensor network 48 management. A comprehensive overview of security issues in Cyber-Physical Systems (CPS) is provided in the survey by Zhu et al. [161], summarizing various game-theoretic approaches. Notably, these prior works typically assume a deterministic adversary. In contrast, the present work considers the presence of an adversary in a probabilistic manner. Our prior works [14] and [15] focused on full and partial information scenarios for a single robot with a route planning problem serving as the overall objective. This chapter extends the method- ology from our prior works to: 1) probability of adversarial intent in the information structure of the game, 2) condition on the existence of mixed and pure policy equilibria corresponding to the new structure, 3) the case where the defender requires a finite number of detections to ascertain the adversarial intent of the other player, and 4) the consideration of finitely many number of actions per player. Specifically, in this chapter, we introduce the concept of probabilistic adversary and a finite termination threshold, which represents a limited number of instances in which a specific pair of actions can be played before the game concludes. The termination threshold refers to the number of times a specific set of action pairs (e.g., strong defense being used against an attack) is played, after which the game is terminated. In other words, a termination threshold of ๐ฟ corresponds to playing a specific action pair a total of ๐ฟ times. The termination threshold can be viewed as a dual version of the war of attrition or the chicken game [61], where the game continues only when a specific pair of actions is chosen and terminates otherwise. The termination threshold acts as a counter; once the counter reaches the count ๐ฟ, the multi-stage zero-sum game terminates. The use of such a termination threshold in the presence of a deterministic adversary has been demonstrated in navi- gation and path planning [15]. Additionally, the termination threshold can be modified to included limited attack or defense which are applicable in settings with constrained energy [155, 88], limited energy resources [114], denial-of-service attacks [155, 88], and remote state estimation [114]. A formal definition of termination threshold is provided in Section 3.2 (Definition 1). The concept of termination threshold allows us to consider the impact of error in detecting the player type (for example, false positives). Pure policy Nash equilibrium are preferable since they greatly reduce the computational complexity, which is known to increase polynomially with the number of actions. 49 The contributions of this chapter are as follows, 1. Modeling adversarial intent through a stochastic game with a termination threshold: We model the interaction between a defender and a second player, which has a given probability of turning adversarial at any stage of a multi-stage stochastic zero-sum game (M-SSG). We assume that the stage cost matrices and the probability of turning adversarial are known. We begin with a M-SSG with two actions per player. An M-SSG captures two main features: (i) a probable adversarial intent of the second player, i.e., once the second player turns adversarial, it continues to act adversarially for the remaining stages of the game, and (ii) a balance between security and minimum per stage cost via stopping states. Furthermore, we incorporate a termination criteria for the zero-sum game when a specified pair of actions are played ๐ฟ times (known as the termination threshold) and completely characterize the M-SGG. 2. Arbitrary finite number of actions per player with a switching policy: We extend to the case of finitely many number of actions per player and provide a numerical method to solve the game. We then characterize analytic conditions on the problem parameters under which the defender and second player switches to a pure policy of weak defense and no attack, respectively. We demonstrate three applications of the proposed model. The first application is a leader-follower scenario, where a follower (ego vehicle) acts to maintain a safe distance from the leader (non-ego vehicle). The second application is a lane-change scenario, where the velocity of the ego vehicle is regulated while maintaining a balance between safety and the cost incurred to reach its goal. For the motion planning application, we compare the results obtained from analytical/simulation with the experiments. Finally, we demonstrate an application of with a large action space in the context of resilient estimation. This application involves utilizing a Kalman filter with multiple sensor feedback channels in the presence of an adversary that can strategically inject noise into 50 the measurement. Such a decision-making framework empowers a defender to strike a balance between security and performance while operating in the presence of a probable adversary. Outline: The chapter is organized as follows. In Section 3.2, we formulate the decision-making problem as a stochastic multi-stage zero-sum game (M-SSG) with budgets. In Section 3.3, we characterize the solution of the described M-SSG with a engagement, defense and attack budget along with numerical examples. We extend the solution methodology to an arbitrary number of actions per player setting in Section 3.4. Finally, in Section 3.5, we present an application of the M-SSG on i) motion planning problems, and ii) on resilient estimation using a Kalman filter. 3.2 Problem Formulation We consider a finite-horizon decision-making problem between a defender and a second player, whose type (whether non-adversarial or adversarial) is initially unknown. At each decision instant, the second player continues to act in a benign manner or switches to playing adversarially, according to a Bernoulli process. Once it reveals adversarial intent, it continues to act adversarially for all subsequent stages of the problem. The Bernoulli parameter ๐œŒ captures the adversarial intent of the second player. We model this finite stage interaction between a probabilistic adversary and the defender as a multi-stage zero-sum game, termed as the stochastic stopping state game (SSG). In this chapter, we extend our previous setup [18] consisting of two actions per player to (i) incorporate a finite termination threshold termed as M-SSG, and (ii) finitely many actions per player, termed as SSG๐‘šร—๐‘›, which consists of ๐‘š defender and ๐‘› second player actions with a termination threshold of 1. Every stage ๐‘˜ โˆˆ {1, 2, . . . , ๐พ } in a ๐พ horizon M-SSG and SSG๐‘šร—๐‘› are associated with two matrices: 1. ๐‘†๐‘˜ โˆˆ R๐‘šร—๐‘›: The rows and columns of this matrix correspond to the available actions for the defender and the second player if it acts adversarially. Each row represents a defenderโ€™s action, and each column represents a second playerโ€™s action. A pair of actions ๐‘– and ๐‘— chosen by the defender and adversarial player, respectively, results in a stage cost by the defender, given by the (๐‘–, ๐‘—)th entry of ๐‘†๐‘˜ , denoted by ๐‘ ๐‘– ๐‘—,๐‘˜ . 51 2. ๐‘…๐‘˜ โˆˆ R๐‘šร—1: This matrix contains a single column, wherein rows represent the defenderโ€™s actions along with their associated costs. A single column of costs correspond to a game where the second playerโ€™s action do not impact the defenderโ€™s cost. For the defenderโ€™s action ๐‘–, the stage cost is determined by the ๐‘–th row of ๐‘…๐‘˜ denoted as ๐‘Ÿ๐‘–,๐‘˜ . The discrete state of an M-SSG indicates whether the second player is playing adversarially or not. At any stage ๐‘˜ โˆˆ {1, 2, . . . , ๐พ } and state of an M-SSG, the uncertainty of player type (non-adversarial or adversarial) is captured when the game branches out with probability ๐œŒ๐‘˜ of continued adversarial intent and the complementary probability of 1 โˆ’ ๐œŒ๐‘˜ representing a non-adversarial player. For the case of ๐‘š = ๐‘› = 2, an M-SSG considers two actions per player, i.e., ๐‘†๐‘˜ โˆˆ R2ร—2 and ๐‘…๐‘˜ โˆˆ R2ร—1, ๐‘˜ โˆˆ {1, . . . , ๐พ }. The actions of the players and the corresponding entries of the stage cost matrices are given as, attack no attack no attack strong defense ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป Next, we formally define the term termination threshold and introduce the three types used in ๏ฃฎ ๐‘Ÿ1,๐‘˜ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘Ÿ2,๐‘˜ ๏ฃฏ ๏ฃฏ ๏ฃฐ strong defense weak defense weak defense ๐‘ 22,๐‘˜ ๐‘ 21,๐‘˜ ๐‘ 12,๐‘˜ ๐‘ 11,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , . the work. Definition 1 Let T denote a given subset of action pairs corresponding to the stage cost matrices of a ๐พ stage zero-sum matrix game. A termination threshold of ๐ฟ is defined as the number of times an action pair from T can be played after which the game is terminated. Figure 3.1 shows a game tree where the intent of the second player is uncertain when the set T := {{strong defense, attack}}. Here, the termination threshold ๐ฟ = 2. Figure 3.2 shows a similar game tree with the addition of finitely many number of actions available to each player with a termination threshold of ๐ฟ = 1, given T := {{๐ท1, ๐ด1}, {๐ท1, ๐ด2}, . . . , {๐ท1, ๐ด๐‘›}}. States which realize an adversarial intent (under the branch with probability ๐œŒ๐‘˜ ) continue to branch out since the second player continues to play adversarially for the rest of the game. 52 Figure 3.1 An M-SSG consisting of ๐พ stages a termination threshold of ๐ฟ = 2. The information set for the defender and second player is indicated by the dotted line and nodes taking value ๐‘‰ ๐‘– ๐‘˜ for ๐‘˜ โˆˆ {1, 2, . . . ๐พ }, ๐‘– โˆˆ {0, 1, . . . , ๐ฟ}. The value of an M-SSG under an adversarial intent is indicated by ๐‘‰ ๐‘˜ , ๐‘˜ โˆˆ {0, . . . , ๐พ } (see Remark 3.2.1). At every stage, the game branches with probability ๐œŒ๐‘˜ to indicate an adversarial player. Actions of an adversary (resp. defender) abbreviated as {๐ด, ๐‘ ๐ด} (resp. {๐‘†๐ท, ๐‘Š ๐ท}) for {Attack, No attack} (resp. {Strong Defense, Weak defense}). SS indicates the stopping state. Both the M-SSG and SSG๐‘šร—๐‘› are characterized by the sequence of matrices over ๐พ stages given by {{๐‘†1, ๐‘…1}, {๐‘†2, ๐‘…2}, . . . , {๐‘†๐พ, ๐‘…๐พ }}. At any stage ๐‘˜, the number of actions available to the defender and the second player are ๐‘š๐‘˜ and ๐‘›๐‘˜ , respectively, i.e., ๐‘†๐‘˜ โˆˆ R๐‘š๐‘˜ร—๐‘›๐‘˜ , and ๐‘…๐‘˜ โˆˆ R๐‘š๐‘˜ร—1, ๐‘˜ โˆˆ {1, . . . , ๐พ }. At every stage ๐‘˜, the defender (row player) and the second (column player) simultaneously select their respective actions (๐‘–, ๐‘—), leading to an expected cost ๐œŒ๐‘˜ ๐‘ ๐‘– ๐‘—,๐‘˜ + (1โˆ’ ๐œŒ๐‘˜ )๐‘Ÿ๐‘–,๐‘˜ , where ๐‘ ๐‘– ๐‘—,๐‘˜ โˆˆ ๐‘†๐‘˜ , ๐‘Ÿ๐‘–,๐‘˜ โˆˆ ๐‘…๐‘˜ , conditioned on the intent of the second player being unknown in the previous stages of the game. We assume that the game terminates at any stage ๐‘˜ โ‰ค ๐พ, whenever a set of action pairs (๐‘–, ๐‘—) โˆˆ T is played a total of ๐ฟ times. In an M-SSG or SSG๐‘šร—๐‘›, the current game state at any stage and the probability ๐œŒ๐‘˜ governing the presence of an adversary are common knowledge to both players. Since the second playerโ€™s type is governed by a Bernoulli process, we can determine the probability of revealing adversarial intent at any stage ๐‘˜ using the parameter ๐œŒ๐‘˜ . For instance, revealing adversarial intent is equivalent 53 SSAdversarial IntentIntent uncertain SSAdversaryconfirmedSS Figure 3.2 An SSG๐‘šร—๐‘› refers to a stochastic stopping state game where there are ๐‘š possible actions for the defender and ๐‘› possible actions for the second player. At stage ๐‘˜, when the game diverges with a probability of 1 โˆ’ ๐œŒ๐‘˜ , it signifies a non-adversarial player scenario where solely the actions of the defender are applicable. to flipping a coin and obtaining a heads (H), with the probability of getting heads being ๐œŒ๐‘˜ . When the SSG is played only for three stages, i.e., ๐พ = 3, the set of possible events at the final stage are given by {TTT,TTH,THT,THH,HTT,HTH,HHT,HHH}, with T indicating tails (benign type). The event of revealing adversarial intent given the second player is of type (T) in stages 1 and 2 is TTH. Therefore, when ๐ฟ = 1 and ๐œŒ = ๐œŒ๐‘˜ , โˆ€๐‘˜, the probability of revealing adversarial intent at stage 3 is (1 โˆ’ ๐œŒ)2๐œŒ, where (1 โˆ’ ๐œŒ)2 corresponds to the probability of remaining benign until stage 2. Likewise, when ๐ฟ = 2, the events TTH, THH and HTH describe the presence of an adversary in stage 3. Therefore, when ๐ฟ = 2 and ๐œŒ = ๐œŒ๐‘˜ , โˆ€๐‘˜, the probability of revealing adversarial intent at (cid:1) (1 โˆ’ ๐œŒ)2๐œŒ + (1 โˆ’ ๐œŒ)2๐œŒ. In general, any prior presence of adversary is accounted through stage 3 is (cid:0)2 1 the probability of (cid:0)๐พโˆ’1 (cid:1) (1 โˆ’ ๐œŒ) ๐‘˜โˆ’1๐œŒ, ๐‘ฅ โˆˆ {1, 2, . . . , ๐ฟ โˆ’ 1}. Using induction, for a stage varying ๐‘ฅ probability ๐œŒ๐‘˜ , we obtain the presence of an adversary any stage ๐‘˜ using the indicator function 54 SSSSSSAdversarial IntentIntent uncertain SSAdversaryconfirmedSSSS 1 : {1, 2, . . . , ๐‘š๐‘˜ } ร— {1, 2, . . . , ๐‘›๐‘˜ } โ†’ {0, 1} as: 1(๐‘–๐‘˜ , ๐‘—๐‘˜ ) = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ with probability 1, if {๐‘–๐‘˜ , ๐‘—๐‘˜ } โˆˆ T , 0, otherwise. (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) ๐ฟโˆ’1 โˆ‘๏ธ ๐‘ฅ=1 (cid:124) (cid:18)๐พ โˆ’ 1 ๐‘ฅ (cid:19) ๐‘˜โˆ’1 (cid:214) ๐‘˜โˆ’1 (cid:214) (1 โˆ’ ๐œŒ๐‘ž) + (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๐‘ž=1 ๐‘ž=1 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (1 โˆ’ ๐œŒ๐‘ž) ๐œŒ๐‘˜ , (3.1) (cid:123)(cid:122) ๐œš๐‘˜ (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:125) (cid:172) The game terminates at a stopping state if there exists a stage ๐‘ก โ‰ค ๐พ such that (cid:205)๐‘ก ๐‘˜=1 1(๐‘–๐‘˜ , ๐‘—๐‘˜ ) = ๐ฟ. At any stage ๐‘˜, for a given pair of actions (๐‘–๐‘˜ , ๐‘—๐‘˜ ) the expected cost for the defender is computed (cid:1). Here, the term ๐œš๐‘˜ ๐œŒ๐‘˜ ๐‘ ๐‘–๐‘˜, ๐‘—๐‘˜ corresponds to the expected cost of the as, ๐œš๐‘˜ (cid:0)๐œŒ๐‘˜ ๐‘ ๐‘–๐‘˜, ๐‘—๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ )๐‘Ÿ๐‘–๐‘˜ defender when the second player is adversarial, while the term ๐œš๐‘˜ (1โˆ’ ๐œŒ๐‘˜ )๐‘Ÿ๐‘–๐‘˜ represents the expected cost for the defender when facing a non-adversarial player. Now, conditioned on a sequence of player actions {(๐‘–1, ๐‘—1), . . . , (๐‘–๐พ, ๐‘—๐พ)}, the expected cost (with respect to the Bernoulli random variable that defines adversarial intent) ๐ฝ๐พ : (cid:206)๐พ ๐‘˜=1{1, 2 . . . , ๐‘š๐‘˜ } ร— {1, 2 . . . , ๐‘›๐‘˜ } โ†’ R for the defender is given by ๐ฝ๐พ ({(๐‘–1, ๐‘—1), . . . , (๐‘–๐พ, ๐‘—๐พ)}) = ๐พ โˆ‘๏ธ ๐œš๐‘˜ (cid:0)๐œŒ๐‘˜ ๐‘ ๐‘–๐‘˜, ๐‘—๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ )๐‘Ÿ๐‘–๐‘˜ (cid:1) . (3.2) ๐‘˜=1 For both the players, we consider the space of behavioral policies. A multi-stage behavioral policy [61] for the defender and second player (non-adversarial or adversarial) are defined by a set of probability distributions Y := {๐‘ฆ1, . . . , ๐‘ฆ๐พ } โˆˆ {ฮ”๐‘š1 , ฮ”๐‘š2 , . . . , ฮ”๐‘š๐พ } and Z := {๐‘ง1, . . . , ๐‘ง๐พ } โˆˆ , . . . , ฮ”๐‘›๐พ }, respectively, where ฮ”๐‘–๐‘˜ is the probability simplex in ๐‘– dimensions at stage ๐‘˜. , ฮ”๐‘›2 {ฮ”๐‘›1 In particular, for the M-SSG, Y := {๐‘ฆ1, . . . , ๐‘ฆ๐พ } โˆˆ {ฮ”21 , ฮ”22 , . . . , ฮ”2๐พ } and Z := {๐‘ง1, . . . , ๐‘ง๐พ } โˆˆ {ฮ”21 , ฮ”22 , . . . , ฮ”2๐พ }. Remark 3.2.1 When the probability ๐œŒ๐‘˜ , โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ } is set to 1, an M-SSG or SSG๐‘šร—๐‘› reduces to an edge-game from our previous work [14]. The net expected cost ๐ฝ๐ธ : (cid:206)๐พ ๐‘–=1 ฮ”๐‘š๐‘– ร— (cid:206)๐พ ๐‘–=1 ฮ”๐‘›๐‘– โ†’ R for the defender with respect to the 55 behavioral policies {Y, Z} is given by, ๐ฝ๐ธ (Y, Z) = (cid:16) ๐œš๐‘˜ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐œŒ๐‘˜ ๐‘ฆโ€ฒ ๐‘˜ ๐‘†๐‘˜ ๐‘ง๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ )๐‘ฆโ€ฒ ๐‘˜ ๐‘…๐‘˜ (cid:17) . (3.3) The goal of this chapter is to find a pair of behavioral policies (Yโˆ—, Zโˆ—) that are in Nash equilibrium [61] satisfying ๐ฝ๐ธ (Yโˆ—, Z) โ‰ค ๐ฝ๐ธ (Yโˆ—, Zโˆ—) โ‰ค ๐ฝ๐ธ (Y, Zโˆ—). Since this a complete information full feedback game, there always exists a behavioral saddle point [61]. We denote the outcome of any SSG (M-SSG or SSG๐‘šร—๐‘›) as ๐ธ := ๐ฝ๐ธ (Yโˆ—, Zโˆ—). ๐ฝโˆ— In the following sections we will derive an analytical and numerical method to solve both the M-SSG and SSG๐‘šร—๐‘›. 3.3 Solution to the M-SSG In this section, we will solve and analyze the M-SSG. Limiting the number of actions to two enables us to determine a closed-form expression for the value of the game and the corresponding player policies. The set T indicates the condition for terminating the M-SSG. For the M-SSG, we define the set T := {{strong defense, attack}}. For the given set T and a finite termination threshold ๐ฟ, we will present a procedure to compute the outcome ๐ฝ๐ธ defined in equation (3.3), resulting in a Nash equilibrium. The value of a zero-sum matrix game defined by the matrix ๐‘‹ is given by Val(๐‘‹) := min๐‘ฆ๐‘˜ โˆˆฮ”๐‘š๐‘˜ max๐‘ง๐‘˜ โˆˆฮ”๐‘›๐‘˜ second player policies, respectively. For any ๐‘– โ‰ฅ 0, let ๐‘‰ ๐‘– ๐‘˜โˆ’1 denote the mixed value of an M-SSG with a termination threshold of ๐ฟ โˆ’ ๐‘– at time instant ๐‘˜ โˆ’ 1. Then, there are two possibilities โ€“ either ๐‘ฆโ€ฒ ๐‘˜ ๐‘‹ ๐‘ง๐‘˜ , where ๐‘ฆ๐‘˜ and ๐‘ง๐‘˜ are the space of defender and the pair of actions played at time ๐‘˜ belong to the set T or not. Thus, using backward iteration, ๐‘˜โˆ’1 is a linear combination of ๐‘‰ ๐‘–+1 ๐‘‰ ๐‘– the value of an edge-game, ๐‘‰ ๐‘˜ [14] with probability ๐œŒ๐‘˜ and ๐‘‰ ๐ฟ ๐‘˜ . When ๐‘– = ๐ฟ, the value of the M-SSG depends on ๐‘˜ with probability 1 โˆ’ ๐œŒ๐‘˜ . The next and ๐‘‰ ๐‘– ๐‘˜ subsubsection formalizes this intuitive description. 56 3.3.1 Nash equilibria and value of the game For the given set T , we define two matrices, ๐น = 1 0 0 0 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , and ๐ท = 0 1 1 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป which encodes the event that the action pair from T was used or not used at any stage of the M-SSG, respectively (cf. Figure 3.1 for visualization). Notice the entry of 1 in the matrix ๐น and 0 in the matrix ๐ท correspond to the action pair in the set T . The following Bellman equation will show how the matrices ๐น and ๐ท are incorporated with the stage cost matrices. A standard technique to solve such games using the cost-to-go function (e.g., see [61]) is to compute the solution of the Bellman equation backward in time, ๐‘‰ ๐‘– ๐‘˜โˆ’1 = (cid:16) (cid:16) Val Val ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐œŒ๐‘˜ (๐‘‰ ๐‘–+1 ๐‘˜ ๐น + ๐‘‰ ๐‘– ๐‘˜ ๐ท + ๐‘†๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰ ๐‘– ๐‘˜ 1 + หœ๐‘…๐‘˜ ) (cid:17) , for ๐‘– โ‰  ๐ฟ, (3.4) ๐œŒ๐‘˜ (๐‘‰ ๐‘˜ ๐ท + ๐‘†๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰ ๐‘– ๐‘˜ 1 + หœ๐‘…๐‘˜ ) (cid:17) , ๐‘– = ๐ฟ, where ๐‘˜ โˆˆ {๐พ, ๐พ โˆ’ 1, . . . , 1} is the stage, ๐‘– โˆˆ {0, 1, . . . , ๐ฟ} is the number of times an action pair from T was used, ๐‘‰ ๐‘˜ is the value of the edge-game, ๐‘‰ ๐‘– ๐‘˜ is the value of M-SSG with a threshold of ๐ฟ โˆ’ ๐‘– respectively, 1 โˆˆ R2ร—2 is the matrix of ones, หœ๐‘…๐‘˜ โˆˆ R2ร—2 is a matrix with repeated column entries to convert the vector ๐‘…๐‘˜ to a matrix. The following mild assumption enables us to analyze the M-SSG and derive closed-form expressions for the value of the game and player policies. Assumption 3.3.1 The following stage cost inequalities hold at any stage ๐‘˜ of the M-SSG, ๐‘ 21,๐‘˜ > ๐‘ 12,๐‘˜ โ‰ฅ ๐‘ 11,๐‘˜ > ๐‘ 22,๐‘˜ โ‰ฅ 0 and ๐‘Ÿ1,๐‘˜ > ๐‘Ÿ2,๐‘˜ โ‰ฅ 0. Assumption 3.3.1 is naturally applicable in security-related scenarios. It implies that the cost associated with implementing a strong defense is lower than the cost of using a weak defense against an adversarial player. Similarly, the cost corresponding to a strong defense is higher than a weak defense against a non-adversarial player. 57 Remark 3.3.2 To incorporate costs once a stopping state is reached, we can optionally augment the stage cost entries corresponding to the action pair in T for ๐‘‰ ๐ฟ cost over the remaining stages, such as (cid:205)๐พ ๐‘˜ , i.e., augment ๐‘ 11,๐‘˜ with a fixed ๐‘—=๐‘˜ ๐‘ 22,๐‘˜ . In the analysis of the M-SSG, we do not consider such an augmentation of stage costs. To solve (3.4), we must first determine the value of an edge-game corresponding to the case of ๐‘– = ๐ฟ. This case is equivalent to the setting of ๐œŒ๐‘˜ = 1, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ }. For the case of two actions per player, under Assumption 3.3.1, the value of the edge-game is computed recursively [14] using ๐‘ 11,๐‘˜ ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ๐‘ 21,๐‘˜ โˆ’ ๐‘ 22,๐‘˜๐‘‰ ๐‘˜ ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘˜ with ๐‘‰๐พ = 0. Next, the recursion (3.4) for the case of ๐‘– โ‰  ๐ฟ is given by ๐‘‰ ๐‘˜โˆ’1 = ๐‘‰ ๐‘˜ + , (3.5) + (cid:33) . (3.6) ๐‘‰ ๐‘– ๐‘˜โˆ’1 = Val (cid:32) ๐œŒ๐‘˜๐‘‰ ๐‘–+1 ๐‘˜ +๐œŒ๐‘˜๐‘‰ ๐‘– ๐‘˜ 1 0 ๏ฃฎ ๏ฃน ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ 0 0 ๏ฃฏ ๏ฃบ (cid:32) (cid:32) ๏ฃฐ ๏ฃป (cid:123)(cid:122) (cid:125) (cid:124) ๐น +๐œŒ๐‘˜ ๏ฃฎ ๏ฃน 0 1 ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ 1 1 ๏ฃฏ ๏ฃบ (cid:32) (cid:32) ๏ฃฐ ๏ฃป (cid:123)(cid:122) (cid:125) (cid:124) ๐ท ๐‘ 11,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘ 21,๐‘˜ ๏ฃฏ ๏ฃฏ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃฐ (cid:124) ๐‘ 12,๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๐‘ 22,๐‘˜ ๏ฃบ ๏ฃบ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป (cid:125) (1 โˆ’ ๐œŒ๐‘˜ )๐‘‰ ๐‘– ๐‘˜ 1 1 1 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป + (1 โˆ’ ๐œŒ๐‘˜ ) (cid:123)(cid:122) ๐‘†๐‘˜ ๐‘Ÿ1,๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๐‘Ÿ2,๐‘˜ ๏ฃบ ๏ฃบ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป (cid:125) ๏ฃฎ ๐‘Ÿ1,๐‘˜ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘Ÿ2,๐‘˜ ๏ฃฏ ๏ฃฏ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃฐ (cid:124) (cid:123)(cid:122) หœ๐‘…๐‘˜ Similarly, when ๐‘– = ๐ฟ, the value of an M-SSG at any stage ๐‘˜ is given by ๐‘‰ ๐ฟ ๐‘˜โˆ’1 = Val (cid:32) ๐œŒ๐‘˜๐‘‰ ๐‘˜ +๐œŒ๐‘˜ ๏ฃฎ ๏ฃน 0 1 ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃบ ๏ฃฏ 1 1 ๏ฃบ ๏ฃฏ (cid:32) (cid:32) ๏ฃฐ ๏ฃป (cid:123)(cid:122) (cid:125) (cid:124) ๐ท ๐‘ 11,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘ 21,๐‘˜ ๏ฃฏ ๏ฃฏ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃฐ (cid:124) + ๐‘ 12,๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๐‘ 22,๐‘˜ ๏ฃบ ๏ฃบ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป (cid:125) (cid:123)(cid:122) ๐‘†๐‘˜ (1 โˆ’ ๐œŒ๐‘˜ )๐‘‰ ๐ฟ ๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ ) 1 1 1 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃฎ ๐‘Ÿ1,๐‘˜ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘Ÿ2,๐‘˜ ๏ฃฏ ๏ฃฏ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃฐ (cid:124) ๐‘Ÿ1,๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๐‘Ÿ2,๐‘˜ ๏ฃบ ๏ฃบ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป (cid:125) (cid:123)(cid:122) หœ๐‘…๐‘˜ (cid:33) . (3.7) A quick comparison of (3.6) and (3.7) reveals that due to the termination threshold, the quantities ๐‘‰ ๐‘˜ and ๐‘‰ ๐‘– ๐‘˜ get coupled, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ } and โˆ€๐‘– โˆˆ {1, 2, . . . , ๐ฟ}. Thus, the expected value of the 58 M-SSG at any stage ๐‘˜ is given by, ๐‘‰ ๐‘– ๐‘˜โˆ’1 = โ€ฒ (cid:16) โ€ฒ (cid:16) ๐‘ฆ๐‘–,โˆ— ๐‘˜ ๐‘ฆ๐‘–,โˆ— ๐‘˜ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐œŒ๐‘˜ (๐‘‰ ๐‘–+1 ๐‘˜ ๐น + ๐‘‰ ๐‘– ๐‘˜ ๐ท + ๐‘†๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰ ๐‘– ๐‘˜ 1 + หœ๐‘…๐‘˜ ) (cid:17) ๐œŒ๐‘˜ (๐‘‰ ๐‘˜ ๐ท + ๐‘†๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰ ๐‘– ๐‘˜ 1 + หœ๐‘…๐‘˜ ) ๐‘ง๐‘–,โˆ— ๐‘˜ , (cid:17) ๐‘ง๐‘–,โˆ— ๐‘˜ , if ๐‘– โ‰  ๐ฟ, otherwise, (3.8) where {๐‘ฆ๐‘–,โˆ— ๐‘˜ , ๐‘ง๐‘–,โˆ— ๐‘– โˆˆ {0, 1, . . . , ๐ฟ}. ๐‘˜ } is a Nash equilibrium pair at stage ๐‘˜ when the termination threshold equals While a mixed Nash equilibrium always exists, it is computationally more efficient to identify whether a pure Nash equilibrium exists at any given stage. Therefore, to aid the search of a pure Nash equilibrium, we present the following result. We derive a general result of switching between mixed and pure policies, corresponding to the stage cost matrices and the Bernoulli parameter ๐œŒ๐‘˜ , which will be used in the subsequent results. Lemma 3.3.3 Given a Bernoulli parameter ๐œŒ and stage cost matrices: ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป where ๐ต corresponds to costs when playing against an adversarial player and ๐ถ corresponds costs ๐‘21 ๐‘22 ๐‘11 ๐‘12 ๐‘1 ๐‘1 ๐‘2 ๐‘2 , ๐ถ = ๐ต = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป when playing against a non-adversarial player, satisfying the inequalities: ๐‘21 > ๐‘12 โ‰ฅ ๐‘11 > ๐‘22 โ‰ฅ 0 and ๐‘1 > ๐‘2 โ‰ฅ 0. If where 0 < ๐œŒ < ห†๐œŒ, ห†๐œŒ := ๐‘1 โˆ’ ๐‘2 ๐‘21 โˆ’ ๐‘11 โˆ’ ๐‘2 + ๐‘1 . (3.9) Then, there exists a pure policy Nash equilibrium action pair of {weak defense, attack} for the zero sum game defined by the ๐œŒ๐ต + (1 โˆ’ ๐œŒ)๐ถ, โ–ก 59 Proof: When ๐œŒ = 1, the matrix ๐œŒ๐ต + (1 โˆ’ ๐œŒ)๐ถ simplifies to just the matrix ๐ต, which leads to a mixed strategy Nash equilibrium [132] (no row or column domination). When ๐œŒ = 0, the second playerโ€™s action does not impact the cost. When 0 < ๐œŒ < 1, we examine the entries of the matrix ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐œŒ๐‘11 + (1 โˆ’ ๐œŒ)๐‘1 ๐œŒ๐‘12 + (1 โˆ’ ๐œŒ)๐‘1 ๐œŒ๐‘21 + (1 โˆ’ ๐œŒ)๐‘2 ๐œŒ๐‘22 + (1 โˆ’ ๐œŒ)๐‘2 . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (3.10) Based on the entries in (3.10), the entry ๐œŒ๐‘22 + (1 โˆ’ ๐œŒ)๐‘2 is the smallest entry of the matrix. As ๐œŒ โ†’ 0, the defender switches to the pure policy of weak defense. Thus, we need to determine the value of ๐œŒ for which a row domination (weak defense) occurs. This condition corresponds to: ๐œŒ๐‘21 + (1 โˆ’ ๐œŒ)๐‘2 < ๐œŒ๐‘11 + (1 โˆ’ ๐œŒ)๐‘1, โ‡’ ๐œŒ(๐‘21 โˆ’ ๐‘11 โˆ’ ๐‘2 + ๐‘1) < ๐‘1 โˆ’ ๐‘2, which upon further simplification, leads to equation (3.9). โ–ก Lemma 3.3.3 outlines a condition that dictates the shift from a mixed policy to a pure policy Nash equilibrium. This result will help us derive the conditions for a termination threshold in the following Theorem. Theorem 4 summarizes the analytic expressions for the Nash equilibria in behavioral policies and the corresponding value at any stage ๐‘˜ for any given ๐œŒ๐‘˜ . Theorem 3.3.4 The Nash equilibrium policies at any stage ๐‘˜ โˆˆ {1, . . . , ๐พ } for a given M-SSG with a termination threshold of ๐ฟ, stage cost matrices ๐‘†๐‘˜ , and หœ๐‘…๐‘˜ := ๏ฃฎ ๐‘Ÿ1,๐‘˜ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘Ÿ2,๐‘˜ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘Ÿ1,๐‘˜ ๐‘Ÿ2,๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป = ๐‘ 12,๐‘˜ ๐‘ 22,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ 12,๐‘˜ ๐‘ 22,๐‘˜ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป 60 under Assumption 3.3.1 are given by: ๐‘ฆ๐‘–,โˆ— ๐‘˜ = ๐‘ง๐‘–,โˆ— ๐‘˜ = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ 22,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ (cid:101)๐‘‰๐‘˜ ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ + ๐‘ 22,๐‘˜ + ๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ (cid:101)๐‘‰๐‘˜ ๐‘ 22,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ (cid:98)๐‘‰๐‘˜ ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘‰ ๐‘˜ (cid:98)๐‘‰๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป [0 1]โ€ฒ , , if ๐‘– โ‰  ๐ฟ, and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐‘– ๐‘˜ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป if ๐‘– = ๐ฟ and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐ฟ ๐‘˜ , (3.11) ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ๐œŒ๐‘˜ (cid:101)๐‘‰๐‘˜ ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ + ๐‘ 22,๐‘˜ + ๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ๐œŒ๐‘˜ (cid:101)๐‘‰๐‘˜ ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ๐œŒ๐‘˜ (cid:98)๐‘‰๐‘˜ ๐‘ 11,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ โˆ’ ๐‘‰ ๐‘˜ ๐œŒ๐‘˜ (cid:98)๐‘‰๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป [1 0]โ€ฒ , otherwise, , if ๐‘– โ‰  ๐ฟ, and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐‘– ๐‘˜ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป if ๐‘– = ๐ฟ and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐ฟ ๐‘˜ , (3.12) otherwise, where ๐‘‰๐พ = 0, (cid:101)๐‘‰๐‘˜ := ๐‘‰ ๐‘–+1 ๏ฃณ ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ , (cid:98)๐‘‰๐‘˜ := ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘˜ , and หœ๐œŒ๐‘– ๐‘˜ := ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ (cid:101)๐‘‰๐‘˜ , หœ๐œŒ๐ฟ ๐‘˜ := ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ (cid:98)๐‘‰๐‘˜ , โˆ€๐‘– โˆˆ {1, 2, . . . , ๐ฟ}. (3.13) The value of the game at stage ๐‘˜ satisfies ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘ 22,๐‘˜ (๐‘‰ ๐‘–+1 ๐‘˜ ) + det(()๐‘†๐‘˜ ) ๐‘‰ ๐‘– ๐‘˜ + ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘‰ ๐‘–+1 ๐‘˜ det(()๐‘†๐‘˜ ) โˆ’ ๐‘ 22,๐‘˜๐‘‰ ๐‘˜ (cid:98)๐‘‰๐‘˜ ๐œŒ๐‘˜๐‘‰ ๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ )๐‘‰ ๐‘– ๐‘˜ + , ๐‘‰ ๐‘– ๐‘˜ + ๐œŒ๐‘˜ ๐‘ 21,๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ )๐‘ 22,๐‘˜ , , if ๐‘– โ‰  ๐ฟ, and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐‘– ๐‘˜ if ๐‘– = ๐ฟ and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐ฟ ๐‘˜ , if ๐‘– โ‰  ๐ฟ, and ๐œŒ๐‘˜ < หœ๐œŒ๐‘– ๐‘˜ , ๐‘‰ ๐‘– ๐‘˜โˆ’1 = ๐œŒ๐‘˜ (๐‘ 21,๐‘˜ + ๐‘‰ ๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘ 22,๐‘˜ + ๐‘‰ ๐ฟ ๐‘˜ ), otherwise, where det(() ๐‘‹) is the determinant of the matrix ๐‘‹. 61 (3.14) โ–ก ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ Proof: We derive this result by first considering the case of ๐‘– โ‰  ๐ฟ. Given any 2 ร— 2 zero-sum game matrix ๐‘ˆ = ๏ฃฎ ๐‘ข1 ๐‘ข2 ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘ข3 ๐‘ข4 ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป that does not admit any row or column domination, the unique Nash equilibrium mixed policy of the row (๐œ‹โˆ— row) and column player(๐œ‹โˆ— col), and the value of the game (see e.g., [132]) are given by ๐‘ข4โˆ’๐‘ข2 ๐‘ข1โˆ’๐‘ข2+๐‘ข4โˆ’๐‘ข3 ๐‘ข4โˆ’๐‘ข3 ๐‘ข1โˆ’๐‘ข2+๐‘ข4โˆ’๐‘ข3 ๐‘ข1โˆ’๐‘ข2 ๐‘ข1โˆ’๐‘ข2+๐‘ข4โˆ’๐‘ข3 ๐œ‹โˆ— row = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ Val(๐‘ˆ) = ๐œ‹โˆ—โ€ฒ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐‘ˆ๐œ‹โˆ— col = row , ๐œ‹โˆ— col = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ข1โˆ’๐‘ข3 ๐‘ข1โˆ’๐‘ข2+๐‘ข4โˆ’๐‘ข3 ๐‘ข1๐‘ข4 โˆ’ ๐‘ข2๐‘ข3 ๐‘ข1 โˆ’ ๐‘ข2 + ๐‘ข4 โˆ’ ๐‘ข3 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , . (3.15) (3.16) Rewriting the matrix in the argument of the Val(.) operator from (3.6) in a compact form, we obtain the terms ๐‘ข1 = ๐œŒ๐‘˜ (๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ) + ๐‘‰ ๐‘– ๐‘˜ + ๐‘ 12,๐‘˜ , ๐‘ข2 = ๐‘‰ ๐‘– ๐‘˜ + ๐‘ 12,๐‘˜ , ๐‘ข3 = ๐œŒ๐‘˜ (๐‘ 21,๐‘˜ โˆ’ ๐‘ 22,๐‘˜ ) + ๐‘‰ ๐‘– ๐‘˜ + ๐‘ 22,๐‘˜ , ๐‘ข4 = ๐‘‰ ๐‘– ๐‘˜ + ๐‘ 22,๐‘˜ . (3.17) Substituting (3.17) in (3.15) and canceling the common terms including the probability ๐œŒ๐‘˜ in both the numerator and denominator, we obtain the defender policy for ๐‘– โ‰  ๐ฟ as, ๐‘ฆ๐‘–โˆ— ๐‘˜ = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (๐‘ 21,๐‘˜ โˆ’ ๐‘ 22,๐‘˜ ) ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ) (๐‘‰ ๐‘–+1 (๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ) (๐‘‰ ๐‘–+1 ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ ) , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป and the second player policy in (3.12) (derivation skipped for brevity). Similarly, substituting (3.17) in (3.16) we obtain the value of game at stage ๐‘˜ and for any ๐‘– โ‰  ๐ฟ, (cid:16) ๐‘˜ (cid:101)๐‘‰๐‘˜ +๐‘ 22,๐‘˜ (๐‘‰ ๐‘–+1 ๐‘‰ ๐‘– ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ) โˆ’ ๐‘ 12,๐‘˜ (๐‘ 21,๐‘˜ โˆ’ ๐‘ 22,๐‘˜ ) (cid:17) (cid:26)(cid:26)๐œŒ๐‘˜ ๐‘‰ ๐‘– ๐‘˜โˆ’1 = = ๐‘˜ (cid:101)๐‘‰๐‘˜ + ๐‘ 22,๐‘˜ (๐‘‰ ๐‘–+1 ๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘‰ ๐‘– (cid:101)๐‘‰๐‘˜ (cid:26)(cid:26)๐œŒ๐‘˜ (cid:101)๐‘‰๐‘˜ ๐‘˜ ) + det(๐‘†๐‘˜ ) , 62 where (cid:101)๐‘‰๐‘˜ := ๐‘‰ ๐‘–+1 for the case of ๐‘– โ‰  ๐ฟ. Notice the dependency on probability ๐œŒ๐‘˜ is eliminated in computing the value ๐‘˜ and upon further simplification, we obtain (3.14) ๐‘˜ + ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘– of M-SSG for ๐‘– = 1, 2, . . . , ๐ฟ โˆ’ 1. The dependency is coupled only via the case ๐‘– = ๐ฟ, which is solved using the results in [18] (Theorem 4.4) and are applied to (3.7). ๐œŒ(๐‘‰ ๐‘–+1 Next, we derive the expressions for the probabilities หœ๐œŒ๐‘– ๐‘˜ ๐ท + ๐‘†๐‘˜ ) for matrix ๐ต, and (1 โˆ’ ๐œŒ) (๐‘‰ ๐‘– ๐‘˜ ๐น +๐‘‰ ๐‘– ๐‘˜ . For ๐‘– โ‰  ๐ฟ, we substitute the matrix ๐‘˜ 1 + หœ๐‘…๐‘˜ ) for matrix ๐ถ in Lemma 3.3.3, to obtain หœ๐œŒ๐‘–. Similarly, for the case where ๐‘– = ๐ฟ, we utilize the matrix ๐œŒ๐‘˜ (๐‘‰ ๐‘˜ ๐ท + ๐‘†๐‘˜ ) in place of matrix ๐ต, and ๐‘˜ 1 + หœ๐‘…๐‘˜) in place of matrix ๐ถ in Lemma 3.3.3 to yield the probability หœ๐œŒ๐ฟ. Combining (1 โˆ’ ๐œŒ๐‘˜ )(๐‘‰ ๐‘– both scenarios, we arrive at the expression (3.13). The value of the game corresponding to the pure policy pair are given by ๐œŒ(๐‘‰ ๐‘– ๐‘˜ + ๐‘ 21,๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰ ๐‘– ๐‘˜ + ๐‘ 22,๐‘˜ ), ๐‘– โ‰  ๐ฟ, ๐œŒ(๐‘‰ ๐‘˜ + ๐‘ 21,๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰ ๐‘– ๐‘˜ + ๐‘ 22,๐‘˜ ), ๐‘– = ๐ฟ. Now we show that หœ๐œŒ๐ฟ ๐‘˜ โˆˆ [0, 1]. The probability threshold หœ๐œŒ๐ฟ ๐‘˜ from Theorem 3.3.4 is defined as หœ๐œŒ๐ฟ ๐‘˜ := ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘˜ . Under Assumption 3.3.1 we have ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ < 0 and ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ < 0. For the final stage ๐พ, ๐‘‰ ๐พ = 0, therefore, ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ < ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โ‰ค 0. From (3.5) for the stage ๐พ โˆ’ 1, we have ๐‘‰ ๐พโˆ’1 = ๐‘ 11,๐‘˜ ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ๐‘ 21,๐‘˜ ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ . Under Assumption 3.3.1, ๐‘ 11,๐‘˜ ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ ๐‘ 21,๐‘˜ < 0, and from the recursion (3.5), we infer that ๐‘‰ ๐‘˜ is a monotonically increasing function from stage ๐‘˜ = ๐พ to ๐‘˜ = 0. Therefore, ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘˜ < ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โ‰ค 0. 63 Thus, we conclude that หœ๐œŒ๐ฟ ๐‘˜ โˆˆ [0, 1]. Under Assumption 3.3.1, หœ๐œŒ๐‘– ๐‘˜ โˆˆ [0, 1], ๐‘– โˆˆ {1, 2, . . . , ๐ฟ โˆ’ 1} if ๐‘ 11,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ โˆ’ ๐‘ 21,๐‘˜ + ๐‘ 22,๐‘˜ โˆ’ ๐‘‰ ๐‘– ๐‘˜ + ๐‘‰ ๐‘–+1 ๐‘˜ < ๐‘ 22,๐‘˜ โˆ’ ๐‘ 12,๐‘˜ . (3.18) The value of the game and player policies for the case of ๐‘– = ๐ฟ are obtained analogous to the case of ๐‘– โ‰  ๐ฟ. Using the zero-sum matrix from (3.7) in (3.16) and (3.15), we obtain (3.11), (3.12) and (3.14) for the case of ๐‘– = ๐ฟ. Combining both the cases of ๐‘– โ‰  ๐ฟ and ๐‘– = ๐ฟ along with the probabilities หœ๐œŒ๐‘–, we obtain the complete case of (3.11), (3.12) and (3.14). โ–ก Theorem 3.3.4 provides a closed-form M-SSG solution along with the player policies with a termination threshold ๐ฟ. Such a solution provides computational efficiency, and a switching policy indicates a clear trade-off between costs and security. (a) (b) Figure 3.3 (a) Value of an M-SSG vs. edge-game (๐œŒ = 1) over stages ๐พ for หœ๐‘ 2,๐‘˜ = 0.3 and หœ๐‘ 1,๐‘˜ = 1.25, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ } with termination threshold of ๐ฟ = 2 and 4. (b) Probability parameter หœ๐œŒ๐ฟ ๐‘˜ for the same set of parameters. We can further simplify the recursion obtained from Theorem 3.3.4 by parameterizing the stage cost matrices in terms of a uniform strong defense cost. The value of the M-SSG, player policies, and a numerical evaluation for the parameterized stage costs are discussed as follows. 64 51015206WDJHVK5101520V0010.911.011.17.58.08.59.09.5 = 0.50, L = 2 = 0.50, L = 4 = 0.75, L = 2 = 0.75, L = 4 = 1.00, L = 2 = 1.00, L = 451015200.20.40.60.81 = 0.50 = 0.75 = 1.00 (a) (b) Figure 3.4 (a) Probability of choosing strong defense action when ๐‘– = ๐ฟ for the M-SSG and edge-game (๐œŒ = 1), solved for ๐พ = 20 using the same set of parameters, หœ๐‘ 2,๐‘˜ , หœ๐‘ 1,๐‘˜ , and ๐ฟ. (b) Probability of choosing attack action when ๐‘– = ๐ฟ for the M-SSG and edge-game (๐œŒ = 1) with the same parameters of stages ๐พ, หœ๐‘ 2,๐‘˜ , หœ๐‘ 1,๐‘˜ , and ๐ฟ. 3.3.2 Parameterized stage cost and numerical evaluation Following Theorem 3.3.4, we parameterize the stage cost matrix ๐‘†๐‘˜ , for ๐‘˜ โˆˆ {1, 2, . . . , ๐พ } with หœ๐‘ 1,๐‘˜ and หœ๐‘ 2,๐‘˜ as, ๐‘†๐‘˜ = ๐‘ 11,๐‘˜ 1 1 หœ๐‘ 1,๐‘˜ หœ๐‘ 2,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , where หœ๐‘ 1,๐‘˜ := ๐‘ 21,๐‘˜ ๐‘ 11,๐‘˜ , and หœ๐‘ 2,๐‘˜ := ๐‘ 22,๐‘˜ ๐‘ 11,๐‘˜ , (3.19) ๐‘Ÿ1,๐‘˜ = ๐‘ 11,๐‘˜ , and ๐‘Ÿ2,๐‘˜ = หœ๐‘ 2,๐‘˜ . A uniform strong defense cost indicates that the amount of resources spent corresponding to such an action is independent of the action taken by the adversarial player. The condition of หœ๐‘ 1,๐‘˜ โ‰ฅ 1 and หœ๐‘ 2,๐‘˜ < 1 follows from Assumption 3.3.1. The parameterized matrix 65 510152000.20.40.60.81 = 1.00, L =4 = 1.00, L =2 = 0.75, L =4 = 0.75, L =2 = 0.50, L =4 = 0.50, L =2510152000.20.40.60.81 = 1.00, L =4 = 1.00, L =2 = 0.75, L =4 = 0.75, L =2 = 0.50, L =4 = 0.50, L =2 (3.19) with a unit strong defense ๐‘ 11,๐‘˜ = 1, โˆ€๐‘˜ results in the following recursive equation, ๐‘˜ โˆ’๐‘‰ ๐‘– ๐‘˜ + หœ๐‘ 2,๐‘˜โˆ’ หœ๐‘ 1,๐‘˜+ หœ๐‘ 2,๐‘˜ (๐‘‰ ๐‘–+1 ๐‘˜) ๐‘‰ ๐‘– ๐‘˜+๐‘‰ ๐‘–+1 หœ๐‘ 2,๐‘˜โˆ’ หœ๐‘ 1,๐‘˜โˆ’๐‘‰ ๐‘– ๐‘˜ , ๐œŒ๐‘˜๐‘‰ ๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ )๐‘‰ ๐‘– ๐‘˜ + หœ๐‘ 2,๐‘˜ โˆ’ หœ๐‘ 1,๐‘˜ โˆ’ หœ๐‘ 2,๐‘˜๐‘‰ ๐‘˜ หœ๐‘ 2,๐‘˜ โˆ’ หœ๐‘ 1,๐‘˜ โˆ’ ๐‘‰ ๐‘˜ , ๐‘‰ ๐‘– ๐‘˜โˆ’1 = ๐‘‰๐‘˜ + ๐œŒ๐‘˜ หœ๐‘ 1,๐‘˜ + (1 โˆ’ ๐œŒ๐‘˜ ) หœ๐‘ 2,๐‘˜ , if if if ๐‘– โ‰  ๐ฟ and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐‘– ๐‘˜ , ๐‘– = ๐ฟ and ๐œŒ๐‘˜ โ‰ฅ หœ๐œŒ๐‘– ๐‘˜ , (3.20) ๐‘– โ‰  ๐ฟ and ๐œŒ๐‘˜ < หœ๐œŒ๐‘– ๐‘˜ , ๐œŒ๐‘˜ ( หœ๐‘ 1,๐‘˜ + ๐‘‰ ๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) ( หœ๐‘ 2,๐‘˜ + ๐‘‰ ๐‘– ๐‘˜ ), otherwise. ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ Similarly, we can derive the parameterized policies for both players, but leave those details out for brevity. We now compare an M-SSG with an edge-game for a fixed set of probabilities, i.e., ๐œŒ๐‘˜ = ๐œŒ, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ }, and a termination threshold ๐ฟ under a unit strong defense cost, i.e., ๐‘ 11,๐‘˜ = 1, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ }. The value of an M-SSG (๐‘‰ 0 0 ) over the number of stages is shown in Figure 3.3a. We observe that the value of M-SSG increases with increasing values of ๐ฟ and probability ๐œŒ. For instance, the value ๐‘‰ 0 0 for a fixed ๐œŒ = 1.0 is higher for ๐ฟ = 4 compared to ๐ฟ = 2. Similarly, the 0 for a fixed values of ๐ฟ = 2 is higher for ๐œŒ = 0.5 compared to ๐œŒ = 1.0. In summary, as the likelihood of an adversary increase, so does the value of M-SSG. The Nash equilibrium policies value ๐‘‰ 0 for an M-SSG with the corresponding probability ๐œŒ and termination threshold ๐ฟ are shown in Figure 3.4a and 3.4b, respectively. We observe that the defender (resp. second player) switches to a pure policy weak defense (resp. attack) at the stages when ๐œŒ๐‘˜ is below หœ๐œŒ๐‘˜ , indicated in Figure 3.3b. In particular, the probabilities ๐œŒ = 0.4, 0.65 are below หœ๐œŒ๐‘˜ for the stages 20 and 19. As we iterate ๐‘– = {๐ฟ โˆ’ 1, ๐ฟ โˆ’ 2, . . . , 1}, the termination threshold enables greater instances of the policy pair {weak defense, attack}. Finally, for a given a termination threshold ๐ฟ with a fixed probability ๐œŒ, once a player switches to a pure policy, it continues to play the pure policy until the final stage ๐พ. Remark 3.3.5 The presented analysis and numerical evaluation of the M-SSG corresponds to a particular set T . However, with a change in the set T , we can derive corresponding Nash 66 equilibrium policies and value of the game, representing different game structures. For instance, when T := {{strong defense, attack}, {strong defense, no attack}}, the termination threshold corresponds to a limit on the number of times the defender can resort to strong defense actions. Similarly, if T := {{strong defense, attack}, {weak defense, attack}}, the termination threshold would correspond to a limit on the number of attack actions. To summarize, in this section, we studied how the termination threshold of ๐ฟ affects the solution of the M-SSG. We derived a recursive equation for both the value of the M-SSG and player policies. The numerical study provides insight into the player policies as a function of ๐ฟ and the number of stages of the game. In the next section, we will extend the SSG model to a larger action space, i.e., more than two actions per player setting and derive a condition similar to Theorem 3.3.4 to switch to a pure player policy. 3.4 Solution to SSG๐‘šร—๐‘› In this section, we analyze the model and solution for an SSG๐‘šร—๐‘› with a termination threshold set to ๐ฟ = 1, i.e., the game reaches a stopping state when both the defender and the second player jointly select an action pair from the set T . While the model SSG๐‘šร—๐‘› can be extended to accommodate termination thresholds greater than 1, for ease of exposition, we present the case of ๐ฟ = 1. Similar to our approach with the M-SSG, we develop a methodology to compute the outcome ๐ฝ๐พ defined in equation (3.3), resulting in a Nash equilibrium for the SSG๐‘šร—๐‘›. With ๐ฟ = 1, the expected value of an SSG๐‘šร—๐‘› at stage ๐‘˜ โˆ’ 1 will be a function of an edge-game with value ๐‘‰ ๐‘˜ [14] at stage ๐‘˜ with probability ๐œŒ๐‘˜ and ๐‘‰๐‘˜ (The superscript 1 has been omitted for clarity) with probability 1 โˆ’ ๐œŒ๐‘˜ . The actions of the players and the corresponding entries of the stage cost matrices ๐‘†๐‘˜ and ๐‘…๐‘˜ 67 are given as, attack 1 . . . no attack no attack defense 1 defense 2 . . . defense m . . . . . . . . . . . . ๐‘ 21,๐‘˜ ๐‘ 11,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๐‘ ๐‘š1,๐‘˜ ๏ฃฏ ๏ฃฐ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:124) . . . (cid:123)(cid:122) ๐‘†๐‘˜ ๐‘ 1๐‘›,๐‘˜ ๐‘ 2๐‘›,๐‘˜ . . . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:125) defense 1 defense 2 , . . . defense ๐‘š , ๐‘Ÿ2,๐‘˜ ๐‘Ÿ1,๐‘˜ ๏ฃน ๏ฃฎ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๏ฃบ ๏ฃฏ ๐‘Ÿ๐‘š,๐‘˜ ๏ฃฏ ๏ฃบ ๏ฃป ๏ฃฐ (cid:124)(cid:123)(cid:122)(cid:125) ๐‘…๐‘˜ . . . ๐‘ ๐‘š๐‘›,๐‘˜ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) where ๐‘†๐‘˜ โˆˆ R๐‘š๐‘˜ร—๐‘›๐‘˜ and ๐‘…๐‘˜ โˆˆ R๐‘š๐‘˜ . Similar to M-SSG, the stage cost matrix หœ๐‘…๐‘˜ โˆˆ R๐‘š๐‘˜ร—๐‘›๐‘˜ is a matrix whose columns are all equal to ๐‘…๐‘˜ , representing a zero-sum matrix with ๐‘š๐‘˜ defender actions and ๐‘›๐‘˜ second player actions. As per Definition 1, let {๐›ผ, ๐›ฝ} represent the set of action pair indices corresponding to the set T . In other words, the game stops at any stage when the players choose actions that belong to this set. We define a matrix D such that D๐›พ,๐›ฟ = 0, โˆ€{๐›พ, ๐›ฟ} โˆˆ {๐›ผ, ๐›ฝ}, D๐›พ,๐›ฟ = 1, otherwise, ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ where D๐›พ,๐›ฟ is (๐›พ, ๐›ฟ)th entry of the matrix D. Analogous to the M-SSG, the value of an SSG๐‘šร—๐‘› at stage ๐‘˜ is ๐‘‰๐‘˜โˆ’1 = Val(๐œŒ๐‘˜ (๐‘‰ ๐‘˜ D + ๐‘†๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰๐‘˜ 1 + หœ๐‘…๐‘˜ )), (3.21) where ๐‘†๐‘˜ โˆˆ R๐‘š๐‘˜ร—๐‘›๐‘˜ and หœ๐‘…๐‘˜ โˆˆ R๐‘š๐‘˜ร—๐‘›๐‘˜ are stage cost matrices corresponding to adversarial and non- adversarial player type. At stage ๐‘˜, the defender and second player have ๐‘š๐‘˜ and ๐‘›๐‘˜ number of actions, respectively. The solution to (3.21) depends on the edge-game at every stage ๐‘˜ โˆˆ {1, 2, . . . , ๐พ }, which takes the form ๐‘‰ ๐‘˜โˆ’1 = Val(๐‘‰ ๐‘˜ D + ๐‘†๐‘˜ ) := min ๐‘ฆ๐‘˜ โˆˆฮ”๐‘š๐‘˜ max ๐‘ง๐‘˜ โˆˆฮ”๐‘›๐‘˜ ๐‘ฆโ€ฒ ๐‘˜ (๐‘‰ ๐‘˜ D + ๐‘†๐‘˜ )๐‘ง๐‘˜ . (3.22) Problem (3.22) can be formulated as a linear program [61] with (๐‘‰ ๐‘˜ D + ๐‘†๐‘˜ ) as the zero-sum matrix, whose outcome are ๐‘‰ ๐‘˜โˆ’1, ๐‘ฆ๐‘˜ and ๐‘ง๐‘˜ . Similar to (3.22), we can establish a recursive form for an 68 SSG๐‘šร—๐‘› (3.21) given by ๐‘‰๐‘˜โˆ’1 = min ๐‘ฆ๐‘˜ โˆˆฮ”๐‘š๐‘˜ max ๐‘ง๐‘˜ โˆˆฮ”๐‘›๐‘˜ (cid:16) ๐‘ฆโ€ฒ ๐‘˜ ๐œŒ๐‘˜ (๐‘‰ ๐‘˜ D + ๐‘†๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰๐‘˜ 1 + หœ๐‘…๐‘˜ ) (cid:17) ๐‘ง๐‘˜ , (3.23) where 1 โˆˆ R๐‘š๐‘˜ร—๐‘›๐‘˜ is a matrix of ones. Analogous to (3.22), we can formulate (3.23) as a linear program, with (๐œŒ๐‘˜ (๐‘‰ ๐‘˜ D + ๐‘†๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰๐‘˜ 1 + หœ๐‘…๐‘˜ )) serving as a zero-sum matrix. The outcome of this linear program yields the Nash equilibrium policies and the value of the game (๐‘‰๐‘˜ ) at every stage ๐‘˜ โˆˆ {1, 2, . . . , ๐พ }. While it is possible to compute the solution for a given SSG๐‘šร—๐‘› and determine the corresponding player policies, the computational load can increase significantly for large number of stages ๐พ. Hence, in the subsequent result, we derive a sufficient condition to determine when to switch from a numerical solution to an analytical one. Assumption 3.4.1 We assume the following stage cost inequality holds for any stage ๐‘˜ for the SSG๐‘šร—๐‘›, ๐‘Ÿ1,๐‘˜ > ๐‘Ÿ๐‘–,๐‘˜ โ‰ฅ ๐‘Ÿ๐‘š,๐‘˜ โ‰ฅ 0, โˆ€๐‘– โˆˆ {2, 3, . . . , ๐‘š โˆ’ 1} ๐‘ ๐‘š1,๐‘˜ > ๐‘ ๐‘–๐‘,๐‘˜ โ‰ฅ ๐‘ ๐‘š๐‘›,๐‘˜ โ‰ฅ 0, โˆ€๐‘ โˆˆ {2, 3, . . . , ๐‘› โˆ’ 1}, ๐‘ ๐‘š1,๐‘˜ > ๐‘ ๐‘– ๐‘—,๐‘˜ , โˆ€๐‘– โˆˆ {1, 2, . . . , ๐‘š โˆ’ 1}, ๐‘— โˆˆ {1, 2, . . . , ๐‘›}, ๐‘ ๐‘š๐‘›,๐‘˜ โ‰ค ๐‘ ๐‘– ๐‘—,๐‘˜ , โˆ€๐‘– โˆˆ {1, 2, . . . , ๐‘š โˆ’ 1}, ๐‘— โˆˆ {1, 2, . . . , ๐‘›}, Assumption 3.4.1 can be considered as an extension of Assumption 3.3.1. It signifies four conditions: i) Within the matrix ๐‘…๐‘˜ , the defense costs rise monotonically from the last row of defense (๐‘š) to the first row of defense (1). ii) In the matrix ๐‘†๐‘˜ , the costs corresponding to row m decreases across the column, i.e., from the action of attack 1 to no attack. iii) The cost corresponding to the action pair {defense ๐‘š, attack 1} is the largest entry of the matrix ๐‘†๐‘˜ . iv) The cost corresponding to the action pair {defense ๐‘š, no attack } is the smallest entry of the matrix ๐‘†๐‘˜ . Under the specified Assumption 3.4.1, the following result summarizes a switching condition under which a pure Nash equilibrium exists at a given stage. 69 Theorem 3.4.2 The Nash equilibrium policies at any stage ๐‘˜ โˆˆ {1, 2, . . . , ๐พ } for a given SSG๐‘šร—๐‘› with a termination threshold of ๐ฟ = 1, stage cost matrices {๐‘†๐‘˜ , and หœ๐‘…๐‘˜ } under Assumption 3.4.1 is given by where (cid:20) 0 . . . 1 (cid:21) โ€ฒ , Solve (3.23) as a linear program [61], ๐‘ฆโˆ— ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ (cid:20) 1 . . . 0 (cid:21) โ€ฒ , Solve (3.23) as a linear program [61], ๐‘งโˆ— ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ if ๐œŒ๐‘˜ < ห†๐œŒ๐‘˜ , otherwise, if ๐œŒ๐‘˜ < ห†๐œŒ๐‘˜ , otherwise, ห†๐œŒ๐‘˜ = ๐‘Ÿ ๐‘— โˆ—,๐‘˜ โˆ’ ๐‘Ÿ๐‘š,๐‘˜ ๐‘ ๐‘š1,๐‘˜ โˆ’ ๐‘  ๐‘— โˆ—1,๐‘˜ + ๐‘Ÿ ๐‘— โˆ—,๐‘˜ โˆ’ ๐‘Ÿ๐‘š,๐‘˜ + ๐‘‰ ๐‘˜ โˆ’ D๐‘š, ๐‘— โˆ—๐‘‰ ๐‘˜ , ๐‘— โˆ— := arg min ๐‘–โˆˆ{1,2,...,๐‘šโˆ’1} ๐‘ ๐‘–1,๐‘˜ The value of the game at stage ๐‘˜ is given by ๐‘‰๐‘˜โˆ’1 = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐œŒ๐‘˜ (๐‘ ๐‘š1,๐‘˜ + ๐‘‰ ๐‘˜ ) + (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘Ÿ๐‘š,๐‘˜ + ๐‘‰๐‘˜ ), if ๐œŒ๐‘˜ < ห†๐œŒ๐‘˜ Solve (3.23) as a linear program [61], otherwise. (3.24) (3.25) (3.26) (3.27) โ–ก Proof: The proof closely follows from Lemma 3.3.3. When ๐œŒ๐‘˜ = 1, ๐‘‰ ๐‘˜ D + ๐‘†๐‘˜ is the zero-sum matrix under consideration, where a linear program [61] is used to solve for the value of the game and the player policies. In contrast, when ๐œŒ๐‘˜ = 0, the zero-sum matrix is ๐‘‰๐‘˜ 1 + หœ๐‘…๐‘˜ , which 70 corresponds to a repeated column matrix with a pure policy of weak defense, and where the cost is invariant of the second playerโ€™s action. Hence, under Assumption 3.4.1 and when 0 < ๐œŒ๐‘˜ < 1, we seek to determine a value of ๐œŒ๐‘˜ that causes a switch from a linear program solution to a pure policy action of weak defense (row domination). Furthermore, when a row domination is encountered, the second player also switches to a pure policy of attack 1. The pure Nash equilibrium of (cid:104) ๐‘ฆโˆ— ๐‘˜ = 0 . . . 1 (cid:105) โ€ฒ (cid:104) ๐‘งโˆ— ๐‘˜ = 1 . . . 0 (cid:105) โ€ฒ . arises when ๐œŒ๐‘˜ (๐‘ ๐‘š1,๐‘˜ + ๐‘‰ ๐‘˜ )+ ๐œŒ๐‘˜ (๐‘ ๐‘š ๐‘— โˆ—,๐‘˜ + D๐‘š, ๐‘— โˆ—๐‘‰ ๐‘˜ )+ < (1 โˆ’ ๐œŒ๐‘˜ )(๐‘‰๐‘˜ + ๐‘Ÿ๐‘š,๐‘˜ ) (1 โˆ’ ๐œŒ๐‘˜ ) (๐‘‰๐‘˜ + ๐‘Ÿ ๐‘— โˆ—,๐‘˜ ), (3.28) where ๐‘— โˆ— := arg min๐‘–โˆˆ{1,2,...,๐‘šโˆ’1} ๐‘ ๐‘–1,๐‘˜ . By rearranging the terms related to ๐œŒ๐‘˜ to the left-hand side, the inequality becomes: ๐œŒ๐‘˜ (๐‘ ๐‘š1,๐‘˜ + ๐‘‰ ๐‘˜ โˆ’ ๐‘ ๐‘š ๐‘— โˆ—,๐‘˜ โˆ’ D๐‘š, ๐‘— โˆ—๐‘‰ ๐‘˜ + ๐‘Ÿ๐‘š,๐‘˜ โˆ’ ๐‘Ÿ ๐‘— โˆ—,๐‘˜ ) < ๐‘Ÿ๐‘š,๐‘˜ โˆ’ ๐‘Ÿ ๐‘— โˆ—,๐‘˜ . Further simplification leads to (3.26). We derive (3.27) for case of ๐œŒ๐‘˜ < หœ๐œŒ๐‘˜ from the left hand side of the inequality (3.28) and the linear programming-based solution otherwise. โ–ก Theorem 3.4.2 provides us with a recursive approach that combines numerical and analytical solutions, effectively reducing the computational burden when solving SSG๐‘šร—๐‘›. Next, we illustrate the results of Theorem 3.4.2 using a numerical example for a chosen set of stage cost matrices and probability ๐œŒ๐‘˜ . This allows us to observe how the value of SSG๐‘šร—๐‘› and the corresponding player policies change with varying probability ๐œŒ๐‘˜ . Numerical example We evaluate an SSG๐‘šร—๐‘› with four actions for both the defender and second player, i.e, ๐‘†๐‘˜ โˆˆ R4ร—4 and หœ๐‘…๐‘˜ โˆˆ R4ร—4, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ }. In this numerical example, we use the matrix D as: D = 03ร—3 13ร—1 11ร—3 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป 71 (a) (b) Figure 3.5 (a) Nash Equilibrium policy of the defender (rows 2,3 and 4) for an SSG๐‘šร—๐‘› for a range of ๐œŒ solved over a total of ๐พ = 20 stages. The SSG๐‘šร—๐‘› was solved with stage cost matrix entries ๐‘ 1,๐‘˜ = 1.0, ๐‘ 2,๐‘˜ = 1.2 and ๐‘ 3,๐‘˜ = 0.3, ๐‘˜ โˆˆ {1, 2, . . . , ๐พ }. (b) Nash Equilibrium policy of the second player (column 1 and 4) for the same stage cost parameters and total number of stage. where 0๐‘Žร—๐‘ and 1๐‘Žร—๐‘ correspond to a matrix of zeros or ones of size ๐‘Ž ร— ๐‘. We parameterize the stage-cost matrices ๐‘†๐‘˜ and หœ๐‘…๐‘˜ using three terms, ๐‘ 1,๐‘˜ , ๐‘ 2,๐‘˜ , and ๐‘ 3,๐‘˜ to obtain: ๐‘†๐‘˜ = ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 2,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 2,๐‘˜ 2 ๐‘ 2,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 2,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 2,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 3,๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , หœ๐‘…๐‘˜ = ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 3,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 3,๐‘˜ ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 3,๐‘˜ . ๐‘ 1,๐‘˜ ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 1,๐‘˜+๐‘ 3,๐‘˜ 2 ๐‘ 3,๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป Both the stage cost matrices ๐‘†๐‘˜ and หœ๐‘…๐‘˜ follow Assumption 3.4.1. We evaluate an SSG๐‘šร—๐‘› using the parameters ๐‘ 1,๐‘˜ = 1.0, ๐‘ 2,๐‘˜ = 1.2 and ๐‘ 3,๐‘˜ = 0.3, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ }, for a total of ๐พ = 20 stages with fixed probabilities ๐œŒ๐‘˜ = ๐œŒ, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ }. The defender and second player policies are shown in Figure 3.5a, and 3.5b, respectively. Notably, due to the nearly identical entries in the second and third rows of the parameterized stage cost matrix ๐‘†๐‘˜ , the defenderโ€™s policy of playing defense 2 or 3 is indistinguishable, as seen in Figure 3.5a. As the probability ๐œŒ decreases, we observe the defender shifts to a pure policy of defense 4 in the later stages of the SSG๐‘šร—๐‘›. This shift is attributed to the reduced likelihood of having an adversarial player in the game. The second playerโ€™s policy involves selecting extreme column 72 00.5105101520Stages, k00.51 = 0.2 = 0.5 = 0.700.5105101520Stages, k00.51 = 0.2 = 0.5 = 0.7 choices, specifically attacking 1 or no attack, as indicated in Figure 3.5b. In other words, the second player mixes between attack 1 and no attack, while not playing semi-attack (attack 2 and 3) actions. Similar to the defenderโ€™s policy, a decrease in the probability ๐œŒ leads to an increased likelihood of attack 1 in the later stages of the SSG๐‘šร—๐‘› with a switch to the pure policy of attack 1. This numerical example demonstrates the framework of an SSG beyond two actions per player and provide insights into symmetric policies for a parameterized version of stage cost matrices. Furthermore, it provides an analytical condition under which a player can switch from a numerical solution to a pure policy. In the next section, we will apply the framework of SSG๐‘šร—๐‘› on an estimation problem and demonstrate the player policies through a numerical example. 3.5 Application We now apply an M-SSG to two motion planning scenarios that involve making decisions to detect and counter adversarial intent with an engagement budget of ๐ฟ = 1. Through these scenarios, we demonstrate how to incorporate mobility aspects of an autonomous vehicle in the framework of a M-SSG. (a) (b) Figure 3.6 (a) Ego and non-ego vehicle policy averaged over 50 experiment runs for ๐พ = 25 with ๐œŒ = 0.25. (b) Simulated ego and non-ego vehicle policy with defined stage costs and ๐œŒ = 0.25. 73 01020Time (s)00.20.40.60.81p(Defend/Attack)Var(Ego)Var(Non-ego)Ego, = 0.25Non-ego, = 0.2505101520Time (s)00.20.40.60.81p(Defend\Attack)Ego, = 0.25Non-ego, = 0.25 (a) (b) Figure 3.7 (a) Sample policy of the defender and attack for a given experimental run. (b) Sampled and expected value of the SSG compared with the theoretical value of the SSG. 3.5.1 Leader-Follower Game with Fixed Stage Cost Matrix In this scenario, the ego vehicle aims to maintain a safe distance from a potentially adversarial vehicle on the road, akin to a cruise control behavior [68]. The ego vehicle is assumed to be equipped with sensors like LiDARs, cameras, and radar, providing information in the vicinity of the ego vehicle, including the likelihood of any other non-ego vehicle behaving adversarially. This likelihood information is used to derive the stochastic parameter ๐œŒ of the M-SSG. The adversary has the option to brake, potentially causing the ego vehicle to slow down, or can continue traveling at nominal speeds. The ego vehicle, on the other hand, can choose to brake, ensuring a safe distance is maintained, or can opt to travel at a nominal speed. When a non-ego vehicle chooses to brake and is successful while the ego vehicle is traveling at nominal speed, the distance between the vehicles is reduced, necessitating a slowdown by the ego vehicle. This leads to additional time required to reach its goal. This scenario is modeled through fixed stage cost matrices: 74 510152025Time (s)00.20.40.60.81ActionDefendAttack01020304050Experiment epoch26272829Value of SSGExperimentsSSG TheoreticalSSG Experiments Brake ๐‘† = Nominal speed Brake Nominal speed ๐œ™ฮ” หœ๐œ™ฮ” ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐œ™ฮ” ฮ” ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป Brake , ๐‘… = Nominal speed Nominal speed Nominal speed ๐œ™ฮ” ฮ” ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐œ™ฮ” ฮ” . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป where ฮ” represents the time required per decision instant for the ego vehicle. Here, ๐œ™ > 1 represents the additional time factor required during a braking action applied by the ego vehicle, and หœ๐œ™ > ๐œ™ represents the increased time factor when the ego vehicle moves at nominal speed and the non-ego vehicle acts adversarially. Simulation and Experiments: We conducted experiments using Turtlebot3 burgers to represent both the ego and non-ego vehicles, maintaining a safe distance between them. An OptiTrack Motion Capture system was used for localization and as an infrastructure. The Robot Operating System (ROS) was employed for implementing the M-SSG policies and the corresponding actions on both the ego and non-ego vehicles. Given the nominal velocity of the ego vehicle, along with starting and goal positions, we determined the total number of stages ๐พ for the M-SSG. A total of 50 experiments were carried out and compared against the numerical solution from Theorem 3.3.4. The nominal linear velocity of the robot is set to 0.14 m/s in the absence of any brake action by either the ego or non-ego vehicle. The linear velocity is reduced to 0.10 m/s under a successful attack (brake action by the non-ego vehicle) and to 0.12 m/s under a brake action by the ego vehicle when the non-ego vehicle travels at nominal speed. These nominal velocities were arbitrarily chosen within the constraints feasible for the Turtlebot3. The change in velocity for each action was determined to conform to the stage cost matrix structure (Assumption 3.3.1). The M-SSG is played at 1 Hz, where ฮ” equals the safe distance (1 m) from the adversary agent ahead of the defender divided by the velocity, representing the time required to cover that distance. The entries of the stage cost matrix also account for the cost of returning to nominal velocity. The numerical entries of the stage cost matrices are: 75 ๐‘† = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน 1.18 1.18 ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป 1.38 1.0 , ๐‘… = 1.18 1.18 1.0 1.0 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป The actions sampled from the Nash equilibrium policies realized in the experiments and nu- merical evaluation are averaged and illustrated in Figure 3.6a and 3.6b. We observe a constant difference between averaged M-SSGs from the experiments and the numerical evaluation primarily due to uncertainties stemming from lack of synchronization, delays, and the accuracy of the robotโ€™s position. Furthermore, we observe that both the non-ego and ego vehicles increase the probability of braking as the game progresses over the stages, with the exception of the ego vehicle dropping out at the end. A sample policy realization of both the ego vehicle and adversary (non-ego vehicle) is shown in Figure 3.7a, where the shaded region represents the game being active and reaching a stopping state, otherwise. Finally, the value of M-SSG from Theorem 3.3.4, i.e., the total time taken to cover ๐พ stages is compared against the averaged SSG from experiments in Figure 3.7b, indicating a difference between the theoretical and experiments arising from the uncertainties as indicated earlier. These findings show that with an appropriate choice of stage cost matrices and scenarios, we can apply the framework of M-SSG with constant stage cost matrices to reason about the possible actions in the presence of a probabilistic adversary. Figure 3.8 Illustration of trajectory under scaled and unscaled control input ๐›พ๐‘ข๐‘˜ and ๐‘ข๐‘˜ respectively for a finite horizon ๐‘‡. 76 Difference between scaled andunscaled state - State under scaled control input. 3.5.2 Lane-merging scenario The M-SSG framework also supports varying stage costs, motivating us to consider dynamic scenarios commonly encountered in multi-robot problems. We assume the existence of a path prediction mechanism for the ego vehicle (defender), capable of generating its own trajectory and that of any surrounding agents. Similar to the constant stage cost game, we assume that the defender is equipped with an array of sensors. This prediction mechanism also provides the likelihood of a predicted path, enabling us to determine the stochastic parameter ๐œŒ. Furthermore, once adversarial intent is ascertained, the non-ego vehicle is modeled as an adversary in all subsequent stages. (a) (b) Figure 3.9 (a) Simulated trajectory of an ego and non-ego vehicle in a lane change scenario over 50 time steps with ๐œŒ = 0.1 and ๐œŒ = 1.0, and a final time of 25s. (b) Expected trajectory of the ego and non-ego vehicle averaged over 50 experiment runs, with 50 time steps for corresponding parameters of ๐œŒ. The M-SSG framework accommodates dynamic scenarios commonly encountered in multi- robot problems. It relies on two key inputs: (i) a predicted path for the defender and surrounding agents, and (ii) the likelihood of these predicted paths. We will analyze actions related to braking within the predicted trajectories, comparing unscaled versus scaled control inputs. Figure 3.8 illustrates the distinction between scaled and unscaled inputs over the last three time steps within a horizon of ๐‘‡ โ‰ฅ 3. At time instant ๐‘‡ โˆ’ 3, under a scaled control input ๐›พ๐‘ข๐‘‡โˆ’3, a vehicle reaches the shaded state to the left of ๐‘ฅ๐‘‡โˆ’2. This implies the existence of a control input that enables 77 -2-1012XEgo vehicleNon-egovehicle = 0.1Ego vehicleNon-ego vehicle = 1.0Difference inposition-1012XEgo vehicleNon-egovehicle = 0.1tend = 24.85Ego vehicleNon-ego vehicle = 1.0tend = 24.92Difference inposition (a) (b) Figure 3.10 (a) Ego and Non-ego vehicle policy averaged over 50 experiment runs with a nominal speed of 0.15 m/s and 0.18 m/s. (b) Ego and non-ego vehicle policy averaged over 50 simulation runs for the same set of nominal speeds (0.15 and 0.18 m/s). the vehicle to reach ๐‘ฅ๐‘‡โˆ’1 from the shaded state at time ๐‘‡ โˆ’ 2. In particular, we make the following assumption: Assumption 3.5.1 Given a predicted path, when a scaled control input is applied at any time instant ๐‘ก to reach a corresponding new state at ๐‘ก + 1, there exists a control input at time instant ๐‘ก + 1 to reach the exact predicted state at ๐‘ก + 2. This assumption empowers us to employ the principle of optimality. Essentially, if the vehicle can reach the next state from either a scaled or an unscaled state, the vehicle incorporates the cost difference between the current state and the next state into the cost-to-go. The stage cost at any time instant comprises a state cost along with a safety cost that governs the distance between the defender vehicle and any adversarial vehicles in its vicinity. In this work, we adopt a logarithmic function for the safety cost: ๐œ“๐‘˜ (๐‘ฅ๐‘Ž ๐‘˜ , ๐‘ฅ๐‘‘ ๐‘˜ , ยฏ๐‘‘) = โˆ’๐œ† log (cid:32) ||๐‘ฅ๐‘Ž ๐‘˜ โˆ’ ๐‘ฅ๐‘‘ ๐‘˜ || ยฏ๐‘‘ (cid:33) , (3.29) where ๐‘ฅ๐‘Ž ๐‘˜ and ๐‘ฅ๐‘‘ ๐‘˜ โˆ’ ๐‘ฅ๐‘‘ Here, ยฏ๐‘‘ > ||๐‘ฅ๐‘Ž ๐‘˜ represent the predicted positions of the adversary and defender, respectively. ๐‘˜ || is the minimum safe distance, and ๐œ† is a scaling factor for safety. 78 0510152025Time (s)00.51p(Defend/Attack)Ego, v = 0.15 m/sEgo, v = 0.18 m/sNon-ego, v = 0.15 m/sNon-ego, v = 0.18 m/s0510152025Time (s)00.51p(Defend/Attack)Ego, v = 0.15 m/sEgo, v = 0.18 m/sNon-ego, v = 0.15 m/sNon-ego, v = 0.18 m/s Let ๐‘“๐‘˜ (๐‘ฅ๐‘Ž ๐‘˜ , ๐‘ฅ๐‘‘ ๐‘˜ , ๐›พ๐‘ข๐‘Ž ๐‘˜ , ๐›พ๐‘ข๐‘‘ ๐‘˜ ) denote the current payoff. Thus, the stage cost matrices for the Stochas- tic Stopping Game (SSG) are chosen as: ๐‘“๐‘˜ (๐‘ฅ๐‘Ž ๐‘˜ , ๐‘ฅ๐‘‘ ๐‘˜ , ๐›พ๐‘ข๐‘Ž ๐‘“๐‘˜ (๐‘ฅ๐‘Ž ๐‘˜ , ๐›พ๐‘ข๐‘‘ ๐‘˜ , ๐‘ฅ๐‘‘ ๐‘˜ ) + หœ๐‘“๐‘˜ (๐‘ฅ๐‘‘ ๐‘˜ , ๐‘ข๐‘‘ ๐‘˜ ) ๐‘˜ , ๐›พ๐‘ข๐‘Ž ๐‘˜ , ๐›พ๐‘ข๐‘‘ ๐‘˜ ) ๐‘“๐‘˜ (๐‘ฅ๐‘Ž ๐‘˜ , ๐‘ฅ๐‘‘ ๐‘˜ , ๐‘ฅ๐‘‘ ๐‘˜ , ๐‘ข๐‘Ž ๐‘˜ , ๐‘ข๐‘Ž ๐‘˜ , ๐›พ๐‘ข๐‘‘ ๐‘˜ ) ๐‘˜ , ๐‘ข๐‘‘ ๐‘˜ ) ๐‘“๐‘˜ (๐‘ฅ๐‘Ž ๐‘ฅ๐‘Ž ๐œ“๐‘˜ ((cid:101) ๐‘˜+1 ๐‘ฅ๐‘Ž ๐œ“๐‘˜ ((cid:101) ๐‘˜+1 , ๐‘ฅ๐‘‘ (cid:101) ๐‘˜+1 , ๐‘ฅ๐‘‘ ๐‘˜+1 , ยฏ๐‘‘) ๐œ“๐‘˜ (๐‘ฅ๐‘Ž ๐‘˜+1 , ยฏ๐‘‘) ๐œ“๐‘˜ (๐‘ฅ๐‘Ž ๐‘˜+1 , ๐‘ฅ๐‘‘ (cid:101) ๐‘˜+1 , ๐‘ฅ๐‘‘ ๐‘˜+1 , ยฏ๐‘‘) , ยฏ๐‘‘) , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (3.30) หœ๐‘† = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘†๐‘˜ = หœ๐‘† + ๐‘…๐‘˜ = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ 12,๐‘˜ ๐‘ 22,๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘ 12,๐‘˜ ๐‘ 22,๐‘˜ , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐‘˜+1 represents the adversary state with a scaled control ๐›พ๐‘ข๐‘Ž ๐‘ฅ๐‘Ž where (cid:101) ๐‘˜ , ๐›พ๐‘ข๐‘‘ ๐‘ฅ๐‘‘ ๐‘˜+1 represents the defender ๐‘˜ , (cid:101) ๐‘˜ ) denotes the cost to go from a stopping state, i.e., the state with scaled control ๐›พ๐‘ข๐‘‘ ๐‘˜ , and หœ๐‘“๐‘˜ (๐‘ฅ๐‘‘ defender continues to travel using scaled control input for the remaining stage. The dynamic M-SSG is implemented in an online manner, akin to Model Predictive Control (MPC). The implementation is summarized in Algorithm 2. Simulations and experiments: To demonstrate the frameworkโ€™s capability, we consider a lane merging scenario. In this scenario, an ego vehicle (defender) is traveling straight in a lane, while a non-ego vehicle (other agent) is planning to merge into the defenderโ€™s lane. Since the ego vehicle is unaware of the other agentโ€™s intent, we assume the possibility of adversarial behavior. However, the non-ego vehicle may not necessarily act adversarially. Therefore, we approach the Stochastic Stopping Game (SSG) completely from the ego vehicleโ€™s perspective, justifying the zero-sum nature of the game. To ensure realistic and feasible trajectories, we construct predicted trajectories for both the ego and non-ego vehicles, which reflect the typical planning behavior of autonomous vehicles [9]. For our experiments, we utilize Turtlebot3 burgers equipped with ROS for both the ego and non-ego vehicles. We set the prediction horizon (number of stages) as ๐พ = 30 and the safety distance parameter as ๐œ† = 5.0. The algorithm (Algorithm 2) is executed at a frequency of 2 Hz for 50 iterations, resulting in a total of 50 experimental runs. We introduce a stochastic parameter ๐œŒ = 0.1, chosen 79 Algorithm 2: Dynamic M-SSG ๐‘˜:๐‘˜+๐พ, ๐‘ฅ๐‘Ž ๐‘˜:๐‘˜+๐พ - predicted trajectories Input: ๐พ (Time horizon), ฮ”๐‘‡ (Sample time), ๐›พ control scaling factor, k (current time) repeat Every ฮ”๐‘‡ Obtain ๐‘ฅ๐‘‘ Obtain ๐œŒ - likelihood of an adversary Compute ๐‘ฅ๐‘‘ ๐‘˜:๐‘˜+๐พ (๐›พ๐‘ข๐‘Ž Compute ๐‘†๐‘˜:๐‘˜+๐พ and ๐‘…๐‘˜:๐‘˜+๐พ defined in (3.30) Compute policy (3.11), (3.12) by solving the SSG Set, ๐‘‘๐‘˜ โˆผ ๐‘ฆโˆ— Adversary โˆผ ๐œŒ๐‘˜ ๐‘˜:๐‘˜+๐พ (๐›พ๐‘ข๐‘‘ ๐‘˜:๐‘˜+๐พ), ๐‘ฅ๐‘Ž ๐‘˜ , ๐‘Ž๐‘˜ โˆผ ๐‘งโˆ— ๐‘˜ ๐‘˜:๐‘˜+๐พ) if ๐‘‘๐‘˜ = 0 otherwise defender action = (cid:40)๐‘ข๐‘‘ ๐‘˜ , ๐›พ๐‘ข๐‘‘ ๐‘˜ , (cid:40)๐‘ข๐‘Ž ๐‘˜ , ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃฒ ๐›พ๐‘ข๐‘Ž ๐‘˜ , ๏ฃด๏ฃด๏ฃด ๏ฃณ Update the states, ๐‘ฅ๐‘‘ ๐‘˜+1 Increment time, ๐‘˜ = ๐‘˜ + 1 adversary action = , if ๐‘Ž๐‘˜ = 0 otherwise ๐‘ข๐‘Ž ๐‘˜ , (defender action), ๐‘ฅ๐‘Ž if Adversary otherwise (adversary action) ๐‘˜+1 until Goal; arbitrarily to observe the algorithmโ€™s performance. In real-world scenarios, we anticipate that the parameter ๐œŒ would be dynamically updated at each time instant, a situation easily accommodated by Algorithm 2. (a) (b) Figure 3.11 (a) Simulated policy of an ego and non-ego vehicle (possible adversary) in lane change scenario over a range of sample time with ๐œŒ = 0.1. (b) Simulated policy of an ego and non-ego vehicle for the same scenario over a range of nominal speeds with ๐œŒ = 0.1. 80 00.51p(Defend)Ego05101520Time (s)00.51p(Attack)Non-ego T = 0.2 T = 0.4 T = 0.6 T = 0.8 T = 100.51p(Defend)Ego0510152025Time (s)00.51p(Attack)Non-egov = 0.1m/sv = 0.12m/sv = 0.14m/sv = 0.16m/sv = 0.18m/s The expected trajectories in both simulations and experiments are depicted in Figures 3.9a and 3.9b. In these figures, dashed circles indicate the starting pose of the TurtleBots, while solid circles represent the end pose assuming an uncertain adversarial intent (๐œŒ = 0.1) and a deterministic adversary (๐œŒ = 1.0). Itโ€™s important to note that the experiment runs differ from the simulations due to factors like model discrepancies, measurement inaccuracies, and localization errors. Overall, we observe that the experiments closely approximate the simulations, demonstrating the effectiveness of the Stochastic Stopping Game (SSG) framework in reasoning about such navigation tasks. Under the assumption of a deterministic adversary, we observe that the ego vehicle adopts a defensive stance, resulting in a shorter distance covered in the same duration of time. The discrepancy in the final positions of the TurtleBots between the simulation and experiments arises from an initial delay in applying control commands and localization inaccuracies. The policies obtained from 50 runs of experiments and simulations, conducted at nominal speeds of ๐‘ฃ = 0.15 m/s and ๐‘ฃ = 0.18 m/s, are displayed in Figures 3.10a and 3.10b. We notice a slight shift in the policies obtained from experiments, approximately 2 seconds compared to the simulation. This discrepancy is primarily attributed to the initial delays experienced by the TurtleBots in synchronizing the control commands. Nevertheless, the trend observed in the policies from experiments closely aligns with those from simulations, reflecting the online nature of the Stochastic Stopping Game and accounting for model and measurement noise. Furthermore, we conducted simulations for a range of sample times and nominal speeds, as illustrated in Figures 3.11a and 3.11b respectively. We observe that with decreasing sample time, the policies of both the ego and non-ego vehicles converge to pure policies, i.e., always travel at reduced speed and nominal speed, respectively. Conversely, with increasing speed for a fixed sample time, the duration of active braking by the ego vehicle (or brake by the non-ego vehicle) decreases (or increases), indicating a faster merger with higher speeds. These simulations and experiments provide valuable insights into the performance of the dynamic SSG framework across a spectrum of parameters and scenarios. 81 3.5.3 Resilient estimation We will now apply the SSG๐‘šร—๐‘› framework to address a resilient estimation problem using a Kalman filter operating in a possibly adversarial environment. This example demonstrates how to incorporate the framework of SSG๐‘šร—๐‘› in a typical cyber-physical system application. Figure 3.12 A typical feedback control system with an estimator. The control law is a function of the estimates. The estimator performance is dependent on the channel used to communicate the data observed from a sensor. An adversary might be present in the feedback loop impacting the performance of the estimates by injecting noise on different channels. A standard feedback control system is depicted in Figure 3.12 accompanied by a potential adversary. In this scenario, there are ๐‘š feedback channels to communicate the sensor information to an estimator. Each channel choice corresponds to a choosing an estimator with a specific sampling frequency. Equivalently, it can be viewed as choosing different plant models for different channels. Opting for a channel with a high sampling frequency could yield a low error covariance, but might be susceptible to a high measurement noise injected by an adversary. The steady-state posterior error covariance for a channel ๐‘– โˆˆ {1, 2, . . . , ๐‘š} and injected noise ๐‘— โˆˆ {1, 2, . . . , ๐‘›} is obtained by solving the discrete time algebraic Riccati equation (DARE): (cid:16) Pโˆž = F๐‘– Pโˆž โˆ’ PโˆžHT (M)โˆ’1 HPโˆž (cid:17) ๐‘– + Q๐‘–, FT := ๐‘“ (F๐‘–, R ๐‘— ), (3.31) where M := (cid:0)HPโˆžHT + R ๐‘— (cid:1) โˆˆ R๐‘šร—๐‘š, Pโˆž โˆˆ R๐‘›ร—๐‘› is the steady state covariance, F๐‘– โˆˆ R๐‘›ร—๐‘› is 82 PlantControllerSensorEstimator Channel 1Channel mAdversary the state transition model using the channel ๐‘–, H โˆˆ R๐‘šร—๐‘› is the observation model, R ๐‘— โˆˆ R๐‘šร—๐‘š is the covariance of the observation noise and Q๐‘– โˆˆ R๐‘›ร—๐‘› is the covariance of the process noise corresponding to the ๐‘–th channel. We assume that the pairs (F๐‘–, H) is detectable and (F๐‘–, Q1/2) is controllable on and inside the unit circle [134], so that there exists at least one positive definite Pโˆž := ๐‘“ (F๐‘–, R ๐‘— ), resulting in a stable steady-state Kalman filter. In this context, every stage can be perceived as an episode composed of numerous time steps. We configure the stage cost matrices ๐‘†๐‘˜ = ๐‘† and หœ๐‘…๐‘˜ = หœ๐‘… for all ๐‘˜ โˆˆ {1, 2, . . . , ๐พ }, considering ๐‘š defender channels and ๐‘› potential noise levels. In this section, we consider four channels for the defender and four potential noise levels introduced by a probable adversary. Each noise level imparts different observation noise across the various defender channels. For this analysis, we assume fixed stage cost matrices ๐‘†๐‘˜ = ๐‘† and ๐‘…๐‘˜ = ๐‘…, โˆ€๐‘˜ โˆˆ {1, 2, . . . , ๐พ } defined as, ๐‘† = หœ๐‘… = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F2, R2)) Tr( ๐‘“ (F2, R2)) Tr( ๐‘“ (F2, R2โ€ฒ )) Tr( ๐‘“ (F2, R2)) Tr( ๐‘“ (F3, R3)) Tr( ๐‘“ (F3, R3โ€ฒ )) Tr( ๐‘“ (F3, R3)) Tr( ๐‘“ (F3, R3)) Tr( ๐‘“ (F4, R4)) Tr( ๐‘“ (F4, R5)) Tr( ๐‘“ (F4, R6)) Tr( ๐‘“ (F4, R7)) Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F1, R1)) Tr( ๐‘“ (F2, R2)) Tr( ๐‘“ (F2, R2)) Tr( ๐‘“ (F2, R2)) Tr( ๐‘“ (F2, R2)) Tr( ๐‘“ (F3, R3)) Tr( ๐‘“ (F3, R3)) Tr( ๐‘“ (F3, R3)) Tr( ๐‘“ (F3, R3)) Tr( ๐‘“ (F4, R7)) Tr( ๐‘“ (F4, R7)) Tr( ๐‘“ (F4, R7)) Tr( ๐‘“ (F4, R7)) ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , (3.32) where R๐‘ฅ := r๐‘ฅ ๐ผ, ๐‘ฅ โˆˆ {1, 2, 2โ€ฒ, 3, 3โ€ฒ, 4, 5, 6, 7}, i.e., the observation covariance matrix is diagonal. Similarly, the diagonal process noise matrix is given by Q๐‘ฅ := q๐‘ฅ ๐ผ, ๐‘ฅ โˆˆ {1, 2, 3, 4}. The matrix หœ๐‘… is a repeated column matrix of ๐‘…, which is defined by the last column of the matrix ๐‘†. The state transition is defined as F๐‘ฅ := , ๐‘ฅ โˆˆ {1, 2, 3, 4}. The defender channel ๐‘š, which corresponds to the fourth row, is the estimator with highest sampling ๏ฃฎ 1 ฮ›๐‘ฅ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ 0 ๏ฃฏ ๏ฃฐ 1 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป 83 Figure 3.13 Value of the SSG๐‘šร—๐‘› for a range of fixed probability ๐œŒ solved over a total of ๐พ = 20 stages with a engagement budget of ๐ฟ = 1. (a) (b) Figure 3.14 (a) Probability of the defender actions; defense 2,3 and 4 (rows 2,3 and 4) for an SSG๐‘šร—๐‘› for the corresponding probability and stages ๐พ. (b) Probability of the second player actions; attack 1,2 and 3 (column 1, 2 and 3) for the same SSG๐‘šร—๐‘› for the corresponding probability and stages ๐พ. rate but is the least secure, captured via the inequality r4 > r5 โ‰ฅ r6 โ‰ฅ r7. The rows above ๐‘š correspond to slower estimator rates but are secure, with the highest security corresponding to the entries of the first row, captured via the observation covariance r4 > r๐‘ฅ, โˆ€๐‘ฅ โˆˆ {1, 2, 2โ€ฒ, 3, 3โ€ฒ, 5, 6, 7} and ฮ›1 > ฮ›2 > ฮ›3 > ฮ›4. The adversary has the ability to introduce varying noise levels when the defender opts for the ๐‘š-th row, which is the least secure channel. However, as the defender picks more secure channels (rows < ๐‘š) the adversarial impact degrades and completely diminishes 84 05101520Stages K51015202530 = 0.2 = 0.45 = 119202526270.000.501.000.000.501.0005101520Stages, k0.000.501.00 = 0.2 = 0.45 = 10.000.501.000.000.501.0005101520Stages, k-0.05 0.00 0.05 0.10 0.15 = 0.2 = 0.45 = 1 against the first row. For rows โ‰  ๐‘š, adversary can exert maximum influence by selecting specific column or action (r3โ€ฒ โ‰ฅ r3 and r2โ€ฒ โ‰ฅ r2). On the other hand, when the defender picks the first row, the costs remain unaffected by the adversaryโ€™s actions. The matrices ๐‘† and หœ๐‘… in (3.32) follow Assumption 3.4.1. We summarize the observation covariance and sampling frequency of the estimators in the following inequalities: r4 > r5 โ‰ฅ r6 โ‰ฅr3โ€ฒ โ‰ฅ r2โ€ฒ โ‰ฅ r3 โ‰ฅ r2 โ‰ฅ r7 > r1 โ‰ฅ 0 ฮ›1 > ฮ›2 > ฮ›3 > ฮ›4 > 0. We used the following parameters, ฮ›1 = 0.6, ฮ›2 = 0.5, ฮ›3 = 0.3, ฮ›4 = 0.1, r4 = 2.0, r5 = 1.5, r6 = 1.2, r3โ€ฒ = 0.45, r2โ€ฒ = 0.35, r3 = 0.25, r2 = 0.15, r7 = 0.2, r1 = 0.1. Although the adversarial injected noise degrades from the least (๐‘š) to the most secure channel (1), the impact of process noise covariance has the opposite effect. It increases as we move from the least secure channel (๐‘š) to the most secure channel (1). We use the following parameters for process noise covariance, q1 = 0.6, q2 = 0.4, q3 = 0.3, q4 = 0.25. Once an adversarial action is detected, the game reaches a stopping state (i.e., ๐ฟ = 1). The cost-to- go from a stopping state equals the number of remaining stages multiplied by the cost of using the most secure channel. Finally, we define matrix D as D = 03ร—3 13ร—1 11ร—3 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป The value of the SSG๐‘šร—๐‘› is shown in Figure 3.13 along with the corresponding defender and second player policies in Figure 3.14a and 3.14b. Similar to the evaluated numerical example for the M-SSG, we can observe that the value of the SSG๐‘šร—๐‘› increases as the probability ๐œŒ increases. Such an increase indicates that the cost incurred by the defender increases with an increase in 85 adversarial presence. The defender policy is supported over actions defense 2,3 and 4. This suggests that the defender does not choose the most secure channel due to the associated increased cost. Moreover, when the probability of adversarial behavior is low, specifically when ๐œŒ = 2, the defender selects defense 4 (the weakest defense) and only resorts to a more secure channel in the final stage. In essence, as the probability ๐œŒ decreases, the defenderโ€™s policy shifts towards improving the posterior error covariance by selecting less secure channels. Below a certain probability threshold ( ห†๐œŒ1 = 0.223), we observe the defender deterministically chooses the least secure channel. This corresponds to a pure policy switch as indicated by Theorem 3.4.2. We observe a similar trend in the second playerโ€™s policy, where the player policy is supported over actions of attack 1,2 and 3. When ๐œŒ = 1, the second player chooses over attack 2 and 3. As the probability ๐œŒ decreases, the adversarial policy shifts towards a pure policy of highest level of injected noise (attack 1). This application effectively demonstrates the utility of the SSG framework. Solving the SSG๐‘šร—๐‘› empowers the defender to navigate the trade-off between security and lower posterior error covari- ance in a environment influenced by potentially adversarial actions. 3.6 Summary In this chapter, we modeled a finite-horizon decision-making process between a defender and a second player, who has a probabilistic intent of turning into an adversarial player, as a multi- stage zero-sum game with stopping states having a finite termination threshold. We analyzed the M-SSG for an arbitrary finite termination threshold. We characterized the Nash equilibria and value of the M-SSG for the case of two actions per player. We provided a detailed analysis with respect to the termination threshold and a stage-dependent probability of the second player turning adversarial. We derived conditions under which either player opts to play a pure policy. We then extended the M-SSG to an arbitrary finite number of actions per player, termed as SSG๐‘šร—๐‘› with a termination threshold of one. We characterized a condition for the SSG๐‘šร—๐‘› under which there exists a pure Nash equilibrium as a function of the game parameters. We applied the SSG framework to a cyber-physical system in a potentially adversarial environment, involving estimation using a Kalman filter in the presence of a probable adversary. The initial set of results consisting of the 86 stochastic adversarial model along with itโ€™s application to motion planning problems was presented in [19]. More recently, the extension to large action space and its application to resilient estimation is under review in [20]. 87 CHAPTER 4 TAKEOVER ADVERSARY - FLIPDYN: RESOURCE TAKEOVER GAMES In the previous chapter, we extended the deterministic adversarial model to a stochastic adversary. Such an adversarial model can act as benign player or as an adversary governed by a Bernoulli process. However, once it switches to an adversarial mode, it remains an adversary till the end of the game. Similar to the deterministic adversary, we model the interaction between the defender and stochastic adversary as a zero-sum multi-stage game. In addition to the model, we introduced the concept of budgets, which allow for multiple action pairs from a defined set to be played before terminating the game. We characterized the Nash equilibrium of such a game and demonstrated its application on a motion planning problem and resilient estimation. In this chapter, we further enhance the capability of an adversary to completely takeover the system. Consider an adversary that has access to a prototypical CPS control loop and can achieve a takeover at various points shown in Figure 4.1a. These points include the (i) reference inputs, (ii) actuator, (iii) state, (iv) sensor and (v) control output, thereby affecting the system performance. As opposed to conventional adversaries perturbing the states of the system (actuator attack) or measurements (integrity attack) [52], this chapter supposes that an adversary completely takes over a resource and can transmit arbitrary values originating from the controlled resource. 4.1 Introduction We present the FlipDyn, a dynamic game in which two opponents (a defender and an adversary) choose strategies to optimally takeover a resource that involves a dynamical system. At any time instant, each player can take over the resource and thereby control the dynamical system after incurring a state-dependent and a control-dependent costs. The resulting model becomes a hybrid dynamical system where the discrete state (FlipDyn state) determines which player is in control of the resource (cf. Figure 4.1b). Our objective is to compute the Nash equilibria of this dynamic zero-sum game. The security attributes of any CPS are broadly classified into three categories; confidentiality, integrity, and availability [12]. Any type of attack on CPS impacts one or multiple of these 88 (a) (b) Figure 4.1 (a) Closed-loop system with adversaries present at various locations infecting the reference values, actuator, plant, measurement output and control input. (b) Closed-loop system with the adversary present between the controller and actuator trying to takeover the control signals. The takeover action at time ๐‘˜ of the defender (resp. adversary) is given by ๐œ‹0 ๐‘˜ ). A FlipIt is setup over the control signal between the defender and adversarial control. ๐‘˜ (resp. ๐œ‹1 attributes. Confidentiality in a CPS is focused on preventing adversaries from deducing the states of the dynamical system or its measurements. This is achieved by safeguarding against eavesdropping activities that may occur between different components of the CPS. Integrity, on the other hand, revolves around upholding the systemโ€™s operational goals. This is done by discouraging and identifying deceptive attacks on the information exchanged between various components within the CPS. Lastly, availability signifies the systemโ€™s capability to sustain its operational objectives. This is ensured by actively countering denial-of-service (DOS) attacks that may target different components of the CPS. In this work, an adversary targets both confidentiality and integrity by taking control of a dynamical system when the system is in a vulnerable state. The adversary can then send malicious control signals to drive the system to undesirable states. Such actions can lead to permanent damage, disrupting services and causing operational losses. Therefore, it becomes imperative to develop defensive strategies to continuously scan and act against adversarial behavior while striking a balance between operating costs and system performance. This paper puts forth a framework to formally model the problem of dynamic resource takeovers, design effective defense policies and analyze their performance. 89 PlantActuatorReferenceSensorControllerPlantSensorDefenderControllerAdversaryController The concept of resource takeovers, embodied in the FlipIT game, was first introduced by Van et al.[145]. This game models a zero-sum conflict between a defender and an adversary striving to take over a common resource, such as a computing device, virtual machine, or a cloud service[30]. Subsequent advancements in the FlipIT framework include the incorporation of a dynamic environment with varying costs and success probabilities of attacks, as explored by Johnson et al. [73]. The model of FlipIT was extended to multiple resources, termed as FlipThem [81], which consisted of two models: an AND model, where all resources must be compromised, and an OR model, where a single resource is chosen for compromise. There are variations of FlipThem, such as a defender which configures the resources such that an adversary has no incentive to attack beyond a certain number of resources [84]. References [157] and [158] introduced resource constraints in a two-player non-zero-sum game of multiple resource takeovers. Similar to the FlipIt model, a threshold based takeover was introduced with operational dynamics of critical infrastructure as a part of the zero-sum game [33]. Reference [123] introduced FlipNet, a graph based representation of FlipIT, investigating the graph structure, complexity of best response strategies and Nash equilibria. Beyond cybersecurity, [92] introduced the model of FlipIT in supervisory control and data acquisition (SCADA) to evaluate the impact of cyberattacks with insider assistance. A diverse array of applications for the FlipIT model have been explored in secure a variety of systems [30]. Notably, the aforementioned works primarily focused on resource takeovers within a static system, lacking consideration for the dynamic evolution of physical systems. In contrast, our work incorporates the dynamics of a physical system in the game of resource takeovers between an adversary and a defender. The framework outlined by Ding et al.[47] for analyzing probabilistic reachability in discrete- time stochastic hybrid systems addresses the synthesis of safety controls in a finite-horizon zero- sum stochastic game. Our work also deals with a discrete-time game involving two hybrid states. However, a key distinction is that only one player has control over the system in a hybrid state at any given time, while allowing for a potential switch to the other hybrid state. A similar investigation into safe controller designs within two hybrid states was conducted by Dallal et al.[44], formulating 90 a game between a controller aiming to enforce a safety property and an environment seeking to violate it. The solution proposed by Dallal et al.[44] is confined to finite states and actions, whereas our work extends to continuous states and actions. Fiscko et al.[55] introduce a multi-player game with a superplayer controlling a parameterized utility of all the players, resulting in a cost-optimal policy derived from dynamic programming. Building upon this, Fiscko et al.[53] focus on systems with multiple agents that can be clustered, with the superplayer applying a cluster-based policy. The work by Fiscko et al.[54] generalizes the cluster-based approach of a multi-player game with a superplayer to a Transition Independent Markov Decision Process (TI-MDP), proposing a Clustered Value Iteration method to solve the TI-MDP. The work presented in this chapter can be mapped to the case of two clusters, albeit with the added challenge of determining control policies in the presence of coupling between the clusters. The application of game theory to formulate security policies in cyber-physical control systems was addressed by Zhu et al.[161]. The setup introduced by Kontouras et al.[75] closely resembles the game of resource takeover in the dynamical system FlipDyn[16]. However, Kontouras et al.[75] do not address the action of takeovers; they assume takeovers can occur periodically and focus solely on deriving control policies for both the defender and adversary, limited to a one-dimensional control input. Expanding on this, Kontouras et al.[76] incorporate a multi-dimensional control input, solving for a contractive control against covert attacks subject to control and state constraints. The authors in[135] introduce a covert misappropriation of a plant, where a feedback structure allows an attacker to take over control of the plant while remaining hidden from the supervisory system. Similar covert attacks have also been explored that can take control of a load frequency control (LFC) system using a covert reference signal [105]. In contrast to previous research, this chapter provides a feedback signal to infer who is in control and offers the ability to take control of the plant at any instant of time, balancing a trade-off between operational cost and performance. Our prior work [16] presented a game of resource takeovers in a dynamical system. However, we assumed the control policies were time-invariant for both the defender and adversary. In this chapter, we relax this assumption of static control policies and solve for both the takeover strategies 91 and control policies. The contributions of this work are four-fold: 1. Takeover strategies for any discrete-time dynamical system: We formulate a two-player zero-sum takeover game involving a defender and an adversary seeking to control a dynamical system in discrete-time. This game encompasses dynamic takeover scenarios, considering costs that are contingent on the systemโ€™s state and control inputs. Assuming knowledge of the control policies, we establish analytical expressions for the NE takeover strategies and saddle-point values, in the space of pure and mixed strategies. 2. Optimal linear state-feedback control policies: For a linear dynamical system with quadratic takeover, state, and control costs, we derive an analytic state-feedback control policy for both the defender and adversary. Furthermore, we provide sufficient conditions under which the game admits a saddle-point in the space of feedback control policies that are affine in the state. 3. Exact takeover strategies and saddle-point value parameters for scalar/1โˆ’ dimensional system: For a linear dynamical system in one dimension with quadratic takeover, state and control costs, we derive the corresponding analytical state-feedback control policies for both the defender and adversary. In particular, we derive closed-form expressions for the NE takeover and parameterized value of the game independent of the state. 4. Approximate takeover strategies and saddle-point value parameters for ๐‘›โˆ’ dimensional system: For a linear dynamical system in ๐‘› dimensions with quadratic takeover, state and control costs, we derive upper and lower bounds on the defender and the attacker value functions, respectively, when both players use a linear state-feedback control policy. Using these bounds, we derive approximate NE takeover strategies and the corresponding value of the game in a parameterized form. We illustrate our results for the scalar/1โˆ’dimensional and ๐‘›โˆ’dimensional systems through numerical examples. 92 Outline: This chapter is structured as follows. Section 4.2 formally defines the FlipDyn problem, considering unknown control policies with state and control-dependent costs. In Sec- tion 4.3, we outline a solution methodology applicable to discrete-time dynamical systems with non-negative costs, under the assumption of known control policies. Section 4.5.1 presents a solution for determining optimal linear state-feedback control policies, specifically designed for linear discrete-time dynamical systems featuring quadratic costs. In Section 4.5.2, we delve into the takeover strategies and saddle-point value parameters for the scalar/1โˆ’dimensional system. Finally, Section 4.5.3 addresses the approximate takeover strategies and saddle-point value parameters for the ๐‘›โˆ’dimensional system. The chapter concludes with a discussion in Section 4.6. 4.2 Problem Formulation Consider a discrete-time dynamical system, whose state evolution is given by: ๐‘ฅ๐‘˜+1 = ๐น0 ๐‘˜ (๐‘ฅ๐‘˜ , ๐‘ข๐‘˜ ), (4.1) where ๐‘˜ denotes the discrete time index, taking values from the integer set K := {1, 2, . . . , ๐ฟ} โŠ‚ N, ๐‘ฅ๐‘˜ โˆˆ X โІ R๐‘› is the state of the system with X denoting the Euclidean state space, ๐‘ข๐‘˜ โˆˆ U๐‘˜ โŠ‚ R๐‘š is the control input of the system with U๐‘˜ as the Euclidean control input space at time instant ๐‘˜, and ๐น0 ๐‘˜ : X ร— U๐‘˜ โ†’ X is the state transition function. We consider a single adversary trying to takeover the dynamical system resource. In particular, we assume the adversary to be located between the controller and actuator. The FlipDyn state, ๐›ผ๐‘˜ โˆˆ {0, 1} indicates whether the defender (๐›ผ๐‘˜ = 0) or the adversary (๐›ผ๐‘˜ = 1) has taken over the system at time ๐‘˜. We describe a takeover through the action ๐œ‹ ๐‘— ๐‘˜ โˆˆ {0, 1}, which denotes the action of the player ๐‘— โˆˆ {0, 1} at time ๐‘˜, where ๐‘— = 0 denotes the defender and ๐‘— = 1 denotes the adversary. The binary FlipDyn state update based on the playerโ€™s takeover action satisfies ๐›ผ๐‘˜+1 = ๐›ผ๐‘˜ , if ๐œ‹1 ๐‘˜ , ๐‘˜ = ๐œ‹0 ๐‘—, if ๐œ‹ ๐‘— ๐‘˜ = 1. ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ (4.2) The FlipDyn update (4.2) states that if both players act to takeover the resource at the same time instant, then their actions are nullified, and the FlipDyn state remains unchanged. However, if the 93 resource is under control by one of the players who does not exert a takeover action, while the other player moves to gain control at time ๐‘˜ + 1, then the FlipDyn state toggles at time ๐‘˜ + 1. Finally, if a player is already in control and continues the takeover while the other player remains idle, then the FlipDyn state is unchanged. Thus, the FlipDyn dynamics is compactly described by (cid:16) ๐›ผ๐‘˜+1 = ๐‘˜ ยฏ๐œ‹1 ยฏ๐œ‹0 ๐‘˜ + ๐œ‹0 ๐‘˜ ๐œ‹1 ๐‘˜ (cid:17) ๐›ผ๐‘˜ + ยฏ๐œ‹0 ๐‘˜ (๐œ‹0 ๐‘˜ + ๐œ‹1 ๐‘˜ ), (4.3) where a binary variable ยฏ๐‘ฅ := 1 โˆ’ ๐‘ฅ. Takeovers are mutually exclusive, i.e., only one player is in control of the system at any given time. The continuous state ๐‘ฅ๐‘˜+1 at time ๐‘˜ + 1 is dependent on ๐›ผ๐‘˜+1. The inclusion of an adversary modifies the state evolution (4.1) resulting in: ๐‘ฅ๐‘˜+1 = (1 โˆ’ ๐›ผ๐‘˜+1)๐น0 ๐‘˜ (๐‘ฅ๐‘˜ , ๐‘ข๐‘˜ ) + ๐›ผ๐‘˜+1๐น1 ๐‘˜ (๐‘ฅ๐‘˜ , ๐‘ค ๐‘˜ ), (4.4) where ๐น1 ๐‘˜ : X ร— W๐‘˜ โ†’ X is the state transition function under the adversaryโ€™s control, ๐‘ค ๐‘˜ โˆˆ W๐‘˜ โŠ‚ R๐‘ is the attack input with W๐‘˜ as the Euclidean attack input space. In this work, we aim to determine an optimal control input for the dynamical system along with the corresponding takeover strategy for each player. Given a non-zero initial state ๐‘ฅ1, we pose the resource takeover and dynamic system control problem as a zero-sum dynamic game described by the dynamics (4.4) and (4.3) over a finite-horizon ๐ฟ, where the defender aims to minimize a net cost given by: ๐ฝ (๐‘ฅ1, ๐›ผ1, {๐œ‹1 L}, {๐œ‹0 L}, ๐‘ขโˆ— L , ๐‘คโˆ— L) = ๐ฟ โˆ‘๏ธ ๐‘ก=1 ๐‘”๐‘ก (๐‘ฅ๐‘ก, ๐›ผ๐‘ก) + ๐œ‹0 ๐‘ก ๐‘‘๐‘ก (๐‘ฅ๐‘ก) + ยฏ๐›ผ๐‘ก๐‘š๐‘ก (๐‘ข๐‘ก) โˆ’ ๐œ‹1 ๐‘ก ๐‘Ž๐‘ก (๐‘ฅ๐‘ก) โˆ’ ๐›ผ๐‘ก๐‘›๐‘ก (๐‘ค๐‘ก) ๐‘”๐ฟ+1(๐‘ฅ๐ฟ+1, ๐›ผ๐ฟ+1), (4.5) where ๐‘”๐‘ก (๐‘ฅ๐‘ก, ๐›ผ๐‘ก) : R๐‘› ร—{0, 1} โ†’ R represents the state cost with ๐‘”๐ฟ+1(๐‘ฅ๐ฟ+1, ๐›ผ๐ฟ+1) : R๐‘› ร—{0, 1} โ†’ R representing the terminal state cost, ๐‘‘๐‘ก (๐‘ฅ๐‘ก) : R๐‘› โ†’ R and ๐‘Ž๐‘ก (๐‘ฅ๐‘ก) : R๐‘› โ†’ R are the instantaneous takeover costs for the defender and adversary, respectively. The terms ๐‘š๐‘ก (๐‘ข๐‘ก) : R๐‘š โ†’ R and ๐‘›๐‘ก (๐‘ค๐‘ก) : R๐‘ โ†’ R, are control costs corresponding to the defender and adversary, respectively. The notations {๐œ‹ ๐‘— ๐ฟ }, ๐‘— โˆˆ {0, 1}, ๐‘ขL := {๐‘ข1, . . . , ๐‘ข๐ฟ }, and ๐‘คL := {๐‘ค1, . . . , ๐‘ค ๐ฟ }. In , . . . , ๐œ‹ ๐‘— L} := {๐œ‹ ๐‘— 1 contrast, the adversary aims to maximize the cost function (4.5) leading to a zero-sum dynamic game, termed as the FlipDyn game [16] with control. 94 We seek to find Nash Equilibrium (NE) solutions of the game (4.5). To guarantee the existence of a pure or mixed NE takeover strategy, we expand the set of player policies to behavioral strategies โ€“ probability distributions over the space of discrete actions at each time step [61]. Specifically, let (cid:104) ๐‘ฆ๐›ผ๐‘˜ ๐‘˜ = 1 โˆ’ ๐›ฝ๐›ผ๐‘˜ ๐‘˜ ๐›ฝ๐›ผ๐‘˜ ๐‘˜ (cid:105) T and ๐‘ง๐›ผ๐‘˜ ๐‘˜ = (cid:104) 1 โˆ’ ๐›พ๐›ผ๐‘˜ ๐‘˜ ๐›พ๐›ผ๐‘˜ ๐‘˜ (cid:105) T , (4.6) be the behavioral strategies for the defender and adversary at time instant ๐‘˜ for the FlipDyn state ๐›ผ๐‘˜ , such that ๐›ฝ๐›ผ๐‘˜ ๐‘˜ โˆˆ [0, 1], respectively. The takeover actions ๐‘˜ โˆˆ [0, 1] and ๐›พ๐›ผ๐‘˜ ๐‘˜ โˆผ ๐‘ฆ๐›ผ๐‘˜ ๐‘˜ , ๐œ‹0 ๐‘˜ โˆผ ๐‘ง๐›ผ๐‘˜ ๐‘˜ , ๐œ‹1 of each player at any time ๐‘˜ are sampled from the corresponding behavioral strategy. The behavioral strategies, ๐‘ฆ๐›ผ๐‘˜ horizon ๐ฟ, let ๐‘ฆL := {๐‘ฆ๐›ผ1 ๐‘˜ โˆˆ ฮ”2, where ฮ”2 is the probability simplex in two dimensions. Over the finite , . . . , ๐‘ฆ๐›ผ๐ฟ , . . . , ๐‘ง๐›ผ๐ฟ ๐‘˜ , ๐‘ง๐›ผ๐‘˜ ๐ฟ } โˆˆ ฮ”๐ฟ ๐ฟ } โˆˆ ฮ”๐ฟ 2 and ๐‘งL := {๐‘ง๐›ผ1 2 be the sequence , ๐‘ฆ๐›ผ2 2 , ๐‘ง๐›ผ2 2 1 1 of defender and adversary behavioral strategies. Thus, the expected outcome of the zero-sum game (4.5) is given by ๐ฝ๐ธ (๐‘ฅ1, ๐›ผ1, ๐‘ฆL, ๐‘งL, ๐‘ขL, ๐‘คL) := E[๐ฝ (๐‘ฅ1, ๐›ผ1, {๐œ‹1 ๐ฟ }, {๐œ‹0 ๐ฟ }, ๐‘ขL, ๐‘คL)], (4.7) where the expectation is computed with respect to the distributions ๐‘ฆL and ๐‘งL. Specifically, we seek a saddle-point solution (๐‘ฆโˆ— L , ๐‘งโˆ— L , ๐‘ขโˆ— L , ๐‘คโˆ— L) in the space of behavioral strategies and control inputs such that for any non-zero initial state ๐‘ฅ0 โˆˆ X, ๐›ผ0 โˆˆ {0, 1}, ๐ฝ ๐ธ โ‰ค ๐ฝ๐ธ (๐‘ฅ0, ๐›ผ0, ๐‘ฆโˆ— L , ๐‘งโˆ— L , ๐‘ขโˆ— L , ๐‘คโˆ— L) โ‰ค ๐ฝ ๐ธ . where ๐ฝ ๐ธ := ๐ฝ๐ธ (๐‘ฅ0, ๐›ผ0, ๐‘ฆโˆ— L , ๐‘งL, ๐‘ขโˆ— L , ๐‘คL) and ๐ฝ ๐ธ := ๐ฝ๐ธ (๐‘ฅ0, ๐›ผ0, ๐‘ฆL.๐‘งโˆ— L , ๐‘ขL, ๐‘คโˆ— L). The FlipDyn game with control of dynamical system - FlipDyn-Con, is completely defined by the expected cost (4.7) and the space of player takeover strategies and control input policies subject to the dynamics in (4.4) and (4.3). In the next section, we derive the outcome of the FlipDyn game with control for both the FlipDyn state of ๐›ผ = 0 and ๐›ผ = 1 for general systems. 95 4.3 FlipDyn for general systems We will begin by deriving the NE takeover strategies of the FlipDyn game, given any control policy pair ๐‘ขL, ๐‘คL, in each of the two takeover. Our approach begins by defining the saddle-point value of the game. 4.3.1 Saddle-point value At time instant ๐‘˜ โˆˆ K, given an initial FlipDyn state, the saddle-point value consists of the instantaneous state and control-dependent cost and an additive cost-to-go based on the players takeover actions. The cost-to-go is determined via a cost-to-go matrix in each of the FlipDyn โˆˆ R2ร—2 and ฮž1 state, represented by ฮž0 ๐‘˜+1 โˆˆ R2ร—2 for the FlipDyn state ๐›ผ๐‘˜ = 0 and ๐›ผ๐‘˜ = 1, ๐‘˜ (๐‘ฅ, ๐‘ค ๐‘˜ , ฮž1 ๐‘˜+1) be the saddle-point values at time instant ๐‘˜ with the continuous state ๐‘ฅ for a given control policy pair ๐‘ข๐‘˜ and ๐‘ค ๐‘˜ and cost-to-go matrices, ๐‘˜+1 ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ , ฮž0 respectively. Let ๐‘‰ 0 ) and ๐‘‰ 1 ๐‘˜+1 corresponding to the FlipDyn state of ๐›ผ = 0 and 1, respectively. The entries of the cost-to-go matrix ฮž0 ๐‘˜+1 corresponding to each pair of takeover actions are given by Idle ๐‘ฃ0 ๐‘˜+1 Idle Takeover ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:124) ๐‘ฃ0 ๐‘ฃ0 + ๐‘‘๐‘˜ (๐‘ฅ) ๐‘˜+1 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) Takeover ๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) + ๐‘‘๐‘˜ (๐‘ฅ) โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:125) ๐‘˜+1 (cid:123)(cid:122) ฮž0 ๐‘˜+1 where ๐‘ฃ0 ๐‘˜+1 := ๐‘‰ 0 ๐‘˜+1 (cid:16) ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ ), ๐‘ข๐‘˜+1, ฮž0 ๐น0 ๐‘˜+2 (cid:17) , ๐‘˜+1 := ๐‘‰ 1 ๐‘ฃ1 ๐‘˜+1(๐น1 ๐‘˜ (๐‘ฅ, ๐‘ค ๐‘˜ ), ๐‘ค ๐‘˜+1, ฮž1 ๐‘˜+2). (4.8) (4.9) (4.10) The matrix entries corresponding to ฮž0 ๐‘˜+1 are determined using the defender and adversary control policies, and the dynamics (4.4) and (4.3). ๐‘‹ (๐‘–, ๐‘—) corresponds to the (๐‘–, ๐‘—)-th entry of the matrix ๐‘‹. The diagonal entries ฮž0 ๐‘˜+1 (1, 1) and ฮž0 ๐‘˜+1 (2, 2) correspond to both the defender and adversary acting idle and taking over, respectively. The off-diagonal entries correspond to one player taking over the resource. The entries of the cost-to-go matrix couple the value functions in each FlipDyn state. Thus, at time ๐‘˜ for a given control policy ๐‘ข๐‘˜ , state ๐‘ฅ and ๐›ผ๐‘˜ = 0, the saddle-point value 96 satisfies ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ , ฮž0 ๐‘‰ 0 ๐‘˜+1) = ๐‘”๐‘˜ (๐‘ฅ, 0) + ๐‘š๐‘˜ (๐‘ข๐‘˜ ) + Val(ฮž0 ๐‘˜+1), (4.11) where Val(๐‘‹ ๐›ผ๐‘˜ ๐‘˜+1 ) := min๐‘ฆ ๐›ผ๐‘˜ ๐‘˜ max๐‘ง ๐›ผ๐‘˜ ๐‘˜ ๐‘˜ ๐‘‹๐‘˜+1๐‘ง๐›ผ๐‘˜ ๐‘ฆ๐›ผ๐‘˜ T ๐‘˜ , represents the (mixed) saddle-point value of the zero-sum matrix ๐‘‹๐‘˜+1 for the FlipDyn state ๐›ผ๐‘˜ , and ฮž0 ๐‘˜+1 โˆˆ R2ร—2 is the cost-to-go zero-sum matrix. The defenderโ€™s (row player) and adversaryโ€™s (column player) action results in either an entry within ฮž0 ๐‘˜+1 (if the matrix has a saddle point in pure strategies) or in the expected sense, resulting in a cost-to-go from state ๐‘ฅ at time ๐‘˜. Similarly, for ๐›ผ๐‘˜ = 1, โˆ€๐‘˜, the cost-to-go matrix entries ฮž1 ๐‘˜+1 and the saddle-point value are given by: Idle Takeover Idle Takeover ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:124) ๐‘ฃ1 ๏ฃน ๐‘˜+1 โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) ๐‘ฃ1 ๏ฃบ ๐‘˜+1 ๏ฃบ ๏ฃบ ๐‘ฃ0 ๐‘ฃ1 + ๐‘‘๐‘˜ (๐‘ฅ) ๐‘˜+1 + ๐‘‘๐‘˜ (๐‘ฅ) โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) ๏ฃบ ๐‘˜+1 ๏ฃบ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๏ฃป (cid:123)(cid:122) (cid:125) ฮž1 ๐‘˜+1 , with ๐‘‰ 1 ๐‘˜ (๐‘ฅ, ๐‘ค ๐‘˜ , ฮž1 ๐‘˜+1) = ๐‘”๐‘˜ (๐‘ฅ, 1) โˆ’ ๐‘›๐‘˜ (๐‘ค ๐‘˜ ) + Val(ฮž1 ๐‘˜+1). (4.12) (4.13) With the saddle-point values established in each of the FlipDyn states, in the following subsec- tion, we will characterize the NE takeover strategies and the saddle-point values for the entire time horizon ๐ฟ. 4.3.2 NE takeover strategies of the FlipDyn game In order to characterize the saddle-point value of the game, we restrict the cost functions to a particular domain, stated in the following mild assumption. Assumption 4.3.1 [Non-negative costs] For any time instant ๐‘˜ โˆˆ K, the state and control depen- dent costs ๐‘”๐‘˜ (๐‘ฅ, ๐›ผ), ๐‘‘๐‘˜ (๐‘ฅ), ๐‘Ž๐‘˜ (๐‘ฅ), ๐‘š๐‘˜ (๐‘ข๐‘˜ ), ๐‘›๐‘˜ (๐‘ค ๐‘˜ ), for all ๐‘ฅ โˆˆ X, ๐‘ข๐‘˜ โˆˆ U๐‘˜ , ๐‘ค โˆˆ W๐‘˜ , and ๐›ผ โˆˆ {0, 1} are non-negative (Rโ‰ฅ0). Assumption 4.3.1 enables us to compare the entries of the cost-to-go matrix without changes in the sign of the costs, thereby, characterizing the strategies of the players (pure or mixed strategies). 97 Under this assumption, we derive the following result to compute a recursive saddle-point value for the entire horizon length and the NE takeover strategies for both the players. Theorem 4.3.2 (Case ๐›ผ๐‘˜ = 0) Under Assumption 4.3.1, for a given choice of control policies, ๐‘ขL and ๐‘คL, the unique NE takeover strategies of the FlipDyn-Con game (4.7) at any time ๐‘˜ โˆˆ K, subject to the continuous state dynamics (4.4) and FlipDyn dynamics (4.3) are given by: ๐‘ฆ0โˆ— ๐‘˜ = ๐‘ง0โˆ— ๐‘˜ = ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ (cid:20) ๐‘Ž๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 1 โˆ’ (cid:21) T , ๐‘Ž๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 if ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), (cid:20) 1 0 (cid:20) (cid:20) (cid:20) 1 โˆ’ ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 0 1 1 0 (cid:21) T (cid:21) T (cid:21) T (cid:21) T , , , , otherwise, if if ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 โ‰ค ๐‘‘๐‘˜ (๐‘ฅ), , ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), otherwise, (4.14) (4.15) where ห‡ฮž๐‘˜+1 := ๐‘‰ 1 ๐‘˜+1(๐น1 The saddle-point value is given by: ๐‘˜ (๐‘ฅ, ๐‘ค ๐‘˜ ), ๐‘ค ๐‘˜+1, ฮž1 ๐‘˜+2) โˆ’ ๐‘‰ 0 ๐‘˜+1 (๐น0 ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ ), ๐‘ข๐‘˜+1, ฮž0 ๐‘˜+2 ). ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ , ฮž0 ๐‘‰ 0 ๐‘˜+1) = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘”๐‘˜ (๐‘ฅ, 0) + ๐‘ฃ0 ๐‘˜+1 + ๐‘š๐‘˜ (๐‘ข๐‘˜ ) + ๐‘‘๐‘˜ (๐‘ฅ) โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ)๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 ๐‘”๐‘˜ (๐‘ฅ, 0) + ๐‘š๐‘˜ (๐‘ข๐‘˜ ) + ๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ), ๐‘”๐‘˜ (๐‘ฅ, 0) + ๐‘ฃ0 ๐‘˜+1 + ๐‘š๐‘˜ (๐‘ข๐‘˜ ), 98 , if ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), if ห‡ฮž๐‘˜+1 โ‰ค ๐‘‘๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), otherwise. (4.16) (4.17) (4.18) ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 โ‰ค ๐‘Ž๐‘˜ (๐‘ฅ) (Case ๐›ผ๐‘˜ = 1) The unique NE takeover strategies are (cid:20) (cid:20) (cid:20) ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃณ ๐‘ฆ1โˆ— ๐‘˜ = ๐‘ง1โˆ— ๐‘˜ = 1 โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 (cid:21) T , ๐‘Ž๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 0 1 1 0 (cid:20) ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 1 โˆ’ ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 (cid:21) T (cid:21) T (cid:21) T , , , if if ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 โ‰ค ๐‘Ž๐‘˜ (๐‘ฅ), otherwise, if ห‡ฮž๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ), ห‡ฮž๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), (cid:20) 1 (cid:21) T , 0 otherwise. The saddle-point value is given by: ๐‘”๐‘˜ (๐‘ฅ, 1) + ๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘›๐‘˜ (๐‘ค ๐‘˜ ) โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) + ๐‘Ž๐‘˜ (๐‘ฅ)๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 , if ๐‘”๐‘˜ (๐‘ฅ, 1) โˆ’ ๐‘›๐‘˜ (๐‘ค ๐‘˜ ) + ๐‘ฃ0 ๐‘˜+1 + ๐‘‘๐‘˜ (๐‘ฅ), if ๐‘˜ (๐‘ฅ, ๐‘ค ๐‘˜ , ฮž1 ๐‘‰ 1 ๐‘˜+1) = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘”๐‘˜ (๐‘ฅ, 1) + ๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘›๐‘˜ (๐‘ค ๐‘˜ ), otherwise. The boundary condition at ๐‘˜ = ๐ฟ is given by: ๐‘ข๐ฟ+1 := 0๐‘š, ๐‘ค ๐ฟ+1 := 0๐‘, ฮž1 ๐ฟ+2 := 02ร—2, ฮž0 ๐ฟ+2 := 02ร—2, where 0๐‘–ร— ๐‘— โˆˆ R๐‘–ร— ๐‘— represents a matrix of zeros. (4.19) (4.20) โ–ก Proof: We will only derive the NE takeover strategies and saddle-point value for case of ๐›ผ๐‘˜ = 0. We leave out the derivations for ๐›ผ = 1 as they are analogous to ๐›ผ = 0. There are three cases to consider for the 2 ร— 2 matrix game defined by the matrix in (4.8). We start by identifying the NE takeover in pure strategies in the cost-to-go matrix ฮž0 ๐‘˜ (4.11). 99 i) Pure strategy: Both the defender and adversary choose the action of staying idle. First, we determine the conditions under which the defender always chooses to play idle. Under Assump- tion 4.3.1, we compare the entries of ฮž0 ๐‘˜+1 when the adversary opts to remain idle to obtain the condition: ๐‘ฃ0 ๐‘˜+1 โ‰ค ๐‘ฃ0 ๐‘˜+1 + ๐‘‘๐‘˜ (๐‘ฅ). (4.21) Similarly, when the adversary opts to takeover, if the condition ๐‘˜+1 โ‰ค ๐‘ฃ0 ๐‘ฃ1 ๐‘˜+1 + ๐‘‘๐‘˜ (๐‘ฅ), โ‡’๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘ฃ0 ๐‘˜+1 โ‰ค ๐‘‘๐‘˜ (๐‘ฅ) holds, then defender always remains idle. Next, we determine the conditions for the adversary to always remain idle. Under Assumption 4.3.1, when the defender chooses to takeover, we compare the entries of ฮž0 ๐‘˜+1 to infer the condition ๐‘˜+1 + ๐‘‘๐‘˜ (๐‘ฅ) โ‰ฅ๐‘ฃ0 ๐‘ฃ0 ๐‘˜+1 + ๐‘‘๐‘˜ (๐‘ฅ) โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) โ‡’ 0 โ‰ฅ โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ) always holds. Finally, when the defender opts to remain idle, if the condition, ๐‘˜+1 โ‰ค ๐‘ฃ0 ๐‘ฃ1 ๐‘˜+1 + ๐‘Ž๐‘˜ (๐‘ฅ), holds, then the adversary always remains idle. The saddle-point value corresponding to the pure strategy of playing idle by both the players, is the entry ฮž0 ๐‘˜+1 (1, 1), given by: ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ , ฮž0 ๐‘‰ 0 ๐‘˜+1) = ๐‘”๐‘˜ (๐‘ฅ, 0) + ๐‘ฃ0 ๐‘˜+1 + ๐‘š๐‘˜ (๐‘ข๐‘˜ ). ii) Pure strategy: The defender chooses to stay idle whereas the adversary chooses to takeover. We will derive the conditions under which the adversary opts to takeover. When the defender plays idle, if the condition, ๐‘˜+1 โ‰ฅ ๐‘ฃ0 ๐‘ฃ1 ๐‘˜+1 + ๐‘Ž๐‘˜ (๐‘ฅ), 100 holds, then the adversary always opts to takeover. The saddle-point corresponding to the pure strategy of when the defender opts to remain idle while the adversary plays a takeover action corresponds to the entry ฮž0 ๐‘˜+1 (1, 2), which is given by: ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ , ฮž0 ๐‘‰ 0 ๐‘˜+1) = ๐‘”๐‘˜ (๐‘ฅ, 0) + ๐‘š๐‘˜ (๐‘ข๐‘˜ ) + ๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ). Finally, we derive conditions under which the cost-to-go matrix ฮž0 ๐‘˜+1 has a saddle-point in mixed strategies. iii) Mixed strategies: Mixed strategies are played by both players if none of the pure strategy conditions are met, i.e., when both ๐‘˜+1 โˆ’ ๐‘ฃ0 ๐‘ฃ1 ๐‘˜+1 > ๐‘‘๐‘˜ (๐‘ฅ), ๐‘˜+1 โˆ’ ๐‘ฃ0 ๐‘ฃ1 ๐‘˜+1 > ๐‘Ž๐‘˜ (๐‘ฅ), hold. In this case, no single row or column dominates. A mixed strategy NE takeover for any 2 ร— 2 game is given by (cf. [132]) ๐‘ฆ0โˆ— ๐‘˜ = (cid:20) ๐‘Ž๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 1 โˆ’ (cid:21) T ๐‘Ž๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 , ๐‘ง0โˆ— ๐‘˜ = (cid:20) 1 โˆ’ ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 (cid:21) T . ๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 Thus, for the FlipDyn state of ๐›ผ๐‘˜ = 0, ๐‘˜ โˆˆ K, we characterize the complete NE takeover strategies (pure and mixed strategies) for the defender and adversary (4.14) and (4.15), respectively. The mixed saddle-point value of the 2 ร— 2 zero-sum matrix ฮž0 ๐‘˜+1 is given by (cf. [132]), ๐‘ฆ0โˆ—T ๐‘˜ ฮž0 ๐‘˜+1 ๐‘ง0โˆ— ๐‘˜ := ๐‘ฃ0 ๐‘˜+1 + ๐‘‘๐‘˜ (๐‘ฅ) โˆ’ ๐‘Ž๐‘˜ (๐‘ฅ)๐‘‘๐‘˜ (๐‘ฅ) ห‡ฮž๐‘˜+1 . Collecting all the saddle-point values of the game corresponding to the pure and mixed strategy NE, we obtain the saddle-point value update equation over the horizon of ๐ฟ in (4.16). Notice that ๐‘”๐‘˜ (๐‘ฅ, 0) and ๐‘š๐‘˜ (๐‘ข๐‘˜ ) represent the instantaneous state and control-dependent cost, and are not part of the zero-sum matrix as shown in (4.11). The boundary conditions (4.20) imply that the saddle-point values at ๐‘˜ = ๐ฟ + 1 satisfy ๐ฟ+1(๐‘ฅ, 0๐‘š, 02ร—2) = ๐‘”0 ๐‘‰ 0 ๐ฟ+1(๐‘ฅ, 0), ๐ฟ+1(๐‘ฅ, 0๐‘, 02ร—2) = ๐‘”1 ๐‘‰ 1 ๐ฟ+1(๐‘ฅ, 1). 101 โ–ก For a finite cardinality of the state X, fixed player policies ๐‘ข๐‘˜ and ๐‘ค ๐‘˜ , ๐‘˜ โˆˆ K, and a finite horizon ๐ฟ, Theorem 4.3.2 yields an exact saddle-point value of the FlipDyn-Con game (4.7). However, the computational and storage complexities scale undesirably with the cardinality of X, especially in continuous state spaces. For this purpose, in the next section, we will provide a parametric form of the saddle-point value for the case of linear dynamics with quadratic costs. 4.4 FlipDyn for LQ Problems To address continuous state spaces arising in the FlipDyn-Con game, we restrict our attention to a linear dynamical system with quadratic costs (LQ problems). Furthermore, we segment our analysis into two distinct cases: a 1-dimensional (scalar) and an ๐‘›-dimensional system. The dynamics of a linear system at time instant ๐‘˜ โˆˆ K, when the defender has taken over satisfies: ๐‘ฅ๐‘˜+1 = ๐น0 ๐‘˜ (๐‘ฅ๐‘˜ , ๐‘ข๐‘˜ ) := ๐ธ๐‘˜ ๐‘ฅ๐‘˜ + ๐ต๐‘˜๐‘ข๐‘˜ , (4.22) where ๐ธ๐‘˜ โˆˆ R๐‘›ร—๐‘› denotes the state transition matrix, while ๐ต๐‘˜ โˆˆ R๐‘›ร—๐‘š represents the defender control matrix. Similarly, the dynamics of the linear system if the adversary takes over satisfies: ๐‘ฅ๐‘˜+1 = ๐น1 ๐‘˜ (๐‘ฅ๐‘˜ , ๐‘ค ๐‘˜ ) := ๐ธ๐‘˜ ๐‘ฅ๐‘˜ + ๐ป๐‘˜ ๐‘ค ๐‘˜ , (4.23) where ๐ป๐‘˜ โˆˆ R๐‘›ร—๐‘ signifies the adversary control matrix. The FlipDyn dynamics (4.4) then reduces to ๐‘ฅ๐‘˜+1 = ๐ธ๐‘˜ ๐‘ฅ๐‘˜ + (1 โˆ’ ๐›ผ๐‘˜ )๐ต๐‘˜๐‘ข๐‘˜ + ๐›ผ๐‘˜ ๐ป๐‘˜ ๐‘ค ๐‘˜ . (4.24) The stage, takeover and control quadratic costs for each player are given by: ๐‘”๐‘˜ (๐‘ฅ, ๐›ผ๐‘˜ ) = ๐‘ฅT๐บ๐›ผ๐‘˜ ๐‘˜ ๐‘ฅ, ๐‘‘๐‘˜ (๐‘ฅ) = ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ, ๐‘Ž๐‘˜ (๐‘ฅ) = ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ, ๐‘š๐‘˜ (๐‘ข) = ๐‘ขT๐‘€๐‘˜๐‘ข, ๐‘›๐‘˜ (๐‘ค) = ๐‘คT๐‘๐‘˜ ๐‘ค, (4.25) , ๐ท ๐‘˜ โˆˆ S๐‘›ร—๐‘› + , ๐ด๐‘˜ โˆˆ S๐‘›ร—๐‘› + , ๐‘€๐‘˜ โˆˆ S๐‘šร—๐‘š + and ๐‘๐‘˜ โˆˆ S๐‘ร—๐‘ + are positive definite where ๐บ๐›ผ๐‘˜ ๐‘˜ โˆˆ S๐‘›ร—๐‘› + matrices. 102 Remark 4.4.1 The control policies for both players function in a mutually exclusive manner within their respective FlipDyn state. Specifically, the defender control policy ๐‘ข๐‘˜ affects the dynamics when the FlipDyn state ๐›ผ๐‘˜ = 0, while the adversary control policy ๐‘ค ๐‘˜ comes into effect when ๐›ผ๐‘˜ = 1. If we constrain the control polices for both players to be functions of the continuous state ๐‘ฅ, then the saddle-point value for each FlipDyn state relies solely on the continuous state ๐‘ฅ, in contrast to being contingent on both continuous state ๐‘ฅ and the control input for the corresponding FlipDyn state. This restriction is formally stated in the subsequent assumption. Assumption 4.4.2 At any time instant ๐‘˜ โˆˆ K, we define the defender and adversary control policies to be linear state-feedback in the continuous state ๐‘ฅ, defined by ๐‘ข๐‘˜ (๐‘ฅ) := ๐พ๐‘˜ ๐‘ฅ, ๐‘ค ๐‘˜ (๐‘ฅ) := ๐‘Š๐‘˜ ๐‘ฅ, (4.26) where ๐พ๐‘˜ โˆˆ R๐‘šร—๐‘› and ๐‘Š๐‘˜ โˆˆ R๐‘ร—๐‘› are known defender and adversary control gains matrices, respectively. Under Assumption 4.4.2, and from saddle-point values (4.16) and (4.19), we postulate a para- metric form for the saddle-point value in each FlipDyn state as follows: ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ (๐‘ฅ), ฮž0 ๐‘‰ 0 ๐‘˜+1) โ‡’ ๐‘‰ 0 ๐‘˜ (๐‘ฅ) := ๐‘ฅT๐‘ƒ0 ๐‘˜ ๐‘ฅ, ๐‘˜ (๐‘ฅ, ๐‘ค ๐‘˜ (๐‘ฅ), ฮž1 ๐‘‰ 1 ๐‘˜+1) โ‡’ ๐‘‰ 1 ๐‘˜ (๐‘ฅ) := ๐‘ฅT๐‘ƒ1 ๐‘˜ ๐‘ฅ, where ๐‘ƒ0 ๐‘˜ and ๐‘ƒ1 ๐‘˜ are ๐‘› ร— ๐‘› real symmetric matrices corresponding to the FlipDyn states ๐›ผ = 0 and 1, respectively. We impose Assumption 4.4.2 to enable factoring out the state ๐‘ฅ while computing the saddle-point value update backward in time. We redefine the dynamics of the defender and adversary under Assumption 4.4.2 given by: ๐‘ฅ๐‘˜+1 = (cid:101)๐ต๐‘˜ ๐‘ฅ๐‘˜ := (๐ธ๐‘ฅ + ๐ต๐‘˜ ๐พ๐‘˜ )๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 = (cid:101)๐‘Š๐‘˜ ๐‘ฅ๐‘˜ := (๐ธ๐‘ฅ + ๐ป๐‘˜๐‘Š๐‘˜ )๐‘ฅ๐‘˜ . (4.27) 103 Next, we outline the NE takeover strategies of both the players, along with the corresponding parameters of the saddle-point values for each of the FlipDyn state, for discrete-time linear dynamics, with known linear state-feedback control policies, and quadratic costs. We begin by analyzing the case when ๐‘ฅ is a scalar, for which we will compute the saddle-point value exactly, and subsequently proceed to approximate the saddle-point value for the ๐‘›- dimensional case. 4.4.1 Scalar/1-dimensional dynamical system The quadratic costs any time ๐‘˜ โˆˆ K stated in (4.25) for a scalar dynamical system are represented as: ๐‘”๐‘˜ (๐‘ฅ, ๐›ผ๐‘˜ ) = ๐บ๐›ผ๐‘˜ ๐‘˜ ๐‘ฅ2, ๐‘‘๐‘˜ (๐‘ฅ) = ๐‘‘๐‘˜ ๐‘ฅ2, ๐‘Ž๐‘˜ (๐‘ฅ) = ๐‘Ž๐‘˜ ๐‘ฅ2, ๐‘š๐‘˜ (๐‘ข) = ๐‘€๐‘˜ ๐พ 2 ๐‘˜ ๐‘ฅ2, ๐‘›๐‘˜ (๐‘ค) = ๐‘๐‘˜๐‘Š 2 ๐‘˜ ๐‘ฅ2, (4.28) for non-negative values of ๐บ๐›ผ๐‘˜ ๐‘˜ , ๐ท ๐‘˜ , ๐ด๐‘˜ , ๐‘€๐‘˜ and ๐‘๐‘˜ . For a scalar dynamical system, we use the following notation to represent the saddle-point value in each FlipDyn state. Let ๐‘˜ (๐‘ฅ) := p0 ๐‘‰ 0 ๐‘˜ ๐‘ฅ2, ๐‘‰ 1 ๐‘˜ (๐‘ฅ) := p1 ๐‘˜ ๐‘ฅ2, where p๐›ผ ๐‘˜ โˆˆ R, ๐›ผ โˆˆ {0, 1}, ๐‘˜ โˆˆ K. Building on Theorem 4.3.2, we present the following result, which provides a closed-form expression for the NE takeover in both pure and mixed strategies of both players, and outlines the saddle-point value update of the parameter p๐›ผ ๐‘˜ . Corollary 1 (Case ๐›ผ๐‘˜ = 0) Under Assumption 4.4.2, the unique NE takeover strategies of the FlipDyn game at any time ๐‘˜ โˆˆ K, subject to the dynamics (4.27) for a scalar dynamical system 104 (4.29) (4.30) (4.31) with quadratic costs (4.28) and FlipDyn dynamics (4.3) are given by: (cid:20) ๐‘Ž๐‘˜ หœp๐‘˜+1 (cid:20) 1 1 โˆ’ (cid:21) T , ๐‘Ž๐‘˜ หœp๐‘˜+1 if หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , (cid:21) T , 0 otherwise, (cid:20) (cid:20) (cid:20) 1 โˆ’ ๐‘‘๐‘˜ หœp๐‘˜+1 (cid:21) T , ๐‘‘๐‘˜ หœp๐‘˜+1 0 1 (cid:21) T (cid:21) T , , 1 0 if if หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , หœp๐‘˜+1 โ‰ค ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , otherwise, ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘ฆ0โˆ— ๐‘˜ = ๐‘ง0โˆ— ๐‘˜ = where The saddle-point value parameter at time ๐‘˜ is given by: หœp๐‘˜+1 := (cid:101)๐‘Š๐‘˜ p1 ๐‘˜+1 โˆ’ (cid:101)๐ต๐‘˜ p0 ๐‘˜+1 . if if หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , หœp๐‘˜+1 โ‰ค ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , ๐บ0 ๐‘˜ + ๐‘‘๐‘˜ โˆ’ ๐‘‘๐‘˜ ๐‘Ž๐‘˜ หœp๐‘˜+1 + (cid:101)๐ต2 ๐‘˜ p0 ๐‘˜+1 + ๐พ 2 ๐‘˜ ๐‘€๐‘˜ , ๐บ0 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ + (cid:101)๐‘Š 2 ๐‘˜ p1 ๐‘˜+1 + ๐พ 2 ๐‘˜ ๐‘€๐‘˜ , p0 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ0 ๐‘˜ + ๐พ 2 ๐‘˜ ๐‘€๐‘˜ + (cid:101)๐ต2 ๐‘˜ p0 ๐‘˜+1 , otherwise, 105 (Case ๐›ผ๐‘˜ = 1) The unique NE takeover strategies are given by: (cid:20) (cid:20) (cid:20) 1 โˆ’ ๐‘Ž๐‘˜ หœp๐‘˜+1 (cid:21) T , ๐‘Ž๐‘˜ หœp๐‘˜+1 0 1 (cid:21) T (cid:21) T , , 1 0 if if หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 โ‰ค ๐‘Ž๐‘˜ , otherwise, (cid:20) ๐‘‘๐‘˜ ห‡p๐‘˜+1 (cid:20) 1 1 โˆ’ (cid:21) T , ๐‘‘๐‘˜ หœp๐‘˜+1 if หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , (cid:21) T , 0 otherwise, ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃณ ๐‘ฆ1โˆ— ๐‘˜ = ๐‘ง1โˆ— ๐‘˜ = The saddle-point value parameter at time ๐‘˜ is given by, p1 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ1 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ + ๐‘‘๐‘˜ ๐‘Ž๐‘˜ หœp๐‘˜+1 + (cid:101)๐‘Š 2 ๐‘˜ p1 ๐‘˜+1 โˆ’ ๐‘Š 2 ๐‘˜ ๐‘๐‘˜ , ๐บ1 ๐‘˜ + ๐‘‘๐‘˜ โˆ’ ๐‘Š 2 ๐‘˜ ๐‘๐‘˜ + (cid:101)๐ต2 ๐‘˜ p0 ๐‘˜+1 , if if หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 โ‰ค ๐‘Ž๐‘˜ , ๐บ1 ๐‘˜ โˆ’ ๐‘Š 2 ๐‘˜ ๐‘๐‘˜ + (cid:101)๐‘Š 2 ๐‘˜ p1 ๐‘˜+1 , otherwise. The terminal conditions for the recursions (4.31) and (4.34) are: ๐ฟ+1 := ๐บ0 p0 ๐ฟ+1 , ๐ฟ+1 := ๐บ1 p1 ๐ฟ+1 (4.32) (4.33) (4.34) โ–ก Proof: We begin the proof by determining the NE takeover in both pure and mixed strategies, and computing the corresponding saddle-point value parameter for the FlipDyn state of ๐›ผ = 0. We substitute the quadratic costs (4.28) and linear dynamics (4.27) in the term (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) from (4.53) to 106 obtain: หœ๐‘ƒ๐‘˜+1(๐‘ฅ) := (cid:16) (๐ธ๐‘ฅ + ๐ป๐‘˜๐‘Š๐‘˜ )2p1 ๐‘˜+1 โˆ’ (๐ธ๐‘ฅ + ๐ต๐‘˜ ๐พ๐‘˜ )2p0 ๐‘˜+1 (cid:17) ๐‘ฅ2, = หœp๐‘˜+1๐‘ฅ2 Substituting หœp๐‘˜+1 and takeover costs (4.28) in (4.14) and (4.15), we obtain the NE takeover strategies presented in (4.29) and (4.30), respectively. The NE takeover strategies for the FlipDyn state of ๐›ผ๐‘˜ = 1 can trivially be obtained by taking the complementary of (4.29) and (4.30), resulting in (4.32) and (4.33), respectively. To obtain a backward recursion for the parameter p0 ๐‘˜ , we substitute the linear dynamics (4.27) along with quadratic costs (4.28) in (4.16), which yields: ๐‘˜ ๐‘ฅ2 = p0 ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ (๐บ0 ๐‘˜ + ๐‘‘๐‘˜ )๐‘ฅ2 โˆ’ ๐‘‘๐‘˜ ๐‘Ž๐‘˜ ๐‘ฅ4 หœp๐‘˜+1๐‘ฅ2 + (๐พ 2 ๐‘˜ ๐‘€๐‘˜ + (cid:101)๐ต2 ๐‘˜ p0 ๐‘˜+1)๐‘ฅ2, (๐บ0 ๐‘˜ + (cid:101)๐‘Š 2 ๐‘˜ p1 ๐‘˜+1 โˆ’ ๐‘Ž๐‘˜ + ๐พ 2 ๐‘˜ ๐‘€๐‘˜ )๐‘ฅ2, (๐บ0 ๐‘˜ + ๐พ 2 ๐‘˜ ๐‘€๐‘˜ + (cid:101)๐ต2 ๐‘˜ p0 ๐‘˜+1)๐‘ฅ2, if if ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 โ‰ค ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , otherwise. Factoring out the term ๐‘ฅ2, we arrive at (4.31). Similar substitutions for the FlipDyn state of ๐›ผ๐‘˜ = 1, we obtain (4.34). The state cost at the time instant ๐ฟ + 1 yields the terminal condition on the parameters of saddle-point values ๐ฟ+1 := ๐บ0 p0 ๐ฟ+1 , ๐ฟ+1 := ๐บ1 p1 ๐ฟ+1 โ–ก Corollary 1 presents a closed-form solution for the FlipDyn (4.7) game for a given control policy, with NE takeover strategies independent of state of the scalar/1-dimensional system. The saddle-point values of the FlipDyn game for a given control policy corresponding to ๐›ผ1 = 0 and ๐›ผ1 = 1 are given by: ๐ฝ๐ธ (๐‘ฅ1, 0, ๐‘ฆโˆ— L , ๐‘งโˆ— L , ๐‘ขL, ๐‘คL) = ๐‘ฅT 1 p0 1 ๐‘ฅ1, ๐ฝ๐ธ (๐‘ฅ1, 1, ๐‘ฆโˆ— L , ๐‘งโˆ— L , ๐‘ขL, ๐‘คL) = ๐‘ฅT 1 p1 1 ๐‘ฅ1. 107 We can determine a minimum adversarial state cost ๐บ1โˆ— ๐‘˜ that guarantees a mixed strategy NE takeover at every time ๐‘˜ โˆˆ K. We characterize such an adversarial state cost in the following remark. Remark 4.4.3 Given a scalar/1-dimensional system (4.27) with quadratic costs (4.28), the mixed strategy NE takeover and the corresponding recursion for the saddle-point value parameter, as outlined in Corollary 1, exists for an adversary state-dependent cost ๐บ1โˆ— ๐‘˜ โ‰ค ๐บ1 ๐‘˜ provided หœp๐‘˜+1 > ๐‘‘๐‘˜ , หœp๐‘˜+1 > ๐‘Ž๐‘˜ , โˆ€๐‘˜ โˆˆ K, with the parameters at the time ๐ฟ + 1 satisfying ๐ฟ+1 = ๐บ1โˆ— p1 ๐ฟ+1 , ๐ฟ+1 = ๐บ0 p0 ๐ฟ+1 . (4.35) The parameters ๐บ1โˆ— ๐‘˜ in Remark 4.4.3 can be computed using a bisection method at every time ๐‘˜ โˆˆ K. Given an arbitrary adversary control cost ๐บ1 ๐‘˜ , we start by updating the parameter of saddle-point value in (4.31) and (4.34) backward in time. At any time instant ๐‘˜ โˆˆ K, if any of inequalities, หœp๐‘˜+1 โ‰ค ๐‘‘๐‘˜ and หœp๐‘˜+1 โ‰ค ๐‘Ž๐‘˜ are not satisfied, the adversary cost ๐บ1 ๐‘˜ is updated using the bisection method. This process is iteratively repeated until the time instant ๐‘˜ = 0 and the bisection method has converged. The determined cost ๐บ1โˆ— ๐‘˜ indicates the minimal cost the adversary must bear to achieve a mixed strategy takeover. Next, we illustrate the results of Corollary 1 through a numerical example. A Numerical Example (Mixed strategy NE) In this numerical example we will only focus on the mixed strategy NE and the corresponding saddle-point value parameters obtained in Corollary 1 on a linear time-invariant (LTI) scalar system for a horizon length of ๐ฟ = 50. The defender and adversary dynamics are given by: ๐‘˜ (๐‘ฅ๐‘˜ ) := (๐ธ + ๐ต๐พ)๐‘ฅ๐‘˜ , ๐น1 ๐น0 ๐‘˜ (๐‘ฅ๐‘˜ ) := ๐ธ๐‘ฅ๐‘˜ . In this numerical example we assume adversary cannot directly control the system. In other words, the control matrix ๐ป๐‘˜ := 0๐‘›ร—๐‘ and control gain ๐‘Š๐‘˜ := 0๐‘ร—๐‘›, โˆ€๐‘˜ โˆˆ K. The quadratic costs (4.28) 108 (a) (b) Figure 4.2 (a) Coefficient of the parameterized value function, p0 and p1 for a 1-dimensional system where the state is bounded (๐น โ‰ค 1) over a horizon length of ๐ฟ = 50. (b) Attack and defense policy corresponding to the value function in Figure 4.2a for the given set of costs. (a) (b) Figure 4.3 (a) Coefficient of the parameterized value function, p0 and p1 for an unbounded (๐น โ‰ฅ 1) 1-dimensional system with a horizon length of ๐ฟ = 50. (b) Policy of defense and attack for the obtained parameterized value function indicated in Figure 4.3a. are assumed to fixed โˆ€๐‘˜ โˆˆ K, given by ๐บ0 ๐‘˜ = ๐บ0 = 1, ๐บ1 ๐‘˜ = ๐บ1 = 1, ๐‘€๐‘˜ = ๐‘€ = 0.65, ๐‘Ž๐‘˜ = ๐‘Ž = {0.5, 0.9},๐‘‘๐‘˜ = ๐‘‘ = {0.5, 0.9}. 109 102030405024680.20.40.610203040500.20.40.60.810203040502461040.20.40.60.810203040500.20.40.60.8 The control matrix for the defender is: ๐ต๐‘˜ = ฮ”๐‘ก, โˆ€๐‘˜ โˆˆ K, where ฮ”๐‘ก = 0.1 for the numerical evaluation. We obtain the defender gain ๐พ by solving the LQR problem with arbitrarily weighted state and control cost. We solve for the NE takeover in the space of mixed strategies and the saddle-point value parameters for two cases of fixed state transition constant ๐ธ๐‘˜ = ๐ธ, โˆ€๐‘˜ โˆˆ K: ๐ธ = 0.99 and 1.1 corresponding to a given choice of takeover costs. Figure 4.2a illustrates the saddle-point value parameters p๐‘– ๐‘˜ , ๐‘– โˆˆ {0, 1}, ๐‘˜ โˆˆ K for ๐ธ = 0.99 corresponding to different takeover costs. We observe that the saddle-point value parameters are bounded and reach an asymptotic value. On the contrary, from Figure 4.3a we observer for ๐ธ = 1.1, the saddle-point value parameters for the adversary increases exponentially backward in time. Although the saddle-point value parameters of the defender are bounded and reach a fixed value for both the cases of ๐ธ = 0.99 and ๐ธ = 1.1. Such an evolution of saddle-point value parameters indicates that as the system moves from open-loop stable ๐ธ < 1 to unstable ๐ธ โ‰ฅ 1, there is a large incentive for an adversary to takeover the system. Figure 4.2b shows the takeover policies of both the players for the case of ๐ธ = 0.99 when ๐›ผ๐‘˜ = 0, โˆ€๐‘˜ โˆˆ K. When the defender takeover cost is lower compared to the adversary, there is low probability of takeover by the defender compared to the adversary, expect for last few time instants of the game. Such a takeover strategy changes when the takeover cost of the defender is higher compared to that of the adversary, resulting in a higher probability of takeover by both the players. Finally, Figure 4.3b illustrates the takeover policy for the case of ๐ธ = 1.1 and ๐›ผ๐‘˜ = 0, ๐‘˜ โˆˆ K, where both the defender and adversary converge to an asymptotic value probability of takeover and remaining idle, respectively. Next, we will extend our derivation and analysis of the FlipDyn game with known control policies for discrete-time linear dynamics and quadratic costs to ๐‘›โˆ’dimensions. 4.4.2 n-dimensional system Unlike the scalar case, wherein the state ๐‘ฅ was factored out during the computation of the NE takeover strategies and saddle-point value parameters p0 ๐‘˜ and p1 ๐‘˜ , that simplification does not yield 110 exact results for an ๐‘›โˆ’dimensional system. The challenge for factoring out the state at any time ๐‘˜ โˆˆ K, arises from the term ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) = ๐‘ฅT (cid:16) (cid:101)๐‘Š T ๐‘˜ ๐‘ƒ1 ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ ๐‘˜ ๐‘ƒ0 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ โˆ’ (cid:101)๐ตT ๐‘˜+1 (cid:101)๐ต๐‘˜ , (cid:17) ๐‘ฅ (4.36) which arises when a mixed strategy NE takeover is played in either of the FlipDyn states. To address the challenge in factoring out the state, we impose a particular form for the takeover costs, stated in the following assumption. Here, and in the sequel, let I๐‘› โˆˆ R๐‘›ร—๐‘› denote the identity matrix. Assumption 4.4.4 At any time instant ๐‘˜ โˆˆ K, we define the defender and adversary costs as follows: ๐‘‘๐‘˜ (๐‘ฅ) := ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, ๐‘Ž๐‘˜ (๐‘ฅ) := ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (4.37) where ๐‘‘๐‘˜ โˆˆ R and ๐‘Ž๐‘˜ โˆˆ R are non-negative scalars. Next, we introduce an approximation consisting of the state ๐‘ฅ and a matrix ๐‘ƒ. Proposition 4.4.5 Given a positive definite matrix ฮจ, the term ๐‘ฅT๐‘ฅ ๐‘ฅTฮจ๐‘ฅ, consisting of state ๐‘ฅ can be upper bounded by ๐‘ฅT๐‘ฅ ๐‘ฅTฮจ๐‘ฅ, โ‰ค ๐‘ฅTฮจโˆ’1๐‘ฅ ๐‘ฅT๐‘ฅ . Proof: Setting, observe that the 2 ร— 2 matrix ฮ“ := ฮจ1/2 ๐‘˜+1 ๐‘ฅ, ฮฉ := ฮจโˆ’1/2 ๐‘˜+1 ๐‘ฅ, M := ฮ“๐‘‡ ฮ“ ฮ“๐‘‡ ฮฉ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ฮฉ๐‘‡ ฮ“ ฮฉ๐‘‡ ฮฉ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:104) = ฮ“ ฮฉ (cid:105)๐‘‡ (cid:104) (cid:105) โชฐ 0. ฮ“ ฮฉ 111 (4.38) โ–ก Therefore, det(M) = (ฮ“๐‘‡ ฮ“) (ฮฉ๐‘‡ ฮฉ) โˆ’ (ฮ“๐‘‡ ฮฉ)2 โ‰ฅ 0, and thus our claim (4.38) holds. โ–ก Using the Assumption 4.4.4 and results from Proposition 4.4.5 we derive an approximation for the saddle-point value parameters state in the following result. Lemma 4.4.6 Under Assumptions 4.4.2 and 4.4.4, consider a linear dynamical system described by (4.27) with quadratic costs (4.25) and FlipDyn dynamics (4.3) with known saddle-point value parameters ๐‘ƒ1 ๐‘˜+1 and ๐‘ƒ0 ๐‘˜+1, such the following conditions are satisfied: (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š๐‘˜ )T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š๐‘˜ ) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ )T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ ) โ‰ป ๐‘‘๐‘˜ I๐‘›, (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š๐‘˜ )T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š๐‘˜ ) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ )T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ ) โ‰ป ๐‘Ž๐‘˜ I๐‘›. (4.39) (4.40) Then, the saddle-point value parameters at time ๐‘˜ โˆˆ K, under a mixed strategy NE takeover of both the FlipDyn states satisfy ๐‘ƒ0 ๐‘˜ โชฏ ๐บ0 ๐‘˜ + ๐‘‘๐‘˜ I๐‘› + ๐พ T ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ + (cid:101)๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 (cid:101)๐ต๐‘˜ โˆ’ ๐‘‘๐‘˜ `๐‘ƒโˆ’1 ๐‘˜+1 ๐‘Ž๐‘˜ , ๐‘ƒ1 ๐‘˜ โชฐ ๐บ1 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ I๐‘› โˆ’ ๐‘Š T ๐‘˜ ๐‘๐‘˜๐‘Š๐‘˜ + (cid:101)๐‘Š T ๐‘˜ ๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ + ๐‘‘๐‘˜ `๐‘ƒโˆ’1 ๐‘˜+1 ๐‘Ž๐‘˜ , where `๐‘ƒ๐‘˜+1 := (cid:101)๐‘Š T ๐‘˜ ๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ โˆ’ (cid:101)๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 (cid:101)๐ต๐‘˜ . (4.41) (4.42) โ–ก Proof: We will show the proof only for (4.41), as the derivation is analogous for (4.42). Under a mixed strategy NE takeover, we substitute the linear dynamics (4.27), quadratic costs (4.25) and takeover costs (4.37) in (4.16) to obtain: ๐‘ฅT๐‘ƒ0 ๐‘˜ ๐‘ฅ = ๐‘ฅT (cid:16) ๐บ0 ๐‘˜ + ๐‘‘๐‘˜ I๐‘› + ๐พ T ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ + (cid:101)๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 (cid:17) ๐ต๐‘˜ ๐‘ฅ โˆ’ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT `๐‘ƒ๐‘˜+1๐‘ฅ . Using the results from (4.38) to bound the term consisting of the state ๐‘ฅ and `๐‘ƒ and factoring out the state, we obtain (4.41). โ–ก 112 The upper bound for the saddle-point value parameters derived in Lemma 4.4.6 enable us to recursively define an approximate saddle-point value parameter of the form: ๐‘˜ (๐‘ฅ) := ๐‘ฅT ห†๐‘ƒ0 ห†๐‘‰ 0 ๐‘˜ ๐‘ฅ, ๐‘˜ (๐‘ฅ) := ๐‘ฅT ห†๐‘ƒ1 ห†๐‘‰ 1 ๐‘˜ ๐‘ฅ, (4.43) where ห†๐‘ƒ1 ๐‘˜ โˆˆ R๐‘›ร—๐‘› and ห†๐‘ƒ0 ๐‘˜ โˆˆ R๐‘›ร—๐‘›. Similar to results obtained in Corollary 1, we will use the results from Theorem 4.3.2 to provide an approximate NE takeover pair { ห†๐‘ฆ๐›ผโˆ— ๐‘˜ , ห†๐‘ง๐›ผโˆ— the corresponding approximate saddle-point value update of the parameter ห†๐‘ƒ๐›ผ ๐‘˜ }, in both pure and mixed strategies of both players, and ๐‘˜ โˆˆ R๐‘›ร—๐‘›, ๐›ผ โˆˆ {0, 1}. Corollary 2 (Case ๐›ผ๐‘˜ = 0) The approximate NE takeover strategies of the FlipDyn-Con game (4.7) with known control policies at any time ๐‘˜ โˆˆ K, subject to dynamics in (4.27), with quadratic costs (4.25), takeover costs (4.37), and FlipDyn dynamics (4.3) are given by: ห†๐‘ฆ0โˆ— ๐‘˜ = ห†๐‘ง0โˆ— ๐‘˜ = ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ 1 โˆ’ ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) 1 0 T ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:21) T , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, , otherwise, ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) (cid:20) 1 โˆ’ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ 0 1 1 0 T ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:21) T (cid:21) T , if , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, , otherwise, (4.44) (4.45) where and (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) := ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ. ห‡๐‘ƒ๐‘˜+1 := (cid:101)๐‘Š T ๐‘˜ ห†๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ โˆ’ (cid:101)๐ตT ๐‘˜ ห†๐‘ƒ0 ๐‘˜+1 (cid:101)๐ต๐‘˜ 113 The approximate saddle-point value parameter at time ๐‘˜ is given by: ห†๐‘ƒ0 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ0 ๐‘˜ + ๐‘‘๐‘˜ I๐‘› + ๐พ T ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ + (cid:101)๐ตT ๐‘˜ ห†๐‘ƒ0 ๐‘˜+1 (cid:101)๐ต๐‘˜ โˆ’ ๐‘‘๐‘˜ ห‡๐‘ƒโˆ’1 ๐‘˜+1 ๐‘Ž๐‘˜ ๐บ0 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ I๐‘› + ๐พ T ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ + (cid:101)๐‘Š T ๐‘˜ ห†๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ โˆ’ ๐‘Š T ๐‘˜ ๐‘๐‘˜๐‘Š๐‘˜ , ๐บ0 ๐‘˜ + ๐พ T ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ + (cid:101)๐ตT ๐‘˜ ห†๐‘ƒ0 ๐‘˜+1 (cid:101)๐ต๐‘˜ , (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (4.46) if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, otherwise. (Case ๐›ผ๐‘˜ = 1) The approximate NE takeover strategies are given by: ห†๐‘ฆ1โˆ— ๐‘˜ = ๏ฃฎ ๏ฃฏ ๏ฃฏ 1 โˆ’ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ (cid:20) (cid:20) 0 1 1 0 T ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:21) T (cid:21) T , , , if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, otherwise, (cid:20) ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ 1 โˆ’ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ (cid:21) T , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, ห†๐‘ง1โˆ— ๐‘˜ = 1 (cid:21) T 0 , otherwise. ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃณ (cid:20) (4.47) (4.48) The approximate saddle-point value parameter at time ๐‘˜ is given by, ห†๐‘ƒ1 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ1 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ I๐‘› โˆ’ ๐‘Š T ๐‘˜ ๐‘๐‘˜๐‘Š๐‘˜ + (cid:101)๐‘Š T ๐‘˜ ห†๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ + ๐‘‘๐‘˜ `๐‘ƒโˆ’1 ๐‘˜+1 ๐บ1 ๐‘˜ + ๐‘‘๐‘˜ I๐‘› โˆ’ ๐‘Š T ๐‘˜ ๐‘๐‘˜๐‘Š๐‘˜ + (cid:101)๐‘Š T ๐‘˜ ห†๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ , ๐บ1 ๐‘˜ โˆ’ ๐‘Š T ๐‘˜ ๐‘๐‘˜๐‘Š๐‘˜ + (cid:101)๐‘Š T ๐‘˜ ห†๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ , 114 (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (4.49) ๐‘Ž๐‘˜ , if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, otherwise. The terminal conditions for the recursions (4.46) and (4.49) are: ห†๐‘ƒ0 ๐ฟ+1 := ๐บ0 ๐ฟ+1 , ๐ฟ+1 := ๐บ1 ห†๐‘ƒ1 ๐ฟ+1 . Proof: [Outline] We begin the proof by determining the NE takeover in pure and mixed strategies โ–ก for the FlipDyn state of ๐›ผ = 0. We substitute the quadratic costs (4.25), linear dynamics (4.27) in (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) with the approximate saddle-point value parameters ๐‘ƒ0 ๐‘˜+1 and ๐‘ƒ1 ๐‘˜+1 to obtain: (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) := ห‡๐‘‰ 1 = ๐‘ฅT (cid:16) ๐‘˜+1( (cid:101)๐‘Š๐‘˜ ๐‘ฅ) โˆ’ ห‡๐‘‰ 0 ๐‘˜+1( (cid:101)๐ต๐‘˜ ๐‘ฅ), (cid:101)๐‘Š T ๐‘˜ ห†๐‘ƒ1 ๐‘˜+1 (cid:101)๐‘Š๐‘˜ โˆ’ (cid:101)๐ตT ๐‘˜ ห†๐‘ƒ0 ๐‘˜+1 (cid:101)๐ต๐‘˜ (cid:17) ๐‘ฅ, = ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ. We substitute the takeover cost (4.37) and ๐‘ฅT ห‡๐‘ƒ๐‘˜+1๐‘ฅ in (4.14) and (4.15), to obtain the NE takeover policies in (4.44) and (4.45), respectively. The approximate NE takeover strategies of the FlipDyn state ๐›ผ = 1 are complementary to ๐›ผ = 0, presented in (4.47) and (4.48). To determine the approximate saddle-point value parameters under a mixed strategy NE takeover of the FlipDyn state of ๐›ผ = 0, we substitute the upper bound (4.41) from Lemma 4.4.6 and replace ๐‘ƒ0 ๐‘˜+1 with ห‡๐‘ƒ0 and discrete-time linear dynamics (4.27) to obtain the approximate saddle-point value parameters. ๐‘˜+1. Under a pure strategy NE takeover, we substitute the quadratic costs (4.25) Combining both the solutions from the mixed and pure strategy NE takeover, we obtain (4.46). We skip the proof of the saddle-point value parameter ห†๐‘ƒ1 ๐‘˜+1 for brevity. โ–ก Recursions (4.46) and (4.49) provide an approximate solution to the FlipDyn problem (4.7) for the ๐‘›-dimensional case with known control policies. Analogous to the scalar scar, we can determine a minimum adversarial state cost ๐บ1โˆ— ๐‘˜ that guarantees a mixed strategy NE takeover at every time ๐‘˜ โˆˆ K. We characterize such an adversarial state cost in the following remark. Remark 4.4.7 Given a ๐‘›โˆ’dimensional linear system (4.27) with quadratic costs (4.25), the mixed strategy NE takeover and the corresponding recursion for the approximate saddle-point value 115 parameter, as outlined in Corollary 2, exists for an adversary state-dependent cost ๐บ1โˆ— ๐‘˜ โชฏ ๐บ1 ๐‘˜ provided ห‡๐‘ƒ๐‘˜+1 โ‰ป ๐‘‘๐‘˜ I๐‘›, ห‡๐‘ƒ๐‘˜+1 โ‰ป ๐‘Ž๐‘˜ I๐‘›, โˆ€๐‘˜ โˆˆ K, with the parameters at the time ๐ฟ + 1 given by: ๐ฟ+1 := ๐บ1โˆ— ห‡๐‘ƒ1 ๐ฟ+1 , ๐ฟ+1 := ๐บ0 ห‡๐‘ƒ0 ๐ฟ+1 . (4.50) Similar to the scalar/1-dimensional case, we can determine ๐บ1โˆ— ๐‘˜ using a bisection method. Next, we illustrate the results of the approximate value function on a numerical example. A Numerical Example (Mixed strategy NE) (a) (b) Figure 4.4 (a) Minimum eigenvalue of the saddle-point value parameters, ๐œ†๐‘› ( ห†๐‘ƒ0) and ๐œ†๐‘› ( ห†๐‘ƒ1), given ๐‘’ = 0.99 over a horizon length of ๐ฟ = 100. (b) Takeover strategies corresponding to the saddle-point value in Figure 4.4a for a given initial state ๐‘ฅ1 and FlipDyn state ๐›ผ๐‘˜ = 0, โˆ€๐‘˜ โˆˆ K. Similar to the scalar/1-dimensional case, in this numerical example we will only focus on the mixed strategy NE and the corresponding saddle-point value parameters obtained in Corollary 2 on a linear time-invariant (LTI) scalar system for a horizon length of ๐ฟ = 100. For this example, we use a double integrator for the defender and adversary given by: ๐‘˜ (๐‘ฅ๐‘˜ ) := (๐ธ + ๐ต๐พ)๐‘ฅ๐‘˜ , ๐น1 ๐น0 ๐‘˜ (๐‘ฅ๐‘˜ ) := ๐ธ๐‘ฅ๐‘˜ , 116 204060801002004006000.850.90.95204060801000.050.10.15 (a) (b) Figure 4.5 (a) Minimum eigenvalue of the saddle-point value parameters, ๐œ†๐‘› ( ห†๐‘ƒ0) and ๐œ†๐‘› ( ห†๐‘ƒ1), given ๐‘’ = 1.01 over the same horizon length of ๐ฟ = 100. (b) Takeover strategies corresponding to the saddle-point value in Figure 4.5a for a given initial state ๐‘ฅ1 and FlipDyn state ๐›ผ๐‘˜ = 0, โˆ€๐‘˜ โˆˆ K. where ๐ธ๐‘˜ = ๐ธ = , ๐ต๐‘˜ = , โˆ€๐‘˜ โˆˆ K. ๐‘’ ฮ”๐‘ก 0 ๐‘’ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ฮ”๐‘ก 0 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป Similar to the scalar/1โˆ’dimensional case, we solve for the approximate NE takeover strategies and saddle-point value function parameters for two cases with a fixed state transition constant ๐‘’๐‘˜ = ๐‘’, โˆ€๐‘˜ โˆˆ K: ๐‘’ = 0.99 and 1.01 given ฮ”๐‘ก = 0.1. The system represents a second order system with acceleration as the control input. Analogous to the scalar case, we obtain the defenderโ€™s gain ๐พ using the LQR method. The quadratic costs (4.25) are assumed to fixed โˆ€๐‘˜ โˆˆ K, given by ๐บ0 ๐‘˜ = ๐บ0 = I๐‘›, ๐บ1 ๐‘˜ = ๐บ1 =1.35I๐‘›, ๐ท ๐‘˜ = ๐ท = {0.5, 0.9}I๐‘›, ๐ด๐‘˜ = ๐ด = {0.5, 0.9}I๐‘›, ๐‘€๐‘˜ = ๐‘€ = 0.65, ๐‘๐‘˜ = ๐‘ = 0.45 Since the saddle-point value parameters for ๐‘›-dimensions are symmetric positive definite ma- trices, we plot the minimum eigenvalues of ห†๐‘ƒ1 ๐‘˜ , ห†๐‘ƒ0 ๐‘˜ shown in Figure 4.4a and 4.5a Akin to the scalar case, we observe a similar trend of converging coefficients when ๐‘’ โ‰ค 1, i.e., the system remains bounded upon lack of control, whereas the eigenvalues of saddle-point parameters of the adversary diverges for ๐‘’ > 1, indicating a large incentive for an adversary in taking over the system backward in time. Since the player policies for the ๐‘›-dimensional case 117 20406080100204060800.20.40.60.8204060801000.20.40.60.8 are functions of state, and the FlipDyn state is a random variable, the attack and defense takeover actions are averaged over 500 independent simulations for ๐‘’ := 0.99 and ๐‘’ := 1.01 are shown in Figures 4.4b and 4.5b, respectively, with the initial state ๐‘ฅ0 = (cid:105)๐‘‡ (cid:104) 0 1 . We observe a dynamic policy over the horizon length for the case of ๐‘’ := 0.99, and a converging pure policy for ๐‘’ := 1.01 for the FlipDyn state ๐›ผ = 0, respectively. The converging pure policy for ๐‘’ := 1.01 is reflective of the ever increasing value of the adversary over the horizon length. Corollary 1 and 2 completely characterize the takeover strategies of the FlipDyn game when the control policies are known in the space of pure and mixed strategies. The closed-form expressions of takeover strategies provides computational efficiency and scalability for large horizons. Next, we will derive both the control policies and takeover strategies of both the player. 4.5 FlipDyn-Con for LQ Problems In this section we will solve the complete FlipDyn-Con game. We will state the underlying control problem and show how it decouples from the game of takeover. We will derive conditions under which we obtain a linear state-feedback policies and how it impacts the saddle-point value of the game. 4.5.1 Control policy for the FlipDyn-Con LQ Problem To determine the control policies for both players, we need to solve the following problems in each FlipDyn state min ๐‘ข๐‘˜ (๐‘ฅ) max ๐‘ค ๐‘˜ (๐‘ฅ) ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘˜+1 + ๐‘ขT ๐‘ฃ0 ๐‘˜ (๐‘ฅ) ๐‘€๐‘˜๐‘ข๐‘˜ (๐‘ฅ) โˆ’ ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) , ๐‘˜+1 + ๐‘ขT ๐‘ฃ1 ๐‘˜ (๐‘ฅ) ๐‘€๐‘˜๐‘ข๐‘˜ (๐‘ฅ) โˆ’ ๐‘ฅT ๐‘˜ ๐ด๐‘˜ ๐‘ฅ, if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ, (4.51) (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ, ๐‘ฃ0 ๐‘˜+1 + ๐‘ขT ๐‘˜ (๐‘ฅ) ๐‘€๐‘˜๐‘ข๐‘˜ (๐‘ฅ), otherwise, 118 and where, min ๐‘ข๐‘˜ (๐‘ฅ) max ๐‘ค ๐‘˜ (๐‘ฅ) ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘˜+1 โˆ’ ๐‘คT ๐‘ฃ1 ๐‘˜ (๐‘ฅ)๐‘๐‘˜ ๐‘ค ๐‘˜ (๐‘ฅ) + ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) , ๐‘ฃ0 ๐‘˜+1 โˆ’ ๐‘คT ๐‘˜ (๐‘ฅ)๐‘๐‘˜ ๐‘ค ๐‘˜ (๐‘ฅ) + ๐‘ฅT ๐‘˜ ๐ท ๐‘˜ ๐‘ฅ, ๐‘˜+1 โˆ’ ๐‘คT ๐‘ฃ1 ๐‘˜ (๐‘ฅ)๐‘๐‘˜ ๐‘ค ๐‘˜ (๐‘ฅ), otherwise, if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ, (4.52) (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) := ๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘ฃ0 ๐‘˜+1 . (4.53) The terms ๐‘ฃ0 ๐‘˜+1 and ๐‘ฃ1 ๐‘˜+1 are defined in (4.9) and (4.10), respectively. The first condition in both (4.51) and (4.52) pertains to NE takeover in mixed strategies by both players, while the remaining conditions correspond to playing NE takeover in pure strategies. Notably, the problems corresponding to NE takeover in mixed strategies contain the term (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ), which couples the saddle-point values in both the FlipDyn state. In the following results, we will derive the control policies for NE takeover, both in pure and mixed strategies, for each of the FlipDyn state. Furthermore, we observe that the min-max problem corresponding to the NE takeover in pure strategies in each of the FlipDyn relies on the solution to the NE takeover in mixed strategies ( (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐ท ๐‘˜ ๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT ๐ด๐‘˜ ๐‘ฅ). Thus, we will begin by deriving the control policies for NE takeover in mixed strategies. We restrict the control policies to be linear state-feedback in the continuous state ๐‘ฅ, as stated in Assumption 4.4.2. Under Assumption 4.4.2 the parameteric form of the saddle-point value in each of the FlipDyn states still hold, given by: ๐‘˜ (๐‘ฅ, ๐‘ข๐‘˜ (๐‘ฅ), ฮž0 ๐‘‰ 0 ๐‘˜+1) โ‡’ ๐‘‰ 0 ๐‘˜ (๐‘ฅ) := ๐‘ฅT๐‘ƒ0 ๐‘˜ ๐‘ฅ, ๐‘˜ (๐‘ฅ, ๐‘ค ๐‘˜ (๐‘ฅ), ฮž1 ๐‘‰ 1 ๐‘˜+1) โ‡’ ๐‘‰ 1 ๐‘˜ (๐‘ฅ) := ๐‘ฅT๐‘ƒ1 ๐‘˜ ๐‘ฅ, where ๐‘ƒ0 ๐‘˜ and ๐‘ƒ1 ๐‘˜ are ๐‘› ร— ๐‘› real symmetric matrices corresponding to the FlipDyn states ๐›ผ = 0 and 1, respectively. 119 Furthermore, Assumption 4.4.4 plays an essential role in computing the saddle-point value for the ๐‘›-dimensional dynamical system (Section 4.5.3). Next, we derive conditions under which there exists an optimal linear state-feedback control policy pair {๐‘ขโˆ— ๐‘˜ , ๐‘คโˆ— ๐‘˜ }. Theorem 4.5.1 Under Assumptions 4.4.2 and 4.4.4, consider a linear dynamical system described by (4.24) with quadratic costs (4.25), takeover costs (4.37), and FlipDyn dynamics (4.3). Suppose that for every ๐‘˜ โˆˆ K, ๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ โ‰ป 0, ๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ โ‰บ 0. (4.54) Then, an optimal linear state-feedback control policies of the form (4.26), under a mixed strategy NE takeover for both the defender and adversary are given by: ๐‘˜ (๐‘ฅ) := ๐พ โˆ— ๐‘ขโˆ— ๐‘˜ (๐œ‚๐‘˜ )๐‘ฅ = โˆ’( ห†๐œ‚๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ )โˆ’1( ห†๐œ‚๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ธ๐‘˜ )๐‘ฅ, ๐‘คโˆ— ๐‘˜ (๐‘ฅ) := ๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )๐‘ฅ = โˆ’( ห†๐œ‚๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ )โˆ’1( ห†๐œ‚๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ธ๐‘˜ )๐‘ฅ, where ห†๐œ‚๐‘˜ := 1 โˆ’ ๐œ‚2 ๐‘˜ and the parameter ๐œ‚๐‘˜ satisfies the following conditions: (4.55) (4.56) (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โ‰ป ๐‘‘๐‘˜ I๐‘›, (4.57) (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โ‰ป ๐‘Ž๐‘˜ I๐‘›, ๐‘ฅT (cid:16) (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ )) (cid:17) ๐‘ฅ = ๐‘ฅT๐‘ฅ โˆš ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜ . (4.58) (4.59) โ–ก Proof: Under Assumptions 4.4.2 and 4.4.4, if the adversary control policy ๐‘คโˆ— ๐‘˜ (๐‘ฅ) is known, then the defenderโ€™s control problem reduces to ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ โˆ’ ๐‘ฃ0 ๐‘ฃ1โˆ— ๐‘˜+1 ๐‘˜+1 , (4.60) min ๐พ๐‘˜ ๐‘˜+1 + ๐‘ฅT ๐‘ฃ0 ๐‘˜ ๐พ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ ๐‘ฅ โˆ’ 120 where ๐‘ฃ1โˆ— ๐‘˜+1 := ๐‘ฅT(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ )๐‘ฅ and ๐‘ฃ0 adversaryโ€™s control problem for a known defender policy ๐‘ขโˆ— ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ )T๐‘ƒ1 ๐‘˜+1 is defined in (4.9). Similarly, the ๐‘˜ (๐‘ฅ) is given by max ๐‘Š๐‘˜ ๐‘˜+1 โˆ’ ๐‘ฅT ๐‘ฃ1 ๐‘˜ ๐‘Š๐‘˜ ๐‘๐‘˜๐‘Š๐‘˜ ๐‘ฅ + ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ ๐‘ฃ1 ๐‘˜+1 โˆ’ ๐‘ฃ0โˆ— ๐‘˜+1 , (4.61) where ๐‘ฃ0โˆ— ๐‘˜+1 := ๐‘ฅT(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ )T๐‘ƒ0 ๐‘˜+1 (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ )๐‘ฅ, and ๐‘ฃ1 ๐‘˜+1 is defined in (4.10). Taking the first derivative of (4.60) and (4.61) with respect to the player control gains ๐พ๐‘˜ and ๐‘Š๐‘˜ , respectively, and solving the first-order optimality conditions, we obtain ๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ ) + ๐‘€๐‘˜ ๐พ๐‘˜ โˆ’ ๐‘Ž๐‘˜ ๐‘‘๐‘˜ (๐‘ฅT๐‘ฅ)2๐ตT (๐‘ฃ1โˆ— ๐‘˜+1 (๐ธ๐‘˜+๐ต๐‘˜ ๐พ๐‘˜) ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐‘˜+1)2 โˆ’๐‘ฃ0 ๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š๐‘˜ ) โˆ’ ๐‘๐‘˜๐‘Š๐‘˜ โˆ’ ๐‘Ž๐‘˜ ๐‘‘๐‘˜ (๐‘ฅT๐‘ฅ)2๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 โˆ’๐‘ฃ0โˆ— ๐‘˜+1 (๐ธ๐‘˜+๐ป๐‘˜๐‘Š๐‘˜) )2 (๐‘ฃ1 ๐‘˜+1 = 0๐‘šร—๐‘›, = 0๐‘ร—๐‘›, (4.62) (4.63) where 0๐‘–ร— ๐‘— โˆˆ R๐‘–ร— ๐‘— is a matrix of zeros. The terms ๐‘Ž๐‘˜ ๐‘‘๐‘˜ (๐‘ฅT๐‘ฅ)2๐ตT (๐‘ฃ1โˆ— ๐‘˜+1 ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐‘˜+1)2 โˆ’๐‘ฃ0 (๐ธ๐‘˜+๐ต๐‘˜ ๐พ๐‘˜) and ๐‘Ž๐‘˜ ๐‘‘๐‘˜ (๐‘ฅT๐‘ฅ)2๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 โˆ’๐‘ฃ0โˆ— ๐‘˜+1 (๐ธ๐‘˜+๐ป๐‘˜๐‘Š๐‘˜) )2 , (๐‘ฃ1 ๐‘˜+1 introduce non-linearity in ๐พ๐‘˜ and ๐‘Š๐‘˜ in (4.62) and (4.63), respectively. This prevents us from deriving an optimal linear control policy of the form (4.26). In order to address this limitation and achieve a linear control policy, we look for scalar parameters ๐œ‚๐‘˜,0 โˆˆ R and ๐œ‚๐‘˜,1 โˆˆ R such that they satisfy ๐‘ฅT (cid:16) (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ )T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ ) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ )T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ ) (cid:17) ๐‘ฅ = ๐‘ฅT๐‘ฅ ๐‘ฅT (cid:16) (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š๐‘˜ )T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š๐‘˜ ) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ )T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ ) (cid:17) ๐‘ฅ = ๐‘ฅT๐‘ฅ โˆš ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜,0 โˆš ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜,1 , . (4.64) (4.65) Substituting (4.64) and (4.65) in (4.62) and (4.63), respectively, and solving for the parameterized control gains we obtain: ๐พ โˆ— ๐‘˜ = โˆ’((1 โˆ’ ๐œ‚2 ๐‘˜,0)๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ )โˆ’1((1 โˆ’ ๐œ‚2 ๐‘˜,0)๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ธ๐‘˜ ), (4.66) 121 ๐‘Š โˆ— ๐‘˜ = โˆ’((1 โˆ’ ๐œ‚2 ๐‘˜,1)๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ )โˆ’1((1 โˆ’ ๐œ‚2 ๐‘˜,1)๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ธ๐‘˜ ), (4.67) โˆ€๐‘ฅ โˆˆ X. Substituting (4.66) and (4.67) in (4.64) and (4.65), respectively, yields an identical equation. This observation implies that if there exists a common parameter ๐œ‚๐‘˜ such that ๐œ‚๐‘˜ = ๐œ‚๐‘˜,0 = ๐œ‚๐‘˜,1, we can derive the control policy pair (4.55) and (4.56), with the condition of existence given in (4.59). The control policy pair {๐พ โˆ— the saddle-point values ๐‘‰ 0 ๐‘˜ (๐‘ฅ) and ๐‘‰ 1 ๐‘˜ , ๐‘Š โˆ— ๐‘˜ } constitutes a mixed strategy NE takeover with ๐‘˜ (๐‘ฅ), provided the control policy pair satisfies the conditions, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ. Substituting the dynamics (4.24) and the parameterized optimal control policies (๐‘ขโˆ— ๐‘˜ (๐‘ฅ), ๐‘คโˆ— ๐‘˜ (๐‘ฅ)) in (4.53) and factoring out the state ๐‘ฅ, we obtain the conditions (4.57) and (4.58). Furthermore, substituting (4.64) and (4.65) in (4.62) and (4.63), respectively, and then taking the second derivative with respect to ๐พ โˆ— ๐‘˜ and ๐‘Š โˆ— ๐‘˜ and solving for the second-order conditions, we conclude that the controls are optimal provided (1 โˆ’ ๐œ‚2 ๐‘˜ )๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ โ‰ป 0, (1 โˆ’ ๐œ‚2 ๐‘˜ )๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ โ‰บ 0. (4.68) Given the quadratic costs (4.25), as ๐œ‚๐‘˜ โ†’ 0, the second-order optimality condition (4.68) is always satisfied. Setting ๐œ‚๐‘˜ = 1 in (4.68), yields the limiting conditions (4.54). The obtained conditions verify/certify strong convexity in the control gain ๐พ๐‘˜ and strong concavity in ๐‘Š๐‘˜ , ensuring the existence of a unique saddle-point equilibrium. โ–ก Theorem 4.5.1 provides a condition under which there exists linear state-feedback control policy pair. This characterization will enable us to compute the saddle-point value efficiently using backward iteration. The following result further bounds the range of the parameter ๐œ‚๐‘˜ corresponding to the mixed strategy NE takeover. Proposition 4.5.2 The permissible range for the parameter ๐œ‚๐‘˜ , satisfying the condition (4.59) satisfies 0 < ๐œ‚๐‘˜ < โˆš๏ธ„ min{๐‘‘๐‘˜,๐‘Ž๐‘˜ } max{๐‘‘๐‘˜,๐‘Ž๐‘˜ } < 1. 122 (4.69) โ–ก Proof: A permissible parameter ๐œ‚๐‘˜ satisfying the condition (4.59) corresponds to a control policy ๐‘˜ (๐‘ฅ)} that constitutes a mixed strategy NE takeover with saddle-point values ๐‘‰ 0 ๐‘˜ (๐‘ฅ) pair {๐‘ขโˆ— and ๐‘‰ 1 ๐‘˜ (๐‘ฅ), ๐‘คโˆ— ๐‘˜ (๐‘ฅ). Such a control policy pair and ๐œ‚๐‘˜ must satisfy (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ. Since a lower bound on the term (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) is equivalent to the condition (4.59), we substitute the right-hand side of (4.59) into the prior stated conditions to obtain โˆš โˆš ๐‘ฅT๐‘ฅ ๐‘ฅ > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, ๐‘ฅT๐‘ฅ > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ. ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜ ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜ By eliminating the state ๐‘ฅ and combining the terms, we arrive at (4.69). โ–ก This proposition enables us to reduce the search space of the permissible parameter ๐œ‚๐‘˜ . In the subsequent sections, we will illustrate how this constrained range proves instrumental in determining a feasible ๐œ‚๐‘˜ for both scalar and ๐‘›โˆ’dimensional case. Given the control policies corresponding to the mixed strategy NE, we will now characterize the control policies for the NE takeover in both pure and mixed strategies. Theorem 4.5.3 Under Assumptions 4.4.2 and 4.4.4, consider a linear dynamical system described by (4.24) with quadratic costs (4.25), takeover costs (4.37), and FlipDyn dynamics (4.3). An optimal linear state-feedback control policy of the form (4.26), parametrized by a scalar ๐œ‚๐‘˜ โˆˆ [0, 1] is given by ๐‘ขโˆ— ๐‘˜ (๐‘ฅ) = ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ )๐‘ฅ, ๐พ โˆ— ๐‘˜ (1)๐‘ฅ, if if ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐พ โˆ— ๐‘˜ (0)๐‘ฅ, otherwise, ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ 123 (4.70) ๐‘คโˆ— ๐‘˜ (๐‘ฅ) = ๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )๐‘ฅ, ๐‘Š โˆ— ๐‘˜ (1)๐‘ฅ, if if ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒโˆ— ๐‘Š โˆ— ๐‘˜ (0)๐‘ฅ, otherwise, ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ where ๐‘˜+1(๐‘ฅ) := ๐‘ฅT (cid:16) (cid:101)๐‘ƒโˆ— (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )))T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ) and ๐œ‚๐‘˜ , ๐‘ƒ1 ๐‘˜+1 and ๐‘ƒ0 ๐‘˜+1 satisfy conditions (4.57), (4.58) and (4.59). (4.71) (cid:17) ๐‘ฅ, โ–ก Proof: We will establish the proof only for the defenderโ€™s control policy, as the derivation is analogous for the adversaryโ€™s control policy. We begin by considering the condition in both (4.70) and (4.71), specifically: ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, and (cid:101)๐‘ƒโˆ— (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ. Under these conditions and the conditions (4.57), (4.58) and (4.59), Theorem 4.5.1 yields mixed strategy NE takeover policies. To complete the remaining part of this claim, we proceed to derive the control policies for NE takeovers in pure strategies. i) Pure strategy: The defender chooses to stay idle whereas the adversary chooses to takeover. This takeover strategy is characterized by the conditions (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘‘๐‘˜ ๐ผ๐‘ฅ, and (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ. If the optimal adversary control policy ๐‘คโˆ— ๐‘˜ (๐‘ฅ) for the corresponding pure strategy NE takeover are known, the defenderโ€™s control problem simplifies to min ๐พ๐‘˜ ๐‘ฃ1โˆ— ๐‘˜+1 + ๐‘ฅT ๐‘˜ ๐พ T ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ ๐‘ฅ โˆ’ ๐‘ฅT ๐‘˜ ๐‘Ž๐‘˜ I๐‘›๐‘ฅ. (4.72) 124 Taking the first derivative of (4.72) with respect to ๐พ๐‘˜ , and subsequently applying the first-order optimality condition given ๐‘€๐‘˜ โˆˆ S๐‘šร—๐‘š + , we obtain ๐‘€๐‘˜ ๐พ๐‘˜ ๐‘ฅ๐‘ฅT = 0๐‘šร—๐‘›, โ‡’ ๐‘€ โˆ’1 ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ ๐‘ฅ๐‘ฅT = 0๐‘šร—๐‘›, โˆ€๐‘ฅ โˆˆ X โ‡’ ๐พ๐‘˜ = 0๐‘š = ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ = 1). This means that the defender refrains from applying any control input due to a deterministic adversarial takeover at ๐‘˜ + 1. Notice that this condition of zero control gain is aligned with setting ๐œ‚๐‘˜ = 1 in (4.66). ii) Pure strategy: Both the defender and adversary choose the action of staying idle. In this case, the takeover strategy corresponds to the conditions: ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, and (cid:101)๐‘ƒโˆ— (cid:101)๐‘ƒโˆ— ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ. Given the absence of an adversary control term in determining the saddle-point value of the game, the defenderโ€™s control problem simplifies to: min ๐พ๐‘˜ ๐‘˜+1 + ๐‘ฅT ๐‘ฃ0 ๐‘˜ ๐พ T ๐‘˜ ๐‘€๐‘˜ ๐พ๐‘˜ ๐‘ฅ. (4.73) Taking the first derivative of (4.73) with respect to ๐พ๐‘˜ , and solving for the first-order optimality condition, we obtain: ๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ๐‘˜ ) = โˆ’๐‘€๐‘˜ ๐พ๐‘˜ , โ‡’ ๐พ๐‘˜ = (๐‘€๐‘˜ + ๐ตT๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ )โˆ’1๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ธ๐‘˜ := ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ = 0). This control policy pertains to a single-player control problem, given that the FlipDyn state deter- ministically remains at ๐›ผ๐‘˜+1 = 0. Furthermore, this control policy corresponds to setting ๐œ‚๐‘˜ = 0 in (4.66). โ–ก Theorems 4.5.1 and 4.5.3 completely characterize the control policies of both players in the space of pure and mixed NE strategies for the takeover. This characterization enables us to compute the saddle-point value efficiently. If we define the dynamics of the defender and adversary using a parameter ๐œ๐‘˜ โˆˆ R, then the continuous state evolution is given by ๐‘ฅ๐‘˜+1 = ห‡๐ต๐‘˜ (๐œ๐‘˜ )๐‘ฅ๐‘˜ := (๐ธ๐‘ฅ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ๐‘˜ ))๐‘ฅ๐‘˜ , ๐‘ฅ๐‘˜+1 = ห‡๐‘Š๐‘˜ (๐œ๐‘˜ )๐‘ฅ๐‘˜ := (๐ธ๐‘ฅ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ๐‘˜ ))๐‘ฅ๐‘˜ . (4.74) 125 The parameter ๐œ๐‘˜ = ๐œ‚๐‘˜ when we use the derived control policies (4.55) and (4.56) under a mixed strategy NE. Next, we outline the NE takeover strategies of both the players, along with the corresponding saddle-point values for each of the FlipDyn state, for discrete-time linear dynamics, linear state- feedback control policies, and quadratic costs. We begin by analyzing the case when ๐‘ฅ is a scalar, for which we will compute the saddle-point value exactly, and subsequently proceed to approximate the saddle-point value for the ๐‘›- dimensional case. 4.5.2 Scalar/1-dimensional dynamical system The quadratic costs any time ๐‘˜ โˆˆ K stated in (4.28) for a scalar dynamical system. For a scalar dynamical system, we use the following notation to represent the saddle-point value in each FlipDyn state. Let ๐‘˜ (๐‘ฅ) := p0 ๐‘‰ 0 ๐‘˜ ๐‘ฅ2, ๐‘‰ 1 ๐‘˜ (๐‘ฅ) := p1 ๐‘˜ ๐‘ฅ2, where p๐›ผ ๐‘˜ โˆˆ R, ๐›ผ โˆˆ {0, 1}, ๐‘˜ โˆˆ K. Building on Theorem 4.3.2, we present the following result, which provides a closed-form expression for the NE takeover in both pure and mixed strategies of both players, and outlines the saddle-point value update of the parameter p๐›ผ ๐‘˜ . Corollary 3 (Case ๐›ผ๐‘˜ = 0) The unique NE takeover strategies of the FlipDyn-Con game (4.7) at any time ๐‘˜ โˆˆ K, subject to the dynamics (4.74) for a scalar dynamical system with quadratic 126 costs (4.28), takeover costs (4.37), and FlipDyn dynamics (4.3) are given by: ๐‘ฆ0โˆ— ๐‘˜ = ๐‘ง0โˆ— ๐‘˜ = (cid:20) ๐‘Ž๐‘˜ ห‡p๐‘˜+1 (cid:20) 1 1 โˆ’ (cid:21) T , ๐‘Ž๐‘˜ ห‡p๐‘˜+1 if ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , (cid:21) T , 0 otherwise, (cid:20) (cid:20) (cid:20) 1 โˆ’ ๐‘‘๐‘˜ ห‡p๐‘˜+1 (cid:21) T , ๐‘‘๐‘˜ ห‡p๐‘˜+1 0 1 (cid:21) T (cid:21) T , , 1 0 if if ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 โ‰ค ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , otherwise, ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ where (cid:32) ห‡p๐‘˜+1 := ๐‘ 2 ๐‘˜ p1 (๐‘๐‘˜ โˆ’ (1 โˆ’ ๐œ‚2 ๐‘˜+1 ๐‘˜ )๐ป2 ๐‘˜ p1 ๐‘˜+1 โˆ’ )2 ๐‘€ 2 ๐‘˜ p0 (๐‘€๐‘˜ + (1 โˆ’ ๐œ‚2 ๐‘˜+1 ๐‘˜ )๐ต2 ๐‘˜ p0 ๐‘˜+1 The saddle-point value parameter at time ๐‘˜ is given by: (4.75) (4.76) (cid:33) ๐‘˜ . ๐ธ 2 )2 ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 โ‰ค ๐‘‘๐‘˜ , (4.77) ๐‘˜ , ๐ธ 2 )2 if if ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , otherwise, p0 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ0 ๐‘˜ + ๐‘‘๐‘˜ โˆ’ ๐‘‘๐‘˜ ๐‘Ž๐‘˜ ห‡p๐‘˜+1 + ๐พ๐‘˜ (๐œ‚๐‘˜ )โˆ—2๐‘€๐‘˜ + ๐‘€ 2 ๐‘˜ p0 (๐‘€๐‘˜ + (1 โˆ’ ๐œ‚2 ๐‘˜+1 ๐‘˜ )๐ต2 ๐‘˜ p0 ๐‘˜+1 ๐บ0 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ + ๐พ๐‘˜ (1)โˆ—2๐‘€๐‘˜ + ๐‘ 2 ๐‘˜ p1 (๐‘๐‘˜ โˆ’ (1 โˆ’ ๐œ‚2 ๐‘˜+1 ๐‘˜ )๐ป2 ๐‘˜ p1 ๐‘˜+1 ๐‘˜ , ๐ธ 2 )2 ๐บ0 ๐‘˜ + ๐‘€ 2 ๐‘˜ p0 (๐‘€๐‘˜ + ๐ต2 ๐‘˜+1 ๐‘˜ p0 ๐‘˜+1 ๐ธ 2 ๐‘˜ + ๐พ๐‘˜ (0)โˆ—2๐‘€๐‘˜ , )2 127 (Case ๐›ผ๐‘˜ = 1) The unique NE takeover strategies are given by: ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃณ ๐‘ฆ1โˆ— ๐‘˜ = ๐‘ง1โˆ— ๐‘˜ = (cid:20) (cid:20) (cid:20) 1 โˆ’ ๐‘Ž๐‘˜ ห‡p๐‘˜+1 (cid:21) T , ๐‘Ž๐‘˜ ห‡p๐‘˜+1 0 1 (cid:21) T (cid:21) T , , 1 0 if if ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 โ‰ค ๐‘Ž๐‘˜ , otherwise, (cid:20) ๐‘‘๐‘˜ ห‡p๐‘˜+1 (cid:20) 1 1 โˆ’ (cid:21) T , ๐‘‘๐‘˜ ห‡p๐‘˜+1 if ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , (cid:21) T , 0 otherwise, (4.78) (4.79) The saddle-point value parameter at time ๐‘˜ is given by, p1 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ1 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ + ๐‘‘๐‘˜ ๐‘Ž๐‘˜ ห‡p๐‘˜+1 โˆ’ ๐‘Š๐‘˜ (๐œ‚๐‘˜ )โˆ—2๐‘๐‘˜ + ๐‘ 2 ๐‘˜ p1 (๐‘๐‘˜ โˆ’ (1 โˆ’ ๐œ‚2 ๐‘˜+1 ๐‘˜ )๐ป2 ๐‘˜ p1 ๐‘˜+1 ๐บ1 ๐‘˜ + ๐‘‘๐‘˜ โˆ’ ๐‘Š๐‘˜ (1)โˆ—2๐‘๐‘˜ + ๐‘€ 2 ๐‘˜ p0 (๐‘€๐‘˜ + (1 โˆ’ ๐œ‚2 ๐‘˜+1 ๐‘˜ )๐ป2 ๐‘˜ p0 ๐‘˜+1 ๐‘˜ , ๐ธ 2 )2 ๐บ1 ๐‘˜ + ๐‘ 2 ๐‘˜ p1 (๐‘๐‘˜ โˆ’ ๐ป2 ๐‘˜+1 ๐‘˜ p1 ๐‘˜+1 ๐ธ 2 ๐‘˜ โˆ’ ๐‘Š๐‘˜ (0)โˆ—2๐‘๐‘˜ , )2 ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , (4.80) ๐‘˜ , ๐ธ 2 )2 if if ห‡p๐‘˜+1 โ‰ค ๐‘Ž๐‘˜ , otherwise. The recursions (4.77) and (4.80) hold provided, p0 ๐‘˜+1 ๐ต2 ๐‘˜ + ๐‘€๐‘˜ โ‰ฅ 0, p1 ๐‘˜+1 ๐ป2 ๐‘˜ โˆ’ ๐‘๐‘˜ โ‰ค 0. (4.81) The terminal conditions for the recursions (4.77) and (4.80) are: ๐ฟ+1 := ๐บ0 p0 ๐ฟ+1 , ๐ฟ+1 := ๐บ1 p1 ๐ฟ+1 โ–ก 128 Proof: We begin the proof by determining the NE takeover in both pure and mixed strategies, and computing the corresponding saddle-point value parameter for the FlipDyn state of ๐›ผ = 0. We substitute the quadratic costs (4.28), linear dynamics (4.74), and the obtained optimal control policies (4.70) and (4.71) in the term (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) from (4.53) to obtain: หœ๐‘ƒ๐‘˜+1(๐‘ฅ) := (cid:16) (๐ธ๐‘ฅ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))2p1 ๐‘˜+1 โˆ’ (๐ธ๐‘ฅ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ))2p0 ๐‘˜+1 (cid:17) ๐‘ฅ2, p1 ๐‘˜+1 = (cid:169) (cid:173) (cid:171) (cid:32) = p1 ๐‘˜+1 (cid:32) 1 + ๐ป2 ๐‘˜ p1 ๐‘๐‘˜ โˆ’ ๐ป2 ๐‘ 2 ๐‘˜ โˆ’ ๐ป2 ๐‘˜ p1 (๐‘๐‘˜ โˆ’ ๐ป2 ๐‘˜+1 ๐‘˜ p1 ๐‘˜+1 ๐‘˜+1 + ๐ป2 ๐‘˜ p1 ๐‘˜ p1 )2 ๐‘˜+1 (cid:33) 2 (cid:32) โˆ’ p0 ๐‘˜+1 1 โˆ’ ๐‘€ 2 ๐‘˜ p0 ๐‘€๐‘˜ + ๐ป2 ๐‘˜+1 ๐‘˜ p0 ๐‘˜+1 โˆ’ p0 ๐‘˜+1 ๐‘€๐‘˜ + ๐ต2 ๐‘˜ p0 ๐‘˜+1 (๐‘€๐‘˜ + ๐ต2 (cid:33) 2 (cid:170) (cid:174) (cid:172) ๐‘˜ p0 )2 ๐‘˜+1 โˆ’ ๐ต2 ๐‘˜ p0 ๐‘˜+1 ๐ธ 2 ๐‘˜ ๐‘ฅ2 ๐‘˜+1 (cid:33) ๐ธ 2 ๐‘˜ ๐‘ฅ2, = ห‡p๐‘˜+1๐‘ฅ2 Substituting ห‡p๐‘˜+1 and takeover costs (4.37) in (4.14) and (4.15), we obtain the NE takeover strategies presented in (4.75) and (4.76), respectively. Notably, as observed in Theorem 4.3.2, the NE takeover strategies for the FlipDyn state of ๐›ผ๐‘˜ = 1 can be also be obtained by taking the complementary of (4.75) and (4.76), resulting in (4.78) and (4.79), respectively. To obtain a recurrence relation for the parameter p0 ๐‘˜ , we substitute the linear dynamics (4.74) along with quadratic costs (4.28), takeover costs (4.37). This yields ๐‘˜ ๐‘ฅ2 = p0 ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ (๐บ0 ๐‘˜ + ๐‘‘๐‘˜ )๐‘ฅ2 โˆ’ ๐‘‘๐‘˜ ๐‘Ž๐‘˜ ๐‘ฅ4 ห‡p๐‘˜+1๐‘ฅ2 + (๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ )2๐‘€๐‘˜ + ห‡๐ต๐‘˜ (๐œ‚๐‘˜ )2p0 ๐‘˜+1)๐‘ฅ2, (๐บ0 ๐‘˜ + ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )2p1 ๐‘˜+1 โˆ’ ๐‘Ž๐‘˜ )๐‘ฅ2, (๐บ0 ๐‘˜ + ๐พ โˆ— ๐‘˜ (0)2๐‘€๐‘˜ + ห‡๐ต๐‘˜ (0)2p0 ๐‘˜+1)๐‘ฅ2, if if ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 โ‰ค ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , otherwise. Substituting the control gains ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ) (4.71) and factoring out the term ๐‘ฅ2, we arrive at (4.77). Employing analogous substitutions for the FlipDyn state of ๐›ผ๐‘˜ = 1, we ๐‘˜ (๐œ‚๐‘˜ ) (4.70) and ๐‘Š โˆ— obtain (4.80). Condition (4.81) corresponds to a second-order optimality condition for the policy pair ๐‘ขโˆ— ๐‘˜ (๐‘ฅ) ๐‘˜ (๐‘ฅ) derived from (4.54) for a scalar dynamical system. This condition ensures that the and ๐‘คโˆ— 129 control policies form a saddle-point equilibrium. โ–ก Corollary 3 presents a closed-form solution for the FlipDyn-Con (4.7) game with NE takeover strategies independent of state of the scalar/1-dimensional system. However, it is important to note that not all control quadratic costs (4.28) satisfy the recursion of the saddle-point value parameter outlined in Corollary 3. The following remark presents the minimum adversary control cost, ๐‘๐‘˜ , that satisfies the parameter recursions described in (4.77) and (4.80). Remark 4.5.4 Given a scalar/1-dimensional system (4.74) with quadratic costs (4.28), the NE takeover strategies and the recursion for the saddle-point value parameter, as outlined in Corol- lary 3, exist for adversary control costs ๐‘ โˆ— ๐‘˜ โ‰ค ๐‘๐‘˜ provided โˆ’ ๐‘ โˆ— ๐‘˜ + ๐ป2 ๐‘˜ p1 ๐‘˜+1 < 0, โˆ€๐‘˜ โˆˆ K. The parameters ๐‘ โˆ— ๐‘˜ in Remark 4.5.4 can be computed using a bisection method at every time ๐‘˜ โˆˆ K. Given an arbitrary adversary control cost ๐‘๐‘˜ , we start by updating the parameter of saddle- point value in (4.77) and (4.80) backward in time. At any time instant ๐‘˜ โˆˆ K, if the inequality ๐‘˜ p1 โˆ’๐‘๐‘˜ + ๐ป2 ๐‘˜+1 โ‰ค 0 is not satisfied, the adversary cost ๐‘๐‘˜ is updated using the bisection method. This process is iteratively repeated until reaching the time ๐‘˜ = 0 and the bisection method has converged. The determined cost ๐‘ โˆ— ๐‘˜ indicates the minimal cost the adversary must bear to control the system effectively. Similar to the findings presented in [16], in addition to the minimum adversary control costs, we can determine a minimum adversarial state cost ๐บ1โˆ— ๐‘˜ that guarantees a mixed strategy NE takeover at every time ๐‘˜ โˆˆ K. We characterize such an adversarial state cost in the following remark. Remark 4.5.5 Given a scalar/1-dimensional system (4.74) with quadratic costs (4.28), the mixed strategy NE takeover and the corresponding recursion for the saddle-point value parameter, as outlined in Corollary 3, exists for an adversary state-dependent cost ๐บ1โˆ— ๐‘˜ โ‰ค ๐บ1 ๐‘˜ provided ห‡p๐‘˜+1 > ๐‘‘๐‘˜ , ห‡p๐‘˜+1 > ๐‘Ž๐‘˜ , โˆ€๐‘˜ โˆˆ K, 130 with the parameters at the time ๐ฟ + 1 satisfying ๐ฟ+1 = ๐บ1โˆ— p1 ๐ฟ+1 , ๐ฟ+1 = ๐บ0 p0 ๐ฟ+1 . (4.82) The process for determining the minimum adversary state cost ๐บ1โˆ— ๐‘˜ is analogous to that of ๐‘ โˆ— ๐‘˜ and involves utilizing a bisection method. To simultaneously compute both ๐บ1 ๐‘˜ and ๐‘ โˆ— ๐‘˜ requires a dual bisection approach, with an outer bisection loop for ๐‘ โˆ— ๐‘˜ and an inner bisection loop for ๐‘˜ .This iterative procedure continues until we reach the time instant ๐‘˜ = 0, and both bisections ๐บ1โˆ— have converged. Next, we illustrate the results of Corollary 3 through a numerical example. A Numerical Example We evaluate the NE takeover strategies and saddle-point value parameters obtained in Corol- lary 3 on a linear time-invariant (LTI) scalar system for a horizon length of ๐ฟ = 20. The quadratic costs (4.28) are assumed to fixed โˆ€๐‘˜ โˆˆ K, given by ๐บ0 ๐‘˜ = ๐บ0 = 1, ๐บ1 ๐‘˜ = ๐บ1 = 1, ๐‘‘๐‘˜ = ๐‘‘ = 0.45, ๐‘Ž๐‘˜ = ๐‘Ž = 0.25, ๐‘€๐‘˜ = ๐‘€ = 0.65. The control matrices of both the players reduce to ๐ต๐‘˜ = ๐ป๐‘˜ = ฮ”๐‘ก, โˆ€๐‘˜ โˆˆ K, where ฮ”๐‘ก = 0.1 for the numerical evaluation. We solve for the NE takeover strategies and the saddle-point value parameters for two cases of a fixed state transition constant ๐ธ๐‘˜ = ๐ธ, โˆ€๐‘˜ โˆˆ K: ๐ธ = 0.85 and 1.0. For ๐ธ = 0.85, the minimal adversary control costs are ๐‘ โˆ— ๐‘˜ = ๐‘ โˆ— = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ 0.39, if ห‡p๐‘˜+1 โ‰ฅ ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 โ‰ฅ ๐‘‘๐‘˜ 0.25, otherwise, whereas, for ๐ธ = 1.0, the minimal adversary control costs are ๐‘ โˆ— ๐‘˜ = ๐‘ โˆ— = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ 2.17, if ห‡p๐‘˜+1 โ‰ฅ ๐‘Ž๐‘˜ , ห‡p๐‘˜+1 โ‰ฅ ๐‘‘๐‘˜ , 1.51, otherwise, 131 To obtain a mixed strategy NE takeover over the entire horizon ๐ฟ, we solve for adversary cost ๐บ1โˆ— ๐‘˜ for each case given by ๐บ1โˆ— ๐‘˜ = ๐บ1โˆ— = 1.56, when ๐ธ = 0.85, 1.43, when ๐ธ = 1.00. ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ Figures 4.6a and 4.6b illustrate the saddle-point value parameters p0 ๐‘˜ for both ๐ธ = 0.85 and 1.00. In Figure 4.6a, M-NE represents a mixed strategy NE takeover over the entire horizon ๐‘˜ and p1 ๐ฟ, achieved through ๐‘ โˆ— ๐‘˜ and ๐บ1โˆ— ๐‘˜ . We observe the saddle-point parameter value for the adversary increases with increasing value of ๐ธ, i.e., as the system shifts from open-loop stable ๐ธ < 1 to unstable ๐ธ โ‰ฅ 1, there is larger incentive for the adversary to takeover the system. Figures 4.7a and 4.7a shows the probabilities of takeover by the defender and adversary when ๐›ผ๐‘˜ = 0. For both ๐ธ = 0.85 and 1.00, the probabilities follow a monotonic decrease (resp. increase) for the defender (resp. adversary). When the obtained takeover strategies contain both the pure and mixed strategy NE, there exists a time instant beyond which both the players switch to a pure strategy NE for all future time instants. This switch indicates that under the given costs, there is no incentive for either player to takeover. Finally, the difference between ๐ธ = 0.85 and 1.00, shows the rate at which the takeover strategies change over time. The probability of taking over when ๐ธ = 1.00 is higher compared to ๐ธ = 0.85, and decreases rapidly at the end of the horizon. Next, we will extend our derivation and analysis of the FlipDyn-Con game with discrete-time linear dynamics and quadratic costs to ๐‘›โˆ’dimensions. 4.5.3 n-dimensional system Unlike the scalar case, wherein the state ๐‘ฅ was factored out during the computation of the NE takeover strategies and saddle-point value parameters p0 ๐‘˜ and p1 ๐‘˜ , that simplification does not yield exact results for an ๐‘›โˆ’dimensional system. The challenge for factoring out the state at any time ๐‘˜ โˆˆ K, arises from the term ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ ๐‘ฅT (cid:16) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ห‡๐ต๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ0 ห‡๐ต๐‘˜ (๐œ‚๐‘˜ ) ๐‘˜+1 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:17) (cid:124) (cid:125) , ๐‘ฅ (4.83) (cid:123)(cid:122) (cid:101)๐‘ƒ๐‘˜+1 (๐‘ฅ) 132 (a) Figure 4.6 Saddle-point value parameters p๐‘– constant (a) ๐ธ = 0.85, (b) ๐ธ = 1.0. The parameters p๐‘– saddle-point under a mixed NE takeover over the entire time horizon. (b) ๐‘˜ , ๐‘˜ โˆˆ {1, 2, . . . , ๐ฟ}, ๐‘– โˆˆ {0, 1} for state transition ๐‘˜ ,M-NE corresponds to the parameters of the (a) (b) Figure 4.7 Defender takeover strategies ๐›ฝ๐‘˜ and adversary takeover strategies ๐›พ๐‘˜ for state transition (a) ๐ธ = 0.85 and (b) ๐ธ = 1.0. M-NE corresponds to the mixed NE policy. which arises when a mixed strategy NE takeover is played in either of the FlipDyn states. A similar challenge was encountered in [16], where the aforementioned term was approximated to factor out the state ๐‘ฅ while computing the saddle-point value parameters backward in time. Here, we propose a more general approach by leveraging the results of Theorem 4.5.1 to address this limitation. Recall that the parameterized control policy pair {๐‘ขโˆ— ๐‘˜ (๐œ‚๐‘˜ ), ๐‘คโˆ— ๐‘˜ (๐œ‚๐‘˜ )} with a feasible parameter ๐œ‚๐‘˜ 133 5101520204060801001201401618202451015205001000150020002500161820246800.5510152000.5100.5510152000.51 must satisfy condition (4.59): ๐‘ฅT (cid:16) ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ห‡๐ต๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ0 ๐‘˜+1 (cid:17) ห‡๐ต๐‘˜ (๐œ‚๐‘˜ ) ๐‘ฅ = ๐‘ฅT๐‘ฅ โˆš ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜ . Substituting condition (4.59) in (4.83) yields: ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) := ๐œ‚๐‘˜ ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ โˆš ๐‘ฅT๐‘ฅ ๐‘Ž๐‘˜ ๐‘‘๐‘˜ = ๐œ‚๐‘˜ โˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ. (4.84) Analogous to the scalar/1โˆ’dimensional case, we will use Theorem 4.3.2 to present the following result, which provides a closed-form expression for the NE takeover in both pure and mixed strategies ๐‘˜ โˆˆ R๐‘›ร—๐‘›, ๐›ผ โˆˆ {0, 1}. of both players, and outlines the saddle-point value update of the parameter ๐‘ƒ๐›ผ Corollary 4 (Case ๐›ผ๐‘˜ = 0) The unique NE takeover strategies of the FlipDyn-Con game (4.7) for every ๐‘˜ โˆˆ K, subject to the dynamics (4.74), with quadratic costs (4.25), takeover costs (4.37), and FlipDyn dynamics (4.3) are given by (4.85) (4.86) ๐‘ฆ0โˆ— ๐‘˜ = ๐‘ง0โˆ— ๐‘˜ = ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) (cid:20) (cid:118)(cid:117)(cid:116) ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) 1 1 โˆ’ ๐œ‚๐‘˜ (cid:118)(cid:117)(cid:116) ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป 0 T , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:21) T , otherwise, 1 โˆ’ ๐œ‚๐‘˜ (cid:118)(cid:117)(cid:116) ๐‘‘๐‘˜ ๐‘Ž๐‘˜ T , (cid:118)(cid:117)(cid:116) ๐‘‘๐‘˜ ๐‘Ž๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐œ‚๐‘˜ 0 1 (cid:21) T (cid:21) T , , 1 0 if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, otherwise. 134 The saddle-point value parameter at time ๐‘˜ is given by: ๐‘ƒ0 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ0 ๐‘˜ + ห‡๐ต๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ0 ห‡๐ต๐‘˜ (๐œ‚๐‘˜ ) + ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ )T๐‘€๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚๐‘˜ ) + ๐‘‘๐‘˜ I๐‘› โˆ’ I๐‘›๐œ‚๐‘˜ ๐‘˜+1 โˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ , ๐บ0 ๐‘˜ + ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ๐‘Ž๐‘˜ I๐‘›, ๐บ0 ๐‘˜ + ๐พ โˆ— ๐‘˜ (0)T๐‘€๐‘˜ ๐พ โˆ— ๐‘˜ (0) + ห‡๐ต๐‘˜ (0)T๐‘ƒ0 ๐‘˜+1 ห‡๐ต๐‘˜ (0), if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, otherwise. (Case ๐›ผ๐‘˜ = 1) The unique NE takeover strategies are given by: ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃณ ๐‘ฆ1โˆ— ๐‘˜ = ๐‘ง1โˆ— ๐‘˜ = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) (cid:20) 1 โˆ’ ๐œ‚๐‘˜ (cid:118)(cid:117)(cid:116) ๐‘Ž๐‘˜ ๐‘‘๐‘˜ T , (cid:118)(cid:117)(cid:116) ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐œ‚๐‘˜ 0 1 (cid:21) T (cid:21) T , , 1 0 (cid:118)(cid:117)(cid:116) ๐‘‘๐‘˜ ๐‘Ž๐‘˜ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๐œ‚๐‘˜ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) 1 1 โˆ’ ๐œ‚๐‘˜ (cid:118)(cid:117)(cid:116) ๐‘‘๐‘˜ ๐‘Ž๐‘˜ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป T , (cid:21) T 0 if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, otherwise, if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, , otherwise. (4.87) (4.88) (4.89) The saddle-point value parameter at time ๐‘˜ is given by, ๐‘ƒ1 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ1 ๐‘˜ + ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ๐‘Š Tโˆ— ๐‘˜ (๐œ‚๐‘˜ )๐‘๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ๐‘Ž๐‘˜ I๐‘› + I๐‘›๐œ‚๐‘˜ ๐‘˜+1 โˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ , ๐บ1 ๐‘˜ + ห‡๐ต๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ0 ๐‘˜+1 ห‡๐ต๐‘˜ (๐œ‚๐‘˜ ) + ๐‘‘๐‘˜ I๐‘›, if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ, (4.90) (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ, ๐บ1 ๐‘˜ โˆ’ ๐‘Š๐‘˜ (0)Tโˆ—๐‘๐‘˜๐‘Š โˆ— ๐‘˜ (0) + ห‡๐‘Š๐‘˜ (0)T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (0), otherwise. 135 The recursions (4.87) and (4.90) hold provided, ๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ โ‰ป 0, ๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ โ‰บ 0. (4.91) The terminal conditions for the recursions (4.87) and (4.90) are: ๐‘ƒ0 ๐ฟ+1 := ๐บ0 ๐ฟ+1 , ๐‘ƒ1 ๐ฟ+1 := ๐บ1 ๐ฟ+1 . โ–ก Proof: We begin the proof by determining the NE takeover in pure and mixed strategies of the FlipDyn state of ๐›ผ = 0. We substitute the takeover cost (4.37) and the terms from (4.84) in (4.14) and (4.15), to obtain the NE takeover policies in (4.85) and (4.86), respectively. Analogous to the scalar/1โˆ’dimensional case, the NE takeover strategies in (4.88) and (4.89) for the FlipDyn state of ๐›ผ = 1 are the complementary takeover strategies of the FlipDyn state ๐›ผ = 0. To determine the saddle-point value parameters for the FlipDyn state of ๐›ผ = 0, we substi- tute (4.84), discrete-time linear dynamics (4.74), quadratic costs (4.25) and takeover costs (4.37) in (4.16) and factor out the state ๐‘ฅ to obtain (4.87). Through similar substitutions and factorization we can obtain (4.90) corresponding to the FlipDyn state of ๐›ผ = 1. โ–ก Similar to the 1โˆ’dimensional case, Corollary 4 presents a closed-form solution for the FlipDyn- Con (4.7) game with NE takeover strategies independent of state. However, this NE takeover strategy and saddle-point value parameters are conditioned on finding a feasible parameter ๐œ‚๐‘˜ , โˆ€๐‘˜ โˆˆ K that satisfies (4.84). A feasible parameter ๐œ‚๐‘˜ is seldom found corresponding to the linear dynamics (4.74), as the matrices ห‡๐ต๐‘˜ (๐œ๐‘˜ ) and ห‡๐‘Š๐‘˜ (๐œ๐‘˜ ) are generally non-diagonal. Therefore, there is a need to find approximate NE takeover strategies and bounds on the saddle-point values for any general ๐‘›โˆ’dimensional case which need not satisfy (4.84). A solution addressing the limitation in determining a parameter ๐œ‚๐‘˜ is found by re-visiting the optimal linear state-feedback control from Theorem 4.5.1, described in the following result. Lemma 4.5.6 Under Assumptions 4.4.2 and 4.4.4, consider a linear dynamical system described by (4.24) with quadratic costs (4.25), takeover costs (4.37), and FlipDyn dynamics (4.3) with known 136 saddle-point value parameters ๐‘ƒ1 ๐‘˜+1 and ๐‘ƒ0 ๐‘˜+1. Suppose that for every ๐‘˜ โˆˆ K, ๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ โ‰ป 0, ๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ โ‰บ 0, (4.92) and for every ๐‘ฅ โˆˆ X, there exists scalars ๐œ‚ ๐‘˜ โˆˆ R and ๐œ‚๐‘˜ โˆˆ R which correspond to an optimal linear state-feedback control pair {๐พ โˆ— ๐‘˜ (๐œ‚ ๐‘˜ ), ๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )} of the form (4.55) and (4.56), such the following conditions are satisfied: ๐‘ฅT๐‘ฅ โˆš ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚ ๐‘˜ โ‰ค ๐‘ฅTP๐‘˜+1๐‘ฅ โ‰ค ๐‘ฅT๐‘ฅ โˆš ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐œ‚๐‘˜ , (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ (๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ ))T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ ))T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ ๐‘˜ ๐‘˜ (4.93) )) โ‰ป ๐‘‘๐‘˜ I๐‘›. (4.94) )) โ‰ป ๐‘Ž๐‘˜ I๐‘›. ๐‘˜ ๐‘˜ (4.95) where P๐‘˜+1 =(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ))T๐‘ƒ1 ๐‘˜+1(๐ธ๐‘˜ + ๐ป๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )) โˆ’ (๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ ))T๐‘ƒ0 ๐‘˜+1(๐ธ๐‘˜ + ๐ต๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ )). ๐‘˜ ๐‘˜ Then, the saddle-point value parameters at time ๐‘˜ โˆˆ K, under a mixed strategy NE takeover of both the FlipDyn states satisfy ๐‘ƒ0 ๐‘˜ โชฏ ๐บ0 ๐‘˜ + ๐‘‘๐‘˜ I๐‘› + ๐พ โˆ— ๐‘˜ (๐œ‚ )T๐‘€๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ ๐‘˜ ๐‘˜ ) โˆ’ I๐‘›๐œ‚ ๐‘˜ โˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ + ห‡๐ต๐‘˜ (๐œ‚ )T๐‘ƒ0 ๐‘˜+1 ๐‘˜ ห‡๐ต๐‘˜ (๐œ‚ ), ๐‘˜ ๐‘ƒ1 ๐‘˜ โชฐ ๐บ1 ๐‘˜ โˆ’ ๐‘Ž๐‘˜ I๐‘› โˆ’ ๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )T๐‘๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ) + I๐‘›๐œ‚๐‘˜ โˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ + ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ). (4.96) (4.97) โ–ก Proof: From (4.55), a linear defender control policy gain parameterized by a scalar ๐œ‚ ๐‘˜ , is given by: ๐พ โˆ— ๐‘˜ (๐œ‚ ๐‘˜ ) = โˆ’(๐œ—(๐œ‚ ๐‘˜ )๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ )โˆ’1(๐œ—(๐œ‚ ๐‘˜ )๐ตT ๐‘˜ ๐‘ƒ0 ๐‘˜+1 ๐ธ๐‘˜ ), (4.98) where ๐œ—(๐‘) := 1 โˆ’ ๐‘2. Likewise, from (4.56), a linear adversary control policy gain parameterized by a scalar ๐œ‚๐‘˜ , is given by: ๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ) = โˆ’(๐œ—(๐œ‚๐‘˜ )๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ )โˆ’1(๐œ—(๐œ‚๐‘˜ )๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ธ๐‘˜ ). (4.99) 137 Upon substituting the condition (4.93) in (4.60) and (4.61) and solving for the second-order optimal- ity condition (similar to Theorem 4.5.1) yields (4.92), which certifies a saddle-point equilibrium. Recall that any control policy pair {๐พ๐‘˜ , ๐‘Š๐‘˜ } that constitutes a mixed strategy NE takeover to both the saddle-point values ๐‘‰ 0 ๐‘˜ (๐‘ฅ) and ๐‘‰ 1 ๐‘˜ (๐‘ฅ) must satisfy the conditions: (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ. Thus, upon substituting the linear dynamics (4.24) and the optimal control gains {๐พ โˆ— ๐‘˜ (๐œ‚ ), ๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )} ๐‘˜ in (4.53) and factoring out the state ๐‘ฅ, we obtain the conditions (4.94) and (4.95). Next, we will only establish (4.96), as the derivation for (4.97) is analogous. Under a mixed strategy NE takeover, we substitute the quadratic costs (4.25), discrete-time linear dynamics (4.74) and the defender control (4.98) in (4.16) to obtain: ๐‘ฅT๐‘ƒ0 ๐‘˜ ๐‘ฅ =๐‘ฅT (cid:16) ๐บ0 ๐‘˜ + ๐‘‘๐‘˜ I๐‘› + ๐พ โˆ— ๐‘˜ )T๐‘€๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ ๐‘˜ (๐œ‚ ๐‘˜ ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ (cid:17) ) ๐‘ฅ + ๐‘ฅT (cid:16) ห‡๐ต๐‘˜ (๐œ‚ )T๐‘ƒ0 ๐‘˜+1 ๐‘˜ ห‡๐ต๐‘˜ (๐œ‚ ) ๐‘˜ (cid:17) ๐‘ฅโˆ’ ๐‘ฅT (cid:16) ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:124) ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ห‡๐ต๐‘˜ (๐œ‚ (cid:123)(cid:122) P๐‘˜+1 ๐‘˜+1 )T๐‘ƒ0 ) ๐‘˜ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ห‡๐ต๐‘˜ (๐œ‚ ๐‘˜+1 ๐‘˜ . (cid:17) ๐‘ฅ (cid:125) Using condition (4.93), we bound the term containing P๐‘˜+1 by ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ โ‰ค ๐œ‚ โ‰ค ๐œ‚ ๐‘˜ ๐‘˜ ๐‘ฅT๐‘Ž๐‘˜ I๐‘›๐‘ฅ๐‘ฅT๐‘‘๐‘˜ I๐‘›๐‘ฅ โˆš ๐‘Ž๐‘˜ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅT๐‘ฅโˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ . , Substituting this bound in ๐‘ฅT๐‘ƒ0 ๐‘˜ ๐‘ฅ and factoring out the state ๐‘ฅ, we obtain (4.96). โ–ก Lemma 4.5.6 provides a linear state-feedback control for the defender (resp. adversary) and enables us to compute bounds on the saddle-point values independent of the state ๐‘ฅ backward in time. More importantly, condition (4.93) serves as a relaxation for (4.84). Such a relaxation enables us to determine an upper and lower bound in a semi-definite sense, for the saddle-point value parameters using the scalars ๐œ‚๐‘˜ and ๐œ‚ ๐‘˜ , which can be used to approximately compute the saddle- point value parameters recursively. Therefore, following the same methodology from [16], for 138 the ๐‘›โˆ’dimensional case, we will solve for an approximate NE takeover strategies and saddle-point values using the parameterization, ๐‘˜ (๐‘ฅ) := ๐‘ฅT๐‘ƒ0 ๐‘‰ 0 ๐‘˜ ๐‘ฅ, ๐‘‰ 1 ๐‘˜ (๐‘ฅ) := ๐‘ฅT๐‘ƒ1 ๐‘˜ ๐‘ฅ, (4.100) where ๐‘ƒ1 ๐‘˜ โˆˆ R๐‘›ร—๐‘› and ๐‘ƒ0 ๐‘˜ โˆˆ R๐‘›ร—๐‘›. As shown in Corollary 4, we will use the results from Theorem 4.3.2 to provide an approximate NE takeover pair {๐‘ฆ๐›ผโˆ— ๐‘˜ , ๐‘ง๐›ผโˆ— ๐‘˜ }, in both pure and mixed strategies of both players, and the corresponding approximate saddle-point value update of the parameter ๐‘ƒ ๐›ผ ๐‘˜ โˆˆ R๐‘›ร—๐‘›, ๐›ผ โˆˆ {0, 1}. Corollary 5 (Case ๐›ผ๐‘˜ = 0) The approximate NE takeover strategies of the FlipDyn-Con game (4.7) at any time ๐‘˜ โˆˆ K, subject to dynamics in (4.74), with quadratic costs (4.25), takeover costs (4.37), and FlipDyn dynamics (4.3) are given by: ๐‘ฆ0โˆ— ๐‘˜ = ๐‘ง0โˆ— ๐‘˜ = ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ 1 ๏ฃฎ ๏ฃฏ ๏ฃฏ 1 โˆ’ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) (cid:20) ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ 1 โˆ’ ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:20) T ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:21) T , otherwise, 0 ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ , if T ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, 0 1 (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:21) T , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:21) T , otherwise, 1 0 (4.101) (4.102) where P๐‘˜+1 := ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ห‡๐ต๐‘˜ (๐œ‚ )T๐‘ƒ0 ๐‘˜+1 ๐‘˜ ห‡๐ต๐‘˜ (๐œ‚ ), ๐‘˜ and (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) := ๐‘ฅTP๐‘˜+1๐‘ฅ. 139 The approximate saddle-point value parameter at time ๐‘˜ is given by: ๐‘ƒ0 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ0 ๐‘˜ + ห‡๐ต๐‘˜ (๐œ‚ ๐‘˜ + ๐‘‘๐‘˜ I๐‘› โˆ’ I๐‘›๐œ‚ )T๐‘ƒ0 ๐‘˜+1 โˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ , ๐‘˜ ห‡๐ต๐‘˜ (๐œ‚ ) + ๐พ โˆ— ๐‘˜ (๐œ‚ ๐‘˜ ๐‘˜ )T๐‘€๐‘˜ ๐พ โˆ— ๐‘˜ (๐œ‚ ) ๐‘˜ ๐บ0 ๐‘˜ + ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ๐‘Ž๐‘˜ I๐‘›, if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (4.103) (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, ๐บ0 ๐‘˜ + ๐พ โˆ— ๐‘˜ (0)T๐‘€๐‘˜ ๐พ โˆ— ๐‘˜ (0) + ห‡๐ต๐‘˜ (0)T๐‘ƒ0 ๐‘˜+1 ห‡๐ต๐‘˜ (0), ๐‘‘otherwise. (Case ๐›ผ๐‘˜ = 1) The approximate NE takeover strategies are given by: ๐‘ฆ1โˆ— ๐‘˜ = ๏ฃฎ ๏ฃฏ ๏ฃฏ 1 โˆ’ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ , if T ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, 0 1 (cid:21) T , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:21) T , otherwise, 1 0 ๐‘ง1โˆ— ๐‘˜ = (cid:34) ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ 1 โˆ’ ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ ๐‘ฅTP๐‘˜+1๐‘ฅ (cid:35) T , if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:20) 1 (cid:21) T 0 , otherwise. (cid:20) (cid:20) ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃณ The approximate saddle-point value parameter at time ๐‘˜ is given by, (4.104) (4.105) ๐‘ƒ1 ๐‘˜ = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐บ1 ๐‘˜ + ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )T๐‘ƒ1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ )T๐‘๐‘˜๐‘Š โˆ— ๐‘˜ (๐œ‚๐‘˜ ) โˆ’ ๐‘Ž๐‘˜ I๐‘› + I๐‘›๐œ‚๐‘˜ ๐‘˜+1 โˆš๏ธ๐‘Ž๐‘˜ ๐‘‘๐‘˜ , ๐บ1 ๐‘˜ + ห‡๐ต๐‘˜ (๐œ‚ )T๐‘ƒ0 ๐‘˜+1 ๐‘˜ ห‡๐ต๐‘˜ (๐œ‚ ๐‘˜ ) + ๐‘‘๐‘˜ I๐‘›, if if (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ค ๐‘Ž๐‘˜ ๐‘ฅT๐‘ฅ, (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) > ๐‘‘๐‘˜ ๐‘ฅT๐‘ฅ, ๐บ1 ๐‘˜ + ๐‘Š๐‘˜ (0)Tโˆ—๐‘๐‘˜๐‘Š โˆ— ๐‘˜ (0) + ห‡๐‘Š๐‘˜ (0)T๐‘ƒ1 ๐‘˜+1 ห‡๐‘Š๐‘˜ (0), otherwise. (4.106) 140 The recursions (4.103) and (4.106) hold provided, ๐‘˜ ๐‘ƒ0 ๐ตT ๐‘˜+1 ๐ต๐‘˜ + ๐‘€๐‘˜ โ‰ป 0, ๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โˆ’ ๐‘๐‘˜ โ‰บ 0. (4.107) The terminal conditions for the recursions (4.103) and (4.106) are: ๐‘ƒ0 ๐ฟ+1 := ๐บ0 ๐ฟ+1 , ๐‘ƒ1 ๐ฟ+1 := ๐บ1 ๐ฟ+1 . โ–ก Proof: [Outline] Similar to the proofs in the prior sections, we begin the proof by determining the NE takeover in pure and mixed strategies for the FlipDyn state of ๐›ผ = 0. We substitute the quadratic costs (4.25), linear dynamics (4.74), and linear control gains (4.98) and (4.99) in the term (cid:101)๐‘ƒ๐‘˜+1(๐‘ฅ) with the approximate saddle-point value parameters ๐‘ƒ0 ๐‘˜+1 from (4.53) to obtain: ๐‘˜+1 and ๐‘ƒ1 หœ๐‘ƒ๐‘˜+1(๐‘ฅ) := ๐‘‰ 1 = ๐‘ฅT (cid:16) ๐‘˜+1( ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )๐‘ฅ) โˆ’ ๐‘‰ 0 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ )๐‘ƒ1 ห‡๐‘Š๐‘˜ (๐œ‚๐‘˜ ) ๐‘˜+1 ๐‘˜+1( ห‡๐ต๐‘˜ (๐œ‚ )๐‘ฅ), ๐‘˜ โˆ’ ห‡๐ต๐‘˜ (๐œ‚ )๐‘ƒ0 ๐‘˜+1 ๐‘˜ ห‡๐ต๐‘˜ (๐œ‚ ) ๐‘˜ (cid:17) ๐‘ฅ, = ๐‘ฅTP๐‘˜+1๐‘ฅ. We substitute the takeover cost (4.37) and ๐‘ฅTP๐‘˜+1๐‘ฅ in (4.14) and (4.15), to obtain the NE takeover policies in (4.101) and (4.102), respectively. The approximate NE takeover strategies of the FlipDyn state ๐›ผ = 1 are complementary to ๐›ผ = 0, presented in (4.104) and (4.105). To determine the approximate saddle-point value parameters under a mixed strategy NE takeover of the FlipDyn state of ๐›ผ = 0, we substitute the upper bound (4.96) from Lemma 4.5.6 and replace ๐‘˜+1 with ๐‘ƒ0 ๐‘ƒ0 discrete-time linear dynamics (4.74) and the adversary linear state-feedback control ๐‘˜+1. Under a pure strategy NE takeover, we substitute the quadratic costs (4.25), (4.99) to obtain the exact saddle-point value parameters. Combining both the solutions from the mixed and pure strategy NE takeover, we obtain (4.103). โ–ก Recursions (4.103) and (4.106) provide an approximate solution to the FlipDynCon prob- lem (4.7) for the ๐‘›-dimensional case with corresponding takeover and control policies. Similar to 141 the range of the parameter ๐œ‚๐‘˜ presented in Lemma 4.5.6, the parameters ๐œ‚๐‘˜ and ๐œ‚ ๐‘˜ under a mixed strategy NE takeover can be bounded using condition (4.59), indicated in the following remark. Remark 4.5.7 The permissible range for the parameters ๐œ‚๐‘˜ and ๐œ‚ ๐‘˜ satisfying condition (4.93) corresponding to a mixed strategy NE are given by: 0 < ๐œ‚๐‘˜ โ‰ค ๐œ‚๐‘˜ โ‰ค ๐œ‚ < ๐‘˜ โˆš๏ธ„ min{๐‘‘๐‘˜,๐‘Ž๐‘˜ } max{๐‘‘๐‘˜,๐‘Ž๐‘˜ } < 1. (4.108) Remark 4.5.7 is a direct consequence of Lemma 4.5.6. Similar to the scalar/1โˆ’dimensional case, not all control costs (4.25) satisfy the approximate saddle-point recursion. The following remark provides the minimum adversarial control cost required to satisfy the recursions (4.103) and (4.106). Remark 4.5.8 Given an ๐‘›โˆ’dimensional system (4.74) with quadratic costs (4.28), the NE takeover strategies and the recursion for the approximate saddle-point value parameter, as outlined in Corollary 5, exists for an adversary control costs ๐‘ โˆ— ๐‘˜ โ‰บ๐‘๐‘˜ provided โˆ’ ๐‘ โˆ— ๐‘˜ + ๐ปT ๐‘˜ ๐‘ƒ1 ๐‘˜+1 ๐ป๐‘˜ โ‰บ 0, โˆ€๐‘˜ โˆˆ K. Analogous to the scalar/1โˆ’dimensional system, the parameter ๐‘ โˆ— ๐‘˜ can be found using a bisection method at every stage ๐‘˜ โˆˆ K. A candidate initial value for all ๐‘˜ โˆˆ K can be set to ๐‘๐ฟ := ๐œˆR>0 such that ๐œˆI๐‘ โ‰ป ๐ปT ๐ป๐ฟ. Similarly, we can also determine a minimum adversarial state cost I๐‘ ๐ฟ ๐‘ƒ1 ๐ฟ+1 ๐บ1โˆ— ๐‘˜ to guarantee a mixed strategy NE takeover at every time ๐‘˜ for the ๐‘›-dimensional system. The following remark summarizes such an adversarial cost. Remark 4.5.9 Given a ๐‘›โˆ’dimensional system (4.74) with quadratic costs (4.28), the mixed strategy NE takeover and the corresponding recursion for the approximate saddle-point value parameter, as outlined in Corollary 5, exists for an adversary state-dependent cost ๐บ1โˆ— ๐‘˜ โชฏ ๐บ1 ๐‘˜ provided P๐‘˜+1 โ‰ป ๐‘‘๐‘˜ I๐‘›, P๐‘˜+1 โ‰ป ๐‘Ž๐‘˜ I๐‘›, โˆ€๐‘˜ โˆˆ K, with the parameters at the time ๐ฟ + 1 given by: ๐‘ƒ1 ๐ฟ+1 := ๐บ1โˆ— ๐ฟ+1 , ๐‘ƒ0 ๐ฟ+1 := ๐บ0 ๐ฟ+1 . (4.109) 142 As indicated in the scalar/1โˆ’dimensional case, we can determine ๐บ1โˆ— ๐‘˜ using a bisection method, and simultaneously determine ๐บ1โˆ— ๐‘˜ and ๐‘ โˆ— ๐‘˜ using a double bisection method. Next, we illustrate the results of the approximate value function on a numerical example. (a) (b) Figure 4.8 Maximum eigenvalues (๐œ†1(๐‘ƒ 1}, ๐›ผ โˆˆ {0, 1} for state transition constant (a) ๐‘’ = 0.85, (b) ๐‘’ = 1.0. ๐›ผ ๐‘˜ )) of saddle-point value parameters ๐‘ƒ ๐›ผ ๐‘˜ , ๐‘˜ โˆˆ {0, 1, . . . , ๐ฟ + (a) (b) Figure 4.9 The parameters ๐‘ƒ๐‘– ๐‘˜ ,M-NE corresponds to saddle-point value parameter recursion under a mixed NE takeover over the entire time horizon. Defender takeover strategy ๐›ฝ๐‘˜ and adversary takeover strategy ๐›พ๐‘˜ for state transition (a) ๐‘’ = 0.85 and (b) ๐‘’ = 1.0. M-NE corresponds to the mixed NE policy. 143 51015202040608010016182024651015205001000150020002500300016182051000.51510152000.5100.51510152000.51 A Numerical Example We now evaluate the results of the approximate NE takeover and the corresponding saddle-point value parameters presented in Corollary 5, on a discrete-time two-dimensional linear time-invariant system (LTI) for a horizon length of ๐ฟ = 20. The quadratic costs (4.25) are assumed to be fixed โˆ€๐‘˜ โˆˆ K, and are given by: ๐บ0 ๐‘˜ = ๐บ0 = I๐‘›, ๐บ1 ๐‘˜ = ๐บ1 =1.35I๐‘›, ๐ท ๐‘˜ = ๐ท = 0.45I๐‘›, ๐ด๐‘˜ = ๐ด = 0.25I๐‘›, ๐‘€๐‘˜ = ๐‘€ = 0.65. The system transition matrix ๐ธ๐‘˜ = ๐ธ and control matrices for the defender and adversary are given by: ๐ธ๐‘˜ = ๐ธ = ๐‘’ ฮ”๐‘ก 0 ๐‘’ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃฎ ฮ”๐‘ก ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ 0 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป , ๐ต๐‘˜ = ๐ป๐‘˜ = , โˆ€๐‘˜ โˆˆ {0, 1, . . . , ๐ฟ}, where ฮ”๐‘ก = 0.1 for the numerical example. Similar to the scalar/1โˆ’dimensional case, we solve for the approximate NE takeover strategies and saddle-point value function parameters for two cases with a fixed state transition constant ๐‘’๐‘˜ = ๐‘’, โˆ€๐‘˜ โˆˆ K: ๐‘’ = 0.85 and 1.0. Since the saddle-point value parameters for ๐‘›-dimensions are symmetric positive definite matrices, we plot the maximum ๐‘˜ , ๐‘ƒ0 eigenvalues of the value function matrices ๐‘ƒ1 ๐‘˜ shown in Figure 4.8a and 4.8b, with M-NE indicating a mixed strategy NE takeover over the entire horizon L, achieved through ๐‘ โˆ— and ๐บ1โˆ—. We obtain the adversary control costs ๐‘ โˆ— ๐‘˜ , โˆ€๐‘˜ โˆˆ K for the case of ๐‘’ = 0.85 given by: ๐‘ โˆ— ๐‘˜ = ๐‘ โˆ— = and for the case of ๐‘’ = 1.0 as: ๐‘ โˆ— ๐‘˜ = ๐‘ โˆ— = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ 0.42, if ๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ฅ ๐‘ฅT๐‘Ž๐‘˜ ๐‘ฅ, ๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ฅ ๐‘ฅT๐‘‘๐‘˜ ๐‘ฅ, 0.45, otherwise , 3.73, if ๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ฅ ๐‘ฅT๐‘Ž๐‘˜ ๐‘ฅ, ๐‘ƒ๐‘˜+1(๐‘ฅ) โ‰ฅ ๐‘ฅT๐‘‘๐‘˜ ๐‘ฅ, 3.40, otherwise. 144 Similarly, we determine the minimum adversary cost ๐บ1โˆ— ๐‘˜ for each case of ๐‘’, which corresponds to a mixed strategy NE takeover over the entire time horizon ๐ฟ given by: ๐บ1โˆ— ๐‘˜ = ๐บ1โˆ— = 1.67I๐‘›, when ๐‘’ = 0.85, 1.48I๐‘›, when ๐‘’ = 1.00, ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ Similar to the scalar/1โˆ’dimensional case, we observe that the eigenvalues of the saddle-point value parameters are significantly lower when ๐‘’ = 0.85 compared to ๐‘’ = 1.0. This corresponds to lower incentives for a takeover when the system is open-loop stable ๐‘’ < 1 as opposed to unstable condition of ๐‘’ โ‰ฅ 1. However, the value function parameter ๐‘ƒ0 ๐‘˜ always reaches a steady-state value for either values ๐‘’, implying that the system will remain stable under a defenderโ€™s control. For the ๐‘›-dimensional case, the takeover policy is a function of the state ๐‘ฅ. Therefore, we simulate the system for a total of 100 iterations with the initial state ๐‘ฅ0 = (cid:105) T (cid:104) 1 0 and show the average takeover policies in Figure 4.9a and 4.9b. For the case of playing mixed NE takeover (M-NE), we observe for both ๐‘’ = 0.85 and ๐‘’ = 1.0, given ๐›ผ = 0, i.e., when the defender is in control, the probability of takeover for the defender (resp. adversary) increase (resp. decreases) backward in time. This takeover policy indicates that the defender retains control of the system while the adversary remains idle. For the case of playing between pure and mixed NE takeover, we observe that for both ๐‘’ = 0.85 and ๐‘’ = 1.0 given ๐›ผ = 0, both players alternate between pure and mixed NE over the horizon. This numerical example illustrates the use of the approximate saddle-point value parameters in determining the takeover strategies for each player. Additionally, it provides insight into the systemโ€™s behavior for the given costs and the systemโ€™s stability properties. These insights are useful while designing the costs, which further impact the control and takeover policies. 4.6 Summary This chapter introduced FlipDyn-Con, a finite-horizon, zero-sum game of resource takeovers involving a discrete-time dynamical system. Our contributions are distilled into four key facets. First, we presented analytical expressions for the saddle-point value of the FlipDyn-Con game, alongside the corresponding NE takeover in both the pure and mixed strategies. Second, we 145 derived optimal control policies for linear dynamical systems characterized by quadratic costs. We provided sufficient conditions under which there is a saddle-point in the space of linear state- feedback policies. Third, for scalar/1โˆ’dimensional dynamical systems with quadratic costs, we derived exact saddle-point value parameters and NE takeover strategies, independent of the state of the dynamical system. Finally, for higher dimensional dynamical systems with quadratic costs, we provided approximate NE takeover strategies and control policies. Our approach enables computation for general linear systems, broadening its applicability. The practical implications of our findings were showcased through a numerical study involving the control of a linear dynamical system in the presence of an adversary. The results of the NE takeover strategies with known control policies were demonstrated in [17]. The results containing both the control policies and takeover strategies are under review in [21]. 146 CHAPTER 5 DATA-DRIVEN ADVERSARIAL MODEL All the prior chapters introduced various types of adversarial models and formulated corresponding decision-making frameworks to reason about defensive strategies given the underlying costs and model of the system. These framework are computationally efficient and scalable over long time horizons. However, when the underlying model of the system or costs are unknown, such frameworks are not suitable. Therefore, there necessitates a need to develop a framework which can reason conditioned on the data available from the underlying system. 5.1 Introduction In this chapter, we introduce a novel data-driven domain-aware, optimization-based approach to determine an effective defense strategy for CPS in an automated fashion โ€“ by emulating a strategic adversary in the loop that exploits system vulnerabilities, interconnection of the CPS, and the dynamics of the physical components. Our approach builds on an adversarial decision-making model based on a Markov Decision Process (MDP) that determines the optimal cyber (discrete) and physical (continuous) attack actions over a CPS attack graph. The defense planning problem is modeled as a non-zero-sum game between the adversary and defender. We use a model-free reinforcement learning method to solve the adversaryโ€™s problem as a function of the defense strategy. We then employ Bayesian optimization (BO) to find an approximate best-response for the defender to harden the network against the resulting adversary policy. This process is iterated multiple times to improve the strategy for both players. A majority of the worldโ€™s critical infrastructures depends on Cyber-Physical Systems (CPS) to manage essential and complex, domain-specific operational processes. Historically, CPS op- erational risk could be attributed to human operator errors, natural disasters, and acts of physical sabotage. However, with the rapid integration of physical and cyber-security processes and in- creased reliance on internet-based networks, CPS is now vulnerable to sophisticated cyber attacks that can result in significant equipment damage, service disruptions, and potential loss of life. These attacks vary in severity and application; well-known examples include the StuxNet attack [80] on 147 supervisory control and data acquisition (SCADA) systems, the German steel mill attack [83] caused by advanced persistent threats (APTs), the Ukrainian grid attack [120] via Denial of service (DOS) tactics and derailment of trams [85] using basic network access methods. In each instance, strategic threat actors used a sequence of atomic attack actions to exploit known vulnerabilities in both the cyber and physical layers of the system. MITRE ATT&CK framework is a continuously growing database of such atomic actions corresponding to specific goals on different platforms, pri- marily used to characterize post-compromise adversarial behavior in cybersecurity and in Industrial Control System (ICS) [6]. This chapter proposes a general framework for modeling and uncovering an adversaryโ€™s move- ments using a hybrid attack graph (HAG) and relating the security status of the cyber with the physical layer, while effectively configuring the HAG to ensure resilient operation of the CPS. The proposed framework has two components: (a) an adversaryโ€™s model and policy and (b) a defenderโ€™s network hardening policy. The adversaryโ€™s movement is modeled using a Markov Decision Process (MDP) on the HAG, while the policy is determined using an ML method. The defender evaluates the security of the CPS using partial observations of the HAG. The security of the CPS is quantified by the adversaryโ€™s movements and disruption in some measurable services of the physical processes. The defender uses partial observations to reason about the security of the CPS and to balance reconfiguring the HAG via network hardening and the corresponding costs. This chapter extends the linear parameterized ML method, introduced in our preliminary work [28], with a defender using Bayesian optimization to achieve successful network hardening. The proposed framework can be applied to a wide range of CPS and enhances the security of the system by preventing attacks and ensuring resilient operation. There is a large body of work on securing CPS from an attack prevention perspective in the cyber layer, categorized broadly into (a) resilience-by-design and (b) resilience-by-reaction [36]. To position our work in literature, we organize the literature in appropriate categories as below. Control-Theoretic Methods: The utilization of control theory for securing Cyber-Physical Systems (CPS) has received substantial attention in the literature. For instance, Miehling et 148 al.[103] propose a sampling-based worst-case design approach to overcome observation challenges and develop corresponding policies. Similarly, the work by Nguyen et al.[106] introduces a system identification and control-theoretical framework to ensure safety-critical operations in CPS. Comprehensive surveys of control-theoretic methods used for securing CPS are presented by Dibaji et al.[46] and Lun et al.[154]. Recently, Miehling et al. [101] developed a model that explicitly links the security status between the cyber and physical layers to design an intrusion response system.However, all of these approaches require knowledge of the system model at the cyber or at the physical level or both, making them challenging to apply in scenarios where the system model is unknown. Attack-graphs: Attack graphs are commonly used to model the movement of adversaries in a cyber environment, allowing for the quantification of attack path vulnerabilities using a common vulnerability scoring system (CVSS) [146]. Bayesian attack graph are used to determine the cyber attack scenarios on supervisory control and data acquisition (SCADA) and energy management system (EMS) of wind farms [159]. Petri nets, with their increased flexibility and resolution compared to attack graphs, have been a long standing tool for a range of application, including modeling cyber attacks [98, 38]. Adversarial identification frameworks: The MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) [1] framework provides a knowledge database to characterize post-compromise detection of an adversary targeting a given platform. The MITRE ATT&CK has recently been extended for Industrial Control System (ICS) [6]. Using MITRE ATT&CK framework, a cyber kill chain (CKC) has been developed and evaluated to determine the resiliency of Distributed Energy Resouces (DER) [3, 111]. Similar models characterizing the security attributes of a CPS are presented by Bakirtzis et al. [13] using a post-compromise database, like MITRE ATT&CK. Attack Detection via Machine Learning: ML methods have shown significant success in enhancing the security of CPS in various applications [148]. Some of these methods include attack detection, which have been used in [99] to detect false data injection attacks. To capture the 149 temporal and spatial structure of an anomaly, convolutional and memory based encoder-decoder models are employed [95]. A comprehensive list of ML based attack detectors have been provided in the survey [109] by Olowononi et al.. However, these methods are only used for detecting attacks and lack a defense mechanism to counteract an attack on the system. Defense Mechanism via Reinforcement Learning: Reinforcement learning (RL), a subfield of ML, has been used to develop a variety of defense mechanism in CPS [86, 107]. For instance, RL has been used to develop anti-jamming [39], and anti-spoofing policies, such as the use of dynamic threshold hypothesis testing for authentic user verification in [90, 149]. Moreover, RL methods have also been used to indentify vulnerabilities in smart grid CPS [152, 40]. However, the described RL methods assume a fixed policy for the CPS or the adversary, and do not account for any deviation while identifying system vulnerability or developing a defense mechanism. Game-Theoretic Methods and Network Hardening: Game-theoretic formulations in con- junction with Reinforcement Learning (RL) have been employed, as seen in studies such as [108], where an adversary-defender zero-sum dynamic game is formulated to determine optimal ac- tions for damaging (resp. protecting) transmission lines in smart grids. Two-player games have been utilized to model the security policies of Cyber-Physical Systems (CPS) in Vehicu- lar ad-hoc networks (VANETs)[94], addressing vulnerabilities to jamming attacks. Game theory has found application in modeling preemptive defender measures, such as anti-virus software or honeypot mechanisms[49], designed to secure IT systems before granting access to poten- tial users. Furthermore, game theory has been instrumental in analyzing advanced persistent threats [123, 127, 125, 126], where a defender can resort to Dynamic Information Flow Track- ing (DIFT) โ€“ a mechanism developed to dynamically track the usage of information flows during program executions [137]. In addition to game theory, network hardening techniques have been employed to secure Cyber- Physical Systems (CPS). However, the problem of network hardening has been shown to be NP- hard [131], necessitating the use of heuristic solutions. Identifying system vulnerability along with attack graph-based hardening is proposed by Saha et al.[124], offering efficient algorithms with 150 provable guarantees and exploring trade-offs between hardening cost and damages inflicted on the system. In this chapter, we present a novel approach to securing CPS through a non-zero-sum game between an adversary and a defender. The adversaryโ€™s policy is dynamic in nature and is determined using a Reinforcement Learning (RL) agent, while the defenderโ€™s policy is static and chooses to sequentially harden the network. For a principled approach to updating the defender actions, we resort to Bayesian optimization methods[136]. Blackbox Optimization: Blackbox (particularly, Bayesian) optimization has its roots in early methods such as Taguchi techniques [58, 11]. Techniques for blackbox optimization can be classified into two categories, deterministic [27, 25, 26] and stochastic. Among stochastic approaches to blackbox optimization, a popular approach is based on the assumption that the unknown function can be represented as a Gaussian process [136]. Recent research has applied Bayesian optimization to compute approximate Nash equilibria of general sum games with continuous action spaces [117, 4] or potential games [10]. This chapter presents a framework to design network hardening strategies for CPS by integrating a learning-based adversarial attack modeling approach [28] with the defense planning process. The contributions of this work are three-fold. 1. A game-theoretic formulation under information asymmetry and partial system observ- ability: This work presents a game-theoretic formulation for a CPS using an HAG model to capture probabilistic transitions of the adversary. We assume the defender does not have direct access to the adversaryโ€™s actions (policy) and rewards during an attack, and operate solely on a belief of the cyber layer security status and some measurable attributes in the physical layer (e.g., temperature measurements in smart buildings). The interaction between the adversary and the defender is modeled as a non-zero-sum game, where the goal is to find a defense strategy solely based on appropriately modeled reward/cost functions. By formulating the adversaryโ€™s and defenderโ€™s problems as an MDP [28] with cyber (discrete) and physical (continuous) states, the defenderโ€™s actions correspond to hardening the network, i.e., to impact the success probabilities on the cyber exploits. The solution concept that we 151 seek is that of a Nash equilibrium, i.e., a pair of policies from which neither player has any incentive to deviate. 2. Data-driven adversarial network hardening: Our work starts by demonstrating that the network hardening problem is equivalent to designing a slow absorbing Markov chain that represents the progression of an attack in any CPS. Such a slow absorbing Markov chain design is cast into a constrained optimization problem, which is non-convex, and hence, a global solution is not guaranteed using standard optimization methods. To address this, we propose a data-driven approach to compute a best response for each player iteratively, and then find an approximate NE using the best iterated response. Given a security policy of the defender, we adapt an Actor Critic algorithm โ€“ an RL method to solve the adversaryโ€™s problem and extract the corresponding policy. To solve for the defenderโ€™s best response, Bayesian optimization (BO) is used given the adversaryโ€™s policy. Neither the Actor Critic nor the BO require explicit knowledge of the underlying dynamics of the physical or cyber processes, making them attractive for joint attack and defense planning (referred to as purple teaming) of any complex CPS. 3. Evaluation on a smart building case-study: We evaluate our proposed approach on a smart building system, where the dynamics of the physical process were obtained from a highly accurate truncated model based on real-world measurements. The cyber layer of the CPS is modeled as a truncated version of a ransomware graph, created using an information flow graph [126]. The simulation results demonstrate the effectiveness of our approach in hardening the network, while also characterizing a trade-off between hardening costs and security status of the CPS. Furthermore, we observe that the adversary and defender objectives exhibit a diminishing marginal improvement with increasing number of iterations of our approach, suggesting proximity to an approximate NE of the game. Outline: The chapter is organized as follows. The model formulation of the HAG describing the dynamics of cyber (discrete) and physical (continuous) components and their interactions are 152 presented along in Section 5.2. Solution approaches for the adversary and defender problems are described in Section 5.3. Numerical experiments with the description of the cyber layer, physical layer, defense layer and results of proposed approaches in a smart building case-study in presented in Section 5.4. Finally, we conclude this chapter in Section 5.5. 5.2 Model Formulation In this section, we present our adversarial threat model that characterizes the cross-layer coupling between the cyber and physical vulnerabilities in a CPS using an HAG [93, 60, 67, 79, 102, 28, 8]. An HAG is a directed acyclic graph, which represent exploitable security attributes and physical processes as nodes, and adversarial exploits (or actions) as edges. The leaf nodes represent exploitable cyber attributes as an attack entry-point (e.g., malware download into a local workstation), while root nodes denote an adversaryโ€™s target set of physical-layer attributes (e.g., energy consumption, thermal comfort, traffic-lane assist). An HAG models the space of all possible attack paths available to a strategic adversary aiming to compromise cyber and physical components. Figure 5.1 illustrates a representative HAG used in [28] to model cross-layer sensor- deception attacks in buildings. Figure 5.1 A hybrid attack graph for a single-zone building with four cyber nodes (in red) and one physical node (in blue) [28]. An adversary infiltrates the leaf node (node 1) and progressively secures additional security attributes (nodes 2-4) before attacking the zone temperature controller by perturbing sensor measurements at the root node 5. The success probability of each cyber exploit along the edge of an HAG is dependent on the defense configuration. For example, the probability of detecting adversarial activity (equivalent to an unsuccessful attack action) is a function of the number of honeypots installed in the network [64]. The cyber exploits are represented using techniques from the MITRE ATT&CK framework for 153 ICS, as done in previous works such as [6, 41]. The authors in [41] developed an automated attack sequence generator represented as a hidden Markov model (HMM) using the same framework, with transition probabilities between tactics (nodes) and emission probabilities from tactics to a techniques. In our work, we use similar representations, i.e., the nodes can be presented as equivalent tactics and exploitable edges as techniques. Once a root node is breached, every attack action in the physical system is assumed to be successful with probability 1, and the adversary earns a corresponding reward. The adversaryโ€™s objective is to progressively learn the best attack path(s) in the HAG to reach the target root node and maximize the cumulative rewards earned over a finite attack horizon. This learning problem is posed as a Markov Decision Process (MDP). On the other hand, the defenderโ€™s objective is to preemptively minimize any costs incurred due to the adversary compromising any physical attributes at the root node(s), such as any disruption of physical processes and the cost of network hardening. This is achieved by selecting the success probabilities on the cyber exploits appropriately. Next, we present the modeling assumptions in our problem setup. 5.2.1 Modeling Assumptions Assumption 5.2.1 The adversary has full knowledge of the HAG topology but has limited (no) knowledge of the success probabilities (set by the defender) at the onset of an attack. Assumption 5.2.2 The defender has complete knowledge of the cyber exploits (edges in the HAG) and can allocate resources to harden the cyber network but not at the physical layer. Assumption 5.2.3 The defender cannot observe the adversaryโ€™s sequence of actions and rewards while the system is under attack1. Assumption 5.2.4 The HAG exhibits the well-known monotonicity property, which states that an adversary never willingly relinquish attributes once obtained [102]. This simplifies our analysis by avoiding any attack paths with self-loops. 1Under full information scenario between the adversary and defender, the defenderโ€™s cost and adversaryโ€™s net reward would be interchangeable. 154 In what follows, G = (N , E) is used to denote an HAG, where N and E are the set of nodes and edges in G, respectively. For notational clarity, we assume that G has only one root node; however, this assumption can be relaxed. Next, we discuss the preliminaries for the adversaryโ€™s MDP model. 5.2.2 Preliminaries States, Actions and Rewards We define ฮฆ as the set of attack success probabilities over all edges of G. The success probability of a cyber exploit ๐‘’ โˆˆ E, conditioned on the adversary using ๐‘’ is denoted by ฮฆ๐‘’ โˆˆ ฮฆ, and is given by ฮฆ๐‘’ (cid:17) ๐›ผ๐‘’๐‘ค๐‘’, where ๐›ผ๐‘’ โˆˆ [๐›ผ, 1] is chosen by the defender, ๐›ผ โˆˆ (0, 1) is a positive lower bound on ๐›ผ๐‘’, and ๐‘ค๐‘’ is a default (nominal) value. The defender can adjust ๐›ผ๐‘’ to control the success probability of ๐‘’; as ๐›ผ๐‘’ increases, so does the success probability of ๐‘’. Note that, ๐›ผ > 0 ensures that an exploit ๐‘’ is not made redundant by assigning a zero success probability. Let ๐œถ = (๐›ผ๐‘’ : ๐‘’ โˆˆ E) be the tuple of all defender-assigned weights in G; henceforth, we will refer to ๐œถ as the defenderโ€™s policy. Note that ๐œถ is set prior to the onset of an attack and is constant over the attack horizon. Hardening an exploitable edge corresponds to improving defense mechanisms over the techniques (MITRE ATT&CK for ICS) used by the adversary. For instance, a cyber node such as impair process control (a tactic) can be hardened over exploitable techniques such as alarm suppression, denial of service, and others that require corresponding costs. Let T = {1, 2, . . . , ๐‘‡ } be a finite attack horizon. The security state of the CPS at time ๐‘ก is denoted by a hybrid state variable ๐‘ ๐‘ก = (๐›พ๐‘ก, ๐‘ฅ๐‘ก), where (a) ๐›พ๐‘ก โˆˆ {0, 1}|N | is the discrete security state describing the current state of compromise of each node (1 means node is compromised and 0 means otherwise), and (b) ๐‘ฅ๐‘ก โˆˆ R๐‘š is the continuous state of the physical process at the root node. The set of available attack actions in the cyber and physical layers at time ๐‘ก is denoted by A (๐‘ ๐‘ก). Let ฮฅ be the total number of root nodes in G, and ๐›พ๐‘ก = ๐›พroot,๐‘–, for any ๐‘– โˆˆ {1, 2, . . . , ฮฅ} represent the breach of the ๐‘–th physical node. 155 Let ๐‘Ž(๐‘ ๐‘ก) โˆˆ A (๐‘ ๐‘ก) denote an attack action taken in state ๐‘ ๐‘ก for a given defense policy ๐œถ. Then, we denote the adversaryโ€™s instantaneous net reward at time ๐‘ก by ๐‘Ÿ (๐‘ ๐‘ก, ๐‘Ž(๐‘ ๐‘ก), ๐œถ) โˆˆ R. Note that the net reward includes the cost incurred to launch an exploit, irrespective of whether it is successful or not. CPS State Transitions Suppose a non-root node ๐‘› is compromised at time ๐‘ก, and there are E๐‘›,๐‘›โ€ฒ exploits available to compromise a neighboring node ๐‘›โ€ฒ. Assuming independence between different exploits, the proba- bility that ๐‘›โ€ฒ is compromised at time ๐‘ก +1 is given by 1โˆ’(cid:206)๐‘’โˆˆE๐‘›,๐‘›โ€ฒ (1โˆ’ฮฆ๐‘’). Such transitions represent various techniques from MITRE ATT&CK [5], and the graph nodes N represent equivalent tactics. For instance, an entry leaf node can be represented as an initial access (tactic), connected to lateral movement (another tactic) via cyber exploits (techniques), such as default credentials, I/O module discovery, and so on. Thus, the success probabilities ฮฆ (or equivalently the defender policy ๐œถ) influence the probabilistic evolution of the discrete state ๐›พ๐‘ก; this dependence is compactly expressed as: ๐›พ๐‘ก+1 = ๐‘”cyb(๐›พ๐‘ก, ๐‘Ž(๐‘ ๐‘ก), ๐œถ), (5.1) where ๐‘”cyb is an appropriate probability transition kernel. Moreover, the physical-process dynamics at the root node is represented using a state-space model of the form: ๐‘ฅ๐‘ก+1 = ๐‘”phy(๐‘ฅ๐‘ก, ๐‘ข๐‘ก, ๐‘ค๐‘ก, ๐‘Ž(๐‘ ๐‘ก)), ๐‘ฆ๐‘ก = ๐ป (๐‘ฅ๐‘ก, ๐‘ค๐‘ก, ๐‘Ž(๐‘ ๐‘ก), ๐œถ), (5.2) (5.3) where ๐‘”phy is the state transition function, ๐‘ฆ๐‘ก is the measurements, ๐ป is the measurement function, ๐‘ข๐‘ก is a suitably designed control, and ๐‘ค๐‘ก is the disturbance. Note that the attack term ๐‘Ž(๐‘ ๐‘ก) in (5.1) and (5.2) accounts for the attack impact on the root (physical) node, only after the root node is compromised. Combining (5.1) and (5.2), the security state ๐‘ ๐‘ก transition can be compactly denoted as 156 ๐‘ ๐‘ก+1 = ๐‘”(๐‘ ๐‘ก, ๐‘Ž(๐‘ ๐‘ก), ๐œถ), (5.4) where ๐‘” comprises ๐‘”cyb and ๐‘”phy. A detailed version of the HAG and its components are described in [28]. Next, we formally present the adversaryโ€™s MDP model. 5.2.3 Adversaryโ€™s Learning Problem Let ๐œ‹(๐‘ ๐‘ก) denote a stationary attack policy that assigns a probability to each action in the set A (๐‘ ๐‘ก) for a given state ๐‘ ๐‘ก and a defenderโ€™s policy ๐œถ. If ๐‘ ๐‘ก is the physical node, then ๐œ‹ is a distribution over a finite set of actions on the physical dynamics. Let ฮ  be the space of all feasible attack policies. Starting from an initial state ๐‘ 0 โˆˆ S and for a given defender policy ๐œถ, the adversary seeks a policy ๐œ‹โˆ— โˆˆ ฮ  that maximizes the objective function ๐ฝatt comprising the cumulative net reward over the attack horizon T , ๐ฝatt(๐‘ 0, ๐œ‹, ๐œถ) := E (cid:35) ๐‘Ÿ (๐‘ ๐‘ก, ๐œ‹, ๐œถ) , (cid:34) โˆ‘๏ธ ๐‘กโˆˆT ๐œ‹โˆ—(๐‘ 0, ๐œถ) โˆˆ arg max ๐œ‹โˆˆฮ  ๐ฝatt(๐‘ 0, ๐œ‹, ๐œถ), (5.5) (5.6) where the expectation is taken with respect to the transition kernel that defines the evolution in (5.4). 5.2.4 Defenderโ€™s Cyber Network Hardening Problem The defenderโ€™s objective is to minimize the combined impact of cyber attacks on the CPS and the cost of network hardening by choosing its actions ๐›ผ๐›ผ๐›ผ. Let ๐‘๐‘‘ (๐‘ , ๐œ‹(๐‘ ), ๐œถ) be the cost incurred by the defender under an attack policy ๐œ‹(.) for a given choice of defense action ๐›ผ๐›ผ๐›ผ. The cost may depend on the cyber states and/or physical layer attributes (discomfort or temperature fluctuations). Given a tuple of non-negative weights ๐›ผ๐›ผ๐›ผ, the network hardening cost is computed as โ„Ž(๐›ผ๐›ผ๐›ผ) = ๐‘‘๐‘’ โˆ‘๏ธ ๐‘’โˆˆE (cid:18) 1 โˆ’ ๐›ผ๐‘’ ๐›ผ๐‘’ (cid:19) , (5.7) where ๐‘‘๐‘’ is a hardening cost factor, which will be studied in Section 5.4. If a cyber exploit ๐‘’ is not hardened, then the corresponding cost is zero, i.e., ๐›ผ๐‘’ = 1. 157 We seek to minimize the defenderโ€™s objective ๐ฝdef over the attack horizon T , given any initial state ๐‘ 0 โˆˆ S and an attack policy ๐œ‹ โˆˆ ฮ . The objective function is defined as follows: (cid:34) (cid:35) ๐ฝdef(๐‘ 0, ๐œ‹, ๐œถ) := E โˆ‘๏ธ ๐‘๐‘‘ (๐‘ ๐‘ก, ๐œ‹(๐‘ ๐‘ก), ๐œถ) ๐‘กโˆˆT ๐›ผ๐›ผ๐›ผโˆ—(๐‘ 0, ๐œ‹) โˆˆ arg min ๐›ผ๐›ผ๐›ผโˆˆ[๐›ผ,1] | E | ๐ฝdef(๐‘ 0, ๐œ‹, ๐œถ), + โ„Ž(๐›ผ๐›ผ๐›ผ), (5.8) (5.9) where the expectation is taken with respect to the transition kernel in (5.4). Using (5.4), (5.5) and (5.8) we define a non-zero-sum stochastic game being played between the defender and the adversary. The desired solution concept is that of an open-loop Nash equilibrium [23], , where we find a pair of attack-defense policies {๐œ‹โˆ—, ๐›ผ๐›ผ๐›ผโˆ—} that are best-responses to each other, i.e, for which (5.6) and (5.9) hold simultaneously, given any ๐‘ 0. We identify sufficient conditions such as the stochastic game being zero-sum or having a specific structure (such as additive rewards for one player while the transitions are controlled by the other [70]) that guarantee the existence of Nash equilibrium policies. In particular, we adopt an iterative approach to find the best response of one player by fixing the policy of the other. We formally characterize technical conditions on the cost functions that ensures our proposed approach converges to a Nash equilibrium in a zero-sum and non zero-sum settings. The defenderโ€™s and adversaryโ€™s objectives are interdependent through each otherโ€™s policy, creating a paradox for solving either of the problems. For a non-zero-sum game, the defenderโ€™s and adversaryโ€™s objectives should be evaluated and optimized simultaneously. Since simultaneously solving non-zero-sum games is challenging, we propose an iterative approach to tackle the joint problem, i.e., by fixing the policy of a player first (e.g., the defender), solving for an optimal attack policy, then optimizing over the defenderโ€™s policies. We numerically investigate the convergence of this approach on a CPS example in Section 5.4. 5.2.5 Computational Challenges We elaborate on the major challenges in solving both, the adversaryโ€™s and defenderโ€™s problems. The adversaryโ€™s problem focuses on solving the MDP (5.5). Traditional dynamic programming algorithms, such as value-iteration and policy-iteration [139], are infeasible for solving the opti- 158 mality equation in each state due to the uncountable hybrid state space S. Moreover, these methods assume perfect knowledge of the system and transition probabilities. However, an adversary usually has limited knowledge of the dynamics in (5.2) and the attack success probabilities. Similarly, the defenderโ€™s objective is to solve Equation (5.9) using the HAG and adversaryโ€™s policy. However, the defender also lacks explicit knowledge of the dynamics in the HAG and adversaryโ€™s policy. This motivates the need for an automated purple teaming process, wherein both players solve their respective problems sequentially, until an equilibrium is reached or a specified number of iterations have been completed. In the next section, we discuss how an actor critic (AC) RL algorithm is used to approximately solve the adversaryโ€™s problem (5.5), as also described in our recent work [28]. For the defenderโ€™s problem, we propose the use of Bayesian optimization to efficiently explore the defenderโ€™s search space and identify a potential solution. 5.3 Solution Approaches In this section, we will begin by deriving an analytical expression for the expected time required by the adversary to reach the physical node(s), utilizing the properties of Markov chains. The expected time to reach the physical node(s) is a function of the cyber exploits, meaning hardening the network results in a longer expected time to reach. However, we will see that the underlying problem of network hardening is non-convex, which necessitates the use of efficient search methods, such as Bayesian optimization, for the defender. 5.3.1 Markov Chain Hardening using Expected Time The attributes of the HAG namely, (a) directed acyclic nature of the defined attack graph, (b) the presence of leaf and root node acting as source (cyber) and sink (physical) nodes respectively, and (c) a probabilistic distribution over the cyber exploits, make it ideal for modeling as an Absorbing Markov chain (AMC). Using the defenderโ€™s actions ๐›ผ๐›ผ๐›ผ and the adversaryโ€™s policy ๐œ‹, we determine the transition probabilities of the AMC states. We elaborate the components of the AMC and how the network hardening is posed as a constrained optimization problem. Given N nodes and E edges in an HAG, we define a Markov chain ๐‘€ with a transition probability matrix (cid:101)๐‘ƒ โˆˆ [0, 1] |N |ร—|N |. The Markov chain ๐‘€ defined by ๐‘”cyb is naturally absorbing due to the 159 presence of sink nodes (physical nodes). Let (cid:101)๐‘† โІ N be the set of absorbing states, and (cid:101)๐‘‡ โІ N be the set of transient states, such that N = (cid:101)๐‘† โˆช (cid:101)๐‘‡. The canonical form of the transition probability (cid:101)๐‘ƒ is given by, (cid:101)๐‘„ 0 (cid:101)๐‘ƒ = (cid:169) (cid:170) (cid:173) (cid:174) (cid:173) (cid:174) (cid:101)๐‘… ๐ผ (cid:171) (cid:172) where (cid:101)๐‘„ โˆˆ R|(cid:101)๐‘‡ |ร—|(cid:101)๐‘‡ |, is the matrix corresponding to the transient states, (cid:101)๐‘… โˆˆ R|(cid:101)๐‘†|ร—|(cid:101)๐‘‡ | is the matrix corresponding to the absorbing states, 0 โˆˆ R|(cid:101)๐‘‡ |ร—|(cid:101)๐‘†| zero matrix, and ๐ผ โˆˆ R|(cid:101)๐‘†|ร—|(cid:101)๐‘†| identity matrix (5.10) , corresponding to the absorbing states. Let ๐œ0 โˆˆ ฮ“ be the initial state distribution of the Markov chain. Note that ๐œ0 only contains the cyber state and represents a distribution over the transient states. For the transition probability (cid:101)๐‘ƒ, the expected absorption time [51] starting at the state ๐œ0 is given by, E[tabsorb((cid:101)P)] = ๐ฝAMC( (cid:101)๐‘„, ๐œ0) := 1๐‘‡ (๐ผ โˆ’ (cid:101)๐‘„)โˆ’1๐œ0. (5.11) The expected time governs how quickly the adversary can reach the physical node(s). The work in [51] focuses on designing fast absorbing Markov chains, such that the absorbing state is reached as soon as possible. However, hardening the network requires designing the matrix (cid:101)๐‘„ to deter the adversary from reaching the sink node.The optimization problem for modifying the matrix (cid:101)๐‘„ through the defender actions ๐›ผ๐›ผ๐›ผ is given by, max ๐›ผ๐›ผ๐›ผ ๐ฝAMC( (cid:101)๐‘„(๐›ผ๐›ผ๐›ผ), ๐œ0) = 1๐‘‡ (๐ผ โˆ’ (cid:101)๐‘„(๐›ผ๐›ผ๐›ผ))โˆ’1๐œ0 s.t. ๐›ผ๐›ผ๐›ผ โˆˆ [๐›ผ, 1] |E |, (5.12a) (5.12b) where 1 > ๐›ผ > 0 is a user-defined lower bound for the cyber exploit cost. The directed acyclic structure of HAG makes the transition matrix (cid:101)๐‘ƒ a block lower triangular, column stochastic matrix. The elements of the fundamental matrix ๐ฝFM := (๐ผ โˆ’ (cid:101)๐‘„(๐›ผ๐›ผ๐›ผ))โˆ’1 are given by, 160 ๐ฝFM = (cid:205) ๐‘—,โˆ€( ๐‘—,1)โˆˆE ๐›ผ ๐‘—,1 ๐‘ ๐‘—,1 0 โˆ’๐›ผ2,1 ๐‘2,1 . . . . . . . . . . . . . . . . . . . . . (cid:205) ๐‘—,โˆ€( ๐‘—,๐‘–)โˆˆE ๐›ผ ๐‘—,๐‘– ๐‘ ๐‘—,๐‘– . . . ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ Equation (5.12a) can be re-expressed as, ๐ฝAMC(๐›ผ๐›ผ๐›ผ) = 1๐‘‡ adj(๐ผ โˆ’ (cid:101)๐‘„(๐›ผ๐›ผ๐›ผ)) det(๐ผ โˆ’ (cid:101)๐‘„(๐›ผ๐›ผ๐›ผ)) ๐œ0, โˆ’1 . 0 . . . . . . . . . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (5.13) (5.14) where det(๐ด) and adj( ๐ด) corresponds to the determinant and adjugate of the matrix ๐ด, respectively. Since (cid:101)๐‘„ is affine in ๐›ผ๐›ผ๐›ผ, we can express Equation (5.14) as the ratio of two polynomials in the entries of ๐›ผ๐›ผ๐›ผ given by, ๐ฝAMC(๐›ผ๐›ผ๐›ผ) = ๐’ซ|N |โˆ’1(๐›ผ๐›ผ๐›ผ) ๐’ซ|N | (๐›ผ๐›ผ๐›ผ) , (5.15) where ๐’ซ|N |โˆ’1(๐‘ฅ) is a polynomial in ๐‘ฅ of degree at most |N | โˆ’ 1. Note that the denominator has at least one more degree than the numerator, so ๐ฝAMC tends to infinity if and only if ๐›ผ๐‘’ approaches zero for all ๐‘’ โˆˆ E. This leads to a solution to make all the cyber exploit weights ๐›ผ๐‘’ set to zero. However, setting ๐›ผ๐‘’ to zero in practice can disconnect different components of a CPS, rendering the problem infeasible. Moreover, the optimization problem under constraints (5.12b) is non-convex, and hence, a global solution is not guaranteed. Proposition 5.3.1 (Convexity of cost) Suppose all entries in ๐›ผ๐›ผ๐›ผ are identical, i.e., ๐›ผ๐‘’ = ๐›ผ๐‘Ž, โˆ€๐‘’ โˆˆ E. Then, 1. ๐ฝAMC(๐›ผ๐‘Ž) is convex in ๐›ผ๐‘Ž, โˆ€๐›ผ๐‘Ž โˆˆ [๐›ผ, 1]; 2. the optimizer of (5.12a) lies on the constraint boundary. 161 Proof: Under the assumption of ๐›ผ๐‘’ = ๐›ผ๐‘Ž, โˆ€๐‘’ โˆˆ E, (5.15) changes from a ratio of polynomial in ๐›ผ๐‘Ž to a monomial in ๐›ผ๐‘Ž, given as, ๐ฝAMC(๐›ผ๐‘Ž) = ๐‘˜1๐›ผ|N |โˆ’1 ๐‘Ž + ๐‘˜2๐›ผ|N |โˆ’2 ๐‘Ž ๐‘ž1๐›ผ|N | ๐‘Ž + ยท ยท ยท + ๐‘˜ |N | , (5.16) where ๐‘˜๐‘–, ๐‘– โˆˆ 1, 2, . . . , |N | and ๐‘ž1 are positive coefficients. Since 1 ๐›ผ๐‘˜ ๐‘Ž is convex for ๐‘˜ โ‰ฅ 1 and ๐›ผ๐‘Ž > 0, ๐ฝAMC is a sum of convex functions and therefore, is convex. The second part follows from the fact that the maximizer of a convex function always lies at the boundary of the domain. โ–ก โ–ก Observe that we are yet to include an additional or marginal cost for hardening the network in the formulation. Under the assumption ๐›ผ๐‘’ = ๐›ผ๐‘Ž, โˆ€๐‘’ โˆˆ E, we add an additional cost to harden the network to obtain the hardening Markov chain objective ๐ฝHMC given by, ๐ฝHMC(๐›ผ๐‘Ž) (cid:17) ๐ฝAMC(๐›ผ๐‘Ž) + โ„Ž(๐›ผ๐‘Ž), (5.17) where โ„Ž(๐›ผ๐‘Ž) is the cost of hardening. If โ„Ž(๐›ผ๐‘Ž) is also convex, then ๐ฝHMC remains convex over ๐›ผ๐‘’. Therefore, by Proposition 5.3.1, the solution will always lie at the boundary, i.e., for a given topology and costs, it will choose the cyber exploits ๐›ผ๐‘’ to either completely harden or not harden at all. In order to model more general reward functions that also include the physical attributes, we present the use of Bayesian optimization for efficiently searching for non-trivial solutions.However, before we describe the approach, we briefly review the technique used to compute the optimal attack policy. 5.3.2 Model-Free Reinforcement Learning for Adversarial Policy Learning Actor Critic (AC) is a model-free RL approach that learns an agentโ€™s (in this case, the adversary) policy without explicit knowledge of the probabilistic dynamics of the system (5.2), even for hybrid MDP state spaces. AC concurrently trains two models (called the actor and the critic) to learn a parametric form of a policy in an interactive setting with the environment (HAG). Let ๐œƒ โˆˆ ฮ˜ be a vector used to represent a parameterized value function of the form ๐‘‰ โˆ—(๐‘ ๐‘ก, ๐œถ) = max ๐œ‹โˆˆฮ  E [๐‘Ÿ (๐‘ ๐‘ก, ๐œ‹(๐‘ ๐‘ก), ๐œถ) + ๐‘‰ โˆ—(๐‘ ๐‘ก+1, ๐œถ)] , (5.18) 162 where ๐‘‰ โˆ—(๐‘ ๐‘ก, ๐›ผ๐›ผ๐›ผ) is the optimal value function for the state ๐‘ ๐‘ก and ฮ˜ has much lower dimensions as compared to S. The AC aims to learn ๐œƒโˆ— โˆˆ ฮ˜ such that โˆ€๐‘  โˆˆ S, |๐‘‰ โˆ—(๐‘ , ๐›ผ๐›ผ๐›ผ) โˆ’ ๐ฝatt(๐‘ , ๐›ผ๐›ผ๐›ผ; ๐œƒโˆ—)| < ๐œ–, where ๐ฝatt(๐‘ , ๐›ผ๐›ผ๐›ผ; ๐œƒ) is a parametrized value function, and ๐œ– > 0 is an error tolerance. Analogous to the parameterized value function, let ๐œ‹(๐‘ , ๐›ผ๐›ผ๐›ผ; ๐œ“) denote a parameterized stochastic policy by ๐œ“ โˆˆ ฮจ. At each time step, the critic updates the value-function parameters ๐œƒ using sampled actions and successor states, while the actor updates the policy parameters ๐œ“ in a direction suggested by the critic. The parameters ๐œ“ and ๐œƒ are updated using a stochastic gradient scheme of the form ๐œƒ โ† ๐œƒ + ๐›ฝ๐œƒ (๐‘Ÿ๐‘ก + ๐œ‚๐ฝatt(๐‘ โ€ฒ, ๐›ผ๐›ผ๐›ผ; ๐œƒ) โˆ’ ๐ฝatt(๐‘ , ๐›ผ๐›ผ๐›ผ; ๐œƒ)) โˆ‡๐œƒ, ๐œ“ โ† ๐œ“ + ๐›ฝ๐œ“ (๐‘Ÿ๐‘ก + ๐œ‚๐ฝatt(๐‘ โ€ฒ, ๐›ผ๐›ผ๐›ผ; ๐œƒ) โˆ’ ๐ฝatt(๐‘ , ๐›ผ๐›ผ๐›ผ; ๐œƒ)) โˆ‡๐œ“ ln ๐œ‹(๐‘ , ๐›ผ๐›ผ๐›ผ; ๐œ“), (5.19a) (5.19b) where ๐›ฝ๐œ“ > 0 and ๐›ฝ๐œƒ > 0 are step-sizes for the actor and critic, respectively, that vary over the iterations, and โˆ‡๐œƒ is the gradient of ๐ฝatt with respect to ๐œƒ evaluated at (๐‘ , ๐›ผ๐›ผ๐›ผ, ๐œƒ), ๐œ‚ is the discount factor, and ๐‘ โ€ฒ is the next state. The process is repeated until ๐œƒ converges or a prescribed number of iterations is completed. To apply the AC algorithm in the MDP (5.5) with discrete actions, we use an exponential softmax distribution ๐‘’โ„Ž(๐‘ ,๐‘Ž,๐œ“) (cid:205)๐‘โˆˆA๐‘ก (๐‘ ) ๐‘’โ„Ž(๐‘ ,๐‘,๐œ“) where ๐‘’ is the Euler constant. Here, the function โ„Ž(๐‘ , ๐‘Ž, ๐œ“) denotes a real-valued parametric , โˆ€๐‘Ž โˆˆ A๐‘ก (๐‘ ), ๐œ‹(๐‘ , ๐›ผ๐›ผ๐›ผ; ๐œ“) = (5.20) preference defined for each state-action pair, which can be determined using tile coding or deep neural networks. The complete steps of various AC algorithms are described in [139]. To implement the AC algorithm, we use an on-policy linear function approximation [139]. We use tile coding to represent multi-dimensional continuous state space, where the receptive fields of the features are grouped into partitions of state space. The convergence of temporal difference (TD) (๐œ†) with probability 1 when the learning rates follow certain properties was demonstrated in [45]. Similarly, the author in [31] proved the convergence of on-line TD(0) with probability 1 while using a linear function approximator. [140] introduced fast convergence algorithms for both on-line and offline policy training with linear function approximation. A comprehensive list of RL using function approximation and its convergence were reported in [151]. We use the policy obtained from AC 163 algorithm to determine an effective sequence of attacks to eventually reach the physical node(s) causing damage or disruption in service. Next, we present the solution to the defenderโ€™s problem while keeping the obtained adversary policy fixed. 5.3.3 Bayesian Optimization for Network Hardening Algorithm 3: Adversarial Network Hardening Input: HAG, ๐‘‡ (Time horizon), {๐‘ค๐‘’}, โˆ€๐‘’ โˆˆ E (default success probabilities), ๐พ (Hardening iteration) Result: Attack policy ๐œ‹โˆ—, Defenderโ€™s actions ๐›ผโˆ—๐›ผโˆ—๐›ผโˆ— Initialize ๐›ผ๐›ผ๐›ผ1 (๐›ผ๐‘’ := 1, โˆ€๐‘’ โˆˆ E) for ๐‘˜ โ† 1 to ๐พ: do # Actor Critic for adversary Initialize Actor Critic weights; # Number of episodes of the attack for episode โ† 1 to ๐‘: do Initialize ๐‘ 0 โˆˆ S for ๐‘ก โ† 1 to ๐‘‡: do ๐‘Ž๐‘ก โˆผ ๐œ‹๐‘˜ (๐‘ ๐‘ก; ๐œ“) ๐‘ ๐‘ก+1 = ๐‘”(๐‘ ๐‘ก, ๐‘Ž๐‘ก) Update ๐œƒ and ๐œ“ end end # Bayesian optimization for defender Initialize surrogate model parameters: ๐œ‡0(ยท), ๐œŽ0(ยท), ๐‘˜ (ยท, ยท), ๐œŒ, ๐ท0 = โˆ… for b โ† 1 to ๐ต: do Obtain ๐œ‰๐‘ = ๐น (๐›ผ๐›ผ๐›ผ๐‘˜,๐‘, ๐œ‹๐‘˜ ) Augment data, ๐ท ๐‘ = ๐ท ๐‘โˆ’1 โˆช {๐›ผ๐›ผ๐›ผ๐‘˜,๐‘, ๐œ‰๐‘} Update the GP parameters ๐œ‡๐‘ (ยท), ๐œŽ๐‘ (ยท) using (5.22) Choose ๐œถ๐‘˜,๐‘+1 โˆˆ arg min๐œถ ๐‘ž(๐›ผ๐›ผ๐›ผ|๐ท ๐‘), end Choose ๐›ผ๐›ผ๐›ผ๐‘˜+1 = arg min ๐‘ž(๐›ผ๐›ผ๐›ผ|๐ท ๐ต) end Output: ๐œ‹โˆ— = ๐œ‹๐พ, ๐›ผ๐›ผ๐›ผโˆ— = ๐›ผ๐›ผ๐›ผ๐พ+1 Recall that our best-response based solution approach is iterative in nature: We begin with a defender policy, compute the optimal policy for the adversary (using the AC algorithm in Section 5.3.2), update the defender policy and repeat the process. Due to lack of knowledge of the underlying physical dynamics (5.2) along with requiring multiple evaluations (expected value), we treat the problem as a black box and use Bayesian optimization [113] to solve the defenderโ€™s problem. To 164 account for the computational complexity of the defenderโ€™s problem using Bayesian optimization (BO), we evaluate the expectation with limited samples, to average out any measurement noise. We initialize the defenderโ€™s policies with ๐›ผ๐‘’ = 1, โˆ€๐‘’ โˆˆ E, and train the adversaryโ€™s policy using AC algorithm with weights ๐œƒ and ๐œ“. Once we learn an attack policy, we determine the defenderโ€™s best response with respect to each exploit using BO. The goal of a BO process is to minimize an unknown function given by (5.7) expressed by, ๐น (๐›ผ๐›ผ๐›ผ, ๐œ‹) = E ๐‘๐‘‘ (๐‘ ๐‘ก, ๐œ‹(๐‘ ๐‘ก), ๐œถ) (cid:35) + โ„Ž(๐›ผ๐›ผ๐›ผ). (cid:34) โˆ‘๏ธ ๐‘กโˆˆT (5.21) At each BO iteration ๐‘ we select a tuple ๐›ผ๐›ผ๐›ผ๐‘˜,๐‘ and evaluate the corresponding function value ๐น (๐›ผ๐›ผ๐›ผ๐‘˜, ๐‘, ๐œ‹๐‘˜ ), where ๐œ‹๐‘˜ is the attack policy for the ๐‘˜-th hardening epoch. The main idea behind BO is to maintain a surrogate function of ๐น, such as a Gaussian process2, which is updated with noisy observations ๐œ‰ := [๐œ‰1, . . . , ๐œ‰๐ต]โ€ฒ of ๐น at the set ๐ด๐ต := {๐›ผ๐›ผ๐›ผ๐‘˜,1, . . . , ๐›ผ๐›ผ๐›ผ๐‘˜,๐ต} using an acquisition function ๐‘ž(๐›ผ๐›ผ๐›ผ). The posterior over ๐น is a Gaussian distribution with mean ๐œ‡๐ต (๐›ผ๐›ผ๐›ผ) and covariance ๐‘˜ ๐ต (๐›ผ๐›ผ๐›ผ, ๐›ผ๐›ผ๐›ผโ€ฒ) given by ๐œ‡๐ต (๐›ผ๐›ผ๐›ผ) = k๐ต (๐›ผ๐›ผ๐›ผ)๐‘‡ (๐พ๐ต + ๐œŒ๐ผ)โˆ’1๐œ‰, ๐‘˜ ๐ต (๐›ผ๐›ผ๐›ผ, ๐›ผ๐›ผ๐›ผโ€ฒ) = ๐‘˜ (๐›ผ๐›ผ๐›ผ, ๐›ผ๐›ผ๐›ผโ€ฒ) โˆ’ k๐ต (๐›ผ๐›ผ๐›ผ)๐‘‡ (๐พ๐ต + ๐œŒ๐ผ)โˆ’1k๐ต (๐›ผ๐›ผ๐›ผโ€ฒ), ๐œŽ๐ต (๐›ผ๐›ผ๐›ผ)2 = ๐‘˜ ๐ต (๐›ผ๐›ผ๐›ผ, ๐›ผ๐›ผ๐›ผโ€ฒ), (5.22) where ๐‘˜ : A ร— A โ†’ Rโ‰ฅ0 is the kernel function, the vector k๐ต (๐›ผ๐›ผ๐›ผ) := [๐‘˜ (๐›ผ๐›ผ๐›ผ๐‘˜,1, ๐›ผ๐›ผ๐›ผ) . . . ๐‘˜ (๐›ผ๐›ผ๐›ผ๐‘˜,๐ต, ๐›ผ๐›ผ๐›ผ)]๐‘‡ , ๐พ๐ต is the positive semi-definite kernel matrix [๐‘˜ (๐›ผ๐›ผ๐›ผ, ๐›ผ๐›ผ๐›ผโ€ฒ)]๐›ผ๐›ผ๐›ผ,๐›ผ๐›ผ๐›ผโ€ฒโˆˆ๐ด๐‘›, ๐œŒ โ‰ฅ 0, and ๐œŽ๐ต (๐›ผ๐›ผ๐›ผ) is the standard deviation of the Gaussian measurement noise for the samples ๐œ‰. In this work, we use the expected improvement as the acquisition function, which is defined by (5.23). Let ๐นโ€ฒ ๐ต (๐›ผ๐›ผ๐›ผ) := min๐‘šโ‰ค๐ต ๐น (๐›ผ๐›ผ๐›ผ๐‘˜,๐‘š) represent the minimal observed value of ๐น () at the current iterate ๐ต, then expected improvement is defined as, ๐‘ž(๐›ผ๐›ผ๐›ผ) = EI๐ต (๐›ผ๐›ผ๐›ผ) := E(cid:104) (cid:0)๐นโ€ฒ ๐ต (๐›ผ๐›ผ๐›ผ) โˆ’ ๐น (๐›ผ๐›ผ๐›ผ)(cid:1) + (cid:12) (cid:12) (cid:12) ๐›ผ๐›ผ๐›ผ๐‘˜,1:๐ต, ๐œ‰1:๐ต (cid:105) , (5.23) 2A Gaussian process is a stochastic process, i.e., random variables indexed by space and time, such that any finite collection of those random variables has a multivariate normal distribution. 165 where ๐‘ฅ+ (cid:17) max{๐‘ฅ, 0}. To obtain theoretical guarantees on the suboptimality of ๐›ผ after ๐ต iterations, we also use the upper confidence bound (UCB)[136], which is given by ๐‘ž๐ต (๐œถ) := ๐œ‡๐ต (๐œถ) + โˆš๏ธ๐›ฝ๐ต๐œŽ๐ต (๐œถ), where, for a discrete choice of ๐œถ, ๐›ฝ๐ต := 2 ln(|๐œถ|๐œ‰๐ต/๐œ) with an user-defined ๐œ โˆˆ (0, 1) , and ๐œ‰๐‘˜ is a sequence such that (cid:205)โˆž ๐‘˜=1 ๐œ‰โˆ’1 ๐‘˜ = 1. The BO algorithm in conjuction with AC is summarized in Algorithm 3, where ๐พ is the total number of BO iterations, ๐‘ is the total number of episodes of the AC algorithm and ๐‘‡ is total time duration for the system. At each BO iteration ๐‘˜, we return the updated cyber exploits which are used to re-train the adversaryโ€™s policy with the new set of success probabilities and repeat the same process for the defined number of iterations ๐พ. Once this process terminates, we obtain the best set of defenderโ€™s actions (non-negative weights) ๐›ผโˆ— ๐‘’, โˆ€๐‘’ โˆˆ E and the corresponding adversary policy ๐œ‹โˆ—. 5.3.4 Analytic properties for Zero-sum games We provide analytical guarantees for our proposed approach, which involves analyzing Algo- rithm 1 in a zero-sum scenario by considering a finite set of pure policies for each player. For the zero-sum analysis, we swap the minimizer and maximizer. In particular, the adversary (min- imizer) picks out of the set {๐œ‹1, ๐œ‹2, . . . , ๐œ‹๐‘š} and the defender (maximizer) picks out of the set {๐›ผ1๐›ผ1๐›ผ1, ๐›ผ2๐›ผ2๐›ผ2, . . . , ๐›ผ๐‘›๐›ผ๐‘›๐›ผ๐‘›}. The cost of player policy ๐œ‹๐‘– against ๐›ผ๐›ผ๐›ผ ๐‘— equals ๐‘€๐‘– ๐‘— (๐‘ 0), where ๐‘€ (๐‘ 0) โˆˆ R๐‘šร—๐‘› is the cost/payoff matrix. In what follows, we will drop the explicit dependence of ๐‘€ on ๐‘ 0 for ease of notation. Any Hannan consistent algorithm has properties of (i) time-average convergence to the best response policy, and (ii) 2๐œ€โˆ’ approximate Nash equilibrium with ๐œ€ โ‰ฅ 0 when both players update their policy using a Hannan consistent algorithm [59]. As such, our proposed approach employs a single-agent reinforcement learning (adversary) to determine Nash equilibria for such repeated zero-sum games [156]. Assuming ๐พ iterations of Algorithm 1, we will leverage the following properties : 166 Proposition 5.3.2 ( [35] Theorem 4.1 and 7.2) Given ๐พ as the number of iterations of Algorithm 3, and let {๐‘ƒ1, . . . , ๐‘ƒ๐พ } and { ๐‘—1, . . . , ๐‘—๐พ } be the possibly mixed adversary policies and pure defender policies at the corresponding iterations, respectively. Then, the adversary algorithm satisfies the following inequality 1 ๐พ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐‘ƒ๐‘‡ ๐‘˜ ๐‘€๐‘’ ๐‘—๐‘˜ โ‰ค 1 ๐พ min ยฏ๐‘ƒโˆˆฮ”๐‘š ยฏ๐‘ƒ๐‘‡ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐‘€๐‘’ ๐‘—๐‘˜ + ๐›ฟ(๐‘š, ๐พ), where ๐‘’ ๐‘—๐‘˜ is the ๐‘—๐‘˜ -th basis vector in R๐‘›, ฮ”๐‘š is the probability simplex in ๐‘š dimensions, ๐›ฟ(๐‘š, ๐พ) โ‰ฅ 0 is an Hannan consistent regret that depends on the number of adversary actions ๐‘š and number of iterations ๐พ, obtained using any fixed distribution ยฏ๐‘ƒ. ๐›ฟ(๐‘š, ๐พ) โ‰ฅ 0 corresponds to regret when the adversary uses a Hannan consistent [59] algorithm to update its policy every iteration. There exist many Hannan consistent algorithms, such as exponential weighted average [35] or multiplicative weight update [56], where ๐›ฟ(๐‘š, ๐พ) = O (โˆš๏ธlog(๐‘š)/๐พ). Before we proceed with the defenderโ€™s analysis, we need to make the following assumption on the entries of ๐‘€. Assumption 5.3.3 Each row of ๐‘€ is assumed to be drawn out of a Gaussian process with a given mean (typically equal to zero) and prior covariance defined by a kernel matrix ๐พ๐‘– ( ๐‘—, โ„“) โ‰ฅ 0, for the ๐‘–-th row. Note that this assumption automatically implies that any linear combination of the rows is also a sample of a Gaussian process with a mean and a linear combination of the kernel matrices. Proposition 5.3.4 ( [136] Theorem 1 and Lemma 7.6) Suppose that Assumption 5.3.3 holds. Then, against any attack distribution ๐‘ƒ๐‘˜ , Bayesian optimization yields a pure policy ๐‘’ ๐‘—๐‘˜ , such that ๐‘˜ ๐‘€๐›ผ๐›ผ๐›ผ โ‰ค ๐‘ƒ๐‘‡ ๐‘ƒ๐‘‡ ๐‘˜ ๐‘€๐‘’ ๐‘—๐‘˜ + ๐œ–๐‘˜ , max ๐œถ with probability of at least 1 โˆ’ ๐œ, where โˆš๏ธ„ ๐›พ๐ต (๐‘ƒ๐‘‡ ๐‘˜ ๐‘€) ๐›ฝ๐ต (๐‘›) ๐ต . (cid:170) (cid:174) (cid:172) ๐œ–๐‘˜ โˆˆ ๐‘‚ (cid:169) (cid:173) (cid:171) 167 gain ๐›พ๐ต (๐‘ƒ๐‘‡ Recall that ๐›ฝ๐ต (๐‘›) = 2 ln(๐‘›๐œ‰๐ต/๐œ), where the sequence ๐œ‰๐‘˜ is such that (cid:205)โˆž ๐œ‰โˆ’1 ๐‘˜ = 1. The information ๐‘˜=1 โ„“=1 log(1 + ๐œŽโˆ’2๐‘”โ„“๐œ†โ„“), where ๐œ†โ€™s are the eigenvalues ๐‘˜ ๐‘€, and ๐œŽ is the variance of the noise in obtaining the ๐‘˜ ๐‘€) := 0.5/(1 โˆ’ 1/๐‘’) max๐‘”1,...,๐‘”๐‘˜ of the kernel matrix of the weighted rows ๐‘ƒ๐‘‡ (cid:205)๐ต payoff. We are now ready to state and prove a convergence result for the zero-sum setting. Proposition 5.3.5 Consider the average of the attack distributions produced by Algorithm 1, ห†๐‘ƒ๐พ := 1 ๐พ (cid:205)๐พ ๐‘˜=1 ๐‘ƒ๐‘˜ . This distribution satisfies max ๐œถ ห†๐‘ƒ๐‘‡ ๐พ ๐‘€๐›ผ๐›ผ๐›ผ โ‰ค min ๐‘ƒ๐‘‡ ๐‘€๐›ผ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ๐‘ƒโˆˆฮ”๐‘š max ๐›ผ๐›ผ๐›ผ (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:124) Value of the matrix game ๐‘€ (cid:125) + 1 ๐พ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐œ–๐‘˜ + ๐›ฟ(๐‘š, ๐พ), with probability of at least 1 โˆ’ ๐พ๐œ. Proof: We start with ห†๐‘ƒ๐‘‡ ๐พ ๐‘€๐›ผ๐›ผ๐›ผ = max ๐›ผ 1 ๐พ max ๐›ผ๐›ผ๐›ผ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐‘ƒ๐‘‡ ๐‘˜ ๐‘€๐›ผ โ‰ค โ‰ค โ‰ค 1 ๐พ 1 ๐พ 1 ๐พ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐พ โˆ‘๏ธ ๐‘˜=1 ๐พ โˆ‘๏ธ ๐‘˜=1 ๐‘ƒ๐‘‡ ๐‘˜ ๐‘€๐›ผ max ๐›ผ๐›ผ๐›ผ (๐‘ƒ๐‘‡ ๐‘˜ ๐‘€๐‘’ ๐‘—๐‘˜ + ๐œ–๐‘˜ ) Using Prop. 5.3.4 with prob. at least 1 โˆ’ ๐พ๐œ, ( ยฏ๐‘ƒ๐‘‡ ๐‘€๐‘’ ๐‘—๐‘˜ + ๐œ–๐‘˜ ) + ๐›ฟ(๐‘š, ๐พ) using Prop. 5.3.2, = min ยฏ๐‘ƒโˆˆฮ”๐‘š ยฏ๐‘ƒ๐‘‡ 1 ๐พ ๐พ โˆ‘๏ธ ๐‘˜=1 โ‰ค max ๐›ผ๐›ผ๐›ผ ยฏ๐‘ƒ๐‘‡ ๐‘€๐›ผ + ๐‘€๐‘’ ๐‘—๐‘˜ + 1 ๐พ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐œ–๐‘˜ + ๐›ฟ(๐‘š, ๐พ) 1 ๐พ ๐พ โˆ‘๏ธ ๐‘˜=1 ๐œ–๐‘˜ + ๐›ฟ(๐‘š, ๐พ). Since this holds for any fixed distribution ยฏ๐‘ƒ, one such particular choice is a saddle-point policy for the adversary. This completes the proof. โ–ก Remark 5.3.6 Proposition 5.3.5 quantifies the proximity of the outcome of Algorithm 1 to the saddle-point value (i.e., the Nash equilibrium) of the matrix game ๐‘€ with high probability, under 168 certain technical assumptions on the entries of the payoff matrix. Furthermore, the error in the outcome depends logarithmically on the number of rows ๐‘š and columns ๐‘› of the payoff matrix ๐‘€. This means that one can use a large number of pure policies while incurring only a modest increase in the error bound. 5.3.5 Analytic properties of the non zero-sum set-up In this subsection, we derive analytical properties of the non-zero-sum game under some assumptions. Consider a two-player stochastic game with a finite state space ๐‘  โˆˆ S, having finite action spaces ๐œ‹(๐‘ ) and ๐œถ for the adversary and defender, respectively, in each state ๐‘ . We denote this game by ฮ“ = {S, ๐œ‹(๐‘ ), ๐œถ, ห†๐‘Ÿ, ๐‘}, (5.24) where ห†๐‘Ÿ := { ห†๐‘Ÿ1, ห†๐‘Ÿ2} is a vector-valued function for the defender and adversary, respectively, in the domain Z = {(๐‘ , ๐œ‹(๐‘ ), ๐œถ); ๐‘  โˆˆ S, ๐œ‹(๐‘ ) โˆˆ ฮ , ๐œถ โˆˆ [๐›ผ, 1] |E |}. In particular, ห†๐‘Ÿ := { ห†๐‘Ÿ1 := ๐‘๐‘‘ (๐‘ , ๐œ‹, ๐œถ), ห†๐‘Ÿ2 := ๐‘Ÿ (๐‘ , ๐œ‹, ๐œถ)} for the described problem (5.9) and (5.6), respectively. Lastly, the state transition probability is given by p = {๐‘(๐‘ง|๐‘ , ๐œ‹(๐‘ ), ๐œถ); ๐‘ง โˆˆ S, (๐‘ , ๐œ‹(๐‘ ), ๐œถ) โˆˆ Z}, where ๐‘(๐‘ง|๐‘ , ๐œ‹(๐‘ ), ๐œถ) denotes the probability that the state moves from state ๐‘  to ๐‘ง when the actions ๐œ‹(๐‘ ) and ๐œถ are taken in the state ๐‘ . The state transition probabilities satisfy the following properties, ๐‘(๐‘ง|๐‘ , ๐œ‹(๐‘ ), ๐œถ) โ‰ฅ 0, and ๐‘(๐‘ง|๐‘ , ๐œ‹(๐‘ ), ๐œถ) = 1. โˆ‘๏ธ ๐‘งโˆˆS Definition 2 (Additive reward (AR) and additive transition (AT) game (ARAT game) [121]) The stochastic game ฮ“ (5.24) possesses an additive rewards property, if for all (๐‘ , ๐œ‹(๐‘ ), ๐œถ) โˆˆ Z, ๐‘๐‘‘ (๐‘ , ๐œ‹, ๐œถ) = ๐‘๐‘‘ 1 (๐‘ , ๐œถ) + ๐‘๐‘‘ 2 (๐‘ , ๐œ‹), ๐‘Ÿ (๐‘ , ๐œ‹, ๐œถ) = ๐‘Ÿ1(๐‘ , ๐œถ) + ๐‘Ÿ2(๐‘ , ๐œ‹), 169 for appropriate functions ๐‘๐‘‘ 1 , ๐‘๐‘‘ 2 , ๐‘Ÿ1 and ๐‘Ÿ2 on the domain. The game ฮ“ (5.24) simplifies to a controlling game if the states can be partitioned into two sets S1 and S2 such that โˆ€๐‘  โˆˆ S1, ๐‘(๐‘ง|๐‘ , ๐œ‹(๐‘ ), ๐œถ) = ๐‘1(๐‘ง|๐‘ , ๐œถ) โˆ€๐‘  โˆˆ S2, ๐‘(๐‘ง|๐‘ , ๐œ‹(๐‘ ), ๐œถ) = ๐‘2(๐‘ง|๐‘ , ๐œ‹(๐‘ )). The partitioning of states enables the game ฮ“ (5.24) to possess additive transitions for all (๐‘ , ๐œ‹(๐‘ ), ๐œถ) โˆˆ Z of the form ๐‘(๐‘ง|๐‘ , ๐œ‹(๐‘ ), ๐œถ) = ๐‘1(๐‘ง|๐‘ , ๐œถ) + ๐‘2(๐‘ง|๐‘ , ๐œ‹(๐‘ )) Assumption 5.3.7 ("Switching control graphs") The graph G satisfies the following properties: 1. There are no self loops, 2. the defense policy ๐œถ is such that for every cyber node with a single outgoing edge ๐‘’, ๐›ผ๐‘’ โ‰  1, 3. for every other edge, ๐›ผ๐‘’ = 1, and 4. the game is played over an infinite horizon in a discounted setting A line graph represents one such example. Figure 5.2 shows a non-trivial example of a switching control graph. Then, the following is a property of the game described in Sections 5.4.2 and 5.4.3. Figure 5.2 Switching control graph with nodes 1, 4 and 6 representing adversary control, and nodes 2, 3, 5 representing defender control. Proposition 5.3.8 ( ARAT game with switching control graphs ) Under Assumption 5.3.7, the stochastic game ฮ“ defined by (5.24) is an ARAT game. 170 156324132 Proof: We will verify that the cyber rewards (5.25) and physical rewards (5.27) satisfy the AR property, and the state transitions satisfy the AT properties. Under Assumption 5.3.7, the expected cyber rewards are partitioned as ๐‘Ÿ (๐‘ ๐‘ก, ๐œ‹, ๐œถ) = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ ๐‘Ÿ1(๐‘ ๐‘ก, ๐œถ) := ๐›ผ๐‘’๐‘ค๐‘’ โˆ’ ๐‘, ๐‘ ๐‘ก โˆˆ S1, ๐‘Ÿ2(๐‘ ๐‘ก, ๐œ‹) := ๐œ‹๐‘’ (๐‘ ๐‘ก)๐‘ค๐‘’ โˆ’ ๐‘(๐œ‹๐‘’ (๐‘ ๐‘ก)), ๐‘ ๐‘ก โˆˆ S2, where ๐‘ corresponds to the cyber cost for all the states belonging to the set S1, i.e., states under defenderโ€™s control. Note that when the adversary reaches the physical state ๐‘ ๐‘ก = {๐›พroot, ๐‘ฅ๐‘ก } the defenderโ€™s action has no impact on the reward obtained by the adversary. The state transitions under Assumption 5.3.7 are of the form ๐‘1(๐‘ง|๐‘ ๐‘ก, ๐œถ) = ๐›ผ๐‘’๐‘ค๐‘’, โˆ€๐‘ ๐‘ก โˆˆ S1, ๐‘2(๐‘ง|๐‘ ๐‘ก, ๐œถ) = ๐œ‹๐‘’ (๐‘ ๐‘ก)๐‘ค๐‘’, โˆ€๐‘ ๐‘ก โˆˆ S2. Therefore, we satisfy both ARAT property for the stochastic game ฮ“ (5.24). โ–ก Using Theorem 3.1 from [121], we conclude that the ARAT game ฮ“ (5.24) admits a Nash equilibrium in stationary strategies which uses at most two pure actions for each player in each state. This result will allow us to significantly prune down the adversary edges of a large graph that satisfies Assumption 5.3.7. 5.4 Numerical Experiments We now demonstrate the effectiveness of our proposed network hardening algorithm on a smart building case-study with a cyber layer inspired by a ransomware attack graph and the physical layer obtained from a truncated model identified using real-world experiments. 5.4.1 Case-study: Sensor Deception Attacks on Building In this use case, the adversary aims to maximize the occupant discomfort of a single zone in the given building over a defined time horizon, while the defender seeks to minimize a combination of the discomfort and the hardening cost. The buildingโ€™s air-handling unit (AHU) performs standard 171 operations by reconditioning ambient air and return air to a specific supply-air temperature and then supplying it to various building zones using a supply fan. The adversary aims to manipulate temperature measurements from various zone-level sensors to deceive the AHU control system and send poorly conditioned air into various zones, causing comfort-bound violations over time. However, to gain access to the temperature sensors at various zones, the adversary has to penetrate the sensor unit via a set of cyber exploits present on different components of a Building Automation System (BAS), such as IoT devices (e.g., IP cameras and smart thermostats), building-management workstations, and programmable logic controllers (PLC). For the cyber layer, we use a pruned version of a ransomware attack graph [126] created using information flow. The original graph represents multiple stages of an attack progression: (a) a privilege escalation stage, (b) lateral movement over the cyber nodes and (c) reaching the goal node. We use an HAG to represent these specific stages, as shown in Figure 5.3a. Similar attack graphs for BAS were used in [48], where the attack paths involved executing a subset of tactics defined in popular attack frameworks, such as MITREโ€™s ATT&CK [1]. The reward functions used for an adversary in the cyber and physical layer of a CPS is usually system-specific and depends on the systemโ€™s overall security objective and specifications. For instance, the cyber reward at a certain node in an HAG can be set equal to the loss a defender or system administrator would incur in case an adversary were to successfully access the corresponding node. For this case study, we set the cyber reward to a positive value that incentivizes a resource- and/or time-constrained adversary to reach the physical node as quickly as possible. However, other cyber-layer reward specifications can be easily integrated in our framework. On the other hand, reward in the physical layer is generally associated with a metric that corresponds to loss in physical- system performance due to the adversaryโ€™s actions. Examples include power, energy, efficiency or deviation of performance beyond a specified bound. It is also important to note that probability of transitions between different nodes in an HAG is usually determined from related attack-incident reports in the literature (see [41] for more details). However, we use synthetic transition-probability values in the ransomware attack graph for demonstrative purposes only. Next, we elaborate the 172 cyber and physical layer components of the proposed HAG using notation described in Section 5.2. (a) (b) Figure 5.3 (a) An HAG inspired from a ransomware attack graph [126]. The source node 1 is represented by the dashed circle and the physical node (sink node) 9 is represented by concentric circles. (b) Trajectories of Zone 1 temperature (Zone 1) along with the outside air temperature (Outside T) over a year with upper (T max) and lower temperature (T min) comfort bounds. 5.4.2 Cyber Layer The HAG consists of eight cyber vertices with the associated cyber exploits also known as tactics from MITRE ATT&CK framework. The physical node is represented via concentric blue circles (node 9 in Figure 5.3a). Each vertex (tactic) and its corresponding edge (technique) are shown in Table 5.1. A user can generate such attack graphs and models using the framework in [41]. The success probability of any of the cyber exploit is independently sampled from a uniform distribution, U โˆผ [0.5, 1). For an attack action ๐‘Ž๐‘ก โˆผ ๐œ‹(๐‘ ๐‘ก, ๐›ผ๐›ผ๐›ผ; ๐œ“) on the cyber layer, the adversary incurs a cost ๐‘(๐‘Ž๐‘ก) of 0.1 and a nominal reward of 1 if an exploit is successful, while the reward for doing nothing is assigned a value of 0. The reward from the cyber layer to the adversary is given by, ๐‘Ÿ (๐‘ ๐‘ก, ๐œ‹, ๐œถ) = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ 1 โˆ’ ๐‘(๐œ‹๐‘’ (๐‘ ๐‘ก)), with probability ๐›ผ๐‘’๐‘ค๐‘’๐œ‹๐‘’ (๐‘ ๐‘ก), โˆ’๐‘(๐œ‹๐‘’ (๐‘ ๐‘ก)), with probability 1 โˆ’ ๐›ผ๐‘’๐‘ค๐‘’๐œ‹๐‘’ (๐‘ ๐‘ก), (5.25) where ๐œ‹๐‘’ (๐‘ ๐‘ก) denotes the adversaryโ€™s probability of choosing exploit ๐‘’ while in the state ๐‘ ๐‘ก. Then, the expected reward until the root (physical) node is not compromised is given by E[๐‘Ÿ (๐‘ ๐‘ก, ๐œ‹, ๐œถ)] = ๐›ผ๐‘’๐‘ค๐‘’๐œ‹๐‘’ (๐‘ ๐‘ก) โˆ’ ๐‘(๐œ‹๐‘’ (๐‘ ๐‘ก)), 173 123456782345Physical processTemperaturesensorAHU UnitZone temperaturedynamics90100200300Day0102030Temperature COutside TZone 1T minT max5010022.525.0 Table 5.1 Cyber exploits and their corresponding probability of success. Node Tactic 1 2 Initial access Execution 3 Persistence 4 (1,2) Edge Transition Probability Node 0.82 (Internet Accessible Device) 0.63 (Execution through API) 0.88 (Man in the middle) 0.89 (Module Firmware) (3,4) (2,3) (2,4) 6 5 Tactic Edge Transition Probability Node Tactic Edge Transition Probability Evasion (4,5) 0.56 (Utilize/Change operating module) (4,6) 0.94 (Rootkit) Discovery (5,6) Lateral movement (6,7) 0.97 (Control Device Identification) 0.59 (External Remote Services) 6 7 8 Lateral movement Inhibit response function Impair process control (6,8) (6,9) (7,8) (8,9) 0.87 (Remote File Copy) 0.78 (Program Organization Units) 0.87 (Block serial COM) 0.50 (Change Program State) subject to the dynamics in (5.4). Note that each exploit has a positive expected net reward, which incentivizes the adversary to reach the root node as quickly as possible. 5.4.3 Physical Layer We consider a multi-zone residential building with a single floor as our representative building, which is based on the setup described in [116]. The building has 6 conditioning zones and a central Air Handling Unit (AHU) that sends thermally conditioned air to each zone using a supply-air fan. The AHU unit uses an absorption chiller for conventional cooling and a backup boiler for emergency heating during very low ambient temperatures. Conventional heating is provided by Variable Air Volume (VAV) terminal units with reheat coils that regulate the temperature and flow-rate of the air entering each zone. To accurately model the building dynamics, a linearized, time-invariant, discrete-time, reduced- order state-space model (SSM) can be used, as discussed in [116]. We use the RenoLight SSM as part of the Python Systems Library (PSL) [118] to simulate the dynamics of our representative building. The RenoLight model comprises of 250 states (building envelope variables), 6 control inputs (amount of heating or cooling for each zone) and 6 observations (zone temperatures). The sampling frequency of the model is set to 15 minutes. Notation and description of the different components of the SSM are reported in Table 5.2. We use a rule-based controller to provide Table 5.2 Description of the variables in the building model. Variable Description ๐‘ฅ๐‘ก ๐‘ฆ๐‘ก ๐‘ข๐‘ก ๐‘ค๐‘ก Building envelope states Zone temperature measurements Amount of heating or cooling (control inputs) Ambient temperature (disturbance) Unit โ—ฆC โ—ฆC โ—ฆC kg sโˆ’1 โ—ฆC 174 occupant thermal comfort by maintaining zone temperature in each zone within specified comfort bounds. Specifically, the amount of heating or cooling at time ๐‘ก in zone ๐‘– was set according to โˆ’๐‘ขmax min โˆ’๐‘ขmax min (cid:110) ๐‘ฆ๐‘– ๐‘ก โˆ’๐‘ฆmax+๐›ฟ หœ๐œ– (cid:111) , , 1 (cid:110) โˆ’๐‘ฆ๐‘– ๐‘ก +๐‘ฆmin+๐›ฟ หœ๐œ– (cid:111) , , 1 if ๐‘ฆ๐‘– ๐‘ก > ๐‘ฆmax โˆ’ ๐‘œ, if ๐‘ฆ๐‘– ๐‘ก โ‰ค ๐‘ฆmin + ๐‘œ, 0, otherwise, ๐‘ข๐‘– ๐‘ก = ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ where ๐‘ฆmin and ๐‘ฆmax are the prescribed lower and upper comfort bounds, ๐‘œ is hysteresis parameter, หœ๐œ– is proportional gain and ๐‘ขmax is the maximum heating or cooling capacity of the controller. For our experiments, we set ๐‘ฆmin = 23โ—ฆC and ๐‘ฆmax = 25โ—ฆC, respectively. Figure 5.3b shows the nominal annual performance of the rule-based controller (under no attacks), which clearly shows that the zone temperatures stay within the comfort bounds with high probability. On acquiring access to a zone temperature sensor, the adversary can perturb the sensor mea- surements to cause occupant discomfort in that zone. With a slight abuse of notation, let ๐‘Ž๐‘ก be the adversarial temperature perturbation at time ๐‘ก. For demonstrative purposes, only the temperatures in zone 1 are allowed to be perturbed; henceforth, we drop the zone superscripts. The perturbed zone temperature measurement at time ๐‘ก changes to ๐‘ฆ๐‘ก = ๐‘ฅ๐‘ก + ๐‘Ž๐‘ก. The adversaryโ€™s reward for executing the action ๐‘Ž๐‘ก when the physical state is ๐‘ ๐‘ก = {๐›พ๐‘ก, ๐‘ฅ๐‘ก } = {๐›พ๐‘Ÿ๐‘œ๐‘œ๐‘ก, ๐‘ฅ๐‘ก }, denoted by ๐‘Ÿ (๐‘ ๐‘ก, ๐‘Ž๐‘ก), equals ๐‘Ÿ (๐‘ ๐‘ก, ๐‘Ž๐‘ก) = (๐‘ฆmin โˆ’ ๐‘ฆ๐‘ก)+ + (๐‘ฆ๐‘ก โˆ’ ๐‘ฆmax)+ โˆ’ ๐‘๐‘Ž2 ๐‘ก . (5.26) For ๐‘ข โˆˆ R, the first (resp. second) term is the thermal discomfort caused by temperature deviation from the lower (resp. upper) comfort bound. The cost for executing an action ๐‘Ž๐‘ก is scaled by a proportional term ๐‘. Since ๐‘Ž๐‘ก takes values in a discrete set, the expected reward is given by E[๐‘Ÿ (๐‘ ๐‘ก, ๐œ‹๐‘ก)] = โˆ‘๏ธ (cid:16) ๐œ‹๐‘Ž๐‘ก (๐‘ ๐‘ก) (๐‘ฆmin โˆ’ ๐‘ฆ๐‘ก)+ + (๐‘ฆ๐‘ก โˆ’ ๐‘ฆmax)+ โˆ’ ๐‘๐‘Ž2 ๐‘ก (cid:17) . ๐‘Ž๐‘ก โˆˆA (๐‘ ๐‘ก ) Note that based on the action and the state, the adversary will either observe the cyber or the physical reward. Once the root node is compromised, the defender can only measure the discomfort caused by the adversaryโ€™s perturbation in any zone. The expected return incurred under a set of defenses and 175 adversary policy in state ๐‘ ๐‘ก equals ๐‘๐‘‘ (๐‘ ๐‘ก, ๐œ‹) = โˆ’E [๐‘Ÿ (๐‘ ๐‘ก, ๐œ‹)] . (5.27) Since the defenderโ€™s actions are purely on the cyber layer, once an adversary reaches a root node, the return ๐‘๐‘‘ is invariant of the defender policy ๐œถ. Network Hardening (a) (b) Figure 5.4 (a) Defenderโ€™s (๐ฝdef) and Attackerโ€™s (๐ฝatt) objective with a hardening cost factor ๐‘‘๐‘’ := 0.1, where min(๐‘–) is defined as the ๐‘–๐‘กโ„Ž argument minimum of ๐ฝdef/att. (b) Defenderโ€™s (๐ฝdef) and Attackerโ€™s (๐ฝatt) objective with a hardening cost factor of ๐‘‘๐‘’ := 0.5. (c) Defenderโ€™s (๐ฝdef) and Attackerโ€™s (๐ฝatt) objective with a hardening cost factor ๐‘‘๐‘’ := 1. (c) (a) (b) (c) Figure 5.5 (a) Average time steps required to reach the physical node for the adversary for the hardening factor of ๐‘‘๐‘’ := 0.1. (b) Average time steps to reach the physical node with ๐‘‘๐‘’ := 0.5 (c) Average time steps to reach the physical node with ๐‘‘๐‘’ := 1.0. We numerically demonstrate the outcome of Algorithm 3 with the following parameters, (a) time horizon ๐‘‡ = 48, (b) hardening iteration ๐พ = 100, (c) AC episodes ๐‘ = 30000 and (d) lower 176 020406080100Hardening Epoch10.0020.00Jdefmin 0min 1020406080100Hardening Epoch10.0020.00Jattmin0min10255075100Hardening Epoch20.0040.00Jdefmin 0min 10255075100Hardening Epoch20.0030.00Jattmin0min10255075100Hardening Epoch25.0050.0075.00Jdefmin 0min 10255075100Hardening Epoch0.0020.00Jattmin0min10255075100Hardening Epoch15.0020.0025.0030.0035.00[Time]argmin Jdef0255075100Hardening Epoch10.0015.0020.0025.0030.00[Time]argmin Jdef0255075100Hardening Epoch10.0020.0030.00[Time]argmin Jdef bound for hardening ๐›ผ = 0.1. Figure 5.4 illustrates the defenderโ€™s and adversaryโ€™s objectives at the end of the hardening iteration for different values of the hardening cost factor ๐‘‘๐‘’ = 0.1, 0.5, and 1. As shown in Figures 5.4a, 5.4b and 5.4c, increasing values of ๐‘‘๐‘’ lead to higher objectives for both the adversary and defender. The adversaryโ€™s and defenderโ€™s objective show diminishing marginal improvement with increasing number of iterations of the approach, suggesting proximity to an approximate NE of the game. But since this is a non-zero-sum game, characterizing additional properties such as the price of anarchy and convergence to an NE will require additional assumptions on the structure of the playersโ€™ objectives, and is a topic of future investigation. We quantify the effectiveness of Algorithm 3, by measuring the average time taken by the adversary to reach the physical node during the BO process for different values of ๐‘‘๐‘’, as shown in Figures 5.5a, 5.5b and 5.5c. We observe that as ๐‘‘๐‘’ increases, the average time taken to reach the physical node decreases. Furthermore, we compare the distribution of the time required to reach the physical node for the corresponding values of ๐‘‘๐‘’ against the expected absorption time in (5.11) as shown in Figure 5.6b. We observe that the expected absorption time ๐ฝAMC is greater than the median value of the empirically determined time. This result justifies the use of the proposed approach over standard optimization methods for optimizing ๐ฝAMC Next, we visualize the defender policy ๐›ผ๐›ผ๐›ผ for the three values of ๐‘‘๐‘’ shown in Figure 5.6a. We observe that a majority of the weights are hardened for a smaller values of ๐‘‘๐‘’, indicating the effectiveness of our approach in balancing between the cost of hardening and the cost of securing the CPS. We demonstrate a sample node trajectory for the corresponding values of ๐‘‘๐‘’ shown in Figures 5.7a, 5.7b and 5.7c. As expected, the adversary takes significantly longer to reach the physical node with ๐‘‘๐‘’ = 0.1 as compared to ๐‘‘๐‘’ = 1.0. Finally, as the defender can only observe the discomfort in HAG, we evaluated the same for the prior defined values of ๐‘‘๐‘’ using the obtained policies of {๐œ‹โˆ—, ๐œถโˆ—} shown in Figure 5.8a, 5.8b and 5.8c. The results show a decrease in discomfort for the lowest value of ๐‘‘๐‘’ := 0.1. Our approach to adversarial network hardening provides a principled defense planning solution in the presence of an adversary. Despite the defenderโ€™s limited knowledge of the adversaryโ€™s 177 movements and only being able to measure physical attributes, our approach prevents the adversary from gaining privileges in the HAG. Our framework optimizes network hardening and adversary cost simultaneously, resulting in robust policies for both players, leading to an approximate best- response pair for the non-zero-sum game. This approach offers a promising defense mechanism against adversarial attacks. (a) (b) Figure 5.6 (a) Cyber exploits weights obtained from the result of Algorithm 3 with a cyber cost factor of ๐‘‘๐‘’ = 0.1, 0.5 and 1.0. (b) Time to reach the physical node 9 for varying hardening cost factor and compared with the expected time to reach (๐ฝAMC) obtained from (5.11). (a) (b) (c) Figure 5.7 Sample node trajectory obtained from an attack policy with a hardening cost factor of (a) ๐‘‘๐‘’ = 1.0, (b)๐‘‘๐‘’ = 0.5, and (c) ๐‘‘๐‘’ = 0.1, where with null action corresponding to no action taken by the adversary. 178 0.00.51.0123456789101112Parametersde=0.1de=0.5de=1.00.10.51.0Hardening cost factor de1020304050Time to reach node 9JAMC0255075Time (hour)123456789NodesTransitionPhysical node0255075Time (hour)123456789NodesTransitionPhysical node0255075Time (hour)123456789NodesTransitionNullPhysical node (a) (b) Figure 5.8 Discomfort corresponding under the optimal policy {๐œ‹โˆ—, ๐œถโˆ—} for the hardening cost factor (a) ๐‘‘๐‘’ := 1.0, (b) ๐‘‘๐‘’ := 0.5, and (c) ๐‘‘๐‘’ := 0.1. (c) 5.5 Summary This chapter developed a domain-aware framework for automated adversarial defense planning, accounting for cross-layer interaction between the cyber and physical components of a CPS. Our approach leveraged an MDP with a hybrid state representing the cyber (discrete) and physical (continuous) state of the system to capture the adversaryโ€™s progression over the HAG. We formulated the automated defense planning as a non-zero-sum game between an adversary and a defender. We used Actor Critic, a RL method and Bayesian optimization to iteratively solve the adversaryโ€™s and defenderโ€™s problem, respectively. Finally, we demonstrated the effectiveness of our proposed framework on a ransomware inspired graph in conjunction with smart building dynamics. The obtained results show a hardened network for varying hardening costs along with diminishing marginal improvement for both players. A preliminary set of results with an adversary emulation were presented in [28]. The results presented in this chapter were recently published in [22]. 179 0255075Time (hour)0.00.10.20.30.4Discomfort0255075Time (hour)0.00.20.4Discomfort0255075Time (hour)0.00.10.20.30.4Discomfort CHAPTER 6 FUTURE DIRECTION In this thesis, we developed a diverse range of adversarial models commonly encountered in various CPS. For each type of adversarial model, we developed a corresponding game-theoretic framework to reason about the possible defensive actions. The first part of the thesis consisted of state-independent adversarial models. First, we presented a deterministic adversary and developed corresponding defensive strategies, demonstrating the use of such a framework in path planning applications. Next, we extended the adversarial model to a stochastic adversary, which includes both benign and adversarial actions. Furthermore, we incorporated the concept of budget and its variants. The budget enables accounting for false positives and prevents the defender from choosing overly conservative policies. We also extended the analysis to more than two actions per player and characterized the Nash equilibrium policies for both players. We demonstrated the use of this stochastic adversarial game-theoretic framework in motion planning problems and resilient estimation. In the second part of the thesis, we focused on state-based adversarial models, accounting for state and control-dependent costs. We presented a game of resource takeovers in dynamical systems. In such games, the adversary can completely take over the system and drive it to undesirable states. We characterized the Nash equilibrium takeover strategies for both players and derived conditions under which a linear state-feedback control law exists for both players. We applied this game- theoretic framework of resource takeovers to linear dynamical systems. Finally, we presented a data-driven domain-aware approach to safeguard CPS. We created an automated approach to emulate an adversary in a high-fidelity model and determine the corre- sponding optimal adversary policy using reinforcement learning. For the optimal adversary policy, we determined defensive strategies using Bayesian optimization. We demonstrated the application of this framework in a smart building system. We present future directions for each part of the thesis as follows: โ€ข Deterministic Adversary: Future directions encompass non-zero-sum formulations, which 180 could model different objectives for both the adversary and defender. We also aim to relax the constraint of single-edge attacks and defenses over the graph to include attacks over multiple edges. Formulations involving multiple vehicles are also a topic of future investigation. โ€ข Stochastic Adversary: Future directions include a non-zero-sum formulation of the M-SSG, taking into account different objectives for the defender and the malign player. Asymmetric or partial information in the SSG is another promising direction. Furthermore, we aim to integrate control-oriented applications into the M-SSG framework, such as the incorporation of adversarial multi-armed bandit systems and multi-plant control. โ€ข Takeover Adversary: Our future efforts will focus on expanding the scope of our model. We aim to incorporate partial state observability, wherein the discrete FlipDyn state of the system needs to be estimated. We also plan to introduce bounded process and measurement noise into the framework, investigating its impact on the FlipDyn-Con game. Additionally, we plan to extend the number of FlipDyn states to more than two. Lastly, we intend to conduct a comparative study between our established solution and a learning-based approach, evaluating their performance across various objectives and cost functions. โ€ข Data-Driven Adversary: Future work will focus on studying the convergence properties of our proposed approach. Additionally, integrating an Intrusion Detection System (IDS) and an Intrusion Response System (IRS) on the cyber layer would enable a more informed and active defender. We also plan to extend the defenderโ€™s policy from a static network hardening approach to an active network reconfiguration with one or multiple adversaries in the HAG. Exploring zero-day exploits and preemptive defense mechanisms within the framework is another area of interest. Finally, we will investigate the strategic use of backup systems and their interaction within the CPS. These backup systems could represent hidden parts of the HAG, and the defender may choose to activate them to improve the current systemโ€™s performance. 181 [1] MITRE ATT&CK, https://attack.mitre.org, 2021. BIBLIOGRAPHY [2] [3] [4] [5] [6] [7] [8] [9] Gaurav Kumar Agarwal, Mohammed Karmoose, Suhas N. Diggavi, Christina Fragouli, and Paulo Tabuada. Distorting an adversaryโ€™s view in cyber-physical systems. 2018 IEEE Conference on Decision and Control (CDC), pages 1476โ€“1481, 2018. BoHyun Ahn, Taesic Kim, Jinchun Choi, Sung-won Park, Kuchan Park, and Dongjun Won. A cyber kill chain model for distributed energy resources (der) aggregation systems. In 2021 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), pages 1โ€“5, 2021. Abdullah Al-Dujaili, Erik Hemberg, and Una-May Oโ€™Reilly. Approximating Nash Equilibria for Black-Box Games: A Bayesian Optimization Approach. In International Workshop on Optimization in Multiagent Systems. AAMAS, 2018. Rawan Al-Shaer, Jonathan M. Spring, and Eliana Christou. Learning the associations of mitre att & ck adversarial techniques. In 2020 IEEE Conference on Communications and Network Security (CNS), pages 1โ€“9, 2020. Otis Alexander, Misha Belisle, and Jacob Steele. Mitre att&ckยฎ for industrial control systems: Design and philosophy. Technical report, 2020. Tansu Alpcan and Tamer BaลŸar. Network security: A decision and game-theoretic approach. Cambridge University Press, 2010. Paul Ammann, Duminda Wฤณesekera, and Saket Kaushik. Scalable, graph-based network vulnerability analysis. CCS โ€™02, page 217โ€“224, New York, NY, USA, 2002. Association for Computing Machinery. Cyrus Anderson, Ram Vasudevan, and Matthew Johnson-Roberson. A kinematic model for trajectory prediction in general highway scenarios. IEEE Robotics and Automation Letters, 6(4):6757โ€“6764, 2021. [10] Anup Aprem and Stephen Roberts. A Bayesian Optimization Approach to Compute Nash Equilibrium of Potential Games Using Bandit Feedback. The Computer Journal, 64(12):1801โ€“1813, 12 2019. [11] Martin Arvidsson and Ida Gremyr. Principles of robust design methodology. Quality and Reliability Engineering International, 24(1):23โ€“35, 2008. [12] Algirdas Avizienis, J-C Laprie, Brian Randell, and Carl Landwehr. Basic concepts and IEEE transactions on dependable and taxonomy of dependable and secure computing. secure computing, 1(1):11โ€“33, 2004. [13] Georgios Bakirtzis, Bryan T. Carter, Carl R. Elks, and Cody H. Fleming. A model-based approach to security analysis for cyber-physical systems. In 2018 Annual IEEE International Systems Conference (SysCon), pages 1โ€“8, 2018. 182 [14] Sandeep Banik and Shaunak D. Bopardikar. Secure route planning using dynamic games with stopping states. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2404โ€“2409, 2020. [15] Sandeep Banik and Shaunak D. Bopardikar. Attack-resilient path planning using dynamic games with stopping states. IEEE Transactions on Robotics, 38(1):25โ€“41, 2022. [16] Sandeep Banik and Shaunak D. Bopardikar. Flipdyn: A game of resource takeovers in dynamical systems. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 2506โ€“2511, 2022. [17] Sandeep Banik and Shaunak D. Bopardikar. FlipDyn: A game of resource takeovers in dynamical systems. In to appead, 2022 IEEE Conference on Decision and Control (CDC). IEEE, 2022. [18] Sandeep Banik and Shaunak D. Bopardikar. Stochastic games with stopping states and their application to adversarial motion planning problems. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13181โ€“13188, 2022. [19] Sandeep Banik and Shaunak D. Bopardikar. Stochastic games with stopping states and their application to adversarial motion planning problems. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13181โ€“13188, 2022. [20] Sandeep Banik and Shaunak D. Bopardikar. Budget-based stochastic games with stopping states. In European Journal of Control (Under review), 2023. [21] Sandeep Banik and Shaunak D. Bopardikar. FlipDyn with control: Resource takeover games with dynamics. IEEE Transactions on Automatic Control (under review), 2023. [22] Sandeep Banik, Thiagarajan Ramachandran, Arnab Bhattacharya, and Shaunak D Bopar- dikar. Automated adversary-in-the-loop cyber-physical defense planning. ACM Transactions on Cyber-Physical Systems, 2023. [23] Tamer BaลŸar and Geert Jan Olsder. Dynamic Noncooperative Game Theory, 2nd Edition. Society for Industrial and Applied Mathematics, 1998. [24] Halil Bayrak and Matthew D Bailey. Shortest path network interdiction with asymmetric information. Networks: An International Journal, 52(3):133โ€“140, 2008. [25] Dimitris Bertsimas and Omid Nohadani. Robust optimization with simulated annealing. Journal of Global Optimization, 48(2):323โ€“334, 2010. [26] Dimitris Bertsimas, Omid Nohadani, and Kwong Meng Teo. Nonconvex robust optimization for problems with constraints. INFORMS journal on computing, 22(1):44โ€“58, 2010. [27] Dimitris Bertsimas, Omid Nohadani, and Kwong Meng Teo. Robust optimization for un- constrained simulation-based problems. Operations research, 58(1):161โ€“178, 2010. 183 [28] Arnab Bhattacharya, Thiagarajan Ramachandran, Sandeep Banik, Chase P Dowling, and Shaunak D Bopardikar. Automated adversary emulation for cyber-physical systems via reinforcement learning. In 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), pages 1โ€“6. IEEE, 2020. [29] Gianluca Bianchin, Yin-Chen Liu, and Fabio Pasqualetti. Secure navigation of robots in adversarial environments. IEEE Control Systems Letters, 4(1):1โ€“6, 2019. [30] Kevin D Bowers, Marten Van Dฤณk, Robert Griffin, Ari Juels, Alina Oprea, Ronald L Rivest, and Nikos Triandopoulos. Defending against the unknown enemy: Applying flipit to system security. In International Conference on Decision and Game Theory for Security, pages 248โ€“263. Springer, 2012. [31] Steven J Bradtke. Incremental dynamic programming for on-line adaptive optimal control. PhD thesis, Citeseer, 1994. [32] Anna L Buczak and Erhan Guven. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications surveys & tutorials, 18(2):1153โ€“ 1176, 2015. [33] Elisa Canzani and Stefan Pickl. Cyber epidemics: Modeling attacker-defender dynamics in critical infrastructure systems. In Advances in Human Factors in Cybersecurity: Proceedings of the AHFE 2016 International Conference on Human Factors in Cybersecurity, July 27-31, 2016, Walt Disney Worldยฎ, Florida, USA, pages 377โ€“389. Springer, 2016. [34] Yulong Cao, Chaowei Xiao, Benjamin Cyr, Yimeng Zhou, Won Park, Sara Rampazzi, Qi Alfred Chen, Kevin Fu, and Zhuoqing Morley Mao. Adversarial Sensor Attack on LiDAR- based Perception in Autonomous Driving. In Proceedings of the 26th ACM Conference on Computer and Communications Security (CCSโ€™19), London, UK, November 2019. [35] Nicolo Cesa-Bianchi and Gรกbor Lugosi. Prediction, learning, and games. Cambridge university press, 2006. [36] Somali Chaterji, Parinaz Naghizadeh, Muhammad Ashraful Alam, Saurabh Bagchi, Mung Chiang, David Corman, Brian Henz, Suman Jana, Na Li, Shaoshuai Mou, Meeko Oishi, Chunyi Peng, Tiark Rompf, Ashutosh Sabharwal, Shreyas Sundaram, James Weimer, and Jennifer Weller. Resilient cyberphysical systems and their application drivers: A technology roadmap, 2019. [37] Genshe Chen, Dan Shen, Chiman Kwan, Jose B. Cruz, and Martin Kruger. Game theoretic approach to threat prediction and situation awareness. In 2006 9th International Conference on Information Fusion, pages 1โ€“8, 2006. [38] Thomas M Chen, Juan Carlos Sanchez-Aarnoutse, and John Buford. Petri net modeling of cyber-physical attacks on smart grid. IEEE Transactions on smart grid, 2(4):741โ€“749, 2011. [39] Ye Chen, Yanda Li, Dongjin Xu, and Liang Xiao. DQN-based power control for IoT transmission against jamming. In 2018 IEEE 87th Vehicular Technology Conference (VTC Spring), pages 1โ€“5. IEEE, 2018. 184 [40] Ying Chen, Shaowei Huang, Feng Liu, Zhisheng Wang, and Xinwei Sun. Evaluation of reinforcement learning-based false data injection attack to automatic voltage control. IEEE Transactions on Smart Grid, 10(2):2158โ€“2169, 2018. [41] Seungoh Choi, Jeong-Han Yun, and Byung-Gil Min. Probabilistic attack sequence generation and execution based on mitre att&ck for ics datasets. In Cyber Security Experimentation and Test Workshop, CSET โ€™21, page 41โ€“48, New York, NY, USA, 2021. Association for Computing Machinery. [42] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2009. [43] Mathieu Dahan and Saurabh Amin. Network flow routing under strategic link disruptions. In 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 353โ€“360. IEEE, 2015. [44] Eric Dallal, Daniel Neider, and Paulo Tabuada. Synthesis of safety controllers robust to In 2016 IEEE 55th Conference on Decision and unmodeled intermittent disturbances. Control (CDC), pages 7425โ€“7430. IEEE, 2016. [45] Peter Dayan and Terrence J Sejnowski. Td converges with probability 1. Machine Learning, 14(3):295โ€“301, 1994. [46] Seyed Mehran Dibaji, Mohammad Pirani, David Bezalel Flamholz, Anuradha M. An- naswamy, Karl Henrik Johansson, and Aranya Chakrabortty. A systems and control perspec- tive of cps security. Annual Reviews in Control, 47:394โ€“411, 2019. [47] Jerry Ding, Maryam Kamgarpour, Sean Summers, Alessandro Abate, John Lygeros, and Claire Tomlin. A stochastic games framework for verification and control of discrete time stochastic hybrid systems. Automatica, 49(9):2665โ€“2674, 2013. [48] Daniel dos Santos, Clement Speybrouck, and Elisa Costante. Cybersecurity in Building Automation Systems. Technical report, Forescout Technologies, 2019. [49] Karel Durkota, Viliam Lisy, Branislav Boลกansky, and Christopher Kiekintveld. Optimal net- work security hardening using attack graph games. In Proceedings of the 24th International Conference on Artificial Intelligence, ฤฒCAIโ€™15, page 526โ€“532. AAAI Press, 2015. [50] Mahsa Emami-Taba and Ladan Tahvildari. A Bayesian game decision-making model for uncertain adversary types. In Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering, CASCON โ€™16, page 39โ€“49, USA, 2016. IBM Corp. [51] Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Designing fast absorbing markov chains. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1), Jun. 2014. [52] Hamza Fawzi, Paulo Tabuada, and Suhas Diggavi. Security for control systems under sensor and actuator attacks. In 2012 IEEE 51st IEEE conference on Decision and Control (CDC), pages 3412โ€“3417. IEEE, 2012. 185 [53] Carmel Fiscko, Soummya Kar, and Bruno Sinopoli. Efficient solutions for targeted control of multi-agent mdps. In 2021 American control conference (acc), pages 690โ€“696. IEEE, 2021. [54] Carmel Fiscko, Soummya Kar, and Bruno Sinopoli. Cluster-based control of transition- independent mdps. arXiv preprint arXiv:2207.05224, 2022. [55] Carmel Fiscko, Brian Swenson, Soummya Kar, and Bruno Sinopoli. Control of parametric games. In 2019 18th European Control Conference (ECC), pages 1036โ€“1042. IEEE, 2019. [56] Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79โ€“103, 1999. [57] Andrey Garnaev, Melike Baykal-Gursoy, and H Vincent Poor. Security games with unknown adversarial strategies. IEEE transactions on cybernetics, 46(10):2291โ€“2299, 2015. [58] TN Goh. Taguchi methods: some technical, cultural and pedagogical perspectives. Quality and Reliability Engineering International, 9(3):185โ€“202, 1993. [59] James Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3(2):97โ€“139, 1957. [60] Peter J Hawrylak, Michael Haney, Mauricio Papa, and John Hale. Using hybrid attack graphs to model cyber-physical attacks in the smart grid. In 2012 5th International Symposium on Resilient Control Systems, pages 161โ€“164. IEEE, 2012. [61] [62] Joรฃo P Hespanha. Noncooperative game theory: An introduction for engineers and computer scientists. Princeton University Press, 2017. Joรฃo P Hespanha and Shaunak D Bopardikar. Output-feedback linear quadratic robust control under actuation and deception attacks. In 2019 American Control Conference (ACC), pages 489โ€“496. IEEE, 2019. [63] Karel Horรกk, Quanyan Zhu, and Branislav Boลกanskรฝ. Manipulating adversaryโ€™s belief: A dynamic game approach to deception by design for proactive network security. In Stefan Rass, Bo An, Christopher Kiekintveld, Fei Fang, and Stefan Schauer, editors, Decision and Game Theory for Security, pages 273โ€“294, Cham, 2017. Springer International Publishing. [64] Ashish R. Hota, Abraham A. Clements, Saurabh Bagchi, and Shreyas Sundaram. A Game- Theoretic Framework for Securing Interdependent Assets in Networks, pages 157โ€“184. Springer International Publishing, Cham, 2018. [65] Linan Huang and Quanyan Zhu. Dynamic Bayesian Games for Adversarial and Defensive Cyber Deception, pages 75โ€“97. Springer International Publishing, Cham, 2019. [66] Yunhan Huang, Zehui Xiong, and Quanyan Zhu. Cross-layer coordinated attacks on cyber- physical systems: A lqg game framework with controlled observations. In 2021 European Control Conference (ECC), pages 521โ€“528. IEEE, 2021. 186 [67] Mariam Ibrahim and Ahmad Alsheikh. Automatic hybrid attack graph (ahag) generation for complex engineering systems. Processes, 7(11), 2019. [68] Petros A Ioannou and Cheng-Chih Chien. Autonomous intelligent cruise control. IEEE Transactions on Vehicular technology, 42(4):657โ€“672, 1993. [69] Eitan Israeli and R Kevin Wood. Shortest-path network interdiction. Networks: An Interna- tional Journal, 40(2):97โ€“111, 2002. [70] Anna Jaล›kiewicz and Andrzej S Nowak. On pure stationary almost markov nash equilibria in nonzero-sum arat stochastic games. Mathematical Methods of Operations Research, 81(2):169โ€“179, 2015. [71] A. Y. Javaid, W. Sun, V. K. Devabhaktuni, and M. Alam. Cyber security threat analysis and modeling of an unmanned aerial vehicle system. In 2012 IEEE Conference on Technologies for Homeland Security (HST), pages 585โ€“590, Nov 2012. [72] Yunhan Jia Jia, Yantao Lu, Junjie Shen, Qi Alfred Chen, Hao Chen, Zhenyu Zhong, and Tao Wei Wei. Fooling detection alone is not enough: Adversarial attack against multiple object tracking. In International Conference on Learning Representations (ICLRโ€™20), 2020. [73] Benjamin Johnson, Aron Laszka, and Jens Grossklags. Games of timing for security in In Decision and Game Theory for Security: 6th International dynamic environments. Conference, GameSec 2015, London, UK, November 4-5, 2015, Proceedings 6, pages 57โ€“ 73. Springer, 2015. [74] Nathan Koenig and Andrew Howard. Design and use paradigms for gazebo, an open-source multi-robot simulator. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), volume 3, pages 2149โ€“2154. IEEE, 2004. [75] Efstathios Kontouras, Anthony Tzes, and Leonidas Dritsas. Adversary control strategies for discrete-time systems. In 2014 European Control Conference (ECC), pages 2508โ€“2513. IEEE, 2014. [76] Efstathios Kontouras, Anthony Tzes, and Leonidas Dritsas. Covert attack on a discrete-time system with limited use of the available disruption resources. In 2015 European Control Conference (ECC), pages 812โ€“817. IEEE, 2015. [77] Andreas Krause, Alex Roper, and Daniel Golovin. Randomized sensing in adversarial environments. In International Joint Conference on Artificial Intelligence, 2011. [78] Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C Suh, Ikkyun Kim, and Kuinam J Kim. A survey of deep learning-based network anomaly detection. Cluster Computing, 22(1):949โ€“961, 2019. [79] Harjinder Singh Lallie, Kurt Debattista, and Jay Bal. A review of attack graph and attack tree visual syntax in cyber security. Computer Science Review, 35:100219, 2020. [80] Ralph Langner. Stuxnet: Dissecting a cyberwarfare weapon. IEEE Security & Privacy, 9(3):49โ€“51, 2011. 187 [81] Aron Laszka, Gabor Horvath, Mark Felegyhazi, and Levente Buttyรกn. FlipThem: Modeling targeted attacks with FlipIt for multiple resources. In International Conference on Decision and Game Theory for Security, pages 175โ€“194. Springer, 2014. [82] Chanhwa Lee, Hyungbo Shim, and Yongsoon Eun. Secure and robust state estimation under sensor attacks, measurement noises, and process disturbances: Observer-based combina- torial approach. In 2015 European Control Conference (ECC), pages 1872โ€“1877. IEEE, 2015. [83] Robert M Lee, Michael J Assante, and Tim Conway. German steel mill cyber attack. Technical report, 2014. [84] David Leslie, Chris Sherfield, and Nigel P Smart. Threshold flipthem: When the winner does not need to take all. In Decision and Game Theory for Security: 6th International Conference, GameSec 2015, London, UK, November 4-5, 2015, Proceedings 6, pages 74โ€“92. Springer, 2015. [85] John Leyden. Polish teen derails tram after hacking train network, 2008. [86] Chong Li and Meikang Qiu. Reinforcement Learning for Cyber-Physical Systems: with Cybersecurity Case Studies. CRC Press, 2019. [87] L. Li and J. S. Shamma. Efficient strategy computation in zero-sum asymmetric information repeated games. IEEE Transactions on Automatic Control, 65(7):2785โ€“2800, 2020. [88] Li Li, Huixia Zhang, Yuanqing Xia, and Hongjiu Yang. Security estimation under denial- of-service attack with energy constraint. Neurocomputing, 292:111โ€“120, 2018. [89] Yuzhe Li, Aryan Saadat Mehr, and Tongwen Chen. Multi-sensor transmission power control for remote estimation through a sinr-based communication channel. Automatica, 101:78โ€“86, 2019. [90] Jinliang Liu, Liang Xiao, Guolong Liu, and Yifeng Zhao. Active authentication with reinforcement learning based on ambient radio signals. Multimedia Tools and Applications, 76(3):3979โ€“3998, 2017. [91] Yin-Chen Liu, Gianluca Bianchin, and Fabio Pasqualetti. Secure trajectory planning against undetectable spoofing attacks. Automatica, 112:108655, 2020. [92] Zhaoxi Liu and Lingfeng Wang. Flipit game model-based defense strategy against cyberat- tacks on scada systems considering insider assistance. IEEE Transactions on Information Forensics and Security, 16:2791โ€“2804, 2021. [93] George Louthan, Phoebe Hardwicke, Peter Hawrylak, and John Hale. Toward hybrid attack dependency graphs. In Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research, CSIIRW โ€™11, New York, NY, USA, 2011. Association for Computing Machinery. 188 [94] Xiaozhen Lu, Dongjin Xu, Liang Xiao, Lei Wang, and Weihua Zhuang. Anti-jamming communication game for UAV-aided VANETs. In GLOBECOM 2017 - 2017 IEEE Global Communications Conference, pages 1โ€“6, 2017. [95] Mayra Macas and Wu Chunming. Enhanced cyber-physical security through deep learning In 2019 Proceedings of the Cyber-Physical Systems PhD Workshop, pages techniques. 72โ€“83, 2013. [96] Magdi S Mahmoud, Mutaz M Hamdan, and Uthman A Baroudi. Modeling and control of cyber-physical systems subject to cyber attacks: A survey of recent advances and challenges. Neurocomputing, 338:101โ€“115, 2019. [97] K. Mansfield, T. Eveleigh, T. H. Holzer, and S. Sarkani. Unmanned aerial vehicle smart In 2013 IEEE International device ground control station cyber security threat model. Conference on Technologies for Homeland Security (HST), pages 722โ€“728, Nov 2013. [98] J. P. McDermott. Attack net penetration testing. In Proceedings of the 2000 Workshop on New Security Paradigms, NSPW โ€™00, page 15โ€“21, New York, NY, USA, 2001. Association for Computing Machinery. [99] Fei Miao, Quanyan Zhu, Miroslav Pajic, and George J Pappas. Coding schemes for securing cyber-physical systems against stealthy data injection attacks. IEEE Transactions on Control of Network Systems, 4(1):106โ€“117, 2016. [100] Fei Miao, Quanyan Zhu, Miroslav Pajic, and George J Pappas. A hybrid stochastic game for secure control of cyber-physical systems. Automatica, 93:55โ€“63, 2018. [101] Erik Miehling, Cedric Langbort, and Tamer BaลŸar. Secure contingency prediction and response for cyber-physical systems. In 2020 IEEE Conference on Control Technology and Applications (CCTA), pages 998โ€“1003, 2020. [102] Erik Miehling, Mohammad Rasouli, and Demosthenis Teneketzis. Optimal defense policies for partially observable spreading processes on bayesian attack graphs. In Proceedings of the Second ACM Workshop on Moving Target Defense, MTD โ€™15, page 67โ€“76, New York, NY, USA, 2015. Association for Computing Machinery. [103] Erik Miehling, Mohammad Rasouli, and Demosthenis Teneketzis. Control-Theoretic Ap- proaches to Cyber-Security, page 12โ€“28. Springer-Verlag, Berlin, Heidelberg, 2022. [104] Yilin Mo, Joao Hespanha, and Bruno Sinopoli. Robust detection in the presence of integrity attacks. In 2012 American Control Conference (ACC), pages 3541โ€“3546. IEEE, 2012. [105] Athira M Mohan, Nader Meskin, and Hasan Mehrjerdi. Covert attack in load frequency control of power systems. In 2020 6th IEEE International Energy Conference (ENERGYCon), pages 802โ€“807. IEEE, 2020. [106] Luan Nguyen and Vฤณay Gupta. Towards a framework of enforcing resilient operation of cyber-physical systems with unknown dynamics. IET Cyber-Physical Systems: Theory & Applications, 6(3):125โ€“138, 2021. 189 [107] Thanh Thi Nguyen and Vฤณay Janapa Reddi. Deep reinforcement learning for cyber security. IEEE Transactions on Neural Networks and Learning Systems, pages 1โ€“17, 2021. [108] Zhen Ni and Shuva Paul. A multistage game in smart grid security: A reinforcement learning solution. IEEE transactions on neural networks and learning systems, 30(9):2684โ€“2695, 2019. [109] Felix O. Olowononi, Danda B Rawat, and Chunmei Liu. Resilient machine learning for networked cyber physical systems: A survey for machine learning security to securing IEEE Communications Surveys & Tutorials, 23(1):524โ€“552, machine learning for cps. 2021. [110] Miroslav Pajic, James Weimer, Nicola Bezzo, Paulo Tabuada, Oleg Sokolsky, Insup Lee, and George J Pappas. Robustness of attack-resilient state estimators. In 2014 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), pages 163โ€“174. IEEE, 2014. [111] Kyuchan Park, Bohyun Ahn, Jinsan Kim, Dongjun Won, Youngtae Noh, Jinchun Choi, and Taesic Kim. An advanced persistent threat (apt)-style cyberattack testbed for distributed energy resources (der). In 2021 IEEE Design Methodologies Conference (DMC), pages 1โ€“5, 2021. [112] Praveen Paruchuri, Jonathan P Pearce, Milind Tambe, Fernando Ordonez, and Sarit Kraus. An efficient heuristic approach for security against multiple adversaries. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, pages 1โ€“8, 2007. [113] Martin Pelikan, David E. Goldberg, and Erick Cantรบ-Paz. Boa: The bayesian optimization In Proceedings of the 1st Annual Conference on Genetic and Evolutionary algorithm. Computation - Volume 1, GECCOโ€™99, page 525โ€“532, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. [114] Lianghong Peng, Ling Shi, Xianghui Cao, and Changyin Sun. Optimal attack energy alloca- tion against remote state estimation. IEEE Transactions on Automatic Control, 63(7):2199โ€“ 2205, 2018. [115] Lina Perelman and Saurabh Amin. A network interdiction model for analyzing the vulner- ability of water distribution systems. In Proceedings of the 3rd international conference on High confidence networked systems, pages 135โ€“144, 2014. [116] Damien Picard, Jรกn Drgoลˆa, Michal Kvasnica, and Lieve Helsen. Impact of the controller model complexity on model predictive control performance for buildings. Energy and Buildings, 152:739โ€“751, 2017. [117] Victor Picheny, Mickael Binois, and Abderrahmane Habbal. A Bayesian optimization approach to find Nash equilibria. Journal of Global Optimization, 73(1):171โ€“192, 2019. [118] PNNL. Python systems library, 2019. 190 [119] J-P Ponssard and Sylvain Sorin. The lp formulation of finite zero-sum games with incomplete information. International Journal of Game Theory, 9(2):99โ€“105, 1980. [120] Tereza Pultarova. Cyber security-ukraine grid hack is wake-up call for network operators [news briefing]. Engineering & Technology, 11(1):12โ€“13, 2016. [121] Tirukkannamangai ES Raghavan, SH Tฤณs, and OJ Vrieze. On stochastic games with additive reward and transition structure. Journal of Optimization Theory and Applications, 47(4):451โ€“ 464, 1985. [122] Nils Miro Rodday, Ricardo de O Schmidt, and Aiko Pras. Exploring security vulnerabilities In NOMS 2016-2016 IEEE/IFIP Network Operations and of unmanned aerial vehicles. Management Symposium, pages 993โ€“994. IEEE, 2016. [123] Sudip Saha, Anil Vullikanti, and Mahantesh Halappanavar. Flipnet: Modeling covert and persistent attacks on networked resources. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2444โ€“2451, 2017. [124] Sudip Saha, Anil Kumar S. Vullikanti, Mahantesh Halappanavar, and Samrat Chatterjee. In 2016 Identifying vulnerabilities and hardening attack graphs for networked systems. IEEE Symposium on Technologies for Homeland Security (HST), pages 1โ€“6, 2016. [125] Dinuka Sahabandu, Shana Moothedath, Joey Allen, Linda Bushnell, Wenke Lee, and Radha Poovendran. Stochastic dynamic information flow tracking game with reinforcement learn- ing. In Tansu Alpcan, Yevgeniy Vorobeychik, John S. Baras, and Gyรถrgy Dรกn, editors, Decision and Game Theory for Security, pages 417โ€“438, Cham, 2019. Springer Interna- tional Publishing. [126] Dinuka Sahabandu, Shana Moothedath, Joey Allen, Linda Bushnell, Wenke Lee, and Radha Poovendran. A reinforcement learning approach for dynamic information flow tracking games for detecting advanced persistent threats, 2021. [127] Dinuka Sahabandu, Baicen Xiao, Andrew Clark, Sangho Lee, Wenke Lee, and Radha Poovendran. Dift games: Dynamic information flow tracking games for advanced persistent threats. In 2018 IEEE Conference on Decision and Control (CDC), pages 1136โ€“1143, 2018. [128] Ahmed Salem, Xuening Liao, Yulong Shen, and Xiang Lu. Provoking the adversary by dual detection techniques: A game theoretical framework. In 2017 International Conference on Networking and Network Applications (NaNA), pages 326โ€“329. IEEE, 2017. [129] Anibal Sanjab, Walid Saad, and Tamer Basar. A game of drones: Cyber-physical security of time-critical UAV applications with cumulative prospect theory perceptions and valuations. CoRR, abs/1902.03506, 2019. [130] Anibal Sanjab, Walid Saad, and Tamer BaลŸar. Prospect theory for enhanced cyber-physical security of drone delivery systems: A network interdiction game. 2017 IEEE International Conference on Communications (ICC), pages 1โ€“6, 2017. 191 [131] Aaron Schlenker, Omkar Thakoor, Haifeng Xu, Fei Fang, Milind Tambe, Long Tran-Thanh, Phebe Vayanos, and Yevgeniy Vorobeychik. Deceiving Cyber Adversaries: A Game Theo- retic Approach. AAMAS โ€™18, page 892โ€“900, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems. [132] Lloyd S Shapley and RN Snow. Basic solutions of discrete games. Contributions to the Theory of Games, 1:27โ€“35, 1952. [133] Devendra Shelar and Saurabh Amin. Security assessment of electricity distribution networks under DER node compromises. IEEE Transactions on Control of Network Systems, 4(1):23โ€“ 36, 2016. [134] Dan Simon. Optimal state estimation: Kalman, H infinity, and nonlinear approaches. John Wiley & Sons, 2006. [135] Roy S Smith. Covert misappropriation of networked control systems: Presenting a feedback structure. IEEE Control Systems Magazine, 35(1):82โ€“92, 2015. [136] Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias W Seeger. Information- IEEE theoretic regret bounds for Gaussian process optimization in the bandit setting. Transactions on Information Theory, 58(5):3250โ€“3265, 2012. [137] G Edward Suh, Jae W Lee, David Zhang, and Srinivas Devadas. Secure program execution via dynamic information flow tracking. ACM Sigplan Notices, 39(11):85โ€“96, 2004. [138] Jiachen Sun, Yulong Cao, Qi Alfred Chen, and Z Morley Mao. Towards robust lidar- based perception in autonomous driving: General black-box adversarial sensor attack and countermeasures. In 29th {USENIX} Security Symposium ({USENIX} Security 20), pages 877โ€“894, 2020. [139] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. [140] Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvรกri, and Eric Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML โ€™09, page 993โ€“1000, New York, NY, USA, 2009. Association for Computing Machinery. [141] Wen Tian, Xiao-Peng Ji, Weiwei Liu, Jiangtao Zhai, Guangjie Liu, Yuewei Dai, and Shuhua Huang. Honeypot game-theoretical model for defending against apt attacks with limited resources in cyber-physical systems. Etri Journal, 41(5):585โ€“598, 2019. [142] Anastasios Tsiamis, Andreea B. Alexandru, and George J. Pappas. Motion planning with secrecy. 2019 American Control Conference (ACC), pages 784โ€“791, 2019. [143] Kyriakos G Vamvoudakis, Joรฃo P Hespanha, Bruno Sinopoli, and Yilin Mo. Adversarial detection as a zero-sum game. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pages 7133โ€“7138. IEEE, 2012. 192 [144] Kyriakos G Vamvoudakis, Joao P Hespanha, Bruno Sinopoli, and Yilin Mo. Detection in IEEE Transactions on Automatic Control, 59(12):3209โ€“3223, adversarial environments. 2014. [145] Marten Van Dฤณk, Ari Juels, Alina Oprea, and Ronald L Rivest. Flipit: The game of โ€œstealthy takeoverโ€. Journal of Cryptology, 26(4):655โ€“713, 2013. [146] Huan Wang, Zhanfang Chen, Jianping Zhao, Xiaoqiang Di, and Dan Liu. A vulnerability assessment method in industrial internet of things based on attack graph and maximum flow. IEEE Access, 6:8599โ€“8609, 2018. [147] Alan Washburn and Kevin Wood. Two-person zero-sum games for network interdiction. Operations research, 43(2):243โ€“251, 1995. [148] Chathurika S. Wickramasinghe, Daniel L. Marino, Kasun Amarasinghe, and Milos Manic. Generalization of deep learning for cyber-physical system security: A survey. In IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society, pages 745โ€“751, 2018. [149] Liang Xiao, Yan Li, Guoan Han, Guolong Liu, and Weihua Zhuang. Phy-layer spoofing detection with reinforcement learning in wireless networks. IEEE Transactions on Vehicular Technology, 65(12):10037โ€“10047, 2016. [150] Wei Xing, Xudong Zhao, Tamer BaลŸar, and Weiguo Xia. Security investment in cyber- physical systems: Stochastic games with asymmetric information and resource-constrained players. IEEE Transactions on Automatic Control, 67(10):5384โ€“5391, 2021. [151] Xin Xu, Lei Zuo, and Zhenhua Huang. Reinforcement learning algorithms with function approximation: Recent advances and applications. Information Sciences, 261:1โ€“31, 2014. [152] Jun Yan, Haibo He, Xiangnan Zhong, and Yufei Tang. Q-learning-based vulnerability analysis of smart grid against sequential topology attacks. IEEE Transactions on Information Forensics and Security, 12(1):200โ€“210, 2016. [153] Dayong Ye, Tianqing Zhu, Sheng Shen, and Wanlei Zhou. A differentially private game IEEE Transactions on Information theoretic approach for deceiving cyber adversaries. Forensics and Security, 16:569โ€“584, 2020. [154] Yuriy Zacchia Lun, Alessandro Dโ€™Innocenzo, Francesco Smarra, Ivano Malavolta, and Maria Domenica Di Benedetto. State of the art of cyber-physical systems security: An automatic control perspective. Journal of Systems and Software, 149:174โ€“216, 2019. [155] Heng Zhang, Peng Cheng, Ling Shi, and Jiming Chen. Optimal denial-of-service attack scheduling with energy constraint. IEEE Transactions on Automatic Control, 60(11):3023โ€“ 3028, 2015. [156] Kaiqing Zhang, Zhuoran Yang, and Tamer BaลŸar. Multi-agent reinforcement learning: A selective overview of theories and algorithms, pages 321โ€“384. Springer, Cham, 2021. 193 [157] Ming Zhang, Zizhan Zheng, and Ness B Shroff. A game theoretic model for defending against In Decision and Game Theory for Security: 6th stealthy attacks with limited resources. International Conference, GameSec 2015, London, UK, November 4-5, 2015, Proceedings 6, pages 93โ€“112. Springer, 2015. [158] Ming Zhang, Zizhan Zheng, and Ness B Shroff. Defending against stealthy attacks on multiple nodes with limited resources: A game-theoretic analysis. IEEE Transactions on Control of Network Systems, 7(4):1665โ€“1677, 2020. [159] Yichi Zhang, Yingmeng Xiang, and Lingfeng Wang. Power system reliability assessment in- corporating cyber attacks against wind farm energy management systems. IEEE transactions on smart grid, 8(5):2343โ€“2357, 2016. [160] Lifeng Zhou, Vasileios Tzoumas, George J. Pappas, and Pratap Tokekar. Distributed Attack-Robust Submodular Maximization for Multi-Robot Planning. arXiv e-prints, page arXiv:1910.01208, October 2019. [161] Quanyan Zhu and Tamer Basar. Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross- layer resilient control systems. IEEE Control Systems Magazine, 35(1):46โ€“65, 2015. 194