Deep Multi-agent Reinforcement Learning for Efficient and Scalable Networked System Control

Recently, intelligent systems, such as robots, connected automated vehicles, and smart grids have emerged as promising tools to enhance efficiency and sustainability across diverse areas, including intelligent transportation, industrial automation, and energy management. These systems can connect with local communication networks, forming connected systems, and showing high scalability and robustness. Yet, controlling these connected systems presents great challenges, mainly due to the high dimensionality of their state/action spaces and the complex interactions among their components. Traditional control methods often struggle in the real-time management of these systems, given their inherent complexity and uncertainties. Fortunately, reinforcement learning (RL), especially multi-agent reinforcement learning (MARL), offers an effective solution by leveraging adaptive online capabilities and their proficiency in solving intricate problems. In this thesis, three unique deep MARL algorithms are explored for safe, efficient, and scalable networked system control (NSC). The efficacy of these algorithms is validated in several practical and real-world applications, such as power grids and connected automated vehicles (CAVs).In the first algorithm, a safe, scalable, and efficient MARL framework is introduced specifically for on-ramp merging in mixed-traffic scenarios, where both human-driven vehicles and connected automated vehicles exist. By leveraging parameter sharing and local reward design, the framework fosters cooperation among agents without compromising on scalability. To mitigate the collision rates and expedite the training process, an innovative priority-based safety supervisor is developed and incorporated into the MARL framework. In addition, a gym-like simulation environment is developed and open-sourced, offering three traffic density levels. Extensive experimental results show that our proposed MARL model consistently surpasses several state-of-the-art (SOTA) benchmarks, showing its significant promise for managing CAVs in the specified on-ramp merging scenarios.In our second exploration, we propose a fully-decentralized MARL framework for Cooperative Adaptive Cruise Control (CACC). This approach differs substantially from the traditional centralized training and decentralized execution (CTDE) method. Within this framework, each agent acts based on its unique observations and rewards, eliminating the need for a central controller. In addition, we further introduce a quantization-based communication protocol to enhance communication efficiency and reduce bandwidth consumption by employing randomized rounding to quantize each transmitted data piece, while only sending the non-zero components after quantization. Through the validation of two distinct CACC scenarios, our method has proven to outperform SOTA models in both control precision and communication efficiency.In our third exploration, we present an efficient MARL algorithm specifically for cooperative control within power grids. In particular, we focus on the decentralized inverter-based secondary voltage control problem by formulating it as a cooperative MARL problem. Then, we introduce a novel on-policy MARL algorithm, named PowerNet, where each agent (i.e., each distributed generator (DG)) learns a control policy based on (sub-)global reward, as well as encoded communication messages from its neighbors. Additionally, a novel spatial discount factor is introduced to mitigate the effect of remote agents, expedite the training process and improve scalability. Moreover, a differentiable, learning-based communication protocol is employed to enhance collaboration among adjacent agents. To support comprehensive training and assessment, we introduce PGSim, an open-source and cutting-edge power grid simulation platform. The evaluation across two microgrid configurations shows that PowerNet not only outperforms conventional model-based control techniques but also several SOTA MARL strategies.

Read