Electrical and Computer Engineering ETDs

Publication Date

Spring 5-11-2024

Abstract

In the dynamic landscape of autonomous aerial systems, the integration of uncrewed aerial vehicles (UAVs) has sparked a paradigm shift, offering unprecedented opportunities and challenges in collaborative decision-making and navigation. This thesis explores the application of multi-agent reinforcement learning (MARL) for the planning and coordination of UAVs in complex environments.

The first part of this thesis provides an introduction to single-agent reinforcement learning and MARL. We provide examples of the use of MARL for countering uncrewed aerial systems (C-UAS). We formulate the Counter-UAS problem as a multiagent partially observable Markov decision process (MAPOMDP), and we propose Multi-AGent partial observable deep reiNforcement lEarning for pursuer conTrol optimization (MAGNET) to train a group of UAS in terms of pursuers or agents, to pursue and intercept a faster UAS or evader, which tries to escape from capture while navigating through crowded airspace with several moving non-cooperating interacting entities (NCIEs). In MAGNET, we integrate the Control Barrier Function iv (CBF) based safety layer into proximal policy optimization (PPO) to provide safety guarantees during the training and testing processes. In addition, we incorporate the DeepSet network into MAGNET to handle the time-varying dimension of an agent’s observations. We conduct extensive simulations, and the results show that MAGNET can maintain a collision-free environment at the sacrifice of slight evader capture rate reduction as compared to the baseline implementations.

The second part of this thesis deals with learning safe methods for Multi-Agent Systems. To this extent, we explore a more complicated scenario in Advanced air mobility applications, where a group of autonomous uncrewed aerial vehicles (UAVs) may need to cooperate to arrive at their predefined destinations simultaneously to, for example, attack a target or carry heavy cargo. However, controlling a group of UAVs to arrive at destinations simultaneously is nontrivial as they have to meet spatial constraints, meaning that the control algorithm has to avoid collisions not only among UAVs but also between UAVs and non-cooperative flying objects (NCFOs), which are not coordinated by the control algorithm. The existing time-coordinated control algorithms can achieve simultaneous arrivals for a multi-UAV system but are unable to ensure collision-free. In this example, we propose a safe linear quadratic optimal control algorithm, which comprises two major parts, i.e., a time-coordinated planner and a safety layer, where the time-coordinated planner is to derive the accelerations of UAVs to minimize the difference between the arrival time and the predefined termination time for all the UAVs, and the safety layer applies a control barrier function based solution to generate feasible accelerations of UAVs that ensure collision-free environment.

Finally, we use the MARL framework to solve the terminal time-coordinated problem, successfully achieving the simultaneous arrival of UAVs at their destinations while avoiding collisions with other UAVs and non-cooperative flying objects (NCFOs).

Keywords

reinforcement learning, machine learning, uncrewed aerial vehicles, safety

Sponsors

Sandia Laboratories, Air Force Research Laboratories.

Document Type

Dissertation

Language

English

Degree Name

Computer Engineering

Level of Degree

Doctoral

Department Name

Electrical and Computer Engineering

First Committee Member (Chair)

Rafael Fierro

Second Committee Member

Xiang Sun

Third Committee Member

Marios Pattichis

Fourth Committee Member

Claus Danielson

Share

COinS