(1 - 1 of 1)
- Optimal Learning of Deployment and Search Strategies for Robotic Teams
- Wei, Lai
- Electronic Theses & Dissertations
In the problem of optimal learning, the dilemma of exploration and exploitation stems from the fact that gathering information and exploiting it are, in many cases, two mutually exclusive activities. The key to optimal learning is to strike a balance between exploration and exploitation. The Multi-Armed Bandit (MAB) problem is a prototypical example of such an explore-exploit tradeoff, in which a decision-maker sequentially allocates a single resource by repeatedly choosing one among a set of...
Show moreIn the problem of optimal learning, the dilemma of exploration and exploitation stems from the fact that gathering information and exploiting it are, in many cases, two mutually exclusive activities. The key to optimal learning is to strike a balance between exploration and exploitation. The Multi-Armed Bandit (MAB) problem is a prototypical example of such an explore-exploit tradeoff, in which a decision-maker sequentially allocates a single resource by repeatedly choosing one among a set of options that provide stochastic rewards. The MAB setup has been applied in many robotics problems such as foraging, surveillance, and target search, wherein the task of robots can be modeled as collecting stochastic rewards. The theoretical work of this dissertation is based on the MAB setup and three problem variations, namely heavy-tailed bandits, nonstationary bandits, and multi-player bandits, are studied. The first two variations capture two key features of stochastic feedback in complex and uncertain environments: heavy-tailed distributions and nonstationarity; while the last one addresses the problem of achieving coordination in uncertain environments. We design several algorithms that are robust to heavy-tailed distributions and nonstationary environments. Besides, two distributed policies that require no communication among agents are designed for the multi-player stochastic bandits in a piece-wise stationary environment.The MAB problems provide a natural framework to study robotic search problems. The above variations of the MAB problems directly map to robotic search tasks in which a robot team searches for a target from a fixed set of view-points (arms). We further focus on the class of search problems involving the search of an unknown number of targets in a large or continuous space. We view the multi-target search problem as a hot-spots identification problem in which, instead of the global maximum of the field, all locations with a value greater than a threshold need to be identified. We consider a robot moving in 3D space with a downward-facing camera sensor. We model the robot's sensing output using a multi-fidelity Gaussian Process (GP) that systematically describes the sensing information available at different altitudes from the floor. Based on the sensing model, we design a novel algorithm that (i) addresses the coverage-accuracy tradeoff: sampling at a location farther from the floor provides a wider field of view but less accurate measurements, (ii) computes an occupancy map of the floor within a prescribed accuracy and quickly eliminates unoccupied regions from the search space, and (iii) travels efficiently to collect the required samples for target detection. We rigorously analyze the algorithm and establish formal guarantees on the target detection accuracy and the detection time.An approach to extend the single robot search policy to multiple robots is to partition the environment into multiple regions such that workload is equitably distributed among all regions and then assign a robot to each region. The coverage control focuses on such equitable partitioning and the workload is equivalent to the so-called service demands in the coverage control literature. In particular, we study the adaptive coverage control problem, in which the demands of robotic service within the environment are modeled as a GP. To optimize the coverage of service demands in the environment, the team of robots aims to partition the environment and achieve a configuration that minimizes the coverage cost, which is a measure of the average distance of a service demand from the nearest robot. The robots need to address the explore-exploit tradeoff: to minimize coverage cost, they need to gather information about demands within the environment, whereas information gathering deviates them from maintaining a good coverage configuration. We propose an algorithm that schedules learning and coverage epochs such that its emphasis gradually shifts from exploration to exploitation while never fully ceasing to learn. Using a novel definition of coverage regret, we analyze the algorithm and characterizes its coverage performance over a finite time horizon.