Yoann Sola PhD

v.2021-12

Contributions to the development of Deep Reinforcement Learning-based controllers for AUV

Defense (link)

funding : AID and Region Bretagne
December 3th, 2021 at ENSTA Bretagne
Supervisors : Gilles LE CHENADEC and Benoit CLEMENT
a video link is here

Dissertation is available here (link) and the slides are here (link)

Board of Examiners

Jean-Philippe DIGUET- IRL CROSSING - reviewer - (rapport)
Vincent CREUZE- LIRMM Montpellier - reviewer - (rapport)
Veronique SERFATY - AID
Ali MANSOUR - Lab-STICC
Gilles LE CHENADEC - Lab-STICC - Supervisor
Benoit CLEMENT - Lab-STICC - Supervisor
Benoit DESROCHERS (invited)

Abstract

The marine environment is a very hostile setting for robotics. It is strongly unstructured, very uncertain and includes a lot of external disturbances which cannot be easily predicted or modelled. In this work, we will try to control an Autonomous Underwater Vehicle (AUV) in order to perform a waypoint tracking task, using a machine learning-based controller. Machine learning allowed to make impressive progress in a lot of different domain in the recent years, and the subfield of deep reinforcement learning managed to design several algorithms very suitable for the continuous control of dynamical systems. We chose to implement the Soft Actor-Critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm allowing to fulfill a learning task and to encourage the exploration of the environment simultaneously. We compared a SAC-based controller with a Proportional-Integral-Derivative (PID) controller on a waypoint tracking task and using specific performance metrics. All the tests were performed in simulation thanks to the use of the UUV Simulator. We decided to apply these two controllers to the RexROV 2, a six degrees of freedom cube-shaped Remotely Operated underwater Vehicle (ROV) converted in an AUV. Thanks to these tests, we managed to propose several interesting contributions such as making the SAC achieve an end-to-end control of the AUV, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm. Moreover we propose a methodology for the training of deep reinforcement learning algorithms on control tasks, as well as a discussion about the absence of guidance algorithms for our end-to-end AUV controller.