Master Thesis, D-MAVT, ETH Zurich.
The main goal of this work is learning a local path planning policy for mobile robots from a single depth camera input. We formulate the end-to-end local planning problem as a Partially Observable Markov Decision Process and solve it using a Deep Reinforcement Learning algorithm. The main challenges of this setting comes from
We resolve these problems by memory-based Deep Reinforcement Learning. This framework represents a policy as a network with a memory unit that can remember past observations. As a result, the trained policy can generate collision-safe trajectories based on not only a current observation but also previous observations. We also address sample inefficiency of end-to-end learning by
In the quantitative evaluation, our policy with memory units outperforms standard CNN policy. Notably, the policy with Temporal Convolutional layers learned much faster than the policy with conventional LSTM. In the following real robot experiments, we deployed the trained policy to the quadrupedal robot ANYmal with Intel RealSense depth camera. Our policy generated collision-safe paths reactively in both stationary and dynamic environments.
Paper: [ETH Research Collection]