End-to-End Collision Avoidance from Depth Input with Memory-based Deep Reinforcement Learning

Master Thesis, D-MAVT, ETH Zurich.


The main goal of this work is learning a local path planning policy for mobile robots from a single depth camera input. We formulate the end-to-end local planning problem as a Partially Observable Markov Decision Process and solve it using a Deep Reinforcement Learning algorithm. The main challenges of this setting comes from

  1. the short-sightedness of reaction-based planners, and
  2. the limited field-of-view of depth camera
that significantly degrades the planner's performance.

We resolve these problems by memory-based Deep Reinforcement Learning. This framework represents a policy as a network with a memory unit that can remember past observations. As a result, the trained policy can generate collision-safe trajectories based on not only a current observation but also previous observations. We also address sample inefficiency of end-to-end learning by

  1. a two-stream feature extraction with pre-trained autoencoder
  2. asymmetric actor-critic method.
These methods were demonstrated to be effective for fast convergence by our ablation study results. Finally we bridge the reality gap between real depth image and simulated depth image by real-time depth completion algorithm and pre-training autoencoder with both real images and simulate images.

In the quantitative evaluation, our policy with memory units outperforms standard CNN policy. Notably, the policy with Temporal Convolutional layers learned much faster than the policy with conventional LSTM. In the following real robot experiments, we deployed the trained policy to the quadrupedal robot ANYmal with Intel RealSense depth camera. Our policy generated collision-safe paths reactively in both stationary and dynamic environments.

Paper: [ETH Research Collection]

Supplementary videos

Simulation result

Real robot experiments


    title={End-to-End Collision Avoidance from Depth Input with Memory-based Deep Reinforcement Learning},
    author={Kang, Dongho},
    school={ETH Zurich},