Novel depth representations for depth completion with application in 3D object detection

Depth completion refers to interpolating a dense, regular depth grid from sparse and irregularly sampled depth values, often guided by high-resolution color imagery. The primary goal of depth completion is to estimate depth. In practice methods are trained by minimizing an error between predicted dense depth and ground-truth depth, and are evaluated by how well they minimize this error. Here we identify a second goal which is to avoid smearing depth across depth discontinuities. This second goal is important because it can improve downstream applications of depth completion such as object detection and pose estimation. However, we also show that the goal of minimizing error can conflict with the goal of eliminating depth smearing. In this thesis, we propose two novel representations of depths that can encode depth discontinuity across object surfaces by allowing multiple depth estimation in the spatial domain. In order to learn these new representations, we propose carefully designed loss functions and show their effectiveness in deep neural network learning. We show how our representations can avoid inter-object depth mixing and also beat state of the art metrics for depth completion. The quality of ground-truth depth in real-world depth completion problems is another key challenge for learning and accurate evaluation of methods. Ground truth depth created from semi-automatic methods suffers from sparse sampling and errors at object boundaries. We show that the combination of these errors and the commonly used evaluation measure has promoted solutions that mix depths across boundaries in current methods. The thesis proposes alternate depth completion performance measures that reduce preference for mixed depths and promote sharp boundaries. The thesis also investigates whether additional points from depth completion methods can help in a challenging and high-level perception problem; 3D object detection. It shows the effect of different depth noises originated from depth estimates on detection performances and proposes some effective ways to reduce noise in the estimate and overcome architecture limitations. The method is demonstrated on both real-world and synthetic datasets.

Read