We propose a new method for fast and accurate monocular depth estimation for embedded systems. Our objective was to improve the FastDepth model, which achieves state-of-the-art per-formance for mobile applications. To do this, different loss functions were tested for training the model and more advanced light-weight CNNs were used as encoders. Even though switching from the default L2 to the BerHu loss function had as a result only a minor improvement in the quality of the generated depth maps, we managed to enhance significantly the performance of Fast-Depth, by using an EfficientNet based CNN as the encoder. These changes had but a minuscule effect on the model’s size and latency, which is why we argue that our approach is an improvement for mobile applications
Figure: 1st column: RGB input image, 2nd column: ground truth depth map, 3rd column: generated depth map
Size | No. Params | Avg. Latency | RMSE (mm) | δ < 1.25 | |
---|---|---|---|---|---|
Original FastDepth | 31MB | 4M | 9msecs | 796 | 0.603 |
EfficientDepth (ours) | 40MB | 6.5M | 23msecs | 678 | 0.688 |