Skip to content

An Android app running inference on Depth-Anything

License

Notifications You must be signed in to change notification settings

shubham0204/Depth-Anything-Android

Repository files navigation

Depth-Anything - Android Demo

An Android app inferencing the popular Depth-Anything model, which is used for monocular depth estimation

app_img_01 app_img_02 app_img_03

Project Setup

  1. Clone the repository, and open the resulting directory in Android Studio
$> git clone --depth=1 https://github.com/shubham0204/Depth-Anything-Android
  1. Download the ONNX models from Releases and place them in the app/src/main/assets directory. The models are used by ONNX's OrtSession to load the computatio- graph and parameters in-memory. Any one of the following models can be placed in the assets directory:
  • model.onnx: Depth-Anything module
  • model_fp16.onnx: float16 quantized version of model.onnx
  1. Connect a device to Android Studio, and select Run Application from the top navigation pane.

Useful Resources

Note

The app contains an ONNX model which was created by combining the pre/post-processing operations required by Depth-Anything in a single model. To know more on how the model was built, refer this notebook.

Paper Summary

  • MDE model trained on labeled data is used to annotate unlabeled images (62M) during training (semi-supervised learning, self learning or pseudo-labelling)
  • Teacher model trained on labeled images and then used to annotate unlabeled images. Student model trained on all images (labeled + teacher-annotated)
  • No performance gain observed, hence a more difficult optimization target was introduced for the student model. Unlabeled images are perturbed with (1) strong color distortions and (2) CutMix (used in image classification mostly)
  • Semantic assisted perception: Improve depth estimation with auxiliary semantic segmentation task, by using one shared encoder and two separate decoders

Citation

@article{depthanything,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
      journal={arXiv:2401.10891},
      year={2024}
}
@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}