An Android app inferencing the popular Depth-Anything model, which is used for monocular depth estimation
- Clone the repository, and open the resulting directory in Android Studio
$> git clone --depth=1 https://github.com/shubham0204/Depth-Anything-Android
- Download the ONNX models from Releases and place them in the
app/src/main/assets
directory. The models are used by ONNX'sOrtSession
to load the computatio- graph and parameters in-memory. Any one of the following models can be placed in theassets
directory:
model.onnx
: Depth-Anything modulemodel_fp16.onnx
:float16
quantized version ofmodel.onnx
- Connect a device to Android Studio, and select
Run Application
from the top navigation pane.
Note
The app contains an ONNX model which was created by combining the pre/post-processing operations required
by Depth-Anything
in a single model. To know more on how the model was built, refer this notebook.
- Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
- fabio-sim/Depth-Anything-ONNX
- ONNX Runtime: How to develop a mobile application with ONNX Runtime
- ONNX Runtime: Create Float16 and Mixed Precision Models
- Build a image preprocessing model using Pytorch and integrate into your model using ONNX
- MDE model trained on labeled data is used to annotate unlabeled images (62M) during training (semi-supervised learning, self learning or pseudo-labelling)
- Teacher model trained on labeled images and then used to annotate unlabeled images. Student model trained on all images (labeled + teacher-annotated)
- No performance gain observed, hence a more difficult optimization target was introduced for the student model. Unlabeled images are perturbed with (1) strong color distortions and (2) CutMix (used in image classification mostly)
- Semantic assisted perception: Improve depth estimation with auxiliary semantic segmentation task, by using one shared encoder and two separate decoders
@article{depthanything,
title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv:2401.10891},
year={2024}
}
@misc{oquab2023dinov2,
title={DINOv2: Learning Robust Visual Features without Supervision},
author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
journal={arXiv:2304.07193},
year={2023}
}