We provide training illustrations on Unreal4kStereo dataset in this document. Users can adopt to custome datasets based on this Unreal4kStereo version. We provide all configs for Depth-Anything and ZoeDepth training.
Download the dataset from https://github.com/fabiotosi92/SMD-Nets.
Preprocess the dataset following the instruction (convert images to raw
format).
Copy the split files in ./splits/u4k
and organize the folder structure as:
monocular-depth-estimation-toolbox
├── estimator
├── docs
├── ...
├── data (it's included in `.gitignore`)
│ ├── u4k (recommand ln -s)
│ │ ├── 00000
│ │ │ ├── Disp0
│ │ │ │ ├── 00000.npy
│ │ │ │ ├── 00001.npy
│ │ │ │ ├── ...
│ │ │ ├── Extrinsics0
│ │ │ ├── Extrinsics1
│ │ │ ├── Image0
│ │ │ │ ├── 00000.raw (Note it's important to convert png to raw to speed up training)
│ │ │ │ ├── 00001.raw
│ │ │ │ ├── ...
│ │ ├── 00001
│ │ │ ├── Disp0
│ │ │ ├── Extrinsics0
│ │ │ ├── Extrinsics1
│ │ │ ├── Image0
| | ├── ...
| | ├── 00008
| | ├── splits
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ │ ├── test.txt
│ │ │ ├── test_out.txt
Before trainig, please download pre-trained metric depth estimators from https://huggingface.co/zhyever/PatchFusion/tree/main. We provide pre-trained checkpoints for Depth-Anything and ZoeDepth.
Model Name | Config Path |
---|---|
Depth-Anything-vitl | https://huggingface.co/zhyever/PatchFusion/blob/main/DepthAnything_vitl.pt |
Depth-Anything-vitb | https://huggingface.co/zhyever/PatchFusion/blob/main/DepthAnything_vitb.pt |
Depth-Anything-vits | https://huggingface.co/zhyever/PatchFusion/blob/main/DepthAnything_vits.pt |
ZoeDepth-N | https://huggingface.co/zhyever/PatchFusion/blob/main/patchfusion_u4k.pt |
Note that these checkpoints are pre-trained using the offical implementation, and as a result, their file names end with .pt
.
Put them to one specific folder, for example, ./work_dir/DepthAnything_vitl.pt
. (Note: ./work_dir
is included in .gitignore
)
This repo follows the design of Monocular-Depth-Estimation-Toolbox, but it's more flexible in training and inference. The overall training script follows this line of command:
bash tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
Arguments Explanation:
${CONFIG_FILE}
: Select the configuration file for training${GPU_NUM}
: Specify the number of GPU used for training (We use 4 as default)[optional arguments]
: You can specify more arguments. We present some important arguments here--log-name
: experiment name shown in wandb website--work-dir
:work-dir + log-name
indicates the path to save logs and checkpoints--tag
: tags shown in wandb website--debug
: if set, omit wandb log
You will see examples in the following sections.
Training PatchFusion includes three steps. Here, we use take DepthAnything vitl as an example.
First, check the config file: ./configs/patchfusion_depthanything/depthanything_vitl_coarse_pretrain_u4k
. Modify the config item zoe_depth_config.pretrained_resource
to the checkpoint path (the default path is local::./work_dir/DepthAnything_vitl.pt
). The prefix local::
is necessary because we based on the offical implementation.
Then, run:
bash ./tools/dist_train.sh configs/patchfusion_depthanything/depthanything_vitl_coarse_pretrain_u4k.py 4 --work-dir ./work_dir/depthanything_vitl_u4k --log-name coarse_pretrain --tag coarse,da,vitl
As for this command, we will use the config depthanything_vitl_coarse_pretrain_u4k.py
, 4 gpus to train the model, and save the logs and checkpoints to ./work_dir/depthanything_vitl_u4k/coarse_pretrain
. During training, you can check logs with experiment name coarse_pretrain
on wandb.
Again, check the config file: ./configs/patchfusion_depthanything/depthanything_vitl_fine_pretrain_u4k.py
. Modify the config item zoe_depth_config.pretrained_resource
to the checkpoint path (the default path is local::./work_dir/DepthAnything_vitl.pt
).
Then, run:
bash ./tools/dist_train.sh configs/patchfusion_depthanything/depthanything_vitl_fine_pretrain_u4k.py 4 --work-dir ./work_dir/depthanything_vitl_u4k --log-name fine_pretrain --tag fine,da,vitl
Finally, we can train the fusion model. Check the config file: ./configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py
. Now, you need to modify the config item model.config.pretrain_model
to the checkpoint paths of both the pre-trained coarse and fine models. (the default path is ['./work_dir/depthanything_vitl_u4k/coarse_pretrain/checkpoint_24.pth', './work_dir/depthanything_vitl_u4k/fine_pretrain/checkpoint_24.pth']
)
Finally, run:
bash ./tools/dist_train.sh configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py 4 --work-dir ./work_dir/depthanything_vitl_u4k --log-name patchfusion --tag patchfusion,da,vitl
Model Name | Coarse Config | Fine Config | PatchFusion Config |
---|---|---|---|
Depth-Anything-vitl | ./configs/patchfusion_depthanything/depthanything_vitl_coarse_pretrain_u4k.py |
./configs/patchfusion_depthanything/depthanything_vitl_fine_pretrain_u4k.py |
./configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py |
Depth-Anything-vitb | ./configs/patchfusion_depthanything/depthanything_vitb_coarse_pretrain_u4k.py |
./configs/patchfusion_depthanything/depthanything_vitb_fine_pretrain_u4k.py |
./configs/patchfusion_depthanything/depthanything_vitb_patchfusion_u4k.py |
Depth-Anything-vits | ./configs/patchfusion_depthanything/depthanything_vits_coarse_pretrain_u4k.py |
./configs/patchfusion_depthanything/depthanything_vits_fine_pretrain_u4k.py |
./configs/patchfusion_depthanything/depthanything_vits_patchfusion_u4k.py |
ZoeDepth-N | ./configs/patchfusion_zoedepth/zoedepth_coarse_pretrain_u4k.py |
./configs/patchfusion_zoedepth/zoedepth_fine_pretrain_u4k.py |
./configs/patchfusion_zoedepth/zoedepth_patchfusion_u4k.py |
During training, the validation is processed intermittently. In the config file, you can change the related settings. For example,
train_cfg=dict(max_epochs=16, val_interval=2, save_checkpoint_interval=16, log_interval=100, train_log_img_interval=500, val_log_img_interval=50, val_type='epoch_base', eval_start=0)
By changing val_interval=4
, you can validate every 4 epochs.
Run:
bash ./tools/dist_test.sh configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py 4 --ckp-path ./work_dir/depthanything_vitl_u4k/patchfusion/checkpoint_16.pth --cai-mode m1
Check more details of arguments at Inference with Multiple GPUs and Running.