Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] DVCLiveVisBackend initializing DVC workspace in subdirectories #1508

Open
2 tasks done
smarais opened this issue Feb 29, 2024 · 1 comment
Open
2 tasks done
Labels
bug Something isn't working

Comments

@smarais
Copy link

smarais commented Feb 29, 2024

Prerequisite

Environment

OrderedDict([('sys.platform', 'linux'), ('Python', '3.10.8 (main, Nov 4 2022, 13:48:29) [GCC 11.2.0]'), ('CUDA available', True), ('MUSA available', False), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3080 Ti Laptop GPU'), ('CUDA_HOME', '/opt/conda'), ('NVCC', 'Cuda compilation tools, release 11.6, V11.6.124'), ('GCC', 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0'), ('PyTorch', '1.13.1'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201402\n - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.6\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.3.2 (built against CUDA 11.5)\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.14.1'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.3')])

Reproduces the problem - code sample

I am using DVC to run an MMPretrain task. I am enabling the DVCLiveVisBackend in any training config, like so:

visualizer = dict(
    type="Visualizer",
    vis_backends=[dict(type="DVCLiveVisBackend",save_dir="dvclive"],
)

With reference to: https://github.com/open-mmlab/mmengine/blame/v0.10.3/mmengine/visualization/vis_backend.py#L1185-L1189

This code block attempts to discover a Git repository starting from the current directory (os.curdir). If it fails to find Git information (i.e., if a KeyError is thrown), it then initializes a new DVC repository in the current directory without considering whether a DVC repository already exists at a higher level in the directory structure.

In my use case os.curdir will correspond to the dvc.yaml file location which is not in the dvc/git root directory.

Suggestions:

  1. Adjust the initialization logic to better detect existing DVC repositories (for example dvc root and avoid reinitializing DVC in subdirectories.
  2. Add a param to disable this behavior or provide the workspace directory

Reproduces the problem - command or script

In my case, I am tesing in mmpretrain.
dvc repro subfolder/dvc.yaml from the project root folder.

Example dvc.yaml:

stages:
  train:
    cmd: python /workspace/mmpretrain/tools/train.py --work-dir /workspace/myproject/output /workspace/mmpretrain/configs/resnet/resnet34_8xb32_in1k.py

Reproduces the problem - error message

No error message. Creating DVC workspaces in sub-directories causes issues for DVC in that it treats the subfolder as a separate project.

Additional information

No response

@smarais smarais added the bug Something isn't working label Feb 29, 2024
@michaelgruner
Copy link

I had a similar situation. After some debugging I found out that the problem was that no default_signature was found. Here's the test it performs:

path = pygit2.discover_repository(os.fspath(os.curdir), True, '')
pygit2.Repository(path).default_signature
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: "config value 'user.name' was not found"

which made DVC initialize the new repository, breaking everything in its way. This is true for most docker containers.

After running:

git config --global user.name "Michael Gruner"
git config --global user.email "michael.gruner@ridgerun.ai"

everything works as expected. Hope it helps someone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants