Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PatchMatchNet module for MVS and calculation of normals from depth #1129

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

anmatako
Copy link
Contributor

This work mainly integrated PatchMatchNet functionality in colamp using a TorchScript pre-trained module. Additionally it introduces functionality to calculate normal maps from depth maps since PatchMatchNet evaluation does not create normal maps as part of its process. More details about the changes:

  • Colmap can compile with Torch support to enable PatchMatchNet. For this the pre-compiled LitTorch library needs to be downloaded from here https://pytorch.org/ on the desired configuration for GPU or CPU-only and the archive extracted under <colmap-root>/lib/ thus creating a libtorch subfolder. Then CMake should be able to find the dependency and set the correct compilation flags.
  • PatchMatchNet can now be enabled from patch_match_stereo by setting the mvs_module_path option to a valid TorchScript module. One such module is included as part of this PR in <colmap-root>\mvs-modules\patchmatchnet-module.pt
    • The TorchScript interface is fairly generic using the following input structure: (images: List[Tensor], intrisics: Tensor, extrinsics: Tensor, depth_params: Tensor) with the output being a tuple of (depth: Tensor, confidence: Tensor). Thus any module that subscribes to that input/output format for forward evaluation can be used instead.
  • Functionality of standard patch-match remains unchanged. There is now an inheritance structure used to select between standard and PMNet processing
  • Normal maps are now not required for stereo fusion. If missing they will be calculated from the depth maps themselves. This is needed to accommodate PMNet processing that does not produce normal maps as part of the estimation work.
    • Note that use of calculated normal maps can be forced even for standard patch-match processing through the use of a new stereo fusion option --StereoFusion.calculate_normals.
  • Confidence maps can now be used for stereo fusion and they are optional. If missing a confidence of 1 is assumed everywhere. This is also added to make use of the confidence maps that are created as part of PMNet estimation.
  • New method for finding related images for fusion based on triangulation scoring is introduced and can be enabled with the option --StereoFusion.use_triangulation_scoring. This is included for parity with PatchMatchNet that has this method for finding related images instead of the colmap default. (useful for comparing results between colmap and Python)

Antonios Matakos and others added 15 commits February 10, 2021 20:48
…stereo fusion

- Normal maps are optional during stereo fusion. If the map is not found a map is estimated from the depth map using the cross-product method.
- Confidence maps are now also used in stereo fusion along with a threshold specified in the options. If the confidence probability is below threshold the specific depth is ignored. If the confidence map does not exist, then a default map with probability 1.0 is used.
- Introduced alternative calculation of overlapping images in `Model` class based on triangulation score instead of using the median triangulation angle and sorting by number of common points. This calculation can be enabled for fusion with the new `use_triangulation_scoring` option. The new method is what MVSNet and variants are using when processing a ColMap workspace.
- Added flag to allow use of calculated normals (from depth maps) in stereo fusion, instead of using the ones estimated from patch-match.
- Added flag to control whether or not the normal maps should be renormalized when rescaled. This is completing earlier work that was supposed to avoid normalization of normals during fusion
  - Corollary to this, we now have to explicitly normalize the normal vectors that are used to calculate the angular difference when filtering points during fusion.
- Added utility method `SetSlice` to `Mat` class to allow setting entire entry of normal map more conveniently
- Minor cleanup to fix compilation warnings

Merged PR 3157: Include confidence maps in the MVS setup

- Confidence maps are now part of the MVS setup and participate as dependencies in undistortion and batching
- Calculated normal and confidence maps are written out when using a cached workspace to avoid redoing the calculations when a map get evicted from the cache
- Minor change in normal map calculation to avoid using pixels with invalid (<=0) depth

Merged PR 3356: Robust estimation of normals using planes

- Improved calculation of normals from depth map using plane estimation with configurable window around the pixel of interest
- Fix bug in normal calculation that was using pixel coordinates instead of local frame coordinates (needed multiplication with the inverse matrix of camera intrinsics)

Cherry picked from !3327

Remove option to re-normalize normals
Initial implementation of PatchmatchNet evaluation modules. This is currently a standalone "library" not connected to any other parts of ColMap yet.

Merged PR 3299: Add PatchmatchNet processing through TorchScript module

-  Added PatchMatchNet implementation as alternative to standard patch-match
  - New functionality controlled by the new option to load a TorchScript module from file `--PatchMatchStereo.mvs_module_path`
- Created inheritance structure for `PatchMatch` (base class) and `PatchMatchCuda` and `PatchMatchNet` (derived) to facilitate the choice of processing method

Remove LibTorch components

Merged PR 3450: Update PatchMatchNet module and interface

Fixes two issues from the previous module
- Sizes now are handled internally to ensure each dimension is a multiple of 8
- Images are a vector of tensors to allow different sizes between reference and source images
This reverts commit 57113e3.
#ifndef CUDA_ENABLED
std::cerr << "ERROR: Dense stereo reconstruction requires CUDA, which is not "
"available on your system."
#if !defined(CUDA_ENABLED) && !defined(TORCH_ENABLED)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic here is changed such that now we fail immediately only if both CUDA and Torch are missing. If either is present we can do patch-match through the existing method or PMNet

Antonios Matakos and others added 2 commits February 22, 2021 15:15
@Dawars
Copy link
Contributor

Dawars commented Feb 28, 2021

I've been trying to compile this but I get the following error:

-- Caffe2: CUDA detected: 11.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 11.2
-- Found cuDNN: v8.1.0  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Autodetected CUDA architecture(s):  6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
-- Build type specified as Release
-- Enabling SIMD support
-- Enabling OpenMP support
-- Disabling interprocedural optimization
-- Autodetected CUDA architecture(s):  6.1
-- Enabling CUDA support (version: 11.2, archs: sm_61)
-- Enabling LibTorch support
-- Enabling OpenGL support
-- Disabling profiling support
-- Enabling CGAL support
-- Configuring done
CMake Error in src/CMakeLists.txt:
  Imported target "torch" includes non-existent path

    "MKL_INCLUDE_DIR-NOTFOUND"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.


Which libtorch/cuda version are you using?
I've tried Cuda 11.2, cuDNN: v8.1.0, MKL 2020.04 and libtorch 1.7.1 on a 1080Ti. Same for Cuda 10.2 cuDNN: v7.

@anmatako
Copy link
Contributor Author

I've been trying to compile this but I get the following error:

-- Caffe2: CUDA detected: 11.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 11.2
-- Found cuDNN: v8.1.0  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Autodetected CUDA architecture(s):  6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
-- Build type specified as Release
-- Enabling SIMD support
-- Enabling OpenMP support
-- Disabling interprocedural optimization
-- Autodetected CUDA architecture(s):  6.1
-- Enabling CUDA support (version: 11.2, archs: sm_61)
-- Enabling LibTorch support
-- Enabling OpenGL support
-- Disabling profiling support
-- Enabling CGAL support
-- Configuring done
CMake Error in src/CMakeLists.txt:
  Imported target "torch" includes non-existent path

    "MKL_INCLUDE_DIR-NOTFOUND"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.

Which libtorch/cuda version are you using?
I've tried Cuda 11.2, cuDNN: v8.1.0, MKL 2020.04 and libtorch 1.7.1 on a 1080Ti. Same for Cuda 10.2 cuDNN: v7.

@Dawars It seems that LibTorch requires MKL as a dependency even though it already contains the headers and binaries in the LibTorch package itself. See if installing MKL on your system would resolve your issue.

On my end I made some modifications in the CMake configurations of LibTorch itself to make things work. I'll see if I can make changes in colmap CMake instead and have things work with vanilla LibTorch.

For reference here's a diff between my modified LibTorch and the vanilla one (LibTorch 1.7.1 for CUDA 10.1 with CUDNN 7.6.0)

diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h" "b/lib\\libtorch/include/ATen/Parallel.h"
index 9e2f9be..cc652f2 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h"
+++ "b/lib\\libtorch/include/ATen/Parallel.h"
@@ -38,7 +38,7 @@ namespace internal {

 // Initialise num_threads lazily at first parallel call
 inline CAFFE2_API void lazy_init_num_threads() {
-  thread_local bool init = false;
+  static thread_local bool init = false;
   if (C10_UNLIKELY(!init)) {
     at::init_num_threads();
     init = true;
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h" "b/lib\\libtorch/include/c10/util/StringUtil.h"
index d2744f1..79da0ae 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h"
+++ "b/lib\\libtorch/include/c10/util/StringUtil.h"
@@ -74,7 +74,7 @@ struct _str_wrapper<const char*> final {
 template<>
 struct _str_wrapper<> final {
   static const std::string& call() {
-    thread_local const std::string empty_string_literal;
+    static thread_local const std::string empty_string_literal;
     return empty_string_literal;
   }
 };
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/cuda.cmake" "b/lib\\libtorch/share/cmake/Caffe2/public/cuda.cmake"
index 8b60915..041e19b 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/cuda.cmake"
+++ "b/lib\\libtorch/share/cmake/Caffe2/public/cuda.cmake"
@@ -480,7 +480,7 @@ endforeach()
 # Set C++14 support
 set(CUDA_PROPAGATE_HOST_FLAGS_BLACKLIST "-Werror")
 if(MSVC)
-  list(APPEND CUDA_NVCC_FLAGS "--Werror" "cross-execution-space-call")
+  # list(APPEND CUDA_NVCC_FLAGS "--Werror" "cross-execution-space-call")
   list(APPEND CUDA_NVCC_FLAGS "--no-host-device-move-forward")
 else()
   list(APPEND CUDA_NVCC_FLAGS "-std=c++14")
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/mkl.cmake" "b/lib\\libtorch/share/cmake/Caffe2/public/mkl.cmake"
index 9515a4a..c68074b 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/share/cmake/Caffe2/public/mkl.cmake"
+++ "b/lib\\libtorch/share/cmake/Caffe2/public/mkl.cmake"
@@ -1,4 +1,4 @@
-find_package(MKL QUIET)
+set(MKL_INCLUDE_DIR ${CMAKE_TORCHLIB_PATH}/include)

 if(NOT TARGET caffe2::mkl)
   add_library(caffe2::mkl INTERFACE IMPORTED)

@anmatako
Copy link
Contributor Author

anmatako commented Mar 1, 2021

@Dawars @ahojnnes I update colmap's cmake to set the MKL flags without needing the full dependency for LibTorch to build. Also, I removed an NVCC flag set by LibTorch that was causing issues with Eigen/Core.

However I'm not sure what to do with this part of the diff:

diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h" "b/lib\\libtorch/include/ATen/Parallel.h"
index 9e2f9be..cc652f2 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/ATen/Parallel.h"
+++ "b/lib\\libtorch/include/ATen/Parallel.h"
@@ -38,7 +38,7 @@ namespace internal {

 // Initialise num_threads lazily at first parallel call
 inline CAFFE2_API void lazy_init_num_threads() {
-  thread_local bool init = false;
+  static thread_local bool init = false;
   if (C10_UNLIKELY(!init)) {
     at::init_num_threads();
     init = true;
diff --git "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h" "b/lib\\libtorch/include/c10/util/StringUtil.h"
index d2744f1..79da0ae 100644
--- "a/c:\\Users\\anmatako\\Downloads\\libtorch/include/c10/util/StringUtil.h"
+++ "b/lib\\libtorch/include/c10/util/StringUtil.h"
@@ -74,7 +74,7 @@ struct _str_wrapper<const char*> final {
 template<>
 struct _str_wrapper<> final {
   static const std::string& call() {
-    thread_local const std::string empty_string_literal;
+    static thread_local const std::string empty_string_literal;
     return empty_string_literal;
   }
 };

I'm not sure if the issue with thread_local having to be static is specific to MSVC (windows) or if it happens on other platforms as well, since I have no good way to test this cross-platform.

@Dawars
Copy link
Contributor

Dawars commented Mar 1, 2021 via email

@Dawars
Copy link
Contributor

Dawars commented Mar 1, 2021

Now it compiles and runs fine, no additional cmake config needed for mkl.

However the model file seems to be corrupted. I get the following error at: torch::jit::load(options_.mvs_module_path, kDevIn);

cache_size: 20
write_consistency_graph: 0
mvs_module_path: /home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt
allow_missing_files: 0
First definition of patch-match module for thread index: 0
Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0))

Process finished with exit code 9

I checked it Python and Netron as well:
Error loading Python module. Unknown expression '=' in 'patchmatchnet-module3.pt'.

Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.18.1
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] on linux
import torch
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt') as f:
    model = torch.load(f)
    
Traceback (most recent call last):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-f97813dbac00>", line 2, in <module>
    model = torch.load(f)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load
    if _is_zipfile(opened_file):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile
    byte = f.read(1)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module3.pt') as f:
    model = torch.load(f)
    
Traceback (most recent call last):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-a6ef56580e99>", line 2, in <module>
    model = torch.load(f)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load
    if _is_zipfile(opened_file):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile
    byte = f.read(1)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte

@anmatako
Copy link
Contributor Author

anmatako commented Mar 1, 2021

I can load the module just fine in C++ and Python 3.8.5 on Windows using torch.jit.load; even torch.load works as well with a warning like this:

...Python\Python38\site-packages\torch\serialization.py:589: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
  warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"

Wondering if there's some issue with committing the binary as part of the repo or an issue with Python version. See if it will run with a different python version. Also I can send you the module file directly so we can see if it's an issue caused when the file gets commited.

Now it compiles and runs fine, no additional cmake config needed for mkl.

However the model file seems to be corrupted. I get the following error at: torch::jit::load(options_.mvs_module_path, kDevIn);

cache_size: 20
write_consistency_graph: 0
mvs_module_path: /home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt
allow_missing_files: 0
First definition of patch-match module for thread index: 0
Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0))

Process finished with exit code 9

I checked it Python and Netron as well:
Error loading Python module. Unknown expression '=' in 'patchmatchnet-module3.pt'.

Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.18.1
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] on linux
import torch
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module.pt') as f:
    model = torch.load(f)
    
Traceback (most recent call last):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-f97813dbac00>", line 2, in <module>
    model = torch.load(f)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load
    if _is_zipfile(opened_file):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile
    byte = f.read(1)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte
with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module3.pt') as f:
    model = torch.load(f)
    
Traceback (most recent call last):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-a6ef56580e99>", line 2, in <module>
    model = torch.load(f)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 572, in load
    if _is_zipfile(opened_file):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/serialization.py", line 56, in _is_zipfile
    byte = f.read(1)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 72: invalid start byte

@anmatako
Copy link
Contributor Author

anmatako commented Mar 1, 2021

@Dawars one more thing you can try in case it's an issue with encodings between windows and Linux would be to pull PatchMatchNet from the tip of my branch here https://github.com/anmatako/PatchmatchNet

Then uncomment these 3 lines here: https://github.com/anmatako/PatchmatchNet/blob/e21992b1c2d028536403632eb1bf4bfb1aa8f176/eval.py#L97-L99

and you can run from within the root folder of PatchmatchNet as follows:

python eval.py --output_folder <your output folder> --checkpoint_path checkpoints/patchmatchnet-params.pt --input_type params --output_type depth

This will create a new TorchScript module named patchmatchnet-module.pt in your specified output folder. If you can load that module then it must be some conversion issue between OSes.

@Dawars
Copy link
Contributor

Dawars commented Mar 2, 2021

With PyTorch 1.7.1 I can read the model file properly.
I think the problem is that libtorch tries to open the file as a text file, not binary, that was one of my problems with Python.

I tried explicitly setting the file mode via:

std::ifstream model_file(options_.mvs_module_path, std::ios::in | std::ios::binary);

    model_[thread_index_] = torch::jit::load(model_file, kDevIn);

but I still get the same result.

Probably I'll have to compile a debug version of libtorch for linux to get more info. I have little experience with it but I'll try.

Here is the stack trace:

First definition of patch-match module for thread index: 0
Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0))
*** Aborted at 1614714901 (unix time) try "date -d @1614714901" if you are using GNU date ***
PC: @     0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs()
*** SIGSEGV (@0x3e8000044a0) received by PID 17575 (TID 0x7f2b22fc4700) from PID 17568; stack trace: ***
    @     0x7f2b84b3a631 (unknown)
    @     0x7f2b8305f3c0 (unknown)
    @     0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs()
    @     0x7f2b751eeb53 std::__detail::_Executor<>::_M_dfs()
    @     0x7f2b751eec6c std::__detail::_Executor<>::_M_dfs()
    @     0x7f2b751ef412 std::__detail::__regex_algo_impl<>()
    @     0x7f2b319995fe c10::Device::Device()
    @     0x7f2b7544963d torch::jit::Unpickler::readInstruction()
    @     0x7f2b7544b540 torch::jit::Unpickler::run()
    @     0x7f2b7544baf1 torch::jit::Unpickler::parse_ivalue()
    @     0x7f2b753ef9c2 torch::jit::readArchiveAndTensors()
    @     0x7f2b753efcdd torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive()
    @     0x7f2b753f2605 torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize()
    @     0x7f2b753f2bd9 torch::jit::load()
    @     0x7f2b753f5455 torch::jit::load()
    @     0x55620f2f4c46 colmap::mvs::PatchMatchNet::InitModule()
    @     0x55620f2f43d6 colmap::mvs::PatchMatchNet::PatchMatchNet()
    @     0x55620ec7c9b0 colmap::mvs::PatchMatchController::ProcessProblem()
    @     0x55620ec8fb63 std::__invoke_impl<>()
    @     0x55620ec8fa50 std::__invoke<>()
    @     0x55620ec8f851 _ZNSt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS1_17PatchMatchOptionsEmEPS2_S3_mEE6__callIvJEJLm0ELm1ELm2EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
    @     0x55620ec8f367 std::_Bind<>::operator()<>()
    @     0x55620ec8efdd std::__invoke_impl<>()
    @     0x55620ec8ed55 std::__invoke<>()
    @     0x55620ec8ea7d _ZZNSt13__future_base11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS3_17PatchMatchOptionsEmEPS4_S5_mEESaIiEFvvEE6_M_runEvENKUlvE_clEv
    @     0x55620ec8f436 _ZNKSt13__future_base12_Task_setterISt10unique_ptrINS_7_ResultIvEENS_12_Result_base8_DeleterEEZNS_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSA_17PatchMatchOptionsEmEPSB_SC_mEESaIiEFvvEE6_M_runEvEUlvE_vEclEv
    @     0x55620ec8f08c _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSD_17PatchMatchOptionsEmEPSE_SF_mEESaIiEFvvEE6_M_runEvEUlvE_vEEE9_M_invokeERKSt9_Any_data
    @     0x55620eacd258 std::function<>::operator()()
    @     0x55620eacc75e std::__future_base::_State_baseV2::_M_do_set()
    @     0x55620ead4019 std::__invoke_impl<>()
    @     0x55620ead1136 std::__invoke<>()
    @     0x55620eacce3e _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENKUlvE_clEv
Signal: SIGSEGV (unknown crash reason)

Process finished with exit code 11

This is the error I got with PyTorch 1.6 might be related:

with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module_windows.pt', 'br') as f:
    model = torch.jit.load(f)
Traceback (most recent call last):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-d6e3587a7e88>", line 2, in <module>
    model = torch.jit.load(f)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/jit/__init__.py", line 277, in load
    cpp_module = torch._C.import_ir_module_from_buffer(cu, f.read(), map_location, _extra_files)
RuntimeError: 
Arguments for call are not valid.
The following variants are available:
  
  aten::upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> (Tensor(a!)):
  Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
  
  aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor):
  Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
The original call is:
  File "C:\Users\anmatako\AppData\Roaming\Python\Python38\site-packages\torch\nn\functional.py", line 3130
    if input.dim() == 3 and mode == 'nearest':
        return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    if input.dim() == 4 and mode == 'nearest':
        return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
Serialized   File "code/__torch__/torch/nn/functional/___torch_mangle_46.py", line 155
    _49 = False
  if _49:
    _51 = torch.upsample_nearest1d(input, output_size3, scale_factors6)
          ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _50 = _51
  else:
'interpolate' is being compiled since it was called from 'FeatureNet.forward'
Serialized   File "code/__torch__/models/net.py", line 139
  def forward(self: __torch__.models.net.FeatureNet,
    x: Tensor) -> List[Tensor]:
    _35 = __torch__.torch.nn.functional.___torch_mangle_46.interpolate
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _36 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)
    _37 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)

@anmatako
Copy link
Contributor Author

anmatako commented Mar 2, 2021

Being able to load the module with Pytorch 1.7.1 at least means that the module does not seem to be corrupted. The Pytorch 1.6 issue you see looks like a simple incompatibility with older versions.

As for the error you get when you try to load with LibTorch, I'm quite confused as well, as it should not need any special configs in fstream and should load without issues. Can you try with LibTorch 1.7.1 for CUDA 10.1 and cudnn 7.6.0? That's the same package I'm using and I was wondering if there's something in these dependencies that makes the loading incompatible when doing it from colmap.

With PyTorch 1.7.1 I can read the model file properly.
I think the problem is that libtorch tries to open the file as a text file, not binary, that was one of my problems with Python.

I tried explicitly setting the file mode via:

std::ifstream model_file(options_.mvs_module_path, std::ios::in | std::ios::binary);

    model_[thread_index_] = torch::jit::load(model_file, kDevIn);

but I still get the same result.

Probably I'll have to compile a debug version of libtorch for linux to get more info. I have little experience with it but I'll try.

Here is the stack trace:

First definition of patch-match module for thread index: 0
Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0))
*** Aborted at 1614714901 (unix time) try "date -d @1614714901" if you are using GNU date ***
PC: @     0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs()
*** SIGSEGV (@0x3e8000044a0) received by PID 17575 (TID 0x7f2b22fc4700) from PID 17568; stack trace: ***
    @     0x7f2b84b3a631 (unknown)
    @     0x7f2b8305f3c0 (unknown)
    @     0x7f2b751ee986 std::__detail::_Executor<>::_M_dfs()
    @     0x7f2b751eeb53 std::__detail::_Executor<>::_M_dfs()
    @     0x7f2b751eec6c std::__detail::_Executor<>::_M_dfs()
    @     0x7f2b751ef412 std::__detail::__regex_algo_impl<>()
    @     0x7f2b319995fe c10::Device::Device()
    @     0x7f2b7544963d torch::jit::Unpickler::readInstruction()
    @     0x7f2b7544b540 torch::jit::Unpickler::run()
    @     0x7f2b7544baf1 torch::jit::Unpickler::parse_ivalue()
    @     0x7f2b753ef9c2 torch::jit::readArchiveAndTensors()
    @     0x7f2b753efcdd torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive()
    @     0x7f2b753f2605 torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize()
    @     0x7f2b753f2bd9 torch::jit::load()
    @     0x7f2b753f5455 torch::jit::load()
    @     0x55620f2f4c46 colmap::mvs::PatchMatchNet::InitModule()
    @     0x55620f2f43d6 colmap::mvs::PatchMatchNet::PatchMatchNet()
    @     0x55620ec7c9b0 colmap::mvs::PatchMatchController::ProcessProblem()
    @     0x55620ec8fb63 std::__invoke_impl<>()
    @     0x55620ec8fa50 std::__invoke<>()
    @     0x55620ec8f851 _ZNSt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS1_17PatchMatchOptionsEmEPS2_S3_mEE6__callIvJEJLm0ELm1ELm2EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
    @     0x55620ec8f367 std::_Bind<>::operator()<>()
    @     0x55620ec8efdd std::__invoke_impl<>()
    @     0x55620ec8ed55 std::__invoke<>()
    @     0x55620ec8ea7d _ZZNSt13__future_base11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNS3_17PatchMatchOptionsEmEPS4_S5_mEESaIiEFvvEE6_M_runEvENKUlvE_clEv
    @     0x55620ec8f436 _ZNKSt13__future_base12_Task_setterISt10unique_ptrINS_7_ResultIvEENS_12_Result_base8_DeleterEEZNS_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSA_17PatchMatchOptionsEmEPSB_SC_mEESaIiEFvvEE6_M_runEvEUlvE_vEclEv
    @     0x55620ec8f08c _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_11_Task_stateISt5_BindIFMN6colmap3mvs20PatchMatchControllerEFvRKNSD_17PatchMatchOptionsEmEPSE_SF_mEESaIiEFvvEE6_M_runEvEUlvE_vEEE9_M_invokeERKSt9_Any_data
    @     0x55620eacd258 std::function<>::operator()()
    @     0x55620eacc75e std::__future_base::_State_baseV2::_M_do_set()
    @     0x55620ead4019 std::__invoke_impl<>()
    @     0x55620ead1136 std::__invoke<>()
    @     0x55620eacce3e _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENKUlvE_clEv
Signal: SIGSEGV (unknown crash reason)

Process finished with exit code 11

This is the error I got with PyTorch 1.6 might be related:

with open('/home/dawars/projects/colmap_torch/mvs-modules/patchmatchnet-module_windows.pt', 'br') as f:
    model = torch.jit.load(f)
Traceback (most recent call last):
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-d6e3587a7e88>", line 2, in <module>
    model = torch.jit.load(f)
  File "/home/dawars/miniconda3/envs/historic/lib/python3.7/site-packages/torch/jit/__init__.py", line 277, in load
    cpp_module = torch._C.import_ir_module_from_buffer(cu, f.read(), map_location, _extra_files)
RuntimeError: 
Arguments for call are not valid.
The following variants are available:
  
  aten::upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> (Tensor(a!)):
  Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
  
  aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor):
  Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
The original call is:
  File "C:\Users\anmatako\AppData\Roaming\Python\Python38\site-packages\torch\nn\functional.py", line 3130
    if input.dim() == 3 and mode == 'nearest':
        return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    if input.dim() == 4 and mode == 'nearest':
        return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
Serialized   File "code/__torch__/torch/nn/functional/___torch_mangle_46.py", line 155
    _49 = False
  if _49:
    _51 = torch.upsample_nearest1d(input, output_size3, scale_factors6)
          ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _50 = _51
  else:
'interpolate' is being compiled since it was called from 'FeatureNet.forward'
Serialized   File "code/__torch__/models/net.py", line 139
  def forward(self: __torch__.models.net.FeatureNet,
    x: Tensor) -> List[Tensor]:
    _35 = __torch__.torch.nn.functional.___torch_mangle_46.interpolate
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _36 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)
    _37 = torch.empty([1], dtype=None, layout=None, device=None, pin_memory=None, memory_format=None)

@Dawars
Copy link
Contributor

Dawars commented Mar 5, 2021 via email

Comment on lines +80 to +88
if (model_.count(thread_index_) == 0) {
std::cout << "First definition of patch-match module for thread index: "
<< options_.gpu_index << std::endl;
model_[thread_index_] =
torch::jit::load(options_.mvs_module_path, kDevIn);
} else {
std::cout << "Patch-match module already defined for thread index: "
<< options_.gpu_index << std::endl;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the first run is successful, otherwise the thread terminates at model.forwards(...) (https://github.com/colmap/colmap/pull/1129/files#diff-ec9150c5522870ad0fd07f523905377b4d0670e7f42d92f2bbe11ceb42adb1beR59).

When I load the pytorch module every time this problem doesn't occur

Copy link
Contributor Author

@anmatako anmatako Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very interesting failure mode. Failing at the forward() evaluation likely means that the module is not there for the subsequent runs. Are you executing on a single or multi GPU environment? If running on multi-GPU environment, try using just a single GPU index in the patch-match options and see if it fails the same way.

The reason I ended up with this setup instead of loading the module for each problem is to take advantage of the optimizations that LibTorch does in the JIT modules. My main finding was that loading the module every time and no optimization is about 2x slower compared to reusing the module and allow optimizations.

If you can share a dataset that causes this failure I can try to reproduce on my end and see if I can debug it effectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using a single GPU, not loading the model each time makes sense.

I'll send you the dataset.

@jingyibo123
Copy link

Upvote on the integration of 3rd party learning-based MVS methods.

With the recent popularity of colmap amongst the greater CV community, and the advancements in the learning-based SfM & MVS methods, it would be very beneficial for both sides to be able to incorporate methods such as PatchMatchNet, MVSNet, SuperPoint, SuperGlue, etc..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants