Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results depending on machine #38

Open
Hugo-Pereira opened this issue Oct 16, 2019 · 11 comments
Open

Different results depending on machine #38

Hugo-Pereira opened this issue Oct 16, 2019 · 11 comments

Comments

@Hugo-Pereira
Copy link

Hi,

I am running the same example on two different machines (different hardware, same OS - ubuntu 18.04).
I have ENABLE_NON_DETERMINISTIC_PARALLELISM=OFF.
The results I get from each machine is slightly different. Is this expected behaviour?

Thanks

@kzampog
Copy link
Owner

kzampog commented Oct 16, 2019

Hi,

The flag only toggles behavior in cases that would be affected by the non-associativity of floating point arithmetic. If the results are consistent for a given machine, differences may be due to different hardware architectures. Which example is generating non-deterministic results?

@Hugo-Pereira
Copy link
Author

Hugo-Pereira commented Oct 18, 2019

I have been running some tests, and it appears the results start differing on the output of cilantro::RGBDImagesToPointsNormalsColors. The last 6 points and colors have different values
The "example" I am talking about is from scans I am performing with an iPhone, not the ones on the repo, sorry.

This is the comparison between points on the cloud generated on my dev machine and on my CI machine.
image

I just confirmed the inputs I pass to RGBDImagesToPointsNormalsColors are exactly the same.

@Hugo-Pereira
Copy link
Author

If the results are consistent for a given machine, differences may be due to different hardware architectures.

Is this intended? I really need for the results to be the same across different hardware :\

@kzampog
Copy link
Owner

kzampog commented Oct 19, 2019

This is interesting. I would not be surprised by differences in the order of machine epsilon, but that does not seem to be the case here (e.g. 5th point). If input images are exactly the same, maybe you are using a custom depth converter that behaves non-deterministically? I can't think of anything else right now; all the function itself does is simple operations. If you could share a minimal example (code and data) that reproduces the problem, that would really help!

@Algomorph
Copy link
Collaborator

Algomorph commented Oct 19, 2019

@Hugo-Pereira , I suggest a couple of things to try (some of this you may have tried already).

    • Perhaps check to make sure the versions of Eigen are the same on your CI and your dev machine
    • If they are the same but new(ish), try an older release of Eigen on both
    • Analyze the compiler stack differences between your two machines -- try to get those two to converge to see if the problem stems from having slightly different environments (CMake version, gcc version)
    • See if the results are different for a specific examples but the same for most, or are they different from most (may help to isolate the problem)
    • Make triple-sure that the CI environment is reading inputs from the same paths, i.e. it's actually reading the same files / file versions

@Hugo-Pereira
Copy link
Author

I am using TruncatedDepthValueConverter.
The inputs are the same, and I am using docker so all libraries and dependencies are (should be? :) ) the same.
I am using Eigen 3.3.4.

I'll assemble a sample project and send it over as soon as I am able.

@Hugo-Pereira
Copy link
Author

Oops, I was reading the wrong memory addresses :\ Sorry for the confusion
The points of the point cloud match 100%. I am getting different results on the normals though, which by itself will cause the alignment to not match.

image

Is there a way around this? To force the results to be the same between machines

@kzampog
Copy link
Owner

kzampog commented Oct 23, 2019

Oh OK!

Regarding the normal computation, are you using single or double precision floats? I think differences look normal (no pun intended) for single precision. Are you using the NormalEstimation class or the image conversion utility?

@Hugo-Pereira
Copy link
Author

Single precision, I am using RGBDImagesToPointsNormalsColors. I get the exact same results for the points and colors, but not for normals. Compiled cilantro with -DENABLE_NATIVE_BUILD_OPTIMIZATIONS=OFF and -DENABLE_NON_DETERMINISTIC_PARALLELISM=OFF

@kzampog
Copy link
Owner

kzampog commented Oct 24, 2019

That function computes a cross product and normalizes it for each normal vector. Eigen's cross product looks innocent, but it seems that the sqrt implementation used by .normalized() uses platform-dependent intrinsics. You could try manually normalizing instead using std::sqrt, although I'm not sure what guarantees that comes with!

Edit: This might also be worth checking:
https://eigen.tuxfamily.org/dox/TopicPreprocessorDirectives.html
It appears EIGEN_FAST_MATH is defined by default!

@Hugo-Pereira
Copy link
Author

The thing is I am using Eigen for other stuff (like triangle mesh deformation, normals, etc), and the results are consistent between my dev machine and CI.
I'll look further into it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants