Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated operations are in a different order on different platforms #343

Open
gareth-cross opened this issue Jun 7, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@gareth-cross
Copy link
Contributor

Describe the bug
Operations are placed in a different order on different platforms. This can result in unexpected numerical issues in some cases.

Testing details:

  • Linux: Ubuntu 22.04, Python 3.8.16, x86
  • Mac: OSX Ventura 13.3.1, Python 3.8.16, Apple M2/ARM
  • Symforce v0.8.0, installed from pip

To Reproduce

  1. Run the following script to generate a test function on both OSX and Ubuntu:
from pathlib import Path

import symforce
symforce.set_symbolic_api('symengine')
symforce.set_epsilon_to_number(1.0e-16)

from symforce import geo
from symforce import symbolic as sm
from symforce.codegen import Codegen, CppConfig
from symforce import typing as T


def test_function(time: T.Scalar) -> geo.V4:
    rate = sm.pi / 6
    angle = rate * time
    x_axis = geo.V3([-sm.cos(angle), -sm.sin(angle), 0.0])
    z_axis = geo.V3([0, 0, 1])
    y_axis = z_axis.cross(x_axis)

    world_R_body = geo.Matrix33.column_stack(x_axis, y_axis, z_axis)
    world_R_body_quat = geo.Rot3.from_rotation_matrix(R=world_R_body)
    quat_elements = geo.V4(world_R_body_quat.to_storage())

    world_R_body_subbed = world_R_body.subs(time, 3.01).evalf()
    quat_elements_subbed = quat_elements.subs(time, 3.01).evalf()
    print(world_R_body_subbed)
    print(quat_elements_subbed)
    return quat_elements


if __name__ == '__main__':
    cg = Codegen.function(func=test_function, config=CppConfig())
    path = Path(__file__).parent.absolute()
    cg.generate_function(output_dir=path / 'output')
  1. Observe the output of the python script (the values of world_R_body_subbed and quat_elements_subbed)
[0.00523596383141919, 0.999986292247427, 0.0]
[-0.999986292247427, 0.00523596383141919, 0.0]
[0.0, 0.0, 1.0]

[0.0]
[0.0]
[-0.705253158861618]
[0.708955557080773]
  1. Observe that the order of operations is different on the two platforms. This is the exact diff:
35,42c35,43
<   const Scalar _tmp4 = -_tmp1;
<   const Scalar _tmp5 = -std::max<Scalar>(1, std::max<Scalar>(_tmp4, _tmp3 + 1));
<   const Scalar _tmp6 = 1 - std::max<Scalar>(0, -(((_tmp4 + _tmp5) > 0) - ((_tmp4 + _tmp5) < 0)));
<   const Scalar _tmp7 = std::min<Scalar>(1, _tmp6);
<   const Scalar _tmp8 = std::min<Scalar>(_tmp6, 1 - std::max<Scalar>(0, _tmp7));
<   const Scalar _tmp9 = _tmp5 + 1;
<   const Scalar _tmp10 = std::min<Scalar>(1 - std::max<Scalar>(0, std::max<Scalar>(_tmp7, _tmp8)),
<                                          1 - std::max<Scalar>(0, -(((_tmp9) > 0) - ((_tmp9) < 0))));
---
>   const Scalar _tmp4 = _tmp3 + 1;
>   const Scalar _tmp5 = -_tmp1;
>   const Scalar _tmp6 = -std::max<Scalar>(1, std::max<Scalar>(_tmp4, _tmp5));
>   const Scalar _tmp7 = 1 - std::max<Scalar>(0, -(((_tmp5 + _tmp6) > 0) - ((_tmp5 + _tmp6) < 0)));
>   const Scalar _tmp8 = std::min<Scalar>(1, _tmp7);
>   const Scalar _tmp9 = std::min<Scalar>(_tmp7, 1 - std::max<Scalar>(0, _tmp8));
>   const Scalar _tmp10 =
>       std::min<Scalar>(1 - std::max<Scalar>(0, std::max<Scalar>(_tmp8, _tmp9)),
>                        1 - std::max<Scalar>(0, -(((_tmp6 + 1) > 0) - ((_tmp6 + 1) < 0))));
46,47c47,48
<       1 - std::max<Scalar>(0, std::max<Scalar>(_tmp10, std::max<Scalar>(_tmp7, _tmp8))),
<       1 - std::max<Scalar>(0, -(((_tmp3 + _tmp9) > 0) - ((_tmp3 + _tmp9) < 0))));
---
>       1 - std::max<Scalar>(0, std::max<Scalar>(_tmp10, std::max<Scalar>(_tmp8, _tmp9))),
>       1 - std::max<Scalar>(0, -(((_tmp4 + _tmp6) > 0) - ((_tmp4 + _tmp6) < 0))));
53,54c54,55
<   _res(0, 0) = (Scalar(1) / Scalar(2)) * _tmp7 * (1 - _tmp7);
<   _res(1, 0) = (Scalar(1) / Scalar(2)) * _tmp8 * (1 - _tmp8);
---
>   _res(0, 0) = (Scalar(1) / Scalar(2)) * _tmp8 * (1 - _tmp8);
>   _res(1, 0) = (Scalar(1) / Scalar(2)) * _tmp9 * (1 - _tmp9);
  1. Run the C++ test code and evaluate at time=3.01:
#include <fmt/format.h>

#include "output/cpp/symforce/sym/test_function.h"

int main() {
  const double time = 3.01;
  auto v = sym::TestFunction(time);
  fmt::print("time = {:>6}, v = {:>6}, {:>6}, {:>6}, {:>6}\n", time, v[0], v[1],
             v[2], v[3]);
  return 0;
}
  1. Observe that the numerical output is non-trivially different:
Mac version (resultant quaternion has zero norm):
time =   3.01, v =      0,      0,      0,      0
Linux version (resultant quaternion matches python code):
time =   3.01, v =      0,      0, -0.7052531588616179, 0.7089555570807733

NOTE: In this particular case, there may additionally be a bug in Rot3.from_rotation_matrix - the epsilon value is unused. This example just highlights how unpredictable behavior emerges from this issue.

Expected behavior
Ideally order of operations would be identical on all operating systems.

Environment

  • OS and version: Ubuntu 22.04, OSX Ventura
  • Python version: 3.8
  • SymForce Version 0.8
@gareth-cross gareth-cross added the bug Something isn't working label Jun 7, 2023
@aaron-skydio
Copy link
Member

aaron-skydio commented Jun 7, 2023

This is a weird one - we had similar issues in the past when SymEngine was built with different implementations of std::unordered_map. I'm a little surprised we haven't seen this, we should definitely fix

NOTE: In this particular case, there may additionally be a bug in Rot3.from_rotation_matrix - the epsilon value is unused. This example just highlights how unpredictable behavior emerges from this issue.

Yep, good callout - this is not a bug but is definitely surprising - our current implementation of from_rotation_matrix doesn't require an epsilon, but we still have the argument mostly to not break the API. In general we aim for our generated code to be identical everywhere, and treat it as a bug if it's not, regardless of whether evaluated numerical results are meaningfully different

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants