Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating Point Exception in Actions build process #475

Open
1 of 3 tasks
bcoconni opened this issue Jul 11, 2021 · 11 comments · Fixed by #477
Open
1 of 3 tasks

Floating Point Exception in Actions build process #475

bcoconni opened this issue Jul 11, 2021 · 11 comments · Fixed by #477
Labels

Comments

@bcoconni
Copy link
Member

bcoconni commented Jul 11, 2021

I'm submitting a ...

  • bug report
  • feature request
  • support request => Please do not submit support request here

Describe the issue
The build for MSVC repeatedly fails while executing the test CheckScripts.py and more specifically while running the script c172_runway_at_rest_cg_shift.xml.

What is the current behavior?
The test reports an FPE (Floating Point Exception) which seems to be located in the waypoint/geodetic computations code.
Here is the stack trace as reported by the test failure:

#11   Source "D:\a\jsbsim\jsbsim\build\python\jsbsim.cxx", line 14008, in __pyx_pf_6jsbsim_9FGFDMExec_10run [00007FFF462BEB9D]
       14005:  */
       14006:   __Pyx_XDECREF(__pyx_r);
       14007:   try {
      >14008:     __pyx_t_1 = __pyx_v_self->thisptr->Run();
       14009:   } catch(...) {
       14010:     convertJSBSimToPyExc(); if (!PyErr_Occurred())PyErr_SetString(PyExc_RuntimeError, "Error converting c++ exception.");
       14011:     __PYX_ERR(0, 579, __pyx_L1_error)
#10   Source "D:\a\jsbsim\jsbsim\src\FGFDMExec.cpp", line 423, in JSBSim::FGFDMExec::Run [00007FFF46310F82]
        420:   // returns true if success, false if complete
        421:   if (Script && !IntegrationSuspended()) success = Script->RunScript();
        422: 
      > 423:   for (unsigned int i = 0; i < Models.size(); i++) {
        424:     LoadInputs(i);
        425:     Models[i]->Run(holding);
        426:   }
#9    Source "D:\a\jsbsim\jsbsim\src\models\FGFCS.cpp", line 171, in JSBSim::FGFCS::Run [00007FFF4637492E]
        168:   for (i=0; i<SystemChannels.size(); i++) {
        169:     if (debug_lvl & 4) cout << "    Executing System Channel: " << SystemChannels[i]->GetName() << endl;
        170:     ChannelRate = SystemChannels[i]->GetRate();
      > 171:     SystemChannels[i]->Execute();
        172:   }
        173:   ChannelRate = 1;
#8    Source "D:\a\jsbsim\jsbsim\src\models\flight_control\FGWaypoint.cpp", line 185, in JSBSim::FGWaypoint::Run [00007FFF463DB519]
        182:     double heading_to_waypoint_rad = source.GetHeadingTo(target_longitude_rad,
        183:                                                          target_latitude_rad);
        184: 
      > 185:     if (eUnit == eDeg) Output = heading_to_waypoint_rad * radtodeg;
        186:     else               Output = heading_to_waypoint_rad;
        187: 
        188:   } else {                            // Calculate Distance
#7    Source "D:\a\jsbsim\jsbsim\src\math\FGLocation.cpp", line 402, in JSBSim::FGLocation::GetHeadingTo [00007FFF4636379A]
        399:   geod.Inverse(mGeodLat * radtodeg, mLon * radtodeg, target_latitude * radtodeg,
        400:                target_longitude * radtodeg, heading, azimuth2);
        401: 
      > 402:   return heading * degtorad;
        403: }
        404: 
        405: //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#6    Source "D:\a\jsbsim\jsbsim\src\GeographicLib\Geodesic.cpp", line 500, in GeographicLib::Geodesic::GenInverse [00007FFF463C1C22]
        497:                                   real& S12) const {
        498:     outmask &= OUT_MASK;
        499:     real salp1, calp1, salp2, calp2,
      > 500:       a12 =  GenInverse(lat1, lon1, lat2, lon2,
        501:                         outmask, s12, salp1, calp1, salp2, calp2,
        502:                         m12, M12, M21, S12);
        503:     if (outmask & AZIMUTH) {
#5    Source "D:\a\jsbsim\jsbsim\src\GeographicLib\Geodesic.cpp", line 400, in GeographicLib::Geodesic::GenInverse [00007FFF463C1756]
        397:                   cbet1, cbet2, lengthmask, s12x, m12x, dummy, M12, M21, Ca);
        398:         }
        399:         m12x *= _b;
      > 400:         s12x *= _b;
        401:         a12 = sig12 / Math::degree();
        402:         if (outmask & AREA) {
        403:           // omg12 = lam12 - domg12
#4    Object "", at 00007FFF7E50379E, in KiUserExceptionDispatcher
#3    Object "", at 00007FFF7E464CEF, in RtlWalkFrameChain
#2    Object "", at 00007FFF7E504A2F, in _chkstk
#1    Object "", at 00007FFF484FEBC0, in _C_specific_handler
Attempt to access invalid address.
#0    Object "", at 00007FF609A224CC, in _C_specific_handler
FAIL
ERROR

======================================================================
ERROR: testScripts (__main__.CheckScripts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\jsbsim\jsbsim\tests\JSBSim_utils.py", line 156, in tearDown
    self.sandbox.erase()
  File "D:\a\jsbsim\jsbsim\tests\JSBSim_utils.py", line 56, in erase
    shutil.rmtree(self._tmpdir)
  File "C:\hostedtoolcache\windows\Python\3.9.5\x64\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\hostedtoolcache\windows\Python\3.9.5\x64\lib\shutil.py", line 618, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "C:\hostedtoolcache\windows\Python\3.9.5\x64\lib\shutil.py", line 616, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\jsbsim\\jsbsim\\build\\tests\\tmpbzkk6123\\JSBout172B.csv'

======================================================================
FAIL: testScripts (__main__.CheckScripts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\jsbsim\jsbsim\tests\CheckScripts.py", line 39, in testScripts
    ExecuteUntil(fdm, 30.)
fpectl.FloatingPointError: Caught signal SIGFPE in JSBSim

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\a\jsbsim\jsbsim\tests\CheckScripts.py", line 43, in testScripts
    self.fail("Script %s failed:\n%s" % (s, e.args[0]))
AssertionError: Script ..\..\..\scripts\c172_runway_at_rest_cg_shift.xml failed:
Caught signal SIGFPE in JSBSim

What is the expected behavior?
All the tests to run successfully.

What is the motivation / use case for changing the behavior?
N/A

Please tell us about your environment:
OS: GitHub Actions windows-latest host runner
JSBSim: commit 11ea52e from the master branch.
Python: 3.6

Other information
None

@bcoconni bcoconni added the bug label Jul 11, 2021
@seanmcleod
Copy link
Member

When I run the scripts\c172_runway_at_rest_cg_shift.xml locally here on Windows I don't see the issue. I think I saw you making a comment on one of the commits that you also couldn't reproduce it locally on Windows?

PS C:\source\jsbsim> .\Release\JSBSim.exe --script=scripts\c172_runway_at_rest_cg_shift.xml
...
...
------------------------------------------------------------------
State Report at sim time: 0.000000 seconds
  Position
    ECI:   -1329521.725534581, -14711849.15995605, 14771801.97958151 (x,y,z, in ft)
    ECEF:  -1329521.725535 , -14711849.159956 , 14771801.979582 (x,y,z, in ft)
    Local: 45.192423, -95.163839, 4.600026 (geodetic lat, lon, alt ASL in deg and ft)

  Orientation
    ECI:   -180, -44.80757682585688, 84.836161 (phi, theta, psi in deg)
    Local: -5.264532060613332e-16, -1.590277340731758e-15, 8.20496608763031e-15 (phi, theta, psi in deg)

  Velocity
    ECI:   1072.80495937053, -96.95025317596601, 0 (x,y,z in ft/s)
    ECEF:  0, 0, 0 (x,y,z in ft/s)
    Local: 0.000000 , 0.000000 , 0.000000 (n,e,d in ft/sec)
    Body:  0.000000 , 0.000000 , 0.000000 (u,v,w in ft/sec)

  Body Rates (relative to given frame, expressed in body frame)
    ECI:   0.002944405985131796, -4.638594100204067e-19, -0.002964249795347291 (p,q,r in deg/s)
    ECEF:  0, 0, 0 (p,q,r in deg/s)

---- JSBSim Execution beginning ... --------------------------------------------

Start: Sunday July 11 2021 21:44:18 (HH:MM:SS)
0: GEAR_CONTACT: 0 seconds: Nose Gear 1
1: GEAR_CONTACT: 0.091663 seconds: Left Main Gear 1
2: GEAR_CONTACT: 0.091663 seconds: Right Main Gear 1

Shift weight (Event 0) executed at time: 1.508273

End: Sunday July 11 2021 21:44:18 (HH:MM:SS)
PS C:\source\jsbsim>

@bcoconni
Copy link
Member Author

bcoconni commented Jul 11, 2021

I think I saw you making a comment on one of the commits that you also couldn't reproduce it locally on Windows?

Indeed, the test completes successfully for me on Linux and Windows. It only fails when the test is executed on GitHub host runners for MSVC. Most likely there is a combination of criteria that are met on GitHub and which do not exist on our PC. May be the version of Python ? It's 3.6 on GitHub while being 3.7 on my Windows PC and 3.9 on my Linux PC.

At the moment, there is no reason to discard this problem as being a mere coincidence. Especially since the failure is reproducing repeatedly at each build which IMHO indicates that it is a genuine bug.

@bcoconni
Copy link
Member Author

At some point later this week, I intend to run scripts\c172_runway_at_rest_cg_shift.xml with a debugger and check if the variables s12x or m12x are close to values that could trigger an FPE: either extremely small or extremely huge. A division by zero or an illegal value seem to be excluded at first since the FPE is raised during multiplications.

@seanmcleod
Copy link
Member

Hmm, I wasn't running it from python, I ran jsbsim.exe directly. Just trying python CheckScripts.py now it doesn't look like fpectl is a standard installed module?

PS C:\source\jsbsim\tests> python .\CheckScripts.py
Traceback (most recent call last):
  File ".\CheckScripts.py", line 23, in <module>
    import fpectl
ModuleNotFoundError: No module named 'fpectl'

Also looking at CheckScipts.py I'm a bit confused, I don't see c172_runway_at_rest_cg_shift.xml referenced?

for s in self.script_list(['737_cruise_steady_turn_simplex.xml']):

@bcoconni
Copy link
Member Author

Just trying python CheckScripts.py now it doesn't look like fpectl is a standard installed module?

The command python CheckScripts.py should be executed from the tests folder where the Python modules jsbsim and fpectl have been built: it should be something like build/Release/tests, build/RelWithDebInfo/tests or build/Debug/test.

Also looking at CheckScipts.py I'm a bit confused, I don't see c172_runway_at_rest_cg_shift.xml referenced?

Yeah, the test API is poorly designed (who read the tests anyway ? 😄): the list provided to self.script_list is actually a blacklist i.e. the list of scripts to be excluded from the test.

def script_list(self, blacklist=[]):
script_path = self.sandbox.path_to_jsbsim_file('scripts')
for f in os.listdir(script_path):
if f in blacklist:
continue
fullpath = os.path.join(script_path, f)
# Does f contain a JSBSim script ?
if CheckXMLFile(fullpath, 'runscript'):
yield fullpath

I guess the line of code would be clearer if it would be written like so:

- for s in self.script_list(['737_cruise_steady_turn_simplex.xml']):
+ for s in self.script_list(blacklist=['737_cruise_steady_turn_simplex.xml']):

@seanmcleod
Copy link
Member

Ah, I did run python CheckScripts.py in the tests folder, however I hadn't performed a cmake build, just the Visual Studio build of the project which only builds jsbsim.exe.

I guess the line of code would be clearer if

Yep, especially while taking a quick glance at the code while watching the Euros soccer final at the same time! 😉

@bcoconni
Copy link
Member Author

Of course, when running the test from build/.../tests you need to give the relative path to python: something like

> python ../../../tests/CheckScripts.py

In order to save myself the trouble of all these dots, I generally use the ctest command from the build directory root (i.e. build/Debug, build/Release, etc.)

> ctest -R CheckScript --output-on-failure

The last parameter --output-on-failure to make sure that the complete test log is only dumped if there is a failure in the test.

bcoconni added a commit that referenced this issue Jul 14, 2021
GeographicLib v1.52 contains the following update:
* Geodesic routines: be more aggressive in preventing negative s12 and m12 for short lines
@bcoconni
Copy link
Member Author

It finally happened to be a genuine bug which was located is our dependency GeographicLib. It has been fixed in the release v1.52 of GeographicLib:

  • Geodesic routines: be more aggressive in preventing negative s12 and m12 for short lines (all languages).

However, in its release v1.51, GeographicLib switched to C++11 math routines rather than using their own:

  • C++11 compiler required for C++ library. As a consequence:
    • The workaround implementations for C++11 routines (Math::hypot, Math::expm1, Math::log1p, Math::asinh, Math::atanh, Math::copysign, Math::cbrt, Math::remainder, Math::remquo, Math::round, Math::lround, Math::fma, Math::isfinite, and Math::isnan) are now deprecated. Just use the versions in the std:: namespace instead.
    • SphericalEngine class, fix the namespace for using streamoff.
    • Some templated functions, e.g., Math::degree(), now have default template parameters, T = Math::real.

So it may be that Microsoft (i.e. stdlib) C++11 math functions are better handling the subtleties of floating point arithmetic than GeographicLib's !?!

sthagen added a commit to sthagen/JSBSim-Team-jsbsim that referenced this issue Jul 14, 2021
Update GeographicLib to version 1.52 to fix issue JSBSim-Team#475. (JSBSim-Team#477)
@seanmcleod
Copy link
Member

So how come I didn't see the error when I ran .\Release\JSBSim.exe --script=scripts\c172_runway_at_rest_cg_shift.xml?

Since I wasn't running it via the python test, rather running the JSBSim.exe were the set of floating point exceptions that are enabled different?

@bcoconni
Copy link
Member Author

Floating point exceptions are only raised when the Python module fpectl is used (which function turnon_sigfpe sets some CPU flags to send a signal SIGFPE whenever an FPE occurs). Otherwise, FPEs are plainly and simply ignored.

@bcoconni
Copy link
Member Author

There has just been another occurrence of the bug on the Linux runner: same stack trace than above, script scripts/c1724.xml.
It smells like there is an uninitialized variable somewhere in GeographicLib code.

@bcoconni bcoconni reopened this Jul 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants