Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced numerical accuracy in cythonized thermo.py integrals #20

Closed
gmagoon opened this issue Feb 15, 2010 · 13 comments
Closed

Reduced numerical accuracy in cythonized thermo.py integrals #20

gmagoon opened this issue Feb 15, 2010 · 13 comments

Comments

@gmagoon
Copy link
Contributor

gmagoon commented Feb 15, 2010

Numerical error seems to accumulate more readily within cythonized integrals and/or using computations based on results of cythonized integrals. As a result, when run in cython mode, the integral of squared errors can sometimes be substantially negative, when, in fact, the result must be non-negative. Example cases that cause such problems appear in thermotest.py unit tests. These cases do not cause such problems when run in pure-python mode. (There is also some evidence that the cython results are platform-dependent.)

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 15, 2010

Results of cythonized integrals differ from pure-python mode results as soon as the 7th digit. (actual differences can be seen by running thermotest.py in cython mode)

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

Exact value for testWilhoitIntegralT0 = 37743349890545371795103/1251583950064471569000
Numeric version of exact value (to 17 digits) = 30.156466842356949
Result of code in pure-python mode = 30.156466842356952 (differs in 16th digit)
Result of code with cython = 30.156465590000153 (differs in 8th digit)

Thus, it seems that there is substantial error WITHIN the cythonized integrals. Pure-python mode seems to be OK. Maybe the cythonized case is only using single precision?

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

Numbers copied from above to align for easy comparison:
30.156466842356949
30.156466842356952 (differs in 16th digit)
30.156465590000153 (differs in 8th digit)

@athlonshi
Copy link

If cythonized case only use single precision by default, can we force it to double precision by defining variables?

@rwest
Copy link
Member

rwest commented Feb 16, 2010

It should certainly be possible to declare a variable as a double. hopefully this helps, and if so, we should make a note in developer documentation!

play around with things like:

cython.declare(a=cython.float, b=cython.float)
cython.declare(a=cython.double, b=cython.double)

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

Thanks for the suggestions...I was under the impression that python float (which we use) corresponded to double in C, but I'll look into it some more tomorrow.

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

I tried changing the cython.float to cython.double in thermo.py, and it does seem to change the results, but doesn't seem to improve them...I now get the following for the test case:
30.15646755695343 (differs in 8th digit)
I can also try changing "float" to "double" in "thermo.pxd".

@rwest
Copy link
Member

rwest commented Feb 16, 2010

You can check the thermo.html file that is created by cython (or used to be, I assume the current setup.py script still does it?) to see exactly what C code is being created for those functions. Scroll to the relevant lines then click to reveal the C code.

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

OK, I also changed "float" to "double" in "thermo.pxd" and results are promising:
30.156466842356949 (equivalent in all 17 digits)

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

Thanks for your helpful suggestions! (FWIW, I don't seem to get a thermo.html file.)

@rwest
Copy link
Member

rwest commented Feb 16, 2010

I have to wonder, do we really need 16 significant figures here? ;-)

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

I'll try to clarify the issues we were encountering and elaborate in greater detail:

In the original case, (apparently single precision) I would get -0.00022270 for testObjectiveFunctionForOxygen while pure-python mode (double precision) gives a much more reasonable value of 0.00018295. (Note that this is an integral of squared errors the exact result should be non-negative, and in this case, non-zero.) Even if we don't care about error metrics, some results on Linux for O2 suggested that the actual Wilhoit fit parameters could be significantly different...although the Wilhoit polynomials agreed well in the range 300-1500 K, there was substantial difference below 300 K.

The thing is, the numerical error accumulates during the arithmetic and, at the end of many arithmetic operations (particularly things like subtracting very close values), the accuracy can be significantly lower than 1 part in 10^8 with single precision.

Also, based on some of the unit test results Josh sent me, it seemed that the single precision arithmetic exacerbated platform-dependent numerical issues, giving significantly different results on his system vs. mine (for the integral of squared errors test). Of course, even on the same platform, there is the issue that running in cython-mode and pure-python mode gives significantly different results if you are using single-precision in one case and double-precision in the other.

So, I would argue that the answer to your question is: "No, we do not need 16 significant figures, but double precision is highly desirable over single precision."

@gmagoon
Copy link
Contributor Author

gmagoon commented Feb 16, 2010

changed "float" to "double" in thermo.pxd

in conjunction with my previous commit, I believe this closed by 0ed788f

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants