-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make parsing locale independent if possible #1567
base: master
Are you sure you want to change the base?
Conversation
Looks like strtod_l/_strtod_l is available on linux, osx, mingw-w64, msvc15 environments. (Not on mingw though) |
Thanks @isuruf. It looks quite complicated. Are we talking about converting "2.5" to 2.5? It seems it would be more robust and perhaps even faster to simply maintain our own conversion routine. |
Yes, it also includes scientific notation and some edge cases like inf, nan. I don't know if it will be easy to maintain our conversion routine, but it'll certainly be slower. |
It won't be slower if we take it from some of the libc implementations. In fact it can be faster, because it does not need to check the locale. |
Right. We can't get it from glibc due to license. Let me have a look at what musl does. |
res = parse(s); | ||
REQUIRE(eq(*res, *real_double(2.5))); | ||
std::setlocale(LC_NUMERIC, "C"); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only run this tests when HAVE_STRTOD_L
is defined? Is there a reason not to run this always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strtod_l
is not in C standard and almost all platforms I've checked supports it. Linux, OSX, MinGW-w64 and MSVC15 supports it. Only MinGW 32-bit platform (MinGW-w64 32-bit supports it) doesn't support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but what happens if the platform does not support it? It means the test will fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's why the test is only run if HAVE_STRTOD_L
is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I would really ship our own implementation of this very important conversion, that works everywhere. Here is how musl does it:
https://github.com/ifduyue/musl/blob/79f653c6bc2881dd6855299c908a442f56cb7c2b/src/stdlib/strtod.c#L6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can extract it into the symengine_strtod.h
header file, merge this PR, and then substitute symengine_strtod.h
for our own implementation later and change this test to run everywhere.
#else | ||
return strtod(startptr, endptr); | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should probably go into its own header, say symengine_strtod.h
. So that if we provide our own implementation in the future, it is easy to change.
#include <locale.h> | ||
#endif | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be moved to symengine_strtod.h
.
Also note that this symengine/symengine/parser/scanner.ll Line 13 in f7e253f
We should split the float and int versions, so the float is just ({dig}*\.?{dig}+([eE][-+]?{dig}+)?) . I wonder if this can be converted already in the scanner. That would make the most sense, there might be a way to do this very quickly: for example, the code in the scanner currently contains:
{numeric} {
*dval = std::string(matched());
return Parser::NUMERIC;
} So I wonder if there is a way to extract the parts of the floating point from the regular expression matcher (perhaps if we use some parenthesis like in Python) and simply convert it right there. That will be the most robust and probably also the fastest. |
symengine/symengine/parser/scanner.ll Line 14 in b699c32
|
Here is how to parse floating point: https://github.com/skvadrik/re2c/blob/f04d430876a5f65d025c7a6aaca47184520e277a/examples/07_cxx98.i.re#L133, it's very simple and should be very fast. |
Fixes #1566