Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make parsing locale independent if possible #1567

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

isuruf
Copy link
Member

@isuruf isuruf commented Jul 20, 2019

Fixes #1566

@isuruf
Copy link
Member Author

isuruf commented Jul 20, 2019

Looks like strtod_l/_strtod_l is available on linux, osx, mingw-w64, msvc15 environments. (Not on mingw though)

@certik
Copy link
Contributor

certik commented Jul 20, 2019

Thanks @isuruf. It looks quite complicated. Are we talking about converting "2.5" to 2.5? It seems it would be more robust and perhaps even faster to simply maintain our own conversion routine.

@isuruf
Copy link
Member Author

isuruf commented Jul 20, 2019

Are we talking about converting "2.5" to 2.5?

Yes, it also includes scientific notation and some edge cases like inf, nan. I don't know if it will be easy to maintain our conversion routine, but it'll certainly be slower.

@certik
Copy link
Contributor

certik commented Jul 20, 2019

It won't be slower if we take it from some of the libc implementations. In fact it can be faster, because it does not need to check the locale.

@isuruf
Copy link
Member Author

isuruf commented Jul 20, 2019

Right. We can't get it from glibc due to license. Let me have a look at what musl does.

res = parse(s);
REQUIRE(eq(*res, *real_double(2.5)));
std::setlocale(LC_NUMERIC, "C");
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only run this tests when HAVE_STRTOD_L is defined? Is there a reason not to run this always?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strtod_l is not in C standard and almost all platforms I've checked supports it. Linux, OSX, MinGW-w64 and MSVC15 supports it. Only MinGW 32-bit platform (MinGW-w64 32-bit supports it) doesn't support it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but what happens if the platform does not support it? It means the test will fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's why the test is only run if HAVE_STRTOD_L is defined.

Copy link
Contributor

@certik certik Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract it into the symengine_strtod.h header file, merge this PR, and then substitute symengine_strtod.h for our own implementation later and change this test to run everywhere.

#else
return strtod(startptr, endptr);
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably go into its own header, say symengine_strtod.h. So that if we provide our own implementation in the future, it is easy to change.

#include <locale.h>
#endif
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be moved to symengine_strtod.h.

@certik
Copy link
Contributor

certik commented Jul 22, 2019

Also note that this strtod is not just passed any floating point. It is passed in precisely a floating point in this form:

numeric ({dig}*\.?{dig}+([eE][-+]?{dig}+)?)|({dig}+\.)

We should split the float and int versions, so the float is just ({dig}*\.?{dig}+([eE][-+]?{dig}+)?). I wonder if this can be converted already in the scanner. That would make the most sense, there might be a way to do this very quickly: for example, the code in the scanner currently contains:

{numeric}           {
                        *dval = std::string(matched());
                        return Parser::NUMERIC;
                    }

So I wonder if there is a way to extract the parts of the floating point from the regular expression matcher (perhaps if we use some parenthesis like in Python) and simply convert it right there. That will be the most robust and probably also the fastest.

@isuruf
Copy link
Member Author

isuruf commented Jul 23, 2019

strtod is also passed a IMPLICIT_MUL token as in

implicitmul ({numeric}{ident})

@certik
Copy link
Contributor

certik commented Jul 30, 2019

Here is how to parse floating point: https://github.com/skvadrik/re2c/blob/f04d430876a5f65d025c7a6aaca47184520e277a/examples/07_cxx98.i.re#L133, it's very simple and should be very fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parsing of doubles depends on locale
2 participants