Make parsing locale independent if possible #1567

isuruf · 2019-07-20T14:33:59Z

isuruf · 2019-07-20T18:19:43Z

Looks like strtod_l/_strtod_l is available on linux, osx, mingw-w64, msvc15 environments. (Not on mingw though)

certik · 2019-07-20T19:35:08Z

Thanks @isuruf. It looks quite complicated. Are we talking about converting "2.5" to 2.5? It seems it would be more robust and perhaps even faster to simply maintain our own conversion routine.

isuruf · 2019-07-20T20:13:21Z

Are we talking about converting "2.5" to 2.5?

Yes, it also includes scientific notation and some edge cases like inf, nan. I don't know if it will be easy to maintain our conversion routine, but it'll certainly be slower.

certik · 2019-07-20T20:41:35Z

It won't be slower if we take it from some of the libc implementations. In fact it can be faster, because it does not need to check the locale.

isuruf · 2019-07-20T23:06:30Z

Right. We can't get it from glibc due to license. Let me have a look at what musl does.

certik · 2019-07-22T16:22:09Z

symengine/tests/basic/test_parser.cpp

+    res = parse(s);
+    REQUIRE(eq(*res, *real_double(2.5)));
+    std::setlocale(LC_NUMERIC, "C");
+#endif


Why do we only run this tests when HAVE_STRTOD_L is defined? Is there a reason not to run this always?

strtod_l is not in C standard and almost all platforms I've checked supports it. Linux, OSX, MinGW-w64 and MSVC15 supports it. Only MinGW 32-bit platform (MinGW-w64 32-bit supports it) doesn't support it.

Yes, but what happens if the platform does not support it? It means the test will fail?

Yes. That's why the test is only run if HAVE_STRTOD_L is defined.

Hm. I would really ship our own implementation of this very important conversion, that works everywhere. Here is how musl does it:

https://github.com/ifduyue/musl/blob/79f653c6bc2881dd6855299c908a442f56cb7c2b/src/stdlib/strtod.c#L6

https://github.com/ifduyue/musl/blob/b4b1e10364c8737a632be61582e05a8d3acf5690/src/internal/floatscan.c#L429

We can extract it into the symengine_strtod.h header file, merge this PR, and then substitute symengine_strtod.h for our own implementation later and change this test to run everywhere.

certik · 2019-07-22T16:24:39Z

symengine/parser/parser.ih

+#else
+    return strtod(startptr, endptr);
+#endif
+}


I think this should probably go into its own header, say symengine_strtod.h. So that if we provide our own implementation in the future, it is easy to change.

certik · 2019-07-22T16:25:08Z

symengine/parser/parser.ih

+#include <locale.h>
+#endif
+#endif
+


These should be moved to symengine_strtod.h.

certik · 2019-07-22T17:11:51Z

Also note that this strtod is not just passed any floating point. It is passed in precisely a floating point in this form:

symengine/symengine/parser/scanner.ll

Line 13 in f7e253f

numeric ({dig}*\.?{dig}+([eE][-+]?{dig}+)?)|({dig}+\.)

We should split the float and int versions, so the float is just ({dig}*\.?{dig}+([eE][-+]?{dig}+)?). I wonder if this can be converted already in the scanner. That would make the most sense, there might be a way to do this very quickly: for example, the code in the scanner currently contains:

{numeric}           {
                        *dval = std::string(matched());
                        return Parser::NUMERIC;
                    }

So I wonder if there is a way to extract the parts of the floating point from the regular expression matcher (perhaps if we use some parenthesis like in Python) and simply convert it right there. That will be the most robust and probably also the fastest.

isuruf · 2019-07-23T04:46:01Z

strtod is also passed a IMPLICIT_MUL token as in

symengine/symengine/parser/scanner.ll

Line 14 in b699c32

implicitmul ({numeric}{ident})

certik · 2019-07-30T21:59:49Z

Here is how to parse floating point: https://github.com/skvadrik/re2c/blob/f04d430876a5f65d025c7a6aaca47184520e277a/examples/07_cxx98.i.re#L133, it's very simple and should be very fast.

isuruf added 4 commits July 20, 2019 09:32

Make parsing locale independent if possible

83386ba

Use LC_NUMERIC

ef3f68c

Use _create_locale on windows

3d0d4ab

Fix for mingw-w64

52d03a8

certik reviewed Jul 22, 2019

View reviewed changes

symengine/parser/parser.ih

#include <locale.h>

#endif

#endif

Copy link

Contributor

certik Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be moved to symengine_strtod.h.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make parsing locale independent if possible #1567

Make parsing locale independent if possible #1567

isuruf commented Jul 20, 2019 •

edited

isuruf commented Jul 20, 2019

certik commented Jul 20, 2019

isuruf commented Jul 20, 2019

certik commented Jul 20, 2019

isuruf commented Jul 20, 2019

certik Jul 22, 2019

isuruf Jul 22, 2019

certik Jul 22, 2019

isuruf Jul 22, 2019

certik Jul 22, 2019 •

edited

certik Jul 22, 2019

certik Jul 22, 2019

certik Jul 22, 2019

certik commented Jul 22, 2019 •

edited

isuruf commented Jul 23, 2019

certik commented Jul 30, 2019

Make parsing locale independent if possible #1567

Are you sure you want to change the base?

Make parsing locale independent if possible #1567

Conversation

isuruf commented Jul 20, 2019 • edited

isuruf commented Jul 20, 2019

certik commented Jul 20, 2019

isuruf commented Jul 20, 2019

certik commented Jul 20, 2019

isuruf commented Jul 20, 2019

certik Jul 22, 2019

Choose a reason for hiding this comment

isuruf Jul 22, 2019

Choose a reason for hiding this comment

certik Jul 22, 2019

Choose a reason for hiding this comment

isuruf Jul 22, 2019

Choose a reason for hiding this comment

certik Jul 22, 2019 • edited

Choose a reason for hiding this comment

certik Jul 22, 2019

Choose a reason for hiding this comment

certik Jul 22, 2019

Choose a reason for hiding this comment

certik Jul 22, 2019

Choose a reason for hiding this comment

certik commented Jul 22, 2019 • edited

isuruf commented Jul 23, 2019

certik commented Jul 30, 2019

isuruf commented Jul 20, 2019 •

edited

certik Jul 22, 2019 •

edited

certik commented Jul 22, 2019 •

edited