Question: parsing strings containing two related values #138

rowlesmr · 2022-08-24T06:57:20Z

I'm writing a library for some scientific data management, and I have a need to parse strings of the sort "+12.345e-02(13)"**, which contains the value 0.12345 and the error 0.00013.

I can see how I can adapt your library to cope with the leading +. How difficult do you think it would be to extend the parsing to include the bracketed digits? At this point in time, I only need to go from char to double.

I don't know if my code would be worthy of putting back into the library, but I can make everything available.

.

** In the grammar I'm following:
SIGN = [+-]
DIGIT = [0-9]
UINT = DIGIT+
INT = SIGN? UNIT
EXP = [eE] INT
FLOAT = INT EXP | SIGN? DIGIT* '.' UINT EXP? | INT '.' EXP?
NUMB = INT | FLOAT
NUMERIC = NUMB | NUMB '(' UINT ')'

+ = 1 or more of
* = 0 or more of
? = 0 or 1 of
| = or

The text was updated successfully, but these errors were encountered:

lemire · 2022-08-24T11:58:49Z

Is that a known standard? I am not familiar with your notation.

rowlesmr · 2022-08-24T12:17:09Z

It's a standard scientific representation of a value and its uncertainty. The number in brackets maps onto the rightmost digits in the value. 123(45) : 123 and 45, - 64.3(12): - 64.3 and 1.2, 1.23e3(4): 1230 and 40.

The reference for the grammar is https://www.iucr.org/resources/cif/spec/version1.1/cifsyntax

lemire · 2022-08-24T12:49:02Z

Looks like it could be added as a new template function using a few handfuls of lines. You would need to define a data type corresponding to this format because double and float won't do. If the implementation is, as I expect, quite compact, and you can write a reasonable amount of tests to make sure that the code is reasonably correct, then it looks like something we could merge.

Note that any pull request you provide should be additive. We don't want to change the existing parser or the existing code. We are very deliberate about the syntax we follow currently. E.g., folks may decide that they want to put a + in front of their numbers if they want, but that's forbidden by the C++ standard. If you can write new code that meets your needs, and if this code can be sufficiently small that it can be examined and does not risk harming other users (bloat, bugs,...) then it is good. We will put the bar rather high: it needs to be good code because we can't risk breaking this library. So it needs to be clean, efficient and well tested. However, we can be constructive about it.

rowlesmr · 2022-08-24T13:25:18Z

Thanks for your feedback. I'll try and put something together to see if I can make it work, and then start on finenessing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: parsing strings containing two related values #138

Question: parsing strings containing two related values #138

rowlesmr commented Aug 24, 2022 •

edited

lemire commented Aug 24, 2022

rowlesmr commented Aug 24, 2022 •

edited

lemire commented Aug 24, 2022

rowlesmr commented Aug 24, 2022

Question: parsing strings containing two related values #138

Question: parsing strings containing two related values #138

Comments

rowlesmr commented Aug 24, 2022 • edited

lemire commented Aug 24, 2022

rowlesmr commented Aug 24, 2022 • edited

lemire commented Aug 24, 2022

rowlesmr commented Aug 24, 2022

rowlesmr commented Aug 24, 2022 •

edited

rowlesmr commented Aug 24, 2022 •

edited