Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: parsing strings containing two related values #138

Open
rowlesmr opened this issue Aug 24, 2022 · 4 comments
Open

Question: parsing strings containing two related values #138

rowlesmr opened this issue Aug 24, 2022 · 4 comments

Comments

@rowlesmr
Copy link

rowlesmr commented Aug 24, 2022

I'm writing a library for some scientific data management, and I have a need to parse strings of the sort "+12.345e-02(13)"**, which contains the value 0.12345 and the error 0.00013.

I can see how I can adapt your library to cope with the leading +. How difficult do you think it would be to extend the parsing to include the bracketed digits? At this point in time, I only need to go from char to double.

I don't know if my code would be worthy of putting back into the library, but I can make everything available.

.

** In the grammar I'm following:
SIGN = [+-]
DIGIT = [0-9]
UINT = DIGIT+
INT = SIGN? UNIT
EXP = [eE] INT
FLOAT = INT EXP | SIGN? DIGIT* '.' UINT EXP? | INT '.' EXP?
NUMB = INT | FLOAT
NUMERIC = NUMB | NUMB '(' UINT ')'

+ = 1 or more of
* = 0 or more of
? = 0 or 1 of
| = or

@lemire
Copy link
Member

lemire commented Aug 24, 2022

Is that a known standard? I am not familiar with your notation.

@rowlesmr
Copy link
Author

rowlesmr commented Aug 24, 2022

It's a standard scientific representation of a value and its uncertainty. The number in brackets maps onto the rightmost digits in the value. 123(45) : 123 and 45, - 64.3(12): - 64.3 and 1.2, 1.23e3(4): 1230 and 40.

The reference for the grammar is https://www.iucr.org/resources/cif/spec/version1.1/cifsyntax

@lemire
Copy link
Member

lemire commented Aug 24, 2022

Looks like it could be added as a new template function using a few handfuls of lines. You would need to define a data type corresponding to this format because double and float won't do. If the implementation is, as I expect, quite compact, and you can write a reasonable amount of tests to make sure that the code is reasonably correct, then it looks like something we could merge.

Note that any pull request you provide should be additive. We don't want to change the existing parser or the existing code. We are very deliberate about the syntax we follow currently. E.g., folks may decide that they want to put a + in front of their numbers if they want, but that's forbidden by the C++ standard. If you can write new code that meets your needs, and if this code can be sufficiently small that it can be examined and does not risk harming other users (bloat, bugs,...) then it is good. We will put the bar rather high: it needs to be good code because we can't risk breaking this library. So it needs to be clean, efficient and well tested. However, we can be constructive about it.

@rowlesmr
Copy link
Author

Thanks for your feedback. I'll try and put something together to see if I can make it work, and then start on finenessing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants