Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: to_chars() alternative? #23

Open
biojppm opened this issue Nov 7, 2020 · 11 comments
Open

feature request: to_chars() alternative? #23

biojppm opened this issue Nov 7, 2020 · 11 comments

Comments

@biojppm
Copy link
Contributor

biojppm commented Nov 7, 2020

Thanks for your work -- it ticks all the boxes! C++11, non terminated strings, and zero allocations - just what I was looking for in my library to address a really nasty issue caused by trying to stick to standard facilities while avoiding the performance-killing allocation cookie monsters from the STL.

But I am also looking for a matching to_chars() version/alternative that writes into a given buffer+size. I do not care about the roundtrip guarantee dictated by the standard.

Are you considering adding such a thing? Or are you aware of any implementation providing this function with similar quality and design choices?

I looked at ryu which does not tick all the boxes and has a large lookup table, but could work. I've also found fp which seems better but is C++17 and maybe a bit too fresh.

@lemire
Copy link
Member

lemire commented Nov 8, 2020

I believe that the state-of-the-art is Schubfach algorithm's but I did not find a C++ implementation that I liked. In simdjson, we adopted Grisu2 which is not as good, but I could find good looking C++ implementations.

So I think that taking Schubfach, building a good implementation, testing it, tuning it, would be great. Note that there might be a good Schubfach implementation out there in C++, I just did not find one.

(For obvious reasons, when you are building software, you don't just want to use something that has the best algorithm. You have other constraints... like... can I trust the code not to blow up? Can I read through the code and understand it?)

I do not care about the roundtrip guarantee dictated by the standard.

Actually, this should come for free. The from_chars implemented in fast_float is exact (with round-to-even and all that) so if you have exact to_chars, then you get the round-trip for free. In fact, you get better.

@biojppm
Copy link
Contributor Author

biojppm commented Nov 10, 2020

There's a Schubfach implementation here:

https://github.com/jk-jeon/fp/tree/master/subproject/3rdparty/schubfach

But am I correct in thinking Schubfach will only give us the equivalent to printf("%g")? I'd also like to have %e and %f together with a specified precision.

@lemire
Copy link
Member

lemire commented Nov 10, 2020

Schubfach is the high-level algorithm and not a formatter per se, so you are correct that it does not do everything (nor is it meant to).

I have not looked at the pointer you give but it does look a good APL 2 library at a glance.

Let us look at the std::to_chars specification... So if we are just talking about std::to_chars, then you always want the shortest representation, though you need to support both f and e.

the value is converted to a string as if by std::printf in the default ("C") locale. The conversion specifier is f or e (resolving in favor of f in case of a tie), chosen according to the requirement for a shortest representation: the string representation consists of the smallest number of characters such that there is at least one digit before the radix point (if present) and parsing the representation using the corresponding std::from_chars function recovers value exactly. If there are several such representations, one with the smallest difference to value is chosen, resolving any remaining ties using rounding according to std::round_to_nearest

@biojppm
Copy link
Contributor Author

biojppm commented Nov 12, 2020

On a related note, I've gathered the first benchmark results here.

Overall, fast_float is really fast on windows: 4x faster than std::from_chars(), and faster than everything else.

On Linux, it's among the faster, but there are some outliers and I have some suspicion over the results (eg, for clang10/Release/double, std::atof() is ~870MB/s, compared with fast_float at ~360MB/s). To be clear this is WSL so let's not jump to conclusions.

I do have some concerns over binary size. If you look at the data on the linux sizes, fast float is above 1.3MB, while a scanf is 12KB; even iostream has a smaller size, at ~1.2MB. To make things more comparable, I tried to request the static standard library, but I had no time to check if that was successful.

So, something to look at.

(And apologies if this is not the place to post such data.)

@lemire
Copy link
Member

lemire commented Nov 12, 2020

If you look at the data on the linux sizes, fast float is above 1.3MB

It is a header-only library, but let us look at the size of the compiled binaries (which include the header, compiled in release mode with -O3):

$ ls -alh example_test
-rwxr-xr-x  1 lemire dialout  35K Nov 12 01:23 example_test

Now an empty "int main() {}" binary will use 17KB. So fast_float itself cannot be much more than about ~15KB in that case. It may be a bit more, I am not being very precise, but it is not 1.2MB.

For comparison, if you grab Gay's dtoa.c (which is effectively the inspiration/source for strtod), you will find that it compiles down to a 55 KB binary.

Note that simpler version of this algorithm is part of Go standard library (as of a few weeks ago) and they did consider binary size as a factor.

@lemire
Copy link
Member

lemire commented Nov 12, 2020

Regarding benchmarking, I do have a pretty decent one there:

https://github.com/lemire/simple_fastfloat_benchmark

It used to support Visual Studio, but over several rounds of reengineering, I broke compatibility with Visual Studio. This could be fixed with some work?

@lemire
Copy link
Member

lemire commented Nov 12, 2020

(And apologies if this is not the place to post such data.)

It is totally fair to assess binary size, but it would be better to do it in a separate issue.

@biojppm
Copy link
Contributor Author

biojppm commented Nov 15, 2020

Now an empty "int main() {}" binary will use 17KB. So fast_float itself cannot be much more than about ~15KB in that case. It may be a bit more, I am not being very precise, but it is not 1.2MB.

Strange - that's exactly what I did. In my results the main is a loop using fgets() to read from stdin and then calling a macro which consists of the call to fast_float::from_chars() or is simply empty for the baseline. The Release size of the baseline with the empty loop comes to about 8.5KB in linux and 11KB in windows.

But it is really relevant here that I compiled this with the static standard library, so that may be causing the increased size. I will investigate this further and - if justified - pick this up in a different issue.

@lemire
Copy link
Member

lemire commented Nov 15, 2020

@biojppm There is about 84 KB of code in there, most of it made of comments. The code volume is about the same as dtoa.c. I am not denying that you are seeing a potential issue, but one would still have to explain how ~85KB of code (mostly comments) turn into 1.2MB of binary.

@sirinath
Copy link

I am also interested in a to_char implementations which is super fast. Ideally faster than the Dragonbox algorithm.

@lemire lemire mentioned this issue Jan 23, 2021
@ecorm
Copy link

ecorm commented Mar 29, 2021

The dragonbox.cc implementation from abolz/Drachennest has been recommended to me by the author of Dragonbox. It doesn't require C++17; it compiles for me in C++11 mode.

It seems to work, but I've had to modify it for header-only use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants