Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Floating point precision #104

Open
MichielDeMey opened this issue Jan 26, 2021 · 1 comment
Open

[Discussion] Floating point precision #104

MichielDeMey opened this issue Jan 26, 2021 · 1 comment
Labels
version-2 Saving this for Data-Forge version 2.

Comments

@MichielDeMey
Copy link

Hi

We're currently using data-forge for a fintech project and we're loving it!
Current usage is to aggregate a list of financial transactions by month and create a sum, basically giving us a time-series representation of our data.

This works great, however JavaScript is notorious for (incorrectly) handling floating points and losing precision in the process.
Unfortunately, when working with financial data you'd need that floating point precision sweetness.

Relevant snippet:

const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
  .select((group) => ({
    type: group.first().type,
    due_date: group.first().due_date,
    amount: group.deflate((row) => row.amount).sum()
  }))

Which will return something like

__index__   type     due_date    amount             
----------  -------  ----------  -------------------
2020-01-01  payable  2020-01-01  -300046.7899999998 
2020-02-01  payable  2020-02-01  -399881.57999999955
2020-03-01  payable  2020-03-01  -4724.579999999969 
2020-04-01  payable  2020-04-01  29564.520000000066 
2021-01-01  payable  2021-01-01  1750               

As you can see, the floating point precision is not very favourable.
Looking at the code, the sum function does exactly what you'd expect as it just sums up two number using plain JavaScript.

https://github.com/data-forge/data-forge-ts/blob/master/src/lib/series.ts#L4011-L4019

Floating point precision has been an issue for a long time and have been solved by several libraries already.
I noticed this library already using numeral, which can handle floating point precision.

With that in mind, we wrote a custom aggregation function to use numeral to fix our floating point precision.

const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
  .select((group) => ({
    type: group.first().type,
    due_date: group.first().due_date,
    amount: group
      .deflate((row) => row.amount)
      .aggregate((prev: number, value: number) =>
        numeral(prev).add(value).value() // Use Numeral to add two numbers, returning the resulting value
      )
  }))

And correctly returns

__index__   type     due_date    amount    
----------  -------  ----------  ----------
2020-01-01  payable  2020-01-01  -300046.79
2020-02-01  payable  2020-02-01  -399881.58
2020-03-01  payable  2020-03-01  -4724.58  
2020-04-01  payable  2020-04-01  29564.52  
2021-01-01  payable  2021-01-01  1750      

I created a separate CodeSandbox with the actual code so you can play with it:
https://codesandbox.io/s/eloquent-elbakyan-16v5i?file=/src/App.tsx


With this in mind, I'd like to open up the discussion of handling floating point precision in data-forge.

The way I see it, we can either

a) Provide support for floating point precision using numeral
This would improve the developer experience of this library as floating point precision "just works".
(For the manipulation functions supported by numeral (add, subtract, multiply, divide))

e.g.

amount: group.deflate((row) => row.amount).sum(true) // pass in "true" to allow floating point precision

or a separate util function

amount: group.deflate((row) => row.amount).sumPrecise()

b) Clearly document that data-forge does not handle floating point precision out-of-the-box and provide an example on how to keep floating point precision using numeral or another library (numbro, bigdecimal.js etc.).

Curious to hear your thoughts about this!

@ashleydavis
Copy link
Member

This is awesome thanks so much for bringing it up.

I have actually had to deal with this before (in multiple languages actually not just JavaScript).

For JavaScript, I've previously used Decimal.js, but Numeral is also good.

I've wanted to fix this directly in Data-Forge for a few years now. I'm just not sure the best way to go about it.

Maybe what it needs is some kind of plugin to DF that (when supplied) can handle the arithmetic. Then we could plugin Numeral and Decimal.js. Or maybe a more generic statics plugin so that we can override all the statistical capabilities to DF with customized versions.

What do you think? Any better ideas?

Would you be interested in working on this?

If it were backward compatible it could be included in an update to version 2 of DF.

@ashleydavis ashleydavis added the version-2 Saving this for Data-Forge version 2. label Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version-2 Saving this for Data-Forge version 2.
Projects
None yet
Development

No branches or pull requests

2 participants