trying to add round for c #1193

yassine-alaoui · 2022-09-19T15:47:02Z

Hey, I have this problem where sometimes python gives a surprising behavior as described in the documentation -> "https://docs.python.org/3/library/functions.html#round" I was wondering if there is a way to go around this issue. Thanks in advance.

EmilyBourne · 2022-09-19T17:14:06Z

See PR #996

We have not found a way to fix this other than by using the same technique that python uses internally (convert to str then back to int)

yassine-alaoui · 2022-09-19T18:27:44Z

Should I keep working on this or just stop, since someone is already, working on it(I just saw that it was getting fixed on the 996)

EmilyBourne · 2022-09-21T06:18:17Z

Should I keep working on this or just stop, since someone is already, working on it(I just saw that it was getting fixed on the 996)

As you prefer @nandiniraja348 doesn't seem to be working on this anymore. I have occasionally been working on it when I have time but I don't have much time at the moment. Another option would be for you to pick up @nandiniraja348 's fork and complete it. I think I prefer the version in that PR as:

there are a couple of minor bug fixes for other things there
there are more tests
the syntactic and semantic stages are handled more simply

All that is missing there is the C function

yassine-alaoui · 2022-09-21T13:53:38Z

Okay, I'll try to do that

yassine-alaoui · 2022-09-30T15:34:31Z

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:

def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))

where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.

EmilyBourne · 2022-10-11T16:25:34Z

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:
def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))
where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.

This is very strange behaviour. It is not a normal round or a banker's round :|

Python's implementation of round can be found here:
https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012

It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?

yassine-alaoui · 2022-10-11T19:45:21Z

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:
def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))
where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.
This is very strange behaviour. It is not a normal round or a banker's round :|

Python's implementation of round can be found here: https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012

It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?

I ran the failing test on #996 on a Ubuntu image and it didn't fail, is it a problem only with windows ?
And I wanted to ask; if we have to do the integer to string and back to integer again only for integers not floats should we do it in our own way or try and mimic the code in cpython reference you gave me, because there is a lot of stuff they have going on and one of them is the function _Py_dg_dtoa() that they use for double_round here is the link for the function : https://github.com/python/cpython/blob/main/Python/dtoa.c#L2247-L2855

EmilyBourne · 2022-10-11T20:25:44Z

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:
def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))
where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.
This is very strange behaviour. It is not a normal round or a banker's round :|

Python's implementation of round can be found here: https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012

It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?
I ran the failing test on #996 on a Ubuntu image and it didn't fail, is it a problem only with windows ?

I don't think so. It's a floating point problem. If the closest floating point number to 230 that can be stored in the computer's memory is 229.999999999999 and you convert to an int you will get 229 instead of 230.
I just haven't happened upon the right corner case for Ubuntu

And I wanted to ask; if we have to do the integer to string and back to integer again only for integers not floats should we do it in our own way or try and mimic the code in cpython reference you gave me, because there is a lot of stuff they have going on and one of them is the function _Py_dg_dtoa() that they use for double_round here is the link for the function : https://github.com/python/cpython/blob/main/Python/dtoa.c#L2247-L2855

The implementation I linked to is for floats, here's the integer version
https://github.com/python/cpython/blob/main/Objects/longobject.c#L5659
I'm not sure if you can get the same results with a different method. If you can find something clearer then by all means keep that. If not :/

Given that the round function is in the c side of the python code, one would hope that the guys at python have already spent some time getting something reasonably optimised.

yassine-alaoui · 2022-10-11T22:58:11Z

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:
def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))
where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.
This is very strange behaviour. It is not a normal round or a banker's round :|
Python's implementation of round can be found here: https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012
It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?
I ran the failing test on #996 on a Ubuntu image and it didn't fail, is it a problem only with windows ?
I don't think so. It's a floating point problem. If the closest floating point number to 230 that can be stored in the computer's memory is 229.999999999999 and you convert to an int you will get 229 instead of 230. I just haven't happened upon the right corner case for Ubuntu

That makes sense, I'll look more into this tomorrow

And I wanted to ask; if we have to do the integer to string and back to integer again only for integers not floats should we do it in our own way or try and mimic the code in cpython reference you gave me, because there is a lot of stuff they have going on and one of them is the function _Py_dg_dtoa() that they use for double_round here is the link for the function : https://github.com/python/cpython/blob/main/Python/dtoa.c#L2247-L2855

The implementation I linked to is for floats, here's the integer version https://github.com/python/cpython/blob/main/Objects/longobject.c#L5659 I'm not sure if you can get the same results with a different method. If you can find something clearer then by all means keep that. If not :/

Given that the round function is in the c side of the python code, one would hope that the guys at python have already spent some time getting something reasonably optimised.

I don't think there are any other simpler methods and we probably have to follow their steps because trying to get a solution around this is probably unachievable.

yassine-alaoui · 2022-10-12T11:42:38Z

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision.
what do you think @EmilyBourne ?

EmilyBourne · 2022-10-12T11:54:58Z

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

yassine-alaoui · 2022-10-12T12:11:38Z

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

I think that's what happens in the failing case we have round(2.675, 2) where in python they always result in 2.67 instead of 2.68, they probably have an algorithm that unifies the result in some way so they never run into the precision problem, but I'm not sure how are we going to be able to mimic that behavior, to be honest.

EmilyBourne · 2022-10-12T12:23:46Z

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

I think that's what happens in the failing case we have round(2.675, 2) where in python they always result in 2.67 instead of 2.68, they probably have an algorithm that unifies the result in some way so they never run into the precision problem, but I'm not sure how are we going to be able to mimic that behavior, to be honest.

This is why this issue's been sitting around for so long ^^'. I think it is quite complicated so I don't think it's worth looking for shortcuts. I think we should just implement what python does

yassine-alaoui · 2022-10-12T12:44:15Z

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

I think that's what happens in the failing case we have round(2.675, 2) where in python they always result in 2.67 instead of 2.68, they probably have an algorithm that unifies the result in some way so they never run into the precision problem, but I'm not sure how are we going to be able to mimic that behavior, to be honest.

This is why this issue's been sitting around for so long ^^'. I think it is quite complicated so I don't think it's worth looking for shortcuts. I think we should just implement what python does

yeah, I don't think we can quite find a way around this, I'll try to understand the CPython solution for this and see if I can do the same on our end.

trying to add round for c

90c8dbc

yassine-alaoui requested a review from EmilyBourne September 19, 2022 17:09

added some tests and changed round to use lrint

06201ba

yassine-alaoui removed the request for review from EmilyBourne September 30, 2022 15:35

yassine-alaoui added help wanted needs_initial_review labels Sep 30, 2022

smaller ndigits to prevent overflow

e69bbc0

EmilyBourne removed needs_initial_review help wanted labels Oct 18, 2022

yguclu added the duplicate label Mar 3, 2023

yguclu assigned jalalium Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trying to add round for c #1193

trying to add round for c #1193

yassine-alaoui commented Sep 19, 2022

EmilyBourne commented Sep 19, 2022

yassine-alaoui commented Sep 19, 2022

EmilyBourne commented Sep 21, 2022

yassine-alaoui commented Sep 21, 2022

yassine-alaoui commented Sep 30, 2022

EmilyBourne commented Oct 11, 2022

yassine-alaoui commented Oct 11, 2022

EmilyBourne commented Oct 11, 2022 •

edited

yassine-alaoui commented Oct 11, 2022

yassine-alaoui commented Oct 12, 2022

EmilyBourne commented Oct 12, 2022

yassine-alaoui commented Oct 12, 2022

EmilyBourne commented Oct 12, 2022

yassine-alaoui commented Oct 12, 2022

trying to add round for c #1193

Are you sure you want to change the base?

trying to add round for c #1193

Conversation

yassine-alaoui commented Sep 19, 2022

EmilyBourne commented Sep 19, 2022

yassine-alaoui commented Sep 19, 2022

EmilyBourne commented Sep 21, 2022

yassine-alaoui commented Sep 21, 2022

yassine-alaoui commented Sep 30, 2022

EmilyBourne commented Oct 11, 2022

yassine-alaoui commented Oct 11, 2022

EmilyBourne commented Oct 11, 2022 • edited

yassine-alaoui commented Oct 11, 2022

yassine-alaoui commented Oct 12, 2022

EmilyBourne commented Oct 12, 2022

yassine-alaoui commented Oct 12, 2022

EmilyBourne commented Oct 12, 2022

yassine-alaoui commented Oct 12, 2022

EmilyBourne commented Oct 11, 2022 •

edited