Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trying to add round for c #1193

Draft
wants to merge 3 commits into
base: devel
Choose a base branch
from
Draft

trying to add round for c #1193

wants to merge 3 commits into from

Conversation

yassine-alaoui
Copy link
Contributor

Screenshot 2022-09-19 at 16 22 38

Hey, I have this problem where sometimes python gives a surprising behavior as described in the documentation -> "https://docs.python.org/3/library/functions.html#round" I was wondering if there is a way to go around this issue. Thanks in advance.

@EmilyBourne
Copy link
Member

See PR #996

We have not found a way to fix this other than by using the same technique that python uses internally (convert to str then back to int)

@yassine-alaoui
Copy link
Contributor Author

Should I keep working on this or just stop, since someone is already, working on it(I just saw that it was getting fixed on the 996)

@EmilyBourne
Copy link
Member

Should I keep working on this or just stop, since someone is already, working on it(I just saw that it was getting fixed on the 996)

As you prefer @nandiniraja348 doesn't seem to be working on this anymore. I have occasionally been working on it when I have time but I don't have much time at the moment. Another option would be for you to pick up @nandiniraja348 's fork and complete it. I think I prefer the version in that PR as:

  • there are a couple of minor bug fixes for other things there
  • there are more tests
  • the syntactic and semantic stages are handled more simply

All that is missing there is the C function

@yassine-alaoui
Copy link
Contributor Author

Okay, I'll try to do that

@yassine-alaoui
Copy link
Contributor Author

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:

def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))

where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.

@EmilyBourne
Copy link
Member

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:

def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))

where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.

This is very strange behaviour. It is not a normal round or a banker's round :|

Python's implementation of round can be found here:
https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012

It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?

@yassine-alaoui
Copy link
Contributor Author

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:

def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))

where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.

This is very strange behaviour. It is not a normal round or a banker's round :|

Python's implementation of round can be found here: https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012

It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?

I ran the failing test on #996 on a Ubuntu image and it didn't fail, is it a problem only with windows ?
And I wanted to ask; if we have to do the integer to string and back to integer again only for integers not floats should we do it in our own way or try and mimic the code in cpython reference you gave me, because there is a lot of stuff they have going on and one of them is the function _Py_dg_dtoa() that they use for double_round here is the link for the function : https://github.com/python/cpython/blob/main/Python/dtoa.c#L2247-L2855

@EmilyBourne
Copy link
Member

EmilyBourne commented Oct 11, 2022

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:

def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))

where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.

This is very strange behaviour. It is not a normal round or a banker's round :|

Python's implementation of round can be found here: https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012

It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?

I ran the failing test on #996 on a Ubuntu image and it didn't fail, is it a problem only with windows ?

I don't think so. It's a floating point problem. If the closest floating point number to 230 that can be stored in the computer's memory is 229.999999999999 and you convert to an int you will get 229 instead of 230.
I just haven't happened upon the right corner case for Ubuntu

And I wanted to ask; if we have to do the integer to string and back to integer again only for integers not floats should we do it in our own way or try and mimic the code in cpython reference you gave me, because there is a lot of stuff they have going on and one of them is the function _Py_dg_dtoa() that they use for double_round here is the link for the function : https://github.com/python/cpython/blob/main/Python/dtoa.c#L2247-L2855

The implementation I linked to is for floats, here's the integer version
https://github.com/python/cpython/blob/main/Objects/longobject.c#L5659
I'm not sure if you can get the same results with a different method. If you can find something clearer then by all means keep that. If not :/

Given that the round function is in the c side of the python code, one would hope that the guys at python have already spent some time getting something reasonably optimised.

@yassine-alaoui
Copy link
Contributor Author

I changed the way the round function worked to approximately the same as #983. I added tests to check for all the results, but still had issues with this case:

def testNdigitsEdgeCases():
    f = epyccel(roundNdigits, language='c')

    x = 2.675

    n = 2

    print("testNdigitsEdgeCases 2")

    print(x, n)

    f_output = f(x, n)

    python_output = roundNdigits(x, n)

    print(f_output, python_output)

    assert isclose(f_output, python_output, rtol=RTOL, atol=ATOL)

    assert isinstance(f_output ,type(python_output))

where the number is unexpected in the python version(I get 2.68 which makes sense, but we get 2.67 from python), also I wanted to make the conversion internally to get it close to the python result but found out that there are way too many edge cases for that and the implementation is going to be hard, so I wanted to ask if there are any better solutions for this.

This is very strange behaviour. It is not a normal round or a banker's round :|
Python's implementation of round can be found here: https://github.com/python/cpython/blob/main/Objects/floatobject.c#L962-L1012
It converts the float to a string then back to a truncated float. This seems to be necessary for integers at least (see failing test on #996 ) to avoid floating point imprecision leading to the wrong value. But I don't see why it ought to be necessary for floats and I imagine it will be significantly more costly than what is implemented currently. @yguclu what do you think?

I ran the failing test on #996 on a Ubuntu image and it didn't fail, is it a problem only with windows ?

I don't think so. It's a floating point problem. If the closest floating point number to 230 that can be stored in the computer's memory is 229.999999999999 and you convert to an int you will get 229 instead of 230. I just haven't happened upon the right corner case for Ubuntu

That makes sense, I'll look more into this tomorrow

And I wanted to ask; if we have to do the integer to string and back to integer again only for integers not floats should we do it in our own way or try and mimic the code in cpython reference you gave me, because there is a lot of stuff they have going on and one of them is the function _Py_dg_dtoa() that they use for double_round here is the link for the function : https://github.com/python/cpython/blob/main/Python/dtoa.c#L2247-L2855

The implementation I linked to is for floats, here's the integer version https://github.com/python/cpython/blob/main/Objects/longobject.c#L5659 I'm not sure if you can get the same results with a different method. If you can find something clearer then by all means keep that. If not :/

Given that the round function is in the c side of the python code, one would hope that the guys at python have already spent some time getting something reasonably optimised.

I don't think there are any other simpler methods and we probably have to follow their steps because trying to get a solution around this is probably unachievable.

@yassine-alaoui
Copy link
Contributor Author

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision.
what do you think @EmilyBourne ?

@EmilyBourne
Copy link
Member

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

@yassine-alaoui
Copy link
Contributor Author

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

I think that's what happens in the failing case we have round(2.675, 2) where in python they always result in 2.67 instead of 2.68, they probably have an algorithm that unifies the result in some way so they never run into the precision problem, but I'm not sure how are we going to be able to mimic that behavior, to be honest.

@EmilyBourne
Copy link
Member

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

I think that's what happens in the failing case we have round(2.675, 2) where in python they always result in 2.67 instead of 2.68, they probably have an algorithm that unifies the result in some way so they never run into the precision problem, but I'm not sure how are we going to be able to mimic that behavior, to be honest.

This is why this issue's been sitting around for so long ^^'. I think it is quite complicated so I don't think it's worth looking for shortcuts. I think we should just implement what python does

@yassine-alaoui
Copy link
Contributor Author

what if we make a special function for the integers like 230, for example, if we get round(230, -1) we check if the first argument type is an integer and we call a function that takes long long in c, something like this: long_long_round(long long x, int ndigits) that way we wouldn't store the number as float and risque losing precision. what do you think @EmilyBourne ?

That is possible. But I'm not sure this is restricted to ints. For example round(0.023,3) gets multiplied to 23 which may have a rounding problem(22.99999999) which then gives 0.022. The problem comes from the use of lrint not the fact that the input is an integer

I think that's what happens in the failing case we have round(2.675, 2) where in python they always result in 2.67 instead of 2.68, they probably have an algorithm that unifies the result in some way so they never run into the precision problem, but I'm not sure how are we going to be able to mimic that behavior, to be honest.

This is why this issue's been sitting around for so long ^^'. I think it is quite complicated so I don't think it's worth looking for shortcuts. I think we should just implement what python does

yeah, I don't think we can quite find a way around this, I'll try to understand the CPython solution for this and see if I can do the same on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants