Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] New string::replace in Cython 3.0.9 feature overrides the bytes.replace method #6123

Open
Source61 opened this issue Apr 1, 2024 · 5 comments

Comments

@Source61
Copy link

Source61 commented Apr 1, 2024

Describe the bug

Implicitly or explicitly calling a string object as a object to use bytes.replace($1, $2) no longer works since version 3.0.9.

Code to reproduce the behaviour:

from libcpp.string cimport string

cdef:
        string tmp = b"abcdefg"

tmp = tmp.replace(b"f", b"H")

Expected behaviour

No response

OS

Debian 12

Python version

3.11.2

Cython version

3.0.9

Additional context

No response

@da-woods
Copy link
Contributor

da-woods commented Apr 2, 2024

I'm slightly amazed that this ever worked. It should be possible to do it manually via casting:

tmp = (<bytes>tmp).replace(b"f", b"H")

It isn't obvious to me whether this is something we should fix or not. It seems pretty unintuitive to be implicitly calling bytes methods like this, but the worry is that lots of people are relying on it.

@scoder
Copy link
Contributor

scoder commented Apr 2, 2024

This kind of breakage shouldn't happen in a point release, so it's an unintended regression. However, now that the regression is there, I wonder if we should really go back and change it again. I doubt that "lots of people" are relying on this specific behaviour. And it's easy to work around for users by making their code more explicit.

It worked in the sense that it creates one or two Python bytes objects, one before and one after the replacement, in which case the code throws away the first. We also allow cpp_string.decode(), so it's not that far fetched.

However, the difference to cpp_string.decode() is that it only creates one decoded string object, whereas the old cpp_string.replace() could create two bytes objects, and that's not visible from the code. So making the conversion explicit really is an improvement to user code.

I'd keep the new behaviour, unless there is broader opposition.

But it's one more reason to be conservative with updates in point releases.

@Source61
Copy link
Author

Source61 commented Apr 2, 2024

I'm slightly amazed that this ever worked. It should be possible to do it manually via casting:

tmp = (<bytes>tmp).replace(b"f", b"H")

It isn't obvious to me whether this is something we should fix or not. It seems pretty unintuitive to be implicitly calling bytes methods like this, but the worry is that lots of people are relying on it.

Thanks, I had tried (tmp) but that didn't work, (tmp) on the other hand worked.

To me it's incredibly useful to be able to code like this so I hope you don't remove the option altogether, my project that's years in the making would probably not survive such an update (making every call explicit it might survive), but removing the ability to use Python calls like this altogether would be detrimental, after all Python has a lot of good libraries and functions for manipulating strings and such so you don't have to in C++, and these libraries are so optimized that it's often just as cheap if not cheaper to use the Python library, so why force users to do it in C++, you know what I mean?

I understand that's not what you're doing though, just expressing my concern about the hypothetical that this functionality would be removed in the future, I hope it doesn't.

Thanks for being reasonable as always and coming up with solutions/workarounds whenever possible.

@scoder
Copy link
Contributor

scoder commented Apr 2, 2024

Don't worry. Being able to mix C, C++ and Python APIs freely is at the core of Cython's feature set.

This is just a rather special case where C++ and Python methods with the same name exist and disagree. In that case, it seems reasonable that the C++ method of a C++ object is looked up first, before trying a conversion to Python and looking up the method there. So, if the old behaviour hadn't existed before, the new behaviour would be the expected, unsurprising one.

@da-woods
Copy link
Contributor

da-woods commented Apr 2, 2024

I wonder if it's worth putting a warning on the implicit conversion for an attribute lookup since it's an area where it's easy to get unexpected behaviour (and also missed attribute lookup becomes a runtime error rather than a compile time error).

I definitely don't want to stop anyone doing it deliberately but I wonder if it's unintended rather than intended in most cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants