Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructor fed with iterable bytes starting with a leading 0x00-0x07 silently treats the bytes as a python pickle... #206

Open
manor opened this issue Aug 2, 2023 · 4 comments

Comments

@manor
Copy link

manor commented Aug 2, 2023

The documentation of the bitarray class claims that the initializer can be:

 |  `int`: Create a bitarray of given integer length.  The initial values are
 |  uninitialized.
 |
 |  `str`: Create bitarray from a string of `0` and `1`.
 |
 |  `iterable`: Create bitarray from iterable or sequence of integers 0 or 1.

Since bytes are iterable, it is natural to assume that providing bytes as an initializer will be equivalent to a .frombytes() call... However, it appears that if the leading byte is in the range '0x00' through '0x07', the initializer is assumed to be a python pickle. The difference is counter-intuitive and confusing since a call fed from a binary file with a leading 0x00 will yield an almost "equivalent" bitarray with exactly one byte missing... Which can be hard to spot :-)

from bitarray import bitarray

b = bitarray(b'\x00\x0F', endian="big"); print(b)
# bitarray('00001111')

b = bitarray(endian="big"); b.frombytes(b'\x00\x0F'); print(b)
# bitarray('0000000000001111')

These are even more counter-intuitive:

b = bitarray(b'\x01\x0F', endian="big"); print(b)
# bitarray('0000111')

b = bitarray(b'\x02\x0F', endian="big"); print(b)
# bitarray('000011')

b = bitarray(b'\x03\x0F', endian="big"); print(b)
# bitarray('00001')

b = bitarray(b'\x04\x0F', endian="big"); print(b)
# bitarray('0000')

b = bitarray(b'\x05\x0F', endian="big"); print(b)
# bitarray('000')

b = bitarray(b'\x06\x0F', endian="big"); print(b)
# bitarray('00')

b = bitarray(b'\x07\x0F', endian="big"); print(b)
# bitarray('0')
@ilanschnell
Copy link
Owner

Thank you for using bitarray and filing this issue! The described behavior is correct, i.e. then a bitarray is initialized with bytes and the leading one is the range 0x00 to 0x07, then the remaining bytes are treated as the raw buffer (with the leading byte indicating how many unused bits are present).
This was behavior is only used to support unpickling, and therefore is not documented.
I agree that this might be confusing as bytes are also iterable, e.g. the iterable b'\1\0\1' is not interpreted as bitarray('101').

What would you suggest to fix this issue? Change the documentation?

@scott-griffiths
Copy link

Hi, another example of how this can surprise:

>>> bitarray(bytearray([0, 1]))
bitarray('01')
>>> bitarray(bytes([0, 1]))
Traceback (most recent call last):
ValueError: endianness missing for pickle

I don't really wish to weigh in on what should be happening, but I don't think that documenting it is the answer. I'm not very familiar with pickle, but I thought you could avoid using __init__ when unpickling? Other options seem messy to me.

@ilanschnell
Copy link
Owner

@scott-griffiths I agree that it would be best to avoid __init__ in the unpickling process completely. When looking at the Python standard library array module, I see an internal function _array_reconstructor which does the unpickling:

>>> import array
>>> array._array_reconstructor(array.array, 'b', 1, bytes([11, 22, 33]))
array('b', [11, 22, 33])

I will try to use this approach in bitarray as well, while keeping things backwards compatible.

@ilanschnell
Copy link
Owner

ilanschnell commented Aug 5, 2023

@scott-griffiths I just created #207, which uses a reconstructor function and also explains the history and technical details.

TL;DR: To allow a backwards compatibility transition, this issue (#206) can be closed a year from now!

ilanschnell added a commit that referenced this issue Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants