Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitarray should be a subtype of frozenbitarray #212

Open
scott-griffiths opened this issue Dec 7, 2023 · 8 comments
Open

bitarray should be a subtype of frozenbitarray #212

scott-griffiths opened this issue Dec 7, 2023 · 8 comments

Comments

@scott-griffiths
Copy link

Hi,

A feature suggestion to consider as I see you're preparing a 3.0 release, and this would be slightly backward incompatible.

At the moment frozenbitarray is a subclass of bitarray, so is essentially a bitarray which has had various mutating methods removed from it. The more natural arrangement (for me) would be to have frozenbitarray be the base class and have bitarray be a subclass that adds the mutating methods.

This is a breaking change as for example isinstance(some_bitarray, bitarray.frozenbitarray) would become True instead of False, but I'd argue that this is less surprising behaviour.

The real reason for doing this is that it could allow a frozenbitarray object to perhaps be simpler (and thus faster) than a bitarray object. Trivial example: a copy method wouldn't have to make a copy but could return the original object as an optimisation.

I don't know how realistic this is as I appreciate that it might be a real chunk of work, but I think switching their relationship around would allow for some future optimisations.

Thanks. 👍

@ilanschnell
Copy link
Owner

Hey,
thanks for your suggestion. This is an interesting idea. This would be cleaner, as it would avoid to "removal" of mutating the methods in the frozenbitarray class. I'm not sure how much would this would be, as I haven't tried yet used the Python C API to create a subclass.

I also wonder how the relationship between the Python bytes and bytearray objects are implemented. This is actually very similar, i.e. bytes being the "frozenbytearray".

I will give this some more thought. Please let comment on this issue if you have more ideas about this topic. And please open a new issue if you have other thoughts about the 3.0 release.

One idea I have is to rename the iter* methods and remove the non-iter variants. For example itersearch -> search. This transition was made in Python 3.0 also, e.g. the dictionary used to have both .keys() and .iterkeys() in Python 2. But now in Python 3 you have just .keys(), and if you really want a list of the keys you have to say list(d.keys()).

@scott-griffiths
Copy link
Author

Rather than open any new issues I'll just make some observations here and let you decide if they merit their own issues. These might just be personal to me so I hope I don't sound too critical. Some are breaking changes and some are not:

  1. I agree on the renaming of the iter methods. Nothing is lost if people need to recode to list(b.search(x)).
  2. I dislike the way object memory is left uninitialised by default. I understand it's marginally faster at the C level, but it's unexpected in Python and leads to surprising and non-deterministic behaviour. Perhaps a do_not_initialise parameter could be used to get the current behaviour (though I suspect it would be rarely used).
  3. The itersearch method could really use start and stop parameters, the same as in find and index.
  4. The relationship between find, index, itersearch, search and rindex is rather confusing. For example index is a method but rindex is a utility function, while find and index only differ in how they signal failure.
  5. And having just complained about there being too many find methods, it would be useful to also have a reverse search (returning an iterator).
  6. The methods to convert to strings of different bases are not uniform. For binary it's b.to01(), for hex it's util.ba2hex(b) and for octal (for example) it's util.ba2base(8, b). The note in the documentation about not using ba2base for hex is a bit strange as surely the optimal code could be called programatically? I'm not sure what a good interface would be here - personally I might go with b.bin(), b.hex(), b.oct() together with b.base(n).

[And if you follow the bytes/bytearray naming then frozenbitarray would be called just bits (this is how the Bits and BitArray classes in bitstring came about!)]

Hope those suggestions / observations are useful. Thanks.

@ilanschnell
Copy link
Owner

Thank you for all the remarks. Here are some of my thoughts:

  1. yeah, will do this!
  2. yes, I as thinking about this too. bitarray(n) should give a bitarray with n zeros. This will make util.zeros() obsolete, but we'll keep it! As this change will not break anything, I'll probably do it in a bitarray 2.9 release much sooner. Also, I want to add util.ones() in the 2.9 release.
  3. Agreed: .search(start=0, stop=<end>)
  4. Have to think about this some more.
  5. Yes, maybe just an extra argument, e.g. .search(start=0, stop=<end>, reverse=False)
  6. Having these additional string methods in the bitarray object would clutter it, I think. This is why I added the util module in the first place. Also, addind these "to methods", it would mean that the "from functions" (like hex2ba()) would still be somewhere else. Unless of course on changes the bitarray initialize string to accept 0x, e.g. a = bitarrat("0xa0f"). But I think I'll leave these things as they are.

@scott-griffiths
Copy link
Author

That all looks good to me. I agree that it's probably not good to parse initialisation strings to see if they are hex or bin etc. as that could add an unavoidable overhead for everyone.

One other wrinkle I've only just noticed is that rindex can only search for a single bit, whereas index can search for a sub_bitarray.

@ilanschnell
Copy link
Owner

Good catch! I remember adding sub_bitarray search to .index(), but it did occur to me that rindex() doesn't have it either.

@ilanschnell
Copy link
Owner

@scott-griffiths I'm planning a 2.9 release in the next few days. Everything for the release if in master, in particular the changelog. I'm deprecating util.rindex() and adding an optional right keyword argument to .find(), .index() and .itersearch() (leaving .search() as is because it's going to be replaced by itersearch in 3.0 anyway.

Let me know what you think. Thanks

@scott-griffiths
Copy link
Author

Looks good.

My one comment is that the right parameters should be of type bool for consistency (c.f signed, reverse parameters).

I notice that the count method could now search for a sub_bitarray. This method also has a step parameter, so I presume you could now count the number of byte-aligned occurrences of a sub_bitarray using a step of 8 for example? That would be really useful in both find and itersearch (and index too for consistency) - I can add a separate feature request if you'd like.

@ilanschnell
Copy link
Owner

Thanks!

I've updated the documentation for the right parameters to be Boolean. These paramters (including signed and reverse) are integers, such that one can say reverse=1. In fact signed can be of any type, as the code only checks it's truth value.

Yes, I added sub_bitarray to .count(). This method has had the step parameter for almost 2 years now. However, when a sub_bitarray (of length other than 1) is used, the step must be 1. I though about allowing sub_bitarray search for arbitrary steps, but it is not clear what this should actually do. Only consider the step for the starting possitions? Or count occurences in a[start:stop:step] as the item-count does? For the same reason (and because Python's string, bytes and bytearray .find() does not have a step either) I think it's better to not add step to .find, etc., and require step=1 for sub_bitarray count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants