Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydicom v3.0 release #1232

Open
5 tasks
darcymason opened this issue Oct 20, 2020 · 19 comments
Open
5 tasks

Pydicom v3.0 release #1232

darcymason opened this issue Oct 20, 2020 · 19 comments

Comments

@darcymason
Copy link
Member

darcymason commented Oct 20, 2020

Keeping track of issues for pydicom 3.X release series.

Some ideas carried over from #1137:

  • a config option for non-dicom attributes CamelCase. It defaults now to 'warn' in pydicom 2.x but 'error' in 3.x.
  • limit Tag instantiation argument(s). Perhaps Tag(0x10, 0x10), Tag((0x10, 0x10)), Tag(0x100010), and Tag("PatientName")
  • Remove the ability to iterate DataElement, instead iterate over its value
  • FileDataset.filename should only be the file path as str, a separate attribute should be used for datasets read from BytesIO, etc.

Others:

@richard-moss
Copy link

Hi Darcy,

are you looking for comments on just those issues, or looking for other ideas as well?

@darcymason
Copy link
Member Author

are you looking for comments on just those issues, or looking for other ideas as well?

I'd be happy to entertain other ideas. If necessary (lots of discussion) they could be forked off into other issues to try to keep this issue a little more focused.

@darcymason
Copy link
Member Author

Actually - changed my mind about process -- let's put them in as separate issues first, with the "RFC" label, then bring to the list here if confirmed.

@scaramallion
Copy link
Member

What's the plan for 2.x and 3.0? I.e, at what release are we expecting to end the 2.x series?

@darcymason
Copy link
Member Author

I was thinking maybe only one or two more releases, then 3.0. What do you think? I hadn't given it a lot of thought, but it is nice to keep moving forward with important changes.

@scaramallion
Copy link
Member

I'm not fussed, just wanted to know if I should create a new release issue or not

@darcymason
Copy link
Member Author

I was thinking maybe only one or two more releases, then 3.0.

Revisiting this in thinking ahead to upcoming v2.2 release (~May), I'm happy with two releases. The second one, v2.3 could be ~Sept, (although the current release schedule says Nov.), then v3.0 in ~Jan-Feb 2022

However, if we need to work through some big changes, we could have an additional release v2.4 in ~Jan/2022, with v3.0 in ~Apr 2022.

We should be thinking now (for v2.2 release #1259) about any major deprecations that aren't already warned about -- it would be nice to start warning with at least two versions before breaking changes. The items in the checklist in the first post above are a start, but are there others?

We could consider big items like:

  • rethinking the way we handle 'strict' / 'enforce_valid' DICOM
  • value representations? e.g. PersonName, DS, etc., some inconsitencies often come up - perhaps we could look to harmonize the way things are done.
  • encoding / decoding of values to/from files, in particular unicode - it wasn't handled way back in original pydicom, and perhaps it is not integrated into the code as best it could be?
  • perhaps better hooks for fixing invalid DICOM before/after being read in.

Those are a few ideas that come to mind. If anyone has any ideas about structural issues pydicom could address, we could try to come up with a few to tackle, if there is a reasonable deprecation path leading to v3.0.

@mrbean-bremen
Copy link
Member

Good point to handle this before the next release! I haven't really looked at this, here are just my ad-hoc thoughts:

  • handling of strict - yes, this is somewhat inconsistent now, I think, so it would make sense; it probably wouldn't break much code, as it is seldom used (I think)
  • harmonizing value representations sounds good, as related issues come up frequently, though I haven't thought this through yet
  • not sure about encoding/decoding - I may have a look at this later, though I don't see any breaking changes there at a first glance
  • I haven't looked into the hooks, but that is certainly useful. Not sure if it needs to be a breaking change, though.

Maybe I will have a closer look and think about other changes at the weekend...

@mrbean-bremen
Copy link
Member

I didn't see any other breaking changes needed. I'm not sure about the possible reworking in pixel data handling and modularization, but as far as I understood the comments in the related issues, this can all be done without breaking changes.

Other suspects where I'm not completely sure are SR handling (I simply lack the expertise here), and maybe JSON handling, though I expect this to be expanded rather than changed in an incompatible way.

I also had another look at encoding / decoding, but I don't see the need for breaking changes (or any major changes) there at the moment.

@darcymason
Copy link
Member Author

@mrbean-bremen, thanks for all the work on issue cleanup and looking into the breaking changes possibilities. I'll try to have a good look myself this week.

@darcymason
Copy link
Member Author

Thought I'd drop another thought here based on my comment in #1366...

Perhaps some grouping of 'strict' flags could be done - a dict or data class where could turn or off combinations of warning/errors/automatically converting and which to use on reading/writing. E.g. could warn on reading invalid values, auto-convert on writing. Or error on reading, warn on writing - any combination.

@scaramallion
Copy link
Member

scaramallion commented Apr 27, 2021

I'd like to change the config option setting to use a single entry point, probably backed by a dict or class as you suggested:

from pydicom import config

config.set_option("opt name", value, **kwargs)

I still like the idea of have a bunch of 'fixer' functions that could be dropped in or out as desired. We could flag up non-conformance issues during read/parsing and (by default) process them with the fixer functions. If a user doesn't want that fixed (or if they turn off the conformance setting) then we just remove function(s) from the corresponding list. Users could define their own processor list to mix and match fixers. I think it'd tie in nicely with the processing hooks, too.

@darcymason
Copy link
Member Author

config.set_option("opt name", value, **kwargs)

Yeah, that looks good, and it is a nice evolutionary path that can be used without breaking backward compatibility.

I still like the idea of have a bunch of 'fixer' functions that could be dropped in or out as desired

Well, I'm sure you know that I have been a big advocate of the 'fixer' concept in many issue answers. I really like this too but it is a bit slower option because it is a callback - the usual situation now is to have a None callback check, so no expensive function call is made. I think these non-conformance options might be turned on more commonly, so "inline" branching logic via checking config flags could be much faster. Maybe there can be some mix of both concepts, with the really common ones going inline. For example, I think we are agreed on being (by default) forgiving during reading, but strict on writing, so maybe the latter could benefit from being inline.

@scaramallion
Copy link
Member

scaramallion commented May 4, 2021

I've also never really been happy with the current way we handle pixel data decoding (too much boilerplate, hard to customise). I'd like to replace it wholesale with a system similar to the proposed encoding method, where there's a single generic management class and decoding runs via plugins to the class instances. I can probably add it as an option beforehand and then make it default in v3.0 (and remove the old system).

@mrbean-bremen
Copy link
Member

I'd like to replace it wholesale with a system similar to the proposed encoding method, where there's a single generic management class and decoding runs via plugins to the class instances.

That would be really nice! I liked the way you handle the encoding in the PR. A plugin concept is more flexible, and also more convenient for contributions outside of the pydicom package.

@darcymason
Copy link
Member Author

I've been reading the "What's new" for the Python versions, and I'd like to propose Python 3.10+ only for pydicom 3.X.

Some of the reasons:

  • fewer Python versions = faster test suites and fewer maintenance issues
  • because we are jumping a major version, breaking changes aren't as unexpected
  • we could start using some newer Python features much more quickly than waiting years to drop 3.8, 3.9.
  • Data Classes (available Python 3.7+) also allow slots for speed improvement in Python 3.10+. (may be useful for DataElement and others?)
  • Typing notation is greatly improved, much more readable by Python 3.10:
    • Union, Optional can be replaced with X | Y, X | None.
    • TypeAliass might be helpful.
  • for users, upgrading Python 3.X versions is not a big issue (e.g. compared with 2.X -> 3.X)
  • by the time we actually release pydicom 3.X, Python 3.12 (release candidates scheduled for summer 2023) may already be in place or not be far away.
  • we can target any speed profiling (e.g. as requested in Seeking ideas for performance improvements with large segmentation files #1728) to the faster, newer versions of Python which people interested in speed should be using anyway.

So, we still have 2.4 to release, but after that, go to Python 3.10+?

@mrbean-bremen
Copy link
Member

In principle, I agree - I've also seen these improvements and like them, too.
But I also want pydicom users to be able to use the newest version, which depends on their environment.
I think that it would be good if we can at least support the Python version that is used (by default) by the main Linux distributions (in the latest LTS version). It would be good to allow them to package the latest version.
I haven't checked this yet, but we have some time until we finally decide this...

@darcymason
Copy link
Member Author

I think that it would be good if we can at least support the Python version that is used (by default) by the main Linux distributions (in the latest LTS version). It would be good to allow them to package the latest version.

Yes, that would be ideal... but we won't know 6 months ahead of time while we are trying some major changes, so I'm kind of trying to be selfish with my time...

but we have some time until we finally decide this...

On a very quick google, it seems Py 3.10 is already out there.

As of writing this guide, Python 3.10 is adopted by most of the current distros. For example, Ubuntu 22.04 LTS, and Fedora 36 all have Python 3.10 by default.

@mrbean-bremen
Copy link
Member

As of writing this guide, Python 3.10 is adopted by most of the current distros

Ok, that looks good then - thanks for looking that up. Well, I have no more objections in that case 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants