Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [RFC] Framework for extending Dataset #989

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

darcymason
Copy link
Member

Describe the changes

Created an "extension" concept for Dataset, where a class can be hooked into the Dataset class for specific tags.

Early at this point, only allows the "get" operation to be intercepted for any item by tag number, or existing items by keyword. Could look at handling tag groups or slices, or even defining new methods or properties (that are not dicom dict items).

Looking for feedback on the overall concept before coding extension by setting operations. See the test_extension code for an example of how it is used.

One tricky issue was the handler accessing dataset information without getting into an infinite loop. Used the Dataset context manager to 'turn off' handlers when control passed to the registered handler. Could write a separate context manager instead.

Related to recent discussion in #824.

Tasks

  • Unit tests added that reproduce the issue or prove feature is working
  • Fix or feature added
  • Documentation updated (if relevant)
    • No warnings during build
    • Preview link (CircleCI -> Artifacts -> [...]/_build/html/index.html)
  • Unit tests passing and overall coverage the same or better

@pep8speaks
Copy link

pep8speaks commented Dec 10, 2019

Hello @darcymason! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-10 01:18:44 UTC

@mrbean-bremen
Copy link
Member

I had a quick look, and so far it looks good to me - will make a proper review probably tomorrow (spent too much time with the VOI LUT PR...)

Copy link
Member

@mrbean-bremen mrbean-bremen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks fine to me, I just want to understand a bit more about the usage.

@@ -0,0 +1,53 @@
# Copyright 2008-2018 pydicom authors. See LICENSE file for details.
"""Test for encaps.py"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste?

existing_tags = set(Dataset._handler_classes.keys()) & set(tags)
if existing_tags:
warnings.warn(
"Registration of handler {!r} is overriding"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to allow more than one handler? I'm thinking of handlers for different SOP classes that may handle the same tag differently. Not sure if that is an intended use case - maybe that would need another mechanism that takes the SOP class into account (similar to the supported TransferSyntax in the image data handlers).
Or I'm just misunderstanding the use case here. I had been thinking about these handler classes as a replacement for the proposed SR classes (SRDocumentStorage, WaveformStorage etc) - maybe you could elaborate a bit about the intended use cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points ... I had planned to extend this to tag groups, or slices of tag numbers, which I think could handle most special cases. For example, SR code elements seem to be largely group 0040.

The idea of handling SOP classes is interesting, and we could put handlers in for that as well, but I don't know how to extend that to nested datasets within Sequences, which wouldn't have an SOP class data element. Maybe there is some way in analogy to what was done with SpecificCharacterSet, keeping track of parents and using their handlers. Would take some thought to figure that out. But if someone were building something up from scratch, they might start with some SR content items which they then assemble into an overall dataset. Along the way, there would only be the tag numbers to identify what each data element was.

In terms of more than one handler, I'm not sure how that would work - generally speaking only one operation can actually occur. E.g. for 'get' operations, some data element has to come back, and we would have to allow only one handler to actually provide it. Similarly with setting items, it would be very confusing if multiple handlers could be doing different things to the dataset contents.

In the end, I'd like to see something here that can handle some things with relatively clearly defined conditions. There will no doubt be some cases where subclassing would be the only thing flexible enough to handle all cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that I need to go back and review the classes in this PR again, and see how/if these ideas can fit. I started with something really general, but should check it in the specifics of this SR code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of more than one handler, I'm not sure how that would work - generally speaking only one operation can actually occur.

Yes, I thought about calling handlers until some of them is able to handle it. That would imply that the handlers should know if they want to handle the dataset, and also, that there may be conflicts if two handlers want to handle the same data for the same tag or tag range, so that the first would win... which is probably not a good thing, as it would depend on the order of the handlers. It would actually be similar to the pixel data handlers in this aspect. Also, you are right - if you want to extend this to nested sequences, it would be more complicated.
What I had in mind was actually a generalization of the pixel data handler concept for other tags beside PixelData. So, if this is not really a use case, it may just be a bad idea.

I think your idea is an interesting one. I'm not sure yet how to apply it for SR code, but there are other use cases anyway. I can think for example about handling vendor-specific and/or non-standard-conform tags, or a kind of cheap anonymization - actually there are lots of possible use cases for pydicom users now I think about it. I guess I have been thinking more about using it inside pydicom so far...

@mrbean-bremen mrbean-bremen added this to the v1.4 milestone Dec 22, 2019
@mrbean-bremen mrbean-bremen mentioned this pull request Dec 22, 2019
25 tasks
@darcymason darcymason modified the milestones: v1.4, v2.0 Jan 1, 2020
@darcymason
Copy link
Member Author

Leading up to v2.0 release, I'm not convinced this is the right way to go, and don't want to put something in place and later break code because a different path is needed. Pushing this to a later version unless there are objections.

@darcymason darcymason modified the milestones: v2.0, v2.3 Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants