Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [RFC] Framework for extending Dataset #989

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/release_notes/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Enhancements
* Added optional `handler` argument to
:func:`~pydicom.dataset.Dataset.decompress`. This lets you specify a
particular handler, rather than following pydicom's default order (:issue:`537`)
* Added keyword propery to Tag class. e.g. Tag(0x7fe00010).keyword give 'PixelData'
* Added :func:`~pydicom.pixel_data_handlers.util.apply_voi_lut` function for
applying VOI LUTs or windowing operations.
* Added support for (7fe0,0008) *Float Pixel Data* and (7fe0,0009) *Double
Expand All @@ -69,3 +70,4 @@ Changes
.......

* :func:`~pydicom.encaps.get_frame_offsets` now returns whether the Basic Offset Table is empty and a list of the offsets.

37 changes: 37 additions & 0 deletions pydicom/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,12 @@ class Dataset(dict):
# Python 2: Classes defining __eq__ should flag themselves as unhashable
__hash__ = None

# Dataset extension tag handlers.
# Format {tag: handler class, ...},
# where handler class defines get_item and set_item,
# possibly, get_attr and set_attr if desired
_handler_classes = {}

def __init__(self, *args, **kwargs):
"""Create a new :class:`Dataset` instance."""
self._parent_encoding = kwargs.get('parent_encoding', default_encoding)
Expand Down Expand Up @@ -382,13 +388,20 @@ def __init__(self, *args, **kwargs):
# known private creator blocks
self._private_blocks = {}

# Instances of handler classes for this Dataset instance
# The instance is created if a matching tag is found
self._handlers = {}
self.handlers_on = True

def __enter__(self):
"""Method invoked on entry to a with statement."""
self.handlers_on = False
return self

def __exit__(self, exc_type, exc_val, exc_tb):
"""Method invoked on exit from a with statement."""
# Returning False will re-raise any exceptions that occur
self.handlers_on = True
return False

def add(self, data_element):
Expand Down Expand Up @@ -429,6 +442,10 @@ def add_new(self, tag, VR, value):
# use data_element.tag since DataElement verified it
self._dict[data_element.tag] = data_element

@classmethod
def clear_handler_classes(cls):
cls._handler_classes = {}

def data_element(self, name):
"""Return the element corresponding to the element keyword `name`.

Expand Down Expand Up @@ -791,6 +808,21 @@ def _character_set(self):

return char_set

def _call_handler(self, tag, method):
"""
tag:Tag
tag to handle
method: str
name of class method to call
"""
if tag not in self._handlers:
# Create handler instance for the dataset instance
# Pass it a reference to this Dataset instance
self._handlers[tag] = self._handler_classes[tag](self)
with self as ds: # handlers turned off in context to avoid inf loop
handler_method = getattr(ds._handlers[tag], method)
return handler_method(tag)

def __getitem__(self, key):
"""Operator for ``Dataset[key]`` request.

Expand Down Expand Up @@ -846,6 +878,11 @@ def __getitem__(self, key):
tag = key
else:
tag = Tag(key)

if self._handler_classes and self.handlers_on:
if tag in self._handler_classes:
return self._call_handler(tag, "get_item")

data_elem = self._dict[tag]

if isinstance(data_elem, DataElement):
Expand Down
20 changes: 20 additions & 0 deletions pydicom/extensions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from pydicom.dataset import Dataset
from pydicom.tag import Tag
import warnings


def handle_dataset_tags(tags):
"""Define a decorator to use with classes which handle tags"""
def decorator(handler_class):
existing_tags = set(Dataset._handler_classes.keys()) & set(tags)
if existing_tags:
warnings.warn(
"Registration of handler {!r} is overriding"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to allow more than one handler? I'm thinking of handlers for different SOP classes that may handle the same tag differently. Not sure if that is an intended use case - maybe that would need another mechanism that takes the SOP class into account (similar to the supported TransferSyntax in the image data handlers).
Or I'm just misunderstanding the use case here. I had been thinking about these handler classes as a replacement for the proposed SR classes (SRDocumentStorage, WaveformStorage etc) - maybe you could elaborate a bit about the intended use cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points ... I had planned to extend this to tag groups, or slices of tag numbers, which I think could handle most special cases. For example, SR code elements seem to be largely group 0040.

The idea of handling SOP classes is interesting, and we could put handlers in for that as well, but I don't know how to extend that to nested datasets within Sequences, which wouldn't have an SOP class data element. Maybe there is some way in analogy to what was done with SpecificCharacterSet, keeping track of parents and using their handlers. Would take some thought to figure that out. But if someone were building something up from scratch, they might start with some SR content items which they then assemble into an overall dataset. Along the way, there would only be the tag numbers to identify what each data element was.

In terms of more than one handler, I'm not sure how that would work - generally speaking only one operation can actually occur. E.g. for 'get' operations, some data element has to come back, and we would have to allow only one handler to actually provide it. Similarly with setting items, it would be very confusing if multiple handlers could be doing different things to the dataset contents.

In the end, I'd like to see something here that can handle some things with relatively clearly defined conditions. There will no doubt be some cases where subclassing would be the only thing flexible enough to handle all cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that I need to go back and review the classes in this PR again, and see how/if these ideas can fit. I started with something really general, but should check it in the specifics of this SR code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of more than one handler, I'm not sure how that would work - generally speaking only one operation can actually occur.

Yes, I thought about calling handlers until some of them is able to handle it. That would imply that the handlers should know if they want to handle the dataset, and also, that there may be conflicts if two handlers want to handle the same data for the same tag or tag range, so that the first would win... which is probably not a good thing, as it would depend on the order of the handlers. It would actually be similar to the pixel data handlers in this aspect. Also, you are right - if you want to extend this to nested sequences, it would be more complicated.
What I had in mind was actually a generalization of the pixel data handler concept for other tags beside PixelData. So, if this is not really a use case, it may just be a bad idea.

I think your idea is an interesting one. I'm not sure yet how to apply it for SR code, but there are other use cases anyway. I can think for example about handling vendor-specific and/or non-standard-conform tags, or a kind of cheap anonymization - actually there are lots of possible use cases for pydicom users now I think about it. I guess I have been thinking more about using it inside pydicom so far...

"existing handling of tags {}".format(handler_class,
existing_tags)
)
Dataset._handler_classes.update({Tag(tag): handler_class
for tag in tags})
return handler_class

return decorator
4 changes: 4 additions & 0 deletions pydicom/tag.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,10 @@ def is_private_creator(self):
"""Return ``True`` if the tag is a private creator."""
return self.is_private and 0x0010 <= self.element < 0x0100

@property
def keyword(self):
from pydicom.datadict import keyword_for_tag
return keyword_for_tag(self)

def TupleTag(group_elem):
"""Fast factory for :class:`BaseTag` object with known safe (group, elem)
Expand Down
53 changes: 53 additions & 0 deletions pydicom/tests/test_extensions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Copyright 2008-2018 pydicom authors. See LICENSE file for details.
"""Test for encaps.py"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste?


import pytest

from pydicom.tag import Tag
from pydicom.dataset import Dataset
from pydicom.dataelem import DataElement
from pydicom.extensions import handle_dataset_tags


class TestDatasetExtension(object):
"""Test handlers which extend Dataset"""
def setup(self):
@handle_dataset_tags(["ImagePositionPatient", "PatientName"])
class MyHandler:
def __init__(self, dataset):
self.ds = dataset

def get_item(self, tag):
"""Return the data element instance"""
if tag.keyword == "ImagePositionPatient":
x, y, z = self.ds.ImagePositionPatient
# This one updates underlying dataset
self.ds.ImagePositionPatient = [x + 1, y + 1, z + 1]
return self.ds["ImagePositionPatient"]
elif tag.keyword == "PatientName":
# This one leaves underlying dataset alone
return DataElement(tag, "PN", "anonymous")
else:
NotImplementedError("Unhandled tag")
self.ds = Dataset()
self.ds.PatientName = "Patient^Joe"
self.ds.ImagePositionPatient = [0.0, 0.0, 0.0]
self.ds.PatientID = "123"

def teardown(self):
Dataset.clear_handler_classes()

def test_get_item(self):
"""Test handler handles specified tags when retrieved by tag."""
assert [1.0, 1.0, 1.0] == self.ds["ImagePositionPatient"].value
assert "anonymous" == self.ds["PatientName"].value
# Check that one we haven't changed is okay
assert "123" == self.ds["PatientID"].value

def test_get_attr(self):
"""Test handler handles specified tags when retrieved by keyword."""
assert [1.0, 1.0, 1.0] == self.ds.ImagePositionPatient
assert "anonymous" == self.ds.PatientName

# assert that unchanged on still okay
assert "123" == self.ds.PatientID