-
-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrated handling of decompression and Dataset state #525
Comments
…ansfer Syntax is uncompressed type. Related to #525.
Sorry to belabor this, but I'm stuck on the side-effect thing, and need it solved for version 1,0. To take the above a little further: If code accesses any data element, for example, Currently in repository, accessing The PR would need to separate things out a little further to make that happen. But I'm interested in comments first before coding that. |
@darcymason Yes, I agree that accessing the pixel_array should not change anything else. This keeps previous behavior. I also like that the state of pixel_array and PixelData are not connected. To me, it is clear that PixelData is just the content of the original dicom source, pixel_array is just a method to get a numpy array of the PixelData. I would suggest that decompression should remain implicit as I doubt that users would like to have to manage this on their client end (I wouldn't). I am already well trained that if I need to change pixel values and save it to a file, I have to do more that just I would encourage a return to the previous behaviour where pixel_array and PixelData were two different things and pixel_array had no effect on PixelData. |
I agree with @rhaxton. I have thought a little about the possible use cases, and there are really not many. The most common is probably just accessing the pixel data, as described by @rhaxton, to do some image processing or similar - in this case handling the data independent of the DICOM file is correct, I think. If the result shall be written back to a DICOM file, this has to be done separately. |
Thanks for your comments. I'm okay with pixel_array and PixelData separate for the usual case of accessing We'll have to be clear in the pixel_array documentation that for color three-plane it always returns RGB, even though the dataset PhotometricInterpretation says it is YBR. I realized that anyone checking the PI before accessing pixel_array as opposed to after would have that problem anyway. |
Agreed. Anyway, I think this is the wanted behavior. |
@darcymason pixel_array always returns a decompressed numpy copy of the data in PixelData (or throws an exception) [This is the 0.9 behavior]
Right now, it is 0 because the pixel_array is cached with the dataset. If you wanted to get the original data again, you would have to manually delete/clear either dataset._pixel_array or dataset._pixel_id (or re-read the file). I'm OK with this as I do re-read the file if I want the original data. This is different from other attributes that return copies:
I don't think that PixelData should be changed on writing after the numpy array has been modified (as the other attributes are not like PatientSex above). However, I think it would be a small improvement to allow:
That just implicitly calls the |
I was just thinking about that behavior too in the last couple of days. It was cached like that to avoid a lot of conversions (e.g. byte-swapping, if needed) every time the pixel data was accessed. That could be dropped and just documented that it is better to keep your own copy only and don't access it again.
Note that is not true for mutables: ds.PixelSpacing
['0.661468', '0.661468']
spacing = ds.PixelSpacing
spacing[0] = 42
ds.PixelSpacing
['42.0', '0.661468'] For both cases, that is the way python works for any mutables and immutables. The numpy array would normally fall in the mutable camp, but pixel_array is behaving like an immutable. Another thought (two related ones actually) I had recently, just to confuse this even more ;-) The current behavior does not work well for very large pixel data (see first comment #139; that user has > 1 GB files), because we are keeping two copies of the pixels. The second related point is that if you then try to avoid loading pixels into memory using memmap, then the only thing in the dicom file to map to is the original PixelData. How would a separate copy be implemented? So in the memmap scenario, any changes would be in-place editing the pixel data. Using the python/numpy buffer protocol, for non-compressed images we could point the numpy array to the pixel bytes directly and have only one copy. Numpy can also transparently deal with byte order without actually moving the bytes around. And then memmap would work in exactly the same way as fully loaded pixels. How to use memmap and deal with compressed formats, I'm not sure. |
The point about large pixel data is a good one, but I would leave that for a post 1.0 version. Using About mutable/immutable pixel data: Regarding |
I'm closing this for now. I think the |
Looking for a solution to one of the last remaining problems before version 1.0. As discussed in #521, decompression will have side-effects that change other data elements. Pydicom should keep track of that state and do some updating and checking on writing files.
A simple use case for pydicom might be to decompress a set of files and write them back out again:
I am submitting a pull request to separate out the decompress() step which had not previously existed (the only way to decompress was to ask for the pixels using
ds.pixel_array
).At the moment, the state of
pixel_array
andPixelData
are not connected, so that in the past the user would have to useds.PixelData = pix.tobytes()
to set the pixel data before writing the file. Here I propose that pydicom should take care of that step implicitly sincedecompress()
is an in-place modification of the dataset.The text was updated successfully, but these errors were encountered: