Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoking pixel_array truncates valid data if NumberOfFrames is not defined #2035

Open
luissantosHCIT opened this issue Mar 26, 2024 · 4 comments

Comments

@luissantosHCIT
Copy link

Describe the issue
For my projects, I recently came across a situation in which the file scanner service used to upload documents to our PACS does not break a multipage scan into individual images within a series. Instead, it concatenates the pixel data of each scanned page into a single array in PixelData. Unfortunately, the Rows and Columns tags are set to represent the dimensions of a single scanned page. When calling pixel_array, it warns that it has truncated 66% of the data because of excess padding. As a result, I end up with only the first page of a three page scanned document. I found out about this interaction during testing of a feature I was setting up.

image

I see the following in the numpy data handler
image
get_expected_length() has the following
image
get_nr_frames() only checks for NumberOfFrames
image

image

I worked around this issue by computing the frame count as
page_count = int(len(img_ds.PixelData) / (rows * columns * samples_per_pixel))
I then adjusted the Rows value to trick pydicom into giving me the full pixel data as a numpy array (convenient for other image manipulations I do).
img_ds.Rows = rows * page_count

I know that the scanner is at fault for not recording the NumberOfFrames in the header, but do you think that we could also check if we get an integer ratio between the expected size and the actual size and use that as the result of get_nr_frames() iff NumberOfFrames is not defined?
Maybe we can default getattr(ds, 'NumberOfFrames', 1) to getattr(ds, "NumberOfFrames", computed_integer_frames), where computed_integer_frames = int(len(ds.PixelData) / (rows * columns * samples_per_pixel)).

If this is a valid idea, I will try to come up with a PR later in the week.
Let me know if this is something I should open a PR for.

Steps to reproduce

  1. Create a DICOM file.
  2. Set Rows and Columns to reflect the length, width of a frame.
  3. Add n frames into the PixelData array.
  4. Save the DICOM.
  5. Load the DICOM and call pixel_array()
  6. Observe warning about excess padding.

image

I do not have an anonymized representative sample at hand right now.

@scaramallion
Copy link
Member

scaramallion commented Mar 26, 2024

I have though about including your suggested approach, but the problem is you have to assume you're getting the excess padding warning because everything except number of frames is correct. So then what happens if instead of number of frames being incorrect, the rows is wrong? Or columns? Or bits allocated? Or samples per pixel? How do I decide which of those is likely in error when there's excess padding?

And then keep in mind that whatever solution you suggest has to at least make things occasionally better for everyone and hopefully never worse.

And finally, in v3.0 the pixel_data_handlers backend is being deprecated and we'll be moving to the pixels module instead: frame length calculation function and native array conversion function.

@scaramallion
Copy link
Member

scaramallion commented Mar 28, 2024

Thinking about it a bit more, I think it does make sense when you go from 1 frame to say... 3+ frames? But 1 -> 2 frames may be a bits allocated or image size issue and 1 -> 3 frames may be a samples per pixel issue (although you could check the photometric interpretation is consistent). And I don't think its a good idea to do anything except try and anticipate there being excess frames, since changing the other pixel module elements may result in negative effects.

So yes, if you want to submit a PR that takes those things into account it would be greatly appreciated

@luissantosHCIT
Copy link
Author

luissantosHCIT commented Mar 28, 2024

I have though about including your suggested approach, but the problem is you have to assume you're getting the excess padding warning because everything except number of frames is correct. So then what happens if instead of number of frames being incorrect, the rows is wrong? Or columns? Or bits allocated? Or samples per pixel? How do I decide which of those is likely in error when there's excess padding?

Yes, you are correct I am assuming that other systems do not give us the wrong rows, columns, samples_per_pixel, and bits_allocated. In fact, earlier in the week, I had forgotten about samples_per_pixel until I was testing against a representative cohort of what I normally experience in production. Since then and now I took into account the samples_per_pixel and will update my own workaround to also account for bits allocated (get the correct number of bytes per pixel when calculating the frame_size. So you bring some excellent points.

And finally, in v3.0 the pixel_data_handlers backend is being deprecated and we'll be moving to the pixels module instead: frame length calculation function and native array conversion function.

I am excited about the new version but was hoping to have an interim solution at the library level so that I could skip performing deep copies of the ds instance so that I could clutch my way into receiving the correct numpy array and enable subframe iteration.

Thinking about it a bit more, I think it does make sense when you go from 1 frame to say... 3+ frames? But 1 -> 2 frames may be a bits allocated or image size issue and 1 -> 3 frames may be a samples per pixel issue (although you could check the photometric interpretation is consistent). And I don't think its a good idea to do anything except try and anticipate there being excess frames, since changing the other pixel module elements may result in negative effects.

Hmm, I did not consider the photometric interpretation in my latest local implementation. It's going to come down to calculating the frame size as properly as possible like you said. I do think that changing the default value for getattr(ds, "NumberOfFrames", 1) line is the safer spot to introduce a best attempt at calculating the number of frames. I think that if a source is malicious enough to mess with the tags associated with frame size calculations there is not much to do than have them correct their programs. Setting the default to the calculated frame count should improve situations in which a good actor properly recorded all other characteristics of the frame stored but did not explicitly record the expected number of frames in the PixelData buffer.

Let me update my current solution to account for bits allocated and photometric interpretation. I will take into account the new functions you shared. It's possible that backporting frame length calculation function is all that we need right now until 3.0 is officially released. If 3.0 is releasing soon, we can leave things as they currently are (I can learn to live with my hacky solution haha). Thank you!

One question, does this issue need to remain open when I am ready to open a PR? If not, you can close or I can close. I will simply point back to this issue as reference when the time comes.

@scaramallion
Copy link
Member

No problem, looking forward to seeing your PR. You can keep the issue open and we'll close it once the PR is merged.

luissantosHCIT added a commit to luissantosHCIT/pydicom that referenced this issue Apr 3, 2024
Updated pixel_data_handlers.util.get_expected_length()
- Accounts for predicted number of frames if the tag NumberOfFrames is not defined and the rows/columns reflect the size of a single frame in a multiframe DICOM slice.
- Does not remove previous behavior of defaulting to 1 frame  if NumberOfFrames is defined as None.
- Refactored variable naming to reflect when computing pixels vs. bytes for the expected length.
- closes pydicom#2035
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants