Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Photometric Interpretation of the dataset is YBG_FULL_422 [...] You may need to change the Photometric Interpretation to the correct value #240

Open
javitab opened this issue Nov 24, 2022 · 14 comments

Comments

@javitab
Copy link

javitab commented Nov 24, 2022

I'm trying to use DicomCleaner, I do my detect, followed by the clean and save, however, I get the below error message which appears to be coming from pydicom, not sure if this issue should go under there. I find that I am successful with, for example, a demographics page (RGB), however, any of the other dcm files that do indeed have a Photometric Interpretation of YBG_FULL_422 throw the below error and when I try to load them in, for example, Micro Dicom Viewer, it's a bunch of nonsense.

Do I really need to handle this myself or am I missing something? This seems to be happening the same with different manufacturers as well.

C:\git\dicomtools\venv\Lib\site-packages\pydicom\pixel_data_handlers\numpy_handler.py:250: UserWarning: The Photometric Interpretation of the dataset is YBR_FULL_422, however the length of the pixel data (21823488 bytes) is a third larger than expected (14548992 bytes) which indicates that this may be incorrect. You may need to change the Photometric Interpretation to the correct value.

@vsoch
Copy link
Member

vsoch commented Nov 24, 2022

I'm not sure - I would test loading / interacting with the dataset first and seeing if it's an issue with pydicom or within deid. If it's within deid, then we should figure out what changes are being done to lead to the issue. My suggestion:

  1. Try loading / saving with pydicom alone to see if it reproduces
  2. You might want to open an issue over there anyway to see if the core devs have feedback
  3. If it's related to a deid-specific thing, we will need a dummy dataset provided and example how to reproduce your issue.

pinging @wetzelj for thoughts!

@javitab
Copy link
Author

javitab commented Nov 24, 2022

I'm fairly new to pydicom at large, I will try loading and saving with only pydicom and see what my luck is. You're suggesting just the act of loading the files and writing them back out with minimal to no changes, correct?

@vsoch
Copy link
Member

vsoch commented Nov 24, 2022

Yes! And then ping the pydicom maintainers in case they have an idea. Probably you'll wind up back here, and in which case I'll need a dummy dataset to reproduce your error and then be able to work on.

And no worries about being new to pydicom - welcome!

@wetzelj
Copy link
Contributor

wetzelj commented Nov 28, 2022

Unfortunately I haven't seen this issue - and don't think I've dealt with YBR_FULL_422 images at all.

That said, I noticed that in clean.py deid is switching the PhotometricInterpretation to RGB in order to obtain the pixel_array. This makes me wonder if it is something that we're doing incorrectly when masking the pixels on a YBR_FULL_422 image - and the error is being produced by pydicom on the write file.

Ultimately, I think we're going to need a sample dataset in order to debug fully.

@vsoch
Copy link
Member

vsoch commented Nov 28, 2022

+1! I hope you can provide us with a dataset to reproduce @johnavitable. I'm suspicious of the same.

@javitab
Copy link
Author

javitab commented Nov 28, 2022

So, in an effort to do a sanity check, I ran over the dicom cookies dataset and actually encountered the same issue, it would appear this is not specific to any datasets of my own. The dicom cookies actually are all YBR_FULL_422 as well. I've tried a variety of versions of python, though the latest I tried with dicom cookies was 3.10.7.

In running through dicom cookies, it throws the error in image1.dcm and then the rest hit the blacklist but are also rendered unviewable. I'm not sure if maybe there's something that changed with numpy or with a new version of python? I was trying to spin up the docker container but based on the Dockerfile, it didn't look like it was doing much that I was missing. I'm on python 3.10.7. Not sure what other info you may need given its present in the sample data. Below is the code that I'm using to run the anonymization, I'm no expert, but I think this should be doing the trick:

from deid.dicom import get_files, replace_identifiers, get_identifiers, clean_pixel_data, DicomCleaner
from deid.utils import get_installdir
from deid.data import get_dataset
from deid.config import DeidRecipe
from deid.logger.message import bot
from pydicom import read_file, dataset
from pprint import pprint
import os
import secrets


if __name__ == "__main__":

    #Generating directory for exam
    random_string = secrets.token_urlsafe()
    print(f"Session ID: {random_string}")

    
    current_directory = os.getcwd()
    final_directory = os.path.join(current_directory, r'data')
    if not os.path.exists(final_directory): os.makedirs(final_directory)
    os.makedirs(os.path.join(f'data/{random_string}','input/'))
    os.makedirs(os.path.join(f'data/{random_string}','output/'))
    input_dir=f'data/{random_string}/input/'
    output_dir=f'data/{random_string}/output/'

    input("Please put files in input folder as named above (ENTER)")
    
    input_files=list(get_files(input_dir))

    bot.log("Discovering DICOM files")
    for file in input_files:
        dicom = read_file(file)    
        print(f"###DICOM File: {os.path.basename(file)}:{dicom.get('PatientID')} - {dicom.get('PatientName')} - {dicom.get('PatientSex')} \n {dicom.get('PhotometricInterpretation')}")
    
    deid = 'stock.deid.dicom'
    recipe = DeidRecipe(deid)

    cleaner = DicomCleaner(deid=deid)
    ifile=0
    for file in input_files:
        ifile+=1
        try:
            dicom=read_file(file)
            print(f"### Reading DICOM File {ifile}/{len(input_files)}: {os.path.basename(file)}:{dicom.get('PatientID')} - {dicom.get('PatientName')} - {dicom.get('PatientSex')}")
        except Exception as e:
            bot.warning(f"Error reading file {file}")
        try:
            detected = cleaner.detect(file)
            pprint(detected)
            cleaned = cleaner.clean(fix_interpretation=False)
            cleaner.save_dicom(output_folder=output_dir)
        except Exception as e:
            bot.warning("### START ERROR CLEANING PIXELS###")
            print(e)
            bot.warning("### END ERROR CLEANING PIXELS###")

    output_files=list(get_files(output_dir))
    

    bot.log("Getting current identifiers" )
    #Get real patient identifiers
    ids = get_identifiers(output_files)

    bot.log("Removing identifiers")
    #Remove patient identifiers
    files_ids_removed = replace_identifiers(
        dicom_files=output_files, 
        deid=recipe,
        ids=ids,
        save=True,
        overwrite=True
    )

    bot.log(f"Identifiers removed from {len(files_ids_removed)} files.")

    bot.log(f"Anonymization complete. Files have been written to {output_dir}")

@vsoch
Copy link
Member

vsoch commented Nov 28, 2022

Thanks @javitab! I'll add this to my list of TODO but it's quite chonky at the moment so I might not get to testing it out immediately. Thanks for figuring out a reproducing case! We did recently have changes to the cleaner (the coordinate system was off I think) so I wouldn't be surprised if there is a bug.

@javitab
Copy link
Author

javitab commented Nov 28, 2022

Thanks, that sounds plausible to me. I think I saw an instance or two where there was an RGB image that was almost viewable, but the "green" layer was skewed, if that makes sense.

@vsoch
Copy link
Member

vsoch commented Nov 30, 2022

Okay got a chance to run this, and on the dicom cookies as you suggested! I don't think I see the error block?

$ python test.py 
Session ID: RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU
Please put files in input folder as named above (ENTER)
LOG Discovering DICOM files
###DICOM File: image5.dcm:cookie-47 - falling disk - M 
 YBR_FULL_422
###DICOM File: image3.dcm:cookie-47 - still salad - F 
 YBR_FULL_422
###DICOM File: image6.dcm:cookie-47 - noisy feather - M 
 YBR_FULL_422
###DICOM File: image7.dcm:cookie-47 - frosty paper - F 
 YBR_FULL_422
###DICOM File: image2.dcm:cookie-47 - billowing mode - F 
 YBR_FULL_422
###DICOM File: image1.dcm:cookie-47 - nameless waterfall - F 
 YBR_FULL_422
###DICOM File: image4.dcm:cookie-47 - flat glade - M 
 YBR_FULL_422
WARNING Problem loading stock.deid.dicom, skipping.
WARNING Problem loading stock.deid.dicom, skipping.
### Reading DICOM File 1/7: image5.dcm:cookie-47 - falling disk - M
{'flagged': True,
 'results': [{'coordinates': [],
              'group': 'blacklist',
              'reason': ' ImageType missing  or ImageType empty '}]}
Scrubbing data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/input/image5.dcm.
/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pydicom/pixel_data_handlers/numpy_handler.py:341: UserWarning: The Photometric Interpretation of the dataset is YBR_FULL_422, however the length of the pixel data (9437184 bytes) is a third larger than expected (6291456 bytes) which indicates that this may be incorrect. You may need to change the Photometric Interpretation to the correct value.
  warnings.warn(msg)
### Reading DICOM File 2/7: image3.dcm:cookie-47 - still salad - F
{'flagged': True,
 'results': [{'coordinates': [],
              'group': 'blacklist',
              'reason': ' ImageType missing  or ImageType empty '}]}
Scrubbing data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/input/image3.dcm.
### Reading DICOM File 3/7: image6.dcm:cookie-47 - noisy feather - M
{'flagged': True,
 'results': [{'coordinates': [],
              'group': 'blacklist',
              'reason': ' ImageType missing  or ImageType empty '}]}
Scrubbing data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/input/image6.dcm.
### Reading DICOM File 4/7: image7.dcm:cookie-47 - frosty paper - F
{'flagged': True,
 'results': [{'coordinates': [],
              'group': 'blacklist',
              'reason': ' ImageType missing  or ImageType empty '}]}
Scrubbing data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/input/image7.dcm.
### Reading DICOM File 5/7: image2.dcm:cookie-47 - billowing mode - F
{'flagged': True,
 'results': [{'coordinates': [],
              'group': 'blacklist',
              'reason': ' ImageType missing  or ImageType empty '}]}
Scrubbing data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/input/image2.dcm.
### Reading DICOM File 6/7: image1.dcm:cookie-47 - nameless waterfall - F
{'flagged': True,
 'results': [{'coordinates': [],
              'group': 'blacklist',
              'reason': ' ImageType missing  or ImageType empty '}]}
Scrubbing data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/input/image1.dcm.
### Reading DICOM File 7/7: image4.dcm:cookie-47 - flat glade - M
{'flagged': True,
 'results': [{'coordinates': [],
              'group': 'blacklist',
              'reason': ' ImageType missing  or ImageType empty '}]}
Scrubbing data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/input/image4.dcm.
LOG Getting current identifiers
LOG Removing identifiers
LOG Identifiers removed from 7 files.
LOG Anonymization complete. Files have been written to data/RbEPDAqbQyD8usnKtpDVb-8pCrExTJ6n5CbmZ_wCpvU/output/

But I'm not sure these are the best reproducing cases to test - these were fake images I made, mostly for the header parsing.

image

Do we have a reproducing image that wasn't artificially made by me?

@javitab
Copy link
Author

javitab commented Dec 8, 2022

I've been doing some more testing and have a few discoveries. I've attached some sample data and information from my environment. I have noticed more specifically that, while all of the dicom-cookies samples get destroyed when I try to run through them, for the most part, any files that are a static image are successful, and anything that includes a cine clip, gets destroyed.

Further, I have found that the ones that do get destroyed (the cine clips) the file size grows by several orders of magnitude. I haven't included the output from when I run through as it becomes far too large to upload here, but if that is needed, I'm happy to work at sharing that data.

Lastly, being the difference between what @vsoch gets after running through the same script and what I get, I noticed in your output that you're on python 3.9. I switched to 3.9 from 3.11, though that unfortunately didn't make any difference. I've also included the output of pip freeze in current_packages.txt. Not sure if there's a dependency that might have broken something.

Thanks again!

dicom_data.zip

EDIT: I've also tried running all of this through the docker container with no difference.

@vsoch
Copy link
Member

vsoch commented Dec 8, 2022

Thanks! I'm not sure what further help I can offer - this isn't my primary area of work for several years and I lack the expertise. If someone that has the expertise wants to take charge of investigating this and opening a PR it would be greatly appreciated.

@javitab
Copy link
Author

javitab commented Dec 8, 2022

Would you by any chance be able to share the output of a pip freeze from where you ran the code earlier? At the very least, I'm confused as to why we get different outcomes on the dicom-cookies dataset. I was looking at a prior issue that's been logged here about RGB cine loops, so I'm wondering if my problem with dicom-cookies might point at anything else.

@vsoch
Copy link
Member

vsoch commented Dec 8, 2022

The environment above is long gone - but here is my Python:

python --version
Python 3.9.12

I use anaconda and basically create a new venv, source it, then pip install -e . within deid.

@vsoch
Copy link
Member

vsoch commented Dec 8, 2022

I think it's almost time for a python update - my pip freeze seems borked :(

 pip freeze
ERROR: Exception:
Traceback (most recent call last):
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 160, in exc_logging_wrapper
    status = run_func(*args)
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/commands/freeze.py", line 87, in run
    for line in freeze(
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/operations/freeze.py", line 43, in freeze
    req = FrozenRequirement.from_dist(dist)
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/operations/freeze.py", line 237, in from_dist
    req, comments = _get_editable_info(dist)
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/operations/freeze.py", line 164, in _get_editable_info
    vcs_backend = vcs.get_backend_for_dir(location)
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/vcs/versioncontrol.py", line 238, in get_backend_for_dir
    repo_path = vcs_backend.get_repository_root(location)
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/vcs/mercurial.py", line 143, in get_repository_root
    r = cls.run_command(
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/vcs/versioncontrol.py", line 650, in run_command
    return call_subprocess(
  File "/home/vanessa/Desktop/Code/deid/env/lib/python3.9/site-packages/pip/_internal/utils/subprocess.py", line 141, in call_subprocess
    proc = subprocess.Popen(
  File "/home/vanessa/anaconda3/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/vanessa/anaconda3/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
NotADirectoryError: [Errno 20] Not a directory: 'hg'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants