-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate MR/PET (+CT) images with deidentification methods #1709
Comments
The situation has definitely changed since this was opened. Last year, the following text was adopted:
A dataset that is skull-stripped or defaced prior to or during curation into BIDS is raw, by definition.
I would be +1 for adding these DICOM fields to BIDS directly.
While this is useful for converting from de-identified DICOM, it would be good to have recommendations for how a curating tool should populate these fields. Something like:
I don't have a strong opinion on what recommendation is to be made. Doing something similar to what is done in DICOM is probably best, but I haven't seen these. |
Thank you for your support! I do want to note again that this isn't just MRI but also PET and CT, while the topic title and tag applied here are both MRI-specific.
I'm far from a DICOM expert, but I tried to read the standards carefully and the way we've implemented in our deface tools (which are being applied to some large and significant open datasets in the wild) looks like this:
De-identification Method and De-identification Method Code Sequence definitely apply to more than just de-facing, i.e. removal of various DICOM tags is also coded here. A table of the defined codes is in this table, values 113101-113112 https://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_d.html. Users can also define their own codes, as we've done in the second sequence-block. We used the second (custom) sequence block because DICOM de-identifier software would often be run after de-facing and could easily overwrite De-identificationMethod with their own information, so using the sequence better ensured that the info would carry through. Something that's gone through both a de-facer and a DICOM deidentifier could look like this:
One issue for BIDS is that most of DICOM de-identification is irrelevant to the converted .nii. Things that affect the pixel data are relevant, but removal of PatientName and such are not. The "safe" thing for BIDS would be to serialize all of this information and store it, capturing both what's relevant and what's irrelevant, but that would be very long and difficult to parse. A less-verbose option could be to exclude the code sequences that shouldn't modify anything relevant to nii+json (from this example, ignore the blocks with 113100, 113107, 113111) and keep 113102 + the custom-defined (since there's no way to know if that will be relevant or irrelevant). A low-effort variant for BIDS could be to just create json fields that directly capture DICOM's Patient Identity Removed (boolean) and De-identification Method (string, non-standardized), ignoring De-identification Method Code Sequence because it's just too complex. This may not capture everything but it would still a big improvement over not having any information captured at all. It could even be considered to skip the boolean PatientIdentityRemoved since basically every BIDS dataset will at least have PatientID re-coded, so it's always TRUE and therefore meaningless. With only DeIdentificationMethod, there's just an unstandardized string so it would be up to users to do whatever and it won't really be machine parse-able. The CodeSequence was designed to solve the latter, but it's not straightforward. Are there any other areas in BIDS where DICOM CodeSequences were captured? That could offer some guidance. Otherwise it's perfectly fine for BIDS to add a direct capture/translation of "(0012, 0063) De-identification Method" and solve 90% of this need with very small effort. |
Cool, thanks for sharing that. So an absolutely minimal-effort translation would be: "DeidentificationMethod": "Per DICOM PS 3.15 AnnexE. Details in 0012,0064, mri_reface 0.3.3",
"DeidentificationMethodCodeSequence": [
{
"CodeValue": "113100"
"CodingSchemeDesignator": "DCM",
"CodeMeaning": "Basic Application Confidentiality Profile"
},
{
"CodeValue": "113107"
"CodingSchemeDesignator": "DCM",
"CodeMeaning": "Retain Longitudinal Temporal Information Modified Dates Option"
},
{
"CodeValue": "113111"
"CodingSchemeDesignator": "DCM",
"CodeMeaning": "Retain Safe Private Option"
},
{
"CodeValue": "113102"
"CodingSchemeDesignator": "DCM",
"CodingSchemeVersion": "01",
"CodeMeaning": "Clean Recognizable Visual Features Option"
},
{
"CodeValue": "replace_recognizable"
"CodingSchemeDesignator": "mri_reface",
"CodingSchemeVersion": "0.3.3",
"CodeMeaning": "Replace face, ears, and artifacts in air"
}
] It looks like we could recommend that tools add or append a comma-separated short description to {
"Code": "replace_recognizable"
"Designator": "mri_reface",
"Version": "0.3.3",
"Description": "Replace face, ears, and artifacts in air"
}
I don't think so.
IMO it would be reasonable for a DICOM converter to blacklist known "uninteresting" fields, but I think the standard is plenty verbose without dictating those decisions.
I added PET, but CT is not yet in BIDS. There is a moribund BEP (https://bids.neuroimaging.io/bep024) that could be revived. It seems short enough that it should not be a heavy lift. It needs a champion that is familiar with CT to wrap up the discussions and bring it into the main spec. |
Thanks Chris! I was thinking too rigidly about BIDS fields as single strings rather than using a more complex .json structure to capture sequences. In that case I think I agree most with the "minimal-effort translation" example you proposed. BIDS could easily drop PatientIdentityRemoved while keeping both DeidentificationMethod and DeidentificationMethodCodeSequence in totality. Whether to drop some specific numeric codes would really then be up to Chris Rorden et al rather than BIDS, but my guess is they'd choose to keep them all as the "safe" option. I have no objection to also shortening some of those fields as in your second example. Does BIDS have any general guidance on keeping field names verbatim from DICOM vs making them shorter/friendlier? That general concept seems like it'd have already come up and been decided at some point. Are we at the point of making a PR out of this, or should it sit open for discussion for a while first? |
We definitely want to give people some time to chime in, but a PR can help make further discussion more concrete. I don't think I'm up to writing a PR yet, but I don't want to stop you. Just be aware that conversations can change direction dramatically, and writing a PR does not necessarily mean acceptance (see previous thread). |
I think in general we try to keep things pretty close when there's a 1-1 correspondence. On the other hand, DICOM seems to have a kind of global namespace, where BIDS is pretty comfortable with reusing fields (like "Description") when they fit in multiple places, especially in nested structures. So I would probably encourage keeping the two top-level as direct DICOM. I could go either way for the others.
You'd think so. If it's been written down, I can't readily find it. @yarikoptic might be the most likely to know for sure. |
Another mild point of support: I've learned that at least some Siemens MRI scanners have built-in "anonymization" options that strip some tags, and these also fill DeidentificationMethod and DeidentificationMethodCodeSequence. These DICOM tags are being used out in the wild beyond just de-facing, and it would be really great if the BIDS json files could capture that information. |
I suppose it's been enough time for people to raise objections. Would you be up to drafting some text or writing a PR? |
Yes, but I am away for the next couple of weeks. I can start to work on it after I return. Thanks for your continued interest and support! |
I'm having a hard time figuring out how to officially link them, but I created PR #1772 for this. |
Originally posted by @CGSchwarzMayo in #666 (comment):
I came upon [#666] in looking for a .json field that I could population to indicate that images had been de-faced, and with what software/version they had been de-faced. I think the de-facing field has changed a bit since 2020 and I'm hoping the BIDS group might be willing to revisit it.
Some major points of change in the past few years:
One solution proposed here has been Bids Derivatives. De-facing is actually discussed as the first example on the BIDS-Derivatives page https://bids-specification.readthedocs.io/en/stable/derivatives/introduction.html When de-facing is performed during the curation process, the image would be stored as if it were raw data e.g. sub-01/anat/sub-01_T1w.nii.gz. From what I can tell, this solution makes the image indistinguishable from unmodified raw data. While for most downstream consumers the de-faced data will be the only copy they ever see, so treating it as original primary is appropriate, I don't see that there are standard fields in BIDS-specified .json files that would allow specifying the de-facing software or version used in a standardized and machine-readable format.
Without a standardized .json field, there is no place to store this information that is critical to both users and database maintainers in maintaining and understanding the images. Adding such a field would also allow its preservation when de-faced DICOM are converted to .nii via dcm2niix and similar programs, vs. now the information is lost because no BIDS fields have been standardized.
Would the BIDS maintainers be interested in re-visiting this idea with top-level fields for deface-software and deface-software-version? While I'm personally less invested in skull-stripping, I would envision the same argument for software+version would be appropriate for skull-strip related tags as well.
The text was updated successfully, but these errors were encountered: