Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when checking whether code is contained in context group #1388

Open
hackermd opened this issue May 16, 2021 · 21 comments
Open

Error when checking whether code is contained in context group #1388

hackermd opened this issue May 16, 2021 · 21 comments
Labels
Milestone

Comments

@hackermd
Copy link
Contributor

Describe the bug
For certain context groups, an AssertionError is raised when one tries to check whether a given code is contained in the context group.

Expected behavior
When a check is performed using the in operator whether an instance of pydicom.sr.coding.Code is contained in pydicom.sr.codedict.codes.cid*, a Boolean value should be returned and no Exception should be raised.

Steps To Reproduce

>>> from pydicom.sr.codedict import codes
>>> codes.SCT.Lung in codes.cid4
AssertionError: HipJoint had multiple code matches for cid4

Your environment

module       | version
------       | -------
platform     | macOS-10.15.6-x86_64-i386-64bit
Python       | 3.9.1 (default, Feb  3 2021, 07:04:15)  [Clang 12.0.0 (clang-1200.0.32.29)]
pydicom      | 2.1.2
@scaramallion
Copy link
Member

scaramallion commented May 26, 2021

Issue occurs because Hip joint is in CIDs 4 and 4031 (which is an inclusion of an inclusion). The newest version looks fine so the data probably needs to be regenerated, but I don't know enough about SR to judge whether or not that's a good idea without any versioning info...

@hackermd
Copy link
Contributor Author

Issue occurs because Hip joint is in CIDs 4 and 4031 (which is an inclusion of an inclusion).

The context groups represent value sets and it should be fine for a value to be in more than one set.

@hackermd
Copy link
Contributor Author

The newest version looks fine so the data probably needs to be regenerated, but I don't know enough about SR to judge whether or not that's a good idea without any versioning info...

An update can be problematic if for whatever reason the meanings in the standard change, which would result in different Python dictionary keys and potentially break existing code. SNOMED-CT codes have synonyms and a different code meaning could in principle be used for the same code value.

For example, the SNOMED Browser lists the following synonyms for Hip joint:

en Hip joint
en Coxofemoral joint
en Hip joint structure (body structure)
en Hip joint structure
en Hip
en Coxal joint

@dclunie could you provide some insight the process for updating CIDs in Part 16 (in particular the FHIR JSON format). Is there a chance for meanings to change from one edition of the standard to another?

We should probably improve the implementation in the generate_code_dicts.py script. For example, we could check whether a code already exists for a given coding scheme (using the Code.__eq__ method, which compares the Code Value and Coding Scheme Designator rather than Code Meaning) and re-use the existing name in case of a collision.

@dclunie
Copy link

dclunie commented May 27, 2021

Yes, code meanings are changed/corrected/rationalized occasionally.

@hackermd
Copy link
Contributor Author

Thanks for confirming @dclunie

@darcymason
Copy link
Member

AssertionError: HipJoint had multiple code matches for cid4

I believe I wrote that code, and vaguely remember putting in asserts in a couple of places just as a defence against unusual situations that were (to me at least) unexpected.

The context groups represent value sets and it should be fine for a value to be in more than one set.

I'll try to get my head around the code logic again and see how the assertion could be safely removed. Likely is an easy fix.

@scaramallion scaramallion mentioned this issue Jun 15, 2021
7 tasks
@scaramallion
Copy link
Member

scaramallion commented Jun 15, 2021

Hmm, would it make sense to decouple the SR tables from pydicom itself and have them as a separate package (pydicom-sr-data)? That way we could explicitly release them with a given version and users could pin the one that fits their needs.

Alternatively we could release pydicom packaged with the newest version of the tables and make it possible to use older (or newer) versions by installing a specific version of the separate package. Could probably just make it so a Github Action runs weekly (?) and checks the DICOM docbook for changes and performs auto-releases.

If we wanted to we could probably do the same thing with the regular data dictionaries as well

@darcymason
Copy link
Member

Hmm, would it make sense to decouple the SR tables from pydicom itself and have them as a separate package (pydicom-sr-data)? That way we could explicitly release them with a given version and users could pin the one that fits their needs.

That could help, but by itself doesn't handle all problems because it is the meaning of the standards themselves that changes. But the pinning possibility would give users some options. @hackermd, thoughts?

If we wanted to we could probably do the same thing with the regular data dictionaries as well

Hmmm, that actually sounds good. We can have an import check for a separate pydicom-dicom-dict (or whatever it is called) and if newer than the one with pydicom, import that dictionary. Then users could update to latest DICOM dictionary without waiting for a pydicom release to catch up. I don't see older as being necessary, changes are backwards compatible, with the exception that some currently 'retired' items might not have been in previous version, or (rarely) some typo has been fixed. Having said that, it would be easy to have a flag for 'use even if older version', so I'd support it.

@hackermd
Copy link
Contributor Author

That could help, but by itself doesn't handle all problems because it is the meaning of the standards themselves that changes. But the pinning possibility would give users some options. @hackermd, thoughts?

I would prefer if we'd keep the tables in the library. There are only a few editions of the standard per year and I am thus not very concerned about the standard and the library getting out of sync.

Alternatively we could release pydicom packaged with the newest version of the tables and make it possible to use older (or newer) versions by installing a specific version of the separate package.

We need to ensure that the codes are fully backwards compatible, therefore the need to use an older version of the table should not arise. If a code gets retired in the underlying coding schemes, I would suggest exposing the new version (rather than the retired version), but making sure that pydicom.sr.coding.Code.__eq__ takes the retired code into account when establishing equality.

@darcymason
Copy link
Member

@hackermd, I started looking at this again - the original point about the in operator is easily solved (just trap the error and return True), but is it enough to fix just that problem? I imagine the error will just be shifted to when one tries to use a value. Is it okay to just return the first match to the code found, or is there some other better choice?

@scaramallion
Copy link
Member

scaramallion commented Jun 27, 2021

The problem is that the pydicom attribute name matches multiple codes and may do so (and in fact actually does) for codes in the same CID and same scheme. I would suggest that we add explicit custom attribute names for those duplicates, possibly by appending the code to the end of the attribute?

A list of duplicated attributes for the recent DICOM version:

ERROR:sr.process:'RightPosteriorOblique': 'Right posterior oblique' in CID 501 is duplicated!
ERROR:sr.process:'LeftPosteriorOblique': 'left posterior oblique' in CID 501 is duplicated!
ERROR:sr.process:'Axial': 'Axial' in CID 501 is duplicated!
ERROR:sr.process:'LeftAnteriorOblique': 'left anterior oblique' in CID 501 is duplicated!
ERROR:sr.process:'RightAnteriorOblique': 'right anterior oblique' in CID 501 is duplicated!
ERROR:sr.process:'Oblique': 'oblique' in CID 501 is duplicated!
ERROR:sr.process:'SagittalObliqueAxial': 'sagittal-oblique axial' in CID 501 is duplicated!
ERROR:sr.process:'LateralMedial': 'Lateral-medial' in CID 501 is duplicated!
ERROR:sr.process:'MedialLateral': 'medial-lateral' in CID 501 is duplicated!
ERROR:sr.process:'MedioLateralOblique': 'Medio-lateral oblique' in CID 501 is duplicated!
ERROR:sr.process:'LateroMedialOblique': 'Latero-medial oblique' in CID 501 is duplicated!
ERROR:sr.process:'ObliqueAxial': 'oblique axial' in CID 501 is duplicated!
ERROR:sr.process:'Lateral': 'lateral' in CID 501 is duplicated!
ERROR:sr.process:'MusculusAdductorHallucis': 'Musculus adductor hallucis' in CID 3031 is duplicated!
ERROR:sr.process:'MusculusAbductorDigitiMinimi': 'Musculus abductor digiti minimi' in CID 3031 is duplicated!
ERROR:sr.process:'MusculusAbductorDigitiMinimiLeft': 'Musculus abductor digiti minimi, left' in CID 3031 is duplicated!
ERROR:sr.process:'MusculusAbductorDigitiMinimi': 'Musculus abductor digiti minimi' in CID 3031 is duplicated!
ERROR:sr.process:'MusculusFlexorDigitiMinimiBrevisLeft': 'Musculus flexor digiti minimi brevis, left' in CID 3031 is duplicated!
ERROR:sr.process:'MusculusFlexorDigitiMinimiBrevis': 'Musculus flexor digiti minimi brevis' in CID 3031 is duplicated!
ERROR:sr.process:'MusculusFlexorDigitiMinimiBrevisRight': 'Musculus flexor digiti minimi brevis, right' in CID 3031 is duplicated!
ERROR:sr.process:'Mammographic': 'Mammographic (grid)' in CID 6058 is duplicated!
ERROR:sr.process:'Mammographic': 'Mammographic (grid)' in CID 6060 is duplicated!
ERROR:sr.process:'DescendingAorta': 'Descending aorta' in CID 7151 is duplicated!
ERROR:sr.process:'CardiacPacemaker': 'Cardiac Pacemaker' in CID 7151 is duplicated!
ERROR:sr.process:'PulmonaryVein': 'Pulmonary vein' in CID 7151 is duplicated!
ERROR:sr.process:'PulmonaryVein': 'Pulmonary vein' in CID 7192 is duplicated!
ERROR:sr.process:'DescendingAorta': 'Descending aorta' in CID 7192 is duplicated!
ERROR:sr.process:'CardiacPacemaker': 'Cardiac Pacemaker' in CID 7193 is duplicated!
ERROR:sr.process:'Nasopharynx': 'Nasopharynx' in CID 8134 is duplicated!
ERROR:sr.process:'Pelvis': 'Pelvis' in CID 8134 is duplicated!
ERROR:sr.process:'DescendingAorta': 'Descending aorta' in CID 8134 is duplicated!
ERROR:sr.process:'LumboSacralSpine': 'Lumbo-Sacral Spine' in CID 8134 is duplicated!
ERROR:sr.process:'PulmonaryVein': 'Pulmonary vein' in CID 9514 is duplicated!
ERROR:sr.process:'DescendingAorta': 'Descending aorta' in CID 9514 is duplicated!
ERROR:sr.process:'LeftVentricularInternalDiastolicDimensionBSA': 'Left ventricular internal diastolic dimension / BSA' in CID 12300 is duplicated!
ERROR:sr.process:'LeftVentricularPosteriorWallSystolicThickness': 'Left ventricular posterior wall systolic thickness' in CID 12300 is duplicated!
ERROR:sr.process:'LeftVentricularPosteriorWallDiastolicThickness': 'Left ventricular posterior wall diastolic thickness' in CID 12300 is duplicated!
ERROR:sr.process:'LeftVentricularInternalSystolicDimensionBSA': 'Left ventricular internal systolic dimension / BSA' in CID 12300 is duplicated!

I've fixed some of the duplicates by adding more rules during processing, all the above are "true" duplicates in the sense that they have the same meaning in current DICOM.

@darcymason
Copy link
Member

Sorry all, I'd forgotten about cleaning this up before the 2.2 release (#1259).

I would suggest that we add explicit custom attribute names for those duplicates, possibly by appending the code to the end of the attribute?

Do we have agreement on this? If the user needs to know the code to add to the name, then perhaps they can just use that directly instead.

As always, I'm no expert in SR use, so appreciate comments from those that are, and how it best makes sense to solve this. As I noted before, the in operator is easily handled - just trap the duplication error and return True, but then the user presumably wants to actually access a value after confirming it (or they) exist(s).

@hackermd
Copy link
Contributor Author

I would suggest that we add explicit custom attribute names for those duplicates, possibly by appending the code to the end of the attribute?

I would not do that. The problem with context groups such as CID 501 is that these groups include multiple other groups and some codes may occur in multiple included groups.

We should now be able to handle these duplicates relatively easily using the __hash__ method, which we recently introduced. We could just collect codes in a set, which would automatically avoid any duplication.

We already have duplicate code entries in the code dicts, which are semantically equivalent (i.e., have the same Code Value and Coding Scheme Designator and __eq__ would return True), but have a different Code Meaning and thus result in separate dict keys (attributes).
For example, there is codes.SCT.Rectum (Code(value='34402009', scheme_designator='SCT', meaning='Rectum', schem_version=None) and codes.SCT.RectumStructure (Code(value='34402009', scheme_designator='SCT', meaning='Rectum structure (body structure)', scheme_version=None)), which are semantically equivalent (codes.SCT.Rectum == codes.SCT.RectumStructure). Several similar entries exist.
There is no harm in keeping those, since they are not actual duplicates but different pointers to the same code.

@scaramallion
Copy link
Member

I would not do that.

Yeah agreed, apparently I'd forgotten what I'd said earlier in the same comment...

@darcymason
Copy link
Member

We should now be able to handle these duplicates relatively easily using the __hash__ method, which we recently introduced. We could just collect codes in a set, which would automatically avoid any duplication.

Right, the __hash__ method - I did see that but already forgot about the connection to this. Checking by set does seem like a great solution.

@darcymason
Copy link
Member

So I started looking into this, and in current master, the first post code lines above give a different error:

>>> from pydicom.sr.codedict import codes
>>> codes.SCT.Lung in codes.cid4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\git\pydicom\pydicom\sr\codedict.py", line 196, in __contains__
    return any([concept == code for concept in self.concepts.values()])
  File "c:\git\pydicom\pydicom\sr\codedict.py", line 87, in __getattr__
    raise AttributeError(
AttributeError: 'concepts' not found in CID 4

I assume this relates to something done in the last few pulls (original error replicated when I checkout v2.1.2 tag). Does anyone recognize where this is coming from? ... if not, I'll dig deeper.

@darcymason
Copy link
Member

I've looked into this a bit - as shown, the error is in __getattr__, when it is trying to look up the name 'concepts', which is actually a property of the class, and so should not get to __getattr__ (which is only supposed to be triggered after normal class values etc. have been checked).

It appears this is happening because in building self._concepts in the property, an AttributeError is raised (the multiple matches problem) which Python absorbs and calls __getattr__. I've tested this with a toy class and that is what Python does - so in a class with both properties and __getattr__, one probably shouldn't allow an AttributeError exception in a property.

But I'm not sure yet why this is happening only in recent code in pydicom.

I think I'll leave this for now unless someone knows where the new behavior came in, and work on the v2.2 pre-release (#1259) at a point before this was introduced.

@hackermd
Copy link
Contributor Author

Thanks @darcymason for looking into the issue.

@darcymason
Copy link
Member

Looked into again while figuring out what commit to use for v2.2 release - this was introduced in 1546700. Simple, really, the name of the exception thrown was changed from AssertionError to AttributeError.

So, nothing too mysterious, decided to go to latest commit for release. We can fix this issue after the release.

@scaramallion
Copy link
Member

Something else to do: Tables J-1 and -2 need to be included and retired codes handled properly

@mrbean-bremen mrbean-bremen added this to the 2.4 milestone Oct 8, 2022
@mrbean-bremen
Copy link
Member

Adding 2.4 milestone to consider this for the release - this has been somewhat forgotten, it seems...

@darcymason darcymason modified the milestones: 2.4, v3.0 Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants