Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DicomValidation for VR PN #1785

Closed
alpha-yanagida opened this issue May 9, 2024 · 17 comments
Closed

DicomValidation for VR PN #1785

alpha-yanagida opened this issue May 9, 2024 · 17 comments

Comments

@alpha-yanagida
Copy link

alpha-yanagida commented May 9, 2024

Overview

Hello.

Please be aware that I am new to DICOM.

I am running into a problem with an application that receives a MWM worklist retrieval request (C-FIND) from a modality and returns a dataset with the requested tag.

(0010,0010) VR=PN VM=1 In Patient's Name, the byte data encoded with ISO 2022 IR87 for "莉" (JIS X 0208 2nd level Kanji), which is part of the name, contains "=", so when validating the dataset DicomValidation causes an error.

VR=PN allows up to two component delimiters "=", but because the byte data of the name contains "=", it reads as three "=", which causes an error if there are too many components in the VR=PN validation.

Currently, we are temporarily coping by removing DicomValidation and are able to communicate with the modality for now.

I referred to DICOM PS3.5 6.2, which is not a violation of the DICOM standard, so it seems normal to be able to send data without error with DicomValidation.

I don't want to remove DicomValidation to avoid other problems, but do you have any suggestions for a fundamental solution?

Expected behavior

DicomValidationException is not thrown.

Actual behavior

The method throws an DicomValidationException.

if (groups.Length > 3)

Actual error content

Individual names will be withheld.

例外がスローされました: 'FellowOakDicom.DicomValidationException' (fo-dicom.core.dll の中)
例外がスローされました: 'FellowOakDicom.DicomValidationException' (System.Private.CoreLib.dll の中)
例外がスローされました: 'FellowOakDicom.DicomValidationException' (System.Private.CoreLib.dll の中)
例外がスローされました: 'FellowOakDicom.DicomValidationException' (System.Private.CoreLib.dll の中)
| 18:13:14.020 | EROR | 07:__ | 26728:MyDicomApp | Exception processing P-Data-TF PDU | FellowOakDicom.Network |  |  L | MyDicomApp | 1.1.1.0 | PC | PC\user | 32270544 B|
FellowOakDicom.DicomValidationException: Content "****************=***********�$Bh=**�(B=***************************" does not validate VR PN: value contains too many groups
   at FellowOakDicom.DicomValidation.ValidatePN(String content)
   at System.Collections.Generic.List`1.ForEach(Action`1 action)
   at FellowOakDicom.DicomElement.Validate()
   at FellowOakDicom.DicomDataset.DoAdd(DicomItem item, Boolean allowUpdate)
   at FellowOakDicom.DicomDataset.DoAdd[T](DicomVR vr, DicomTag tag, IList`1 values, Boolean allowUpdate)
   at FellowOakDicom.DicomDataset.DoAdd[T](DicomTag tag, IList`1 values, Boolean allowUpdate)
   at FellowOakDicom.DicomDataset.AddOrUpdate[T](DicomTag tag, T[] values)
   at MyDicomApp.Services.MyDicomService.AddIfExistsInRequest[T](DicomDataset result, DicomDataset request, DicomTag tag, T value) in C:\Users\user\Documents\GitHub\MyDicomApp\MyDicomApp\Services\MyDicomService.cs:line 477
   at MyDicomApp.Services.MyDicomService.FilterWorklistItems(DicomDataset request, List`1 worklistItems)+MoveNext() in C:\Users\user\Documents\GitHub\MyDicomApp\MyDicomApp\Services\MyDicomService.cs:line 358
   at MyDicomApp.Services.MyDicomService.OnCFindRequestAsync(DicomCFindRequest request)+MoveNext() in C:\Users\user\Documents\GitHub\MyDicomApp\MyDicomApp\Services\MyDicomService.cs:line 123
   at MyDicomApp.Services.MyDicomService.OnCFindRequestAsync(DicomCFindRequest request)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at FellowOakDicom.Network.DicomService.PerformDimseAsync(DicomMessage dimse)
   at FellowOakDicom.Network.DicomService.PerformDimseAsync(DicomMessage dimse)
   at FellowOakDicom.Network.DicomService.ProcessPDataTFAsync(PDataTF pdu)
例外がスローされました: 'FellowOakDicom.DicomValidationException' (fo-dicom.core.dll の中)
例外がスローされました: 'FellowOakDicom.DicomValidationException' (System.Private.CoreLib.dll の中)
例外がスローされました: 'FellowOakDicom.DicomValidationException' (System.Private.CoreLib.dll の中)
| 18:13:14.301 | EROR | 07:__ | 26728:MyDicomApp | Exception processing PDU | FellowOakDicom.Network |  |  L | MyDicomApp | 1.1.1.0 | PC | PC\user | 32715848 B|
FellowOakDicom.DicomValidationException: Content "****************=***********�$Bh=**�(B=***************************" does not validate VR PN: value contains too many groups
   at FellowOakDicom.DicomValidation.ValidatePN(String content)
   at System.Collections.Generic.List`1.ForEach(Action`1 action)
   at FellowOakDicom.DicomElement.Validate()
   at FellowOakDicom.DicomDataset.DoAdd(DicomItem item, Boolean allowUpdate)
   at FellowOakDicom.DicomDataset.DoAdd[T](DicomVR vr, DicomTag tag, IList`1 values, Boolean allowUpdate)
   at FellowOakDicom.DicomDataset.DoAdd[T](DicomTag tag, IList`1 values, Boolean allowUpdate)
   at FellowOakDicom.DicomDataset.AddOrUpdate[T](DicomTag tag, T[] values)
   at MyDicomApp.Services.MyDicomService.AddIfExistsInRequest[T](DicomDataset result, DicomDataset request, DicomTag tag, T value) in C:\Users\user\Documents\GitHub\MyDicomApp\MyDicomApp\Services\MyDicomService.cs:line 477
   at MyDicomApp.Services.MyDicomService.FilterWorklistItems(DicomDataset request, List`1 worklistItems)+MoveNext() in C:\Users\user\Documents\GitHub\MyDicomApp\MyDicomApp\Services\MyDicomService.cs:line 358
   at MyDicomApp.Services.MyDicomService.OnCFindRequestAsync(DicomCFindRequest request)+MoveNext() in C:\Users\user\Documents\GitHub\MyDicomApp\MyDicomApp\Services\MyDicomService.cs:line 123
   at MyDicomApp.Services.MyDicomService.OnCFindRequestAsync(DicomCFindRequest request)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
   at FellowOakDicom.Network.DicomService.PerformDimseAsync(DicomMessage dimse)
   at FellowOakDicom.Network.DicomService.PerformDimseAsync(DicomMessage dimse)
   at FellowOakDicom.Network.DicomService.ProcessPDataTFAsync(PDataTF pdu)
   at FellowOakDicom.Network.DicomService.ProcessPDataTFAsync(PDataTF pdu)
   at FellowOakDicom.Network.DicomService.ListenAndProcessPDUAsync()

Byte data of patient name from Wireshark.

00a0   32 20 49 52 20 38 37 20 10 00 10 00 42 00 00 00 // 10~ 
00b0   4e 49 53 48 49 4b 41 57 41 5e 52 49 53 41 54 4f
00c0   3d 1b 24 42 40 3e 40 6e 1b 28 42 5e 1b 24 42 68
00d0   3d 4e 24 1b 28 42 3d 1b 24 42 25 4b 25 37 25 2b
00e0   25 6f 1b 28 42 5e 1b 24 42 25 6a 25 35 25 48 1b
00f0   28 42

Steps to reproduce the behavior

Part of the process is shown in accordance with this issue.

// Format patientName to romaji=kanji=kana
using FellowOakDicom;
var encoding1 = DicomEncoding.GetEncoding("ISO 2022 IR 6");
var encoding = DicomEncoding.GetEncoding("ISO 2022 IR 87");
var nameByte = encoding1.GetBytes("ROMAJI^NAME=").ToList();
nameByte.AddRange(encoding2.GetBytes("KANJI^NAME")); // 莉 to h=
nameByte.AddRange(encoding2.GetBytes("KANA^NAME"));
PatientName = DicomEncoding.GetEncoding("ISO 2022 IR 6").GetString(nameByte.ToArray());

// Add patient name to dataset
DicomDataset resultDataset;
resultDataset.AddOrUpdate(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");
resultDataset.AddOrUpdate(DicomTag.PatientName, patientName); // MyDicomService.cs:line 477

reference data

In byte data, "莉" is 68 3D and "=" is 3D.

ASCII Table

JIS-X0208-1997 Code Table

DICOM PS3.5

fo-dicom version and OS/platform

Fellow Oak DICOM version: 5.1.1
OS: Windows 10 64bit
Platform: .NET 6

@mrbean-bremen
Copy link
Collaborator

mrbean-bremen commented May 9, 2024

Can you please add the SpecificCharacterSet value of the dataset? Also, can you show the byte values for the encoded patient name? From the ASCII output above (�$Bh=**�(B=), they are not completely apparent.

@mrbean-bremen
Copy link
Collaborator

So, you have "\ISO 2022 IR 87" as SpecificCharacterSet in the result, but I'm still not sure what is the value of the returned patient name tag (in bytes). Can you please add this?

I understand that you are showing some code that reproduces the behavior, but I don't see where that code is used (and it is incomplete). As far as I understood, you get the problem with a dataset received via a C-FIND response, correct?

@alpha-yanagida
Copy link
Author

alpha-yanagida commented May 9, 2024

I added the following

  • resultDataset.AddOrUpdate(DicomTag.SpecificCharacterSet, "\ISO 2022 IR 87");
  • Byte data of patient name from Wireshark.
  • Added stack trace to Actual error content

I understand that you are showing some code that reproduces the behavior, but I don't see where that code is used (and it is incomplete). As far as I understood, you get the problem with a dataset received via a C-FIND response, correct?

Sorry for the poor explanation, but the app has the ability to act as a MWM server (SCP) and return data such as patient name according to the C=FIND received from the modality (SCU).

The role of the application is to mediate DICOM communication between the core system and the modalities.

  1. csv file of patient information from core system falls into folder after barcode reading
  2. when the app receives C=FIND from the modality, it reads the csv file, formats it for DICOM communication, and sends it to the modality.

The problem occurred when the patient name was formatted for DICOM communication, i.e., when the patient name was added to the data set. (See also the actual error description.)

@mrbean-bremen
Copy link
Collaborator

mrbean-bremen commented May 9, 2024

I have checked how fo-dicom handles that patient name, and I did not have any problems. To test it, I also patched a real DICOM file with that patient name and encoding, and the patient name is correctly read, decoded and validated on reading the dataset.

It would help if you could show some code with that behavior. My assumption is that the patient name is somehow wrongly decoded, so that that "=" will still be in the string and be seen as a delimiter.

What is done in the code is something like this (not exactly, this is for illustration), which works without problems:

          var bytes = new byte[]  // your patient name
            {
                0x4e, 0x49, 0x53, 0x48, 0x49, 0x4b, 0x41, 0x57, 0x41, 0x5e, 0x52, 0x49, 0x53, 0x41, 0x54, 0x4f,
                0x3d, 0x1b, 0x24, 0x42, 0x40, 0x3e, 0x40, 0x6e, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x68,
                0x3d, 0x4e, 0x24, 0x1b, 0x28, 0x42, 0x3d, 0x1b, 0x24, 0x42, 0x25, 0x4b, 0x25, 0x37, 0x25, 0x2b,
                0x25, 0x6f, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x25, 0x6a, 0x25, 0x35, 0x25, 0x48, 0x1b,
                0x28, 0x42
            };
            var dataset = new DicomDataset();
            result.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");

            var patientName = new DicomPersonName(DicomTag.PatientName,
                dataset.GetEncodingsForSerialization(), new MemoryByteBuffer(bytes));
            // patientName has now the correct decoded value
            dataset.Add(patientName);  // does validation

@alpha-yanagida
Copy link
Author

alpha-yanagida commented May 10, 2024

The following is an excerpt of the code.

PatientNameAll in the WorklistItem contained "=".

Individual names will be withheld.
� is control code ESC.

// in worklist item model
public string patientName { get; private set; }
public string patientNameAll => DicomEncoding.GetEncoding("ISO 2022 IR 6").GetString(patientNameAllByte); }
private byte[] patientNameAllByte { get; set; }

List<string> namesKanji = {"**", "莉*"}; // patient name read from csv
// Format patientName to ROMAJI^NAME=KANJI^NAME=KANA^NAME
var encoding1 = DicomEncoding.GetEncoding("ISO 2022 IR 6");
var encoding2 = DicomEncoding.GetEncoding("ISO 2022 IR 87");

var nameByte = encoding1.GetBytes($"{string.Join("^", namesRomaji)}=").ToList(); // romaji
for (int i = 0; i < namesKanji.Count; i++) // Kanji
{
    nameByte.AddRange(encoding2.GetBytes($"{namesKanji[i]}")); if (i ! = namesKanji.Count; i++)
    if (i ! = namesKanji.Count - 1)
    {
        nameByte.AddRange(encoding1.GetBytes("^")); if (i ! = namesKanji.Count - 1)
    }
}
// Kana is treated in the same way as Kanji

patientNameAllByte = nameByte.ToArray();
patientName = patientNameAll; // patientNameAll is *********^******=�$B@>@n�(B^�$Bh=N$�(B=�$B%K%7%+%o�(B^�$B%j%5%H�(B

// in MyDicomService file

var resultDataset = new DicomDataset();
resultDataset.AddOrUpdate(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");
resultDataset.AddOrUpdate(DicomTag.PatientName, patientName);

Byte data of nameByte

  | [0] | 78 | byte
  | [1] | 73 | byte
  | [2] | 83 | byte
  | [3] | 72 | byte
  | [4] | 73 | byte
  | [5] | 75 | byte
  | [6] | 65 | byte
  | [7] | 87 | byte
  | [8] | 65 | byte
  | [9] | 94 | byte
  | [10] | 82 | byte
  | [11] | 73 | byte
  | [12] | 83 | byte
  | [13] | 65 | byte
  | [14] | 84 | byte
  | [15] | 79 | byte
  | [16] | 61 | byte
  | [17] | 27 | byte
  | [18] | 36 | byte
  | [19] | 66 | byte
  | [20] | 64 | byte
  | [21] | 62 | byte
  | [22] | 64 | byte
  | [23] | 110 | byte
  | [24] | 27 | byte
  | [25] | 40 | byte
  | [26] | 66 | byte
  | [27] | 94 | byte
  | [28] | 27 | byte
  | [29] | 36 | byte
  | [30] | 66 | byte
  | [31] | 104 | byte
  | [32] | 61 | byte
  | [33] | 78 | byte
  | [34] | 36 | byte
  | [35] | 27 | byte
  | [36] | 40 | byte
  | [37] | 66 | byte
  | [38] | 61 | byte
  | [39] | 27 | byte
  | [40] | 36 | byte
  | [41] | 66 | byte
  | [42] | 37 | byte
  | [43] | 75 | byte
  | [44] | 37 | byte
  | [45] | 55 | byte
  | [46] | 37 | byte
  | [47] | 43 | byte
  | [48] | 37 | byte
  | [49] | 111 | byte
  | [50] | 27 | byte
  | [51] | 40 | byte
  | [52] | 66 | byte
  | [53] | 94 | byte
  | [54] | 27 | byte
  | [55] | 36 | byte
  | [56] | 66 | byte
  | [57] | 37 | byte
  | [58] | 106 | byte
  | [59] | 37 | byte
  | [60] | 53 | byte
  | [61] | 37 | byte
  | [62] | 72 | byte
  | [63] | 27 | byte
  | [64] | 40 | byte
  | [65] | 66 | byte

@mrbean-bremen
Copy link
Collaborator

I'm not sure yet if this behavior is expected or a bug (I may have another look some time later), but you could avoid this problem by using the code I showed above, e.g.:

var dataset = new DicomDataset();
result.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");

// assuming patientNameAllByte is of type byte[] and has the correct value
var patientName = new DicomPersonName(DicomTag.PatientName,
     dataset.GetEncodingsForSerialization(), new MemoryByteBuffer(patientNameAllByte));
dataset.Add(patientName);

@alpha-yanagida
Copy link
Author

Sorry for the late reply.

I tried the above code and was able to work around the problem.

Thanks for your time!

@alpha-yanagida
Copy link
Author

I'm not sure yet if this behavior is expected or a bug (I may have another look some time later), but you could avoid this problem by using the code I showed above, e.g.:

I will leave this issue as it may be investigated at a later date from the above.
If you do not need this issue, you can close it. Or I will close it.

@chen-wu
Copy link

chen-wu commented May 17, 2024

It seems the "dataset.Add" function cannot accept CJK string. If you are using some CJK characters in the string, it may not be encoded correctly.

    var raw = new byte[] {
        0x59, 0x61, 0x6d, 0x61, 0x64, 0x61, 0x5e, 0x54, 0x61, 0x72, 0x6f, 0x75, 0x3d, 0x1b, 0x24, 0x42, 0x3b, 0x33,
        0x45,
        0x44, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x42, 0x40, 0x4f, 0x3a, 0x1b, 0x28, 0x42, 0x3d, 0x1b, 0x24,
        0x42,
        0x24, 0x64, 0x24, 0x5e, 0x24, 0x40, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x24, 0x3f, 0x24, 0x6d, 0x24,
        0x26,
        0x1b, 0x28, 0x42
    }; // https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_H.3-1

    var ds1 = new DicomDataset();
    ds1.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");
    ds1.Add(DicomTag.PatientName, "Yamada^Tarou=山田^太郎=やまだ^たろう");
    
    var ds2 = new DicomDataset();
    ds2.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");
    var pn = new DicomPersonName(DicomTag.PatientName,
        DicomEncoding.GetEncodings(ds2.GetValues<string>(DicomTag.SpecificCharacterSet)),
        new MemoryByteBuffer(raw));
    ds2.Add(pn);
    
    var out1 = ds1.GetString(DicomTag.PatientName); // "Yamada^Tarou=山田^太郎=やまだ^たろう"
    var out2 = ds2.GetString(DicomTag.PatientName); // "Yamada^Tarou=山田^太郎=やまだ^たろう"
    
    var outbuf1 = ds1.GetValues<byte>(DicomTag.PatientName); // length of outbuf1 is 26
    var outbuf2 = ds2.GetValues<byte>(DicomTag.PatientName); // length of outbuf2 is 60

The ds2 is a correct way to add such DicomItem. However, the GetEncodingsForSerialization() seems to be an internal method. So, I replaced it to GetEncodings.
26 is exactly the length of "Yamada^Tarou=山田^太郎=やまだ^たろう" string. But it should be encoded as a 60 bytes array.

@mrbean-bremen
Copy link
Collaborator

@chen-wu - yes, this is exactly the described behavior, including the workaround, and the reason why this issue is still open.
I suspect that the problem has to do with the multi-valued encoding (e.g. only the first encoding is used, which is ASCII in this case).
And yes, I used an internal instead of the public method for the workaround, good catch.

@mrbean-bremen
Copy link
Collaborator

I just confirmed that saving with a single-value encoding works well, it only does not work with a multi-valued encoding.
Note that the problem only occurs with newly added data, that cannot be represented by the first encoding.

I just had forgotten that this is not implemented (and never has been), as I wrote a comment to that extent myself in the code in a PR I made some time ago...

@gofal
Copy link
Contributor

gofal commented May 18, 2024

patientNameAllByte = nameByte.ToArray();
patientName = patientNameAll; // patientNameAll is ***^=�$B@>@n�(B^�$Bh=N$�(B=�$B%K%7%+%o�(B^�$B%j%5%H�(B

// in MyDicomService file

var resultDataset = new DicomDataset();
resultDataset.AddOrUpdate(DicomTag.SpecificCharacterSet, "\ISO 2022 IR 87");
resultDataset.AddOrUpdate(DicomTag.PatientName, patientName);

The problem, that I see in your example is, that you have various strings that should be binary serialized using different encodings.
Then you use the various encodings to generate an array of bytes. But then you join them together and instead of passing this already serialized byte array to the DicomDataset, you convert it back into a .net string into the variable patientName. This might cause the error, because when the byte-array is converted to a string, then .net takes one encoding and decodes it into a string. And after passing this string to DicomDataset, then this string is again binary serialized using one encoding.

This is why you did not face any issue as soon as you pass the bytearray directly into the DicomDataset

@mrbean-bremen
Copy link
Collaborator

@gofal - that is true, but maybe the example by @chen-wu shows the problem better. If using a single value encoding that can encode the name (like GB18030 or UTF-8), this works without a problem.
The problem is that fo-dicom always uses the first encoding to encode the string, and ignores the rest - thus the comment in the code. This is something that I should have implemented at the time but didn't, probably because I wasn't sure how to do it.
If the whole string can be encoded using one of the encodings, this would be relatively easy - we could just set the fallback encoding to EncoderFallback.ExceptionFallback while encoding, and handling the exception (which will appear rarely, so this will probably not be a performance problem) by trying the other encodings. For PN though special handling is needed, as each component has to be encoded separately.
The code where the encoding should be changed inside the string is more complicated. I'm not sure how to do this with Encoding, but then I'm not sure if this a realistic use case at all.

@gofal
Copy link
Contributor

gofal commented May 18, 2024

The code where the binary data is added directly seems to solve the issue that no validation error is raised. But does it also work on the modality side? Is the modality able, to render the PatientName correctly?
https://dicom.nema.org/dicom/2013/output/chtml/part05/chapter_6.html Dicom defines that it is not sufficient, just to concanate the various binary streams of the various encodings, but it also has to contain the separator 0x1b followed by 2 bytes that tell the parser which codec to use. This ESC-separtor followed by the two bytes referencin the codec ISO 2022 IR 87 seems to be missing.

fo-dicom is able to decode text that is encoded with mutliple encodings. But I also highly recommend taht if someone creates new data to use an encoding that contains all used characters, e.g. UTF-8. I also cannot imagine a usecase, where this is not possible.

If we wanted to support encoding a text with various encodings, then this would require some API like StringBuilder, where you can add one text after the other and with each text also pass the encoding to use.
And those values should then be excluded from transcoding in case the SpecificCharacterSet of the DicomDataset changes.
Not a trivial task.

@mrbean-bremen
Copy link
Collaborator

Yes, I agree. I had a closer look, and there is a reason why I didn't implement this last time - it can get quite complicated.
It should be possible to implement this without manually adding the encoding (I did this in pydicom), but there are ways to workaround the problem, and the code to do this would be quite complicated, and also possibly slower.

So while I'm not really happy with the current state (especially because I did some implementation on the decoding side and should have also done the encoding), I think we can live with this for now.

@gofal
Copy link
Contributor

gofal commented May 19, 2024

Thank you! I could not find a issue about that encoding. Could you please open a new issue with label "Enhancement" about the feature of encodig a text with codeextensions? Then we could close this issue, but we have the feature in our backlog for future versions,

@mrbean-bremen
Copy link
Collaborator

Closing this issue after creating the spin-off, as discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants