Internationalization support #146

malaterre · 2020-05-11T09:07:42Z

I cannot find anything related to internationalization support neither in the documentation nor in the code itself. What is the status of internationalization support in dicomParser ?

I have not checked but I suspect some string won't play well with JSON which is limited to UTF-8 /UTF-16/UTF-32 strings.

Ref:

Table D.6.2-1. Supported Specific Character Set Defined Terms

malaterre · 2020-05-11T09:25:27Z

Here is what I get using the DICOM Dump to JSON live example:

while it should look like:

ref:

Internationalized character set test DICOM images

malaterre · 2020-05-11T14:00:45Z

readFixedString does not seems to check the value for SpecificCharacterSet (0008,0005) as seen at:

dicomParser/src/byteArrayParser.js

Lines 29 to 37 in 82573d9

    
           for (var i = 0; i < length; i++) { 
        
             byte = byteArray[position + i]; 
        
             if (byte === 0) { 
        
               position += length; 
        
               return result; 
        
             } 
        
             result += String.fromCharCode(byte); 
        
           }

refs:

yagni · 2020-05-11T14:17:17Z

@malaterre Currently dicomParser itself doesn't do any character set decoding. However, if you need it now, you can pair dicomParser with the dicom-character-set library that I wrote.

malaterre · 2020-05-11T14:21:04Z

@yagni That looks pretty promising. One thing I still fail to understand. The original string function from dicomParser seems to be doing the following:

Take raw byte
Consider it as ISO-8859-1 character, turn it into UTF-16
Return a truncated string (stop at first byte === 0)

So I am wondering what is the expected input to your library ? Can I pass directly the output of string element function ?

chafey · 2020-05-11T14:23:09Z

I didn't have any experience with DICOM character sets so didn't factor it into the original design. I like the idea of putting it in a separate library like @yagni did so it can be added in for those that need it. I specifically didn't add image decompression to this library for the same reason.

malaterre · 2020-05-11T14:25:36Z

@chafey I am pretty sure that this test is just wrong:

if (byte === 0) {

This feel like a c-string ASCII ending. I am sure we can have byte===0 in unicode (we should only rely on the length).

yagni · 2020-05-11T14:26:47Z

@malaterre You'll need to pass the raw bytes to dicom-character-set. If I remember correctly, fromCharCode converts it to UTF so you end up with bytes not in the original data. So just slice the byteArray starting at the element's dataOffset and going for its length number of bytes, then pass that into dicom-character-set, along with the Specific Character Set and optional VR (see the readme for more details).

malaterre · 2020-05-11T14:27:50Z

@yagni Thanks for the confirmation. @chafey it would be nice to document what string is actually doing. I hope the next version will offer a function rawString, that would be clearer (IMHO).

chafey · 2020-05-11T14:38:14Z

It probably makes sense to revisit the whole repo in light of non ascii character sets, lots of code is using this library now and we should not be propogating designs with are not character set aware

creemer · 2022-11-28T09:35:06Z

Hello!
Are there some new info about this feature?
Or, maybe someone, can help to understand, how to get the raw data from tag, and I can parse it by my self... ?
For example, how can i get this binary data from "x00100010"?

Thanks alot!

yagni · 2022-11-28T23:22:01Z

@creemer To get the raw data, create a Uint8Array at the data offset of the element (like we do in the readme) of the appropriate length:

const patientNameElement = dataSet.elements.x00100010;
const patientNameBytes = new Uint8Array(dataSet.byteArray.buffer, patientNameElement.dataOffset, patientNameElement.length);

If you don't want to parse those bytes yourself at this point, you can pass them, along with the value of the Specific Character Set element, to my dicom-character-set library:

import { convertBytes } from 'dicom-character-set';
const str = convertBytes(dataSet.string('x00080005'), patientNameBytes, {vr: 'PN'});

creemer · 2022-11-29T09:21:49Z

@yagni Thanks a lot! It is all I need :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internationalization support #146

Internationalization support #146

malaterre commented May 11, 2020 •

edited

malaterre commented May 11, 2020 •

edited

malaterre commented May 11, 2020 •

edited

yagni commented May 11, 2020

malaterre commented May 11, 2020 •

edited

chafey commented May 11, 2020

malaterre commented May 11, 2020

yagni commented May 11, 2020

malaterre commented May 11, 2020

chafey commented May 11, 2020

creemer commented Nov 28, 2022

yagni commented Nov 28, 2022

creemer commented Nov 29, 2022

Internationalization support #146

Internationalization support #146

Comments

malaterre commented May 11, 2020 • edited

malaterre commented May 11, 2020 • edited

malaterre commented May 11, 2020 • edited

yagni commented May 11, 2020

malaterre commented May 11, 2020 • edited

chafey commented May 11, 2020

malaterre commented May 11, 2020

yagni commented May 11, 2020

malaterre commented May 11, 2020

chafey commented May 11, 2020

creemer commented Nov 28, 2022

yagni commented Nov 28, 2022

creemer commented Nov 29, 2022

malaterre commented May 11, 2020 •

edited

malaterre commented May 11, 2020 •

edited

malaterre commented May 11, 2020 •

edited

malaterre commented May 11, 2020 •

edited