Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internationalization support #146

Open
malaterre opened this issue May 11, 2020 · 12 comments
Open

Internationalization support #146

malaterre opened this issue May 11, 2020 · 12 comments

Comments

@malaterre
Copy link
Contributor

malaterre commented May 11, 2020

I cannot find anything related to internationalization support neither in the documentation nor in the code itself. What is the status of internationalization support in dicomParser ?

I have not checked but I suspect some string won't play well with JSON which is limited to UTF-8 /UTF-16/UTF-32 strings.

Ref:

@malaterre
Copy link
Contributor Author

malaterre commented May 11, 2020

Here is what I get using the DICOM Dump to JSON live example:

image

while it should look like:

image

ref:

@malaterre
Copy link
Contributor Author

malaterre commented May 11, 2020

readFixedString does not seems to check the value for SpecificCharacterSet (0008,0005) as seen at:

  • for (var i = 0; i < length; i++) {
    byte = byteArray[position + i];
    if (byte === 0) {
    position += length;
    return result;
    }
    result += String.fromCharCode(byte);
    }

refs:

@yagni
Copy link
Collaborator

yagni commented May 11, 2020

@malaterre Currently dicomParser itself doesn't do any character set decoding. However, if you need it now, you can pair dicomParser with the dicom-character-set library that I wrote.

@malaterre
Copy link
Contributor Author

malaterre commented May 11, 2020

@yagni That looks pretty promising. One thing I still fail to understand. The original string function from dicomParser seems to be doing the following:

  1. Take raw byte
  2. Consider it as ISO-8859-1 character, turn it into UTF-16
  3. Return a truncated string (stop at first byte === 0)

So I am wondering what is the expected input to your library ? Can I pass directly the output of string element function ?

@chafey
Copy link
Collaborator

chafey commented May 11, 2020

I didn't have any experience with DICOM character sets so didn't factor it into the original design. I like the idea of putting it in a separate library like @yagni did so it can be added in for those that need it. I specifically didn't add image decompression to this library for the same reason.

@malaterre
Copy link
Contributor Author

@chafey I am pretty sure that this test is just wrong:

if (byte === 0) {

This feel like a c-string ASCII ending. I am sure we can have byte===0 in unicode (we should only rely on the length).

@yagni
Copy link
Collaborator

yagni commented May 11, 2020

@malaterre You'll need to pass the raw bytes to dicom-character-set. If I remember correctly, fromCharCode converts it to UTF so you end up with bytes not in the original data. So just slice the byteArray starting at the element's dataOffset and going for its length number of bytes, then pass that into dicom-character-set, along with the Specific Character Set and optional VR (see the readme for more details).

@malaterre
Copy link
Contributor Author

@yagni Thanks for the confirmation. @chafey it would be nice to document what string is actually doing. I hope the next version will offer a function rawString, that would be clearer (IMHO).

@chafey
Copy link
Collaborator

chafey commented May 11, 2020

It probably makes sense to revisit the whole repo in light of non ascii character sets, lots of code is using this library now and we should not be propogating designs with are not character set aware

@creemer
Copy link

creemer commented Nov 28, 2022

Hello!
Are there some new info about this feature?
Or, maybe someone, can help to understand, how to get the raw data from tag, and I can parse it by my self... ?
For example, how can i get this binary data from "x00100010"?
Screenshot 2022-11-28 at 12 34 09

Thanks alot!

@yagni
Copy link
Collaborator

yagni commented Nov 28, 2022

@creemer To get the raw data, create a Uint8Array at the data offset of the element (like we do in the readme) of the appropriate length:

const patientNameElement = dataSet.elements.x00100010;
const patientNameBytes = new Uint8Array(dataSet.byteArray.buffer, patientNameElement.dataOffset, patientNameElement.length);

If you don't want to parse those bytes yourself at this point, you can pass them, along with the value of the Specific Character Set element, to my dicom-character-set library:

import { convertBytes } from 'dicom-character-set';
const str = convertBytes(dataSet.string('x00080005'), patientNameBytes, {vr: 'PN'});

@creemer
Copy link

creemer commented Nov 29, 2022

@yagni Thanks a lot! It is all I need :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants