Struct.UTFString.get() fails for UTF-16 #30

blschatz · 2014-11-19T00:09:01Z

This fails due to the underlying call to IO.getZeroTerminatedByteArray - this should really be looking for double nulls not single nulls for wide Charsets.

headius · 2015-04-23T19:17:45Z

This should probably be using Java's charset logic to decode. Will investigate.

headius · 2015-04-23T19:57:07Z

Ahh I see, it's just looking for the nulls to peel them off. Will see what I can do.

headius · 2015-04-23T21:00:26Z

Ok, I understand now.

getZeroTerminatedByteArray is used to return the bytes of a string sans the null terminator. It does this by taking the given string address and calling strlen on it. strlen only looks for \0, and then that length is used to allocate and populate a Java byte array.

This would be a problem if there's any embedded null bytes, which is obviously a problem for UTF-16 in ASCII range.

This is going to be a much more difficult fix, since the actual strlen call happens inside native code. Whenever we change native code, we need to rebuild the native stubs across platforms.

I'm also not sure that just changing strlen is the right fix. These functions have no way of knowing what encoding the bytes are in.

Here's what I think we should do:

As a workaround, you could work with the strings as bytes and deal with the nulls yourself. Not ideal, I know.
Add a second version of this logic that takes either an encoding or an explicit terminator to look for, along the lines of getTerminatedByteArray(addr, [terminator|encoding]).
Finally figure out how to set up VMs for all the platforms we support, so we can more easily update the native bits (ping @tduehr).

blschatz · 2015-04-27T06:16:36Z

My fix was as follows:

public class UTF16String extends String {

public UTF16String(int length, Charset cs) {
        super(length * 8, 8, length, cs); 

    }
    protected jnr.ffi.Pointer getStringMemory() {
        return getMemory().slice(offset(), length());
    }

    public final void set(java.lang.String value) {
        getStringMemory().putString(0, value, length, charset);
    }

    public final java.lang.String get() {
        jnr.ffi.Pointer memory = getStringMemory();
        byte[] bytes = new byte[length];
        memory.get(0, bytes, 0, length);

        // find the null terminator first
        int nullPos = bytes.length;
        for (int i=0; i< nullPos ; i+=2) {
            if (bytes[i] == 0 && bytes[i+1] == 0) {
                nullPos = i;
                break;
            }
        }
        CharBuffer res = charset.decode(ByteBuffer.wrap(bytes, 0, nullPos));
        return res.toString();
    }

}

headius · 2016-09-26T18:57:01Z

@blschatz Possible for you to turn that into a pull request we can integrate? I'm not sure how you're using that within jnr-ffi and your own code (i.e. I'd like to see some examples and ideally tests in a PR).

DirectMemoryIO.getString() fails for non UTF-8

pepijnve mentioned this issue Sep 3, 2018

DirectMemoryIO#getString(long, int, java.nio.charset.Charset) is broken for multi-byte encodings #166

Closed

demon36 added a commit to demon36/jnr-ffi that referenced this issue Jul 6, 2021

fix jnr#30

a88b8a6

DirectMemoryIO.getString() fails for non UTF-8

demon36 linked a pull request Jul 6, 2021 that will close this issue

fix #30 #254

Open

demon36 mentioned this issue Aug 30, 2021

add support for varying width null terminator for (get/put)ZeroTerminatedByteArray() jnr/jffi#112

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Struct.UTFString.get() fails for UTF-16 #30

Struct.UTFString.get() fails for UTF-16 #30

blschatz commented Nov 19, 2014

headius commented Apr 23, 2015

headius commented Apr 23, 2015

headius commented Apr 23, 2015

blschatz commented Apr 27, 2015

headius commented Sep 26, 2016

Struct.UTFString.get() fails for UTF-16 #30

Struct.UTFString.get() fails for UTF-16 #30

Comments

blschatz commented Nov 19, 2014

headius commented Apr 23, 2015

headius commented Apr 23, 2015

headius commented Apr 23, 2015

blschatz commented Apr 27, 2015

headius commented Sep 26, 2016