Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 MIME variant does not ignore white space chars as per RFC2045 #414

Closed
tmoschou opened this issue Oct 11, 2017 · 3 comments
Closed

Comments

@tmoschou
Copy link

Hello,

The Base64 Mime variant is non compliant as per RFC2045 Section 6.8. Base64 Content-Transfer-Encoding, which specifies

Any characters outside of the base64 alphabet are to be ignored in
base64-encoded data.

With this regard, any character including newline characters and whitespace should not be required to fall on 4 character boundaries.

Sample test case that demonstrates the issues (Note I'm using the Yaml dataformat, since base64 mime encoded documents in json strings with (escaped) newlines seems completely broken regardless of whether newlines fall on 4 char boundary or not).

Input document from "Example 2.23. Various Explicit Tags" from spec (1.1 and 1.2) - http://www.yaml.org/spec/1.2/spec.html

---
picture: !!binary |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=
@Test
public void testBinaryDecoder() throws IOException {
    final ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
    mapper.setBase64Variant(Base64Variants.MIME);
    try (final InputStream inputStream = getClass().getResourceAsStream("yaml1.3-example2.23.yaml")) {
        final JsonNode bean = mapper.readTree(inputStream);
        final JsonNode picture = bean.get("picture");
        java.util.Base64.getMimeDecoder().decode(picture.asText()); // works fine
        final byte[] gif = picture.binaryValue(); // fails
        assertEquals(65, gif.length);
        final byte[] actualFileHeader = Arrays.copyOfRange(gif, 0, 6);
        final byte[] expectedFileHeader = new byte[]{'G', 'I', 'F', '8', '9', 'a'};
        assertArrayEquals(expectedFileHeader, actualFileHeader);
    }
}

Note that java.util.Base64.getMimeDecoder() has no issues decoding this document.

Also note that technically speaking the YAML codec should be not using the MIME decoder but another variant, not covered in core. http://yaml.org/type/binary.html specifies that

Binary data is serialized using the base64 format as defined by RFC2045 (MIME), with the following notes:

  • The content is not restricted to lines of 76 characters or less.
  • Characters other than the base64 alphabet, line breaks and white space are considered an error.

However the current API does not allow specifying such a custom variant as far as I am aware. This should not have any baring on the test case provided (since the document only consists on the base64 alphabet/whitespace and the document lines are not more than 76 characters).

@cowtowncoder
Copy link
Member

Hmmh. That seems kind of .... wrong to me -- simply ignoring non-alphabet characters would be very likely to simply mask legitimate issues without helping inter-operability... I can understand whitespace being exception.
But at the same time following specifications is important.
I'll have to think about this question; and in all likelihood change would need to be:

  1. Go in 3.0, not 2.9 as this is major behavioral difference
  2. Possibly have matching JsonParser.Feature to allow failure (but default to permissive, specification-defined approach)

The other question wrt YAML is something we could perhaps tackle more easily by allowing use of different default Base64 encoding. Would one of Base64Variants be closer to what YAML expects?
If so, could you please file a separate issue under jackson-dataformats-text for changing defaults that YAML uses?

@cowtowncoder
Copy link
Member

Quick note: fixes in YAML module (to accept type tag on read, write one when serializing) also includes change to use SnakeYAML's codec so this will be resolved with YAML, to the degree that it will skip whitespace characters.

As to JSON, I am sort of torn between allowing inclusion of white space for indentation and not, but just because there is no way to actually enclose unescaped linefeeds in String values, and as such indentation does not make much sense there. So although you can certainly include white space, there isn't much reason to do so: output will not look any better.

I will leave this issue open, at any rate, to let other comments be added as necessary.

@cowtowncoder
Copy link
Member

Ah. Actually, I think I may know what one problem here is -- as-is, code does not deal properly with trailing white space. So, for specific case of YAML, it seems that there is often a single trailing linefeed (whether it should be there or not I don't know, wrt yaml content model, but it is there....).
So I'll add handling of this first, see how much it helps.

Also note related issue: FasterXML/jackson-dataformats-text#62

@cowtowncoder cowtowncoder changed the title Base64 MIME variant does not ignore chars outside alphabet as per RFC2045 Base64 MIME variant does not ignore white space chars as per RFC2045 Dec 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants