[Question] Invalid input data handling ? #27

mayeut · 2016-10-21T21:33:59Z

I'm not sure how "invalid" input data shall be processed. Could this be clarified ?

A few samples using base64_decode, what's the expected output for those ?

"Zm9vYg=" decodes successfully though it's not properly terminated
"Zm9vYg" decodes successfully though it's not properly terminated
"Zm9vY" decodes successfully though it misses data
"Zm9vYmF=Zm9v" decodes successfully though it contains too much data or corruption occurred.

This could be extended to streaming API, but I thought I'd start with the simple block decode.

The text was updated successfully, but these errors were encountered:

aklomp · 2016-10-21T21:44:10Z

The compliance baseline for this library should be the relevant RFC 4648. I never liked Postel's law much ("be strict in what you send, and liberal in what you accept"). Being liberal just invites abuse. All your examples should throw an error when they violate the preconditions.

mayeut · 2016-10-21T22:07:15Z

I'll check the RFC 4648and try to make a PR to update the test suite and library to handle invalid input properly.

aklomp · 2016-10-21T22:15:03Z

Thanks! Let's have a library that can at least strictly validate. If at any later moment we want to relax the validation, we could create a flag for it.

(Personally, I think that any string that violates the base64 spec as given in the RFC, is not a valid base64 string, and should not be given the benefit of the doubt.)

aklomp · 2016-10-21T22:30:48Z

Maybe one comment: it's quite common for base64 input to contain whitespace (newlines and such), and it would be helpful to have a flag to tell the library to ignore it. This does not negate the fact that a bona-fide base64 string with such characters is invalid, but it would indicate that it would be useful to have a flag that tells the library to strip out those characters before processing. (See also #15).

mayeut · 2016-10-23T19:00:01Z

So, not much else than what you're saying after reviewing RFC 4648.

First step, report all violations.
Second step, allow whitespace. See #33

mayeut · 2016-11-06T10:40:24Z

For now, I added validation for base64_decode function only.
The streaming implementation would require a final step (as can be seen in validation for base64_decode). This requires adding a new function in the API or asking integrators to do the same check as what's done for base64_decode.
I'm more in favor of adding a new API function. Could be something like:

int
base64_stream_decode_final (struct base64_state *state)
{
    state->eof = BASE64_EOF;
    if (state->bytes == 0) {
        return 1;
    }
    return 0;
}

aklomp · 2016-11-06T17:18:10Z

Hm, I'd like to avoid changing the API if possible. Your suggested addition basically checks if there are no trailing characters in the decode buffer. I'm not sure I consider that something that we must intercept and note. My attitude would be "as long as we honour the EOF marker and output no further bytes, who cares about trailing bytes". As far as decoding is concerned, those bytes are neither here or there, and it seems pedantic to point out to the user that his buffer length calculation is "optimistic".

What do you think? Maybe the right way is to be pedantic and technically correct.

mayeut · 2016-11-06T17:37:46Z

Well, if I quote what you said in an earlier comment:

The compliance baseline for this library should be the relevant RFC 4648. I never liked Postel's law much ("be strict in what you send, and liberal in what you accept"). Being liberal just invites abuse. All your examples should throw an error when they violate the preconditions.

I'd say that the library should throw an error even when there are trailing bytes remaining. For sure, it shall be the case for the non streaming API. Since there's no API to get the number of unread bytes, I'd say the library shall return an error in this case (it might be possible to compute this with this version but adding whitespace character stripping will remove that possibility).

I did not understand this:

As far as decoding is concerned, those bytes are neither here or there, and it seems pedantic to point out to the user that his buffer length calculation is "optimistic".

The only thing that will be pointed out to the user is that he's feeding too much (or not enough) data in the decoder.

The API change might be considered an enhancement only. Backward compatibility is still here. If the user wants a stricter behavior then, he can integrate the new function in his code.

aklomp · 2016-11-06T18:27:22Z

The viewpoint changes depending on how you conceptualize the buffer. If you define it as being strictly a Base64-encoded string and nothing more, you get a different viewpoint than when you define it as a buffer that contains a properly-terminated Base64-encoded string starting at offset 0.

In the first case trailing bytes are an error, while in the second case trailing bytes after a valid string are simply irrelevant (which is what I meant with "neither here nor there": they don't matter).

I've been holding more or less the second view of the buffer, whereas you are closer to the first. I don't think my view contradicts what I said earlier about violating the preconditions, because the preconditions are defined for the Base64-encoded string itself and not the buffer it's embedded in. Of your four examples, I believe the first three should throw errors, whereas the fourth is probably OK (but no bytes should be output after seeing the EOF marker).

Concretely, I've assumed that it's not an error to pass an oversized buffer (of "optimistic" length :), as long as the Base64 string starting at offset 0 is valid.

Maybe a stricter check would be a good idea, though. I do still think that it's better to be strict than liberal in these cases.

mayeut · 2016-11-07T21:10:22Z

OK, I see what you meant now.
What bothers me with that viewpoint is that, if I'm not mistaken or missing something else, it's only valid for "terminated" Base64-encoded string. That is, for Base64-encoded strings that are multiple of 4 characters, what remains will be decoded (or return an error at some point).
Given what I just said, I don't know what are the real use cases for using an "optimistic" length. It's seems to be error prone more than anything else. Still, I might be missing something.

mayeut changed the title ~~Invalid input data handling~~ [Question] Invalid input data handling ? Oct 21, 2016

mayeut mentioned this issue Nov 6, 2016

Fix invalid input data handling for base64_decode #32

Merged

mayeut mentioned this issue Nov 6, 2016

Whitespace character filtering #33

Open

BurningEnlightenment mentioned this issue Apr 5, 2017

Add constant-time Base64 encoding and decoding jedisct1/libsodium#524

Closed

aklomp added the question label Nov 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Invalid input data handling ? #27

[Question] Invalid input data handling ? #27

mayeut commented Oct 21, 2016

aklomp commented Oct 21, 2016

mayeut commented Oct 21, 2016 •

edited

aklomp commented Oct 21, 2016

aklomp commented Oct 21, 2016

mayeut commented Oct 23, 2016 •

edited

mayeut commented Nov 6, 2016

aklomp commented Nov 6, 2016

mayeut commented Nov 6, 2016

aklomp commented Nov 6, 2016

mayeut commented Nov 7, 2016

[Question] Invalid input data handling ? #27

[Question] Invalid input data handling ? #27

Comments

mayeut commented Oct 21, 2016

aklomp commented Oct 21, 2016

mayeut commented Oct 21, 2016 • edited

aklomp commented Oct 21, 2016

aklomp commented Oct 21, 2016

mayeut commented Oct 23, 2016 • edited

mayeut commented Nov 6, 2016

aklomp commented Nov 6, 2016

mayeut commented Nov 6, 2016

aklomp commented Nov 6, 2016

mayeut commented Nov 7, 2016

mayeut commented Oct 21, 2016 •

edited

mayeut commented Oct 23, 2016 •

edited