Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default multipart form encoding uses noncompliant URLEncoder for tilde (~) #1966

Open
AlexanderNull opened this issue Oct 10, 2023 · 3 comments
Assignees

Comments

@AlexanderNull
Copy link

AlexanderNull commented Oct 10, 2023

Problem:
Following the examples in the docs for submitting form data using multipart method results in encoding that is not compliant with RFC 3986 specifically related to how the tilde (~) character is handled.
multipart("form", Map("some_key" -> "~")) results in a StringBody(some_key=%7E,utf-8,text/plain) form component generated where the tilde has been encoded into %7E. As per the latest spec on URL encoding, ~ is not a reserved character and should not be encoded.

It looks like this problem was partially tackled already within STTP as the internal UriCompatibility object has an additional encodeDNSHost method which uses a spec compliant Rfc3986.encode method. Unfortunately the multipart method uses the noncompliant encodeQuery method on UriCompatibility.

Is it possible to switch to using the Rfc3986 encoder instead of the URLEncoder as part of the multipart method's form handling? (edit: originally mentioned incorrect allowedCharacters set that would work here) Looks like a new character set would need to be defined within Rfc3986 object to get a fully correct allowedCharacters set. Want a PR for this?

@AlexanderNull
Copy link
Author

Disregard that edit note, the Rfc3986.Unreserved char set would be correct. I got confused because URLEncoder also incorrectly handles the * asterisk character by not encoding it even though it is a reserved subdelimiter character.

@adamw
Copy link
Member

adamw commented Oct 11, 2023

Yes, a PR would be great - though I'd like to include this change in sttp4 (so the PR would be to the master branch). If you could also add a test, that would be even better :)

@Pask423 Pask423 self-assigned this May 13, 2024
@Pask423
Copy link
Contributor

Pask423 commented May 21, 2024

Java URLEncoder has a following comment regarding encoding ~ .

mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
*
* It appears that both Netscape and Internet Explorer escape
* all special characters from this list with the exception
* of "-", "_", ".", "*". While it is not clear why they are
* escaping the other characters, perhaps it is safest to
* assume that there might be contexts in which the others
* are unsafe if not escaped. Therefore, we will use the same
* list. It is also noteworthy that this is consistent with
* O'Reilly's "HTML: The Definitive Guide" (page 164).

It seems that there may be some corner cases around parsing ~.
Although, they probably do not apply to multipart requests I wanted to share this to add some more context.
I think the we can safely use Rfc3986.encode for parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants