Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse percent encoded query parameter using a different charset than UTF-8 #468

Open
conet opened this issue Oct 4, 2022 · 3 comments
Open
Assignees

Comments

@conet
Copy link

conet commented Oct 4, 2022

I'm trying to parse an "ISO-8859-1" encoded relative URL, I can't seem to get the proper UTF-8 string out of this value:

scala> val v = RelativeUrl.parse("/path?param=r%F3n")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%3Fn

scala> v.query.param("param")
val res9: Option[String] = Some(r�n)

scala> v.toStringRaw
val res10: String = /path?param=r?n

The param value should be rón. So I tried this example but I can seem to get bidirectionally work properly for a custom charset, for UTF-8 it works:

scala> import io.lemonlabs.uri.RelativeUrl
import io.lemonlabs.uri.RelativeUrl

scala> import io.lemonlabs.uri.config.UriConfig
import io.lemonlabs.uri.config.UriConfig

scala> val v1 = RelativeUrl.parse("/uris-in-scala.html?chinese=网址")(UriConfig(charset = "GB2312"))
val v1: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%CD%F8%D6%B7

scala> val v2 = RelativeUrl.parse(v1.toString)(UriConfig(charset = "GB2312"))
val v2: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%3F%3F%3F

scala> val v3 = RelativeUrl.parse("/uris-in-scala.html?chinese=网址")
val v3: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%E7%BD%91%E5%9D%80

scala> val v4 = RelativeUrl.parse(v3.toString)
val v4: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%E7%BD%91%E5%9D%80

I'm trying bidirectionally because it's related to what I need. v2 and v3 are the same which makes sense but v1 and v2 are not, am I missing something? What is the proper way to parse an encoded representation that was encoded using a custom character set?

@conet
Copy link
Author

conet commented Oct 4, 2022

What I'm saying is that it works properly this way:

scala> val v = RelativeUrl.parse("/path?param=rón")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%F3n

scala> v.toString
val res11: String = /path?param=r%F3n

scala> v.toStringRaw
val res12: String = /path?param=rón

But not the other way around:

scala> val v = RelativeUrl.parse("/path?param=r%F3n")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%3Fn

scala> v.toString
val res13: String = /path?param=r%3Fn

scala>  v.toStringRaw
val res14: String = /path?param=r?n

@conet
Copy link
Author

conet commented Oct 4, 2022

OK, I think I found a workaround based on PercentDecoder where the UTF-8 hardcoding takes place:

val queryDecoder = PercentDecoder

new String(queryDecoder.decodeBytes("/path?param=r%F3n", "ISO-8859-1"), "ISO-8859-1")
val res21: String = /path?param=rón

I can use this as an input to RelativeUrl.parse

@theon
Copy link
Member

theon commented Oct 4, 2022

Thanks for raising and figuring out where the shortcoming is 🙇‍♂️

I'm thinking we should make the PercentDecoder charset configurable

fedorovar added a commit to ActianCorp/scala-uri that referenced this issue Apr 23, 2024
FLimburg pushed a commit to ActianCorp/scala-uri that referenced this issue Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants