Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StandaloneWSClient.url(String) permits inconsistent states with query params #267

Open
htmldoug opened this issue Jul 31, 2018 · 5 comments

Comments

@htmldoug
Copy link
Contributor

htmldoug commented Jul 31, 2018

Summary

Behavior of query parameters in wsClient.url(String) is not well defined and permits inconsistent states. For example, wsClient.url("http://postman-echo.com/get?pipe=|").execute() successfully sends the request as http://postman-echo.com/get?pipe=%7C", but wsRequest.queryString returns an empty map, and wsRequest.uri throws a URISyntaxException.

This makes methods uri and queryString hard to reason about and use, since they may differ from what is actually sent or throw exceptions.

Behavior Example

/**
  * Demonstrates inconsistencies in [[StandaloneAhcWSClient.url()]].
  */
object UrlTest extends App {

  implicit val actorSystem = ActorSystem()
  implicit val materializer = ActorMaterializer()

  val wsClient = StandaloneAhcWSClient()
  val wsRequest = wsClient.url("http://postman-echo.com/get?pipe=|")
  val wsResponse = Await.result(wsRequest.execute(), 1.minute)
  println("Response body: " + wsResponse.body) // success!

  println("Request Query params: " + wsRequest.queryString) // empty, but the server disagrees.

  try {
    println("URI: " + wsRequest.uri) // throws! and yet, AHC figured it out somehow.
  } catch {
    case t: URISyntaxException => t.printStackTrace(System.out)
  }
  finally {
    wsClient.close()
    materializer.shutdown()
    actorSystem.terminate()
  }
}

Prints:

Response body: {"args":{"pipe":"|"},"headers":{"host":"postman-echo.com","accept":"*/*","user-agent":"AHC/2.0","x-forwarded-port":"80","x-forwarded-proto":"http"},"url":"http://postman-echo.com/get?pipe=%7C"}
Request Query params: Map()
java.net.URISyntaxException: Illegal character in query at index 33: http://postman-echo.com/get?pipe=|
	at java.net.URI$Parser.fail(URI.java:2848)
	at java.net.URI$Parser.checkChars(URI.java:3021)
	at java.net.URI$Parser.parseHierarchical(URI.java:3111)
	at java.net.URI$Parser.parse(URI.java:3053)
	at java.net.URI.<init>(URI.java:588)
	at play.api.libs.ws.ahc.StandaloneAhcWSRequest.uri$lzycompute(StandaloneAhcWSRequest.scala:59)
	at play.api.libs.ws.ahc.StandaloneAhcWSRequest.uri(StandaloneAhcWSRequest.scala:52)
	at play.api.libs.UrlTest$.delayedEndpoint$play$api$libs$UrlTest$1(UrlTest.scala:29)
	at play.api.libs.UrlTest$delayedInit$body.apply(UrlTest.scala:16)
	at scala.Function0.apply$mcV$sp(Function0.scala:34)
	at scala.Function0.apply$mcV$sp$(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at scala.App.$anonfun$main$1$adapted(App.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:389)
	at scala.App.main(App.scala:76)
	at scala.App.main$(App.scala:74)
	at play.api.libs.UrlTest$.main(UrlTest.scala:16)
	at play.api.libs.UrlTest.main(UrlTest.scala)

Wireshark sees:

GET /get?pipe=%7C HTTP/1.1
Host: postman-echo.com
Accept: */*
User-Agent: AHC/2.0

Observations

The scaladoc is unclear. "url" implies https://tools.ietf.org/html/rfc1738, and yet "base URL" implies maybe not?

/**
 * Generates a request.  Throws IllegalArgumentException if the URL is invalid.
 *
 * @param url The base URL to make HTTP requests to.
 * @return a request
 */
@throws[IllegalArgumentException]
def url(url: String): StandaloneWSRequest

Tolerant or Strict?

I think we'd benefit from committing to one of these two approaches and documenting.

Tolerant

The behavior of org.asynchttpclient.util.UriEncoder.FIXING could be pulled up to StandaloneAhcWSClient.url(String) to create something sane, parsing query params, perhaps decoding the values, and putting them in the queryString field. From there, lazy val uri: URI can figure it out and queryString would reflect reality.

Downside is that StandaloneWSRequest.copy(url = "") exists and would still allow for inconsistent states. At least it'd be better.

Scaladoc would be updated to indicate that the URI parameter means https://tools.ietf.org/html/rfc1738 and tolerantly accepts query string values either encoded or in plaintext.

Strict

StandaloneAhcWSRequest would throw if query params are present in url, which could be renamed baseUrl: String, and defined as "everything up to the query params".

A consistency oriented implementation could add require(uri.getQuery == null) which would force parsing of the `java.net.URI. This is a bit heavy to do every method call of the builder.

Alternatively, we could move this check into StandaloneAhcWSClient.validate().

Scaladoc would be updated to indicate that query params MUST NOT be part of the "baseUrl".

Workarounds

In my current production Play 2.5 services, I'm defensively decorating WSClient to intercept the WSRequest prior to execution, and running it through UriEncoder.FIXING just as AHC does under the hood, and moving any query params into the WSRequest.queryString field.

It's kludgy without #264. I ended up requiring a WSClient and just rebuilding the whole request from scratch. I'd love for this to be handled upfront by the library so I don't have to second guess every request.

Relates to playframework/playframework#7444.

@htmldoug htmldoug mentioned this issue Jul 31, 2018
7 tasks
@htmldoug
Copy link
Contributor Author

@gmethvin & @marcospereira, I could use your guidance here. Would be nice to nail this down before v2.

@marcospereira
Copy link
Member

Hi @htmldoug,

Thanks for the detailed and well-thought issue.

Right now, I think the tolerant approach is the recommended one. There are some other expectations that we can do around the API:

  1. We can elaborate that queryString is actually used to define additional query strings, besides the ones already present at the url. Changing the name to something like additionalQueryString can make it clear that it is not related to what we have in the url.
  2. uri should be consistent with execute and validate. If it is possible to run the request, than it should be possible to call uri.

Ideally, these changes should add comprehensive tests to ensure they are consistent.

Downside is that StandaloneWSRequest.copy(url = "") exists and would still allow for inconsistent states. At least it'd be better.

We can have a method like this:

def copy(url: String)

That does all the work necessary to ensure things are consistent.

@htmldoug
Copy link
Contributor Author

htmldoug commented Sep 7, 2018

Thanks for the direction. I'd be interested in submitting a PR for the tolerant route, having written one already around my existing use of the play25 WSClient. Implementation is here: https://gist.github.com/htmldoug/38cfa714b33123c6695ec272cfcae6c1 Before proceeding, I'd like for you to witness how gnarly it is and confirm that you really want it. Feel free to mark it up with comments for any changes you'd like to see.

Overloading copy(url: String) can still be bypassed with req.copy(url = "/pipe=|", method = "POST"). There would also be no validation if you call StandaloneAhcWSRequest.apply() directly. Seems like a reasonable escape hatch to leave open anyway for performance-sensitive extreme cases. Now that we have a happy path of withUrl(String) from #268 (which would apply validation/fixing), I think that's fine. (Thank you for merging, btw!)

Note that one consequence of removing the query params from the url: String is that the url value you get back may be different from the one you put in.

val baseUrl: String = "http://example.com/path?a=1"
ws.url(baseUrl).url == baseUrl // false. yields "http://example.com/path" perhaps surprising.
ws.queryString("a") // this would work now though. yay!

Overall, I think this is a sensible way to go. Just give me the green light and I'll put up a PR.

@htmldoug htmldoug mentioned this issue Oct 29, 2018
7 tasks
@wsargent
Copy link
Member

wsargent commented Dec 17, 2018

On the meaning of URL -- URL parsing has been a mixed bag overall. RFC 1738 is effectively obsoleted by RFC 3986 and RFC 7320 and so using "URI" over "URL" would be more accurate in general.

@wsargent
Copy link
Member

wsargent commented Dec 17, 2018

There are a couple of posts that talk about parsing URIs:

There are external libraries that will augment the functionality of WS: galimatias validates URLs, and scala-url-builder and url-builder will build up guaranteed valid URLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants