Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get.d: explain separators used in URL #13706

Closed

Conversation

pszlazak
Copy link
Contributor

Prior to 7.86.0 symbol '?' could be used again as a separator in some edge cases.
This behavior was fixed by b82eb72 (precedes 7.86.0).

This is NOK example for curl 7.74.0 (? used twice as a separator):

$ curl --silent --get --data key=value 'https://httpbin.org/get?foo/bar=baz' -v
[...]
> GET /get?foo/bar=baz?key=value HTTP/1.1
[...]

This is OK example for curl 8.4.0 (& properly used in front of key=value):

$ curl --silent --get --data key=value 'https://httpbin.org/get?foo/bar=baz' -v
[...]
> GET /get?foo/bar=baz&key=value HTTP/1.1
[...]

The problem was very edge case. I managed to reproduce it only when using / (forward slash) in name or value of data provided in URL.

I tried to update documentation to:

  • briefly explain that sometimes ? is to be expected as a separator and sometimes &
  • mark that there were edge cases that before 7.86.0 curl could use ? instead of &. But without going into details when exactly to not overwhelm user.

Prior to 7.86.0 symbol '?' could be used again as a separator in some edge cases.
This was fixed by b82eb72 (precedes 7.86.0).

Closes curl#13706
@pszlazak pszlazak force-pushed the documentation-update-for-get-option branch from baf629d to 86fd3e8 Compare May 19, 2024 16:00
@@ -22,7 +22,11 @@ Example:
When used, this option makes all data specified with --data, --data-binary
or --data-urlencode to be used in an HTTP GET request instead of the POST
request that otherwise would be used. The data is appended to the URL
with a '?' separator.
with a '?' as a separator. Or with '&' if more data is provided (check
--data for details).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true.? is a query separator. The ampersand (&) is inserted between multiple data strings provided on the command line. Different things.

Copy link
Contributor Author

@pszlazak pszlazak May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to point that curl will use proper separator depending on number of data:

  • ? if only one data is to be appended to URL and there is no other data in URL
  • ? and/or & if more data is to be appended or there is already some data in URL. & can be considered as separator between two data fields in query string.

I'm writing ? and/or & because of following cases:

  1. User already provided data in URL, so curl only used &
$ curl --silent --get --data key=value 'https://httpbin.org/get?foo=bar' -v
[...]
> GET /get?foo=bar&key=value HTTP/1.1
[...]
  1. User provided all data as command line options, so curl used both ? (to separate query string from main part of URL) and & (to separate data fields):
$ curl --silent --get --data key=value --data foo=bar 'https://httpbin.org/get' -v
[...]
> GET /get?key=value&foo=bar HTTP/1.1
[...]

My colleague was afraid that ? will be used always as per current statement:

The data is appended to the URL with a '?' separator.

Maybe it's not worth mentioning those separators at all? curl will simply do the proper thing 😉

Copy link
Member

@bagder bagder May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ampersand (&) is used as a separator between multiple parts of query (actually POST) pieces. That is described in section for --data.

When all of that data is added to the URL, it is put as a query component. Query components are always provided in the URL with a ? separator. This latter part is what --get does, and that is why it is added in the documentation for --get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bagder!
I pushed new commit with simplified description of --get / -G option.

Why I wanted update in original description? With colleague we were preparing a command to send some data to API endpoint. I prepared it working on 8.4.0 and it worked. He was using some older version of curl (prior to 7.86.0) and for him same command was not working. Command was something like this:

$ cat file.txt | curl --get --data-urlencode key@- 'http://example.com?foo=bar&data=leading/to/corner/case'

He was debugging and found that for him final URL is:

GET /?foo=bar&data=leading/to/corner/case?key=world%0A HTTP/1.1
     ^                                   ^

He checked the documentation and found:

The data is appended to the URL with a '?' separator.

So he assumed that whole idea is wrong, as curl will always use ?.

Of course I was surprised that it does not work for him, so I started looking if it was corrected and when.
Knowing it, I wanted to update curl manual to nuance a bit what separator will be used. And to document that there was a change in 7.86.0 which fixed one problematic corner case.

I hope knowing this story, you will better understand my motivation 😄

PS. As it can happen that in our env there will be more people with curl older that 7.86.0 we finally switched to:

$ cat file.txt | curl --get --data-urlencode key@- --data 'foo=bar' --data 'data=leading/to/corner/case' 'http://example.com'

--data for details).

Prior to 7.86.0 curl could again use '?' as a separator in some
edge cases even if '?' was already in place.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is not helping. What dies it mean to "use it again" as a separator? Separator for what?

If there was a URL parser bug back before 7.86.0, why do we need to explain that to users now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per NOK example from PR description:

$ curl --silent --get --data key=value 'https://httpbin.org/get?foo/bar=baz' -v
[...]
> GET /get?foo/bar=baz?key=value HTTP/1.1
[...]

You can see ? used twice. So ? was "used again" by curl, although it was already provided in URL.

Maybe better wording could be:

Prior to 7.86.0 curl could use '?' as a separator between data fields in some edge cases.

Why to mention it if already fixed? If somebody uses curl prior to 7.86.0 and notices this problem and checks manual, he will switch to newer version instead of creating issue etc.

as a query string.

Prior to 7.86.0 curl could use '?' as a separator between data fields in some
edge cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to refer to a bug that we have not been notified about which also is already fixed.

If we should document an old bug in the current manpage, I think we need more details and a more specific description. But I will repeat that I don't think that this bug seems important enough to warrant a mention in current documentation.

I don't think this suggested wording will help any users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only scenario to reproduce this problem is to:

  • have data with forward slash (/) in key or value provided already in URL
  • add additional data using --data or --data-urlencode (this may be valid to all others variants of --data-XXX - I was not testing)
  • have curl older than 7.86.0 😉

As in example provided here: #13706 (comment)

Not to make manual page too overwhelming, I called it 'edge case'. I hope if anyone notices ? used as separator between data fields (instead only between URL and query string) and finds this description in manual page, he will be able to figure out that he is using too old version of curl. So he will upgrade instead of reporting issue, etc.

It's hard to match this behavior with correction description in changelog as it is called:

single_transfer: use the libcurl URL parser when appending query parts

So maybe it somehow helps to have it mentioned additionally like this.

It was done a bit like here:

Copy link
Member

@bagder bagder May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug happened if there was an existing query in the provided URL and that query contained a slash. Then curl would wrongly append the provided --data to the right side of the query with a ? separator because the slash to the right side of the question mark makes it think there was no query part.

I still don't think this is a bug important enough to mention in the manpage, 17 releases after it was fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, part related to edge case is removed. I kept the other portion of the change. If not accepted let's close this PR.

@bagder
Copy link
Member

bagder commented May 30, 2024

docs/cmdline-opts/get.md:24:44: error: found bad word "\bwill\b"
   24 | request that otherwise would be used. curl will append provided data to URL
      |                                            ^~~~~~~~
 maybe use "rewrite to present tense" instead?

It should rather be written like:

curl appends the provided data...

I'll fix when I merge.

@bagder bagder closed this in 7d5b0ba May 30, 2024
@bagder
Copy link
Member

bagder commented May 30, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants