Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Module: http:get(), http:post(), … #914

Open
ChristianGruen opened this issue Mar 27, 2014 · 17 comments
Open

HTTP Module: http:get(), http:post(), … #914

ChristianGruen opened this issue Mar 27, 2014 · 17 comments

Comments

@ChristianGruen
Copy link
Member

New functions should be added to the HTTP Module, which shouldn’t do any magic as http:send-request does. Next, make functions streamable:

http://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg03981.html

@ChristianGruen
Copy link
Member Author

@ChristianGruen
Copy link
Member Author

Postponed.

@ChristianGruen ChristianGruen reopened this Jul 4, 2018
@ChristianGruen ChristianGruen added this to the 9.1 milestone Jul 23, 2018
@ChristianGruen
Copy link
Member Author

ChristianGruen commented Jul 25, 2018

Herr is some more information on the planned enhancements:

  • All new functions will start with the mandatory URI.
  • The 2nd argument of http:send will be the mandatory HTTP method.
  • The provided options will be pretty close to the element attributes of the EXPath module.

Function signatures:

http:delete($uri[,$options])
http:get($uri[,$options])
http:head($uri[,$options])
http:options($uri[,$options])
http:post($uri[,$options])
http:put($uri[,$options])
http:send($uri,$method[,$options])

Options:

map {
  "username" : "user",
  "password" : "pass",
  "auth-method" : "Basic",
  "status-only": true(),
  "override-media-type": "text/plain",
  "follow-redirect": false(),
  "timeout": 30,
  "headers": map {
    "user-agent": "..." (: , ... :)
  },
  "body": map {
    "media-type": "text/plain",
    "content": "..."
  }
}

@adamretter
Copy link

So if I understand there are two main drivers here:

  1. Split the legacy send function into constituent methods; Whilst also offering a send variant that takes a Map of options.
  2. Switch the $options to a Map instead of an XML document?

What do you see the return type of these functions looking like? I was never to sure about the sequence approach. There are some arguments that a Map would be a better result.

The most interesting thing for me would be how to handle multipart request/response in a way that does not prohibit lazy construction and/or streaming.

Perhaps we need to also give some consideration to WebSockets, or do you see that as a separate module?

@ChristianGruen
Copy link
Member Author

  1. Split the legacy send function into constituent methods; Whilst also offering a send variant that takes a Map of options.
  2. Switch the $options to a Map instead of an XML document?

Exactly. The design heavily borrows from the Zorba HTTP Client implementation, which I referenced in an earlier comment, and which I was happy to use in the past. The result of a simple GET request looks as follows:

map {
  "message": "OK",
  "status": "200",
  "headers": map {
    "Last-Modified": "Thu, 31 May 2018 11:55:40 GMT",
    "Server": "Apache",
    "Content-Type": "text/html",
    "Date": "Wed, 25 Jul 2018 17:09:22 GMT",
    "Content-Length": "15749",
    "Accept-Ranges": "bytes"
  },
  "body": map {
    "media-type": "text/html",
    "content": <html class="homepage" lang="en">...</html>
  }
}

The most interesting thing for me would be how to handle multipart request/response in a way that does not prohibit lazy construction and/or streaming.

In BaseX, we lazy/streamable items can be used, which will only retrieve the actual result when requested. For example, if file:read-binary(...) is defined as content (either in the main body or as multipart item), the bytes will only be streamed from disk when the request is actually sent. – I am wondering if eXist would handle this similarly?

I am not sure how this could look like when creating the HTTP response. After all, it seems to depend on the underlying implementation anyway.

Perhaps we need to also give some consideration to WebSockets, or do you see that as a separate module?

Regarding WebSockets, I would indeed propose to handle those in a separate module. As far as I know, while WebSocket are downward compatible with HTTP (in particular the handshake), they are actually a different protocol.

I already implemented large parts of this proposal today; it was pretty straightforward.

@ChristianGruen
Copy link
Member Author

Some more notes:

  • As the use of specific methods is pretty common, we decided to add a function for all major HTTP methods.
  • We didn’t add dedicated functions for delivering text and binary (as we e.g. have in the File Module or Archive Module), because we already have the override-media-type attribute in the EXPath specification, which is more flexible.
  • As options (such as for authentication) are pretty common, we added them to all functions.
  • In contrast to the Zorba module, we didn’t add a $body parameter and an optional $content-type parameters, …
    • as we would limit the functions to singlepart requests, and
    • as the format would differ from the $options parameter, which already provides support for supplying a payload.

Here is the full list of function signatures:

  • http:get($uri as xs:string) as map(*)
  • http:get($uri as xs:string, $options as map(*)) as map(*)
  • http:post($uri as xs:string) as map(*)
  • http:post($uri as xs:string, $options as map(*)) as map(*)
  • http:put($uri as xs:string) as map(*)
  • http:put($uri as xs:string, $options as map(*)) as map(*)
  • http:delete($uri as xs:string) as map(*)
  • http:delete($uri as xs:string, $options as map(*)) as map(*)
  • http:head($uri as xs:string) as map(*)
  • http:head($uri as xs:string, $options as map(*)) as map(*)
  • http:options($uri as xs:string) as xs:string*
  • http:options($uri as xs:string, $options as map(*)) as xs:string*
  • http:send($uri as xs:string, $method as xs:string) as map(*)
  • http:send($uri as xs:string, $method as xs:string, $options as map(*)) as map(*)

@adamretter
Copy link

adamretter commented Jul 25, 2018

So a few questions:

  1. I don't really understand why you would want functions for a POST or PUT request without a body?
  2. The structure for the request does not seem to support multipart - it seems you only have a single body? which is a map. The same goes for response. Could you provided a multipart example or a request and response for discussion please?
  3. Semantically I wonder about including the body in the "options". It doesn't feel to me that the body of the request is quite the same as say setting a http-header.

We implement lazy streamable for binary files in a similar manner to yourselves I think. Everything there is a stream which is untouched until serialization time.

I was wondering more about if there was value in making it explicit in the API, so that XQuery users could have control over this. i.e. for the body they supply a function instead of an xs:anyAtomicType or node(). I can see some use-cases where this would be important.

@ChristianGruen
Copy link
Member Author

ChristianGruen commented Jul 25, 2018

Could you provided a multipart example or a request and response for discussion please?

The map for a multipart request could look as follows:

map {
  "status-only": true(),
  "headers": map {
    "user-agent": "BaseX"
  },
  "multipart": map {
    "boundary": "--AaB03x", (: optional :)
    "parts": (
      map {
        "headers" : map {
          "Content Disposition": "file"
        },
        "body": map {
          "media-type": "image/png",
          "content": http:get("http://docs.basex.org/skins/vector/images/wiki.png")
        }
      }
    )
  }
}

As you may see, it’s pretty close to the structure of EXPath requests. In Zorba, an array was used for the entries of the parts key, but I think that a simple sequence does the job as well.

The response for multipart looks pretty much the same: Both the headers and the payload are returned as a single map. Similar to EXPath and Zorba, a multipart map entry would be returned instead of the body map entry.

You are right, the term “options” is misleading. We could give it a more generic name (such as “request”)?

Or we choose a more radical approach indeed, and add another 2nd parameter to http:post & http:put and a 3rd parameter to http:send (with a maximum of 4 parameters). I decided not to choose this path so far, because we would then get different formats for requests and responses (currently, they are more or less similar).

If we introduced an extra parameter for bodies in those 3 cases, we should probably return a sequence instead:

  • The first item would be the map, which contains the status, message, and the headers.
  • The remaining items would be maps with media-type and content, or with the multipart (boundary and single parts, containing the headers and bodies).

Here are the two representations:

(: Request with 2 arguments :)
http:post(
  'URI',
  map {
    'status-only': true(),
    'body': map {
      'media-type': 'text/plain',
      'content': file:read-text('bla')
    }
  }
)

(: Request with 3 arguments :)
http:post(
  'URI',
  map {
    'media-type': 'text/plain',
    'content': file:read-text('bla')
  },
  map {
    'status-only': true()
  }
)

For the second version, we would need to find an intuitive solution for indicating if a reponse is multipart or not.

I haven’t thought about nested multipart requests. In principle, all approaches here (requests and responses) can be arbitrarily nested.

for the body they supply a function instead of an xs:anyAtomicType or node(). I can see some use-cases where this would be important.

Currently, I’m not aware of any other functions that behave like this, but, yes, it could be an option. In that case, I would tend to use item() as type, and evaluate the body if it’s a function.

@ChristianGruen
Copy link
Member Author

Some more thoughts on http:post (and, similar, http:put and http:send). We could define the following two signatures:

http:post($uri as xs:string, $payload as item()?) as map(*)
http:post($uri as xs:string, $payload as item()?, $options as map(*)) as map(*)

The following types are accepted for the payload:

Type Description
empty-sequence() No body
xs:string Send as text/plain
xs:base64Binary, xs:hexBinary Send as application/octet-stream
node() Send as application/xml (text/xml ?)
map(xs:string, item()) Map with…
• single body: media-type, content
• multipart body: boundary, parts

@adamretter: We could add an optional type function() as item()? for lazy evaluation.

The result is returned in a single map. If a payload is available, it is bound to the body or multipart key:

map {
  "message": "OK",
  "status": "200",
  "headers": map {
    "Last-Modified": "Thu, 1 Jan 1995 01:01:01 GMT", (: , ... :)
  },
  "body": map {
    "media-type": "text/html",
    "content": <html class="homepage" lang="en">...</html>
  }
}

The full list of function signatures:

http:options($uri as xs:string) as xs:string*
http:options($uri as xs:string, $options as map(*)) as xs:string*

http:get($uri as xs:string) as map(*)
http:get($uri as xs:string, $options as map(*)) as map(*)

http:post($uri as xs:string, $payload as item()?) as map(*)
http:post($uri as xs:string, $payload as item()?, $options as map(*)) as map(*)

http:put($uri as xs:string, $payload as item()?) as map(*)
http:put($uri as xs:string, $payload as item()?, $options as map(*)) as map(*)

http:delete($uri as xs:string) as map(*)
http:delete($uri as xs:string, $options as map(*)) as map(*)

http:head($uri as xs:string) as map(*)
http:head($uri as xs:string, $options as map(*)) as map(*)

http:send($uri as xs:string, $method as xs:string, $payload as item()?) as map(*)
http:send($uri as xs:string, $method as xs:string, $payload as item()?, $options as map(*)) as map(*)

@adamretter
Copy link

adamretter commented Jul 29, 2018

  1. So I do like your idea of the request and response being the same format. Maybe it is enough to rename $options to $request (or even $data).

  2. I agree that using item() as the type so that users can control lazy-evaluation of a complex body calculation could work nicely.

  3. I have a few queries about the parts: field:

    1. Why use a sequence and not an array here? I can see that either would work fine, perhaps there are some advantages in using an array here; Perhaps even as simple as then making it parsable as JSON.
    2. I have some reservations about the use of either body or "parts". As I understand it the HTTP Request/Response has a "body" and that body might be a multipart body. How about if we could just have body and the content of that is an item()?; For multipart this would be an array of maps.

Request with normal body

"body": <html class="homepage" lang="en">...</html>

Request with lazy body

"body": function() { <html class="homepage" lang="en">expensive computation...</html> }

Request with explicit media-type conversion

"body": map {
    "media-type": "text/html",
    "content": <html class="homepage" lang="en">...</html>
}

Request with lazy explicit media-type conversion

"body": function() { map {
    "media-type": "text/html",
    "content": <html class="homepage" lang="en">expensive compuation...</html>
}}

Request with multi-part body (two parts)

"body": [
    map {
        "content": <html class="homepage" lang="en">...</html>
    },
    map {
        "headers": map {
            "Content-Disposition": 'form-data; name="uploadedfile"; filename="hello.o"',
            "Content-Type": "application/x-object"
        },
        "content": someBase64==
    }
]

Request with multi-part body (one part)

"body": [
    map {
        "headers": map {
            "Content-Disposition": 'form-data; name="uploadedfile"; filename="hello.o"',
            "Content-Type": "application/x-object"
        },
        "content": someBase64==
    }
]

@ChristianGruen
Copy link
Member Author

  1. So I do like your idea of the request and response being the same format. Maybe it is enough to rename $options to $request (or even $data).

And I developed sympathy for your suggestion to have explicit payload arguments… Someone else I talked to was wondering as well if a POST function without body makes sense.

  1. I have a few queries about the parts: field:
    i. Why use a sequence and not an array here?

I thought about using sequences, because some users still fight with the syntax of arrays (by mixing up array { ... } and [ ... ]). But you are right, JSON serialization would be simpler with arrays.

ii. I have some reservations about the use of either body or "parts". As I understand it the HTTP Request/Response has a "body" and that body might be a multipart body. How about if we could just have body and the content of that is an item()?

I like this idea a lot…

  • The boundary key for multipart messages, which the previous specs had and which I adopted in the first draft, should probably be skipped anyway, as it’s something the implementation can take care of.
  • The media-type for single bodies is redundant (I guess?), as we also have the Content-Type header.
  • Arrays, as you suggested, would probably be the better fit for multipart messages: The array type of the response could serve as indicator that the response is multipart.
  • If we stick with the $payload (or $body) argument, and if we want to have the same format for requests and responses, we could drop my intermediate idea of allowing maps, and instead supply the media type as header in the last argument:

An exemplary POST request:

(: OLD :)
http:post(
  'http://json.io/',
  map {
    'media-type': 'application/json',
    'content': '{ "key": "value" }'
  }
)

(: NEW :)
http:post(
  'http://json.io/',
  '{ "key": "value" }',
  map { 'headers': map { 'Content-Type': 'application/json' } }
)

An exemplary GET response:

map {
  'headers': map {
    'Content-Type': 'application/json'
  },
  'body': '{ "key": "value" }' (: string? map? binary? :)
)

One thing I haven’t touched so far is in which format a reponse will be returned to the client. Currently, I simply adopted the existing conversion rules.

@ChristianGruen
Copy link
Member Author

ChristianGruen commented Jul 29, 2018

PS: As JSON is one of the predominant data type for HTTP requests nowadays, we could even go as far as to interpret map arguments as JSON:

Body Type Content-Type
xs:string text/plain
xs:base64Binary
xs:hexBinary
application/octet-stream
node() application/xml
map(*) application/json
array(*) multipart/mixed

@adamretter
Copy link

adamretter commented Jul 29, 2018

The boundary key for multipart messages, which the previous specs had and which I adopted in the first draft, should probably be skipped anyway, as it’s something the implementation can take care of.

I can't think of a use-case where a user would need to specify the boundary manually. So unless anyone can claim otherwise, I think we can probably drop it.

The media-type for single bodies is redundant (I guess?), as we also have the Content-Type header.

I think we have to be a little bit careful here. We have to clearly differentiate between the HTTP Content-Type header, and cases for implicit/explicit conversion of the XDM data types to some form of serialized HTTP Request data. For example if you specify text/html as the content type, you may still want to actually send XHTML 1.1 as the content body.

Perhaps we should work next on defining implicit and explicit serialization rules?
It seems sensible to me to decouple the HTTP Content-Type header from the serialization itself, as many HTTP services can be fickle and the user likely has a better knowledge of what the server requires (even if it is non-sensical). We can of course offer some implicit serialization rules, but for explicit serialization, I think we should incorporate https://www.w3.org/TR/xslt-xquery-serialization-30/ either as an additional function parameter, or perhaps within the "options", although as an additional parameter seems to fit better with existing methods like fn:serialize etc.

we could even go as far as to interpret map arguments as JSON:

Whilst I like the spirit of that idea. Again, I think we need to be very careful here. Whilst we could do that for non-multipart, it brings in an asymmetry with a multipart request/response where each multipart is a Map(*), but is in fact not mapped to/from JSON.

@ChristianGruen
Copy link
Member Author

ChristianGruen commented Jul 29, 2018

I think we have to be a little bit careful here. We have to clearly differentiate between the "Content-Type" header, and cases for implicit/explicit conversion of the XDM data type to some form of serialized HTTP Request data. For example if you specify "text/html" as the content type, you may still want to actually send XHTML 1.1 as the content body.

My thought here was that the content-type would only be assigned as header and passed on to the server if no explicit content-type was supplied by the user. If a user specifies the content-type in the request, (s)he still has the choice of converting the body to a binary or string before supplying it to the HTTP function.

It would work similar for multipart responses: If a user supplies no Content-Type in the headers section of a multipart body, we would rely on the implicit conversion rules.

To summarize this (all rules would apply to both the single main body as well as the multipart bodies):

  1. Implicit conversion/serialization rules (to discuss):
    1. xs:...Binary will be sent as-is (→ the implicit conversion can always be circumvented by supplying binary data).
    2. xs:string could be serialized as UTF-8.
    3. node() could be serialized as XML.
    4. map(*) could be serialized as JSON.
  2. The media-type, which results from the implicit conversion, will be sent as Content-Type if no other content-type header was specified by the user.

My early stance on all this was to completely drop implicit conversions in the newly defined functions, because the EXPath solution regularly leads to confused questions on our mailing list. A bare-bones design would have been to only allow a single binary item as body argument. However, that would have required additional functions for converting multipart messages to binary data… And it would have made streaming more difficult. The convenience conversions may simplify streaming in general.

I think there is an important difference between requests and responses:

Requests

As you indicated, a developer usually knows what the server requires, so we should have total control of the data that will be sent to the server.

Responses

  • Unfortunately, we cannot control the quality of the response.
  • Obviously, we should trust the server that the response conforms to the basic rules of the protocols (for example, of multipart messages).
  • However, we cannot take for granted anymore (in particular when working with older server instances) that a text/plain response will be encoded as UTF-8 (or that the encoding is specified at all in the response).
  • So maybe we shouldn’t do any implicit result conversions here and only return (single body and several multipart) items of type xs:base64Binary? The binary data could then be further processed by the implementation-specific conversion functions (in BaseX, that’d be bin:decode-string, fetch:xml-binary, html:parse, csv:parse, …).
  • To make this more user-friendly, we could have implicit conversions here as well (ideally based on the same rules as for sending requests), but we could make them optional. EXPath provides override-media-type to enforce a different conversion. I am not convinced that this is such a good solution…
    • Maybe we should rather have something like a boolean convert-response option?
    • Thinking of your suggestion, the existence of a serialize option could trigger an additional serialization of the result.

I think we should incorporate https://www.w3.org/TR/xslt-xquery-serialization-30/ either as an additional function parameter, or perhaps within the "options", although as an additional parameter seems to fit better with existing methods like fn:serialize etc.

Do you have an example how this could look like? Would you like to have something like this for both requests and responses?

@adamretter
Copy link

  1. Your first section regarding the Content-Type header sounds very reasonable. Just one question there still remains from above:

    iv. map(*) could be serialized as JSON.

    Whilst I like the spirit of that idea. Again, I think we need to be very careful here. Whilst we could do that for non-multipart, it brings in an asymmetry with a multipart request/response where each multipart is a Map(*), but is in fact not mapped to/from JSON. How do you see this working when you need a map to specify the multipart?

  2. In the responses section:

    1. However, we cannot take for granted anymore (in particular when working with older server instances) that a text/plain response will be encoded as UTF-8 (or that the encoding is specified at all in the response).

    Actually the HTTP spec says that unless the server explicitly specifies the character encoding, then it is in fact ISO-8859-1. So I think we should be able to rely on that.

    I do think that some implicit conversions of the response are reasonable. However, like with the request, I think we have to trust that the user/developer knows best. So wherever we employ implicit conversions, we should allow the user to override them with an explicit definition.

    1. Regards serialization, yes, I think it could be equally applied to the request or the response to explicitly override any implicit conversion. I could try and work up some example if you like, however see the next point...
  3. I think at this point it might be simpler for us to collaborate on a short document (a simplified version of a spec if you like) where we can define the function signatures etc. I feel like this thread is becoming a little confusing for me. Do you have a preferred collaborative editing tool? Wiki, Google Doc, Markdown over email, other? Should I create something as a starting draft?

@ChristianGruen
Copy link
Member Author

How do you see this working when you need a map to specify the multipart?

I will write down examples in an upcoming document (see below…).

Actually the HTTP spec says that unless the server explicitly specifies the character encoding, then it is in fact ISO-8859-1. So I think we should be able to rely on that.

Thanks, good to remember.

I think at this point it might be simpler for us to collaborate on a short document […]

Absolutely! I have just sent you an e-mail to discuss what might be the best domain (expath, exquery). Ideally, we can already use the collaborative result as final spec without too many adaptations.

@ChristianGruen ChristianGruen modified the milestones: 9.1, 9.2 Oct 30, 2018
@ChristianGruen ChristianGruen modified the milestones: 9.2, 10 Feb 7, 2022
@ChristianGruen ChristianGruen removed this from the 10 milestone Jul 31, 2022
@ChristianGruen
Copy link
Member Author

Postponed to a later version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants