-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Proxy-Wasm ABI v0.2.1. #42
base: main
Are you sure you want to change the base?
Conversation
Fixes proxy-wasm#41. Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Please take a look, I'm sure there are some errors, but it should be fairly complete. I'll wait a week or two before merging this to make sure all mistakes are caught and fixed. Afterwards, I'll push v0.1.0 and v0.2.0 for completeness, since it's going to be trivial to do those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started making some comments, then realized they would be the same through the whole doc. Hopefully this gives a good idea of what may be helpful.
abi-versions/v0.2.1/README.md
Outdated
Called when the Wasm module is first loaded. | ||
|
||
|
||
### `proxy_on_memory_allocate` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe somewhere say that either proxy_on_memory_allocate
or malloc
may be absent (not exported by the guest), but not both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little tricky. One of those callbacks is required in the current Envoy implementation, but technically plugins don't need to export this if they don't retrieve strings/vectors/buffers/maps/properties from the host.
|
||
|
||
### `wasi_snapshot_preview1.fd_write` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.. summary comment if this is required, which would be odd. in fact, _start and this may be better put into an "optional support" section unless they are required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, it would be better to keep all things like this (things that are not defined in this ABI) as a historical appendix. That envoy's host exports a few functions is interesting info, so is that some compilers only have a wasi target. However, even though these topics affect portability, these are different than the ABI in general which is used by hosts besides envoy such as mosn cc @antJack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But those are neither historical not optional, and intercepting fd_write
is actually quite useful when integrating with existing 3rd party libraries that print to stdout
and/or stderr
.
Ideally, all Proxy-Wasm plugins should use proper logging facilitates and use proxy_log
, but that's not always possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for writing it!
- `SERIALIZATION_FAILURE` TODO | ||
- `INVALID_MEMORY_ACCESS` when `path_data`, `path_size`, | ||
`return_value_data` and/or `return_value_size` point to invalid | ||
memory address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incomplete without a description of the encoding of the input / output I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but I don't think that should hold up this PR; personally I'd be fine with it landing in a follow-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm treating proxy_{get,set}_property
in v0.2.1 as a opaque pass-thru to CEL (hence the comment at the beginning of this section), and I was hoping not to document existing paths, since the set isn't stable across host implementations.
Maybe we could redirect users to the documentation in proxies for CEL attributes?
In any case, I think CEL makes sense for a lot of use-cases (e.g. when used in expressions, or to access unspecified objects like Envoy's filter state or dynamic metadata), but I don't think it works great as a generic property getter for predefined values.
Notably:
- Querying using strings vs enums is wasteful, and inconsistent with rest of the Proxy-Wasm ABI.
- The set isn't stable, so it's impossible to enforce it at the ABI layer, and we had issues with that in the past.
- There is a some overlap with existing functions (e.g. access to header maps), which is source of some confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it works great as a generic property getter for predefined values.
huge +1
|
||
## HTTP fields and gRPC metadata | ||
|
||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data encoding is almost as important as the function declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'll add this shortly. I left TODO
because I didn't want to delay this PR any longer, but I intend to complete this sections before merging this PR.
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
|
||
#### `proxy_header_map_type_t` | ||
|
||
- `HTTP_REQUEST_HEADERS` = `0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @PiotrSikora! Quick question: has it been considered to make URI arguments a map type as well? How are proxy-wasm users currently expected to manipulate that part of the request? Through platform/implementation-specific properties maybe? Would it make sense to introduce a HTTP_REQUEST_URI_ARGS
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thibaultcha, it's been a while!
I don't recall a map with URI query parameters ever being discussed, no.
Right now, the only option is to modify the :path
pseudo-header which contains the query parameters.
Adding HTTP_REQUEST_URI_ARGS
map that allows access to individual parameters could work, but I'm not sure if there is much value there vs modifying :path
directly. Most APIs I know would only separate path
and query
strings, but not give access to individual parameters. Are many users asking for it?
In any case, this PR is about documenting v0.2.1 as-is, so this is not something that can be introduced here.
|
||
Plugin must return one of the following values: | ||
- `CONTINUE` to forward `HTTP_RESPONSE_BODY` buffer downstream. | ||
- `PAUSE` to pause processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Rust SDK examples we recently noticed that returning PAUSE
expects the host implementation to buffer the response body until eof
, at which point it will call on_response_body
once more (with a full body and eof=true
). We saw this described in this example but not written explicitly so anywhere else. Would this be the right place to clarify this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, that is a terrible example, and it works only for HTTP responses with a small body, but it would deadlock with anything larger than the configured buffer size.
Hosts are expected to buffer request/response body while the processing is paused, but they can limit the amount of data that may be buffered. I've added small note about it, PTAL.
The issue is that v0.2.1 doesn't have a way to notify the plugin when that limit is reached. We could add a note that calling proxy_on_{request,response}_body
with the same buffer size twice in a row implies that the buffer is full, but that's a bit hacky and not currently implemented. @mpwarres any thoughts?
Note that this is fixed in vNEXT with a much richer set of actions (see: vNEXT#proxy_on_http_request_body), but that version is neither reviewed nor implemented yet.
Also, see my previous comment about buffering (proxy-wasm/proxy-wasm-cpp-host#143 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a note that calling proxy_on_{request,response}_body with the same buffer size twice in a row implies that the buffer is full, but that's a bit hacky and not currently implemented.
I suppose that is not what the proxy-wasm-rust-sdk example expects, and I don't know how much flexibility there is for v0.2.1 and this corner-case, but what if we made it clear body handlers can always be invoked with more chunks so long as eof: false
? In short doing something like vNEXT
WaitForEndOrFull
, but implied.
- Handlers return
again
the first time which enables buffering. - The same handler expects another invocation with a chunk that may or may not fit in the body buffer.
- Next body handler invocation:
3a. Ifeof: true
, the full body fits in the buffer on this invocation.
3b. Ifeof: false
, the buffer may or may not be full, but either way the handler can consider this body to only be handled by chunks, and expect more invocations.
In this context, hosts are expected to print appropriate error logs for a full body buffer or a second again
return value. Filters are them expected to keep track of body handlers invocations themselves on a per-need basis. Or rather, body handlers are expected to always handle chunked bodies as a fail-safe.
Maybe this misses cases I haven't thought of, or maybe it matches existing implementations, I don't know. Plus of course I am unsure how much flexibility there is for existing host and filters of v0.2.1.
By the way, what would happen to the proxy-wasm-rust-sdk example on Envoy, would the body not fit the buffer? I ought to check but I don't know my way around Envoy very well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall the actions need to extend a bit and make it more explicit? the Pause will actually convert into Http::FilterDataStatus::StopIterationAndBuffer
status implicitly which will buffer the response body. more context: tetratelabs/proxy-wasm-go-sdk#414
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod comments, acknowledging some TODOs could be addressed in later PRs.
- `SERIALIZATION_FAILURE` TODO | ||
- `INVALID_MEMORY_ACCESS` when `path_data`, `path_size`, | ||
`return_value_data` and/or `return_value_size` point to invalid | ||
memory address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but I don't think that should hold up this PR; personally I'd be fine with it landing in a follow-on.
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't reviewed all, but flushing the first comments. Thank you so much for your work! @PiotrSikora
Called to allocate continuous memory buffer of `memory_size` using | ||
the in-VM memory allocator. | ||
|
||
Plugin must return `memory_data` pointing to the start of the allocated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should clarify what Plugin
means here (this is the first appearance I guess), maybe adding a Terminology subsection?
> This callback has been deprecated in favor of [`proxy_on_memory_allocate`], | ||
> and it's called only in its absence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should clarify when this deprecation happens. Do you want to create an issue for something like "support policy"?
#### `proxy_on_vm_start` | ||
|
||
* params: | ||
- `i32 (uint32_t) plugin_context_id` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we missing a doc for this param? maybe at least say "unused" or something like that?
|
||
Plugin must return one of the following values: | ||
- `true` to indicate that the configuration was processed successfully. | ||
- `false` to indicate that the configuration processing failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document the desired semantic when false to answer the question like will the same Wasm VM be used?
|
||
Plugin must return one of the following values: | ||
- `true` to indicate that the configuration was processed successfully. | ||
- `false` to indicate that the configuration processing failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same about semantics
- `INVALID_MEMORY_ACCESS` when `return_pairs_data` and/or | ||
`return_pairs_size` point to invalid memory address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `INVALID_MEMORY_ACCESS` when `return_pairs_data` and/or | |
`return_pairs_size` point to invalid memory address. | |
- `INVALID_MEMORY_ACCESS` when `return_serialized_pairs_data` and/or | |
`return_serialized_pairs_size` point to invalid memory address. |
- `i32 (`[`proxy_status_t`]`) status` | ||
|
||
Retrieves value (`return_value_data`, `return_value_size`) of the header | ||
key (`key_data`, `key_value`) from the header map `map_id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment on header map
- `i32 (`[`proxy_status_t`]`) status` | ||
|
||
Adds key (`key_data`, `key_size`) with value (`value_data`, | ||
`value_size`) to the header map `map_id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment on header map
- `i32 (`[`proxy_status_t`]`) status` | ||
|
||
Adds or replaces key's (`key_data`, `key_value`) value with the provided | ||
value (`value_data`, `value_size`) in the header map `map_id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment on header map
* returns: | ||
- `i32 (`[`proxy_status_t`]`) status` | ||
|
||
Removes the key (`key_data`, `key_value`) from the header map `map_id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment on header map
point to invalid memory address. | ||
|
||
|
||
## TCP/UDP/QUIC streams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should limit this to only TCP streams
since UDP/QUIC are not supposed in v0.2.1.
invalid memory address. | ||
|
||
|
||
## Common stream operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we clarify what Common
mean here? This includes not only TCP but also HTTP, right?
Called when upstream connection is closed. | ||
|
||
|
||
## HTTP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, shouldn't this be HTTP Steams
(as in Common stream operations
)
|
||
> **Note** | ||
> The same unique `stream_context_id` is shared by | ||
> HTTP request and response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailers as well?
Plugin must return one of the following values: | ||
- `CONTINUE` to forward `HTTP_RESPONSE_BODY` buffer downstream. | ||
- `PAUSE` to pause processing. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify that we cannot use proxy_send_local_response
at this point since the headers might have already arrived at the downstream?
Plugin must return one of the following values: | ||
- `CONTINUE` to forward `HTTP_RESPONSE_TRAILERS` fields downstream. | ||
- `PAUSE` to pause processing. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment on proxy_send_local_response
above.
- `i32 (const uint8_t *) serialized_trailers_data` | ||
- `i32 (size_t) serialized_trailers_size` | ||
- `i32 (uint32_t) timeout` | ||
- `i32 (uint32_t *) return_call_id` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we document how return_call_id
will be used?
* returns: | ||
- `i32 (`[`proxy_status_t`]`) status` | ||
|
||
Resolves existing shared queue using the provided VM ID (`vm_id_data`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we document what VM ID
mean here? this is the first appearance.
- `SERIALIZATION_FAILURE` TODO | ||
- `INVALID_MEMORY_ACCESS` when `path_data`, `path_size`, | ||
`return_value_data` and/or `return_value_size` point to invalid | ||
memory address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it works great as a generic property getter for predefined values.
huge +1
cc @spacewander @shukitchan as non-Envoy implementors |
|
||
When `fd_id` is `STDOUT`, then it's logged at the `INFO` level. | ||
|
||
When `fd_id` is `STDERR`, then it's logged at the `ERROR` level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some well-known loggers write log to stderr by default, like logrus (https://github.com/sirupsen/logrus/blob/dd1b4c2e81afc5c255f216a722b012ed26be57df/logger.go#L89).
It might cause trouble if all the logs are considered in ERROR level.
`buffer_id`. | ||
|
||
Returned `status` value is: | ||
- `OK` on success. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we doc the behavior when the return buffer is empty? (For example, request without a body)
|
||
### Functions exposed by the host | ||
|
||
#### `proxy_http_call` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PiotrSikora Is the function proxy_http_call
the replacement of proxy_dispatch_http_call
in the old spec ? What about the code ? Will dispatch_http_call
disappear ? I'm asking these questions because I'm developing proxies and it's not very clear to me. Thank you Piotr
* returns: | ||
- `i32 (`[`proxy_status_t`]`) status` | ||
|
||
Defines a new metric of type `metric_type` with a name (`name_data`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions about behaviors that are not specified here:
-
What happens if I give it a metric name that was previously defined? Do I get the preexisting id in
return_metric_id
?- What happens if I pass a pre-existing name but with a different
metric_type
?- Are the names namespaced by type and I get a new metric id?
- Do I get a
BAD_ARGUMENT
error? - Do I get the preexisting id and the type is ignored?
- Do I get the preexisting id and the type is updated?
- What happens if I pass a pre-existing name but with a different
-
This call creates a metric; how do I dispose of a metric? For other functions such as
proxy_set_property
setting the value to NULL can be assumed to mean "unset", but how to "undefine" and release memory from metrics?
|
||
Plugin must return one of the following values: | ||
- `CONTINUE` to forward `HTTP_REQUEST_HEADERS` fields downstream. | ||
- `PAUSE` to pause processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notice a case that the action pause is not actual pausing the processing: tetratelabs/proxy-wasm-go-sdk#425
`HTTP_REQUEST`. | ||
|
||
Additionally, instead of forwarding request upstream, a HTTP response | ||
can be sent using [`proxy_send_local_response`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there are several variations of local responses like:
- sendDirectLocalReply https://github.com/envoyproxy/envoy/blob/50324229d7cf6024c43c7fd4bae1e042beea4728/source/common/http/filter_manager.cc#L1099
- sendLocalReply https://github.com/envoyproxy/envoy/blob/50324229d7cf6024c43c7fd4bae1e042beea4728/source/common/http/filter_manager.cc#L944
- sendLocalReplyViaFilterChain https://github.com/envoyproxy/envoy/blob/50324229d7cf6024c43c7fd4bae1e042beea4728/source/common/http/filter_manager.cc#L1061
Is it possible to make here proxy_send_local_response
more explicitly to indicate which type specifically it will trigger or specifying clearly in the doc?
@PiotrSikora any update on when this might be merged? |
I'll try to wrap this this week. |
🎉 |
@PiotrSikora Is there anything blocking this? Reference documentation for |
Nothing major, I need to address some comments and do a minor cleanup. It's finally on top of my TODO list, so I should land it in a day or two. |
Sets all key-value pairs in the header map `map_id` to the provided | ||
[serialized] map (`serialized_pairs_data`, `serialized_pairs_size`). | ||
|
||
Returned `status` value is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected behavior when the underlying function fails to set the header map pairs? (For example, if attempting to set response headers during proxy_on_response_body
, after the headers have already been sent).
What should be the result when failing to set the header map:
- set
status
toOK
and fail silently? - set
status
toBAD_ARGUMENT
, for trying to use amap_id
that is invalid in the given context? - issue a wasm trap, which triggers a panic in the client SDK?
Fixes #41.