Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBONE Specification? #660

Open
elevran opened this issue Aug 22, 2023 · 7 comments
Open

HBONE Specification? #660

elevran opened this issue Aug 22, 2023 · 7 comments

Comments

@elevran
Copy link

elevran commented Aug 22, 2023

I am considering writing a dataplane proxy that can interoperate with ambient mesh for experimental functionality that might not make sense inside the core Istio components.
Therefore, I'm looking to find a specification of the protocols used.

While control plane protocols are relatively accessible from the corresponding protobuf and xDS (workload and policy) and gRPC (identity) definitions, the HBONE protocol between dataplanes is not as directly accessible. Specifically, things such as headers passed in requests and responses, use of methods and paths, stream management, etc. are less "formally" defined and can be inferred through reading the actual implementation code.

Reading through https://istio.io/latest/blog/2022/introducing-ambient-mesh/ it says:

Ambient mesh uses HTTP CONNECT over mTLS to implement its secure tunnels and insert waypoint proxies in the path,
a pattern we call HBONE (HTTP-Based Overlay Network Environment). HBONE provides for a cleaner encapsulation of traffic
than TLS on its own while enabling interoperability with common load-balancer infrastructure. FIPS builds are used by default
to meet compliance needs. More details on HBONE, its standards-based approach, and plans for UDP and other non-TCP
protocols will be provided in a future blog.

However, I was unable to find a detailed protocol specification on the Istio website or github. Perhaps I had somehow missed the aforementioned blog entry.

Is there such a spec available? If not, is there interest in having a contribution which documents it?

@elevran elevran changed the title HBONE Specification HBONE Specification? Aug 22, 2023
@keithmattix
Copy link
Contributor

The closest thing we have is being submitted in a PR: istio/istio#46472. Take a read through that and see what gaps exist in the doc and we'll be happy to add it

@elevran
Copy link
Author

elevran commented Aug 22, 2023

thanks @keithmattix !
I'll go through that, noting gaps (if any) in the PR discussion

@kyessenov
Copy link

This might be intentional since we don't want Istio to reinvent the wheel. The standardization effort is done by https://www.ietf.org/archive/id/draft-schinazi-masque-02.html. The specific instance used by Istio differs in two ways:

  • it's mTLS, not TLS - QUIC implementations are not ready for mutual yet
  • it's only TCP - hence it's regular HTTP CONNECT with authority populated (no need for extended or UDP yet)

The operational concerns (stream management, flow control) are handled via regular HTTP means.

The one extra header (baggage) has been removed from the protocol specification after some extensive discussion.

@kyessenov
Copy link

Client authorization is done in a simplistic way: source IP and identity is forged by ztunnel. There's no delegation (separate ztunnel identity) or client forwarding (XFF) that are needed right now, but could be easily added.

@bleggett
Copy link
Contributor

bleggett commented Aug 24, 2023

This might be intentional since we don't want Istio to reinvent the wheel.

Well, we invented a codename/acronym, and we have to at least contend with that :D

My $0.02 is just put @kyessenov 's explanation above into architecture/HBONE.md and call it a day.

@elevran
Copy link
Author

elevran commented Oct 26, 2023

@kyessenov - apologies for not replying earlier, hope you are ok restarting this conversation.

I've taken the time to read through the ztunnel architecture doc and the code.

This might be intentional since we don't want Istio to reinvent the wheel. The standardization effort is done by https://www.ietf.org/archive/id/draft-schinazi-masque-02.html. The specific instance used by Istio differs in two ways:

  • it's mTLS, not TLS - QUIC implementations are not ready for mutual yet
  • it's only TCP - hence it's regular HTTP CONNECT with authority populated (no need for extended or UDP yet)

Based on my (admittedly limited) understanding of Rust, and in line with what you've written above, it seems that the transport connection is H2 frames over TCP. Identical (i.e., based on client and server identities) connection are pooled into a single ztunnel (TCP socket). Is that correct?

Also, which identities (i.e., certificates) are used between ztunnels? I think it the workload's identities upon each TCP connection establishment, but would like to confirm. In addition, since it is H2/TCP, it would seem that mTLS is supported, but I may have misunderstood the Rust TLS acceptor set up.

The operational concerns (stream management, flow control) are handled via regular HTTP means.

I understand how the operational concerns of different (i.e., "unpooled") connections between ztunnel gateways are handled, as each is its own TCP connection and the kernel stack would handle fairness, congestion, etc.
It is less clear to me how operational concerns of multiple client-server connections, using the same tunnel, are handled. Is the ztunnel implementation responsible for allocating fair share of the TCP stream between multiple pooled streams?

What was the rationale for pooling (as opposed to opening a new ztunnel socket for each client-server)? Was it resource usage, latency in establishing new streams (the current scheme may avoid roundtrips used for TCP establishment), or something else?

Is there a general timeline for switching over to MASQUE/QUIC (via https://www.ietf.org/archive/id/draft-schinazi-masque-02.html or any of the more the specific encoding specified under the WG)?

@kyessenov
Copy link

Based on my (admittedly limited) understanding of Rust, and in line with what you've written above, it seems that the transport connection is H2 frames over TCP. Identical (i.e., based on client and server identities) connection are pooled into a single ztunnel (TCP socket). Is that correct?

Yes. This is the same for Envoy, and should not be Rust specific, really.

Also, which identities (i.e., certificates) are used between ztunnels? I think it the workload's identities upon each TCP connection establishment, but would like to confirm. In addition, since it is H2/TCP, it would seem that mTLS is supported, but I may have misunderstood the Rust TLS acceptor set up.

Ztunnel doesn't present its own identity, it presents the identities of the source/destination pods on their behalf. This is a limitation, but it works for now.

I understand how the operational concerns of different (i.e., "unpooled") connections between ztunnel gateways are handled, as each is its own TCP connection and the kernel stack would handle fairness, congestion, etc. It is less clear to me how operational concerns of multiple client-server connections, using the same tunnel, are handled. Is the ztunnel implementation responsible for allocating fair share of the TCP stream between multiple pooled streams?

Yes, it should be responsible for fair-sharing network resources and handling the noisy neighbors. This is my main concern about production-readiness of ambient - I don't think we have enough evidence the current settings are adequate.

What was the rationale for pooling (as opposed to opening a new ztunnel socket for each client-server)? Was it resource usage, latency in establishing new streams (the current scheme may avoid roundtrips used for TCP establishment), or something else?

The main benefit of pooling is reducing the overhead of the TLS handshake. I don't know how ztunnel pools, but in Envoy we set max streams to 100 so ensure better CPU utilization.

Is there a general timeline for switching over to MASQUE/QUIC (via https://www.ietf.org/archive/id/draft-schinazi-masque-02.html or any of the more the specific encoding specified under the WG)?

You don't need full MASQUE if all traffic is TCP. We will need it when UDP or arbitrary IP traffic becomes in scope for Istio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants