Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THRIFT-5778 DRAFT/PROPOSAL: Thrift URI specification/implementation #2963

Closed
wants to merge 3 commits into from

Conversation

Jens-G
Copy link
Member

@Jens-G Jens-G commented Apr 21, 2024

Describing the endpoint specifics for a Thrift API today is a purely textual exercise, which leaves the client end with the task to set up and stack together a proper protocol/transport stack. That sometimes even leads to headaches, e.g. if the server requires e.g. framed protocol which might not be obvious. The use of a generally accepted, machine-readable and extensible Thrift URI format to describe client bindings could streamline that process.

Details see https://github.com/Jens-G/thrift/blob/thrift-uri/doc/specs/thrift-uri.md

@Jens-G Jens-G force-pushed the thrift-uri branch 3 times, most recently from 50dcab2 to 49a5928 Compare April 21, 2024 13:53
@Jens-G Jens-G changed the title DRAFT/PROPOSAL: Thrift URI specification/implementation THRIFT-5778 DRAFT/PROPOSAL: Thrift URI specification/implementation Apr 22, 2024
The data part consists of the target pipe, either name only or in full format:

Examples:
* thrift://binary/namedpipes?mypipe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As referenced in https://www.ietf.org/rfc/rfc2396.txt, even though this is not how URI is defined, a common syntax used widely defines it as:

<scheme>://<authority><path>?<query>

so this will be parsed into:

  • scheme: thrift
  • authority: binary
  • path: /namedpipes
  • query: mypipe

and the authority part can be misleading/confusing.

I haven't read that RFC fully carefully yet, but I think that we can add another / to make authority part empty:

thrift:///binary/namedpipes?mypipe

or, if that doesn't work, then maybe we should try to define it similar to mailto:, without the // part followed by :.


# Motivation and use case

Describing the endpoint specifics for a Thrift API today is a purely text-adventure like exercise, which leaves the client side developer end with the tedious task to set up and put together a proper protocol/transport stack. That sometimes even leads to headaches, e.g. if the server requires e.g. framed protocol which might not be obvious. The use of a generally accepted, machine-readable and extensible Thrift URI format to describe client bindings could streamline that process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we limit the scope of the issue to endpoints (e.g. limit to RPC part, and exclude pure serialization/deserialization use-cases), I would say the THeaderProtocol resolves the majority of the problems here.

THeaderProtocol already supports the auto detection of:

  • binary vs. compact
  • framed or not

it also supports zlib (but only if it's enabled inside Header, it does not support raw zlib transport unlike raw framed/binary/compact)

So the only issues left are:

  • json protocol is not covered (supported)
  • the underlying, final, raw transport used (socket/sslsocket/file/pipes/namedpipes)

(I think memory and buffered used in RPCs are pure implementation details and do not really affect the wire protocol, but please correct me if I'm wrong)

so maybe we should shift focus to support THeader in more language implementations (it's currently only implemented in c++/py/go)?

when a server uses THeaderProtocol, any of the following client will be able to talk to it without issue:

  • THeaderProtocol
  • TBinaryProtocol
  • TBinaryProtocol+TFramedTransport
  • TCompactProtocol
  • TCompactProtocol+TFramedTransport

(the final raw transport, e.g. one of socket/sslsocket/file/pipe/namedpipe, still need to be specified)

that won't resolve the json problem, obvious (I also don't think add json support to THeader is a good idea), but from my experience TJSONProtocol used in thrift RPC are usually more special cases (for example, only being combined with THTTPTransport) that it's usually unambiguous and has much less chance for the client to use the wrong protocol/transport.

Copy link
Member Author

@Jens-G Jens-G Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only issues left are:

  • Extensibility: How do I integrate my custom transport or protocol into the THeader system?
  • How does THeader deal with multiplex (I'm about to add that, not finsihed yet)?

memory and buffered used in RPCs are pure implementation details and do not really affect the wire protocol

Kind of "for the sake of completeness" thing. They probably could be safely excluded.

As referenced in https://www.ietf.org/rfc/rfc2396.txt, even though this is not how URI is defined, a common syntax used widely defines it as:

Yes I noticed. RFC3986 is more recent, but essentially tells the same story. Will be fixed soon.

@Jens-G
Copy link
Member Author

Jens-G commented Apr 24, 2024

THeaderProtocol already supports the auto detection of: [...] framed or not

  • Assumed, I absolutely want to force the use of framed transport for some reasons, I would have to derive an own variant and handle that myself? Otherwise bad counterparts could by crafting a suitable THeader force me into unframed mode which I don't want. Correct? Intended?

  • Same with HMAC. I can intercept a message, strip the HMAC stuff and then deliver some "improved" version of the message instead. Correct? Intended?

  • CLIENT_TYPE somehow reminds me of something.

@fishy
Copy link
Member

fishy commented Apr 24, 2024

THeaderProtocol already supports the auto detection of: [...] framed or not

  • Assumed, I absolutely want to force the use of framed transport for some reasons, I would have to derive an own variant and handle that myself? Otherwise bad counterparts could by crafting a suitable THeader force me into unframed mode which I don't want. Correct? Intended?

The "auto detection" part in THeader spec is more for auto "backward" compatibility with non-THeader clients. e.g. if a client talking to THeader server does not use THeader at all, it still works if it uses one of TBinary or TCompact protocol, and with or without TFramedTransport. but if a client actually uses THeader frame is enforced (it cannot write/read any header without frame)

  • Same with HMAC. I can intercept a message, strip the HMAC stuff and then deliver some "improved" version of the message instead. Correct? Intended?

if we have a MITM to rewrite the message then yes HMAC can be stripped.

I get your point, this proposal does add additional enforcement not possible with THeader alone.

@Jens-G
Copy link
Member Author

Jens-G commented Apr 27, 2024

Thanks for the feedback. I'm currently re-evaluating the entire design. I'll come up with an enhanced proposal later.

@Jens-G Jens-G closed this Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants