Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing query strings in SPIFFE IDs #88

Open
jkanywhere opened this issue Oct 19, 2018 · 6 comments
Open

Consider allowing query strings in SPIFFE IDs #88

jkanywhere opened this issue Oct 19, 2018 · 6 comments

Comments

@jkanywhere
Copy link

Hello SPIFFE team,

I have a use case which would benefit from using query strings in SPIFFE IDs, a use currently forbidden by The SPIFFE Identity and Verifiable Identity Document standard. ("Valid SPIFFE IDs ... MUST NOT include a query ... component.")

Our services have [at least] two attributes

  • application id (app) - Used when services have multiple entry points or execution modes, such as web application or batch processing. Also used when the same service runs with a variety of different configurations such as a log processor ingesting from two different streams.
  • run time environment (env) - Used to indicate production vs staging. Production takes customer traffic, staging does not.

Both app and env are used to make authorization decisions: staging services can only write to other staging services; different applications should be able to read only from their corresponding input source.

(A similar example is given in the 2.2 Path section using service account and namespace.)

Adding app and env to URI path implies a hierarchical relationship when there is none:

  • Do applications live inside an environment, for example
    spiffe://k8s-west.example.com/service/foo/env/staging/app/streaming
  • or, is environment a sub-component of an application?
    spiffe://k8s-west.example.com/service/foo/app/streaming/env/staging

Neither is strictly true, app and env are orthogonal concepts. (The problem is multiplied when adding a third attribute.)

To highlight the key value nature of the data and we could use =, part of "sub-delims" reserved by RFC 3986, instead of /:
spiffe://k8s-west.example.com/service/foo/env=staging/app=streaming
spiffe://k8s-west.example.com/service/foo/app=streaming/env=staging

We can of course build application logic that would treat these two URIs as identical, however the naive approach would treat them differently and most developers are used to treating paths as hierarchical, regardless of separator.

On the other hand, because most web application developers and url parsing libraries are already familiar with query string as an unordered maps. It is more clear that
spiffe://k8s-west.example.com/service/foo?env=staging&app=streaming
is identical to
spiffe://k8s-west.example.com/service/foo?app=streaming&env=staging.

How would you recommend to handle this situation?
What are your reasons for excluding query component from SPIFFE URIs? (Perhaps trying to add metadata to identity is a mistake in the first place.)
Are you open to expanding the standard?

Thank you.

@evan2645
Copy link
Member

Hi Josh - thank you very much for filing this.

There is currently active discussion within the community on this point... I will share some thoughts/information here and am curious to hear what you think.

Adding app and env to URI path implies a hierarchical relationship when there is none

The spec does not specify hierarchy in any portion of the SPIFFE ID, however it is definitely not forbidden, and it is understood that in many cases a hierarchy does make sense. We expect site-specific hierarchies to arise as a result.

I am personally not a fan of the key-value pattern in the path component. It usually leads to someone trying to parse the pairs out of the path... Square has adopted a hierarchical pattern in their deployment, and they write authorization rules using wildcards (where necessary). Here is an example of how they are doing it: ghostunnel/ghostunnel#186

We can of course build application logic that would treat these two URIs as identical, however the naive approach would treat them differently and most developers are used to treating paths as hierarchical, regardless of separator.

I think there is a similar problem here with query component, as the extra naive approach is to do a straight string comparison on the URI SAN. This is the primary reason that we have avoided supporting query, k/v pair, etc.

What are your reasons for excluding query component from SPIFFE URIs? (Perhaps trying to add metadata to identity is a mistake in the first place.)

It is my personal opinion that adding metadata to the identity document will lead to interoperability problems. One of the major problems SPIFFE is trying to address (particularly with X.509) is that different X.509-using implementations encode information into the document in all manner of ways (e.g. common name, multiple SANs, OU, etc). Reconciling these differences is difficult or impossible.

Adding metadata which is then used to make authorization decisions feels very similar to me. Additional metadata will need to be opaque as far as the standard goes, since we won't be able to capture all the use cases in a rigid set. Deployments leveraging custom metadata will face challenges interoperating with deployments which aren't aware of the metadata scheme in use. IMO, it conflates authentication and authorization.

I think that the ideal scenario is that such metadata originates from a different system (perhaps the authz system itself), where the SPIFFE ID serves as the primary key. Though I understand that this introduces another challenge, which is the reason we are hearing that folks want to put metadata into the SVID (this has come up a few times already).

The approach that Square is taking (linked above) is a good one, as it doesn't sacrifice interoperability and is flexible enough to accomplish some of the things you are asking about. I have also heard of folks using OPA bundles to ship this kind of metadata to the authorization system in a way that remains performant.

Are you open to expanding the standard?

We are absolutely open to changing things around if/when/where it makes sense. This is by all means a community-led project, and we want to see it provide value and solve problems. The SPIFFE community holds bi-weekly calls to discuss matters relavent to the SPIFFE specification set - it would be a great venue to hold a high bandwidth conversation on this topic. Curious to hear your thoughts on all this, and thanks again for bringing it up.

@jkanywhere
Copy link
Author

Adding app and env to URI path implies a hierarchical relationship when there is none

The spec does not specify hierarchy in any portion of the SPIFFE ID, however it is definitely not forbidden, and it is understood that in many cases a hierarchy does make sense. We expect site-specific hierarchies to arise as a result.

I agree spec allows hierarchy in any portion of SPIFFE ID. In my view our infrastructure is non-hierarchical with respect to app and env and I prefer not to convey hierarchy by forcing order in path component.

I am personally not a fan of the key-value pattern in the path component. It usually leads to someone trying to parse the pairs out of the path...

In my experience someone definitely tries to parse identity strings.
I think they would try parsing equally given any of spiffe://.../app/bar/, spiffe://.../app=bar/, spiffe://...?app=bar.
I have use cases where parsing is needed to enforce complex authorization decisions. I want to make it easier and less error prone by using a key value delimiter different from path delimiter and rigorous internal standards for the meaning of these SPIFEE IDs.

I'm open to parsing or matching against regular expression, wildcard, or more complex key/value pairs.

Another approach is to specific a strict ordering and force all SPIFFE IDs to include service name, application, and environment even when not needed. Even if I left out the "keys" someone would still try to parse spiffe://k8s-west.example.com/foo/bar/production, it would just be harder for humans to remember what it meant.

Square has adopted a hierarchical pattern in their deployment, and they write authorization rules using wildcards (where necessary). Here is an example of how they are doing it: square/ghostunnel#186

To allow all environments of application bar of service foo with the * matcher I would need two matchers:

  • spiffe://k8s-west.example.com/service/foo/app/bar/**
  • spiffe://k8s-west.example.com/service/foo/env/*/app/bar/**

And it would break when adding a third attribute, e.g. deployment because
spiffe://k8s-west.example.com/service/foo/deployment/prod1/app/bar/ matches neither pattern.

If ** is permitted anywhere I could use a single matcher
spiffe://k8s-west.example.com/service/foo/**/app/bar/**.

I could also use spiffe://k8s-west.example.com/service/foo/**/app=bar/** which is better in case bar is also a valid attribute name.

We can of course build application logic that would treat these two URIs as identical, however the naive approach would treat them differently and most developers are used to treating paths as hierarchical, regardless of separator.

I think there is a similar problem here with query component, as the extra naive approach is to do a straight string comparison on the URI SAN. This is the primary reason that we have avoided supporting query, k/v pair, etc.

Correct, naive string comparison will fail in both cases and wildcard matching would work in both cases. A benefit is, most URI libraries can parse query string into an unordered map (e.g. URL.Query, url.ParseQuery).

What are your reasons for excluding query component from SPIFFE URIs? (Perhaps trying to add metadata to identity is a mistake in the first place.)

It is my personal opinion that adding metadata to the identity document will lead to interoperability problems.

An entity will could be denied access to a resource that doesn't understand how to parse its SPIFFE ID (false negative), that seems like a good default.
I'm struggling to think of a case where an actor with a query string in it's SPIFFE ID would be granted access erroneously (false positive).
spiffe://k8s-west.example.com/service/foo/** could be taken to match spiffe://k8s-west.example.com/service/foo?app=bar&env=production, and I think that is consistent with the meaning of trailing **: the remainder of the identity is unimportant.

Adding metadata which is then used to make authorization decisions feels very similar to me. Additional metadata will need to be opaque as far as the standard goes, since we won't be able to capture all the use cases in a rigid set. Deployments leveraging custom metadata will face challenges interoperating with deployments which aren't aware of the metadata scheme in use. IMO, it conflates authentication and authorization.

I agree metadata can encode authorization in the general case, things like admin=true or scope=contacts and that should be avoided.
Application and environment are elements of actor identity and imply nothing about what it is allowed to do. Perhaps metadata is the wrong word here.
Some actors have two names like humans: think of service name as surname, and application name as given name.
Similarly attributes like facial biometrics (photo) and age are encoded into your drivers license identity document. Enforcers such as bars parse age from your identity document, whereas TSA parse your photo and name to match against your actual face and airline ticket.

I think that the ideal scenario is that such metadata originates from a different system (perhaps the authz system itself), where the SPIFFE ID serves as the primary key. Though I understand that this introduces another challenge, which is the reason we are hearing that folks want to put metadata into the SVID (this has come up a few times already).

In cases where data is keyed by SPIFFE ID, we will need different data returned for different values of app and env.

The approach that Square is taking (linked above) is a good one, as it doesn't sacrifice interoperability and is flexible enough to accomplish some of the things you are asking about.

Looks like Square's matchers could be expanded to meet my needs by allowing ** anywhere in the URI.
It would not support using = as key value separator in path.
I'm not sure how you would match the case where app is missing entirely. I suppose you could use not spiffe://**/app/**. Seem awkward.

I have also heard of folks using OPA bundles to ship this kind of metadata to the authorization system in a way that remains performant.

I think we're using OPA in some cases to distribute authorization policies. How does OPA address including app and env in the identity of an actor?

Are you open to expanding the standard?

We are absolutely open to changing things around if/when/where it makes sense. This is by all means a community-led project, and we want to see it provide value and solve problems. The SPIFFE community holds bi-weekly calls to discuss matters relevant to the SPIFFE specification set - it would be a great venue to hold a high bandwidth conversation on this topic. Curious to hear your thoughts on all this, and thanks again for bringing it up.

Sounds great, I'm happy to join a call. When is your next call please?
Thank you as well for your thoughtful response.

@bradleyjames
Copy link

bradleyjames commented Oct 22, 2018

I'm curious about the environment use case. My assumed usage of environments is cross-environment communication should never occur. It seems like separate trust domains would be a more appropriate solution than metadata.

@jkanywhere
Copy link
Author

Good question. We do not have a full staging environment, instead staging and production are intermingled on the same cluster. When teams choose to run a staging copy of their service some communicate only to other staging services, and some interact with production services using test tenancy data.

I agree trust domain is another solution for env, which does not change my initial proposal that query parameters are appropriate for non-hierarchical attributes of identity.
I could have also used deployment or pod as examples.

Perhaps should not have said "metadata", these are fundamental aspects of service identity that are required to make authorization decisions.

I am prepared to use key=value path segments and specific that order cannot be relied on. At least one colleague asked why I was not query parameters, the existing key value portion of URLs.

@zhangweikop
Copy link

hi @jkanywhere
We are considering doing the same thing.
I would like to hear more about your experience if you have started using for few years.

Thank you.

@andrewpmartinez
Copy link

andrewpmartinez commented Jun 16, 2022

If query parameters were allowed the spiffe:// protocol of the URI would most likely have to be incremented to spiffe2:// or risk breakage.

The query portion of the URI has no RFC requirements to be rendered deterministically. Library, language, and compilation environment changes could cause URIs with query parameters to render differently (e.g. due to looping through k/v's in a map that had its hashing algo changed). Introducing scenarios where "it worked in dev but not in production" and the like due to the mentioned "naive string comparison" algorithm that is allowed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants