New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new security considerations section for SPIFFE IDs #215
base: main
Are you sure you want to change the base?
Add new security considerations section for SPIFFE IDs #215
Conversation
The SPIFFE ID spec has within it a set of character restrictions that are intended to mitigate confusion and potential exploitation around how to interpret an ID and the equality of IDs. It is important to communicate the reasons and necessity of these restrictions to our users so that they are preserved in their implementations. This commit adds a new security considerations section that details why these restrictions are in place, and touches up a few spots that have fallen out of date following the changes in these restrictions. It is a follow-on to the change made in PR spiffe#183. Signed-off-by: Evan Gilman <egilman@vmware.com>
1. Includes only uppercase or lowercase alpha-numeric characters, `.`s, `-`s, `_`s, and `/`s | ||
1. The character sequences `//`, `/./`, and `/../` do not appear anywhere in the string | ||
|
||
If the above checks are successful, the SPIFFE ID is valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @spikecurtis's python script can be linked to from here. Maybe we can add it to this PR and check it into a new /standards/examples
directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could link to it directly from section 4.2 and omit this appendix item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way on whether to include the above appendix text.
Pointing to example parsers sounds like a good idea.
@@ -109,12 +113,18 @@ Paths MAY be hierarchical - similar to filesystem paths. The specific meaning of | |||
|
|||
URIs, as defined by [RFC 3986](https://tools.ietf.org/html/rfc3986), do not have a maximal length. As an interoperability consideration, SPIFFE implementations MUST support SPIFFE URIs up to 2048 bytes in length and SHOULD NOT generate URIs of length greater than 2048 bytes. [RFC 3986](https://tools.ietf.org/html/rfc3986) permits only ASCII characters, thus the recommended maximum length of a SPIFFE ID is 2048 bytes. | |||
|
|||
All URI components contribute to the URI length, including the "spiffe" scheme, "://" separator, trust domain name, and path component. Non-ASCII characters contribute to the URI length after they are percent encoded as ASCII characters. Note that [RFC 3986](https://tools.ietf.org/html/rfc3986) defines a maximum length of 255 characters for the "host" component of a URI; therefore a maximum length of a trust domain name is 255 bytes. | |||
All URI components contribute to the URI length, including the "spiffe" scheme, "://" separator, trust domain name, and path component. Note that [RFC 3986](https://tools.ietf.org/html/rfc3986) defines a maximum length of 255 characters for the "host" component of a URI; therefore a maximum length of a trust domain name is 255 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All URI components contribute to the URI length, including the "spiffe" scheme, "://" separator, trust domain name, and path component. Note that [RFC 3986](https://tools.ietf.org/html/rfc3986) defines a maximum length of 255 characters for the "host" component of a URI; therefore a maximum length of a trust domain name is 255 bytes. | |
All URI components contribute to the URI length, including the "spiffe" scheme, "://" separator, trust domain name, and path component. Note that [RFC 3986](https://tools.ietf.org/html/rfc3986) defines a maximum length of 255 characters for the "host" component of a URI; therefore a maximum length of a trust domain name is 255 characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the substitution of "bytes" for "characters" important?
What are we asking implementors to do differently with characters than with bytes?
If I were to implement this spec and encountered "bytes" here, I would know to process the input string byte-by-byte. But if the spec says "characters", then that suggests to me that I need to know more about the definition of "character". Are they fixed-width? Do I need to use some particular charset?
|
||
### 2.4. SPIFFE ID Parsing | ||
|
||
SPIFFE IDs follow the URI specification as defined by [RFC 3986](https://tools.ietf.org/html/rfc3986). The scheme and trust domain name of the SPIFFE ID are case-insensitive. The path is case-sensitive. | ||
|
||
### 2.5. SPIFFE ID Equivalency | ||
|
||
Two SPIFFE IDs are equivalent if and only if they match on a byte-for-byte basis. Note that since SPIFFE IDs allow only ASCII characters without percent-encoding, and SPIFFE also forbids capital letters in the `host` part of the authority (which is traditionally case-insensitive), comparisons of legal SPIFFE IDs are disambiguated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two SPIFFE IDs are equivalent if and only if they match on a byte-for-byte basis. Note that since SPIFFE IDs allow only ASCII characters without percent-encoding, and SPIFFE also forbids capital letters in the `host` part of the authority (which is traditionally case-insensitive), comparisons of legal SPIFFE IDs are disambiguated. | |
Two SPIFFE IDs are equivalent if and only if they match on a character-for-character basis. Note that since SPIFFE IDs forbid percent-encoded characters, and forbid capital letters in the `host` part of the authority (which is traditionally case-insensitive), comparisons of legal SPIFFE IDs are disambiguated. |
### 4.2. ID Equivalency | ||
The comparison of SPIFFE IDs is a security critical operation. In allowing for internationalization, as well as compatibility with the DNS system, the URI standard which SPIFFE IDs rely upon has made a handful of decisions which complicate the equivalency process. Specifically, case-insensitivity, ambiguous percent-encoding rules and support for UTF-8, and the implementation of many conditionally-special characters can make the topic a confusing one. | ||
|
||
This specification has specifically forbidden the use of characters and encoding schemes that complicate comparison (please see sections [2.1](#21-trust-domain) and [2.2](#22-path) for more information). Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a byte-for-byte basis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specification has specifically forbidden the use of characters and encoding schemes that complicate comparison (please see sections [2.1](#21-trust-domain) and [2.2](#22-path) for more information). Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a byte-for-byte basis. | |
This specification has specifically forbidden the use of characters and encoding schemes that complicate comparison (please see sections [2.1](#21-trust-domain) and [2.2](#22-path) for more information). Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a character-for-character basis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a byte-for-byte basis."
This is the intention, or aspiration of the design. But as stated, I read this as a fact. Are we actually able to make this claim? Do we know that we haven't made a mistake somewhere or that some future parser may deviate from today's norms?
1. What remains is the path - scan it for the following conditions in sequential order: | ||
1. The string does not start or end with `/` | ||
1. Includes only uppercase or lowercase alpha-numeric characters, `.`s, `-`s, `_`s, and `/`s | ||
1. The character sequences `//`, `/./`, and `/../` do not appear anywhere in the string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would allow something like spiffe://mydomain/spike/..
--- that is to say, it doesn't eliminate relative path segments as the final segment.
|
||
This specification has specifically forbidden the use of characters and encoding schemes that complicate comparison (please see sections [2.1](#21-trust-domain) and [2.2](#22-path) for more information). Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a byte-for-byte basis. | ||
|
||
It is still possible for a legal URI (but illegal SPIFFE ID) to be legally processed according to URI normalization rules _into_ a legal SPIFFE ID. Thus, under certain circumstances, it is possible for an illegal SPIFFE ID to be passed through a URI parser and produce a legal SPIFFE ID. These two SPIFFE IDs **will not match** on a byte-for-byte basis, however there exists room for confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still possible for a legal URI (but illegal SPIFFE ID) to be legally processed according to URI normalization rules _into_ a legal SPIFFE ID. Thus, under certain circumstances, it is possible for an illegal SPIFFE ID to be passed through a URI parser and produce a legal SPIFFE ID. These two SPIFFE IDs **will not match** on a byte-for-byte basis, however there exists room for confusion. | |
It is still possible for a legal URI (but illegal SPIFFE ID) to be legally processed according to URI normalization rules _into_ a legal SPIFFE ID. Thus, under certain circumstances, it is possible for an illegal SPIFFE ID to be passed through a URI parser and produce a legal SPIFFE ID. These two SPIFFE IDs **will not match** on a character-for-character basis, however there exists room for confusion. |
|
||
## 5. Appendix A. Lightweight SPIFFE ID validation | ||
SPIFFE IDs have a strict character set that is designed to be as consistent and as easy to validate as possible. Here is a logic example demonstrating a simple SPIFFE ID validation mechanism: | ||
1. Total number of characters is less than 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since less than 2048 is only a SHOULD in the standard, I don't think this can be part of the validation algorithm
|
||
Two SPIFFE IDs are equivalent if and only if they match on a byte-for-byte basis. Note that since SPIFFE IDs allow only ASCII characters without percent-encoding, and SPIFFE also forbids capital letters in the `host` part of the authority (which is traditionally case-insensitive), comparisons of legal SPIFFE IDs are disambiguated. | ||
|
||
It is important to try to minimize the amount of processing done on SPIFFE IDs received from untrusted sources prior to comparing them, as it is possible for some URI parsers to normalize an illegal SPIFFE ID into a legal one. Please see Security Considerations [section 4.2](#42-id-equivalency) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important to try to minimize the amount of processing done on SPIFFE IDs received from untrusted sources prior to comparing them, as it is possible for some URI parsers to normalize an illegal SPIFFE ID into a legal one. Please see Security Considerations [section 4.2](#42-id-equivalency) for more information. | |
As it is possible for some URI parsers to normalize an illegal SPIFFE ID into a legal one, it is important to try to minimize the amount of processing done on SPIFFE IDs received from untrusted sources prior to comparing them. Please see Security Considerations [section 4.2](#42-id-equivalency) for more information. |
### 4.2. ID Equivalency | ||
The comparison of SPIFFE IDs is a security critical operation. In allowing for internationalization, as well as compatibility with the DNS system, the URI standard which SPIFFE IDs rely upon has made a handful of decisions which complicate the equivalency process. Specifically, case-insensitivity, ambiguous percent-encoding rules and support for UTF-8, and the implementation of many conditionally-special characters can make the topic a confusing one. | ||
|
||
This specification has specifically forbidden the use of characters and encoding schemes that complicate comparison (please see sections [2.1](#21-trust-domain) and [2.2](#22-path) for more information). Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a byte-for-byte basis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a byte-for-byte basis."
This is the intention, or aspiration of the design. But as stated, I read this as a fact. Are we actually able to make this claim? Do we know that we haven't made a mistake somewhere or that some future parser may deviate from today's norms?
@@ -185,3 +195,29 @@ For example, imagine that the operator of trust domain A and trust domain B have | |||
|
|||
Now imagine that a request for the customer’s data is received by the storage service in trust domain A, except the caller presented an SVID from trust domain B. Even though the presented SVID may have the necessary assertion indicating that the shared customer was authenticated and authorized the request, it is a bad idea to blindly assume that trust domain B has indeed authenticated your customer. If trust domain B is compromised, or if it has a malicious internal actor, it could claim to have authenticated any user it wishes, thus creating the very circumstance that the measure was designed to mitigate in the first place. | |||
|
|||
### 4.2. ID Equivalency | |||
The comparison of SPIFFE IDs is a security critical operation. In allowing for internationalization, as well as compatibility with the DNS system, the URI standard which SPIFFE IDs rely upon has made a handful of decisions which complicate the equivalency process. Specifically, case-insensitivity, ambiguous percent-encoding rules and support for UTF-8, and the implementation of many conditionally-special characters can make the topic a confusing one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison of SPIFFE IDs is a security critical operation. In allowing for internationalization, as well as compatibility with the DNS system, the URI standard which SPIFFE IDs rely upon has made a handful of decisions which complicate the equivalency process. Specifically, case-insensitivity, ambiguous percent-encoding rules and support for UTF-8, and the implementation of many conditionally-special characters can make the topic a confusing one. | |
The comparison of SPIFFE IDs is a security critical operation. In allowing for internationalization, as well as compatibility with the DNS system, the URI standard which SPIFFE IDs rely upon has made a handful of decisions which complicate the equivalency process. Specifically, case-insensitivity, ambiguous percent-encoding rules and support for UTF-8, and the implementation of many conditionally-special characters can make interpretation and implementation of the rules challenging. |
|
||
This specification has specifically forbidden the use of characters and encoding schemes that complicate comparison (please see sections [2.1](#21-trust-domain) and [2.2](#22-path) for more information). Thus, any legal SPIFFE ID can be considered safe. It will not be misinterpreted or transformed by traditional URI libraries, and it can be safely compared with other legal IDs on a byte-for-byte basis. | ||
|
||
It is still possible for a legal URI (but illegal SPIFFE ID) to be legally processed according to URI normalization rules _into_ a legal SPIFFE ID. Thus, under certain circumstances, it is possible for an illegal SPIFFE ID to be passed through a URI parser and produce a legal SPIFFE ID. These two SPIFFE IDs **will not match** on a byte-for-byte basis, however there exists room for confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"These two SPIFFE IDs will not match on a byte-for-byte basis, however there exists room for confusion."
Is it the IDs or the URLs which will not match byte-for-byte/character-for-character?
Consider rephrasing to something like: The pre- and post-processed SPIFFE IDs/URIs will not match on a ... basis.
"however, there exists room for confusion"
I don't understand this last phrase. Can it be dropped?
|
||
It is still possible for a legal URI (but illegal SPIFFE ID) to be legally processed according to URI normalization rules _into_ a legal SPIFFE ID. Thus, under certain circumstances, it is possible for an illegal SPIFFE ID to be passed through a URI parser and produce a legal SPIFFE ID. These two SPIFFE IDs **will not match** on a byte-for-byte basis, however there exists room for confusion. | ||
|
||
This is an important detail to keep in mind, particularly when accepting untrusted input that is not signed over by a trust domain authority (e.g. user input from a web app). Untrusted input should always be validated against the SPIFFE ID ruleset prior to accepting and parsing with a traditional URI parser. Please see [Appendix A](#5-appendix-a) for a brief example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This is an important detail to keep in mind"
What is the subject of the sentence? (What does "this" refer to?)
1. Includes only uppercase or lowercase alpha-numeric characters, `.`s, `-`s, `_`s, and `/`s | ||
1. The character sequences `//`, `/./`, and `/../` do not appear anywhere in the string | ||
|
||
If the above checks are successful, the SPIFFE ID is valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way on whether to include the above appendix text.
Pointing to example parsers sounds like a good idea.
1. The (new) first character must not be a `/` | ||
1. From the beginning, strip one character at a time: | ||
1. Ensure it is either a lowercase alpha-numeric, a `.`, a `-`, or a `/` | ||
1. When `/` is detected, stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. When `/` is detected, stop | |
1. If the character is `/`, stop |
@@ -109,12 +113,18 @@ Paths MAY be hierarchical - similar to filesystem paths. The specific meaning of | |||
|
|||
URIs, as defined by [RFC 3986](https://tools.ietf.org/html/rfc3986), do not have a maximal length. As an interoperability consideration, SPIFFE implementations MUST support SPIFFE URIs up to 2048 bytes in length and SHOULD NOT generate URIs of length greater than 2048 bytes. [RFC 3986](https://tools.ietf.org/html/rfc3986) permits only ASCII characters, thus the recommended maximum length of a SPIFFE ID is 2048 bytes. | |||
|
|||
All URI components contribute to the URI length, including the "spiffe" scheme, "://" separator, trust domain name, and path component. Non-ASCII characters contribute to the URI length after they are percent encoded as ASCII characters. Note that [RFC 3986](https://tools.ietf.org/html/rfc3986) defines a maximum length of 255 characters for the "host" component of a URI; therefore a maximum length of a trust domain name is 255 bytes. | |||
All URI components contribute to the URI length, including the "spiffe" scheme, "://" separator, trust domain name, and path component. Note that [RFC 3986](https://tools.ietf.org/html/rfc3986) defines a maximum length of 255 characters for the "host" component of a URI; therefore a maximum length of a trust domain name is 255 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the substitution of "bytes" for "characters" important?
What are we asking implementors to do differently with characters than with bytes?
If I were to implement this spec and encountered "bytes" here, I would know to process the input string byte-by-byte. But if the spec says "characters", then that suggests to me that I need to know more about the definition of "character". Are they fixed-width? Do I need to use some particular charset?
@justinburke interestingly, GitHub is letting me edit your comment, but not reply to it. So, I'll do so here
A URI is defined as a sequence of characters, not bytes. To quote from the RFC
So, it is simply not logically consistent to speak about the length of a URI or URI-component in terms of bytes unless you also specify the character encoding. As a simple example, a URI encoded in UTF-16 will occupy twice the number of bytes as one encoded in UTF-8. Implementers might need to think about both characters and bytes, depending on the question they are asking themselves. Like, if I write an in-memory data-structure using SPIFFE-IDs as keys in Java, and I want to know worst case occupancy, that would be 4096-bytes to store the key, since java internally stores Strings with two bytes per character. But, if I'm writing a SQL database schema, I can just write CHAR(2048) as the column type and not really worry too much about it. |
Same here. Strange bug.
We currently have in section 2.3 of this spec (and unmodified by this PR) that we're using ASCII ("RFC 3986](https://tools.ietf.org/html/rfc3986) permits only ASCII characters"). I interpret that as specifying a character encoding. Do you disagree?
I think that what an implementation does within the confines of itself is beyond scope for this spec. What is in scope (or should be, IMO), are the requirements and expectations for interoperating between distinct implementations. What are the requirements for how an ID is represented in- and communicated via- an SVID? The spec needs to unambiguously state what is required for the exchange and safe interpretation of SVIDs. For example, if I send you a valid, spec-conforming X.509 SVID, should you expect that the SPIFFE ID is contained in an ASN.1 Octet String, which has maximum length 2048 octets and that each octet has a one-to-one correspondence with an ASCII character? If we change this spec to prefer 'character' instead of 'byte', then we need followup PRs to update each SVID spec so that they clearly define encoding requirements. Currently, both specs refer to this one to define the SPIFFE ID concept. I suggest that we strongly prefer language in this spec which promotes safe implementation and strive to make it difficult for ordinary, non-security experts to incorrectly implement. So far, the distinction between 'character' and 'byte' feels to me like a distinction without a practical difference and an unnecessary obfuscation. If it would make the text clearer, perhaps we could include a warning to implementors that not all string libraries are the same (see Java Strings). |
I think there was a desire to express the requirements in bytes for some reason that I don't fully recall. What we have now seems to go something like this:
I personally think 2 and 3 are unnecessary, and we should just stop at 1.
Here again, I think we just need to talk about characters, not bytes in this spec. A safe SVID must support unambiguously encode a SPIFFE-ID, which itself is a string of characters. Exactly how the SPIFFE-ID is encoded depends on the SVID type and is therefore beyond the scope of this document. In the case of X.509, because we encode the SPIFFE-ID in a URI SAN, it is unambiguous how the SPIFFE-ID is encoded: it's an IA5String encoded in ASN.1 (https://datatracker.ietf.org/doc/html/rfc4985). In the case of JWT, JWT is itself a text format, and so the SPIFFE-ID is encoded the same way as the rest of the document. Any ambiguity on the encoding there affects the document as a whole, so SPIFFE-ID is not specially affected. We might need to revisit whether the spec is clear about safe exchange of these documents vis-a-vis encoding.
I agree that we want to promote safe implementation for non-security experts, and I think just sticking to "characters" and not talking about "bytes" is the way forward. String libraries in most modern languages are organized around characters, and so if we talk about about and encourage people to think about characters they should be fine. Telling people to do things like compare byte-for-byte is actually where the foot-guns lie. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel that character is the right "primitive" to talk about here, instead of bytes. As mentioned, different programming languages and libraries use different "natural" string encoding. When you are consuming things from these languages/libraries (e.g. URI SAN in an X.509 certificate), more likely than not it is going to be represented to your code using the natural string encoding. If we mandate that people do byte-by-byte comparison, then they have to do one more conversion, for really no benefit that I can see. For validation purposes, it is generally easy to ask the question "is this character ASCII" in whatever encoding is being used (e.g. does the Win32 IsTextUnicode function return IS_TEXT_UNICODE_ASCII16 to indicate that the string is zero-extended ASCII). If you are comparing two SPIFFE IDs that you have already validated, then a character by character comparison feels unambiguous and hard to get wrong.
The SPIFFE ID spec has within it a set of character restrictions that
are intended to mitigate confusion and potential exploitation around
how to interpret an ID and the equality of IDs. It is important to
communicate the reasons and necessity of these restrictions to our users
so that they are preserved in their implementations.
This commit adds a new security considerations section that details why
these restrictions are in place, and touches up a few spots that have
fallen out of date following the changes in these restrictions. It is a
follow-on to the change made in PR #183.
Signed-off-by: Evan Gilman egilman@vmware.com