Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: wrong certificate chain is used if preferredChain is configured #6755

Merged
merged 8 commits into from
Feb 20, 2024

Conversation

import-shiburin
Copy link
Contributor

@import-shiburin import-shiburin commented Feb 13, 2024

Pull Request Motivation

Due to recent change of Let's Encrypt Chain of Trust, the short certificate chain (having ISRG Root X1 as root certificate) became the default certificate provided for /acme/certificate API endpoint. Old long certificate chain, which is cross-signed with DST Root CA X3, is now served as an alternate certificate chain.

Caused by this change, if preferredChain is configured as ISRG Root X1 in Issuer or ClusterIssuer, cert-manager now returns a long-chain certificate, cross-signed by DST Root CA, since the current implementation does not include default certificate chain while looking for preferred chain.

Going worse, this does not fall back to the default chain but returns the cross-signed chain since that chain also includes ISRG Root CA X1 as intermediate CA.

This PR addresses the issue by

  1. Including default certificate chain while evaluating preferred certificate bundle
  2. Checking Issuer CN of topmost certificate, like certbot

Kind

bug

Release Note

Breaking Change: Fixed unintended certificate chain is used if `preferredChain` is configured.

Signed-off-by: Sam Lee <me@shibuya-rin.moe>
Signed-off-by: Sam Lee <me@shibuya-rin.moe>
@jetstack-bot jetstack-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 13, 2024
@jetstack-bot
Copy link
Contributor

Hi @import-shiburin. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jetstack-bot jetstack-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/acme Indicates a PR directly modifies the ACME Issuer code labels Feb 13, 2024
@inteon
Copy link
Member

inteon commented Feb 13, 2024

/ok-to-test

@jetstack-bot jetstack-bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 13, 2024
@jetstack-bot jetstack-bot added dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 13, 2024
Signed-off-by: Sam Lee <me@shibuya-rin.moe>
@jetstack-bot jetstack-bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. and removed dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. labels Feb 13, 2024
Signed-off-by: Sam Lee <me@shibuya-rin.moe>
@inteon
Copy link
Member

inteon commented Feb 13, 2024

EXTRA INFO: this feature was introduced in #3208

return true, altChain, nil
}
// Check topmost certificate
cert, err := x509.ParseCertificate(certChain[len(certChain)-1])
Copy link
Member

@inteon inteon Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be non-ideal to backport this change, some non-let's encrypt use cases might rely on selecting a chain using the CommonName of the 2nd or nth certificate.

For the patch release, we can maybe pick the chain which has a match closes to the top?

And make a breaking change in the next full cert-manager release?

@SgtCoDFish 2nd opinion would be good here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching closest to top might still break weird edge cases, like primary cert chain and alt cert chain having identical intermediate CA on the same depth, thus we cannot ensure 100% compatibility if we pick the closest chain.

However, if we still want to cover some edge cases like you've mentioned, there's no reason not to do it.

@inteon
Copy link
Member

inteon commented Feb 14, 2024

xref: #4505 and certbot/certbot#8596

@inteon
Copy link
Member

inteon commented Feb 14, 2024

@import-shiburin
Do you agree that this bug will not cause any outages due to the current and upcoming Let's encrypt changes?
Based on the timeline that Let's encrypt published, I think the following will happen if we do not fix the bug:

  1. starting from February 8th:
    • preferredChain: ISRG Root X1 now results in the wrong chain (the cross-signed chain), this chain should still work for all use cases, but will just be longer.
      EDIT: this will break clients that rely on the specific ISRG Root X1 to be returned by the webserver.
  2. starting from June 6th:
    • preferredChain: ISRG Root X1 will fail to match an alternative chain, and will default to the default chain. This default chain is the ISRG Root X1 chain, so users will be back to the original behavior (they will serve the shorter chain again).

@import-shiburin
Copy link
Contributor Author

import-shiburin commented Feb 14, 2024

@import-shiburin Do you agree that this bug will not cause any outages due to the current and upcoming Let's encrypt changes? Based on the timeline that Let's encrypt published, I think the following will happen if we do not fix the bug:

1. starting from February 8th:
   
   * `preferredChain: ISRG Root X1` now results in the wrong chain (the cross-signed chain), this chain should still work for all use cases, but will just be longer.

2. starting from June 6th:
   
   * `preferredChain: ISRG Root X1` will fail to match an alternative chain, and will default to the default chain. This default chain is the `ISRG Root X1` chain, so users will be back to the original behavior (they will serve the shorter chain again).

Yes in general, but not 100%. Your thoughts are correct, except this line: this chain should still work for all use cases, but will just be longer..

I've found this bug because one of services served by our team is connected by Unity application, which is affected by this issue. If anybody mitigated that issue by configuring preferredChain, they may face potential service outage.

@wallrj wallrj linked an issue Feb 14, 2024 that may be closed by this pull request
@inteon
Copy link
Member

inteon commented Feb 14, 2024

Thanks for the extra information @import-shiburin.
We discussed how we think we can fix this issue for as many users as possible without breaking existing users (seems to be non-trivial).
The action items we agreed on:

  1. merge this fix into master as-is to fix future issues with future releases (we will also add a release note that explains the new behavior)
  2. [DONE] update existing release notes with a note that explains that this is a known issue and that the immediate fix is to remove the preferredChain: ISRG Root X1 property (since this does not require upgrading to a new version).
  3. if we see more people running into this problem (please like this PR), we will consider backporting this fix and possibly breaking users who were relying on the incorrect behavior.

Do you think this plan of action makes sense?

/cc @wallrj @SgtCoDFish @ThatsMrTalbot

…chain

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@jetstack-bot jetstack-bot added area/testing Issues relating to testing size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 16, 2024
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR (as it stands) changes two things about the original behaviour which is documented in the API, here:

// PreferredChain is the chain to use if the ACME server outputs multiple.
// PreferredChain is no guarantee that this one gets delivered by the ACME
// endpoint.
// For example, for Let's Encrypt's DST crosssign you would use:
// "DST Root CA X3" or "ISRG Root X1" for the newer Let's Encrypt root CA.
// This value picks the first certificate bundle in the ACME alternative
// chains that has a certificate with this value as its issuer's CN
// +optional
// +kubebuilder:validation:MaxLength=64
PreferredChain string `json:"preferredChain,omitempty"`

1. Compare with last cert in chain

Instead of selecting a chain that contains a certificate issued by a parent with CN of preferredChain, it now selects a chain that ends with a certificate issued by a parent with CN of preferredChain. That conflicts with documented behaviour of this feature. And it may conflict with the intended behaviour which was supposed to fix the problem reported by @OndroNR in #1700 (Option to select intermediate certificate)...it could be that the title of that issue is misleading....I don't know.

It also seems to me that this change alone would be enough to fix the problem reported by @import-shiburin, because they configure preferredChain: ISRG Root X1 in order to avoid DST Root CA X3, because that certificate crashes some of their clients. With this change, and since Feb 8, none of the alternative chains would end with a certificate issued by ISRG Root X1 and therefore the default chain would be used.

UPDATED: I looked up the RFC and realized I was mixing up first and last, so I've updated the comment.

The server MAY provide one or more link relation header fields
[RFC8288] with relation "alternate". Each such field SHOULD express
an alternative certificate chain starting with the same end-entity
certificate. This can be used to express paths to various trust
anchors. Clients can fetch these alternates and use their own
heuristics to decide which is optimal.

https://www.rfc-editor.org/rfc/rfc8555#section-7.4.2

2. Search the default chain first

I think this is how the feature ought to have been implemented in the first place,
but it clearly conflicts with the API documentation which says:

This value picks the first certificate bundle in the ACME alternative chains

I'm inclined to drop the second change for this PR.
But happy to give way if people think I'm wrong.

preferredChain := issuer.GetSpec().ACME.PreferredChain
found, altChain, err := getAltCertChain(ctx, cl, certURL, preferredChain)
preferredChainName := issuer.GetSpec().ACME.PreferredChain
found, preferredCertChain, err := getPreferredCertChain(ctx, cl, certURL, certSlice, preferredChainName)
if err != nil {
return fmt.Errorf("error retrieving alternate chain: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("error retrieving alternate chain: %w", err)
return fmt.Errorf("error retrieving preferred chain: %w", err)

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@jetstack-bot jetstack-bot added the area/api Indicates a PR directly modifies the 'pkg/apis' directory label Feb 16, 2024
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@jetstack-bot jetstack-bot added the area/deploy Indicates a PR modifies deployment configuration label Feb 18, 2024
@import-shiburin
Copy link
Contributor Author

@wallrj

At a glance, #1700 seems like the author needed to select certs between DST and ISRG, which is implemented on #3208.

I agree with your opinion (applying the first change only) since the implementation may differ by spec considering this line from RFC:

Clients can fetch these alternates and use their own heuristics to decide which is optimal.

However, it still leaves some consideration that users may be confused by different behaviors among ACME client implementations (especially considering normal users - at least I think - are not aware of ACME spec and thus may expect the same behavior while using various ACME clients)

@inteon
Copy link
Member

inteon commented Feb 20, 2024

@wallrj

At a glance, #1700 seems like the author needed to select certs between DST and ISRG, which is implemented on #3208.

I agree with your opinion (applying the first change only) since the implementation may differ by spec considering this line from RFC:

Clients can fetch these alternates and use their own heuristics to decide which is optimal.

However, it still leaves some consideration that users may be confused by different behaviors among ACME client implementations (especially considering normal users - at least I think - are not aware of ACME spec and thus may expect the same behavior while using various ACME clients)

DST and ISRG both are root certificates, so it seems like the title of #1700 should actually be "Option to select intermediate root certificate"?

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks @import-shiburin and @inteon for fixing this and for answering my questions.

I prefer this new behaviour of the preferredChain feature and I think our users will prefer it too.
It will need a note in the release notes and upgrade notes for 1.15 explaining that this is a possible breaking change.
So please also prefix the release note in this PR with "Breaking Change: " (as we have done in the past) so that the release manager doesn't forget to highlight this.

I think it is ambiguous to identify the preferred chain by the "common name" of the issuer. Only convention prevents there being two root certificates with the same common name, right?

In hindsight it might have been more intuitive if the user could specify exactly which CA certificate they trust and we could then have programmed cert-manager to select the chain with the shortest route to that specific certificate.

It also occurs to me that Let's Encrypt could / should include all the chains when you call their "alternativeChains" API...would seem more natural to me that it gives you all the possible chains for a particular signed certificate in a single API call.

/lgtm
/kind bug

@jetstack-bot jetstack-bot added kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Feb 20, 2024
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wallrj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2024
@jetstack-bot jetstack-bot merged commit f643eef into cert-manager:master Feb 20, 2024
7 checks passed
@wallrj wallrj modified the milestones: 1.15-alpha.0, 1.15 Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/acme Indicates a PR directly modifies the ACME Issuer code area/api Indicates a PR directly modifies the 'pkg/apis' directory area/deploy Indicates a PR modifies deployment configuration area/testing Issues relating to testing dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. ok-to-test release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
4 participants