Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caddytls: tailscale cert manager not used as fallback for *.ts.net certs #6327

Closed
willnorris opened this issue May 19, 2024 · 9 comments
Closed
Milestone

Comments

@willnorris
Copy link
Contributor

It appears that the tailscale cert manager is only used when a ts.net domain is explicitly configured for a site:

samson.tailb575b.ts.net:443 {
  respond OK
}

This works fine, and shows debug output of:

2024/05/19 01:44:25.371 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "samson.tailb575b.ts.net"}
2024/05/19 01:44:25.371 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.tailb575b.ts.net"}
2024/05/19 01:44:25.372 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.*.ts.net"}
2024/05/19 01:44:25.372 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.*.*.net"}
2024/05/19 01:44:25.372 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.*.*.*"}
2024/05/19 01:44:38.957 DEBUG   tls.handshake   using externally-managed certificate    {"remote_ip": "100.120.137.94", "remote_port": "53195", "sni": "samson.tailb575b.ts.net", "names": ["samson.tailb575b.ts.net"], "expiration": "2024/08/17 00:44:38.000"}

However, when configured as:

:443 {
  respond OK
}

I get debug logs:

2024/05/19 01:48:42.921 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "samson.tailb575b.ts.net"}
2024/05/19 01:48:42.921 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.tailb575b.ts.net"}
2024/05/19 01:48:42.921 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.*.ts.net"}
2024/05/19 01:48:42.921 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.*.*.net"}
2024/05/19 01:48:42.921 DEBUG   tls.handshake   no matching certificates and no custom selection logic  {"identifier": "*.*.*.*"}
2024/05/19 01:48:42.921 DEBUG   tls.handshake   no certificate matching TLS ClientHello {"remote_ip": "100.120.137.94", "remote_port": "53237", "server_name": "samson.tailb575b.ts.net", "remote": "100.120.137.94:53237", "identifier": "samson.tailb575b.ts.net", "cipher_suites": [4867, 4866, 4865, 52393, 52392, 52394, 49200, 49196, 49192, 49188, 49172, 49162, 159, 107, 57, 65413, 196, 136, 129, 157, 61, 53, 192, 132, 49199, 49195, 49191, 49187, 49171, 49161, 158, 103, 51, 190, 69, 156, 60, 47, 186, 65, 49169, 49159, 5, 4, 49170, 49160, 22, 10, 255], "cert_cache_fill": 0, "load_or_obtain_if_necessary": true, "on_demand": false}
2024/05/19 01:48:42.922 DEBUG   http.stdlib     http: TLS handshake error from 100.120.137.94:53237: no certificate available for 'samson.tailb575b.ts.net'

Since caddy is already treating ts.net domains special when setting up cert managers, I'm wondering if it could do the same in the fallback handler (or maybe I've got some terminology wrong here?). I'm already looking into this myself, but wanted to open the issue in case there was a reason it behaves this way, or if you have any pointers on where to hook in.

@willnorris
Copy link
Contributor Author

Okay, I think I have found some clues. From autohttps.go

// tailscale names go in their own automation policies because
// they require on-demand TLS to be enabled, which we obviously
// can't enable for everything

So configuring on-demand access for the site does seem to work...

{
  debug
  on_demand_tls {
    ask http://localhost:8080/
  }
}

:443 {
  tls {
    on_demand
  }
  respond OK
}

:8080 {
  respond OK
}

Do I have that right? I'm not too worried about requiring a proper hostname for production use, I'm just trying to add to the caddy-tailscale examples that will work without additional config changes

@francislavoie
Copy link
Member

francislavoie commented May 19, 2024

I think you need to do this:

https:// {
	tls {
		get_certificate tailscale
	}
	respond OK
}

https://caddyserver.com/docs/caddyfile/directives/tls#certificate-managers

We could probably adjust the docs to mention that this pattern can be used if you need wildcard TS domains 🤔

@willnorris
Copy link
Contributor Author

But even that still requires the on_demand_tls global option, right? Otherwise, you get the error:

Error: loading initial config: loading new config: loading http app module: provision http: on-demand TLS cannot be enabled without a permission module to prevent abuse; please refer to documentation for details

So then I'm either specifying get_certificate tailscale or on_demand on each site, so it's roughly equivalent.

@francislavoie
Copy link
Member

I'm pretty sure enabling get_certificate should obviate the need for that. Are you testing with latest on master? I thought we adjusted that already 🤔

@willnorris
Copy link
Contributor Author

yes, this is with the last master branch... still see that error.

@francislavoie
Copy link
Member

francislavoie commented May 19, 2024

Okay - PRs welcome if you want to patch that (to skip the permission module check if get_certificate is used).

Basically a get_certificate module is effectively a permission module that returns a certificate or nil instead of a boolean. So I don't think it makes sense to ever require a permission module when get_certificate is used.

willnorris added a commit to willnorris/caddy that referenced this issue May 19, 2024
Certificate automation has permission modules that are designed to
prevent inappropriate issuance of unbounded or wildcard certificates.
When an explicit cert manager is used, no additional permission should
be necessary. For example, this should be a valid caddyfile:

    https:// {
      tls {
        get_certificate tailscale
      }
      respond OK
    }

This is accomplished when provisioning an AutomationPolicy by tracking
whether there were explicit managers configured directly on the policy
(in the ManagersRaw field). Only when a number of potentially unsafe
conditions are present AND no explicit cert managers are configured is
an error returned.

The problem arises from the fact that ctx.LoadModule deletes the raw
bytes after loading in order to save memory. The first time an
AutomationPolicy is provisioned, the ManagersRaw field is populated, and
everything is fine.

An AutomationPolicy with no subjects is treated as a special "catch-all"
policy. App.createAutomationPolicies ensures that this catch-all policy
has an ACME issuer, and then calls its Provision method again because it
may have changed. This second time Provision is called, ManagesRaw is no
longer populated, and the permission check fails because it appears as
though the policy has no explicit managers.

Address this by storing a new boolean on AutomationPolicy recording
whether it had explicit cert managers configured on it.

Also fix an inverted boolean check on this value when setting
failClosed.

Updates caddyserver#6060
Updates caddyserver#6229
Updates caddyserver#6327

Signed-off-by: Will Norris <will@tailscale.com>
mholt pushed a commit that referenced this issue May 20, 2024
Certificate automation has permission modules that are designed to
prevent inappropriate issuance of unbounded or wildcard certificates.
When an explicit cert manager is used, no additional permission should
be necessary. For example, this should be a valid caddyfile:

    https:// {
      tls {
        get_certificate tailscale
      }
      respond OK
    }

This is accomplished when provisioning an AutomationPolicy by tracking
whether there were explicit managers configured directly on the policy
(in the ManagersRaw field). Only when a number of potentially unsafe
conditions are present AND no explicit cert managers are configured is
an error returned.

The problem arises from the fact that ctx.LoadModule deletes the raw
bytes after loading in order to save memory. The first time an
AutomationPolicy is provisioned, the ManagersRaw field is populated, and
everything is fine.

An AutomationPolicy with no subjects is treated as a special "catch-all"
policy. App.createAutomationPolicies ensures that this catch-all policy
has an ACME issuer, and then calls its Provision method again because it
may have changed. This second time Provision is called, ManagesRaw is no
longer populated, and the permission check fails because it appears as
though the policy has no explicit managers.

Address this by storing a new boolean on AutomationPolicy recording
whether it had explicit cert managers configured on it.

Also fix an inverted boolean check on this value when setting
failClosed.

Updates #6060
Updates #6229
Updates #6327

Signed-off-by: Will Norris <will@tailscale.com>
@mholt
Copy link
Member

mholt commented May 20, 2024

@willnorris Is this patched up now? If so I'll close this and prepare to release 2.8 rc.1.

@willnorris
Copy link
Contributor Author

willnorris commented May 20, 2024

Yes, I think so. I just tested again, and while we still have some pending changes on the Tailscale side to merge (for things like proper QUIC support), everything on the Caddy side is done.

@mholt
Copy link
Member

mholt commented May 20, 2024

Excellent!! Thanks. I'll tag rc1 today.

@mholt mholt closed this as completed May 20, 2024
@mholt mholt added this to the v2.8.0 milestone May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants