Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--auto-hsts should disable HSTS if renewal has failed N times #6239

Closed
pde opened this issue Jul 18, 2018 · 7 comments
Closed

--auto-hsts should disable HSTS if renewal has failed N times #6239

pde opened this issue Jul 18, 2018 · 7 comments

Comments

@pde
Copy link
Member

pde commented Jul 18, 2018

Having an HSTS header makes a certificate error unrecoverable on the user end, which is obviously very bad. The --auto-hsts option should try to avoid contributing to such situations if at all possible.

It's reasonably common for certificate renewal to be failing for some silly reason, but if that's happening the auto hsts code should notice and disable the HSTS header to mitigate the damage.

Ideally there should be a constraint N + HSTS-max-age-in-days < 30, if that's possible.

@bmw
Copy link
Member

bmw commented Jul 19, 2018

Happy to have a separate meeting or just chat here, but I'm unsure we want to add this feature. I have some concerns about how much this really buys users when it works, the additional code complexity, and the additional cognitive complexity pushed on users trying to understand this feature, but my biggest concerns are:

Do we want users to click through TLS warnings?

The main motivation I heard for this feature and the reason we need it now when we didn't for HTTP->HTTPS redirects is that users can't click through TLS warnings on sites that used HSTS. Is the fact that we disable HSTS so users can do this beneficial? I know many users do click through these warnings when they can, but most should not and browsers are trying all the time to reduce the rate that users do so.

Even more seriously with us talking about making this feature Certbot's default behavior some day, do all sysadmins want their users to (be able to) send their data to a potentially compromised site or should they wait for things to be fixed? I think that depends on the Certbot user and the site and us making this change allows users to make more harmful decisions by default. Perhaps we could make this aspect of AutoHSTS configurable, but that brings me back to my concerns about code and cognitive complexity.

We can't disable HSTS reliably

There are many cases where both renewal will fail and we'll be unable to disable HSTS. A few examples of this are:

Neither us or our users can rely on this feature to work. I think features like this can lull us and our users into a false sense of security and leave everyone in a bad place when it doesn't work as expected. I'd rather we worked under the assumption that we cannot do this and focus on improving documentation, features, and usability in ways we expect to be reliable.

@pde
Copy link
Member Author

pde commented Jul 20, 2018

A large part of the problem we face is browser behavior. Browsers treat certificate expiry as equally severe a failure mode as having a name mismatch or a self-signed cert, which was a poor choice; they'd do better to show more moderate but unmissable UI degradation.

Having said that, your point about not wanting to encourage cert warning click-through is reasonable. Option: also disable redirects at the same time as HSTS.

But given the current state of affairs (browsers treat expiry the same way as MITM attacks; browsers don't offer a way to disable HTTP without also making errors user-unrecoverable; we know from experience that it's hard to categorically prevent Certbot from failing sometimes) my instinct was to ship patches that reduce the damage.

A few key questions:

  1. how much complexity will this fallback actually add?
  2. what fraction of the time would this actually prevent unrecoverable errors for users? My sense after your post is about 50% of the time, but that's based on a hunch and not good data.

Certainly a call for you and @joohoi to make.

@joohoi
Copy link
Member

joohoi commented Jul 24, 2018

After thinking this through for quite some time I'm leaning towards thinking we probably should not do this after all for reasons other than possible issues with the implementation and / or reliability.

At the time of the first renewal, AutoHSTS max-age should be 24h. The lockout would most likely be a lot shorter, as 24h is the worst case scenario where the user has made the last request just before the certificate expired.

To prevent the lockout, we would have to disable HSTS at least 24h prior to the expiration, leaving the server vulnerable and breaking the promise of keeping the configuration secure. Also in the end the issue is the certificate and/or renewal automation which the user should probably fix in the first place.

@pde
Copy link
Member Author

pde commented Jul 24, 2018

I have probably come to agree here, though for different reasons :). I don't think the 24 hour window is an issue (we can act well before expiry), but I am persuaded the correlation between Apache authenticator failure and failure of this enhancement rollback mechanism might be quite high (maybe even >80%). And doing this work for the 20% case probably isn't worth it.

@pde pde closed this as completed Jul 24, 2018
@bmw bmw removed this from the 0.27.0 milestone Jul 24, 2018
@bmw
Copy link
Member

bmw commented Jul 24, 2018

It sounds like we're all in agreement here, but @pde, if you're possibly not convinced we shouldn't do this, I'm happy to chat about it more if you want. The more I think and talk about it, the more confident I become we shouldn't add this feature, but if you think not doing so might be a significant mistake, I'm happy to continue the conversation.

@pde
Copy link
Member Author

pde commented Jul 24, 2018 via email

@bmw
Copy link
Member

bmw commented Jul 25, 2018

I can think of a few problems with that:

  1. This won't work for problems that are occurring outside of the Apache plugin. There are examples of this in my previous list.
  2. Any changes the user made to their vhost files after running Certbot will be lost.
  3. The reverter currently has to rollback each checkpoint in the opposite order they were made. This means that any other changes Certbot made to the user's config such as enabling TLS on other vhosts will have to be reverted as well.
  4. Apache needs to be restarted for any config changes to take effect.

We could start to work around 2 - 3 by doing #5553. In addition to using git, we'd have to store outside of the Apache plugin which changes need to be reverted if renewal is failing and develop a strategy for handling any conflicts that occur. I think we'd probably want to avoid modifying the user's config in the case of any conflicts.

To solve 4, we'd need to reload Apache outside of the Apache plugin. Since how to start Apache is user configurable, depends on the distro, can be a sequence of commands, and we probably want this to work for other plugins like Nginx, we'd need a way for a plugin to provide and Certbot to retrieve and store arbitrary Python/shell commands that should be run to reload the server. The restart command and commits to revert need to be saved somewhere that we expect to be reliable in the case of arbitrary Certbot failures.

I think using the reverter to work around plugin failures is a lot more work and comes with its own set of problems. Additionally, my concerns about the reliability of this feature aren't attached to a specific implementation. I think any attempt to perform cleanup in the case of arbitrary failures has to be on a best effort basis which means that neither we or our users should rely on them to work.

Since my previous post, I also have thought and talked about user promises/expectations and the fact this feature focuses on the side effects of unattended renewal failures rather than the issue itself. If you think it's a good idea for me to expand on these issues, I can do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants