Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define unique user-agent #107

Open
danDanV1 opened this issue Jul 3, 2018 · 16 comments
Open

Define unique user-agent #107

danDanV1 opened this issue Jul 3, 2018 · 16 comments

Comments

@danDanV1
Copy link

danDanV1 commented Jul 3, 2018

The user agent for hstspreload requests is generic
user-agent: Go-http-client/2.0

Can this be set to something specific to identify the bot? This would enable server admins to whitelist the bot if necessary and distinquish from any otherbot using the go http library.

Can I suggest hstspreload client/2.0 ??

@lgarron
Copy link
Collaborator

lgarron commented Jul 3, 2018

So far, we've tried to encourage configurations that did not depend on any feature of the client, especially things like the user agent or source IP. The HSTS preload list website has no promises (only requirements), and there are no guarantees any particular part of the system will remain the same in the future.

This would enable server admins to whitelist the bot if necessary and distinquish from any otherbot using the go http library.

It is safest helps if every (modern) browser and everyone connecting using other libraries (e.g. via the hstspreload commandline tool) gets the same responses with the same headers. In particular, a site will not end up on Mozilla's HSTS preload list unless their scanner is able to observe the header.

This also brings to mind the issue is that people copy-paste recommendations from others on the internet. If these "recommended" configurations have a specific user agent they can start sniffing, preloading issues could become very difficult to debug. (I try to track bad recommendations on the web and ask their owners to improve them, but this is a manual, imperfect process.)

For these reasons, I'm against tweaking the user agent. However, I'll let @nharper have final say about it.

@lightswitch05
Copy link

In my case, having a custom user agent would have prevented the bot from being blacklisted. Being the default Go-http-client/1.1 and Go-http-client/2.0 was flagged as someone scrapping the site. Blocking based off user agent is a quick fix - although not a really great one. Anyways, It probably was a bot scrapping the site, but by blacklisting it, this was thrown under the bus as well

@lgarron
Copy link
Collaborator

lgarron commented Aug 20, 2018

@nharper, do you have an opinion about this either way?

@nharper
Copy link
Collaborator

nharper commented Aug 20, 2018

I don't think the hstspreload tool needs to specify its own UA. In general, I don't like servers sniffing UA strings.

@devonobrien
Copy link
Collaborator

Reopening this issue as we now have a more compelling reason to reconsider custom UA strings.

The outbound scans used by hstspreload.org and the bulk updates to check for preloading eligibility have started to be blocked by several CDNs' spam/fraud detection. These CDNs only offer allowlisting by (User-Agent, ASN)-tuples, and they are understandably not a fan of allowlisting the default go UA string. While I agree that we do not want to encourage UA sniffing generally speaking, I don't think we have many other options here. Once we get a custom string, we still need to reach out to the affected CDNs to start the process of unblocking.

I'd not seen this discussion before filing #118 and tweaking the UA string to "user-agent: hsts-preload-bot" on a new branch, but I'm happy to settle on a more amenable custom string, if folks have a strong opinion on what it should be.

@devonobrien devonobrien reopened this Sep 2, 2021
@lgarron
Copy link
Collaborator

lgarron commented Sep 2, 2021

Reopening this issue as we now have a more compelling reason to reconsider custom UA strings.

I think now's a good time!

This has turned int something people ask for more regularly.

We could still discourage UA detection by scanning using the default user agent first, and e.g. redoing the whole scan using the custom UA if there is a relevant failure.

@nharper
Copy link
Collaborator

nharper commented Sep 2, 2021

If we want to change the UA string from the default golang string, we could also consider scanning using a few common UA strings from browsers. This way the behavior observed by the probe would more closely match what browsers would see in the real world.

@lgarron
Copy link
Collaborator

lgarron commented Sep 2, 2021

If we want to change the UA string from the default golang string, we could also consider scanning using a few common UA strings from browsers. This way the behavior observed by the probe would more closely match what browsers would see in the real world.

Sounds like a good idea! Maybe also include things like curl, which benefits more from dynamic HSTS than actual browsers (who have preload lists)? :-D

@jdeblasio
Copy link
Collaborator

From my perspective, scanning multiple times with multiple UAs feels a little silly to me. The number one reason to scan at all is to ensure that the site has authorized preloading. As long as that works with any UA, I'm comfortable saying that we're authorized. Conversely, scanning n times adds a bunch of additional complexity. Besides the obvious code complexity, there's also more legwork for maintainers when the emails with harder-to-debug failures start rolling in.

That's not to say that I'm 100% opposed to this, but I'm not totally clear on what problem scanning with multiple UAs would actually solve.

@lgarron
Copy link
Collaborator

lgarron commented Sep 2, 2021

@jdeblasio hstspreload.org has always scanned for more than the super-basic requirements, and issued errors or warnings for practices that could leave users unprotected.

If a site is dynamically calculating whether to send an HSTS header, then users with a client that doesn't have preloaded HSTS are more likely to be unprotected because they're not getting dynamic HSTS.
(This is getting less and less of a concern, but it certainly has had its value.)

Also, dynamic HSTS configuration means that the header may change or get dropped by accident. We had to specifically add a guard for the removal criteria because this was happening too often, and I think it would be good to encourage sending the header as unconditionally as possible.

@nharper
Copy link
Collaborator

nharper commented Sep 2, 2021

The outbound scans used by hstspreload.org and the bulk updates to check for preloading eligibility have started to be blocked by several CDNs' spam/fraud detection. These CDNs only offer allowlisting by (User-Agent, ASN)-tuples, and they are understandably not a fan of allowlisting the default go UA string.

I'm assuming (perhaps incorrectly) that CDNs are adding the STS header based on a configuration option. Could you work with CDNs so that the HTTP response they send in spam/fraud cases still includes the STS header, i.e. apply the STS header before the spam/fraud check? (This is also assuming that the CDN's response to such a request is an HTTP response vs closing a connection or similar.) That would be more in line with the philosophy that an STS header should be set unconditionally on a domain.

@jdeblasio
Copy link
Collaborator

I think I'd like to argue that we should spin off the "check multiple fetch with multiple UAs" idea into a separate feature request.

I 100% agree that scanning for more than super-basic requirements is great, and that this could help solve a real issue that occurs in some cases. There are just also some additional risks. One thing I'm worried about, for instance, is that we'll run into CDNs who aren't enthusiastic about allowlisting fetches that look like they're from a bot but are using a browser-like UA string. If we encounter that, then we've obligated ourselves to either bake in ways to account for those CDNs (more complexity), or remove the check (wasted effort).

Separate from that improvement is the present buggy reality that some folks behind CDNs can't preload their domains without manual intervention because those requests are getting blocked.

The former is a cool nice-to-have. The latter needs addressing pretty urgently.

@lgarron
Copy link
Collaborator

lgarron commented Sep 2, 2021

The former is a cool nice-to-have. The latter needs addressing pretty urgently.

Could I ask what makes it urgent? I think it's worth looking at solutions, but we've successfully asked sites to handle this on their end for over half a decade.

Do we know what CDNs are causing most of the issues? Is it e.g. mostly Cloudflare?

We could consider asking them if they would apply the domain's HSTS setting to their interception page.
This would mean we don't see the correct response and redirect chain, but it's another option.

@lgarron
Copy link
Collaborator

lgarron commented Sep 2, 2021

In any case, I offer this strawperson:

  • Run all the test with an up-to-date Chrome UA. (Is there a good dynamic way to get that, ideally without manual work or a network call? Maybe GitHub action to update a config file?)
  • Issue one additional request to the root path over HSTS with the default user agent and add a warning if the response has a different HSTS header / status code / redirect.

@jdeblasio
Copy link
Collaborator

There's another reality here: we don't have a ton of cycles for HSTS preload stuff right now. We (the Chromium-based maintainers) are definitely committed to supporting the list for as long as it's valuable, and we might be able to give it more cycles in the future, but presently we're looking to get the most value per (very little) time spent.

Setting a single UA header is a trivial change that meets the present need. We'd also be delighted to receive PRs for more comprehensive solutions.

@devonobrien
Copy link
Collaborator

From my perspective, setting a hstspreload-specific UA string gets us an immediate win with virtually no downside and whether sites selectively serve headers based on UA string is a bit of an orthogonal issue to what hstspreload uses. We can consider fancier approaches later if we can articulate benefits that are worth the implementation effort.

Site operators are already responsible for the consequences of "bad" HSTS behavior like ignoring the deployment recommendations when submitting their domain for preloading, regardless of whether they selectively serve headers based on UA. The immediate need we have now is for hstspreload.org and our bulk update infrastructure to be identifiable so it s header checks can be unblocked at the CDN level. We've so far identified 2 CDNs (including Cloudflare) that are known to be blocking requests, and after discussing it with them, the established way to circumvent this for bots is to allowlist based on ASN and UA string.

If there are no objections to this immediate path forwards, I suggest we:

  • update the UA string to e.g. hsts-preload-bot in redirects.go and response.go,
  • update the version of hstspreload used on hstspreload.org,
  • work with the CDNs on allowlisting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants