Define unique user-agent #107

danDanV1 · 2018-07-03T04:44:07Z

The user agent for hstspreload requests is generic
user-agent: Go-http-client/2.0

Can this be set to something specific to identify the bot? This would enable server admins to whitelist the bot if necessary and distinquish from any otherbot using the go http library.

Can I suggest hstspreload client/2.0 ??

The text was updated successfully, but these errors were encountered:

lgarron · 2018-07-03T05:02:50Z

So far, we've tried to encourage configurations that did not depend on any feature of the client, especially things like the user agent or source IP. The HSTS preload list website has no promises (only requirements), and there are no guarantees any particular part of the system will remain the same in the future.

This would enable server admins to whitelist the bot if necessary and distinquish from any otherbot using the go http library.

It is safest helps if every (modern) browser and everyone connecting using other libraries (e.g. via the hstspreload commandline tool) gets the same responses with the same headers. In particular, a site will not end up on Mozilla's HSTS preload list unless their scanner is able to observe the header.

This also brings to mind the issue is that people copy-paste recommendations from others on the internet. If these "recommended" configurations have a specific user agent they can start sniffing, preloading issues could become very difficult to debug. (I try to track bad recommendations on the web and ask their owners to improve them, but this is a manual, imperfect process.)

For these reasons, I'm against tweaking the user agent. However, I'll let @nharper have final say about it.

lightswitch05 · 2018-07-23T19:33:59Z

In my case, having a custom user agent would have prevented the bot from being blacklisted. Being the default Go-http-client/1.1 and Go-http-client/2.0 was flagged as someone scrapping the site. Blocking based off user agent is a quick fix - although not a really great one. Anyways, It probably was a bot scrapping the site, but by blacklisting it, this was thrown under the bus as well

lgarron · 2018-08-20T07:15:58Z

@nharper, do you have an opinion about this either way?

nharper · 2018-08-20T17:33:41Z

I don't think the hstspreload tool needs to specify its own UA. In general, I don't like servers sniffing UA strings.

devonobrien · 2021-09-02T20:32:19Z

Reopening this issue as we now have a more compelling reason to reconsider custom UA strings.

The outbound scans used by hstspreload.org and the bulk updates to check for preloading eligibility have started to be blocked by several CDNs' spam/fraud detection. These CDNs only offer allowlisting by (User-Agent, ASN)-tuples, and they are understandably not a fan of allowlisting the default go UA string. While I agree that we do not want to encourage UA sniffing generally speaking, I don't think we have many other options here. Once we get a custom string, we still need to reach out to the affected CDNs to start the process of unblocking.

I'd not seen this discussion before filing #118 and tweaking the UA string to "user-agent: hsts-preload-bot" on a new branch, but I'm happy to settle on a more amenable custom string, if folks have a strong opinion on what it should be.

lgarron · 2021-09-02T20:37:09Z

Reopening this issue as we now have a more compelling reason to reconsider custom UA strings.

I think now's a good time!

This has turned int something people ask for more regularly.

We could still discourage UA detection by scanning using the default user agent first, and e.g. redoing the whole scan using the custom UA if there is a relevant failure.

nharper · 2021-09-02T21:14:42Z

If we want to change the UA string from the default golang string, we could also consider scanning using a few common UA strings from browsers. This way the behavior observed by the probe would more closely match what browsers would see in the real world.

lgarron · 2021-09-02T21:25:30Z

If we want to change the UA string from the default golang string, we could also consider scanning using a few common UA strings from browsers. This way the behavior observed by the probe would more closely match what browsers would see in the real world.

Sounds like a good idea! Maybe also include things like curl, which benefits more from dynamic HSTS than actual browsers (who have preload lists)? :-D

jdeblasio · 2021-09-02T22:46:08Z

From my perspective, scanning multiple times with multiple UAs feels a little silly to me. The number one reason to scan at all is to ensure that the site has authorized preloading. As long as that works with any UA, I'm comfortable saying that we're authorized. Conversely, scanning n times adds a bunch of additional complexity. Besides the obvious code complexity, there's also more legwork for maintainers when the emails with harder-to-debug failures start rolling in.

That's not to say that I'm 100% opposed to this, but I'm not totally clear on what problem scanning with multiple UAs would actually solve.

lgarron · 2021-09-02T23:12:08Z

@jdeblasio hstspreload.org has always scanned for more than the super-basic requirements, and issued errors or warnings for practices that could leave users unprotected.

If a site is dynamically calculating whether to send an HSTS header, then users with a client that doesn't have preloaded HSTS are more likely to be unprotected because they're not getting dynamic HSTS.
(This is getting less and less of a concern, but it certainly has had its value.)

Also, dynamic HSTS configuration means that the header may change or get dropped by accident. We had to specifically add a guard for the removal criteria because this was happening too often, and I think it would be good to encourage sending the header as unconditionally as possible.

nharper · 2021-09-02T23:26:54Z

The outbound scans used by hstspreload.org and the bulk updates to check for preloading eligibility have started to be blocked by several CDNs' spam/fraud detection. These CDNs only offer allowlisting by (User-Agent, ASN)-tuples, and they are understandably not a fan of allowlisting the default go UA string.

I'm assuming (perhaps incorrectly) that CDNs are adding the STS header based on a configuration option. Could you work with CDNs so that the HTTP response they send in spam/fraud cases still includes the STS header, i.e. apply the STS header before the spam/fraud check? (This is also assuming that the CDN's response to such a request is an HTTP response vs closing a connection or similar.) That would be more in line with the philosophy that an STS header should be set unconditionally on a domain.

jdeblasio · 2021-09-02T23:35:40Z

I think I'd like to argue that we should spin off the "check multiple fetch with multiple UAs" idea into a separate feature request.

I 100% agree that scanning for more than super-basic requirements is great, and that this could help solve a real issue that occurs in some cases. There are just also some additional risks. One thing I'm worried about, for instance, is that we'll run into CDNs who aren't enthusiastic about allowlisting fetches that look like they're from a bot but are using a browser-like UA string. If we encounter that, then we've obligated ourselves to either bake in ways to account for those CDNs (more complexity), or remove the check (wasted effort).

Separate from that improvement is the present buggy reality that some folks behind CDNs can't preload their domains without manual intervention because those requests are getting blocked.

The former is a cool nice-to-have. The latter needs addressing pretty urgently.

lgarron · 2021-09-02T23:40:02Z

The former is a cool nice-to-have. The latter needs addressing pretty urgently.

Could I ask what makes it urgent? I think it's worth looking at solutions, but we've successfully asked sites to handle this on their end for over half a decade.

Do we know what CDNs are causing most of the issues? Is it e.g. mostly Cloudflare?

We could consider asking them if they would apply the domain's HSTS setting to their interception page.
This would mean we don't see the correct response and redirect chain, but it's another option.

lgarron · 2021-09-02T23:42:11Z

In any case, I offer this strawperson:

Run all the test with an up-to-date Chrome UA. (Is there a good dynamic way to get that, ideally without manual work or a network call? Maybe GitHub action to update a config file?)
Issue one additional request to the root path over HSTS with the default user agent and add a warning if the response has a different HSTS header / status code / redirect.

jdeblasio · 2021-09-02T23:57:09Z

There's another reality here: we don't have a ton of cycles for HSTS preload stuff right now. We (the Chromium-based maintainers) are definitely committed to supporting the list for as long as it's valuable, and we might be able to give it more cycles in the future, but presently we're looking to get the most value per (very little) time spent.

Setting a single UA header is a trivial change that meets the present need. We'd also be delighted to receive PRs for more comprehensive solutions.

devonobrien · 2021-09-03T00:45:29Z

From my perspective, setting a hstspreload-specific UA string gets us an immediate win with virtually no downside and whether sites selectively serve headers based on UA string is a bit of an orthogonal issue to what hstspreload uses. We can consider fancier approaches later if we can articulate benefits that are worth the implementation effort.

Site operators are already responsible for the consequences of "bad" HSTS behavior like ignoring the deployment recommendations when submitting their domain for preloading, regardless of whether they selectively serve headers based on UA. The immediate need we have now is for hstspreload.org and our bulk update infrastructure to be identifiable so it s header checks can be unblocked at the CDN level. We've so far identified 2 CDNs (including Cloudflare) that are known to be blocking requests, and after discussing it with them, the established way to circumvent this for bots is to allowlist based on ASN and UA string.

If there are no objections to this immediate path forwards, I suggest we:

update the UA string to e.g. hsts-preload-bot in redirects.go and response.go,
update the version of hstspreload used on hstspreload.org,
work with the CDNs on allowlisting.

nharper closed this as completed Aug 20, 2018

nharper mentioned this issue Jun 10, 2021

Set unique user-agent chromium/hstspreload.org#180

Open

nharper mentioned this issue Sep 2, 2021

Set custom User-Agent string for hstspreload #118

Closed

devonobrien reopened this Sep 2, 2021

devonobrien mentioned this issue Jan 31, 2022

Set custom user agent string for hstspreload #121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define unique user-agent #107

Define unique user-agent #107

danDanV1 commented Jul 3, 2018 •

edited

lgarron commented Jul 3, 2018 •

edited

lightswitch05 commented Jul 23, 2018

lgarron commented Aug 20, 2018

nharper commented Aug 20, 2018

devonobrien commented Sep 2, 2021

lgarron commented Sep 2, 2021

nharper commented Sep 2, 2021

lgarron commented Sep 2, 2021 •

edited

jdeblasio commented Sep 2, 2021

lgarron commented Sep 2, 2021

nharper commented Sep 2, 2021

jdeblasio commented Sep 2, 2021

lgarron commented Sep 2, 2021

lgarron commented Sep 2, 2021

jdeblasio commented Sep 2, 2021

devonobrien commented Sep 3, 2021

Define unique user-agent #107

Define unique user-agent #107

Comments

danDanV1 commented Jul 3, 2018 • edited

lgarron commented Jul 3, 2018 • edited

lightswitch05 commented Jul 23, 2018

lgarron commented Aug 20, 2018

nharper commented Aug 20, 2018

devonobrien commented Sep 2, 2021

lgarron commented Sep 2, 2021

nharper commented Sep 2, 2021

lgarron commented Sep 2, 2021 • edited

jdeblasio commented Sep 2, 2021

lgarron commented Sep 2, 2021

nharper commented Sep 2, 2021

jdeblasio commented Sep 2, 2021

lgarron commented Sep 2, 2021

lgarron commented Sep 2, 2021

jdeblasio commented Sep 2, 2021

devonobrien commented Sep 3, 2021

danDanV1 commented Jul 3, 2018 •

edited

lgarron commented Jul 3, 2018 •

edited

lgarron commented Sep 2, 2021 •

edited