Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic public suffix list support. #1082

Closed
wants to merge 1 commit into from
Closed

Basic public suffix list support. #1082

wants to merge 1 commit into from

Conversation

rthalley
Copy link
Owner

@rthalley rthalley commented May 5, 2024

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented May 5, 2024

Codecov Report

Attention: Patch coverage is 84.78261% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 93.88%. Comparing base (bfd0919) to head (87765ce).
Report is 5 commits behind head on main.

Files Patch % Lines
dns/psl.py 84.78% 9 Missing and 5 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1082      +/-   ##
==========================================
- Coverage   93.92%   93.88%   -0.05%     
==========================================
  Files         143      144       +1     
  Lines       13435    13527      +92     
  Branches     2606     2627      +21     
==========================================
+ Hits        12619    12700      +81     
- Misses        480      488       +8     
- Partials      336      339       +3     
Flag Coverage Δ
unittests 93.84% <84.78%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pspacek
Copy link
Collaborator

pspacek commented May 6, 2024

Hello! Out of curiosity, why this needs to be in dnspython?

I mean, there are at least two existing libraries for Python:

I wonder if there is a way improve dnspython interoperability with these? Or what is even the problem?

@rthalley
Copy link
Owner Author

rthalley commented May 6, 2024

I don't know that it "needs" to be in dnspython, but it goes along well with other things dnspython provides. Of the examples you cite, one hasn't been updated since 2019, and both of them use split(".") to convert a name into labels. This branch is based on code I've used for several years, and operates on dns.name.Name objects.

@pspacek
Copy link
Collaborator

pspacek commented May 8, 2024

I agree that split(".") is not how things should be done, definitely not. I'm just trying to find out if there is a good way to improve things for everyone and not just dnspython users, that's all.

@peterthomassen
Copy link
Contributor

I did a DNS-based implementation of the PSL as a Python module (https://github.com/sse-secure-systems/psl-dns). Docs and demo are at https://publicsuffix.zone/.

This is publicly queryable, i.e. doesn't require applications to bring the PSL along. Not sure if it aligns with what you have in mind for dnspython's PSL support, but using it might avoid duplicate work and in particular update problems.

@rthalley
Copy link
Owner Author

I think your solution is probably better if you don't have to make too many queries and don't mind the information leak, as it would not have up-to-date problems. I had automatic updating in an earlier draft, but I got rid of it. For my typical PSL use cases, I would probably be making a rude number of queries and/or not want the information leak, so I think there's still value in this code too.

Copy link
Collaborator

@bwelling bwelling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the other comments, in that I'm not sure that this needs to be in dnspython. There are no users of it here, and I'm not sure what any in-tree users would be doing.

That said, I don't object to it being added if there are reasons. I think the implementation looks fine, and only have a few minor comments.

Comment on lines +111 to +119
if not _have_httpx:
raise ValueError(
"download_if_needed is True but httpx is not available"
)
if not os.path.isfile(filename):
response = httpx.request("GET", url)
if response.status_code == 200:
with open(filename, "w") as f:
f.write(response.text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason this couldn't use urllib.request(), or is it doing something that requires httpx?

node = _ExceptionNode(n)
else:
node = _ExactNode(n)
if self.suffixes.has_key(n):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be n in self.suffixes, as has_key usage has been deprecated for years (assuming that works; NameDict probably should be defining __contains__, but MutableMapping does).

Also, is it possible that some alternate PSL might contain the root domain, which would conflict with the automatically added root domain earlier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root is not a public suffix, and the PSL algorithm stops at the top level (with all TLDs considered public suffixes).

My interpretation of that is that the root is outside of the reach of the PSL algorithm, and its appearance in this PR's code is just a data structure convenience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that depends on whether the intent here is to support loading the official public suffix list, or other lists which are compatible. If the latter, there is a possibility that it could contain the root, as I don't see anything in the spec that explicitly forbids it.

I don't see anything in the linked algorithm that prevents there being a rule at the root. Also, according to the algorithm:

If no rules match, the prevailing rule is "*".

So, maybe the PR should be adding a default wildcard rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the latter, there is a possibility that it could contain the root, as I don't see anything in the spec that explicitly forbids it.

But there's no point to that, because all TLDs are public suffixes by definition (see your other quote above), so you can never climb up to the root as a TLD public suffix will always be in the way.

The code here handles that with the allow_unlisted_gtlds parameter which defaults to True.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. If there's a rule at ., then there will never be the case that no rules match, because that rule will always match at the end. So, the text I quoted above would be irrelevant in this case.

I'm also not sure why any of this really matters. I simply suggested avoiding a misleading error. There's nothing in the spec that prevents a rule for ., so having a rule for . should not result in an error. Even if you think it should result in an error, it should not be a redefinition error, as it's not a redefinition.

@pspacek
Copy link
Collaborator

pspacek commented May 15, 2024

Wondering out loud: Maybe best course of action - for the overall ecosystem - would be to add label-aware API to an existing library? I agree that split('.') is horrendous thing to do, but if that's the only problem maybe it can be fixed in the library instead of reinventing it here?

@rthalley
Copy link
Owner Author

I'll just set this one aside for now.

@rthalley rthalley closed this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants