Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a safe_url based on all standards #221

Draft
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Feb 13, 2024

Continuation of #201.

Fixes #193.

Tasks:

  • Provide an alternative implementation for safe_url_string based on the URL living standard
  • Add missing typing information
  • Provide comprehensive test coverage for both implementations, highlighting the differences, and in the process make the new implementation support also RFC 2396 (used by java.net.URI) and RFC 3986 (the previous standard, still popular).
  • Ensure the following issues are addressed by the new implementation:
  • Have the URL living standard parse implementation mostly pass the upstream tests.
  • Improve performance
    The new implementation is currently around 3-4 times slower than the previous implementation for test cases where both have the same, non-error outcome.
  • Address to-dos in test_parse_url.
  • Clean up the implementation: cleaner code, docstrings…
  • Discuss what to do API-wise (deprecate safe_url_string in favor of safe_url? Remove safe_url and make safe_url_string use the new implementation?

@Gallaecio
Copy link
Member Author

I am having a hard time lowering the gap below the current one, which is 4-3 times slower than the previous implementation 😞

@Gallaecio
Copy link
Member Author

Looking at scrapy/scrapy#1306, maybe we could try going for https://chromium.googlesource.com/chromium/src/+/HEAD/url/ .

@Gallaecio
Copy link
Member Author

@kmike
Copy link
Member

kmike commented Mar 16, 2024

An interesting project: https://github.com/TkTech/can_ada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

safe_url_string handling IPv6 URLs
2 participants