Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't check preconnect links #1187

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Don't check preconnect links #1187

wants to merge 5 commits into from

Conversation

mre
Copy link
Member

@mre mre commented Jul 29, 2023

Preconnect links are used to establish a server connection without loading a specific resource yet.
Not always do these links point to a URL that should return a 200, and they are not user-facing, i.e. they don't show up in the final rendered version of a page.

Therefore, I think we should exclude them at all; not even in --include-verbatim
mode, as they might not point to a valid resource.

Fixes #897

Preconnect links are used to establish a server connection without loading a
specific resource yet. Not always do these links point to a URL that should
return a 200, and they are not user-facing, i.e. they don't show up in the
final rendered version of a page.

Therefore, I think we should them at all; not even in `--include-verbatim`
mode, as they might not point to a valid resource.

Fixes #897
@mre
Copy link
Member Author

mre commented Aug 17, 2023

Huh, I don't know why this breaks a test. Somehow the parser state gets tripped up by my change. Maybe I need to reset some state.
https://github.com/lycheeverse/lychee/actions/runs/5872452302/job/15923983895?pr=1187

@untitaker, any ideas?

#[test]
fn test_skip_preconnect() {
let input = r#"
<link rel="preconnect" href="https://example.com">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a testcase for the opposite order of attributes, <link href=... rel=...> -- not sure at the moment if it would pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it fails.
The other, similar nofollow feature is broken in the same way and the test fails for that, too, if I change the ordering of attributes.

@mre
Copy link
Member Author

mre commented Aug 22, 2023

I'm thinking I need to improve the attribute handling in general to make the tests pass.

I would have to cache the attributes and only extract links once I'm fully done with reading all attributes for a single tag.
Quite frankly, driving the HTML parsing state-machine is a bit finicky with html5gum. Is there a way to change the parsing such that I can get a fully parsed set of attributes for a tag? It would probably come with a performance penalty, but it would reduce the complexity of the implementation.

(That's the one reason why I haven't removed html5ever support yet. I like how simple the parsing is, although it's slower of course.)

@untitaker
Copy link
Collaborator

instead of writing your own emitter you can always just read through the DefaultEmitter (like how tags are being processed in the README), though it is a lot slower and I think also not competitive with html5ever

@mre
Copy link
Member Author

mre commented Aug 24, 2023

@untitaker, the way I would model it (without switching to the DefaultEmitter) would be to create a HashMap with key being the attribute and value being a list/set of links, e.g. { key: "href", "value": [https://example.com] }. When I reach the end of a tag, I check the attributes and emit the list of links if there are no excluded attributes (like rel=prefetch).

Does that make sense? Is there an easier way in html5gum (without DefaultEmitter)? It feels like the perf could suffer from the additional bookkeeping.

@untitaker
Copy link
Collaborator

untitaker commented Aug 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't check preconnect links if not in verbatim mode
2 participants