Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wallabag can't parse JS-rendered pages #6737

Open
pipoprods opened this issue Jul 18, 2023 · 0 comments · May be fixed by #7060
Open

Wallabag can't parse JS-rendered pages #6737

pipoprods opened this issue Jul 18, 2023 · 0 comments · May be fixed by #7060

Comments

@pipoprods
Copy link

More and more sites are using frontend rendering in JS. Wallabag can't handle this contents as the parser (curl-like) doesn't run any kind of script.

It would be great that Wallabag can parse these pages.
We could achieve this by fetching problematic sites with a headless Chromium.

This project could help in getting this to work: https://github.com/gildas-lormeau/single-file-cli
I've made experiments (on the command-line) and lynx -dump shows the page contents properly.

The next steps I'm going to experiment are:

  • make a Web API from single-file-cli
  • manually feed my Wallabag instance with pages from this API

If it works correctly, we could imagine this as a fallback when internal parser fails to retrieve the page body.

We'll have to deal with security issues: it would be dangerous to allow any URL to be handled by a headless Chromium. The fallback API should restrict the pages it accepts to handle.
This will probably lead to user expectations vs. admin restrictions difficulties.

pipoprods pushed a commit to pipoprods/wallabag that referenced this issue Nov 9, 2023
@pipoprods pipoprods linked a pull request Nov 9, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant