Wallabag can't parse JS-rendered pages #6737

pipoprods · 2023-07-18T06:50:41Z

More and more sites are using frontend rendering in JS. Wallabag can't handle this contents as the parser (curl-like) doesn't run any kind of script.

It would be great that Wallabag can parse these pages.
We could achieve this by fetching problematic sites with a headless Chromium.

This project could help in getting this to work: https://github.com/gildas-lormeau/single-file-cli
I've made experiments (on the command-line) and lynx -dump shows the page contents properly.

The next steps I'm going to experiment are:

make a Web API from single-file-cli
manually feed my Wallabag instance with pages from this API

If it works correctly, we could imagine this as a fallback when internal parser fails to retrieve the page body.

We'll have to deal with security issues: it would be dangerous to allow any URL to be handled by a headless Chromium. The fallback API should restrict the pages it accepts to handle.
This will probably lead to user expectations vs. admin restrictions difficulties.

The text was updated successfully, but these errors were encountered:

closes wallabag#6737

pipoprods added the Site Config label Jul 18, 2023

pipoprods pushed a commit to pipoprods/wallabag that referenced this issue Nov 9, 2023

feat: render sites through single-file-proxy

8a7845d

closes wallabag#6737

pipoprods linked a pull request Nov 9, 2023 that will close this issue

feat: store sites that are rendered on the frontend #7060

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wallabag can't parse JS-rendered pages #6737

Wallabag can't parse JS-rendered pages #6737

pipoprods commented Jul 18, 2023

Wallabag can't parse JS-rendered pages #6737

Wallabag can't parse JS-rendered pages #6737

Comments

pipoprods commented Jul 18, 2023