[Feature] Some sites block scraping content without javascript. #6447

sherlcok314159 · 2024-05-10T03:32:51Z

Some sites can not be scraped without javascript. And I tried different useragents such as curl/8.21. All the useragents failed.

Site: https://rsshub.app/zhubai/posts/havefun

Alkarex · 2024-05-11T21:39:28Z

You can try with https://github.com/lwthiker/curl-impersonate/ , which sometimes help.
Otherwise you will need a more sophisticated system.

sherlcok314159 · 2024-05-12T01:57:07Z

Thanks. But how can I combine this with freshrss?

Alkarex · 2024-05-12T17:16:24Z

A typical way is to use a system such as RSS Bridge, which outputs an RSS feed, which can be consumed by FreshRSS.
But first step is to find an approach that works manually.

squromiv · 2024-05-22T05:18:37Z

Some sites can not be scraped without javascript

Try feedless tool. It can help in some cases.

sherlcok314159 · 2024-05-22T07:59:18Z

Thanks for the above replies. My solution is to use a local headless browser to handle this by python. It is quite light.

sherlcok314159 closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Some sites block scraping content without javascript. #6447

[Feature] Some sites block scraping content without javascript. #6447

sherlcok314159 commented May 10, 2024

Alkarex commented May 11, 2024

sherlcok314159 commented May 12, 2024

Alkarex commented May 12, 2024

squromiv commented May 22, 2024 •

edited

sherlcok314159 commented May 22, 2024

[Feature] Some sites block scraping content without javascript. #6447

[Feature] Some sites block scraping content without javascript. #6447

Comments

sherlcok314159 commented May 10, 2024

Alkarex commented May 11, 2024

sherlcok314159 commented May 12, 2024

Alkarex commented May 12, 2024

squromiv commented May 22, 2024 • edited

sherlcok314159 commented May 22, 2024

squromiv commented May 22, 2024 •

edited