Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incomplete content on multiple pages #739

Open
Grienauer opened this issue Apr 17, 2023 · 2 comments
Open

incomplete content on multiple pages #739

Grienauer opened this issue Apr 17, 2023 · 2 comments

Comments

@Grienauer
Copy link

Currently on following pages the parser seems to be lost.
I don't see any markup problems.
maybe the newspapers detect and block the scraper?

https://www.derstandard.at/story/2000145508819/franzoesischer-verfassungsrat-stimmt-umstrittener-pensionsreform-zu
there an info is added to the text, that some "software" is blocking stuff and it should be removed

https://kurier.at/wirtschaft/atomausstieg-wie-die-abschaltung-eines-kernkraftwerks-funktioniert/402412829
only one line of text

thx for info. happy to help.

@Overwatching
Copy link

There are multiple mentions in the issues section about header content being removed erroneously.
I think this falls into the same problem.

I came here to report the same thing happening on Hackaday.com/blog

@ctipper
Copy link

ctipper commented Jul 30, 2023

And https://www.thetimes.co.uk/ multiple articles, it clips the first one or two paragraphs on every page I'v tried. Kind of useeless in this state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants