Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing issues on theverge.com #3787

Open
johnlago opened this issue Apr 4, 2024 · 1 comment
Open

Parsing issues on theverge.com #3787

johnlago opened this issue Apr 4, 2024 · 1 comment

Comments

@johnlago
Copy link

johnlago commented Apr 4, 2024

Some extraneous content and inappropriate formatting are happening on Verge articles. One example:


Native website (https://www.theverge.com/2024/4/3/24119918/elon-musk-reputation-impact-tesla-falling-sales):
Screenshot 2024-04-04 at 12 43 40 PM

Omnivore version:
Screenshot 2024-04-04 at 12 43 57 PM


Pullquotes also tend to become odd in Omnivore. On the website, they stand out outside the normal flow of text, but Omnivore parses them as regular paragraphs. This is confusing to read, since often they're seemingly random repeats of text you've already read, or are about to read.

Setting them off as a quote passage, or even better removing them altogether, would make the text more readable.

Native website (https://www.theverge.com/24094310/vice-media-layoffs-bankruptcy-shane-smith):
Screenshot 2024-04-04 at 2 51 59 PM

Omnivore version:
Screenshot 2024-04-04 at 2 51 30 PM

@johnlago
Copy link
Author

johnlago commented Apr 4, 2024

One more issue -- articles from theverge.com end with a strange comments box in Omnivore.

Screenshot 2024-04-04 at 2 56 39 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant