Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RssCrawler doesn't support valid Rss XML #129

Open
JeromeGill opened this issue Oct 31, 2019 · 2 comments
Open

RssCrawler doesn't support valid Rss XML #129

JeromeGill opened this issue Oct 31, 2019 · 2 comments

Comments

@JeromeGill
Copy link
Contributor

JeromeGill commented Oct 31, 2019

Describe the bug
RssCrawler throws an exception due to not supporting valid RSS
https://github.com/fhamborg/news-please/blob/master/newsplease/single_crawler.py#L202

To Reproduce

dockerfile

FROM python:latest
RUN pip3 install news-please

The following sitelist.hjson

  "base_urls" : [
    {
           "url": "http://feeds.bbci.co.uk/news/world/latin_america/rss.xml",
           "crawler": "RssCrawler"
    }

Expected behavior

RssCrawler loaded and scrape the feed

Log

[newsplease.config:164|INFO] Loading config-file (/root/news-please-repo/config/config.cfg)
[newsplease.config:164|INFO] Loading config-file (/root/news-please-repo/config/config.cfg)
[__main__:200|ERROR] No crawlers (incl. fallbacks) found for url http://feeds.bbci.co.uk/news/world/latin_america/rss.xml.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/newsplease/single_crawler.py", line 265, in <module>
    SingleCrawler(cfg_file_path=sys.argv[1],
  File "/usr/local/lib/python3.8/site-packages/newsplease/single_crawler.py", line 117, in __init__
    crawler_class = self.get_crawler(self.crawler_name, site["url"])
  File "/usr/local/lib/python3.8/site-packages/newsplease/single_crawler.py", line 202, in get_crawler
    raise RuntimeError("No crawler found. Quit.")
RuntimeError: No crawler found. Quit.
[newsplease.__main__:269|INFO] Graceful stop called manually. Shutting down.

Versions (please complete the following information):

  • Python 3.8.0
  • news-please Version 1.4.3
@fhamborg
Copy link
Owner

fhamborg commented Nov 1, 2019

Could this be related to this PR? #119

@JeromeGill
Copy link
Contributor Author

That url in this issue is an atom feed, yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants