RssCrawler doesn't support valid Rss XML #129

JeromeGill · 2019-10-31T16:05:21Z

Describe the bug
RssCrawler throws an exception due to not supporting valid RSS
https://github.com/fhamborg/news-please/blob/master/newsplease/single_crawler.py#L202

To Reproduce

dockerfile

FROM python:latest
RUN pip3 install news-please

The following sitelist.hjson

  "base_urls" : [
    {
           "url": "http://feeds.bbci.co.uk/news/world/latin_america/rss.xml",
           "crawler": "RssCrawler"
    }

Expected behavior

RssCrawler loaded and scrape the feed

Log

[newsplease.config:164|INFO] Loading config-file (/root/news-please-repo/config/config.cfg)
[newsplease.config:164|INFO] Loading config-file (/root/news-please-repo/config/config.cfg)
[__main__:200|ERROR] No crawlers (incl. fallbacks) found for url http://feeds.bbci.co.uk/news/world/latin_america/rss.xml.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/newsplease/single_crawler.py", line 265, in <module>
    SingleCrawler(cfg_file_path=sys.argv[1],
  File "/usr/local/lib/python3.8/site-packages/newsplease/single_crawler.py", line 117, in __init__
    crawler_class = self.get_crawler(self.crawler_name, site["url"])
  File "/usr/local/lib/python3.8/site-packages/newsplease/single_crawler.py", line 202, in get_crawler
    raise RuntimeError("No crawler found. Quit.")
RuntimeError: No crawler found. Quit.
[newsplease.__main__:269|INFO] Graceful stop called manually. Shutting down.

Versions (please complete the following information):

Python 3.8.0
news-please Version 1.4.3

The text was updated successfully, but these errors were encountered:

fhamborg · 2019-11-01T13:52:49Z

Could this be related to this PR? #119

JeromeGill · 2019-11-01T14:05:02Z

That url in this issue is an atom feed, yeah

fhamborg added the help wanted label Nov 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RssCrawler doesn't support valid Rss XML #129

RssCrawler doesn't support valid Rss XML #129

JeromeGill commented Oct 31, 2019 •

edited

fhamborg commented Nov 1, 2019

JeromeGill commented Nov 1, 2019

RssCrawler doesn't support valid Rss XML #129

RssCrawler doesn't support valid Rss XML #129

Comments

JeromeGill commented Oct 31, 2019 • edited

fhamborg commented Nov 1, 2019

JeromeGill commented Nov 1, 2019

JeromeGill commented Oct 31, 2019 •

edited