Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems Parsing Titles #262

Open
grantdelozier opened this issue Oct 3, 2016 · 1 comment
Open

Problems Parsing Titles #262

grantdelozier opened this issue Oct 3, 2016 · 1 comment

Comments

@grantdelozier
Copy link

grantdelozier commented Oct 3, 2016

Seeing extraction errors on certain websites that have titles.

File "/usr/local/lib/python2.7/site-packages/ContentAnalysis-0.1.1-py2.7.egg/ContentAnalysis/document.py", line 53, in parse ginfo = g.extract(url=self.link) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 56, in extract return self.crawl(cc) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 66, in crawl article = crawler.crawl(crawl_candiate) File "/usr/local/lib/python2.7/site-packages/goose/crawler.py", line 154, in crawl self.article.title = self.title_extractor.extract() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 99, in extract return self.get_title() File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 78, in get_title return self.clean_title(title) File "/usr/local/lib/python2.7/site-packages/goose/extractors/title.py", line 56, in clean_title if title_words[0] in TITLE_SPLITTERS: IndexError: list index out of range

You can replicate by running goose extract on a site like http://daydreamingfoodie.com/

@grantdelozier
Copy link
Author

The issue on this site and plenty of others stems from when the title = opengraph site name

Fixed the issue in this commit of my fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant