Skip to content

Commit

Permalink
spider fix: don't pass empty docs to language filter
Browse files Browse the repository at this point in the history
  • Loading branch information
adbar committed Apr 14, 2023
1 parent 6a3ce96 commit 1b8ebe9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion trafilatura/spider.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def process_links(htmlstring, base_url, language=None, rules=None):
Store the links in todo-list while prioritizing the navigation ones."""
links, links_priority = [], []
# optional language check: run baseline extraction + language identifier
if language is not None and LANGID_FLAG is True:
if language is not None and LANGID_FLAG is True and htmlstring is not None:
_, text, _ = baseline(htmlstring)
result, _ = py3langid.classify(text)
if result != language:
Expand Down

0 comments on commit 1b8ebe9

Please sign in to comment.