You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So far Trafilatura is entwined with a version of readability-lxml, it also uses jusText as fallback before triggering the baseline extraction as last resort. This combination is robust and performs well in the benchmark, however it can be beneficial to refactor the code so as to expose the extractor chain.
The current configuration can be written as follows:
fast mode: ["trafilatura", "baseline"]
normal mode: ["trafilatura+readability", "justext", "baseline"]
The text was updated successfully, but these errors were encountered:
So far Trafilatura is entwined with a version of readability-lxml, it also uses jusText as fallback before triggering the baseline extraction as last resort. This combination is robust and performs well in the benchmark, however it can be beneficial to refactor the code so as to expose the extractor chain.
The current configuration can be written as follows:
["trafilatura", "baseline"]
["trafilatura+readability", "justext", "baseline"]
The text was updated successfully, but these errors were encountered: