Skip to content

trafilatura-1.6.2

Compare
Choose a tag to compare
@adbar adbar released this 06 Sep 15:45
· 82 commits to master since this release
5ce31d9

Extraction:

  • more lenient HTML parsing (#370)
  • improved code block support with @idoshamun (#372, #401)
  • convertion of relative links to absolute by @feltcat (#377)
  • remove use of signal from core functions (#384)

Metadata:

Command-line interface:

  • more robust batch processing (#381)
  • added --probe option to CLI to check for extractable content (#378, #392)

Maintenance:

  • simplified code (#408)
  • support for Python 3.12
  • pinned LXML version for MacOS (#393)
  • updated dependencies and parameters (notably htmldate and courlan)
  • code cleaning by @marksmayo (#406)