Skip to content

Releases: VikParuchuri/marker

Speed improvements

23 May 23:24
0d9b0db
Compare
Choose a tag to compare
  • Enable parallel text extraction, with worker count settings
  • Bump surya version to pull in layout/line segmentation speed improvements, and OCR bug fix

Faster OCR

18 May 04:28
cc9d830
Compare
Choose a tag to compare
  • OCR is now ~2.5x faster, due to improvements in surya

Speed up inference

17 May 22:57
a056562
Compare
Choose a tag to compare
  • (from surya) faster ocr, line detection, layout inference
  • Unpin transformers version after testing

Should be significantly faster now, but haven't fully benchmarked, since I'm running low on time this week!

Fix memory leak

16 May 22:46
74adf35
Compare
Choose a tag to compare
  • Fix a memory leak (fixed in surya, bumped the version). This caused high CPU memory usage on long docs.
  • Improve load_all_models to take device and dtype

Marker v2

10 May 16:02
6f8b239
Compare
Choose a tag to compare

Basically a full rewrite!

Main features:

  • Extracts and saves images
  • Improved table formatting
  • Better markdown wrapping
  • Better reading order on complex docs
  • Improved OCR engine with more language options
  • Simple pip package install (no more required system dependencies), so can be used easily on Windows
  • Can be used commercially (pymupdf and layoutlmv3 dependencies removed)

It takes ~2x as long to run now, but seems like a decent tradeoff.

See the README for details.