Skip to content
@pd3f

pd3f

PDF text extraction pipeline: self-hosted, local-first and Docker-based

Pinned

  1. pd3f pd3f Public

    🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

    HTML 273 35

  2. pd3f-core pd3f-core Public

    📑 Python Package to reconstruct the original continuous text from PDFs with language models

    Jupyter Notebook 35 8

  3. dehyphen dehyphen Public

    📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

    Python 38 4

Repositories

Showing 7 of 7 repositories

Top languages

Loading…

Most used topics

Loading…