Skip to content

Releases: Filimoa/open-parse

v0.5.6 (2024-05-01)

02 May 00:35
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.5...v0.5.6

v0.5.5

28 Apr 17:39
7e3fdd8
Compare
Choose a tag to compare

What's Changed

  • [Memory Leak Fix] Create Fitz Pdf From Bytestream by @Filimoa in #39

Full Changelog: v0.5.4...v0.5.5

v0.5.4

24 Apr 15:00
56b5a88
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.3...v0.5.4

v0.5.3 (2024-04-21)

22 Apr 03:56
Compare
Choose a tag to compare

Minor bug fixes

What's Changed

  • Update pymupdf.md by @ada-lovecraft in #20
  • update the cookbooks link by @brianjking in #24
  • fix: Fix sequence item 2: expected str instance, NoneType found exception when table output is set to markdown. by @ic-xu in #27

New Contributors

Full Changelog: v0.5.2...v0.5.3

v0.5.2 (2024-04-11)

11 Apr 14:23
Compare
Choose a tag to compare

Features

  • Better version display
  • Fixed pytorch device bug. Thanks @jinmang2
  • Add global config to set pytorch device

v0.5.1 (2024-04-05)

08 Apr 20:58
06509d1
Compare
Choose a tag to compare

Bug Fixes

  • Fixed type hinting bug for python < 3.10

Full Changelog: v0.5.0...v0.5.1

v0.5.0 (2024-04-07)

08 Apr 04:20
Compare
Choose a tag to compare

0.5.0 (2024-04-01)

What's Changed

  • SemanticProcessing! This is the recommended processing pipeline.
  • Add optional annotations to the pdf draw functions
  • Fixed reading order bug

Breaking Changes

  1. Renaming
  • Node.aggregate_position renamed to Node.reading_order.
  • RemoveStubs to RemoveNodesBelowNTokens
  1. Refactored processing pipelines to use a class to promote ease of reuse

Previously

from openparse import ProcessingStep, default_pipeline, Node
from typing import List


class CustomCombineTables(ProcessingStep):
    def process(self, nodes: List[Node]) -> List[Node]:
        return nodes


# copy the default pipeline (or create a new one)
custom_pipeline = default_pipeline.copy()
custom_pipeline.append(CustomCombineTables())

parser = openparse.DocumentParser(
    table_args={"parsing_algorithm": "pymupdf"}, processing_pipeline=custom_pipeline
)
custom_10k = parser.parse(meta10k_path)

Now becomes

from openparse import processing, Node
from typing import List


class CustomCombineTables(processing.ProcessingStep):
    def process(self, nodes: List[Node]) -> List[Node]:
        return nodes


# copy the default pipeline (or create a new one)
custom_pipeline = processing.BasicIngestionPipeline()
custom_pipeline.append_transform(CustomCombineTables())

parser = openparse.DocumentParser(
    table_args={"parsing_algorithm": "pymupdf"}, processing_pipeline=custom_pipeline
)
custom_10k = parser.parse(meta10k_path)
  1. openai and numpy as now required dependencies, will likely split this out in the future.

Full Changelog: v0.4.1...v0.5.0

v0.4.1 (2024-04-05)

05 Apr 19:33
dd33fb0
Compare
Choose a tag to compare

What's Changed

  • Better error messages for missing weights
  • Type hinting bug with python 3.8 fixed

Full Changelog: v0.4.0...v0.4.1

0.4.0 (2024-04-04)

05 Apr 04:54
80e2df9
Compare
Choose a tag to compare

What's Changed

  • ✨ Unitable support for table content extraction!
  • 🐛 Fixed bug with table transformers failing on multiple pages.
  • ✨ Improved table docs

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.4.0

0.3.1 (2024-04-01)

01 Apr 21:36
252390f
Compare
Choose a tag to compare

What's Changed

  • Fixed #4