Releases: Filimoa/open-parse
Releases · Filimoa/open-parse
v0.5.6 (2024-05-01)
v0.5.5
v0.5.4
v0.5.3 (2024-04-21)
Minor bug fixes
What's Changed
- Update pymupdf.md by @ada-lovecraft in #20
- update the cookbooks link by @brianjking in #24
- fix: Fix sequence item 2: expected str instance, NoneType found exception when table output is set to markdown. by @ic-xu in #27
New Contributors
- @ada-lovecraft made their first contribution in #20
- @brianjking made their first contribution in #24
- @ic-xu made their first contribution in #27
Full Changelog: v0.5.2...v0.5.3
v0.5.2 (2024-04-11)
Features
- Better version display
- Fixed pytorch device bug. Thanks @jinmang2
- Add global config to set pytorch device
v0.5.1 (2024-04-05)
v0.5.0 (2024-04-07)
0.5.0 (2024-04-01)
What's Changed
- SemanticProcessing! This is the recommended processing pipeline.
- Add optional annotations to the pdf draw functions
- Fixed reading order bug
Breaking Changes
- Renaming
Node.aggregate_position
renamed toNode.reading_order
.RemoveStubs
toRemoveNodesBelowNTokens
- Refactored processing pipelines to use a class to promote ease of reuse
Previously
from openparse import ProcessingStep, default_pipeline, Node
from typing import List
class CustomCombineTables(ProcessingStep):
def process(self, nodes: List[Node]) -> List[Node]:
return nodes
# copy the default pipeline (or create a new one)
custom_pipeline = default_pipeline.copy()
custom_pipeline.append(CustomCombineTables())
parser = openparse.DocumentParser(
table_args={"parsing_algorithm": "pymupdf"}, processing_pipeline=custom_pipeline
)
custom_10k = parser.parse(meta10k_path)
Now becomes
from openparse import processing, Node
from typing import List
class CustomCombineTables(processing.ProcessingStep):
def process(self, nodes: List[Node]) -> List[Node]:
return nodes
# copy the default pipeline (or create a new one)
custom_pipeline = processing.BasicIngestionPipeline()
custom_pipeline.append_transform(CustomCombineTables())
parser = openparse.DocumentParser(
table_args={"parsing_algorithm": "pymupdf"}, processing_pipeline=custom_pipeline
)
custom_10k = parser.parse(meta10k_path)
openai
andnumpy
as now required dependencies, will likely split this out in the future.
Full Changelog: v0.4.1...v0.5.0
v0.4.1 (2024-04-05)
What's Changed
- Better error messages for missing weights
- Type hinting bug with python 3.8 fixed
Full Changelog: v0.4.0...v0.4.1
0.4.0 (2024-04-04)
What's Changed
- ✨ Unitable support for table content extraction!
- 🐛 Fixed bug with table transformers failing on multiple pages.
- ✨ Improved table docs
What's Changed
New Contributors
Full Changelog: v0.3.1...v0.4.0
0.3.1 (2024-04-01)
What's Changed
- Fixed #4