You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Last year, I reached out to the community seeking a Python solution for extracting and parsing content from Grobid's TEI-XML output. Under the original issue, I noticed other users expressing the same need. Faced with these challenges, I've taken the initiative to develop a Python package named grobidmonkey to address this issue.
While it's still in its early versions, I believe grobidmonkey can be a valuable tool for the community. I'm eager to hear your thoughts and feedback to make it better.
The package is currently only available through pip and can be installed with
pip install grobidmonkey
to use it you can run
fromgrobidmonkeyimportreadermonkeyReader=reader.MonkeyReader('monkey') # or 'lxml' or 'x2d'# read paper outlineoutline=monkeyReader.readOutline('/path/to/your/paper.pdf.tei.xml')
# read paper contentessay=monkeyReader.readEssay('/path/to/your/paper.pdf.tei.xml')
The text was updated successfully, but these errors were encountered:
com3dian
changed the title
Introducing grobidmonkey: A Python Package for TEI-XML Parsing
Introducing grobidmonkey: A Python Package for grobid output Parsing
Apr 14, 2024
Last year, I reached out to the community seeking a Python solution for extracting and parsing content from Grobid's TEI-XML output. Under the original issue, I noticed other users expressing the same need. Faced with these challenges, I've taken the initiative to develop a Python package named
grobidmonkey
to address this issue.While it's still in its early versions, I believe grobidmonkey can be a valuable tool for the community. I'm eager to hear your thoughts and feedback to make it better.
GitHub Repository: grobidmonkey
The package is currently only available through pip and can be installed with
to use it you can run
The text was updated successfully, but these errors were encountered: