Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing grobidmonkey: A Python Package for grobid output Parsing #1098

Open
com3dian opened this issue Apr 14, 2024 · 1 comment
Open

Comments

@com3dian
Copy link

Last year, I reached out to the community seeking a Python solution for extracting and parsing content from Grobid's TEI-XML output. Under the original issue, I noticed other users expressing the same need. Faced with these challenges, I've taken the initiative to develop a Python package named grobidmonkey to address this issue.

While it's still in its early versions, I believe grobidmonkey can be a valuable tool for the community. I'm eager to hear your thoughts and feedback to make it better.

GitHub Repository: grobidmonkey

The package is currently only available through pip and can be installed with

pip install grobidmonkey

to use it you can run

from grobidmonkey import reader
monkeyReader = reader.MonkeyReader('monkey') # or 'lxml' or 'x2d'

# read paper outline
outline = monkeyReader.readOutline('/path/to/your/paper.pdf.tei.xml')

# read paper content
essay = monkeyReader.readEssay('/path/to/your/paper.pdf.tei.xml')
@com3dian com3dian changed the title Introducing grobidmonkey: A Python Package for TEI-XML Parsing Introducing grobidmonkey: A Python Package for grobid output Parsing Apr 14, 2024
@lfoppiano
Copy link
Collaborator

@com3dian thanks for your contribution. I did not yet the opportunity to test it. As soon as I do I will surely write you my feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants