PyGrobid

A Python wrapper for the Grobid scholarly information extraction library

Simple Demo

Note: For pygrobid to function the Java server must be started and running. pygrobid does not (yet) start Java side of things up itself. Currently it can be started from the root directory with:

cd pygrobid && mvn exec:exec -Pstart_grobid`

Then you can do:

from pygrobid import Grobid, start_server

start_server()
g = Grobid()
g.process_references('some_pdf_file.pdf')
g.shutdown()

About Grobid

GROBID (or Grobid) means GeneRation Of BIbliographic Data.

GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured TEI-encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as side project since the beginning and is expected to continue until at least 2020 :)

For a list of features and more information, see the Kermitt2/Grobid repo.

Installation

Java Dependencies

These are required for running Grobid.

Java 1.8+
Apache Maven

Package Installation

pip install pygrobid && python -c 'import pygrobid; pygrobid.get_dependencies()'

License

This package is license under Apache License 2.0, the same license as the base Java library.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
pygrobid		pygrobid
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
DEV.md		DEV.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
conda_env_specification.txt		conda_env_specification.txt
install_local.sh		install_local.sh
install_miniconda.sh		install_miniconda.sh
miniconda_info.sh		miniconda_info.sh
requirements.txt		requirements.txt
run_in_environment.sh		run_in_environment.sh
setup.py		setup.py

License

thundergolfer-old/PyGrobid

Folders and files

Latest commit

History

Repository files navigation

PyGrobid

Simple Demo

About Grobid

Installation

Java Dependencies

Package Installation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages