You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I noticed that, at least since v0.7.3, GROBID started returning bibtex by default for /api/processHeaderDocument. This contradicts https://grobid.readthedocs.io/en/latest/Grobid-service/#apiprocessheaderdocument which claims a special Accept: application/x-bibtex header must be used for BibTeX and that the default is TEI XML.
Note that it's possible to get an XML response by using Accept: application/xml.
Make a request against the GROBID API. I used the HuggingFace demo API: curl https://kermitt2-grobid.hf.space/api/processHeaderDocument --form input=@Downloads/2212.12604v1.pdf
See that the output contains BibTeX and not TEI XML:
@misc{-1,
author = {},
title = {Search for new physics in the τ lepton plus missing transverse momentum final state in proton-proton collisions at √ s = 13 TeV The CMS Collaboration},
date = {2022-12-23},
year = {2022},
month = {12},
day = {23},
eprint = {arXiv:2212.12604v1[hep-ex]},
abstract = {A search for physics beyond the standard model (SM) in the final state with a hadronically decaying tau lepton and a neutrino is presented. This analysis is based on data recorded by the CMS experiment from proton-proton collisions at a center-ofmass energy of 13 TeV at the LHC, corresponding to a total integrated luminosity of 138 fb-1. The transverse mass spectrum is analyzed for the presence of new physics. No significant deviation from the SM prediction is observed. Limits are set on the production cross section of a W boson decaying into a tau lepton and a neutrino. Lower limits are set on the mass of the sequential SM-like heavy charged vector boson and the mass of a quantum black hole. Upper limits are placed on the couplings of a new boson to the SM fermions. Constraints are put on a nonuniversal gauge interaction model and an effective field theory model. For the first time, upper limits on the cross section of t-channel leptoquark (LQ) exchange are presented. These limits are translated into exclusion limits on the LQ mass and on its coupling in the t-channel. The sensitivity of this analysis extends into the parameter space of LQ models that attempt to explain the anomalies observed in B meson decays. The limits presented for the various interpretations are the most stringent to date. Additionally, a model-independent limit is provided.}
}
Linux amd64 through lfoppiano/grobid:0.7.3 Docker image & whatever huggingface is using
What is your Java version (java --version)?
openjdk 17.0.2 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
In case of build or run errors, please submit the error while running gradlew with --stacktrace and --info for better log traces (e.g. ./gradlew run --stacktrace --info) or attach the log file logs/grobid-service.log.
The text was updated successfully, but these errors were encountered:
Hi @michamos, long time not see 😄
It's nice that you're back working with Grobid?
Thanks for opening the issue.
It seems more a problem due to how Jakarta selects the default when Accept is not specified.
In local, when I use the same request you posted, I get TEI-XML, however I think it depends how the methods are loaded. It seems that there is no clear behaviour, althought this looks strange.
One solution I saw is to add an additional filter to default the Accept to application/xml when undefined, but it seems a bit of a hack and might affect other endpoints.
Hi @lfoppiano, indeed :) We've been using GROBID in prod for INSPIRE for quite a while now. We use it to extract author and affiliation info from PDFs and to segment references for interactive search (so users can copy/paste references from a paper and it magically works). Unfortunately, our current resources are very limited, so we can't really contribute beyond submitting bug reports.
I dug into this and did not find a clean solution. I'm quite surprised that there is no way to define a default behavior.
It seems that the behavior is random depending on the platform where it's running.
Nevertheless, I updated the documentation, though, stating that the Accept header is required.
Hi, I noticed that, at least since v0.7.3, GROBID started returning bibtex by default for
/api/processHeaderDocument
. This contradicts https://grobid.readthedocs.io/en/latest/Grobid-service/#apiprocessheaderdocument which claims a specialAccept: application/x-bibtex
header must be used for BibTeX and that the default is TEI XML.Note that it's possible to get an XML response by using
Accept: application/xml
.Steps to reproduce
curl https://kermitt2-grobid.hf.space/api/processHeaderDocument --form input=@Downloads/2212.12604v1.pdf
Requested info
Linux amd64 through
lfoppiano/grobid:0.7.3
Docker image & whatever huggingface is usingjava --version
)?openjdk 17.0.2 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
--stacktrace
and--info
for better log traces (e.g../gradlew run --stacktrace --info
) or attach the log filelogs/grobid-service.log
.The text was updated successfully, but these errors were encountered: