You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is my attempt to use grobid on MacOS ARM. The docs state that MacOS is not fully supported so feel free to mark this issue as out of scope.
If anybody got it working, I would be interested in the package versions used.
Here I document what I tried and how far I got:
System
MacOS 14.4.1 (ARM M3)
java --version
openjdk 17.0.10 2024-01-16 LTS
OpenJDK Runtime Environment Zulu17.48+15-CA (build 17.0.10+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.48+15-CA (build 17.0.10+7-LTS, mixed mode, sharing)
Steps
#clone grobid#cd grobid# shared venv
uv venv -p 3.9
source .venv/bin/activate
uv pip install jep==4.2.0
cp .venv/lib/python3.9/site-packages/jep/jep.cpython-39-darwin.so grobid-home/lib/mac_arm-64/libjep.dylib
# prepare delft# I think 0.3.3 is used in the container if I remember correctly
git clone --branch v0.3.3 https://github.com/kermitt2/delft
cd delft
# change requirements until delft works - very scientific
wget -O delftMacArm.patch http://sprunge.us/iFQCZx
git apply delftMacArm.patch
uv pip install -r requirements.txt
python setup.py build install
# test delft# python delft/applications/grobidTagger.py date tag --architecture BidLSTM_CRF# enjoy json output :)cd ..
# build grobid
./gradlew clean install
# the patch edits grobid-home/config/grobid.yaml# 1. change delft: install: "../delft" to: delft: install: "delft"# 2. use delft models
wget -O grobidConf.patch http://sprunge.us/o8IbpR
git apply grobidConf.patch
# I had to include the path to the libpython from the venv here
java -Xmx4G -Djava.library.path=grobid-home/lib/mac_arm-64:/opt/homebrew/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib -jar grobid-core/build/libs/grobid-core-0.8.1-SNAPSHOT-onejar.jar -gH grobid-home -dIn /Users/ascadian/Projects/paperSegmentation/train_data/raw -dOut /Users/ascadian/Projects/paperSegmentation/train_data/anno_raw_test -exe createTraining
Output/Error
22:38:05.157 [main] INFO org.grobid.core.main.GrobidHomeFinder - No Grobid property was provided. Attempting to find Grobid home in the current directory...
22:38:05.161 [main] INFO org.grobid.core.main.GrobidHomeFinder - *** USING GROBID HOME: /Users/ascadian/Projects/grobid3/grobid-home
22:38:05.163 [main] INFO org.grobid.core.main.GrobidHomeFinder - No Grobid property was provided. Attempting to find Grobid home in the current directory...
22:38:05.163 [main] INFO org.grobid.core.main.GrobidHomeFinder - *** USING GROBID HOME: /Users/ascadian/Projects/grobid3/grobid-home
22:38:05.163 [main] INFO org.grobid.core.main.GrobidHomeFinder - Grobid config file location was not explicitly set via 'org.grobid.config' system variable, defaulting to: /Users/ascadian/Projects/grobid3/grobid-home/config/grobid.yaml
22:38:05.280 [main] INFO org.grobid.core.main.LibraryLoader - Loading external native sequence labelling library
22:38:05.286 [main] INFO org.grobid.core.main.LibraryLoader - Loading Wapiti native library...
22:38:05.489 [main] INFO org.grobid.core.main.LibraryLoader - Loading JEP native library for DeLFT... /Users/ascadian/Projects/grobid3/grobid-home/lib/mac_arm-64
22:38:05.640 [main] INFO org.grobid.core.main.LibraryLoader - Native library for sequence labelling loaded
22:38:05.642 [main] INFO org.grobid.core.lexicon.Lexicon - Initiating dictionary
22:38:05.642 [main] INFO org.grobid.core.lexicon.Lexicon - End of Initialization of dictionary
22:38:05.642 [main] INFO org.grobid.core.lexicon.Lexicon - Initiating names
22:38:05.642 [main] INFO org.grobid.core.lexicon.Lexicon - End of initialization of names
22:38:05.885 [main] INFO org.grobid.core.lexicon.Lexicon - Initiating country codes
22:38:05.888 [main] INFO org.grobid.core.lexicon.Lexicon - End of initialization of country codes
.DS_Store
2004.03577.pdf
NeurIPS-2023-modelling-cellular-perturbations-with-the-sparse-additive-mechanism-shift-variational-autoencoder-Paper-Conference.pdf
s41586-024-07303-5.pdf
fpsyg-07-00789.pdf
4 files to be processed.
/Users/ascadian/Projects/paperSegmentation/train_data/raw/2004.03577.pdf
[Wapiti] Loading model: "/Users/ascadian/Projects/grobid3/grobid-home/models/fulltext/model.wapiti"
Model path: /Users/ascadian/Projects/grobid3/grobid-home/models/fulltext/model.wapiti
[Wapiti] Loading model: "/Users/ascadian/Projects/grobid3/grobid-home/models/segmentation/model.wapiti"
Model path: /Users/ascadian/Projects/grobid3/grobid-home/models/segmentation/model.wapiti
22:38:09.792 [main] INFO org.grobid.core.jni.DeLFTModel - Loading DeLFT model for reference-segmenter with architecture BidLSTM_ChainCRF_FEATURES...
22:38:09.794 [pool-1-thread-1] INFO org.grobid.core.jni.JEPThreadPool - Creating JEP instance for thread 19
WARNING: Failed to get and cache frequent class types!
WARNING: Failed to get and cache primitive class types!
22:38:09.846 [pool-1-thread-1] ERROR org.grobid.core.jni.JEPThreadPool - JEP initialisation failed
22:38:09.878 [pool-1-thread-1] INFO org.grobid.core.jni.JEPThreadPool - Creating JEP instance for thread 19
WARNING: Failed to get and cache frequent class types!
WARNING: Failed to get and cache primitive class types!
22:38:09.879 [pool-1-thread-1] ERROR org.grobid.core.jni.JEPThreadPool - JEP initialisation failed
22:38:09.884 [main] ERROR org.grobid.core.jni.DeLFTModel - DeLFT model reference_segmenter labelling failed
java.util.concurrent.ExecutionException: java.lang.RuntimeException: JEP initialisation failed
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.grobid.core.jni.JEPThreadPool.call(JEPThreadPool.java:176)
at org.grobid.core.jni.DeLFTModel.label(DeLFTModel.java:194)
at org.grobid.core.engines.tagging.DeLFTTagger.label(DeLFTTagger.java:29)
at org.grobid.core.engines.AbstractParser.label(AbstractParser.java:47)
at org.grobid.core.engines.ReferenceSegmenterParser.createTrainingData(ReferenceSegmenterParser.java:334)
at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1153)
at org.grobid.core.engines.Engine.createTraining(Engine.java:551)
at org.grobid.core.engines.Engine.batchCreateTraining(Engine.java:655)
at org.grobid.core.engines.ProcessEngine.createTraining(ProcessEngine.java:376)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.grobid.core.utilities.Utilities.launchMethod(Utilities.java:344)
at org.grobid.core.main.batch.GrobidMain.main(GrobidMain.java:194)
Caused by: java.lang.RuntimeException: JEP initialisation failed
at org.grobid.core.jni.JEPThreadPool.createJEPInstance(JEPThreadPool.java:135)
at org.grobid.core.jni.JEPThreadPool.getJEPInstance(JEPThreadPool.java:151)
at org.grobid.core.jni.DeLFTModel$LabelTask.call(DeLFTModel.java:119)
at org.grobid.core.jni.DeLFTModel$LabelTask.call(DeLFTModel.java:84)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
22:38:09.886 [main] ERROR org.grobid.core.engines.Engine - An error occured while processing the following pdf: /Users/ascadian/Projects/paperSegmentation/train_data/raw/2004.03577.pdf
org.grobid.core.exceptions.GrobidException: [GENERAL] An exception occurred while running Grobid training data generation for full text.
at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1562)
at org.grobid.core.engines.Engine.createTraining(Engine.java:551)
at org.grobid.core.engines.Engine.batchCreateTraining(Engine.java:655)
at org.grobid.core.engines.ProcessEngine.createTraining(ProcessEngine.java:376)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.grobid.core.utilities.Utilities.launchMethod(Utilities.java:344)
at org.grobid.core.main.batch.GrobidMain.main(GrobidMain.java:194)
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.commons.lang3.tuple.Pair.getLeft()" because "result" is null
at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1154)
... 9 common frames omitted
Used patches (in case the pastebin is unavailable)
I use usually Conda and I install most of the packages with pip unless they are particularly annoying (e.g. try to compile -fail - etc...)
The JEP library should not need to be copied under the grobid-home because the version in the python env should be used directly. For doing that you should export the equivalent of CONDA_PREFIX directory corresponding to VENV before running grobid.
This is my attempt to use grobid on MacOS ARM. The docs state that MacOS is not fully supported so feel free to mark this issue as out of scope.
If anybody got it working, I would be interested in the package versions used.
Here I document what I tried and how far I got:
System
MacOS
14.4.1
(ARM M3)java --version
Steps
Output/Error
Used patches (in case the pastebin is unavailable)
grobidConf.patch
delftMacArm.patch
The text was updated successfully, but these errors were encountered: