Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grobid with DL models natively on MacOS ARM #1108

Open
Schroedi opened this issue Apr 29, 2024 · 1 comment
Open

Grobid with DL models natively on MacOS ARM #1108

Schroedi opened this issue Apr 29, 2024 · 1 comment
Labels
macOS-specific Issue visible only on macOS environments

Comments

@Schroedi
Copy link

This is my attempt to use grobid on MacOS ARM. The docs state that MacOS is not fully supported so feel free to mark this issue as out of scope.

If anybody got it working, I would be interested in the package versions used.

Here I document what I tried and how far I got:

System

MacOS 14.4.1 (ARM M3)
java --version

openjdk 17.0.10 2024-01-16 LTS
OpenJDK Runtime Environment Zulu17.48+15-CA (build 17.0.10+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.48+15-CA (build 17.0.10+7-LTS, mixed mode, sharing)

Steps

#clone grobid
#cd grobid

# shared venv
uv venv -p 3.9
source .venv/bin/activate
uv pip install jep==4.2.0
cp .venv/lib/python3.9/site-packages/jep/jep.cpython-39-darwin.so grobid-home/lib/mac_arm-64/libjep.dylib

# prepare delft
# I think 0.3.3 is used in the container if I remember correctly
git clone --branch v0.3.3 https://github.com/kermitt2/delft
cd delft
# change requirements until delft works - very scientific
wget -O delftMacArm.patch http://sprunge.us/iFQCZx
git apply delftMacArm.patch
uv pip install -r requirements.txt
python setup.py build install
# test delft
# python delft/applications/grobidTagger.py date tag --architecture BidLSTM_CRF
# enjoy json output :)
cd ..

# build grobid
./gradlew clean install

# the patch edits grobid-home/config/grobid.yaml
# 1. change delft: install: "../delft" to: delft: install: "delft"
# 2. use delft models
wget -O grobidConf.patch http://sprunge.us/o8IbpR
git apply grobidConf.patch

# I had to include the path to the libpython from the venv here
java -Xmx4G -Djava.library.path=grobid-home/lib/mac_arm-64:/opt/homebrew/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib -jar grobid-core/build/libs/grobid-core-0.8.1-SNAPSHOT-onejar.jar -gH grobid-home -dIn /Users/ascadian/Projects/paperSegmentation/train_data/raw -dOut /Users/ascadian/Projects/paperSegmentation/train_data/anno_raw_test  -exe createTraining

Output/Error

22:38:05.157 [main] INFO  org.grobid.core.main.GrobidHomeFinder - No Grobid property was provided. Attempting to find Grobid home in the current directory...
22:38:05.161 [main] INFO  org.grobid.core.main.GrobidHomeFinder - *** USING GROBID HOME: /Users/ascadian/Projects/grobid3/grobid-home
22:38:05.163 [main] INFO  org.grobid.core.main.GrobidHomeFinder - No Grobid property was provided. Attempting to find Grobid home in the current directory...
22:38:05.163 [main] INFO  org.grobid.core.main.GrobidHomeFinder - *** USING GROBID HOME: /Users/ascadian/Projects/grobid3/grobid-home
22:38:05.163 [main] INFO  org.grobid.core.main.GrobidHomeFinder - Grobid config file location was not explicitly set via 'org.grobid.config' system variable, defaulting to: /Users/ascadian/Projects/grobid3/grobid-home/config/grobid.yaml
22:38:05.280 [main] INFO  org.grobid.core.main.LibraryLoader - Loading external native sequence labelling library
22:38:05.286 [main] INFO  org.grobid.core.main.LibraryLoader - Loading Wapiti native library...
22:38:05.489 [main] INFO  org.grobid.core.main.LibraryLoader - Loading JEP native library for DeLFT... /Users/ascadian/Projects/grobid3/grobid-home/lib/mac_arm-64
22:38:05.640 [main] INFO  org.grobid.core.main.LibraryLoader - Native library for sequence labelling loaded
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - Initiating dictionary
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - End of Initialization of dictionary
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - Initiating names
22:38:05.642 [main] INFO  org.grobid.core.lexicon.Lexicon - End of initialization of names
22:38:05.885 [main] INFO  org.grobid.core.lexicon.Lexicon - Initiating country codes
22:38:05.888 [main] INFO  org.grobid.core.lexicon.Lexicon - End of initialization of country codes
.DS_Store
2004.03577.pdf
NeurIPS-2023-modelling-cellular-perturbations-with-the-sparse-additive-mechanism-shift-variational-autoencoder-Paper-Conference.pdf
s41586-024-07303-5.pdf
fpsyg-07-00789.pdf
4 files to be processed.
/Users/ascadian/Projects/paperSegmentation/train_data/raw/2004.03577.pdf
[Wapiti] Loading model: "/Users/ascadian/Projects/grobid3/grobid-home/models/fulltext/model.wapiti"
Model path: /Users/ascadian/Projects/grobid3/grobid-home/models/fulltext/model.wapiti
[Wapiti] Loading model: "/Users/ascadian/Projects/grobid3/grobid-home/models/segmentation/model.wapiti"
Model path: /Users/ascadian/Projects/grobid3/grobid-home/models/segmentation/model.wapiti
22:38:09.792 [main] INFO  org.grobid.core.jni.DeLFTModel - Loading DeLFT model for reference-segmenter with architecture BidLSTM_ChainCRF_FEATURES...
22:38:09.794 [pool-1-thread-1] INFO  org.grobid.core.jni.JEPThreadPool - Creating JEP instance for thread 19
WARNING: Failed to get and cache frequent class types!
WARNING: Failed to get and cache primitive class types!
22:38:09.846 [pool-1-thread-1] ERROR org.grobid.core.jni.JEPThreadPool - JEP initialisation failed
22:38:09.878 [pool-1-thread-1] INFO  org.grobid.core.jni.JEPThreadPool - Creating JEP instance for thread 19
WARNING: Failed to get and cache frequent class types!
WARNING: Failed to get and cache primitive class types!
22:38:09.879 [pool-1-thread-1] ERROR org.grobid.core.jni.JEPThreadPool - JEP initialisation failed
22:38:09.884 [main] ERROR org.grobid.core.jni.DeLFTModel - DeLFT model reference_segmenter labelling failed
java.util.concurrent.ExecutionException: java.lang.RuntimeException: JEP initialisation failed
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.grobid.core.jni.JEPThreadPool.call(JEPThreadPool.java:176)
	at org.grobid.core.jni.DeLFTModel.label(DeLFTModel.java:194)
	at org.grobid.core.engines.tagging.DeLFTTagger.label(DeLFTTagger.java:29)
	at org.grobid.core.engines.AbstractParser.label(AbstractParser.java:47)
	at org.grobid.core.engines.ReferenceSegmenterParser.createTrainingData(ReferenceSegmenterParser.java:334)
	at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1153)
	at org.grobid.core.engines.Engine.createTraining(Engine.java:551)
	at org.grobid.core.engines.Engine.batchCreateTraining(Engine.java:655)
	at org.grobid.core.engines.ProcessEngine.createTraining(ProcessEngine.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.grobid.core.utilities.Utilities.launchMethod(Utilities.java:344)
	at org.grobid.core.main.batch.GrobidMain.main(GrobidMain.java:194)
Caused by: java.lang.RuntimeException: JEP initialisation failed
	at org.grobid.core.jni.JEPThreadPool.createJEPInstance(JEPThreadPool.java:135)
	at org.grobid.core.jni.JEPThreadPool.getJEPInstance(JEPThreadPool.java:151)
	at org.grobid.core.jni.DeLFTModel$LabelTask.call(DeLFTModel.java:119)
	at org.grobid.core.jni.DeLFTModel$LabelTask.call(DeLFTModel.java:84)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
22:38:09.886 [main] ERROR org.grobid.core.engines.Engine - An error occured while processing the following pdf: /Users/ascadian/Projects/paperSegmentation/train_data/raw/2004.03577.pdf
org.grobid.core.exceptions.GrobidException: [GENERAL] An exception occurred while running Grobid training data generation for full text.
	at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1562)
	at org.grobid.core.engines.Engine.createTraining(Engine.java:551)
	at org.grobid.core.engines.Engine.batchCreateTraining(Engine.java:655)
	at org.grobid.core.engines.ProcessEngine.createTraining(ProcessEngine.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.grobid.core.utilities.Utilities.launchMethod(Utilities.java:344)
	at org.grobid.core.main.batch.GrobidMain.main(GrobidMain.java:194)
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.commons.lang3.tuple.Pair.getLeft()" because "result" is null
	at org.grobid.core.engines.FullTextParser.createTraining(FullTextParser.java:1154)
	... 9 common frames omitted

Used patches (in case the pastebin is unavailable)

grobidConf.patch
delftMacArm.patch

@lfoppiano
Copy link
Collaborator

lfoppiano commented Apr 29, 2024

For tensorflow on ARM Apple, you should install tensorflow-deps using conda (https://github.com/lfoppiano/material-parsers?tab=readme-ov-file#set-up-on-apple-m1, you can stop before the spacy model download stuff - same scientific approach 😄 )

I use usually Conda and I install most of the packages with pip unless they are particularly annoying (e.g. try to compile -fail - etc...)

The JEP library should not need to be copied under the grobid-home because the version in the python env should be used directly. For doing that you should export the equivalent of CONDA_PREFIX directory corresponding to VENV before running grobid.

@lfoppiano lfoppiano added the macOS-specific Issue visible only on macOS environments label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macOS-specific Issue visible only on macOS environments
Projects
None yet
Development

No branches or pull requests

2 participants