Updated llama.cpp engine to version b2581 #3066

MatPere · 2024-04-05T14:57:23Z

Description

The llama engine code is now compatible with the b2581 release of llama.cpp repository, upgrading from b1696.

Note : This is just a compatibility update, some code had changed on the llama.cpp interface, making this engine unable to run. This pull request does not provide any kind of new functionality in itself, however it opens up the opportunity to further enhance the llama engine with the newest llama.cpp tools.
A consequence of this change is that the llama engine now supports Qwen-based GGUF models, which were incompatible on version b1696.
The test were done on a linux-x86_64 architecture. The changes were not tested on linux-aarch64, osx-x86_64, osx-aarch64 and win-x86_64.
IMPORTANT : djl/engines/llama/build.gradle downloads binaries for the different architectures from "https://publish.djl.ai/llama/${**llamacpp_version**}/jnilib/${djl_version}". Since I did not update that repository with the newest binaries, ./gradlew build WILL FAIL.
as a temporary workaround for the test, disable the download in build.gradle, and use the task ./gradlew compileJNI to build your own binaries for your OS.

--------- Co-authored-by: Administrator <Administrator@tech8> Co-authored-by: KexinFeng <fenkexin@amazon.com>

* Implement PtNDArraryEx.multiboxDetection * MultiboxDetection - code cleanup * MultiboxDetection - code cleanup * MultiboxDetection - code cleanup * MultiboxDetection - code cleanup * format code * Fix, add tests, and pass CI --------- Co-authored-by: Zach Kimberg <kimbergz@amazon.com>

…brary#2796) This reverts commit 3a90d0a.

This fixes the markdown headers to be h1 so they render correctly in docs.

…y#2818)

…valibrary#2806) * [api] Added Early stopping configuration (deepjavalibrary#38) * [api] Added Builder for Early stopping configuration (deepjavalibrary#38) * Explicitly set NDManager for dataset in EarlyStoppingListenerTest to make the test run on JDK11 in gradle.

This creates an abstraction for combining devices into a single device. The main use case for now is in DJL Serving TP_parallel. It will allow us to create a WorkerGroup and a PyPredictor for a set of devices and then track the usage of devices properly. It could also be used later for multi-gpu training or other multi-device cases.

…alibrary#2826)

* Updates doc versions to 0.24.0 Also moves android gradle.properties to the new 0.25.0. * Remove android change

* Updates XGBoost to 2.0.1 * Use devtools 8 * Updates based on new Xgboost JNI API. --------- Co-authored-by: Frank Liu <frankfliu2000@gmail.com>

Fixes deepjavalibrary#2840

* Added element-wise gauss error function (ERF) * Added element-wise arctan2 * Format java * Fixed docs * added * to other_ptr in Atan2

* Added 2D FFT * Format java * Add default fft2 * Convert array to vectors * Add inverse fft2 * Add better assersion in ifft2 test * Add really better assersion in ifft2 test * Move cast bellow ifft2 for unsupported exception * Format java * changed dims to axes * changed dims to axes

* only build triton binaries * install requests library * remove script

…javalibrary#2850)

Updates the navigation as a followup to deepjavalibrary/djl-serving#1316.

…y#3027)

…ibrary#3026) Fixes: deepjavalibrary#3025

…y#3028)

…rary#3029)

…brary#3032) * support includeTokenTypes in TextEmbeddingBatchTranslator Co-authored-by: Frank Liu <frankfliu2000@gmail.com>

Fixes: deepjavalibrary#3033

…ry#3035)

…avalibrary#3043)

* Increase DJL version to 0.27.0 * Update README

…alibrary#3053) Fixes deepjavalibrary#3048

…brary#3052) Fixes deepjavalibrary#3044

frankfliu · 2024-04-07T17:16:16Z

@MatPere

Would you please take a look this test failure: https://github.com/deepjavalibrary/djl/actions/runs/8590262606/job/23537569545#step:5:201

It seems failed for mac when loading model.

You can reproduce the error locally on your mac:

cd engines/llama
./gradlew compileJNI
./gradlew test -Dnightly=true -Pjni

MatPere · 2024-04-09T09:56:51Z

I don't think I'll be able to get my hands on an osx anytime soon, so I can only suggest leads for anyone willing to help. I did not manage to reproduce the error on my linux-x86_64, so it is likely an os-specific issue, and I have very little experience dealing with osx-related errors.

From my understanding, the error message comes from within the code of llama.cpp (llama.cpp/common/common.cpp, function llama_init_from_gpt_params). Normally, the cause for such an error may come from the content of the model file, or the parameters given to the loader, but without access to the LLAMA_LOG_ERROR itself I can't tell from that message alone. Because it works on other OSs, my guess is that it has to come from the way the model is downloaded or the code is compiled.

Maybe (but I doubt it) this is an internal issue with llama.cpp itself being unable to handle that exact model on osx for some reason, in which case manually downloading and installing the llama.cpp repository as well as the model tinyllama-1.1b-1t-openorca.Q4_K_M.gguf, then launching ./main -m path/to/model.gguf would presumably result in the same failure (and in that case we'd have to post a new issue to their repository).

SidneyLann and others added 30 commits September 19, 2023 17:36

To support Yolov8 (deepjavalibrary#2776)

950340f

--------- Co-authored-by: Administrator <Administrator@tech8> Co-authored-by: KexinFeng <fenkexin@amazon.com>

[onnxruntime] Upgrades OnnxRuntime to 1.16.0 (deepjavalibrary#2784)

da15713

build ft for sm90 (deepjavalibrary#2785)

15fd0d0

Updates LightGBM to 1.7.6 (deepjavalibrary#2793)

3a90d0a

Revert "Updates LightGBM to 1.7.6 (deepjavalibrary#2793)" (deepjavali…

8fd79db

…brary#2796) This reverts commit 3a90d0a.

[tokenizer] Allows import non-english model (deepjavalibrary#2797)

27c6a57

allow to just build for 1 flow (deepjavalibrary#2798)

d432a65

[api] Fixed NDList decode numpy file bug (deepjavalibrary#2804)

458933c

[api] Allows cancel Input (deepjavalibrary#2805)

2f4ebee

[ci] Fixes out of diskspace issue (deepjavalibrary#2808)

90059cd

[docs] Fixes markdown headers (deepjavalibrary#2812)

298ea1f

This fixes the markdown headers to be h1 so they render correctly in docs.

Bump up DJL version to 0.25.0 (deepjavalibrary#2809)

f0b4334

add gpu flag build for triton client (deepjavalibrary#2815)

fe86680

[xgb] Add .xgb file extension support (deepjavalibrary#2810)

1c5aef8

[tokenizers] Upgrade huggingface tokenizers to 1.14.1 (deepjavalibrar…

23e07cf

…y#2818)

[huggingface] Adds CrossEncoderTranslator (deepjavalibrary#2817)

85d9e85

Update README with release update (deepjavalibrary#2823)

7d68857

[api] Replace double-check singlton with lazy initialization (deepjav…

3927867

…alibrary#2826)

[api] Refactor PublisherBytesSupplier.java (deepjavalibrary#2831)

9b0c8c9

Updates doc versions to 0.24.0 (deepjavalibrary#2829)

6981d76

* Updates doc versions to 0.24.0 Also moves android gradle.properties to the new 0.25.0. * Remove android change

Updates XGBoost to 2.0.1 (deepjavalibrary#2833)

715e620

* Updates XGBoost to 2.0.1 * Use devtools 8 * Updates based on new Xgboost JNI API. --------- Co-authored-by: Frank Liu <frankfliu2000@gmail.com>

[tokenizer] Fixes tokenizer bug (deepjavalibrary#2843)

9f55189

Fixes deepjavalibrary#2840

Add erf and atan2 (deepjavalibrary#2842)

f84d3bb

* Added element-wise gauss error function (ERF) * Added element-wise arctan2 * Format java * Fixed docs * added * to other_ptr in Atan2

only build triton binaries (deepjavalibrary#2847)

8f6ff7c

* only build triton binaries * install requests library * remove script

[tokenizer] Update import script for huggingface_hub api change (deep…

e8ceef3

…javalibrary#2850)

[docs] Update serving configuration nav (deepjavalibrary#2853)

e315554

Updates the navigation as a followup to deepjavalibrary/djl-serving#1316.

frankfliu and others added 24 commits March 5, 2024 15:00

[ci] Fixes nightly build for onnx 1.17.1 (deepjavalibrary#3021)

063fb8d

[pytorch] Fixes detecting wrong flavor on macOS issue (deepjavalibrar…

0fa0db2

…y#3027)

[api] Allows to use relative jar uri for cache folder name (deepjaval…

8c5ed49

…ibrary#3026) Fixes: deepjavalibrary#3025

[example] Adds document about how to trace gpt2 model (deepjavalibrar…

7567277

…y#3028)

[docs] update mkdocs structure for new lmi documentation (deepjavalib…

b6b8729

…rary#3029)

support includeTokenTypes in TextEmbeddingBatchTranslator (deepjavali…

6b32ef2

…brary#3032) * support includeTokenTypes in TextEmbeddingBatchTranslator Co-authored-by: Frank Liu <frankfliu2000@gmail.com>

[bom] Fixes djl-serving packages in BOM (deepjavalibrary#3039)

e3a8e4c

[api] Support encode/decode String tensor (deepjavalibrary#3034)

b3b04f5

Fixes: deepjavalibrary#3033

[tokenizer] Adds includeTokenTypes for all translators (deepjavalibra…

ee93305

…ry#3035)

Updates dependencies version to latest (deepjavalibrary#3040)

fa28fbe

[pytorch] Allows to exclude certain DLL from pytorch directory (deepj…

091a41c

…avalibrary#3043)

Update checkstyle tool version to 10.14.2 (deepjavalibrary#3047)

5eb2b9e

Upgrade dependency version (deepjavalibrary#3049)

2c1e7fa

Increase DJL version to 0.27.0 (deepjavalibrary#3046)

a22360a

* Increase DJL version to 0.27.0 * Update README

Increase build version to 0.28.0 (deepjavalibrary#3050)

22b46a9

[llama.cpp] Fixes llama.cpp JNI build issue (deepjavalibrary#3057)

25bec4d

[docs] Update OnnxRuntime version in README. (deepjavalibrary#3056)

9dc1b40

[tokenizer] Upgrade rs-jni to 0.21.1 (deepjavalibrary#3051)

8283acd

[model-zoo] Avoid use properties as part of cached file path (deepjav…

ada3b98

…alibrary#3053) Fixes deepjavalibrary#3048

[tokenizer] Apply Dense layer for TextEmbeddingTranslator (deepjavali…

ae00167

…brary#3052) Fixes deepjavalibrary#3044

[model-zoo] Fixes mxnet yolo metadata.json (deepjavalibrary#3059)

f8791db

updated to llama version b2581

4697b28

detailed removal of mul_mat_q model parameter

f6db856

removed extraneous comments

e6ef807

MatPere requested review from zachgk, frankfliu and a team as code owners April 5, 2024 14:57

frankfliu force-pushed the master branch from ec89a66 to c68f8a7 Compare April 26, 2024 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated llama.cpp engine to version b2581 #3066

Updated llama.cpp engine to version b2581 #3066

MatPere commented Apr 5, 2024

frankfliu commented Apr 7, 2024 •

edited

MatPere commented Apr 9, 2024

Updated llama.cpp engine to version b2581 #3066

Are you sure you want to change the base?

Updated llama.cpp engine to version b2581 #3066

Conversation

MatPere commented Apr 5, 2024

Description

frankfliu commented Apr 7, 2024 • edited

MatPere commented Apr 9, 2024

frankfliu commented Apr 7, 2024 •

edited