Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated llama.cpp engine to version b2581 #3066

Open
wants to merge 156 commits into
base: master
Choose a base branch
from

Conversation

MatPere
Copy link

@MatPere MatPere commented Apr 5, 2024

Description

The llama engine code is now compatible with the b2581 release of llama.cpp repository, upgrading from b1696.

  • Note : This is just a compatibility update, some code had changed on the llama.cpp interface, making this engine unable to run. This pull request does not provide any kind of new functionality in itself, however it opens up the opportunity to further enhance the llama engine with the newest llama.cpp tools.
  • A consequence of this change is that the llama engine now supports Qwen-based GGUF models, which were incompatible on version b1696.
  • The test were done on a linux-x86_64 architecture. The changes were not tested on linux-aarch64, osx-x86_64, osx-aarch64 and win-x86_64.
  • IMPORTANT : djl/engines/llama/build.gradle downloads binaries for the different architectures from "https://publish.djl.ai/llama/${**llamacpp_version**}/jnilib/${djl_version}". Since I did not update that repository with the newest binaries, ./gradlew build WILL FAIL.
  • as a temporary workaround for the test, disable the download in build.gradle, and use the task ./gradlew compileJNI to build your own binaries for your OS.

SidneyLann and others added 30 commits September 19, 2023 17:36
---------

Co-authored-by: Administrator <Administrator@tech8>
Co-authored-by: KexinFeng <fenkexin@amazon.com>
* Implement PtNDArraryEx.multiboxDetection

* MultiboxDetection - code cleanup

* MultiboxDetection - code cleanup

* MultiboxDetection - code cleanup

* MultiboxDetection - code cleanup

* format code

* Fix, add tests, and pass CI

---------

Co-authored-by: Zach Kimberg <kimbergz@amazon.com>
This fixes the markdown headers to be h1 so they render correctly in docs.
…valibrary#2806)

* [api] Added Early stopping configuration (deepjavalibrary#38)

* [api] Added Builder for Early stopping configuration (deepjavalibrary#38)

* Explicitly set NDManager for dataset in EarlyStoppingListenerTest to make the test run on JDK11 in gradle.
This creates an abstraction for combining devices into a single device. The main
use case for now is in DJL Serving TP_parallel. It will allow us to create a
WorkerGroup and a PyPredictor for a set of devices and then track the usage of
devices properly. It could also be used later for multi-gpu training or other
multi-device cases.
* Updates doc versions to 0.24.0

Also moves android gradle.properties to the new 0.25.0.

* Remove android change
* Updates XGBoost to 2.0.1

* Use devtools 8

* Updates based on new Xgboost JNI API.

---------

Co-authored-by: Frank Liu <frankfliu2000@gmail.com>
* Added element-wise gauss error function (ERF)

* Added element-wise arctan2

* Format java

* Fixed docs

* added * to other_ptr in Atan2
* Added 2D FFT

* Format java

* Add default fft2

* Convert array to vectors

* Add inverse fft2

* Add better assersion in ifft2 test

* Add really better assersion in ifft2 test

* Move cast bellow ifft2 for unsupported exception

* Format java

* changed dims to axes

* changed dims to axes
* only build triton binaries

* install requests library

* remove script
frankfliu and others added 24 commits March 5, 2024 15:00
…brary#3032)

* support includeTokenTypes in TextEmbeddingBatchTranslator

Co-authored-by: Frank Liu <frankfliu2000@gmail.com>
* Increase DJL version to 0.27.0

* Update README
@MatPere MatPere requested review from zachgk, frankfliu and a team as code owners April 5, 2024 14:57
@frankfliu
Copy link
Contributor

frankfliu commented Apr 7, 2024

@MatPere

Would you please take a look this test failure: https://github.com/deepjavalibrary/djl/actions/runs/8590262606/job/23537569545#step:5:201

It seems failed for mac when loading model.

You can reproduce the error locally on your mac:

cd engines/llama
./gradlew compileJNI
./gradlew test -Dnightly=true -Pjni

@MatPere
Copy link
Author

MatPere commented Apr 9, 2024

I don't think I'll be able to get my hands on an osx anytime soon, so I can only suggest leads for anyone willing to help. I did not manage to reproduce the error on my linux-x86_64, so it is likely an os-specific issue, and I have very little experience dealing with osx-related errors.

From my understanding, the error message comes from within the code of llama.cpp (llama.cpp/common/common.cpp, function llama_init_from_gpt_params). Normally, the cause for such an error may come from the content of the model file, or the parameters given to the loader, but without access to the LLAMA_LOG_ERROR itself I can't tell from that message alone. Because it works on other OSs, my guess is that it has to come from the way the model is downloaded or the code is compiled.

Maybe (but I doubt it) this is an internal issue with llama.cpp itself being unable to handle that exact model on osx for some reason, in which case manually downloading and installing the llama.cpp repository as well as the model tinyllama-1.1b-1t-openorca.Q4_K_M.gguf, then launching ./main -m path/to/model.gguf would presumably result in the same failure (and in that case we'd have to post a new issue to their repository).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet