Run a WasmEdge LLAMA chat server app with Containerd over Kubernetes

Environment

We use Ubuntu 20.04 x86_64 in the following example.

Install containerd, costomized crun, and WasmEdge

Reuse install script from other example, but use the experimental crun branch.

sed 's|https://github.com/containers/crun|-b enable-wasmedge-plugin https://github.com/second-state/crun|g' containerd/install.sh | bash

Install k8s

Reuse install script from other example.

bash kubernetes_containerd/install.sh

Run LLAMA chat server app

The llama_server_application.sh script shows how to pull a WASM container image with WASI-NN-GGML plugin support from the Docker Hub, and then run it as a containerized application in Kubernetes.

bash k8s_containerd_llama/llama_server_application.sh

Test API service from other session

curl -X POST http://localhost:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "Who is Robert Oppenheimer?"}], "model":"llama-2-chat"}' | jq .

Check output

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Run a WasmEdge LLAMA chat server app with Containerd over Kubernetes

Environment

Install containerd, costomized crun, and WasmEdge

Install k8s

Run LLAMA chat server app

Files

README.md

Latest commit

History

README.md

File metadata and controls

Run a WasmEdge LLAMA chat server app with Containerd over Kubernetes

Environment

Install containerd, costomized crun, and WasmEdge

Install k8s

Run LLAMA chat server app