This is the code for the The Total Noob's Guide to Harnessing the GPU for LLaMA Inference blog post.
- Docker
- VS Code
-
Open this project in the provided devcontainer
-
Run:
# compile llamacpp and install its dependencies make clone-llamacpp-repo make compile-llamacpp make install-llamacpp-deps # get a model and convert it to something llamacpp can use make download-model make convert-model-to-f16 make quantize-model # view inference timings make eval