Introduction

This is the code for the The Total Noob's Guide to Harnessing the GPU for LLaMA Inference blog post.

Prerequisites

Docker
VS Code

Usage

Open this project in the provided devcontainer

Run:

# compile llamacpp and install its dependencies
make clone-llamacpp-repo
make compile-llamacpp
make install-llamacpp-deps

# get a model and convert it to something llamacpp can use
make download-model
make convert-model-to-f16
make quantize-model

# view inference timings
make eval

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.devcontainer

.devcontainer

.gitignore

.gitignore

README.md

README.md

makefile

makefile

Repository files navigation

Introduction

Prerequisites

Usage

About

Releases

Packages

Languages

hsm207/howto-llamacpp-with-gpu

Folders and files

Latest commit

History

Repository files navigation

Introduction

Prerequisites

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages