Skip to content

Step-by-step guide on running LLaMA language models using llama.cpp with GPU acceleration. Includes detailed examples and performance comparison. Based on OpenLLaMA project.

Notifications You must be signed in to change notification settings

hsm207/howto-llamacpp-with-gpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is the code for the The Total Noob's Guide to Harnessing the GPU for LLaMA Inference blog post.

Prerequisites

  1. Docker
  2. VS Code

Usage

  1. Open this project in the provided devcontainer

  2. Run:

    # compile llamacpp and install its dependencies
    make clone-llamacpp-repo
    make compile-llamacpp
    make install-llamacpp-deps
    
    # get a model and convert it to something llamacpp can use
    make download-model
    make convert-model-to-f16
    make quantize-model
    
    # view inference timings
    make eval

About

Step-by-step guide on running LLaMA language models using llama.cpp with GPU acceleration. Includes detailed examples and performance comparison. Based on OpenLLaMA project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published