Convert directly from llama3 #4268

pdevine · 2024-05-08T23:14:20Z

This change allows you to convert directly from a llama3 derived safetensors model into Ollama.

It is currently missing:

pytorch almost works however the embeddings layer size is off by the eos/bos tokens

This will work with most llama3 derivatives if they are using safetensors including dolphin-2.9-llama3, nous research's hermes 2 pro, and nvidia's chatqa.

mxyng · 2024-05-18T07:15:32Z

Updated the safetensors and pytorch conversion interfaces to take F32, F16, and BF16 inputs. This allows this change to convert llama3 derivatives such as nvidia's ChatQA and NousResearch's Hermes 2 Pro

pdevine

LGTM.

pdevine · 2024-05-20T20:21:23Z

convert/tokenizer.go

+		}
+	}
+
+	switch digest := fmt.Sprintf("%x", sha256sum.Sum(nil)); digest {


This is pain.

hex.EncodeToString ?

I meant more that we have to look up these hex digests this way. I'm OK w/ the Sprintf()

I agree it's not ideal but it's dead simple way to differentiate between different pretokenizers

mxyng · 2024-05-21T18:32:20Z

convert/convert_test.go

this test is intended to run locally and runs iff -tags slow and the model testdata exists

mxyng · 2024-05-21T18:43:13Z

convert/convert_test.go

+	"github.com/ollama/ollama/llm"
+)
+
+func convertFull(t *testing.T, p string) (llm.KV, llm.Tensors) {


There's room for improvement here. Ideally there's a single function call to convert or at least set up for writing the binary. I missed calling GetTensors on the first pass but the write succeeded without writing out any tensors

jmorganca

Overall looks good, @pdevine may have some comments

pdevine · 2024-05-21T20:49:57Z

convert/convert_test.go

+		tensors int
+		layers  int
+	}{
+		{"Meta-Llama-3-8B-Instruct", "llama", 291, 35},


how does the test data get populated?

I symlinked it into the directory since I have in another directory but you can clone or unpack as well

pdevine

LGTM.

jmorganca mentioned this pull request May 10, 2024

Import from a HF model directly? #3748

Closed

mxyng force-pushed the pdevine/llama3 branch 8 times, most recently from 9b83ecb to 27588a7 Compare May 16, 2024 23:53

mxyng changed the base branch from main to mxyng/cache-intermediate-layers May 17, 2024 18:38

mxyng force-pushed the mxyng/cache-intermediate-layers branch from 39efb30 to 8d807d7 Compare May 17, 2024 18:38

mxyng force-pushed the pdevine/llama3 branch from 27588a7 to d56887a Compare May 17, 2024 18:39

mxyng force-pushed the mxyng/cache-intermediate-layers branch from 8d807d7 to 0aba2d5 Compare May 17, 2024 18:40

mxyng force-pushed the pdevine/llama3 branch from d56887a to 5e3e177 Compare May 17, 2024 18:47

mxyng changed the base branch from mxyng/cache-intermediate-layers to mxyng/fix-quantize May 17, 2024 18:48

mxyng force-pushed the pdevine/llama3 branch from 448014e to 8698064 Compare May 18, 2024 07:07

mxyng force-pushed the mxyng/fix-quantize branch from 882041e to 6a6d762 Compare May 18, 2024 07:13

mxyng force-pushed the pdevine/llama3 branch from 8698064 to 328d0ea Compare May 18, 2024 07:13

mxyng marked this pull request as ready for review May 18, 2024 07:13

pdevine commented May 20, 2024

View reviewed changes

mxyng force-pushed the mxyng/fix-quantize branch from 6a6d762 to 1410d8d Compare May 20, 2024 20:26

mxyng force-pushed the pdevine/llama3 branch 3 times, most recently from 0a7d4fb to 2bf8089 Compare May 20, 2024 20:29

mxyng force-pushed the mxyng/fix-quantize branch from 1410d8d to 807d092 Compare May 20, 2024 22:22

mxyng force-pushed the pdevine/llama3 branch from 2bf8089 to 5923794 Compare May 20, 2024 23:00

Base automatically changed from mxyng/fix-quantize to main May 20, 2024 23:09

pdevine added 2 commits May 20, 2024 16:13

some changes for llama3

d88582d

add safetensors version

4730762

pdevine and others added 5 commits May 20, 2024 16:13

llama3 conversion

c8cf0d9

add fixes for llama

d355d20

add missing file

2d315ba

bpe pretokenizer

547132e

cleanup

bbbd9f2

mxyng force-pushed the pdevine/llama3 branch from 5923794 to 722cba2 Compare May 20, 2024 23:14

mxyng added 2 commits May 21, 2024 11:28

fix conversion for f16 or f32 inputs

34d5ef2

add test

3591bbe

mxyng force-pushed the pdevine/llama3 branch from 722cba2 to 3591bbe Compare May 21, 2024 18:28

mxyng reviewed May 21, 2024

View reviewed changes

convert/convert_test.go

Copy link

Contributor

mxyng May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is intended to run locally and runs iff -tags slow and the model testdata exists

mxyng reviewed May 21, 2024

View reviewed changes

jmorganca approved these changes May 21, 2024

View reviewed changes

pdevine commented May 21, 2024

View reviewed changes

mxyng merged commit 96236b7 into main May 21, 2024
15 checks passed

mxyng deleted the pdevine/llama3 branch May 21, 2024 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert directly from llama3 #4268

Convert directly from llama3 #4268

pdevine commented May 8, 2024 •

edited by mxyng

mxyng commented May 18, 2024

pdevine left a comment

pdevine May 20, 2024

jmorganca May 21, 2024

pdevine May 21, 2024

mxyng May 21, 2024

mxyng May 21, 2024

mxyng May 21, 2024

jmorganca left a comment

pdevine May 21, 2024

mxyng May 21, 2024

pdevine left a comment

Convert directly from llama3 #4268

Convert directly from llama3 #4268

Conversation

pdevine commented May 8, 2024 • edited by mxyng

mxyng commented May 18, 2024

pdevine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmorganca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdevine left a comment

Choose a reason for hiding this comment

pdevine commented May 8, 2024 •

edited by mxyng