-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert directly from llama3 #4268
Conversation
9b83ecb
to
27588a7
Compare
39efb30
to
8d807d7
Compare
8d807d7
to
0aba2d5
Compare
Updated the safetensors and pytorch conversion interfaces to take F32, F16, and BF16 inputs. This allows this change to convert llama3 derivatives such as nvidia's ChatQA and NousResearch's Hermes 2 Pro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
} | ||
} | ||
|
||
switch digest := fmt.Sprintf("%x", sha256sum.Sum(nil)); digest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hex.EncodeToString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant more that we have to look up these hex digests this way. I'm OK w/ the Sprintf()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not ideal but it's dead simple way to differentiate between different pretokenizers
0a7d4fb
to
2bf8089
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is intended to run locally and runs iff -tags slow
and the model testdata exists
"github.com/ollama/ollama/llm" | ||
) | ||
|
||
func convertFull(t *testing.T, p string) (llm.KV, llm.Tensors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's room for improvement here. Ideally there's a single function call to convert or at least set up for writing the binary. I missed calling GetTensors
on the first pass but the write succeeded without writing out any tensors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, @pdevine may have some comments
tensors int | ||
layers int | ||
}{ | ||
{"Meta-Llama-3-8B-Instruct", "llama", 291, 35}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does the test data get populated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I symlinked it into the directory since I have in another directory but you can clone or unpack as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This change allows you to convert directly from a llama3 derived safetensors model into Ollama.
It is currently missing:
This will work with most llama3 derivatives if they are using safetensors including
dolphin-2.9-llama3
, nous research's hermes 2 pro, and nvidia's chatqa.