Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert directly from llama3 #4268

Merged
merged 9 commits into from
May 21, 2024
Merged

Convert directly from llama3 #4268

merged 9 commits into from
May 21, 2024

Conversation

pdevine
Copy link
Contributor

@pdevine pdevine commented May 8, 2024

This change allows you to convert directly from a llama3 derived safetensors model into Ollama.

It is currently missing:

  • pytorch almost works however the embeddings layer size is off by the eos/bos tokens

This will work with most llama3 derivatives if they are using safetensors including dolphin-2.9-llama3, nous research's hermes 2 pro, and nvidia's chatqa.

@mxyng mxyng force-pushed the pdevine/llama3 branch 8 times, most recently from 9b83ecb to 27588a7 Compare May 16, 2024 23:53
@mxyng mxyng changed the base branch from main to mxyng/cache-intermediate-layers May 17, 2024 18:38
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 39efb30 to 8d807d7 Compare May 17, 2024 18:38
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 8d807d7 to 0aba2d5 Compare May 17, 2024 18:40
@mxyng mxyng changed the base branch from mxyng/cache-intermediate-layers to mxyng/fix-quantize May 17, 2024 18:48
@mxyng mxyng marked this pull request as ready for review May 18, 2024 07:13
@mxyng
Copy link
Contributor

mxyng commented May 18, 2024

Updated the safetensors and pytorch conversion interfaces to take F32, F16, and BF16 inputs. This allows this change to convert llama3 derivatives such as nvidia's ChatQA and NousResearch's Hermes 2 Pro

Copy link
Contributor Author

@pdevine pdevine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

}
}

switch digest := fmt.Sprintf("%x", sha256sum.Sum(nil)); digest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hex.EncodeToString ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more that we have to look up these hex digests this way. I'm OK w/ the Sprintf()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not ideal but it's dead simple way to differentiate between different pretokenizers

@mxyng mxyng force-pushed the pdevine/llama3 branch 3 times, most recently from 0a7d4fb to 2bf8089 Compare May 20, 2024 20:29
Base automatically changed from mxyng/fix-quantize to main May 20, 2024 23:09
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is intended to run locally and runs iff -tags slow and the model testdata exists

"github.com/ollama/ollama/llm"
)

func convertFull(t *testing.T, p string) (llm.KV, llm.Tensors) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's room for improvement here. Ideally there's a single function call to convert or at least set up for writing the binary. I missed calling GetTensors on the first pass but the write succeeded without writing out any tensors

Copy link
Member

@jmorganca jmorganca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, @pdevine may have some comments

tensors int
layers int
}{
{"Meta-Llama-3-8B-Instruct", "llama", 291, 35},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the test data get populated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I symlinked it into the directory since I have in another directory but you can clone or unpack as well

Copy link
Contributor Author

@pdevine pdevine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mxyng mxyng merged commit 96236b7 into main May 21, 2024
15 checks passed
@mxyng mxyng deleted the pdevine/llama3 branch May 21, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants