Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Llama config to use Llama block and RoPE lower precision #358

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

2015aroras
Copy link
Contributor

Updating the Llama config to use Llama block and RoPE lower precision, to match the behavior of bf16-autocast Llama more closely.

@dirkgr
Copy link
Member

dirkgr commented Nov 2, 2023

This is good, but I want to run that separately first to see if it makes a difference. We'll have to wait a bit to get cluster time.

@2015aroras
Copy link
Contributor Author

This is good, but I want to run that separately first to see if it makes a difference. We'll have to wait a bit to get cluster time.

Do you want a separate config then, or should I just keep this PR on hold until you're ready to take it?

@dirkgr
Copy link
Member

dirkgr commented Nov 2, 2023

Let's keep this on hold for a bit, but if it gets too long, we'll merge it as a separate config.

@dirkgr
Copy link
Member

dirkgr commented Nov 2, 2023

I just ran this on Beaker, and it said this:

RuntimeError: When using the full_megatron init, every module must have a type.

coming from /home/dirkg/LLM/olmo/model.py:826.

Can you find some place to run this to avoid errors like this, even if it's a tiny batch and sequence length, just for a few batches?

@2015aroras
Copy link
Contributor Author

I just ran this on Beaker, and it said this:

RuntimeError: When using the full_megatron init, every module must have a type.

coming from /home/dirkg/LLM/olmo/model.py:826.

Can you find some place to run this to avoid errors like this, even if it's a tiny batch and sequence length, just for a few batches?

That runtime error is now fixed on a local hackish setup I have. I'll try running it on beaker briefly to see if anything else shows up.

@2015aroras
Copy link
Contributor Author

2015aroras commented Nov 3, 2023

This seems to run fine on beaker (with reduced model size to adjust for lack of GPUs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants