Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified export.py to add the ability to export fp16 weights. #345

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rdentato
Copy link
Contributor

@rdentato rdentato commented Aug 23, 2023

Added a --version 3 to the parameters that is like legacy_export but exports data in fp16 format.

Warning! needs test. I have too little memory (64GB are needed for a 7B parameters model and I couldn't find a smaller one)

@rdentato rdentato changed the title Added a versione (3) that is like legacy but exports to fp16. Modified export.py to export to fp16. Aug 23, 2023
@kroggen
Copy link
Contributor

kroggen commented Aug 23, 2023

I am against creating a new version. It should export to existing versions, but choosing the output format (fp32, fp16, int8...)

Current code still expect the freq_cis to be there in the final file

I also think you mean 64 GB instead of MB. BTW this OOM problem is tracked on #341

@rdentato
Copy link
Contributor Author

rdentato commented Aug 23, 2023

It will be the "legacy version" but with fp16 weights.
Because of how export.py works, you need to give a version number to it:

usage: export.py [-h] [--version VERSION] (--checkpoint CHECKPOINT | --meta-llama META_LLAMA | --hf HF) filepath

Changing the command line parameters of export.py could be the topic of another PR. For example having a "name" and a "version" for each model format that can be exported.

@rdentato
Copy link
Contributor Author

rdentato commented Aug 23, 2023

Just thought of another way, but I'm not sure I like it: use the extension of the output file to determine the fp32/fp16 size.
For example:

python export.py --hf mrm8488/llama-2-coder-7b llama-2-coder-7b.f16

could signal --version 0 (the default) but with weights in fp16 format. The code would be a little bit uglier but if this could serve to avoid confusion ...

@rdentato rdentato changed the title Modified export.py to export to fp16. Modified export.py to add the ability to export fp16 weights. Aug 23, 2023
@kroggen
Copy link
Contributor

kroggen commented Aug 23, 2023

Why your serialize_half is not named serialize_fp16 to be similar with the existing serialize_fp32?

Why your function is not together with the other 2 in the beginning of the file?

@kroggen
Copy link
Contributor

kroggen commented Aug 23, 2023

My attempt is at #347

@rdentato
Copy link
Contributor Author

rdentato commented Aug 24, 2023

About the half, is to stay consistent with the torch.half in the code. I liked it better (I ad fp16 initially).

for the position, I don't like to disrupt existing code (at least not for something so minimal) and just added at the end of the block. But in this case I agree I should have put it earlier, together with the other two, moved up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants