export model to fp16 #347

kroggen · 2023-08-23T22:01:06Z

No description provided.

rdentato · 2023-08-23T22:15:57Z

Ok. I see you went for a much deeper change.

Did you manage to test it?

kroggen · 2023-08-23T22:34:38Z

It is not tested. I am trying to implement the load of model (version 0 and maybe 1)

karpathy · 2023-08-25T15:21:23Z

Question: what is the benefit of fp16?

As the Llama 2 models were trained in bf16 I find fp16 conversion sketchy. For newly trained models this is less of a concern
The file sizes are ofc ~2X smaller
The code is a little bit more bloated

Am I missing some considerations?

karpathy · 2023-08-25T15:23:34Z

export.py

-    b = struct.pack(f'{len(d)}b', *d)
+def serialize(file, tensor, type):
+    """ writes one tensor to file that is open in wb mode """
+    if type == 'fp32':


feels simplfiable

karpathy · 2023-08-25T15:24:12Z

export.py

@@ -129,7 +133,7 @@ def legacy_export(model, filepath):
 # -----------------------------------------------------------------------------
 # new version

-def version1_export(model, filepath):
+def version1_export(model, filepath, type):


we'd have to serialize the type to the header too

rdentato · 2023-08-25T15:25:29Z

The point is that they can be directly loaded into the GPU.
Not needing conversion on-the-flying (and having a smaller file to load) significantly reduce the load time (which, for my Tesla T4 is around 2min. for the llama2_7b models.

Also, I tested llama2.c on an ARM machine using their native support for fp16 and it works like a charm (and ARM CPU are cheaper on AWS).

karpathy · 2023-08-25T15:25:34Z

export.py

@@ -450,6 +454,7 @@ def torchscript_export(model, filepath, zero_params=False, gzip_output=False):
    parser = argparse.ArgumentParser()
    parser.add_argument("filepath", type=str, help="the output filepath")
    parser.add_argument("--version", default=0, type=int, help="the version to export with")
+    parser.add_argument("--type", default='fp32', type=str, help="the data type to export to (fp32, fp16, bfloat16)")


i'm not 100% decided if type should be a separate variable that is written into the header, or if it should just be absorbed into version. E.g.:

version 0 original float32
version 1 original float16
version 2 new header int8

etc and just go that way

There's another PR that just uses "--version" for this.

Don't forget the support to bf16 (and maybe others to come)

If using a version number for each, it is both not intuitive as also it will have a lot of "versions"

export model to fp16

27221a1

kroggen mentioned this pull request Aug 23, 2023

Modified export.py to add the ability to export fp16 weights. #345

Open

kroggen added 2 commits August 23, 2023 19:07

fix typo

e2ae96f

support bfloat16

a7bde11

karpathy reviewed Aug 25, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export model to fp16 #347

export model to fp16 #347

kroggen commented Aug 23, 2023

rdentato commented Aug 23, 2023

kroggen commented Aug 23, 2023

karpathy commented Aug 25, 2023

karpathy Aug 25, 2023

karpathy Aug 25, 2023

rdentato commented Aug 25, 2023 •

edited

karpathy Aug 25, 2023

rdentato Aug 25, 2023

kroggen Aug 25, 2023

export model to fp16 #347

Are you sure you want to change the base?

export model to fp16 #347

Conversation

kroggen commented Aug 23, 2023

rdentato commented Aug 23, 2023

kroggen commented Aug 23, 2023

karpathy commented Aug 25, 2023

karpathy Aug 25, 2023

Choose a reason for hiding this comment

karpathy Aug 25, 2023

Choose a reason for hiding this comment

rdentato commented Aug 25, 2023 • edited

karpathy Aug 25, 2023

Choose a reason for hiding this comment

rdentato Aug 25, 2023

Choose a reason for hiding this comment

kroggen Aug 25, 2023

Choose a reason for hiding this comment

rdentato commented Aug 25, 2023 •

edited