Adding Phi-3 model #1580

monk1337 · 2024-04-30T18:24:29Z

Adding Phi-3 model

fsdp_transformer_layer_cls_to_wrap: Phi3DecoderLayer

winglian · 2024-05-08T20:35:49Z

examples/phi/phi3-ft.yml

+fsdp:
+  - full_shard
+  - auto_wrap
+fsdp_config:
+  fsdp_limit_all_gathers: true
+  fsdp_sync_module_states: true
+  fsdp_offload_params: true
+  fsdp_use_orig_params: false
+  fsdp_cpu_ram_efficient_loading: true
+  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+  fsdp_transformer_layer_cls_to_wrap: Phi3DecoderLayer
+  fsdp_state_dict_type: FULL_STATE_DICT
+  fsdp_sharding_strategy: FULL_SHARD
+resize_token_embeddings_to_32x: true


generally we prefer not to have FSDP or deepspeed specific configs (which require multi-gpu) for smaller models that should fit on single GPU systems.

maziyarpanahi · 2024-05-09T11:43:56Z

Will you also add support for Phi-3 chat template for fine-tuning? As a reference: https://github.com/unslothai/unsloth/blob/4211cc01409e3ced4f7abebaf68e244193b46e2c/unsloth/chat_templates.py#L269C3-L269C8

winglian · 2024-05-14T13:24:52Z

will merge this after #1582

vinamrabenara · 2024-05-24T21:06:59Z

Was wondering when would this be merged?

monk1337 · 2024-05-24T21:22:11Z

Was wondering when would this be merged?

Too busy to work on this :P

maziyarpanahi · 2024-05-28T15:53:12Z

@winglian I am having some issues with phi-3-small and phi-3-medium models.

phi-3-medium with 4k instruct, it constantly fails with RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
phi-3-small with 8k instruct, fails with:

ValueError: For now, we do not support unknown special tokens
In the future, if there is a need for this, we can add special tokens to the tokenizer
starting from rank 100261 - 100263 and then 100266 - 100275.
And finally, we can re-construct the enc object back

Is it correct to assume this PR won't solve these, right?

monk1337 mentioned this pull request Apr 30, 2024

Phi 3 support #1562

Open

5 tasks

winglian reviewed May 8, 2024

View reviewed changes

winglian force-pushed the patch-1 branch from 224f367 to e933d7a Compare May 14, 2024 12:31

winglian added the ready to merge label May 14, 2024

winglian mentioned this pull request May 14, 2024

Phi-3 conversation format, example training script and perplexity metric #1582

Open

monk1337 and others added 4 commits May 28, 2024 11:20

Create phi3-ft.yml

6af5f0a

rename to be fsdp specific and tweak settings a bit

23ee9a2

add phi-3 chat template

c6ddb4b

naming

67f2f4f

winglian force-pushed the patch-1 branch from b288ad5 to 67f2f4f Compare May 28, 2024 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Phi-3 model #1580

Adding Phi-3 model #1580

monk1337 commented Apr 30, 2024 •

edited

winglian May 8, 2024

maziyarpanahi commented May 9, 2024

winglian commented May 14, 2024

vinamrabenara commented May 24, 2024

monk1337 commented May 24, 2024

maziyarpanahi commented May 28, 2024

Adding Phi-3 model #1580

Are you sure you want to change the base?

Adding Phi-3 model #1580

Conversation

monk1337 commented Apr 30, 2024 • edited

winglian May 8, 2024

Choose a reason for hiding this comment

maziyarpanahi commented May 9, 2024

winglian commented May 14, 2024

vinamrabenara commented May 24, 2024

monk1337 commented May 24, 2024

maziyarpanahi commented May 28, 2024

monk1337 commented Apr 30, 2024 •

edited