Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Phi-3 model #1580

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

monk1337
Copy link
Contributor

@monk1337 monk1337 commented Apr 30, 2024

Adding Phi-3 model

fsdp_transformer_layer_cls_to_wrap: Phi3DecoderLayer

@monk1337 monk1337 mentioned this pull request Apr 30, 2024
5 tasks
Comment on lines 68 to 81
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_limit_all_gathers: true
fsdp_sync_module_states: true
fsdp_offload_params: true
fsdp_use_orig_params: false
fsdp_cpu_ram_efficient_loading: true
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_transformer_layer_cls_to_wrap: Phi3DecoderLayer
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sharding_strategy: FULL_SHARD
resize_token_embeddings_to_32x: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally we prefer not to have FSDP or deepspeed specific configs (which require multi-gpu) for smaller models that should fit on single GPU systems.

@maziyarpanahi
Copy link
Contributor

Will you also add support for Phi-3 chat template for fine-tuning? As a reference: https://github.com/unslothai/unsloth/blob/4211cc01409e3ced4f7abebaf68e244193b46e2c/unsloth/chat_templates.py#L269C3-L269C8

@winglian
Copy link
Collaborator

will merge this after #1582

@vinamrabenara
Copy link

Was wondering when would this be merged?

@monk1337
Copy link
Contributor Author

Was wondering when would this be merged?

Too busy to work on this :P

@maziyarpanahi
Copy link
Contributor

@winglian I am having some issues with phi-3-small and phi-3-medium models.

  • phi-3-medium with 4k instruct, it constantly fails with RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
  • phi-3-small with 8k instruct, fails with:
ValueError: For now, we do not support unknown special tokens
In the future, if there is a need for this, we can add special tokens to the tokenizer
starting from rank 100261 - 100263 and then 100266 - 100275.
And finally, we can re-construct the enc object back

Is it correct to assume this PR won't solve these, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants