Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model parallel v2 llama finetuning notebook fixes #4646

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ArjunKrishnak
Copy link

Description of changes:

  • Updating the model parallel v2 README to clarify usage of shared-scripts directory
  • Disabling fp8 by default for backward compatibility
  • Updating llma finetuning example with inline comments for FSX args and upgrade command for pytest

Testing done:
Ran smp-finetuning-llama-fsdp-tp.ipynb in sagemaker notebook and ensured sagemaker training job succeded

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc and adhered to the example notebook best practices
  • I have updated any necessary documentation, including READMEs
  • I have tested my notebook(s) and ensured it runs end-to-end
  • I have linted my notebook(s) and code using black-nb -l 100 {path}/{notebook-name}.ipynb

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: SageMakerNotebooksProd-USWEST2-amazon-sagemaker-examples-pr
  • Commit ID: f7d67ba
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: SageMakerNotebooksProd-USWEST2-sagemaker-examples-code-formatting
  • Commit ID: f7d67ba
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: SageMakerNotebooksProd-USWEST2-sagemaker-examples-link-check
  • Commit ID: f7d67ba
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: SageMakerNotebooksProd-USWEST2-sagemaker-examples-grammar
  • Commit ID: f7d67ba
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -27,7 +27,7 @@ def parse_args(): # pylint: disable=too-many-statements
opt_grp.add_argument("--seed", type=int, default=12345)
opt_grp.add_argument("--same_seed", type=int, default=0)
opt_grp.add_argument("--bf16", default=1, type=int, help="automatic mixed precision training")
opt_grp.add_argument("--fp8", default=1, type=int, help="fp8 mixed precision training")
opt_grp.add_argument("--fp8", default=0, type=int, help="fp8 mixed precision training")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run into any issues with the default enabled? Normally, fp8 will only be enabled for models that support fp8 and should switch to bf16 for other cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants