Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPO-201: Generate new models #64

Merged
merged 93 commits into from May 17, 2024
Merged

FPO-201: Generate new models #64

merged 93 commits into from May 17, 2024

Conversation

willfish
Copy link
Member

@willfish willfish commented May 2, 2024

Jira link

FPO-201

What?

I have added/removed/altered:

Added script to:

  • Check if we need to train a new model
  • Bring up training instance from our AMI
  • Generate a model on this instance from our current ref
  • Persist the model contents to our S3 bucket
  • Rotate any lingering running instances at midnight
  • Support deploying new model
  • Completely restructure and consolidate how we configure model generation

Also:

Why?

I am doing this because:

  • This is a prerequisite for this part of the training pipeline

AC

  • Pipeline deploys new models that can be consumed by our deployment process

@willfish willfish marked this pull request as draft May 2, 2024 15:00
@willfish willfish changed the title FPO-201: Adds createec2 script FPO-201: Generate new models May 2, 2024
when:
equal: [ true, << pipeline.parameters.train_model >>]

nightly-clear-model-instances:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

]

[project.optional-dependencies]
dev = [
"aws-lambda-powertools",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being a development, only, dependency since its a requirements_lambda.txt requirement and only needed to run the tests of the lambda

batch_size = 1000
embedding_batch_size = 100
model_batch_size = 1000
embedding_batch_size = 1000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2k causes the GPU to run out of memory (loads about 750 mb for each batch)

@willfish willfish marked this pull request as ready for review May 17, 2024 09:52
Copy link
Contributor

@alexdesi alexdesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀 🚀
🚢 🚢 🚢 🚢

@willfish willfish merged commit c555ac0 into main May 17, 2024
9 checks passed
@willfish willfish deleted the FPO-201-create-instance branch May 17, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants