FPO-201: Generate new models #64

willfish · 2024-05-02T14:59:15Z

Jira link

FPO-201

What?

I have added/removed/altered:

Added script to:

Check if we need to train a new model
Bring up training instance from our AMI
Generate a model on this instance from our current ref
Persist the model contents to our S3 bucket
Rotate any lingering running instances at midnight
Support deploying new model
Completely restructure and consolidate how we configure model generation

Also:

Removed terraform scripts from project (these are replaced by FPO-201: Adds dedicated VPC for extra-region us-east-1 trade-tariff-platform-aws-terraform#240)

Why?

I am doing this because:

This is a prerequisite for this part of the training pipeline

AC

Pipeline deploys new models that can be consumed by our deployment process

…raform#240

shjohnson · 2024-05-08T13:36:26Z

.circleci/config.yml

-    when:
-      equal: [ true, << pipeline.parameters.train_model >>]
-
+  nightly-clear-model-instances:


This should include more up-to-date CUDA drivers https://aws.amazon.com/releasenotes/aws-deep-learning-base-ami-amazon-linux-2/

willfish · 2024-05-17T08:14:12Z

pyproject.toml

 ]

 [project.optional-dependencies]
 dev = [
+    "aws-lambda-powertools",


This ends up being a development, only, dependency since its a requirements_lambda.txt requirement and only needed to run the tests of the lambda

willfish · 2024-05-17T08:14:49Z

search-config.toml

-batch_size = 1000
-embedding_batch_size = 100
+model_batch_size = 1000
+embedding_batch_size = 1000


2k causes the GPU to run out of memory (loads about 750 mb for each batch)

alexdesi

🚀 🚀 🚀 🚀
🚢 🚢 🚢 🚢

willfish added 2 commits May 2, 2024 15:55

FPO-201: Adds createec2 script

ecaa112

FPO-201: Migrate terraform trade-tariff/trade-tariff-platform-aws-ter…

8deeed5

…raform#240

willfish marked this pull request as draft May 2, 2024 15:00

willfish changed the title ~~FPO-201: Adds createec2 script~~ FPO-201: Generate new models May 2, 2024

willfish added 2 commits May 2, 2024 17:06

FPO-201: Code review

e0f868d

FPO-201: Consolidate configuration under file

a2c8dbc

willfish force-pushed the FPO-201-create-instance branch from 18b0843 to a2c8dbc Compare May 7, 2024 10:35

FPO-201: Checkout HEAD of current branch

b50ae1c

willfish force-pushed the FPO-201-create-instance branch from 4c16310 to b50ae1c Compare May 7, 2024 12:43

willfish added 2 commits May 7, 2024 13:46

FPO-201: Lint

5c72afb

FPO-201: Fixes limit reference

fbe1baa

willfish force-pushed the FPO-201-create-instance branch from 4c20760 to fbe1baa Compare May 7, 2024 13:47

willfish added 10 commits May 7, 2024 16:20

FPO-201: Working model generation

ffbdc99

FPO-201: Plumb together CI run

ada4199

FPO-201: Tidy old train scripts

1388a77

FPO-201: Disable strict host checking

c63161c

FPO-201: Fix rotation of old running instances

288eeaf

FPO-201: Persist the search-config.toml to our prefix

68ff34b

FPO-201: Adds a getmodel script to fetch the latest model

693b415

FPO-201: Disable model training until we change version

3e99988

FPO-201: Increment model version

0ca77e7

FPO-201: Update needsmodel

28d6c1b

willfish force-pushed the FPO-201-create-instance branch from 044f399 to 28d6c1b Compare May 8, 2024 11:43

willfish added 2 commits May 8, 2024 12:47

FPO-201: Fixes needsami

c70df90

FPO-201: Set transfomer device to GPU

7c59acf

shjohnson reviewed May 8, 2024

View reviewed changes

.circleci/config.yml

when:

equal: [ true, << pipeline.parameters.train_model >>]

nightly-clear-model-instances:

Copy link

Contributor

shjohnson May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

willfish force-pushed the FPO-201-create-instance branch from c271ac8 to 1836440 Compare May 8, 2024 14:08

FPO-201: Remove limit

be7cb3e

willfish force-pushed the FPO-201-create-instance branch from 1836440 to be7cb3e Compare May 8, 2024 14:24

FPO-201: Read the flippin manual

a038fe6

FPO-201: Specify ami exactly

5568c2d

willfish force-pushed the FPO-201-create-instance branch from c02571d to 5568c2d Compare May 15, 2024 21:51

willfish added 2 commits May 16, 2024 08:43

FPO-201: Make wait time for ami 50 minutes

ebcf038

FPO-201: Try a Deep Learning Amazon Linux ami

1745a9e

This should include more up-to-date CUDA drivers https://aws.amazon.com/releasenotes/aws-deep-learning-base-ami-amazon-linux-2/

willfish force-pushed the FPO-201-create-instance branch from 8f388f2 to 1745a9e Compare May 16, 2024 11:43

FPO-201: Use DLAMI

b1d91bc

willfish force-pushed the FPO-201-create-instance branch from 6a94c6a to b1d91bc Compare May 16, 2024 12:31

willfish added 9 commits May 16, 2024 14:19

FPO-201: Remove limits

044b4a0

FPO-201: Adds shell.nix

7d5ba23

FPO-201: Decreases the batch size

9757e81

FPO-201: Code review

ca3436c

FPO-201: Limit 1 for initial deployment

1ecfd62

FPO-201: Code review

79af5fc

FPO-201: Updates to nix

a909b88

FPO-201: Code review

82b97d4

FPO-201: Code review

f979692

willfish force-pushed the FPO-201-create-instance branch from 4c81fe9 to c970b4a Compare May 16, 2024 19:05

FPO-201: Rationalise scripts and add fetchembeddings

aff1242

willfish force-pushed the FPO-201-create-instance branch from c970b4a to aff1242 Compare May 16, 2024 19:14

willfish commented May 17, 2024

View reviewed changes

willfish added 4 commits May 17, 2024 09:42

FPO-201: Fixes model prefix environment mark

8e18ca2

FPO-201: Adds some initial model generation documentation

3f80da7

FPO-201: Tweak Key Components of pipeline

e4ca1a7

FPO-201: Adds design objectives

0932bb7

willfish marked this pull request as ready for review May 17, 2024 09:52

FPO-201: Use underscores for documentation file names

aacde5e

alexdesi approved these changes May 17, 2024

View reviewed changes

willfish merged commit c555ac0 into main May 17, 2024
9 checks passed

willfish deleted the FPO-201-create-instance branch May 17, 2024 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FPO-201: Generate new models #64

FPO-201: Generate new models #64

willfish commented May 2, 2024 •

edited

shjohnson May 8, 2024

willfish May 17, 2024

willfish May 17, 2024

alexdesi left a comment

FPO-201: Generate new models #64

FPO-201: Generate new models #64

Conversation

willfish commented May 2, 2024 • edited

Jira link

What?

Why?

AC

shjohnson May 8, 2024

Choose a reason for hiding this comment

willfish May 17, 2024

Choose a reason for hiding this comment

willfish May 17, 2024

Choose a reason for hiding this comment

alexdesi left a comment

Choose a reason for hiding this comment

willfish commented May 2, 2024 •

edited