AWS Lambda Model Training

A lambda to split pre-processed data into, training and validation then uploaded to an S3 bucket. Training and validation data uploaded to the bucket will be used when triggering the training job.

This repository does not create the S3 Bucket, this is created via Terraform found here terraform-aws-machine-learning-pipeline. For more details on the entire flow and how this lambda is deployed, see aws-automlops-serverless-deployment.

Flowchart

The diagram below demonstrates what happens when the lambda is trigger, when a new .csv object has been uploaded to the S3 Bucket.

graph LR
  S0(Start)
  T1(Dataset pulled from S3 Bucket)
  T2(Random split and sort using Numpy)
  T3[["`70% training data
    20% validation data
    10% test data`"]]
  T4("Upload split data into S3 Bucket as `.csv`")
  T5("Start training job with training and validation data")
  E0(End)

  S0-->T1
  T1-->T2
  T2-->T3
  T3-->T4
  T4-->T5
  T5-->E0

Development

Dependencies

Usage

Build the docker image locally:

docker build --no-cache -t model_training:local .

Run the docker image built:

docker run --platform linux/amd64 -p 9000:8080 model_training:local

Send an event to the lambda via curl:

curl "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{<REPLACE_WITH_JSON_BELOW>}'

{
  "Records": [
    {
      "eventVersion": "2.0",
      "eventSource": "aws:s3",
      "awsRegion": "us-east-1",
      "eventTime": "1970-01-01T00:00:00.000Z",
      "eventName": "ObjectCreated:Put",
      "userIdentity": { "principalId": "EXAMPLE" },
      "requestParameters": { "sourceIPAddress": "127.0.0.1" },
      "responseElements": {
        "x-amz-request-id": "EXAMPLE123456789",
        "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
      },
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "testConfigRule",
        "bucket": {
          "name": "example-bucket",
          "ownerIdentity": { "principalId": "EXAMPLE" },
          "arn": "arn:aws:s3:::example-bucket"
        },
        "object": {
          "key": "data/example-bank-file.csv",
          "size": 515246,
          "eTag": "0e29c0d99c654bbe83c42097c97743ed",
          "sequencer": "00656A54CA3D69362D"
        }
      }
    }
  ]
}

GitHub Action (CI/CD)

The GitHub Action "🚀 Push Docker image to AWS ECR" will check out the repository and push a docker image to the chosen AWS ECR using configure-aws-credentials action. The following repository secrets need to be set:

Secret	Description
AWS_REGION	The AWS Region.
AWS_ACCOUNT_ID	The AWS account ID.
AWS_ECR_REPOSITORY	The AWS ECR repository name.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
tests		tests
.cz.toml		.cz.toml
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
VERSION		VERSION
model_training.py		model_training.py
models.py		models.py
requirements.txt		requirements.txt

kwame-mintah/aws-lambda-model-training

Folders and files

Latest commit

History

Repository files navigation

AWS Lambda Model Training

Flowchart

Development

Dependencies

Usage

GitHub Action (CI/CD)

About

Topics

Resources

Stars

Watchers

Forks

Languages