Skip to content

aws-samples/amazon-sagemaker-pipelines-mxnet-image-classification

Build and Register an MXNet Image Classification Model via Amazon SageMaker Pipelines

This architecture describes how to: (1) preprocess an image (.jpg) dataset into the recommended RecordIO input format for image classification, (2) train and evaluate a MXNet binary image classification model using SageMaker, and (3) register the trained model to SageMaker Model Registry. Additionally, this pattern demonstrates how all these ML workflow steps can be defined and automated using SageMaker Pipelines.

Prerequisites and limitations

Prerequisites

  • An active AWS account
  • Download to the following Pizza or Not Pizza? public dataset
    • Note: For this pattern, you will be building a binary image classification model that detects whether an input image contains a pizza food item or not. However, you can modify this pattern to optionally use any image dataset that has two distinct classes (i.e. cat vs. dog)
  • An Amazon Simple Storage Service (Amazon S3) bucket to store the image (.jpg) dataset
  • Access to create and configure an Amazon SageMaker Domain and User Profile. For more information about this, see Onboard to Amazon SageMaker Domain in the Amazon SageMaker documentation
  • Access to Amazon SageMaker Studio
  • An understanding of Amazon SageMaker notebooks and Jupyter notebooks
  • An understanding of how to create an AWS Identity and Access Management (IAM) role with basic SageMaker role permissions and S3 bucket access permissions
  • Familiarity with Python
  • Familiarity with common ML terms and concepts such as “binary classification”, “preprocessing”, “hyperparameters”, etc. For more information about this, see Machine Learning Concepts in the Amazon Machine Learning documentation

Limitations

  • To save processing time and cut costs, only a subset (1000 images) of the Pizza or Not Pizza? dataset is used to build the image classification model. You can choose to use more (or less) data or choose another dataset entirely (as mentioned above)
  • Certain hyperparameters in the model training step are hard-coded (manually set). These are specified in the image-classification-sagemaker-pipelines.ipynb Jupyter notebook. For more information about this, see Image Classification Hyperparameters in the Amazon SageMaker documentation.
  • You can extend upon the existing image classification ML workflow by adding additional steps (e.g. model tuning step) as needed. For more information about this, see Pipeline Steps in the Amazon SageMaker documentation.

Architecture

Target technology stack

Target architecture

Architecture Diagram

Automation and scale

After registering the trained model to SageMaker Model Registry, you can choose to deploy the model to a SageMaker endpoint for real-time inference. For more information about this, see Deploy a Model from the Registry in the Amazon SageMaker documentation.

Tools

Getting started

Step 1. Prepare Amazon S3 Bucket for image dataset

  • Create a new S3 bucket with default settings via the Amazon S3 console
  • Create a new folder named “ImageData” within the newly created S3 bucket
    • Within the “ImageData” folder, create two subfolders named “Pizza” and “NotPizza”.
  • Locally download Pizza or Not Pizza? dataset on to your computer and unzip its contents
    • You should notice two subdirectories within the downloaded file: (1) “pizza” and (2) “not_pizza”
    • Note: You may have to create a free account with Kaggle.com to access the dataset.
  • Navigate to the “Pizza” folder in the S3 bucket and upload 500 randomly selected images from the “pizza” subdirectory from the locally downloaded dataset.
  • Navigate to the “NotPizza” folder in the S3 bucket and upload 500 randomly selected images from the “not_pizza” subdirectory from the locally downloaded dataset.

Step 2. Configure Amazon SageMaker Studio environment

  • Create a new SageMaker Domain and User Profile via the Amazon SageMaker console
    • Follow the instructions from Onboard to Amazon SageMaker Domain Using Quick setup from the Amazon SageMaker documentation.
    • Note: When setting up the IAM role for the user profile, ensure that you give access to the Amazon S3 bucket you created earlier.
    • Note: Ensure that the SageMaker Domain is created in the same AWS Region as the S3 bucket you created earlier
  • Launch SageMaker Studio application via the User
  • Download the image-classification-sagemaker-pipelines.ipynb Jupyter notebook and scripts folder from this GitHub repository
  • Upload the image-classification-sagemaker-pipelines.ipynb Jupyter notebook and scripts folder to the SageMaker Studio application

Step 3. Run the Jupyter notebook in Amazon SageMaker Studio

  • Sequentially run the code cells from the image-classification-sagemaker-pipelines.ipynb Jupyter notebook within SageMaker Studio
    • Note: Make sure to appropriately configure the TODO portions of the code as you run the code cells
  • You can graphically monitor the pipeline execution in SageMaker Studio.
    • Follow the instructions from View a Pipeline from the Amazon SageMaker documentation.
  • After the pipeline is finished, you can view the registered model and associated metadata within SageMaker Studio.

Clean up

  1. Delete the S3 bucket with the image dataset and the default S3 bucket created by the SageMaker session
    • For more information on this, follow the instructions from Deleting a bucket from the Amazon S3 documentation.
  2. Delete the default S3 bucket created by the SageMaker session
    • Note: The default S3 bucket created by the SageMaker session should be in the following format: "sagemaker-{region}-{aws-account-id}”
  3. Delete model group from SageMaker Model Registry
    • Follow the instructions from Delete a Model Group from the Amazon SageMaker documentation.
    • Note: The model group name should be “MXNet-Image-Classification.” This was previously defined in the image-classification-sagemaker-pipelines.ipynb Jupyter notebook
  4. Delete SageMaker IAM execution role
    • First, navigate to your SageMaker Domain via the Amazon SageMaker console. Next, click on the "Domain Settings" tab. Now, under "General settings," you should find the "Execution role" for your SageMaker Domain. Copy the name of that "Execution role" (i.e. "AmazonSageMaker-ExecutionRole-XXXXX".
    • Next, navigate to the AWS IAM console and delete the SageMaker Execution (IAM) role you just copied. For more information about this, refer to Deleting an IAM role (console) from the AWS IAM documentation.
  5. Delete SageMaker Domain

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Authors