Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SageMaker Endpoint stuck at “Creating” #92

Open
vas610 opened this issue Jan 12, 2021 · 0 comments
Open

SageMaker Endpoint stuck at “Creating” #92

vas610 opened this issue Jan 12, 2021 · 0 comments

Comments

@vas610
Copy link

vas610 commented Jan 12, 2021

Describe the bug
I'm trying to deploy a SageMaker endpoint and it gets stuck in "Creating" stage indefinitely. Below is my Dockerfile and training / serving script. The model trains without any issue. Only the Endpoint deployment gets stuck in the "Creating" stage.

To reproduce

Folder structure

|_code
   |_train_serve.py
|_Dockerfile

Dockerfile

# ##########################################################

# Adapt your container (to work with SageMaker)
# # https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html
# # https://hub.docker.com/r/huanjason/scikit-learn/dockerfile

ARG REGION=us-east-1

FROM python:3.7

RUN apt-get update && apt-get -y install gcc

RUN pip3 install \
        # numpy==1.16.2 \
        numpy \
        # scikit-learn==0.20.2 \
        scikit-learn \
        pandas \
        # scipy==1.2.1 \
        scipy \
        mlflow

RUN rm -rf /root/.cache

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE

# Install sagemaker-training toolkit to enable SageMaker Python SDK
RUN pip3 install sagemaker-training

ENV PATH="/opt/ml/code:${PATH}"

# Copies the training code inside the container
COPY  /code /opt/ml/code

# Defines train_serve.py as script entrypoint
ENV SAGEMAKER_PROGRAM train_serve.py

train_serve.py

import os
import ast
import warnings
import sys
import json
import ast
import argparse
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import PolynomialFeatures
from urllib.parse import urlparse
import logging
import pickle

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

if __name__ =='__main__':
    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    # Data, model, and output directories
    parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    parser.add_argument('--train-file', type=str, default='kc_house_data_train.csv')
    parser.add_argument('--test-file', type=str, default='kc_house_data_test.csv')
    parser.add_argument('--features', type=str)  # we ask user to explicitly name features
    parser.add_argument('--target', type=str) # we ask user to explicitly name the target

    args, _ = parser.parse_known_args()

    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Reading training and testing datasets
    logging.info('reading training and testing datasets')
    logging.info(f"{args.train} {args.train_file} {args.test} {args.test_file}")
    train_df = pd.read_csv(os.path.join(args.train, args.train_file))
    test_df = pd.read_csv(os.path.join(args.test, args.test_file))
    
    logging.info(args.features.split(','))
    logging.info(args.target)
    train_x = np.array(train_df[args.features.split(',')]).reshape(-1,1)
    test_x = np.array(test_df[args.features.split(',')]).reshape(-1,1)
    train_y = np.array(train_df[args.target]).reshape(-1,1)
    test_y = np.array(test_df[args.target]).reshape(-1,1)  

    reg = linear_model.LinearRegression()

    reg.fit(train_x, train_y)
    predicted_price = 
    reg.predict(test_x)
    (rmse, mae, r2) = eval_metrics(test_y, predicted_price)

    logging.info(f"        Linear model: (features={args.features}, target={args.target})")
    logging.info(f"            RMSE: {rmse}")
    logging.info(f"            MAE: {mae}")
    logging.info(f"            R2: {r2}")

    model_path = os.path.join(args.model_dir, "model.pkl")
    logging.info(f"saving to {model_path}")          
    logging.info(args.model_dir)
    with open(model_path, 'wb') as path:
        pickle.dump(reg, path)


def model_fn(model_dir):
    with open(os.path.join(model_dir, "model.pkl"), "rb") as input_model:
        model = pickle.load(input_model)
    return model
    
def predict_fn(input_object, model):
    _return = model.predict(input_object)
    return _return

Expected behavior
SageMaker Endpoint should get deployed successfully

Screenshots or logs

System information
A description of your system.

  • Include the version of SageMaker Training Toolkit you are using.
  • If you are using a prebuilt Amazon SageMaker Docker image, provide the URL.
  • If you are using a custom Docker image, provide:
    • framework name (eg. PyTorch)
    • framework version
    • Python version
    • processing unit type (ie. CPU or GPU)

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant