Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

models compatibility for local mode #20

Open
joejztang opened this issue Sep 14, 2022 · 7 comments
Open

models compatibility for local mode #20

joejztang opened this issue Sep 14, 2022 · 7 comments

Comments

@joejztang
Copy link

Hey dear aws,
I have ran a couple of models in this repo, but none of them are working so far. I am able to solve authenticate issues for s3, but when it turned to create container either to train or predict, they are always issues.

Personally I saw popular ones are

  1. [Errno 2] No such file or directory: '/opt/ml/input/config/resourceconfig.json'.
  2. some kind of indication that some data parallel is not supported in local mode (sorry I didn't remember the details).

A questions on my side: is there a way to bypass issues solve all other potential issues well locally?
Any comments, solutions are welcome, thanks in advance!

@eitansela
Copy link
Contributor

Hi @joejztang

By the errors you are describing it looks you are trying to use data parallel in local mode.
Is this what you are trying to do?

@joejztang
Copy link
Author

@eitansela Hi, I don't mean to use data parallel in local. I am trying to run this locally https://github.com/aws-samples/amazon-sagemaker-local-mode/tree/main/tensorflow_script_mode_local_training_and_serving. After solving the s3 issue, it's giving me the error [Errno 2] No such file or directory: '/opt/ml/input/config/resourceconfig.json'

@eitansela
Copy link
Contributor

eitansela commented Sep 20, 2022

Got it. What is the SageMaker SDK you have installed, and which operating system?

@joejztang
Copy link
Author

@eitansela sorry for the late reply.
I will share some info on below.

sagemaker sdk version 2.110.0.
os: macos

addtional info:

  1. installed env thru miniconda. conda create --name localmode python=3.9. running python version 3.9.13.
  2. take scikit_learn_script_mode_local_training_and_serving as an example, in order to run it compliant to company's setting up, have to pass in sagemaker_session=LocalSession(boto_session=boto3.Session(region_name='us-west-2', profile_name='<awesomeprofile>')) here at https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/scikit_learn_script_mode_local_training_and_serving/scikit_learn_script_mode_local_training_and_serving.py#L66. personally don't think this would interrupt anything, but if it is, then this is something worth mentioning.

thanks for the reply. please tag @joejztang if you find anything.

@eitansela
Copy link
Contributor

Hi @joejztang , do you run it on Intel or Arm based Mac?

@joejztang
Copy link
Author

joejztang commented Oct 3, 2022

@eitansela intel based.

@eitansela
Copy link
Contributor

Can you please attach the full logs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants