Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scikit_bring_your_own.ipynb train model pandas error #219

Closed
professoroakz opened this issue Mar 26, 2018 · 3 comments
Closed

scikit_bring_your_own.ipynb train model pandas error #219

professoroakz opened this issue Mar 26, 2018 · 3 comments

Comments

@professoroakz
Copy link

professoroakz commented Mar 26, 2018

Hello!

I am following the scikit_bring_your_own tutorial and I am trying to set up BYO bring your own model for production use, but I am encountering the following issue when trying to train the model on AWS Sagemaker.


AlgorithmError: Exception during training: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'. Traceback (most recent call last): File "/opt/program/train", line 48, in train raw_data = [ pd.read_csv(file, header=None) for file in input_files ] File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__ self._make_engine(self.engine) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parser

I uploaded the data to s3 using:

    def upload_data(self):
        self.logger.info(
            'Uploading locally available data to s3 in path: %s, using bucket: %s using s3 directory prefix: %s'
            % (
                self.config.data_directory_path,
                self.config.data_upload_bucket,
                self.config.s3_data_directory_prefix,
            )
        )

        self.train_data_location = self.session.upload_data(
            path=self.config.data_directory_path,
            bucket=self.config.data_upload_bucket,
            key_prefix=self.config.s3_data_directory_prefix
        )

        self.logger.info('Uploaded local data to s3 path: %s' % (self.train_data_location))

I ran the build_and_push.sh script.

Then I tried to train the model using:

    def estimator(self):
        self.logger.info(
            'Creating estimator for %s model %s using image %s' % (
                'BYO',
                self.config.model_name,
                self.image,
            )
        )

        return Estimator(
            image_name=self.image,
            role=self.config.role,
            train_instance_count=self.config.train_instance_count,
            train_instance_type=self.config.train_instance_type,
            output_path=self.config.output_path,
            base_job_name=self.config.base_job_name,
            sagemaker_session=self.session,
        )

(I'm using the same code as in the notebook, just rewritten for using it as a class)

Am I missing something or doing something wrong?

@djarpin
Copy link
Contributor

djarpin commented Mar 26, 2018

Thanks @OktayGardener . It looks like you're running into an error when Pandas is trying to read in your data, similar to this one. Are you using the same iris.csv dataset from the scikit? If not, does the fix recommended in that link help?

@djarpin
Copy link
Contributor

djarpin commented Apr 10, 2018

Closing this issue, but feel free to re-open if you run into other problems with this.

@djarpin djarpin closed this as completed Apr 10, 2018
@bkitano
Copy link

bkitano commented Apr 11, 2020

Hi!

To resolve this issue, just append the filename to the directory path, eg

tree.fit(data_location + '/iris.csv')

atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this issue Nov 30, 2022
* initial commit

* minor fix

* minor fixes

* add inference experiment notebook

add inference experiment notebook

* Add shadow endpoint notebook (aws#215)

* initial commit

* add inference experiment notebook

add inference experiment notebook

Co-authored-by: Qingwei Li<ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>

* Revert "Add shadow endpoint notebook (aws#215)" (aws#218)

This reverts commit b6d2fd203f7f85670478556e902ad2bb86a1a882.

* reformat

* reviewer's comments addressed

* clear output

* fix and reformat nb

* reformat nb

* remove notebook

* markdown change

Co-authored-by: EC2 Default User <ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>
Co-authored-by: EC2 Default User <ec2-user@ip-172-16-0-250.us-west-2.compute.internal>
atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this issue Nov 30, 2022
* initial commit

* minor fix

* minor fixes

* add inference experiment notebook

add inference experiment notebook

* Add shadow endpoint notebook (aws#215)

* initial commit

* add inference experiment notebook

add inference experiment notebook

Co-authored-by: Qingwei Li<ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>

* Revert "Add shadow endpoint notebook (aws#215)" (aws#218)

This reverts commit b6d2fd203f7f85670478556e902ad2bb86a1a882.

* reformat

* reviewer's comments addressed

* clear output

* fix and reformat nb

* reformat nb

* remove notebook

* markdown change

Co-authored-by: EC2 Default User <ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>
Co-authored-by: EC2 Default User <ec2-user@ip-172-16-0-250.us-west-2.compute.internal>
atqy added a commit that referenced this issue Nov 30, 2022
* Adding example for native AutoML step in SageMaker Pipelines. (#211)

* Adding example for native AutoML step in SageMaker Pipelines.

* Adding SageMaker Geospatial example for Digital Farming (#213)

* Adding digital-farming-pipelines notebooks

* Adding digital-farming-pipelines notebooks

* Update pipelines-sagemaker-geospatial.ipynb

* Update pipelines-sagemaker-geospatial.ipynb

* Update pipelines-sagemaker-geospatial.ipynb

* Update pipelines-sagemaker-geospatial.ipynb

* Adding digital-farming-pipelines notebooks

* Adding digital-farming-pipelines notebooks

* Adding code folder

* Adding Lambda function

* Adjusting notebook

* Adjuting roles

* Adjusted roles

* Updated notebooks

* Updated notebooks format

* Adding Scheduled Notebook (Keynote2) (#216)

* * Add scheduled notebook example

* Update notebook

* Format

* Change image dir

* Resolve comments.

* Resolve comments.

* Resolve comments.

* Add example notebook for model governance model card (#217)

Co-authored-by: Zhenshan Jin <zsjin@amazon.com>

* shadow endpoint (#219)

* initial commit

* minor fix

* minor fixes

* add inference experiment notebook

add inference experiment notebook

* Add shadow endpoint notebook (#215)

* initial commit

* add inference experiment notebook

add inference experiment notebook

Co-authored-by: Qingwei Li<ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>

* Revert "Add shadow endpoint notebook (#215)" (#218)

This reverts commit b6d2fd203f7f85670478556e902ad2bb86a1a882.

* reformat

* reviewer's comments addressed

* clear output

* fix and reformat nb

* reformat nb

* remove notebook

* markdown change

Co-authored-by: EC2 Default User <ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>
Co-authored-by: EC2 Default User <ec2-user@ip-172-16-0-250.us-west-2.compute.internal>

* rename sagemaker-shadow-endpoint folder

* upgrade pip packages for reinvent

* install from reinvent wheels temporarily

* Keynote2 rtd (#224)

* remove --no-deps tags from notebook

* configure rst files for reinvent notebooks + fix any rtd leveling issues

* make corrections

* reformat

* updated shadow_variants with Alwin's work (#225)

* initial commit

* minor fix

* minor fixes

* add inference experiment notebook

add inference experiment notebook

* Add shadow endpoint notebook (#215)

* initial commit

* add inference experiment notebook

add inference experiment notebook

Co-authored-by: Qingwei Li<ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>

* Revert "Add shadow endpoint notebook (#215)" (#218)

This reverts commit b6d2fd203f7f85670478556e902ad2bb86a1a882.

* reformat

* reviewer's comments addressed

* clear output

* fix and reformat nb

* reformat nb

* remove notebook

* markdown change

* Alwin's edit

add edits from Alwin

* reformat

* change folder name

Co-authored-by: EC2 Default User <ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>
Co-authored-by: EC2 Default User <ec2-user@ip-172-16-0-250.us-west-2.compute.internal>
Co-authored-by: atqy <atqy@amazon.com>
Co-authored-by: atqy <95724753+atqy@users.noreply.github.com>

* add region tag to allow download from opt-in regions (#226)

* add region tag to allow download from opt-in regions

* add region tag to allow download from opt-in regions

* Trigger Build

Co-authored-by: Marcelo Aberle <116175523+aberlm@users.noreply.github.com>
Co-authored-by: Antonio Rodriguez <42835728+rodzanto@users.noreply.github.com>
Co-authored-by: Sean Morgan <seanmorgan91@gmail.com>
Co-authored-by: Zhenshan Jin <zhenshan.jin.jz@gmail.com>
Co-authored-by: Zhenshan Jin <zsjin@amazon.com>
Co-authored-by: Qingwei Li <billdoors@users.noreply.github.com>
Co-authored-by: EC2 Default User <ec2-user@ip-172-16-37-37.us-west-2.compute.internal>
Co-authored-by: Shreya Pandit <pandishr@amazon.com>
Co-authored-by: Qingwei Li <qqnl@amazon.com>
Co-authored-by: EC2 Default User <ec2-user@ip-172-16-0-250.us-west-2.compute.internal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants