Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding S3 contents to python path #56

Open
cjacksudo opened this issue Jan 11, 2019 · 3 comments
Open

Adding S3 contents to python path #56

cjacksudo opened this issue Jan 11, 2019 · 3 comments

Comments

@cjacksudo
Copy link

When I'm using the local file system, it's possible to add my home directory to my python path such that in my notebook I can upload a python file to my root directory and then run a line like:

from my_uploaded_file import *

However, when using s3Contents, I get an import error. Interestingly, if I run the following, it looks like the file is in my home directory already:

In:
     import os
     file_path = os.path.abspath("my_uploaded_file.py")
     print(file_path)
Out:
   '/home/jovyan/my_uploaded_file.py'

However, if I actually look at that directory, the file is missing...

Is there a way to make this import work?

@danielfrg
Copy link
Owner

This an interesting issue, it wont work on a notebook since the python file will be on S3 and not in the session/kernel path where they can be imported.

One way to make this work would be to have the python files locally and use the HybridContentsManager but this wont save the python files to S3.

@GergelyKalmar
Copy link

A possibly nicer approach is to use boto3 in a notebook to download all files from a given S3 path locally, which ultimately achieves this. It should be possible to connect this action to a post-save hook too and automatically download files when they are saved (I haven't done that though, the occasional manual sync works for now).

It works roughly like this (assuming we want to sync files from a utils folder, also, paths are specific to AWS EMR):

import os

import boto3

NOTEBOOK_BUCKET = "name-of-the-bucket-which-holds-your-notebooks"
UTILS_PATH = "jupyter/user_name/utils/"

s3 = boto3.resource("s3")
print(f"Loading '{NOTEBOOK_BUCKET}'")
bucket = s3.Bucket(NOTEBOOK_BUCKET)
for obj in bucket.objects.filter(Prefix = UTILS_PATH):
    path = obj.key.replace(UTILS_PATH, "")
    if os.path.basename(path).startswith("."):
        continue
    path = f"utils/{path}"
    print(f"Downloading '{path}'")
    if not os.path.exists(os.path.dirname(path)):
        os.makedirs(os.path.dirname(path))
    bucket.download_file(obj.key, path)
print("Done!")

Note that we skip files starting with a dot (like the .s3keep files).

@GergelyKalmar
Copy link

Nevertheless, it would be very nice to have an option for turning on "automatic local syncing upon save" (at least for .py files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants