-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new functions to support downloading from AWS s3 #218
base: new_inference
Are you sure you want to change the base?
Conversation
…into new_inference
…into new_inference
@fang19911030 I got errors when invoking
Are any prerequisites required? |
after updated conda env and gempyor:
|
Hi @kjsato , I corrected the wrong function name used in unit tests. These errors are solved. |
I confirmed, thx |
…ename()' in utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing it like L364, for readability in utils.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is very good, I had some comments along the way we can discuss.
The end goal is to have a command line to resume like there is currently.gempyor-simulate
and these functions should be adapted to be used through this entry point, that will read environment variable or command line argument through click.
def create_resume_out_filename(filetype: str, liketype: str) -> str: | ||
run_id = os.environ.get("FLEPI_RUN_INDEX") | ||
prefix = f"{os.environ.get('FLEPI_PREFIX')}/{os.environ.get('FLEPI_RUN_INDEX')}" | ||
inference_filepath_suffix = f"{liketype}/intermidate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate
prefix = f"{os.environ.get('FLEPI_PREFIX')}/{os.environ.get('FLEPI_RUN_INDEX')}" | ||
inference_filepath_suffix = f"{liketype}/intermidate" | ||
inference_filename_prefix = "{:09d}.".format(int(os.environ.get("FLEPI_SLOT_INDEX"))) | ||
index = "{:09d}.{:09d}".format(1, int(os.environ.get("FLEPI_BLOCK_INDEX")) - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the create_file_name function handles the formatting of the index.
) | ||
|
||
|
||
def get_parquet_types_for_resume() -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good, but these functions should take arguments and not read environment variables, because we want to use them from gempyor (and have less and fewer environment variable), and because if we need an environment variable version we can add a wrapper around it.
(think of a script like simulate.py, that is e.g called resume_from.py : this script parse the environment variable from the safety of the click module and call these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note: seeding is a csv, so I think this function should be renamed "get_filetype_for_resume" (this 4 letter id is called filetype throughout the code)-
return resume_file_name_mapping | ||
|
||
|
||
def download_file_from_s3(name_map: Dict[str, str]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing this is very good. Ideally we also have a "move on filesystem" function that can be used as drop-in when resuming from another folder. In this case, it should just move the file in a safe way (creating folders along the way)
(again, sorry for the delay reviewing, as you know round 18 has been really hard) |
No worry, I will do change based the feedbacks |
Add new python functions which can be used to download files from AWS s3 to the utils.py