Add new functions to support downloading from AWS s3 #218

fang19911030 · 2024-05-13T14:34:04Z

Add new python functions which can be used to download files from AWS s3 to the utils.py

…into new_inference

kjsato · 2024-05-13T19:00:20Z

@fang19911030 I got errors when invoking pytest test_utils.py such as:

============================================== test session starts ===============================================
platform darwin -- Python 3.9.13, pytest-7.4.0, pluggy-1.0.0
rootdir: /Users/koji/Codes/flepimop/flepiMoP/flepimop/gempyor_pkg/tests/utils
plugins: anyio-3.5.0, cov-4.1.0
collected 17 items

test_utils.py ......F....FFFFFF [100%]

==================================================== FAILURES ====================================================

============================================ short test summary info =============================================
FAILED test_utils.py::test_print_disk_diagnosis_success - AttributeError: module 'gempyor.utils' has no attribute 'print_disk_diagnosis'
FAILED test_utils.py::test_create_resume_out_filename - AttributeError: module 'gempyor.utils' has no attribute 'create_resume_out_filename'
FAILED test_utils.py::test_create_resume_input_filename - AttributeError: module 'gempyor.utils' has no attribute 'create_resume_input_filename'
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_true_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_false_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_flepi_block_index_2 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_create_resume_file_names_map - AttributeError: module 'gempyor.utils' has no attribute 'create_resume_file_names_map'
========================================== 7 failed, 10 passed in 3.31s ==========================================

Are any prerequisites required?

kjsato · 2024-05-14T00:48:29Z

after updated conda env and gempyor:

============================================== test session starts ===============================================
platform darwin -- Python 3.11.4, pytest-7.4.0, pluggy-1.0.0
rootdir: /Users/koji/Codes/flepimop/flepiMoP/flepimop/gempyor_pkg/tests/utils
collected 17 items

test_utils.py .............FFF. [100%]

==================================================== FAILURES ====================================================
============================================ short test summary info =============================================
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_true_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_false_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_flepi_block_index_2 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
==================================== 3 failed, 14 passed, 9 warnings in 3.83s ====================================

fang19911030 · 2024-05-14T13:05:04Z

Hi @kjsato , I corrected the wrong function name used in unit tests. These errors are solved.

kjsato · 2024-05-14T14:45:59Z

Hi @kjsato , I corrected the wrong function name used in unit tests. These errors are solved.

I confirmed, thx

…ename()' in utils.py

kjsato

How about changing it like L364, for readability in utils.py?

jcblemai

Thank you, this is very good, I had some comments along the way we can discuss.

The end goal is to have a command line to resume like there is currently.gempyor-simulate and these functions should be adapted to be used through this entry point, that will read environment variable or command line argument through click.

jcblemai · 2024-05-21T08:47:27Z

flepimop/gempyor_pkg/src/gempyor/utils.py

+def create_resume_out_filename(filetype: str, liketype: str) -> str:
+    run_id = os.environ.get("FLEPI_RUN_INDEX")
+    prefix = f"{os.environ.get('FLEPI_PREFIX')}/{os.environ.get('FLEPI_RUN_INDEX')}"
+    inference_filepath_suffix = f"{liketype}/intermidate"


intermediate

jcblemai · 2024-05-21T08:50:15Z

flepimop/gempyor_pkg/src/gempyor/utils.py

+    prefix = f"{os.environ.get('FLEPI_PREFIX')}/{os.environ.get('FLEPI_RUN_INDEX')}"
+    inference_filepath_suffix = f"{liketype}/intermidate"
+    inference_filename_prefix = "{:09d}.".format(int(os.environ.get("FLEPI_SLOT_INDEX")))
+    index = "{:09d}.{:09d}".format(1, int(os.environ.get("FLEPI_BLOCK_INDEX")) - 1)


I think the create_file_name function handles the formatting of the index.

jcblemai · 2024-05-21T09:20:39Z

flepimop/gempyor_pkg/src/gempyor/utils.py

+    )
+
+
+def get_parquet_types_for_resume() -> List[str]:


This is very good, but these functions should take arguments and not read environment variables, because we want to use them from gempyor (and have less and fewer environment variable), and because if we need an environment variable version we can add a wrapper around it.

(think of a script like simulate.py, that is e.g called resume_from.py : this script parse the environment variable from the safety of the click module and call these functions.

Another note: seeding is a csv, so I think this function should be renamed "get_filetype_for_resume" (this 4 letter id is called filetype throughout the code)-

jcblemai · 2024-05-21T09:22:51Z

flepimop/gempyor_pkg/src/gempyor/utils.py

+    return resume_file_name_mapping
+
+
+def download_file_from_s3(name_map: Dict[str, str]) -> None:


Amazing this is very good. Ideally we also have a "move on filesystem" function that can be used as drop-in when resuming from another folder. In this case, it should just move the file in a safe way (creating folders along the way)

jcblemai · 2024-05-21T09:27:18Z

(again, sorry for the delay reviewing, as you know round 18 has been really hard)

fang19911030 · 2024-05-22T12:25:02Z

No worry, I will do change based the feedbacks

fang19911030 added 9 commits April 17, 2024 14:37

add function to generate filename for resume files

c8fce03

Merge branch 'sep_seeding_and_ic' into new_inference

4503fcf

fix functions and adding unit tests

a63105e

add copy function

aca80bc

format change

727169c

Merge branch 'new_inference' of https://github.com/HopkinsIDD/flepiMoP …

386da14

…into new_inference

add new function to set parquest types and add tests for it

2d35b8d

add functions to create resume file name map and download from s3 bucket

bb5c191

Merge branch 'new_inference' of https://github.com/HopkinsIDD/flepiMoP …

4915bbf

…into new_inference

fang19911030 requested review from jcblemai and kjsato May 13, 2024 14:34

fang19911030 changed the title ~~New inference pengcheng~~ Add new functions to support downloading from AWS s3 May 13, 2024

correct wrong functions name in tests

34394a3

modified to use a common style in the function 'create_resume_out_fil…

aab8388

…ename()' in utils.py

kjsato requested changes May 14, 2024

View reviewed changes

kjsato self-requested a review May 14, 2024 23:17

kjsato approved these changes May 14, 2024

View reviewed changes

address requested changes by koji

dc44991

jcblemai requested changes May 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new functions to support downloading from AWS s3 #218

Add new functions to support downloading from AWS s3 #218

fang19911030 commented May 13, 2024

kjsato commented May 13, 2024

kjsato commented May 14, 2024 •

edited

fang19911030 commented May 14, 2024

kjsato commented May 14, 2024 •

edited

kjsato left a comment •

edited

jcblemai left a comment

jcblemai May 21, 2024

jcblemai May 21, 2024

jcblemai May 21, 2024

jcblemai May 21, 2024

jcblemai May 21, 2024

jcblemai commented May 21, 2024

fang19911030 commented May 22, 2024

		return resume_file_name_mapping


		def download_file_from_s3(name_map: Dict[str, str]) -> None:

Add new functions to support downloading from AWS s3 #218

Are you sure you want to change the base?

Add new functions to support downloading from AWS s3 #218

Conversation

fang19911030 commented May 13, 2024

kjsato commented May 13, 2024

kjsato commented May 14, 2024 • edited

fang19911030 commented May 14, 2024

kjsato commented May 14, 2024 • edited

kjsato left a comment • edited

Choose a reason for hiding this comment

jcblemai left a comment

Choose a reason for hiding this comment

jcblemai May 21, 2024

Choose a reason for hiding this comment

jcblemai May 21, 2024

Choose a reason for hiding this comment

jcblemai May 21, 2024

Choose a reason for hiding this comment

jcblemai May 21, 2024

Choose a reason for hiding this comment

jcblemai May 21, 2024

Choose a reason for hiding this comment

jcblemai commented May 21, 2024

fang19911030 commented May 22, 2024

kjsato commented May 14, 2024 •

edited

kjsato commented May 14, 2024 •

edited

kjsato left a comment •

edited