Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idc-index CSV is not packaged with the refactored organization of the repo #41

Closed
fedorov opened this issue Feb 12, 2024 · 6 comments · Fixed by #48
Closed

idc-index CSV is not packaged with the refactored organization of the repo #41

fedorov opened this issue Feb 12, 2024 · 6 comments · Fixed by #48

Comments

@fedorov
Copy link
Member

fedorov commented Feb 12, 2024

@jcfr it is my fault that I missed it during my testing, but we realized it only today that the CSV file that serves as the content of the index is not packaged in the wheel.

This is how we did it before:

urllib.request.urlretrieve('https://github.com/ImagingDataCommons/idc-index/releases/download/'+package_version+'/idc_index.csv.zip', os.path.join(save_location, 'idc_index.csv.zip'))
.

Currently, since this file is not packaged, the fallback is activated and it is downloaded when the user instantiates the index:

if not os.path.exists(file_path):
.

Before declaring this to be a bug and working on a solution, I would like to understand what is the recommendation/best practice on how this should be done? We discussed this initially, and thought that it is better to install this file at the time the package is installed, and to avoid surprising user by downloading it at runtime, but maybe that is not the best practice? Do you have a perspective on this?

@jcfr
Copy link
Collaborator

jcfr commented Feb 12, 2024

re: that the CSV file that serves as the content of the index is not packaged in the wheel.

Thanks for pointing this out. Packaging the csv file as part of the wheel would allow to streamline the implementation by avoiding runtime download (that would need to be updated by adding checksum verification as it currently only check1 for file existence)

As you said, it would avoid an additional download upon the first instantiation of the IDCClient.

To avoid uploading the cvs file (~45M) for each platform specific wheels, we could also create a wheel called s5cmd-python-distributions that would be responsible to only install the s5cmd, that way we would be a good "citizen" and avoid re-uploading 90MB with each release of the idx-index wheel which would distributed as a pure-python one called idx-index-X.Y.Z-py3-none-any.whl.

Last, we could also look into having an other wheel responsible to only distribute the csv file, this wheel would be called idc-index-csv.

In the short term, I suggest to create s5cmd-python-distributions as well as post an issue at https://github.com/peak/s5cmd suggesting the creation of https://github.com/peak/s5cmd-python-distributions so that a wheel called s5cmd is available for download.

Footnotes

  1. https://github.com/ImagingDataCommons/idc-index/blob/597aef6d20da12f3071eacdc11c844115a5743ba/idc_index/index.py#L34-L40

@jcfr
Copy link
Collaborator

jcfr commented Feb 12, 2024

In my initial reply, I suggested idc-s5cmd, but I think creating https://github.com/ImagingDataCommons/s5cmd-python-distributions is better (conceptually similar to scikit-build/cmake-python-distributions1 or scikit-build/ninja-python-distributions2)

Footnotes

  1. https://github.com/scikit-build/cmake-python-distributions

  2. https://github.com/scikit-build/ninja-python-distributions

@jcfr
Copy link
Collaborator

jcfr commented Feb 12, 2024

@fedorov I am happy to help by posting an issue at https://github.com/peak/s5cmd-python-distributions and then move forward with the creation of the relevant project.

@fedorov
Copy link
Member Author

fedorov commented Feb 12, 2024

@jcfr we discussed this with @vkt1414 and agreed these are very cool suggestions! Yes, I think no one would be able to formulate the issue better for the s5cmd repo than you, it would be great if you could do it.

@jcfr
Copy link
Collaborator

jcfr commented Feb 13, 2024

it would be great if you could do it.

👌 I will take care of this either this evening or tomorrow.

@jcfr
Copy link
Collaborator

jcfr commented Feb 14, 2024

Corresponding issue has been created:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants