Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom file download location #915

Open
eculler opened this issue Oct 2, 2023 · 1 comment
Open

Custom file download location #915

eculler opened this issue Oct 2, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@eculler
Copy link
Contributor

eculler commented Oct 2, 2023

Earthpy users need to be able to download and cache data from API links which may have special characters and other qualities not conducive to automatic file name creation.

Allow users to set their own file name using the file_name parameter of et.data.get_data(url='...', file_name='...').

Caching can be related to the file name rather than the url, as a starting point.

@eculler eculler added the enhancement New feature or request label Oct 2, 2023
@eculler
Copy link
Contributor Author

eculler commented Oct 2, 2023

In general, I think we should consider the following behavior:

  • Allow users to set their own earthlab HOME directory through a configuration file and/or environment variable. A common scenario is that data must be stored on a larger external harddrive. If it is not set, then the default ~/earth-analytics can be used. My preference is for this not to happen within a workflow, so that it is reproducible.
  • We could also use a project_dir parameter to customize a project directory within ETHOME or ETHOME/data. ETHOME/earthpy-downloads could then be the default. Personally I think it makes more sense to put the key downloads and project directories both directly in ETHOME, as we do not put anything except data in there anyway, but I'm happy to keep it the same as it is now.
  • I'd like to avoid setting the working directory in code, personally. We could instead write a et.get_path() function or something like that, which would use the configured ETHOME, an optional project_dir, and an optional file_name or file_re to generate paths.
  • We could consider allowing users to keep their data in the project directory, adding it to the .gitignore file by default.
  • Finally, I would love to see a computation caching feature. I write these for my workflows, and it looks something like (with proper use of the pickle library):
    def cache(func, id, override=False, *args, **kwargs):
        if not os.path.exists(id.jar) or override:
            result = func(*args, **kwargs)
            save_pickle(result, id.jar)
        else:
            result = load_pickle(id.jar)
        return result

There's lots of fancy stuff we could do, like a ComputationCache parent class that users could inherit from when defining workflow steps, or better yet a @cache decorator for functions to automatically add this functionality. But we should look at some of the newer workflow organizing stuff and see - it might be better for us to just keep it simple and let folks have their chosen interface when they want to level-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants