Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve download options #480

Open
ivastar opened this issue Jul 17, 2018 · 2 comments
Open

Improve download options #480

ivastar opened this issue Jul 17, 2018 · 2 comments

Comments

@ivastar
Copy link

ivastar commented Jul 17, 2018

What is the preferred way to download data automatically? It would be great if the download options are improved. This is what I currently have:

from urllib.request import urlretrieve 

os.environ['iref'] = '~/iref/'
if not os.path.exists('iref'):
    os.mkdir('iref')

os.environ['jref'] = '~/jref/'
if not os.path.exists('jref'):
    os.mkdir('jref')

base_url = 'https://hst-crds.stsci.edu/unchecked_get/references/hst/'

files = glob.glob('data_download/mastDownload/HST/*/*fl?.fits')

for file in files:
    with fits.open(file) as hdu:
        for key in ['IDCTAB','NPOLFILE','D2IMFILE']:
            reffile_name = hdu[0].header[key].replace('$', '/')
            print(reffile_name)
            if not os.path.exists(reffile_name):
                urlretrieve(base_url + os.path.basename(reffile_name), reffile_name)
@jaytmiller
Copy link
Collaborator

CRDS has a client package which interacts with the servers and implements something like this:


In [1]: os.environ["CRDS_PATH"] = ~/crds_cache
In [2]: os.environ["CRDS_SERVER_URL"] = "https://hst-crds.stsci.edu"

In [3]: from crds import client

In [4]: globbed = client.list_references("hst", "*fl?.fits")

In [5]: client.dump_files(files=globbed[:3])
CRDS - INFO -  Fetching  /Users/jmiller/crds_cache_dev/references/hst/acs/03h17189j_fls.fits  268.5 M bytes  (1 / 3 files) (0 / 805.5 M bytes)
CRDS - INFO -  Fetching  /Users/jmiller/crds_cache_dev/references/hst/acs/03h1718aj_fls.fits  268.5 M bytes  (2 / 3 files) (268.5 M / 805.5 M bytes)
CRDS - INFO -  Fetching  /Users/jmiller/crds_cache_dev/references/hst/acs/03h1718bj_fls.fits  268.5 M bytes  (3 / 3 files) (537.0 M / 805.5 M bytes)

Out[5]:
({'03h17189j_fls.fits': '/Users/jmiller/crds_cache_dev/references/hst/acs/03h17189j_fls.fits',
  '03h1718aj_fls.fits': '/Users/jmiller/crds_cache_dev/references/hst/acs/03h1718aj_fls.fits',
  '03h1718bj_fls.fits': '/Users/jmiller/crds_cache_dev/references/hst/acs/03h1718bj_fls.fits'},
 3,
 805472640)

dump_files() is returning a 3-tuple of the form:

( {basename: full_cache_path, ...}, files_downloaded, bytes_downloaded)

there are simpler functions dump_mappings() and dump_references() which return only the dictionary mapping basename to full_download_path.

(I'm in the process of fixing a bug which prevents ? in glob expressions, but other than that all of this already exists. Replacing ? with [explicit characters] should let you try it immediately.)

@ivastar
Copy link
Author

ivastar commented Jan 24, 2021

This is great and probably pretty close to what I had in mind. It needs to be documented however. I don't see any documentation of the Python client anywhere in the docs, for JWST and HST. Especially HST, which has a big user community, steers the users to the command line client which results in the monstrosity illustrated above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants