Improve download options #480

ivastar · 2018-07-17T19:35:47Z

What is the preferred way to download data automatically? It would be great if the download options are improved. This is what I currently have:

from urllib.request import urlretrieve 

os.environ['iref'] = '~/iref/'
if not os.path.exists('iref'):
    os.mkdir('iref')

os.environ['jref'] = '~/jref/'
if not os.path.exists('jref'):
    os.mkdir('jref')

base_url = 'https://hst-crds.stsci.edu/unchecked_get/references/hst/'

files = glob.glob('data_download/mastDownload/HST/*/*fl?.fits')

for file in files:
    with fits.open(file) as hdu:
        for key in ['IDCTAB','NPOLFILE','D2IMFILE']:
            reffile_name = hdu[0].header[key].replace('$', '/')
            print(reffile_name)
            if not os.path.exists(reffile_name):
                urlretrieve(base_url + os.path.basename(reffile_name), reffile_name)

The text was updated successfully, but these errors were encountered:

jaytmiller · 2019-09-12T19:11:22Z

CRDS has a client package which interacts with the servers and implements something like this:


In [1]: os.environ["CRDS_PATH"] = ~/crds_cache
In [2]: os.environ["CRDS_SERVER_URL"] = "https://hst-crds.stsci.edu"

In [3]: from crds import client

In [4]: globbed = client.list_references("hst", "*fl?.fits")

In [5]: client.dump_files(files=globbed[:3])
CRDS - INFO -  Fetching  /Users/jmiller/crds_cache_dev/references/hst/acs/03h17189j_fls.fits  268.5 M bytes  (1 / 3 files) (0 / 805.5 M bytes)
CRDS - INFO -  Fetching  /Users/jmiller/crds_cache_dev/references/hst/acs/03h1718aj_fls.fits  268.5 M bytes  (2 / 3 files) (268.5 M / 805.5 M bytes)
CRDS - INFO -  Fetching  /Users/jmiller/crds_cache_dev/references/hst/acs/03h1718bj_fls.fits  268.5 M bytes  (3 / 3 files) (537.0 M / 805.5 M bytes)

Out[5]:
({'03h17189j_fls.fits': '/Users/jmiller/crds_cache_dev/references/hst/acs/03h17189j_fls.fits',
  '03h1718aj_fls.fits': '/Users/jmiller/crds_cache_dev/references/hst/acs/03h1718aj_fls.fits',
  '03h1718bj_fls.fits': '/Users/jmiller/crds_cache_dev/references/hst/acs/03h1718bj_fls.fits'},
 3,
 805472640)

dump_files() is returning a 3-tuple of the form:

( {basename: full_cache_path, ...}, files_downloaded, bytes_downloaded)

there are simpler functions dump_mappings() and dump_references() which return only the dictionary mapping basename to full_download_path.

(I'm in the process of fixing a bug which prevents ? in glob expressions, but other than that all of this already exists. Replacing ? with [explicit characters] should let you try it immediately.)

ivastar · 2021-01-24T21:19:51Z

This is great and probably pretty close to what I had in mind. It needs to be documented however. I don't see any documentation of the Python client anywhere in the docs, for JWST and HST. Especially HST, which has a big user community, steers the users to the command line client which results in the monstrosity illustrated above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve download options #480

Improve download options #480

ivastar commented Jul 17, 2018

jaytmiller commented Sep 12, 2019

ivastar commented Jan 24, 2021

Improve download options #480

Improve download options #480

Comments

ivastar commented Jul 17, 2018

jaytmiller commented Sep 12, 2019

ivastar commented Jan 24, 2021