Skip to content

Intake catalog referencing ISCCP HGH, HGM, HGG products

Notifications You must be signed in to change notification settings

ISSI-CONSTRAIN/isccp

Repository files navigation

ISCCP intake data catalog

Intake catalog referencing the official International Satellite Cloud Climatology Project (ISCCP) dataset on the NOAA S3 Bucket s3://noaa-cdr-cloud-properties-isccp-pds with the DOI 10.7289/V5QZ281S.

Statement of need

The datasets on the S3 bucket are saved as individual netCDF files at hourly or monthly resolution depending on the ISCCP product. Accessing or querying the entire ISCCP timeseries from July 1983 to June 2017 would require the download of the individual files and concatenation offline. With the dataset being available on a modern cloud storage, the requests can be made more efficient by loading only the chunks of data necessary for the computation of interest. To make this possible and have the entire ISCCP dataset lazily available, this repository created so called reference files and virtually merged the individual netCDF files to one dataset.

Usage

Warning

This is not an official repository. Because this catalog only references the original dataset, post-processing issues are limited but might still exist, particularly in the form of missing timesteps and metadata inconsistencies. Attributing this work is encouraged but the original data source provider should always be acknowledged and their reference policy followed.

Python environment requirements

pip install "intake<2.0.0" xarray intake-xarray zarr s3fs requests

Opening dataset

>>> import intake
>>>
>>> # Load catalog
>>> cat = intake.open_catalog("https://raw.githubusercontent.com/ISSI-CONSTRAIN/isccp/main/catalog.yaml")
>>>
>>> # List catalog entries
>>> list(cat)
['ISCCP_BASIC_HGH', 'ISCCP_BASIC_HGG', 'ISCCP_BASIC_HGM']
>>>
>>> # Load dataset lazily as xarray dataset
>>> ds = cat['ISCCP_BASIC_HGG'].to_dask()
<xarray.Dataset> Size: 9TB
Dimensions:             (time: 99352, lat: 180, lon: 360, cloud_irtype: 3,
                         cloud_type: 18, edge: 2, levpc: 7, levtau: 6,
                         satpos: 12)
Coordinates:
  * lat                 (lat) float32 720B -89.5 -88.5 -87.5 ... 87.5 88.5 89.5
  * levpc               (levpc) float32 28B 95.0 245.0 375.0 ... 740.0 912.5
  * levtau              (levtau) float32 24B 0.5 2.3 6.0 14.5 34.74 109.8
  * lon                 (lon) float32 1kB 0.5 1.5 2.5 3.5 ... 357.5 358.5 359.5
  * time                (time) datetime64[ns] 795kB 1983-07-01 ... 2017-06-30...
Dimensions without coordinates: cloud_irtype, cloud_type, edge, satpos
Data variables: (12/43)
    cell_origin         (time, lat, lon) float32 26GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
    cldamt              (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
    cldamt_ir           (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
    cldamt_irmarg       (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
    cldamt_irtypes      (time, cloud_irtype, lat, lon) float64 155GB dask.array<chunksize=(1, 3, 180, 360), meta=np.ndarray>
    cldamt_types        (time, cloud_type, lat, lon) float64 927GB dask.array<chunksize=(1, 18, 180, 360), meta=np.ndarray>
    ...                  ...
    tc_pcdist           (time, levpc, lat, lon) float64 361GB dask.array<chunksize=(1, 7, 180, 360), meta=np.ndarray>
    tc_type             (time, cloud_type, lat, lon) float64 927GB dask.array<chunksize=(1, 18, 180, 360), meta=np.ndarray>
    time_bounds         (time, edge) datetime64[ns] 2MB dask.array<chunksize=(1, 2), meta=np.ndarray>
    wp                  (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
    wp_ir               (time, lat, lon) float64 52GB dask.array<chunksize=(1, 180, 360), meta=np.ndarray>
    wp_type             (time, cloud_type, lat, lon) float64 927GB dask.array<chunksize=(1, 18, 180, 360), meta=np.ndarray>
Attributes: (12/67)
    Conventions:                              CF-1.6, ACDD-1.3
    NCO:                                      4.4.4
    acknowledgement:                          This project received funding s...
    cdm_data_type:                            Grid
    comment:                                  ---------- TO RE-MAP EQUAL-AREA...
    contributor_name:                         William B. Rossow, Alison Walke...
    ...                                       ...

Example plot

>>> ds.tau.sel(time='2017-05-01 00:00:00').plot()

image

Reproduce reference files

DVC has been used to track the workflow to create the reference files. The individual commands are therefore listed in the dvc.yaml file and can be run by dvc repro. This step should only be necessary if the ISCCP dataset on the NOAA S3 bucket changes and errors occur.

Similar work

These datasets seem to be unavailable