Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Recipes for ISIMIP #158

Open
larsbuntemeyer opened this issue Jul 21, 2022 · 3 comments
Open

Proposed Recipes for ISIMIP #158

larsbuntemeyer opened this issue Jul 21, 2022 · 3 comments

Comments

@larsbuntemeyer
Copy link

larsbuntemeyer commented Jul 21, 2022

Source Dataset

ISIMIP provides CMIP6 bias adjusted datasets.

Quote:

The Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) offers a framework for consistently projecting the impacts of climate change across affected sectors and spatial scales. An international network of climate-impact modellers contribute to a comprehensive and consistent picture of the world under different climate-change scenarios.

Transformation / Alignment / Merging

The files should be concatenated along the time dimension. The structure is similar to ESGF, in fact, the data used to be available also in the ESGF but this has ended.

Output Dataset

zarr output format.

@chuckwondo
Copy link

chuckwondo commented Jul 22, 2022

@larsbuntemeyer, I've started to take a stab at this, but have some questions about the required dimensions.

First, I want to understand how many recipes make sense for the data:

  1. In looking at https://data.isimip.org/, it appears that you would want at least 3 recipes: (a) climate forcing, (b) socioeconomic forcing, and (c) static goeographic information. Are those indeed the only top-level collections?
  2. Within each of those 3, there are 4 simulation rounds. Do you want each of those simulations to be separate recipes, thus resulting in 3 x 4 = 12 recipes?

Further, in looking at the climate forcing data, it appears that the following dimensions might make sense:

  1. year range (e.g., 1901-1910)
  2. climate forcing (e.g., GSWP3-EWEMBI)
  3. climate variable (e.g., huss)

Please let me know if that makes sense, or if I'm off base (please know that I'm a noob to all of this, so I'm not familiar with all of the domain-specific terminology).

So far, not including "climate forcing" as a dimension (i.e., only year range and climate variable dimensions), I have this:

from typing import Tuple
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern, MergeDim


def make_url(variable: str, year_range: Tuple[int, int]):
    start_year, end_year = year_range
    template = (
        "https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/"
        "GSWP3-EWEMBI/{variable}_gswp3-ewembi_{start_year}_{end_year}.nc4"
    )

    return template.format(
        variable=variable,
        start_year=start_year,
        end_year=end_year
    )


year_ranges = (
    *((start_year, start_year + 9) for start_year in range(1901, 2011, 10)),
    (2011, 2016)
)
variables = (
    "huss",
    "pr",
    "ps",
    "rhs",
    "rlds",
    "rsds",
    "tas",
    "tasmax",
    "tasmin"
    "wind",
)

pattern = FilePattern(
    make_url,
    MergeDim(name="variable", keys=variables),
    ConcatDim(name="year_range", keys=year_ranges, nitems_per_file=10),
)

The urls of the items generated by the pattern look like this:

https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/GSWP3-EWEMBI/huss_gswp3-ewembi_1901_1910.nc4
https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/GSWP3-EWEMBI/huss_gswp3-ewembi_1911_1920.nc4
...
https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/GSWP3-EWEMBI/huss_gswp3-ewembi_2011_2016.nc4
https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/GSWP3-EWEMBI/pr_gswp3-ewembi_1901_1910.nc4
https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/GSWP3-EWEMBI/pr_gswp3-ewembi_1911_1920.nc4
...
https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/GSWP3-EWEMBI/tasminwind_gswp3-ewembi_2001_2010.nc4
https://files.isimip.org/ISIMIP2a/InputData/climate_co2/climate/HistObs/GSWP3-EWEMBI/tasminwind_gswp3-ewembi_2011_2016.nc4

I've managed to bend Ryan Abernathy's (@rabernat) ear about this during the ESIP Summer 2022 meeting, so I'm looking to get some traction while it's still fresh in my mind.

@chuckwondo
Copy link

chuckwondo commented Jul 22, 2022

@rabernat, each file contains 10 years, except for the last in each group, where each contains only 6 years (2011-2016). Does specifying nitems_per_file=10 cause a problem for those files that don't span 10 years?

@larsbuntemeyer
Copy link
Author

@chuckwondo, thanks for picking this up so quickly. I haven't really been able to wrap my head around those questions yet, and i just wanted to drop that recipe idea here since we have some PhD things comping up and i wanted to avoid everybody starting to download those datasets, urgh... 😩 As you mentioned, it surely makes sense to split those recipes up, at least by the simulation round. For more details, i need first to have more experience with those datasets...

I think building up urls from dataset attributes makes totally sense for ISIMIP, but i also wanted to bring the search API to attention which might give some more control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants