Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Retain bounds and compute time point for group averaging operations #565

Open
pochedls opened this issue Nov 13, 2023 · 2 comments
Labels
type: enhancement New enhancement request

Comments

@pochedls
Copy link
Collaborator

Is your feature request related to a problem?

Time bounds are dropped when computing group averages and the time point is set to the beginning of the averaging period.

Note that time values exist in the initial dataset:

# import xcdat
import xcdat as xc
# open dataset
dpath = '/p/user_pub/work/CMIP6/CMIP/E3SM-Project/E3SM-2-0/historical/r1i1p1f1/Amon/ts/gr/v20220830/'
ds = xc.open_mfdataset(dpath)
# show time bounds present
ds.time_bnds

<xarray.DataArray 'time_bnds' (time: 1980, bnds: 2)>
dask.array<concatenate, shape=(1980, 2), dtype=object, chunksize=(600, 2), chunktype=numpy.ndarray>
Coordinates:

  • time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
    Dimensions without coordinates: bnds

But the bounds disappear for the group average values:

# compute annual averages
ds = ds.temporal.group_average('ts', freq='year')
# extract time_bnds
ds.time_bnds

AttributeError: 'Dataset' object has no attribute 'time_bnds'

And the time point for each group average is at the beginning of the period:

# inspect time values
 ds.time.values  

array([cftime.DatetimeNoLeap(1850, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1851, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1852, 1, 1, 0, 0, 0, 0, has_year_zero=True),
...

Describe the solution you'd like

  1. Ideally we could return time_bnds with group averaging calculations. I think the returned bounds could be the lower most and upper most bound for the averaged data.

  2. The returned time points could then be the mean of these returned time bounds, which would be more representative than a time point in the beginning of the averaged period.

Describe alternatives you've considered

No response

Additional context

No response

@taylor13
Copy link

To be clear about what the bounds should be on the mean, consider a daily mean computed from four 6-hourly mean samples centered at 3Z, 9Z, 15Z, and 21Z. If the 6-hour means have bounds 0-6Z, 6-12Z, 12-18Z, and 18-24Z, then we want the daily mean to extend from the beginning of the interval represented by the first sample (i.e., from 0Z) to the end of the last sample (i.e., to 24Z, or 0Z of the next day). So for a daily mean for the first month of this year, the bounds would be 2024-01-01 0:00:00 and 2024-01-02 0:00:00, while the coordinate value would be 2024-01-01 12:00:00.

@taylor13
Copy link

Also, in the above example, one could simply average the four 6- hour time means because they were fully contained within a single day (and fully covered all the hours of the day). If the 6-hourly time-mean samples were centered instead on 0Z, 6Z, 12Z, and 18Z, then to form a daily mean extending from 0-24Z, you would need to compute the mean as (.5x0 + x1 + x2 + x3 + .5*x4)/4. That is each sample should be weighted by the time interval overlapping the daily time-interval of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
Status: Todo
Development

No branches or pull requests

3 participants