Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netCDF time encoding challenges #1203

Open
rabernat opened this issue Sep 8, 2020 · 9 comments
Open

netCDF time encoding challenges #1203

rabernat opened this issue Sep 8, 2020 · 9 comments

Comments

@rabernat
Copy link

rabernat commented Sep 8, 2020

I am working with @neerajabhamidipati to create some analysis-ready MOM6 data for the Ocean Eddy CPT.

I have noticed some issues with time encoding in MOM6 files. To summarize, they are the following:

  • Some time variables in our data (such at time itself) use a non-cf-compliant calendar attribute: THIRTY_DAY_MONTHS. The standards state it should be 360_day.
  • Other time variables (average_T1, average_T2, time_bnds) do not have the calendar attribute set, but the do have timelike units: days since 0001-01-01 00:00:00. This leads to problems decoding the variable with cftime inside xarray. This could be considered an xarray bug, since perhaps xarray should "know" to propagate the time attributes to other timelike variables. But currently, these variables can't be decoded.

A workaround to decode times in xarray is as follows:

ds = xr.open_dataset('file.nc', decode_times=False)

for var in ds.variables:
    # fix wrong calendar encoding
    if ds[var].attrs.get('calendar') == 'THIRTY_DAY_MONTHS':
        ds[var].attrs['calendar'] = '360_day'
    # fix missing calendar encoding
    elif ds[var].dims == ('time',) and ds[var].attrs.get('units', '').startswith('days'):
        ds[var].attrs['calendar'] = '360_day'
 
ds = xr.decode_cf(ds)  

Note you need cftime installed for this to work.

netcdf averages_00000002 {
dimensions:
	xq = 241 ;
	yh = 560 ;
	zl = 15 ;
	time = UNLIMITED ; // (100 currently)
	nv = 2 ;
	xh = 240 ;
	yq = 561 ;
variables:
	double xq(xq) ;
		xq:long_name = "q point nominal longitude" ;
		xq:units = "degrees_east" ;
		xq:cartesian_axis = "X" ;
	double yh(yh) ;
		yh:long_name = "h point nominal latitude" ;
		yh:units = "degrees_north" ;
		yh:cartesian_axis = "Y" ;
	double zl(zl) ;
		zl:long_name = "Layer Target Potential Density" ;
		zl:units = "kg m-3" ;
		zl:cartesian_axis = "Z" ;
		zl:positive = "up" ;
	double time(time) ;
		time:long_name = "time" ;
		time:units = "days since 0001-01-01 00:00:00" ;
		time:cartesian_axis = "T" ;
		time:calendar_type = "THIRTY_DAY_MONTHS" ;
		time:calendar = "THIRTY_DAY_MONTHS" ;
		time:bounds = "time_bnds" ;
	double nv(nv) ;
		nv:long_name = "vertex number" ;
		nv:units = "none" ;
		nv:cartesian_axis = "N" ;
	double xh(xh) ;
		xh:long_name = "h point nominal longitude" ;
		xh:units = "degrees_east" ;
		xh:cartesian_axis = "X" ;
	double yq(yq) ;
		yq:long_name = "q point nominal latitude" ;
		yq:units = "degrees_north" ;
		yq:cartesian_axis = "Y" ;
	float u(time, zl, yh, xq) ;
		u:long_name = "Zonal velocity" ;
		u:units = "m s-1" ;
		u:missing_value = 1.e+20f ;
		u:_FillValue = 1.e+20f ;
		u:cell_methods = "zl:mean yh:mean xq:point time: mean" ;
		u:time_avg_info = "average_T1,average_T2,average_DT" ;
		u:interp_method = "none" ;
	float v(time, zl, yq, xh) ;
		v:long_name = "Meridional velocity" ;
		v:units = "m s-1" ;
		v:missing_value = 1.e+20f ;
		v:_FillValue = 1.e+20f ;
		v:cell_methods = "zl:mean yq:point xh:mean time: mean" ;
		v:time_avg_info = "average_T1,average_T2,average_DT" ;
		v:interp_method = "none" ;
	float h(time, zl, yh, xh) ;
		h:long_name = "Layer Thickness" ;
		h:units = "m" ;
		h:missing_value = 1.e+20f ;
		h:_FillValue = 1.e+20f ;
		h:cell_methods = "area:mean zl:sum yh:mean xh:mean time: mean" ;
		h:cell_measures = "area: area_t" ;
		h:time_avg_info = "average_T1,average_T2,average_DT" ;
	float uh(time, zl, yh, xq) ;
		uh:long_name = "Zonal Thickness Flux" ;
		uh:units = "m3 s-1" ;
		uh:missing_value = 1.e+20f ;
		uh:_FillValue = 1.e+20f ;
		uh:cell_methods = "zl:sum yh:sum xq:point time: mean" ;
		uh:time_avg_info = "average_T1,average_T2,average_DT" ;
		uh:interp_method = "none" ;
	float vh(time, zl, yq, xh) ;
		vh:long_name = "Meridional Thickness Flux" ;
		vh:units = "m3 s-1" ;
		vh:missing_value = 1.e+20f ;
		vh:_FillValue = 1.e+20f ;
		vh:cell_methods = "zl:sum yq:point xh:sum time: mean" ;
		vh:time_avg_info = "average_T1,average_T2,average_DT" ;
		vh:interp_method = "none" ;
	float RV(time, zl, yq, xq) ;
		RV:long_name = "Relative Vorticity" ;
		RV:units = "s-1" ;
		RV:missing_value = 1.e+20f ;
		RV:_FillValue = 1.e+20f ;
		RV:cell_methods = "zl:mean yq:point xq:point time: mean" ;
		RV:time_avg_info = "average_T1,average_T2,average_DT" ;
		RV:interp_method = "none" ;
	float RV2(time, zl, yq, xq) ;
		RV2:long_name = "Relative Vorticity" ;
		RV2:units = "s-1" ;
		RV2:missing_value = 1.e+20f ;
		RV2:_FillValue = 1.e+20f ;
		RV2:cell_methods = "zl:mean yq:point xq:point time: mean_pow(2)" ;
		RV2:time_avg_info = "average_T1,average_T2,average_DT" ;
		RV2:interp_method = "none" ;
	float NoSt2(time, zl, yh, xh) ;
		NoSt2:long_name = "Normal Stress" ;
		NoSt2:units = "s-1" ;
		NoSt2:missing_value = 1.e+20f ;
		NoSt2:_FillValue = 1.e+20f ;
		NoSt2:cell_methods = "area:mean zl:mean yh:mean xh:mean time: mean_pow(2)" ;
		NoSt2:cell_measures = "area: area_t" ;
		NoSt2:time_avg_info = "average_T1,average_T2,average_DT" ;
	float ShSt2(time, zl, yq, xq) ;
		ShSt2:long_name = "Shear Stress" ;
		ShSt2:units = "s-1" ;
		ShSt2:missing_value = 1.e+20f ;
		ShSt2:_FillValue = 1.e+20f ;
		ShSt2:cell_methods = "zl:mean yq:point xq:point time: mean_pow(2)" ;
		ShSt2:time_avg_info = "average_T1,average_T2,average_DT" ;
		ShSt2:interp_method = "none" ;
	float dudt(time, zl, yh, xq) ;
		dudt:long_name = "Zonal Acceleration" ;
		dudt:units = "m s-2" ;
		dudt:missing_value = 1.e+20f ;
		dudt:_FillValue = 1.e+20f ;
		dudt:cell_methods = "zl:mean yh:mean xq:point time: mean" ;
		dudt:time_avg_info = "average_T1,average_T2,average_DT" ;
		dudt:interp_method = "none" ;
	float dvdt(time, zl, yq, xh) ;
		dvdt:long_name = "Meridional Acceleration" ;
		dvdt:units = "m s-2" ;
		dvdt:missing_value = 1.e+20f ;
		dvdt:_FillValue = 1.e+20f ;
		dvdt:cell_methods = "zl:mean yq:point xh:mean time: mean" ;
		dvdt:time_avg_info = "average_T1,average_T2,average_DT" ;
		dvdt:interp_method = "none" ;
	float CAu(time, zl, yh, xq) ;
		CAu:long_name = "Zonal Coriolis and Advective Acceleration" ;
		CAu:units = "m s-2" ;
		CAu:missing_value = 1.e+20f ;
		CAu:_FillValue = 1.e+20f ;
		CAu:cell_methods = "zl:mean yh:mean xq:point time: mean" ;
		CAu:time_avg_info = "average_T1,average_T2,average_DT" ;
		CAu:interp_method = "none" ;
	float CAv(time, zl, yq, xh) ;
		CAv:long_name = "Meridional Coriolis and Advective Acceleration" ;
		CAv:units = "m s-2" ;
		CAv:missing_value = 1.e+20f ;
		CAv:_FillValue = 1.e+20f ;
		CAv:cell_methods = "zl:mean yq:point xh:mean time: mean" ;
		CAv:time_avg_info = "average_T1,average_T2,average_DT" ;
		CAv:interp_method = "none" ;
	float PFu(time, zl, yh, xq) ;
		PFu:long_name = "Zonal Pressure Force Acceleration" ;
		PFu:units = "m s-2" ;
		PFu:missing_value = 1.e+20f ;
		PFu:_FillValue = 1.e+20f ;
		PFu:cell_methods = "zl:mean yh:mean xq:point time: mean" ;
		PFu:time_avg_info = "average_T1,average_T2,average_DT" ;
		PFu:interp_method = "none" ;
	float PFv(time, zl, yq, xh) ;
		PFv:long_name = "Meridional Pressure Force Acceleration" ;
		PFv:units = "m s-2" ;
		PFv:missing_value = 1.e+20f ;
		PFv:_FillValue = 1.e+20f ;
		PFv:cell_methods = "zl:mean yq:point xh:mean time: mean" ;
		PFv:time_avg_info = "average_T1,average_T2,average_DT" ;
		PFv:interp_method = "none" ;
	double average_T1(time) ;
		average_T1:long_name = "Start time for average period" ;
		average_T1:units = "days since 0001-01-01 00:00:00" ;
		average_T1:missing_value = 1.e+20 ;
		average_T1:_FillValue = 1.e+20 ;
	double average_T2(time) ;
		average_T2:long_name = "End time for average period" ;
		average_T2:units = "days since 0001-01-01 00:00:00" ;
		average_T2:missing_value = 1.e+20 ;
		average_T2:_FillValue = 1.e+20 ;
	double average_DT(time) ;
		average_DT:long_name = "Length of average period" ;
		average_DT:units = "days" ;
		average_DT:missing_value = 1.e+20 ;
		average_DT:_FillValue = 1.e+20 ;
	double time_bnds(time, nv) ;
		time_bnds:long_name = "time axis boundaries" ;
		time_bnds:units = "days" ;
		time_bnds:missing_value = 1.e+20 ;
		time_bnds:_FillValue = 1.e+20 ;
// global attributes:
		:filename = "averages_00000002.nc" ;
		:title = "NeverWorld2" ;
		:associated_files = "area_t: static.nc" ;
		:grid_type = "regular" ;
		:grid_tile = "N/A" ;
}
@rabernat
Copy link
Author

rabernat commented Sep 8, 2020

From my point of view, I would prefer if MOM6 did the following:

  • use the CF-compliant name for the calendar attribute: 360_day
  • add calendar attributes to all timelike variables

@adcroft
Copy link
Collaborator

adcroft commented Sep 8, 2020

CF compliance and the details of file format are, for the most part, handled by FMS. All of our output passes through FMS's diag_manager . To fix this requires a change to FMS. We've traced the source of the "THIRTY_DAY_MONTHS" to https://github.com/NOAA-GFDL/FMS/blob/master/time_manager/time_manager.F90#L3275 . A simple fix is to change:

    valid_calendar_types = 'THIRTY_DAY_MONTHS       '
    valid_calendar_types = '360_DAY                 '

which I've suggested as a temporary work around to @neerajabhamidipati . I've submitted PR NOAA-GFDL/FMS#574 but there's no telling how long it will take to make it into a tagged branch.

@adcroft
Copy link
Collaborator

adcroft commented Sep 8, 2020

I agree that where there is a unit such as "days since ..." on a time-like variable then the calendar attribute would be less ambiguous and easier for software to interpret. Changing this in FMS will be a much bigger exercise than the one-liner above. However, in the CF specification all examples of time_bnds haves no attributes at all. I presume this is because time_bnds uses the dimension time and is associated with time by name. Unfortunately, in the first paragraph of the calendar section it says "... we recommend that the calendar be specified by the attribute calendar which is assigned to the time coordinate variable." which I think leaves open the possibility that the calendar attribute need only be associated with the time dimension variable. :(

@rabernat
Copy link
Author

rabernat commented Sep 9, 2020

which I think leaves open the possibility that the calendar attribute need only be associated with the time dimension variable

So in this case, we probably need to make xarray a bit smarter about how how it decodes timelike variables. Since pydata/xarray#2571 (see v0.11.1 release notes), xarray has done this correctly for time_bnds. This is doable because time references time_bnds explicitly via the bounds attribute. Variables like average_T1 and average_T2 are much trickier, since there is no explicit linkage between them and time, other than the fact that they have time as their only dimension. It might be assuming too much to expect the calendar attribute to be propagated automatically to all such variables. On the other hand, it would be nice if xarray's time decoding did not completely fail for the whole file in these cases.

I'll open some kind of xarray issue to work out that case.

@adcroft
Copy link
Collaborator

adcroft commented Sep 9, 2020

In the case of average_T1 and average_T2 it seems to me we should investigate whether we can add the appropriate attributes. I don't think these are mentioned in the CF document (?) in which case they are simply data variables with units of date (which need a calendar). Given how quickly the FMS team responded to the PR above this might be the best option but I wont be able to look into it immediately myself.

@balaji-gfdl
Copy link

average_t1, _t2, and _dt are GFDLisms and not part of CF. They allow you to perform time averaging where end intervals that are shared across two averaging periods can be half-weighted.

@Hallberg-NOAA
Copy link
Collaborator

Does anyone foresee any downside in adding a calendar and other related attributes to these variables?

@adcroft
Copy link
Collaborator

adcroft commented Sep 9, 2020

It is hard to imagine a downside to adding these attributes. The reason I haven't rushed in yet (apart from being busy) is I suspect this would require somewhat more code than the one line change in the above PR.

@klindsay28
Copy link
Contributor

I think the primary risk is that putting metadata in multiple places introduces a risk that it can become inconsistent, either through a later change to MOM and/or FMS, or through some post-processing tool manipulating the metadata in one place and not the other.

Quoting myself from an xarray issue, the CF conventions section on the bounds attribute states:

Since a boundary variable is considered to be part of a coordinate variable’s metadata, it is not necessary to provide it with attributes such as long_name and units.

Boundary variable attributes which determine the coordinate type (units, standard_name, axis and positive) or those which affect the interpretation of the array values (units, calendar, leap_month, leap_year and month_lengths) must always agree exactly with the same attributes of its associated coordinate, scalar coordinate or auxiliary coordinate variable. To avoid duplication, however, it is recommended that these are not provided to a boundary variable.

So CF recommends that time_bnds not have these attributes. Note that this is just a recommendation and not a requirement, adding more complete metadata to time_bnds would not violate CF.

Also note that CF has nothing to say regarding average_T1, average_T2, and average_DT (as @balaji-gfdl pointed out). The lack of a CF based connection between time and these latter variables might be an argument for having more complete metadata for these variables. Then consistency might be an argument for also adding more complete metadata to time_bnds.

I think the problem in this xarray issue arose because a tool in the user's workflow messed up the metadata. I recall looking into it and concluding that the culprit was CDO. I experimented with CDO and found that it could alter the value of time.attrs.bounds but not rename the variable with that name, breaking the metadata connection. Tools doing such things is unfortunate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants