Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with slicing of time in open_dataset_extra #797

Open
SCLaan opened this issue Feb 27, 2024 · 2 comments
Open

problem with slicing of time in open_dataset_extra #797

SCLaan opened this issue Feb 27, 2024 · 2 comments

Comments

@SCLaan
Copy link

SCLaan commented Feb 27, 2024

The time slicing in the function open_dataset_extra in interpolate_grid2bnd.py causes an error if the timestamp of the requested tstart is not at midnight (00:00).

Think of an xarray.DataArray, where:

In  : data_xr.time
Out : 
<xarray.DataArray 'time' (time: 4)> Size: 28B
array(['2021-12-31T00:00:00.000000000', '2022-01-01T00:00:00.000000000',
       '2022-01-02T00:00:00.000000000', '2022-01-03T00:00:00.000000000'],
       dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 28B 2021-12-31 2022-01-01 ... 2022-01-03
Attributes:
    valid_min:  2021-12-31T00:00:00.000000000
    valid_max:  2022-01-03T00:00:00.000000000

and: tstart, tstop = Timestamp('2021-12-31 12:00:00'), Timestamp('2022-01-02 12:00:00')

This line:

data_xr_vars = data_xr[quantity_list].sel(time=slice(tstart,tstop))

returns an xarray.DataArray without the first and last times:

In  : data_xr.time
Out : 
<xarray.DataArray 'time' (time: 2)> Size: 14B
array(['2022-01-01T00:00:00.000000000','2022-01-02T00:00:00.000000000'],
       dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 14B 2022-01-01 2022-01-02 ... 2022-01-02
Attributes:
    valid_min:  2021-12-31T00:00:00.000000000
    valid_max:  2022-01-03T00:00:00.000000000

This causes an error in the following line, where a check is done:

check_time_extent(data_xr_vars, tstart, tstop)

I suggest to change line 335 to:
data_xr_vars = data_xr[quantity_list].sel(time=slice(tstart.floor('1d'),tstop.ceil('1d')))

@SCLaan
Copy link
Author

SCLaan commented Feb 27, 2024

MWE:

import os
from pathlib import Path

import dfm_tools as dfmt

# Settings
timevect    = {'t_start' : '2021-12-31 12:00:00', # include last time step in year before simulation starts
               't_end'   : '2022-01-02 12:00:00'} # include first time step in year after simulation finishes
coords      = [-76,-59,35,46]
refdate_str = 'MINUTES since 2015-01-01 00:00:00'
datadir     = r'.\data_test' # directory for downloaded files
overwrite   = True # overwrite existing files?

# Download data
os.makedirs(datadir, exist_ok=True)
dfmt.download_CMEMS('zos',
                    longitude_min=coords[0],longitude_max=coords[1],
                    latitude_min=coords[2], latitude_max=coords[3],
                    date_min=timevect['t_start'], date_max=timevect['t_end'],
                    dir_output=datadir, overwrite=overwrite)

# Convert to .bc
conversion_dict  = dfmt.get_conversion_dict()
data_xr_vars = dfmt.open_dataset_extra(dir_pattern=Path(datadir,'{ncvarname}_*.nc'),
                                       quantity='waterlevelbnd', reverse_depth=True,
                                       tstart=timevect['t_start'], tstop=timevect['t_end'],
                                       conversion_dict=conversion_dict,
                                       refdate_str=refdate_str)

Another good example would be to download monthly output and convert this to a period that doesn't start or end on the first of the month. For example when downloading and converting monthly-mean NO3 (freq='M') from dataset_id='global-analysis-forecast-bio-001-028-monthly':

import os
from pathlib import Path

import dfm_tools as dfmt

# Settings
timevect    = {'t_start' : '2021-12-31 12:00:00', # include last time step in year before simulation starts
               't_end'   : '2022-01-02 12:00:00'} # include first time step in year after simulation finishes
coords      = [-76,-59,35,46]
refdate_str = 'MINUTES since 2015-01-01 00:00:00'
datadir     = r'.\data_test' # directory for downloaded files
overwrite   = True # overwrite existing files?

# Download data
os.makedirs(datadir, exist_ok=True)
dfmt.download_CMEMS('no3',
                    longitude_min=coords[0],longitude_max=coords[1],
                    latitude_min=coords[2], latitude_max=coords[3],
                    date_min=timevect['t_start'], date_max=timevect['t_end'],
                    freq='M', dataset_id='global-analysis-forecast-bio-001-028-monthly',
                    dir_output=datadir, overwrite=overwrite)

# Convert to .bc
conversion_dict  = dfmt.get_conversion_dict()
data_xr_vars = dfmt.open_dataset_extra(dir_pattern=Path(datadir,'{ncvarname}_*.nc'),
                                       quantity='tracerbndNO3', reverse_depth=True,
                                       tstart=timevect['t_start'], tstop=timevect['t_end'],
                                       conversion_dict=conversion_dict,
                                       refdate_str=refdate_str)

This was referenced Mar 13, 2024
@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Mar 14, 2024

This also happens in tests/examples/preprocess_interpolate_nc_to_bc.py now (#802), which was not the case before: "OutOfRangeError: requested tstop 2012-04-01 12:00:00 outside of available range 2012-01-16 12:00:00 to 2012-03-16 12:00:00."

import os
import dfm_tools as dfmt
import xarray as xr
file_pli = r'p:\archivedprojects\11208054-004-dcsm-fm\models\model_input\bnd_cond\pli\DCSM-FM_OB_all_20181108.pli'
tstart = '2012-01-16 12:00'
tstop = '2012-04-16 12:00'
dir_sourcefiles_waq = r'p:\archivedprojects\11206304-futuremares-rawdata-preps\python_scripts\ocean_boundaryCMEMS\data_monthly'
dir_pattern = os.path.join(dir_sourcefiles_waq,'cmems_mod_glo_bgc_my_0.25_P1M-m_{ncvarname}_*.nc') 
ds = xr.open_mfdataset(dir_pattern.format(ncvarname="no3"))
print(ds.time)
data_xr_vars = dfmt.open_dataset_extra(dir_pattern=dir_pattern, 
                                       quantity='tracerbndNO3',
                                       tstart=tstart, tstop=tstop)`

But this is actually since the source data is monthly and mixed noon/midnight times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants