Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ERA5 Downloader -d flag error if run twice #227

Open
bnubald opened this issue Mar 10, 2024 · 1 comment
Open

[BUG] ERA5 Downloader -d flag error if run twice #227

bnubald opened this issue Mar 10, 2024 · 1 comment
Labels
bug Something isn't working defensive Attempting to cover different error scenarios

Comments

@bnubald
Copy link
Collaborator

bnubald commented Mar 10, 2024

  • IceNet version: v0.2.7
  • Python version: 3.11.7
  • Operating System: Linux x64

Description

Running the ERA5 downloader twice with the -d flag (prevent deletion of temp files) causes an error when run the second time.

Subsequent run of the same command does not show the error (since temp file was deleted in second run but not replaced due to error).

What I Did

❯ icenet_data_era5 south -d --vars tas --levels '' 2022-6-1 2022-6-2
# Runs fine
❯ icenet_data_era5 south -d --vars tas --levels '' 2022-6-1 2022-6-2
# Error

Since this is the same as using the entrypoints like the above command, this might be unnecessary, but the error is the same via the library usage (the date range does not matter). i.e. Running following script twice causes the issue (relevant part is delete_tempfiles=False):

import numpy as np
import pandas as pd

# We also set the logging level so that we get some feedback from the API
import logging
logging.basicConfig(level=logging.INFO)

from icenet.data.interfaces.cds import ERA5Downloader

era5 = ERA5Downloader(
    var_names=["tas"],
    dates=[
        pd.to_datetime(date).date()
        for date in pd.date_range("2023-01-01", "2023-01-01", freq="D")
    ],
    delete_tempfiles=False,
    download=True,
    levels=[None],
    max_threads=8,
    postprocess=True,
    north=False,
    south=True,
    use_toolbox=False)
era5.download()
era5.regrid()

The error is:

❯ icenet_data_era5 south -d --vars tas --levels '' 2022-6-1 2022-6-2
[09-03-24 15:39:54 :INFO    ] - ERA5 Data Downloading
[09-03-24 15:39:54 :WARNING ] - !!! Deletions of temp files are switched off: be careful with this, you need to manage your files manually
[09-03-24 15:39:54 :INFO    ] - Building request(s), downloading and daily averaging from ERA5 API
[09-03-24 15:39:54 :INFO    ] - Processing single download for tas @ None with 2 dates
[09-03-24 15:39:54 :INFO    ] - Downloading data for tas...
2024-03-09 15:39:55,060 INFO Welcome to the CDS
[09-03-24 15:39:55 :INFO    ] - Welcome to the CDS
2024-03-09 15:39:55,061 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
[09-03-24 15:39:55 :INFO    ] - Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2024-03-09 15:39:55,235 INFO Request is completed
[09-03-24 15:39:55 :INFO    ] - Request is completed
2024-03-09 15:39:55,236 INFO Downloading https://download-0018.copernicus-climate.eu/cache-compute-0018/cache/data1/adaptor.mars.internal-1709912224.1319973-6433-16-ed132964-18c4-4ef0-8c01-93aef39a8b1c.nc to /tmp/tmpmbyjwlqm/latlon_2022.nc.download (713.9M)
[09-03-24 15:39:55 :INFO    ] - Downloading https://download-0018.copernicus-climate.eu/cache-compute-0018/cache/data1/adaptor.mars.internal-1709912224.1319973-6433-16-ed132964-18c4-4ef0-8c01-93aef39a8b1c.nc to /tmp/tmpmbyjwlqm/latlon_2022.nc.download (713.9M)
2024-03-09 15:41:08,107 INFO Download rate 9.8M/s
[09-03-24 15:41:08 :INFO    ] - Download rate 9.8M/s
[09-03-24 15:41:08 :INFO    ] - Download completed: /tmp/tmpmbyjwlqm/latlon_2022.nc.download
/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/downloader.py:287: UserWarning: Times can't be serialized faithfully to int64 with requested units 'days since 2022-01-01'. Serializing with units 'hours since 2022-01-01' instead. Set encoding['dtype'] to floating point dtype to serialize with units 'days since 2022-01-01'. Set encoding['units'] to 'hours since 2022-01-01' to silence this warning .
  da.to_netcdf(latlon_path)
[09-03-24 15:41:17 :INFO    ] - Downloaded to ./data/era5/south/tas/latlon_2022.nc
[09-03-24 15:41:17 :INFO    ] - Postprocessing CDS API data at ./data/era5/south/tas/latlon_2022.nc
[09-03-24 15:41:18 :ERROR   ] - Thread failure: size of dimension 'latitude' on inputs was unexpectedly changed by applied function from 361 to 1. Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size. The data returned was:

array([], shape=(0, 1, 1), dtype=float32)
Traceback (most recent call last):
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/downloader.py", line 226, in download
    future.result()
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/downloader.py", line 302, in _single_download
    self.postprocess(var, latlon_path)
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/cds.py", line 226, in postprocess
    da = da.where(da.time < pd.Timestamp(strip_dates_before), drop=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/xarray/core/common.py", line 1250, in where
    return ops.where_method(self, cond, other)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/xarray/core/ops.py", line 179, in where_method
    return apply_ufunc(
           ^^^^^^^^^^^^
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/xarray/core/computation.py", line 1270, in apply_ufunc
    return apply_dataarray_vfunc(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/xarray/core/computation.py", line 316, in apply_dataarray_vfunc
    result_var = func(*data_vars)
                 ^^^^^^^^^^^^^^^^
  File "/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/xarray/core/computation.py", line 860, in apply_variable_ufunc
    raise ValueError(
ValueError: size of dimension 'latitude' on inputs was unexpectedly changed by applied function from 361 to 1. Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size. The data returned was:

array([], shape=(0, 1, 1), dtype=float32)
[09-03-24 15:41:18 :INFO    ] - 0 daily files downloaded
[09-03-24 15:41:18 :INFO    ] - No regrid batches to processing, moving on...
[09-03-24 15:41:18 :INFO    ] - Rotating wind data prior to merging
/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/downloader.py:361: FutureWarning: Ignoring a datum in netCDF load for consistency with existing behaviour. In a future version of Iris, this datum will be applied. To apply the datum when loading, use the iris.FUTURE.datum_support flag.
  iris.load_cube(sic_day_path, 'sea_ice_area_fraction')
/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/downloader.py:361: FutureWarning: Ignoring a datum in netCDF load for consistency with existing behaviour. In a future version of Iris, this datum will be applied. To apply the datum when loading, use the iris.FUTURE.datum_support flag.
  iris.load_cube(sic_day_path, 'sea_ice_area_fraction')
/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/downloader.py:361: FutureWarning: Ignoring a datum in netCDF load for consistency with existing behaviour. In a future version of Iris, this datum will be applied. To apply the datum when loading, use the iris.FUTURE.datum_support flag.
  iris.load_cube(sic_day_path, 'sea_ice_area_fraction')
/data/hpcdata/users/username/miniconda3/envs/icenet-nb/lib/python3.11/site-packages/icenet/data/interfaces/downloader.py:361: FutureWarning: Ignoring a datum in netCDF load for consistency with existing behaviour. In a future version of Iris, this datum will be applied. To apply the datum when loading, use the iris.FUTURE.datum_support flag.
  iris.load_cube(sic_day_path, 'sea_ice_area_fraction')
[09-03-24 15:41:19 :INFO    ] - Rotating wind data in ./data/era5/south/uas ./data/era5/south/vas
[09-03-24 15:41:19 :INFO    ] - 0 files for uas
[09-03-24 15:41:19 :INFO    ] - 0 files for vas

Additional information

Relevant part of the code (seems like a known issue in the codebase based on the FIXME comment):

# There are situations where the API will spit out unordered and
# partial data, so we ensure here means come from full days and don't
# leave gaps. If we can avoid expver with this, might as well, so
# that's second
# FIXME: This will cause issues for already processed latlon data
if len(doy_counts[doy_counts < 24]) > 0:
strip_dates_before = min([
dt.datetime.strptime(
"{}-{}".format(d,
pd.to_datetime(da.time.values[0]).year),
"%j-%Y")
for d in doy_counts[doy_counts < 24].dayofyear.values
])
da = da.where(da.time < pd.Timestamp(strip_dates_before), drop=True)
if 'expver' in da.coords:
logging.warning("expvers {} in coordinates, will process out but "
"this needs further work: expver needs storing for "
"later overwriting".format(da.expver))
# Ref: https://confluence.ecmwf.int/pages/viewpage.action?pageId=173385064
da = da.sel(expver=1).combine_first(da.sel(expver=5))
da = da.sortby("time").resample(time='1D').mean()
da.to_netcdf(download_path)

Relates to #188

@bnubald bnubald added the bug Something isn't working label Mar 10, 2024
@bnubald
Copy link
Collaborator Author

bnubald commented Mar 10, 2024

Current way of getting around the error is by making sure the temp files are deleted, upon completion i.e., running without the -d flag:

❯ icenet_data_era5 south --vars tas --levels '' 2022-6-1 2022-6-2

@bnubald bnubald added the defensive Attempting to cover different error scenarios label Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working defensive Attempting to cover different error scenarios
Projects
None yet
Development

No branches or pull requests

1 participant