Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean irradience data #112

Open
2 of 4 tasks
peterdudfield opened this issue Apr 22, 2024 · 2 comments
Open
2 of 4 tasks

Mean irradience data #112

peterdudfield opened this issue Apr 22, 2024 · 2 comments

Comments

@peterdudfield
Copy link
Contributor

peterdudfield commented Apr 22, 2024

We've noticed that some of the Icon DWD Huggingface Irradience data is the mean, not the hourly average

Detailed Description

import xarray as xr
import ocf_blosc2
file = 'zip:///::hf://datasets/openclimatefix/dwd-icon-eu/data/2021/1/1/20210101_00.zarr.zip'
data = xr.open_zarr( f"{file}",chunks="auto")
dd = data['aswdifd_s']
dd.mean(dim=['latitude','longitude']).plot()
Screenshot 2024-04-22 at 17 37 29

Context

Our model is trained on hourly average data, so this may casue the evaluation to underperform

Possible Implementation

  • Have transformer back to hourly data. I think you take the differences, and then times by the number of datapoints.
  • Check live data doesnt have this problem
  • Check other variables dont have this problem
  • Check other data on hugginface, do we also see this? Looks similar
@peterdudfield
Copy link
Contributor Author

Looking at the following variables

  • t_2m: Looks fine, hourly
  • tot_prec: This looks cumulative, as suggested by the name
  • aswdifd_s: mean since the start of forecast
  • aswdir_s: mean since the start of the forecast
  • clcl: not sure, seem to go between 0 and 100, in this forecast. Not sure if this is hourly mean or mean from the start
  • clcm: not sure, seem to go between 0 and 100, in this forecast. Not sure if this is hourly mean or mean from the start
  • clch: not sure, seem to go between 0 and 100, in this forecast. Not sure if this is hourly mean or mean from the start

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Apr 23, 2024

I think ive managed a algorihtm to change back to average hour, rather than mean across forecast horizon

hours_steps = np.array(dd.step.values/ 3600000000000)
hours_steps_diff = hours_steps[1:] - hours_steps[:-1]

dd_cum_sum = dd*hours_steps[1:93,None,None]
dd_mean= dd_cum_sum.diff("step") / hours_steps_diff[:,None,None]

Might need to add the first step back in

Screenshot 2024-04-23 at 18 17 04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant