Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix results from the Python/bash pipeline #100

Open
1 of 5 tasks
Tracked by #93
griff-rees opened this issue Dec 5, 2023 · 5 comments
Open
1 of 5 tasks
Tracked by #93

Fix results from the Python/bash pipeline #100

griff-rees opened this issue Dec 5, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@griff-rees
Copy link
Collaborator

griff-rees commented Dec 5, 2023

We were able to transfer the results but found a bug in the date range (I think that was the core @RuthBowyer? sorry if I'm misremembering). Below are components to check that may have a played a role as best I can remember:

@RuthBowyer
Copy link
Collaborator

Hey there - came to add to this - I believe the issue was that per year, there were not 360 days in the corrected dataset. This might be fixed anyway when we update the resampling as in #106 - but it could have been introduced at a later stage?

@griff-rees griff-rees added the bug Something isn't working label Jan 19, 2024
@griff-rees griff-rees changed the title Results from the Python pipeline Results from the Python/bash pipeline Jan 19, 2024
@griff-rees griff-rees changed the title Results from the Python/bash pipeline Fix results from the Python/bash pipeline Feb 5, 2024
@andrewphilipsmith
Copy link
Collaborator

andrewphilipsmith commented Feb 5, 2024

Note (for future investigation):

In python/resampling/resampling_hads.py we call

data_360 = data.convert_calendar(dim="time", calendar="360_day", align_on="year")

followed by

# apply correction if leap year
if data.time.dt.is_leap_year.any():
    data_360 = enforce_date_dropping(data, data_360)

However in python/load_data/data_loader.py we just call without calling enforce_date_dropping:

x = x.convert_calendar(dim="time", calendar="360_day", align_on="year")

@griff-rees
Copy link
Collaborator Author

griff-rees commented Feb 5, 2024

Thanks @andrewphilipsmith. Just for continuity since comments added in #122, the enforce_date_dropping function is written within python/resampling/resampling_hads.py and seems a local attempt to address pydata/xarray#8086 which was raised in relation to #32.

@griff-rees
Copy link
Collaborator Author

griff-rees commented Feb 5, 2024

@RuthBowyer, do you know the details of the method used on the R side, and whether it's dropping days (which I think is what enforce_date_dropping is designed to do) versus a decimal interpolation like interp_calendar in xarray? I'm guessing this relates to #106 and #32 at least.

@RuthBowyer
Copy link
Collaborator

Hey there
So sorry for the lack of clarity on this. The 'R pipeline' only ever had code to resample the data to the same grid, not calendar days - because when we were at this stage during our initial workings we were focused on the whole of the UK, we decided to only do the resampling of the calendar days in python, and then use this input into the R work. I'm so sorry this is so unclear and messy! The original plan was to have a single pre-processing pipeline that then had data useable in both R and python, but that's not how it all worked out, apologies!
When we realised the calendar day problem, for the previous R running of everything, I just did a 'hacky' fix of the days and, as at that time the dataset had too many days in leap years just hard coded the removal of the extra days in those years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants