Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark #27

Open
peterdudfield opened this issue Dec 19, 2023 · 10 comments
Open

Benchmark #27

peterdudfield opened this issue Dec 19, 2023 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@peterdudfield
Copy link
Contributor

peterdudfield commented Dec 19, 2023

Detailed Description

It would be great to bench mark the model

Context

Always good to benchmark

Possible Implementation

  • a model could use the mean pv value, obviously this model will be bad, but it gives a bit of context to the numbers on the evaluation
@peterdudfield peterdudfield mentioned this issue Dec 19, 2023
21 tasks
@peterdudfield peterdudfield added the help wanted Extra attention is needed label Jan 8, 2024
@felipewhitaker
Copy link

Hi! Could I take this one? Is there any deadline? I would aim at doing it in the following weeks.

@peterdudfield
Copy link
Contributor Author

Thanks @felipewhitaker , there is no deadline, So really appreciate you taking this on

@ombhojane
Copy link
Contributor

Hello, can anyone please guide to how to perform it in correct way
I'm thinking to perform evaluation using Mean Absolute Error to compare with train and valid data of PV values
Is this is a correct way, it would be great if you explain it once

@felipewhitaker
Copy link

@ombhojane, it is quite common to use Mean Absolute Error (MAE) for evaluating models, including in the weather research area. Another common metric is Continuous Ranked Probability Score (CRPS), which is a generalization of MAE to take scenarios into consideration (properscoring has an implementation of it).

Independent of the metric, what do you expect to be a correct way? When comparing models, it is important that both are compared by using a dataset that neither have used to learn (test dataset), and that the comparison is fair (it doesn't make much sense to compare two models that predict different things).

@felipewhitaker
Copy link

After exploring psp, my next step is to use the dataset available in Hugging Face (linked in the first comment of #30) to make an historic average model. What interface should it support? The current model has some attributes (e.g. _config, _nwp_tolerance, _nwp_dropout): should every model include these?

@peterdudfield
Copy link
Contributor Author

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

@peterdudfield
Copy link
Contributor Author

@felipewhitaker
Copy link

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

It does help, thanks! I might've missed some details there. Moreover, is there a file containg how the current model was trained (which I believe is in psp)? It would be nice to be able to use the same rough steps.

@ombhojane
Copy link
Contributor

After exploring psp, my next step is to use the dataset available in Hugging Face (linked in the first comment of #30) to make an historic average model. What interface should it support? The current model has some attributes (e.g. _config, _nwp_tolerance, _nwp_dropout): should every model include these?

  • Thanks for referencing a prerequisite and suggestions, makes much clarity.

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Mar 14, 2024

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

It does help, thanks! I might've missed some details there. Moreover, is there a file containg how the current model was trained (which I believe is in psp)? It would be nice to be able to use the same rough steps.

The running of the model is in here - https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecasts/v1.py. I'm hoping we can make v2, v3, ... e.t.c.
The actual model is in pv-site-prediction but I'm not sure its worth going into that code as it might be a bit dense. The train script is here though

A really simple benchmark could be the prediction is always half the capacity and then run the evaluation. Oviously it would a very bad model, but helps give an impression on what the MAE numbers mean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants