-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noorain ensembling #4
base: main
Are you sure you want to change the base?
Conversation
…training.py and not from main.py
…on storage as parquet file with one target column and 36 predictions step-wise
…orecasts in that particular run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've noticed that this Pull Request (PR) seems to be mixed up with your baseline_pgm branch, resulting in changes being scattered throughout. Consequently, I'm stopping the review here.
Please create a PR specifically for the baseline models and another separate one for the ensembles. Otherwise, I'm just going over code that I've already reviewed, which is not an efficient use of time.
Unless there's a very compelling reason to build off a branch that has not yet been merged into the main branch, I strongly advise starting a new branch from the main for each new project. For instance, when transitioning from working on baseline_pgm
to working on ensembles
, it's best practice to initiate the latter from the main branch. This approach helps in maintaining a clear and organized codebase, making it easier to review and manage changes. It also avoids the complications that arise from intertwining the code of different features or projects, which can lead to confusion and errors. By following this guideline, we ensure that each feature or improvement is isolated, making it simpler to track progress, identify issues, and merge updates into the main branch once they are ready and approved.
The simplest solution for you here is to delete the baseline models on this branch. I'll take a look again after this is done and after some docstrings and comments are added for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the other PR why are you deleting this?
def ensemble_median(): | ||
|
||
forecasts = {} | ||
path_to_test = os.path.dirname(os.getcwd()) | ||
print(path_to_test) | ||
for file in glob.glob(path_to_test + "/views_pipeline/models/**/data/generated/forecasts.parquet", recursive=True): | ||
print(file) | ||
# get the model name from the file path | ||
model_name = file.split('/')[-4] | ||
forecasts[model_name] = pd.read_parquet(file) | ||
|
||
|
||
ensemble = pd.concat(forecasts.values()).groupby(level=[0,1]).median() | ||
ensemble = ensemble.filter(regex='^step_pred_|^ln_ged_sb_dep$') | ||
|
||
# get the directory of this function | ||
function_file = inspect.getfile(ensemble_median) | ||
function_dir = os.path.dirname(function_file) | ||
parent_dir = os.path.dirname(function_dir) | ||
ensemble_model_name = parent_dir.split('/')[-2] | ||
ensemble.forecasts.to_store(ensemble_model_name, overwrite=True) | ||
print(ensemble) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docstrings and comments where needed.
Docstrings tell you what is happening, comments why/how it is done (in the best of worlds)
import inspect | ||
|
||
def ensemble_mean(): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the structure is subject to chang, I'm not gonna comment on the configs
from ..blue_sea.src.dataloaders.fetch_data_run_query import fetch_data | ||
from ..blue_sea.src.forecasting.true_future_36m import forecast | ||
from ..blue_sea.src.evaluation.evaluation_mse import evaluate_mse | ||
from ..blue_sea.configs import config | ||
|
||
from .src.utils.utils import wandb_init, wandb_finish | ||
from .configs import wandb_config | ||
from ..blue_sea.src.visualization.visual import visualize_forecasts_in_maps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a comment on this in the baseline_pgm PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will den up in config after new structure implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this aligned with Malika's work? And is this not also in a different PR? or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, I know this is also in a different PR? If they are all mixing up, this makes me use a lot more time here. Please be a bit considered here.
from ..green_oracle.src.dataloaders.fetch_data_run_query import fetch_data | ||
from ..green_oracle.src.forecasting.true_future_36m import forecast | ||
from ..green_oracle.src.evaluation.evaluation_mse import evaluate_mse | ||
from ..green_oracle.configs import config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I realize this PR is mixup with your baseline pgm branch - so all the changes are everywhere. As such I stop the review here.
Please have a PR specifically for the baseline and a separate one for the ensembles. Otherwise, I do nothing but go over code I've already been over...
Unless you have a very good reason to build of a branch that has not yet been merged to main, please make a new branch from main when starting on a new project - I.e. when you go from making baseline_pgm to ensembles.
I think the easiest solution here is for you to delete your baseline models on this branch - should not be a big deal.
No description provided.