-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noorain pgm baseline #3
base: main
Are you sure you want to change the base?
Conversation
…training.py and not from main.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of things are happening in this PR. Mostly, I'm confused as to whether this contains more than just the baseline models indicated by the title.
The task was to create a no-change baseline model and a zero baseline model, but a lot of other things appear to be happening here.
I need this PR to focus on these two baselines, or at least the readmes and the title should reflect what is going on otherwise.
The fact that there are no docstrings or comments does also not help me understand what is happening. I need more inline code documentation to parse what is happening without needing to run this on my own machine
Lastly, have you checked the documentation regarding the repository structure and naming? And if so, where? If we have something misleading around, it should be removed (yes, I realize that having text on both GitHub and Notion can result in one source not being updated...). Anyway, please make sure to consult the documentation before making a PR so that trivial issues like this are in order.
We'll set up a GitHub action set of rules soon enough to catch these things automatically, so we might as well get used to reviewing the documentation carefully (Sara will be setting up a wiki when she has a moment where she is not actively working on the repository code, and hopefully, you'll have an easier time finding what you need).
.gitignore
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you deleting these things in the gitignore? Sure a lot of the things are not currently relevant but many of the things you do delete could become relevant very quickly.
You can of course add to the gitignore all you want - to the extent that it does not interfere with other people's code, but there is usually no reason for you to delete stuff in here. If, for some reason we need to delete something in here, please raise the issue. As I see it this have nothing to do with anything else in this PR - but please correct me if I'm wrong?
So please undelete and add you own stuff - unless very good reason, then I'm all ears
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment. I assumed that some of the frameworks like Django, Scrapy, Flask and various file types were not relevant to our models and data pipeline. So I only kept basic files in .gitignore relevant to the model folder, Python, parquet, Jupyter. If all the others are relevant I can add them back for future. :)
models/blue_sea/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write a small paragraph about what the point of this model is. Why do we have a zero model? What does it mean that it is a baseline? Just you two or three sentences more and we're good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. I'll update the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll might change this format soon - but for now don't worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
models/blue_sea/configs/config.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is clear after the retreat that the config structure should be different. I'll present the structure on Monday, so unless this PR takes a long time to resolve I suggest you just leave the configs as is, for now. We'll make a new branch to fix those next week instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
models/blue_sea/main.py
Outdated
@@ -0,0 +1,21 @@ | |||
from ..blue_sea.src.dataloaders.fetch_data_run_query import fetch_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of implicit relative imports, such as from . import module. I think it is a bit more prudent that we are more explicit (yet still relative), for instance:
# Set the base path relative to the current script's location
src_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Alternative version using pathlib's Path (potentially more best-practice)
# src_path = f"{Path(__file__).parent.parent}"
# Add the required directories to the system path
sys.path.insert(0, os.path.join(src_path, "architectures"))
sys.path.insert(0, os.path.join(src_path, "configs"))
sys.path.insert(0, os.path.join(src_path, "utils"))
# Now a module from, e.g., utils can be loaded directly
from utils import util_module
Note that if everyone, thinks I'm being silly here I am willing to discuss it, but currently this is my conviction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for this comment. I will modify imports using sys if that is the requirement. :)
import wandb | ||
def evaluate_mse() -> float: | ||
data = pd.read_parquet( | ||
f"{Path(__file__).parent.parent.parent}/data/generated/forecasts.parquet") | ||
mse_mean = 0 | ||
for i in range(1, 37): | ||
mse_mean = mse_mean + mean_squared_error(data['ln_ged_sb_dep'], data[f'step_pred_{i}']) | ||
print('The MSE is ', mse_mean/36) | ||
|
||
#project = wandb_config.project_config['project'] | ||
#entity=wandb_config.project_config['entity'] | ||
|
||
#wandb_log(project, entity, mse_mean/36, 'mse_mean_36_months') | ||
wandb.log({'Mean MSE 36 months': mse_mean/36}) | ||
|
||
return mse_mean/36 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a fixe W&B scheme/standard yet, so this is fine for now. But we'll have to converge at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
"test", "predict", model['RunResult_test'].data) | ||
|
||
predictions_test.to_parquet( | ||
f"{Path(__file__).parent.parent.parent}/data/generated/{model['modelname']}_true_forecasts.parquet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config in future - fine for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
model['RunResult_test'] = RunResult.retrain_or_retrieve( | ||
retrain=bool(config.common_config['force_retrain']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Whether or not this is a zero baseline or something completely different:) We'll go with a different solution here - please consult Xiolong - or his code in the stepshifter branch (if the beach is not there is is simply because it has been merge to main)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blue_sea is all zero baseline, yellow_duck is no change model. :) green_oracle which is referred here is a test model. I'll change the readme of green_oracle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of hassle for a zero baseline model? I'm starting to suspect that is not what is happening here? Either the readme is wrong or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blue_sea is all zero baseline, yellow_duck is no change model. :) green_oracle which is referred here is a test model. I'll change the readme of green_oracle.
model['RunResult_calib'] = RunResult.retrain_or_retrieve( | ||
retrain=bool(config.common_config['force_retrain']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll go another way - please consult Xiaolong or his code in the stepshifter branch (if deleted, it is merged with main)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks! baseline models will not require this for now and so I'll take the latest trainer for future models.
Hi, thank you for the feedback. I think I have taken care of most of the comments and suggestions. This branch would require a check of the outputs from the models so that we are sure that the baseline models are forecasting in the way that we expect them to. If this can be checked then we are ready to merge this branch to main. I have provided with enough documentation and docstrings in this branch. I am sure that this branch can be cloned and run locally. Please give me a feedback if there is an issue running the branch and if the outputs are not as expected. This will also check that the branch have no issues running on clients which were not used for development. It can work well with Prefect workflow code, I have tested it using Prefect. Just use the main.py files in the model specific folder to run them through Prefect or through Terminal. Also, as decided in one of our meetings, the models store the forecasts locally and not on the prediction store. So, please look into the data/generated/ folder to download the predictions in the form of parquet file. Thanks! |
Morning! Here are some comments while we wait for Simon and Xiaolong to get back. General comments here, more specific ones in code:
|
No description provided.