Noorain pgm baseline #3

noorains · 2024-02-13T12:39:56Z

No description provided.

…training.py and not from main.py

Polichinel

A lot of things are happening in this PR. Mostly, I'm confused as to whether this contains more than just the baseline models indicated by the title.

The task was to create a no-change baseline model and a zero baseline model, but a lot of other things appear to be happening here.

I need this PR to focus on these two baselines, or at least the readmes and the title should reflect what is going on otherwise.

The fact that there are no docstrings or comments does also not help me understand what is happening. I need more inline code documentation to parse what is happening without needing to run this on my own machine

Lastly, have you checked the documentation regarding the repository structure and naming? And if so, where? If we have something misleading around, it should be removed (yes, I realize that having text on both GitHub and Notion can result in one source not being updated...). Anyway, please make sure to consult the documentation before making a PR so that trivial issues like this are in order.

We'll set up a GitHub action set of rules soon enough to catch these things automatically, so we might as well get used to reviewing the documentation carefully (Sara will be setting up a wiki when she has a moment where she is not actively working on the repository code, and hopefully, you'll have an easier time finding what you need).

Polichinel · 2024-02-29T22:29:27Z

.gitignore

Why are you deleting these things in the gitignore? Sure a lot of the things are not currently relevant but many of the things you do delete could become relevant very quickly.

You can of course add to the gitignore all you want - to the extent that it does not interfere with other people's code, but there is usually no reason for you to delete stuff in here. If, for some reason we need to delete something in here, please raise the issue. As I see it this have nothing to do with anything else in this PR - but please correct me if I'm wrong?

So please undelete and add you own stuff - unless very good reason, then I'm all ears

Thanks for this comment. I assumed that some of the frameworks like Django, Scrapy, Flask and various file types were not relevant to our models and data pipeline. So I only kept basic files in .gitignore relevant to the model folder, Python, parquet, Jupyter. If all the others are relevant I can add them back for future. :)

Polichinel · 2024-02-29T22:31:27Z

models/blue_sea/README.md

Please write a small paragraph about what the point of this model is. Why do we have a zero model? What does it mean that it is a baseline? Just you two or three sentences more and we're good

Noted. I'll update the readme.

Polichinel · 2024-02-29T22:32:15Z

models/blue_sea/artifacts/metadata_dict.py

We'll might change this format soon - but for now don't worry about it.

Polichinel · 2024-02-29T23:26:07Z

models/blue_sea/configs/config.py

It is clear after the retreat that the config structure should be different. I'll present the structure on Monday, so unless this PR takes a long time to resolve I suggest you just leave the configs as is, for now. We'll make a new branch to fix those next week instead.

Polichinel · 2024-02-29T23:28:35Z

models/blue_sea/main.py

@@ -0,0 +1,21 @@
+from ..blue_sea.src.dataloaders.fetch_data_run_query import fetch_data


I'm not a big fan of implicit relative imports, such as from . import module. I think it is a bit more prudent that we are more explicit (yet still relative), for instance:

# Set the base path relative to the current script's location src_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) # Alternative version using pathlib's Path (potentially more best-practice) # src_path = f"{Path(__file__).parent.parent}" # Add the required directories to the system path sys.path.insert(0, os.path.join(src_path, "architectures")) sys.path.insert(0, os.path.join(src_path, "configs")) sys.path.insert(0, os.path.join(src_path, "utils")) # Now a module from, e.g., utils can be loaded directly from utils import util_module

Note that if everyone, thinks I'm being silly here I am willing to discuss it, but currently this is my conviction

Hi, thanks for this comment. I will modify imports using sys if that is the requirement. :)

Polichinel · 2024-02-29T23:57:45Z

models/green_oracle/src/evaluation/evaluation_mse.py

+import wandb
+def evaluate_mse() -> float:
+    data = pd.read_parquet(
+        f"{Path(__file__).parent.parent.parent}/data/generated/forecasts.parquet")
+    mse_mean = 0
+    for i in range(1, 37):
+        mse_mean = mse_mean + mean_squared_error(data['ln_ged_sb_dep'], data[f'step_pred_{i}'])
+    print('The MSE is ', mse_mean/36)
+
+    #project = wandb_config.project_config['project']
+    #entity=wandb_config.project_config['entity']
+
+    #wandb_log(project, entity, mse_mean/36, 'mse_mean_36_months')
+    wandb.log({'Mean MSE 36 months': mse_mean/36})
+
+    return mse_mean/36
+


We don't have a fixe W&B scheme/standard yet, so this is fine for now. But we'll have to converge at some point

Polichinel · 2024-02-29T23:58:20Z

models/green_oracle/src/forecasting/true_future_36m.py

+            "test", "predict", model['RunResult_test'].data)
+
+    predictions_test.to_parquet(
+        f"{Path(__file__).parent.parent.parent}/data/generated/{model['modelname']}_true_forecasts.parquet")


config in future - fine for now

Polichinel · 2024-02-29T23:59:57Z

models/green_oracle/src/forecasting/true_future_36m.py

+    model['RunResult_test'] = RunResult.retrain_or_retrieve(
+        retrain=bool(config.common_config['force_retrain']),


(Whether or not this is a zero baseline or something completely different:) We'll go with a different solution here - please consult Xiolong - or his code in the stepshifter branch (if the beach is not there is is simply because it has been merge to main)

blue_sea is all zero baseline, yellow_duck is no change model. :) green_oracle which is referred here is a test model. I'll change the readme of green_oracle.

Polichinel · 2024-03-01T00:01:16Z

models/green_oracle/src/forecasting/true_future_36m.py

This seems like a lot of hassle for a zero baseline model? I'm starting to suspect that is not what is happening here? Either the readme is wrong or?

blue_sea is all zero baseline, yellow_duck is no change model. :) green_oracle which is referred here is a test model. I'll change the readme of green_oracle.

Polichinel · 2024-03-01T00:02:41Z

models/green_oracle/src/training/training.py

+    model['RunResult_calib'] = RunResult.retrain_or_retrieve(
+        retrain=bool(config.common_config['force_retrain']),


We'll go another way - please consult Xiaolong or his code in the stepshifter branch (if deleted, it is merged with main)

Hi, thanks! baseline models will not require this for now and so I'll take the latest trainer for future models.

noorains · 2024-03-26T12:53:20Z

Hi, thank you for the feedback. I think I have taken care of most of the comments and suggestions. This branch would require a check of the outputs from the models so that we are sure that the baseline models are forecasting in the way that we expect them to. If this can be checked then we are ready to merge this branch to main. I have provided with enough documentation and docstrings in this branch. I am sure that this branch can be cloned and run locally. Please give me a feedback if there is an issue running the branch and if the outputs are not as expected. This will also check that the branch have no issues running on clients which were not used for development. It can work well with Prefect workflow code, I have tested it using Prefect. Just use the main.py files in the model specific folder to run them through Prefect or through Terminal. Also, as decided in one of our meetings, the models store the forecasts locally and not on the prediction store. So, please look into the data/generated/ folder to download the predictions in the form of parquet file. Thanks!

sarakallis · 2024-04-08T06:24:51Z

Morning! Here are some comments while we wait for Simon and Xiaolong to get back.

General comments here, more specific ones in code:

I wonder why there is __init__.py in every folder. I assume this would only needed for packages like Xiaolong's views_stepshift that we have in common_utils. The rest of us 3 also don't have these – so it should be redundant, no?
Please make sure that you adhere to the file naming conventions that are always most up-to-date in the main README (you have to toggle it now to see it). Changes will be needed in the py file names in dataloaders, evaluation, forecasting, artifacts, and configs.
Have a look at the new path solution outlined here.
I couldn't find wandb initialization in blue_sea -- could you re-add this to main.py? Same goes for yellow_duck, where it's in the evaluation script. On that note, we are implementing more evaluation metrics than MSE, outlined here. I'm also still working on implementing this, but if you have time it would be good to start looking through it as well :)

noorains added 7 commits February 13, 2024 08:56

3 baseline pgm models with maps

da6e728

folder names changed to adjective_noun

d4efb65

running wandb function in utils.py

51be98f

included wandb in evaluation of pgm model, this model runs from main_…

9ddcb19

…training.py and not from main.py

added wandb folder to .gitignore

24b2405

wandb code for no change model

ab45c93

refactored wandb initialization and added log statements to visual.py

5803165

Polichinel self-assigned this Feb 29, 2024

Polichinel requested changes Mar 1, 2024

View reviewed changes

noorains added 21 commits March 18, 2024 11:54

updated readme for this test model

0124e80

updated readme of zero baseline model

819797e

merge main to this branch

acecf48

updated description

c58c3fe

removed green_oracle

80b6087

added mean of actuals and removed zeros

9a43a78

removed green_oracle folder

e06d483

stepshifting of forecasts

5befddd

updated description

4aaf89e

updated no change description

a10674d

Update model description

01686e1

updated shifting

aae449b

changed the name of forecast script to standard format

edc78e3

removed comments

88862a0

changed name of zero baseline variable

59da286

updated .gitignore with **/.DS_Store

a763aa2

deleted redundant file

9fdbf44

using subprocess to orchestrate

1680293

added deployment

354747f

added keyboard interrupt

825337b

import simplification

f539de7

noorains and others added 6 commits March 22, 2024 13:23

added loading message

c3e9ee8

removed requirements.txt

d938025

Delete run_all_models.py

1c8290e

updated requirements

79a3690

keeping folders on github

2e1540e

added docstrings

7f01c3b

noorains added 2 commits April 8, 2024 10:45

added wandb logging to zero baseline model

53d9d78

keeping folders in data

e12c08e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noorain pgm baseline #3

Noorain pgm baseline #3

noorains commented Feb 13, 2024

Polichinel left a comment

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Feb 29, 2024

noorains Mar 18, 2024

Polichinel Mar 1, 2024

noorains Mar 18, 2024

Polichinel Mar 1, 2024

noorains Mar 18, 2024

noorains commented Mar 26, 2024

sarakallis commented Apr 8, 2024 •

edited

		@@ -0,0 +1,21 @@
		from ..blue_sea.src.dataloaders.fetch_data_run_query import fetch_data

		model['RunResult_test'] = RunResult.retrain_or_retrieve(
		retrain=bool(config.common_config['force_retrain']),

		model['RunResult_calib'] = RunResult.retrain_or_retrieve(
		retrain=bool(config.common_config['force_retrain']),

Noorain pgm baseline #3

Are you sure you want to change the base?

Noorain pgm baseline #3

Conversation

noorains commented Feb 13, 2024

Polichinel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noorains commented Mar 26, 2024

sarakallis commented Apr 8, 2024 • edited

sarakallis commented Apr 8, 2024 •

edited