Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper results show different forecast horizon than reported #6

Open
tiagoyukio12 opened this issue Apr 17, 2023 · 8 comments
Open

Paper results show different forecast horizon than reported #6

tiagoyukio12 opened this issue Apr 17, 2023 · 8 comments

Comments

@tiagoyukio12
Copy link

I'm currently reproducing the paper results for the UCI dataset, using the GRU architecture.
My results were almost identical to them:
RMSE: 0.745 ± 0.001
MAE: 0.529 ± 0.002

But looking at the recurrent.yaml file used, the output_sequence_length seems to be only 24 steps ahead:

train: False
dataset: 'uci'
exogenous: False
epochs: 1
batch_size: 1024
input_sequence_length: 96
output_sequence_length:  24
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

Due to the UCI dataset having 15min sampling frequency, this means the model is forecasting only 6h into the future, instead of the 24h reported.

@tiagoyukio12
Copy link
Author

For clarification, the config/recurrent.yaml parameter output_sequence_length is used in the dts/examples/recurrent.py main function, in line 62:

X_train, y_train = get_rnn_inputs(train,
                                  window_size=params['input_sequence_length'],
                                  horizon=params['output_sequence_length'],
                                  shuffle=True,
                                  multivariate_output=True)

From a snippet of the get_rnn_inputs docstring (implemented in dts/utils/split.py):

"""
:param horizon: int
    Forecasting horizon, the number of future steps that have to be forecasted
"""

On line 119 of dts/utils/split.py, we can see the targets list is made of slices of size horizon:

targets.append(
                X[i + window_size: i + window_size + horizon])

So there seems to be no correcting factor to convert the desired 24h forecast horizon reported in the paper to a horizon of 96 15-min steps.

I believe an erratum should be issued for the paper, clarifying the UCI dataset results use a 6h forecast horizon instead of 24h.

Your paper was extremely thorough and comprehensive to read, and because of this I am using it as a benchmark, and this Github issue is important for the accuracy and integrity of my research.

I understand that you must be busy, but I would appreciate any assistance you could provide. Thank you for your time and effort sharing your source code.

@albertogaspar
Copy link
Owner

Hi, I am sorry for the late reply. The UCI dataset results are correct and use a 24h forecast horizon (output_sequence_length: 96) using input_sequence_length: 384.
I see that in your config file you are using a single epoch, which makes the learning process too short. In the paper I used 200.

To convince that the results in the paper are not obtained using a 6h forecast horizon instead of 24h I runned a simple experiment: I used your settings with a (slightly) higher number of epochs (which is of course not optimal):

train: False
dataset: 'uci'
exogenous: False
epochs: 10
batch_size: 1024
input_sequence_length: 96
output_sequence_length:  24
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

obtaining the following results (showed between brackets are the results presented in the paper for GRU-MIMO with a 24h forecast horizon):

RMSE: 0.72 (0.75 ± 0.0)
MAE: 0.514 (0.52± 0.0)
NRMSE: 9.47 (9.83 ± 0.03)
R2: 0.333 (0.279 ± 0.004)

As you can see (and I encourage you to try yourself) , the results obtained for a 6h horizon are better then what is presented in the paper even if the training of the model has been cut short.

@tiagoyukio12
Copy link
Author

Thank you for your response and for clarifying the forecast horizon discrepancy. I appreciate your efforts in running the experiment with my configuration settings and providing the results.

However, despite your explanation and the additional experiment, I'm still unable to reproduce the exact results mentioned in the paper for the UCI dataset. Even with a training duration of 200 epochs, the obtained results differ slightly from the reported values.

Here are the results I obtained with the updated configuration:

  • RMSE: 0.67 ± 0.01
  • MAE: 0.51 ± 0.01
{'mse': 0.43373233, 'mae': 0.49384913, 'nrmse_a': 1.3187281, 'nrmse_b': 745.49725, 'nrmsd': 0.1212474, 'r2': 0.3009019, 'smape': 35.21651, 'mape': 72.9558}
{'mse': 0.45289233, 'mae': 0.51428014, 'nrmse_a': 1.295853, 'nrmse_b': 761.7854, 'nrmsd': 0.123896495, 'r2': 0.27001953, 'smape': 37.224, 'mape': 80.34695}
{'mse': 0.4351429, 'mae': 0.49716938, 'nrmse_a': 1.2940199, 'nrmse_b': 746.70856, 'nrmsd': 0.121444404, 'r2': 0.29862827, 'smape': 35.53708, 'mape': 71.81708}
{'mse': 0.49264866, 'mae': 0.54142165, 'nrmse_a': 1.2457576, 'nrmse_b': 794.51807, 'nrmsd': 0.12922013, 'r2': 0.20593947, 'smape': 38.69682, 'mape': 86.43704}
{'mse': 0.43954045, 'mae': 0.50468266, 'nrmse_a': 1.2942526, 'nrmse_b': 750.4721, 'nrmsd': 0.122056514, 'r2': 0.29154032, 'smape': 36.274437, 'mape': 76.50844}
{'mse': 0.4387224, 'mae': 0.5003349, 'nrmse_a': 1.3664513, 'nrmse_b': 749.7735, 'nrmsd': 0.12194287, 'r2': 0.29285884, 'smape': 35.66649, 'mape': 73.54895}
{'mse': 0.45094696, 'mae': 0.5106664, 'nrmse_a': 1.3728217, 'nrmse_b': 760.1476, 'nrmsd': 0.123630114, 'r2': 0.27315503, 'smape': 36.90767, 'mape': 79.35831}
{'mse': 0.46145782, 'mae': 0.52123094, 'nrmse_a': 1.2773947, 'nrmse_b': 768.9555, 'nrmsd': 0.12506263, 'r2': 0.25621337, 'smape': 37.673584, 'mape': 81.25483}
{'mse': 0.4475309, 'mae': 0.5054144, 'nrmse_a': 1.2984899, 'nrmse_b': 757.2628, 'nrmsd': 0.12316096, 'r2': 0.27866113, 'smape': 36.24459, 'mape': 75.25322}
{'mse': 0.4330459, 'mae': 0.5002167, 'nrmse_a': 1.2826424, 'nrmse_b': 744.9071, 'nrmsd': 0.12115142, 'r2': 0.3020084, 'smape': 36.32402, 'mape': 75.93276}

These results were obtained after running the experiment 10 times using this configuration:

train: False
dataset: 'uci'
exogenous: False
epochs: 200
batch_size: 1024
input_sequence_length: 384
output_sequence_length:  96
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

I obtained most of these hyperparameters from Table 4 of your paper for GRU-MIMO.

I couldn't find the batch_size and learning_rate in the paper, thus i left them with the default values found in this repo.

Can you confirm if the paper really used batch_size = 1024 and learning_rate = 0.001 to generate the results?

@albertogaspar
Copy link
Owner

Yes the batch size and the learning rate are correct. Notice that, as written in the readme file the code has changed slightly before being published so some differences may be observed.

@tiagoyukio12
Copy link
Author

Thank you for your previous response. I have thoroughly reviewed the code and could not find any apparent errors or issues that could explain the deviations in the obtained results. I would appreciate it if you could provide more information about the changes made to the code before publication, as this would help me understand the potential factors contributing to the differences.
Alternatively, if possible, could you share with me the latest version of the code?
Thank you for your assistance, and I look forward to your response.

@albertogaspar
Copy link
Owner

The code in the repo is the latest version. The code for the experiments in the paper was refactored and then published here. This is why some differences can be observed.

@tiagoyukio12
Copy link
Author

I would really appreciate it if you could share the original code used in the experiments, so I can understand the observed differences.

@albertogaspar
Copy link
Owner

Unfortunately I only have the refactored code. I am really sorry for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants