Paper results show different forecast horizon than reported #6

tiagoyukio12 · 2023-04-17T13:24:15Z

I'm currently reproducing the paper results for the UCI dataset, using the GRU architecture.
My results were almost identical to them:
RMSE: 0.745 ± 0.001
MAE: 0.529 ± 0.002

But looking at the recurrent.yaml file used, the output_sequence_length seems to be only 24 steps ahead:

train: False
dataset: 'uci'
exogenous: False
epochs: 1
batch_size: 1024
input_sequence_length: 96
output_sequence_length:  24
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

Due to the UCI dataset having 15min sampling frequency, this means the model is forecasting only 6h into the future, instead of the 24h reported.

tiagoyukio12 · 2023-05-02T01:05:09Z

For clarification, the config/recurrent.yaml parameter output_sequence_length is used in the dts/examples/recurrent.py main function, in line 62:

X_train, y_train = get_rnn_inputs(train,
                                  window_size=params['input_sequence_length'],
                                  horizon=params['output_sequence_length'],
                                  shuffle=True,
                                  multivariate_output=True)

From a snippet of the get_rnn_inputs docstring (implemented in dts/utils/split.py):

"""
:param horizon: int
    Forecasting horizon, the number of future steps that have to be forecasted
"""

On line 119 of dts/utils/split.py, we can see the targets list is made of slices of size horizon:

targets.append(
                X[i + window_size: i + window_size + horizon])

So there seems to be no correcting factor to convert the desired 24h forecast horizon reported in the paper to a horizon of 96 15-min steps.

I believe an erratum should be issued for the paper, clarifying the UCI dataset results use a 6h forecast horizon instead of 24h.

Your paper was extremely thorough and comprehensive to read, and because of this I am using it as a benchmark, and this Github issue is important for the accuracy and integrity of my research.

I understand that you must be busy, but I would appreciate any assistance you could provide. Thank you for your time and effort sharing your source code.

albertogaspar · 2023-05-25T07:39:20Z

Hi, I am sorry for the late reply. The UCI dataset results are correct and use a 24h forecast horizon (output_sequence_length: 96) using input_sequence_length: 384.
I see that in your config file you are using a single epoch, which makes the learning process too short. In the paper I used 200.

To convince that the results in the paper are not obtained using a 6h forecast horizon instead of 24h I runned a simple experiment: I used your settings with a (slightly) higher number of epochs (which is of course not optimal):

train: False
dataset: 'uci'
exogenous: False
epochs: 10
batch_size: 1024
input_sequence_length: 96
output_sequence_length:  24
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

obtaining the following results (showed between brackets are the results presented in the paper for GRU-MIMO with a 24h forecast horizon):

RMSE: 0.72 (0.75 ± 0.0)
MAE: 0.514 (0.52± 0.0)
NRMSE: 9.47 (9.83 ± 0.03)
R2: 0.333 (0.279 ± 0.004)

As you can see (and I encourage you to try yourself) , the results obtained for a 6h horizon are better then what is presented in the paper even if the training of the model has been cut short.

tiagoyukio12 · 2023-05-28T15:43:48Z

Thank you for your response and for clarifying the forecast horizon discrepancy. I appreciate your efforts in running the experiment with my configuration settings and providing the results.

However, despite your explanation and the additional experiment, I'm still unable to reproduce the exact results mentioned in the paper for the UCI dataset. Even with a training duration of 200 epochs, the obtained results differ slightly from the reported values.

Here are the results I obtained with the updated configuration:

RMSE: 0.67 ± 0.01
MAE: 0.51 ± 0.01

{'mse': 0.43373233, 'mae': 0.49384913, 'nrmse_a': 1.3187281, 'nrmse_b': 745.49725, 'nrmsd': 0.1212474, 'r2': 0.3009019, 'smape': 35.21651, 'mape': 72.9558}
{'mse': 0.45289233, 'mae': 0.51428014, 'nrmse_a': 1.295853, 'nrmse_b': 761.7854, 'nrmsd': 0.123896495, 'r2': 0.27001953, 'smape': 37.224, 'mape': 80.34695}
{'mse': 0.4351429, 'mae': 0.49716938, 'nrmse_a': 1.2940199, 'nrmse_b': 746.70856, 'nrmsd': 0.121444404, 'r2': 0.29862827, 'smape': 35.53708, 'mape': 71.81708}
{'mse': 0.49264866, 'mae': 0.54142165, 'nrmse_a': 1.2457576, 'nrmse_b': 794.51807, 'nrmsd': 0.12922013, 'r2': 0.20593947, 'smape': 38.69682, 'mape': 86.43704}
{'mse': 0.43954045, 'mae': 0.50468266, 'nrmse_a': 1.2942526, 'nrmse_b': 750.4721, 'nrmsd': 0.122056514, 'r2': 0.29154032, 'smape': 36.274437, 'mape': 76.50844}
{'mse': 0.4387224, 'mae': 0.5003349, 'nrmse_a': 1.3664513, 'nrmse_b': 749.7735, 'nrmsd': 0.12194287, 'r2': 0.29285884, 'smape': 35.66649, 'mape': 73.54895}
{'mse': 0.45094696, 'mae': 0.5106664, 'nrmse_a': 1.3728217, 'nrmse_b': 760.1476, 'nrmsd': 0.123630114, 'r2': 0.27315503, 'smape': 36.90767, 'mape': 79.35831}
{'mse': 0.46145782, 'mae': 0.52123094, 'nrmse_a': 1.2773947, 'nrmse_b': 768.9555, 'nrmsd': 0.12506263, 'r2': 0.25621337, 'smape': 37.673584, 'mape': 81.25483}
{'mse': 0.4475309, 'mae': 0.5054144, 'nrmse_a': 1.2984899, 'nrmse_b': 757.2628, 'nrmsd': 0.12316096, 'r2': 0.27866113, 'smape': 36.24459, 'mape': 75.25322}
{'mse': 0.4330459, 'mae': 0.5002167, 'nrmse_a': 1.2826424, 'nrmse_b': 744.9071, 'nrmsd': 0.12115142, 'r2': 0.3020084, 'smape': 36.32402, 'mape': 75.93276}

These results were obtained after running the experiment 10 times using this configuration:

train: False
dataset: 'uci'
exogenous: False
epochs: 200
batch_size: 1024
input_sequence_length: 384
output_sequence_length:  96
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True

I obtained most of these hyperparameters from Table 4 of your paper for GRU-MIMO.

I couldn't find the batch_size and learning_rate in the paper, thus i left them with the default values found in this repo.

Can you confirm if the paper really used batch_size = 1024 and learning_rate = 0.001 to generate the results?

albertogaspar · 2023-06-07T06:53:20Z

Yes the batch size and the learning rate are correct. Notice that, as written in the readme file the code has changed slightly before being published so some differences may be observed.

tiagoyukio12 · 2023-06-08T23:30:16Z

Thank you for your previous response. I have thoroughly reviewed the code and could not find any apparent errors or issues that could explain the deviations in the obtained results. I would appreciate it if you could provide more information about the changes made to the code before publication, as this would help me understand the potential factors contributing to the differences.
Alternatively, if possible, could you share with me the latest version of the code?
Thank you for your assistance, and I look forward to your response.

albertogaspar · 2023-06-16T06:08:26Z

The code in the repo is the latest version. The code for the experiments in the paper was refactored and then published here. This is why some differences can be observed.

tiagoyukio12 · 2023-06-17T01:12:22Z

I would really appreciate it if you could share the original code used in the experiments, so I can understand the observed differences.

albertogaspar · 2023-06-27T19:32:24Z

Unfortunately I only have the refactored code. I am really sorry for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper results show different forecast horizon than reported #6

Paper results show different forecast horizon than reported #6

tiagoyukio12 commented Apr 17, 2023

tiagoyukio12 commented May 2, 2023

albertogaspar commented May 25, 2023

tiagoyukio12 commented May 28, 2023

albertogaspar commented Jun 7, 2023

tiagoyukio12 commented Jun 8, 2023

albertogaspar commented Jun 16, 2023

tiagoyukio12 commented Jun 17, 2023

albertogaspar commented Jun 27, 2023

Paper results show different forecast horizon than reported #6

Paper results show different forecast horizon than reported #6

Comments

tiagoyukio12 commented Apr 17, 2023

tiagoyukio12 commented May 2, 2023

albertogaspar commented May 25, 2023

tiagoyukio12 commented May 28, 2023

albertogaspar commented Jun 7, 2023

tiagoyukio12 commented Jun 8, 2023

albertogaspar commented Jun 16, 2023

tiagoyukio12 commented Jun 17, 2023

albertogaspar commented Jun 27, 2023