-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paper results show different forecast horizon than reported #6
Comments
For clarification, the X_train, y_train = get_rnn_inputs(train,
window_size=params['input_sequence_length'],
horizon=params['output_sequence_length'],
shuffle=True,
multivariate_output=True) From a snippet of the """
:param horizon: int
Forecasting horizon, the number of future steps that have to be forecasted
""" On line 119 of targets.append(
X[i + window_size: i + window_size + horizon]) So there seems to be no correcting factor to convert the desired 24h forecast horizon reported in the paper to a horizon of 96 15-min steps. I believe an erratum should be issued for the paper, clarifying the UCI dataset results use a 6h forecast horizon instead of 24h. Your paper was extremely thorough and comprehensive to read, and because of this I am using it as a benchmark, and this Github issue is important for the accuracy and integrity of my research. I understand that you must be busy, but I would appreciate any assistance you could provide. Thank you for your time and effort sharing your source code. |
Hi, I am sorry for the late reply. The UCI dataset results are correct and use a 24h forecast horizon ( To convince that the results in the paper are not obtained using a 6h forecast horizon instead of 24h I runned a simple experiment: I used your settings with a (slightly) higher number of epochs (which is of course not optimal):
obtaining the following results (showed between brackets are the results presented in the paper for GRU-MIMO with a 24h forecast horizon):
As you can see (and I encourage you to try yourself) , the results obtained for a 6h horizon are better then what is presented in the paper even if the training of the model has been cut short. |
Thank you for your response and for clarifying the forecast horizon discrepancy. I appreciate your efforts in running the experiment with my configuration settings and providing the results. However, despite your explanation and the additional experiment, I'm still unable to reproduce the exact results mentioned in the paper for the UCI dataset. Even with a training duration of 200 epochs, the obtained results differ slightly from the reported values. Here are the results I obtained with the updated configuration:
These results were obtained after running the experiment 10 times using this configuration: train: False
dataset: 'uci'
exogenous: False
epochs: 200
batch_size: 1024
input_sequence_length: 384
output_sequence_length: 96
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True I obtained most of these hyperparameters from Table 4 of your paper for GRU-MIMO. I couldn't find the batch_size and learning_rate in the paper, thus i left them with the default values found in this repo. Can you confirm if the paper really used batch_size = 1024 and learning_rate = 0.001 to generate the results? |
Yes the batch size and the learning rate are correct. Notice that, as written in the readme file the code has changed slightly before being published so some differences may be observed. |
Thank you for your previous response. I have thoroughly reviewed the code and could not find any apparent errors or issues that could explain the deviations in the obtained results. I would appreciate it if you could provide more information about the changes made to the code before publication, as this would help me understand the potential factors contributing to the differences. |
The code in the repo is the latest version. The code for the experiments in the paper was refactored and then published here. This is why some differences can be observed. |
I would really appreciate it if you could share the original code used in the experiments, so I can understand the observed differences. |
Unfortunately I only have the refactored code. I am really sorry for that. |
I'm currently reproducing the paper results for the UCI dataset, using the GRU architecture.
My results were almost identical to them:
RMSE: 0.745 ± 0.001
MAE: 0.529 ± 0.002
But looking at the recurrent.yaml file used, the output_sequence_length seems to be only 24 steps ahead:
Due to the UCI dataset having 15min sampling frequency, this means the model is forecasting only 6h into the future, instead of the 24h reported.
The text was updated successfully, but these errors were encountered: