New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]Unexpectedly High Forecast Values in Batch Prediction with cuML's Auto ARIMA #5838
Comments
After several attempts, I determined that the problem was that there were some problems with auto-arima's search. Even for the same sequence, different best model search results may appear under different batch combinations. Here is a minimal example.
|
@Nyrio I noticed that you are the main contributor of this code, could you provide some help and mark this as a feature or I can also try to fix this bug. |
This problem is further confirmed to be that when ARIMA(method='ml'), the fitting results of batch input and individual input are different, but method='css' does not have this problem. |
@Apolsus thanks for the issue and reproducer! Would using the |
no, css method will also cause this in a much larger batchsize (10000). I checked the source code. In theory, the parameter optimization process is performed independently for each sequence, but some sequences do not converge in batch prediction, but converge in individual prediction. |
After looking into it (and with help from @Nyrio ) this seems to stem from numerical stability issues particularly around different code paths for different batch sizes. It might take some time to create workarounds or fixes in general, but we will try to look into it as soon as we can. |
Describe the bug
I'm having some issues using cuML's Auto ARIMA model for large-scale time series forecasting. Specifically, when I tried to do a batch forecast on about 50,000 time series data, I got some unusually high values in the forecast results. However, when I select the unusually sequence from these data and predict it alone, I can get normal prediction results.
Steps/Code to reproduce bug
Hard to discribe it here, the data is private and large. One of the sequence is:
[224, 69, 115, 94, 59, 63, 60, 52, 87, 118, 132, 149, 139, 89, 97, 115, 98, 82, 55, 77, 96, 133, 112, 92, 170, 128, 94, 84, 63, 75, 56, 77, 85, 121, 126, 101, 197, 98, 89, 71, 72, 30, 47, 73, 69, 106, 110, 128]
batch prediction gives '7405687891.374923'
Expected behavior
The results of individual predictions and batch predictions should be the same.
Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: