Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time complexity (execution time) #31

Open
buhbuhtig opened this issue Jul 29, 2018 · 6 comments
Open

Time complexity (execution time) #31

buhbuhtig opened this issue Jul 29, 2018 · 6 comments

Comments

@buhbuhtig
Copy link

buhbuhtig commented Jul 29, 2018

It's clear for me why .fit() time complexity (execution time) depends on time series length. Probably O(N^2). But why it's true for predictN(n=1) as I found out ? It looks like dependence is O(N^2)..O(N^3)
Thx

@buhbuhtig buhbuhtig changed the title Prediction performance (execution time) Time complexity (execution time) Jul 29, 2018
@buhbuhtig
Copy link
Author

Ok, I found a possible bottleneck which could be the predictModel = deepcopy(self) line within predict method ( pydlm/dlm.py ). It seems to me this is far from optimal in terms of memory space and CPU.

@wwrechard
Copy link
Owner

The fit() should be an O(N) complexity. Yes, the predictN() could be much improved. The reason that the deepcopy exists in the current implementation is because my original implementation of predictN() modifies the model in-place, i.e., it changes the model status for prediction purposes. This implementation is fast but brought many problems. The mutability makes it hard to track the model status. Once the model is called to make a prediction, it can no longer accept any new data for continuous filtering unless the last status was memorized somewhere. What could be worse is that if the predictN() is called at some middle time point, it might modify the latent states and mess up the results.

A quick mitigation to this is to ask the clients to copy the original model when they want to predict. Then the copied model can be modified and used for prediction purposes while the original model can continue work as it is. This is why the deepcopy appears in the code.

I've started a new repository called pydlm-lite, in which the structure of dlm class will be redesigned. The goal is to make the model immutable, while the model status and data should be passed in/out together to avoid any issue. The complexity improvement will likely be seen there.

@buhbuhtig
Copy link
Author

buhbuhtig commented Aug 2, 2018

Thank you for clear and helpful explanation.
I try to use pydlm for "online" predictions (add data, predict 1step ahead , add data,predict 1step ahead, etc).
As deepcopy re-copies continuously growing data it takes more and more time ...
Refactoring is greatly anticipated. However in the short term, I found simple way to avoid unnecessary deep array copying. Using "monkey-patching" I rewrote deepcopy and excluded some fields from cloning (to links). For 1day ahead oos-prediction it works for me. Yes, it's disgusting solution...

@romainjln
Copy link

romainjln commented Dec 14, 2018

Hi,

I was experiencing a similar issue (following a similar use case as buhbuhtig). Calling several times in a row predictN() seems to cause an exponential increase in the running time of the method.

This time complexity increase is actually coming from the combination of deepcopy(model) and self._predictModel = predictModel in the predict() method

The fact that self._predictModel is not reset to None at the end of the predict() or predictN() method is causing the deepcopy of the consecutive calls to increase exponentially in size (by copying a model that has a self.predictModel that has itself a self.predictModel, etc)

Setting self._predictModel to None at the end of predict/predictN should help solve part of the problem I think

@wwrechard
Copy link
Owner

Ah, that's a good call! Thank you!

I'm adding a new predictN function which should be giving a constant prediction time (or linear on the prediction length). But this is a quick fix to the current status. I will do an update shortly.

@wwrechard
Copy link
Owner

Thanks for both! The fix has been checked in and released on PyPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants