Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow on freq='D' or 365 #63

Open
ahmad-shahi opened this issue Sep 1, 2023 · 8 comments
Open

very slow on freq='D' or 365 #63

ahmad-shahi opened this issue Sep 1, 2023 · 8 comments

Comments

@ahmad-shahi
Copy link

When I run pydlm on my data with daily collection and seasonality of 365. it is very very slow

@wwrechard
Copy link
Owner

Hi @ahmad-shahi, could you share a bit more details? a 365 seasonality will create a 365-dimensional transition matrix. Running kalman filter with that means that you are doing a 365 dimensional matrix multiplication and inversion at every step, it is reasonable to take some time to finish. I tested it locally with 1000 data points and it finishes in roughly 30s. For 2000 data points, it finishes in roughly 68s and so on. Please let me know if this matches your observations

Also, please make sure the seasonality of 365 is actually what you want. I assume you want to model the day-of-year pattern, but a seasonality of 365 might not give you that due to the existence of leap year. I would rather use the dynamic component to create day-of-year pattern (a 366-dimensional vector) rather than using seasonality. It should also be faster.

@ahmad-shahi
Copy link
Author

Hi, thanks for looking into the problem. My data is seasonal, starting in June and ending at the end of May, and this pattern is repeated. My data is a daily collection.

based on your explanation, I believe will be still slow. However, your previous version was much faster.
i will try with dynamic component and see how it goes.

Thanks

@ahmad-shahi
Copy link
Author

from pydlm import dlm, trend, seasonality

A linear trend

linear_trend = trend(degree=1, discount=0.95, name='linear_trend', w=10)

A seasonality

seasonal365 = seasonality(period=365, discount=0.99, name='seasonal365', w=10)

Build a simple dlm

simple_dlm = dlm(time_series) + linear_trend + seasonal365

Fit the model

simple_dlm.fit()

Plot the fitted results

simple_dlm.turnOff('data points')
simple_dlm.plot()

@wwrechard
Copy link
Owner

Hi, thanks for looking into the problem. My data is seasonal, starting in June and ending at the end of May, and this pattern is repeated. My data is a daily collection.

based on your explanation, I believe will be still slow. However, your previous version was much faster. i will try with dynamic component and see how it goes.

Thanks

That's very interesting. Let me take a deeper look and see what is happenning here. Will keep you posted.

@wwrechard
Copy link
Owner

I just did a quick profiling for a 1000-long time series, it seems most of the time was spent on the numpy functions: dot 7s), pinv and svd (20s). Let me revert the numpy version back and see if there is any regression there.

@wwrechard
Copy link
Owner

Hi @ahmad-shahi, I tested a few python and numpy versions with 1000 data points and 365 seasonality and didn't seem to find a better performing one.

Python version numpy version Profiling of svd Profiling of dot
3.11 1.25 20s 7s
3.8 1.20 21s 7s
3.6 1.70 62s 7s

I profiled the dot() and pinv() from numpy independently and it takes roughly 0.8s for 1000 dot() call of a 365-dim matrix and 22s for 1000 pinv() call of a 365-dim matrix. For pydlm, the Kalman filter does rougly 5 times of dot for both fitForwardFilter() and fitBackwardSmoother() in each step which gives a total of 8s assuming 1000 steps. The fitBackwardSmoother() also does 1 exact pinv() for each step which gives a total of 22s. The result is 30s and seems to match with the profiling data

@ahmad-shahi
Copy link
Author

Thanks for the details and clarification. What is the alternative option to run DLM for the seasonality of 365? As you said using dynamic component, can you please share an example of how to use it? I did not find the example in the docs. Thanks again and appreciate your good work.

@wwrechard
Copy link
Owner

Yeah, it's not currently implemented. The basica idea is to get a list of datetime.date of the time series and convert that into a list of 366 dimensional vectors with the coordinate of date of the year being 1 (and all other coordinates are zeros). I'll see if I can find time to implement one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants