Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive Least Squares with Exponential Forgetting #75

Open
4 tasks
buybackoff opened this issue Dec 9, 2016 · 8 comments
Open
4 tasks

Recursive Least Squares with Exponential Forgetting #75

buybackoff opened this issue Dec 9, 2016 · 8 comments

Comments

@buybackoff
Copy link
Member

buybackoff commented Dec 9, 2016

Rewrite this R implementation in efficient non-allocating .NET code https://gist.github.com/buybackoff/bfeb9c8959157dbf86310d383c93f00d

  • Should return a structure with coefficients, the current residual value, and EWVariance for the residuals with the same alpha.
  • Return t-stat for coefficients, R^2, F-stat, etc. We could calculate EWR^2 just by using two EW-variances - one for Y and one for residuals. t-stat could be calculated from xpxi matrix wich is a part of the state.
  • Serial correlation test could be done as another online regression of current and previous residuals, but should be optional (or we could just use residuals from the output and apply RLSEF on them, then T-stat of the coefficient is DW test).
  • Test with alpha = 1 and compare to known results for all stats. (R version is OK)

NB since on each step matrix sizes are limited by the number of X variables, manual implementation of mmult/minverse could be cheaper than calling any "optimized" library.

@buybackoff
Copy link
Member Author

Code for up to 4x4 is there in C. Above 4 is not practical in reality because we could use the rolling regression only when we are "quite confident" in the model and already have some meaningful interpretation of betas, which we re-estimate in online mode. Have never seen a trustworthy linear regression with more than 2-3 real variables, not dummy ones.

(Interesting, but a theoretical at the moment, question is dummy variables, for which we could monitor rolling t-stats just as a signal that some binary factor becomes significant - there could be many of them. However, this usully works on large data and in exploratory mode while refining the "true" meaningful model.)

https://github.com/niswegmann/small-matrix-inverse
http://stackoverflow.com/questions/1148309/inverting-a-4x4-matrix

@buybackoff
Copy link
Member Author

  • Interesting to estimate the constant factor, at the first glance big O should be NNK vs K^3 just for xpx0.

@buybackoff
Copy link
Member Author

buybackoff commented Jan 10, 2017

  • It is also possible to make a simple moving regression by applying SMA logic to the elements of cov/var matrices: for a window of Xs and Ys of size M we know what values are removed from the window and we could subtract the contribution from the leaving ones in the same way we add the contribution from the new ones. Simple moving regression will not carry large outliers forever and could be reproduced from any key, while exponential forgetting depends on the starting point.

buybackoff added a commit that referenced this issue Jan 27, 2017
…erformance test vs non-online algo. Also need weights, probably some multiplier to reduce weights in the similar way to exponential forgetting. But won't do the original exponential version because it depends on the starting point.
@buybackoff
Copy link
Member Author

buybackoff commented Jan 27, 2017

  • Simple moving regression is done as a proof of concept. Not profiled for allocations, performance, etc., just the betas look like correct, but not compared to exact non-online output. !it is commented out temporarily!!

@buybackoff
Copy link
Member Author

The state is not cached, we recalculate lagged xpxi on every move since it is done via zipn, but that is straightforward to fix via manual impl of BindCursor. The math is the same.

@buybackoff
Copy link
Member Author

But we do just twice the work of M size, not N, so this is already online also.

@buybackoff
Copy link
Member Author

Review dotnet/corefx#31779
Small matrices are the most common case

@buybackoff
Copy link
Member Author

https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula
https://en.wikipedia.org/wiki/Recursive_least_squares_filter

Compared to most of its competitors, the RLS exhibits extremely fast convergence. However, this benefit comes at the cost of high computational complexity.

https://en.wikipedia.org/wiki/Woodbury_matrix_identity

This is applied, e.g., in the Kalman filter and recursive least squares methods, to replace the parametric solution, requiring inversion of a state vector sized matrix, with a condition equations based solution. In case of the Kalman filter this matrix has the dimensions of the vector of observations, i.e., as small as 1 in case only one new observation is processed at a time. This significantly speeds up the often real time calculations of the filter.

In the original implementation here only (XX) was calculated in online fashion. But the most expensive part is inversion of that. cblas_?syrk+cblas_?symm` + Sherman-Morrison should be BigO(N)-efficient, but may suffer from native call overhead for small N.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant