Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hessian performance #293

Open
Jhsmit opened this issue Feb 5, 2020 · 9 comments
Open

Hessian performance #293

Jhsmit opened this issue Feb 5, 2020 · 9 comments

Comments

@Jhsmit
Copy link
Collaborator

Jhsmit commented Feb 5, 2020

I'm having some performance issues with a modestly sized model (37 parameters). The fitting takes a rather long time and profiling revealed that 80% of time time is spend in calling coveranciance_matrix / eval_hessian.

symfit_calltree.pdf

I use the standard LBFGSB mimimizer and when I replace Model with Callablemodel the total fitting time reduces from 200 seconds to 17 seconds. The fit result is the same.

@pckroon
Copy link
Collaborator

pckroon commented Feb 5, 2020

Hmmn, that's not great. I'm not sure I'd call 37 parameters modestly sized though ;)

  1. Could you try with GradientModel? That will only use an analytical Jacobian.
  2. The fit result should be the same with Model vs CallableModel for LBFGSB, since LBFGSB doesn't use the Hessian. What could differ is the covariance matrix; could you check that? Even there there's probably only a difference for non-linear models with multiple components, but I'm not up to speed with the maths/stats.

@Jhsmit
Copy link
Collaborator Author

Jhsmit commented Feb 5, 2020

GraidentModel takes 32 seconds.
Covariance matrix is None for CallableModel as for GradientModel.

Actually, on closer inspection, the model has 27 parameters. Math is hard. But compared to the ~300 parameter model I would like to fit, its still modest.

@pckroon
Copy link
Collaborator

pckroon commented Feb 6, 2020

GraidentModel takes 32 seconds.

Sounds much better. Still long-ish, but I'd say acceptable. Even 200s could be considered acceptable, depending on the application

Covariance matrix is None for CallableModel as for GradientModel.

That's somewhere between strange and worrying. All Models should result in a covariance matrix. Either from directly inverting the Hessian (if it's available), or from the approximation J.T.dot(J). It could be that's not invertible for your model because you have e.g. codependent parameters.

PS. Do you really need all 300 parameters?

@Jhsmit
Copy link
Collaborator Author

Jhsmit commented Feb 6, 2020

Yes, I agree, 32 seconds is perfectly acceptable for a 27 parameter model. However, if 200s would be acceptable is in my opinion dependent on how much time symfit adds to this fit compared to the scipy only fit weighted againts the gain in fitting accuracy or of course time spend programming.

For example, in my profiling I find 4 314 861 calls to sympy.core.basic._aresame, with a total time of 100s (33.8%, percantages dont really add up to the 200 seconds). Similarly, there are 7308 calls to sympy.core.basic.subs (76%), which originate from 3 calls to jacobian_from_model.
Why do we need 3 calls to get the jacobian model? This seems like a one-time thing, so perhaps a chached_property of sorts could reduce the total time by half.
And why does subs need to be called roughly 2500x to generate the jacobian model? Is this because of all permutations of derivatives for 27 parameters?

PS. My physical system has a total of ~800 observables, and we have a total of ~1400 'measurements'. However, not all measurements are independent so the system is underdetermined (aside from being a total pain to fit). Also, not all parameters affect all measurements, ie say parameters 0-4 affect measurement value 1, and parameters 1-6 affect measurement 2, etc etc. So in a way there are many quasi independent fitting problems with some coupling between them.

@Jhsmit
Copy link
Collaborator Author

Jhsmit commented Feb 6, 2020

My model's function_dict has 9 entries, and the number of parameters is 27. However, as a consequence of what i described in the PS. above, not all parameters are in each function.
I suspect that because of that the nested for loop in jacobian_from_model can be sped up quite a bit by using this information and setting those derivatives to zero without going through the effort of actually calculating them.

@pckroon
Copy link
Collaborator

pckroon commented Feb 6, 2020

Symfit was never written for performance :)
I think the bigger concern is how the calculation of e.g. Hessians scales. For 5 parameters it'll still be near instant, so no problem no matter how much slower.
The Jacobian of GradientModels is cached (IIRC), so nothing to gain there. If you really want to dig down to the bottom, run a line profiler (kernprof [1]). I don't have the call graph in mind clear enough to comment on why some methods/functions are called disproportionately often.

You can always create your own case-specific subclass of Model or GradientModel where you just hardcode python expressions appropriate for your case. However, I'm not sure it'll be worth the effort: I don't know where the bottle neck is exactly.
Of course you can also just decide you don't care about gradients of any kind, make a CallableModel, and fit using something gradient free such as Powell.

[1] https://github.com/rkern/line_profiler

@tBuLi
Copy link
Owner

tBuLi commented Mar 19, 2020

There are some interesting points here. Indeed, For GradientModel we still evaluate the Hessian, but using the jacobian approximation (J^T . J). This is significantly faster than analytically calculating it fully, but still involves calculating the Jacobian analytically.

I think the 3 calls to jacobian_from_model originate from 1 call to make the Jacobian, and two to make the Hessian since that is the gradient of the Jacobian. I don't think there is a way to shorten this to 2, because the first call is done with some special flags to make sure the result can be used by the second call.

It could be worth trying to optimize that code a bit but this feature was never intended for big systems of equations like this. Computing the Hessians symbolically for such a big system of equations is always going to be expensive. If you need the covariance matrix, I think it is worth looking into how exactly the results from the optimizer are treated. Sometimes the optimizer will return a numerical estimate of the Hessian, Jacobian or the covariance matrix, which we could use. But currently I think we recalculate it because we like to be precise here at symfit. However, if the user uses a CallableModel than I think we should recycle whatever numerical values we get from scipy as much as possible. And I don't think we are currently doing that, returning None instead. (Please correct me on this if I'm wrong.)

As a separate question, is this problem not better solved using the highly experimental matrix_bounds branch?

@Jhsmit
Copy link
Collaborator Author

Jhsmit commented Mar 19, 2020

I indeed considered this a matrix_bounds jobby, however my problem also has a temporal component with basically a matrix equation per time point. I wasnt sure how to best implement this so I went for the old fashioned way.

@tBuLi
Copy link
Owner

tBuLi commented Mar 20, 2020

Hmmm, well maybe you can turn it into subproblems using CallableNumericalModel? Or perhaps using IndexedVariables? Those already work, but there are no IndexedParameters and I guess you need those. A lot of this kind of advanced work still remains to be done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants