Infinite loop in `WolfLSZoom` when using stan for optimization #3229

jhjourdan · 2023-09-11T21:47:25Z

Summary:

Infinite loop in WolfLSZoom when using stan for optimization

Description:

When using Stan for optimization (with optimizing in rStan), the function WolfLSZoom (file bgfs_linesearch.hpp) is called, and causes infinite loop in some conditions.

When I run my script, it appears to be stuck in an infinite loop. Using Gdb on the code, I get the following values for local variables in the main loop of the WolfLSZoom function:

itNum = 59171617
alo = 0.73678486652303887
ahi = 0.73678486652303898

This indicates that there is indeed an infinite loop (itNum is high). The reason for that is that the loop is expected to terminate when ahi-alo < min_range, but ahi and alo stay constant because of rounding errors in the loop, and, in my particular case, ahi-alo = 2^(-53) > min_range = 10^(-16).

Do we expect to always have abs(ahi) < 1 and abs(alo) < 1? If so, then a simple fix could be to choose a larger value for min_range (10^(-15) would do the trick).

Reproducible Steps:

The bug is 100% reproducible with my R script, but, unfortunately, I cannot publish the source code of my example.

Version:

I am using rStan 2.21.8, but the bug seems to still exist in Stan's sources.

The text was updated successfully, but these errors were encountered:

bob-carpenter · 2023-09-11T21:57:10Z

Thanks for reporting and for suggested fixes.

The bug is 100% reproducible with my R script, but, unfortunately, I cannot publish the source code of my example.

Is there a way you can publish a simple example that has the problem with simulated data? Without an example, it can be very hard for us to diagnose what's going on, though this is a hint that maybe we want to cap the number of iterations.

I'm pinging @SteveBronder, who is the person most familiar with our existing L-BFGS code. Some of that's been refactored, I think, on the way to implementing our new variational inference algorithm, but Steve will know more.

jhjourdan · 2023-09-11T22:17:25Z

Is there a way you can publish a simple example that has the problem with simulated data? Without an example, it can be very hard for us to diagnose what's going on, though this is a hint that maybe we want to cap the number of iterations.

Sorry, but that's really difficult to publish this piece of code. But that's the reason why I tried to debug the code myself. I really think the current value of min_range does not guarantee termination, and that a larger value would do.

I'm pinging @SteveBronder, who is the person most familiar with our existing L-BFGS code. Some of that's been refactored, I think, on the way to implementing our new variational inference algorithm, but Steve will know more.

Great! Let's wait for Steve then!

jhjourdan · 2023-09-13T07:43:54Z

I confirm that replacing 1e-16 with 1e-15 fixes the issue in my particular case. But it's unclear that it would fix it in general, since there is no reason alo and ahi would be smaller than 1 (if I understand well...).

nhuurre · 2023-09-13T10:59:42Z

The general fix could be using relative tolerance instead of absolute, for example something like

std::fabs(alo - ahi) < 1e-15*std::max(1.0, alpha)

However, infinite looping suggests a different problem to me. WolfLSZoom() is supposed to stop as soon as it finds a point that satisfies the Wolfe conditions. If the range shrinks to the limit of floating point precision then either WolfLSZoom did not recognize a valid point, or it was called on a range that does not contain any interval where Wolfe conditions hold. The latter would be an error in WolfLineSearch(), assuming your target function is continuously differentiable.

jhjourdan · 2023-09-13T12:03:09Z

Do you mean that there is some theorem telling that a point with Wolfe conditions necessarily exists in the interval? How robust is this theorem w.r.t. floating point approximations?

nhuurre · 2023-09-13T12:56:20Z

Condition (i) is equivalent to saying the average of the derivative over the interval between the starting point and final point is at least c1 times the derivative at the starting point. This always holds in some neighbourhood of the starting point.
Condition (ii) says the derivative at the final point is at most c2 times the derivative at the starting point, and must hold eventually or the target distribution cannot be proper.
When c1 < c2 the mean value theorem guarantees there is some overlap. Stan uses c1=0.0001,c2=0.9 so I think it's quite robust.
Condition (iii) refines condition (ii) but does not change the analysis--a continuously differentiable target function cannot skip the region where (iii) holds when moving into a region where (ii) holds.

SteveBronder · 2023-09-14T17:58:36Z

@jhjourdan apologies for my slow delay, imo this sounds like a floating point error due to rounding. Can you try @nhuurre's suggestion? Something like replacing line 147 with

const Scalar epsilon = std::numeric_limits<Scalar>::epsilon();
const auto min_a = std::min(alo, ahi);
if (std::fabs(min_a - alpha) <= epsilon * std::fabs(min_a) + min_range)

If that works then I put in a PR to patch that. Though it would be nice to be able to find a test that catches this so we know we have fixed it and not done a silly thing. I'd have to think of a model that would cause this to happen

@nhuurre there are some extra heuristics during the search to satisfy the wolfe conditions and it could be caused by those, but I'd need to dig deeper to figure out where those heuristics came from and compare against other ideas.

jhjourdan · 2023-09-22T12:11:18Z

Sorry for the delayed answer: I needed some time to realize that for my particular model, the infinite loop disappears with rstan 2.26.23, and I needed some time to actually test with lots of different data that the patch you are proposing actually fixes the issue on 2.21.

(I'm not sure why 2.26 fixes the issue, perhaps the model is evaluated with better accuracy. In any case, the piece of code where the infinite loop appears does not seem to have significantly changed between the two versions, so it's still worth fixing.)

So the answer is yes, your patch fixes the issue, I no longer observe the infinite loop that we were seeing.

SteveBronder closed this as completed Jan 2, 2024

SteveBronder reopened this Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop in `WolfLSZoom` when using stan for optimization #3229

Infinite loop in `WolfLSZoom` when using stan for optimization #3229

jhjourdan commented Sep 11, 2023

bob-carpenter commented Sep 11, 2023

jhjourdan commented Sep 11, 2023

jhjourdan commented Sep 13, 2023

nhuurre commented Sep 13, 2023

jhjourdan commented Sep 13, 2023

nhuurre commented Sep 13, 2023

SteveBronder commented Sep 14, 2023

jhjourdan commented Sep 22, 2023

Infinite loop in WolfLSZoom when using stan for optimization #3229

Infinite loop in WolfLSZoom when using stan for optimization #3229

Comments

jhjourdan commented Sep 11, 2023

Summary:

Description:

Reproducible Steps:

Version:

bob-carpenter commented Sep 11, 2023

jhjourdan commented Sep 11, 2023

jhjourdan commented Sep 13, 2023

nhuurre commented Sep 13, 2023

jhjourdan commented Sep 13, 2023

nhuurre commented Sep 13, 2023

SteveBronder commented Sep 14, 2023

jhjourdan commented Sep 22, 2023

Infinite loop in `WolfLSZoom` when using stan for optimization #3229

Infinite loop in `WolfLSZoom` when using stan for optimization #3229