Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meaning of f_calls_limit is misleading #1053

Open
MariusDrulea opened this issue Oct 10, 2023 · 7 comments
Open

meaning of f_calls_limit is misleading #1053

MariusDrulea opened this issue Oct 10, 2023 · 7 comments

Comments

@MariusDrulea
Copy link

See the following code and the output. We count the number of calls made to the rosenbrock function. As you can see in the output, the optimize method reports only 53 calls of the function. However there are count_calls=[256], way above the f_calls_limit=100, so clearly not all function calls are reported by optimize.
Perhaps optimize only counts the direct calls to the function and ignore the calls required to compute the gradient and the hessian by finite differences. In my opinion, f_calls_limit shall refer to all possible calls of the original function. Otherwise we just get a hidden behavior.

using Optim

count_calls = [0]
function rosenbrock(x)
    count_calls[1] += 1 
    return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
result = optimize(rosenbrock, zeros(2), BFGS(), Optim.Options(x_tol=1e-12, g_tol = 1e-08, f_calls_limit=100))
@show count_calls
rosenbrock (generic function with 1 method)

 * Status: success

 * Candidate solution
    Final objective value:     5.471433e-17

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 3.47e-07 ≰ 1.0e-12
    |x - x'|/|x'|          = 3.47e-07 ≰ 0.0e+00
    |f(x) - f(x')|         = 6.59e-14 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.20e+03 ≰ 0.0e+00
    |g(x)|                 = 2.33e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    16
    f(x) calls:    53
    ∇f(x) calls:   53

count_calls = [265]
@MariusDrulea
Copy link
Author

MariusDrulea commented Oct 10, 2023

Something I just remarked:

f(x) calls: 53
 ∇f(x) calls:   53 
    => there are 2 parameters; for each parameters we need 2 evaluations of f for gradient  
    => 2 params * 2 f(x) calls/param * 53 calls of gradient = 4*53

So in total we have 53+4*53=256, which is exactly the number of calls we measured above.

This opens up another possibility:
Can we specify what type of finite differences to use? Like use forward differences (f(x+e)-f(x))/e instead of central differences (f(x+e)-f(x-e))/2e. If there are multiple parameters (multiple direction vectors e), f(x) stays the same in forward differences and we save a lot of function calls.
My real-world function is expensive to calculate.

Edit:
While I was looking over the Optim.jl and NLSolversBase.jl code I have found the autodiff=:finiteforward option:
result = optimize(rosenbrock, zeros(2), BFGS(), opts, autodiff=:finiteforward). Unfortunately :finiteforward behaves just like :finitecentral and does not reduce the number of function calls. So I believe in the current implementation f(x) is un-necessarily computed multiple times. This can be optimized.

@johnmyleswhite
Copy link
Contributor

In my opinion, f_calls_limit shall refer to all possible calls of the original function. Otherwise we just get a hidden behavior.

This is a very reasonable thing to want, but it's not clear it's technically feasible. Suppose you defined rosenbrock the way you did and then also defined rosenbrock_gradient by calling into your rosenbrock. How is Optim supposed to know this and produce counts consistent with your internal counter?

It is true that in the special case when Optim has created the gradient itself that Optim could know how many calls are being made, but is it better for the interpretation of f_calls_limit to vary depending on whether Optim creates gradients rather than takes them in as blackbox inputs?

More generally, I think the right way to optimize this stuff is for the caller to do the optimization rather than for Optim to be more clever -- this is why the Optim interface takes in things like fg! so you can compute many things in a single call.

@MariusDrulea
Copy link
Author

but is it better for the interpretation of f_calls_limit to vary depending on whether Optim creates gradients rather than takes them in as blackbox inputs?

I think so, why not? If you provide the gradient, it is expected that the number of calls to the underlying function would be a lot smaller. It's another way to sense the usefulness of providing a gradient function.

@pkofod
Copy link
Member

pkofod commented Oct 14, 2023

I think this has been discussed quite a few times. You can argue either way. But you're right, it's not necessarily counting the number of times rosenbrock is called.

@MariusDrulea
Copy link
Author

I think perhaps a better description of what that is (objective calls) and what is not (does not count objectives in gradient), would be sufficient: https://julianlsolvers.github.io/Optim.jl/stable/#user/config/#general-options.

@MariusDrulea
Copy link
Author

I have added a better description in this #1054.
The PR also shows the minimizer at the end of minimization. Let me know if this is also appropriate.

@pkofod
Copy link
Member

pkofod commented Dec 12, 2023

I think they are two separate issues. I'm on board with merging your change to the description, but I'm not sure about showing the full minimizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants