Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAILURE with L-BFGS depending on system size and starting point #198

Open
Liozou opened this issue Apr 27, 2023 · 0 comments
Open

FAILURE with L-BFGS depending on system size and starting point #198

Liozou opened this issue Apr 27, 2023 · 0 comments

Comments

@Liozou
Copy link

Liozou commented Apr 27, 2023

Hi! I am facing a FAILURE when trying to minimize a function with 62339366 variables using L-BFGS with a box constraint. The issue does not appear when using just one less variable.

The function is $f: \vec x\mapsto \sum_i -50x_i + 300 x_i(\log(x_i) - 1)$ where $\vec x$ is a vector of 62339366 variables $x_i$. The box constraint is that the $x_i$ should be above a fixed SMALL_CONSTANT, for example 0.01, so that $(\vec\nabla f(\vec x))_i = -50 + 300\log(x_i)$ does not diverge. The exact value of SMALL_CONSTANT does not matter here, the same issue occurs using SMALL_CONSTANT = 1e-100.

The numerical solution should be a vector of ω = 1.1813604128656459, since $-50 + 300\log(\omega) \approx 0$. The initial point is a vector of numbers uniformly taken in $[(1-η)ω;\ (1+η)ω]$ where η = 0.001 for instance.

Here is the minimal reproducer (I think it requires having 16GiB RAM):

module TestNLoptLBFGSfailure
using NLopt

import LinearAlgebra: norm

const SMALL_CONSTANT = 1e-2
const ω = 1.1813604128656459
const η = 1e-3

function objective(point::Vector{Float64}, gradient::Vector{Float64})
    if length(gradient) > 0
        gradient .= -50 .+ 300 .* log.(point)
    end
    value = sum(-50*r + 300*(r*(log(r) - 1)) for r in point)
    @show value, norm(gradient)
    value
end

function test(n)
    opt = Opt(:LD_LBFGS, n)
    opt.ftol_rel = 1e-10 # or 1e-3, it does not matter
    opt.lower_bounds = fill(SMALL_CONSTANT, n)
    opt.min_objective = objective

    init = ω .* rand(1 .+ (-η), n)
    # init = fill(ω, n) # actual target

    minf, minx, ret = optimize(opt, init)
    ret
end

end

and the observations:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093566731912422e10, 2369.8435877962493)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365) # just 62339366 - 1
(value, norm(gradient)) = (-2.2093566377504475e10, 2369.843568788648)
(value, norm(gradient)) = (-2.143919631036808e10, 534364.1274153551)
(value, norm(gradient)) = (-2.209355118106552e10, 3646.620883772766)
(value, norm(gradient)) = (-2.209357736243934e10, 2.2617428385919083)
(value, norm(gradient)) = (-2.209357736243935e10, 0.001131058801969029)
(value, norm(gradient)) = (-2.209357736243935e10, 6.7321322189234e-10)
:SUCCESS

If I use the target value ω directly as initialization (i.e. taking η = 0), then it works:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093577716847473e10, 2.2440440909730785e-10)
:SUCCESS

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 2.2440440729744666e-10)
:SUCCESS

And if I set the starting point too close to the target, i.e. $η$ too small, for instance η = 1e-9, then it fails for both cases, but it does not perform the same number of steps:

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.2093577716847473e10, 0.0023686585521977225)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 0.0023686585331996264)
(value, norm(gradient)) = (-2.209357736243935e10, 0.5991391139757002)
(value, norm(gradient)) = (-2.209357736243935e10, 0.3982398117902327)
(value, norm(gradient)) = (-2.209357736243935e10, 0.2643059496298344)
(value, norm(gradient)) = (-2.209357736243935e10, 0.17501521403019882)
(value, norm(gradient)) = (-2.209357736243935e10, 0.1154857940136743)
(value, norm(gradient)) = (-2.209357736243935e10, 0.07579606004479056)
(value, norm(gradient)) = (-2.209357736243935e10, 0.049330911796086924)
(value, norm(gradient)) = (-2.209357736243935e10, 0.03167915082330152)
(value, norm(gradient)) = (-2.209357736243935e10, 0.01989798378814819)
(value, norm(gradient)) = (-2.209357736243935e10, 0.012021766484118854)
(value, norm(gradient)) = (-2.209357736243935e10, 0.006732064617095256)
:FAILURE

May be related to #33. The last point (optimization fails if the initial point is too close to the optimum) reflects #33 (comment), but the difference between 62339365 and 62339366 variables is new I think? My real problem is of course more complex and even larger than 62339366 variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant