Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSR1Operator(1) fails the secant equation #225

Open
paraynaud opened this issue Mar 29, 2022 · 6 comments · May be fixed by #235
Open

LSR1Operator(1) fails the secant equation #225

paraynaud opened this issue Mar 29, 2022 · 6 comments · May be fixed by #235

Comments

@paraynaud
Copy link
Member

paraynaud commented Mar 29, 2022

Hello all,

I did test the LSR1 operators for myself, and it seems that there is a problem with the LSR1Operator of dimension 1.
In my work, I generate automatically several LSR1Operators, and some of them are of dimension 1.

Here is an example to replicate the issue:

n = 1
B = LSR1Operator(n)
r = 0.0
for i = 1:100
  global r
  s = rand(n)
  y = rand(n)
  push!(B, s, y)  
  r += norm(B * s - y) #secant equation
end
r #166.39...

If you replace n=1 by n=2 then r = 1e-14.

The problem does not occur with the LBFGSOperators.

@paraynaud paraynaud changed the title LSR1Operator(1) fail the secant equation LSR1Operator(1) fails the secant equation Mar 29, 2022
@dpo
Copy link
Member

dpo commented Mar 29, 2022

@paraynaud
Copy link
Member Author

paraynaud commented Mar 29, 2022

The conditions for the update are fulfilled in the code above.
The next piece of code use a loop which guarantee which garantees that the conditons of LSR1 are satisfied before the push!.

The strange thing is that in some cases the update happens, and in other cases it does not happen.
For example the pair (s,y) ,

B = LSR1Operator(1)
s = [0.5809871055619441]
y = [0.8263178027435335]
push!(B, s, y)  #1×1 Matrix{Float64}: 1.4222653047425207

produces an updated 1x1 matrix.

To extend the analysis, I tried the push! in a loop similar to the previous one.

n = 1
r = 0.0
cpt = 0 
cpt_approx_null = 0

for i = 1:100
  B = LSR1Operator(n)
  global r
  global cpt
  global cpt_approx_null
  s = rand(n)
  y = rand(n)
  ϵ = eps(eltype(y))  
  ymBs = y - B * s
  well_defined(ymBs, s) = abs(dot(ymBs, s))  ϵ + ϵ * norm(ymBs) * norm(s, 2)
  sufficient_curvature(s, y) = abs(dot(y, s)) >= ϵ * norm(y, 2) * norm(s, 2)
  scaling_factor(s, y) = dot(y, s) / dot(y, y)
  scaling_condition(s, y) = norm((y - s) / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)
  if well_defined(ymBs, s) && scaling_condition(s, y) && sufficient_curvature(s, y)
    push!(B, s, y)  
    cpt += 1
    ymBs = y - B*s    
    norm_ymBs = norm(ymBs, 2)
    # @show norm_ymBs
    r += norm_ymBs
    if isapprox(norm_ymBs, 0., atol=1e-10)
      cpt_approx_null += 1
    end 
  end 
  # println(Matrix(B))
end
r, cpt, cpt_approx_null # (22.107767681026328, 100, 29)

Usually, every pair s,y should update the virgin LSR1 operator, but only 29 updates satisfy the secant equation.

@paraynaud
Copy link
Member Author

paraynaud commented Jun 22, 2022

I have new examples of failures for LSR1Operator and an explanation of why they fail.

lsr1 = LSR1Operator(2)
y = [-5., -5.]
s = [-1., -1.]
push!(lsr1, s ,y)
Matrix(lsr1) # indentity matrix

The vectors y and s are collinear, I tried other collinear vectors to verify

y = [-5., -4.]
s = [-1.25, -1.]
push!(lsr1, s ,y)
Matrix(lsr1) # still the identity

and i get the same result.

The test

if !(well_defined && sufficient_curvature && scaling_condition)

fails, because the scaling condition
scaling_condition = norm(y - s / scaling_factor) >= ϵ * yNorm * sNorm

is false.

If we suppose s = $\alpha$ * y (s,y collinear), the scaling factor ys/yy is of value $\alpha$.
Then, the scaling condition norm(y - s / scaling_factor) will always be 0 since s / scaling_factor is of value $\tfrac{\alpha y }{\alpha} = y$.
It explains why LSR1Operator(1) fails, since there is two component of size one (s,y), inevitably collinear.

When I saw this, I tried again the loop test that I made previously and I spotted an error in my numerical test

scaling_condition(s, y) = norm((y - s) / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)

which should be

scaling_condition(s, y) = norm(y - s / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)

I restarted the same loop with the modified scaling_condition

using LinearOperators, LinearAlgebra

n = 1
r = 0.0
cpt = 0 
cpt_approx_null = 0
for i = 1:100
  B = LSR1Operator(n)
  global r
  global cpt
  global cpt_approx_null
  s = rand(n)
  y = rand(n)
  ϵ = eps(eltype(y))  
  ymBs = y - B * s
  well_defined(ymBs, s) = abs(dot(ymBs, s))  ϵ + ϵ * norm(ymBs) * norm(s, 2)
  sufficient_curvature(s, y) = abs(dot(y, s)) >= ϵ * norm(y, 2) * norm(s, 2)
  scaling_factor(s, y) = dot(y, s) / dot(y, y)
  scaling_condition(s, y) = norm(y - s / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)
  if well_defined(ymBs, s) && scaling_condition(s, y) && sufficient_curvature(s, y)
    @show s, y # that way you can try the pairs that you get
    push!(B, s, y)  
    cpt += 1
    ymBs = y - B*s    
    norm_ymBs = norm(ymBs, 2)
    r += norm_ymBs
    if isapprox(norm_ymBs, 0., atol=1e-10)
      cpt_approx_null += 1
    end 
  end 
end
r, cpt, cpt_approx_null # (0.0, 25, 25)

Out of 100 random pairs of vectors of size 1, only 25 passed the numerical tests (mainly scaling_condition).
One of those was

(s, y) = ([0.49377733218773445], [0.11429787904158029])

This pair passes the scaling_condition with

norm(y - s / scaling_factor(s, y), 2) # 5.551115123125783e-17 a numerical 0

superior to

ϵ * norm(y, 2) * norm(s, 2) # 1.2531687196363857e-17

And all the other pairs s,y that I tried where similar.

I don't know if this is an expected behaviour, because the collinear pairs satisfy the well_defined condition

well_defined = abs(dot(ymBs, s)) ϵ + ϵ * norm(ymBs) * sNorm

I made several tests on my SR1 implementation using collinear pairs s,y, and the resulting matrices satisfied the secant equation and were well-defined.

I'm not sure how to test the collinear vectors, but here is a try

s = [1., 2.]
y = [2., 4.]
myand(a,b) = a && b
s_y = (s ./ y)
collinear = (length(s) == 1) || mapreduce(i-> i == s_y[1], myand, s_y) # true

collinear check if s is a vector of size one, or if the result of an componentwise division s_y = s ./ y have the same value for every component (i.e. the same value than s_y[1]).
collinear could be use to force scaling_condition to true.

@paraynaud paraynaud linked a pull request Jun 24, 2022 that will close this issue
@dpo
Copy link
Member

dpo commented Jun 25, 2022

Thank you for this analysis.

I made several tests on my SR1 implementation using collinear pairs s,y, and the resulting matrices satisfied the secant equation and were well-defined.

How does your implementation differ?

@paraynaud
Copy link
Member Author

paraynaud commented Jun 25, 2022

My SR1 implementation is very simple, it doesn't have as many numerical conditions as push! of LSR1Operator does.
The only numerical test of my SR1 implementation is the same as

if !(well_defined && sufficient_curvature && scaling_condition)

It does not implement the sufficient_curvature and scaling_condition tests.

sufficient_curvature = abs(ys) ϵ * yNorm * sNorm

scaling_condition = norm(y - s / scaling_factor) >= ϵ * yNorm * sNorm

The PR #235 sets data.scaling of the current iterate to false, which implies that sufficient_curvature and scaling_condition stay true. In addition, it dosen't update scaling_factor.

@dpo
Copy link
Member

dpo commented Jun 25, 2022

In fact, I don't remember where scaling_condition came from. Maybe it was an experiment and should be removed. Clearly, it prevents updates when s and y are collinear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants