BUG: Gracefully fail for OOM (Out of Memory) in Presolve #1467

HaoZeke · 2023-10-21T22:42:09Z

As noted in the title. Essentially this is a fix for scipy/scipy#15888.

Although this does introduce HighsExceptions it requires no changes to the C API or any other part of the code, this is just to propagate the error internally, and it maps (within the C++ part itself) to existing error codes, raising Infeasible.

Some more context. For an unreasonably large problem (which none the less has a trivial solution which can be found without Presolve):

from scipy.optimize import linprog
import numpy as np
np.random.seed(0)
n_ctr = 500000
n_var = 500
c_ = np.ones(n_var)
A_ub = np.random.random((n_ctr, n_var))
b_ub = np.zeros((n_ctr, ))

res = linprog(c_, A_ub=A_ub, b_ub=b_ub, method='highs-ds', bounds=(0, None), options={'disp': True})

print('End')

This previously triggered the out of memory killer and crashes. Now it will instead report:

Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Presolving model
ERROR:   Problem too large for Presolve, try without
Specific error raised:
Failed to resize the vector. Requested new size: -2142950640. Size to add is 24. Current size: 2152016632.
Problem status detected on presolve: Infeasible
Model   status      : Infeasible
Objective value     :  0.0000000000e+00
HiGHS run time      :         29.61

This will also save similar crashes across the rest of highs, I used the python example because that was where it was reported first.

FWIW (maybe something for the Highs team) this problem has a trivial solution:

from scipy.optimize import linprog
import numpy as np
np.random.seed(0)
n_ctr = 500000
n_var = 500
c_ = np.ones(n_var)
A_ub = np.random.random((n_ctr, n_var))
b_ub = np.zeros((n_ctr, ))

res = linprog(c_, A_ub=A_ub, b_ub=b_ub, method='highs-ds', bounds=(0, None), options={'disp': True, 'presolve': False})
Solving LP without presolve or with basis
Model   status      : Optimal
Objective value     :  0.0000000000e+00
HiGHS run time      :         25.84

So my understanding is that this should somehow trigger HighsPresolveStatus::kReducedToEmpty and then bailout (special cased) instead of entering the PostSolve phase (the length error comes from the PostSolveStack). Currently the code treats kReducedToEmpty and kReduced the same way:

// Highs.cpp ~ line 1398
      if (model_presolve_status_ == HighsPresolveStatus::kReduced ||
          model_presolve_status_ == HighsPresolveStatus::kReducedToEmpty) {

However, I think that can be dealt with later (if at all), as that would be an improvement (faster runtime than the current approach), while this PR fixes a bug (the crashing of Highs).

No change in C-API, maps to existing errors

jajhall

Thanks, but there are a couple of issues here.

Firstly, return Result::kPrimalInfeasible; cannot be returned, as infeasibility hasn't been determined. The postsolve_stack data should be cleared, and presolve should return as if no reductions can be performed.

Secondly, I'm unclear why you're thinking it should somehow trigger HighsPresolveStatus::kReducedToEmpty. It's not reduced to empty if presolve fails. For me, it should trigger HighsPresolveStatus::kNotReduced, in which case there's no call to postsolve, even if the postsolve stack isn't cleared.

It's correct to treat kReducedToEmpty and kReduced the same way. If the model is reduced to empty, then solution_ and basis_ are trivially empty, so postsolve builds on this. If presolve yields kReduced (ie reductions identified, but to a non-empty LP), then the LP solver applied to the reduced LP yields a non-empty solution_ and basis_.

I'm happy to merge this into a development branch for me to fix the issues above, before merging to latest.

HaoZeke · 2023-10-22T14:15:55Z

Thanks, but there are a couple of issues here.

Firstly, return Result::kPrimalInfeasible; cannot be returned, as infeasibility hasn't been determined. The postsolve_stack data should be cleared, and presolve should return as if no reductions can be performed.

Is there an existing Result:: which should be raised instead? With kStopped it will seg-fault afterwards:

Presolving model
ERROR:   Problem too large for Presolve, try without
Specific error raised:
Failed to resize the vector. Requested new size: -2142950640. Size to add is 24. Current size: 2152016632.
presolve_.run() returns status: Reduced
Presolve : Reductions: rows 499999(-1); columns 231(-269); elements 115500000(-134500000)
Solving the presolved LP

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_oomkill is a Catch v2.13.8 host application.
Run with -? for options

-------------------------------------------------------------------------------
linprog_oom
-------------------------------------------------------------------------------
../check/TestOOMKill.cpp:11
...............................................................................

../check/TestOOMKill.cpp:11: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGSEGV - Segmentation violation signal

Secondly, I'm unclear why you're thinking it should somehow trigger HighsPresolveStatus::kReducedToEmpty. It's not reduced to empty if presolve fails. For me, it should trigger HighsPresolveStatus::kNotReduced, in which case there's no call to postsolve, even if the postsolve stack isn't cleared.

It's correct to treat kReducedToEmpty and kReduced the same way. If the model is reduced to empty, then solution_ and basis_ are trivially empty, so postsolve builds on this. If presolve yields kReduced (ie reductions identified, but to a non-empty LP), then the LP solver applied to the reduced LP yields a non-empty solution_ and basis_.

Makes sense. This was a misunderstanding on my part, on a computer with more memory and with 64-bit integers, the presolve phase reduced to empty, so I thought there might be an optimized path for it, but that would be incorrect here.

I'm happy to merge this into a development branch for me to fix the issues above, before merging to latest.

That'd be great, I've pushed an test in the C++ code-base as well. To run it:

meson setup bbdir --buildtype="debug" -Dwith_tests=True -Dhighsint64=False
meson compile -C bbdir
./bbdir/check/test_oomkill
Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Running with 6 thread(s)
Presolving model
ERROR:   Problem too large for Presolve, try without
Specific error raised:
Failed to resize the vector. Requested new size: -2142950640. Size to add is 24. Current size: 2152016632.
presolve_.run() returns status: Infeasible
Problem status detected on presolve: Infeasible
Model   status      : Infeasible
Objective value     :  0.0000000000e+00
HiGHS run time      :         24.44
Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Running with 6 thread(s)
Solving LP without presolve or with basis
Postsolve  : 
Time       :    19.49
Time Pre   :    -1.00
Time PreLP :    -1.00
Time PostLP:    19.49
For LP                 : Solve original LP    19.49 ( 99%)
Model   status      : Optimal
Objective value     :  0.0000000000e+00
HiGHS run time      :         20.06
===============================================================================
All tests passed (6 assertions in 2 test cases)

It seems like there should be a new status flag which might be needed, perhaps something like:
Result::kPresolveFail -> HighsPresolveStatus::kNotPresolved -> HighsModelStatus::kNotSet perhaps

jajhall · 2023-10-22T14:36:55Z

Thanks. Overall, this is another very useful contribution to HiGHS from you. I will look at this in a week's time. I'm hard pressed with work this week

HaoZeke · 2023-10-22T15:46:12Z

Thanks. Overall, this is another very useful contribution to HiGHS from you. I will look at this in a week's time. I'm hard pressed with work this week

Awesome, thanks so much. For now, I've added a new error code so it goes from Result::kError->HighsPresolveStatus::kError->HighsModelStatus::kPresolveError and added tests for this:

./bbdir/check/test_oomkill
Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Running with 6 thread(s)
Presolving model
ERROR:   Problem too large for Presolve, try without
Specific error raised:
Failed to resize the vector. Requested new size: -2142950640. Size to add is 24. Current size: 2152016632.
presolve_.run() returns status: Presolve error
Presolve returned status 9
Model   status      : Presolve error
HiGHS run time      :         24.17
Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Running with 6 thread(s)
Solving LP without presolve or with basis
Postsolve  : 
Time       :    16.43
Time Pre   :    -1.00
Time PreLP :    -1.00
Time PostLP:    16.43
For LP                 : Solve original LP    16.43 ( 99%)
Model   status      : Optimal
Objective value     :  0.0000000000e+00
HiGHS run time      :         17.02
===============================================================================
All tests passed (6 assertions in 2 test cases)

Feel free to make changes as needed next week :)

HaoZeke added 7 commits October 21, 2023 20:22

ENH: Add exceptions

b730d75

MAINT: Use exceptions to propagate errors

4d4b394

MAINT: Cleanup and propagate correct exceptions

eaf282e

MAINT: Rework fix for newest Highs

8ee4bde

MAINT: Localize the error more, update text

a51dd8f

MAINT: Cleanup unused exceptions

7264b18

ENH: Use minimal exceptions

3a66980

No change in C-API, maps to existing errors

HaoZeke force-pushed the oomKillFix branch from 998ded0 to 3a66980 Compare October 21, 2023 22:45

This was referenced Oct 21, 2023

BUG: Fix gh-15888 for HiGHs OOM crashes scipy/scipy#18990

Closed

ENH: Use highspy in linprog scipy/scipy#19255

Merged

jajhall reviewed Oct 22, 2023

View reviewed changes

TST: Add a C++ test for OOM failures

8193683

MAINT: Rework to use kError over kPrimalInfeasible

2d570bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Gracefully fail for OOM (Out of Memory) in Presolve #1467

BUG: Gracefully fail for OOM (Out of Memory) in Presolve #1467

HaoZeke commented Oct 21, 2023 •

edited

jajhall left a comment

HaoZeke commented Oct 22, 2023 •

edited

jajhall commented Oct 22, 2023

HaoZeke commented Oct 22, 2023

BUG: Gracefully fail for OOM (Out of Memory) in Presolve #1467

Are you sure you want to change the base?

BUG: Gracefully fail for OOM (Out of Memory) in Presolve #1467

Conversation

HaoZeke commented Oct 21, 2023 • edited

jajhall left a comment

Choose a reason for hiding this comment

HaoZeke commented Oct 22, 2023 • edited

jajhall commented Oct 22, 2023

HaoZeke commented Oct 22, 2023

HaoZeke commented Oct 21, 2023 •

edited

HaoZeke commented Oct 22, 2023 •

edited