Performance regression from v9.7 -> v9.8/v9.9 #4166

hanno-becker · 2024-03-31T20:32:25Z

What version of OR-Tools and what language are you using?
Version: v9.7, v9.8, v9.9
Language: Python

Which solver are you using (e.g. CP-SAT, Routing Solver, GLOP, BOP, Gurobi)

CP-SAT

What operating system (Linux, Windows, ...) and version?

Apple M1 Pro, MacOS Sonoma Version 14.3.1

What did you do?

Updated OR-Tools from v9.7 to v9.8 and v9.9 when used as the backend for the SLOTHY assembly superoptimizer.

What did you expect to see

CP-SAT performance that is similar or better in terms of runtime and consistency.

What did you see instead?

Significant inconsistency in the runtime of CP-SAT.

Steps to reproduce:

Save the attached model.txt (5.4MB) and the following as run_model.py:

# Following https://groups.google.com/g/or-tools-discuss/c/jg-LYSWAgZA/m/QMsyJQI-AgAJ

import ortools
from ortools.sat.python import cp_model
from google.protobuf import text_format

TIMEOUT=15 # In the good case, the model solves in ~8s on an Apple M1
ITERATIONS=30

def load_and_solve(i):
    model = cp_model.CpModel()
    with open("model.txt", "r") as file:
        text_format.Parse(file.read(), model.Proto())
    print (f"[{i}]: Solve using OR-Tools v{ortools.__version__} ... ", end='', flush=True)
    solver = cp_model.CpSolver()
    solver.parameters.max_time_in_seconds = TIMEOUT
    status = solver.Solve(model)
    status_str = solver.StatusName(status)
    print(f"{status_str}, wall time: {solver.WallTime():.4f} s")

for i in range(ITERATIONS):
    load_and_solve(i)

In an environment with OR-Tools v9.7/v9.8/v9.9 setup, do

python3 run_model.py

Observe
- High consistency in performance for v9.7
- Low consistency in performance for v9.8 and v9.9

Here are the outputs on my local machine (see above):

[0]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.7095 s
[1]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6358 s
[2]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6029 s
[3]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6173 s
[4]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6677 s
[5]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6342 s
[6]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6091 s
[7]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6222 s
[8]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6853 s
[9]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6149 s
[10]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6368 s
[11]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6400 s
[12]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6311 s
[13]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6256 s
[14]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6580 s
[15]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6790 s
[16]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.5866 s
[17]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6641 s
[18]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6367 s
[19]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.5504 s
[20]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6438 s
[21]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6866 s
[22]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6646 s
[23]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6997 s
[24]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.7523 s
[25]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6659 s
[26]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6842 s
[27]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6527 s
[28]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6678 s
[29]: Solve using OR-Tools v9.7.2996 ... OPTIMAL, wall time: 7.6791 s

[0]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0477 s
[1]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0498 s
[2]: Solve using OR-Tools v9.8.3296 ... OPTIMAL, wall time: 8.0398 s
[3]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0509 s
[4]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0495 s
[5]: Solve using OR-Tools v9.8.3296 ... OPTIMAL, wall time: 7.7925 s
[6]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0494 s
[7]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0490 s
[8]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0499 s
[9]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0521 s
[10]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0441 s
[11]: Solve using OR-Tools v9.8.3296 ... OPTIMAL, wall time: 7.5001 s
[12]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0480 s
[13]: Solve using OR-Tools v9.8.3296 ... OPTIMAL, wall time: 7.7570 s
[14]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0510 s
[15]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0435 s
[16]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0512 s
[17]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0548 s
[18]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0482 s
[19]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0505 s
[20]: Solve using OR-Tools v9.8.3296 ... OPTIMAL, wall time: 7.7389 s
[21]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0518 s
[22]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0549 s
[23]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0517 s
[24]: Solve using OR-Tools v9.8.3296 ... OPTIMAL, wall time: 8.5971 s
[25]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0511 s
[26]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0454 s
[27]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0479 s
[28]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0486 s
[29]: Solve using OR-Tools v9.8.3296 ... UNKNOWN, wall time: 15.0584 s

[0]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.5733 s
[1]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1111 s
[2]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1614 s
[3]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 8.0639 s
[4]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 8.1368 s
[5]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9110 s
[6]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1061 s
[7]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1078 s
[8]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1067 s
[9]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9369 s
[10]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 8.0701 s
[11]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9486 s
[12]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9123 s
[13]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1139 s
[14]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1011 s
[15]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.5487 s
[16]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9708 s
[17]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1062 s
[18]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1557 s
[19]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1104 s
[20]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1042 s
[21]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1049 s
[22]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9224 s
[23]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 8.0476 s
[24]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1151 s
[25]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9463 s
[26]: Solve using OR-Tools v9.9.3963 ... UNKNOWN, wall time: 15.1065 s
[27]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9260 s
[28]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9473 s
[29]: Solve using OR-Tools v9.9.3963 ... OPTIMAL, wall time: 7.9736 s

Anything else we should know about your project / environment

The issue was observed when trying to update OR-Tools from v9.7 to v9.8 (PR) or v9.9 (PR) in SLOTHY, which led to CI failing because of timeouts.
The model provided here is representative of what seems to be a general slowdown / increase of performance variability of CP-SAT for the kinds of models generated by SLOTHY. It can be generated as logs/ntt_dilithium_45678_a55_model.txt via python3 example.py --examples ntt_dilithium_45678_a55 --log-model based off the SLOTHY main branch.
Thank you for your work! CP-SAT has so far worked amazingly well as SLOTHY's backend.

If you need any more information, please let me know.

The text was updated successfully, but these errors were encountered:

lperron · 2024-04-01T10:16:55Z

how many workers are you using ?

lperron · 2024-04-01T10:20:15Z

the hint is incomplete. Can you improve it ?

lperron · 2024-04-01T10:28:04Z

I have consistent results with these parameters:

fix_variables_to_their_hinted_value:true,num_workers:10,use_feasibility_jump:false,use_rins_lns:false,use_feasibility_pump:false,cp_model_probing_level:0

It seems to solve consistently in 6-7s.

hanno-becker · 2024-04-01T15:50:49Z

how many workers are you using ?

The minimal example above does not set them explicitly, so I assume it's determined by the number of cores on the system? In my case, that's 8.

the hint is incomplete. Can you improve it ?

Can you elaborate? Should one either give no hints at all or hints to all variables?

I have consistent results with these parameters:
fix_variables_to_their_hinted_value:true,num_workers:10,use_feasibility_jump:false,use_rins_lns:false,use_feasibility_pump:false,cp_model_probing_level:0

The configurationfix_variables_to_their_hinted_value:true does not seem like an option in my case, because the hints really are only hints -- I set them based on an expectation that for the majority of variables of a certain kind the hinted property will be true, but there will be exceptions (in more detail: SLOTHY can interleave neighbouring loop iterations, and there are booleans indicating if an instruction is pulled forward into the previous iteration (e.g. an early load), or deferred to the next iteration (e.g. late store) -- most instructions will stay in their original iteration and hence the tool is hinting at that, but without early/late instructions altogether, the tool would be much less powerful).

Can you comment further on the configuration options you have set to force consistency? It does look like search either succeeds quickly, or some search strategy leads the solver astray entirely (because where the usual ~8s solution is missed, solution are often not even found with a significantly larger budget.

lperron · 2024-04-01T15:56:57Z

The closer to completeness the hint is, the less effort is needed in search. We do process complete feasible hints differently. Laurent Perron | Operations Research | ***@***.*** | (33) 1 42 68 53 00 Le lun. 1 avr. 2024 à 17:51, Hanno Becker ***@***.***> a écrit :

…

how many workers are you using ? The minimal example above does not set them explicitly, so I assume it's determined by the number of cores on the system? In my case, that's 8. the hint is incomplete. Can you improve it ? Can you elaborate? Should one either give no hints at all or hints to all variables? I have consistent results with these parameters: fix_variables_to_their_hinted_value:true,num_workers:10,use_feasibility_jump:false,use_rins_lns:false,use_feasibility_pump:false,cp_model_probing_level:0 The configurationfix_variables_to_their_hinted_value:true does not seem like an option in my case, because the hints really are only hints -- I set them based on an expectation that for the majority of variables of a certain kind the hinted property will be true, but there will be exceptions (in more detail: SLOTHY can interleave neighbouring loop iterations, and there are booleans indicating if an instruction is pulled forward into the previous iteration (e.g. an early load), or deferred to the next iteration (e.g. late store) -- most instructions will stay in their original iteration and hence the tool is hinting at that, but without early/late instructions altogether, the tool would be much less powerful). Can you comment further on the configuration options you have set to force consistency? It does look like search either succeeds quickly, or some search strategy leads the solver astray entirely (because where the usual ~8s solution is missed, solution are often not even found with a significantly larger budget. — Reply to this email directly, view it on GitHub <#4166 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUPL3M5G7FGLMAHO44BJJLY3F667AVCNFSM6AAAAABFQV5KC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZQGAZTMNJWGI> . You are receiving this because you commented.Message ID: ***@***.***>

hanno-becker · 2024-04-01T15:59:12Z

The closer to completeness the hint is, the less effort is needed in search.

There are usually simple solutions which follow the (incomplete) hints, but they won't minimize the given objective (stall minimization, in SLOTHY's case) -- are those still useful hints in your experience, or should they be removed?

lperron · 2024-04-01T16:33:29Z

can you try num_workers:24 ?

lperron · 2024-04-01T16:34:46Z

or just num_workers:1,search_branching:FIXED_SEARCH

hanno-becker · 2024-04-01T18:25:10Z

num_workers=24 does not seem to help
num_workers=1, search_braching=FIXED_SEARCH gives consistently fast results in ~7s
search_branching=FIXED_SEARCH gives consistent results ~8-9s.

What is the takeaway here? When should I consider setting search_branching=FIXED_SEARCH and/or num_workers=1?

I will start a SLOTHY CI run using search_branching=FIXED_SEARCH and see how other workloads fare.

Following suggestions in google/or-tools#4166

hanno-becker · 2024-04-02T07:47:23Z

@lperron Unfortunately, unconditionally setting search_branching=FIXED_SEARCH (with or without num_workers=1) does lead to performance problems in other SLOTHY CI workloads.

How would you suggest to proceed here? Do you have a sense of what v9.7->v9.8 change might have triggered this performance change? Has some new search strategy been added in v9.8 that might lead the solver astray in the models produced by SLOTHY?

lperron · 2024-04-05T15:51:43Z

OK. I have no quick solution.
Stay and 9.7 for the time being.

If you could send me a collection of models, I can integrate those into out benchmark suite.

hanno-becker · 2024-04-05T16:28:03Z

@lperron I will prepare a set of models representative of SLOTHY workloads and share them in the coming days.

hanno-becker · 2024-04-09T08:36:39Z

@lperron @Mizux I have exported some of the models exercised in the SLOTHY CI here: https://github.com/slothy-optimizer/slothy/tree/ci_models/paper/scripts/models Performance numbers as observed on my local Apple M1 are in https://github.com/slothy-optimizer/slothy/blob/ci_models/paper/scripts/models/results.txt. Some of them are solved/refuted very quickly, so one should probably hand-select a few that can be solved in seconds-minutes.

Please let me know if this is useful to you, or what kind of models/format you would prefer otherwise.

lperron · 2024-04-17T20:34:26Z

can you try with these parameters ?

um_workers:10,linearization_level:0,use_rins_lns:false,use_feasibility_pump:false,cp_model_probing_level:0

lperron · 2024-04-19T08:23:08Z

I ran all the models with 15 runs per model with different settings

16 workers, 20s:

    problems      optimal     feasible   infeasible      unknown
        1440          945           15          465           15

12 workers, 20s, linearization_level:0,use_rins_lns:false,use_feasibility_pump:false,cp_model_probing_level:0

    problems      optimal     feasible   infeasible      unknown
        1440          950           15          465           10

hanno-becker · 2024-04-20T03:52:37Z

@lperron Thank you for investigating! What do your measurements tell you?

lperron · 2024-04-20T07:00:26Z

The second set of parameters is stable, and solve all the set of problems reliably.

hanno-becker · 2024-04-23T03:14:56Z

@lperron Thank you very much for investigating. Are you going to make changes in CP-SAT to make the behaviour the default, or what are next steps?

lperron · 2024-04-23T03:54:52Z

No. Try setting these parameters in your code, and t ll me how it performs. Le mar. 23 avr. 2024, 05:15, Hanno Becker ***@***.***> a écrit :

…

@lperron <https://github.com/lperron> Thank you very much for investigating. Are you going to make changes in CP-SAT to make the behaviour the default, or what are next steps? — Reply to this email directly, view it on GitHub <#4166 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUPL3I2OYNWR5VWZRBGWT3Y6XG4NAVCNFSM6AAAAABFQV5KC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZRGMZTANJWHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

hanno-becker · 2024-04-23T04:02:31Z

@lperron I'll run the CI on the proposed parameters and get back to you.

cf. google/or-tools#4166

hanno-becker added a commit to slothy-optimizer/slothy that referenced this issue Apr 1, 2024

Tweak OR-Tools CP-SAT solver parameters

5e9cfdc

Following suggestions in google/or-tools#4166

hanno-becker added a commit to slothy-optimizer/slothy that referenced this issue Apr 1, 2024

Tweak OR-Tools CP-SAT solver parameters

394e018

Following suggestions in google/or-tools#4166

Mizux assigned lperron Apr 2, 2024

Mizux added this to the v10.0 milestone Apr 2, 2024

Mizux added the Solver: CP-SAT Solver Relates to the CP-SAT solver label Apr 2, 2024

hanno-becker mentioned this issue Apr 9, 2024

DRAFT: Add models and performance statistics for models exercised in CI slothy-optimizer/slothy#51

Open

hanno-becker added a commit to slothy-optimizer/slothy that referenced this issue Apr 23, 2024

Experiment with different solver parameters to fix performance issue

90abaf7

cf. google/or-tools#4166

Mizux modified the milestones: v10.0, Backlog Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression from v9.7 -> v9.8/v9.9 #4166

Performance regression from v9.7 -> v9.8/v9.9 #4166

hanno-becker commented Mar 31, 2024 •

edited

lperron commented Apr 1, 2024

lperron commented Apr 1, 2024

lperron commented Apr 1, 2024 •

edited

hanno-becker commented Apr 1, 2024

lperron commented Apr 1, 2024 via email

hanno-becker commented Apr 1, 2024 •

edited

lperron commented Apr 1, 2024

lperron commented Apr 1, 2024

hanno-becker commented Apr 1, 2024

hanno-becker commented Apr 2, 2024

lperron commented Apr 5, 2024

hanno-becker commented Apr 5, 2024

hanno-becker commented Apr 9, 2024 •

edited

lperron commented Apr 17, 2024 •

edited

lperron commented Apr 19, 2024 •

edited

hanno-becker commented Apr 20, 2024

lperron commented Apr 20, 2024

hanno-becker commented Apr 23, 2024

lperron commented Apr 23, 2024 via email

hanno-becker commented Apr 23, 2024

Performance regression from v9.7 -> v9.8/v9.9 #4166

Performance regression from v9.7 -> v9.8/v9.9 #4166

Comments

hanno-becker commented Mar 31, 2024 • edited

lperron commented Apr 1, 2024

lperron commented Apr 1, 2024

lperron commented Apr 1, 2024 • edited

hanno-becker commented Apr 1, 2024

lperron commented Apr 1, 2024 via email

hanno-becker commented Apr 1, 2024 • edited

lperron commented Apr 1, 2024

lperron commented Apr 1, 2024

hanno-becker commented Apr 1, 2024

hanno-becker commented Apr 2, 2024

lperron commented Apr 5, 2024

hanno-becker commented Apr 5, 2024

hanno-becker commented Apr 9, 2024 • edited

lperron commented Apr 17, 2024 • edited

lperron commented Apr 19, 2024 • edited

hanno-becker commented Apr 20, 2024

lperron commented Apr 20, 2024

hanno-becker commented Apr 23, 2024

lperron commented Apr 23, 2024 via email

hanno-becker commented Apr 23, 2024

hanno-becker commented Mar 31, 2024 •

edited

lperron commented Apr 1, 2024 •

edited

hanno-becker commented Apr 1, 2024 •

edited

hanno-becker commented Apr 9, 2024 •

edited

lperron commented Apr 17, 2024 •

edited

lperron commented Apr 19, 2024 •

edited