Data set failing using OR-Tools #1

torressa · 2024-03-07T13:19:23Z

Running data_extraction_from_rf_experiments.py with method = "cp-sat" fails on the adult dataset as follows:

Local Config:
 dataset =  adult
 n_estimators =  1
 max_depth_t =  None
 seed =  0
Using dataset adult, training set size is 10 with 19 attributes.
accuracy_train= 1.0 accuracy_test= 0.7657718120805369
RF parsing done!
Model creation done!

Starting CP-SAT solver v9.8.3296
Parameters: random_seed: 0 max_time_in_seconds: 18000 log_search_progress: true num_workers: 8

Initial satisfaction model '': (model_fingerprint: 0x388a71457e5c97d2)
#Variables: 190
  - 190 Booleans in [0,1]
#kLinear0: 10
#kLinearN: 10 (#terms: 60)

Starting presolve at 0.00s
INFEASIBLE: 'proven during initial copy of constraint #10:
linear {
  domain: [1, 1]
}
With current variable domains:
'

Presolve summary:
  - 0 affine relations were detected.
Problem closed by presolve.
CpSolverResponse summary:
status: INFEASIBLE
objective: NA
best_bound: NA
integers: 0
booleans: 0
conflicts: 0
branches: 0
propagations: 0
integer_propagations: 0
restarts: 0
lp_iterations: 0
walltime: 0.000337
usertime: 0.000337
deterministic_time: 0
gap_integral: 0

Traceback (most recent call last):
  File "DRAFT/data_extraction_from_rf_experiments.py", line 265, in <module>
    dict_res = extractor.fit(
               ^^^^^^^^^^^^^^
  File "DRAFT/DRAFT.py", line 299, in fit
    self.perform_reconstruction_v1_CP_SAT(
  File "DRAFT/DRAFT.py", line 474, in perform_reconstruction_v1_CP_SAT
    raise RuntimeError(
RuntimeError: Infeasible model: the reconstruction problem has no solution. Please make sure the provided one-hot encoding constraints are correct. Else, report this issue to the developers.

I have not touched the one-hot encoding and it works for the other 2 datasets.

Additionally, as a general comment, model-building time in your experiments is part of the solve_duration_time, maybe this should be considered as a separate timer as it doesn't reflect the solver's ability to solve the problem but the speed of the API (which can of course vary depending on the implementation and the language used).

The text was updated successfully, but these errors were encountered:

ferryjul · 2024-03-07T19:36:27Z

Thank you for your feedback @torressa ! I've been able to reproduce this error. It was due to an edge case where all training data belongs to the same class - and the resulting trees have a single leaf node (and no internal node). I fixed that in our code, the error should no longer appear! (however, note that in such a case the forest brings no valuable information for reconstruction)

Regarding the fact that we include the model-building time in the returned solve_duration_time, it is due to the fact that we aim at measuring the runtime of our entire method - and not the performances of the solvers themselves. Indeed, model creation would be way faster using the C++ APIs rather than the Python ones!
We could have done it the other way to actually quantify the solver's ability to solve the problem, and may modify that later.
In our experiments, we assign limited run-times using the timeout parameter, which applies to the solvers' runtimes only as we cannot bound the models' creation time (and so the comparisons presented in the Appendices of our paper only compare the solvers' runtimes once the model is built).

I'm closing this issue now as the problem is fixed, but don't hesitate to reach out for any additional comment/feedback!

torressa · 2024-03-07T20:56:45Z

Cool! Thanks for the clarification!
Very cool piece of work BTW!

ferryjul self-assigned this Mar 7, 2024

ferryjul added a commit that referenced this issue Mar 7, 2024

Fix issue #1

19273ec

ferryjul closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data set failing using OR-Tools #1

Data set failing using OR-Tools #1

torressa commented Mar 7, 2024 •

edited

ferryjul commented Mar 7, 2024 •

edited

torressa commented Mar 7, 2024

Data set failing using OR-Tools #1

Data set failing using OR-Tools #1

Comments

torressa commented Mar 7, 2024 • edited

ferryjul commented Mar 7, 2024 • edited

torressa commented Mar 7, 2024

torressa commented Mar 7, 2024 •

edited

ferryjul commented Mar 7, 2024 •

edited