Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAT_ORACLE gives wrong answers #1170

Open
myreen opened this issue Dec 5, 2023 · 11 comments
Open

SAT_ORACLE gives wrong answers #1170

myreen opened this issue Dec 5, 2023 · 11 comments

Comments

@myreen
Copy link
Contributor

myreen commented Dec 5, 2023

This is correct behaviour:

> minisatProve.SAT_PROVE ``b ==> T``;
val it = ⊢ b ⇒ T: thm

This is incorrect behaviour:

> minisatProve.SAT_ORACLE ``b ==> T``;
Exception- SAT_cex ⊢ b ⇒ ¬(b ⇒ T) raised

In other words, the oracle version of the interface to SAT returns a bogus theorem for this simple input.

The following shows that one can prove false from the bogus theorem.

> (minisatProve.SAT_ORACLE ``b ==> T``; fail()) handle minisatProve.SAT_cex th => th;
val it = ⊢ b ⇒ ¬(b ⇒ T): thm
> it |> GEN_ALL |> SPEC T |> REWRITE_RULE [];
val it = ⊢ F: thm
> Tag.dest_tag (tag it);
val it = (["DISK_THM", "HolSatLib"], []): string list * string list

The last line shows that the proof of false depends on a trusted oracle call made by HolSatLib. That is to say: this isn't a soundness bug in HOL but rather a bug report about minisatProve.SAT_ORACLE.

@binghe
Copy link
Contributor

binghe commented Dec 6, 2023

@myreen What's your operating system? I can't reproduce your steps on Intel macOS:

> minisatProve.SAT_PROVE ``b ==> T``;
val it = |- b ==> T: thm

@tanyongkiam
Copy link
Contributor

I can confirm the bug with SAT_ORACLE on my machine

uname -a
Linux yongkiam-ThinkPad-X1C 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct  9 15:34:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

@myreen
Copy link
Contributor Author

myreen commented Dec 6, 2023

@binghe I'm on MacOS running on Mac. But note that you need to call SAT_ORACLE rather than SAT_PROVE.

@binghe
Copy link
Contributor

binghe commented Dec 6, 2023

@binghe I'm on MacOS running on Mac. But note that you need to call SAT_ORACLE rather than SAT_PROVE.

Sorry, I didn't notice that! Now I can reproduce too. (Then this seems not an issue of MiniSAT but the SML code behind SAT_ORACLE).

@binghe
Copy link
Contributor

binghe commented Dec 6, 2023

I think the issue is (at least) in the function invoke_solver" (src/HolSat/minisatProve.sml, line 57-78)

fun invoke_solver solver lfn ntm clauseth cnfv vc is_proved svm sva in_name =
    let val nr = Array.length clauseth
    in
      if access(getSolverExe solver,[A_EXEC]) then
        let
          val answer = invokeSat solver T (SOME vc) (SOME nr) svm sva in_name
        in case answer of
              SOME model => (* returns |- model ==> ~t *)
              let val model' = transform_model cnfv lfn model
              in if is_proved then satCheck model' ntm
                 else mk_sat_oracle_thm([],mk_imp(list_mk_conj model',ntm)) end
            | NONE    => (* returns ~t |- F *)
                replay_proof is_proved sva nr in_name solver vc clauseth
                             lfn ntm NONE
        end
      else (* do not have execute access to solver binary, or it doesn't exist*)
        (if nr > !sat_limit then
           warn "SAT solver not found. Using slow internal prover."
         else ();
         DPLL_TAUT (dest_neg ntm))
    end
end

In the above function, is_proved is false when SAT_ORACLE is used, and true when SAT_PROVE is used.

@binghe
Copy link
Contributor

binghe commented Dec 6, 2023

Hmm... I think just replace mk_sat_oracle_thm([],mk_imp(list_mk_conj model',ntm)) with mk_sat_oracle_thm([ntm],F) will fix the bug. The above mentioned satCheck() actually should always return [.] |- F, which after DISCH_ALL becomes [] |- ~t ==> F (t is the original problem). If we fully trust the SAT solver and do not do this further verification, mk_sat_oracle_thm([ntm],F) will give exactly the same output, only this is a theorem without proof.

binghe added a commit to binghe/HOL that referenced this issue Dec 6, 2023
In case of SAT, instead of making [] |- model ==> ~t, it should make [~t] |- F to simulate the output of satCheck model' ntm
@binghe
Copy link
Contributor

binghe commented Dec 6, 2023

No... if the input term is a /\ ~a (clearly UNSAT), the value of model' in the code is [``a``] and satCheck model' ntm gives [] |- a ==> ~(a /\ ~a), which is indeed in the form model ==> ~t. I was wrong in saying that satCheck always returns something like [.] |- F.

Now I think perhaps the whole idea of SAT_ORACLE is wrong, without calling satCheck the intended oracle theorem cannot be determined, because at that moment the code doesn't even know the input term is SAT or UNSAT... @mn200

@binghe
Copy link
Contributor

binghe commented Dec 6, 2023

I think... it's possible that MiniSAT's behavior has changed, say, under new C++ compilers. Previously on Linux ARM64, MiniSAT every wrongly reported SAT for an UNSAT problem, and such a mistake was later captured by HOL when trying to construct the fake SAT result as a theorem (whose concl is the input term). Now, perhaps MiniSAT is doing the opposite thing: reporting UNSAT for an SAT problem (e.g. b ==> T) but this time the further verification (when constructing the output theorem) has automatically fixed the answer, thus the issue was hidden. Using SAT_ORACLE merely exposed this bug.

@binghe
Copy link
Contributor

binghe commented Dec 8, 2023

Now I think it's MiniSAT's problem on certain inputs (~(b ==> T) is just one of them). Picking another tautology term like b \/ ~b the SAT_ORACLE returns correct answer:

> SAT_ORACLE ``b \/ ~b``;
val it = [oracles: DISK_THM, HolSatLib] [axioms: ] [] ⊢ b ∨ ¬b: thm
> SAT_ORACLE ``b ==> T``;
Exception- SAT_cex [oracles: HolSatLib] [axioms: ] [] ⊢ b ⇒ ¬(b ⇒ T) raised

I tried HOL builds back to k6, and also tried latest HOL built with very old C++ compilers (on Mac OS X 10.6, GCC 4.2), the reported issue with b ==> T is always there. Thus now I think it should be MiniSAT 1.x's original bug.

@mn200
Copy link
Member

mn200 commented Dec 11, 2023

We're using such an old version of MiniSAT that no-one other than us will ever fix it. Is there an obvious test-case we can feed into minisat to demonstrate the problem as a first step?

@binghe
Copy link
Contributor

binghe commented Dec 12, 2023

We're using such an old version of MiniSAT that no-one other than us will ever fix it. Is there an obvious test-case we can feed into minisat to demonstrate the problem as a first step?

Yes, I should first confirm if MiniSAT behaves correct on raw inputs (of DIMACS). But then I found MiniSAT is innocent this time!

In minisatProve.sml, if I manually evaluate the ML code (first, the opens, then all code inside the local block). In function GEN_SAT, I got the following values of its local variables:

val cnfv = SOME "SP": string option
val in_name = "/var/tmp//MLTEMPECqLFC.cnf": string
val lfn = <Redblackmap(1)>: (term, term) RBM.dict
val ntm = ``~(b ==> T)``: term
val sva = fromList[``T``, ``SP0``]: term array
val svm = (2, <Redblackmap(1)>): int * (term, term * int) SVM.dict
val tmpname = "/var/tmp//MLTEMPECqLFC": string
val vc = 1: int

Then, the file contents of in_name ("/var/tmp//MLTEMPECqLFC.cnf") is the following:

c File /var/tmp//MLTEMPECqLFC.cnf generated by HolSatLib
c
p cnf 1 1
1 
0

But the above DIMACS is wrong. Here the input term of SAT_ORACLE is b ==> T, which is equivalent to ^b \/ T. Recall that the goal was to check if this term is a tautology (!b. b ==> T) and the method is to let SAT solver to check the negation of the input term, i.e. ~(b ==> T), which is b /\ F, a CNF.

In DIMACS format, the line p cnf n m indicates there are totally n Boolean variables and m clauses, and the rest lines each represent one clause, each is a disjunction of involved variables (zero or more), ended with zero. Thus b is 1 0, and F should be just 0. The above DIMACS file generated by HOL has two issues:

  1. The line p cnf ... has wrong numbers. It should be p cnf 1 2 instead of p cnf 1 1 (line 3).
  2. Somehow the line for the first clause (1) misses the ending zero, i.e., it should be 1 0 (line 4).

There's a MiniSat web interface [1] taking DIMACS inputs. If I put the above DIMACS contents as is, the solving result is SAT. But after manual fixes according to the above two issues (only the 2nd is fatal), the result is UNSAT, the correct one.

[1] http://logicrunch.it.uu.se:4096/~wv/minisat/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants