Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete model #1019

Merged
merged 13 commits into from
Jan 18, 2024
Merged

Complete model #1019

merged 13 commits into from
Jan 18, 2024

Conversation

Halbaroth
Copy link
Collaborator

@Halbaroth Halbaroth commented Jan 2, 2024

This PR fixes the issue #778. The suggested solution by this PR is as follows:

  • Collect user-declared symbols in D_cnf;
  • As the function make of the module D_cnf returns a list of
    commands, we add a new command Decl in Frontend to declare
    symbols;
  • The declared symbols are given to the SAT solver through a new
    function declaration in sat_solver_sig;
  • These symbols are collected in vectors and we keep indexes in another
    vector in order to manage properly the push/pop mechanism.
  • Finally the declared symbols are given to Uf and we create the empty
    model with them.

Another annoying point comes from the current data structure used in
ModelMap. The type of ModelMap saves model values in the form of graphs.
A graph is a finite set of constraints but there is no appropriate
representation for the graph without constraints. Let's imagine we have
the input:

(set-option :produce-models true)
(set-logic ALL)
(declare-fun (Int) Int)
(check-sat)
(get-model)

We expect an output as follows:

(
  (define-fun f (_x Int) (@A as Int))
)

But Graph.empty cannot hold the abstract value @a. A naive solution
consists in using an ADT:

type graph =
| Empty of Id.typed
| G of Graph.t

But we'll add an extra indirection. Probably the best solution would be
to use a custom data structure instead of a map to store the constraints.
In this PR, we detect in ModelMap.add if the only constraint in the
graph is of the form something -> abstract value. In this case we
remove this constraint.

This solution isn't perfect as explained in issue #1018. The biggest
drawback of this solution is the fact we have to declare symbols in the
SAT API, otherwise models returned by get_model could be
incomplete.

@Halbaroth Halbaroth added the models This issue is related to model generation. label Jan 2, 2024
@Halbaroth Halbaroth added this to the 2.6.0 milestone Jan 2, 2024
Copy link
Collaborator

@bclement-ocp bclement-ocp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a mechanism to keep track of symbols that were declared, and ensure that all declared symbols are defined in generated models. The general design is sound, but I would like the implementation of ModelMap to be changed to no longer assume that expressions can't be abstract names. The unused DeclSet functors should also be removed.

@@ -53,6 +53,12 @@ module HT = Hashtbl.Make(
let hash = Fun.id
end)

module DeclSet = Set.Make
(struct
type t = Id.typed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use the type here? Can we have multiple declarations with the same name but different types?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is because we need the type in Uf, this should be a map from identifiers to types rather than a set of typed identifiers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this doesn't seem to be used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to remove it ;)

src/lib/frontend/d_cnf.ml Show resolved Hide resolved
Comment on lines 2104 to 2106
let decl = match td with
| `Type_decl (td, _def) ->
mk_ty_decl td;
None

| `Term_decl td ->
Some (mk_term_decl td)
in
begin match decl with
| Some d ->
let st_loc = dl_to_ael dloc_file loc in
C.{ st_decl = Decl d; st_loc } :: acc
| None ->
acc
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to build an option to destruct it immediately:

Suggested change
let decl = match td with
| `Type_decl (td, _def) ->
mk_ty_decl td;
None
| `Term_decl td ->
Some (mk_term_decl td)
in
begin match decl with
| Some d ->
let st_loc = dl_to_ael dloc_file loc in
C.{ st_decl = Decl d; st_loc } :: acc
| None ->
acc
end
begin match td with
| `Type_decl (td, _def) ->
mk_ty_decl td;
acc
| `Term_decl td ->
let d = mk_term_decl td in
let st_loc = dl_to_ael dloc_file loc in
C.{ st_decl = Decl d; st_loc } :: acc
end

end
in
aux [] dcl;
acc
aux [] dcl @ acc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: acc could be passed as a new 3rd argument to aux here (to return instead of [] in the final case) so that we avoid building a list just to concatenate it to another one.

Comment on lines 180 to 181
declare_ids : Id.typed Vec.t;
declare_lim : int Vec.t;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to do this dance in Fun_sat which is used as a functional data structure, this can simply be a declare_ids : Id.typed list field that is updated functionally ({ env with declare_ids = id :: env.declare_ids }) in declare.

Copy link
Collaborator Author

@Halbaroth Halbaroth Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my first implementation and it didn't work. A functional solution would be to use a list of lists:

declare_stack: Id.typed list list

We have to pop/push this stack in Fun_sat.pop/Fun_sat.push.
I didn't adopt this solution because I was afraid that a list of lists isn't cache-friendly. I understand that you want to keep the environment of FunSAT as pure as possible.

Honestly, I believe we should pull up all the imperative stuff of FunSAT into Funsat_frontend and use a stack of environment of FunSAT to simulate the push and the pop functions. In this design, we'll use a vector for declarations in the imperative environment of Funsat_frontend.

I also tried to implement the feature in Funsat_frontend without modifying the environment of FunSAT but we have no access to the right environment of Theory here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, push and pop forces us to be imperative. Oh well. I guess the way to do this while staying consistent with the current Fun_sat module is Id.typed list Stack.t rather than Id.typed list list then.

I didn't adopt this solution because I afraid that a list of lists isn't cache-friendly.

You did write too much C++ :) A list of lists (or a stack of lists, which is the same thing) is very probably not going to be an issue here because we never actually look at the lists until model generation (and also because this only impacts calls to push and pop, which is virtually 0% of Alt-Ergo's runtime).

In this design, we'll use a vector for declarations in the imperative environment of Funsat_frontend.

If we use Funsat_frontend for push and pop we don't need a separate vector for declarations in Funsat_frontend, we can simply have declare_ids : Id.typed list in Funsat which will get tracked by the main stack in the frontend, I believe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to use Id.typed list Stack.t :)

Comment on lines 70 to 71
declare_ids : Id.typed Vec.t;
declare_lim : int Vec.t;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler to use a vector of lists here:

mutable declare_ids : Id.typed list ;
declare_queue : Id.typed list Vec.t ;

Copy link
Collaborator Author

@Halbaroth Halbaroth Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, Id.typed list Vec.tis a stack, not a queue in your suggestion, right?

I adopt this solution because it's the current style of SatML.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's a stack (for some reasons they are called "queues" elsewhere in satML but we can use the proper teminology ;) ).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why do we need an extra field declare_ids here? The top element of the stack can be accessed in O(1). I believe

declare_stack: Id.typed list Vec.t 

is sufficient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the algorithmic complexity is identical, the constant term is not the same; having a separate field here avoids having to check that the vector is not empty on all accesses. It probably does not matter here because we never look at the declarations except during model generation but for the other vectors in SatML which are accessed much more frequently it might matter.

It's fine to use a non-empty Vec.t here (or even a Stack.t from the stdlib actually).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use the current design with an extra field.

let values = P.add sy (Graph.add arg_vals ret_val graph) values in
{ values; suspicious }

let empty ~suspicious = { values = P.empty; suspicious }
module DeclSets = Set.Make
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem used.

it means there is no constraint on this graph. We replace it by
the graph with the only constraint given by [arg_vals] and
[ret_val]. *)
if Graph.cardinal graph == 1 then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use == to compare integers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, I did too much C++ lately :)

Comment on lines 161 to 164
let arg_vals = List.map Expr.mk_abstract arg_tys in
let ret_val = Expr.mk_abstract ret_ty in
let graph = Graph.add arg_vals ret_val Graph.empty in
P.add sy graph values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are using a graph with a single "dummy" definition here to then detect it in add and fix things up if the graph actually has definitions. This is quite hacky in part because it introduces a global invariant that abstract values are never otherwise placed into graphs — I believe it is true currently (although I actually had to check in several places to say so) but could easily cause subtle bugs in the future due to unrelated refactorings.

Your suggestion in the PR message to use an ADT (or a record carrying an explicit default value) would be better here; I don't think the extra indirection will be a problem (especially given that it will only impact model generation).

Copy link
Collaborator Author

@Halbaroth Halbaroth Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a hackish solution. I think the best solution would be to add a function to generate dummy values. If you run cvc5 or z3 on an input file with completely unused symbols, they generate dummy values for them. I didn't do it in this PR because I want a small modification here in order to implement (get-value).

Besides, I'm not sure to understand what you mean by a global invariant here. The add replaces a valid model by another valid model and you don't need to maintain an invariant outside ModelMap because the only way to add new constraints in your graph is by using ModelMap.add.

I agree the indirection isn't a big deal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand what you mean by a global invariant here. The add replaces a valid model by another valid model and you don't need to maintain an invariant outside ModelMap because the only way to add new constraints in your graph is by using ModelMap.add.

I mean that in the current implementation it is incorrect to call ModelMap.add with a ret_val that is an abstract value:

  • The graph is initially { [@a0] -> @a1 } as created by empty
  • Adding the mapping 0 -> @x, the graph becomes { [0] -> @x } because it only contains a single abstract value
  • Adding the mapping 1 -> @y, the graph becomes { [1] -> @y } because it only contains a single abstract value

But this is incorrect and now we will print (define-fun f ((_ Int)) T @y) but (f 0) must be @x, not @y (abstract values with different names are distinct).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm right, it should be fine with the current state of the code but it's a fragile design. I'll fix it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really determine whether this file is intended to test model generation or the interaction of model generation with incremental solving.

In general it is better to have smaller tests so that it is easier to understand what they are actually testing — for instance, is it important to have x and z in the last get-model here, or are we actually testing model generation for functions?

I'd prefer if you could split this test into smaller files, ideally with a single (check-sat) per problem file.

(As an aside: are we currently guaranteed to have the same abstract values in two consecutive calls to (get-model)?)

@Halbaroth Halbaroth linked an issue Jan 3, 2024 that may be closed by this pull request
Copy link
Collaborator

@bclement-ocp bclement-ocp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — just a couple nits and one question about popping too much but it is not really related to this PR per se.

@@ -177,6 +177,12 @@ module Make (Th : Theory.S) = struct
unit_facts_cache : (E.gformula * Ex.t) ME.t ref;
last_saved_model : Models.t Lazy.t option ref;
unknown_reason : Sat_solver_sig.unknown_reason option;

declare_top : Id.typed list ref;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unless there is a reason to pass the ref around, it is usually better to use mutable record fields rather than refs, both for performance (it is slightly more efficient, although as we have established it won't matter here) and to make the intent more clear, in particular that there will be no sharing of the field.

Copy link
Collaborator Author

@Halbaroth Halbaroth Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my first attempt of course for sake of performance but it doesn't work. It seems we need to share this field between several environments so we cannot avoid the double indirection with the current implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that may sound like a bug? We do create new environments (copies) all the time, but the old ones should no longer be used (we overwrite the fields in Fun_sat_frontend). The pop issue should also no longer be relevant, because we can write to the mutable field here. Can you elaborate on what "it doesn't work" means?

Copy link
Collaborator Author

@Halbaroth Halbaroth Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check what's happened. When I tried using a mutable field instead of a reference, FunSAT keeps declarations of popped assertion levels. Then I switched to a reference (I only did this modification) and it magically worked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with a mutable field you need to set acc.declare_top <- Stack.pop acc.declare_tail not env.declare_top <- Stack.pop env.declare_tail in the pop function (which I'd guess is the issue given what you describe).

try env.declare_top := Stack.pop env.declare_tail
with Stack.Empty ->
(* Happens if we perform more pops than pushes. *)
invalid_arg "Fun_sat.pop"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this actually happen in practice, or is there some mechanism to catch it before the fact? We shouldn't show a naked exception to the user in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no mechanism to catch it if you use the SAT API with the library. But there is a mechanism to prevent it with the binary and this hero is Dolmen :)

I may create a new error in Error to manage this case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exception is appropriate for library use, thanks for checking!

(fun graph ->
match graph with
| C constraints -> C (Constraints.map (subst_in_term id e) constraints)
| Free a -> Free (subst_in_term id e a)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Free a -> Free (subst_in_term id e a)
| Free _ -> graph

Copy link
Collaborator Author

@Halbaroth Halbaroth Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my first code but let's imagine you're using the library and you get a model with SAT.get_model. One of the values of the model is an abstract value. You may want to substitute it by something else. Sometimes, it will work because this is not a free graph, sometimes the function subst silently ignores your request.

My code isn't good because after doing this substitution, the user could add a new constraint to the graph but the current add function will erase the effect of the previous substitution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model is an output value of the solver, I don't think the use case where we allow people to alter the model after the fact is one that we want to support. If people want to change the model they need to add assertions to the problem and call solve again to get a new model.

Under that assumption, the semantics of the Free constructor is "this function is unconstrained", and the inner abstract value is a "dummy" name that is guaranteed to not occur anywhere else in the model (by construction). Given that semantics, we shouldn't substitute under the Free constructor.

If we want to have a proper semantics for the graphs, the only easy way I see given the current architecture is a record with a default value instead, and print something like (define-fun f (x Int) (ite (= x 0) 0 @a0)) when then only constraint is [0] -> 0 — but again, I don't think we need to do this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under that assumption, the semantics of the Free constructor is "this function is unconstrained", and the inner abstract value is a "dummy" name that is guaranteed to not occur anywhere else in the model (by construction). Given that semantics, we shouldn't substitute under the Free constructor.

Note, I don't care either way — I think both implementations are OK as long as we don't expect the user to modify the model. If we do, neither implementation is correct as you point out (but again — I don't think we want to support this).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we don't need to support it. At some point, we should stop to expose subst in the library.

@bclement-ocp
Copy link
Collaborator

@Halbaroth should we merge this?

@bclement-ocp
Copy link
Collaborator

@Halbaroth second ping; are we good to merge or did you want to make additional changes?

This PR fixes the issue OCamlPro#778. The suggested solution by this PR is as follows:
- Collect user-declared symbols in `D_cnf`;
- As the function `make` of the module `D_cnf` returns a list of
  commands, we add a new command `Decl` in `Frontend` to declare
  symbols;
- The declared symbols are given to the SAT solver through a new
  function `declaration` in `sat_solver_sig`.
- These symbols are collected in vectors and we keep indexes in another
  PR in order to manage properly the push/pop mechanism.
- Finally the declared symbols are given to `Uf` and we create the empty
  model with them.

Another annoying point comes from the current datastructure used in
`ModelMap`. The type of `ModelMap` saves model values in the form of graphs.
A graph is a finite set of constraints but there is no appropriate
representation for the graph without constraints. Let's imagine we have
the input:
```smt2
(set-option :produce-models true)
(set-logic ALL)
(declare-fun (Int) Int)
(check-sat)
(get-model)
```
We expect an output as follows:
```smt2
(
  (define-fun f (_x Int) (@A as Int))
)
```
But `Graph.empty` cannot hold the abstract value `@a`. A naive solution
consists in using an ADT:
```ocaml
type graph =
| Empty of Id.typed
| G of Graph.t
```
But we'll add an extra indirection. Probably the best solution would be
to use a custom datastructure instead of a map to store the constraints.
In this PR, we detect in `ModelMap.add` if the only constraint in the
graph is of the form `something -> abstract value`. In this case we
remove this constraint.

This solution isn't perfect as explained in issue OCamlPro#1018. The biggest
drawback of this solution is the fact we have to declare symbols in the
SAT API, otherwise models returned by `get_model` could be
incomplete.
@Halbaroth Halbaroth merged commit 373f903 into OCamlPro:next Jan 18, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models This issue is related to model generation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong model in the non-regression tests
2 participants