Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PySMT #26

Open
cnheitman opened this issue Feb 22, 2016 · 12 comments
Open

Support PySMT #26

cnheitman opened this issue Feb 22, 2016 · 12 comments
Labels

Comments

@cnheitman
Copy link
Collaborator

Current support for SMT solvers interaction is provided by a Python module taken from PySymEmu with custom modifications. It only supports Z3 and CVC4 solvers. The goal is to replace the aforementioned module with the package PySMT which supports multiple solvers very easily.

@cnheitman cnheitman added the idea label Feb 22, 2016
@ia-sadosky
Copy link

Implementing this would close issue #6

@marcogario
Copy link

I can help with this (if it is still of interest). We are preparing the new release of pySMT (0.7.0) and afterwards, we would like to better understand possible use-cases that could be improved.

@cnheitman
Copy link
Collaborator Author

Hi! Yes, it is still of interest. Any help will be very much appreciated. Let me know when you are ready and we'll talk about the details.

@marcogario
Copy link

We recently released pySMT 0.7.0 !
I am working on the next release (that will include the theory of Strings) but I would like to start working on this in parallel, so that any insight from integrating pySMT might benefit the next release.

I took a quick look at barf.core.smt and it seems that:

  • You are interfacing with CVC4 and Z3 via SMT-LIB
  • You are using BV and Array, and none of the other theories (FP, Real, Int) or quantifiers. This is an important point because currently we do not support Floating Point
  • You are not using unsat cores or interpolants

If the above is true, integrating pySMT should be reasonably straight forward. I would probably do it directly within the smttranslator module, by replacing calls to smtlibv2 to corresponding calls to pySMT.

Testing-wise, it seems that the tests in test_smttranslator are accessing directly the underlying solver. This means that the tests will have to change as the same time as the implementation. Do you have system tests that cover the end-to-end behavior of the tool?

@cnheitman
Copy link
Collaborator Author

Great! :)

Yes, your assessment is accurate. All the changes should be made in the smttranslatormodule.

Yes, the current tests need to be improved. We use BARF in another project which uses the SMT features extensively so we can test its behavior in case of changes. Unfortunately, that project is not release yet. As a intermediate solution you can run the scripts located in the examples folder (check_constraint{1,2,3}.py) which use most of the functionality you are about to modify.

@marcogario
Copy link

marcogario commented Sep 9, 2017

I looked closer into this and looked into a couple of approaches. In particular, I was attempting to implement a PySMTSolver class that would be equivalent to Z3Solver and CVC4Solver as defined in smtlibv2.py. This would make the change backwards compatible, however, the current implementation mixes a bit the definition of the formula and the use of the solver, and this makes it hard to keep the change contained in one place. For example, the following is a simplified version of what happens in test_smttranslator.py :

    def test_add_mem_reg(self):
        self._solver.reset()
        # add constrains
        mem = self._translator.get_memory()
        eax = self._solver.mkBitVec(32, "eax_0")    # !!!
        constraint = (mem[eax] != 42)    # This is not supported by pySMT
        self._solver.add(constraint)
        trans = self._translator.translate(instr)
        self._solver.add(trans)
        [...]

The symbol eax is created in the solver, instead of being defined in the translator. My understanding is that this is in order to perform the symbol definitions, since that function calls another more general function where the solver is stored within the expression smtlibv2.py:BitVec.

If the introduction of pySMT is meant to completely replace smtlibv2.py, then we can go with a more clean separation of expressions and solvers, but this will break backwards compatibility, and touch many parts of the code, therefore, I wanted to ask you before spending time on this.
In summary:

  • The best way to implement this would to make a cleaner distinction between formula and solver, however this will break existing code. In this case, expressions within the application would be pysmt FNode objects.
  • If backwards compatibility is required, the change will probably have to touch many parts of smtlibv2.py, since you have quite a bit of code handling expressions (e.g., class Symbol), that would need to be extended to detect when running pysmt. In this case, expressions within the application would be barf Symbol objects.

Let me know what you think!

@cnheitman
Copy link
Collaborator Author

I would definetly like a cleaner separation between formula and solver. I'm not sure, right now, which is the best way to achieve this; I need to take a better look into the code. However, I think the first step is to refactor the SmtTranslator tests in order to: a) improve the code quality of the module, b) better test the functionality of the SmtTranslator class and c) extract the code related to the SmtSolver into its own test class. This will provide a better understanding of how the functionality is used and it will help deciding which is the way to go.

In conclusion, let me do the refactoring and then I will get back to you on this issue.

@cnheitman
Copy link
Collaborator Author

An update on this. I refactored the entire smt package. It changed quite a bit from last time. I keep only functionality that was used and removed the rest.

Now there is a cleaner separation between modules and, specially, between formulae and solvers (symbols are no longer created in the solver). Also, I added some more tests for all the modules.

Please, take a look into the new code and let me known whether pySMT integration is possible.

@marcogario
Copy link

https://github.com/programa-stic/barf-project/compare/master...marcogario:intro_pysmt?expand=1

I made an example of how the testcases in test_smtsolver.py would be converted after the introduction of pySMT. There are two points that I would like your opinion on:

  1. SMT Expressions are currently in barf's Symbol (and related) hierarchy of objects. The easiest way to integrate pySMT would be to replace this with pySMT expressions (internally called FNode). Expressions handling and solving would be up-to pySMT. barf would focus on building the expressions and interacting with the solving layer. Would this work for you?

  2. In the test case you used == and != quite a lot among expressions. In pySMT we decided not to support these two operators for infix notation. We have two equivalent operators .Equals and .NotEquals that I used in the testcase. The main reason not to support this is performance. If we allow the use of == and !=, then every time you check for equality of an expression (e.g., when using a formula as key of a dictionary or while iterating over multiple formulas) you will have to create a new expression and then convert it into a boolean if this is being evaluated as a bool. This is a significant performance hit for little gain. Nevertheless, if for your usecase it is essential to do x == y instead of x.Equals(y), then we can try to override this within barf.

@cnheitman
Copy link
Collaborator Author

Hi! I took a look into the test_smtsolver.py example. The code resembles pretty much to the original which is great. Regarding you questions:

  1. Do you mean replacing barf's objects completely or adding a wrapper around pySMT's objects like you did in the example? I prefer the latter approach. I want all smt related code within barf's smt package.

  2. I don't see any problem moving from == and != to .Equals and .NotEquals, specially if it improves performance. As you mention if the need arises we can override them within barf.

I have a question. In your example it is no longer necessary to declare BitVec variables before adding them to the solver (as I currently do with the declare_fun method.) Where exactly are they declared? Within pySMT's Symbol class (when they are instanciated) or within the solver (when they are added as assertions)?

@marcogario
Copy link

In pySMT we distinguish between expressions and the solver being used to reason on the expression. These are two distinct concepts that can live independently of each other. In many cases, we use pySMT to manipulate those expressions, and solving is just one of these manipulations. For this reason, the pySMT object that represents an expression is not linked with the solver. Each solver contains a Converter service that translates expressions back and forth from pySMT representation to the solver specific one. This has the big advantage that you can work on the same expression with multiple solvers, and can reset the state of the solver while preserving the expressions that you have build so far. The main drawback is that you use 2 times the amount of memory, since the expression exists at least twice.

Having said this, the way I see the integration is by replacing the class barf.core.smt.smtsymbol.Symbol with pySMT FNode class. To make this more concrete, I searched for where you import barf.core.smt. I got:

  • within the smt package (e.g., smttranslator)
  • within codeanalyzer package (codeanalyzer.py and gadgetverifier.py)

Outside of the smt package, you can expose the pySMT objects directly. See line 93 of code analyzer:

            ret_val = smtfunction.extract(ret_val, offset, self._arch_info.registers_size[register_name])

If ret_val is a pySMT expression, you can do:

            ret_val = BVExtract(ret_val, offset, self._arch_info.registers_size[register_name])

or

            ret_val = ret_val[offset:self._arch_info.registers_size[register_name]]

However, as an initial step, I will try to see if it is possible to only change smtfunction.py and wrap pySMT there.

@cnheitman
Copy link
Collaborator Author

Great! Thanks for the explanation.
I agree with your proposal of how integrate pySMT.
Let me known if you encounter any issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants