Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to collect symbolic address from MemoryAccess #1313

Open
GreenieQwQ opened this issue Mar 20, 2024 · 2 comments
Open

Trying to collect symbolic address from MemoryAccess #1313

GreenieQwQ opened this issue Mar 20, 2024 · 2 comments

Comments

@GreenieQwQ
Copy link

I recently embarked on a project utilizing Triton's Python interface for AArch64 binary analysis and I must express my gratitude for providing such a user-friendly tool.

My current task involves generating a mapping of all registers and memory used within a function to their symbolic states at the end of the function, symbolizing all registers and memory states automatically upon their initial access. As a newcomer to Triton, I've referred to the SeedCoverage class within src/testers/unittests/test_simulation.py to perform dynamic symbolic execution and set a boundary for visiting the same address to mitigate path explosion.

Below is a snippet of the Python code I've been utilizing to symbolize registers and conduct emulation (returning only x0 for test):

def symbolize_inputs(self, seed):
        """Add symbols in memory/register for seed."""
        self.ctx.reset()
        """Seed registers for taken branches."""
        for reg, value in list(seed.items()):
            self.ctx.setConcreteRegisterValue(reg, value)
        """Symbolize all registers."""
        for reg in self.ctx.getAllRegisters():
            try:
                self.ctx.symbolizeRegister(reg, reg.getName())
            except TypeError as e:
                # print(f"Warning: No symbolic value for {reg.getName()} since it is immutable.")
                pass

def seed_emulate(self, pc=DEF_START_ADDRESS):
        """Emulate one run, return (x0_expr, path_condition)."""
        bound_record = dict()
        while pc:
            bound_record[pc] = bound_record.get(pc, 0) + 1
            if bound_record[pc] > self.bound:
                return None, None
            opcode = self.ctx.getConcreteMemoryAreaValue(pc, 4)
            instruction = Instruction(pc, opcode)
            self.ctx.processing(instruction)
            pc = self.ctx.getConcreteRegisterValue(self.ctx.registers.pc)
        # endwhile
        x0_expr = self.ctx.getSymbolicRegister(self.ctx.registers.x0)
        pathCondition = self.ctx.getPathPredicate()
        return x0_expr.getAst(), pathCondition

def new_inputs(self):
        """Look for another branching using current constraints found."""
        astCtxt = self.ctx.getAstContext()
        # Set of new inputs
        inputs = list()
        # Get path constraints from the last execution
        pco = self.ctx.getPathConstraints()
        # We start with any input. T (Top)
        previousConstraints = astCtxt.equal(astCtxt.bvtrue(), astCtxt.bvtrue())
        # Go through the path constraints
        for pc in pco:
            # If there is a condition
            if pc.isMultipleBranches():
                # Get all branches
                branches = pc.getBranchConstraints()
                for branch in branches:
                    # Get the constraint of the branch which has been not taken
                    if branch['isTaken'] == False:
                        # Ask for a model
                        models = self.ctx.getModel(astCtxt.land([previousConstraints, branch['constraint']]))
                        seed = dict()
                        for k, v in list(models.items()):
                            # Get the symbolic variable assigned to the model
                            symVar = self.ctx.getSymbolicVariable(k)
                            # Save the new input as seed. Maybe not working for registers?
                            # seed.update({symVar.getOrigin(): v.getValue()})
                            # Register case
                            seed.update({self.ctx.getRegister(symVar.getOrigin()): v.getValue()})
                        if seed:
                            inputs.append(seed)
            # Update the previous constraints with true branch to keep a good path.
            previousConstraints = astCtxt.land([previousConstraints, pc.getTakenPredicate()])

However, when trying to collect all the referenced symbolic address in seed_emulate, I encountered a challenge. It appears that the MemoryAccess information stored in instruction might only contain concrete access information, even with SYMBOLIC_STORE and SYMBOLIC_LOAD modes enabled. Consequently, I am curious about the following:

  1. How can Triton be utilized to collect all symbolic addresses (or, symbolic accesses) during one emulation?
  2. In cases where uninitialized memory is referenced, does Triton automatically symbolize its value (note that it may be referenced by a symbolic address)? If not, is there an API available to accomplish this task?

It's been an excellent experience for using Triton! Any help would be greatly appreciated!

@JonathanSalwan
Copy link
Owner

It appears that the MemoryAccess information stored in instruction might only contain concrete access information,

Have you tried to use MemoryAccess::getLeaAst()? That should return the symbolic memory access.

@GreenieQwQ
Copy link
Author

GreenieQwQ commented Mar 23, 2024

It appears that the MemoryAccess information stored in instruction might only contain concrete access information,

Have you tried to use MemoryAccess::getLeaAst()? That should return the symbolic memory access.

Much appreciated! By digging out MemoryAccess::getLeaAst() and its related documentation, I've learned a lot from Triton's memory modeling procedure.

However, I'm still confused about how to symbolize memory before their initial access. I attempted a simple test case as shown below, yet I realized that I may not be able to retrieve the very memory access until having processed the instruction (e.g., I've tried using disassembly before processing the instruction, but the memory access information in getOperands is incomplete until having the instruction processed).

def test(self):
        self.ctx = TritonContext(ARCH.X86_64)
        self.ctx.setMode(MODE.MEMORY_ARRAY, True)
        self.ctx.setMode(MODE.SYMBOLIZE_LOAD, True)
        self.ctx.setMode(MODE.SYMBOLIZE_STORE, True)
        code = [
            (1, b"\x8b\x0c\x18"), # mov ecx, dword ptr [rax + rbx]
            (2, b"\x48\x81\xf9\xad\xde\x00\x00"), # cmp rcx, 0xdead
        ]

        self.ctx.symbolizeRegister(self.ctx.registers.rax, 's_rax')
        self.ctx.symbolizeRegister(self.ctx.registers.rbx, 's_rbx')
        self.ctx.symbolizeRegister(self.ctx.registers.rcx, 's_rcx')
        count = 0 # counter for symbolized memory
        for i, op in code:
            i= Instruction(op)
            self.ctx.processing(i)
            if i.isMemoryRead():
                a = i.getLoadAccess()
                readMemory = a[0][0]
                if not self.ctx.isMemorySymbolized(readMemory):
                    memAlias = f'mem_{count}'
                    self.ctx.symbolizeMemory(readMemory, memAlias)
                    print(f"symbolize memory {readMemory} with {memAlias}")
                    count += 1
            if i.isMemoryWrite():
                # similar to memory read
        zf = self.ctx.getRegisterAst(self.ctx.registers.zf)
        m = self.ctx.getModel(zf == 1)
        print(m)

Ideally, a model where [rax+rbx] is set to 0xdead will be printed. However, since the memory is symbolized after its initial access, m is empty in actual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants