Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator Gadgets #42

Open
wants to merge 12 commits into
base: arm-port
Choose a base branch
from
Open

Generator Gadgets #42

wants to merge 12 commits into from

Conversation

cwshugg
Copy link
Collaborator

@cwshugg cwshugg commented Jan 26, 2023

This PR implements the concept of gadgets into Revizor's program generator. Here, a gadget refers to a small list of assembly instructions placed in a specific order for a specific reason. I've made this PR description somewhat lengthy to make sure I document my thought process in implementing this feature.

Benefits of Gadgets

Considering Spectre v1, the below gadget (in pseudocode), which features two memory load instructions, would be useful for causing an extra line in the CPU cache to be allocated during speculative execution.

LOAD REG1, [REG2] # REG1 is destination, REG2 is source address
LOAD REG3, [REG1] # REG1's result from previous load is used as the next load's address

Or consider another gadget, potentially useful for increasing the CPU's speculative execution window by creating a chain of data-dependent instructions:

MUL REG1, REG2, REG3
MUL REG4, REG1, REG3 # depends on REG1
MUL REG2, REG4, REG3 # depends on REG4
MUL REG3, REG2, REG1 # depends on REG2
# ...

Because Revizor's program generator randomly places assembly instructions, specific assembly sequences like these won't easily be generated. Gadgets make this possible.

Gadget File

At runtime, the user can specify the path to a gadget file through a config field (gadget_file). This file is formatted in JSON and uses the same syntax for specifying assembly instructions as Revizor's ISA specification parser. Within this file, the user can place interesting assembly gadgets he/she wants to see in Revizor's generated programs. Each gadget has a name and an optional weight (used to tune how often gadgets appear in generated programs). Here's an example of what one gadget might look like:

{
    "name": "double_subtraction",
    "description": "Two consecutive subtraction instructions.",
    "weight": 0.5,
    "instructions":
    [
        {
            "name": "SUB", "category": "general", "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
            ],
            "implicit_operands": [
                {"type_": "FLAGS", "width": 0, "src": false, "dest": false, "values": ["w", "", "", "w", "w", "", "", "", "w"]}
            ]
        },
        {
            "name": "SUB", "category": "general", "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
            ],
            "implicit_operands": [
                {"type_": "FLAGS", "width": 0, "src": false, "dest": false, "values": ["w", "", "", "w", "w", "", "", "", "w"]}
            ]
        }
    ]
}

If the generator chooses to place this particular gadget into a program, two randomized SUB instructions will be placed (consecutively) within a basic block.

(I've created an example gadget file at src/arm64/tests/example_gadgets.json).

Gadget Selection

As the program generator fills basic blocks with instructions, it may choose to instead select a random gadget from the gadget file and place it into the program. This pseudocode describes the basic idea of how it decides:

while (program isn't full yet)
{
    bb = choose_random_basic_block();
    if (random_chance_to_use_gadget())
    {
        gadget = select_random_gadget_that_fits();
        bb.insert(gadget.instructions);
    }
    else
    {
        instruction = generate_random_instruction();
        bb.insert(instruction);
    }
}

The randomness of gadget placement is configurable using a few new config fields:

  • CONF.avg_gadgets_per_bb dictates, roughly, how many gadgets are selected and placed into each basic block of the test case.
  • CONF.max_gadgets_per_bb forces a maximum number of gadgets that can be placed into each basic block.

Gadget placement will also not occur if:

  • A gadget is too big (i.e. won't fit in CONF.program_size)
  • The generator is in the middle of placing two memory access instructions
  • A gadget would exceed the average number of memory accesses (CONF.avg_mem_accesses)

Operand Aliasing

I've also given gadgets the ability to specify unique operand aliases, to control how instruction operands (such as registers) are used within the same gadget. Defining an operand alias (let's say it's named REG_A) as a GPR operand allows us to specify REG_A in the operands of the gadget's instructions. When the gadgets is placed into the code, all instances of REG_A will be replaced by the same randomly-chosen register.

See this gadget as an example:

{
    "name": "spectre_double_load",
    "description": "Two consecutive load instructions in a spectre-like style (with the first load result used as the second load's address)..",
    "weight": 0.75,
    "operand_aliases":
    [
        {
            "name": "REG_A",
            "operand": {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
        },
        {
            "name": "REG_B",
            "operand": {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
        },
        {
            "name": "REG_B",
            "operand": {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
        }
    ],
    "instructions":
    [
        {
            "name": "LDR",
            "category": "general",
            "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "type_": "REG", "width": 64, "values": ["REG_A"]},
                {"dest": false, "src": true, "type_": "MEM", "width": 64, "values": ["REG_C"]},
                {"dest": false, "src": true, "type_": "IMM", "values": ["[-256-255]"]}
            ],
            "implicit_operands": []
        },
        {
            "name": "LDR",
            "category": "general",
            "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "type_": "REG", "width": 64, "values": ["REG_B"]},
                {"dest": false, "src": true, "type_": "MEM", "width": 64, "values": ["REG_A"]},
                {"dest": false, "src": true, "type_": "IMM", "values": ["[-256-255]"]}
            ],
            "implicit_operands": []
        }
    ]
}

The operand_aliases allow this gadget to use random registers to fill in REG_A, REG_B, and REG_C, but keep their values consistent across all uses in each of the two load instructions.

Other Additions

Literal Operand

To support extra operands that may not fit under the existing types, I've implemented a LiteralOperand, which allows for any arbitrary literal string to be used as an operand for an assembly instruction. Take this EOR instruction specification as an example:

{
    "name": "EOR",
    "category": "general",
    "control_flow": false,
    "operands": [
        {"dest": true, "src": false, "type_": "REG", "width": 32, "values": ["W0"]},
        {"dest": false, "src": true, "type_": "REG", "width": 32, "values": ["W0"]},
        {"dest": false, "src": true, "type_": "REG", "width": 32, "values": ["W0"]},
        {"type_": "LITERAL", "values": ["ROR #9"]}
    ],
    "implicit_operands": [
        {"type_": "FLAGS", "width": 0, "src": false, "dest": false, "values": ["w", "", "", "w", "w", "", "", "", "w"]}
    ]
}

The final operand is a "LITERAL", and contains the value "ROR #9". This type of operand will be placed in the assembly code as-is. In this example, it allows the EOR instruction to have a right-rotate barrel-shift operand:

EOR W0, W0, W0, ROR #9

…ompletely optional and are specified by the user in a JSON file. Each gadget specifies a number of instruction specifications (in the same syntax as the isa_spec's base.json file). During program generation, the generator may choose a random gadget. If this happens, all instructions in the gadget's list will be generated and placed. It's a customizeable way for the user to give the generator chunks of assembly code that might get included in the test
…lacing gadgets. Gadget placement is mindful of the number of memory accesses already placed into the program. If placing a specific gadget will exceed the config field's value, the gadget will not be placed
…ithin gadgets. Control-flow is already decided based on the basic blocks and successors generated in earlier steps. We don't want control-flow in gadgets, as this will mess with the test DAG
…f 1) the average number of gadgets placed into each basic block, and 2) the maximum number of gadgets placed into each basic block. Also implemented individual probabilities for each gadget, which allows for the gadgets in a gadget file to be weighed differently
@cwshugg cwshugg self-assigned this Jan 26, 2023
@cwshugg cwshugg marked this pull request as ready for review January 26, 2023 17:00
@cwshugg
Copy link
Collaborator Author

cwshugg commented Feb 8, 2023

Hi @OleksiiOleksenko - have you had the chance to look at this? I had another idea that could be added to this PR; what do you think of this?

Currently I've implemented this gadget system to forbid any control-flow instructions. The motivation here was to prevent the disruption of your test case's DAG; if control-flow instructions were allowed in gadgets, the user could force more basic blocks to be added to the test case that aren't a part of the official DAG.

However, what do you think about allowing small, self-contained, loops to be present? A "loop gadget" might look like this:

MOV X0, #4
LOOP_GADGET_START:
SUBS X0, X0, #1
B.PL LOOP_GADGET_START

Something like this might allow for forced misprediction to occur, without creating infinite loops or doing anything unreasonable. If this isn't worth implementing as a gadget, perhaps implementing loops would be best as a separate feature of the program generator. (I might actually prefer the latter.) What do you think?

… allows any arbitrary assembly to be placed as an operand to an instruction. I'm using this as a way to specify arm64 barrel-shifting operations, but it could be used for *any* string literal
@cwshugg
Copy link
Collaborator Author

cwshugg commented Feb 8, 2023

I'm also implementing a way to "alias" operands in a gadget, to allow for the same unique operand to be used in two different places in the same gadget. I'll push this up as soon as it's done, but for now, I'll mark the PR as a draft. More details to come on this particular part of the PR.

@cwshugg cwshugg marked this pull request as draft February 8, 2023 21:50
…es use unique registers and allow for more control over the random-ness of gadget instruction operands
@cwshugg cwshugg marked this pull request as ready for review February 9, 2023 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant