Minor Optimization for the Generic Dominator Calculation Algorithm #1424

omerk2511 · 2022-03-27T17:35:13Z

The generic dominator calculation algorithm sets the initial todo set to be just all the nodes. This leads to redundant calculations which can be saved.

Explanation:

The algorithm's main loop tries to see if the dominators of the current node have changed by calculating the intersection of every previous node's dominators. Since the initial dominator set of all nodes is always the entire node set, except for the head node (whose dominator set is set([head])), this intersection calculation will always initially return the entire node set for all nodes except the next nodes of the head node. Therefore, we can prevent this waste of cycles by setting the initial todo set to contain only them.

Note - I used the terms previous and next nodes instead of predecessors and successors, since this function supports both dominators and postdominators. I initially didn't notice the support for postdominators as well, and implemented an additional optimization which was correct only for dominator calculation. Luckily, the regression tests caught this.

Benchmarks:

When adding a log of the amount of such "misses" and running the graph.py test suite, these are the results before this PR:

# of misses: 1
# of misses: 5
# of misses: 4
# of misses: 2
# of misses: 3
# of misses: 1
# of misses: 0
# of misses: 5
# of misses: 1
# of misses: 0
# of misses: 5
# of misses: 3

And after:

# of misses: 1
# of misses: 0
# of misses: 1
# of misses: 0
# of misses: 0
# of misses: 1
# of misses: 0
# of misses: 0
# of misses: 1
# of misses: 0
# of misses: 0
# of misses: 2

Not so dramatic - but I guess it might be beneficial for analysis of huge CFGs... Let's test it for large CFGs as well. I used this tester program which creates a pretty linear CFG of 20K nodes:

from miasm.core.graph import *
from random import randint

for i in range(5):
    x = {}
    for a in range(1, 20001):
        x[a] = str(randint(1000, 10000))

    g = DiGraph()
    g.add_edge(x[1], x[2])
    g.add_edge(x[2], x[3])
    g.add_edge(x[2], x[4])
    g.add_edge(x[3], x[5])
    g.add_edge(x[4], x[5])
    g.add_edge(x[5], x[2])
    g.add_edge(x[2], x[6])
    for a in range(1, 20000):
        g.add_edge(x[a], x[a+1])

    dominators = g.compute_dominators(x[1])

Here are the results before this PR:

# of misses: 13471
# of misses: 13558
# of misses: 13658
# of misses: 13541
# of misses: 13652
python3 ./graph.py  95.40s user 4.41s system 99% cpu 1:39.82 total

And after:

# of misses: 5687
# of misses: 5633
# of misses: 5624
# of misses: 5602
# of misses: 5771
python3 ./graph.py  30.75s user 1.32s system 99% cpu 32.075 total

About 3 times faster, nice...

…dant initial calculations

… any node in the graph is correct, but this is a *generic* function, so it should handle postdominators as well, whose head can be a predecessor of some node in the graph. Thus, the node == head check is still required.

serpilliere · 2022-04-07T18:48:48Z

Hey @omerk2511
I will analyze this in a near future: I have to think about it :)
Thank you for the fix!

Optimized the generic dominator calculation a bit by preventing redun…

8493ca2

…dant initial calculations

omerk2511 changed the title ~~Minor Generic Dominator Calculation Algorithm Optimization~~ Minor Optimization for the Generic Dominator Calculation Algorithm Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor Optimization for the Generic Dominator Calculation Algorithm #1424

Minor Optimization for the Generic Dominator Calculation Algorithm #1424

omerk2511 commented Mar 27, 2022 •

edited

serpilliere commented Apr 7, 2022

Minor Optimization for the Generic Dominator Calculation Algorithm #1424

Are you sure you want to change the base?

Minor Optimization for the Generic Dominator Calculation Algorithm #1424

Conversation

omerk2511 commented Mar 27, 2022 • edited

serpilliere commented Apr 7, 2022

omerk2511 commented Mar 27, 2022 •

edited