Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2045 cfg implement abs int driver #2050

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

mayanje
Copy link
Contributor

@mayanje mayanje commented Oct 15, 2021

What have been done

  • The interface IMultiBranchContext is removed
  • The class MultiBranchContext is moved in Graph namespace
  • CfgAbstractInterpretation class is added in Graph with Environment and Metrics subclasses, and also IObserver interface.
  • Test cases have been added in BasicAbstractInterpretationTest

@mayanje mayanje self-assigned this Oct 15, 2021
@trafico-bot trafico-bot bot added the 🔍 Ready for Review Pull Request is not reviewed yet label Oct 15, 2021
TypeCobol.Analysis/Graph/BasicBlock.cs Outdated Show resolved Hide resolved
TypeCobol.Analysis/Cfg/ControlFlowGraphBuilder.cs Outdated Show resolved Hide resolved
TypeCobol.Analysis/Cfg/ControlFlowGraphBuilder.cs Outdated Show resolved Hide resolved
TypeCobol.Analysis/Graph/CfgAbstractInterpretation.cs Outdated Show resolved Hide resolved
/// <summary>
/// The context of a Multi Branch instruction like IF or Evaluate.
/// </summary>
public class MultiBranchContext<N, D> : System.ICloneable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the abstract interpretation is done parser-side, the context class does not need to be public. But ok we can keep it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in qualis test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in unit tests or in the analyzer code ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In analyzer code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a problem then because we have to ensure all blocks or block indexes publicly exposed through this class are the 'final' ones otherwise we'll have to give access to the relocation map (which we don't want).
the MultiBranchContext may not be required with this new observer mechanism on the abstract interpretation.

@fm-117 fm-117 self-requested a review October 15, 2021 15:37
/// <summary>
/// The Abstract Interpretation Stack
/// </summary>
private Stack<BasicBlock<N, D>> InterpretationStack { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly field _interpretationStack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in dc9a4c4

/// <summary>
/// The DFS Run stack
/// </summary>
private Stack<BasicBlock<N, D>> DFSStack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly + rename (_dfsStack)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in dc9a4c4

/// <summary>
/// Already Visited Block
/// </summary>
private System.Collections.BitArray Visited;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly + rename (_visited)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in dc9a4c4

/// <summary>
/// Some metrics calculated.
/// </summary>
public readonly Metrics Metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly property ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not now

Comment on lines 81 to 90
IterateBlock(block);
BasicBlock<N, D> nextFlowBlock = null;

// Track Branching blocks
if (block.Context != null)
{
nextFlowBlock = InterpretContext(block);
}

LeaveBlock(block);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly ! Look at the pseudo-code I wrote, you can see clearly the three step one line above each other so the Enter - Interpret - Leave workflow is obvious for the reader.

But it's ok, we can keep it like that.

Comment on lines 104 to 110
foreach (var edge in block.SuccessorEdges)
{
if (Cfg.SuccessorEdges[edge] == nextFlowBlock)
{
Metrics.EdgeCount++;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent to Metrics.EdgeCount++ ? Among the successors there would be multiple edges to the same block ?
I think I need an example to see how this next flow block special case work.

Copy link
Contributor Author

@mayanje mayanje Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok you are right better to do: Metrics.EdgeCount += block.SuccessorEdges.Count;
I wanted to match the execution process and to have the same result, but after checking, it appers that no.
done in dc9a4c4

/// <param name="block"></param>
private void IterateBlock(BasicBlock<N, D> block)
{
if (_observers != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should keep the condition && block.Instructions != null because here we actually use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in dc9a4c4

@trafico-bot trafico-bot bot added ⚠️ Changes requested Pull Request needs changes before it can be reviewed again and removed 🔍 Ready for Review Pull Request is not reviewed yet labels Oct 15, 2021
@trafico-bot trafico-bot bot added 🔍 Ready for Review Pull Request is not reviewed yet and removed ⚠️ Changes requested Pull Request needs changes before it can be reviewed again labels Oct 18, 2021
@mayanje mayanje requested a review from fm-117 October 18, 2021 08:30
/// <summary>
/// The DFS Run stack
/// </summary>
private Stack<BasicBlock<N, D>> _dFSStack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming _dfsStack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for all your suggestion,, but I suggest to keep Cfg public so that observers that receive the evironment instance thru IObserver interface can accessed thru the environment instance to which graph the current execution is associted

/// Current CFG being run.
/// </summary>
public ControlFlowGraph<N, D> Cfg { get; private set; }
private IObserver[] _observers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly as it is set only at construction time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in a840251

@mayanje mayanje requested a review from fm-117 October 18, 2021 11:35
fm-117
fm-117 previously approved these changes Oct 18, 2021
@@ -0,0 +1,34 @@
Entering block : 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This program gives the following result when run:

IN PARA-A
IN PARA-C
IN PARA-F
IN PARA-G

So is the result of the file the reflect of a running program or something else ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CFG reflect all possible executions of a program. But in reality this can represent exponential possibilities, so Abstract interpretation can only be view of a subset of all possibilities.
But the most important thing here we are just implementing a driver, we don't evaluate variable value to decide for instance what to do in the instruction GO TO PARA-E PARA-F PARA-G DEPENDING ON m
depending on the value of m.
We are not intrepreting the COBOL program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on the default execution performed is added. 7f8af6b
Not all possible execution paths can be followed, in this default implementation. But it is the base for other execution scheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this code:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. MyPGM.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  Group1.
           05 X pic X occurs 100 indexed by Idx.
           05 Y pic X occurs  50 indexed by Idx2.
       PROCEDURE DIVISION.
           MOVE 0   TO IDX2

           PERFORM VARYING Idx FROM 1 BY 1
                             UNTIL Idx > 100
      
               IF (IDX2 < 50)
                              PERFORM ADD-ITEM
               END-IF
           END-PERFORM
      *    Ko, ADD-ITEM increment Idx2 without enclosing IF
           PERFORM ADD-ITEM
           goback
           .
       ADD-ITEM.
           display "ADD-ITEM"
           ADD 1  TO Idx2
           MOVE X(Idx)   TO Y(IDX2)
           perform DisplayELt
           .
       DisplayELt.
           display "X(Idx)=" X(Idx)  
           .
      
       END PROGRAM MyPGM.
      

Paragraph ADD-ITEM is called outside of an IF statement (just before the goback), and our custom quality rule must detect it.

Does this PR handle this case?
On my execution I get:

    MOVE 0   TO IDX2
Entering block : 1
    PERFORM VARYING Idx FROM 1 BY 1
                      UNTIL Idx > 100
Leaving block : 1
Entering block : 2
        IF (IDX2 < 50)
Entering block : 3
Leaving block : 3
Entering block : 4
                       PERFORM ADD-ITEM
Leaving block : 4
Entering block : 5
Leaving block : 5
Leaving block : 2
Entering block : 6
Leaving block : 6
Leaving block : 0
Entering block : 7
Leaving block : 7
Entering block : 8
    PERFORM ADD-ITEM
Leaving block : 8
Entering block : 9
    goback
Leaving block : 9

which is not clear for me if the paragraph ADD-ITEM if the driver will pass on it 2 times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this the result:
--- Diagnostics ---
Line 9[12,27] <47, Warning, CodeAnalysis> - [Qualis] Protect table index increment (Incremented table index indX was not conditionnaly protected by IF or PERFORM UNTIL clause.)
Line 24[12,25] <47, Warning, CodeAnalysis> - [Qualis] Protect table index increment (Incremented table index indX was not conditionnaly protected by IF or PERFORM UNTIL clause.)

That is to say:
MOVE 0 TO IDX2
and
ADD 1 TO Idx2

Are detected.

@@ -0,0 +1,34 @@
Entering block : 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this code:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. MyPGM.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  Group1.
           05 X pic X occurs 100 indexed by Idx.
           05 Y pic X occurs  50 indexed by Idx2.
       PROCEDURE DIVISION.
           MOVE 0   TO IDX2

           PERFORM VARYING Idx FROM 1 BY 1
                             UNTIL Idx > 100
      
               IF (IDX2 < 50)
                              PERFORM ADD-ITEM
               END-IF
           END-PERFORM
      *    Ko, ADD-ITEM increment Idx2 without enclosing IF
           PERFORM ADD-ITEM
           goback
           .
       ADD-ITEM.
           display "ADD-ITEM"
           ADD 1  TO Idx2
           MOVE X(Idx)   TO Y(IDX2)
           perform DisplayELt
           .
       DisplayELt.
           display "X(Idx)=" X(Idx)  
           .
      
       END PROGRAM MyPGM.
      

Paragraph ADD-ITEM is called outside of an IF statement (just before the goback), and our custom quality rule must detect it.

Does this PR handle this case?
On my execution I get:

    MOVE 0   TO IDX2
Entering block : 1
    PERFORM VARYING Idx FROM 1 BY 1
                      UNTIL Idx > 100
Leaving block : 1
Entering block : 2
        IF (IDX2 < 50)
Entering block : 3
Leaving block : 3
Entering block : 4
                       PERFORM ADD-ITEM
Leaving block : 4
Entering block : 5
Leaving block : 5
Leaving block : 2
Entering block : 6
Leaving block : 6
Leaving block : 0
Entering block : 7
Leaving block : 7
Entering block : 8
    PERFORM ADD-ITEM
Leaving block : 8
Entering block : 9
    goback
Leaving block : 9

which is not clear for me if the paragraph ADD-ITEM if the driver will pass on it 2 times.

@mayanje
Copy link
Contributor Author

mayanje commented Nov 4, 2021

@smedilol "which is not clear for me if the paragraph ADD-ITEM if the driver will pass on it 2 times."
The answer is yes, it will be considered 2 times.

@mayanje
Copy link
Contributor Author

mayanje commented Nov 9, 2021

Due to important observations by @smedilol the abstract interpretation algorithm has been improved.
The new algorithm is no longer based on marking visited blocks, any block can be visited several times, therefore a cyclic threshold is used to avoid infinite recursion on cyclic blocks. See 203b206

Comment on lines 2205 to 2212
//Create the real perform block context for abstract interpretation
//This is not the context of the perform block.
ctx.OriginBlock.Context = null;
MultiBranchContext<Node, D> ctxPerform = new MultiBranchContext<Node, D>(perform);
ctxPerform.Start(ctx.Branches[0]); //Branches[0] is the perfom block
ctxPerform.AddBranch(ctx.Branches[1]); //Branches[1] i sthe perform body block
ctxPerform.BranchIndices.Add(ctx.BranchIndices[1]); //BranchIndices[1] is index in the SuccessorEdge of the perfomr body block.
ctxPerform.Terminals = terminals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple typos:

  • Line 2209: 'the perform block'
  • Line 2210: 'Branches[1] is the perform'
  • Line 2211: 'perform body block'

About the comment '//This is not the context of the perform block.', I'm confused. If you call Start on the perform block, this will actually attach this context to the block so it looks like it is truly its context.

Also we previously did not need to create a MultiBranchContext to represent perform loops, if we need it now for abstract interpretation does it mean that all blocks having multiple successors require in fact a MultiBranchContext ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact only Control Flow instruction needs a MultiBranchContext. The first context calculated was only for constructing the graph, but the execution context of the PerformLoop is the one abtained after knowing the branching depending if it was in iterative perfotm or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 7c56456

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Control Flow Instructions" and "Blocks having multiple successors" are not the same thing ?

/// <summary>
/// The Cyclic relation from one block index to another block index
/// </summary>
private Dictionary<int, HashSet<int> >_cyclicRelation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing

This is the transitive closure for each block I guess. Why do we need to compute it before running ?
Let's say we keep a dictionary that gives for each block the number of times we visited it. Upon entering the block we check this number against the threshold and increment it before continuing.

Also this last commit adds much complexity to the abstract interpretation algorithm. What about letting the client decide for each block if it needs to visit it again ? By changing the EnterBlock method or the interface IObserver, we could add this capability and this would split the compexity between the abstract runner and its observer while being more flexible.

Copy link
Contributor Author

@mayanje mayanje Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _cyclicRelation is only for remembering which relation (which transition) leads to a cycle, it is not the entire transtive closure relation of the graph. So it is now called _cyclicTransition.
We just want to limit how many time a cyclic transition will be executed to avoid infinite transitions. otherwise there is no limitation on the number of time a non cyclic transition can be executed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that, I know we have to limit the number of times a block is visited. My idea is to let the client decide instead of fixing a maximum for all blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can client decides on block he doesn't know? the cyclicExecutionThreshold is already a parameter of the method, do you mean that we should add an option for that? even more we don't know in adance what transition will be cyclic.
So there can only be an unique threshold for each block.

/// <summary>
/// Number maximal of cyclic execution for a block index.
/// </summary>
private int _cyclicExecutionThresold ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: _cyclicExecutionThreshold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 7c56456

Comment on lines 70 to 82
if (_interpretationStack == null)
_interpretationStack = new Stack<BasicBlock<N, D>>();
if (_dfsStack == null)
_dfsStack = new Stack<BasicBlock<N, D>>();
_metricUpdated = new System.Collections.BitArray(Cfg.AllBlocks.Count);
if (_cyclicThreshold == null)
_cyclicThreshold = new Dictionary<int, int>();
else
_cyclicThreshold.Clear();
if (_cyclicRelation == null)
_cyclicRelation = new Dictionary<int, HashSet<int>>();
else
_cyclicRelation.Clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have more fields to initialize, I think it is cleaner to do all the the initializations in the constructor and create a private Reset method for those that need to be cleared. This would also allow to mark fields as readonly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is renamed Reset, no need to have it in the constructor, beacuse it is any call to Run that need to Reset value.

Copy link
Contributor Author

@mayanje mayanje Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 7c56456

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init operations in constructor and Clear operations in Reset, this would avoird testing for null at each run. This was my idea but ok to keep it like this.

/// <summary>
/// Current Cyclic execution Thresold by block index.
/// </summary>
private Dictionary<int,int> _cyclicThreshold;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 7c56456

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing a space after int,

}
}

bool CanExecute(BasicBlock<N, D> b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a CanExecute and CanAddExecution ? Why checking the value twice ?

Copy link
Contributor Author

@mayanje mayanje Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// CanExecute is used during execution to check for a block that contributes to a cycle,
/// if can be executed, that is to say if its cyclic threshold is not consumed.
/// Other blocks are always executed.

/// CanAddExecution Checks if a transition from one block to another can be pushed as to be executed.
/// For a transition that can leads to a cycle, the cyclic threshold value is decreased, if the threshold
/// is consumed the transition cannot be executed.

/// </summary>
public void SetCyclicityThreshold()
{
BitSet _domain = new BitSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why prefix local variable with _ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 98d7aaa

Comment on lines 140 to 141
var sb = Cfg.SuccessorEdges[s];
int b = sb.Index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use more meaningful variable name.
Like:

Suggested change
var sb = Cfg.SuccessorEdges[s];
int b = sb.Index;
int blockBIndex = Cfg.SuccessorEdges[s].Index;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 98d7aaa

/// The default execution performed is based on a dfs algorithm, with exception that
/// branching instructions are handled specifically.
/// Thus not all execution paths are applied, but all nodes are visited using dfs like algorithm if they are accessible.
/// A node cannot be visited twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now obsolete with the cyclicity threshold mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in a049ba1

Stack<int> _stack = new Stack<int>();

//Compute the domain
Cfg.DFS((block, incomingEdge, predecessorBlock, graph) => { _domain.Set(block.Index); return true; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not take into account group blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, abstract interpretation like dfa analysis can only be performed on grapth generated using Extended or WithDfa mode. I know tests are executed by default in Standard mode, thus the algorithm doesn't go in group blocks. But for tracing execution steps it is not really important here.

/// <summary>
/// Current Cyclic execution Thresold by block index.
/// </summary>
private Dictionary<int,int> _cyclicThreshold;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing a space after int,

/// <summary>
/// The Cyclic transition from one block index to another block index
/// </summary>
private Dictionary<int, HashSet<int> >_cyclicTransition;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in a049ba1

dfs(b);
}
int _na = Visited(a);
_nb = Visited(b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the previous if body as if we don't DFS the block the value won't change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in a049ba1

Copy link
Contributor

@fm-117 fm-117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me, the last remaining thing is my remark about the MultiBranchContext (see https://github.com/TypeCobolTeam/TypeCobol/pull/2050/files#r740842228). We should try to keep it internal because of the cloned/relocated blocks.

I've not checked how this PR would impact the code in our private analyzer, but ideally the context is used by the Abstract Interpretation driver but not exposed to observers.

@mayanje mayanje requested a review from smedilol January 3, 2022 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔍 Ready for Review Pull Request is not reviewed yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants