-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mcdc analysis for pattern matching #124278
base: master
Are you sure you want to change the base?
Conversation
r? @wesleywiser rustbot has assigned @wesleywiser. Use |
Failed to set assignee to
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
39732bf
to
0255b0b
Compare
Determining decision of pattern matching is more tricky than expected. I should write it down here in case someone feels confused about it. We call patterns like The first rule is that candidate is constructed where all match pairs representing The second rule is the first pair will be removed and compiler will insert its sub pairs into the candidate's match pairs if it is "full matched" Thus in all, the evaluate order of
While the order of
And the order of
A more mischievous example is
Hopefully the decision structure were not taken as mischief by users. |
This comment has been minimized.
This comment has been minimized.
I have pushed an impractical commit to show design for constructing decisions and mcdc branches of pattern matching. |
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #124255) made this pull request unmergeable. Please resolve the merge conflicts. |
Split mcdc code to a sub module of coverageinfo A further work from rust-lang#124217 . I have made relatively large changes when working on rust-lang#124278 so that it would better split them from `coverageinfo.rs` to avoid potential troubling merge work with improved branch coverage by `@Zalathar` . Besides `BlockMarkerGenerator` is added to avoid ownership problems (mostly needed for following change of rust-lang#124278 ) All code changes are done in [a37d737a](rust-lang@a3d737a) while the second commit just renames the file. cc `@RenjiSann` `@Zalathar` This will impact your current work.
Implementation for if-let has been drafted. Due to llvm does not support nested decisions yet tests for let-chains are not added. I should try to reduce coverage expressions and investigate if it works for matching guards later. |
☔ The latest upstream changes (presumably #124972) made this pull request unmergeable. Please resolve the merge conflicts. |
Right now I'm working on a PR to simplify obviously-redundant expressions. My plan is to get it merged before this PR, since it should be a lot easier to review. EDIT: Filed as #125106. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a proposal to properly track the data you need. Let me know what you think.
17b9e82
to
baefce9
Compare
// By here the candidate is the first candidate for this branch. The matched span must be from | ||
// one of its match pairs if existed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// By here the candidate is the first candidate for this branch. The matched span must be from | |
// one of its match pairs if existed. | |
// The candidate is the first candidate for this branch. If the corresponding match_pair | |
// was fully matched, we record its span here for coverage tracking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the wikipedia article for MCDC, it sounds like we should be tracking every possible outcome of a decision, no? In that case, shouldn't we record all possible branches, even if the match pair was not fully matched?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I might be confused about what "decision" means. The simpler to me would be to track each match branch, including sub-branches due to or-patterns. If that's the case, then what this is doing is incorrect. Instead, I'd suggest considering the whole "match" as one condition, with one outcome per subcandidate. You'd collect the information you need in lower_match_tree
, where start_block
is the start block and the block for each outcome is the subcandidate.pre_binding_block
. Would that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the wikipedia article for MCDC, it sounds like we should be tracking every possible outcome of a decision, no? In that case, shouldn't we record all possible branches, even if the match pair was not fully matched?
Yes. I was not clear about how to process the branches not fully matched until understand what full_matched
means from your reviews. Now I modified it a bit so we could handle not fully matched branches and generate appropriate results for more patterns.
Instead, I'd suggest considering the whole "match" as one condition, with one outcome per subcandidate.
Let's clarify it more. Consider such code
// val: Pat
match val {
Pat::A | Pat::B => { /* Arm 1 */ }
Pat::C => { /* Arm 2 */ }
_ => { /* Arm 3 */ }
}
Here Pat::A | Pat::B
, Pat::C
are decisions, their outcome mean "whether these pattern is matched or not". In another word, outcome of decision Pat::A | Pat:B
is equivalent to results of matches!(val, Pat::A | Pat::B)
and the conditions are sub patterns Pat::A
, Pat::B
. By setting val
to Pat::A
, we show the outcome is true. By setting val
to Pat::C
, we show the outcome is false. Thus we say condition Pat::A
can effect outcome of the decision because the outcome varies as value of val == Pat::A
changes.
Note that the last pattern _
is not a decision because it never fails to match val
.
Taking the whole match
as a decision (do I misunderstand?) might be a bit avant-garde. To some extent I love this view but this means a decision might have more than two outcome, which is not compatible with traditional definition only caring boolean value and is not supported by llvm implementation.
This pr just generates branch coverage report here but I should try to investigate if we could find a way to implement mcdc defined above.
EDIT:
Now the main problem is the common conditions. Let's consider a more complex example:
// val: (Pat, Pat)
match val {
(Pat::A | Pat::B, Pat::C) => { /* Arm 1 */ }
(Pat::D, Pat::C) => { /* Arm 2 */ }
_ => { /* Arm 3 */ }
}
Suppose the span of Pat::C
in Pat::A | Pat::B, Pat::C
is S1, span of Pat::C
in Pat::D
is S2. The problem is we can only see S1 when val.1
is tested, but we also need S2 to fill span of a condition in decision (Pat::D, Pat::C)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to handle decisions with multiple outcomes anyway, since individual tests (e.g. enum variant test) can have many branches. How do you deal with that in the current state of the code?
This example is good: could you spell out precisely what conditions and decisions you expect to see in this example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is good: could you spell out precisely what conditions and decisions you expect to see in this example?
Sure let's introduce what decisions and conditions need for mcdc first:
struct Decision {
// Span of the whole decision, which should contain all spans of its conditions
span: Span,
// Total number of conditions in this decision
conditions_num: usize,
}
struct Condition {
// Span of this condition, should be contained by its decision
span: Span,
// Id of this condition, starting from 1, should be unique in the decision. Can be assigned arbitrarily except that the first evaluated condition should be assigned to 1
condition_id: usize,
// Id of the next condition to be evaluated if this condition is true/matched. 0 if the evaluation ends here.
true_next_id: usize,
// Id of the next condition to be evaluated if this condition is false/unmatched. 0 if the evaluation ends here.
false_next_id: usize,
// The block this condition is tested,
test_block: BasicBlock,
// The succeeding block if this condition is true
matched_block: BasicBlock
}
For code like
// val: (Pat, Pat)
match val {
(Pat::A | Pat::B, Pat::C) => { /* Arm 1 */ } // Suppose this is line 3
(Pat::D, Pat::C) => { /* Arm 2 */ } // Suppose this is line 4
_ => { /* Arm 3 */ }
}
For decision (Pat::A | Pat::B, Pat::C)
we expect (use string to represent span here)
let decision = Decision {
span: "(Pat::A | Pat::B, Pat::C)",
conditions_num: 3,
}
// This condition can be viewed as "val.1==Pat::C"
let condition_1 = Condition {
span: "Pat::C", // This is "Pat::C" in line 3
condition_id: 1, // This condition is the first to be checked
true_next_id: 2, // Next we check "Pat::A" if this is true
false_next_id: 0,
test_block: _, // The block we check if val.1==Pat::C
matched_block: _ // The block switch to if val.1==Pat::C
}
// This condition can be viewed as "val.0==Pat::A"
let condition_2 = Condition {
span: "Pat::A",
condition_id: 2,
true_next_id: 0, // Matched all here
false_next_id: 3, // Next we check "Pat::B" if this is false
test_block: _, // The block we check if val.0==Pat::A, can be same as condition_1.matched_block
matched_block: _ // The block switch to if val.0==Pat::A, may be the block to Arm 1
}
// This condition can be viewed as "val.0==Pat::B"
let condition_3 = Condition {
span: "Pat::B",
condition_id: 3,
// No matter what this condition outcomes, we reach end
true_next_id: 0,
false_next_id: 0,
test_block: _, // The block we check if val.0==Pat::B
matched_block: _ // The block switch to if val.0==Pat::B, may be the block to Arm 2
}
And for decision (Pat::D, Pat::C)
we expect
let decision = Decision {
span: "(Pat::D, Pat::C)",
conditions_num: 2,
}
let condition_1 = Condition {
span: "Pat::C", // This is "Pat::C" in line 4
condition_id: 1,
true_next_id: 2, // Next we check "Pat::D" if this is true
false_next_id: 0,
test_block: _,// The block we check if val.1==Pat::C
matched_block: _ // The block switch to if val.1==Pat::C
}
let condition_2 = Condition {
span: "Pat::D",
condition_id: 2,
true_next_id: 0,
false_next_id: 0,
test_block: _, // The block we check if val.1==Pat::D
matched_block: _ // The block switch to if val.1==Pat::D, may be the block to Arm 2
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to handle decisions with multiple outcomes anyway, since individual tests (e.g. enum variant test) can have many branches. How do you deal with that in the current state of the code?
Suppose the decision is Pat::A | Pat::B
, Do you mean "multiple outcomes" due to logic like
_2 = discriminant((_1: Pat));
switchInt(move _2) -> [0: bb2 "block to match Pat::A", 1: bb3 "block to match Pat::B" otherwise: bb4 "unmatched block"];
In such cases, now both _2==0
(Pat::A
) and _2==1
(Pat::B
) mean the decision is true
, and all others such as Pat::C
, Pat::D
indicate the decision is false
.
Considering Pat::A
and Pat::B
are conditions of the decision. We will update the bit of some values for the decision representing Pat::A
to 1 at bb2
and similar as Pat::B
at bb3
. Then from the condition info introduced above, llvm can know the value of decision from the bits by taking it as val==Pat::A || val==Pat::B
. (llvm can reconstruct the decision tree from true_next_id
and false_next_id
provided).
Take that as example, suppose we use a value a: i32
to trace the conditions. In bb2
we insert a|=1 << 0
while in bb3
we insert a|=1 << 1
. Then we store a
to some places assigned by llvm. If llvm saw a==1
later, it knows the value has been matched with Pat::A
. If llvm saw a==0b10
, it knows the value has been matched with Pat::B
. Otherwise if llvm see a==0
, it knows neither matched the value. Besides from the decision tree llvm knows that a==0b10
and a==0b1
mean the decision is true
while a==0
means it is false
. So llvm can show how the two conditions affect outcome of the decision with mcdc criteria.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that's really helpful! I didn't have a clear idea of the goal here but I think I do now.
I seem to understand that every branch in the match is a decision, and every match pair in a branch is a condition for that branch?
The current way of doing things misses some conditions then I think. I particular the second C
is missed in your example, because you only record conditions for the first candidate in each branch of a given test. Why not record all the match pairs when they're fully_matched? This would ensure that every match branch gets one condition per match pair.
Off the top of my head, what I'd try is:
- make
sort_candidate
outputOption<(TestBranch, Option<MatchPair>)>
that returns the match pair of it was fully matched; - make
sort_candidates
returnMap<Branch, Vec<(Candidate, Option<MatchPair>)>>
, to record every fully_matched match pair alongside the candidate it came from; - when building
target_blocks
, output one condition for each match pair we get.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note also that I'm not sure we're handling spans like you want right now: can you try and example like Some(true) | None
? I suspect the true
would have an incorrect decision span.
To solve this, I'd suggest accumulating conditions in a new field of Candidate
, and process them all at the end of lower_match_tree
. The trick is that you want to use the span of the top-level candidate, but the conditions might be stored in sub-candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head, what I'd try is:
* make `sort_candidate` output `Option<(TestBranch, Option<MatchPair>)>` that returns the match pair of it was fully matched; * make `sort_candidates` return `Map<Branch, Vec<(Candidate, Option<MatchPair>)>>`, to record every fully_matched match pair alongside the candidate it came from; * when building `target_blocks`, output one condition for each match pair we get.
Yes that's exactly I am going to do. I would try to find a way to identify each decision alongside lowering the match tree and group its conditions respectively.
can you try and example like
Some(true) | None
? I suspect thetrue
would have an incorrect decision span.
Decision Some(true) | None
would have 3 conditons: val.is_some()
, val.is_none()
, <val as Option::Some>.0 == true
. To simplify it we can take it as something like (val.is_some() && <val as Option::Some>.0 == true) || val.is_none()
, then the decision span is Some(true) | None
and conditions are
let condition_1 = Condition {
span: "Some(" // Primarily is `Some(true)` but we extract the span of its sub pattern from it.
condition_id: 1,
true_next_id: 2,
false_next_id: 3
// omit blocks
}
let condition_2 = Condition {
span: "true"
condition_id: 2,
true_next_id: 0,
false_next_id: 0,
}
let condition_3 = Condition {
span: "None"
condition_id: 3,
true_next_id: 0,
false_next_id: 0,
}
For now the decision span is right because I get it by combining spans of all conditions with Span::to
. Indeed this method would not work if we record all target spans.
To solve this, I'd suggest accumulating conditions in a new field of
Candidate
, and process them all at the end oflower_match_tree
. The trick is that you want to use the span of the top-level candidate, but the conditions might be stored in sub-candidates.
Agree with that, I should have a try in this way. Thank you!
05534ce
to
386f870
Compare
coverage: Memoize and simplify counter expressions When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`. To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead: - `(a - b) + b` → `a`. - `(a + b) - b` → `a`. - `(a + b) - a` → `b`. - `a + (b - a)` → `b`. - `a - (a - b)` → `b`. Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification. (Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.) --- This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates. This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by rust-lang#124278 (comment).
Rollup merge of rust-lang#125106 - Zalathar:expressions, r=davidtwco coverage: Memoize and simplify counter expressions When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`. To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead: - `(a - b) + b` → `a`. - `(a + b) - b` → `a`. - `(a + b) - a` → `b`. - `a + (b - a)` → `b`. - `a - (a - b)` → `b`. Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification. (Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.) --- This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates. This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by rust-lang#124278 (comment).
☔ The latest upstream changes (presumably #125331) made this pull request unmergeable. Please resolve the merge conflicts. |
659eede
to
7e0374f
Compare
Thanks to @Nadrieril now I could find a way to treat each arm in The basic idea is to update test vectors of every decision in every arms. For instance, match val {
Pat::A | Pat::B => { /* update test vectors of `Pat::A | Pat::B` and `Pat::C | Pat::D` here */}
Pat::C | Pat::D=> { /* update test vectors of `Pat::A | Pat::B` and `Pat::C | Pat::D` here */}
_ => { /* update test vectors of `Pat::A | Pat::B` and `Pat::C | Pat::D` here */}
} By updating test vectors of all decisions simultaneously we can deal with tests affecting different decisions. Fortunately we have implemented decision depth to such cases. |
normal_branch_spans: Vec<MCDCBranchSpan>, | ||
mcdc_spans: Vec<(MCDCDecisionSpan, Vec<MCDCBranchSpan>)>, | ||
decision_ctx_stack: Vec<DecisionCtx>, | ||
decision_ends: FxIndexMap<Span, Rc<RefCell<Vec<BlockMarkerId>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decision_ends
is introduced because:
- Candidates with
|
may not havepre_binding_block
when matching finishes. - For every arm in
match
statements,arm_block
is generated inlower_match_arms
and immediately compiler digs into code in the arm, which might contain many other decisions.
Hence when a PatternDecisionCtx
is finished, we do not know all blocks which are "ends" of these decisions. Instead we fulfill them in add_ends_to_decision
later. Because decisions produced by same ctx
share same ends, Rc<RefCell<_>>
is used here.
@@ -755,7 +755,7 @@ impl<'a, 'tcx> Builder<'a, 'tcx> { | |||
candidate.match_pairs.sort_by_key(|pair| matches!(pair.test_case, TestCase::Or { .. })); | |||
} | |||
|
|||
ret | |||
ret.map(|branch| (branch, matched_span)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you add the span regardless of fully_matched
, some conditions will be duplicated; this may or may not be what you want. For example:
match x {
(10.., true) => {}
(100.., false) => {}
_ => {}
}
In that example, the first condition we try is 10..
, which also matches 100..
but not fully_matches it. So 100..
will show up twice: once when we test if n>=10
, and the second time when we test if n>=100
.
Intuitively I'd expect that we only want the second time; what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled in PatternDecisionCtx::visit_matching_arms
in mcdc.rs
, I use false_matched
to indicate if that target is full_matched
or not and find the right block for it.
EDIT: false_matched
is renamed to partial_matched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should clarify it a bit clearer.
The motivation to catch conditions not full_matched
is I want to reconstruct decision tree in mcdc.
Say the code is
// arr: &[i32]
match arr {
[one] => { /*...*/ }
[_, two] => { /*...*/ }
_ => { /*..*/ }
}
First we check whether arr
matches [one]
. By engaging "partial matched" pattern [_, two]
, we know condition [one]
has a false next test [_, two]
. If we only pass full_matched
targets to mcdc builder we have no idea how [one]
is connected to [_, two]
, true next or false next.
(I forgot to add tests to check this kind of code. Thanks for the reminder now a simple test is added and it helps me fix a bug, see comparison)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see! If you're reconstructing the whole decision tree, could it be easier to add enough Span
information to the MIR so that you can act directly on the MIR? I don't know how easy that would be but it feels conceptually simpler. For example we could add info to SwitchInt
or SourceInfo
to indicate what pattern it comes from (if any).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that may help a lot. Nevertheless I'm afraid that it results in uncomfortable side effects. Though mcdc is meaningful in some areas, mostly it is required by embedded systems working for safety critical components, which comprise only a very small portion of all applications. I'd tend to encapsulate complexity in mcdc and minimize its effects on other parts that are much more widely depended on.
Besides, mcdc criteria for pattern matching has not reached a consensus within industries yet (due to C is the only practical choice in these areas in the past and few manufacturers are aware of modern language features). The basic idea behind this implementation is addressed by a paper from ferrous system but still there might be other opinions worth consideration. So we'd better keep it in an easy-to-change way.
coverage: Memoize and simplify counter expressions When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`. To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead: - `(a - b) + b` → `a`. - `(a + b) - b` → `a`. - `(a + b) - a` → `b`. - `a + (b - a)` → `b`. - `a - (a - b)` → `b`. Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification. (Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.) --- This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates. This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by rust-lang/rust#124278 (comment).
…anch mappings any more
Changed a bit to fit with constant folding. The main goal is to find all |
To finish the second task at #124144.
Changes
That means,
coverage-options=mcdc
may give different branch coverage results of pattern matching withcoverage-options=branch
temporarily.match
.Note. The results might be a bit counter intuitive due to reorder as this comment introduces.