Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mcdc analysis for pattern matching #124278

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ZhuUx
Copy link
Contributor

@ZhuUx ZhuUx commented Apr 23, 2024

To finish the second task at #124144.

Changes

  • Implement branch coverage for pattern matching specific to mcdc. As suggested at #124217, mcdc would better implement its own branch coverage methods for pattern match.
    That means, coverage-options=mcdc may give different branch coverage results of pattern matching with coverage-options=branch temporarily.
  • Generate decision mappings for all refutable patterns in match.

Note. The results might be a bit counter intuitive due to reorder as this comment introduces.

@rustbot
Copy link
Collaborator

rustbot commented Apr 23, 2024

r? @wesleywiser

rustbot has assigned @wesleywiser.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 23, 2024
@ZhuUx
Copy link
Contributor Author

ZhuUx commented Apr 23, 2024

For now there are still some work need to do and some code is not well optimized to reduce work in merging #124255 .
r? @ghost
@rustbot label +A-code-coverage
@rustbot author

@rustbot
Copy link
Collaborator

rustbot commented Apr 23, 2024

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

@rustbot rustbot added A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 23, 2024
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZhuUx ZhuUx force-pushed the pattern-match branch 2 times, most recently from 39732bf to 0255b0b Compare April 24, 2024 02:39
@Lambdaris
Copy link

Lambdaris commented Apr 24, 2024

Determining decision of pattern matching is more tricky than expected. I should write it down here in case someone feels confused about it.
I'm not sure if rustc promises anything about the order of checked pattern (probably not), so the description here should be only taken as a specific way determined by the implementation of compiler when this pr is drafting.

We call patterns like ( A | B, C (X | Y) ) as candidates, each candidate has match pairs which indicates sub patterns and also has their "sub pairs". For example, candidate for ( A | B, C (X | Y) ) has match pairs for A | B (let's call it M1) and C (X | Y) (M2) separately. Also M1 has sub pairs for A and B and M2 has sub pair X | Y.

The first rule is that candidate is constructed where all match pairs representing | pattern (directly) are moved to last
This is done at simplify_match_pairs.
For the example compiler checks C first because A | B is or pattern while C is not directly (though it has an or sub pattern).

The second rule is the first pair will be removed and compiler will insert its sub pairs into the candidate's match pairs if it is "full matched"
This is done at sort_candidate.
So when compiler checks C first, it removes it and takes match pair representing X | Y as the candidate's last match pair. If the sub pattern in C were not |, it would be inserted at front of all or patterns as the code suggests.

Thus in all, the evaluate order of if let ( A | B, C (X | Y) ) = val is:

  1. Check whether val.1 is C,
  2. Check whether val.0 is A | B,
  3. Check whether val.1.0 is X | Y.

While the order of if let (A | B, C(X)) = val is:

  1. Check whether val.1 is C,
  2. Check whether val.1.0 is X,
  3. Check whether val.0 is A | B.

And the order of if let (A(Z), C) = val is:

  1. Check whether val.0 is A,
  2. Check whether val.0.0 is Z,
  3. Check whether val.1 is C.

A more mischievous example is if let (A | B, C (X | Y, Z), D) = val, its control flow would be like:

  1. Check whether val.1 is C,
  2. Check whether val.2 is D,
  3. Check whether val.1.1 is Z,
  4. Check whether val.0 is A | B,
  5. Check whether val.1.0 is X | Y.

Hopefully the decision structure were not taken as mischief by users.

@ZhuUx ZhuUx changed the title Support mcdc analysis for pattern match Support mcdc analysis for pattern matching Apr 24, 2024
@rust-log-analyzer

This comment has been minimized.

@ZhuUx
Copy link
Contributor Author

ZhuUx commented Apr 28, 2024

I have pushed an impractical commit to show design for constructing decisions and mcdc branches of pattern matching.
Mostly is in MCDCDecisionBuilder. I'd like to finish it after merging #124255 and #124399 because the way to inject block marker should be changed.
But still any comments about it are welcomed.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 29, 2024

☔ The latest upstream changes (presumably #124255) made this pull request unmergeable. Please resolve the merge conflicts.

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 30, 2024
Split mcdc code to a sub module of coverageinfo

A further work from rust-lang#124217 . I have made relatively large changes when working on rust-lang#124278 so that it would better split them from `coverageinfo.rs` to avoid potential troubling merge work with improved branch coverage by `@Zalathar` .

Besides `BlockMarkerGenerator` is added to avoid ownership problems (mostly needed for following change of rust-lang#124278 )

All code changes are done in [a37d737a](rust-lang@a3d737a) while the second commit just renames the file.

cc `@RenjiSann` `@Zalathar`
This will impact your current work.
@ZhuUx
Copy link
Contributor Author

ZhuUx commented May 10, 2024

Implementation for if-let has been drafted. Due to llvm does not support nested decisions yet tests for let-chains are not added. I should try to reduce coverage expressions and investigate if it works for matching guards later.

@bors
Copy link
Contributor

bors commented May 10, 2024

☔ The latest upstream changes (presumably #124972) made this pull request unmergeable. Please resolve the merge conflicts.

@Zalathar
Copy link
Contributor

Zalathar commented May 14, 2024

Sure I do think so. Thus that commit is not contained in this pr yet. I planned to commit it later. Or shall we rewrite expressions before these issues?

Right now I'm working on a PR to simplify obviously-redundant expressions. My plan is to get it merged before this PR, since it should be a lot easier to review.

EDIT: Filed as #125106.

Copy link
Member

@Nadrieril Nadrieril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a proposal to properly track the data you need. Let me know what you think.

compiler/rustc_mir_build/src/build/matches/test.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_build/src/build/matches/mod.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_build/src/build/matches/mod.rs Outdated Show resolved Hide resolved
@ZhuUx ZhuUx force-pushed the pattern-match branch 2 times, most recently from 17b9e82 to baefce9 Compare May 15, 2024 02:08
Comment on lines 1795 to 1796
// By here the candidate is the first candidate for this branch. The matched span must be from
// one of its match pairs if existed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// By here the candidate is the first candidate for this branch. The matched span must be from
// one of its match pairs if existed.
// The candidate is the first candidate for this branch. If the corresponding match_pair
// was fully matched, we record its span here for coverage tracking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the wikipedia article for MCDC, it sounds like we should be tracking every possible outcome of a decision, no? In that case, shouldn't we record all possible branches, even if the match pair was not fully matched?

Copy link
Member

@Nadrieril Nadrieril May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I might be confused about what "decision" means. The simpler to me would be to track each match branch, including sub-branches due to or-patterns. If that's the case, then what this is doing is incorrect. Instead, I'd suggest considering the whole "match" as one condition, with one outcome per subcandidate. You'd collect the information you need in lower_match_tree, where start_block is the start block and the block for each outcome is the subcandidate.pre_binding_block. Would that make sense?

Copy link
Contributor Author

@ZhuUx ZhuUx May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the wikipedia article for MCDC, it sounds like we should be tracking every possible outcome of a decision, no? In that case, shouldn't we record all possible branches, even if the match pair was not fully matched?

Yes. I was not clear about how to process the branches not fully matched until understand what full_matched means from your reviews. Now I modified it a bit so we could handle not fully matched branches and generate appropriate results for more patterns.

Instead, I'd suggest considering the whole "match" as one condition, with one outcome per subcandidate.

Let's clarify it more. Consider such code

// val: Pat
match val {
    Pat::A | Pat::B => { /* Arm 1 */ }
    Pat::C  => { /* Arm 2 */ }
    _ => { /* Arm 3 */ }
}

Here Pat::A | Pat::B, Pat::C are decisions, their outcome mean "whether these pattern is matched or not". In another word, outcome of decision Pat::A | Pat:B is equivalent to results of matches!(val, Pat::A | Pat::B) and the conditions are sub patterns Pat::A, Pat::B. By setting val to Pat::A, we show the outcome is true. By setting val to Pat::C, we show the outcome is false. Thus we say condition Pat::A can effect outcome of the decision because the outcome varies as value of val == Pat::A changes.

Note that the last pattern _ is not a decision because it never fails to match val.

Taking the whole match as a decision (do I misunderstand?) might be a bit avant-garde. To some extent I love this view but this means a decision might have more than two outcome, which is not compatible with traditional definition only caring boolean value and is not supported by llvm implementation.

This pr just generates branch coverage report here but I should try to investigate if we could find a way to implement mcdc defined above.

EDIT:
Now the main problem is the common conditions. Let's consider a more complex example:

// val: (Pat, Pat)
match val {
    (Pat::A | Pat::B, Pat::C) => { /* Arm 1 */ }
    (Pat::D, Pat::C)  => { /* Arm 2 */ }
    _ => { /* Arm 3 */ }
}

Suppose the span of Pat::C in Pat::A | Pat::B, Pat::C is S1, span of Pat::C in Pat::D is S2. The problem is we can only see S1 when val.1 is tested, but we also need S2 to fill span of a condition in decision (Pat::D, Pat::C)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to handle decisions with multiple outcomes anyway, since individual tests (e.g. enum variant test) can have many branches. How do you deal with that in the current state of the code?

This example is good: could you spell out precisely what conditions and decisions you expect to see in this example?

Copy link
Contributor Author

@ZhuUx ZhuUx May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is good: could you spell out precisely what conditions and decisions you expect to see in this example?

Sure let's introduce what decisions and conditions need for mcdc first:

struct Decision {
    // Span of the whole decision, which should contain all spans of its conditions
    span: Span,
    // Total number of conditions in this decision
    conditions_num: usize,
}

struct Condition {
    // Span of this condition, should be contained by its decision
    span: Span,
    // Id of this condition, starting from 1, should be unique in the decision. Can be assigned arbitrarily except that the first evaluated condition should be assigned to 1
    condition_id: usize,
    // Id of the next condition to be evaluated if this condition is true/matched. 0 if the evaluation ends here.
    true_next_id: usize,
    // Id of the next condition to be evaluated if this condition is false/unmatched. 0 if the evaluation ends here.
    false_next_id: usize,
    // The block this condition is tested,
    test_block: BasicBlock,
    // The succeeding block if this condition is true
    matched_block: BasicBlock
}

For code like

// val: (Pat, Pat)
match val {
    (Pat::A | Pat::B, Pat::C) => { /* Arm 1 */ } // Suppose this is line 3
    (Pat::D, Pat::C)  => { /* Arm 2 */ } // Suppose this is line 4
    _ => { /* Arm 3 */ }
}

For decision (Pat::A | Pat::B, Pat::C) we expect (use string to represent span here)

let decision = Decision {
    span: "(Pat::A | Pat::B, Pat::C)", 
    conditions_num: 3,
}
// This condition can be viewed as "val.1==Pat::C"
let condition_1 = Condition {
    span: "Pat::C", // This is "Pat::C" in line 3
    condition_id: 1, // This condition is the first to be checked
    true_next_id: 2, // Next we check "Pat::A" if this is true
    false_next_id: 0,
    test_block: _, //  The block we check if val.1==Pat::C
    matched_block: _ // The block switch to if val.1==Pat::C
}
// This condition can be viewed as "val.0==Pat::A"
let condition_2 = Condition {
    span: "Pat::A",
    condition_id: 2,
    true_next_id: 0, // Matched all here
    false_next_id: 3, // Next we check "Pat::B" if this is false
    test_block: _, //  The block we check if val.0==Pat::A, can be same as condition_1.matched_block
    matched_block: _ // The block switch to if val.0==Pat::A, may be the block to Arm 1
}
// This condition can be viewed as "val.0==Pat::B"
let condition_3 = Condition {
    span: "Pat::B",
    condition_id: 3,
    // No matter what this condition outcomes, we reach end
    true_next_id: 0,
    false_next_id: 0,
    test_block: _, //  The block we check if val.0==Pat::B
    matched_block: _ // The block switch to if val.0==Pat::B, may be the block to Arm 2
}

And for decision (Pat::D, Pat::C) we expect

let decision = Decision {
    span: "(Pat::D, Pat::C)", 
    conditions_num: 2,
}
let condition_1 = Condition {
    span: "Pat::C", // This is "Pat::C" in line 4
    condition_id: 1,
    true_next_id: 2, // Next we check "Pat::D" if this is true
    false_next_id: 0,
    test_block: _,//  The block we check if val.1==Pat::C
    matched_block: _ // The block switch to if val.1==Pat::C
}
let condition_2 = Condition {
    span: "Pat::D",
    condition_id: 2,
    true_next_id: 0,
    false_next_id: 0,
    test_block: _, //  The block we check if val.1==Pat::D
    matched_block: _ // The block switch to if val.1==Pat::D, may be the block to Arm 2
}

Copy link
Contributor Author

@ZhuUx ZhuUx May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to handle decisions with multiple outcomes anyway, since individual tests (e.g. enum variant test) can have many branches. How do you deal with that in the current state of the code?

Suppose the decision is Pat::A | Pat::B, Do you mean "multiple outcomes" due to logic like

_2 = discriminant((_1: Pat));
switchInt(move _2) -> [0: bb2 "block to match Pat::A", 1: bb3 "block to match Pat::B" otherwise: bb4 "unmatched block"];

In such cases, now both _2==0 (Pat::A) and _2==1 (Pat::B) mean the decision is true, and all others such as Pat::C, Pat::D indicate the decision is false.

Considering Pat::A and Pat::B are conditions of the decision. We will update the bit of some values for the decision representing Pat::A to 1 at bb2 and similar as Pat::B at bb3. Then from the condition info introduced above, llvm can know the value of decision from the bits by taking it as val==Pat::A || val==Pat::B. (llvm can reconstruct the decision tree from true_next_id and false_next_id provided).

Take that as example, suppose we use a value a: i32 to trace the conditions. In bb2 we insert a|=1 << 0 while in bb3 we insert a|=1 << 1. Then we store a to some places assigned by llvm. If llvm saw a==1 later, it knows the value has been matched with Pat::A . If llvm saw a==0b10, it knows the value has been matched with Pat::B. Otherwise if llvm see a==0, it knows neither matched the value. Besides from the decision tree llvm knows that a==0b10 and a==0b1 mean the decision is true while a==0 means it is false. So llvm can show how the two conditions affect outcome of the decision with mcdc criteria.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that's really helpful! I didn't have a clear idea of the goal here but I think I do now.

I seem to understand that every branch in the match is a decision, and every match pair in a branch is a condition for that branch?

The current way of doing things misses some conditions then I think. I particular the second C is missed in your example, because you only record conditions for the first candidate in each branch of a given test. Why not record all the match pairs when they're fully_matched? This would ensure that every match branch gets one condition per match pair.

Off the top of my head, what I'd try is:

  • make sort_candidate output Option<(TestBranch, Option<MatchPair>)> that returns the match pair of it was fully matched;
  • make sort_candidates return Map<Branch, Vec<(Candidate, Option<MatchPair>)>>, to record every fully_matched match pair alongside the candidate it came from;
  • when building target_blocks, output one condition for each match pair we get.

Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that I'm not sure we're handling spans like you want right now: can you try and example like Some(true) | None? I suspect the true would have an incorrect decision span.

To solve this, I'd suggest accumulating conditions in a new field of Candidate, and process them all at the end of lower_match_tree. The trick is that you want to use the span of the top-level candidate, but the conditions might be stored in sub-candidates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head, what I'd try is:

* make `sort_candidate` output `Option<(TestBranch, Option<MatchPair>)>` that returns the match pair of it was fully matched;

* make `sort_candidates` return `Map<Branch, Vec<(Candidate, Option<MatchPair>)>>`, to record every fully_matched match pair alongside the candidate it came from;

* when building `target_blocks`, output one condition for each match pair we get.

Yes that's exactly I am going to do. I would try to find a way to identify each decision alongside lowering the match tree and group its conditions respectively.

can you try and example like Some(true) | None? I suspect the true would have an incorrect decision span.

Decision Some(true) | None would have 3 conditons: val.is_some(), val.is_none(), <val as Option::Some>.0 == true. To simplify it we can take it as something like (val.is_some() && <val as Option::Some>.0 == true) || val.is_none(), then the decision span is Some(true) | None and conditions are

let condition_1 = Condition {
    span: "Some(" // Primarily is `Some(true)` but we extract the span of its sub pattern from it.
    condition_id: 1,
    true_next_id: 2,
    false_next_id: 3
    // omit blocks
}
let condition_2 = Condition {
   span: "true"
   condition_id: 2,
   true_next_id: 0,
   false_next_id: 0,
}
let condition_3 = Condition {
   span: "None"
   condition_id: 3,
   true_next_id: 0,
   false_next_id: 0,
}

For now the decision span is right because I get it by combining spans of all conditions with Span::to. Indeed this method would not work if we record all target spans.

To solve this, I'd suggest accumulating conditions in a new field of Candidate, and process them all at the end of lower_match_tree. The trick is that you want to use the span of the top-level candidate, but the conditions might be stored in sub-candidates.

Agree with that, I should have a try in this way. Thank you!

@ZhuUx ZhuUx force-pushed the pattern-match branch 2 times, most recently from 05534ce to 386f870 Compare May 16, 2024 03:16
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 20, 2024
coverage: Memoize and simplify counter expressions

When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`.

To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead:
- `(a - b) + b` → `a`.
- `(a + b) - b` → `a`.
- `(a + b) - a` → `b`.
- `a + (b - a)` → `b`.
- `a - (a - b)` → `b`.

Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification.

(Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.)

---

This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates.

This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by rust-lang#124278 (comment).
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request May 20, 2024
Rollup merge of rust-lang#125106 - Zalathar:expressions, r=davidtwco

coverage: Memoize and simplify counter expressions

When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`.

To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead:
- `(a - b) + b` → `a`.
- `(a + b) - b` → `a`.
- `(a + b) - a` → `b`.
- `a + (b - a)` → `b`.
- `a - (a - b)` → `b`.

Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification.

(Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.)

---

This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates.

This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by rust-lang#124278 (comment).
@bors
Copy link
Contributor

bors commented May 20, 2024

☔ The latest upstream changes (presumably #125331) made this pull request unmergeable. Please resolve the merge conflicts.

@ZhuUx ZhuUx force-pushed the pattern-match branch 2 times, most recently from 659eede to 7e0374f Compare May 21, 2024 08:16
@ZhuUx
Copy link
Contributor Author

ZhuUx commented May 21, 2024

Thanks to @Nadrieril now I could find a way to treat each arm in match statements as a decision.

The basic idea is to update test vectors of every decision in every arms. For instance,

match val {
    Pat::A | Pat::B => { /* update test vectors of `Pat::A | Pat::B` and `Pat::C | Pat::D` here */}
    Pat::C | Pat::D=> { /* update test vectors of `Pat::A | Pat::B`  and `Pat::C | Pat::D` here */}
    _ => { /* update test vectors of `Pat::A | Pat::B`  and `Pat::C | Pat::D` here */}
}

By updating test vectors of all decisions simultaneously we can deal with tests affecting different decisions. Fortunately we have implemented decision depth to such cases.

normal_branch_spans: Vec<MCDCBranchSpan>,
mcdc_spans: Vec<(MCDCDecisionSpan, Vec<MCDCBranchSpan>)>,
decision_ctx_stack: Vec<DecisionCtx>,
decision_ends: FxIndexMap<Span, Rc<RefCell<Vec<BlockMarkerId>>>>,
Copy link
Contributor Author

@ZhuUx ZhuUx May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decision_ends is introduced because:

  1. Candidates with | may not have pre_binding_block when matching finishes.
  2. For every arm in match statements, arm_block is generated in lower_match_arms and immediately compiler digs into code in the arm, which might contain many other decisions.

Hence when a PatternDecisionCtx is finished, we do not know all blocks which are "ends" of these decisions. Instead we fulfill them in add_ends_to_decision later. Because decisions produced by same ctx share same ends, Rc<RefCell<_>> is used here.

@@ -755,7 +755,7 @@ impl<'a, 'tcx> Builder<'a, 'tcx> {
candidate.match_pairs.sort_by_key(|pair| matches!(pair.test_case, TestCase::Or { .. }));
}

ret
ret.map(|branch| (branch, matched_span))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you add the span regardless of fully_matched, some conditions will be duplicated; this may or may not be what you want. For example:

match x {
    (10.., true) => {}
    (100.., false) => {}
    _ => {}
}

In that example, the first condition we try is 10.., which also matches 100.. but not fully_matches it. So 100.. will show up twice: once when we test if n>=10, and the second time when we test if n>=100.

Intuitively I'd expect that we only want the second time; what do you think?

Copy link
Contributor Author

@ZhuUx ZhuUx May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled in PatternDecisionCtx::visit_matching_arms in mcdc.rs, I use false_matched to indicate if that target is full_matched or not and find the right block for it.

EDIT: false_matched is renamed to partial_matched

Copy link
Contributor Author

@ZhuUx ZhuUx May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should clarify it a bit clearer.
The motivation to catch conditions not full_matched is I want to reconstruct decision tree in mcdc.
Say the code is

// arr: &[i32]
match arr {
    [one] => { /*...*/ }
    [_, two] => { /*...*/ }
    _ => { /*..*/ }
}

First we check whether arr matches [one]. By engaging "partial matched" pattern [_, two], we know condition [one] has a false next test [_, two]. If we only pass full_matched targets to mcdc builder we have no idea how [one] is connected to [_, two], true next or false next.

(I forgot to add tests to check this kind of code. Thanks for the reminder now a simple test is added and it helps me fix a bug, see comparison)

Copy link
Member

@Nadrieril Nadrieril May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see! If you're reconstructing the whole decision tree, could it be easier to add enough Span information to the MIR so that you can act directly on the MIR? I don't know how easy that would be but it feels conceptually simpler. For example we could add info to SwitchInt or SourceInfo to indicate what pattern it comes from (if any).

Copy link
Contributor Author

@ZhuUx ZhuUx May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that may help a lot. Nevertheless I'm afraid that it results in uncomfortable side effects. Though mcdc is meaningful in some areas, mostly it is required by embedded systems working for safety critical components, which comprise only a very small portion of all applications. I'd tend to encapsulate complexity in mcdc and minimize its effects on other parts that are much more widely depended on.

Besides, mcdc criteria for pattern matching has not reached a consensus within industries yet (due to C is the only practical choice in these areas in the past and few manufacturers are aware of modern language features). The basic idea behind this implementation is addressed by a paper from ferrous system but still there might be other opinions worth consideration. So we'd better keep it in an easy-to-change way.

flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request May 24, 2024
coverage: Memoize and simplify counter expressions

When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`.

To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead:
- `(a - b) + b` → `a`.
- `(a + b) - b` → `a`.
- `(a + b) - a` → `b`.
- `a + (b - a)` → `b`.
- `a - (a - b)` → `b`.

Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification.

(Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.)

---

This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates.

This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by rust-lang/rust#124278 (comment).
@ZhuUx
Copy link
Contributor Author

ZhuUx commented May 28, 2024

Changed a bit to fit with constant folding. The main goal is to find all false blocks of branches and let the false term be sum of all counters in false blocks.
Now it uses two variants of MCDCBranchMarker to represent markers for boolean expressions and pattern matching respectively. The difference on false blocks between boolean expressions and pattern matching is the true blocks and false blocks of boolean expressions possibly does not share same predecessor in nested decisions. While for pattern matching, false block might not exist when building mir (such false blocks are inserted when instrumenting coverage statements for bcb edges). Hence it's hard to find a unified way to get false blocks for boolean expressions and patterns.
See comparison for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants