New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR][CIRGen] Enhance switch #528
base: main
Are you sure you want to change the base?
Conversation
Actually the body of a switch statement can be neither of int f(int x) {
switch (x)
return 1;
return 2;
} This is accepted by clang ( |
8cba98d
to
b5eaa96
Compare
@Lancern Added |
a7b0446
to
5915d80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for you first patch and for working on this.
clang/test/CIR/CodeGen/switch.cpp
Outdated
// CHECK: cir.func @_Z9caseNone1i | ||
// CHECK: cir.scope { | ||
// CHECK: cir.switch | ||
// CHECK-NEXT: case (none) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the body of the switch is never going to executed, instead of introducing none
, we should just emit an empty switch body instead.
Well, things are a little tricky in the case when the body of a void foo(int x) {
switch (x)
while (condition()) {
case 42:
do_something();
}
} The |
It becomes a little complex when we consider The definition assume the size of case attributes is same with regions, unfortunately the region may be nested.
I'm not sure what a reasonable
Otherwise I noticed |
Since |
1e75d7b
to
e19aa5a
Compare
Let me know when this comes out of Draft state and I'll take a look again |
072a50b
to
c9f7e9a
Compare
Appreciate for the suggestions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, almost there. Few more inline comments.
@@ -1991,6 +1994,11 @@ class CIRGenFunction : public CIRGenTypeCache { | |||
mlir::Block *getEntryBlock() { return EntryBlock; } | |||
|
|||
mlir::Location BeginLoc, EndLoc; | |||
// Each SmallVector<APSInt> object is corresponding to a case region, empty | |||
// vector means default case region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these 3 new members really need to be inside LexicalScope
? Perhaps I missed something. Feels like they provide more details of switch lowering we'd want inside LexicalScope
. Worst case they can be inside a struct declared inside CIRGenStmt.cpp
and passed around over there? My impression is:
More specifically, lastCaseBlock
could just be a local variable inside buildCaseCompoundStmt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's an unnecessary change, let me have an explanation.
My initial (obsoleting) idea is CaseStmt
may be inside any statements, so it's better to handle it in buildStmt()
instead of buildSwitchStmt()
. The logic would be like:
auto buildStmt(const Stmt *S) {
...
if (S is CaseStmt)
buildCaseStmt(S);
...
}
auto buildSwitchStmt(const SwitchStmt &S) {
...
buildStmt(S.body());
...
}
auto buildCaseStmt(const CaseStmt &S) {
...
auto switchLexScope = currLexScope;
while (!switchLexScope->isSwitch())
switchLexScope = switchLexScope->parrentLexScope;
// Then update case attributes and `lastCaseBlock` in switchLexScope
...
}
So I moved some variables to LexicalScope
to make buildCaseStmt()
only has one parameter const CaseStmt &S
.
Later, I found all I need to do is only handling the CaseStmt
in CompoundStmt
's body, now that we wouldn't handle the scope cross in this pr. Then the previous change that defined variables inside LexicalScope
looks weird.
After thinking, I kept the change. Because we still need it in the future, when we want to support CaseStmt
nested in other scope like WhileStmt
.
Hope I explained it clearly, let me know if you still have concern on this change, I could revert it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, it make sense for the most part, though I still have few concerns.
Before we set on the final approach, one note is that we usually try to avoid visiting nodes multiple times. For example, collectCompoundStmtHasCase
is being called before calling another method that is applying the same traversal. Since we can already walk scopes up to update information, why this is needed?
In most cases we do one visit and keep some information around (e.g. Lexscope) and that's usually enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we have to visit the nodes twice, the problem is also related to scope.
Ideally, we should create a new scope for every CompoundStmt
, all the problems will be gone.
However we haven't supported CaseStmt
inside the new scope. So I add a function buildCaseCompoundStmt()
, which won't create new scope.
If there is a CaseStmt
inside the CompoundStmt
subtree we call buildCaseCompoundStmt()
, if not, call buildStmt()
to create a new scope.
switch(a) {
// Should we create a new scope for case body stmt? It depends.
case 1: {
{
// There may be case here
...
}
Foo foo;
}
}
switch(a) {
// Call buildStmt() and create a new scope,.
case 1: {
{
int x = 0;
}
Foo foo;
}
}
switch(a) {
// Call buildCaseCompoundStmt(), don't create a new scope.
case 1: {
{
case 2:
}
Foo foo;
}
}
So the logic will be like
buildCaseCompoundStmt(CompoundStmt &S) {
for (auto *body : S.body()) {
if (body is compound) {
if (hasCase)
buildCaseCompoundStmt(body);
else
// We can't call buildCaseCompoundStmt() here, otherwise we will lose the new scope.
buildStmt(body, /*useCurrentScope=*/false);
} else {
// If we want to calculate hasCase inside buildCaseCompoundStmt,
// we need to build this node before the parent node could decide whether should create a new scope.
buildStmt(body, /*useCurrentScope=*/true);
}
}
}
So we can't compute hasCase
inside buildCaseCompoundStmt
, and extra collect function is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we have to visit the nodes twice, the problem is also related to scope.
As general feedback, that much fine grained level of tracking + multiple visitors complicates the design and is usually an indicative that we're trying to do more work than necessary and/or that CIR design needs more improvements. You might have picked a complicated feature to work in this PR, but bear with me cause eventually we are going to converge on something. Also apologies for pushing harder into making this more comprehensive than initially needed.
Ideally, we should create a new scope for every CompoundStmt, all the problems will be gone. However we haven't supported CaseStmt inside the new scope.
Somehow I missed that in previous reviews. Given the mentioned complexity, it's possible that we might have to change cir.switch/cir.case
a bit to better support scopes as written. We can then later add extra passes that will clean up unncessary stuff.
As you pointed out (thanks for the nice examples), they are very complex to handle. In the end, the approach of the current PR is trying to detect simple cases involving nested compound statements, but they are just papering over all the inter scopes jumps that are necessary (e.g. https://godbolt.org/z/E65PE4vG1). I don't think we are ready to mess up with compound statements just yet, we need to build up on Oleg's work here. To make CIRGen flow natural, we'll need to create new scopes and to add synthetic forms of cir.case
we can patch up later while flattening the CFG. Not much different from case statements in loops like we discussed before.
My suggestion is to take a step back here:
- Not touch compound statements in this PR (only if you are adding new NYI asserts, but no additional visitors).
- Tackle only cases that don't have extra compound statements, like in Assertion failure on switch statement with non-block substatement #520 and Assertion failure on switch statement with code outside a case #521.
Once Oleg lands his work on gotos, I'm happy to help you continue the compound work and improve CIR to make it happen. How about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I took another look at the patch and there are too many refactoring going on at the same time there's functionality change, and I'm not very confident to let this land right now. I'd like to first approve a PR that only does the NFC refactoring, and later on a PR that changes the minimum code that fixes only the problem in question. You could create a new PR for the refactoring and later update this one when it lands, your call on the PR approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, refactoring pr created #552, thanks for comments~
clang/test/CIR/CodeGen/switch.cpp
Outdated
@@ -275,14 +276,124 @@ void sw12(int a) { | |||
// CHECK-NEXT: cir.break | |||
// CHECK-NEXT: } | |||
|
|||
void fallthrough(int x) { | |||
void sw13(int a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These testcases are great. Can you add a couple more variations where extra nested switch shows up? Just wanna make sure the lexical scope here is doing the right thing.
c9f7e9a
to
14b2183
Compare
14b2183
to
b7e1c76
Compare
Comments addressed. |
Make logic cleaner and more extensible. Separate collecting `SwitchStmt` information and building op logic into different functions. Add more UT to cover nested switch, which also worked before this pr. This pr is split from #528.
Let me know once this is ready for review again! |
9ddab69
to
ef970b7
Compare
Hi, this pr is ready for review, thanks~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, much easier for me to understand the approach now that the refactoring part is gone. More comments inline.
clang/lib/CIR/CodeGen/CIRGenStmt.cpp
Outdated
@@ -976,7 +1000,8 @@ mlir::LogicalResult CIRGenFunction::buildSwitchBody( | |||
builder.setInsertionPointToEnd(lastCaseBlock); | |||
res = buildStmt(c, /*useCurrentScope=*/!isa<CompoundStmt>(c)); | |||
} else { | |||
llvm_unreachable("statement doesn't belong to any case region, NYI"); | |||
checkCaseNoneStmt(*c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just call buildStmt()
instead or rewrite more logic in a way that CIRGen just falls out naturally. Like I mentioned in previous reviews, this is dispatching another visitor logic just for the sake of grabbing information that could just be handled in our regular CIRGen visiting path.
Looking at checkCaseNoneStmt
impl, specifically Stmt::CaseStmtClass
/DefaultStmtClass
: you should add a buildCaseStmt
and buildDefaultStmt
and call them from buildSimpleStmt
. That code should already be part of CIRGen emission, and not something that does a side checking. If you need this info, you can walk LexScopes up to find if we are in a switch or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we call buildStmt()
, for ReturnStmt
we need to get currLexScope->RetBlocks
, we expect the block is belong to the case region, but CaseNoneStmt
has no region.
I'll keep working to find a solution. Pushed a wip
commit in case you want to have a look about the draft.
ef970b7
to
5223c3c
Compare
Make logic cleaner and more extensible. Separate collecting `SwitchStmt` information and building op logic into different functions. Add more UT to cover nested switch, which also worked before this pr. This pr is split from #528.
Make logic cleaner and more extensible. Separate collecting `SwitchStmt` information and building op logic into different functions. Add more UT to cover nested switch, which also worked before this pr. This pr is split from #528.
Make logic cleaner and more extensible. Separate collecting `SwitchStmt` information and building op logic into different functions. Add more UT to cover nested switch, which also worked before this pr. This pr is split from #528.
@wenpen still working on this? I don't usually look at draft PRs, just trying to make sure if there's something I should be looking here. |
@bcardosolopes Yes, just be a little busy recently, I will update the pr and request review form you later days, thanks~ |
5223c3c
to
9ae5d1f
Compare
clang/lib/CIR/CodeGen/CIRGenStmt.cpp
Outdated
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then | ||
// clean up the code below. | ||
if (currLexScope->IsInsideCaseNoneStmt) | ||
return mlir::success(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found many sample code that failed due to incorrect terminator in block, e.g.
switch(a) {
case 0:
break;
int x = 1;
}
switch(a) {
case 0:
return 0;
return 1;
int x = 1;
}
for (;;) {
break;
int x = 1;
}
Looks like it's another large work, so I just skip ReturnStmt here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, can you file a new issue and list these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm opposed to return mlir::success();
because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?
clang/lib/CIR/CodeGen/CIRGenStmt.cpp
Outdated
@@ -328,6 +328,14 @@ mlir::LogicalResult CIRGenFunction::buildLabelStmt(const clang::LabelStmt &S) { | |||
// IsEHa: not implemented. | |||
assert(!(getContext().getLangOpts().EHAsynch && S.isSideEntry())); | |||
|
|||
// TODO: After support case stmt crossing scopes, we should build LabelStmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any TODO
in CIRGen should be TODO(cir)
@@ -2027,6 +2031,8 @@ class CIRGenFunction : public CIRGenTypeCache { | |||
// Scope entry block tracking | |||
mlir::Block *getEntryBlock() { return EntryBlock; } | |||
|
|||
bool IsInsideCaseNoneStmt = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this, reasons below.
// and clean LexicalScope::IsInsideCaseNoneStmt. | ||
for (auto *lexScope = currLexScope; lexScope; | ||
lexScope = lexScope->getParentScope()) { | ||
assert(!lexScope->IsInsideCaseNoneStmt && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you remove this code? Also, why doesn't it work to just walk the scope up until you find a switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, we won't need this assert anymore if we could keep the case none stmt somehow as you suggested.
What happens if you remove this code?
Remove this code won't cause incorrect behavior currently (as we didn't support goto in that case yet), but I think it may produce strange error message in the future.
switch (int x) {
foo:
x = 1;
break;
case 2:
goto foo;
}
We need to avoid erasing the CaseNoneStmt
containing label foo
.
why doesn't it work to just walk the scope up until you find a switch?
Refer to the below code, we need to guarantee the removed Stmt
won't contain any LabelStmt
, whether the LabelStmt
is inside another nested switch or not.
switch(x) {
switch(x) {
case 1:
foo:
break;
}
break;
case 1:
goto foo;
}
clang/lib/CIR/CodeGen/CIRGenStmt.cpp
Outdated
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then | ||
// clean up the code below. | ||
if (currLexScope->IsInsideCaseNoneStmt) | ||
return mlir::success(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, can you file a new issue and list these?
clang/lib/CIR/CodeGen/CIRGenStmt.cpp
Outdated
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then | ||
// clean up the code below. | ||
if (currLexScope->IsInsideCaseNoneStmt) | ||
return mlir::success(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm opposed to return mlir::success();
because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?
@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType, | |||
llvm_unreachable("expect case or default stmt"); | |||
} | |||
|
|||
mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) { | |||
// Create orphan region to skip over the case none stmts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you are creating an orphan region, this mean that anything emitted inside a buildCaseNoneStmt
will never execute, right? The problem if a orphan region is that it won't get attached to anything, so it really adds no value (not even for unrecheable code analysis). If so, better just to split the current basic block A into two: B and C. A should jump to C and you emit the code in B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a good place to hold the block of CaseNoneStmt
.
For example
void f(int x) {
switch(x) {
break;
}
}
There is no region inside SwitchOp
, so we have to put the break
block outside SwitchOp
, which cause verification failed: 'cir.break' op must be within a loop or switch
.
Did I misunderstand something? Looking forward to your suggestions~
f726860
to
7a61b3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType, | |||
llvm_unreachable("expect case or default stmt"); | |||
} | |||
|
|||
mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) { | |||
// Create orphan region to skip over the case none stmts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a good place to hold the block of CaseNoneStmt
.
For example
void f(int x) {
switch(x) {
break;
}
}
There is no region inside SwitchOp
, so we have to put the break
block outside SwitchOp
, which cause verification failed: 'cir.break' op must be within a loop or switch
.
Did I misunderstand something? Looking forward to your suggestions~
// and clean LexicalScope::IsInsideCaseNoneStmt. | ||
for (auto *lexScope = currLexScope; lexScope; | ||
lexScope = lexScope->getParentScope()) { | ||
assert(!lexScope->IsInsideCaseNoneStmt && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, we won't need this assert anymore if we could keep the case none stmt somehow as you suggested.
What happens if you remove this code?
Remove this code won't cause incorrect behavior currently (as we didn't support goto in that case yet), but I think it may produce strange error message in the future.
switch (int x) {
foo:
x = 1;
break;
case 2:
goto foo;
}
We need to avoid erasing the CaseNoneStmt
containing label foo
.
why doesn't it work to just walk the scope up until you find a switch?
Refer to the below code, we need to guarantee the removed Stmt
won't contain any LabelStmt
, whether the LabelStmt
is inside another nested switch or not.
switch(x) {
switch(x) {
case 1:
foo:
break;
}
break;
case 1:
goto foo;
}
// TODO(cir): Rewrite the logic to handle ReturnStmt inside SwitchStmt, then | ||
// clean up the code below. | ||
if (currLexScope->IsInsideCaseNoneStmt) | ||
return mlir::success(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm opposed to return mlir::success(); because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?
buildReturnStmt()
assume there is exactly one return block in a region, and there is one region in a lexical scope, the only exceptions are switch scope, which has multiple regions. The related code is
mlir::Block *getOrCreateRetBlock(CIRGenFunction &CGF, mlir::Location loc) {
unsigned int regionIdx = 0;
if (isSwitch())
regionIdx = SwitchRegions.size() - 1;
if (regionIdx >= RetBlocks.size())
return createRetBlock(CGF, loc);
return &*RetBlocks.back();
}
So if we remove the return here, the following code will cause crash. regionIdx
will be -1, and we'll call RetBlocks .back()
with empty RetBlocks
int f(int x) {
switch(x) {
return 0;
}
return 1;
}
By the way, I believe the current implementation of getOrCreateRetBlock()
about switch is incorrect and also should be solved after changing definition of SwitchOp
.
Support non-block
case
and statementw that don't belong to anycase
region, fix #520 #521