Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen][LLVMLowering] Vtable support for simple multiple inhertance without thunk #569

Merged
merged 1 commit into from May 21, 2024

Conversation

Laity000
Copy link
Contributor

@Laity000 Laity000 commented Apr 27, 2024

This PR adds Vtable support for C++ multiple inheritance without thunk. This change contains the CIR codegen and lowering work:

  1. VTableAttr should allow adding multiple ArrayAttr for multi-inheritance.
  2. VTableAddrPointOpLowering has been fixed for the multi-vtable during the MLIR lowering phase.

Example:

    class Mother {
      virtual void MotherFoo() {}
      virtual void MotherFoo2() {}
    }
    
    class Father {
      virtual void FatherFoo() {}
    }
    
    class Child : public Mother, public Father {
      void MotherFoo() override {}
    }
    cir.global linkonce_odr @_ZTV5Child = #cir.vtable<
    {#cir.const_array<[
      #cir.ptr<null> :  #!cir.ptr<!u8i>,
      #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
      #cir.global_view<@_ZN5Child9MotherFooEv> : !cir.ptr<!u8i>,
      #cir.global_view<@_ZN6Mother10MotherFoo2Ev> : !cir.ptr<!u8i>]> : !cir.array<!cir.ptr<!u8i> x 4>,
     #cir.const_array<[
      #cir.ptr<-8> : !cir.ptr<!u8i>,
      #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
      #cir.global_view<@_ZN6Father9FatherFooEv> : !cir.ptr<!u8i>]
    > : !cir.array<!cir.ptr<!u8i> x 3>}> : !ty_anon_struct3

Copy link
Collaborator

@Lancern Lancern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some further discussions on the typing of vtables. CIRGen currently follows the original clang CodeGen scheme which represents vtables as arrays of pointers. Although most elements in a vtable are indeed pointers, elements such as offset-to-top are not. Representing these elements as pointers requires ugly hack as shown in this commit (i.e. #cir.ptr<-8>).

let description = [{
A pointer attribute is a literal attribute that represents an integral
value of a pointer type.
}];
let builders = [
AttrBuilderWithInferredContext<(ins "Type":$type, "uint64_t":$value), [{
AttrBuilderWithInferredContext<(ins "Type":$type, "int64_t":$value), [{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can be strange. A "negative" pointer value does not make any sense in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand ConstPtrAttr to be equivalent to llvm.inttoptr. #cir.ptr<-8> can be translated as ptr inttoptr (i64 -8 to ptr). Or should a new attribute be added to represent relative offset?

Copy link
Collaborator

@Lancern Lancern Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand ConstPtrAttr to be equivalent to llvm.inttoptr

Yes, they are the same on the code they eventually generate. But we should focus more on the abstraction they provide, especially when designing an IR that contains high-level semantics. #cir.const_ptr actually models reinterpret_cast in C++ that casts integers to pointers. It just does not make sense to cast a negative integer to a pointer, thus the argument to #cir.const_ptr should be an unsigned integer.

As you pointed out in the comment, the offset-to-top element in a vtable is an integer of type ptrdiff_t under Itanium C++ ABI. Thus a better design would be using #cir.int<-8> : !T to represent the offset-to-top element in a vtable. where !T is the CIR integer type for ptrdiff_t. But this is impossible now because we use a pointer array to represent a vtable. This is exactly what we need to further discuss with Bruno and other CIR folks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just does not make sense to cast a negative integer to a pointer, thus the argument to #cir.const_ptr should be an unsigned integer.

Intuitivelly it doesn't make much sense, but in practice we find code like that all around, simple example: https://godbolt.org/z/aMdxfvrf3. This is how CIR handles it: https://godbolt.org/z/55fx96z1Y. Note that if we #cir.ptr was a bit more flexible here we wouldn't need the unary op. There are many ways to represent this, for example, it's possible to keep encoding those as unsigned and make them signed when materializing as part of some use. However, I rather we use a more obvious representation.

In general, though not related to vtables, This could be used for things like tagging pointers with some sort of non-null sentinel, etc. The way to thing about this is that the constant initialization isn't related to the pointee. We should be able to write #cir.ptr<-1 : i64> : !cir.ptr<whatever>

As you pointed out in the comment, the offset-to-top element in a vtable is an integer of type ptrdiff_t under Itanium C++ ABI. Thus a better design would be using #cir.int<-8> : !T to represent the offset-to-top element in a vtable. where !T is the CIR integer type for ptrdiff_t. But this is impossible now because we use a pointer array to represent a vtable. This is exactly what we need to further discuss with Bruno and other CIR folks.

Since the offset-to-top is always a constant, there are a couple of different approaches to consider:

  1. Teach #cir.ptr to accept a MLIR integer instead of a plain uint64_t. We should do that regardless of this PR.
  2. Change the vtable representation from constant array of pointers to a constant struct of pointers. I need to refresh my itanium-abi-fu but if we need more entries to keep offsets, we might a way for the operation to track them from each other?
  3. Raise the vtable operation a bit more such that we keep offset info as a MLIR integer in some other param outside vtable_data. This allows us to write some nice query like attr.getOffsetFromTop or somethimg. I was expecting the multiple inheritance to force us into improving the design, this is probably one way to go about this.

I'm more inclined towards (3), but would love to see improvements in (1) at some point in case someone wants to tackle it. For (3), we need to change that in another PR, and then use it here once that lands.

Any extra thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitivelly it doesn't make much sense, but in practice we find code like that all around, simple example: https://godbolt.org/z/aMdxfvrf3. This is how CIR handles it: https://godbolt.org/z/55fx96z1Y. Note that if we #cir.ptr was a bit more flexible here we wouldn't need the unary op.

OK I can buy this.

I'm more inclined towards (3), ... Any extra thoughts?

The problem of (3) is that a single vtable may contain multiple offset-to-top elements in the presence of multiple inheritance. For example:

struct Left { virtual ~Left(); };
struct Right { virtual ~Right(); };
struct Derived : Left, Right {};
@vtable for Derived = linkonce_odr dso_local unnamed_addr constant { [4 x ptr], [4 x ptr] } {
  [4 x ptr] [
    ptr null,                      ;; Offset-to-top from Left to Derived
    ptr @typeinfo for Derived,
    ptr @Derived::~Derived(),
    ptr @Derived::~Derived()
  ],
  [4 x ptr] [
    ptr inttoptr (i64 -8 to ptr),  ;; Offset-to-top from Right to Derived
    ptr @typeinfo for Derived,
    ptr @non-virtual thunk to Derived::~Derived(),
    ptr @non-virtual thunk to Derived::~Derived()
  ]
}, comdat, align 8

Thus if offset-to-top values are kept separate from vtable_data we may need a way to track which base class each of them corresponds to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus if offset-to-top values are kept separate from vtable_data we may need a way to track which base class each of them corresponds to.

As long as offset-to-top is a list of the same size as vtable_data, all should be good, shouldn't it?

But more that that, we could track multiple lanes just as easy with some other approaches. The vtable right now is still a llvm friendly layout, but it could be anything that makes more sense / easier to use - as long as it could keep lowering to the same LLVM code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this PR attempting to modify #cir.ptr to accept mlir::integer instead of plain uint64_t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Laity000, but as previously mentioned, changing #cir.ptr should be another PR on it own, can you please address it outside of this one?

Copy link
Contributor Author

@Laity000 Laity000 May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, though not related to vtables, This could be used for things like tagging pointers with some sort of non-null sentinel, etc. The way to thing about this is that the constant initialization isn't related to the pointee. We should be able to write #cir.ptr<-1 : i64> : !cir.ptr<whatever>

Sorry for the delay of a few days. I've created a new PR #598 . Is it possible to directly use the int64_t type as the value of the #cir.ptr? If using MLIR integer types like #cir.ptr<-1 : i64>, I understand we should consider the address size based on XX ABI to determine the type, include #cir.ptr<null>. Can we just not worry about the address size when we create it, and instead get it from the DataLayout when we use the attribute? @bcardosolopes

Comment on lines 3012 to 2977
auto llvmSrcVal = adaptor.getOperands().front();
auto llvmDstTy =
getTypeConverter()->convertType(op.getBaseAddr().getType());
rewriter.replaceOpWithNewOp<mlir::LLVM::BitcastOp>(op, llvmDstTy,
llvmSrcVal);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cir.base_class_addr is not simply a bitcast. It is expected to perform some pointer arithmetic to convert an object pointer to a base subobject pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. So, like you mentioned, BaseClassAddrOp needs to be more abstract in its syntax. For instance, we should also include the baseoffset values for casting to base classes in this attribute. And instead of generating them directly in CIRGen, they should all be handled together during the lowering phase. That's what I've got in mind. Any other situations?
I removed this in the PR, and perhaps it can be implemented separately later on.

Copy link
Collaborator

@Lancern Lancern May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other situations?

At least 2 approaches exist that provides necessary information to cir.base_class_addr and allows proper lowering of it:

  1. Attach cir.base_class_addr with a new attribute, e.g. #cir.base_class_addr_info, that contains the base-to-derived offsets. During lowering prepare, the offsets are extracted from the attribute and used for lowering the operation.
  2. Attach cir.base_class_addr with the AST node corresponds to the derived-to-base cast expression. During lowering prepare, execute any necessary AST queries to find the offset for lowering the operation.

An example of the first approach can be found at the implementation of the cir.dyn_cast operation. The operation carries a #cir.dyn_cast_info attribute that contains all necessary ABI-specific information for lowering the operation during lowering prepare. While the 2nd approach allows a clearer assembly syntax.

As a side note, cir.base_class_addr should also handle derived-to-virtual-base casts whose offsets cannot be known statically. Thus we need at least 2 offsets to do the conversion job: 1) the address point in the vtable that corresponds to the vbase offset entry; 2) a fixed offset from the derived object (or vbase subobject) to the target base object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lancern very good explanation!

@Laity000 I suggest you start simple: only handle static offsets for now and assert when that's not the case. Please let's build this incrementally. Both (1) and (2) seems fine to me, but I'm inclined for (1) given that we already raise a bunch of vtable information and we have a use for that in mind in the future.

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp Outdated Show resolved Hide resolved
@Laity000
Copy link
Contributor Author

Laity000 commented Apr 27, 2024

I think we need some further discussions on the typing of vtables. CIRGen currently follows the original clang CodeGen scheme which represents vtables as arrays of pointers. Although most elements in a vtable are indeed pointers, elements such as offset-to-top are not. Representing these elements as pointers requires ugly hack as shown in this commit (i.e. #cir.ptr<-8>).

Thank you for your review.

I've reviewed the Itanium C++ ABI document:

Offsets are of type ptrdiff_t unless otherwise stated. We call it the address point of the virtual table. The virtual table may therefore contain components at either positive or negative offsets from its address point.

So, I'm not sure whether to use ptrdiff_t, ptr, or a new attr for the type of vtable?

This might require a further discussion. @Lancern @bcardosolopes

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see your first PR here, thanks! Comments inline

let description = [{
A pointer attribute is a literal attribute that represents an integral
value of a pointer type.
}];
let builders = [
AttrBuilderWithInferredContext<(ins "Type":$type, "uint64_t":$value), [{
AttrBuilderWithInferredContext<(ins "Type":$type, "int64_t":$value), [{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just does not make sense to cast a negative integer to a pointer, thus the argument to #cir.const_ptr should be an unsigned integer.

Intuitivelly it doesn't make much sense, but in practice we find code like that all around, simple example: https://godbolt.org/z/aMdxfvrf3. This is how CIR handles it: https://godbolt.org/z/55fx96z1Y. Note that if we #cir.ptr was a bit more flexible here we wouldn't need the unary op. There are many ways to represent this, for example, it's possible to keep encoding those as unsigned and make them signed when materializing as part of some use. However, I rather we use a more obvious representation.

In general, though not related to vtables, This could be used for things like tagging pointers with some sort of non-null sentinel, etc. The way to thing about this is that the constant initialization isn't related to the pointee. We should be able to write #cir.ptr<-1 : i64> : !cir.ptr<whatever>

As you pointed out in the comment, the offset-to-top element in a vtable is an integer of type ptrdiff_t under Itanium C++ ABI. Thus a better design would be using #cir.int<-8> : !T to represent the offset-to-top element in a vtable. where !T is the CIR integer type for ptrdiff_t. But this is impossible now because we use a pointer array to represent a vtable. This is exactly what we need to further discuss with Bruno and other CIR folks.

Since the offset-to-top is always a constant, there are a couple of different approaches to consider:

  1. Teach #cir.ptr to accept a MLIR integer instead of a plain uint64_t. We should do that regardless of this PR.
  2. Change the vtable representation from constant array of pointers to a constant struct of pointers. I need to refresh my itanium-abi-fu but if we need more entries to keep offsets, we might a way for the operation to track them from each other?
  3. Raise the vtable operation a bit more such that we keep offset info as a MLIR integer in some other param outside vtable_data. This allows us to write some nice query like attr.getOffsetFromTop or somethimg. I was expecting the multiple inheritance to force us into improving the design, this is probably one way to go about this.

I'm more inclined towards (3), but would love to see improvements in (1) at some point in case someone wants to tackle it. For (3), we need to change that in another PR, and then use it here once that lands.

Any extra thoughts?

@bcardosolopes
Copy link
Member

@lanza rebased over the weekend, sorry for the churn. Can you rebase this PR again?

@Laity000 Laity000 force-pushed the multi-vtable branch 2 times, most recently from 1717333 to 9a5d0e5 Compare April 30, 2024 08:50
Laity000 added a commit to Laity000/clangir that referenced this pull request May 2, 2024
…tance without thunk

This PR adds Vtable support for C++ multiple inheritance without thunk:
1. The `VTableAttr` supports multiple `ArrayAttr`, and the `VTableAddrPointOpLowering` supports multi-vtable lowering
2. To support the `offset-to-top` members of multi-vtable, the current implementation is to make `ConstPtrAttr`accepts a MLIR integer instead of a plain uint64_t. More further discussions by [this comment](llvm#569 (comment))

Example:
```mlir
cir.global linkonce_odr @_ZTV5Child = #cir.vtable<
{#cir.const_array<[
  #cir.ptr<null> :  #!cir.ptr<!u8i>,
  #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN5Child9MotherFooEv> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN6Mother10MotherFoo2Ev> : !cir.ptr<!u8i>]> : !cir.array<!cir.ptr<!u8i> x 4>,
 #cir.const_array<[
  #cir.ptr<-8 : i64> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN6Father9FatherFooEv> : !cir.ptr<!u8i>]
> : !cir.array<!cir.ptr<!u8i> x 3>}> : !ty_anon_struct3
```
Laity000 added a commit to Laity000/clangir that referenced this pull request May 2, 2024
…tance without thunk

This PR adds Vtable support for C++ multiple inheritance without thunk:
1. The `VTableAttr` supports multiple `ArrayAttr`, and the `VTableAddrPointOpLowering` supports multi-vtable lowering
2. To support the `offset-to-top` members of multi-vtable, the current implementation is to make `ConstPtrAttr`accepts a MLIR integer instead of a plain uint64_t. More further discussions by [this comment](llvm#569 (comment))

Example:
```mlir
cir.global linkonce_odr @_ZTV5Child = #cir.vtable<
{#cir.const_array<[
  #cir.ptr<null> :  #!cir.ptr<!u8i>,
  #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN5Child9MotherFooEv> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN6Mother10MotherFoo2Ev> : !cir.ptr<!u8i>]> : !cir.array<!cir.ptr<!u8i> x 4>,
 #cir.const_array<[
  #cir.ptr<-8 : i64> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN6Father9FatherFooEv> : !cir.ptr<!u8i>]
> : !cir.array<!cir.ptr<!u8i> x 3>}> : !ty_anon_struct3
```
@@ -248,18 +248,26 @@ def FPAttr : CIR_Attr<"FP", "fp", [TypedAttrInterface]> {

def ConstPtrAttr : CIR_Attr<"ConstPtr", "ptr", [TypedAttrInterface]> {
let summary = "Holds a constant pointer value";
let parameters = (ins AttributeSelfTypeParameter<"">:$type, "uint64_t":$value);
let parameters = (ins AttributeSelfTypeParameter<"">:$type, "mlir::IntegerAttr":$value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's the same change I suggested in the other PR, please incorporate this bit over there instead.

delete c;

return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this build all the way to LLVM? Would be nice to have LLVM tests right here as well (see other test for examples) and already include the LLVM lowering support needed (if any is necessary), so we can make sure that we are matching what's expected out from C++ source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated.

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost good! Thanks

…tance without thunk

This PR adds Vtable support for C++ multiple inheritance without thunk:
1. The `VTableAttr` supports multiple `ArrayAttr`.
2. The `VTableAddrPointOpLowering` supports multi-vtable lowering

Example:
```mlir
cir.global linkonce_odr @_ZTV5Child = #cir.vtable<
{#cir.const_array<[
  #cir.ptr<null> :  #!cir.ptr<!u8i>,
  #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN5Child9MotherFooEv> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN6Mother10MotherFoo2Ev> : !cir.ptr<!u8i>]> : !cir.array<!cir.ptr<!u8i> x 4>,
 #cir.const_array<[
  #cir.ptr<-8 : i64> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZTI5Child> : !cir.ptr<!u8i>,
  #cir.global_view<@_ZN6Father9FatherFooEv> : !cir.ptr<!u8i>]
> : !cir.array<!cir.ptr<!u8i> x 3>}> : !ty_anon_struct3
```
Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for applying all the reviews. LGTM (but see minor comment)

// CIR: }

// LLVM-DAG: define linkonce_odr void @_ZN6MotherC2Ev(ptr %0)
// LLVM-DAG: store ptr getelementptr inbounds ({ [4 x ptr] }, ptr @_ZTV6Mother, i32 0, i32 0, i32 2), ptr %{{[0-9]+}}, align 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in OG codegen we get store ptr getelementptr inbounds inrange(-16, 16) ({ [4 x ptr] }, ptr @_ZTV6Mother, i32 0, i32 0, i32 2), ptr %3, align 8. After this lands, can you work on adding inrange(-16, 16) support in another PR?

@bcardosolopes bcardosolopes merged commit 632d7e0 into llvm:main May 21, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants