Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface to LLVM::DataLayout:: methods #360

Open
jfaure opened this issue Aug 31, 2021 · 14 comments
Open

Interface to LLVM::DataLayout:: methods #360

jfaure opened this issue Aug 31, 2021 · 14 comments
Assignees
Labels
api-coverage Issues relating to our coverage of the LLVM C API enhancement

Comments

@jfaure
Copy link
Contributor

jfaure commented Aug 31, 2021

It iss realistically necessary to be able to use getPointerSize and getTypeAllocSize (at least) from https://llvm.org/doxygen/classllvm_1_1DataLayout.html

The sizeof as a gep to then end of a nullpointer hack has the disadvantage of not being able to influence code generation.

getTypeAllocSize in particular requires us to pre-convert llvm-hs types to C++ so this may be tricky.

@andrew-wja
Copy link
Member

This is actually very straightforward to do, because LLVM exposes this through the C API: https://llvm.org/doxygen/group__LLVMCTarget.html

However, I don't quite understand exactly what you mean by "not being able to influence code generation". Do you have an example?

@jfaure
Copy link
Contributor Author

jfaure commented Sep 1, 2021

Allocating a tagged union: you want to allocate enough space for the biggest member (this cannot reliably be achieved by inspecting the llvm-hs types who don't know the pointer size and alignment details)

@andrew-wja
Copy link
Member

Ah, I see. I was only thinking about datatypes that LLVM provides, but a tagged union is indeed tricky. You would need to iterate over the union types and statically determine which is largest. Using gep only works if you can define the type using the builtin types already provided by LLVM!

Sounds like the use case is maximumBy (comparing getTypeAllocSize), which we can add to the tests to make sure that that functionality continues to work. I'll see if I can add getPointerSize and getTypeAllocSize to the FFI!

@andrew-wja andrew-wja self-assigned this Sep 1, 2021
@andrew-wja andrew-wja added api-coverage Issues relating to our coverage of the LLVM C API enhancement labels Sep 1, 2021
@jfaure
Copy link
Contributor Author

jfaure commented Sep 3, 2021

A couple other use-cases:

  1. Checking if a struct field containing a pointer is large enough for a some other operand.
  2. Checking if a previous alloca or struct field is reusable (avoid making llvm figure it out with llvm.lifetime.(start|end) intrinsics)
  3. Checking if a struct is small enough to be returned by value (system V ABI allows using 2 64 bit registers for this, and sadly it is up to the front-end to figure this out pre-llvm), or if need to write it to sret pointer

@luc-tielen
Copy link
Contributor

I just ran also in the issue of needing DataLayout functionality.
Here's the C code I'm trying to port:

#define DESIRED_NUM_KEYS \
    (((BLOCK_SIZE > sizeof(struct node_data)) \
        ? BLOCK_SIZE - sizeof(struct node_data) \
        : 0) / sizeof(value))

#define NUM_KEYS (DESIRED_NUM_KEYS > 3 ? DESIRED_NUM_KEYS : 3)

typedef struct node
{
    node_type type;
    struct node_data meta;
    value values[NUM_KEYS];
} node;

Basically: I need "sizeof" so that I can use that result during codegen to determine length of an array in some other type.

@jfaure
Copy link
Contributor Author

jfaure commented Sep 17, 2021

I added this as a sort of hack some time back https://hackage.haskell.org/package/llvm-hs-pure-9.0.0/docs/LLVM-AST-Constant.html#v:sizeof

@luc-tielen
Copy link
Contributor

@jfaure I don't think that works? ArrayType requires a Word64 for size: https://hackage.haskell.org/package/llvm-hs-pure-9.0.0/docs/LLVM-AST-Type.html#t:Type

@jfaure
Copy link
Contributor Author

jfaure commented Sep 17, 2021

That's no problem; It wraps the type with some llvm instructions, the size won't be available for you like with datalayout, but you can use it in the emitted llvm where it will hopefully be constant folded

@luc-tielen
Copy link
Contributor

@jfaure How then? It just doesn't typecheck..? Also there's no function to go from Constant to Word64.. and the other functions in that module are partial and would error out if I tried converting that way.
I would prefer defining my types all using the typedef function which uses the LLVM.AST.Type I mentioned earlier. For this I think the only way is with DataLayout..

BTW: here's what I tried:

experiment :: ModuleBuilder ()
experiment = do
  s <- typedef "struct_t" $ Just $ StructureType False [i8, i64]
  let x = Constant.sizeof s
  let a = ArrayType x i32 -- Couldn't match expected type 'Word64' with 'Constant'
  -- ...

@andrew-wja I'm not familiar with the codebase but if you give me some high level pointers on how to best approach this, I can try giving it a shot..

@andrew-wja
Copy link
Member

andrew-wja commented Sep 17, 2021

@luc-tielen I understand what you want to do, but I don't think it's possible with llvm-hs right now, so you're correct to post under this issue.

LLVM wants you to pass an integer to the ArrayType constructor, even in C++: https://llvm.org/doxygen/classllvm_1_1ArrayType.html#adf411edc4f135b570ab218079474ce77

So you really do need to ask libLLVM through an IO operation what the size of the laid-out struct type is.

Right now, it isn't possible using llvm-hs to construct any type that depends on IR-level values. It might be possible to work around this in your code generation. For example, you can use alloca to allocate an array with an IR-level Operand element count. In this case that's not very appealing, though.

@luc-tielen
Copy link
Contributor

luc-tielen commented Sep 17, 2021

This got me a little further for my specific case (it looks like some datalayout functionality is exposed in internals?):

experiment :: ModuleBuilderT IO ()
experiment = do
  s <- typedef "struct_t" $ Just $ StructureType False [i8, i64]
  size <- liftIO $ do
    s' <- Context.withContext $ flip runEncodeAST $ encodeM s
    let dl = defaultDataLayout LittleEndian
    DL.withFFIDataLayout dl $ flip DL.getTypeAllocSize s'
  print ("size =", size)

This snippet works if you use i8 or any of the other builtin types instead of s in the encodeM function, but with my custom struct I get EncodeException "reference to undefined type: Name \"struct_t\""

If I could get an up-to-date DataLayout inside the ModuleBuilder monad (like for example a currentDatalayout helper function), my problem would be fixed?

@andrew-wja
Copy link
Member

andrew-wja commented Oct 6, 2021

@luc-tielen mixing and matching between the high-level llvm-hs-pure and low-level llvm-hs FFI interface directly in this way is uncharted territory, but it makes sense that builtin types should always be visible.

I think what is happening is that the explicit runEncodeAST is blowing away the local encode state, so the type definition is no longer visible. If you look at what happens in the EncodeM instance for Type, specifically for NamedTypeReference we end up calling lookupNamedType. However, if you look at the definition of runEncodeAST it creates a new, empty encode state.

runEncodeAST is designed to be the top-level entry point to the encoding, but your code snippet is calling it inside a module builder context. I think if you add a runEncodeAST' which takes an existing encode state as a parameter and extends it, rather than running the AST encoding in a fresh encode state, that should solve your problem.

@luc-tielen
Copy link
Contributor

@andrew-wja I tried my hand at it today, but a fix is non-obvious (atleast to me).
The IR / Module builder monad keeps the definitions hidden internally.. you can extract them if you make a variant of runEncodeAST that runs in ModuleBuilderT IO a basically, but then I tried reusing some other functionally and got stuck with a cycle in my imports..

@luc-tielen
Copy link
Contributor

Did another attempt today:

experiment :: ModuleBuilderT IO ()
experiment = do
  let n = "struct_t"
      ty = StructureType False [i8, i64]
  s <- typedef n $ Just ty
  size <- liftIO $ do
    withHostTargetMachine PIC JITDefault None $ \tm -> do
      dl <- getTargetMachineDataLayout tm

      Context.withContext $ flip runEncodeAST $ do
        createType n ty
        s' <- encodeM s
        liftIO $ DL.withFFIDataLayout dl $ flip DL.getTypeAllocSize s'
  print ("size = ", size)

createType :: Name -> Type -> EncodeAST ()
createType n ty = do
  (t', n') <- createNamedType n
  defineType n n' t'
  setNamedType t' ty

This prints out size = 16 for me. Not obvious at all, but it works. Now I need to refactor it and figure out a way to nicely integrate it in my compiler 😅.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-coverage Issues relating to our coverage of the LLVM C API enhancement
Projects
None yet
Development

No branches or pull requests

3 participants