Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of the MIR union #335

Open
DangerMouseB opened this issue May 24, 2023 · 16 comments
Open

State of the MIR union #335

DangerMouseB opened this issue May 24, 2023 · 16 comments

Comments

@DangerMouseB
Copy link

Hi Vladimir,

I've been watching your repo for a while now (2 years maybe) - the project is very interesting to. However, I've held off investing much time in learning it as the front page states that it is in initial stages of development just here for the purposes of getting acquainted. There are ongoing commits away from master which makes me curious. Could you maybe give an update and say a bit about how it's going and where you're hoping to take it?

Many thx,

David

@vnmakarov
Copy link
Owner

The whole development is done now on bbv branch. There are a lot activities on this branch but github does not report activities on non-main branches.

My goal is to merge bbv branch into trunk when it is ready. I am planning to do this at the summer end/begging of fall.

New insns will be added, There will be new features I found useful for writting better performing JIT-compilers: global register variables, simplified call/return MIR insns, bultin_expect. There will be support of basic block versioning and may be even meta-tracing JIT compiler support. These features are described in my blog post https://developers.redhat.com/articles/2022/02/16/code-specialization-mir-lightweight-jit-compiler#aliasing_in_a_jit_compiler.

The MIR-generator will be changed a lot. There will be no constraints to use only conventional SSA.

Optimizations for values in memory will added, especially when memory is worked with stack manner (usually used by interpreters). New optimizations like register pressure-relief, coalescing, SSA insn combining, loop-invariant motion etc will be added.

Register allocator will be significantly improved and will support live-range splitting (e.g. around loops).

All of this will be done to improve JIT code of interpreter code as for for typical C code (generated by c2m).

So the next release will be a major one. Ambitious goals currently present on the project page will be changed. The project will be more oriented to simplify implementation of JIT compilers of static and dynamic programming languages which are currently implemented by interpretation.

@dibyendumajumdar
Copy link
Contributor

hi @vnmakarov

Making MIR more focused to its use as JIT backend for languages is a good direction IMO.
I hope some C extensions can be added to help this. You mentioned built-in expect.
How about the ability to have local functions without the regular function call overhead - more like local jump and return. This will be a boon for converting interpreter VM bytecode to JIT - as typically each byte code implementation can be then executed in this way rather than having to inline them causing code bloat. LuaJIT uses this approach - the VM bytecode functions are just local jumps.

@vnmakarov
Copy link
Owner

I think it is done. There is already a lighter versions of calls. BBV branch has new MIR insns (JCALL and JRET) and corresponding C builtins.

Usually interpreter is implemented through indirect goto like pc+=insn_len; goto *pc where the first VM insn word contains address of labeled code of the interpreter responsible for execution of the corresponding VM insn. With the new MIR insns it is easy to switch to JITted code (of one or more VM insns) by changing address in the 1st VM insn word. The global context can be passed by global register variables (another MIR extension on bbv branch) also used in the interpreter as local register variables (a GCC extension).

Also by default all calls in MIR are implemented through calling thunk. It permits to change the generated code on-the-fly. When it is not necessary, on the bbv branch you can get and use the generated MIR function directly bu using a new function (_MIR_get_thunk_addr, the name of this function is not final and can be changed in the release).

I use this extensions to implement an experimental JIT for Ruby https://github.com/vnmakarov/ruby.

@mingodad
Copy link

When testing the bbv branch with sqlite3 I'm still getting the same problem reported here #286 .

@dibyendumajumdar
Copy link
Contributor

Thank you, are those features usable from C code since I don't directly generate MIR instructions?

@vnmakarov
Copy link
Owner

are those features usable from C code since I don't directly generate MIR instructions?

Yes. Btw, I did use only C too for Ruby JIT I mentioned.

@DangerMouseB
Copy link
Author

Thank you for clarifying that. I've been using QBE so far - I like the simplicity of the IR (no types, it fixes up non SSA) and the fact that the ABIs are already implemented - though my end goal is JIT compilation for a hopefully highish performance functional style language with multi-methods and generics based on Smalltalk and q and I'm wondering about putting in more time to get up to speed with MIR.

It sounds like it will be worth my while judging from your comments above. If I may, a few more questions come to mind.

  • Can I check that MIR handles function calls to C code (i.e. the ABI)? (and can be called from C).
  • Does MIR handle both cdecl and stdcall on windows? (I'm on macos aarch64 but windows feels important).
  • Is it possible to insert debug info?
  • will it be possible to have a zero cost style exception handling, (i.e. zero cost on the happy path) , e.g. like the itanium one that C++ uses?

One last question, can you point me to anything that explains how to link with other libraries into JIT compiled code?

@vnmakarov
Copy link
Owner

  • Can I check that MIR handles function calls to C code (i.e. the ABI)? (and can be called from C).

Yes, c2m implements the target ABIs. ABI compatible calls can be described by using only MIR but the complex cases (usually small structs/unions) are not documented yet. You still can figure it out by looking at mir code generated by c2m using -S option. The culprit is to use the righ BLK and RBLK args.

* Does MIR handle both cdecl and stdcall on windows? (I'm on macos aarch64 but windows feels important).

Yes, Windows C ABI is implemented for Windows but Windows is still not a part of supported targtes mostly because of setjmp/longjmp is not implemented by c2m. This is because of lack a good and detail documentation about setjmp/longjmp.

* Is it possible to insert debug info?

No. Although I have plans to generate debug info. It would be quite valuable for debugging code generated by c2m. But I can not say about the timeline for this feature.

* will it be possible to have a zero cost style exception handling, (i.e. zero cost on the happy path) , e.g. like the itanium one that C++ uses?

Exception handling is out of scope of MIR project. But I guess unwinding is possible to implement outside of MIR-related code.

One last question, can you point me to anything that explains how to link with other libraries into JIT compiled code?

MIR_link has a function import_resolver as an arg. This function should provide an address of external whose name is the argument of the function. So you can refer external data or call functions outside JITted code from the JITted code.

@DangerMouseB
Copy link
Author

Thank you - I've been hacking a C compiler (as well as my main language project) and it maybe that attempting to emit MIR might be a good place to start.

Do you have a list of other projects using MIR or is it early days yet?

Really looking for mutual support / conversation / community around using MIR as a backend rather than working on it (although you never know - but my key interest presently is seeing if being able to dynamically generate generic and templated functions works as well as I hope).

Can I confirm my understanding? - I was under the impression that Itanium style exception handling is more implemented in the abi than as IR features and presumably thus is somewhat independent of the IR.

@vnmakarov
Copy link
Owner

Do you have a list of other projects using MIR or is it early days yet?

As I know it is used in lua dialect (https://github.com/dibyendumajumdar/ravi) and faust language (https://github.com/grame-cncm/faust). There are probably more of which I am not aware.

I also tried to make MIR-based JIT (https://github.com/vnmakarov/ruby) but recently basically stopped this work because Spotify YJIT success (they have a whole team working on it while I was able to spend very few my time on my MIR-based Ruby JIT).

@dibyendumajumdar
Copy link
Contributor

are those features usable from C code since I don't directly generate MIR instructions?

Yes. Btw, I did use only C too for Ruby JIT I mentioned.

Hi, are there any docs on the C extensions?

@dibyendumajumdar
Copy link
Contributor

I also tried to make MIR-based JIT (https://github.com/vnmakarov/ruby) but recently basically stopped this work because Spotify YJIT success (they have a whole team working on it while I was able to spend very few my time on my MIR-based Ruby JIT).

That's a pity - I think you could continue on Ruby not to compete with YJIT but more as a way to discover how to improve MIR.

@vnmakarov
Copy link
Owner

Hi, are there any docs on the C extensions?

Not really. Sorry, I'll create a document when I have a time or definitely for the release. Global register vars look analogous to GCC extension:

register void *ec asm("r13");

Overflow builtins can be found in tests c-tests/new/{add,sub,mul}-overflow.c. Builtin expect is analogous to GCC one.

@vnmakarov
Copy link
Owner

That's a pity - I think you could continue on Ruby not to compete with YJIT but more as a way to discover how to improve MIR.

Actually I spent half of my work time on Ruby JIT. But for some reasons, it became impossible to do it anymore.

@brynne8
Copy link

brynne8 commented Jun 9, 2023

I think it is done. There is already a lighter versions of calls. BBV branch has new MIR insns (JCALL and JRET) and corresponding C builtins.

Usually interpreter is implemented through indirect goto like pc+=insn_len; goto *pc where the first VM insn word contains address of labeled code of the interpreter responsible for execution of the corresponding VM insn. With the new MIR insns it is easy to switch to JITted code (of one or more VM insns) by changing address in the 1st VM insn word. The global context can be passed by global register variables (another MIR extension on bbv branch) also used in the interpreter as local register variables (a GCC extension).

Also by default all calls in MIR are implemented through calling thunk. It permits to change the generated code on-the-fly. When it is not necessary, on the bbv branch you can get and use the generated MIR function directly bu using a new function (_MIR_get_thunk_addr, the name of this function is not final and can be changed in the release).

I use this extensions to implement an experimental JIT for Ruby https://github.com/vnmakarov/ruby.

Is it possible to implement labels as values through this?

@vnmakarov
Copy link
Owner

Is it possible to implement labels as values through this?

Not, really. Thunks are only for function calls.

With thunks, it is possible to generate a new version of code for the same function and this code will be automatically used by already generated code. For example, the first version of the code can be minimally optimized and, when the function code is executed frequently, you can use more optimized code. In general, it is even possible to use another compiler (e.g. LLVM) to generate another version of the function code.

Implementing labels as values requires consideral changes in MIR-generator as any indirect goto can potentially result in jump to any BB. It is a moderate size project and I have intention to implement it some day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants