-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Intrinsic implementation #116
Open
fsfod
wants to merge
23
commits into
LuaJIT:v2.1
Choose a base branch
from
fsfod:intrinsicpr
base: v2.1
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fsfod
force-pushed
the
intrinsicpr
branch
4 times, most recently
from
December 23, 2015 17:46
2ed3d2e
to
ef61fa2
Compare
fsfod
force-pushed
the
intrinsicpr
branch
3 times, most recently
from
January 8, 2016 19:32
563a59c
to
fccea5a
Compare
fsfod
force-pushed
the
intrinsicpr
branch
3 times, most recently
from
January 19, 2016 15:59
072540e
to
b5f446d
Compare
fsfod
force-pushed
the
intrinsicpr
branch
2 times, most recently
from
January 27, 2016 04:56
d17c803
to
0312429
Compare
fsfod
changed the title
[WIP, RFC] Intrinsic implementation
[RFC] Intrinsic implementation
Jan 27, 2016
fsfod
force-pushed
the
intrinsicpr
branch
4 times, most recently
from
February 5, 2016 07:25
99c5915
to
2102433
Compare
fsfod
force-pushed
the
intrinsicpr
branch
6 times, most recently
from
February 11, 2016 07:02
8a198e6
to
97f8c97
Compare
This was referenced Feb 15, 2016
fsfod
force-pushed
the
intrinsicpr
branch
2 times, most recently
from
March 28, 2016 15:36
5bc3850
to
f57039e
Compare
This was referenced Mar 9, 2017
…to 3 byte form if needed
…_tv by using a special cast flag(CCF_INTRINS_ARG) for intrinsic vector arguments
…abled DCE of intrinsics Intrinsics are now assumed to have no side effects unless flagged to with either memory side effects(S) or non memory side effects(s)
…trinsics that have no side effects and are not forced indirect ModRM which could be a load or store
…rectly allocated an input register
…us ways. Fix wrappers truncating GCobj pointers in GC64 mode when loading them from the stack to store output registers in to cdata. Fix the stack for intrinsics not being adjusted correctly in there interpreter wrapper when it uses the RID_DISPATCH register on GC64 because RSET_GPR does not contain it
…g RID_DISPATCH Make RID_DISPATCH an unallocatable register for intrinsics when building as GC64. Fix trying to evict RID_DISPATCH for LJ_GC64 builds on x64 for intrinsics and add some asserts that we never try to again. Don't set register hints for intrinsic input\output registers that are RID_DISPATCH. Restore RID_DISPATCH first when handling output registers and defer it till last for input registers of intrinsics in the JIT.
…ests causing random test failures
…ifferent builds of LuaJIT
…s to allow pointer based intrinsics to work in both 64 bit and 32 bit with the same definion.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an implementation of #39 and limited to x86/x64 for Windows and Linux ABIs for time being.
There are some working toy examples in test/test.lua and test/intrinsic_spec.lua of the current API. JIT support for support for Vector register will be left as NYI because it needs various change to the JITs systems. If you feeling brave you can try out a experimental branch with JIT vector support.
An intrinsic can either be single machine instruction that LuaJIT might have some specialized understanding of or an opaque blob of 1 or more machine instructions that may be user supplied. Intrinsics will behave like a callable function in the interpreter There argument order will be the same order that input registers were declared in the register list.
API
Declaring an vector opcode intrinsic with immediate control byte
Declaring an opcode with both a prefix and immediate byte, that takes an address and has memory side effects.
Running intrinsics in the interpreter
To allow calling intrinsics in the interpreter an internal wrapper function is generated using part of the existing JIT engine, in theory the full JIT engine could be used by generating IR instead of using the raw emit system but would probably require lots fixes where its assumed the code is being generated for a trace. The wrapper is called with two pointers the first is the input context structure that contains the values(or pointers for vectors) to the values of the input registers and the second is the Lua stack to write the results to . After the intrinsics code in the wrapper has run the wrapper writes output registers directly to the Lua stack if they are 32bit signed numbers otherwise it copies the output registers into the pre-created(before the wrapper is called) cdata that's on the Lua stack.
Intrinsics in the JIT
Three new IR instructions are added for intrinsics:
op2(literal) holds the fixed register id that the output value gets written to.
ASMRET for fixed registers have matching register hints set in register hint prepass.
Design notes
The mcode api/system was generalized to allow more than one mcode area since the existing JIT one is flushed when a full trace flush happens, while the generated wrappers need to stay around until state is closed. In theory the FFI callback stubs could also live in this mcode area as well instead of living in fixed size memory.
Currently arguments passed to an intrinsic in the interpreter are handled using a data drive approach in which they are converted and packed into a context like the FFI system uses to call C functions. If the input values were treated as strongly typed(direct ctype id match for cdata or built-in Lua type) the need to save and load input values into the context could be skipped by the wrapper directly loading the values off the Lua stack and moving them into registers.
Currently the only way to express memory side effects that a intrinsic does is XBAR when all might be needed is a fake store of particular size that the pointer aliasing system understands also see previous discussion of how s/l/mfence could work.
Tasks