Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX/AVX2 support #43

Open
4 tasks
melsman opened this issue Apr 27, 2020 · 0 comments
Open
4 tasks

Add AVX/AVX2 support #43

melsman opened this issue Apr 27, 2020 · 0 comments

Comments

@melsman
Copy link
Owner

melsman commented Apr 27, 2020

Add support for packed vector instructions for floating point and integer operations.

  • Design and implement a generic signature that supports various explicit operations (e.g., mul, add) on, for instance, 64-bit floating point values (in e.g., 256bit packed vector registers).

  • Design and implement various structures that matches the above signature (e.g., for packed 64-bit floats and for packed 64-bit integers). Make use of the MLKit prim feature for intrinsics.

  • Implement support for the intrinsics in the Compiler/Lambda/LambdaExp MLKit intermediate language to be targeted by the operations in the structures. Implement support for the operations all the way down to the Compiler/Backend/X64/CodeGenX64 / Compiler/Backend/X64/CodeGenUtilX64 modules (e.g., extend the operations in Compiler/Backend/PrimName.sml)

  • Implement operations for loading from and storing to memory. We can use the BlockF64 values for representing and allocating memory.

Discussion.

An important aspect here is that the implementation will have to include boxing-operations that implicitly box the vector values into memory. The optimiser can then eliminate box-unboxing and unbox-box compositions. The reason is that, in general, it is impossible to ensure that a value is not passed to a generic function, stored in a data structure, or captured in a closure; it is assumed that all values can be represented in one 64-bit word (perhaps tagged with the LSB being 1, if the GC should not traverse the value).

I foresee some issues with implementing support for register allocation on the ymm registers. Also, We must make sure that the optimiser (i.e., module Compiler/Lambda/OptLambda) does not pass wide 256-bit values to generic functions. Also, such values cannot be passed as arguments to functions and neither can they be stored in closures. They are solely for operations in basic blocks. Ideally, these restrictions could be enforced in Compiler/Lambda/LambdaStatSem.

An interesting application for these operations would be to make use of the operations to implement efficiently some of the operations in the Real64Array / Real64Vector structures.

References

  1. Book

  2. Optimizing Subroutines in Assembly Language

  3. x86 and amd64 instruction reference

  4. Formally optimal boxing

  5. Notes on x86-64 Programming

  6. Twitter-post on the AVX landscape

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant