ARM NEON Intrinsics implementation in C, for accurate understanding of each "neon function".
Replace
#if __ARM_NEON
#include <arm_neon.h>
#endif // __ARM_NEON
with
#include "neon_sim.h"
- Real cross-platform
- 100% in C/C++, no SSE stuffs
- "Debuggable inside the intrinsics"
- Closer to native neon usage
- Support initializer list
int8x8_t a = { 1, -2, 3, 4, 5, 6, 7, 8 };
- Support compile time vector type checking
uint8_t b[4] = { 5, 6, 7, 8 }; uint16x4_t v_b = vld1_u16(b);
- Support same-length-different-type conversion(require
-flax-vector-conversions
sometimes)
- Support initializer list
- The correctness of the simulation implementation is not guaranteed.
- However, the accuracy can be improved by adding examples and continuously verifying.
- Not support inline assembly, i.e.
asm volatile("...")
stuffs. vgetq_lane_f32(sum_vec, 4)
failed to check-and-report index out of bounds of[0, 3]
in compile time.