Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsic test tool to compare neon intrinsics with C #1170

Merged
merged 4 commits into from Sep 9, 2021

Conversation

JamieCunliffe
Copy link
Contributor

As mentioned in #1125 this is the start of a tool to compare the intrinsics between C and Rust.

Some notable things:

  • All NaN's are treated as equal. I do have a branch that prints them as hex if you would prefer though.
  • Intrinsics with a lane or n argument are skipped due to constraints on the values.

I did notice that some of the intrinsics were marked as done in the tracking spreadsheet but didn't actually exist.

I have also noticed that a few vcvt intrinsics have differences, I haven't yet fully investigated that though.

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

Copy link
Member

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a lot of trouble getting this to run: it's hard to get an aarch64 clang toolchain set up properly on an x86_64 host. Could you add the intrinsics test to the CI scripts so we can run them in a consistent docker environment?

crates/intrinsic-test/src/main.rs Outdated Show resolved Hide resolved
@Amanieu
Copy link
Member

Amanieu commented May 25, 2021

Overall, this looks great! Do you think this could be extended in the future to support arm and possibly x86 as well?

@JamieCunliffe
Copy link
Contributor Author

CI - I'll take a look at adding it, I'll try to get something done next week for it.

arm I think this one should be fairly simple. I suspect it will just be a case of changing the target on the build commands and updating the use statement. The input list would need to be filtered to remove the intrinsics that are AArch64 only though.

x86 I haven't looked at those intrinsics in any detail but it looks like the types have different names so it would probably require a few changes to types.rs to emit the right type names (and I suspect the get lane function would have a different name). Also it looks like the type names don't contain as much information as they do on arm so you might just have to treat them as some number of 64 bit values, that shouldn't cause an issue other than the printing of values might not be as human readable. If you could determine the layout while loading the intrinsic list that would be fixed though.

Given that it's mostly just creating some values calling a function with those and then printing the return value, I expect it shouldn't be too far away to be honest.

@Amanieu
Copy link
Member

Amanieu commented May 28, 2021

In stdarch-verify we currently parse the HTML from the ARM website to get a list of all intrinsics and their arguments (crates/stdarch-verify/tests/arm.rs). Is there any difference between this and the data source you are using for your CSV?

We have some intrinsics that require that an argument is a compile-time constant, e.g. vcvtd_n_f64_u64 or vfma_lane_f32. Is this properly handled by your tool? These are a bit tricky since an out-of-range constant will cause a compile-time error in both rustc and clang.

@JamieCunliffe
Copy link
Contributor Author

My data source is https://docs.google.com/spreadsheets/d/1MqW1g8c7tlhdRWQixgdWvR4uJHNZzCYAf4V0oHjZkwA/edit#gid=0 it was easier to use that as the first column can be used to check if it's enabled in Rust.
If we do use the same data as stdarch-verify it would be nice to try and unify some of the code between verify and this test tool though.

As for the compile time constants, in the spreadsheet and documentation they are named n and lane so I was filtering those out as a starting point. I do have plans to add support for them though.

@Amanieu
Copy link
Member

Amanieu commented Jun 19, 2021

The checker seems to be failing on some intrinsics. Are these real bugs in our implementation?

Also some of the C intrinsics fail to run for some reason.

@JamieCunliffe
Copy link
Contributor Author

I haven't had time to investigate the failures properly, but the differences in values could be actual errors with the implementation, when looking at the assembly for some of those the fcvtz(u,s) doesn't appear in the Rust binary.

The failure to run the C is because it didn't find the intrinsic, I just remembered I probably need to set the crypto feature.

@Amanieu
Copy link
Member

Amanieu commented Jul 20, 2021

Ping @JamieCunliffe!

Any progress update on this? From what I can tell the main blockers are:

  • Failed to run C program for intrinsic test failures.
  • vcvt intrinsics producing the wrong results.
    • I can look into this, but it would be helpful if failure messages showed the inputs that caused the mismatch.

The implementation itself looks fine otherwise.

@JamieCunliffe
Copy link
Contributor Author

I haven't forgot about this.

I just pushed a change that should fix the Failed to run c program.

I plan to take a look at the conversion issue now.

@JamieCunliffe
Copy link
Contributor Author

It looks like the issue with the conversion is due to optimisations, when building the C and Rust with optimisations turned off they produce the same correct output.

One set of values that are causing an error are: f32::from_bits(0x7F8FFFFF),f32::from_bits(0x7F800000),f32::from_bits(0x7FD23456),f32::from_bits(0x7FC00000) -> (NaN, Inf, NaN, NaN).
Those values should be converted to (0, MAX, 0, 0).

With optimisations on for converting to u32 Rust produces:
(4096, 0, 304480, 85) if only attempting to convert those values.
(10, 10, 10, 10) when run in the CI when converting multiple. 10 was the last valid floating point value to have been converted.
C produces:
(0, 0, 0, 0)

If I use std::hint::black_box on the input to the intrinsic then we get the fcvtzu and the correct result.

As a side note f32::from_bits(0x7F800000) as u32 and f32::from_bits(0x7F8FFFFF) as u32 appear correct in release. So this does look like it only impacts simd_cast.

@Amanieu
Copy link
Member

Amanieu commented Jul 29, 2021

This seems to be a case of rust-lang/rust#10184. This was fixed for normal float to int casts but not for simd_cast.

However the root cause is that we shouldn't be using simd_cast at all: we should be calling the llvm.aarch64.neon.fcvtzs builtin instead of using a normal cast which is UB for out-of-range values.

This was actually fixed in Clang 12, which produces the correct result: https://godbolt.org/z/jG1x1Yvx5

@JamieCunliffe
Copy link
Contributor Author

I started a patch to use llvm.aarch64.neon.fcvtzs. However for armv7 it would still need to use simd_cast, unless I'm missing something in CGBuiltin, it doesn't appear that armv7 has an equivalent.

There is the llvm.fptoui.sat and llvm.fptosi.sat (https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) that we could use for this. That intrinsic has the correct semantics, however the codegen isn't optimal for simd types at the minute. There is a follow up on https://reviews.llvm.org/D102353 to fix this for vector types.
Using this the tests don't currently pass due to the instruction limit for those tests.

My thoughts are to use those saturating intrinsics and until LLVM can be updated, add an exception in the instruction count limit to the tests for that intrinsic. Does that seem reasonable?

@Amanieu
Copy link
Member

Amanieu commented Aug 4, 2021

My thoughts are to use those saturating intrinsics and until LLVM can be updated, add an exception in the instruction count limit to the tests for that intrinsic. Does that seem reasonable?

Sounds good!

This tool generates rust and C programs that will call intrinsics and
print the results with multiple different inputs. It will build all
the programs and then run each of them and diff the output printing
any differences that are found.

It uses the tracking spreadsheet (with the % column renamed to
enabled) to determine which intrinsics to test.

It will filter out any intrinsics that have an argument with the name
"n" or "lane" as those have constraints on them as to what numbers can
be used.
Copy link
Contributor

@lu-zero lu-zero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good, shall we land it now? :)

@Amanieu
Copy link
Member

Amanieu commented Sep 9, 2021

Oh! I missed the update since github doesn't give notifications for force-push. Yes, let's merge this!

@Amanieu Amanieu merged commit 1a7d0ce into rust-lang:master Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants