Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model RVV permutation instructions more granularly #776

Open
camel-cdr opened this issue Jan 16, 2024 · 0 comments
Open

Model RVV permutation instructions more granularly #776

camel-cdr opened this issue Jan 16, 2024 · 0 comments

Comments

@camel-cdr
Copy link

Disclaimer: I don't know anything about gem5 internals, so this might be misdirected.

Currently the RVV permutation operations are modeled as VectorMiscOp, which doesn't really reflect reality.

Specifically vrgather.vv, vrgatherei16.vv and vcompress.vm are performance outliers, which isn't currently reflected in the model.

here are a few measurements, but to summarize, these are some throughput measurements:

vcompress.vm:

VLEN e8m1 e8m2 e8m4 e8m8
c906 128 4 10 32 136
c908 128 4 10 32 139.4
c920 128 0.5 2.4 5.4 20.0
bobcat* 256 32 64 132 260
x280* 512 65 129 257 513

vrgather.vv:

VLEN e8m1 e8m2 e8m4 e8m8
c906 128 4 16 64 256
c908 128 4 16 64.9 261.1
c920 128 0.5 2.4 8.0 32.0
bobcat* 256 68 132 260 516
x280* 512 65 129 257 513

*bobcat: note that it was explicitly stated, that they didn't optimize the permutation instructions

*x280: the numbers are from llvm-mca, but I was told they match reality. There is also supposed to be a vrgather fast path for vl<=256. I think they didn't have much incentive to make this fast, as the x280 mostly targets AI.

I think that the C920 results are the most representative for what to expect of future desktop CPUs.
Personally, I suspect we'll see vrgather.vv perform well for any SEW under LMUL=1, and then grow exponential per element with higher LMUL in the best case, as an all to all mapping is quite expensive to scale.

vcompress.vm should be better scalable than vrgather.vv, since the work is subdividable, and I think we might see a range of implementations from similar to vrgather.vv to almost linear growth with LMUL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant