Model RVV permutation instructions more granularly #776

camel-cdr · 2024-01-16T18:11:58Z

Disclaimer: I don't know anything about gem5 internals, so this might be misdirected.

Currently the RVV permutation operations are modeled as VectorMiscOp, which doesn't really reflect reality.

Specifically vrgather.vv, vrgatherei16.vv and vcompress.vm are performance outliers, which isn't currently reflected in the model.

here are a few measurements, but to summarize, these are some throughput measurements:

vcompress.vm:

	VLEN	e8m1	e8m2	e8m4	e8m8
c906	128	4	10	32	136
c908	128	4	10	32	139.4
c920	128	0.5	2.4	5.4	20.0
bobcat*	256	32	64	132	260
x280*	512	65	129	257	513

vrgather.vv:

	VLEN	e8m1	e8m2	e8m4	e8m8
c906	128	4	16	64	256
c908	128	4	16	64.9	261.1
c920	128	0.5	2.4	8.0	32.0
bobcat*	256	68	132	260	516
x280*	512	65	129	257	513

*bobcat: note that it was explicitly stated, that they didn't optimize the permutation instructions

*x280: the numbers are from llvm-mca, but I was told they match reality. There is also supposed to be a vrgather fast path for vl<=256. I think they didn't have much incentive to make this fast, as the x280 mostly targets AI.

I think that the C920 results are the most representative for what to expect of future desktop CPUs.
Personally, I suspect we'll see vrgather.vv perform well for any SEW under LMUL=1, and then grow exponential per element with higher LMUL in the best case, as an all to all mapping is quite expensive to scale.

vcompress.vm should be better scalable than vrgather.vv, since the work is subdividable, and I think we might see a range of implementations from similar to vrgather.vv to almost linear growth with LMUL.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model RVV permutation instructions more granularly #776

Model RVV permutation instructions more granularly #776

camel-cdr commented Jan 16, 2024

Model RVV permutation instructions more granularly #776

Model RVV permutation instructions more granularly #776

Comments

camel-cdr commented Jan 16, 2024