-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simulation hangs for longer running functions using the vector extension #2890
Comments
Update I tried running it on a few other branches: I also ran the current master with the MinimalConfig, instead of DefaultConfig.
Other functions still seem to stall though, e.g.: # asm.S
.text
.balign 8
# generated by clang, see: https://github.com/camel-cdr/rvv-bench/blob/main/bench/mandelbrot.S
.global mandelbrot_rvv
mandelbrot_rvv:
beqz a0, rvv_13
beqz a1, rvv_9
li a7, 0
fcvt.s.wu fa5, a0
lui a3, 262144
fmv.w.x fa4, a3
fdiv.s fa5, fa4, fa5
lui a3, 785408
fmv.w.x fa4, a3
lui a3, 784384
fmv.w.x fa3, a3
lui a3, 264192
fmv.w.x fa2, a3
slli a6, a0, 2
j rvv_4
rvv_3:
addi a7, a7, 1
add a2, a2, a6
beq a7, a0, rvv_13
rvv_4:
fcvt.s.wu fa1, a7
mv t0, a0
j rvv_6
rvv_5:
slli a3, t0, 2
add a3, a3, a2
vsetvli zero, zero, e32, m1, ta, ma
vse32.v v8, (a3)
beqz t0, rvv_3
rvv_6:
vsetvli t1, t0, e32, m1, ta, ma
sub t0, t0, t1
vmset.m v0
vmv.v.i v8, 0
viota.m v10, v0
vadd.vx v10, v10, t0
vfcvt.f.xu.v v10, v10
vfmv.v.f v12, fa1
vfmul.vf v10, v10, fa5
vfadd.vf v10, v10, fa4
vfmul.vf v12, v12, fa5
vfadd.vf v12, v12, fa3
vmv.v.i v18, 0
li a3, 1
mv a5, a1
vmv.v.i v14, 0
vmv.v.i v16, 0
vmv.v.i v20, 0
rvv_7:
vsetvli zero, t1, e8, mf4, ta, ma
vfirst.m a4, v0
bltz a4, rvv_5
vsetvli zero, zero, e32, m1, ta, ma
vfadd.vv v22, v16, v20
vmflt.vf v0, v22, fa2
vfsub.vv v16, v16, v20
vfadd.vv v18, v18, v18
vfadd.vv v22, v16, v10
vfmadd.vv v14, v18, v12
vfmul.vv v16, v22, v22
vfmul.vv v20, v14, v14
vmerge.vxm v8, v8, a3, v0
addi a5, a5, -1
addi a3, a3, 1
vmv.v.v v18, v22
bnez a5, rvv_7
j rvv_5
rvv_9:
slli a3, a0, 2
rvv_10:
mv a4, a0
rvv_11:
vsetvli a5, a4, e32, m1, ta, ma
sub a4, a4, a5
vmv.v.i v8, 0
slli a5, a4, 2
add a5, a5, a2
vse32.v v8, (a5)
bnez a4, rvv_11
addi a1, a1, 1
add a2, a2, a3
bne a1, a0, rvv_10
rvv_13:
ret
// hello.c
#include <klib.h>
void mandelbrot_rvv(size_t width, size_t maxIter, uint32_t *res);
int main(void) {
#define W 10
static uint32_t img[W*W] = {0.0f};
printf("beg\n");
mandelbrot_rvv(W, 20, img);
printf("end\n");
return 0;
} Update: Retested on newer branches: 7fd388c: all problems persist 78c76c7: all problems persist 7390003: all problems persist |
Thank you for your bug report, we are handling this. |
The vector extension is still work-in-progress. It may be more stable after Apr. 30. |
I just tried running it on the development branches, and while it behaved the same on fp-split and new-csr, the I'll now try it again on DefaultConfig, and update this comment once it's done building, and I could run the tests. Update: Edit: Just tried the vlsu-merge-master-0504, which from what I can tell merges the vlsu-240315 branch with master, and the problems are back. Sounds like it was introduced between those commits. |
Thank you very much for your attention to the development of XiangShan and sorry for not replying in time. |
Recently RVV support was merged into the master branch, and I tried running a few of my benchmarks on it, but ran into problems. Only very basic RVV functions worked, the others seem to silently hang the simulation.
For the following I've modified the $AM_HOME/apps/hello example code, and added asm.S to SRCS in the Makefile.
I've attached my entire reproducible docker setup at the end of the issue.
Here are two of the programs that hang the simulation indefinitely:
The problems only seem to occur with a larger iteration counts, e.g. the ascii_to_utf16 code works fine when processing 80 instead of 100 elements. This seems to indicate that there might be a problem with a scheduler or internal buffer filling up?
Since I also ran into problems on other implementations, I've got a quick instruction testing script that executes random instructions. However, the ~50 trials of short random instruction streams I've tested didn't run into any problems.
That's good and points towards this being a single problem, that seems to only occur with longer runs.
Environment Reproduction
I've used the following Dockerfile to build the repository on top of the latests commit to master.
It was run when 0c00289 was the latest commit, since they there is only a single new one, that doesn't look like it would fix the problem, since it's a tiny adjustment to the LSU.
PS: I've also ran into problems with rdcycle not working properly with vector instructions, a loop with 10x more iterations took fewer cycles than one with fewer iterations. Is rdcycle supposed to work with vector instruction in the current implementation? I'll have to investigate this further, and share reproducible code.
The text was updated successfully, but these errors were encountered: