Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program is halted when running Coremark.riscv #69

Open
AnouarNechi opened this issue Apr 11, 2020 · 7 comments
Open

Program is halted when running Coremark.riscv #69

AnouarNechi opened this issue Apr 11, 2020 · 7 comments

Comments

@AnouarNechi
Copy link

Hello
I want to run Coremark on the simulator and I used the code here: riscv-coremark which generates two version one for baremetal and the other for Linux or pk. I compiled them and copied them in the built benchmarks folder.
I used the following command:
sims -sys=manycore -vlt_run -x_tiles=1 -y_tiles=1 coremark.bare.riscv -ariane -precompiled -rtl_timeout=1000000
for the bare metal th execution is halted before the rtl-timeout as followwing:

`TILE0-------------------------------------
0000000001d3904a
0000000001d3904a P1S3 msg type:     st_req     addr: 0x0080001000, Data_size: 100, cache_type: 0
P1S3 valid: recycle: 0, stall: 0
State wr en: 1
Dir data: 0x0000000000000000
CSM enable: 0
Msg from mshr: 1
P1S3 addr: 0x0080001000
P1S3 valid: l2_hit: 1, l2_evict: 0
Data data: 0x000000000000000000000000000000000000
State:mesi: 00, vd: 10, subline: 0000, cache_type: 0, owner: 000000
sdid:    0, lsid:  0
TILE0-------------------------------------
0000000001d3923e
0000000001d3923e P1S4 msg type:     st_req     addr: 0x0080001000, Data_size: 100, cache_type: 0
P1S4 valid: recycle: 0, stall: 0, msg_stall: 0, dir_data_stall: 0, stall_inv_counter: 0, stall_smc_buf: 0, smc_stall: 0, global_stall: 0, broadcast_stall: 0
Control signals: 0100011101000011110
CSM enable: 0
broadcast coreid: (    0,   0,   0)
broadcast state: 0, broadcast op val: 0
Special addr type: 0
MSHR state wr en: 0
MSHR data wr en: 0
MSHR data wr : 0x00000000000000040a00c0080001000
MSHR inv counter :  0
State wr en: 1
Dir data: 0x0000000000000000
Dir sharer counter:  1
State data in: 0x01000000000004040
State data mask in: 0x0f0000000000067ff
State wr addr: 0x40
Msg data: 0x0000000000000000
SMC miss: 0
SMC data out: 0x00000000
SMC tag out: 0x0000
SMC valid out: 0x0
Msg send valid: 1, send ready: 1, mode: 011, length: 00000010
Msg send type:     data_ack   Msg send data_size: 000, cache_type: 0, mesi: 11, l2_miss: 1, mshrid: 00000011, subline_vector: 0000
Msg from mshr: 1
P1S4 addr: 0x0080001000
P1S4 valid: l2_hit: 1, l2_evict: 0
Data data: 0x000000000000000000000000000000000000
State:mesi: 00, vd: 10, subline: 0000, cache_type: 0, owner: 000000
Msg send: addr: 0x0080001000, dst_x: 00000000, dst_y: 00000000, dst_fbits: 0000
Msg send data: 0x00000000000000000000000000000000
src x: 00000000, src y: 00000000
sdid:    0, lsid:  0
0000000001d39432
TILE0 noc2 flit raw: 0x00000000008740f8
0000000001d39626
TILE0 noc2 flit raw: 0x0000000000000000
0000000001d3981a
TILE0 noc2 flit raw: 0x0000000000000000
30646000 TILE0 L1.5: Received NOC2                                                MSG_TYPE_DATA_ACK   mshrid 3, l2miss 1, f4b 0, ackstate 3, address 0x0000000000
   Data1: 0x0000000000000000
   Data2: 0x0000000000000000
   Data3: 0x0000000000000000
   Data4: 0x0000000000000000

0000000001d39ef0 L15 TILE0:
NoC1 credit:  8
NoC1 reserved credit:  1
TILE0 Pipeline: *  X  X 
Stage 1 status:    Operation: L15_REQTYPE_ACKDT_ST_IM
   TILE0 S1 Address: 0x0080001000
L15_MON_END


0000000001d3a0e4 L15 TILE0:
NoC1 credit:  7
NoC1 reserved credit:  0
TILE0 Pipeline: *  *  X 
Stage 1 status:    Operation: L15_REQTYPE_ACKDT_ST_IM
   TILE0 S1 Address: 0x0080001000
Stage 2 status:    Operation: L15_REQTYPE_ACKDT_ST_IM
   TILE0 S2 Address: 0x0080001000
   TILE0 S2 Cache index:   0
   DTAG way0 state: 0x3
   DTAG way0 data: 0x0000000080024000
   DTAG way1 state: 0x2
   DTAG way1 data: 0x0000000080003800
   DTAG way2 state: 0x3
   DTAG way2 data: 0x0000000080004000
   DTAG way3 state: 0x0
   DTAG way3 data: 0x0000000000000000
L15_MON_END


0000000001d3a2d8 L15 TILE0:
NoC1 credit:  7
NoC1 reserved credit:  0
TILE0 Pipeline: X  *  * 
Stage 2 status:    Operation: L15_REQTYPE_ACKDT_ST_IM
   TILE0 S2 Address: 0x0080001000
   TILE0 S2 Cache index:   0
   MESI write way: 0x3
   MESI write data: 0x3
HMT writing: 0
Stage 3 status:    Operation: L15_REQTYPE_ACKDT_ST_IM
   TILE0 S3 Address: 0x0080001000
   TILE0 WMT read index: 00
   WMT way          0: 1 0x1
   WMT way          1: 0 0x0
   WMT way          2: 0 0x0
   WMT way          3: 0 0x0
L15_MON_END

30647500 TILE0 L1.5 th0: Sent CPX ST_ACK   l2miss 1, nc 0, atomic 0, threadid 0, pf 0, f4b 0, iia 0, dia 0, dinval 0, iinval 0, invalway 0, blkinit 0
   Data0: 0x0000000000000000
   Data1: 0x0000000000000000
   Data2: 0x0000000000000000
   Data3: 0x0000000000000000

0000000001d3a4cc L15 TILE0:
NoC1 credit:  8
NoC1 reserved credit:  0
TILE0 Pipeline: X  X  * 
Stage 3 status:    Operation: L15_REQTYPE_ACKDT_ST_IM
   TILE0 S3 Address: 0x0080001000
L15_MON_END

Info: spc(0) thread(1) -> timeout happen
Info: spc(0) thread(2) -> timeout happen
Info: spc(0) thread(3) -> timeout happen
Info: spc(0) thread(1) -> timeout happen
...

and I tried the other one for Linux and pk (just to check) and it keeps running for ever until reaching the timeout.
Could you please help?

@Jbalkind
Copy link
Collaborator

Looking at the code, this version assumes that it has some kind of test harness that doesn't exist in our environment. You'd have to modify the compilation environment to instead use the syscalls.c, crt.S, etc that we have in the OpenPiton+Ariane environment. I don't think it should be too troublesome to do that but it'd take some tinkering

@Jbalkind
Copy link
Collaborator

I could get it to build by copying $PITON_ROOT/piton/verif/diag/assembly/include/riscv/ariane/* into the riscv-coremark/riscv64-baremetal/ directory and then modifying the gcc build command to add -fno-builtin-printf. However, I get bad trap when it runs. Not sure what the issue there is.

Looking at trace_hart_0.log it seems there's an illegal instruction exception.

@Jbalkind
Copy link
Collaborator

Looks like the issue is on rdcycle - there is a discussion of this on the PULP forum here: https://pulp-platform.org/community/showthread.php?tid=133

@Jbalkind
Copy link
Collaborator

Based on the above post, I think that essentially coremark assumes it's running in some kind of user mode environment with the ability to use rdcycle. Ariane doesn't seem to have the mcounteren/scounteren register for you to enable lower-privilege (like user-mode) access to the register.

As Florian and Frank say in the post above, you can do one of two things:

  1. Expand the trap handling to enable access to the register (should be a small software change). You could just add an if statement in handle_trap to check the cause. If it's trying to do a rdcycle, perform the rdcycle there since you're in machine mode then, and put the value in the right place. If it's not a rdcycle, then you can just exit as it does already. See handle_trap here:

    uintptr_t __attribute__((weak)) handle_trap(uintptr_t cause, uintptr_t epc, uintptr_t regs[32])
    {
    tohost_exit(1337);
    }

  2. Implement the mcounteren or scounteren register in Ariane and modify the software to enable access in the beginning. This is more involved but would improve Ariane

@Jbalkind
Copy link
Collaborator

Since this caught my attention and I've seen others complain of the same problem, I decided to help out the Ariane project and implement the registers (option 2). You can see my PR here: openhwgroup/cva6#411

If you use this, follow my steps above, and add the following to our crt.S (not the one from riscv-coremark, which should be replaced):

@@ -110,6 +139,12 @@ _start:
   la t0, trap_entry
   csrw mtvec, t0

+  # initialize mcounteren and scounteren
+  # allow access from user mode
+  li a0, 0x7
+  csrw mcounteren, a0
+  csrw scounteren, a0
+
   # initialize global pointer
 .option push
 .option norelax

then you will be able to run the bare-metal coremark. It will run for a very long time in simulation (particularly because there's lots of printing)! I added -DCORE_DEBUG to my gcc build command to generate the bare-metal version and run for only a single iteration. and it runs in a much more watchable timeframe. That's obviously not correct, but it does at least let us see that it's making progress as intended and not just getting stuck in an infinite loop.

Also: be careful with that version of coremark. Looking at the core_portme.* files it seems like they're choosing some random values for frequency and so on. You'll need to change those to make sure you're getting valid numbers.

@AnouarNechi
Copy link
Author

Thank you Jonathan for you efforts to help me.
I followed your instructions step by step and I also had a good trap with 1 iteration and as you mentioned the results are incorrect since CoreMark must run at least for 10 seconds and 1 iteration is not enough for that. So, I am trying to increase the number of iteration.But I already hit a Timeout
Questions:

  • Is there a possible way to prevent the Timeout?
  • in the fake_uart log I got this :
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 527497
Total time (secs): %f
Iterations/Sec   : %f
ERROR! Must execute for at least 10 secs for a valid result!
Iterations       : 1
Compiler version : GCC8.2.0
Compiler flags   : -O2 -mcmodel=medany -static -std=gnu99 -fno-builtin-printf -fno-common -nostdlib -nostartfiles -lm -lgcc -T ../riscv64-baremetal/link.ld   
Memory location  : Please put data memory location here
			(e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xe714
Errors detected

Do you think it is normal or the printf() has a problem to deal with floats?
Thank you

@Jbalkind
Copy link
Collaborator

You can set -rtl_timeout to a large number. I think it's in either micro or nanoseconds. In other circumstances I've set it to something around 1000000 to stave off the checks. I think there's another way to turn it off but I can't think of it immediately or which file exactly that checker was in. Increasing the number should hopefully be sufficient as we've just turned it up before to do Linux boots which can take a couple of days.

10 seconds is quite a long time to run for in simulation. I'd guess it'll take a few hours. It may be better to move to FPGA instead. The test should be able to be pitonstreamed and because you're using our syscalls.c etc, if you add -DPITONSTREAM when you compile the benchmark to .riscv it will also add the load to good/bad trap into the test that can be recognised on FPGA and show pass/fail there. Your bitfile may need to be modified to change the timeout there but I don't recall how that works. I think it may just be a software-based check in pitonstream on the host which you should be able to override without having to recompile the bitfile.

As for printf, yes. We have a very simple implementation of printf in the syscalls.c/util.h/etc which doesn't include floats because we hadn't needed them. You could add in an implementation from elsewhere or just take it easy and print as hex instead and use python on the command line for example to just quickly get the correct corresponding float value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants