[dv] Integrity error generation in memory busses #1811

ctopal · 2022-09-20T13:30:00Z

1st commit changes the old memory_error_seq so that it uses the new sequence library.

2nd commit is a bug-fix guaranteeing not injecting an error while handshaking.

3rd commit allows us to stop/start InfiniteRuns type of sequences. Necessary for stopping injecting errors while we are in an IRQ handler.

4th commit improves memory interface to have an integrity error response on demand, also changes the memory_error_seq to inject integrity errors as well as bus errors. Also introduces a new seperate test named riscv_mem_intg_err_test that randomly picks between having an integrity error or a bus error on a memory response to a request.

5th commit integrates internal NMI handling to our cosim environment by driving nmi_int in Spike and set internal NMI traps as synchronous.

6th commit updates CI to have latest version of ibex_cosim branch of Spike.

7th commit is a simple fix for tying up pc_wdata of RVFI interface

8th commit is an RTL change about allowing the register write in the case of an integrity error. Reason being, if we don't let this instruction execute, this means the interrupt caused by this instruction does need to happen pretty much instantly. And letting Spike know about that behaviour (we retired a faulty instruction that caused an interrupt) was challenging. Instead, after the load/store we now flag the next RVFI package (with the jump instruction) with int_nmi causing Spike to match with Ibex.

For more details please check commit messages.

Resolves #1731
Resolves #1732

ctopal · 2022-09-22T16:27:05Z

This does not take into account the type of memory request. That means we won't differentiate between seeing an integrity error on the memory interface while doing a store or a load (either way if rdata integrity bits are faulty, we're seeing an internal NMI -is this okay? WDYT @GregAC ). I believe this might not be ideal and we should only send an integrity error while we are doing a read. Should I change it?

GregAC · 2022-09-23T14:09:02Z

That means we won't differentiate between seeing an integrity error on the memory interface while doing a store or a load (either way if rdata integrity bits are faulty, we're seeing an internal NMI -is this okay? WDYT @GregAC ). I believe this might not be ideal and we should only send an integrity error while we are doing a read. Should I change it?

With the current Ibex RTL integrity errors on rdata for writes are ignored. However we're planning on going back to checking it (see lowRISC/opentitan#14611) so the current behaviour is fine.

GregAC

This is looking good @ctopal just a few tweaks til it's ready to go.

dv/uvm/core_ibex/tests/core_ibex_test_lib.sv

dv/uvm/core_ibex/tests/core_ibex_new_seq_lib.sv

GregAC · 2022-09-23T14:13:23Z

dv/uvm/core_ibex/tests/core_ibex_test_lib.sv

@@ -1266,6 +1266,7 @@ class core_ibex_mem_error_test extends core_ibex_directed_test;
    memory_error_seq_h = memory_error_seq::type_id::create("memory_error_seq_h", this);
    `uvm_info(`gfn, "Running core_ibex_mem_error_test", UVM_LOW)
    memory_error_seq_h.vseq = vseq;
+    `DV_ASSERT_CTRL_REQ("NoAlertsTriggered", 1'b0)


I think this is a little too blunt. We wouldn't want alerts to occur on bus errors for example.

Perhaps randomly choose whether or not a test will generate integrity errors at all and then only disable this assertion if we'll be generating integrity errors?

GregAC · 2022-09-23T14:21:29Z

Oh and we do need a check that an integrity error triggers an alert. This can be done as an assert for the data side. Instruction side requires a little more thought (a fetch may not actually get executed so the integrity error doesn't get consumed and doesn't trigger an alert). Though strictly that's a V2S task. So for this PR at least we don't need to worry about it, but we must not forget about it (best to create an issue to track).

ctopal · 2022-10-14T01:54:00Z

While simulating with this version I found a minor mismatch regarding two back to back integrity errors (with load instructions). RTL side does not take the faulty data from memory into the register, Spike does.

I might need help about how to tell Spike to not go into a fault but also not process the retired instruction.

ctopal · 2022-10-14T12:34:02Z

Test pass rate for this memory test is now around 60%. We hit internal NMI coverpoints though. I suspect almost all failures are related with the issue that I mentioned in the previous message.

GregAC · 2022-10-14T15:26:53Z

As discussed in the Ibex DV Sync I think we should avoid generating integrity errors in the riscv_mem_error_test and introduce a new riscv_mem_error_intg_test that does have integrity error. Just add a config option for core_ibex_mem_error_test that enables generating integrity errors and a new entry to the testlist that turns that on.

With that done this can pass CI and we can get it merged. If the new test pass rate is 60% that should do for our V2 targets and we can fix up remaining issues in V3.

hcallahan-lowrisc · 2022-10-17T09:53:09Z

I think it might be cleaner to split out the cosim/NMI changes into a separate PR? Those changes are quite significant on their own. And could you add some more detail to the PR comment, the messages for 7222d4a and 66f3e2d are a good start but I think some more context would be really useful in the future.

ctopal · 2022-10-17T15:47:55Z

Thanks @hcallahan-lowrisc for having a look, I updated the commit messages for almost all of them and dropped that one unnecessary commit about catching external RVFI signals -that one should belong to another PR I agree. Latest changes should make the mem_test pass in CI (as can be seen in Ibex CI + Private CI).

Might worth just another quick look @GregAC since now it includes cosim integration too.

PS Failures on the CI says cancelled,I'm not sure why

GregAC · 2022-10-17T16:32:57Z

dv/cosim/cosim.h

+  // Set the state of the internal NMI (non-maskable interrupt) line.
+  // Behaviour wise this is almost as same as external NMI case explained at
+  // set_nmi method. Main difference to consider is that this interrupt is
+  // synchronous because it comes from the integrity checking logic inside the


The internal interrupt is effectively synchronous as currently implemented but it should be consider asynchronous (in particular we don't make any guarantees about when it'll occur relative to a memory access). So I think just drop the final statement here or tweak the language so it doesn't call it 'synchronous'

GregAC · 2022-10-17T16:37:58Z

dv/cosim/spike_cosim.cc

@@ -200,7 +200,10 @@ bool SpikeCosim::step(uint32_t write_reg, uint32_t write_reg_data,

  if (processor->get_state()->last_inst_pc == PC_INVALID) {
    if (!(processor->get_state()->mcause->read() & 0x80000000) ||
-        processor->get_state()->debug_mode) { // (Async-Traps are disabled in debug mode)
+        (processor->get_state()->mcause->read() ==


Internal NMIs should be handled as an async trap here, had you encountered difficulties using internal NMI with the sync trap here?

I tested removing it now, it didn't changed the pass rate. Possibly it was some other problem that lead me to believe this was the correct behaviour. I'll change it to behave just as another NMI now.

ctopal · 2022-10-18T09:40:50Z

I think this PR increases our test time for mem_error_test (after changing it with the new sequence we probably are injecting way more memory errors) and tips it over 30 mins. What should we do about it?

GregAC · 2022-10-19T08:57:27Z

I think this PR increases our test time for mem_error_test (after changing it with the new sequence we probably are injecting way more memory errors) and tips it over 30 mins. What should we do about it?

Let's try dropping the frequency of memory errors and see what that does to the run time.

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

This will prevent seeing mismatches right at the end of our test. Before this change, mem_error_test could inject error at the store instruction in which we finish up the test, resulting with mismatches with Spike and Ibex on the instructions after finishing. Also do the same prevention for signature_addr as well, since we also don't want to corrupt that memory transaction too. Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

This test picks between inserting an integrity error or a bus error to the response in the case of a memory request from Ibex. Introduces a control knob `enable_mem_intg_err` which can control the rate of having integrity errors per request. This commit also disables checking for double fault alerts in the scoreboard because they're expected to be seen while simulating and they don't cause infinite loop problems because every time a memory response is requested the error causing part is just randomized. That means Ibex trying to execute same instruction again would have a chance to succeed this time. Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

This commit is mainly an extension to cosim environment to drive the newly introduced state variable `nmi_int` in Spike. This commit - Extends RVFI interface by a single bit (ext_nmi_int) - Configures cosim to set nmi_int inside Spike Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

ctopal · 2022-10-20T17:19:49Z

@GregAC can you have a look at 66d8d0f to see if this makes sense?

GregAC · 2022-10-21T13:53:55Z

@GregAC can you have a look at 66d8d0f to see if this makes sense?

It does make sense, though having reviewed our discussion I think we want to leave the RTL behaviour as is. This means we need to instead alter cosim to not do the register writes on ECC failures. I will take a look at this.

GregAC · 2022-10-24T19:21:14Z

@GregAC can you have a look at 66d8d0f to see if this makes sense?

It does make sense, though having reviewed our discussion I think we want to leave the RTL behaviour as is. This means we need to instead alter cosim to not do the register writes on ECC failures. I will take a look at this.

I've worked out how I'm going to do this, just need to implement it. I'm going to use a method that doesn't require spike changes (as otherwise these would be quite invasive to suppress a register write on a load).

GregAC · 2022-11-04T21:48:01Z

I've added some stuff and reworked commits here to give #1811 which supersedes this PR and should get the memory integrity test running and passing.

ctopal added Type:Enhancement Feature requests, enhancements Component:DV Design verification (DV) or testing issue labels Sep 20, 2022

ctopal requested review from GregAC and hcallahan-lowrisc September 20, 2022 13:30

GregAC reviewed Sep 23, 2022

View reviewed changes

ctopal force-pushed the intg_err branch 3 times, most recently from 1a49c7c to 1e3b1fb Compare October 14, 2022 01:50

ctopal force-pushed the intg_err branch from 1e3b1fb to 7a07a66 Compare October 14, 2022 01:55

ctopal requested a review from GregAC October 14, 2022 01:56

ctopal force-pushed the intg_err branch from 2b6939a to 7224ed2 Compare October 14, 2022 12:30

GregAC approved these changes Oct 14, 2022

View reviewed changes

ctopal force-pushed the intg_err branch 4 times, most recently from 36a9048 to 5435edf Compare October 17, 2022 08:40

ctopal force-pushed the intg_err branch 7 times, most recently from 55ab9bc to 2bd646b Compare October 17, 2022 13:30

ctopal force-pushed the intg_err branch 3 times, most recently from e75daed to bc16e84 Compare October 17, 2022 15:04

GregAC reviewed Oct 17, 2022

View reviewed changes

ctopal force-pushed the intg_err branch 2 times, most recently from 14633ae to 8fc2e59 Compare October 18, 2022 09:04

ctopal force-pushed the intg_err branch from 8fc2e59 to 62877ad Compare October 19, 2022 09:13

ctopal added 8 commits October 20, 2022 18:06

[ibex,dv] Switch to new sequence for memory errors

488f535

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

Fix-up new_seq_lib to start after stopping

2fb3588

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

Update COSIM_VERSION to latest ibex_cosim hash

d2995a3

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

Fixup RVFI connection for pc_wdata

d257dd8

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

[rtl] Remove data_intg_err gating register writes

66d8d0f

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>

ctopal force-pushed the intg_err branch from 62877ad to 66d8d0f Compare October 20, 2022 17:18

This was referenced Nov 1, 2022

[dv] Add ecc error generation to memory agent #1731

Closed

[dv,cosim] Implement internal interrupts in co-sim #1732

Closed

Mem intg testing #1907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dv] Integrity error generation in memory busses #1811

[dv] Integrity error generation in memory busses #1811

ctopal commented Sep 20, 2022 •

edited

ctopal commented Sep 22, 2022

GregAC commented Sep 23, 2022

GregAC left a comment

GregAC Sep 23, 2022

GregAC commented Sep 23, 2022

ctopal commented Oct 14, 2022 •

edited

ctopal commented Oct 14, 2022

GregAC commented Oct 14, 2022

hcallahan-lowrisc commented Oct 17, 2022

ctopal commented Oct 17, 2022 •

edited

GregAC Oct 17, 2022

GregAC Oct 17, 2022

ctopal Oct 18, 2022

ctopal commented Oct 18, 2022

GregAC commented Oct 19, 2022

ctopal commented Oct 20, 2022

GregAC commented Oct 21, 2022

GregAC commented Oct 24, 2022

GregAC commented Nov 4, 2022

[dv] Integrity error generation in memory busses #1811

Are you sure you want to change the base?

[dv] Integrity error generation in memory busses #1811

Conversation

ctopal commented Sep 20, 2022 • edited

ctopal commented Sep 22, 2022

GregAC commented Sep 23, 2022

GregAC left a comment

Choose a reason for hiding this comment

GregAC Sep 23, 2022

Choose a reason for hiding this comment

GregAC commented Sep 23, 2022

ctopal commented Oct 14, 2022 • edited

ctopal commented Oct 14, 2022

GregAC commented Oct 14, 2022

hcallahan-lowrisc commented Oct 17, 2022

ctopal commented Oct 17, 2022 • edited

GregAC Oct 17, 2022

Choose a reason for hiding this comment

GregAC Oct 17, 2022

Choose a reason for hiding this comment

ctopal Oct 18, 2022

Choose a reason for hiding this comment

ctopal commented Oct 18, 2022

GregAC commented Oct 19, 2022

ctopal commented Oct 20, 2022

GregAC commented Oct 21, 2022

GregAC commented Oct 24, 2022

GregAC commented Nov 4, 2022

ctopal commented Sep 20, 2022 •

edited

ctopal commented Oct 14, 2022 •

edited

ctopal commented Oct 17, 2022 •

edited