LLC responses to backend with the same ID are not in correct order #643

farzadfch · 2020-10-08T17:56:20Z

My understanding of AXI spec is that transactions with the same ID must be responded in the order they were received on the request channel. However, it seems that there are corner cases in which LLC does not comply with this rule. I try to explain what I have found on the attached waveform.

At marker C, LLC receives a request on address 8000B1C0 with ID = 01. This request is a miss and is sent to DRAM at marker D.
At B and F, LLC receives requests on addresses 8002B1C0 and 80001000, respectively, with ID = 01.
The request on 80001000 is a hit and LLC responds to this at marker E without following the correct ordering. So, the backend perceives this as the response to 8000B1C0 and releases the response on the memory model R channel earlier that it should (I have not included the waveform for this channel).

Please let me know if my understanding is correct.

davidbiancolin · 2020-10-08T19:03:45Z

Thanks for the detailed description, your understanding is correct -- this is a pretty serious timing bug.

An immediate workaround is to limit ID reuse to 1 in the target and increase the ID width of the interface to support the same number of transactions. Do you happen to know what agent is generating the requests with the same ID?

davidbiancolin · 2020-10-08T19:52:45Z

I think in the short term the workaround is more useful as it used to be what we did in FireChip before the Chipyard migration. The fact we lost that modification was an oversight resulting from my haste and our desire to use upstream code. That said the easiest fix is to block on requests to the same ID. Unfortunately, the front end of the cache schedules transactions in FIFO order so that would lead to HOL blocking. If you wanted to implement that you could mirror this but instead look for an ID collision, and then L189 would read:

val can_deq_read = reads.valid && !read_set_collision && !miss_resource_hazard && io.rResp.ready && !read_id_inflight

You'd have to do the same with writes.

A higher performance, more complex but more accurate solution is to improve the cache frontend to let it schedule across transactions in the read and write buffers.

farzadfch · 2020-10-09T18:51:47Z

Thank you, David. Those requests were generated by different agents (core 3, core 1, and serial adaptor). I think ID reuse is caused by AXI ID width being smaller than TL source width at mbus.

I actually use FireSim 1.5 and my target is a Quad-core BOOM with 4 MSHRs in each data cache. That results in a 9-bit TL source at mbus input. I have previously increased the AXI ID width to 5 bits from the default 4, but that obviously wasn't enough.

So I set the mbus ID width to 9 and it seems that this limits the ID reuse to 1 and fixes the original problem. (I also had to increase the ID width in the memory model here).

I still want to know if setting the ID width equal to the TL source width is the most effeinct way to limit ID reuse to 1. Most of combinations of that 9 bits are unused, so this seems not very efficient.

davidbiancolin · 2020-10-16T13:22:06Z

In the UserYanker you can limit maxFlight to some number (specifically by setting it to Some(1)). In order to achieve the best system performance you really need to finesse how TL sources are folded onto to AXI4 IDs. I'd have to RTFC for that -- it would likely require some diplomacy hacking.

1.5 is a fairly old version, i'd highly encourage that you bump to a later one with the SiFive L2. In this system just the cache masters the mbus and so it would be trivial to get more uniform use of the ID space. + We have a bunch of new features.

farzadfch · 2020-10-16T19:42:27Z

Thanks David. Not trying to get off the topic, but as for the reason why I am still using 1.5, I tried multi-core BOOM with SiFive L2 on 1.7 and 1.8, and it hangs very often.

jerryz123 · 2020-10-16T20:15:57Z

I think all of those BOOM issues have been resolved by 1.10.

farzadfch · 2020-11-09T20:36:03Z

@jerryz123, I understand that BOOM has changed a lot since Firesim 1.5, but can you pinpoint what exactly has been fixed that let BOOM to work with SiFive L2? I am thinking of using SiFive L2 with 1.5 so I don't have to go through the pain of porting my custom hardware and software to the latest Firesim version.

jerryz123 · 2020-11-09T21:37:37Z

I believe this PR had some fixes. chipsalliance/rocket-chip#2053

davidbiancolin · 2022-01-18T22:38:33Z

For 1.13.0, add an elaboration-time error if ID reuse is enabled under a model that may reorder across IDs.

Longer term fix, limit inflight requests in the transaction scheduler.

davidbiancolin added this to the 1.11.0 milestone Nov 9, 2020

davidbiancolin self-assigned this Nov 9, 2020

davidbiancolin modified the milestones: 1.11.0, 1.12.0 Jan 11, 2021

davidbiancolin modified the milestones: 1.12.0, 1.13.0 May 25, 2021

davidbiancolin linked a pull request Jan 27, 2022 that will close this issue

Add Feature To Optionally Limit ID Reuse in Timing Model #914

Open

11 tasks

davidbiancolin modified the milestones: 1.13.0, 1.14.0 Feb 17, 2022

davidbiancolin modified the milestones: 1.14.0, 1.15.0 Jun 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLC responses to backend with the same ID are not in correct order #643

LLC responses to backend with the same ID are not in correct order #643

farzadfch commented Oct 8, 2020

davidbiancolin commented Oct 8, 2020

davidbiancolin commented Oct 8, 2020

farzadfch commented Oct 9, 2020

davidbiancolin commented Oct 16, 2020

farzadfch commented Oct 16, 2020

jerryz123 commented Oct 16, 2020

farzadfch commented Nov 9, 2020

jerryz123 commented Nov 9, 2020

davidbiancolin commented Jan 18, 2022

LLC responses to backend with the same ID are not in correct order #643

LLC responses to backend with the same ID are not in correct order #643

Comments

farzadfch commented Oct 8, 2020

davidbiancolin commented Oct 8, 2020

davidbiancolin commented Oct 8, 2020

farzadfch commented Oct 9, 2020

davidbiancolin commented Oct 16, 2020

farzadfch commented Oct 16, 2020

jerryz123 commented Oct 16, 2020

farzadfch commented Nov 9, 2020

jerryz123 commented Nov 9, 2020

davidbiancolin commented Jan 18, 2022