Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add illegal instruction detection to RVC decoder #3613

Merged
merged 1 commit into from Apr 23, 2024

Conversation

chenguokai
Copy link

Related issue: In OpenXiangShan repo, this PR included some rocket-chip modifications that may contribute to upstream.

Type of change: feature request

Impact: API addition (no impact on existing code)

Development Phase: implementation

Release Notes

Add explicit illegal RVC instruction detection logic to RVC decoder, enabling a easy implementation for Sstvala

Copy link

linux-foundation-easycla bot commented Apr 10, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: chenguokai / name: Guokai Chen (c8eb0ee)

@jerryz123 jerryz123 changed the base branch from master to dev April 15, 2024 17:24
@jerryz123
Copy link
Contributor

Can you describe how these bits were determined? I notice this doesn't check for the supported ISA subset.
Are there cases where the RVCDecoder would report a illegal instruction, but not the full decoder, or vice-versa?

What is the benefit/use case of this as well? Is there really a substantial improvement over the QMC-based decode of the expanded instruction?

@chenguokai
Copy link
Author

Can you describe how these bits were determined? I notice this doesn't check for the supported ISA subset.

I determined these bits according to RISC-V spec ver 20191213. I will check newer versions and update this PR if necessary. Thanks for the reminder.

What is the benefit/use case of this as well? Is there really a substantial improvement over the QMC-based decode of the expanded instruction?

The original decoder does ensure that one illegal instruction is still decoded as illegal instruction after expansion. The benefit comes mainly from adapting to newer specifications. Sstvala requires illegal instructions be written to stval. It would be more feasible if illegal RVC instructions are identified earlier such that only first faulting compressed instruction should be kept. Otherwise the hardware has to keep all compressed instructions in case one of them is illegal, at least before decode stage.

@jerryz123
Copy link
Contributor

Are there cases where the RVCDecoder would report a illegal instruction, but not the full decoder, or vice-versa?

I think this is the crux of the issue for me. What happens if I use the RVCDecoder in a RV32 implementation? Will this detect that C.LD is illegal? Or C.FLW in a no-F implementation? The current implementation is not parameterized by ISA subset.

if illegal RVC instructions are identified earlier

I agree the challenge is identifying illegal RVC instructions earlier. But what is the relative cost of this approach, vs just doing the expand+decode-only-illegal early in the pipeline? Does that combinational logic synthesize something that is fundamentally more complex than your explicitly defined illegal-decoder?

@chenguokai
Copy link
Author

chenguokai commented Apr 16, 2024

But what is the relative cost of this approach, vs just doing the expand+decode-only-illegal early in the pipeline? Does that combinational logic synthesize something that is fundamentally more complex than your explicitly defined illegal-decoder?

This PR originally target XiangShan which utilizes this decoder. In our case, the instructions are passed through instruction buffer before they get decoded. We will have to add 16 bit to each buffer entry if we do not want to do major modification to the pipeline. Also it seems to be absurd to refactor our pipeline for this minor feature. Overall, delaying illegal RVC detection will bring at least 1(pipeline stage)*16(fetch width)*16(RVC inst width) + 48(ibuffer size)*16(RVC inst width) bit additional storage cost for our implementation.

It should be noted that our cost model may not apply for rocket-chip. I am not quite familiar with cost in your case. If the cost for rocket is minor, maybe we can just keep this modification in XiangShan branch only.

@jerryz123
Copy link
Contributor

I am not suggesting you store the expanded bits early in the frontend. You can expand+decode-illegal, then throw away the expanded bits to minimize instruction buffer size, before re-expanding later in the pipeline.

I am asking specifically about the combinational logic cost of expand+decode-illegal, vs decode-rvc-illegal. If expand+decode-illegal reduces to combinational logic with similar depth/area as direct-decode-rvc-illegal, then there really isn't much of a need for direct-decode-rvc-illegal.

I would prefer you upstream this if the use case is justified, although my comments on the correctness of this for non RV64GC implementations still stands.

@chenguokai
Copy link
Author

I am asking specifically about the combinational logic cost of expand+decode-illegal, vs decode-rvc-illegal. If expand+decode-illegal reduces to combinational logic with similar depth/area as direct-decode-rvc-illegal, then there really isn't much of a need for direct-decode-rvc-illegal.

For combinational logic only, direct-decode method will obviously introduce additional cost, as the logic deals with a strict subset of main decoder(or possibly a dedicated decode-illegal unit). But other costs should not be ignored. I have no timing report for rocket, but for XiangShan, the RVC expander lies in a timing critical path, from fetch unit to instruction buffer, so your proposal will introduce either storage cost(by appending new pipeline stages) or timing cost(by prolonging critical path).

although my comments on the correctness of this for non RV64GC implementations still stands.

As I stated previously, I will update this PR if necessary. But the necessity for mainline rocket-chip remains to be discussed.

@jerryz123
Copy link
Contributor

Thanks, I had not considered that the expand+decode might generate additional logic for instructions which have no compressed forms (I assume this is what you mean by "as the logic deals with a strict subset of main decoder").
Based on that, I am more inclined to believe that the direct-RVC-decode-illegal you have implemented has lower cost.

As I stated previously, I will update this PR if necessary. But the necessity for mainline rocket-chip remains to be discussed.

I think this change should go in mainline then, after it has been parameterized by the ISA subset.

@jerryz123 jerryz123 merged commit d8de943 into chipsalliance:dev Apr 23, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants