-
-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for memories with read latency higher than 1 #1392
Comments
Agree. One question i have, how those 2 cycle latency memory IP support a clock enable for the second stage ? (if at all) |
XPM does not support a clock enable for the memory itself, only for the last register stage for output:
Definition for memory latency:
|
Isn't it what ena / enb are for ? first stage "clock enable" So ena / enb => first stage |
The way i see it to more foward would change the current : def readSyncImpl(address: UInt, data : Data, enable: Bool = null, readUnderWrite: ReadUnderWritePolicy = dontCare, clockCrossing: Boolean = false, allowMixedWidth: Boolean = false): Unit = { into def readSyncImpl(address: UInt, data : Data, enable: Bool = null, readUnderWrite: ReadUnderWritePolicy = dontCare, clockCrossing: Boolean = false, allowMixedWidth: Boolean = false, latency : Int = 1, stageEnable : Seq[Bool] = Nil): Unit = { And the adjuste things around it. Same for the readWrite ports That would fit ? |
Hmm I'm a bit confused about the stage enable signals being separate; what's the point of having them all separate? For instance I'm not sure how will this map to the en/regce scheme of the XPM. In a broader sense I also don't understand, why would you want to read the value from RAM without enabling the output registers. |
The point is to be able to implement streamed memory read of multiple cycle latency, with backpresure and the ability to collapse bubbles. |
That's indeed interesting. I don't have a detailed idea how that would work, but from the viewpoint of the XPM, |
regcea => enable If that was a 3 cycle latency => So in other words : Stage 1 => enable |
No but this is not possible, since both en and regce are both single bit signals in the XPM, regardless of how many reg stages there are. Or did I understand it wrong? |
i would say it is possible up to 2 cycle latency for XMP. |
Sounds great. I'm not too familiar with the internals of the Mem API, so it would be great if @Dolu1990 you can implement the fix; I can then start upstreaming the XPM blackboxing pass I have as seen on Gitter :) |
Will do. |
Yeah that makes sense. I think this should be left to the black boxing pass to handle, to use the output flop on the RAM primitive or to use separate stages? |
I don't know. |
Ah regarding the extra pipeline stages after reading this thread on the Xilinx forum: they're useful to reduce routing congestion on the FPGA. Pipelining with extra flops allow the placer to move logic away from the congestion site, which in turn improves QoR. Not completely useless :) |
Hmm i'm think now that maybe a way to implement it would be to keep MemReadSync as it is, and add regular Reg on its read data, then we can add special SpinalTag to help inferation / blackboxing figureing out that those register can be merged. That would have the advantage to keep things "simple" and not requiring any update in the verilog/vhdl backends. |
That sounds exciting, but how would you guarantee that the unregistered RAM output is not consumed by anyone other than the reg stage? |
What do you mean by consumed ? |
Hi @developfpga I did some experiments localy, by extending the readSyncImpl API, but now i think the best would be instead that the blackboxing phase be capable to reconize Reg after a readSync and fusion them into the blackbox. That way, we could define pipeline in a regular way, and as long as there is a simple register after the read sync, all can be "magicaly" merged. So what would be needed, is to implement a phase which iterate over the AST and add tags to Mem to document "fusion" possibilities, which can then be used by the blackboxer. |
Hi @Dolu1990 From my perspective and xilinx user guide, here is my advice. readData := ram.readSync(addr, rdEn, latency, stageEnable) default latency = 1, and no need stageEnable. and stageEnable.getBitsWidth == latency - 1 readData := ram.readSync(addr, rdEn, 2, stageEnable) The verilog code generated I expected is like always @(posedge rd_clk) begin
if (rdEn) begin
read_data <= ram[addr];
end
if (stageEnable) begin
read_data_stage_1 <= read_data ;
end
end
assign readData = read_data_stage_1 ; The verilog blackbox is like, parameter rdLatency and io rd_stage_enable are added. blackbox #(
.wordCount(1024),
.wordWidth(36),
.clockCrossing(1'b0),
.technology("auto"),
.readUnderWrite("dontCare"),
.wrAddressWidth(10),
.wrDataWidth(36),
.wrMaskWidth(1),
.wrMaskEnable(1'b0),
.rdAddressWidth(10),
.rdDataWidth(36),
.rdLatency(2) // added
) u_ram(
.wr_clk (), //i
.wr_en (), //i
.wr_mask (), //i
.wr_addr (), //i
.wr_data (), //i
.rd_clk (), //i
.rd_en (), //i
.rd_addr (), //i
.rd_data (), //o
.rd_stage_enable() // added
); So I don't understand about 'add tags', I don't think any tags need to be add here. |
Hi @Dolu1990 Any updates for this issue? |
No progress so far, i'm quite under water :/ |
Got some progress : The following code will tag every ReadSync port with a special tag when the output is only used by registers. object PlayMemSync3 extends App{
class MemDataRegTag(reg : BaseType, rs : MemPortStatement, through : List[BaseNode]) extends SpinalTag
class InferTagPhase extends PhaseNetlist{
override def impl(pc: PhaseContext): Unit = {
val algo = pc.globalData.allocateAlgoIncrementale()
val portsHits = ArrayBuffer[MemPortStatement]()
pc.walkDeclarations{
case reg : BaseType if reg.isReg && !reg.hasInit && reg.hasOnlyOneStatement && reg.isDirectionLess =>{
def walkStm(s : LeafStatement, through : List[BaseNode]): Unit = s match {
case s : DataAssignmentStatement if s.target.isInstanceOf[BaseType] =>
val target = s.target.asInstanceOf[BaseType]
if(through == Nil || s.parentScope == target.parentScope) {
walkExp(s.source, s :: through)
}
case _ =>
}
def walkExp(e: Expression, through : List[BaseNode]): Unit = e match {
case e: CastEnumToBits =>
case e: CastEnumToEnum =>
case e: CastBitsToEnum =>
case e: Cast => walkExp(e.input, e :: through)
case e: BitsRangedAccessFixed => walkExp(e.source, e :: through)
case bt: BaseType if bt.isComb && bt.hasOnlyOneStatement => walkStm(bt.head, bt :: through)
case rs: MemReadSync if reg.component == rs.component => {
println(s"hit $rs through ${through.map(e => "- " + e).mkString("\n")}")
val hitId = portsHits.size
portsHits += rs
rs.addTag(new MemDataRegTag(reg, rs, through))
through.foreach{ bn =>
bn.algoIncrementale = algo
bn.algoInt = hitId
}
}
case _ =>
}
walkStm(reg.head.asInstanceOf[DataAssignmentStatement], Nil)
}
case _ =>
}
pc.walkStatements{s =>
if(s.algoIncrementale != algo) {
s.walkDrivingExpressions { e =>
if (e.algoIncrementale == algo) {
println("Remove it")
val port = portsHits(e.algoInt)
port.removeTags(port.getTags().filter(_.isInstanceOf[MemDataRegTag]))
}
}
}
}
}
}
val config = SpinalConfig()
config.addTransformationPhase(new InferTagPhase)
config.addStandardMemBlackboxing(blackboxAllWhatsYouCan)
config.generateVerilog(new Component{
val dataType = HardType(Rgb(5,6,5))
val rom = Mem.fill(256)(dataType)
val addr = in(rom.addressType)
val readed = rom.readSync(addr)
val miaou = dataType()
miaou := readed
val buffer = RegNext(miaou)
val result = out(CombInit(buffer))
})
}
|
With the commit above, the following code seems fine : object PlayMemSync3 extends App{
val config = SpinalConfig()
config.addTransformationPhase(new MemReadBufferPhase)
config.addStandardMemBlackboxing(blackboxAllWhatsYouCan)
config.generateVerilog(new Component{
val dataType = HardType(Rgb(5,6,5))
val ram = Mem.fill(256)(dataType)
val write = ram.writePort().asSlave
val addr = in(ram.addressType)
val readed = ram.readSync(addr)
val miaou = dataType()
miaou := readed
val condA = in Bool()
val buffer = RegNextWhen(miaou, condA)
val result = out(CombInit(buffer))
})
} |
This doesn't seem to be working with following slightly modified code where object PlayMemSync3 extends App{
val config = SpinalConfig()
config.addTransformationPhase(new MemReadBufferPhase)
config.addStandardMemBlackboxing(blackboxAllWhatsYouCan)
config.generateVerilog(new Component{
val dataType = HardType(Rgb(5,6,5))
val ram = Mem.fill(256)(dataType)
val write = ram.writePort()
write.asSlave
val addr = in(ram.addressType)
val readed = ram.readSync(addr)
val miaou = dataType()
miaou := readed
val condA = in Bool()
val buffer = RegNextWhen(miaou, condA)
buffer.asOutput
//val result = out(CombInit(buffer))
})
}
|
@kartikp4892 Fixed ^^ |
At the moment, the
readSync
method ofMem
produces a result in the following cycle (A read latency of 1). However, some memory IPs (see here) support read latencies higher than 2. Especially on FPGA's, this can be important to achieve high frequency.Ideally,
Mem
would allow read latency to be configured, and convey this information to the blackboxing step.The text was updated successfully, but these errors were encountered: