

# SOC Lab#D - SDRAM report

110061217 王彥智 110590022 陳冠晰 110000107 陳柏翰

- **Introduction of SDRAM**

Synchronous Dynamic Random Access Memory (SDRAM) control involves managing data storage and retrieval in synchronized cycles. It utilizes a clock signal for precise timing, enhancing memory access speed. Control includes commands like activate, read, and write, synchronized with the system clock. CAS latency determines the delay between column access and data output. Refresh cycles prevent data loss due to charge leakage. Memory controllers orchestrate these operations, optimizing data flow between the CPU and SDRAM, crucial for high-performance computing. Efficient SDRAM control ensures rapid and synchronized data handling, enhancing overall system responsiveness and computational efficiency.



State diagram for SDRAM

- SDRAM control design

SDRAM Controller :

SDRAM Refresh Cycle 750T :



FSM :



Initialization



Wait for  
 $\begin{cases} 7T \text{ delay (Refresh Latency)} \\ 3T \text{ delay (ACT Latency)} \\ (CAS Latency) \\ (\text{Precharging}) \end{cases}$

If refresh flags  $\Rightarrow$  Precharge  $\Rightarrow$  Refresh

precharge\_bank\_q = 3'b100

else if not Ready : Save the value (ADDR, R/W)

If rows of its Bank is open (No need to activate)

If row\_addr\_queue[Bank Addr] = Row Addr  $\Leftarrow$  Means that Row  
is already open

If (RW = 1) : next\_state = WRITE

else (RW = 0) : next\_state = READ

else : Open (Precharge) current Row  $\Rightarrow$  next\_state = PRECHARGE

precharge\_bank\_d = {0, BankAddr}

else : Open (Activate) the row  $\Rightarrow$  next\_state = ACTIVATE, since no rows open



cmd\_d = CMD\_REFRESH  $\Rightarrow$  CS = 0 RAS = 0 CAS = 0 WE = 1  
next\_state = WAIT  $\Rightarrow$  IDLE

Since refresh takes 7T latency



cmd\_d = CMD\_WRITE  $\Rightarrow$  CS = 0 RAS = 1 CAS = 0 WE = 0  
data\_q = data\_q

data\_en\_q = 1 (Enable the output Bus)

a\_d = {00, 0, ColumnAddr, 00}

ba\_d = BankAddr

next\_state = IDLE

User Address

| [22:10]     | [9:8]        | [7:0]          |
|-------------|--------------|----------------|
| Row Address | Bank Address | Column Address |
| 22          | 9            | 1 0            |

Mode Register

|          |    |         |             |    |              |   |   |   |   |   |   |   |
|----------|----|---------|-------------|----|--------------|---|---|---|---|---|---|---|
| 12       | 11 | 10      | 9           | 8  | 7            | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| Reserved | WB | Op Mode | Cas Latency | BT | Burst Length |   |   |   |   |   |   |   |

Will be used when we implement BURST command.



cmd\_d = CMD\_PRECHARGE  $\Rightarrow$  CS = 0, RAS = 0, CAS = 1, WE = 0

a\_d[0] = precharge\_bank\_q[2]

ba\_d = precharge\_bank\_q[1:0]  $\leftarrow$  closed all rows

If precharge\_bank\_q[2]  $\Rightarrow$  row\_open\_d = 4'b0000

else  $\Rightarrow$  row\_open\_d[precharge\_bank\_q[1:0]] = 1b0  $\leftarrow$  closed one row

next\_state = WAIT  $\Rightarrow$  ACTIVATE



cmd\_d = CMD\_ACTIVATE  $\Rightarrow$  CS = 0 RAS = 0 CAS = 1 WE = 1

a\_d = RowAddr

ba\_d = BankAddr

next\_state = WAIT  $\leftarrow$  WRITE if rw = 1

next\_state = WAIT  $\leftarrow$  READ if rw = 0

since ACT takes 3T delay

precharge\_bank\_q = 

|     |          |
|-----|----------|
| 2   | 1        |
| Cmd | BankAddr |

CMD\_Q[3:0] = 

|    |     |     |    |
|----|-----|-----|----|
| CS | RAS | CAS | WE |
|----|-----|-----|----|



cmd\_d = CMD\_READ  $\Rightarrow$  CS = 0 RAS = 1 CAS = 0 WE = 1

a\_d = {00, 0, ColumnAddr, 00}

ba\_d = BankAddr

next\_state = WAIT  $\Rightarrow$  READRES

↑  
since CASL = 3T

data\_d = data\_q

out\_valid = 1

next\_state = IDLE



Note : A[10] is the precharge signal

## ● SDRAM

### SDRAM

Mode Register

| 12       | 11 | 10 | 9       | 8  | 7           | 6  | 5   | 4            | 3 | 2 | 1 | 0 |
|----------|----|----|---------|----|-------------|----|-----|--------------|---|---|---|---|
| Reserved | WB |    | Op Mode |    | Cas Latency | BT |     | Burst Length |   |   |   |   |
| 0        | 0  | 0  | 1       | 00 | 011         | 0  | 000 |              |   |   |   |   |

#### Command Decode

|               |                                                                                  |                                            |
|---------------|----------------------------------------------------------------------------------|--------------------------------------------|
| Active_enable | $\sim \text{Cs}_n \& \sim \text{Ras}_n \& \text{Cas}_n \& \text{We}_n$           | From controller : CMD_ACTIVATE = 0011      |
| Aref_enable   | $\sim \text{Cs}_n \& \sim \text{Ras}_n \& \sim \text{Cas}_n \& \text{We}_n$      | From controller : CMD_REFRESH = 0001       |
| Burst_term    | $\sim \text{Cs}_n \& \text{Ras}_n \& \text{Cas}_n \& \sim \text{We}_n$           | From controller : CMD_TERMINATE = 0110     |
| Mode_renable  | $\sim \text{Cs}_n \& \sim \text{Ras}_n \& \sim \text{Cas}_n \& \sim \text{We}_n$ | From controller : CMD_LOAD_MODE_REG = 0000 |
| Prefc_enable  | $\sim \text{Cs}_n \& \sim \text{Ras}_n \& \text{Cas}_n \& \sim \text{We}_n$      | From controller : CMD_PREFETCH = 0010      |
| Read_enable   | $\sim \text{Cs}_n \& \text{Ras}_n \& \sim \text{Cas}_n \& \text{We}_n$           | From controller : CMD_READ = 0101          |
| Write_enable  | $\sim \text{Cs}_n \& \text{Ras}_n \& \sim \text{Cas}_n \& \sim \text{We}_n$      | From controller : CMD_WRITE = 0100         |



#### Pipeline Command



#### Read/Write MUX



#### Active Block



#### Terminate the WRITE Immediately

If Prefc.enable

If Data\_in.enable & (Bank = Ba || Addr[10] = 1)

    Data\_in.enable <= 0

else If Burst\_term

    If Data\_in.enable

        Data\_in.enable <= 0

else If Readable

    If Data\_in.enable

        Data\_in.enable <= 0

else If Write\_enable

    If Data\_in.enable

        Data\_in.enable <= 0

else If (Read\_enable || Command[0] == "WRITE")

    If (Read\_enable)

        Data\_in.enable <= 0

        Data\_out.enable <= 1

    else

        Data\_in.enable <= 1

        Data\_out.enable <= 0

    else

        If (Data\_in.enable)

            Data\_in.enable <= 0

        if (Data\_out.enable)

            Data\_out.enable <= 0

since the READ command takes 1T delay, we should put bank.addr in buffer

1  
0  
Data\_out\_enable\_buf  
If (Data\_out\_enable\_buf)  
| bren  
case (Bank\_temp\_buf)  
00:Dq\_reg & bdf[0]  
01:Dq\_reg & bdf[1]  
10:Dq\_reg & bdf[2]  
11:Dq\_reg & bdf[3]

Means that if  
one bit of bren  
is 1, then TRUE  
Output from Bank  
⇒ Load into Dq register



#### SDRAM:

##### 1. Dynamic Storage:

SDRAM stores data in cells that use capacitors to hold charges. SDRAM cells lose their charge over time due to leakage. To maintain the stored data, SDRAM needs to periodically refresh the charges in the cells.

##### 2. Row Activation:

Activating a row in SDRAM involves loading the contents of that row into a row buffer for read/write operations.  
⇒ Allows for faster access to the data in the row.

##### 3. Precharge Operation:

After accessing a row, it needs to be precharged to restore the charge in the cells. This is essential for the proper functioning of the memory and to prepare it for the activation of other rows.

- SDRAM FIFO request

Since the 10T latency, we pipeline our request



Since each read or write has 10T latency, we try to add request FIFO to pipeline the request. The other implementation we can think of is implementing SDRAM with burst.

- Prefetch schematic



Determine the length of the prefetch buffer. Common prefetch sizes are 4 words, 8 words. Configure the SDRAM controller to use an optimal burst length.

The burst length determines the number of data words that are transferred in a single

burst. A longer burst length can improve memory access performance. Also, we have to identify the memory access pattern so that we can implement the prefetch buffer with maximum rate of hits, which is the meaning of adding a prefetch buffer.

## Design consideration

### Block diagram



We have 3 prefetch buffers (FIR/MM/QS), here we use FIR buffer to explain our idea. Our prefetch buffer acts like shift registers, it shift out the stored data when the read access came. It prefetch data until all buffers are filled up before our workload start. If the buffer is empty, controller will send a Empty signal to tell the arbiter to let the priority of that buffer to be the last since the status of that buffer is busy, it is being filled up.

### Prefetch buffer



## Prefetch controller

| SPEC         |           |           |           |          |                                    |          |          |
|--------------|-----------|-----------|-----------|----------|------------------------------------|----------|----------|
| // INITready | FIR       | QS        | MM        |          | Signal from DMA                    |          | \        |
| // BIT       | 2         | 1         | 0         |          |                                    |          | /        |
| //+-----+    | +-----+   | +-----+   | +-----+   | +-----+  | +-----+                            | +-----+  | +-----+\ |
| // MEET/HIT  | FIR       | QS        | MM        |          | Check if meet the data in buffer   |          | /        |
| // BIT       | 2         | 1         | 0         |          |                                    |          | /        |
| //+-----+    | +-----+   | +-----+   | +-----+   | +-----+  | +-----+                            | +-----+  | +-----+\ |
| // FetchACK  | FIR       | QS        | MM        |          | The ACK signal of fetching request |          | /        |
| // BIT       | 2         | 1         | 0         |          |                                    |          | /        |
| //+-----+    | +-----+   | +-----+   | +-----+   | +-----+  | +-----+                            | +-----+  | +-----+\ |
| // STAT_REG  | FIR FULL  | FIR empty | QS FULL   | QS empty | MM FULL                            | MM empty | /        |
| // BIT       | 5         | 4         | 3         | 2        | 1                                  | 0        | /        |
| //+-----+    | +-----+   | +-----+   | +-----+   | +-----+  | +-----+                            | +-----+  | +-----+\ |
| // Module    | FIR       | QS        | MM        |          |                                    |          | /        |
| // Address   | 3800_1000 | 3800_1180 | 3800_1100 |          |                                    |          | /        |
| // Length    | 64        | 10        | 32        |          |                                    |          | /        |
| //+-----+    | +-----+   | +-----+   | +-----+   | +-----+  | +-----+                            | +-----+  | +-----+\ |

## Signal Configuration

Above figure shows our design about prefetch controller.

### — Setup

- When setup(before all buffers are FULL), data will store in SDRAM first.
- After we got the initial address(sent by DMA) => start prefetch.
- Fill the data until it buffer length, then set the state of that buffer to 'FULL'.
- After all prefetch buffer is 'FULL', call ap\_start.

### — Running

- If input address meet the saving address => HIT
- If HIT, terminate the wishbone read request by sending the ACK immediately.
- If the buffer is Empty, start prefetch the data from SDRAM into buffer.

### • Banks interleave for code and data

We intend to separate the memory for firmware instructions and data for computation. So the bank address should be modified so that the instruction and data are stored in different banks.

### • Modification the linker

Address decoder should be modified, since the original decoder use addr[9:8] to decide which bank the data or instructions to put in. However, the instruction of firmware code is larger than the 8 bits address. Therefore we modified the linker of bank from addr[9:8] to addr[13:12] to make more space (8bits to 12bits) to store the assembly instructions.

### • Observe SDRAM access conflicts with SDRAM refresh

In the SDRAM design, we know that the refresh cycle is 750T, then how does it the access signals and refresh operations collide with each other but still make it work?

- Access During Refresh:

If an SDRAM access (read or write) coincides with a refresh cycle, there may be contention for the memory bus and other resources. This can lead to increased

latency for the access operation. SDRAM controllers are designed to manage these situations by prioritizing access over refresh and scheduling them in a way that minimizes conflicts.

- Refresh Overhead:

Refresh cycles take away bandwidth and time that could otherwise be used for data access. As such, frequent refresh requirements can impact the effective bandwidth of the SDRAM. The SDRAM controller must strike a balance between ensuring timely refreshes and allowing for efficient data access.

- Waveform



This is the read pattern of SDRAM control when there's no refresh encountered. The latency of read is 7T.



This is the read pattern of SDRAM control when refresh counter= 750. As we can see, the latency of read increases from 7T to 22T since the refresh operation in the SDRAM. The 'busy' signal also lasts longer from 1T to 10T, which is because the refresh time = 10T.

- How do we expect to implement SDRAM in our final project



In the final project, we want to implement the computation (FIR, QS, and MM) using SDRAM to store the instructions and data sets.

## Further Optimization

### DMA

we plan to design a DMA to improve the execution particularly when it comes to optimizing data transfer between peripherals and memory.

### ARB

we also want to design a ARB to distribute the signals from CPU to hardware to make the tasks more efficient.

### Prefetch buffer with burst

More, since computation speed heavily dependent on the data transfer time, this is the place we are most interested in optimization. Prefetch buffer with high accurate of hit pattern can greatly reduce the time for SDRAM latency to make the task faster.

- Summary

## Performance

### Computation – FIR, QS, MM

|       | Software | Hardware without prefetch | Hardware with prefetch |
|-------|----------|---------------------------|------------------------|
| MM    | 55303    | 756                       | X                      |
| FIR   | 65890    | 1471                      | 893                    |
| QS    | 14394    | 315                       | X                      |
| total | 135587   | 1570                      | X                      |

73.2X    46.5X    45.7X

Faster than software MM

Faster than software FIR

Faster than software QS

### Overall performance Optimization

86.4X

Faster than original  
design