

# 06: Memory-Mapped I/O

Engr 315: Hardware / Software Codesign

Andrew Lukefahr

*Indiana University*



# Announcements

- P2 due Frrday

- P3 is live!

Popcount      v1

# Optimizations thus far

- Algorithmic complexity
- Removing redundant computation
- ~~Multithreading~~
- ~~Multiprocessing\*~~
- Python/C/Asm Interfacing
- **Map to Hardware**

# We could also map popcount to hardware

```
import cPopcount  
print (cPopcount.cPopcount(0))
```

```
import hwPopcount  
print hwPopcount.hwPopcount(0))
```



# Data is only transferred when

→ **ACLK** is on the positive edge AND

→ **TVALID** is 1 AND

This indicates the **MASTER** is trying to transmit new data.

→ **TREADY** is 1.

This indicates the **SLAVE** is ready to receive the data.

If either **TVALID** or **TREADY** are 0, no data is transmitted.

# Transferring data on a AXI4-Stream Bus.



# How would I flip all the bits of TDATA?

```
module custom_hw (
    input          ACLK,
    input          ARESET,
    input [31:0]   S_TDATA,
    input          S_TVALID,
    output         S_TREADY,
    output [31:0]  M_TDATA,
    output          M_TVALID,
    input          M_TREADY
);
```

```
assign M_TDATA = ~S_TDATA;
assign M_TVALID = S_TVALID;
assign S_TREADY = M_TREADY;
```

```
endmodule
```



# Let's build a buffer state machine.



# Let's build a buffer state machine.

```
module custom_hw_buf (
    input          ACLK,
    input          ARESET,
    input [31:0]   S_TDATA,
    input          S_TVALID,
    output         S_TREADY,
    output [31:0]  M_TDATA,
    output          M_TVALID,
    output         M_TREADY
);

enum {S0, S1} state, nextState;
reg [31:0] nextVal;

always_ff @ (posedge ACLK) begin
    if (ARESET) begin
        state <= S0;
        M_TDATA <= 32'h0
    end else begin
        state <= nextState;
        M_TDATA <= nextVal;
    end
end

```

```
always_comb begin
    S_TREADY = 'h1;
    M_TVALID = 'h0;
    nextState = state;
    nextVal = M_TDATA;
    case(state)
        S0: begin
            if (S_TVALID) begin
                nextState = S1;
                nextVal = S_TDATA;
            end
        end
        S1: begin
            S_TREADY = 'h0;
            M_TVALID = 'h1;
            if (M_TREADY) begin
                nextState = S0;
            end
        end
    endcase
end
endmodule
```



C: register x = arr[42];  $\leftarrow$  load

## The Input/Output Problem



$arr[42] = x; // \text{store}$



Memory  
Inputs + Outputs  
(MMIO)  
Map<sup><= 0</sup>

# CPUs + Memory

$$2^{64}$$

$$32\text{bit} = 2^{32} = 4\text{ GBytes}$$

$$64\text{bit} = 2^{64} = 1.8 \cdot 10^{19} \text{ bytes}$$

$$\text{e.g., } \sim 64 \cdot 10^9 \text{ bytes}$$

- ① Empty memory space
- ② Use memory space for I/O
- ③ MMIO is not real memory



# CPUs + Memory



# Memory-Mapped I/O

Goal: Connect I/O to memory address



store  
load  
(load)

0x5 → 0x0  
Addr  
0x0 → 0x5  
value  
0x8 → 0x2  
(Doesn't work)

load  
load

0x8 → 0x1  
0x8 → 0x0



# Memory-Mapped I/O

Yeh

connect I/O to memory address  
not CPU register

load  $0xabcd \rightarrow 5$

store  $5 \Rightarrow 0xabcd$

store  $1 \Rightarrow 0xffff$

load  $0xffff \rightarrow r0$

0xffff

0xffffe

0xabcd

0x0000



# Memory-Mapped I/O

- I/O devices pretend to be memory
- “Pretend Memory” I/O accessed with native CPU load/store instructions

# MMIO from Assembly

# ARM Registers

R0 - R15

| ARM      | Description                                 | x86                     |
|----------|---------------------------------------------|-------------------------|
| R0       | General Purpose                             | EAX                     |
| R1-R5    | General Purpose                             | EBX, ECX, EDX, ESI, EDI |
| R6-R10   | General Purpose                             | -                       |
| R11 (FP) | Frame Pointer                               | EBP                     |
| R12      | Intra Procedural Call                       | -                       |
| R13 (SP) | Stack Pointer                               | ESP                     |
| R14 (LR) | Link Register                               | -                       |
| R15 (PC) | <- Program Counter / Instruction Pointer -> | EIP                     |
| CPSR     | Current Program State Register/Flags        | EFLAGS                  |

# ARM Instructions

| Instruction | Description            | Instruction | Description                   |
|-------------|------------------------|-------------|-------------------------------|
| MOV         | Move data              | EOR         | Bitwise XOR                   |
| MVN         | Move and negate        | LDR         | Load                          |
| ADD         | Addition               | STR         | Store                         |
| SUB         | Subtraction            | LDM         | Load Multiple                 |
| MUL         | Multiplication         | STM         | Store Multiple                |
| LSL         | Logical Shift Left     | PUSH        | Push on Stack                 |
| LSR         | Logical Shift Right    | POP         | Pop off Stack                 |
| ASR         | Arithmetic Shift Right | B           | Branch                        |
| ROR         | Rotate Right           | BL          | Branch with Link              |
| CMP         | Compare                | BX          | Branch and eXchange           |
| AND         | Bitwise AND            | BLX         | Branch with Link and eXchange |
| ORR         | Bitwise OR             | SWI/SVC     | System Call                   |

# ARM Load + Store

memory of [Address in R0]

→ **LDR R2, [R0]** @ [R0] - origin address is the value found in R0.

**STR R2, [R1]** @ [R1] - destination address is the value found in R1.

put value of  
~~Rb~~ R2 into  
mem @ address R1



# MMIO Store from ASM (ARM)

```
mov r0, #0x1  
mov r1, #0x1000  
str r0,[r1]
```



# MMIO Store from ASM (ARM)

```
mov r0, 0x1  
mov r1, [0x100]  
str r0, [r1]
```



Store  $0x1$  to  $\text{mem}[0x100]$



# MMIO Load from ASM (ARM)

# MMIO Load from ASM (ARM)



# MMIO Store from C

```
#define LED_ADDR 0xfffff  
register uint32_t * LED_REG = (uint32_t *) (LED_ADDR);  
*LED_REG = 0x1;
```

# MMIO Store from C

```
→ #define LED_ADDR 0xfffff  
      uint32_t * LED_REG = (uint32_t *) (LED_ADDR);  
→ *LED_REG = 0x1;
```

#  
 $\downarrow \downarrow$   
number

$(\text{uint32\_t}^*)$        $0x100$   
 $= (\text{pointer to } \underline{0x100})$

$\{\text{LED\_REG}$

$*\text{LED\_REG} = 1$

↑ dereferenced  $\text{LED\_REG} \rightarrow$  Value pointed to by  $\text{LED\_REG}$   
 $\rightarrow 0x100$

set  $0x100 = 0x1$

# MMIO Load from C

# MMIO Load from C

```
#define SW_ADDR 0xffffe  
uint32_t * SW_REG = (uint32_t *) SW_ADDR;  
int y = (*SW_REG);  
  
↑  
Dereference to get memory  
at the address
```

*cast value as an address*

**Problem:** Does quit ever change here?  
Do I need to recompute (`!quit`)?

```
int y = 0;  
  
int quit = y,  
while(!quit)  
{  
    //your code  
    quit = y;  
}
```

$y=0$   
 $quit=0$

**Problem:** Does quit ever change here?  
Do I need to recompute (`!quit`)?

```
int y = 0;  
  
int quit = y;  
while(!quit) {  
    // your code  
    quit = y;  
}
```

⇒ `while (1){`  
    `// your code`  
    `}`

Problem: Does quit ever change here?  
Do I need to recompute (`!quit`)?  
(-O3 edition)

```
int y = 1;
```

```
int quit = y;  
while(!quit)  
{  
    //your code  
    quit = !y; //
```

j // nothing



Problem: Does quit ever change here? -03  
Do I need to recompute (`!quit`)?  
(-03 edition)

→ `int y = 0;` // hooked up to a switch

→ `int quit = y;`  
→ `while(!quit) = 1`  
{  
    // your code  
    → `quit = y;`  
}

while (i) {  
    ; ← warning }  
                  { skip whole loop

# What about here?

```
int y = 0;  
uint32_t * SW_REG = &y;
```

```
int quit == 0  
while(!quit)
```

```
{
```

```
// your code
```

```
quit = 0  
(*SW_REG);
```

```
}
```

= while( )

# What about here?

```
int y = 0; constant
uint32_t * SW_REG = &y;
int quit = (*SW_REG);
while(!quit)
{
    //your code
    quit = (*SW_REG); constant =>
} // your code
```

while () {  
 // your code  
}

# Use `volatile` for MMIO addresses!

```
#define SW_ADDR 0xffffe
volatile uint32_t * SW_REG = (uint32_t * SW_ADDR);

int quit = (*SW_REG);
while(!quit)
{
    //more code
    quit = (*SW_REG);
}
```

# volatile Variables

- volatile keyword tells compiler that the memory value is subject to change randomly.
- Use volatile for all MMIO memory. The values change randomly!

Use `volatile` for  
all MMIO memory.

# What happens here?

```
#include <stdio.h>
#include <inttypes.h>

#define REG_FOO 0x40000140

int main () {
    volatile uint32_t *reg = (uint32_t *)(REG_FOO);
    (*reg) += 3;

    printf("0x%x\n", *reg); // Prints out new value
}
```

# Let's find out...

```
vi test.c
```

```
arm-none-eabi-gcc -o test.o test.c -g  
                  -O1 --specs=nosys.specs
```

```
arm-none-eabi-objdump -DS test.o > test.dis
```

# What do the CPU and Memory need to communicate?

or I/O



# The System Bus



# Next Time

- Combine MMIO + AXI Bus

# 06: Memory-Mapped I/O

Engr 315: Hardware / Software Codesign

Andrew Lukefahr

*Indiana University*

