

# 五、目标代码生成

## (16. 指令选择)

魏恒峰

hfwei@nju.edu.cn

2024 年 06 月 12 日





[https://llvm.org/docs/  
CodeGenerator.html](https://llvm.org/docs/CodeGenerator.html)





(in-memory) LLVM IR    SelectionDAG    MachineInstr    MCInst



Where is “Prologue/Epilogue Insertion”?

f0-O0 @ Compiler Explorer



# f0-O1 @ Compiler Explorer

The screenshot shows the Compiler Explorer interface with two main panes. The left pane displays the LLVM IR Viewer for RISC-V code, and the right pane displays the Opt Pipeline Viewer.

**LLVM IR Viewer (RISC-V rv32gc clang (trunk)):**

```
define dso_local range(i32 0, 2)
entry:
%call = tail call i32 @_factorial
%cmp = icmp eq i32 %call, 6
%conv = zext i1 %cmp to i32
ret i32 %conv
}
# Machine code for function main:
bb.0.entry:
ADJCALLSTACKDOWN 0, 0, implicit
%2:gpr = ADDI $x0, 2
$x10 = COPY %2:gpr; example.c:4
PseudoCALL target-flags(riscv-c)
ADJCALLSTACKUP 0, 0, implicit-c
%3:gpr = COPY $x10; example.c:4
%4:gpr = ADDI %3:gpr, -6; example.c:4
%5:gpr = SLTIU killed %4:gpr, 1
$x10 = COPY %5:gpr; example.c:4
PseudoRET implicit $x10; example.c:4
# End machine code for function main:
```

**Opt Pipeline Viewer (RISC-V rv32gc clang (trunk)):**

The pipeline viewer shows the execution flow through various passes:

- AssignmentTrackingPass on [module]
- SROAPass on main
- IPSCCPPass on [module]
- GlobalOptPass on [module]
- InstCombinePass on main
- PostOrderFunctionAttrsPass on (main)
- TailCallElimPass on main
- RISC-V DAG->DAG Pattern
- Instruction Selection (riscv-isel)
- Slot index numbering (slotindexes)
- Merge disjoint stack slots (stack-coloring)
- Machine code sinking (machine-sink)
- Live Variable Analysis (livevars)
- Eliminate PHI nodes for register allocation (phi-node-elimination)
- Two-Address instruction pass (twoaddressinstruction)
- Slot index numbering (slotindexes)
- Live Interval Analysis

## Instruction Selection

Instruction Selection is the process of translating LLVM code presented to the code generator into target-specific machine instructions. There are several well-known ways to do this in the literature. LLVM uses a SelectionDAG based instruction selector.



SDISel    FastISel (per basic block)    GlobalISel (per function)



GlobalISel (per function)

```
unsigned int MUL(unsigned long long int x, unsigned int y)
{
    return x * y;
}
```

```
define dso_local i32 @MUL(i64 %x, i32 %y) local_unnamed_addr #0 {
entry:
    %0 = trunc i64 %x to i32
    %conv1 = mul i32 %0, %y
    ret i32 %conv1
}
```

```

define dso_local i32 @MUL(i64 %x, i32 %y) local_unnamed_addr #0 {
entry:
    %0 = trunc i64 %x to i32
    %conv1 = mul i32 %0, %y
    ret i32 %conv1
}

```



GraphRoot

dag-combine1 input for MUL::entry







### dag-combine2 input for MUL:entry



### scheduler input for MUL:entry



bb. 0. entry:

liveins: \$x0, \$x2

%2:gpr = COPY \$x2

%0:gpr = COPY \$x0

%3:gpr = MUL %0:gpr, %2:gpr

\$x0 = COPY %3:gpr

PseudoRET implicit \$x0

## SelectionDAG Select Phase

The Select phase is the bulk of the target-specific code for instruction selection. This phase takes a legal SelectionDAG as input, pattern matches the instructions supported by the target to this DAG, and produces a new DAG of target code. For example, consider the following LLVM fragment:





局部变量 a 与 i 的地址: 相对于 SP (栈指针) 的常数偏移量  $C_a$  和  $C_i$



LD RO, #a  
 ADD RO, RO, SP  
 ADD RO, RO, i(SP)  
 LD R1, b  
 INC R1  
 ST \*RO, R1

$R_i \leftarrow$   
 $R_i$     +     $C_a$   
{ if ( $a = 1$ )  
 INC  $R_i$   
 else  
 ADD  $R_i, R_i, \#a$  }



$= \text{ind} + + C_a R_{SP} \text{ ind} + C_i R_{SP} + M_b C_1$

- |                                                  |                       |
|--------------------------------------------------|-----------------------|
| 1) $R_i \rightarrow c_a$                         | { LD Ri, #a }         |
| 2) $R_i \rightarrow M_x$                         | { LD Ri, x }          |
| 3) $M \rightarrow = M_x R_i$                     | { ST x, Ri }          |
| 4) $M \rightarrow = \text{ind} R_i R_j$          | { ST *Ri, Rj }        |
| 5) $R_i \rightarrow \text{ind} + c_a R_j$        | { LD Ri, a(Rj) }      |
| 6) $R_i \rightarrow + R_i \text{ ind} + c_a R_j$ | { ADD Ri, Ri, a(Rj) } |
| 7) $R_i \rightarrow + R_i R_j$                   | { ADD Ri, Ri, Rj }    |
| 8) $R_i \rightarrow + R_i c_1$                   | { INC Ri }            |
| 9) $\boxed{R \rightarrow sp}$                    |                       |
| 10) $\boxed{M \rightarrow m}$                    |                       |

$= \text{ind} + + c_a sp \text{ ind} + c_i sp + m_b c_1$

前缀表示

二义性: 偏向于执行较大的归约, 而不是较小的归约

归约/归约冲突      移入/归约冲突

优先选择较长的归约      优先选择移入动作



# 算术与存储指令及树型 (鲲鹏处理器)

| 指    令              | 树    型                                                                                                |
|---------------------|-------------------------------------------------------------------------------------------------------|
| $r_i$               | x                                                                                                     |
| add $r_i, r_j, r_k$ | $+ \swarrow \searrow$                                                                                 |
| mul $r_i, r_j, r_k$ | $* \swarrow \searrow$                                                                                 |
| sub $r_i, r_j, r_k$ | $- \swarrow \searrow$                                                                                 |
| div $r_i, r_j, r_k$ | $/ \swarrow \searrow$                                                                                 |
| addi $r_i, r_j, c$  | $+ \swarrow \quad + \swarrow \quad c$                                                                 |
| subi $r_i, r_j, c$  | $- \swarrow \quad - \swarrow \quad c$                                                                 |
| ldr $r_i, [r_j, c]$ | $[] \quad [] \quad [] \quad []$<br>$c \quad + \quad c \quad + \quad c \quad  $                        |
| str $[r_j, c], r_i$ | $str \quad str \quad str \quad str$<br>$[] \quad [] \quad [] \quad []$<br>$c \quad + \quad c \quad  $ |

## 瓦片覆盖 (tiling)



```
ldr r1, [fp, a]
addi r2, r0, 4
mul r2, ri, r2
add r1, r1, r2
ldr r2, [fp, b]
str [r1, 0], r2
```



$a[i] = b$

addi r<sub>1</sub>, r<sub>0</sub>, a  
 add r<sub>1</sub>, fp, r<sub>1</sub>  
 ldr r<sub>1</sub>, [r<sub>1</sub>, 0]  
 addi r<sub>2</sub>, r<sub>0</sub>, 4  
 mul r<sub>2</sub>, r<sub>1</sub>, r<sub>2</sub>  
 add r<sub>1</sub>, r<sub>1</sub>, r<sub>2</sub>  
 addi r<sub>2</sub>, r<sub>0</sub>, b  
 add r<sub>2</sub>, fp, r<sub>2</sub>  
 ldr r<sub>2</sub>, [r<sub>2</sub>, 0]  
 str [r<sub>1</sub>, 0], r<sub>2</sub>

小树型覆盖方案

## 瓦片覆盖 (tiling)



最大吞进算法  
(maximal munch)

```
ldr r1, [fp, a]
addi r2, r0, 4
mul r2, ri, r2
add r1, r1, r2
ldr r2, [fp, b]
str [r1, 0], r2
```

最佳覆盖方案

(不存在可以合并成更小代价树型的相邻树型)

## 使用动态规划思想计算最优覆盖方案



$$\forall n \in Node. c(n) = \min_{t \in tiles(n)} \left( c(t) + \sum_{n_i \in descendants(n)} c(n_i) \right)$$

$$\text{Opt} = c(r) \quad (r \text{ is the root})$$



| 瓦片 | 指令   | 瓦片代价 | 叶子节点代价 | 总代价 |
|----|------|------|--------|-----|
|    | add  | 1    | $1+1$  | 3   |
|    | addi | 1    | 1      | 2   |
|    | addi | 1    | 1      | 2   |

| 瓦片 | 指令  | 瓦片代价 | 叶子节点代价 | 总代价 |
|----|-----|------|--------|-----|
|    | ldr | 1    | 2      | 3   |
|    | ldr | 1    | 1      | 2   |
|    | ldr | 1    | 1      | 2   |

Gabriel Hjort Blindell

# Instruction Selection

Principles, Methods, and Applications

 Springer

# Thank You!



Office 926

hfwei@nju.edu.cn