

# RISC-V 분석

커널연구회([www.kernel.bz](http://www.kernel.bz))

정재준([rabi3307@nate.com](mailto:rabi3307@nate.com))



사진: 커널연구회에서 제작한 RISC-V (E31) 임베디드 보드

## 발표자 소개



정재준 ([rgb13307@nate.com](mailto:rgb13307@nate.com)) / 커널연구회([www.kernel.bz](http://www.kernel.bz))

저자는 학창시절 마이크로프로세서 제어 기술을 배웠으며 리눅스 커널을 연구하고 있다. 15년 이상 쌓아온 실무 경험을 바탕으로 "C언어와 자료구조 알고리즘", "리눅스 시스템 프로그래밍2", "리눅스 커널과 디바이스드라이버 실습2", "자료구조 알고리즘 & 머신러닝 상세분석"등의 책을 집필하고, 월간임베디드월드 및 전자과학 잡지에 다수의 글을 기고 하였다. 또한 "맞춤형 문장 자동 번역 시스템 및 이를 위한 데이터베이스 구축방법 (The System for the customized automatic sentence translation and database construction method)"라는 내용으로 프로그래밍을 하여 특허 등록 하였고, 서울시 버스와 지하철 교통카드 요금결제 단말기에 들어가는 리눅스 커널과 디바이스드라이버 개발 프로젝트를 성공적으로 수행했으며 여러가지 임베디드 제품을 개발했다. 저자는 Patterson(California 대학), Hennessy(Stanford 대학) 교수의 저서 "Computer Organization and Design" 책을 읽고 깊은 감명을 받았으며, 컴퓨터구조와 자료구조 알고리즘 효율성 연구를 주제로한 저술활동을 활발히 하고 있다. 아울러 커널연구회([www.kernel.bz](http://www.kernel.bz)) 웹사이트를 운영하면서 연구개발, 교육, 관련기술 공유 등을 위해 노력하고 있다.



- ♣ 저자가 집필한 책들 (시중서점에서 "커널연구회"로 검색하여 구매 가능)

# 목차

## 내용

|                                                          |    |
|----------------------------------------------------------|----|
| RISC-V 분석 .....                                          | 1  |
| 발표자 소개 .....                                             | 2  |
| 목차 .....                                                 | 3  |
| 1. 개요 .....                                              | 5  |
| 1.1 참고 문서 및 서적 .....                                     | 5  |
| 1.2 INSTRUCTION SET 빌드 과정 .....                          | 6  |
| 1.3 ISA (INSTRUCTION SET ARCHITECTURE) .....             | 7  |
| 1.4 CPU 성능(PERFORMANCE) 평가 .....                         | 8  |
| 1.5. INSTRUCTION SET 비교 .....                            | 9  |
| 2. RISC-V INSTRUCTION SET .....                          | 12 |
| 2.1 RISC-V 레지스터 정의 .....                                 | 13 |
| 2.2 RISC-V 기본 명령 형식 .....                                | 13 |
| 2.2.1 Integer Computational Instructions .....           | 14 |
| 2.2.2 Control Transfer Instructions .....                | 15 |
| 2.2.3 Load and Store Instructions .....                  | 15 |
| 2.2.4 Memory Model .....                                 | 16 |
| 2.2.5 Control and Status Register Instructions .....     | 16 |
| 2.3 MULTIPLICATION AND DIVISION .....                    | 17 |
| 2.3.1 Multiplication Operations .....                    | 17 |
| 2.3.2 Division Operations .....                          | 17 |
| 2.4 FLOATING-POINT INSTRUCTIONS .....                    | 18 |
| 2.4.1 Single-Precision Load and Store Instructions ..... | 18 |
| 2.4.2 Double-Precision Load and Store Instructions ..... | 19 |
| 2.4.3 Quad-Precision Load and Store Instructions .....   | 19 |
| 2.5 COMPRESSED INSTRUCTIONS .....                        | 20 |
| 2.6 ATOMIC INSTRUCTIONS .....                            | 20 |
| 3. RISC-V 어셈블리 언어 .....                                  | 21 |

|                                         |           |
|-----------------------------------------|-----------|
| 3.1 기본명령 .....                          | 21        |
| 3.2 곱셈, 나눗셈 연산 명령 .....                 | 22        |
| 3.3 실수 연산 명령 .....                      | 23        |
| 3.4 명령 사용 빈도 .....                      | 24        |
| <b>4. RISC-V ADDRESSING MODE.....</b>   | <b>25</b> |
| <b>5. PIPELINING.....</b>               | <b>26</b> |
| <b>6. FREEDOM STUDIO .....</b>          | <b>29</b> |
| <b>7. RISC-V 커널 소스 분석 .....</b>         | <b>31</b> |
| 7.1 RISC-V 컴파일러(툴체인) 설치 .....           | 31        |
| 7.2 RISC-V 커널 소스 컴파일 .....              | 33        |
| 7.2.1 최신 리눅스 커널 소스(V4.15 이상) 빌드.....    | 33        |
| 7.2.2 <i>RISC-V 커널 소스 빌드</i> .....      | 34        |
| 7.3 RISC-V 커널 소스 분석 .....               | 36        |
| 7.3.1 중요 소스 경로(요약) .....                | 36        |
| 7.3.2 64비트 메모리맵 .....                   | 37        |
| 7.3.2 <i>RISC-V 아키텍쳐 Setup 소스</i> ..... | 38        |
| 7.3.3 <i>Atomic Operation</i> .....     | 39        |
| 7.3.4 <i>SpinLock</i> .....             | 41        |
| 7.3.3 <i>Memory Barrier</i> .....       | 44        |

# 1. 개요

## 1.1 참고 문서 및 서적

### The RISC-V Instruction Set Manual (PDF)

Volume I: User-Level ISA

Document Version 2.2

May 7, 2017

### Computer Organization and Design RISC-V Edition

**David A. Patterson**

University of California, Berkeley

**John L. Hennessy**

Stanford University



## 1.2 Instruction Set 빌드 과정



출처: Computer Organization and Design RISC-V Edition (David A. Patterson, John L. Hennessy)

## 1.3 ISA (Instruction Set Architecture)

CISC(Complex Instruction Set Computer)

Computer Architecture: A Quantitative Approach (Hennessy and Patterson)

RISC(Reduced Instruction Set Computer)

MIPS, ARM Instruction Set

### RISC-V

동일한 명령 셋으로 32 비트와 64 비트 주소 명령어 모두 지원

Open Source Instruction Set

RISC-V foundation (<https://riscv.org/>)



ARM, Intel 제외

AMD, Google, Hewlett Packard, IBM, Microsoft, NVIDIA, Oracle, Qualcomm 참여



## 1.4 CPU 성능(Performance) 평가

clock cycles per instruction (CPI)

$$CPI = \frac{\text{CPU clock cycles}}{\text{Instruction count}}$$

### 전력소모

$$\text{Power} \propto 1/2 \times \text{Capacitive load} \times \text{Voltage}^2 \times \text{Frequency switched}$$

### MIPS (million instructions per second)

$$\text{MIPS} = \frac{\text{Instruction count}}{\text{Execution time} \times 10^6}$$

$$\text{MIPS} = \frac{\frac{\text{Instruction count}}{\text{Instruction count} \times \text{CPI}} \times 10^6}{\text{Clock rate}} = \frac{\text{Clock rate}}{\text{CPI} \times 10^6}$$

## 1.5. Instruction Set 비교

### Intel x86 Instruction Set

a. JE EIP + displacement

|    |           |              |
|----|-----------|--------------|
| 4  | 4         | 8            |
| JE | Condition | Displacement |

b. CALL

|      |        |
|------|--------|
| 8    | 32     |
| CALL | Offset |

c. MOV EBX, [EDI + 45]

|     |   |   |                 |              |
|-----|---|---|-----------------|--------------|
| 6   | 1 | 1 | 8               | 8            |
| MOV | d | w | r/m<br>Postbyte | Displacement |

d. PUSH ESI

|      |     |
|------|-----|
| 5    | 3   |
| PUSH | Reg |

e. ADD EAX, #6765

|     |     |   |           |
|-----|-----|---|-----------|
| 4   | 3   | 1 | 32        |
| ADD | Reg | w | Immediate |

f. TEST EDX, #42

|      |   |          |           |
|------|---|----------|-----------|
| 7    | 1 | 8        | 32        |
| TEST | w | Postbyte | Immediate |

출처: Computer Organization and Design RISC-V Edition (David A. Patterson, John L. Hennessy)

### ARM과 MIPS Instruction Set 비교

|                   |                  |                  |                                    |                     |                                |                  |
|-------------------|------------------|------------------|------------------------------------|---------------------|--------------------------------|------------------|
|                   | 31 28 27         | 20 19            | 16 15                              | 12 11               | 4 3                            | 0                |
| ARM               | Opx <sup>4</sup> | Op <sup>8</sup>  | Rs1 <sup>4</sup>                   | Rd <sup>4</sup>     | Opx <sup>8</sup>               | Rs2 <sup>4</sup> |
| Register-register | 31 26 25         | 21 20            | 16 15                              | 11 10               | 6 5                            | 0                |
| MIPS              | Op <sup>6</sup>  | Rs1 <sup>5</sup> | Rs2 <sup>5</sup>                   | Rd <sup>5</sup>     | Const <sup>5</sup>             | Op <sup>6</sup>  |
|                   | 31 28 27         | 20 19            | 16 15                              | 12 11               |                                | 0                |
| ARM               | Opx <sup>4</sup> | Op <sup>8</sup>  | Rs1 <sup>4</sup>                   | Rd <sup>4</sup>     | Const <sup>12</sup>            |                  |
| Data transfer     | 31 26 25         | 21 20            | 16 15                              |                     |                                | 0                |
| MIPS              | Op <sup>6</sup>  | Rs1 <sup>5</sup> | Rd <sup>5</sup>                    | Const <sup>16</sup> |                                |                  |
|                   | 31 28 27 24 23   |                  |                                    |                     |                                | 0                |
| ARM               | Opx <sup>4</sup> | Op <sup>4</sup>  |                                    |                     | Const <sup>24</sup>            |                  |
| Branch            | 31 26 25         | 21 20            | 16 15                              |                     |                                | 0                |
| MIPS              | Op <sup>6</sup>  | Rs1 <sup>5</sup> | Opx <sup>5</sup> /Rs2 <sup>5</sup> | Const <sup>16</sup> |                                |                  |
|                   | 31 28 27 24 23   |                  |                                    |                     |                                | 0                |
| ARM               | Opx <sup>4</sup> | Op <sup>4</sup>  |                                    | Const <sup>24</sup> |                                |                  |
| Jump/Call         | 31 26 25         |                  |                                    |                     |                                | 0                |
| MIPS              | Op <sup>6</sup>  |                  | Const <sup>26</sup>                |                     |                                |                  |
|                   |                  |                  |                                    |                     | ■ Opcode □ Register □ Constant |                  |

출처: Computer Organization and Design RISC-V Edition (David A. Patterson, John L. Hennessy)

## RISC-V와 MIPS Instruction Set 비교

### Register-register

|        | 31 | 25 24     | 20 19  | 15 14  | 12 11     | 7 6      | 0         |
|--------|----|-----------|--------|--------|-----------|----------|-----------|
| RISC-V |    | funct7(7) | rs2(5) | rs1(5) | funct3(3) | rd(5)    | opcode(7) |
| MIPS   | 31 | 26 25     | 21 20  | 16 15  | 11 10     | 6 5      | 0         |
|        |    | Op(6)     | Rs1(5) | Rs2(5) | Rd(5)     | Const(5) | Opx(6)    |

### Load

|        | 31    | 20 19         | 15 14  | 12 11     | 7 6       | 0         |
|--------|-------|---------------|--------|-----------|-----------|-----------|
| RISC-V |       | immediate(12) | rs1(5) | funct3(3) | rd(5)     | opcode(7) |
| MIPS   | 31    | 26 25         | 21 20  | 16 15     |           | 0         |
|        | Op(6) | Rs1(5)        | Rs2(5) |           | Const(16) |           |

### Store

|        | 31    | 25 24        | 20 19  | 15 14  | 12 11     | 7 6          | 0         |
|--------|-------|--------------|--------|--------|-----------|--------------|-----------|
| RISC-V |       | immediate(7) | rs2(5) | rs1(5) | funct3(3) | immediate(5) | opcode(7) |
| MIPS   | 31    | 26 25        | 21 20  | 16 15  |           | Const(16)    | 0         |
|        | Op(6) | Rs1(5)       | Rs2(5) |        | Const(16) |              |           |

### Branch

|        | 31    | 25 24        | 20 19     | 15 14  | 12 11     | 7 6          | 0         |
|--------|-------|--------------|-----------|--------|-----------|--------------|-----------|
| RISC-V |       | immediate(7) | rs2(5)    | rs1(5) | funct3(3) | immediate(5) | opcode(7) |
| MIPS   | 31    | 26 25        | 21 20     | 16 15  |           | Const(16)    | 0         |
|        | Op(6) | Rs1(5)       | Op/Rs2(5) |        | Const(16) |              |           |

출처: Computer Organization and Design RISC-V Edition (David A. Patterson, John L. Hennessy)

## Subset Naming Convention

| Subset                                   | Name      |
|------------------------------------------|-----------|
| Standard General-Purpose ISA             |           |
| Integer                                  | I         |
| Integer Multiplication and Division      | M         |
| Atomics                                  | A         |
| Single-Precision Floating-Point          | F         |
| Double-Precision Floating-Point          | D         |
| General                                  | G = IMAFD |
| Standard User-Level Extensions           |           |
| Quad-Precision Floating-Point            | Q         |
| Decimal Floating-Point                   | L         |
| 16-bit Compressed Instructions           | C         |
| Bit Manipulation                         | B         |
| Dynamic Languages                        | J         |
| Transactional Memory                     | T         |
| Packed-SIMD Extensions                   | P         |
| Vector Extensions                        | V         |
| User-Level Interrupts                    | N         |
| Non-Standard User-Level Extensions       |           |
| Non-standard extension "abc"             | Xabc      |
| Standard Supervisor-Level ISA            |           |
| Supervisor extension "def"               | Sdef      |
| Non-Standard Supervisor-Level Extensions |           |
| Supervisor extension "ghi"               | SXghi     |

## 2. RISC-V Instruction Set

### Instruction Length Encoding

|         |                                          |                                     |
|---------|------------------------------------------|-------------------------------------|
|         | xxxxxxxxxxxxxxaa                         | 16-bit ( $aa \neq 11$ )             |
|         | xxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxbbb11   | 32-bit ( $bbb \neq 111$ )           |
| ...xxxx | xxxxxxxxxxxxxxx   xxxxxxxxxxxxxxx011111  | 48-bit                              |
| ...xxxx | xxxxxxxxxxxxxxx   xxxxxxxxxxxxxxx0111111 | 64-bit                              |
| ...xxxx | xxxxxxxxxxxxxxx   xnnnnxxxxx1111111      | ( $80+16*nnn$ )-bit, $nnn \neq 111$ |
| ...xxxx | xxxxxxxxxxxxxxx   x111xxxxx1111111       | Reserved for $\geq 192$ -bits       |

Byte Address: base+4                                  base+2                                  base

Figure 1.1: RISC-V instruction length encoding.

| inst[4:2] | 000    | 001      | 010             | 011      | 100    | 101             | 110                   | 111<br>(> 32b) |
|-----------|--------|----------|-----------------|----------|--------|-----------------|-----------------------|----------------|
| inst[6:5] |        |          |                 |          |        |                 |                       |                |
| 00        | LOAD   | LOAD-FP  | <i>custom-0</i> | MISC-MEM | OP-IMM | AUIPC           | OP-IMM-32             | 48b            |
| 01        | STORE  | STORE-FP | <i>custom-1</i> | AMO      | OP     | LUI             | OP-32                 | 64b            |
| 10        | MADD   | MSUB     | NMSUB           | NMADD    | OP-FP  | <i>reserved</i> | <i>custom-2/rv128</i> | 48b            |
| 11        | BRANCH | JALR     | <i>reserved</i> | JAL      | SYSTEM | <i>reserved</i> | <i>custom-3/rv128</i> | $\geq 80b$     |

Table 19.1: RISC-V base opcode map, inst[1:0]=11

## 2.1 RISC-V 레지스터 정의

| Name    | Register number | Usage                          | Preserved on call? |
|---------|-----------------|--------------------------------|--------------------|
| x0      | 0               | The constant value 0           | n.a.               |
| x1 (ra) | 1               | Return address (link register) | yes                |
| x2 (sp) | 2               | Stack pointer                  | yes                |
| x3 (gp) | 3               | Global pointer                 | yes                |
| x4 (tp) | 4               | Thread pointer                 | yes                |
| x5-x7   | 5-7             | Temporaries                    | no                 |
| x8-x9   | 8-9             | Saved                          | yes                |
| x10-x17 | 10-17           | Arguments/results              | no                 |
| x18-x27 | 18-27           | Saved                          | yes                |
| x28-x31 | 28-31           | Temporaries                    | no                 |

FIGURE 2.14 RISC-V register conventions.

출처: Computer Organization and Design RISC-V Edition (David A. Patterson, John L. Hennessy)

For RV32, the x registers are 32 bits wide, and for RV64, they are 64 bits wide. This document uses the term XLEN to refer to the current width of an x register in bits (either 32 or 64)

## 2.2 RISC-V 기본 명령 형식



Figure 2.3: RISC-V base instruction formats showing immediate variants.

## 2.2.1 Integer Computational Instructions

### Integer Register-Immediate Instructions

| 31                | 20 19 | 15 14         | 12 11 | 7 6    | 0 |
|-------------------|-------|---------------|-------|--------|---|
| imm[11:0]         | rs1   | funct3        | rd    | opcode |   |
| 12                | 5     | 3             | 5     | 7      |   |
| I-immediate[11:0] | src   | ADDI/SLTI[U]  | dest  | OP-IMM |   |
| I-immediate[11:0] | src   | ANDI/ORI/XORI | dest  | OP-IMM |   |

| 31        | 25 24      | 20 19 | 15 14  | 12 11 | 7 6    | 0 |
|-----------|------------|-------|--------|-------|--------|---|
| imm[11:5] | imm[4:0]   | rs1   | funct3 | rd    | opcode |   |
| 7         | 5          | 5     | 3      | 5     | 7      |   |
| 0000000   | shamt[4:0] | src   | SLLI   | dest  | OP-IMM |   |
| 0000000   | shamt[4:0] | src   | SRLI   | dest  | OP-IMM |   |
| 0100000   | shamt[4:0] | src   | SRAI   | dest  | OP-IMM |   |

| 31                 | 20 | 12 11 | 7 6    | 0 |
|--------------------|----|-------|--------|---|
| imm[31:12]         |    | rd    | opcode |   |
|                    | 5  | 5     | 7      |   |
| U-immediate[31:12] |    | dest  | LUI    |   |
| U-immediate[31:12] |    | dest  | AUIPC  |   |

### Integer Register-Register Operations

| 31      | 25 24 | 20 19 | 15 14        | 12 11 | 7 6    | 0 |
|---------|-------|-------|--------------|-------|--------|---|
| funct7  | rs2   | rs1   | funct3       | rd    | opcode |   |
| 7       | 5     | 5     | 3            | 5     | 7      |   |
| 0000000 | src2  | src1  | ADD/SLT/SLTU | dest  | OP     |   |
| 0000000 | src2  | src1  | AND/OR/XOR   | dest  | OP     |   |
| 0000000 | src2  | src1  | SLL/SRL      | dest  | OP     |   |
| 0100000 | src2  | src1  | SUB/SRA      | dest  | OP     |   |

### NOP Instruction

| 31        | 20 19 | 15 14  | 12 11 | 7 6    | 0 |
|-----------|-------|--------|-------|--------|---|
| imm[11:0] | rs1   | funct3 | rd    | opcode |   |
| 12        | 5     | 3      | 5     | 7      |   |
| 0         | 0     | ADDI   | 0     | OP-IMM |   |

## 2.2.2 Control Transfer Instructions

### Unconditional Jumps

|               |          |         |         |                  |             |             |             |
|---------------|----------|---------|---------|------------------|-------------|-------------|-------------|
| 31<br>imm[20] | 30<br>10 | 21<br>1 | 20<br>8 | 19<br>imm[19:12] | 12 11<br>rd | 7 6<br>dest | 0<br>opcode |
|               |          |         |         |                  |             |             | JAL         |

|                 |                             |                    |                 |                   |             |
|-----------------|-----------------------------|--------------------|-----------------|-------------------|-------------|
| 31<br>imm[11:0] | 20 19<br>12<br>offset[11:0] | 15 14<br>5<br>base | 12 11<br>3<br>0 | 7 6<br>rd<br>dest | 0<br>opcode |
|                 |                             |                    |                 |                   | JALR        |

### Conditional Branches

|                      |                 |              |              |                 |                     |                     |                     |             |
|----------------------|-----------------|--------------|--------------|-----------------|---------------------|---------------------|---------------------|-------------|
| 31<br>imm[12]        | 30<br>imm[10:5] | 25 24<br>rs2 | 20 19<br>rs1 | 15 14<br>funct3 | 12 11<br>imm[4:1]   | 8<br>imm[11]        | 7<br>6              | 0<br>opcode |
| 1<br>offset[12,10:5] | 6<br>src2       | 5<br>src1    | 5<br>src1    | 3<br>BEQ/BNE    | 4<br>offset[11,4:1] | 1<br>offset[11,4:1] | 7<br>offset[11,4:1] | BRANCH      |
| offset[12,10:5]      | src2            | src1         | src1         | BLT[U]          | offset[11,4:1]      | offset[11,4:1]      | 7                   | BRANCH      |
| offset[12,10:5]      | src2            | src1         | src1         | BGE[U]          | offset[11,4:1]      | offset[11,4:1]      | 7                   | BRANCH      |

## 2.2.3 Load and Store Instructions

|                 |                             |                    |                     |                   |             |
|-----------------|-----------------------------|--------------------|---------------------|-------------------|-------------|
| 31<br>imm[11:0] | 20 19<br>12<br>offset[11:0] | 15 14<br>5<br>base | 12 11<br>3<br>width | 7 6<br>rd<br>dest | 0<br>opcode |
|                 |                             |                    |                     |                   | LOAD        |

|                 |                            |                   |                    |                     |                                |             |
|-----------------|----------------------------|-------------------|--------------------|---------------------|--------------------------------|-------------|
| 31<br>imm[11:5] | 25 24<br>7<br>offset[11:5] | 20 19<br>5<br>src | 15 14<br>5<br>base | 12 11<br>3<br>width | 7 6<br>imm[4:0]<br>offset[4:0] | 0<br>opcode |
|                 |                            |                   |                    |                     |                                | STORE       |

## 2.2.4 Memory Model

### RISC-V harts

| 31 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20  | 19     | 15 14 | 12 11 | 7 6      | 0 |
|----|----|----|----|----|----|----|----|----|-----|--------|-------|-------|----------|---|
| 0  | PI | PO | PR | PW | SI | SO | SR | SW | rs1 | funct3 | rd    |       | opcode   |   |
| 4  | 1  | 1  | 1  | 1  | 1  | 1  | 1  | 1  | 1   | 5      | 3     | 5     | 7        |   |
| 0  |    |    |    |    |    |    |    |    |     | 0      | FENCE | 0     | MISC-MEM |   |

| 31 | 20 19     | 15 14 | 12 11   | 7 6 | 0        |
|----|-----------|-------|---------|-----|----------|
|    | imm[11:0] | rs1   | funct3  | rd  | opcode   |
| 12 |           | 5     | 3       | 5   | 7        |
| 0  |           | 0     | FENCE.I | 0   | MISC-MEM |

## 2.2.5 Control and Status Register Instructions

### CSR Instructions

| 31          | 20 19     | 15 14  | 12 11  | 7 6 | 0      |
|-------------|-----------|--------|--------|-----|--------|
|             |           | rs1    | funct3 | rd  | opcode |
| 12          |           | 5      | 3      | 5   | 7      |
| source/dest | source    | CSRRW  | dest   |     | SYSTEM |
| source/dest | source    | CSRRS  | dest   |     | SYSTEM |
| source/dest | source    | CSRRC  | dest   |     | SYSTEM |
| source/dest | uimm[4:0] | CSRRWI | dest   |     | SYSTEM |
| source/dest | uimm[4:0] | CSRRSI | dest   |     | SYSTEM |
| source/dest | uimm[4:0] | CSRRCI | dest   |     | SYSTEM |

### Timers and Counters

| 31           | 20 19 | 15 14 | 12 11  | 7 6 | 0      |
|--------------|-------|-------|--------|-----|--------|
|              |       | rs1   | funct3 | rd  | opcode |
| 12           |       | 5     | 3      | 5   | 7      |
| RDCYCLE[H]   | 0     | CSRRS | dest   |     | SYSTEM |
| RDTIME[H]    | 0     | CSRRS | dest   |     | SYSTEM |
| RDINSTRET[H] | 0     | CSRRS | dest   |     | SYSTEM |

## 2.3 Multiplication and Division

### 2.3.1 Multiplication Operations

| 31     | 25 24      | 20 19        | 15 14          | 12 11 | 7 6    | 0 |
|--------|------------|--------------|----------------|-------|--------|---|
| funct7 | rs2        | rs1          | funct3         | rd    | opcode |   |
| 7      | 5          | 5            | 3              | 5     | 7      |   |
| MULDIV | multiplier | multiplicand | MUL/MULH[[S]U] | dest  | OP     |   |
| MULDIV | multiplier | multiplicand | MULW           | dest  | OP-32  |   |

### 2.3.2 Division Operations

| 31     | 25 24   | 20 19    | 15 14           | 12 11 | 7 6    | 0 |
|--------|---------|----------|-----------------|-------|--------|---|
| funct7 | rs2     | rs1      | funct3          | rd    | opcode |   |
| 7      | 5       | 5        | 3               | 5     | 7      |   |
| MULDIV | divisor | dividend | DIV[U]/REM[U]   | dest  | OP     |   |
| MULDIV | divisor | dividend | DIV[U]W/REM[U]W | dest  | OP-32  |   |

## 2.4 Floating-Point Instructions



Figure 8.2: Floating-point control and status register.

| Rounding Mode | Mnemonic | Meaning                                                                                                         |
|---------------|----------|-----------------------------------------------------------------------------------------------------------------|
| 000           | RNE      | Round to Nearest, ties to Even                                                                                  |
| 001           | RTZ      | Round towards Zero                                                                                              |
| 010           | RDN      | Round Down (towards $-\infty$ )                                                                                 |
| 011           | RUP      | Round Up (towards $+\infty$ )                                                                                   |
| 100           | RMM      | Round to Nearest, ties to Max Magnitude                                                                         |
| 101           |          | <i>Invalid. Reserved for future use.</i>                                                                        |
| 110           |          | <i>Invalid. Reserved for future use.</i>                                                                        |
| 111           |          | In instruction's <i>rm</i> field, selects dynamic rounding mode;<br>In Rounding Mode register, <i>Invalid</i> . |

Table 8.1: Rounding mode encoding.

### 2.4.1 Single-Precision Load and Store Instructions



## 2.4.2 Double-Precision Load and Store Instructions

| 31                 | 20 19     | 15 14  | 12 11     | 7 6          | 0 |
|--------------------|-----------|--------|-----------|--------------|---|
| imm[11:0]          | rs1       | width  | rd        | opcode       |   |
| 12<br>offset[11:0] | 5<br>base | 3<br>D | 5<br>dest | 7<br>LOAD-FP |   |

| 31                | 25 24    | 20 19     | 15 14  | 12 11            | 7 6           | 0 |
|-------------------|----------|-----------|--------|------------------|---------------|---|
| imm[11:5]         | rs2      | rs1       | width  | imm[4:0]         | opcode        |   |
| 7<br>offset[11:5] | 5<br>src | 5<br>base | 3<br>D | 5<br>offset[4:0] | 7<br>STORE-FP |   |

FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN≥64.

## 2.4.3 Quad-Precision Load and Store Instructions

| 31                 | 20 19     | 15 14  | 12 11     | 7 6          | 0 |
|--------------------|-----------|--------|-----------|--------------|---|
| imm[11:0]          | rs1       | width  | rd        | opcode       |   |
| 12<br>offset[11:0] | 5<br>base | 3<br>Q | 5<br>dest | 7<br>LOAD-FP |   |

| 31                | 25 24    | 20 19     | 15 14  | 12 11            | 7 6           | 0 |
|-------------------|----------|-----------|--------|------------------|---------------|---|
| imm[11:5]         | rs2      | rs1       | width  | imm[4:0]         | opcode        |   |
| 7<br>offset[11:5] | 5<br>src | 5<br>base | 3<br>Q | 5<br>offset[4:0] | 7<br>STORE-FP |   |

FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128.

## 2.5 Compressed Instructions

| Format | Meaning              | 15 | 14 | 13 | 12 | 11     | 10  | 9      | 8      | 7           | 6 | 5   | 4      | 3    | 2  | 1  | 0  |
|--------|----------------------|----|----|----|----|--------|-----|--------|--------|-------------|---|-----|--------|------|----|----|----|
| CR     | Register             |    |    |    |    | funct4 |     | rd/rs1 |        |             |   | rs2 |        |      | op |    |    |
| CI     | Immediate            |    |    |    |    | funct3 | imm |        | rd/rs1 |             |   | imm |        |      | op |    |    |
| CSS    | Stack-relative Store |    |    |    |    | funct3 |     |        | imm    |             |   | rs2 |        |      | op |    |    |
| CIW    | Wide Immediate       |    |    |    |    | funct3 |     |        | imm    |             |   | rd' |        |      | op |    |    |
| CL     | Load                 |    |    |    |    | funct3 |     | imm    |        | rs1'        |   | imm |        | rd'  |    |    | op |
| CS     | Store                |    |    |    |    | funct3 |     | imm    |        | rs1'        |   | imm |        | rs2' |    |    | op |
| CB     | Branch               |    |    |    |    | funct3 |     | offset |        | rs1'        |   |     | offset |      |    | op |    |
| CJ     | Jump                 |    |    |    |    | funct3 |     |        |        | jump target |   |     |        |      | op |    |    |

Table 12.1: Compressed 16-bit RVC instruction formats.

## 2.6 Atomic Instructions

| 31     | 27       | 26 | 25  | 24  | 20 19 | 15 14  | 12 | 11   | 7 6 | 0      |
|--------|----------|----|-----|-----|-------|--------|----|------|-----|--------|
| funct5 | aq       | rl |     | rs2 | rs1   | funct3 |    | rd   |     | opcode |
| 5      | 1        | 1  |     | 5   | 5     | 3      |    | 5    |     | 7      |
| LR     | ordering |    |     | 0   | addr  | width  |    | dest |     | AMO    |
| SC     | ordering |    | src |     | addr  | width  |    | dest |     | AMO    |

| 31            | 27       | 26 | 25 | 24  | 20 19 | 15 14  | 12 | 11   | 7 6 | 0      |
|---------------|----------|----|----|-----|-------|--------|----|------|-----|--------|
| funct5        | aq       | rl |    | rs2 | rs1   | funct3 |    | rd   |     | opcode |
| 5             | 1        | 1  |    | 5   | 5     | 3      |    | 5    |     | 7      |
| AMOSWAP.W/D   | ordering |    |    | src | addr  | width  |    | dest |     | AMO    |
| AMOADD.W/D    | ordering |    |    | src | addr  | width  |    | dest |     | AMO    |
| AMOAND.W/D    | ordering |    |    | src | addr  | width  |    | dest |     | AMO    |
| AMOOR.W/D     | ordering |    |    | src | addr  | width  |    | dest |     | AMO    |
| AMOXOR.W/D    | ordering |    |    | src | addr  | width  |    | dest |     | AMO    |
| AMOMAX[U].W/D | ordering |    |    | src | addr  | width  |    | dest |     | AMO    |
| AMOMIN[U].W/D | ordering |    |    | src | addr  | width  |    | dest |     | AMO    |

## 3. RISC-V 어셈블리 언어

### 3.1 기본명령

RISC-V operands

| Name                  | Example                                                        | Comments                                                                                                                                                                             |
|-----------------------|----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 32 registers          | x0-x31                                                         | Fast locations for data. In RISC-V, data must be in registers to perform arithmetic. Register x0 always equals 0.                                                                    |
| $2^{63}$ memory words | Memory[0], Memory[8], ..., Memory[18,446,744,073,709,551, 608] | Accessed only by data transfer instructions. RISC-V uses byte addresses, so sequential doubleword accesses differ by 8. Memory holds data structures, arrays, and spilled registers. |

RISC-V assembly language

| Category      | Instruction                      | Example           | Meaning                            | Comments                                   |
|---------------|----------------------------------|-------------------|------------------------------------|--------------------------------------------|
| Arithmetic    | Add                              | add x5, x6, x7    | $x5 = x6 + x7$                     | Three register operands; add               |
|               | Subtract                         | sub x5, x6, x7    | $x5 = x6 - x7$                     | Three register operands; subtract          |
|               | Add immediate                    | addi x5, x6, 20   | $x5 = x6 + 20$                     | Used to add constants                      |
| Data transfer | Load doubleword                  | ld x5, 40(x6)     | $x5 = \text{Memory}[x6 + 40]$      | Doubleword from memory to register         |
|               | Store doubleword                 | sd x5, 40(x6)     | $\text{Memory}[x6 + 40] = x5$      | Doubleword from register to memory         |
|               | Load word                        | lw x5, 40(x6)     | $x5 = \text{Memory}[x6 + 40]$      | Word from memory to register               |
|               | Load word, unsigned              | lwu x5, 40(x6)    | $x5 = \text{Memory}[x6 + 40]$      | Unsigned word from memory to register      |
|               | Store word                       | sw x5, 40(x6)     | $\text{Memory}[x6 + 40] = x5$      | Word from register to memory               |
|               | Load halfword                    | lh x5, 40(x6)     | $x5 = \text{Memory}[x6 + 40]$      | Halfword from memory to register           |
|               | Load halfword, unsigned          | lhu x5, 40(x6)    | $x5 = \text{Memory}[x6 + 40]$      | Unsigned halfword from memory to register  |
|               | Store halfword                   | sh x5, 40(x6)     | $\text{Memory}[x6 + 40] = x5$      | Halfword from register to memory           |
|               | Load byte                        | lb x5, 40(x6)     | $x5 = \text{Memory}[x6 + 40]$      | Byte from memory to register               |
|               | Load byte, unsigned              | lbu x5, 40(x6)    | $x5 = \text{Memory}[x6 + 40]$      | Byte unsigned from memory to register      |
|               | Store byte                       | sb x5, 40(x6)     | $\text{Memory}[x6 + 40] = x5$      | Byte from register to memory               |
|               | Load reserved                    | lr.d x5, (x6)     | $x5 = \text{Memory}[x6]$           | Load; 1st half of atomic swap              |
|               | Store conditional                | sc.d x7, x5, (x6) | $\text{Memory}[x6] = x5; x7 = 0/1$ | Store; 2nd half of atomic swap             |
|               | Load upper immediate             | lui x5, 0x12345   | $x5 = 0x12345000$                  | Loads 20-bit constant shifted left 12 bits |
| Logical       | And                              | and x5, x6, x7    | $x5 = x6 \& x7$                    | Three reg. operands; bit-by-bit AND        |
|               | Inclusive or                     | or x5, x6, x8     | $x5 = x6   x8$                     | Three reg. operands; bit-by-bit OR         |
|               | Exclusive or                     | xor x5, x6, x9    | $x5 = x6 ^ x9$                     | Three reg. operands; bit-by-bit XOR        |
|               | And immediate                    | andi x5, x6, 20   | $x5 = x6 \& 20$                    | Bit-by-bit AND reg. with constant          |
|               | Inclusive or immediate           | ori x5, x6, 20    | $x5 = x6   20$                     | Bit-by-bit OR reg. with constant           |
|               | Exclusive or immediate           | xori x5, x6, 20   | $x5 = x6 ^ 20$                     | Bit-by-bit XOR reg. with constant          |
| Shift         | Shift left logical               | sll x5, x6, x7    | $x5 = x6 \ll x7$                   | Shift left by register                     |
|               | Shift right logical              | srl x5, x6, x7    | $x5 = x6 \gg x7$                   | Shift right by register                    |
|               | Shift right arithmetic           | sra x5, x6, x7    | $x5 = x6 \gg x7$                   | Arithmetic shift right by register         |
|               | Shift left logical immediate     | slli x5, x6, 3    | $x5 = x6 \ll 3$                    | Shift left by immediate                    |
|               | Shift right logical immediate    | srli x5, x6, 3    | $x5 = x6 \gg 3$                    | Shift right by immediate                   |
|               | Shift right arithmetic immediate | srai x5, x6, 3    | $x5 = x6 \gg 3$                    | Arithmetic shift right by immediate        |

|                      |                                      |                  |                                |                                                            |
|----------------------|--------------------------------------|------------------|--------------------------------|------------------------------------------------------------|
| Conditional branch   | Branch if equal                      | beq x5, x6, 100  | if ( $x5 == x6$ ) go to PC+100 | PC-relative branch if registers equal                      |
|                      | Branch if not equal                  | bne x5, x6, 100  | if ( $x5 != x6$ ) go to PC+100 | PC-relative branch if registers not equal                  |
|                      | Branch if less than                  | blt x5, x6, 100  | if ( $x5 < x6$ ) go to PC+100  | PC-relative branch if registers less                       |
|                      | Branch if greater or equal           | bge x5, x6, 100  | if ( $x5 >= x6$ ) go to PC+100 | PC-relative branch if registers greater or equal           |
|                      | Branch if less, unsigned             | bltu x5, x6, 100 | if ( $x5 < x6$ ) go to PC+100  | PC-relative branch if registers less, unsigned             |
|                      | Branch if greater or equal, unsigned | bgeu x5, x6, 100 | if ( $x5 >= x6$ ) go to PC+100 | PC-relative branch if registers greater or equal, unsigned |
| Unconditional branch | Jump and link                        | jal x1, 100      | x1 = PC+4; go to PC+100        | PC-relative procedure call                                 |
|                      | Jump and link register               | jalr x1, 100(x5) | x1 = PC+4; go to x5+100        | Procedure return; indirect call                            |

## 3.2 곱셈, 나눗셈 연산 명령

| Category   | Instruction                      | Example           | Meaning                                       | Comments                                         |
|------------|----------------------------------|-------------------|-----------------------------------------------|--------------------------------------------------|
| Arithmetic | Add                              | add x5, x6, x7    | $x5 = x6 + x7$                                | Three register operands                          |
|            | Subtract                         | sub x5, x6, x7    | $x5 = x6 - x7$                                | Three register operands                          |
|            | Add immediate                    | addi x5, x6, 20   | $x5 = x6 + 20$                                | Used to add constants                            |
|            | Set if less than                 | slt x5, x6, x7    | $x5 = 1 \text{ if } x5 < x6, \text{ else } 0$ | Three register operands                          |
|            | Set if less than, unsigned       | sltu x5, x6, x7   | $x5 = 1 \text{ if } x5 < x6, \text{ else } 0$ | Three register operands                          |
|            | Set if less than, immediate      | slli x5, x6, x7   | $x5 = 1 \text{ if } x5 < x6, \text{ else } 0$ | Comparison with immediate                        |
|            | Set if less than immediate, uns. | slliu x5, x6, x7  | $x5 = 1 \text{ if } x5 < x6, \text{ else } 0$ | Comparison with immediate                        |
|            | Multiply                         | mul x5, x6, x7    | $x5 = x6 \times x7$                           | Lower 64 bits of 128-bit product                 |
|            | Multiply high                    | mulh x5, x6, x7   | $x5 = (x6 \times x7) \gg 64$                  | Upper 64 bits of 128-bit signed product          |
|            | Multiply high, unsigned          | mulhu x5, x6, x7  | $x5 = (x6 \times x7) \gg 64$                  | Upper 64 bits of 128-bit unsigned product        |
|            | Multiply high, signed-unsigned   | mulhsu x5, x6, x7 | $x5 = (x6 \times x7) \gg 64$                  | Upper 64 bits of 128-bit signed-unsigned product |
|            | Divide                           | div x5, x6, x7    | $x5 = x6 / x7$                                | Divide signed 64-bit numbers                     |
|            | Divide unsigned                  | divu x5, x6, x7   | $x5 = x6 / x7$                                | Divide unsigned 64-bit numbers                   |
|            | Remainder                        | rem x5, x6, x7    | $x5 = x6 \% x7$                               | Remainder of signed 64-bit division              |
|            | Remainder unsigned               | remu x5, x6, x7   | $x5 = x6 \% x7$                               | Remainder of unsigned 64-bit division            |

### 3.3 실수 연산 명령

RISC-V floating-point operands

| Name                         | Example                                                       | Comments                                                                                                                                                                             |
|------------------------------|---------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 32 floating-point registers  | f0-f31                                                        | An f-register can hold either a single-precision floating-point number or a double-precision floating-point number.                                                                  |
| $2^{61}$ memory double words | Memory[0], Memory[8], ..., Memory[18,446,744,073,709,551,608] | Accessed only by data transfer instructions. RISC-V uses byte addresses, so sequential doubleword accesses differ by 8. Memory holds data structures, arrays, and spilled registers. |

RISC-V floating-point assembly language

| Category      | Instruction                   | Example           | Meaning                                          | Comments                          |
|---------------|-------------------------------|-------------------|--------------------------------------------------|-----------------------------------|
| Arithmetic    | FP add single                 | fadd.s f0, f1, f2 | $f0 = f1 + f2$                                   | FP add (single precision)         |
|               | FP subtract single            | fsub.s f0, f1, f2 | $f0 = f1 - f2$                                   | FP subtract (single precision)    |
|               | FP multiply single            | fmul.s f0, f1, f2 | $f0 = f1 * f2$                                   | FP multiply (single precision)    |
|               | FP divide single              | fdiv.s f0, f1, f2 | $f0 = f1 / f2$                                   | FP divide (single precision)      |
|               | FP square root single         | fsqrt.s f0, f1    | $f0 = \sqrt{f1}$                                 | FP square root (single precision) |
|               | FP add double                 | fadd.d f0, f1, f2 | $f0 = f1 + f2$                                   | FP add (double precision)         |
|               | FP subtract double            | fsub.d f0, f1, f2 | $f0 = f1 - f2$                                   | FP subtract (double precision)    |
|               | FP multiply double            | fmul.d f0, f1, f2 | $f0 = f1 * f2$                                   | FP multiply (double precision)    |
|               | FP divide double              | fdiv.d f0, f1, f2 | $f0 = f1 / f2$                                   | FP divide (double precision)      |
|               | FP square root double         | fsqrt.d f0, f1    | $f0 = \sqrt{f1}$                                 | FP square root (double precision) |
| Comparison    | FP equality single            | feq.s x5, f0, f1  | $x5 = 1 \text{ if } f0 == f1, \text{ else } 0$   | FP comparison (single precision)  |
|               | FP less than single           | flt.s x5, f0, f1  | $x5 = 1 \text{ if } f0 < f1, \text{ else } 0$    | FP comparison (single precision)  |
|               | FP less than or equals single | fle.s x5, f0, f1  | $x5 = 1 \text{ if } f0 \leq f1, \text{ else } 0$ | FP comparison (single precision)  |
|               | FP equality double            | feq.d x5, f0, f1  | $x5 = 1 \text{ if } f0 == f1, \text{ else } 0$   | FP comparison (double precision)  |
|               | FP less than double           | flt.d x5, f0, f1  | $x5 = 1 \text{ if } f0 < f1, \text{ else } 0$    | FP comparison (double precision)  |
|               | FP less than or equals double | fle.d x5, f0, f1  | $x5 = 1 \text{ if } f0 \leq f1, \text{ else } 0$ | FP comparison (double precision)  |
| Data transfer | FP load word                  | flw f0, 4(x5)     | $f0 = \text{Memory}[x5 + 4]$                     | Load single-precision from memory |
|               | FP load doubleword            | fld f0, 8(x5)     | $f0 = \text{Memory}[x5 + 8]$                     | Load double-precision from memory |
|               | FP store word                 | fsw f0, 4(x5)     | $\text{Memory}[x5 + 4] = f0$                     | Store single-precision to memory  |
|               | FP store doubleword           | fsd f0, 8(x5)     | $\text{Memory}[x5 + 8] = f0$                     | Store double-precision to memory  |

### 3.4 명령 사용 빈도

| RISC-V Instruction      | Name    | Frequency | Cumulative |
|-------------------------|---------|-----------|------------|
| Add immediate           | addi    | 14.36%    | 14.36%     |
| Load doubleword         | ld      | 8.27%     | 22.63%     |
| Load fl. pt. double     | fld     | 6.83%     | 29.46%     |
| Add registers           | add     | 6.23%     | 35.69%     |
| Load word               | lw      | 4.38%     | 40.07%     |
| Store doubleword        | sd      | 4.29%     | 44.36%     |
| Branch if not equal     | bne     | 4.14%     | 48.50%     |
| Shift left immediate    | slli    | 3.65%     | 52.15%     |
| Fused mul-add double    | fmadd.d | 3.49%     | 55.64%     |
| Branch if equal         | beq     | 3.27%     | 58.91%     |
| Add immediate word      | addiw   | 2.86%     | 61.77%     |
| Store fl. pt. double    | fsd     | 2.24%     | 64.00%     |
| Multiply fl. pt. double | fmul.d  | 2.02%     | 66.02%     |
| Load upper immediate    | lui     | 1.56%     | 67.59%     |
| Store word              | sw      | 1.52%     | 69.10%     |
| Jump and link           | jal     | 1.38%     | 70.49%     |
| Branch if less than     | blt     | 1.37%     | 71.86%     |
| Add word                | addw    | 1.34%     | 73.19%     |
| Subtract fl. pt. double | fsub.d  | 1.28%     | 74.47%     |
| Branch if greater/equal | bge     | 1.27%     | 75.75%     |

FIGURE 3.24 The frequency of the RISC-V instructions for the SPEC CPU2006 benchmarks.

## 4. RISC-V Addressing Mode

### 1. Immediate addressing



### 2. Register addressing



### 3. Base addressing



### 4. PC-relative addressing



FIGURE 2.17 Illustration of four RISC-V addressing modes.

출처: Computer Organization and Design RISC-V Edition (David A. Patterson, John L. Hennessy)

## 5. Pipelining



FIGURE 4.23 The laundry analogy for pipelining.

| Instruction class            | Instruction fetch | Register read | ALU operation | Data access | Register write | Total time |
|------------------------------|-------------------|---------------|---------------|-------------|----------------|------------|
| Load doubleword (ld)         | 200 ps            | 100 ps        | 200 ps        | 200 ps      | 100 ps         | 800 ps     |
| Store doubleword (sd)        | 200 ps            | 100 ps        | 200 ps        | 200 ps      |                | 700 ps     |
| R-format (add, sub, and, or) | 200 ps            | 100 ps        | 200 ps        |             | 100 ps         | 600 ps     |
| Branch (beq)                 | 200 ps            | 100 ps        | 200 ps        |             |                | 500 ps     |

FIGURE 4.24 Total time for each instruction calculated from the time for each component.



**FIGURE 4.25 Single-cycle, nonpipelined execution (top) versus pipelined execution (bottom).**



FIGURE 4.42 Traditional multiple-clock-cycle pipeline diagram of five instructions in Figure 4.41.

$$\text{Time between instructions}_{\text{pipelined}} = \frac{\text{Time between instructions}_{\text{non-pipelined}}}{\text{Number of pipe stages}}$$

$$\frac{800,002,400 \text{ ps}}{200,001,400 \text{ ps}} \approx \frac{800 \text{ ps}}{200 \text{ ps}} \approx 4.00$$

## 6. Freedom Studio

Freedom Studio는 Eclipse 기반의 통합개발환경(IDE)이다. SiFive에서 제작한 대부분의 제품들을 이 환경에서 프로그래밍할 수 있도록 툴체인 컴파일러를 포함하고 있다.

### Freedom Studio 스크린샷



Freedom Studio 다운로드 경로:

<https://www.sifive.com/products/tools/>

위의 경로에서 다운로드 받은 파일을 압축해제하면 다음과 같은 파일들이 나타난다.

### 원도우용 Freedom Studio 파일 내용

| 이름                            | 수정한 날짜             | 유형                 | 크기      |
|-------------------------------|--------------------|--------------------|---------|
| build-tools                   | 2018-02-22 오전 8:31 | 파일 폴더              |         |
| eclipse                       | 2018-02-22 오전 8:31 | 파일 폴더              |         |
| jre                           | 2018-02-22 오전 8:29 | 파일 폴더              |         |
| SiFive                        | 2018-02-22 오전 8:31 | 파일 폴더              |         |
| Freedom Studio.exe            | 2018-01-22 오전 7:22 | 응용 프로그램            | 324KB   |
| FreedomStudio_Manual.v1p3.pdf | 2018-01-22 오전 7:22 | PDF-XChange Vie... | 1,314KB |

위의 파일 목록에서 SiFive 경로를 보면, 다음과 같이 riscv 용 툴체인 컴파일러가 있는 경로를 확인할 수 있다.

| 이름                                                  | 수정한 날짜           | 유형    |
|-----------------------------------------------------|------------------|-------|
| Documentation                                       | 2018-02-22 오전... | 파일 폴더 |
| Drivers                                             | 2018-02-22 오전... | 파일 폴더 |
| Examples                                            | 2018-02-22 오전... | 파일 폴더 |
| Licenses                                            | 2018-02-22 오전... | 파일 폴더 |
| Misc                                                | 2018-02-22 오전... | 파일 폴더 |
| riscv64-unknown-elf-gcc-20171231-x86_64-w64-mingw32 | 2018-02-22 오전... | 파일 폴더 |
| riscv-openocd-20171231-x86_64-w64-mingw32           | 2018-02-22 오전... | 파일 폴더 |

SDK(Freedom E SDK) 및 예제 소스들은 아래 링크에서 다운로드할 수 있다.

Freedom E SDK 다운로드 경로:

<https://github.com/sifive/freedom-e-sdk>

## 7. RISC-V 커널 소스 분석

### 7.1 RISC-V 컴파일러(툴체인) 설치

참고 문서:

<https://riscv.org/software-tools/>

#### 설치 스크립트

```
#!/bin/bash

export CC=gcc
export CXX=g++
export TOP=$(pwd)

git clone https://github.com/riscv/riscv-tools.git

cd $TOP/riscv-tools

git submodule update --init --recursive

sudo apt-get install autoconf automake autotools-dev curl device-tree-compiler libmpc-dev
libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc
zlib1g-dev

export RISCV=$TOP/riscv

export PATH=$PATH:$RISCV/bin

./build.sh
```

#### automake update to 1.14

automake-1.14.tar

libjaylink/Makefile.am:23: error: Libtool library used but 'LIBTOOL' is undefined

```
$ libtool --version  
libtool (GNU libtool) 2.4.2  
Written by Gordon Matzigkeit <gord@gnu.ai.mit.edu>, 1996
```

#### libtool update to 2.4.6

```
wget http://ftp.gnu.org/gnu/libtool/libtool-2.4.6.tar.gz  
tar -zxvf libtool-2.4.6.tar.gz  
cd libtool-2.4.6  
.configure --prefix=/usr/  
make  
sudo make install  
  
echo /usr/share/aclocal > /usr/local/share/aclocal/dirlist
```

```
$ libtool --version  
libtool (GNU libtool) 2.4.6  
Written by Gordon Matzigkeit, 1996
```

cc1plus: error: unrecognized command line option '-std=c++11'  
add -std=gnu++0x to your g++ command line. GCC 4.7 and later support -std=c++11 and -std=gnu++11 as well.

#### gcc update to 4.9

```
sudo add-apt-repository ppa:ubuntu-toolchain-r/test  
sudo apt-get update  
sudo apt-get install gcc-4.9 g++-4.9  
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 60 --slave  
/usr/bin/g++ g++ /usr/bin/g++-4.9
```

## 7.2 RISC-V 커널 소스 컴파일

참조: <https://www.kernel.org/>

참조: <https://kernelnewbies.org/KernelBuild>

참조: <https://riscv.org/software-tools/>

### 7.2.1 최신 리눅스 커널 소스(V4.15 이상) 받기

커널 소스 빌드 도구(패키지) 설치

```
sudo apt-get install libncurses5-dev gcc make git exuberant-ctags bc libssl-dev
```

리눅스 커널 git 보관소에서 최신 안정화 버전 소스를 다운로드 한다.

```
git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git  
cd linux-stable
```

최신의 안정화된 커널의 tag 을 검색한다.

```
git tag -l | less
```

검색한 커널 tag 을 체크아웃 한다. 여기서 tag 는 보통 커널 버전(vX.Y.Z)으로 나타난다.

```
git checkout -b stable tag
```

참고로, 현재 개발 진행중인 최신의 리눅스 커널 -rc 소스는 다음과 같이 다운로드 한다.

```
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  
cd linux
```

## 7.2.2 RISC-V 커널 소스 빌드

RISC-V 커널 소스 빌드(컴파일)는 다음과 같이 명령어를 실행한다.

### 환경 설정

```
$ make ARCH=riscv defconfig
```

또는

```
$ make ARCH=riscv menuconfig
```

### .config 파일

```
CONFIG_RISCV=y
CONFIG_MMU=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_PAGE_OFFSET=0xffffffffe000000000
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_CSUM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DMA_NOOP_OPS=y

# Platform type
# CONFIG_ARCH_RV32I is not set
CONFIG_ARCH_RV64I=y
# CONFIG_CMODEL_MEDLOW is not set
CONFIG_CMODEL_MEDANY=y
# CONFIG_MAXPHYSMEM_2GB is not set
```

```
CONFIG_MAXPHYSMEM_128GB=y
CONFIG_SMP=y
CONFIG_NR_CPUS=8
CONFIG_CPU_SUPPORTS_64BIT_KERNEL=y
CONFIG_TUNE_GENERIC=y
CONFIG_RISCV_ISA_C=y
CONFIG_RISCV_ISA_A=y

# Kernel type
CONFIG_64BIT=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_NO_BOOTMEM=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_PREEMPT_NONE=y
CONFIG_HZ_250=y
CONFIG_HZ=250

//이하 생략..
```

### 빌드(컴파일)

```
$ make ARCH=riscv CROSS_COMPILE=riscv64-unknown-elf-
```

또는

```
$ make -j# ARCH=riscv CROSS_COMPILE=riscv64-unknown-elf-
```

## 7.3 RISC-V 커널 소스 분석



RISC-V U540(64 비트) SiFive 제작



RISC-V E31(32 비트) 커널연구회 제작

### 7.3.1 중요 소스 경로(요약)

```
arch/riscv/  
arch/riscv/kernel/  
arch/riscv/lib/  
arch/riscv/mm/  
arch/riscv/include/asm  
init/  
kernel/  
kernel/irq/  
kernel/time/  
kernel/locking/  
kernel/rcu/  
kernel/sched/  
mm/  
fs/  
fs/proc/  
fs/sysfs/  
drivers/base/  
drivers/of/
```

### 7.3.2 64비트 메모리맵

#### User 공간 메모리 주소(64비트)

SP → 0000 003f ffff ffff0<sub>hex</sub>

0000 0000 1000 0000<sub>hex</sub>

PC → 0000 0000 0040 0000<sub>hex</sub>



FIGURE 2.13 The RISC-V memory allocation for program and data.

SP:255GB, PC:256KB

#### 커널공간 메모리 주소(64비트 System.map)

fffffe000000000 T \_\_init\_begin

fffffe000000000 T \_sinittext

fffffe000000000 T \_start

fffffe00002e098 D \_\_init\_end

fffffe00002e098 D \_\_per\_cpu\_end

fffffe00002e098 T \_stext

fffffe00002e098 T \_text

fffffe0005623b2 T \_etext

fffffe0005623b2 T \_sdata

fffffe000721020 G \_edata

fffffe000722ec0 B \_\_bss\_start

fffffe0007f59b8 R \_\_bss\_stop

fffffe0007f74d4 R \_end

### 7.3.2 RISC-V 아키텍쳐 Setup 소스

```
//init/main.c
start_kernel()
    setup_arch(&command_line)

//arch/riscv/kernel/setup.c
void __init setup_arch(char **cmdline_p)
{
    init_mm.start_code = (unsigned long) _stext; //fffffe0_0002e098 T _stext
    init_mm.end_code   = (unsigned long) _etext; //fffffe0_005623b2 T _etext
    init_mm.end_data   = (unsigned long) _edata; //fffffe0_00721020 G _edata
    init_mm.brk        = (unsigned long) _end; //fffffe0_007f74d4 R _end

    setup_bootmem()           //Find the memory region containing the kernel
    paging_init()             //arch/riscv/mm/init.c
    unflatten_device_tree()   //create tree of device_nodes from flat blob
#endif CONFIG_SMP
    setup_smp()
#endif
}

//arch/riscv/kernel/smpboot.c
void __init setup_smp(void)
{
    struct device_node *dn = NULL;
    int hart, im_okay_therefore_i_am = 0;

    while ((dn = of_find_node_by_type(dn, "cpu"))) {
        hart = riscv_of_processor_hart(dn);
        if (hart >= 0) {
            set_cpu_possible(hart, true);
            set_cpu_present(hart, true);
            if (hart == smp_processor_id()) {
```

```

        BUG_ON(im_okay_therefore_i_am);
        im_okay_therefore_i_am = 1;
    }
}

BUG_ON(!im_okay_therefore_i_am);
}

//kernel/cpu.c
fffffe000721198 G __cpu_active_mask
fffffe0007211a0 G __cpu_present_mask
fffffe0007211a8 G __cpu_online_mask
fffffe0007211b0 G __cpu_possible_mask
fffffe0007211b8 G __boot_cpu_id
EXPORT_SYMBOL(__cpu_possible_mask);

//include/linux/cpumask.h
static inline void
set_cpu_possible(unsigned int cpu, bool possible)
    cpumask_set_cpu(cpu, &__cpu_possible_mask);

```

### 7.3.3 Atomic Operation

#### arch/riscv/include/asm/atomic.h

```

typedef struct {
    int counter;
} atomic_t;

typedef struct {
    long counter;
} atomic64_t;

/***
static inline void atomic_add(int i, atomic_t *v)
{
    __asm__ __volatile__(
        "amo" "add" ".w" "zero, %1, %0"
        : "+A" (v->counter)

```

```

.file   "atomic.c"
.option nopic
.text
.align 1
.type   atomic_add, @function
atomic_add:
    addi   sp,sp,-32
    sd    s0,24(sp)
    addi   s0,sp,32
    mv    a5,a0
    sd    a1,-32(s0)
    sw    a5,-20(s0)
    ld    a5,-32(s0)
    lw    a4,-20(s0)
    ld    a3,-32(s0)

```

```

        : "r" (i)
        : "memory");
}
*/
#define ATOMIC_OP(op, asm_op, l, asm_type,
c_type, prefix)           W
static inline void atomic##prefix##_##op(c_type
i, atomic##prefix##_t *v) W
{
    __asm__ __volatile__ ( W
        "amo" #asm_op ". " #asm_type " zero, %1, %0"
        W
        : "+A" (v->counter) W
        : "r" (l) W
        : "memory"); W
}

#define ATOMIC_OPS(op, asm_op, l)
W
ATOMIC_OP (op, asm_op, l, w, int, )
W
ATOMIC_OP (op, asm_op, l, d, long, 64)

ATOMIC_OPS(add, add, i)
ATOMIC_OPS(sub, add, -i)
ATOMIC_OPS(and, and, i)
ATOMIC_OPS(or, or, i)
ATOMIC_OPS(xor, xor, i)

#define atomic_inc(v) atomic_add(1, v)
#define atomic_dec(v) atomic_sub(1, v)

int main(void)
{
    int i = 0;
    atomic_t *v;
    long j = 0;
    atomic64_t *w;

    atomic_inc(v);
    atomic_add(i, v);
    atomic64_add(j, w);

    /**
     atomic_add(i, v);
     atomic_sub(i, v);
     atomic_and(i, v);
     atomic_or(i, v);
     atomic_xor(i, v);

     atomic64_add(j, w);
     atomic64_sub(j, w);
     atomic64_and(j, w);
     atomic64_or(j, w);
     atomic64_xor(j, w);
    */
}

```

```

#APP
# 35 "atomic.c" 1
    amoadd.w zero, a4, 0(a5)
# 0 "" 2
#NO_APP
    nop
    ld      s0,24(sp)
    addi   sp,sp,32
    jr      ra
    .size   atomic_add, .-atomic_add
    .align  1
    .type   atomic64_add, @function
atomic64_add:
    addi   sp,sp,-32
    sd     s0,24(sp)
    addi   s0,sp,32
    sd     a0,-24(s0)
    sd     a1,-32(s0)
    ld     a5,-32(s0)
    ld     a4,-24(s0)
    ld     a3,-32(s0)
#APP
# 35 "atomic.c" 1
    amoadd.d zero, a4, 0(a5)
# 0 "" 2
#NO_APP
    nop
    ld      s0,24(sp)
    addi   sp,sp,32
    jr      ra
    .size   atomic64_add, .-atomic64_add
    .align  1
    .globl  main
    .type   main, @function
main:
    addi   sp,sp,-48
    sd     ra,40(sp)
    sd     s0,32(sp)
    addi   s0,sp,48
    sw     zero,-20(s0)
    sd     zero,-32(s0)
    ld     a1,-40(s0)
    li     a0,1
    call   atomic_add
    lw     a5,-20(s0)
    ld     a1,-40(s0)
    mv     a0,a5
    call   atomic_add
    ld     a1,-48(s0)
    ld     a0,-32(s0)
    call   atomic64_add
    li     a5,0
    mv     a0,a5
    ld     ra,40(sp)
    ld     s0,32(sp)
    addi   sp,sp,48

```

|                |                                                                         |
|----------------|-------------------------------------------------------------------------|
| return 0;<br>} | <pre> jr      ra .size   main, .-main .ident  "GCC: (GNU) 7.2.0" </pre> |
|----------------|-------------------------------------------------------------------------|

### 7.3.4 SpinLock

#### arch/riscv/include/asm/spinlock.h

```

///__asm__ (asms : output : input : clobber);

typedef struct {
    volatile unsigned int lock;
} arch_spinlock_t;

#define __ARCH_SPIN_LOCK_UNLOCKED 0
#define arch_spin_is_locked(x) ((x)->lock != 0)

static inline void
arch_spin_unlock(arch_spinlock_t *lock)
{
    //amoswap.w.rl x0, x0 lock->lock
    //amoswap.w.rl x0, x0, 0(a5)
    __asm__ __volatile__ (
        "amoswap.w.rl x0, x0, %0"
        : "=A" (lock->lock)
        :: "memory");
}

static inline int
arch_spin_trylock(arch_spinlock_t *lock)
{
    int tmp = 1, busy;

    //amoswap.w.aq busy, tmp, lock->lock
    //amoswap.w.aq a5, a5, 0(a3)
    __asm__ __volatile__ (
        "amoswap.w.aq %0, %2, %1"
        : "=r" (busy), "+A" (lock->lock)
        : "r" (tmp)
        : "memory");

    return !busy;
}

//static inline void
arch_spin_lock(arch_spinlock_t *lock)
int main(void)
{
    arch_spinlock_t *lock;
    lock->lock = __ARCH_SPIN_LOCK_UNLOCKED;

    while (1) {
        if (arch_spin_is_locked(lock))
            continue;
}

```

```

.file    "spinlock.c"
.option  nopic
.text
.align   1
.type    arch_spin_unlock, @function
arch_spin_unlock:
    addi    sp,sp,-32
    sd     s0,24(sp)
    addi    s0,sp,32
    sd     a0,-24(s0)
    ld     a5,-24(s0)
#APP
# 16 "spinlock.c" 1
    amoswap.w.rl x0, x0, 0(a5)
# 0 "" 2
#NO_APP
    nop
    ld     s0,24(sp)
    addi   sp,sp,32
    jr     ra
    .size   arch_spin_unlock, .-
arch_spin_unlock
    .align   1
    .type    arch_spin_trylock, @function
arch_spin_trylock:
    addi    sp,sp,-48
    sd     s0,40(sp)
    addi    s0,sp,48
    sd     a0,-40(s0)
    li     a5,1
    sw     a5,-20(s0)
    ld     a3,-40(s0)
    lw     a5,-20(s0)
    ld     a4,-40(s0)
#APP
# 28 "spinlock.c" 1
    amoswap.w.aq a5, a5, 0(a3)
# 0 "" 2
#NO_APP
    sw     a5,-24(s0)
    lw     a5,-24(s0)
    sext.w a5,a5
    seqz  a5,a5
    andi  a5,a5,0xff
    sext.w a5,a5
    mv     a0,a5

```

|                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|-------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <pre>         if (arch_spin_trylock(lock))             break;     }     arch_spin_unlock(lock);      return 0; } </pre> | <pre>         ld      s0,40(sp)         addi   sp,sp,48         jr     ra         .size   arch_spin_trylock, .- arch_spin_trylock         .align 1         .globl  main         .type   main, @function main:         addi   sp,sp,-32         sd     ra,24(sp)         sd     s0,16(sp)         addi   s0,sp,32         ld     a5,-24(s0)         sw     zero,0(a5) .L8:         ld     a5,-24(s0)         lw     a5,0(a5)         sext.w a5,a5         bnez  a5,.L11         ld     a0,-24(s0)         call   arch_spin_trylock </pre> |
|-------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

**arch/arm64/include/asm/spinlock.h**

```

/*
 * Spinlock implementation.
 *
 * The memory barriers are implicit with the load-acquire and store-release
 * instructions.
 */

static inline void arch_spin_lock(arch_spinlock_t *lock)
{
    unsigned int tmp;
    arch_spinlock_t lockval, newval;

    asm volatile(
        /* Atomically increment the next ticket. */
        ARM64_LSE_ATOMIC_INSN(
            /* LL/SC */
            " prfm    pst11strm, %3\n"
        "1:  ldxr    %w0, %3\n"
        "     add    %w1, %w0, %w5\n"
        "     stxr    %w2, %w1, %3\n"
        "     cbnz    %w2, 1b\n",
            /* LSE atomics */
            "     mov    %w2, %w5\n"
            "     lddadda %w2, %w0, %3\n"
            __nops(3)
        )

        /* Did we get the lock? */
        "     eor    %w1, %w0, %w0, ror #16\n"
    );
}

```

```

"      cbz      %w1, 3f\n"
/*
 * No: spin on the owner. Send a local event to avoid missing an
 * unlock before the exclusive load.
 */
"      sev\n"
"2:    wfe\n"
"      ldxrh    %w2, %4\n"
"      eor      %w1, %w2, %w0, lsr #16\n"
"      cbnz    %w1, 2b\n"
/* We got the lock. Critical section starts here. */
"3:    :=&r (lockval), =&r (newval), =&r (tmp), +Q (*lock)
: "Q" (lock->owner), "I" (1 << TICKET_SHIFT)
: "memory";
}

static inline int arch_spin_trylock(arch_spinlock_t *lock)
{
    unsigned int tmp;
    arch_spinlock_t lockval;

    asm volatile(ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
        "      prfm    pstl1strm, %2\n"
"1:    ldxr    %w0, %2\n"
        "      eor      %w1, %w0, %w0, ror #16\n"
        "      cbnz    %w1, 2f\n"
        "      add      %w0, %w0, %3\n"
        "      stxr    %w1, %w0, %2\n"
        "      cbnz    %w1, 1b\n"
"2:", /* LSE atomics */
        "      ldr      %w0, %2\n"
        "      eor      %w1, %w0, %w0, ror #16\n"
        "      cbnz    %w1, 1f\n"
        "      add      %w1, %w0, %3\n"
        "      casa     %w0, %w1, %2\n"
        "      sub      %w1, %w1, %3\n"
        "      eor      %w1, %w1, %w0\n"
"1:")
:=&r (lockval), =&r (tmp), +Q (*lock)
: "I" (1 << TICKET_SHIFT)
: "memory");

    return !tmp;
}

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
    unsigned long tmp;

    asm volatile(ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
        "      ldrh    %w1, %0\n"
        "      add     %w1, %w1, #1\n"

```

```

        "      stirh    %w1, %0",
/* LSE atomics */
        "      mov      %w1, #1Wn"
        "      staddlh %w1, %0Wn"
__nops(1)
: "=Q" (lock->owner), "=&r" (tmp)
:
: "memory";
}

```

### 7.3.3 Memory Barrier

#### arch/riscv/include/asm/barrier.h

```

#define nop()           __asm__ __volatile__
("nop")

#define RISCV_FENCE(p, s) \
__asm__ __volatile__ ("fence " #p ", " #s : : :
"memory")

/* These barriers need to enforce ordering on
both devices or memory. */
#define mb()            RISCV_FENCE(iorw,iorw)
#define rmb()           RISCV_FENCE(ir,ir)
#define wmb()           RISCV_FENCE(ow,ow)

/* These barriers do not need to enforce ordering
on devices, just memory. */
#define __smp_mb()       RISCV_FENCE(rw,rw)
#define __smp_rmb()     RISCV_FENCE(r,r)
#define __smp_wmb()     RISCV_FENCE(w,w)

int A=1;
long B=0;

int main(void)
{
    long x, y;

    x = A;
    __smp_mb();
    x = B;

    y = x;
    return 0;
}

```

```

.file   "barrier.c"
.option nopic
.globl A
.section .sdata,"aw",@progbits
.align 2
.type   A, @object
.size   A, 4
A:
.word   1
.globl B
.section .sbss,"aw",@nobits
.align 3
.type   B, @object
.size   B, 8
B:
.zero   8
.text
.align 1
.globl main
.type   main, @function
main:
    addi   sp,sp,-32      #sp=sp-32
    sd    s0,24(sp)      #[sp+24]=s0
    addi   s0,sp,32      #s0=sp+32
    lui    a5,%hi(A)    #a5=upper[A]
    lw    a5,%lo(A)(a5)  #a5=lower[a5+A]
    sd    a5,-24(s0)    #[s0-24]=a5 #x=A
#APP
# 28 "barrier.c" 1
    fence rw,rw
# 0 "" 2
#NO_APP
    lui    a5,%hi(B)    #a5=upper[B]
    ld    a5,%lo(B)(a5)  #a5=lower[a5+B]
    sd    a5,-24(s0)    #[s0-24]=a5 #x=B
    ld    a5,-24(s0)    #a5=[s0-24]
    sd    a5,-32(s0)    #[s0-32]=a5 #y=x
    li    a5,0          #a5=0
    mv    a0,a5
    ld    s0,24(sp)      #s0=[sp+24]

```

|  |                                                                           |
|--|---------------------------------------------------------------------------|
|  | addi sp,sp,32<br>jr ra<br>.size main, .-main<br>.ident "GCC: (GNU) 7.2.0" |
|--|---------------------------------------------------------------------------|

기타 문의 사항은 커널연구회 정재준([rgb13307@nate.com](mailto:rgb13307@nate.com))에게 메일 주시기 바랍니다.

참고로, 리눅스 커널 RISC-V 소스 분석 스터디 모임을 정기적으로 할 예정입니다.

아래 링크 참조 하시어 많은 참여 부탁드립니다.

<https://kernel.bz/news>

감사합니다.