

---

# **Tutorial: Creating an LLVM Backend for the Cpu0 Architecture**

***Release 3.3.0***

**Chen Chung-Shu**    `gamma_chen@yahoo.com.tw`  
**Anoushe Jamshidi**    `ajamshidi@gmail.com`

July 13, 2013



# CONTENTS

|          |                                                         |            |
|----------|---------------------------------------------------------|------------|
| <b>1</b> | <b>About</b>                                            | <b>3</b>   |
| 1.1      | Authors . . . . .                                       | 3          |
| 1.2      | Contributors . . . . .                                  | 3          |
| 1.3      | Acknowledgments . . . . .                               | 3          |
| 1.4      | Support . . . . .                                       | 3          |
| 1.5      | Revision history . . . . .                              | 4          |
| 1.6      | Licensing . . . . .                                     | 5          |
| 1.7      | Preface . . . . .                                       | 5          |
| 1.8      | Prerequisites . . . . .                                 | 5          |
| 1.9      | Outline of Chapters . . . . .                           | 6          |
| <b>2</b> | <b>Cpu0 Instruction Set and LLVM Target Description</b> | <b>9</b>   |
| 2.1      | Cpu0 Processor Architecture Details . . . . .           | 9          |
| 2.2      | LLVM Structure . . . . .                                | 13         |
| 2.3      | .td: LLVM's Target Description Files . . . . .          | 16         |
| 2.4      | Creating the Initial Cpu0 .td Files . . . . .           | 16         |
| 2.5      | Write cmake file . . . . .                              | 27         |
| 2.6      | Target Registration . . . . .                           | 28         |
| 2.7      | Build libraries and td . . . . .                        | 31         |
| <b>3</b> | <b>Backend structure</b>                                | <b>35</b>  |
| 3.1      | TargetMachine structure . . . . .                       | 35         |
| 3.2      | Add AsmPrinter . . . . .                                | 61         |
| 3.3      | LLVM Code Generation Sequence . . . . .                 | 77         |
| 3.4      | DAG (Directed Acyclic Graph) . . . . .                  | 81         |
| 3.5      | Instruction Selection . . . . .                         | 81         |
| 3.6      | Add Cpu0DAGToDAGISel class . . . . .                    | 84         |
| 3.7      | Add Prologue/Epilogue functions . . . . .               | 90         |
| 3.8      | Summary of this Chapter . . . . .                       | 104        |
| <b>4</b> | <b>Adding arithmetic and local pointer support</b>      | <b>107</b> |
| 4.1      | Support arithmetic instructions . . . . .               | 107        |
| 4.2      | Operator “not” ! . . . . .                              | 115        |
| 4.3      | Display llvm IR nodes with Graphviz . . . . .           | 118        |
| 4.4      | Local variable pointer . . . . .                        | 121        |
| 4.5      | Operator mod, % . . . . .                               | 122        |
| 4.6      | Full support % . . . . .                                | 131        |
| 4.7      | Summary . . . . .                                       | 140        |

|                                                                                  |            |
|----------------------------------------------------------------------------------|------------|
| <b>5 Generating object files</b>                                                 | <b>143</b> |
| 5.1 Translate into obj file . . . . .                                            | 143        |
| 5.2 Backend Target Registration Structure . . . . .                              | 144        |
| <b>6 Global variables, structs and arrays, other type</b>                        | <b>157</b> |
| 6.1 Global variable . . . . .                                                    | 157        |
| 6.2 Array and struct support . . . . .                                           | 183        |
| 6.3 Type of char and short int . . . . .                                         | 189        |
| <b>7 Control flow statements</b>                                                 | <b>193</b> |
| 7.1 Control flow statement . . . . .                                             | 193        |
| 7.2 RISC CPU knowledge . . . . .                                                 | 208        |
| <b>8 Function call</b>                                                           | <b>209</b> |
| 8.1 Mips stack frame . . . . .                                                   | 209        |
| 8.2 Load incoming arguments from stack frame . . . . .                           | 214        |
| 8.3 Store outgoing arguments to stack frame . . . . .                            | 220        |
| 8.4 Fix issues . . . . .                                                         | 227        |
| 8.5 Support features . . . . .                                                   | 244        |
| 8.6 Summary of this chapter . . . . .                                            | 267        |
| <b>9 ELF Support</b>                                                             | <b>271</b> |
| 9.1 ELF format . . . . .                                                         | 271        |
| 9.2 ELF header and Section header table . . . . .                                | 273        |
| 9.3 Relocation Record . . . . .                                                  | 274        |
| 9.4 Cpu0 ELF related files . . . . .                                             | 279        |
| 9.5 lld . . . . .                                                                | 279        |
| 9.6 llvm-objdump . . . . .                                                       | 280        |
| 9.7 Dynamic link . . . . .                                                       | 292        |
| <b>10 Run backend</b>                                                            | <b>305</b> |
| 10.1 AsmParser support . . . . .                                                 | 305        |
| 10.2 Verilog of CPU0 . . . . .                                                   | 331        |
| 10.3 Run program on CPU0 machine . . . . .                                       | 336        |
| <b>11 Backend Optimization</b>                                                   | <b>353</b> |
| 11.1 Cpu0 backend Optimization: Remove useless JMP . . . . .                     | 353        |
| 11.2 Cpu0 Optimization: Redesign instruction sets . . . . .                      | 357        |
| <b>12 Appendix A: Getting Started: Installing LLVM and the Cpu0 example code</b> | <b>389</b> |
| 12.1 Setting Up Your Mac . . . . .                                               | 389        |
| 12.2 Setting Up Your Linux Machine . . . . .                                     | 406        |
| <b>13 Appendix B: LLVM changes</b>                                               | <b>411</b> |
| 13.1 Difference between 3.2 and 3.1 . . . . .                                    | 411        |
| 13.2 Difference in Mips backend . . . . .                                        | 418        |
| <b>14 Appendix C: instructions discuss</b>                                       | <b>419</b> |
| 14.1 Implicit operand . . . . .                                                  | 419        |
| <b>15 Todo List</b>                                                              | <b>423</b> |
| <b>16 Book example code</b>                                                      | <b>425</b> |
| <b>17 Alternate formats</b>                                                      | <b>427</b> |

**Warning:** This is a work in progress. If you would like to contribution, please push updates and patches to the main github project available at <http://github.com/Jonathan2251/lbd> for review.



# ABOUT

## 1.1 Authors

陳鍾樞

**Chen Chung-Shu** [gamma\\_chen@yahoo.com.tw](mailto:gamma_chen@yahoo.com.tw)

<http://jonathan2251.github.com/web/index.html>

**Anoushe Jamshidi** [ajamshidi@gmail.com](mailto:ajamshidi@gmail.com)

## 1.2 Contributors

Chen Wei-Ren, [chenwj@iis.sinica.edu.tw](mailto:chenwj@iis.sinica.edu.tw), assisted with text and code formatting.

Chen Zhong-Cheng, who is the author of original cpu0 verilog code.

## 1.3 Acknowledgments

We would like to thank Sean Silva, [silvas@purdue.edu](mailto:silvas@purdue.edu), for his help, encouragement, and assistance with the Sphinx document generator. Without his help, this book would not have been finished and published online. We also thank those corrections from readers who make the book more accurate.

## 1.4 Support

We also get the kind help from LLVM development mail list, [llvmdev@cs.uiuc.edu](mailto:llvmdev@cs.uiuc.edu), even we don't know them. So, our experience is you are not alone and can get help from the development list members in working with the LLVM project. They are:

Akira Hatanaka <[ahatanak@gmail.com](mailto:ahatanak@gmail.com)> in va\_arg question answer.

Ulrich Weigand <[Ulrich.Weigand@de.ibm.com](mailto:Ulrich.Weigand@de.ibm.com)> in AsmParser question answer.

## 1.5 Revision history

Version 3.3.01, Not release yet

**Version 3.3.0, Released July 13, 2013** Add Table: C operator ! corresponding IR of .bc and IR of DAG and Table: C operator ! corresponding IR of Type-legalized selection DAG and Cpu0 instructions. Add explanation in section Full support %. Add Table: Chapter 4 operators. Add Table: Chapter 3 .bc IR instructions. Rewrite Chapter 5 Global variables. Rewrite section Handle \$gp register in PIC addressing mode. Add Large Frame Stack Pointer support. Add dynamic link section in elf.rst. Re-organize Chapter 3. Re-organize Chapter 8. Re-organize Chapter 10. Re-organize Chapter 11. Re-organize Chapter 12. Fix bug that ret not \$lr register. Porting to LLVM 3.3.

**Version 3.2.15, Released June 12, 2013** Porting to llvm 3.3. Rewrite section Support arithmetic instructions of chapter Adding arithmetic and local pointer support with the table adding. Add two sentences in Preface. Add *llc -debug-pass* in section LLVM Code Generation Sequence. Remove section Adjust cpu0 instructions. Remove section Use cpu0 official LDI instead of ADDiu of Appendix-C.

**Version 3.2.14, Released May 24, 2013** Fix example code disappeared error.

**Version 3.2.13, Released May 23, 2013** Add sub-section “Setup llvm-lit on iMac” of Appendix A. Replace some code-block with literalinclude in \*.rst. Add Fig 9 of chapter Backend structure. Add section Dynamic stack allocation support of chapter Function call. Fix bug of Cpu0DelUselessJMP.cpp. Fix cpu0 instruction table errors.

**Version 3.2.12, Released March 9, 2013** Add section “Type of char and short int” of chapter “Global variables, structs and arrays, other type”.

**Version 3.2.11, Released March 8, 2013** Fix bug in generate elf of chapter “Backend Optimization”.

**Version 3.2.10, Released February 23, 2013** Add chapter “Backend Optimization”.

**Version 3.2.9, Released February 20, 2013** Correct the “Variable number of arguments” such as sum\_i(int amount, ...) errors.

**Version 3.2.8, Released February 20, 2013** Add section llvm-objdump -t -r.

**Version 3.2.7, Released February 14, 2013** Add chapter Run backend. Add Icarus Verilog tool installation in Appendix A.

**Version 3.2.6, Released February 4, 2013** Update CMP instruction implementation. Add llvm-objdump section.

**Version 3.2.5, Released January 27, 2013** Add “LLVMBackendTutorialExampleCode/llvm3.1”. Add section “Structure type support”. Change reference from Figure title to Figure number.

**Version 3.2.4, Released January 17, 2013** Update for LLVM 3.2. Change title (book name) from “Write An LLVM Backend Tutorial For Cpu0” to “Tutorial: Creating an LLVM Backend for the Cpu0 Architecture”.

**Version 3.2.3, Released January 12, 2013** Add chapter “Porting to LLVM 3.2”.

**Version 3.2.2, Released January 10, 2013** Add section “Full support %” and section “Verify DIV for operator %”.

**Version 3.2.1, Released January 7, 2013** Add Footnote for references. Reorganize chapters (Move bottom part of chapter “Global variable” to chapter “Other instruction”; Move section “Translate into obj file” to new chapter “Generate obj file”). Fix errors in Fig/otherinst/2.png and Fig/otherinst/3.png.

**Version 3.2.0, Released January 1, 2013** Add chapter Function. Move Chapter “Installing LLVM and the Cpu0 example code” from beginning to Appendix A. Add subsection “Install other tools on Linux”. Add chapter ELF.

**Version 3.1.2, Released December 15, 2012** Fix section 6.1 error by add “def : Pat<(brcond RC:\$cond, bb:\$dst), (JNEOp (CMPOp RC:\$cond, ZEROReg), bb:\$dst)>;” in last pattern. Modify section 5.5 Fix bug Cpu0InstrInfo.cpp SW to ST. Correct LW to LD; LB to LDB; SB to STB.

**Version 3.1.1, Released November 28, 2012** Add Revision history. Correct ldi instruction error (replace ldi instruction with addiu from the beginning and in the all example code). Move ldi instruction change from section of “Adjust cpu0 instruction and support type of local variable pointer” to Section “CPU0 processor architecture”. Correct some English & typing errors.

## 1.6 Licensing

---

### Todo

Add info about LLVM documentation licensing.

---

## 1.7 Preface

The LLVM Compiler Infrastructure provides a versatile structure for creating new backends. Creating a new backend should not be too difficult once you familiarize yourself with this structure. However, the available backend documentation is fairly high level and leaves out many details. This tutorial will provide step-by-step instructions to write a new backend for a new target architecture from scratch.

We will use the Cpu0 architecture as an example to build our new backend. Cpu0 is a simple RISC architecture that has been designed for educational purposes. More information about Cpu0, including its instruction set, is available [here](#). The Cpu0 example code referenced in this book can be found [here](#). As you progress from one chapter to the next, you will incrementally build the backend’s functionality.

Since Cpu0 is a simple RISC CPU for educational purpose, it make the Cpu0 llvm backend code simple too and easy to learning. In addition, Cpu0 supply the Verilog source code that you can run on your PC or FPGA platform when you go to chapter Run backend.

This tutorial was written using the LLVM 3.1 Mips backend as a reference. Since Cpu0 is an educational architecture, it is missing some key pieces of documentation needed when developing a compiler, such as an Application Binary Interface (ABI). We implement our backend borrowing information from the Mips ABI as a guide. You may want to familiarize yourself with the relevant parts of the Mips ABI as you progress through this tutorial.

## 1.8 Prerequisites

Readers should be comfortable with the C++ language and Object-Oriented Programming concepts. LLVM has been developed and implemented in C++, and it is written in a modular way so that various classes can be adapted and reused as often as possible.

Already having conceptual knowledge of how compilers work is a plus, and if you already have implemented compilers in the past you will likely have no trouble following this tutorial. As this tutorial will build up an LLVM backend step-by-step, we will introduce important concepts as necessary.

This tutorial references the following materials. We highly recommend you read these documents to get a deeper understanding of what the tutorial is teaching:

[The Architecture of Open Source Applications Chapter on LLVM](#)

[LLVM’s Target-Independent Code Generation documentation](#)

[LLVM’s TableGen Fundamentals documentation](#)

[LLVM’s Writing an LLVM Compiler Backend documentation](#)

[Description of the Tricore LLVM Backend](#)

[Mips ABI document](#)

## 1.9 Outline of Chapters

### *Cpu0 Instruction Set and LLVM Target Description:*

This chapter introduces the Cpu0 architecture, a high-level view of LLVM, and how Cpu0 will be targeted in an LLVM backend. This chapter will run you through the initial steps of building the backend, including initial work on the target description (td), setting up cmake and LLVMBuild files, and target registration. Around 750 lines of source code are added by the end of this chapter.

### *Backend structure:*

This chapter highlights the structure of an LLVM backend using UML graphs, and we continue to build the Cpu0 backend. Around 2300 lines of source code are added, most of which are common from one LLVM backend to another, regardless of the target architecture. By the end of this chapter, the Cpu0 LLVM backend will support three instructions to generate some initial assembly output.

### *Adding arithmetic and local pointer support:*

Over ten C operators and their corresponding LLVM IR instructions are introduced in this chapter. Around 345 lines of source code, mostly in .td Target Description files, are added. With these 345 lines, the backend can now translate the +, -, \*, /, &, |, ^, <<, >>, ! and % C operators into the appropriate Cpu0 assembly code. Use of the `llc` debug option and of **Graphviz** as a debug tool are introduced in this chapter.

### *Generating object files:*

Object file generation support for the Cpu0 backend is added in this chapter, as the Target Registration structure is introduced. With 700 lines of additional code, the Cpu0 backend can now generate big and little endian object files.

### *Global variables, structs and arrays, other type:*

Global variable, struct and array support, char and short int, are added in this chapter. About 300 lines of source code are added to do this. The Cpu0 supports PIC and static addressing mode, both of which are explained as their functionality is implemented.

### *Control flow statements:*

Support for the **if**, **else**, **while**, **for**, **goto** flow control statements are added in this chapter. Around 150 lines of source code added.

### *Function call:*

This chapter details the implementation of function calls in the Cpu0 backend. The stack frame, handling incoming & outgoing arguments, and their corresponding standard LLVM functions are introduced. Over 700 lines of source code are added.

### *ELF Support:*

This chapter details Cpu0 support for the well-known ELF object file format. The ELF format and binutils tools are not a part of LLVM, but are introduced. This chapter details how to use the ELF tools to verify and analyze the object files created by the Cpu0 backend. The `llvm-objdump -d` support which translates elf into hex file format is added in last section.

### *Run backend:*

Add AsmParser support for translating hand-coded assembly language into obj first. Next, design the CPU0 backend with Verilog language of Icarus tool. Finally feed the hex file which is generated by `llvm-objdump` and see the CPU0 running result.

*Backend Optimization:*

Introduce how to do backend optimization by a simple effective example, and redesign Cpu0 instruction sets to be a efficient RISC CPU.

*Appendix A: Getting Started: Installing LLVM and the Cpu0 example code:*

Details how to set up the LLVM source code, development tools, and environment setting for Mac OS X and Linux platforms.

*Appendix B: LLVM changes:*

Introduces the difference of the LLVM APIs used by Cpu0 and Mips when updating this guide between LLVM different version.

*Appendix C: instructions discuss:*

Discuss the other backend instructions.



# CPU0 INSTRUCTION SET AND LLVM TARGET DESCRIPTION

Before you begin this tutorial, you should know that you can always try to develop your own backend by porting code from existing backends. The majority of the code you will want to investigate can be found in the `/lib/Target` directory of your root LLVM installation. As most major RISC instruction sets have some similarities, this may be the avenue you might try if you are an experienced programmer and knowledgeable of compiler backends.

On the other hand, there is a steep learning curve and you may easily get stuck debugging your new backend. You can easily spend a lot of time tracing which methods are callbacks of some function, or which are calling some overridden method deep in the LLVM codebase - and with a codebase as large as LLVM, all of this can easily become difficult to keep track of. This tutorial will help you work through this process while learning the fundamentals of LLVM backend design. It will show you what is necessary to get your first backend functional and complete, and it should help you understand how to debug your backend when it produces incorrect machine code using output provided by the compiler.

This section details the Cpu0 instruction set and the structure of LLVM. The LLVM structure information is adapted from Chris Lattner's LLVM chapter of the Architecture of Open Source Applications book <sup>1</sup>. You can read the original article from the AOSA website if you prefer. Finally, you will begin to create a new LLVM backend by writing register and instruction definitions in the Target Description files which will be used in next section.

## 2.1 Cpu0 Processor Architecture Details

This subsection is based on materials available here <sup>2</sup> (Chinese) and <sup>3</sup> (English).

### 2.1.1 Brief introduction

Cpu0 is a 32-bit architecture. It has 16 general purpose registers (R0, ..., R15), the Instruction Register (IR), the memory access registers MAR & MDR. Its structure is illustrated in Figure 2.1 below.

The registers are used for the following purposes:

<sup>1</sup> Chris Lattner, **LLVM**. Published in The Architecture of Open Source Applications. <http://www.aosabook.org/en/llvm.html>

<sup>2</sup> Original Cpu0 architecture and ISA details (Chinese). <http://ccckmit.wikidot.com/ocs:cpu0>

<sup>3</sup> English translation of Cpu0 description. [http://translate.google.com.tw/translate?js=n&prev=\\_t&hl=zh-TW&ie=UTF-8&layout=2&eotf=1&sl=zh-CN&tl=en&u=http://ccckmit.wikidot.com/ocs:cpu0](http://translate.google.com.tw/translate?js=n&prev=_t&hl=zh-TW&ie=UTF-8&layout=2&eotf=1&sl=zh-CN&tl=en&u=http://ccckmit.wikidot.com/ocs:cpu0)



Figure 2.1: Architectural block diagram of the Cpu0 processor

Table 2.1: Cpu0 registers purposes

| Register | Description                   |
|----------|-------------------------------|
| IR       | Instruction register          |
| R0       | Constant register, value is 0 |
| R1-R11   | General-purpose registers     |
| R12      | Status Word register (SW)     |
| R13      | Stack Pointer register (SP)   |
| R14      | Link Register (LR)            |
| R15      | Program Counter (PC)          |
| MAR      | Memory Address Register (MAR) |
| MDR      | Memory Data Register (MDR)    |
| HI       | High part of MULT result      |
| LO       | Low part of MULT result       |

## 2.1.2 The Cpu0 Instruction Set

The Cpu0 instruction set can be divided into three types: L-type instructions, which are generally associated with memory operations, A-type instructions for arithmetic operations, and J-type instructions that are typically used when altering control flow (i.e. jumps). Figure 2.2 illustrates how the bitfields are broken down for each type of instruction.



Figure 2.2: Cpu0's three instruction formats

The following table details the Cpu0 instruction set:

- First column F.: meaning Format.

Table 2.2: Cpu0 Instruction Set

| F. | Mnemonic | Opcode | Meaning            | Syntax          | Operation           |
|----|----------|--------|--------------------|-----------------|---------------------|
| L  | LD       | 01     | Load word          | LD Ra, [Rb+Cx]  | Ra <= [Rb+Cx]       |
| L  | ST       | 02     | Store word         | ST Ra, [Rb+Cx]  | [Rb+Cx] <= Ra       |
| L  | LB       | 03     | Load byte          | LB Ra, [Rb+Cx]  | Ra <= (byte)[Rb+Cx] |
| L  | LBu      | 04     | Load byte unsigned | LBu Ra, [Rb+Cx] | Ra <= (byte)[Rb+Cx] |

Continued on next page

Table 2.2 – continued from previous page

| F. | Mnemonic | Opcode | Meaning                             | Syntax           | Operation                       |
|----|----------|--------|-------------------------------------|------------------|---------------------------------|
| L  | SB       | 05     | Store byte                          | SB Ra, [Rb+Cx]   | [Rb+Cx] <= (byte)Ra             |
| A  | LH       | 06     | Load half word unsigned             | LH Ra, [Rb+Cx]   | Ra <= (2bytes)[Rb+Cx]           |
| A  | LHu      | 07     | Load half word                      | LHu Ra, [Rb+Cx]  | Ra <= (2bytes)[Rb+Cx]           |
| A  | SH       | 08     | Store half word                     | SH Ra, [Rb+Cx]   | [Rb+Cx] <= Ra                   |
| L  | ADDiu    | 09     | Add immediate                       | ADDiu Ra, Rb, Cx | Ra <= (Rb + Cx)                 |
| L  | ANDi     | 0C     | AND imm                             | ANDi Ra, Rb, Cx  | Ra <= (Rb & Cx)                 |
| L  | ORi      | 0D     | OR                                  | ORi Ra, Rb, Cx   | Ra <= (Rb   Cx)                 |
| L  | XORi     | 0E     | XOR                                 | XORi Ra, Rb, Cx  | Ra <= (Rb ^ Cx)                 |
| A  | CMP      | 10     | Compare                             | CMP Ra, Rb       | SW <= (Ra cond Rb) <sup>4</sup> |
| A  | ADDu     | 11     | Add unsigned                        | ADD Ra, Rb, Rc   | Ra <= Rb + Rc                   |
| A  | SUBu     | 12     | Sub unsigned                        | SUB Ra, Rb, Rc   | Ra <= Rb - Rc                   |
| A  | ADD      | 13     | Add                                 | ADD Ra, Rb, Rc   | Ra <= Rb + Rc                   |
| A  | SUB      | 14     | Subtract                            | SUB Ra, Rb, Rc   | Ra <= Rb - Rc                   |
| A  | MUL      | 15     | Multiply                            | MUL Ra, Rb, Rc   | Ra <= Rb * Rc                   |
| A  | DIV      | 16     | Divide                              | DIV Ra, Rb       | HI<=Ra%Rb, LO<=Ra/Rb            |
| A  | AND      | 18     | Bitwise and                         | AND Ra, Rb, Rc   | Ra <= Rb & Rc                   |
| A  | OR       | 19     | Bitwise or                          | OR Ra, Rb, Rc    | Ra <= Rb   Rc                   |
| A  | XOR      | 1A     | Bitwise exclusive or                | XOR Ra, Rb, Rc   | Ra <= Rb ^ Rc                   |
| A  | SRA      | 1B     | Shift right                         | SHR Ra, Rb, Cx   | Ra <= (h80000000 Rb>>Cx)        |
| A  | ROL      | 1C     | Rotate left                         | ROL Ra, Rb, Cx   | Ra <= Rb rol Cx                 |
| A  | ROR      | 1D     | Rotate right                        | ROR Ra, Rb, Cx   | Ra <= Rb ror Cx                 |
| A  | SHL      | 1E     | Shift left                          | SHL Ra, Rb, Cx   | Ra <= Rb << Cx                  |
| A  | SHR      | 1F     | Shift right                         | SHR Ra, Rb, Cx   | Ra <= Rb >> Cx                  |
| J  | JEQ      | 20     | Jump if equal (==)                  | JEQ Cx           | if SW(==), PC <= PC + Cx        |
| J  | JNE      | 21     | Jump if not equal (!=)              | JNE Cx           | if SW(!=), PC <= PC + Cx        |
| J  | JLT      | 22     | Jump if less than (<)               | JLT Cx           | if SW(<), PC <= PC + Cx         |
| J  | JGT      | 23     | Jump if greater than (>)            | JGT Cx           | if SW(>), PC <= PC + Cx         |
| J  | JLE      | 24     | Jump if less than or equals (<=)    | JLE Cx           | if SW(<=), PC <= PC + Cx        |
| J  | JGE      | 25     | Jump if greater than or equals (>=) | JGE Cx           | if SW(>=), PC <= PC + Cx        |
| J  | JMP      | 26     | Jump (unconditional)                | JMP Cx           | PC <= PC + Cx                   |
| J  | SWI      | 2A     | Software interrupt                  | SWI Cx           | LR <= PC; PC <= Cx              |
| J  | JSUB     | 2B     | Jump to subroutine                  | JSUB Cx          | LR <= PC; PC <= PC + Cx         |
| J  | RET      | 2C     | Return from subroutine              | RET LR           | PC <= LR                        |
| J  | IRET     | 2D     | Return from interrupt handler       | IRET             | PC <= LR; INT 0                 |
| J  | JALR     | 2E     | Jump to subroutine                  | JR Rb            | LR <= PC; PC <= Rb              |
| L  | MFHI     | 40     | Move HI to GPR                      | MFHI Ra          | Ra <= HI                        |
| L  | MFLO     | 41     | Move LO to GPR                      | MFLO Ra          | Ra <= LO                        |
| L  | MTHI     | 42     | Move GPR to HI                      | MTHI Ra          | HI <= Ra                        |
| L  | MTLO     | 43     | Move GPR to LO                      | MTLO Ra          | LO <= Ra                        |
| L  | MULT     | 50     | Multiply for 64 bits result         | MULT Ra, Rb      | (HI,LO) <= MULT(Ra,Rb)          |
| L  | MULTU    | 51     | MULT for unsigned 64 bits           | MULTU Ra, Rb     | (HI,LO) <= MULTU(Ra,Rb)         |

### 2.1.3 The Status Register

The Cpu0 status word register (SW) contains the state of the Negative (N), Zero (Z), Carry (C), Overflow (V), and Interrupt (I), Trap (T), and Mode (M) boolean flags. The bit layout of the SW register is shown in Figure 2.3 below.

<sup>4</sup> Conditions include the following comparisons: >, >=, ==, !=, <, <=. SW is actually set by the subtraction of the two register operands, and the flags indicate which conditions are present.



Figure 2.3: Cpu0 status word (SW) register

When a CMP Ra, Rb instruction executes, the condition flags will change. For example:

- If Ra > Rb, then N = 0, Z = 0
- If Ra < Rb, then N = 1, Z = 0
- If Ra = Rb, then N = 0, Z = 1

The direction (i.e. taken/not taken) of the conditional jump instructions JGT, JLT, JGE, JLE, JEQ, JNE is determined by the N and Z flags in the SW register.

## 2.1.4 Cpu0's Stages of Instruction Execution

The Cpu0 architecture has a three-stage pipeline. The stages are instruction fetch (IF), decode (D), and execute (EX), and they occur in that order. Here is a description of what happens in the processor:

1. Instruction fetch
  - The Cpu0 fetches the instruction pointed to by the Program Counter (PC) into the Instruction Register (IR): IR = [PC].
  - The PC is then updated to point to the next instruction: PC = PC + 4.
2. Decode
  - The control unit decodes the instruction stored in IR, which routes necessary data stored in registers to the ALU, and sets the ALU's operation mode based on the current instruction's opcode.
3. Execute
  - The ALU executes the operation designated by the control unit upon data in registers. After the ALU is done, the result is stored in the destination register.

## 2.2 LLVM Structure

The text in this and the following section comes from the AOSA chapter on LLVM written by Chris Lattner<sup>4</sup>.

The most popular design for a traditional static compiler (like most C compilers) is the three phase design whose major components are the front end, the optimizer and the back end, as seen in Figure 2.4. The front end parses source code, checking it for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code. The AST is optionally converted to a new representation for optimization, and the optimizer and back end are run on the code.

The optimizer is responsible for doing a broad variety of transformations to try to improve the code's running time, such as eliminating redundant computations, and is usually more or less independent of language and target. The back end (also known as the code generator) then maps the code onto the target instruction set. In addition to making correct code, it is responsible for generating good code that takes advantage of unusual features of the supported architecture. Common parts of a compiler back end include instruction selection, register allocation, and instruction scheduling.



Figure 2.4: Three Major Components of a Three Phase Compiler

This model applies equally well to interpreters and JIT compilers. The Java Virtual Machine (JVM) is also an implementation of this model, which uses Java bytecode as the interface between the front end and optimizer.

The most important win of this classical design comes when a compiler decides to support multiple source languages or target architectures. If the compiler uses a common code representation in its optimizer, then a front end can be written for any language that can compile to it, and a back end can be written for any target that can compile from it, as shown in Figure 2.5.



Figure 2.5: Retargetability

With this design, porting the compiler to support a new source language (e.g., Algol or BASIC) requires implementing a new front end, but the existing optimizer and back end can be reused. If these parts weren't separated, implementing a new source language would require starting over from scratch, so supporting  $N$  targets and  $M$  source languages would need  $N*M$  compilers.

Another advantage of the three-phase design (which follows directly from retargetability) is that the compiler serves a broader set of programmers than it would if it only supported one source language and one target. For an open source project, this means that there is a larger community of potential contributors to draw from, which naturally leads to more enhancements and improvements to the compiler. This is the reason why open source compilers that serve many communities (like GCC) tend to generate better optimized machine code than narrower compilers like FreePASCAL. This isn't the case for proprietary compilers, whose quality is directly related to the project's budget. For example, the Intel ICC Compiler is widely known for the quality of code it generates, even though it serves a narrow audience.

A final major win of the three-phase design is that the skills required to implement a front end are different than those required for the optimizer and back end. Separating these makes it easier for a “front-end person” to enhance and maintain their part of the compiler. While this is a social issue, not a technical one, it matters a lot in practice, particularly for open source projects that want to reduce the barrier to contributing as much as possible.

The most important aspect of its design is the LLVM Intermediate Representation (IR), which is the form it uses to represent code in the compiler. LLVM IR is designed to host mid-level analyses and transformations that you find in the optimizer section of a compiler. It was designed with many specific goals in mind, including supporting lightweight

runtime optimizations, cross-function/interprocedural optimizations, whole program analysis, and aggressive restructuring transformations, etc. The most important aspect of it, though, is that it is itself defined as a first class language with well-defined semantics. To make this concrete, here is a simple example of a .ll file:

```

define i32 @add1(i32 %a, i32 %b) {
entry:
  %tmp1 = add i32 %a, %b
  ret i32 %tmp1
}
define i32 @add2(i32 %a, i32 %b) {
entry:
  %tmp1 = icmp eq i32 %a, 0
  br i1 %tmp1, label %done, label %recurse
recurse:
  %tmp2 = sub i32 %a, 1
  %tmp3 = add i32 %b, 1
  %tmp4 = call i32 @add2(i32 %tmp2, i32 %tmp3)
  ret i32 %tmp4
done:
  ret i32 %b
}
// This LLVM IR corresponds to this C code, which provides two different ways to
// add integers:
unsigned add1(unsigned a, unsigned b) {
  return a+b;
}
// Perhaps not the most efficient way to add two numbers.
unsigned add2(unsigned a, unsigned b) {
  if (a == 0) return b;
  return add2(a-1, b+1);
}

```

As you can see from this example, LLVM IR is a low-level RISC-like virtual instruction set. Like a real RISC instruction set, it supports linear sequences of simple instructions like add, subtract, compare, and branch. These instructions are in three address form, which means that they take some number of inputs and produce a result in a different register. LLVM IR supports labels and generally looks like a weird form of assembly language.

Unlike most RISC instruction sets, LLVM is strongly typed with a simple type system (e.g., i32 is a 32-bit integer, i32\*\* is a pointer to pointer to 32-bit integer) and some details of the machine are abstracted away. For example, the calling convention is abstracted through call and ret instructions and explicit arguments. Another significant difference from machine code is that the LLVM IR doesn't use a fixed set of named registers, it uses an infinite set of temporaries named with a % character.

Beyond being implemented as a language, LLVM IR is actually defined in three isomorphic forms: the textual format above, an in-memory data structure inspected and modified by optimizations themselves, and an efficient and dense on-disk binary “bitcode” format. The LLVM Project also provides tools to convert the on-disk format from text to binary: llvm-as assembles the textual .ll file into a .bc file containing the bitcode goop and llvm-dis turns a .bc file into a .ll file.

The intermediate representation of a compiler is interesting because it can be a “perfect world” for the compiler optimizer: unlike the front end and back end of the compiler, the optimizer isn't constrained by either a specific source language or a specific target machine. On the other hand, it has to serve both well: it has to be designed to be easy for a front end to generate and be expressive enough to allow important optimizations to be performed for real targets.

## 2.3 .td: LLVM's Target Description Files

The “mix and match” approach allows target authors to choose what makes sense for their architecture and permits a large amount of code reuse across different targets. This brings up another challenge: each shared component needs to be able to reason about target specific properties in a generic way. For example, a shared register allocator needs to know the register file of each target and the constraints that exist between instructions and their register operands. LLVM’s solution to this is for each target to provide a target description in a declarative domain-specific language (a set of .td files) processed by the `tblgen` tool. The (simplified) build process for the x86 target is shown in Figure 2.6.



Figure 2.6: Simplified x86 Target Definition

The different subsystems supported by the .td files allow target authors to build up the different pieces of their target. For example, the x86 back end defines a register class that holds all of its 32-bit registers named “GR32” (in the .td files, target specific definitions are all caps) like this:

```
def GR32 : RegisterClass<[i32], 32,
[EAX, ECX, EDX, ESI, EDI, EBX, EBP, ESP,
R8D, R9D, R10D, R11D, R14D, R15D, R12D, R13D]> { ... }
```

## 2.4 Creating the Initial Cpu0 .td Files

As has been discussed in the previous section, LLVM uses target description files (which use the .td file extension) to describe various components of a target’s backend. For example, these .td files may describe a target’s register set, instruction set, scheduling information for instructions, and calling conventions. When your backend is being compiled, the `tablegen` tool that ships with LLVM will translate these .td files into C++ source code written to files that have a .inc extension. Please refer to <sup>5</sup> for more information regarding how to use `tablegen`.

Every backend has a .td which defines some target information, including what other .td files are used by the backend. These files have a similar syntax to C++. For Cpu0, the target description file is called Cpu0.td, which is shown below:

<sup>5</sup> <http://llvm.org/docs/TableGenFundamentals.html>

### LLVMBackendTutorialExampleCode/Chapter2/Cpu0.td

```
1 //===== Cpu0.td - Describe the Cpu0 Target Machine -----*- tablegen -*====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====
9 // This is the top level entry point for the Cpu0 target.
10 //=====-----=====
11
12 //=====-----=====
13 // Target-independent interfaces
14 //=====-----=====
15
16 include "llvm/Target/Target.td"
17
18 //=====-----=====
19 // Register File, Calling Conv, Instruction Descriptions
20 //=====-----=====
21
22 include "Cpu0RegisterInfo.td"
23 include "Cpu0Schedule.td"
24 include "Cpu0InstrInfo.td"
25
26 def Cpu0InstrInfo : InstrInfo;
27
28 def Cpu0 : Target {
29 // def Cpu0InstrInfo : InstrInfo as before.
30     let InstructionSet = Cpu0InstrInfo;
31 }
```

Cpu0.td includes a few other .td files. Cpu0RegisterInfo.td (shown below) describes the Cpu0's set of registers. In this file, we see that registers have been given names, i.e. def PC indicates that there is a register called PC. Also, there is a register class named CPURegs that contains all of the other registers. You may have multiple register classes (see the X86 backend, for example) which can help you if certain instructions can only write to specific registers. In this case, there is only one set of general purpose registers for Cpu0, and some registers that are reserved so that they are not modified by instructions during execution.

### LLVMBackendTutorialExampleCode/Chapter2/Cpu0RegisterInfo.td

```
1 //===== Cpu0RegisterInfo.td - Cpu0 Register defs -----*- tablegen -*====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====
9
10 //=====-----=====
11 // Declarations that describe the Cpu0 register file
12 //=====-----=====
13
```

```

14
15 // We have banks of 16 registers each.
16 class Cpu0Reg<string n> : Register<n> {
17     field bits<4> Num;
18     let Namespace = "Cpu0";
19 }
20
21 // Cpu0 CPU Registers
22 class Cpu0GPRReg<bits<4> num, string n> : Cpu0Reg<n> {
23     let Num = num;
24 }
25
26 //=====
27 // Registers
28 //=====
29 // The register string, such as "9" or "gp" will show on "llvm-objdump -d"
30 let Namespace = "Cpu0" in {
31     // General Purpose Registers
32     def ZERO : Cpu0GPRReg< 0, "zero">, DwarfRegNum<[0]>;
33     def AT : Cpu0GPRReg< 1, "1">, DwarfRegNum<[1]>;
34     def V0 : Cpu0GPRReg< 2, "2">, DwarfRegNum<[2]>;
35     def V1 : Cpu0GPRReg< 3, "3">, DwarfRegNum<[3]>;
36     def A0 : Cpu0GPRReg< 4, "4">, DwarfRegNum<[6]>;
37     def A1 : Cpu0GPRReg< 5, "5">, DwarfRegNum<[7]>;
38     def T9 : Cpu0GPRReg< 6, "t9">, DwarfRegNum<[6]>;
39     def S0 : Cpu0GPRReg< 7, "7">, DwarfRegNum<[7]>;
40     def S1 : Cpu0GPRReg< 8, "8">, DwarfRegNum<[8]>;
41     def S2 : Cpu0GPRReg< 9, "9">, DwarfRegNum<[9]>;
42     def GP : Cpu0GPRReg< 10, "gp">, DwarfRegNum<[10]>;
43     def FP : Cpu0GPRReg< 11, "fp">, DwarfRegNum<[11]>;
44     def SW : Cpu0GPRReg< 12, "sw">, DwarfRegNum<[12]>;
45     def SP : Cpu0GPRReg< 13, "sp">, DwarfRegNum<[13]>;
46     def LR : Cpu0GPRReg< 14, "lr">, DwarfRegNum<[14]>;
47     def PC : Cpu0GPRReg< 15, "pc">, DwarfRegNum<[15]>;
48     // def MAR : Register< 16, "mar">, DwarfRegNum<[16]>;
49     // def MDR : Register< 17, "mdr">, DwarfRegNum<[17]>;
50 }
51
52 //=====
53 // Register Classes
54 //=====
55
56 def CPUREgs : RegisterClass<"Cpu0", [i32], 32, (add
57     // Reserved
58     ZERO, AT,
59     // Return Values and Arguments
60     V0, V1, A0, A1,
61     // Not preserved across procedure calls
62     T9,
63     // Callee save
64     S0, S1, S2,
65     // Reserved
66     GP, FP,
67     // Not preserved across procedure calls
68     SW,
69     // Reserved
70     SP, LR, PC)>;

```

In C++, classes typically provide a structure to lay out some data and functions, while definitions are used to allocate

memory for specific instances of a class. For example:

```
class Date { // declare Date
    int year, month, day;
};

Date birthday; // define birthday, an instance of Date
```

The class `Date` has the members `year`, `month`, and `day`, however these do not yet belong to an actual object. By defining an instance of `Date` called `birthday`, you have allocated memory for a specific object, and can set the `year`, `month`, and `day` of this instance of the class.

In `.td` files, classes describe the structure of how data is laid out, while definitions act as the specific instances of the classes. If we look back at the `Cpu0RegisterInfo.td` file, we see a class called `Cpu0Reg<string n>` which is derived from the `Register<n>` class provided by LLVM. `Cpu0Reg` inherits all the fields that exist in the `Register` class, and also adds a new field called `Num` which is four bits wide.

The `def` keyword is used to create instances of classes. In the following line, the `ZERO` register is defined as a member of the `Cpu0GPRReg` class:

```
def ZERO : Cpu0GPRReg< 0, "ZERO">, DwarfRegNum<[0]>;
```

The `def ZERO` indicates the name of this register. `< 0, "ZERO">` are the parameters used when creating this specific instance of the `Cpu0GPRReg` class, thus the four bit `Num` field is set to 0, and the string `n` is set to `ZERO`.

As the register lives in the `Cpu0` namespace, you can refer to the `ZERO` register in C++ code in a backend using `Cpu0::ZERO`.

---

### Todo

I might want to re-edit the following paragraph

---

Notice the use of the `let` expressions: these allow you to override values that are initially defined in a superclass. For example, `let Namespace = "Cpu0"` in the `Cpu0Reg` class will override the default namespace declared in `Register` class. The `Cpu0RegisterInfo.td` also defines that `CPUREgs` is an instance of the class `RegisterClass`, which is an built-in LLVM class. A `RegisterClass` is a set of `Register` instances, thus `CPUREgs` can be described as a set of registers.

The `cpu0` instructions `td` is named to `Cpu0InstrInfo.td` which contents as follows,

### LLVMBackendTutorialExampleCode/Chapter2/Cpu0InstrInfo.td

```
===== Cpu0InstrInfo.td - Target Description for Cpu0 Target -*- tablegen -*-//  
//  
//           The LLVM Compiler Infrastructure  
//  
// This file is distributed under the University of Illinois Open Source  
// License. See LICENSE.TXT for details.  
//  
=====//  
//  
// This file contains the Cpu0 implementation of the TargetInstrInfo class.  
//  
=====//  
//=====//  
// Instruction format superclass  
=====//
```

```

include "Cpu0InstrFormats.td"

//=====
// Cpu0 profiles and nodes
//=====

def SDT_Cpu0Ret      : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

// Return
def Cpu0Ret : SDNode<"Cpu0ISD::Ret", SDT_Cpu0Ret, [SDNPHasChain,
SDNPOptInGlue]>;

//=====
// Cpu0 Operand, Complex Patterns and Transformations Definitions.
//=====

// Signed Operand
def simm16      : Operand<i32> {
    let DecoderMethod= "DecodeSimm16";
}

// Address operand
def mem : Operand<i32> {
    let PrintMethod = "printMemOperand";
    let MIOOperandInfo = (ops CPURegs, simm16);
    let EncoderMethod = "getMemEncoding";
}

// Node immediate fits as 16-bit sign extended on target immediate.
// e.g. addi, andi
def immSExt16 : PatLeaf<(imm), [{ return isInt<16>(N->getSExtValue()); }]>;

// Cpu0 Address Mode! SDNode frameindex could possibly be a match
// since load and store instructions from stack used it.
def addr : ComplexPattern<iPTR, 2, "SelectAddr", [frameindex], [SDNPWantParent]>;

//=====
// Pattern fragment for load/store
//=====

class AlignedLoad<PatFrag Node> :
    PatFrag<(ops node:$ptr), (Node node:$ptr), [{{
        LoadSDNode *LD = cast<LoadSDNode>(N);
        return LD->getMemoryVT().getSizeInBits()/8 <= LD->getAlignment();
    }}]>;

class AlignedStore<PatFrag Node> :
    PatFrag<(ops node:$val, node:$ptr), (Node node:$val, node:$ptr), [{{
        StoreSDNode *SD = cast<StoreSDNode>(N);
        return SD->getMemoryVT().getSizeInBits()/8 <= SD->getAlignment();
    }}]>;

// Load/Store PatFrgs.
def load_a      : AlignedLoad<load>;
def store_a     : AlignedStore<store>;

//=====
// Instructions specific format

```

```

//=====

// Arithmetic and logical instructions with 2 register operands.
class ArithLogicI<bits<8> op, string instr_asm, SDNode OpNode,
    Operand Od, PatLeaf imm_type, RegisterClass RC> :
    FL<op, (outs RC:$ra), (ins RC:$rb, Od:$imm16),
    !strconcat(instr_asm, "\t$ra, $rb, $imm16"),
    [(set RC:$ra, (OpNode RC:$rb, imm_type:$imm16))], IIAlu> {
    let isReMaterializable = 1;
}

class FMem<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
    InstrItinClass itin>: FL<op, outs, ins, asmstr, pattern, itin> {
    bits<20> addr;
    let Inst{19-16} = addr{19-16};
    let Inst{15-0} = addr{15-0};
    let DecoderMethod = "DecodeMem";
}

// Memory Load/Store
let canFoldAsLoad = 1 in
class LoadM<bits<8> op, string instr_asm, PatFrag OpNode, RegisterClass RC,
    Operand MemOpnd, bit Pseudo>:
    FMem<op, (outs RC:$ra), (ins MemOpnd:$addr),
    !strconcat(instr_asm, "\t$ra, $addr"),
    [(set RC:$ra, (OpNode addr:$addr))], IILoad> {
    let isPseudo = Pseudo;
}

class StoreM<bits<8> op, string instr_asm, PatFrag OpNode, RegisterClass RC,
    Operand MemOpnd, bit Pseudo>:
    FMem<op, (outs), (ins RC:$ra, MemOpnd:$addr),
    !strconcat(instr_asm, "\t$ra, $addr"),
    [(OpNode RC:$ra, addr:$addr)], IIStore> {
    let isPseudo = Pseudo;
}

// 32-bit load.
multiclass LoadM32<bits<8> op, string instr_asm, PatFrag OpNode,
    bit Pseudo = 0> {
    def #NAME# : LoadM<op, instr_asm, OpNode, CPURegs, mem, Pseudo>;
}

// 32-bit store.
multiclass StoreM32<bits<8> op, string instr_asm, PatFrag OpNode,
    bit Pseudo = 0> {
    def #NAME# : StoreM<op, instr_asm, OpNode, CPURegs, mem, Pseudo>;
}

//=====

// Instruction definition
//=====

// Cpu0I Instructions
//=====

/// Load and Store Instructions

```

```

/// aligned
defm LD      : LoadM32<0x01, "ld", load_a>;
defm ST      : StoreM32<0x02, "st", store_a>;

/// Arithmetic Instructions (ALU Immediate)
// IR "add" defined in include/llvm/Target/TargetSelectionDAG.td, line 315 (def add).
def ADDiu   : ArithLogicI<0x09, "addiu", add, simm16, immSExt16, CPURegs>;

let isReturn=1, isTerminator=1, hasDelaySlot=1, isCodeGenOnly=1,
    isBarrier=1, hasCtrlDep=1 in
def RET : FJ <0x2C, (outs), (ins CPURegs:$target),
    "ret\t$t$target", [(Cpu0Ret CPURegs:$target)], IIBranch>;

//=====
// Arbitrary patterns that map to one or more instructions
//=====

// Small immediates

def : Pat<(i32 immSExt16:$in),
    (ADDiu ZERO, imm:$in)>;

```

The Cpu0InstrFormats.td is included by Cpu0InstInfo.td as follows,

### LLVMBackendTutorialExampleCode/Chapter2/Cpu0InstrFormats.td

```

1 //===== Cpu0InstrFormats.td - Cpu0 Instruction Formats -----*-----// 
2 // 
3 //          The LLVM Compiler Infrastructure
4 // 
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 // 
8 //=====-----// 
9 
10 //=====-----// 
11 // Describe CPU0 instructions format
12 // 
13 // CPU INSTRUCTION FORMATS
14 // 
15 // opcode - operation code.
16 // ra   - dst reg, only used on 3 reg instr.
17 // rb   - src reg.
18 // rc   - src reg (on a 3 reg instr).
19 // cx   - immediate
20 // 
21 //=====-----// 
22 
23 // Format specifies the encoding used by the instruction. This is part of the
24 // ad-hoc solution used to emit machine instruction encodings by our machine
25 // code emitter.
26 class Format<bits<4> val> {
27     bits<4> Value = val;
28 }
29 
30 def Pseudo    : Format<0>;
31 def FrmA     : Format<1>;

```

```

32 def FrmL      : Format<2>;
33 def FrmJ      : Format<3>;
34 def FrmOther  : Format<4>; // Instruction w/ a custom format
35
36 // Generic Cpu0 Format
37 class Cpu0Inst<dag outs, dag ins, string asmstr, list<dag> pattern,
38           InstrItinClass itin, Format f>: Instruction
39 {
40     field bits<32> Inst;
41     Format Form = f;
42
43     let Namespace = "Cpu0";
44
45     let Size = 4;
46
47     bits<8> Opcode = 0;
48
49     // Top 8 bits are the 'opcode' field
50     let Inst{31-24} = Opcode;
51
52     let OutOperandList = outs;
53     let InOperandList = ins;
54
55     let AsmString = asmstr;
56     let Pattern = pattern;
57     let Itinerary = itin;
58
59     //
60     // Attributes specific to Cpu0 instructions...
61     //
62     bits<4> FormBits = Form.Value;
63
64     // TSFlags layout should be kept in sync with Cpu0InstrInfo.h.
65     let TSFlags{3-0} = FormBits;
66
67     let DecoderNamespace = "Cpu0";
68
69     field bits<32> SoftFail = 0;
70 }
71
72 //=====//  

73 // Format A instruction class in Cpu0 : </opcode/ra/rb/rc/cx/>  

74 //=====//  

75
76 class FA<bits<8> op, dag outs, dag ins, string asmstr,
77           list<dag> pattern, InstrItinClass itin>:
78     Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmA>
79 {
80     bits<4> ra;
81     bits<4> rb;
82     bits<4> rc;
83     bits<12> shamt;
84
85     let Opcode = op;
86
87     let Inst{23-20} = ra;
88     let Inst{19-16} = rb;
89     let Inst{15-12} = rc;

```

```

90     let Inst{11-0} = shamt;
91 }
92
93 //=====//
94 // Format L instruction class in Cpu0 : </opcode/ra/rb/cx/>
95 //=====//
96
97 class FL<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
98           InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmL>
99 {
100    bits<4> ra;
101    bits<4> rb;
102    bits<16> imm16;
103
104    let Opcode = op;
105
106    let Inst{23-20} = ra;
107    let Inst{19-16} = rb;
108    let Inst{15-0} = imm16;
109 }
110
111 //=====//
112 // Format J instruction class in Cpu0 : </opcode/address/>
113 //=====//
114
115 class FJ<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
116           InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmJ>
117 {
118    bits<24> addr;
119
120    let Opcode = op;
121
122    let Inst{23-0} = addr;
123 }

```

ADDiu is class ArithLogicI inherited from FL, can expand and get member value as follows,

```

def ADDiu : ArithLogicI<0x09, "addiu", add, simm16, immSExt16, CPUREgs>;
// Arithmetic and logical instructions with 2 register operands.
class ArithLogicI<bits<8> op, string instr_asm, SDNode OpNode,
                  Operand Od, PatLeaf imm_type, RegisterClass RC> :
    FL<op, (outs RC:$ra, (ins RC:$rb, Od:$imm16),
             !strconcat(instr_asm, "\t$ra, $rb, $imm16"),
             [(set RC:$ra, (OpNode RC:$rb, imm_type:$imm16))], IIAlu> {
        let isReMaterializable = 1;
    }

So,
op = 0x09
instr_asm = "addiu"
OpNode = add
Od = simm16
imm_type = immSExt16
RC = CPUREgs

```

Expand with FL further,

```

: FL<op, (outs RC:$ra), (ins RC:$rb, Od:$imm16),
!strconcat(instr_asm, "\t$ra, $rb, $imm16"),
[(set RC:$ra, (OpNode RC:$rb, imm_type:$imm16))], IIAlu>

class FL<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmL>
{
    bits<4> ra;
    bits<4> rb;
    bits<16> imm16;

    let Opcode = op;

    let Inst{23-20} = ra;
    let Inst{19-16} = rb;
    let Inst{15-0} = imm16;
}

So,
op = 0x09
outs = CPURegs:$ra
ins = CPURegs:$rb, imm16:$imm16
asmstr = "addiu\t$ra, $rb, $imm16"
pattern = [(set CPURegs:$ra, (add RC:$rb, immSExt16:$imm16))]
itin = IIAlu

Members are,
ra = CPURegs:$ra
rb = CPURegs:$rb
imm16 = imm16:$imm16
Opcode = 0x09;
Inst{23-20} = CPURegs:$ra;
Inst{19-16} = CPURegs:$rb;
Inst{15-0} = imm16:$imm16;

```

Expand with Cpu0Inst further,

```

class FL<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmL>

class Cpu0Inst<dag outs, dag ins, string asmstr, list<dag> pattern,
InstrItinClass itin, Format f>: Instruction
{
    field bits<32> Inst;
    Format Form = f;

    let Namespace = "Cpu0";

    let Size = 4;

    bits<8> Opcode = 0;

    // Top 8 bits are the 'opcode' field
    let Inst{31-24} = Opcode;

    let OutOperandList = outs;
    let InOperandList = ins;

```

```

let AsmString      = asmstr;
let Pattern        = pattern;
let Itinerary      = itin;

//
// Attributes specific to Cpu0 instructions...
//
bits<4> FormBits = Form.Value;

// TSFlags layout should be kept in sync with Cpu0InstrInfo.h.
let TSFlags{3-0}   = FormBits;

let DecoderNamespace = "Cpu0";

field bits<32> SoftFail = 0;
}

So,
outs = CPUREgs:$ra
ins = CPUREgs:$rb,simm16:$imm16
asmstr = "addiu\t$ra, $rb, $imm16"
pattern = [(set CPUREgs:$ra, (add RC:$rb, immSExt16:$imm16))]
itin = IIAlu
f = FrmL

Members are,
Inst{31-24} = 0x09;
OutOperandList = CPUREgs:$ra
InOperandList = CPUREgs:$rb,simm16:$imm16
AsmString = "addiu\t$ra, $rb, $imm16"
Pattern = [(set CPUREgs:$ra, (add RC:$rb, immSExt16:$imm16))]
Itinerary = IIAlu

Summary with all members are,
// Inherited from parent like Instruction
Namespace = "Cpu0";
DecoderNamespace = "Cpu0";
Inst{31-24} = 0x08;
Inst{23-20} = CPUREgs:$ra;
Inst{19-16} = CPUREgs:$rb;
Inst{15-0}  = simm16:$imm16;
OutOperandList = CPUREgs:$ra
InOperandList = CPUREgs:$rb,simm16:$imm16
AsmString = "addiu\t$ra, $rb, $imm16"
Pattern = [(set CPUREgs:$ra, (add RC:$rb, immSExt16:$imm16))]
Itinerary = IIAlu
// From Cpu0Inst
Opcode = 0x09;
// From FL
ra = CPUREgs:$ra
rb = CPUREgs:$rb
imm16 = simm16:$imm16

```

It's a lousy process. Similarly, LD and ST instruction definition can be expanded in this way. Please notify the Pattern = [(set CPUREgs:\$ra, (add RC:\$rb, immSExt16:\$imm16))] which include keyword “**add**”. We will use it in DAG transformations later.

## 2.5 Write cmake file

Target/Cpu0 directory has two files CMakeLists.txt and LLVMBuild.txt, contents as follows,

### LLVMBackendTutorialExampleCode/Chapter2/CMakeLists.txt

```

1  # CMakeLists.txt
2  # Our td all in Cpu0.td, Cpu0RegisterInfo.td and Cpu0InstrInfo.td included in
3  # Cpu0.td.
4  set(LLVM_TARGET_DEFINITIONS Cpu0.td)
5
6  # Generate Cpu0GenRegisterInfo.inc and Cpu0GenInstrInfo.inc which included by
7  # your hand code C++ files.
8  # Cpu0GenRegisterInfo.inc came from Cpu0RegisterInfo.td, Cpu0GenInstrInfo.inc
9  # came from Cpu0InstrInfo.td.
10 tablegen(LLVM Cpu0GenRegisterInfo.inc -gen-register-info)
11 tablegen(LLVM Cpu0GenInstrInfo.inc -gen-instr-info)
12 tablegen(LLVM Cpu0GenSubtargetInfo.inc -gen-subtarget)
13
14 # Cpu0CommonTableGen must be defined
15 add_public_tablegen_target(Cpu0CommonTableGen)
16
17 # Cpu0CodeGen should match with LLVMBuild.txt Cpu0CodeGen
18 add_llvm_target(Cpu0CodeGen
19     Cpu0TargetMachine.cpp
20 )
21
22 # Should match with "subdirectories = MCTargetDesc TargetInfo" in LLVMBuild.txt
23 add_subdirectory(TargetInfo)
24 add_subdirectory(MCTargetDesc)

```

### LLVMBackendTutorialExampleCode/Chapter2/LLVMBuild.txt

```

1  ;===== ./lib/Target/Cpu0/LLVMBuild.txt -----* Conf -----;
2  ;
3  ; The LLVM Compiler Infrastructure
4  ;
5  ; This file is distributed under the University of Illinois Open Source
6  ; License. See LICENSE.TXT for details.
7  ;
8  ;=====;
9  ;
10 ; This is an LLVMBuild description file for the components in this subdirectory.
11 ;
12 ; For more information on the LLVMBuild system, please see:
13 ;
14 ;   http://llvm.org/docs/LLVMBuild.html
15 ;
16 ;=====;
17
18 # Following comments extracted from http://llvm.org/docs/LLVMBuild.html
19
20 [common]
21 subdirectories = MCTargetDesc TargetInfo

```

```
22 [component_0]
23 # TargetGroup components are an extension of LibraryGroups, specifically for
24 # defining LLVM targets (which are handled specially in a few places).
25 type = TargetGroup
26 # The name of the component should always be the name of the target. (should
27 # match "def Cpu0 : Target" in Cpu0.td)
28 name = Cpu0
29 # Cpu0 component is located in directory Target/
30 parent = Target
31 # Whether this target defines an assembly parser, assembly printer, disassembler
32 # , and supports JIT compilation. They are optional.
33 #has_asmparser = 1
34 #has_asmprinter = 1
35 #has_disassembler = 1
36 #has_jit = 1
37
38 [component_1]
39 # component_1 is a Library type and name is Cpu0CodeGen. After build it will
40 # in lib/libLLVMCpu0CodeGen.a of your build command directory.
41 type = Library
42 name = Cpu0CodeGen
43 # Cpu0CodeGen component (Library) is located in directory Cpu0/
44 parent = Cpu0
45 # If given, a list of the names of Library or LibraryGroup components which
46 # must also be linked in whenever this library is used. That is, the link time
47 # dependencies for this component. When tools are built, the build system will
48 # include the transitive closure of all required_libraries for the components
49 # the tool needs.
50 required_libraries = CodeGen Core MC Cpu0Desc Cpu0Info SelectionDAG Support
51 Target
52 # All LLVMBuild.txt in Target/Cpu0 and subdirectory use 'add_to_library_groups
53 # = Cpu0'
54 add_to_library_groups = Cpu0
```

CMakeLists.txt is the make information for cmake, # is comment. LLVMBuild.txt files are written in a simple variant of the INI or configuration file format. Comments are prefixed by # in both files. We explain the setting for these 2 files in comments. Please spend a little time to read it.

Both CMakeLists.txt and LLVMBuild.txt coexist in sub-directories MCTargetDesc and TargetInfo. Their contents indicate they will generate Cpu0Desc and Cpu0Info libraries. After building, you will find three libraries: libLLVMCpu0CodeGen.a, libLLVMCpu0Desc.a and libLLVMCpu0Info.a in lib/ of your build directory. For more details please see “Building LLVM with CMake”<sup>6</sup> and “LLVMBuild Guide”<sup>7</sup>.

## 2.6 Target Registration

You must also register your target with the TargetRegistry, which is what other LLVM tools use to be able to lookup and use your target at runtime. The TargetRegistry can be used directly, but for most targets there are helper templates which should take care of the work for you.

All targets should declare a global Target object which is used to represent the target during registration. Then, in the target’s TargetInfo library, the target should define that object and use the RegisterTarget template to register the target. For example, the file TargetInfo/Cpu0TargetInfo.cpp register TheCpu0Target for big endian and TheCpu0elTarget for little endian, as follows.

---

<sup>6</sup> <http://llvm.org/docs/CMake.html>

<sup>7</sup> <http://llvm.org/docs/LLVMBuild.html>

**LLVMBackendTutorialExampleCode/Chapter2/TargetInfo/Cpu0TargetInfo.cpp**

```

1 //===== Cpu0TargetInfo.cpp - Cpu0 Target Implementation =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9
10 #include "Cpu0.h"
11 #include "llvm/IR/Module.h"
12 #include "llvm/Support/TargetRegistry.h"
13 using namespace llvm;
14
15 Target llvm::TheCpu0Target, llvm::TheCpu0elTarget;
16
17 extern "C" void LLVMInitializeCpu0TargetInfo() {
18     RegisterTarget<Triple::cpu0,
19         /*HasJIT=*/true> X(TheCpu0Target, "cpu0", "Cpu0");
20
21     RegisterTarget<Triple::cpu0el,
22         /*HasJIT=*/true> Y(TheCpu0elTarget, "cpu0el", "Cpu0el");
23 }

```

Files Cpu0TargetMachine.cpp and MCTargetDesc/Cpu0MCTargetDesc.cpp just define the empty initialize function since we register nothing in them for this moment.

**LLVMBackendTutorialExampleCode/Chapter2/MCTargetDesc/Cpu0MCTargetDesc.h**

```

1 //===== Cpu0MCTargetDesc.h - Cpu0 Target Descriptions ----- C++ -*==//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9
10 // This file provides Cpu0 specific target descriptions.
11 //
12 //=====//
13
14 #ifndef CPU0MCTARGETDESC_H
15 #define CPU0MCTARGETDESC_H
16
17 namespace llvm {
18     class Target;
19
20     extern Target TheCpu0Target;
21     extern Target TheCpu0elTarget;
22 } // End llvm namespace
23
24 // Defines symbolic names for Cpu0 registers. This defines a mapping from
25 // register name to register number.
26 #define GET_REGINFO_ENUM

```

```
27 #include "Cpu0GenRegisterInfo.inc"
28
29 // Defines symbolic names for the Cpu0 instructions.
30 #define GET_INSTRINFO_ENUM
31 #include "Cpu0GenInstrInfo.inc"
32
33 #define GET_SUBTARGETINFO_ENUM
34 #include "Cpu0GenSubtargetInfo.inc"
35 #endif
```

### LLVMBackendTutorialExampleCode/Chapter2/MCTargetDesc/Cpu0MCTargetDesc.cpp

```
1 //===== Cpu0MCTargetDesc.cpp - Cpu0 Target Descriptions =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // This file provides Cpu0 specific target descriptions.
11 //
12 //=====//
13
14 #include "Cpu0MCTargetDesc.h"
15 #include "llvm/MC/MachineLocation.h"
16 #include "llvm/MC/MCCodeGenInfo.h"
17 #include "llvm/MC/MCInstrInfo.h"
18 #include "llvm/MC/MCRegisterInfo.h"
19 #include "llvm/MC/MCStreamer.h"
20 #include "llvm/MC/MCSubtargetInfo.h"
21 #include "llvm/Support/ErrorHandling.h"
22 #include "llvm/Support/TargetRegistry.h"
23
24 #define GET_INSTRINFO_MC_DESC
25 #include "Cpu0GenInstrInfo.inc"
26
27 #define GET_SUBTARGETINFO_MC_DESC
28 #include "Cpu0GenSubtargetInfo.inc"
29
30 #define GET_REGINFO_MC_DESC
31 #include "Cpu0GenRegisterInfo.inc"
32
33 using namespace llvm;
34
35
36 extern "C" void LLVMInitializeCpu0TargetMC() {
37 }
```

Please see “Target Registration”<sup>8</sup> for reference.

---

<sup>8</sup> <http://llvm.org/docs/WritingAnLLVMBackend.html#target-registration>

## 2.7 Build libraries and td

The llvm source code is put in /Users/Jonathan/llvm/release/src and have llvm release-build in /Users/Jonathan/llvm/release/configure\_release\_build. About how to build llvm, please refer <sup>9</sup>. We made a copy from /Users/Jonathan/llvm/release/src to /Users/Jonathan/llvm/test/src for working with my Cpu0 target back end. Sub-directories src is for source code and cmake\_debug\_build is for debug build directory.

Except directory src/lib/Target/Cpu0, there are a couple of files modified to support cpu0 new Target. Please check files in src\_files\_modify/src\_files\_modified/src/.

You can update your llvm working copy and find the modified files by command,

```
cp -rf LLVMBackendTutorialExampleCode/src_files_modified/src_files_modified/src/
* yourllvm/workingcopy/sourcedir/.
```

```
118-165-78-230:test Jonathan$ pwd
/Users/Jonathan/test
118-165-78-230:test Jonathan$ grep -R "cpu0" src
src/cmake/config-ix.cmake:elseif (LLVM_NATIVE_ARCH MATCHES "cpu0")
src/include/llvm/ADT/Triple.h:#undef cpu0
src/include/llvm/ADT/Triple.h:    cpu0,      // Gamma add
src/include/llvm/ADT/Triple.h:    cpu0el,
src/include/llvm/Support/ELF.h:  EF_CPU0_ARCH_32R2 = 0x70000000, // cpu032r2
src/include/llvm/Support/ELF.h:  EF_CPU0_ARCH_64R2 = 0x80000000, // cpu064r2
src/lib/Support/Triple.cpp:  case cpu0:    return "cpu0";
...
...
```

Now, run the cmake command and Xcode to build td (the following cmake command is for my setting),

```
118-165-78-230:test Jonathan$ cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -G "Unix Makefiles" ../src/
-- Targeting Cpu0
...
-- Targeting XCore
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/Jonathan/llvm/test/cmake_debug_build
```

```
118-165-78-230:test Jonathan$
```

After build, you can type command llc --version to find the cpu0 backend,

```
118-165-78-230:test Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/bin/Debug/llc --version
LLVM (http://llvm.org/):
...
Registered Targets:
arm      - ARM
cellspu  - STI CBEA Cell SPU [experimental]
cpp      - C++ backend
cpu0    - Cpu0
cpu0el  - Cpu0el
...
```

---

<sup>9</sup> [http://clang.llvm.org/get\\_started.html](http://clang.llvm.org/get_started.html)

The `llc -version` can display “`cpu0`” and “`cpu0el`” message, because the following code from file `Target-Info/Cpu0TargetInfo.cpp` what in “section Target Registration”<sup>10</sup> we made. List them as follows again,

### LLVMBackendTutorialExampleCode/Chapter2/TargetInfo/Cpu0TargetInfo.cpp

```
1 //===== Cpu0TargetInfo.cpp - Cpu0 Target Implementation =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9
10 #include "Cpu0.h"
11 #include "llvm/IR/Module.h"
12 #include "llvm/Support/TargetRegistry.h"
13 using namespace llvm;
14
15 Target llvm::TheCpu0Target, llvm::TheCpu0elTarget;
16
17 extern "C" void LLVMInitializeCpu0TargetInfo() {
18     RegisterTarget<Triple::cpu0,
19         /*HasJIT=*/true> X(TheCpu0Target, "cpu0", "Cpu0");
20
21     RegisterTarget<Triple::cpu0el,
22         /*HasJIT=*/true> Y(TheCpu0elTarget, "cpu0el", "Cpu0el");
23 }
```

Let's build LLVMBackendTutorialExampleCode/Chapter2 code as follows,

```
118-165-75-57:ExampleCode Jonathan$ pwd
/Users/Jonathan/llvm/test/src/lib/Target/Cpu0/ExampleCode
118-165-75-57:ExampleCode Jonathan$ sh removecpu0.sh
118-165-75-57:ExampleCode Jonathan$ cp -rf LLVMBackendTutorialExampleCode/Chapter2/
* ../.

118-165-75-57:cmake_debug_build Jonathan$ pwd
/Users/Jonathan/llvm/test/cmake_debug_build
118-165-75-57:cmake_debug_build Jonathan$ rm -rf lib/Target/Cpu0/*
118-165-75-57:cmake_debug_build Jonathan$ cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -G "Xcode" ../src/
...
-- Targeting Cpu0
...
-- Targeting XCore
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/Jonathan/llvm/test/cmake_debug_build
```

Now try to do `llc` command to compile input file `ch3.cpp` as follows,

---

<sup>10</sup> <http://jonathan2251.github.com/lbd/llmstructure.html#target-registration>

### LLVMBackendTutorialExampleCode/InputFiles/ch3.cpp

```
1 int main()
2 {
3     return 0;
4 }
```

First step, compile it with clang and get output ch3.bc as follows,

```
[Gamma@localhost InputFiles]$ clang -c ch3.cpp -emit-llvm -o ch3.bc
```

Next step, transfer bitcode .bc to human readable text format as follows,

```
118-165-78-230:test Jonathan$ llvmdis ch3.bc -o ch3.ll

// ch3.ll
; ModuleID = 'ch3.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3
2:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:6
4-S128"
target triple = "x86_64-unknown-linux-gnu"

define i32 @main() nounwind uwtable {
    %1 = alloca i32, align 4
    store i32 0, i32* %1
    ret i32 0
}
```

Now, compile ch3.bc into ch3.cpu0.s, we get the error message as follows,

```
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch3.bc -o
ch3.cpu0.s
Assertion failed: (target.get() && "Could not allocate target machine!"),
function main, file /Users/Jonathan/llvm/test/src/tools/llc/llc.cpp,
line 271.
...
```

Currently we just define target td files (Cpu0.td, Cpu0RegisterInfo.td, ...). According to LLVM structure, we need to define our target machine and include those td related files. The error message say we didn't define our target machine.



# BACKEND STRUCTURE

This chapter introduce the back end class inherit tree and class members first. Next, following the back end structure, adding individual class implementation in each section. There are compiler knowledge like DAG (Directed-Acyclic-Graph) and instruction selection needed in this chapter. This chapter explains these knowledge just when needed. At the end of this chapter, we will have a back end to compile llvm intermediate code into cpu0 assembly code.

Many code are added in this chapter. They almost are common in every back end except the back end name (cpu0 or mips ...). Actually, we copy almost all the code from mips and replace the name with cpu0. Please focus on the classes relationship in this backend structure. Once knowing the structure, you can create your backend structure as quickly as we did, even though there are 3000 lines of code in this chapter.

## 3.1 TargetMachine structure

Your back end should define a TargetMachine class, for example, we define the Cpu0TargetMachine class. Cpu0TargetMachine class contains it's own instruction class, frame/stack class, DAG (Directed-Acyclic-Graph) class, and register class. The Cpu0TargetMachine contents and it's own class as follows,

include/llvm/Target/Cpu0TargetMachine.h

```
//- TargetMachine.h
class TargetMachine {
    TargetMachine(const TargetMachine &) LLVM_DELETED_FUNCTION;
    void operator=(const TargetMachine &) LLVM_DELETED_FUNCTION;
...
public:
    // Interfaces to the major aspects of target machine information:
    // -- Instruction opcode and operand information
    // -- Pipelines and scheduling information
    // -- Stack frame information
    // -- Selection DAG lowering information
    //
    virtual const TargetInstrInfo      *getInstrInfo() const { return 0; }
    virtual const TargetFrameLowering *getFrameLowering() const { return 0; }
    virtual const TargetLowering      *getTargetLowering() const { return 0; }
    virtual const TargetSelectionDAGInfo *getSelectionDAGInfo() const { return 0; }
    virtual const DataLayout          *getDataLayout() const { return 0; }
...
/// getSubtarget - This method returns a pointer to the specified type of
/// TargetSubtargetInfo. In debug builds, it verifies that the object being
/// returned is of the correct type.
```

```

template<typename STC> const STC &getSubtarget() const {
return *static_cast<const STC*>(getSubtargetImpl());
}

...

class LLVMTargetMachine : public TargetMachine {
protected: // Can only create subclasses.
LLVMTargetMachine(const Target &T, StringRef TargetTriple,
StringRef CPU, StringRef FS, TargetOptions Options,
Reloc::Model RM, CodeModel::Model CM,
CodeGenOpt::Level OL);
...
};

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0TargetMachine.h

```

1 //===== Cpu0TargetMachine.h - Define TargetMachine for Cpu0 -----*-- C++ -*====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====//
9 //
10 // This file declares the Cpu0 specific subclass of TargetMachine.
11 //
12 //=====-----=====//
13
14 #ifndef CPU0TARGETMACHINE_H
15 #define CPU0TARGETMACHINE_H
16
17 #include "Cpu0FrameLowering.h"
18 #include "Cpu0InstrInfo.h"
19 #include "Cpu0ISelLowering.h"
20 #include "Cpu0SelectionDAGInfo.h"
21 #include "Cpu0Subtarget.h"
22 #include "llvm/Target/TargetMachine.h"
23 #include "llvm/IR/DataLayout.h"
24 #include "llvm/Target/TargetFrameLowering.h"
25
26 namespace llvm {
27     class formatted_raw_ostream;
28
29     class Cpu0TargetMachine : public LLVMTargetMachine {
30         Cpu0Subtarget Subtarget;
31         const DataLayout DL; // Calculates type size & alignment
32         Cpu0InstrInfo InstrInfo; //-- Instructions
33         Cpu0FrameLowering FrameLowering; //-- Stack(Frame) and Stack direction
34         Cpu0TargetLowering TLInfo; //-- Stack(Frame) and Stack direction
35         Cpu0SelectionDAGInfo TSInfo; //-- Map .bc DAG to backend DAG
36
37     public:
38         Cpu0TargetMachine(const Target &T, StringRef TT,
39                           StringRef CPU, StringRef FS, const TargetOptions &Options,

```

```

40             Reloc::Model RM, CodeModel::Model CM,
41             CodeGenOpt::Level OL,
42             bool isLittle);
43
44     virtual const Cpu0InstrInfo *getInstrInfo() const
45     { return &InstrInfo; }
46     virtual const TargetFrameLowering *getFrameLowering() const
47     { return &FrameLowering; }
48     virtual const Cpu0Subtarget *getSubtargetImpl() const
49     { return &Subtarget; }
50     virtual const DataLayout *getDataLayout() const
51     { return &DL; }
52
53     virtual const Cpu0RegisterInfo *getRegisterInfo() const {
54         return &InstrInfo.getRegisterInfo();
55     }
56
57     virtual const Cpu0TargetLowering *getTargetLowering() const {
58         return &TLInfo;
59     }
60
61     virtual const Cpu0SelectionDAGInfo* getSelectionDAGInfo() const {
62         return &TSInfo;
63     }
64
65     // Pass Pipeline Configuration
66     virtual TargetPassConfig *createPassConfig(PassManagerBase &PM);
67 };
68
69 /// Cpu0ebTargetMachine - Cpu032 big endian target machine.
70 ///
71 class Cpu0ebTargetMachine : public Cpu0TargetMachine {
72     virtual void anchor();
73 public:
74     Cpu0ebTargetMachine(const Target &T, StringRef TT,
75                         StringRef CPU, StringRef FS, const TargetOptions &Options,
76                         Reloc::Model RM, CodeModel::Model CM,
77                         CodeGenOpt::Level OL);
78 };
79
80 /// Cpu0elTargetMachine - Cpu032 little endian target machine.
81 ///
82 class Cpu0elTargetMachine : public Cpu0TargetMachine {
83     virtual void anchor();
84 public:
85     Cpu0elTargetMachine(const Target &T, StringRef TT,
86                         StringRef CPU, StringRef FS, const TargetOptions &Options,
87                         Reloc::Model RM, CodeModel::Model CM,
88                         CodeGenOpt::Level OL);
89 };
90 } // End llvm namespace
91
92 #endif

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0TargetMachine.cpp

```
1 //===== Cpu0TargetMachine.cpp - Define TargetMachine for Cpu0 =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // Implements the info about Cpu0 target spec.
11 //
12 //=====//
13
14 #include "Cpu0TargetMachine.h"
15 #include "Cpu0.h"
16 #include "llvm/PassManager.h"
17 #include "llvm/CodeGen/Passes.h"
18 #include "llvm/Support/TargetRegistry.h"
19 using namespace llvm;
20
21 extern "C" void LLVMInitializeCpu0Target() {
22     // Register the target.
23     // Big endian Target Machine
24     RegisterTargetMachine<Cpu0ebTargetMachine> X(TheCpu0Target);
25     // Little endian Target Machine
26     RegisterTargetMachine<Cpu0elTargetMachine> Y(TheCpu0elTarget);
27 }
28
29 // DataLayout --> Big-endian, 32-bit pointer/ABI/alignment
30 // The stack is always 8 byte aligned
31 // On function prologue, the stack is created by decrementing
32 // its pointer. Once decremented, all references are done with positive
33 // offset from the stack/frame pointer, using StackGrowsUp enables
34 // an easier handling.
35 // Using CodeModel::Large enables different CALL behavior.
36 Cpu0TargetMachine::
37 Cpu0TargetMachine(const Target &T, StringRef TT,
38                   StringRef CPU, StringRef FS, const TargetOptions &Options,
39                   Reloc::Model RM, CodeModel::Model CM,
40                   CodeGenOpt::Level OL,
41                   bool isLittle)
42     // Default is big endian
43     : LLVMTargetMachine(T, TT, CPU, FS, Options, RM, CM, OL),
44       Subtarget(TT, CPU, FS, isLittle),
45       DL(isLittle ?
46           ("e-p:32:32:32-i8:8:32-i16:16:32-i64:64:64-n32") :
47           ("E-p:32:32:32-i8:8:32-i16:16:32-i64:64:64-n32")),
48       InstrInfo(*this),
49       FrameLowering(Subtarget),
50       TLInfo(*this), TSInfo(*this) {
51 }
52
53 void Cpu0ebTargetMachine::anchor() { }
54
55 Cpu0ebTargetMachine::
56 Cpu0ebTargetMachine(const Target &T, StringRef TT,
```

```

57            StringRef CPU, StringRef FS, const TargetOptions &Options,
58             Reloc::Model RM, CodeModel::Model CM,
59             CodeGenOpt::Level OL)
60     : Cpu0TargetMachine(T, TT, CPU, FS, Options, RM, CM, OL, false) {}
61
62 void Cpu0elTargetMachine::anchor() { }
63
64 Cpu0elTargetMachine::
65 Cpu0elTargetMachine(const Target &T, StringRef TT,
66                     StringRef CPU, StringRef FS, const TargetOptions &Options,
67                     Reloc::Model RM, CodeModel::Model CM,
68                     CodeGenOpt::Level OL)
69     : Cpu0TargetMachine(T, TT, CPU, FS, Options, RM, CM, OL, true) {}
70 namespace {
71     /// Cpu0 Code Generator Pass Configuration Options.
72     class Cpu0PassConfig : public TargetPassConfig {
73     public:
74         Cpu0PassConfig(Cpu0TargetMachine *TM, PassManagerBase &PM)
75         : TargetPassConfig(TM, PM) {}
76
77         Cpu0TargetMachine &getCpu0TargetMachine() const {
78             return getTM<Cpu0TargetMachine>();
79         }
80
81         const Cpu0Subtarget &getCpu0Subtarget() const {
82             return *getCpu0TargetMachine().getSubtargetImpl();
83         }
84     };
85 } // namespace
86
87 TargetPassConfig *Cpu0TargetMachine::createPassConfig(PassManagerBase &PM) {
88     return new Cpu0PassConfig(this, PM);
89 }

```

include/llvm/Target/TargetInstrInfo.h

```

class TargetInstrInfo : public MCInstrInfo {
    TargetInstrInfo(const TargetInstrInfo &) LLVM_DELETED_FUNCTION;
    void operator=(const TargetInstrInfo &) LLVM_DELETED_FUNCTION;
public:
    ...
}
...
class TargetInstrInfoImpl : public TargetInstrInfo {
protected:
    TargetInstrInfoImpl(int CallFrameSetupOpcode = -1,
                        int CallFrameDestroyOpcode = -1)
    : TargetInstrInfo(CallFrameSetupOpcode, CallFrameDestroyOpcode) {}
public:
    ...
}

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0CallingConv.td

```
1 //===== Cpu0CallingConv.td - Calling Conventions for Cpu0 ---*- tablegen -*=====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----// This describes the calling conventions for Cpu0 architecture.
9 //=====-----//
10 //=====-----//
11
12 /// CCIIfSubtarget - Match if the current subtarget has a feature F.
13 class CCIIfSubtarget<string F, CCAction A>:
14     CCIIf<!strconcat("State.getTarget().getSubtarget<Cpu0Subtarget>()", F), A>;
15
16 def CSR_O32 : CalleeSavedRegs<(add LR, FP,
17                             sequence "S%u", 2, 0)>;
```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0FrameLowering.h

```
1 //===== Cpu0FrameLowering.h - Define frame lowering for Cpu0 -----* C++ -*=====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----// This file defines the Cpu0FrameLowering class.
9 //
10 //
11 //
12 //=====-----// This file defines the Cpu0FrameLowering class.
13 #ifndef CPU0_FRAMEINFO_H
14 #define CPU0_FRAMEINFO_H
15
16 #include "Cpu0.h"
17 #include "Cpu0Subtarget.h"
18 #include "llvm/Target/TargetFrameLowering.h"
19
20 namespace llvm {
21     class Cpu0Subtarget;
22
23     class Cpu0FrameLowering : public TargetFrameLowering {
24     protected:
25         const Cpu0Subtarget &STI;
26
27     public:
28         explicit Cpu0FrameLowering(const Cpu0Subtarget &sti)
29             : TargetFrameLowering(StackGrowsDown, 8, 0),
30               STI(sti) {
31     }
32
33     /// emitProlog/emitEpilog - These methods insert prolog and epilog code into
34     /// the function.
```

```

35 void emitPrologue(MachineFunction &MF) const;
36 void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
37 bool hasFP(const MachineFunction &MF) const;
38 };
39
40 } // End llvm namespace
41
42 #endif

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0FrameLowering.cpp

```

1 //===== Cpu0FrameLowering.cpp - Cpu0 Frame Information =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // This file contains the Cpu0 implementation of TargetFrameLowering class.
11 //
12 //=====//
13
14 #include "Cpu0FrameLowering.h"
15 #include "Cpu0InstrInfo.h"
16 #include "Cpu0MachineFunction.h"
17 #include "llvm/IR/Function.h"
18 #include "llvm/CodeGen/MachineFrameInfo.h"
19 #include "llvm/CodeGen/MachineFunction.h"
20 #include "llvm/CodeGen/MachineInstrBuilder.h"
21 #include "llvm/CodeGen/MachineModuleInfo.h"
22 #include "llvm/CodeGen/MachineRegisterInfo.h"
23 #include "llvm/IR/DataLayout.h"
24 #include "llvm/Target/TargetOptions.h"
25 #include "llvm/Support/CommandLine.h"
26
27 using namespace llvm;
28
29 // emitPrologue() and emitEpilogue must exist for main().
30
31 //=====//
32 //
33 // Stack Frame Processing methods
34 // +-----+
35 //
36 // The stack is allocated decrementing the stack pointer on
37 // the first instruction of a function prologue. Once decremented,
38 // all stack references are done thought a positive offset
39 // from the stack/frame pointer, so the stack is considering
40 // to grow up! Otherwise terrible hacks would have to be made
41 // to get this stack ABI compliant :)
42 //
43 // The stack frame required by the ABI (after call):
44 // Offset
45 //
46 // 0

```

```

47 // 4           Args to pass
48 // .           saved $GP (used in PIC)
49 // .           Alloca allocations
50 // .           Local Area
51 // .           CPU "Callee Saved" Registers
52 // .           saved FP
53 // .           saved RA
54 // .           FPU "Callee Saved" Registers
55 // StackSize   -----
56 //
57 // Offset - offset from sp after stack allocation on function prologue
58 //
59 // The sp is the stack pointer subtracted/added from the stack size
60 // at the Prologue/Epilogue
61 //
62 // References to the previous stack (to obtain arguments) are done
63 // with offsets that exceeds the stack size: (stacksize+(4*(num_arg-1)))
64 //
65 // Examples:
66 // - reference to the actual stack frame
67 //   for any local area var there is smt like : FI >= 0, StackOffset: 4
68 //     st REGX, 4(SP)
69 //
70 // - reference to previous stack frame
71 //   suppose there's a load to the 5th arguments : FI < 0, StackOffset: 16.
72 //   The emitted instruction will be something like:
73 //     ld REGX, 16+StackSize(SP)
74 //
75 // Since the total stack size is unknown on LowerFormalArguments, all
76 // stack references (ObjectOffset) created to reference the function
77 // arguments, are negative numbers. This way, on eliminateFrameIndex it's
78 // possible to detect those references and the offsets are adjusted to
79 // their real location.
80 //
81 //=====//
82 //
83 //-- Must have, hasFP() is pure virtual of parent
84 // hasFP - Return true if the specified function should have a dedicated frame
85 // pointer register. This is true if the function has variable sized allocas or
86 // if frame pointer elimination is disabled.
87 bool Cpu0FrameLowering::hasFP(const MachineFunction &MF) const {
88     const MachineFrameInfo *MFI = MF.getFrameInfo();
89     return MF.getTarget().Options.DisableFramePointerElim(MF) ||
90         MFI->hasVarSizedObjects() || MFI->isFrameAddressTaken();
91 }
92
93 void Cpu0FrameLowering::emitPrologue(MachineFunction &MF) const {
94 }
95
96 void Cpu0FrameLowering::emitEpilogue(MachineFunction &MF,
97                                         MachineBasicBlock &MBB) const {
98 }

```

**LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0InstrInfo.h**

```

1 //===== Cpu0InstrInfo.h - Cpu0 Instruction Information -----*- C++ -*==//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // This file contains the Cpu0 implementation of the TargetInstrInfo class.
11 //
12 //=====//
13
14 #ifndef CPU0INSTRUCTIONINFO_H
15 #define CPU0INSTRUCTIONINFO_H
16
17 #include "Cpu0.h"
18 #include "Cpu0RegisterInfo.h"
19 #include "llvm/Support/ErrorHandling.h"
20 #include "llvm/Target/TargetInstrInfo.h"
21
22 #define GET_INSTRINFO_HEADER
23 #include "Cpu0GenInstrInfo.inc"
24
25 namespace llvm {
26
27 class Cpu0InstrInfo : public Cpu0GenInstrInfo {
28     Cpu0TargetMachine &TM;
29     const Cpu0RegisterInfo RI;
30 public:
31     explicit Cpu0InstrInfo(Cpu0TargetMachine &TM);
32
33     /// getRegisterInfo - TargetInstrInfo is a superset of MRegister info. As
34     /// such, whenever a client has an instance of instruction info, it should
35     /// always be able to get register info as well (through this method).
36     ///
37     virtual const Cpu0RegisterInfo &getRegisterInfo() const;
38
39 public:
40 };
41 }
42
43 #endif

```

**LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0InstrInfo.cpp**

```

1 //===== Cpu0InstrInfo.cpp - Cpu0 Instruction Information -----//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//

```

```
9  //  
10 // This file contains the Cpu0 implementation of the TargetInstrInfo class.  
11 //  
12 //=====//  
13  
14 #include "Cpu0InstrInfo.h"  
15 #include "Cpu0TargetMachine.h"  
16 #define GET_INSTRINFOCTOR  
17 #include "Cpu0GenInstrInfo.inc"  
18  
19 using namespace llvm;  
20  
21 Cpu0InstrInfo::Cpu0InstrInfo(Cpu0TargetMachine &tm)  
22 :  
23     TM(tm),  
24     RI(*TM.getSubtargetImpl(), *this) {}  
25  
26 const Cpu0RegisterInfo &Cpu0InstrInfo::getRegisterInfo() const {  
27     return RI;  
28 }
```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0ISelLowering.h

```
1 //===== Cpu0ISelLowering.h - Cpu0 DAG Lowering Interface -----*- C++ -*=====//  
2 //  
3 //  
4 //  
5 // This file is distributed under the University of Illinois Open Source  
6 // License. See LICENSE.TXT for details.  
7 //  
8 //=====//  
9 //  
10 // This file defines the interfaces that Cpu0 uses to lower LLVM code into a  
11 // selection DAG.  
12 //  
13 //=====//  
14  
15 #ifndef Cpu0ISELLOWERING_H  
16 #define Cpu0ISELLOWERING_H  
17  
18 #include "Cpu0.h"  
19 #include "Cpu0Subtarget.h"  
20 #include "llvm/CodeGen/SelectionDAG.h"  
21 #include "llvm/Target/TargetLowering.h"  
22  
23 namespace llvm {  
24     namespace Cpu0ISD {  
25         enum NodeType {  
26             // Start the numbering from where ISD NodeType finishes.  
27             FIRST_NUMBER = ISD::BUILTIN_OP_END,  
28             Ret  
29         };  
30     }  
31  
32 //=====//  
33 // TargetLowering Implementation
```

```

34 //=====
35
36 class Cpu0TargetLowering : public TargetLowering {
37 public:
38     explicit Cpu0TargetLowering(Cpu0TargetMachine &TM);
39
40 private:
41     // Subtarget Info
42     const Cpu0Subtarget *Subtarget;
43
44     //-- must be exist without function all
45     virtual SDValue
46         LowerFormalArguments(SDValue Chain,
47                             CallingConv::ID CallConv, bool isVarArg,
48                             const SmallVectorImpl<ISD::InputArg> &Ins,
49                             DebugLoc dl, SelectionDAG &DAG,
50                             SmallVectorImpl<SDValue> &InVals) const;
51
52     //-- must be exist without function all
53     virtual SDValue
54         LowerReturn(SDValue Chain,
55                     CallingConv::ID CallConv, bool isVarArg,
56                     const SmallVectorImpl<ISD::OutputArg> &Outs,
57                     const SmallVectorImpl<SDValue> &OutVals,
58                     DebugLoc dl, SelectionDAG &DAG) const;
59     };
60 }
61
62 #endif // Cpu0ISELLOWERING_H

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0ISellowering.cpp

```

1 //===== Cpu0ISellowering.cpp - Cpu0 DAG Lowering Implementation =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // This file defines the interfaces that Cpu0 uses to lower LLVM code into a
11 // selection DAG.
12 //
13 //=====//
14
15 #define DEBUG_TYPE "cpu0-lower"
16 #include "Cpu0ISellowering.h"
17 #include "Cpu0TargetMachine.h"
18 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
19 #include "Cpu0Subtarget.h"
20 #include "llvm/IR/DerivedTypes.h"
21 #include "llvm/IR/Function.h"
22 #include "llvm/IR/GlobalVariable.h"
23 #include "llvm/IR/Intrinsics.h"
24 #include "llvm/IR/CallingConv.h"

```

```

25 #include "llvm/CodeGen/CallingConvLower.h"
26 #include "llvm/CodeGen/MachineFrameInfo.h"
27 #include "llvm/CodeGen/MachineFunction.h"
28 #include "llvm/CodeGen/MachineInstrBuilder.h"
29 #include "llvm/CodeGen/MachineRegisterInfo.h"
30 #include "llvm/CodeGen/SelectionDAGISel.h"
31 #include "llvm/CodeGen/ValueTypes.h"
32 #include "llvm/Support/Debug.h"
33 #include "llvm/Support/ErrorHandling.h"
34 #include "llvm/Support/raw_ostream.h"
35
36 using namespace llvm;
37
38 Cpu0TargetLowering::
39 Cpu0TargetLowering(Cpu0TargetMachine &TM)
40   : TargetLowering(TM, new TargetLoweringObjectFileELF()),
41     Subtarget(&TM.getSubtarget<Cpu0Subtarget>()) {
42 }
43
44 #include "Cpu0GenCallingConv.inc"
45
46 /// LowerFormalArguments - transform physical registers into virtual registers
47 /// and generate load operations for arguments places on the stack.
48 SDValue
49 Cpu0TargetLowering::LowerFormalArguments(SDValue Chain,
50                                         CallingConv::ID CallConv,
51                                         bool isVarArg,
52                                         const SmallVectorImpl<ISD::InputArg> &Ins,
53                                         DebugLoc dl, SelectionDAG &DAG,
54                                         SmallVectorImpl<SDValue> &InVals)
55                                         const {
56   return Chain;
57 }
58
59 //=====/
60 //          Return Value Calling Convention Implementation
61 //=====/
62
63 SDValue
64 Cpu0TargetLowering::LowerReturn(SDValue Chain,
65                                 CallingConv::ID CallConv, bool isVarArg,
66                                 const SmallVectorImpl<ISD::OutputArg> &Outs,
67                                 const SmallVectorImpl<SDValue> &OutVals,
68                                 DebugLoc dl, SelectionDAG &DAG) const {
69
70   return DAG.getNode(Cpu0ISD::Ret, dl, MVT::Other,
71                      Chain, DAG.getRegister(Cpu0::LR, MVT::i32));
72 }

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0MachineFunction.h

```

1 //===== Cpu0MachineFunctionInfo.h - Private data used for Cpu0 ----- C++ -*-//  

2 //  

3 //          The LLVM Compiler Infrastructure  

4 //  

5 // This file is distributed under the University of Illinois Open Source

```

```

6  // License. See LICENSE.TXT for details.
7  //
8  //=====//=====
9  //
10 // This file declares the Cpu0 specific subclass of MachineFunctionInfo.
11 //
12 //=====//=====
13
14 #ifndef CPU0_MACHINE_FUNCTION_INFO_H
15 #define CPU0_MACHINE_FUNCTION_INFO_H
16
17 #include "llvm/CodeGen/MachineFunction.h"
18 #include "llvm/CodeGen/MachineFrameInfo.h"
19 #include <utility>
20
21 namespace llvm {
22
23     /// Cpu0FunctionInfo - This class is derived from MachineFunction private
24     /// Cpu0 target-specific information for each MachineFunction.
25     class Cpu0FunctionInfo : public MachineFunctionInfo {
26         MachineFunction& MF;
27         unsigned MaxCallFrameSize;
28
29     public:
30         Cpu0FunctionInfo(MachineFunction& MF)
31             : MF(MF), MaxCallFrameSize(0)
32         {}
33
34         unsigned getMaxCallFrameSize() const { return MaxCallFrameSize; }
35         void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; }
36     };
37
38 } // end of namespace llvm
39
40 #endif // CPU0_MACHINE_FUNCTION_INFO_H

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0SelectionDAGInfo.h

```

1  //===== Cpu0SelectionDAGInfo.h - Cpu0 SelectionDAG Info -----*--- C++ -*=====//
2  //
3  // The LLVM Compiler Infrastructure
4  //
5  // This file is distributed under the University of Illinois Open Source
6  // License. See LICENSE.TXT for details.
7  //
8  //=====//=====
9  //
10 // This file defines the Cpu0 subclass for TargetSelectionDAGInfo.
11 //
12 //=====//=====
13
14 #ifndef CPU0SELECTIONDAGINFO_H
15 #define CPU0SELECTIONDAGINFO_H
16
17 #include "llvm/Target/TargetSelectionDAGInfo.h"
18

```

```
19 namespace llvm {
20
21 class Cpu0TargetMachine;
22
23 class Cpu0SelectionDAGInfo : public TargetSelectionDAGInfo {
24 public:
25     explicit Cpu0SelectionDAGInfo(const Cpu0TargetMachine &TM);
26     ~Cpu0SelectionDAGInfo();
27 };
28
29 }
30
31 #endif
```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0SelectionDAGInfo.cpp

```
1 //===== Cpu0SelectionDAGInfo.cpp - Cpu0 SelectionDAG Info =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // This file implements the Cpu0SelectionDAGInfo class.
11 //
12 //=====//
13
14 #define DEBUG_TYPE "cpu0-selectiondag-info"
15 #include "Cpu0TargetMachine.h"
16 using namespace llvm;
17
18 Cpu0SelectionDAGInfo::Cpu0SelectionDAGInfo(const Cpu0TargetMachine &TM)
19     : TargetSelectionDAGInfo(TM) {
20 }
21
22 Cpu0SelectionDAGInfo::~Cpu0SelectionDAGInfo() {
23 }
```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0Subtarget.h

```
#define GET_SUBTARGETINFO_HEADER
#include "Cpu0GenSubtargetInfo.inc"
...
class Cpu0Subtarget : public Cpu0GenSubtargetInfo {
...
// Virtual function, must have
/// ParseSubtargetFeatures - Parses features string setting specified
/// subtarget options. Definition of function is auto generated by tblgen.
void ParseSubtargetFeatures(StringRef CPU, StringRef FS);
...
}
```

## LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0Subtarget.cpp

```

1 //===== Cpu0Subtarget.cpp - Cpu0 Subtarget Information =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // This file implements the Cpu0 specific subclass of TargetSubtargetInfo.
11 //
12 //=====//
13
14 #include "Cpu0Subtarget.h"
15 #include "Cpu0.h"
16 #include "llvm/Support/TargetRegistry.h"
17
18 #define GET_SUBTARGETINFO_TARGET_DESC
19 #define GET_SUBTARGETINFOCTOR
20 #include "Cpu0GenSubtargetInfo.inc"
21
22 using namespace llvm;
23
24 void Cpu0Subtarget::anchor() { }
25
26 Cpu0Subtarget::Cpu0Subtarget(const std::string &TT, const std::string &CPU,
27                             const std::string &FS, bool little) :
28     Cpu0GenSubtargetInfo(TT, CPU, FS),
29     Cpu0ABI(UnknownABI), IsLittle(little)
30 {
31     std::string CPUName = CPU;
32     if (CPUName.empty())
33         CPUName = "cpu032";
34
35     // Parse features string.
36     ParseSubtargetFeatures(CPUName, FS);
37
38     // Initialize scheduling itinerary for the specified CPU.
39     InstrItins = getInstrItineraryForCPU(CPUName);
40
41     // Set Cpu0ABI if it hasn't been set yet.
42     if (Cpu0ABI == UnknownABI)
43         Cpu0ABI = O32;
44 }

```

## LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0RegisterInfo.h

```

1 //===== Cpu0RegisterInfo.h - Cpu0 Register Information Impl -----*-- C++ -*====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //

```

```

8 //=====//
9 //
10 // This file contains the Cpu0 implementation of the TargetRegisterInfo class.
11 //
12 //=====//
13
14 #ifndef CPU0REGISTERINFO_H
15 #define CPU0REGISTERINFO_H
16
17 #include "Cpu0.h"
18 #include "llvm/Target/TargetRegisterInfo.h"
19
20 #define GET_REGINFO_HEADER
21 #include "Cpu0GenRegisterInfo.inc"
22
23 namespace llvm {
24 class Cpu0Subtarget;
25 class TargetInstrInfo;
26 class Type;
27
28 struct Cpu0RegisterInfo : public Cpu0GenRegisterInfo {
29     const Cpu0Subtarget &Subtarget;
30     const TargetInstrInfo &TII;
31
32     Cpu0RegisterInfo(const Cpu0Subtarget &Subtarget, const TargetInstrInfo &tti);
33
34     /// getRegisterNumbering - Given the enum value for some register, e.g.
35     /// Cpu0::RA, return the number that it corresponds to (e.g. 31).
36     static unsigned getRegisterNumbering(unsigned RegEnum);
37
38     /// Code Generation virtual methods...
39     const uint16_t *getCalleeSavedRegs(const MachineFunction* MF = 0) const;
40     const uint32_t *getCallPreservedMask(CallingConv::ID) const;
41
42     // pure virtual method
43     BitVector getReservedRegs(const MachineFunction &MF) const;
44
45     // pure virtual method
46     /// Stack Frame Processing Methods
47     void eliminateFrameIndex(MachineBasicBlock::iterator II,
48                             int SPAdj, unsigned FIOperandNum,
49                             RegScavenger *RS = NULL) const;
50
51     // pure virtual method
52     /// Debug information queries.
53     unsigned getFrameRegister(const MachineFunction &MF) const;
54 };
55
56 } // end namespace llvm
57
58#endif

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0RegisterInfo.cpp

```

1 //===== Cpu0RegisterInfo.cpp - CPU0 Register Information === =====//
2 //
3 // The LLVM Compiler Infrastructure

```

```

4  // 
5  // This file is distributed under the University of Illinois Open Source
6  // License. See LICENSE.TXT for details.
7  //
8  //=====//
9  //
10 // This file contains the CPU0 implementation of the TargetRegisterInfo class.
11 //
12 //=====//
13
14 #define DEBUG_TYPE "cpu0-reg-info"
15
16 #include "Cpu0RegisterInfo.h"
17 #include "Cpu0.h"
18 #include "Cpu0Subtarget.h"
19 #include "Cpu0MachineFunction.h"
20 #include "llvm/IR/Constants.h"
21 #include "llvm/DebugInfo.h"
22 #include "llvm/IR/Type.h"
23 #include "llvm/IR/Function.h"
24 #include "llvm/CodeGen/ValueTypes.h"
25 #include "llvm/CodeGen/MachineInstrBuilder.h"
26 #include "llvm/CodeGen/MachineFunction.h"
27 #include "llvm/CodeGen/MachineFrameInfo.h"
28 #include "llvm/Target/TargetFrameLowering.h"
29 #include "llvm/Target/TargetMachine.h"
30 #include "llvm/Target/TargetOptions.h"
31 #include "llvm/Target/TargetInstrInfo.h"
32 #include "llvm/Support/CommandLine.h"
33 #include "llvm/Support/Debug.h"
34 #include "llvm/Support/ErrorHandling.h"
35 #include "llvm/Support/raw_ostream.h"
36 #include "llvm/ADT/BitVector.h"
37 #include "llvm/ADT/STLExtras.h"
38
39 #define GET_REGINFO_TARGET_DESC
40 #include "Cpu0GenRegisterInfo.inc"
41
42 using namespace llvm;
43
44 Cpu0RegisterInfo::Cpu0RegisterInfo(const Cpu0Subtarget &ST,
45                                     const TargetInstrInfo &tti)
46 : Cpu0GenRegisterInfo(Cpu0::LR), Subtarget(ST), TII(tti) {}
47
48 //=====//
49 // Callee Saved Registers methods
50 //=====//
51 /// Cpu0 Callee Saved Registers
52 // In Cpu0CallConv.td,
53 // def CSR_032 : CalleeSavedRegs<(add LR, FP,
54 //                               (sequence "S%u", 2, 0))>;
55 // llc create CSR_032_SaveList and CSR_032_RegMask from above defined.
56 const uint16_t* Cpu0RegisterInfo::
57 getCalleeSavedRegs(const MachineFunction *MF) const
58 {
59     return CSR_032_SaveList;
60 }
61

```

```

62 const uint32_t*
63 Cpu0RegisterInfo::getCallPreservedMask(CallingConv::ID) const
64 {
65     return CSR_O32_RegMask;
66 }
67
68 // pure virtual method
69 BitVector Cpu0RegisterInfo::
70 getReservedRegs(const MachineFunction &MF) const {
71     static const uint16_t ReservedCPURegs[] = {
72         Cpu0::ZERO, Cpu0::AT, Cpu0::SP, Cpu0::LR, Cpu0::PC
73     };
74     BitVector Reserved(getNumRegs());
75     typedef TargetRegisterClass::iterator RegIter;
76
77     for (unsigned I = 0; I < array_lengthof(ReservedCPURegs); ++I)
78         Reserved.set(ReservedCPURegs[I]);
79
80     return Reserved;
81 }
82
83 // pure virtual method
84 // FrameIndex represent objects inside a abstract stack.
85 // We must replace FrameIndex with an stack/frame pointer
86 // direct reference.
87 void Cpu0RegisterInfo::
88 eliminateFrameIndex(MachineBasicBlock::iterator II, int SPAdj,
89                      unsigned FIOperandNum, RegScavenger *RS) const {
90 }
91
92 // pure virtual method
93 unsigned Cpu0RegisterInfo::
94 getFrameRegister(const MachineFunction &MF) const {
95     const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering();
96     return TFI->hasFP(MF) ? (Cpu0::FP) :
97                           (Cpu0::SP);
98 }

```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0TargetMachine.h

```

1 //===== Cpu0TargetMachine.h - Define TargetMachine for Cpu0 -----*-- C++ -*====//
2 //
3 //          The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====
9 //
10 // This file declares the Cpu0 specific subclass of TargetMachine.
11 //
12 //=====-----=====
13
14 #ifndef CPU0TARGETMACHINE_H
15 #define CPU0TARGETMACHINE_H
16

```

```

17 #include "Cpu0FrameLowering.h"
18 #include "Cpu0InstrInfo.h"
19 #include "Cpu0ISelLowering.h"
20 #include "Cpu0SelectionDAGInfo.h"
21 #include "Cpu0Subtarget.h"
22 #include "llvm/Target/TargetMachine.h"
23 #include "llvm/IR/DataLayout.h"
24 #include "llvm/Target/TargetFrameLowering.h"
25
26 namespace llvm {
27     class formatted_raw_ostream;
28
29     class Cpu0TargetMachine : public LLVMTargetMachine {
30         Cpu0Subtarget Subtarget;
31         const DataLayout DL; // Calculates type size & alignment
32         Cpu0InstrInfo InstrInfo; // Instructions
33         Cpu0FrameLowering FrameLowering; // Stack(Frame) and Stack direction
34         Cpu0TargetLowering TLInfo; // Stack(Frame) and Stack direction
35         Cpu0SelectionDAGInfo TSInfo; // Map .bc DAG to backend DAG
36
37     public:
38         Cpu0TargetMachine(const Target &T, StringRef TT,
39                           StringRef CPU, StringRef FS, const TargetOptions &Options,
40                           Reloc::Model RM, CodeModel::Model CM,
41                           CodeGenOpt::Level OL,
42                           bool isLittle);
43
44         virtual const Cpu0InstrInfo *getInstrInfo() const
45         { return &InstrInfo; }
46         virtual const TargetFrameLowering *getFrameLowering() const
47         { return &FrameLowering; }
48         virtual const Cpu0Subtarget *getSubtargetImpl() const
49         { return &Subtarget; }
50         virtual const DataLayout *getDataLayout() const
51         { return &DL; }
52
53         virtual const Cpu0RegisterInfo *getRegisterInfo() const {
54             return &InstrInfo.getRegisterInfo();
55         }
56
57         virtual const Cpu0TargetLowering *getTargetLowering() const {
58             return &TLInfo;
59         }
60
61         virtual const Cpu0SelectionDAGInfo* getSelectionDAGInfo() const {
62             return &TSInfo;
63         }
64
65         // Pass Pipeline Configuration
66         virtual TargetPassConfig *createPassConfig(PassManagerBase &PM);
67     };
68
69     /// Cpu0ebTargetMachine - Cpu032 big endian target machine.
70     ///
71     class Cpu0ebTargetMachine : public Cpu0TargetMachine {
72         virtual void anchor();
73     public:
74         Cpu0ebTargetMachine(const Target &T, StringRef TT,

```

```
75    StringRef CPU, StringRef FS, const TargetOptions &Options,
76     Reloc::Model RM, CodeModel::Model CM,
77     CodeGenOpt::Level OL);
78 }
79
80 /// Cpu0elTargetMachine - Cpu032 little endian target machine.
81 ///
82 class Cpu0elTargetMachine : public Cpu0TargetMachine {
83     virtual void anchor();
84 public:
85     Cpu0elTargetMachine(const Target &T, StringRef TT,
86                         StringRef CPU, StringRef FS, const TargetOptions &Options,
87                         Reloc::Model RM, CodeModel::Model CM,
88                         CodeGenOpt::Level OL);
89 }
90 } // End llvm namespace
91
92 #endif
```

### LLVMBackendTutorialExampleCode/Chapter3\_1/Cpu0TargetMachine.cpp

```
1 //===== Cpu0TargetMachine.cpp - Define TargetMachine for Cpu0 =====//
2 //
3 //          The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // Implements the info about Cpu0 target spec.
11 //
12 //=====//
13
14 #include "Cpu0TargetMachine.h"
15 #include "Cpu0.h"
16 #include "llvm/PassManager.h"
17 #include "llvm/CodeGen/Passes.h"
18 #include "llvm/Support/TargetRegistry.h"
19 using namespace llvm;
20
21 extern "C" void LLVMInitializeCpu0Target() {
22     // Register the target.
23     // Big endian Target Machine
24     RegisterTargetMachine<Cpu0ebTargetMachine> X(TheCpu0Target);
25     // Little endian Target Machine
26     RegisterTargetMachine<Cpu0elTargetMachine> Y(TheCpu0elTarget);
27 }
28
29 // DataLayout --> Big-endian, 32-bit pointer/ABI/alignment
30 // The stack is always 8 byte aligned
31 // On function prologue, the stack is created by decrementing
32 // its pointer. Once decremented, all references are done with positive
33 // offset from the stack/frame pointer, using StackGrowsUp enables
34 // an easier handling.
35 // Using CodeModel::Large enables different CALL behavior.
36 Cpu0TargetMachine::
```

```

37 Cpu0TargetMachine(const Target &T, StringRef TT,
38                     StringRef CPU, StringRef FS, const TargetOptions &Options,
39                     Reloc::Model RM, CodeModel::Model CM,
40                     CodeGenOpt::Level OL,
41                     bool isLittle)
42 //- Default is big endian
43 : LLVMTargetMachine(T, TT, CPU, FS, Options, RM, CM, OL),
44   Subtarget(TT, CPU, FS, isLittle),
45   DL(isLittle ?
46       ("e-p:32:32:32-i8:8:32-i16:16:32-i64:64:64-n32") :
47       ("E-p:32:32:32-i8:8:32-i16:16:32-i64:64:64-n32")),
48   InstrInfo(*this),
49   FrameLowering(Subtarget),
50   TLInfo(*this), TSInfo(*this) {
51 }
52
53 void Cpu0ebTargetMachine::anchor() { }
54
55 Cpu0ebTargetMachine::
56 Cpu0ebTargetMachine(const Target &T, StringRef TT,
57                     StringRef CPU, StringRef FS, const TargetOptions &Options,
58                     Reloc::Model RM, CodeModel::Model CM,
59                     CodeGenOpt::Level OL)
60 : Cpu0TargetMachine(T, TT, CPU, FS, Options, RM, CM, OL, false) {}
61
62 void Cpu0elTargetMachine::anchor() { }
63
64 Cpu0elTargetMachine::
65 Cpu0elTargetMachine(const Target &T, StringRef TT,
66                     StringRef CPU, StringRef FS, const TargetOptions &Options,
67                     Reloc::Model RM, CodeModel::Model CM,
68                     CodeGenOpt::Level OL)
69 : Cpu0TargetMachine(T, TT, CPU, FS, Options, RM, CM, OL, true) {}
70 namespace {
71 /// Cpu0 Code Generator Pass Configuration Options.
72 class Cpu0PassConfig : public TargetPassConfig {
73 public:
74   Cpu0PassConfig(Cpu0TargetMachine *TM, PassManagerBase &PM)
75     : TargetPassConfig(TM, PM) {}
76
77   Cpu0TargetMachine &getCpu0TargetMachine() const {
78     return getTM<Cpu0TargetMachine>();
79   }
80
81   const Cpu0Subtarget &getCpu0Subtarget() const {
82     return *getCPU0TargetMachine().getSubtargetImpl();
83   }
84 };
85 } // namespace
86
87 TargetPassConfig *Cpu0TargetMachine::createPassConfig(PassManagerBase &PM) {
88   return new Cpu0PassConfig(this, PM);
89 }

```

`cmake_debug_build/lib/Target/Cpu0/Cpu0GenInstrInfo.inc`

```
//- Cpu0GenInstrInfo.inc which generate from Cpu0InstrInfo.td
#ifndef GET_INSTRINFO_HEADER
#define GET_INSTRINFO_HEADER
namespace llvm {
struct Cpu0GenInstrInfo : public TargetInstrInfoImpl {
    explicit Cpu0GenInstrInfo(int SO = -1, int DO = -1);
};

} // End llvm namespace
#endif // GET_INSTRINFO_HEADER

#define GET_INSTRINFO_HEADER
#include "Cpu0GenInstrInfo.inc"
//- Cpu0InstrInfo.h
class Cpu0InstrInfo : public Cpu0GenInstrInfo {
    Cpu0TargetMachine &TM;
public:
    explicit Cpu0InstrInfo(Cpu0TargetMachine &TM);
};


```



Figure 3.1: TargetMachine class diagram 1

The `Cpu0TargetMachine` inherit tree is `TargetMachine <- LLVMTargetMachine <- Cpu0TargetMachine`.

Cpu0TargetMachine has class Cpu0Subtarget, Cpu0InstrInfo, Cpu0FrameLowering, Cpu0TargetLowering and Cpu0SelectionDAGInfo. Class Cpu0Subtarget, Cpu0InstrInfo, Cpu0FrameLowering, Cpu0TargetLowering and Cpu0SelectionDAGInfo are inherited from parent class TargetSubtargetInfo, TargetInstrInfo, TargetFrameLowering, TargetLowering and TargetSelectionDAGInfo.

Figure 3.1 shows Cpu0TargetMachine inherit tree and it's Cpu0InstrInfo class inherit tree. Cpu0TargetMachine contains Cpu0InstrInfo and ... other class. Cpu0InstrInfo contains Cpu0RegisterInfo class, RI. Cpu0InstrInfo.td and Cpu0RegisterInfo.td will generate Cpu0GenInstrInfo.inc and Cpu0GenRegisterInfo.inc which contain some member functions implementation for class Cpu0InstrInfo and Cpu0RegisterInfo.

Figure 3.2 as below shows Cpu0TargetMachine contains class TSInfo: Cpu0SelectionDAGInfo, FrameLowering: Cpu0FrameLowering, Subtarget: Cpu0Subtarget and TLInfo: Cpu0TargetLowering.



Figure 3.2: TargetMachine class diagram 2

Figure 3.3 shows some members and operators (member function) of the parent class TargetMachine's. Figure 3.4 as below shows some members of class InstrInfo, RegisterInfo and TargetLowering. Class DAGInfo is skipped here.

Benefit from the inherit tree structure, we just need to implement few code in instruction, frame/stack, select DAG class. Many code implemented by their parent class. The llvm-tblgen generate Cpu0GenInstrInfo.inc from Cpu0InstrInfo.td. Cpu0InstrInfo.h extract those code it need from Cpu0GenInstrInfo.inc by define "#define GET\_INSTRINFO\_HEADER". Following is the code fragment from Cpu0GenInstrInfo.inc. Code between "#if def GET\_INSTRINFO\_HEADER" and "#endif // GET\_INSTRINFO\_HEADER" will be extracted by Cpu0InstrInfo.h.



Figure 3.3: TargetMachine members and operators



Figure 3.4: Other class members and operators

### cmake\_debug\_build/lib/Target/Cpu0/Cpu0GenInstInfo.inc

```
//- Cpu0GenInstInfo.inc which generate from Cpu0InstrInfo.td
#ifndef GET_INSTRINFO_HEADER
#define GET_INSTRINFO_HEADER
namespace llvm {
struct Cpu0GenInstrInfo : public TargetInstrInfoImpl {
    explicit Cpu0GenInstrInfo(int SO = -1, int DO = -1);
};
} // End llvm namespace
#endif // GET_INSTRINFO_HEADER
```

Reference Write An LLVM Backend web site<sup>1</sup>.

Now, the code in Chapter3\_1/ add class Cpu0TargetMachine(Cpu0TargetMachine.h and .cpp), Cpu0Subtarget (Cpu0Subtarget.h and .cpp), Cpu0InstrInfo (Cpu0InstrInfo.h and .cpp), Cpu0FrameLowering (Cpu0FrameLowering.h and .cpp), Cpu0TargetLowering (Cpu0ISelLowering.h and .cpp) and Cpu0SelectionDAGInfo (Cpu0SelectionDAGInfo.h and .cpp). CMakeLists.txt modified with those new added \*.cpp as follows,

### LLVMBackendTutorialExampleCode/Chapter3\_1/CMakeLists.txt

```
# Cpu0CodeGen should match with LLVMBuild.txt Cpu0CodeGen
add_llvm_target(Cpu0CodeGen
    Cpu0ISelLowering.cpp
    Cpu0InstrInfo.cpp
    Cpu0FrameLowering.cpp
    Cpu0RegisterInfo.cpp
    Cpu0Subtarget.cpp
    Cpu0TargetMachine.cpp
    Cpu0SelectionDAGInfo.cpp
)
```

Please take a look for Chapter3\_1 code. After that, building Chapter3\_1 by make as chapter 2 (of course, you should remove old src/lib/Target/Cpu0 and replace with src/lib/Target/Cpu0/LLVMBackendTutorialExampleCode/Chapter3\_1/). You can remove cmake\_debug\_build/lib/Target/Cpu0/\*.inc before do “make” to ensure your code rebuild completely. By remove \*.inc, all files those have included .inc will be rebuilt, then your Target library will regenerate. Command as follows,

```
118-165-78-230:cmake_debug_build Jonathan$ rm -rf lib/Target/Cpu0/*
```

Now, let's build Chapter3\_1 as the following command,

```
118-165-75-57:ExampleCode Jonathan$ pwd
/Users/Jonathan/llvm/test/src/lib/Target/Cpu0/LLVMBackendTutorialExampleCode
118-165-75-57:LLVMBackendTutorialExampleCode Jonathan$ sh removecpu0.sh
118-165-75-57:LLVMBackendTutorialExampleCode Jonathan$ cp -rf Chapter3_1/
* ../.
```

```
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch3.bc -o
ch3.cpu0.s
Assertion failed: (AsmInfo && "MCAsmInfo not initialized."
...

```

The errors say that we have not Target AsmPrinter. Let's add it in next section.

---

<sup>1</sup> <http://llvm.org/docs/WritingAnLLVMBackend.html#target-machine>

## 3.2 Add AsmPrinter

Chapter3\_2/cpu0 contains the Cpu0AsmPrinter definition. First, we add definitions in Cpu0.td to support Assembly-Writer. Cpu0.td is added with the following fragment,

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0.td

```
// Without this will have error: 'cpu032' is not a recognized processor for
// this target (ignoring processor)
//=====
// Cpu0 Subtarget features
//=====

def FeatureCpu032      : SubtargetFeature<"cpu032", "Cpu0ArchVersion", "Cpu032",
                           "Cpu032 ISA Support">;

//=====
// Cpu0 processors supported.
//=====

class Proc<string Name, list<SubtargetFeature> Features>
: Processor<Name, Cpu0GenericItineraries, Features>;

def : Proc<"cpu032", [FeatureCpu032]>;

def Cpu0AsmWriter : AsmWriter {
  string AsmWriterClassName = "InstPrinter";
  bit isMCAsmWriter = 1;
}

// Will generate Cpu0GenAsmWrite.inc included by Cpu0InstPrinter.cpp, contents
// as follows,
// void Cpu0InstPrinter::printInstruction(const MCInst *MI, raw_ostream &O) {...}
// const char *Cpu0InstPrinter::getRegisterName(unsigned RegNo) {...}
def Cpu0 : Target {
// def Cpu0InstrInfo : InstrInfo as before.
  let InstructionSet = Cpu0InstrInfo;
  let AssemblyWriters = [Cpu0AsmWriter];
}
```

As comments indicate, it will generate Cpu0GenAsmWrite.inc which is included by Cpu0InstPrinter.cpp. Cpu0GenAsmWrite.inc has the implementation of Cpu0InstPrinter::printInstruction() and Cpu0InstPrinter::getRegisterName(). Both of these functions can be auto-generated from the information we defined in Cpu0InstrInfo.td and Cpu0RegisterInfo.td. To let these two functions work in our code, the only thing need to do is add a class Cpu0InstPrinter and include them as did in Chapter3\_1.

File Chapter3\_1/Cpu0/InstPrinter/Cpu0InstPrinter.cpp include Cpu0GenAsmWrite.inc and call the auto-generated functions as shown in last section.

Next, add Cpu0MCInstLower (Cpu0MCInstLower.h, Cpu0MCInstLower.cpp), as well as Cpu0BaseInfo.h, Cpu0FixupKinds.h and Cpu0MCAsmInfo (Cpu0MCAsmInfo.h, Cpu0MCAsmInfo.cpp) in sub-directory MCTarget-Desc as follows,

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0MCInstLower.h

```
1 //===== Cpu0MCInstLower.h - Lower MachineInstr to MCInst -----* C++ -*=====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====//
9
10 #ifndef CPU0MCINSTLOWER_H
11 #define CPU0MCINSTLOWER_H
12 #include "llvm/ADT/SmallVector.h"
13 #include "llvm/CodeGen/MachineOperand.h"
14 #include "llvm/Support/Compiler.h"
15
16 namespace llvm {
17     class MCContext;
18     class MCInst;
19     class MCOperand;
20     class MachineInstr;
21     class MachineFunction;
22     class Mangler;
23     class Cpu0AsmPrinter;
24
25     /// Cpu0MCInstLower - This class is used to lower an MachineInstr into an
26     // MCInst.
27     class LLVM_LIBRARY_VISIBILITY Cpu0MCInstLower {
28         typedef MachineOperand::MachineOperandType MachineOperandType;
29         MCContext *Ctx;
30         Mangler *Mang;
31         Cpu0AsmPrinter &AsmPrinter;
32     public:
33         Cpu0MCInstLower(Cpu0AsmPrinter &asmprinter);
34         void Initialize(Mangler *mang, MCContext* C);
35         void Lower(const MachineInstr *MI, MCInst &OutMI) const;
36     private:
37         MCOperand LowerSymbolOperand(const MachineOperand &MO,
38                                     MachineOperandType MOTy, unsigned Offset) const;
39         MCOperand LowerOperand(const MachineOperand& MO, unsigned offset = 0) const;
40     };
41 }
42
43 #endif
```

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0MCInstLower.cpp

```
1 //===== Cpu0MCInstLower.cpp - Convert Cpu0 MachineInstr to MCInst =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====//
```

```

9  // This file contains code to lower Cpu0 MachineInstrs to their corresponding
10 // MCInst records.
11 //
12 //=====-----//=====
13
14
15 #include "Cpu0MCInstLower.h"
16 #include "Cpu0AsmPrinter.h"
17 #include "Cpu0InstrInfo.h"
18 #include "MCTargetDesc/Cpu0BaseInfo.h"
19 #include "llvm/CodeGen/MachineFunction.h"
20 #include "llvm/CodeGen/MachineInstr.h"
21 #include "llvm/CodeGen/MachineOperand.h"
22 #include "llvm/MC/MCContext.h"
23 #include "llvm/MC/MCEExpr.h"
24 #include "llvm/MC/MCInst.h"
25 #include "llvm/Target/Mangler.h"
26
27 using namespace llvm;
28
29 Cpu0MCInstLower::Cpu0MCInstLower(Cpu0AsmPrinter &asmprinter)
30 : AsmPrinter(asmprinter) {}
31
32 void Cpu0MCInstLower::Initialize(Mangler *M, MCContext* C) {
33     Mang = M;
34     Ctx = C;
35 }
36
37 MCOperand Cpu0MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
38                                                 MachineOperandType MOTy,
39                                                 unsigned Offset) const {
40     MCSymbolRefExpr::VariantKind Kind;
41     const MCSymbol *Symbol;
42
43     switch (MO.getTargetFlags()) {
44     default: llvm_unreachable("Invalid target flag!");
45     }
46
47     switch (MOTy) {
48     case MachineOperand::MO_GlobalAddress:
49         Symbol = Mang->getSymbol(MO.getGlobal());
50         break;
51
52     default:
53         llvm_unreachable("<unknown operand type>");
54     }
55
56     const MCSymbolRefExpr *MCSym = MCSymbolRefExpr::Create(Symbol, Kind, *Ctx);
57
58     if (!Offset)
59         return MCOperand::CreateExpr(MCSym);
60
61     // Assume offset is never negative.
62     assert(Offset > 0);
63
64     const MCConstantExpr *OffsetExpr = MCConstantExpr::Create(Offset, *Ctx);
65     const MCBinaryExpr *AddExpr = MCBinaryExpr::CreateAdd(MCSym, OffsetExpr, *Ctx);
66     return MCOperand::CreateExpr(AddExpr);

```

```

67 }
68
69 MCOperand Cpu0MCInstLower::LowerOperand(const MachineOperand& MO,
70                                         unsigned offset) const {
71     MachineOperandType MOTy = MO.getType();
72
73     switch (MOTy) {
74     default: llvm_unreachable("unknown operand type");
75     case MachineOperand::MO_Register:
76         // Ignore all implicit register operands.
77         if (MO.isImplicit()) break;
78         return MCOperand::CreateReg(MO.getReg());
79     case MachineOperand::MO_Immediate:
80         return MCOperand::CreateImm(MO.getImm() + offset);
81     case MachineOperand::MO_RegisterMask:
82         break;
83     }
84
85     return MCOperand();
86 }
87
88 void Cpu0MCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
89     OutMI.setOpcode(MI->getOpcode());
90
91     for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
92         const MachineOperand &MO = MI->getOperand(i);
93         MCOperand MCOp = LowerOperand(MO);
94
95         if (MCOp.isValid())
96             OutMI.addOperand(MCOp);
97     }
98 }

```

### LLVMBackendTutorialExampleCode/Chapter3\_2/MCTargetDesc/Cpu0BaseInfo.h

```

1 //===== Cpu0BaseInfo.h - Top level definitions for CPU0 MC -----//  

2 //  

3 // The LLVM Compiler Infrastructure  

4 //  

5 // This file is distributed under the University of Illinois Open Source  

6 // License. See LICENSE.TXT for details.  

7 //  

8 //-----//  

9 //  

10 // This file contains small standalone helper functions and enum definitions for  

11 // the Cpu0 target useful for the compiler back-end and the MC libraries.  

12 //  

13 //-----//  

14 #ifndef CPU0BASEINFO_H  

15 #define CPU0BASEINFO_H  

16
17 #include "Cpu0FixupKinds.h"  

18 #include "Cpu0MCTargetDesc.h"  

19 #include "llvm/MC/MCExpr.h"  

20 #include "llvm/Support/DataTypes.h"  

21 #include "llvm/Support/ErrorHandling.h"

```

```

22
23 namespace llvm {
24
25 /// Cpu0II - This namespace holds all of the target specific flags that
26 /// instruction info tracks.
27 ///
28 namespace Cpu0II {
29 /// Target Operand Flag enum.
30 enum {
31     //=====//
32 /// Instruction encodings. These are the standard/most common forms for
33 /// Cpu0 instructions.
34 ///
35
36 /// Pseudo - This represents an instruction that is a pseudo instruction
37 /// or one that has not been implemented yet. It is illegal to code generate
38 /// it, but tolerated for intermediate implementation stages.
39 Pseudo = 0,
40
41 /// FrmR - This form is for instructions of the format R.
42 FrmR = 1,
43 /// FrmI - This form is for instructions of the format I.
44 FrmI = 2,
45 /// FrmJ - This form is for instructions of the format J.
46 FrmJ = 3,
47 /// FrmOther - This form is for instructions that have no specific format.
48 FrmOther = 4,
49
50 FormMask = 15
51 };
52 }
53
54 /// getCpu0RegisterNumbering - Given the enum value for some register,
55 /// return the number that it corresponds to.
56 inline static unsigned getCpu0RegisterNumbering(unsigned RegEnum)
57 {
58     switch (RegEnum) {
59     case Cpu0::ZERO:
60         return 0;
61     case Cpu0::AT:
62         return 1;
63     case Cpu0::V0:
64         return 2;
65     case Cpu0::V1:
66         return 3;
67     case Cpu0::A0:
68         return 4;
69     case Cpu0::A1:
70         return 5;
71     case Cpu0::T9:
72         return 6;
73     case Cpu0::S0:
74         return 7;
75     case Cpu0::S1:
76         return 8;
77     case Cpu0::S2:
78         return 9;
79     case Cpu0::GP:

```

```
80     return 10;
81     case Cpu0::FP:
82         return 11;
83     case Cpu0::SW:
84         return 12;
85     case Cpu0::SP:
86         return 13;
87     case Cpu0::LR:
88         return 14;
89     case Cpu0::PC:
90         return 15;
91     default: llvm_unreachable("Unknown register number!");
92 }
93 }
94 }
95 }
96
97 #endif
```

### LLVMBackendTutorialExampleCode/Chapter3\_2/MCTargetDesc/Cpu0FixupKinds.h

```
1 //===== Cpu0FixupKinds.h - Cpu0 Specific Fixup Entries -----*- C++ -*=====//
2 //
3 //          The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====//
```

```
9
10 #ifndef LLVM_CPU0_CPU0FIXUPKINDS_H
11 #define LLVM_CPU0_CPU0FIXUPKINDS_H
12
13 #include "llvm/MC/MCFixup.h"
14
15 namespace llvm {
16 namespace Cpu0 {
17     // Although most of the current fixup types reflect a unique relocation
18     // one can have multiple fixup types for a given relocation and thus need
19     // to be uniquely named.
20     //
21     // This table *must* be in the same order of
22     // MCFixupKindInfo Infos[Cpu0::NumTargetFixupKinds]
23     // in Cpu0AsmBackend.cpp.
24     //
25     enum Fixups {
26         // Branch fixups resulting in R_CPU0_16.
27         fixup_Cpu0_16 = FirstTargetFixupKind,
28
29         // Marker
30         LastTargetFixupKind,
31         NumTargetFixupKinds = LastTargetFixupKind - FirstTargetFixupKind
32     };
33 } // namespace Cpu0
34 } // namespace llvm
35
36
```

```
37 #endif // LLVM_CPU0_CPU0FIXUPKINDS_H
```

### LLVMBackendTutorialExampleCode/Chapter3\_2/MCTargetDesc/Cpu0MCAsmInfo.h

```
1 //===== Cpu0MCAsmInfo.h - Cpu0 Asm Info -----* C++ -*****//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----//*
9 //
10 // This file contains the declaration of the Cpu0MCAsmInfo class.
11 //
12 //=====-----//*
13
14 #ifndef CPU0TARGETASMINFO_H
15 #define CPU0TARGETASMINFO_H
16
17 #include "llvm/MC/MCAsmInfo.h"
18
19 namespace llvm {
20     classStringRef;
21     class Target;
22
23     class Cpu0MCAsmInfo : public MCAsmInfo {
24         virtual void anchor();
25     public:
26         explicit Cpu0MCAsmInfo(const Target &T, StringRef TT);
27     };
28
29 } // namespace llvm
30
31 #endif
```

### LLVMBackendTutorialExampleCode/Chapter3\_2/MCTargetDesc/Cpu0MCAsmInfo.cpp

```
1 //===== Cpu0MCAsmInfo.cpp - Cpu0 Asm Properties -----//*
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----//*
9 //
10 // This file contains the declarations of the Cpu0MCAsmInfo properties.
11 //
12 //=====-----//*
13
14 #include "Cpu0MCAsmInfo.h"
15 #include "llvm/ADT/Triple.h"
16
17 using namespace llvm;
```

```

18
19 void Cpu0MCAsmInfo::anchor() { }
20
21 Cpu0MCAsmInfo::Cpu0MCAsmInfo(const Target &T, StringRef TT) {
22     Triple TheTriple(TT);
23     if ((TheTriple.getArch() == Triple::cpu0))
24         IsLittleEndian = false;
25
26     AlignmentIsInBytes = false;
27     Data16bitsDirective = "\t.2byte\t";
28     Data32bitsDirective = "\t.4byte\t";
29     Data64bitsDirective = "\t.8byte\t";
30     PrivateGlobalPrefix = "$";
31     CommentString = "#";
32     ZeroDirective = "\t.space\t";
33     GPRel32Directive = "\t.gpword\t";
34     GPRel64Directive = "\t.gp dword\t";
35     WeakRefDirective = "\t.weak\t";
36
37     SupportsDebugInformation = true;
38     ExceptionsType = ExceptionHandling::DwarfCFI;
39     HasLEB128 = true;
40     DwarfRegNumForCFI = true;
41 }

```

Finally, add code in Cpu0MCTargetDesc.cpp to register Cpu0InstPrinter as follows,

### LLVMBackendTutorialExampleCode/MCTargetDesc/Cpu0MCTargetDesc.cpp

```

static std::string ParseCpu0Triple(StringRef TT, StringRef CPU) {
    std::string Cpu0ArchFeature;
    size_t DashPosition = 0;
    StringRef TheTriple;

    // Let's see if there is a dash, like cpu0-unknown-linux.
    DashPosition = TT.find('-');

    if (DashPosition == StringRef::npos) {
        // No dash, we check the string size.
        TheTriple = TT.substr(0);
    } else {
        // We are only interested in substring before dash.
        TheTriple = TT.substr(0, DashPosition);
    }

    if (TheTriple == "cpu0" || TheTriple == "cpu0el") {
        if (CPU.empty() || CPU == "cpu032") {
            Cpu0ArchFeature = "+cpu032";
        }
    }
    return Cpu0ArchFeature;
}

static MCInstrInfo *createCpu0MCInstrInfo() {
    MCInstrInfo *X = new MCInstrInfo();
    InitCpu0MCInstrInfo(X); // defined in Cpu0GenInstrInfo.inc
    return X;
}

```

```

}

static MCRegisterInfo *createCpu0MCRegisterInfo(StringRef TT) {
    MCRegisterInfo *X = new MCRegisterInfo();
    InitCpu0MCRegisterInfo(X, Cpu0::LR); // defined in Cpu0GenRegisterInfo.inc
    return X;
}

static MCSubtargetInfo *createCpu0MCSubtargetInfo(StringRef TT, StringRef CPU,
                                                 StringRef FS) {
    std::string ArchFS = ParseCpu0Triple(TT, CPU);
    if (!FS.empty()) {
        if (!ArchFS.empty())
            ArchFS = ArchFS + "," + FS.str();
        else
            ArchFS = FS;
    }
    MCSubtargetInfo *X = new MCSubtargetInfo();
    InitCpu0MCSubtargetInfo(X, TT, CPU, ArchFS); // defined in Cpu0GenSubtargetInfo.inc
    return X;
}

static MCAsmInfo *createCpu0MCAsmInfo(const Target &T, StringRef TT) {
    MCAsmInfo *MAI = new Cpu0MCAsmInfo(T, TT);

    MachineLocation Dst(MachineLocation::VirtualFP);
    MachineLocation Src(Cpu0::SP, 0);
    MAI->addInitialFrameState(0, Dst, Src);

    return MAI;
}

static MCCCodeGenInfo *createCpu0MCCCodeGenInfo(StringRef TT, Reloc::Model RM,
                                                CodeModel::Model CM,
                                                CodeGenOpt::Level OL) {
    MCCCodeGenInfo *X = new MCCCodeGenInfo();
    if (CM == CodeModel::JITDefault)
        RM = Reloc::Static;
    else if (RM == Reloc::Default)
        RM = Reloc::PIC_;
    X->InitMCCCodeGenInfo(RM, CM, OL); // defined in lib/MC/MCCCodeGenInfo.cpp
    return X;
}

static MCInstPrinter *createCpu0MCInstPrinter(const Target &T,
                                                unsigned SyntaxVariant,
                                                const MCAsmInfo &MAI,
                                                const MCInstrInfo &MII,
                                                const MCRegisterInfo &MRI,
                                                const MCSubtargetInfo &STI) {
    return new Cpu0InstPrinter(MAI, MII, MRI);
}

extern "C" void LLVMInitializeCpu0TargetMC() {
    // Register the MC asm info.
    RegisterMCAsmInfoFn X(TheCpu0Target, createCpu0MCAsmInfo);
    RegisterMCAsmInfoFn Y(TheCpu0elTarget, createCpu0MCAsmInfo);
}

```

```
// Register the MC codegen info.
TargetRegistry::RegisterMCCodeGenInfo(TheCpu0Target,
                                         createCpu0MCCodeGenInfo);
TargetRegistry::RegisterMCCodeGenInfo(TheCpu0elTarget,
                                         createCpu0MCCodeGenInfo);

// Register the MC instruction info.
TargetRegistry::RegisterMCInstrInfo(TheCpu0Target, createCpu0MCInstrInfo);
TargetRegistry::RegisterMCInstrInfo(TheCpu0elTarget, createCpu0MCInstrInfo);

// Register the MC register info.
TargetRegistry::RegisterMCRegInfo(TheCpu0Target, createCpu0MCRegisterInfo);
TargetRegistry::RegisterMCRegInfo(TheCpu0elTarget, createCpu0MCRegisterInfo);

// Register the MC subtarget info.
TargetRegistry::RegisterMCSubtargetInfo(TheCpu0Target,
                                         createCpu0MCSubtargetInfo);
TargetRegistry::RegisterMCSubtargetInfo(TheCpu0elTarget,
                                         createCpu0MCSubtargetInfo);

// Register the MCInstPrinter.
TargetRegistry::RegisterMCInstPrinter(TheCpu0Target,
                                         createCpu0MCInstPrinter);
TargetRegistry::RegisterMCInstPrinter(TheCpu0elTarget,
                                         createCpu0MCInstPrinter);
}

Now, it's time to work with AsmPrinter. According section "section Target Registration" 2, we can register our AsmPrinter when we need it as the following function of LLVMInitializeCpu0AsmPrinter(),
```

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0AsmPrinter.h

```
1 //===== Cpu0AsmPrinter.h - Cpu0 LLVM Assembly Printer -----* C++ -----//  
2 //  
3 // The LLVM Compiler Infrastructure  
4 //  
5 // This file is distributed under the University of Illinois Open Source  
6 // License. See LICENSE.TXT for details.  
7 //  
8 //-----  
9 //  
10 // Cpu0 Assembly printer class.  
11 //  
12 //-----  
13  
14 #ifndef CPU0ASMPRINTER_H  
15 #define CPU0ASMPRINTER_H  
16  
17 #include "Cpu0MachineFunction.h"  
18 #include "Cpu0MCInstLower.h"  
19 #include "Cpu0Subtarget.h"  
20 #include "llvm/CodeGen/AsmPrinter.h"  
21 #include "llvm/Support/Compiler.h"  
22 #include "llvm/Target/TargetMachine.h"  
23  
24 namespace llvm {  
25 class MCStreamer;
```

<sup>2</sup> <http://jonathan2251.github.com/lbd/llvmstructure.html#target-registration>

```

26 class MachineInstr;
27 class MachineBasicBlock;
28 class Module;
29 class raw_ostream;
30
31 class LLVM_LIBRARY_VISIBILITY Cpu0AsmPrinter : public AsmPrinter {
32
33     void EmitInstrWithMacroNoAT(const MachineInstr *MI);
34
35 public:
36
37     const Cpu0Subtarget *Subtarget;
38     const Cpu0FunctionInfo *Cpu0FI;
39     Cpu0MCInstLower MCInstLowering;
40
41     explicit Cpu0AsmPrinter(TargetMachine &TM, MCStreamer &Streamer)
42         : AsmPrinter(TM, Streamer), MCInstLowering(*this) {
43         Subtarget = &TM.getSubtarget<Cpu0Subtarget>();
44     }
45
46     virtual const char *getPassName() const {
47         return "Cpu0 Assembly Printer";
48     }
49
50     virtual bool runOnMachineFunction(MachineFunction &MF);
51
52     //- EmitInstruction() must exists or will have run time error.
53     void EmitInstruction(const MachineInstr *MI);
54     void printSavedRegsBitmask(raw_ostream &O);
55     void printHex32(unsigned int Value, raw_ostream &O);
56     void emitFrameDirective();
57     const char *getCurrentABIString() const;
58     virtual void EmitFunctionEntryLabel();
59     virtual void EmitFunctionBodyStart();
60     virtual void EmitFunctionBodyEnd();
61     void EmitStartOfAsmFile(Module &M);
62     virtual MachineLocation getDebugValueLocation(const MachineInstr *MI) const;
63     void PrintDebugValueComment(const MachineInstr *MI, raw_ostream &OS);
64 };
65 }
66
67 #endif

```

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0AsmPrinter.cpp

```

1  //===== Cpu0AsmPrinter.cpp - Cpu0 LLVM Assembly Printer =====//
2  //
3  // The LLVM Compiler Infrastructure
4  //
5  // This file is distributed under the University of Illinois Open Source
6  // License. See LICENSE.TXT for details.
7  //
8  //=====//
9  //
10 // This file contains a printer that converts from our internal representation
11 // of machine-dependent LLVM code to GAS-format CPU0 assembly language.
12 //

```

```

13 //=====
14
15 #define DEBUG_TYPE "cpu0-asm-printer"
16 #include "Cpu0AsmPrinter.h"
17 #include "Cpu0.h"
18 #include "Cpu0InstrInfo.h"
19 #include "InstPrinter/Cpu0InstPrinter.h"
20 #include "MCTargetDesc/Cpu0BaseInfo.h"
21 #include "llvm/ADT/SmallString.h"
22 #include "llvm/ADT/StringExtras.h"
23 #include "llvm/ADT/Twine.h"
24 #include "llvm/IR/BasicBlock.h"
25 #include "llvm/IR/Instructions.h"
26 #include "llvm/CodeGen/MachineFunctionPass.h"
27 #include "llvm/CodeGen/MachineConstantPool.h"
28 #include "llvm/CodeGen/MachineFrameInfo.h"
29 #include "llvm/CodeGen/MachineInstr.h"
30 #include "llvm/CodeGen/MachineMemOperand.h"
31 #include "llvm/MC/MCStreamer.h"
32 #include "llvm/MC/MCAsmInfo.h"
33 #include "llvm/MC/MCInst.h"
34 #include "llvm/MC/MCSymbol.h"
35 #include "llvm/Support/TargetRegistry.h"
36 #include "llvm/Support/raw_ostream.h"
37 #include "llvm/Target/Mangler.h"
38 #include "llvm/Target/TargetLoweringObjectFile.h"
39 #include "llvm/Target/TargetOptions.h"
40
41 using namespace llvm;
42
43 bool Cpu0AsmPrinter::runOnMachineFunction(MachineFunction &MF) {
44     Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
45     AsmPrinter::runOnMachineFunction(MF);
46     return true;
47 }
48
49 //-- EmitInstruction() must exists or will have run time error.
50 void Cpu0AsmPrinter::EmitInstruction(const MachineInstr *MI) {
51     if (MI->isDebugValue()) {
52         SmallString<128> Str;
53         raw_svector_ostream OS(Str);
54
55         PrintDebugValueComment(MI, OS);
56         return;
57     }
58
59     MCInst TmpInst0;
60     MCInstLowering.Lower(MI, TmpInst0);
61     OutStreamer.EmitInstruction(TmpInst0);
62 }
63
64 //=====
65 //
66 // Cpu0 Asm Directives
67 //
68 // -- Frame directive "frame Stackpointer, Stacksize, RARegister"
69 // Describe the stack frame.
70 //

```

```

71 // -- Mask directives "(f)mask bitmask, offset"
72 // Tells the assembler which registers are saved and where.
73 // bitmask - contain a little endian bitset indicating which registers are
74 //             saved on function prologue (e.g. with a 0x80000000 mask, the
75 //             assembler knows the register 31 (RA) is saved at prologue.
76 // offset - the position before stack pointer subtraction indicating where
77 //             the first saved register on prologue is located. (e.g. with a
78 //
79 // Consider the following function prologue:
80 //
81 //     .frame $fp,48,$ra
82 //     .mask 0xc0000000,-8
83 //     addiu $sp, $sp, -48
84 //     st $ra, 40($sp)
85 //     st $fp, 36($sp)
86 //
87 //     With a 0xc0000000 mask, the assembler knows the register 31 (RA) and
88 //     30 (FP) are saved at prologue. As the save order on prologue is from
89 //     left to right, RA is saved first. A -8 offset means that after the
90 //     stack pointer subtraction, the first register in the mask (RA) will be
91 //     saved at address 48-8=40.
92 //
93 //=====//=====
94
95 //=====
96 // Mask directives
97 //=====
98 //     .frame      $sp,8,$lr
99 //->     .mask      0x00000000,0
100 //     .set       noreorder
101 //     .set       nomacro
102
103 // Create a bitmask with all callee saved registers for CPU or Floating Point
104 // registers. For CPU registers consider RA, GP and FP for saving if necessary.
105 void Cpu0AsmPrinter::printSavedRegsBitmask(raw_ostream &O) {
106     // CPU and FPU Saved Registers Bitmasks
107     unsigned CPUBitmask = 0;
108     int CPUTopSavedRegOff;
109
110     // Set the CPU and FPU Bitmasks
111     const MachineFrameInfo *MFI = MF->getFrameInfo();
112     const std::vector<CalleeSavedInfo> &CSI = MFI->getCalleeSavedInfo();
113     // size of stack area to which FP callee-saved regs are saved.
114     unsigned CPURegSize = Cpu0::CPURegsRegClass.getSize();
115     unsigned i = 0, e = CSI.size();
116
117     // Set CPU Bitmask.
118     for ( ; i != e; ++i) {
119         unsigned Reg = CSI[i].getReg();
120         unsigned RegNum = getCPURegisterNumbering(Reg);
121         CPUBitmask |= (1 << RegNum);
122     }
123
124     CPUTopSavedRegOff = CPUBitmask ? -CPURegSize : 0;
125
126     // Print CPUBitmask
127     O << "\t.mask \t"; printHex32(CPUBitmask, O);
128     O << ',' << CPUTopSavedRegOff << '\n';

```

```

129 }
130
131 // Print a 32 bit hex number with all numbers.
132 void Cpu0AsmPrinter::printHex32(unsigned Value, raw_ostream &O) {
133     O << "0x";
134     for (int i = 7; i >= 0; i--)
135         O.write_hex((Value & (0xF << (i*4))) >> (i*4));
136 }
137
138 //=====//
139 // Frame and Set directives
140 //=====//
141 //--> .frame    $sp,8,$lr
142 //     .mask      0x00000000,0
143 //     .set       noreorder
144 //     .set       nomacro
145 ///// Frame Directive
146 void Cpu0AsmPrinter::emitFrameDirective() {
147     const TargetRegisterInfo &RI = *TM.getRegisterInfo();
148
149     unsigned stackReg = RI.getFrameRegister(*MF);
150     unsigned returnReg = RI.getRARegister();
151     unsigned stackSize = MF->getFrameInfo()->getStackSize();
152
153     if (OutStreamer.hasRawTextSupport())
154         OutStreamer.EmitRawText("\t.frame\t" +
155            StringRef(Cpu0InstPrinter::getRegisterName(stackReg)).lower() +
156             "," + Twine(stackSize) + ",$" +
157            StringRef(Cpu0InstPrinter::getRegisterName(returnReg)).lower());
158 }
159
160 ///// Emit Set directives.
161 const char *Cpu0AsmPrinter::getCurrentABIString() const {
162     switch (Subtarget->getTargetABI()) {
163     case Cpu0Subtarget::O32: return "abi32";
164     default: llvm_unreachable("Unknown Cpu0 ABI");
165     }
166 }
167
168 //--> .type      main,@function
169 //-->     .ent      main                      # @main
170 //     main:
171 void Cpu0AsmPrinter::EmitFunctionEntryLabel() {
172     if (OutStreamer.hasRawTextSupport())
173         OutStreamer.EmitRawText("\t.ent\t" + Twine(CurrentFnSym->getName()));
174     OutStreamer.EmitLabel(CurrentFnSym);
175 }
176
177 //     .frame    $sp,8,$pc
178 //     .mask      0x00000000,0
179 //-->     .set       noreorder
180 //-->     .set       nomacro
181 ///// EmitFunctionBodyStart - Targets can override this to emit stuff before
182 ///// the first basic block in the function.
183 void Cpu0AsmPrinter::EmitFunctionBodyStart() {
184     MCInstLowering::Initialize(Mang, &MF->getContext());
185
186

```

```

187     emitFrameDirective();
188
189     if (OutStreamer.hasRawTextSupport()) {
190         SmallString<128> Str;
191         raw_svector_ostream OS(Str);
192         printSavedRegsBitmask(OS);
193         OutStreamer.EmitRawText(OS.str());
194         OutStreamer.EmitRawText(StringRef("\t.set\tnoreorder"));
195         OutStreamer.EmitRawText(StringRef("\t.set\tmacro"));
196         if (Cpu0FI->get.EmitNOAT())
197             OutStreamer.EmitRawText(StringRef("\t.set\tat"));
198     }
199 }
200
201 //--> .set macro
202 //--> .set reorder
203 //--> .end main
204 /// EmitFunctionBodyEnd - Targets can override this to emit stuff after
205 /// the last basic block in the function.
206 void Cpu0AsmPrinter::EmitFunctionBodyEnd() {
207     // There are instruction for this macros, but they must
208     // always be at the function end, and we can't emit and
209     // break with BB logic.
210     if (OutStreamer.hasRawTextSupport()) {
211         if (Cpu0FI->get.EmitNOAT())
212             OutStreamer.EmitRawText(StringRef("\t.set\tat"));
213             OutStreamer.EmitRawText(StringRef("\t.set\tmacro"));
214             OutStreamer.EmitRawText(StringRef("\t.set\treorder"));
215             OutStreamer.EmitRawText("\t.end\t" + Twine(CurrentFnSym->getName()));
216     }
217 }
218
219 // .section .mdebug.abi32
220 // .previous
221 void Cpu0AsmPrinter::EmitStartOfAsmFile(Module &M) {
222     // FIXME: Use SwitchSection.
223
224     // Tell the assembler which ABI we are using
225     if (OutStreamer.hasRawTextSupport())
226         OutStreamer.EmitRawText("\t.section .mdebug." +
227             Twine(getCurrentABIString()));
228
229     // return to previous section
230     if (OutStreamer.hasRawTextSupport())
231         OutStreamer.EmitRawText(StringRef("\t.previous"));
232 }
233
234 MachineLocation
235 Cpu0AsmPrinter::getDebugValueLocation(const MachineInstr *MI) const {
236     // Handles frame addresses emitted in Cpu0InstrInfo::emitFrameIndexDebugValue.
237     assert(MI->getNumOperands() == 4 && "Invalid no. of machine operands!");
238     assert(MI->getOperand(0).isReg() && MI->getOperand(1).isImm() &&
239         "Unexpected MachineOperand types");
240     return MachineLocation(MI->getOperand(0).getReg(),
241                           MI->getOperand(1).getImm());
242 }
243
244 void Cpu0AsmPrinter::PrintDebugValueComment(const MachineInstr *MI,

```

```
245                                     raw_ostream &OS) {
246     // TODO: implement
247     OS << "PrintDebugValueComment ()";
248 }
249
250 // Force static initialization.
251 extern "C" void LLVMInitializeCpu0AsmPrinter() {
252     RegisterAsmPrinter<Cpu0AsmPrinter> X(TheCpu0Target);
253     RegisterAsmPrinter<Cpu0AsmPrinter> Y(TheCpu0elTarget);
254 }
```

The dynamic register mechanism is a good idea, right.

Beyond add these new .cpp files to CMakeLists.txt, please remember to add subdirectory InstPrinter, enable asm-printer, add libraries AsmPrinter and Cpu0AsmPrinter to LLVMBuild.txt as follows,

### LLVMBackendTutorialExampleCode/Chapter3\_2/CMakeLists.txt

```
tablegen(LLVM Cpu0GenCodeEmitter.inc -gen-emitter)
tablegen(LLVM Cpu0GenMCCodeEmitter.inc -gen-emitter -mc-emitter)

tablegen(LLVM Cpu0GenAsmWriter.inc -gen-asm-writer)
...
add_llvm_target(Cpu0CodeGen
    Cpu0AsmPrinter.cpp
    ...
    Cpu0MCInstLower.cpp
    ...
)
...
add_subdirectory(InstPrinter)
...
```

### LLVMBackendTutorialExampleCode/Chapter3\_2/LLVMBuild.txt

```
// LLVMBuild.txt
[common]
subdirectories = InstPrinter MCTargetDesc TargetInfo

[component_0]
...
# Please enable asmprinter
has_asmprinter = 1
...

[component_1]
# Add AsmPrinter Cpu0AsmPrinter
required_libraries = AsmPrinter ... Cpu0AsmPrinter ...
```

Now, run Chapter3\_2/Cpu0 for AsmPrinter support, will get error message as follows,

```
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch3.bc -o
ch3.cpu0.s
/Users/Jonathan/llvm/test/cmake_debug_build/bin/Debug/llc: target does not
support generation of this file type!
```

The `llc` fails to compile IR code into machine code since we didn't implement class `Cpu0DAGToDAGISel`. Before the implementation, we will introduce the LLVM Code Generation Sequence, DAG, and LLVM instruction selection in next 3 sections.

### 3.3 LLVM Code Generation Sequence

Following diagram came from `tricore_llvm.pdf`.



Figure 3.5: `tricore_llvm.pdf`: Code generation sequence. On the path from LLVM code to assembly code, numerous passes are run through and several data structures are used to represent the intermediate results.

LLVM is a Static Single Assignment (SSA) based representation. LLVM provides an infinite virtual registers which can hold values of primitive type (integral, floating point, or pointer values). So, every operand can save in different virtual register in llvm SSA representation. Comment is `;` in llvm representation. Following is the llvm SSA instructions.

```

store i32 0, i32* %a ; store i32 type of 0 to virtual register %a, %a is
; pointer type which point to i32 value
store i32 %b, i32* %c ; store %b contents to %c point to, %b is i32 type virtual
; register, %c is pointer type which point to i32 value.
%a1 = load i32* %a ; load the memory value where %a point to and assign the
; memory value to %a1
%a3 = add i32 %a2, 1 ; add %a2 and 1 and save to %a3

```

We explain the code generation process as below. If you don't feel comfortable, please check `tricore_llvm.pdf` section 4.2 first. You can read “The LLVM Target-Independent Code Generator” from <sup>3</sup> and “LLVM Language Reference Manual” from <sup>4</sup> before go ahead, but we think read section 4.2 of `tricore_llvm.pdf` is enough. We suggest you read the web site documents as above only when you are still not quite understand, even though you have read the articles of this section and next 2 sections for DAG and Instruction Selection.

<sup>3</sup> <http://llvm.org/docs/CodeGenerator.html>

<sup>4</sup> <http://llvm.org/docs/LangRef.html>

### 1. Instruction Selection

```
// In this stage, transfer the llvm opcode into machine opcode, but the operand
// still is llvm virtual operand.
store i16 0, i16* %a // store 0 of i16 type to where virtual register %a
                      // point to
=> addiu i16 0, i32* %a
```

### 2. Scheduling and Formation

```
// In this stage, reorder the instructions sequence for optimization in
// instructions cycle or in register pressure.
st i32 %a, i16* %b, i16 5 // st %a to *(%b+5)
st %b, i32* %c, i16 0
%d = ld i32* %c

// Transfer above instructions order as follows. In RISC like Mips the ld %c use
// the previous instruction st %c, must wait more than 1
// cycles. Meaning the ld cannot follow st immediately.
=> st %b, i32* %c, i16 0
    st i32 %a, i16* %b, i16 5
    %d = ld i32* %c, i16 0
// If without reorder instructions, a instruction nop which do nothing must be
// filled, contribute one instruction cycle more than optimization. (Actually,
// Mips is scheduled with hardware dynamically and will insert nop between st
// and ld instructions if compiler didn't insert nop.)
st i32 %a, i16* %b, i16 5
st %b, i32* %c, i16 0
nop
%d = ld i32* %c, i16 0

// Minimum register pressure
// Suppose %c is alive after the instructions basic block (meaning %c will be
// used after the basic block), %a and %b are not alive after that.
// The following no reorder version need 3 registers at least
%a = add i32 1, i32 0
%b = add i32 2, i32 0
st %a, i32* %c, 1
st %b, i32* %c, 2

// The reorder version need 2 registers only (by allocate %a and %b in the same
// register)
=> %a = add i32 1, i32 0
    st %a, i32* %c, 1
    %b = add i32 2, i32 0
    st %b, i32* %c, 2
```

### 3. SSA-based Machine Code Optimization

For example, common expression remove, shown in next section DAG.

### 4. Register Allocation

Allocate real register for virtual register.

### 5. Prologue/Epilogue Code Insertion

Explain in section Add Prologue/Epilogue functions

### 6. Late Machine Code Optimizations

Any “last-minute” peephole optimizations of the final machine code can be applied during this phase. For example, replace  $x = x * 2$  by  $x = x < 1$  for integer operand.

7. **Code Emission** Finally, the completed machine code is emitted. For static compilation, the end result is an assembly code file; for JIT compilation, the opcodes of the machine instructions are written into memory.

The llv<sub>m</sub> code generation sequence also can be obtained by `llc -debug-pass=Structure` as the following. The first 4 code generation sequences from Figure 3.5 are in the ‘**DAG->DAG Pattern Instruction Selection**’ of the `llc -debug-pass=Structure` displayed. The order of Peephole Optimizations and Prologue/Epilogue Insertion is inconsistent in them (please check the \* in the following). No need to bother since the LLVM is under development and changed all the time.

```
118-165-79-200:InputFiles Jonathan$ llc --help-hidden
OVERVIEW: llvm system compiler

USAGE: llc [options] <input bitcode>

OPTIONS:
...
  -debug-pass           - Print PassManager debugging information
  =None                - disable debug output
  =Arguments           - print pass arguments to pass to 'opt'
  =Structure            - print pass structure before run()
  =Executions           - print pass name before it is executed
  =Details              - print pass details when it is executed

118-165-79-200:InputFiles Jonathan$ llc -march=mips -debug-pass=Structure ch3.bc
...
Target Library Information
Target Transform Info
Data Layout
Target Pass Configuration
No Alias Analysis (always returns 'may' alias)
Type-Based Alias Analysis
Basic Alias Analysis (stateless AA impl)
Create Garbage Collector Module Metadata
Machine Module Information
Machine Branch Probability Analysis
  ModulePass Manager
    FunctionPass Manager
      Preliminary module verification
      Dominator Tree Construction
      Module Verifier
      Natural Loop Information
      Loop Pass Manager
        Canonicalize natural loops
      Scalar Evolution Analysis
      Loop Pass Manager
        Canonicalize natural loops
        Induction Variable Users
        Loop Strength Reduction
    Lower Garbage Collection Instructions
    Remove unreachable blocks from the CFG
    Exception handling preparation
    Optimize for code generation
    Insert stack protectors
    Preliminary module verification
    Dominator Tree Construction
    Module Verifier
```

Machine Function Analysis  
Natural Loop Information  
Branch Probability Analysis  
\* MIPS DAG->DAG Pattern Instruction Selection  
Expand ISel Pseudo-instructions  
Tail Duplication  
Optimize machine instruction PHIs  
MachineDominator Tree Construction  
Slot index numbering  
Merge disjoint stack slots  
Local Stack Slot Allocation  
Remove dead machine instructions  
MachineDominator Tree Construction  
Machine Natural Loop Construction  
Machine Loop Invariant Code Motion  
Machine Common Subexpression Elimination  
Machine code sinking  
\* Peephole Optimizations  
Process Implicit Definitions  
Remove unreachable machine basic blocks  
Live Variable Analysis  
Eliminate PHI nodes **for** register allocation  
Two-Address instruction pass  
Slot index numbering  
Live Interval Analysis  
Debug Variable Analysis  
Simple Register Coalescing  
Live Stack Slot Analysis  
Calculate spill weights  
Virtual Register Map  
Live Register Matrix  
Bundle Machine CFG Edges  
Spill Code Placement Analysis  
\* Greedy Register Allocator  
Virtual Register Rewriter  
Stack Slot Coloring  
Machine Loop Invariant Code Motion  
\* Prologue/Epilogue Insertion & Frame Finalization  
Control Flow Optimizer  
Tail Duplication  
Machine Copy Propagation Pass  
\* Post-RA pseudo instruction expansion pass  
MachineDominator Tree Construction  
Machine Natural Loop Construction  
Post RA top-down list latency scheduler  
Analyze Machine Code For Garbage Collection  
Machine Block Frequency Analysis  
Branch Probability Basic Block Placement  
Mips Delay Slot Filler  
Mips Long Branch  
MachineDominator Tree Construction  
Machine Natural Loop Construction  
\* Mips Assembly Printer  
Delete Garbage Collector Information

## 3.4 DAG (Directed Acyclic Graph)

Many important techniques for local optimization begin by transforming a basic block into DAG. For example, the basic block code and it's corresponding DAG as Figure 3.6.



Figure 3.6: DAG example

If  $b$  is not live on exit from the block, then we can do common expression remove to get the following code.

```

a = b + c
d = a - d
c = d + c
  
```

As you can imagine, the common expression remove can apply in IR or machine code.

DAG like a tree which opcode is the node and operand (register and const/immediate/offset) is leaf. It can also be represented by list as prefix order in tree. For example,  $(+ b, c), (+ b, 1)$  is IR DAG representation.

## 3.5 Instruction Selection

In back end, we need to translate IR code into machine code at Instruction Selection Process as Figure 3.7.

|      |             |                      |
|------|-------------|----------------------|
| MOV  | $r_d = r_s$ | $r_d = r_s + 0$      |
| MOV  | $r_d = r_s$ | $r_d = r_{s1} + r_0$ |
| MOVI | $r_d = c$   | $r_d = r_0 + c$      |

Figure 3.7: IR and it's corresponding machine instruction

For machine instruction selection, the better solution is represent IR and machine instruction by DAG. In Figure 3.8, we skip the register leaf. The  $rj + rk$  is IR DAG representation (for symbol notation, not llvm SSA form). ADD is machine instruction.

## Instruction Tree Patterns

| Name | Effect                     | Trees                                                              |
|------|----------------------------|--------------------------------------------------------------------|
| —    | $r_i$                      | TEMP                                                               |
| ADD  | $r_i \quad r_j + r_k$      | <pre>  +   *   /</pre>                                             |
| MUL  | $r_i \quad r_j \times r_k$ | <pre>  *   /</pre>                                                 |
| SUB  | $r_i \quad r_j - r_k$      | <pre>  -   /   \</pre>                                             |
| DIV  | $r_i \quad r_j / r_k$      | <pre>  /   /   \</pre>                                             |
| ADDI | $r_i \quad r_j + c$        | <pre>  +   CONST   CONST</pre>                                     |
| SUBI | $r_i \quad r_j - c$        | <pre>  -   CONST</pre>                                             |
| LOAD | $r_i \quad M[r_j + c]$     | <pre>  MEM   +   CONST   MEM   +   CONST   MEM   CONST   MEM</pre> |

Figure 3.8: Instruction DAG representation

The IR DAG and machine instruction DAG can also be represented as lists. For example,  $(+ r_i, r_j)$ ,  $(- r_i, 1)$  are lists for IR DAG;  $(ADD r_i, r_j)$ ,  $(SUBI r_i, 1)$  are lists for machine instruction DAG.

Now, let's recall the ADDiu instruction defined on Cpu0InstrInfo.td in the previous chapter. List them again as follows,

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0InstrFormats.td

```

}

//=====
// Format L instruction class in Cpu0 : </opcode/ra/rb/cx/>
//=====

class FL<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
          InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmL>
{
    bits<4> ra;
    bits<4> rb;
    bits<16> imm16;

    let Opcode = op;

    let Inst{23-20} = ra;
    let Inst{19-16} = rb;
    let Inst{15-0} = imm16;
}

```

```
//=====//
```

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0InstrInfo.td

```
// Arithmetic and logical instructions with 2 register operands.
class ArithLogicI<bits<8> op, string instr_asm, SDNode OpNode,
    Operand Od, PatLeaf imm_type, RegisterClass RC> :
    FL<op, (outs RC:$ra), (ins RC:$rb, Od:$imm16),
    !strconcat(instr_asm, "\t$ra, $rb, $imm16"),
    [(set RC:$ra, (OpNode RC:$rb, imm_type:$imm16))], IIAlu> {
    let isReMaterializable = 1;
}
...
def ADDiu : ArithLogicI<0x09, "addiu", add, simm16, immSExt16, CPURegs>;
```

Figure 3.9 show how the pattern match work in the IR node **add** and instruction ADDiu defined in Cpu0InstrInfo.td. For the example IR node “add %a, 5”, will be translated to “addiu %r1, 5” since the IR pattern[(set RC:\$ra, (OpNode RC:\$rb, imm\_type:\$imm16))] is set in ADDiu and the 2nd operand is signed immediate which matched “%a, 5”. In addition to pattern match, the .td also set assembly string “addiu” and op code 0x09. With this information, the LLVM TableGen will generate instruction both in assembly and binary automatically (the binary instruction in obj file of ELF format which will shown at later chapter). Similarly, the machine instruction DAG node LD and ST can be got from IR DAG node **load** and **store**.

```
class ArithLogicI<bits<8> op, string instr_asm, SDNode OpNode,
    Operand Od, PatLeaf imm_type, RegisterClass RC> :
    FL<op, (outs RC:$ra), (ins RC:$rb, Od:$imm16),
    !strconcat(instr_asm, "\t$ra, $rb, $imm16"),
    [(set RC:$ra, (OpNode RC:$rb, imm_type:$imm16))], IIAlu> {
    let isReMaterializable = 1;
}
def ADDiu : ArithLogicI<0x09, "addiu", add, simm16, immSExt16,
CPURegs>;
```

Tree



List

- $(\text{add } \%a, 5) \rightarrow (\text{addiu } \$r1, 5)$

Figure 3.9: Pattern match for ADDiu instruction and IR node add

Some cpu/fpu (floating point processor) has multiply-and-add floating point instruction, fmadd. It can be represented by DAG list (fadd (fmul ra, rc), rb). For this implementation, we can assign fmadd DAG pattern to instruction td as follows,

```
def FMADDS : AForm_1<59, 29,
  (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
  "fmadds $FRT, $FRA, $FRC, $FRB",
  [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
  F4RC:$FRB))]>;
```

Similar with ADDiu, [(set F4RC:\$FRT, (fadd (fmul F4RC:\$FRA, F4RC:\$FRC), F4RC:\$FRB))] is the pattern which include node **fmul** and node **fadd**.

Now, for the following basic block notation IR and llvm SSA IR code,

```
d = a * c
e = d + b
...
%d = fmul %a, %c
%e = fadd %d, %b
...
```

The llvm SelectionDAG Optimization Phase (is part of Instruction Selection Process) prefered to translate this 2 IR DAG node (fmul %a, %b) (fadd %d, %c) into one machine instruction DAG node (**fmadd** %a, %c, %b), than translate them into 2 machine instruction nodes **fmul** and **fadd**.

```
%e = fmadd %a, %c, %b
...
```

As you can see, the IR notation representation is easier to read then llvm SSA IR form. So, we use the notation form in this book sometimes.

For the following basic block code,

```
a = b + c // in notation IR form
d = a - d
%e = fmadd %a, %c, %b // in llvm SSA IR form
```

We can apply Figure 3.7 Instruction tree pattern to get the following machine code,

```
load rb, M(sp+8); // assume b allocate in sp+8, sp is stack point register
load rc, M(sp+16);
add ra, rb, rc;
load rd, M(sp+24);
sub rd, ra, rd;
fmadd re, ra, rc, rb;
```

## 3.6 Add Cpu0DAGToDAGISel class

The IR DAG to machine instruction DAG transformation is introduced in the previous section. Now, let's check what IR DAG nodes the file ch3.bc has. List ch3.ll as follows,

```
// ch3.ll
define i32 @main() nounwind uwtable {
%1 = alloca i32, align 4
store i32 0, i32* %1
```

```
ret i32 0
}
```

As above, ch3.ll use the IR DAG node **store**, **ret**. Actually, it also use **add** for sp (stack point) register adjust. So, the definitions in Cpu0InstrInfo.td as follows is enough. IR DAG is defined in file include/llvm/Target/TargetSelectionDAG.td.

### LLVMBackendTutorialExampleCode/Chapter3\_2/Cpu0InstrInfo.td

```
//=====

/// Load and Store Instructions
/// aligned
defm LD      : LoadM32<0x01, "ld", load_a>;
defm ST      : StoreM32<0x02, "st", store_a>;

/// Arithmetic Instructions (ALU Immediate)
// IR "add" defined in include/llvm/Target/TargetSelectionDAG.td, line 315 (def add).
def ADDiu   : ArithLogicI<0x09, "addiu", add, simm16, immSExt16, CPURegs>;

let isReturn=1, isTerminator=1, hasDelaySlot=1, isCodeGenOnly=1,
    isBarrier=1, hasCtrlDep=1 in
  def RET : FJ <0x2C, (outs), (ins CPURegs:$target),
    "ret\t$target", [(Cpu0Ret CPURegs:$target)], IIBranch>;

//=====
```

Add class Cpu0DAGToDAGISel (Cpu0ISelDAGToDAG.cpp) to CMakeLists.txt, and add following fragment to Cpu0TargetMachine.cpp,

### LLVMBackendTutorialExampleCode/Chapter3\_3/CMakeLists.txt

```
add_llvm_target(...

...
Cpu0ISelDAGToDAG.cpp
...
)
```

### LLVMBackendTutorialExampleCode/Chapter3\_3/Cpu0TargetMachine.cpp

```
}

virtual bool addInstSelector();

bool Cpu0PassConfig::addInstSelector() {
  addPass(createCpu0ISelDag(getCpu0TargetMachine()));
  return false;
}
```

### LLVMBackendTutorialExampleCode/Chapter3\_3/Cpu0ISelDAGToDAG.cpp

```
1 //===== Cpu0ISelDAGToDAG.cpp - A Dag to Dag Inst Selector for Cpu0 =====//  
2 //  
3 // The LLVM Compiler Infrastructure  
4 //  
5 // This file is distributed under the University of Illinois Open Source  
6 // License. See LICENSE.TXT for details.  
7 //  
8 //=====//  
9 //  
10 // This file defines an instruction selector for the CPU0 target.  
11 //  
12 //=====//  
13  
14 #define DEBUG_TYPE "cpu0-isel"  
15 #include "Cpu0.h"  
16 #include "Cpu0RegisterInfo.h"  
17 #include "Cpu0Subtarget.h"  
18 #include "Cpu0TargetMachine.h"  
19 #include "MCTargetDesc/Cpu0BaseInfo.h"  
20 #include "llvm/IR/GlobalValue.h"  
21 #include "llvm/IR/Instructions.h"  
22 #include "llvm/IR/Intrinsics.h"  
23 #include "llvm/Support/CFG.h"  
24 #include "llvm/IR/Type.h"  
25 #include "llvm/CodeGen/MachineConstantPool.h"  
26 #include "llvm/CodeGen/MachineFunction.h"  
27 #include "llvm/CodeGen/MachineFrameInfo.h"  
28 #include "llvm/CodeGen/MachineInstrBuilder.h"  
29 #include "llvm/CodeGen/MachineRegisterInfo.h"  
30 #include "llvm/CodeGen/SelectionDAGISel.h"  
31 #include "llvm/CodeGen/SelectionDAGNodes.h"  
32 #include "llvm/Target/TargetMachine.h"  
33 #include "llvm/Support/Debug.h"  
34 #include "llvm/Support/ErrorHandling.h"  
35 #include "llvm/Support/raw_ostream.h"  
36 using namespace llvm;  
37  
38 //=====//  
39 // Instruction Selector Implementation  
40 //=====//  
41  
42 //=====//  
43 // Cpu0DAGToDAGISel - CPU0 specific code to select CPU0 machine  
44 // instructions for SelectionDAG operations.  
45 //=====//  
46 namespace {  
47  
48 class Cpu0DAGToDAGISel : public SelectionDAGISel {  
49  
50     /// TM - Keep a reference to Cpu0TargetMachine.  
51     Cpu0TargetMachine &TM;  
52  
53     /// Subtarget - Keep a pointer to the Cpu0Subtarget around so that we can  
54     /// make the right decision when generating code for different targets.  
55     const Cpu0Subtarget &Subtarget;  
56  
57 public:  
58     explicit Cpu0DAGToDAGISel(Cpu0TargetMachine &tm) :  
59 }
```

```

59     SelectionDAGISel(tm),
60     TM(tm), Subtarget(tm.getSubtarget<Cpu0Subtarget>()) {}
61
62     // Pass Name
63     virtual const char *getPassName() const {
64         return "CPU0 DAG->DAG Pattern Instruction Selection";
65     }
66
67     virtual bool runOnMachineFunction(MachineFunction &MF);
68
69 private:
70     // Include the pieces autogenerated from the target description.
71     #include "Cpu0GenDAGISel.inc"
72
73     /// getTargetMachine - Return a reference to the TargetMachine, casted
74     /// to the target-specific type.
75     const Cpu0TargetMachine &getTargetMachine() {
76         return static_cast<const Cpu0TargetMachine &>(TM);
77     }
78
79     /// getInstrInfo - Return a reference to the TargetInstrInfo, casted
80     /// to the target-specific type.
81     const Cpu0InstrInfo *getInstrInfo() {
82         return getTargetMachine().getInstrInfo();
83     }
84
85     SDNode *getGlobalBaseReg();
86
87     SDNode *Select(SDNode *N);
88     // Complex Pattern.
89     bool SelectAddr(SDNode *Parent, SDValue N, SDValue &Base, SDValue &Offset);
90     // getImm - Return a target constant with the specified value.
91     inline SDValue getImm(const SDNode *Node, unsigned Imm) {
92         return CurDAG->getTargetConstant(Imm, Node->getValueType(0));
93     }
94 }
95 }
96
97 bool Cpu0DAGToDAGISel::runOnMachineFunction(MachineFunction &MF) {
98     bool Ret = SelectionDAGISel::runOnMachineFunction(MF);
99
100    return Ret;
101 }
102
103     /// ComplexPattern used on Cpu0InstrInfo
104     /// Used on Cpu0 Load/Store instructions
105     bool Cpu0DAGToDAGISel::
106     SelectAddr(SDNode *Parent, SDValue Addr, SDValue &Base, SDValue &Offset) {
107         EVT ValTy = Addr.getValueType();
108
109         // If Parent is an unaligned f32 load or store, select a (base + index)
110         // floating point load/store instruction (luxcl or luxcl).
111         const LSBaseSDNode* LS = 0;
112
113         if (Parent && (LS = dyn_cast<LSBaseSDNode>(Parent))) {
114             EVT VT = LS->getMemoryVT();
115
116             if (VT.getSizeInBits() / 8 > LS->getAlignment()) {

```

```

117     assert(TLI.allowsUnalignedMemoryAccesses(VT) &&
118         "Unaligned loads/stores not supported for this type.");
119     if (VT == MVT::f32)
120         return false;
121     }
122     }
123
124     // if Address is FI, get the TargetFrameIndex.
125     if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Addr)) {
126         Base = CurDAG->getTargetFrameIndex(FIN->getIndex(), ValTy);
127         Offset = CurDAG->getTargetConstant(0, ValTy);
128         return true;
129     }
130
131     Base = Addr;
132     Offset = CurDAG->getTargetConstant(0, ValTy);
133     return true;
134 }
135
136     /// Select instructions not customized! Used for
137     /// expanded, promoted and normal instructions
138 SDNode* Cpu0DAGToDAGISel::Select(SDNode *Node) {
139     unsigned Opcode = Node->getOpcode();
140
141     // Dump information about the Node being selected
142     DEBUG(errs() << "Selecting: "; Node->dump(CurDAG); errs() << "\n");
143
144     // If we have a custom node, we already have selected!
145     if (Node->isMachineOpcode()) {
146         DEBUG(errs() << " == "; Node->dump(CurDAG); errs() << "\n");
147         return NULL;
148     }
149
150     ///
151     // Instruction Selection not handled by the auto-generated
152     // tablegen selection should be handled here.
153     ///
154
155     switch(Opcode) {
156     default: break;
157
158     case ISD::Constant: {
159         const ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Node);
160         unsigned Size = CN->getValueSizeInBits(0);
161
162         if (Size == 32)
163             break;
164     }
165     }
166
167     // Select the default instruction
168     SDNode *ResNode = SelectCode(Node);
169
170     DEBUG(errs() << "=> ");
171     if (ResNode == NULL || ResNode == Node)
172         DEBUG(Node->dump(CurDAG));
173     else
174         DEBUG(ResNode->dump(CurDAG));

```

```

175     DEBUG(errs() << "\n");
176     return ResNode;
177 }
178
179 /// createCpu0ISelDag - This pass converts a legalized DAG into a
180 /// CPU0-specific DAG, ready for instruction scheduling.
181 FunctionPass *llvm::createCpu0ISelDag(Cpu0TargetMachine &TM) {
182     return new Cpu0DAGToDAGISel(TM);
183 }

```

This version adding the following code in Cpu0InstInfo.cpp to enable debug information which called by llvm at proper time.

#### LLVMBackendTutorialExampleCode/Chapter3\_3/Cpu0InstrInfo.h

```

class Cpu0InstrInfo : public Cpu0GenInstrInfo {
    ...
    virtual MachineInstr* emitFrameIndexDebugValue(MachineFunction &MF,
                                                    int FrameIx, uint64_t Offset,
                                                    const MDNode *MDPtr,
                                                    DebugLoc DL) const;
};

```

#### LLVMBackendTutorialExampleCode/Chapter3\_3/Cpu0InstrInfo.cpp

```

#include "llvm/CodeGen/MachineInstrBuilder.h"
}

MachineInstr*
Cpu0InstrInfo::emitFrameIndexDebugValue(MachineFunction &MF, int FrameIx,
                                         uint64_t Offset, const MDNode *MDPtr,
                                         DebugLoc DL) const {
    MachineInstrBuilder MIB = BuildMI(MF, DL, get(Cpu0::DBG_VALUE))
        .addFrameIndex(FrameIx).addImm(0).addImm(Offset).addMetadata(MDPtr);
    return &*MIB;
}

```

Build Chapter3\_3, run it, we find the error message in Chapter3\_2 is gone. The new error message for Chapter3\_3 as follows,

```

118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch3.bc -o
ch3.cpu0.s
...
Target didn't implement TargetInstrInfo::storeRegToStackSlot!
1. Running pass 'Function Pass Manager' on module 'ch3.bc'.
2. Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on function
'@main'
...

```

## 3.7 Add Prologue/Epilogue functions

Following came from tricore\_llvm.pdf section “4.4.2 Non-static Register Information”.

For some target architectures, some aspects of the target architecture’s register set are dependent upon variable factors and have to be determined at runtime. As a consequence, they cannot be generated statically from a TableGen description – although that would be possible for the bulk of them in the case of the TriCore backend. Among them are the following points:

- Callee-saved registers. Normally, the ABI specifies a set of registers that a function must save on entry and restore on return if their contents are possibly modified during execution.
- Reserved registers. Although the set of unavailable registers is already defined in the TableGen file, TriCoreRegisterInfo contains a method that marks all non-allocatable register numbers in a bit vector.

The following methods are implemented:

- emitPrologue() inserts prologue code at the beginning of a function. Thanks to TriCore’s context model, this is a trivial task as it is not required to save any registers manually. The only thing that has to be done is reserving space for the function’s stack frame by decrementing the stack pointer. In addition, if the function needs a frame pointer, the frame register `%a14` is set to the old value of the stack pointer beforehand.
- emitEpilogue() is intended to emit instructions to destroy the stack frame and restore all previously saved registers before returning from a function. However, as `%a10` (stack pointer), `%a11` (return address), and `%a14` (frame pointer, if any) are all part of the upper context, no epilogue code is needed at all. All cleanup operations are performed implicitly by the `ret` instruction.
- eliminateFrameIndex() is called for each instruction that references a word of data in a stack slot. All previous passes of the code generator have been addressing stack slots through an abstract frame index and an immediate offset. The purpose of this function is to translate such a reference into a register-offset pair. Depending on whether the machine function that contains the instruction has a fixed or a variable stack frame, either the stack pointer `%a10` or the frame pointer `%a14` is used as the base register. The offset is computed accordingly. [Figure 3.10](#) demonstrates for both cases how a stack slot is addressed.

If the addressing mode of the affected instruction cannot handle the address because the offset is too large (the offset field has 10 bits for the BO addressing mode and 16 bits for the BOL mode), a sequence of instructions is emitted that explicitly computes the effective address. Interim results are put into an unused address register. If none is available, an already occupied address register is scavenged. For this purpose, LLVM’s framework offers a class named `RegScavenger` that takes care of all the details.

We will explain the Prologue and Epilogue further by example code. So for the following llvm IR code, Cpu0 backend will emit the corresponding machine instructions as follows,

```
define i32 @main() nounwind uwtable {
    %1 = alloca i32, align 4
    store i32 0, i32* %1
    ret i32 0
}

.section .mdebug.abi32
.previous
.file "ch3.bc"
.text
.globl main
.align 2
.type main,@function
.ent main          # @main
main:
.cfi_startproc
```



Figure 3.10: Addressing of a variable a located on the stack. If the stack frame has a variable size, slot must be addressed relative to the frame pointer

```

.frame $sp,8,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
    addiu $sp, $sp, -8
$tmp1:
    .cfi_def_cfa_offset 8
    addiu $2, $zero, 0
    st $2, 4($sp)
    addiu $sp, $sp, 8
    ret $lr
.set macro
.set reorder
.end main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc

```

LLVM get the stack size by parsing IR and counting how many virtual registers is assigned to local variables. After that, it call `emitPrologue()`. This function will emit machine instructions to adjust `sp` (stack pointer register) for local variables since we don't use `fp` (frame pointer register). For our example, it will emit the instructions,

```
addiu $sp, $sp, -8
```

The `emitEpilogue` will emit “`addiu $sp, $sp, 8`”, 8 is the stack size.

Since Instruction Selection and Register Allocation occurs before Prologue/Epilogue Code Insertion, `eliminateFrameIndex()` is called after machine instruction and real register allocated. It translate the frame index of local variable (%1 and %2 in the following example) into stack offset according the frame index order upward (stack grow up downward from high address to low address, 0(\$sp) is the top, 52(\$sp) is the bottom) as follows,

```

define i32 @main() nounwind uwtable {
    %1 = alloca i32, align 4
    %2 = alloca i32, align 4
    ...

```

```

store i32 0, i32* %1
store i32 5, i32* %2, align 4
...
ret i32 0

=> # BB#0:
    addiu $sp, $sp, -56
$tmp1:
    addiu $3, $zero, 0
    st $3, 52($sp)    // %1 is the first frame index local variable, so allocate
                       // in 52($sp)
    addiu $2, $zero, 5
    st $2, 48($sp)    // %2 is the second frame index local variable, so
                       // allocate in 48($sp)
...
ret $lr

```

The Prologue and Epilogue functions as follows,

### LLVMBackendTutorialExampleCode/Chapter3\_4/Cpu0FrameLowering.h

```

void emitPrologue(MachineFunction &MF) const;
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;

```

### LLVMBackendTutorialExampleCode/Chapter3\_4/Cpu0FrameLowering.cpp

```

static void expandLargeImm(unsigned Reg, int64_t Imm,
                           const Cpu0InstrInfo &TII, MachineBasicBlock &MBB,
                           MachineBasicBlock::iterator II, DebugLoc DL) {
    unsigned ADDu = Cpu0::ADDu;
    unsigned ZEROReg = Cpu0::ZERO;
    unsigned ATReg = Cpu0::AT;
    Cpu0AnalyzeImmediate AnalyzeImm;
    const Cpu0AnalyzeImmediate::InstSeq &Seq =
        AnalyzeImm.Analyze(Imm, 32, false /* LastInstrIsADDiu */);
    Cpu0AnalyzeImmediate::InstSeq::const_iterator Inst = Seq.begin();

    BuildMI(MBB, II, DL, TII.get(Inst->Opc), ATReg).addReg(ZEROReg)
        .addImm(SignExtend64<16>(Inst->ImmOpnd));

    // Build the remaining instructions in Seq.
    for (++Inst; Inst != Seq.end(); ++Inst)
        BuildMI(MBB, II, DL, TII.get(Inst->Opc), ATReg).addReg(ATReg)
            .addImm(SignExtend64<16>(Inst->ImmOpnd));

    BuildMI(MBB, II, DL, TII.get(ADDu), Reg).addReg(Reg).addReg(ATReg);
}

void Cpu0FrameLowering::emitPrologue(MachineFunction &MF) const {
    MachineBasicBlock &MBB = MF.front();
    MachineFrameInfo *MFI = MF.getFrameInfo();
    Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
    const Cpu0InstrInfo &TII =
        *static_cast<const Cpu0InstrInfo*>(MF.getTarget().getInstrInfo());
    MachineBasicBlock::iterator MBBI = MBB.begin();
}

```

```

DebugLoc dl = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc();
unsigned SP = Cpu0::SP;
unsigned ADDiu = Cpu0::ADDiu;
// First, compute final stack size.
unsigned StackAlign = getStackAlignment();
unsigned LocalVarAreaOffset = Cpu0FI->getMaxCallFrameSize();
uint64_t StackSize = RoundUpToAlignment(LocalVarAreaOffset, StackAlign) +
    RoundUpToAlignment(MFI->getStackSize(), StackAlign);

// Update stack size
MFI->setStackSize(StackSize);

// No need to allocate space on the stack.
if (StackSize == 0 && !MFI->adjustsStack()) return;

MachineModuleInfo &MMI = MF.getMMI();
std::vector<MachineMove> &Moves = MMI.getFrameMoves();
MachineLocation DstML, SrcML;

// Adjust stack.
if (isInt<16>(-StackSize)) // addiu sp, sp, (-stacksize)
    BuildMI(MBB, MBBI, dl, TII.get(ADDiu), SP).addReg(SP).addImm(-StackSize);
else { // Expand immediate that doesn't fit in 16-bit.
    Cpu0FI->setEmitNOAT();
    expandLargeImm(SP, -StackSize, TII, MBB, MBBI, dl);
}

// emit ".cfi_def_cfa_offset StackSize"
MCSymbol *AdjustSPLabel = MMI.getContext().CreateTempSymbol();
BuildMI(MBB, MBBI, dl,
    TII.get(TargetOpcode::PROLOG_LABEL)).addSym(AdjustSPLabel);
DstML = MachineLocation(MachineLocation::VirtualFP);
SrcML = MachineLocation(MachineLocation::VirtualFP, -StackSize);
Moves.push_back(MachineMove(AdjustSPLabel, DstML, SrcML));

const std::vector<CalleeSavedInfo> &CSI = MFI->getCalleeSavedInfo();

if (CSI.size()) {
    // Find the instruction past the last instruction that saves a callee-saved
    // register to the stack.
    for (unsigned i = 0; i < CSI.size(); ++i)
        ++MBBI;

    // Iterate over list of callee-saved registers and emit .cfi_offset
    // directives.
    MCSymbol *CSLabel = MMI.getContext().CreateTempSymbol();
    BuildMI(MBB, MBBI, dl,
        TII.get(TargetOpcode::PROLOG_LABEL)).addSym(CSLabel);

    for (std::vector<CalleeSavedInfo>::const_iterator I = CSI.begin(),
        E = CSI.end(); I != E; ++I) {
        int64_t Offset = MFI->getObjectOffset(I->getFrameIdx());
        unsigned Reg = I->getReg();
        {
            // Reg is either in CPUREgs or FGR32.
            DstML = MachineLocation(MachineLocation::VirtualFP, Offset);
            SrcML = MachineLocation(Reg);
            Moves.push_back(MachineMove(CSLabel, DstML, SrcML));
        }
    }
}

```

```

        }
    }
}

void Cpu0FrameLowering::emitEpilogue(MachineFunction &MF,
                                      MachineBasicBlock &MBB) const {
    MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
    MachineFrameInfo *MFI = MF.getFrameInfo();
    Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
    const Cpu0InstrInfo &TII =
        *static_cast<const Cpu0InstrInfo*>(MF.getTarget().getInstrInfo());
    DebugLoc dl = MBBI->getDebugLoc();
    unsigned SP = Cpu0::SP;
    unsigned ADDiu = Cpu0::ADDiu;

    // Get the number of bytes from FrameInfo
    uint64_t StackSize = MFI->getStackSize();

    if (!StackSize)
        return;

    // Adjust stack.
    if (isInt<16>(StackSize)) // addiu sp, sp, (stacksize)
        BuildMI(MBB, MBBI, dl, TII.get(ADDiu), SP).addReg(SP).addImm(StackSize);
    else { // Expand immediate that doesn't fit in 16-bit.
        Cpu0FI->setEmitNOAT();
        expandLargeImm(SP, StackSize, TII, MBB, MBBI, dl);
    }
}

// This method is called immediately before PrologEpilogInserter scans the
// physical registers used to determine what callee saved registers should be
// spilled. This method is optional.
// Without this will have following errors,
// Target didn't implement TargetInstrInfo::storeRegToStackSlot!
// UNREACHABLE executed at /usr/local/llvm/3.1.test/cpu0/1/src/include/llvm/
// Target/TargetInstrInfo.h:390!
// Stack dump:
// 0.      Program arguments: /usr/local/llvm/3.1.test/cpu0/1/cmake_debug_build/
// bin/llc -march=cpu0 -relocation-model=pic -filetype=asm ch0.bc -o
// ch0.cpu0.s
// 1.      Running pass 'Function Pass Manager' on module 'ch0.bc'.
// 2.      Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on
//         function '@main'
// Aborted (core dumped)

// Must exist
//     ldi      $sp, $sp, 8
//->     ret      $lr
//     .set      macro
//     .set      reorder
//     .end      main
void Cpu0FrameLowering::
processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
                                      RegScavenger *RS) const {
    MachineRegisterInfo& MRI = MF.getRegInfo();
}

```

```

// FIXME: remove this code if register allocator can correctly mark
//         $fp and $ra used or unused.

// The register allocator might determine $ra is used after seeing
// instruction "jr $ra", but we do not want PrologEpilogInserter to insert
// instructions to save/restore $ra unless there is a function call.
// To correct this, $ra is explicitly marked unused if there is no
// function call.
if (MF.getFrameInfo()->hasCalls())
    MRI.setPhysRegUsed(Cpu0::LR);
else {
    MRI.setPhysRegUnused(Cpu0::LR);
}
}

```

### LLVMBackendTutorialExampleCode/Chapter3\_4/Cpu0AnalyzeImmediate.h

```

1 //===== Cpu0AnalyzeImmediate.h - Analyze Immediates -----*-- C++ -*****//
2 //
3 //          The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====
9 #ifndef CPU0_ANALYZE_IMMEDIATE_H
10 #define CPU0_ANALYZE_IMMEDIATE_H
11
12 #include "llvm/ADT/SmallVector.h"
13 #include "llvm/Support/DataTypes.h"
14
15 namespace llvm {
16
17     class Cpu0AnalyzeImmediate {
18     public:
19         struct Inst {
20             unsigned Opc, ImmOpnd;
21             Inst(unsigned Opc, unsigned ImmOpnd);
22         };
23         typedef SmallVector<Inst, 7> InstSeq;
24
25         /// Analyze - Get an instruction sequence to load immediate Imm. The last
26         /// instruction in the sequence must be an ADDiu if LastInstrIsADDiu is
27         /// true;
28         const InstSeq &Analyze(uint64_t Imm, unsigned Size, bool LastInstrIsADDiu);
29     private:
30         typedef SmallVector<InstSeq, 5> InstSeqLs;
31
32         /// AddInstr - Add I to all instruction sequences in SeqLs.
33         void AddInstr(InstSeqLs &SeqLs, const Inst &I);
34
35         /// GetInstSeqLsADDiu - Get instruction sequences which end with an ADDiu to
36         /// load immediate Imm
37         void GetInstSeqLsADDiu(uint64_t Imm, unsigned RemSize, InstSeqLs &SeqLs);
38
39         /// GetInstSeqLsORi - Get instruction sequences which end with an ORi to

```

```

40     /// load immediate Imm
41     void GetInstSeqLsORi(uint64_t Imm, unsigned RemSize, InstSeqLs &SeqLs);
42
43     /// GetInstSeqLsSHL - Get instrucion sequences which end with a SHL to
44     /// load immediate Imm
45     void GetInstSeqLsSHL(uint64_t Imm, unsigned RemSize, InstSeqLs &SeqLs);
46
47     /// GetInstSeqLs - Get instrucion sequences to load immediate Imm.
48     void GetInstSeqLs(uint64_t Imm, unsigned RemSize, InstSeqLs &SeqLs);
49
50     unsigned Size;
51     unsigned ADDiu, ORi, SHL;
52     InstSeq Insts;
53 }
54
55
56 #endif

```

### LLVMBackendTutorialExampleCode/Chapter3\_4/Cpu0AnalyzeImmediate.cpp

```

1 //===== Cpu0AnalyzeImmediate.cpp - Analyze Immediates =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 #include "Cpu0AnalyzeImmediate.h"
10 #include "Cpu0.h"
11 #include "llvm/Support/MathExtras.h"
12
13 using namespace llvm;
14
15 Cpu0AnalyzeImmediate::Inst::Inst(unsigned O, unsigned I) : Opc(O), ImmOpnd(I) {}
16
17 // Add I to the instruction sequences.
18 void Cpu0AnalyzeImmediate::AddInstr(InstSeqLs &SeqLs, const Inst &I) {
19     // Add an instruction sequeunce consisting of just I.
20     if (SeqLs.empty()) {
21         SeqLs.push_back(InstSeq(1, I));
22         return;
23     }
24
25     for (InstSeqLs::iterator Iter = SeqLs.begin(); Iter != SeqLs.end(); ++Iter)
26         Iter->push_back(I);
27 }
28
29 void Cpu0AnalyzeImmediate::GetInstSeqLsADDiu(uint64_t Imm, unsigned RemSize,
30                                               InstSeqLs &SeqLs) {
31     GetInstSeqLs((Imm + 0x8000ULL) & 0xffffffffffff0000ULL, RemSize, SeqLs);
32     AddInstr(SeqLs, Inst(ADDiu, Imm & 0xffffULL));
33 }
34
35 void Cpu0AnalyzeImmediate::GetInstSeqLsORi(uint64_t Imm, unsigned RemSize,
36                                               InstSeqLs &SeqLs) {
37     GetInstSeqLs(Imm & 0xffffffffffff0000ULL, RemSize, SeqLs);

```

```

38     AddInstr(SeqLs, Inst(ORi, Imm & 0xffffffffULL));
39 }
40
41 void Cpu0AnalyzeImmediate::GetInstSeqLsSHL(uint64_t Imm, unsigned RemSize,
42                                         InstSeqLs &SeqLs) {
43     unsigned Shamt = CountTrailingZeros_64(Imm);
44     GetInstSeqLs(Imm >> Shamt, RemSize - Shamt, SeqLs);
45     AddInstr(SeqLs, Inst(SHL, Shamt));
46 }
47
48 void Cpu0AnalyzeImmediate::GetInstSeqLs(uint64_t Imm, unsigned RemSize,
49                                         InstSeqLs &SeqLs) {
50     uint64_t MaskedImm = Imm & (0xffffffffffffffffULL >> (64 - Size));
51
52     // Do nothing if Imm is 0.
53     if (!MaskedImm)
54         return;
55
56     // A single ADDiu will do if RemSize <= 16.
57     if (RemSize <= 16) {
58         AddInstr(SeqLs, Inst(ADDiu, MaskedImm));
59         return;
60     }
61
62     // Shift if the lower 16-bit is cleared.
63     if (!(Imm & 0xffff)) {
64         GetInstSeqLsSHL(Imm, RemSize, SeqLs);
65         return;
66     }
67
68     GetInstSeqLsADDiu(Imm, RemSize, SeqLs);
69
70     // If bit 15 is cleared, it doesn't make a difference whether the last
71     // instruction is an ADDiu or ORi. In that case, do not call GetInstSeqLsORi.
72     if (Imm & 0x8000) {
73         InstSeqLs SeqLsORi;
74         GetInstSeqLsORi(Imm, RemSize, SeqLsORi);
75         SeqLs.insert(SeqLs.end(), SeqLsORi.begin(), SeqLsORi.end());
76     }
77 }
78
79 const Cpu0AnalyzeImmediate::InstSeq
80 &Cpu0AnalyzeImmediate::Analyze(uint64_t Imm, unsigned Size,
81                               bool LastInstrIsADDiu) {
82     this->Size = Size;
83
84     ADDiu = Cpu0::ADDiu;
85     ORi = Cpu0::ORi;
86     SHL = Cpu0::SHL;
87
88     InstSeqLs SeqLs;
89
90     // Get the list of instruction sequences.
91     if (LastInstrIsADDiu | !Imm)
92         GetInstSeqLsADDiu(Imm, Size, SeqLs);
93     else
94         GetInstSeqLs(Imm, Size, SeqLs);
95

```

```

96     Insts.clear();
97     Insts.append(SeqLs.begin() -> begin(), SeqLs.begin() -> end());
98
99     return Insts;
100 }

```

### LLVMBackendTutorialExampleCode/Chapter3\_4/Cpu0RegisterInfo.cpp

```

}

//- If eliminateFrameIndex() is empty, it will hang on run.
// pure virtual method
// FrameIndex represent objects inside a abstract stack.
// We must replace FrameIndex with an stack/frame pointer
// direct reference.
void Cpu0RegisterInfo::  

eliminateFrameIndex(MachineBasicBlock::iterator II, int SPAdj,  

                     unsigned FIOperandNum, RegScavenger *RS) const {  

    MachineInstr &MI = *II;  

    MachineFunction &MF = *MI.getParent() -> getParent();  

    MachineFrameInfo *MFI = MF.getFrameInfo();  

  

    unsigned i = 0;  

    while (!MI.getOperand(i).isFI()) {  

        ++i;  

        assert(i < MI.getNumOperands() &&  

               "Instr doesn't have FrameIndex operand!");  

    }  

  

    DEBUG(errs() << "\nFunction : " << MF.getFunction() -> getName() << "\n";
          errs() << "<----->\n" << MI);  

  

    int FrameIndex = MI.getOperand(i).getIndex();
    uint64_t stackSize = MF.getFrameInfo() -> getStackSize();
    int64_t spOffset = MF.getFrameInfo() -> getObjectOffset(FrameIndex);  

  

    DEBUG(errs() << "FrameIndex : " << FrameIndex << "\n"
          << "spOffset : " << spOffset << "\n"
          << "stackSize : " << stackSize << "\n");  

  

    const std::vector<CalleeSavedInfo> &CSI = MFI -> getCalleeSavedInfo();
    int MinCSFI = 0;
    int MaxCSFI = -1;  

  

    if (CSI.size()) {
        MinCSFI = CSI[0].getFrameIdx();
        MaxCSFI = CSI[CSI.size() - 1].getFrameIdx();
    }  

  

// The following stack frame objects are always referenced relative to $sp:
// 1. Outgoing arguments.
// 2. Pointer to dynamically allocated stack space.
// 3. Locations for callee-saved registers.
// Everything else is referenced relative to whatever register
// getFrameRegister() returns.
unsigned FrameReg;

```

```

FrameReg = getFrameRegister(MF);

// Calculate final offset.
// - There is no need to change the offset if the frame object is one of the
//   following: an outgoing argument, pointer to a dynamically allocated
//   stack space or a $gp restore location,
// - If the frame object is any of the following, its offset must be adjusted
//   by adding the size of the stack:
//   incoming argument, callee-saved register location or local variable.
int64_t Offset;
Offset = spOffset + (int64_t)stackSize;

Offset += MI.getOperand(i+1).getImm();

DEBUG(errs() << "Offset : " << Offset << "\n" << "-----\n");

// If MI is not a debug value, make sure Offset fits in the 16-bit immediate
// field.
if (!MI.isDebugValue() && !isInt<16>(Offset)) {
    assert("(!MI.isDebugValue() && !isInt<16>(Offset))");
}

MI.getOperand(i).ChangeToRegister(FrameReg, false);
MI.getOperand(i+1).ChangeToImmediate(Offset);
}

// pure virtual method

```

### LLVMBackendTutorialExampleCode/Chapter3\_4/CMakeLists.txt

```

add_llvm_target(...

...
Cpu0AnalyzeImmediate.cpp
...
)

```

After add these Prologue and Epilogue functions, and build with Chapter3\_4/Cpu0. Now we are ready to compile our example code ch3.bc into cpu0 assembly code. Following is the command and output file ch3.cpu0.s,

```

118-165-78-12:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm -debug ch3.bc -o -
Args: /Users/Jonathan/llvm/test/cmake_debug_build/bin/Debug/llc -march=cpu0
-relocation-model=pic -filetype=asm -debug ch3.bc -o ch3.cpu0.s
118-165-78-12:InputFiles Jonathan$ cat ch3.cpu0.s
.section .mdebug.abi32
.previous
.file "ch3.bc"
.text
.globl main
.align 2
.type main,@function
.ent main          # @main
main:
.cfi_startproc
.frame $sp,8,$lr
.mask 0x00000000,0

```

```

.set  noreorder
.set  nomacro
# BB#0:
    addiu $sp, $sp, -8
$tmp1:
    .cfi_def_cfa_offset 8
    addiu $2, $zero, 0
    st $2, 4($sp)
    addiu $sp, $sp, 8
    ret $lr
.set  macro
.set  reorder
.end  main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc

```

To see how the ‘**DAG->DAG Pattern Instruction Selection**’ work in llc, let’s compile with `llc -debug` option and see what happens.

```

118-165-78-12:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm -debug ch3.bc -o -
Args: /Users/Jonathan/llvm/test/cmake_debug_build/bin/Debug/llc -march=cpu0
-relocation-model=pic -filetype=asm -debug ch3.bc -o -
...
Optimized legalized selection DAG: BB#0 'main:'
SelectionDAG has 8 nodes:
0x7fbe4082d010: i32 = Constant<0> [ORD=1] [ID=1]

0x7fbe4082d410: i32 = Register %V0 [ID=4]

0x7fbe40410668: ch = EntryToken [ORD=1] [ID=0]

0x7fbe4082d010: <multiple use>
0x7fbe4082d110: i32 = FrameIndex<0> [ORD=1] [ID=2]

0x7fbe4082d210: i32 = undef [ORD=1] [ID=3]

0x7fbe4082d310: ch = store 0x7fbe40410668, 0x7fbe4082d010, 0x7fbe4082d110,
0x7fbe4082d210<ST4[%1]> [ORD=1] [ID=5]

0x7fbe4082d410: <multiple use>
0x7fbe4082d010: <multiple use>
0x7fbe4082d510: ch,glue = CopyToReg 0x7fbe4082d310, 0x7fbe4082d410,
0x7fbe4082d010 [ID=6]

0x7fbe4082d510: <multiple use>
0x7fbe4082d410: <multiple use>
0x7fbe4082d510: <multiple use>
0x7fbe4082d610: ch = Cpu0ISD::Ret 0x7fbe4082d510, 0x7fbe4082d410,
0x7fbe4082d510:1 [ID=7]

===== Instruction selection begins: BB#0 ''
Selecting: 0x7fbe4082d610: ch = Cpu0ISD::Ret 0x7fbe4082d510, 0x7fbe4082d410,
0x7fbe4082d510:1 [ID=7]

ISEL: Starting pattern match on root node: 0x7fbe4082d610: ch = Cpu0ISD::Ret

```

```

0x7fbe4082d510, 0x7fbe4082d410, 0x7fbe4082d510:1 [ID=7]

Morphed node: 0x7fbe4082d610: ch = RET 0x7fbe4082d410, 0x7fbe4082d510,
0x7fbe4082d510:1

ISEL: Match complete!
=> 0x7fbe4082d610: ch = RET 0x7fbe4082d410, 0x7fbe4082d510, 0x7fbe4082d510:1

Selecting: 0x7fbe4082d510: ch,glue = CopyToReg 0x7fbe4082d310, 0x7fbe4082d410,
0x7fbe4082d010 [ID=6]

=> 0x7fbe4082d510: ch,glue = CopyToReg 0x7fbe4082d310, 0x7fbe4082d410,
0x7fbe4082d010

Selecting: 0x7fbe4082d310: ch = store 0x7fbe40410668, 0x7fbe4082d010,
0x7fbe4082d110, 0x7fbe4082d210<ST4[%1]> [ORD=1] [ID=5]

ISEL: Starting pattern match on root node: 0x7fbe4082d310: ch = store 0x7fbe40410668,
0x7fbe4082d010, 0x7fbe4082d110, 0x7fbe4082d210<ST4[%1]> [ORD=1] [ID=5]

    Initial Opcode index to 166
    Morphed node: 0x7fbe4082d310: ch = ST 0x7fbe4082d010, 0x7fbe4082d710,
0x7fbe4082d810, 0x7fbe40410668<Mem:ST4[%1]> [ORD=1]

ISEL: Match complete!
=> 0x7fbe4082d310: ch = ST 0x7fbe4082d010, 0x7fbe4082d710, 0x7fbe4082d810,
0x7fbe40410668<Mem:ST4[%1]> [ORD=1]

Selecting: 0x7fbe4082d410: i32 = Register %v0 [ID=4]

=> 0x7fbe4082d410: i32 = Register %v0

Selecting: 0x7fbe4082d010: i32 = Constant<0> [ORD=1] [ID=1]

ISEL: Starting pattern match on root node: 0x7fbe4082d010: i32 =
Constant<0> [ORD=1] [ID=1]

    Initial Opcode index to 1201
    Morphed node: 0x7fbe4082d010: i32 = ADDiu 0x7fbe4082d110, 0x7fbe4082d810 [ORD=1]

ISEL: Match complete!
=> 0x7fbe4082d010: i32 = ADDiu 0x7fbe4082d110, 0x7fbe4082d810 [ORD=1]

Selecting: 0x7fbe40410668: ch = EntryToken [ORD=1] [ID=0]

=> 0x7fbe40410668: ch = EntryToken [ORD=1]

===== Instruction selection ends:

```

Summary above translation into Table: Chapter 3 .bc IR instructions.

Table 3.1: Chapter 3 .bc IR instructions

| .bc        | Optimized legalized selection DAG | Cpu0  |
|------------|-----------------------------------|-------|
| constant 0 | constant 0                        | addiu |
| store      | store                             | st    |
| ret        | Cpu0ISD::Ret                      | ret   |

From above llc -debug display, we see the **store** and **ret** are translated into **store** and **Cpu0ISD::Ret** in stage

Optimized legalized selection DAG, and then translated into Cpu0 instructions **st** and **ret** finally. Since store use **constant 0 (store i32 0, i32\* %1** in this example), the constant 0 will be translated into “**addiu \$2, \$zero, 0**” via the following pattern defined in Cpu0InstrInfo.td.

### LLVMBackendTutorialExampleCode/Chapter3\_4/Cpu0InstrInfo.td

```
//=====
// Small immediates

def : Pat<(i32 immSExt16:$in),
      (ADDiu ZERO, imm:$in)>;
```

At this point, we have translate the very simple main() function with return 0 single instruction. The Cpu0AnalyzeImmediate.cpp defined as above and the Cpu0InstrInfo.td instructions add as below, takes care the 32 bits stack size adjustments.

### LLVMBackendTutorialExampleCode/Chapter3\_4/Cpu0InstrInfo.td

```
def shamt      : Operand<i32>;
// Unsigned Operand
def uimm16    : Operand<i32> {
    let PrintMethod = "printUnsignedImm";
}
...
// Node immediate fits as 16-bit zero extended on target immediate.
// The LO16 param means that only the lower 16 bits of the node
// immediate are caught.
// e.g. addiu, sltiu
def immZExt16 : PatLeaf<(imm), [
    if (N->getValueType(0) == MVT::i32)
        return (uint32_t)N->getZExtValue() == (unsigned short)N->getZExtValue();
    else
        return (uint64_t)N->getZExtValue() == (unsigned short)N->getZExtValue();
]>;
// shamt field must fit in 5 bits.
def immZExt5 : ImmLeaf<i32, [{return Imm == (Imm & 0x1f);}]>;
...
// Arithmetic and logical instructions with 3 register operands.
class ArithLogicR<bits<8> op, string instr_asm, SDNode OpNode,
                  InstrItinClass itin, RegisterClass RC, bit isComm = 0>:
    FA<op, (outs RC:$ra), (ins RC:$rb, RC:$rc),
    !strconcat(instr_asm, "\t$ra, $rb, $rc"),
    [(set RC:$ra, (OpNode RC:$rb, RC:$rc))], itin> {
    let shamt = 0;
    let isCommutable = isComm; // e.g. add rb rc = add rc rb
    let isReMaterializable = 1;
}
...
// Shifts
class shift_rotate_imm<bits<8> op, bits<4> isRotate, string instr_asm,
                     SDNode OpNode, PatFrag PF, Operand ImmOpnd,
```

```

        RegisterClass RC>:
FA<op, (outs RC:$ra), (ins RC:$rb, ImmOpnd:$shamt),
  !strconcat(instr_asm, "\t$ra, $rb, $shamt"),
  [(set RC:$ra, (OpNode RC:$rb, PF:$shamt))], IIAlu> {
let rc = isRotate;
let shamt = shamt;
}

// 32-bit shift instructions.
class shift_rotate_imm32<bits<8> func, bits<4> isRotate, string instr_asm,
  SDNode OpNode>:
  shift_rotate_imm<func, isRotate, instr_asm, OpNode, immZExt5, shamt, CPURegs>;
...
def ORi      : ArithLogicI<0x0D, "ori", or, uimml6, immZExt16, CPURegs>;
/// Arithmetic Instructions (3-Operand, R-Type)
def ADDu     : ArithLogicR<0x11, "addu", add, IIAlu, CPURegs, 1>;
/// Shift Instructions
def SHL      : shift_rotate_imm32<0x1E, 0x00, "shl", shl>;
...

```

The Cpu0AnalyzeImmediate.cpp written in recursive and a little complicate in logic. You can skip these recursive code and think these code in last chapter 12. Since in Chapter 12 Optimization, it replace addiu and shl with lui single instruction, you have chance to think this thing in details. Anyway, the recursive skills is used in the front end compile book, you should familiar with it. Instead tracking the code, listing the stack size and the instructions generated in Table: Cpu0 stack adjustment instructions as follows,

Table 3.2: Cpu0 stack adjustment instructions

| stack size range    | ex. stack size | Cpu0 Prologue instructions                                                                              | Cpu0 Epilogue instructions                                                                                 |
|---------------------|----------------|---------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| 0 ~ 0x7fff          | • 0x7fff       | • addiu \$sp, \$sp, 32767;                                                                              | • addiu \$sp, \$sp, 32767;                                                                                 |
| 0x8000 ~ 0xffff     | • 0x8000       | • addiu \$sp, \$sp, -32768;                                                                             | • addiu \$1, \$zero, 1;<br>• shl \$1, \$1, 16;<br>• addiu \$1, \$1, -32768;<br>• addu \$sp, \$sp, \$1;     |
| x10000 ~ 0xffffffff | • 0x7fffffff   | • addiu \$1, \$zero, -1;<br>• shl \$1, \$1, 31;<br>• addiu \$1, \$1, 1;<br>• addu \$sp, \$sp, \$1;      | • addiu \$1, \$zero, 1;<br>• shl \$1, \$1, 31;<br>• addiu \$1, \$1, -1;<br>• addu \$sp, \$sp, \$1;         |
| x10000 ~ 0xffffffff | • 0x90008000   | • addiu \$1, \$zero, -9;<br>• shl \$1, \$1, 28;<br>• addiu \$1, \$1, -32768;<br>• addu \$sp, \$sp, \$1; | • addiu \$1, \$zero, -28671;<br>• shl \$1, \$1, 16<br>• addiu \$1, \$1, -32768;<br>• addu \$sp, \$sp, \$1; |

Assume sp = 0xa0008000 and stack size = 0x90008000, then (0xa0008000 - 0x90008000) => 0x10000000. Verify with the Cpu0 Prologue instructions as follows,

1. “addiu \$1, \$zero, -9” => (\$1 = 0 + 0xffffffff7) => \$1 = 0xffffffff7.
2. “shl \$1, \$1, 28;” => \$1 = 0x70000000.
3. “addiu \$1, \$1, -32768” => \$1 = (0x70000000 + 0xffff8000) => \$1 = 0x6fff8000.
4. “addu \$sp, \$sp, \$1” => \$sp = (0xa0008000 + 0x6fff8000) => \$sp = 0x10000000.

Verify with the Cpu0 Epilogue instructions with  $sp = 0x10000000$  and stack size =  $0x90008000$  as follows,

1. “addiu \$1, \$zero, -28671” => (\$1 = 0 + 0xffff9001) => \$1 = 0xffff9001.
2. “shl \$1, \$1, 16;” => \$1 = 0x90010000.
3. “addiu \$1, \$1, -32768” => \$1 = (0x90010000 + 0xffff8000) => \$1 = 0x90008000.
4. “addu \$sp, \$sp, \$1” => \$sp = (0x10000000 + 0x90008000) => \$sp = 0xa0008000.

## 3.8 Summary of this Chapter

Summary the functions for llvm backend stages as the following table.

```
118-165-79-200:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch3.bc
-debug-pass=Structure -o -
...
Machine Branch Probability Analysis
ModulePass Manager
FunctionPass Manager
...
CPU0 DAG->DAG Pattern Instruction Selection
  Initial selection DAG
  Optimized lowered selection DAG
  Type-legalized selection DAG
  Optimized type-legalized selection DAG
  Legalized selection DAG
  Optimized legalized selection DAG
  Instruction selection
  Selected selection DAG
  Scheduling
...
Greedy Register Allocator
...
Post-RA pseudo instruction expansion pass
...
Cpu0 Assembly Printer
```

Table 3.3: functions for llvm backend stage

| Stage                                              | Function                                                                                                                                |
|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| Before CPU0 DAG->DAG Pattern Instruction Selection | <ul style="list-style-type: none"> <li>• Cpu0TargetLowering::LowerFormalArguments</li> <li>• Cpu0TargetLowering::LowerReturn</li> </ul> |
| Instruction selection                              | <ul style="list-style-type: none"> <li>• Cpu0DAGToDAGISel::Select</li> </ul>                                                            |
| Prologue/Epilogue Insertion & Frame Finalization   | <ul style="list-style-type: none"> <li>• Cpu0FrameLowering.cpp</li> <li>• Cpu0RegisterInfo::eliminateFrameIndex()</li> </ul>            |
| Cpu0 Assembly Printer                              | <ul style="list-style-type: none"> <li>• Cpu0AsmPrinter.cpp -&gt; Cpu0MCInstLower.cpp</li> <li>• Cpu0InstPrinter.cpp</li> </ul>         |

We have finished a simple assembler for cpu0 which only support **ld, st, addiu, ori, addu, shl** and **ret** 7 instructions.

We are satisfied with this result. But you may think “After so many codes we program, and just get these 7 instructions”. The point is we have created a frame work for cpu0 target machine (please look back the llvm back end structure class inherit tree early in this chapter). Until now, we have over 3000 lines of source code with comments which include files \*.cpp, \*.h, \*.td, CMakeLists.txt and LLVMBuild.txt. It can be counted by command `wc `find dir -name *.cpp`` for files \*.cpp, \*.h, \*.td, \*.txt. LLVM front end tutorial have 700 lines of source code without comments totally. Don’t feel down with this result. In reality, write a back end is warm up slowly but run fast. Clang has over 500,000 lines of source code with comments in clang/lib directory which include C++ and Obj C support. Mips back end has only 15,000 lines with comments. Even the complicate X86 CPU which CISC outside and RISC inside (micro instruction), has only 45,000 lines with comments. In next chapter, we will show you that add a new instruction support is as easy as 123.



# ADDING ARITHMETIC AND LOCAL POINTER SUPPORT

This chapter adds more cpu0 arithmetic instructions support first. The logic operation “**not**” support and translation in section [Operator “not”](#) !. The [section Display Ilvm IR nodes with Graphviz](#) will show you the DAG optimization steps and their corresponding llc display options. These DAG optimization steps result can be displayed by the graphic tool of Graphviz which supply very useful information with graphic view. You will appreciate Graphviz support in debug, we think. In [section Adjust cpu0 instructions](#), we adjust cpu0 instructions to support some data type for C language. The [section Local variable pointer](#) introduce you the local variable pointer translation. Finally, [section Operator mod, %](#) take care the C operator %.

## 4.1 Support arithmetic instructions

Run the Chapter3\_5/Cpu0 llc with input file ch4\_1\_1.bc will get the error as follows,

[LLVMBackendTutorialExampleCode/InputFiles/ch4\\_1\\_1.cpp](#)

```
1 int main()
2 {
3     int a = 5;
4     int b = 2;
5     int c = 0;
6
7     c = a + b;
8
9     return c;
10 }
```

```
118-165-78-230:InputFiles Jonathan$ clang -c ch4_1_1.cpp -emit-llvm -o
ch4_1_1.bc
118-165-78-230:InputFiles Jonathan$ llvm-dis ch4_1_1.bc -o ch4_1_1.ll
118-165-78-230:InputFiles Jonathan$ cat ch4_1_1.ll
; ModuleID = 'ch4_1_1.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-
f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:
32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"

define i32 @main() nounwind uwtable ssp {
```

```
%1 = alloca i32, align 4
%a = alloca i32, align 4
%b = alloca i32, align 4
%c = alloca i32, align 4
store i32 0, i32* %1
store i32 5, i32* %a, align 4
store i32 2, i32* %b, align 4
store i32 0, i32* %c, align 4
%2 = load i32* %a, align 4
%3 = load i32* %b, align 4
%4 = add nsw i32 %2, %3
store i32 %4, i32* %c, align 4
%5 = load i32* %c, align 4
ret i32 %5
}

118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch4_1_1.bc -o
ch4_1_1.cpu0.s
LLVM ERROR: Cannot select: 0x7ff02102b010: i32 = add 0x7ff02102ae10, ...
...
```

This error says we have not instructions to translate IR DAG node **add**. The ADDiu instruction is defined for node **add** with operands of 1 register and 1 immediate. This node **add** is for 2 registers. So, appending the following code to Cpu0InstrInfo.td and Cpu0Schedule.td in Chapter4\_1/,

### LLVMBackendTutorialExampleCode/Chapter4\_1/Cpu0InstrInfo.td

```
def shamt      : Operand<i32>;
...
// shamt field must fit in 5 bits.
def immZExt5 : ImmLeaf<i32, [{return Imm == (Imm & 0x1f);}]>;
...
// Arithmetic and logical instructions with 3 register operands.
class ArithLogicR<bits<8> op, string instr_asm, SDNode OpNode,
    InstrItinClass itin, RegisterClass RC, bit isComm = 0>:
    FA<op, (outs RC:$ra), (ins RC:$rb, RC:$rc),
    !strconcat(instr_asm, "\t$ra, $rb, $rc"),
    [(set RC:$ra, (OpNode RC:$rb, RC:$rc))], itin> {
let shamt = 0;
let isCommutable = isComm; // e.g. add rb rc = add rc rb
let isReMaterializable = 1;
}

class CmpInstr<bits<8> op, string instr_asm,
    InstrItinClass itin, RegisterClass RC, bit isComm = 0>:
    FA<op, (outs RC:$SW), (ins RC:$ra, RC:$rb),
    !strconcat(instr_asm, "\t$ra, $rb"), [], itin> {
let rc = 0;
let shamt = 0;
let isCommutable = isComm;
}
...
// Shifts
class shift_rotate_imm<bits<8> op, bits<4> isRotate, string instr_asm,
    SDNode OpNode, PatFrag PF, Operand ImmOpnd,
    RegisterClass RC>:
```

```

FA<op, (outs RC:$ra), (ins RC:$rb, ImmOpnd:$shamt),
  !strconcat(instr_asm, "\t$ra, $rb, $shamt"),
  [(set RC:$ra, (OpNode RC:$rb, PF:$shamt))], IIAlu> {
let rc = isRotate;
let shamt = shamt;
}

// 32-bit shift instructions.
class shift_rotate_imm32<bits<8> func, bits<4> isRotate, string instr_asm,
  SDNode OpNode>:
shift_rotate_imm<func, isRotate, instr_asm, OpNode, immZExt5, shamt, CPUREgs>

// Load Upper Immediate
class LoadUpper<bits<8> op, string instr_asm, RegisterClass RC, Operand Imm>:
FL<op, (outs RC:$ra), (ins Imm:$imm16),
  !strconcat(instr_asm, "\t$ra, $imm16"), [], IIAlu> {
let rb = 0;
let neverHasSideEffects = 1;
let isReMaterializable = 1;
}
...

/// Arithmetic Instructions (3-Operand, R-Type)
def CMP : CmpInstr<0x10, "cmp", IIAlu, CPUREgs, 1>;
def ADD : ArithLogicR<0x13, "add", add, IIAlu, CPUREgs, 1>;
def SUB : ArithLogicR<0x14, "sub", sub, IIAlu, CPUREgs, 1>;
def MUL : ArithLogicR<0x15, "mul", mul, IIImul, CPUREgs, 1>;
def DIV : ArithLogicR<0x16, "div", sdiv, IIIdiv, CPUREgs, 1>;
def UDIV : ArithLogicR<0x17, "udiv", udiv, IIIdiv, CPUREgs, 1>;
def AND : ArithLogicR<0x18, "and", and, IIAlu, CPUREgs, 1>;
def OR : ArithLogicR<0x19, "or", or, IIAlu, CPUREgs, 1>;
def XOR : ArithLogicR<0x1A, "xor", xor, IIAlu, CPUREgs, 1>

/// Shift Instructions
// sra is IR node for ash  llvm IR instruction of .bc
def SRA : shift_rotate_imm32<0x1B, 0x00, "sra", sra>;
def ROL : shift_rotate_imm32<0x1C, 0x01, "rol", rotl>;
def ROR : shift_rotate_imm32<0x1D, 0x01, "ror", rotr>;
...
// srl is IR node for lsh  llvm IR instruction of .bc
def SHR : shift_rotate_imm32<0x1F, 0x00, "shr", srl>;

```

### LLVMBackendTutorialExampleCode/Chapter4\_1/Cpu0Schedule.td

```

...
def IMULDIV : FuncUnit;
...
def IIImul      : InstrItinClass;
def IIIdiv      : InstrItinClass;
...
// http://llvm.org/docs/doxygen/html/structllvm\_1\_1InstrStage.html
def Cpu0GenericItineraries : ProcessorItineraries<[ALU, IMULDIV], [], [
...
InstrItinData<IIImul      , [InstrStage<17, [IMULDIV]>]>,
InstrItinData<IIIdiv      , [InstrStage<38, [IMULDIV]>]>
]>;

```

In RISC CPU like Mips, the multiply/divide function unit and add/sub/logic unit are designed from two different

hardware circuits, and more, their data path is separate. We think the cpu0 is the same even though no explanation in it's web site. So, these two function units can be executed at same time (instruction level parallelism). Reference <sup>1</sup> for instruction itineraries.

Now, let's build Chapter4\_1/ and run with input file ch4\_1\_2.cpp as follows,

```
118-165-78-12:InputFiles Jonathan$ clang -c ch4_1_2.cpp -emit-llvm -o ch4_1_2.bc
118-165-78-12:InputFiles Jonathan$ llvm-dis ch4_1_2.bc -o -
; ModuleID = 'ch4_1_2.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i16:16:16-i32:32:32-i64:64:64-
f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64-f80:128:128-n8:16:
32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"

define i32 @main() nounwind uwtable ssp {
  %1 = alloca i32, align 4
  %a = alloca i32, align 4
  %b = alloca i32, align 4
  %c = alloca i32, align 4
  %d = alloca i32, align 4
  %e = alloca i32, align 4
  %f = alloca i32, align 4
  %g = alloca i32, align 4
  %h = alloca i32, align 4
  %i = alloca i32, align 4
  %j = alloca i32, align 4
  %k = alloca i32, align 4
  %l = alloca i32, align 4
  %a1 = alloca i32, align 4
  %k1 = alloca i32, align 4
  %f1 = alloca i32, align 4
  %j1 = alloca i32, align 4
  store i32 0, i32* %1
  store i32 5, i32* %a, align 4
  store i32 2, i32* %b, align 4
  store i32 0, i32* %c, align 4
  store i32 0, i32* %d, align 4
  store i32 0, i32* %l, align 4
  store i32 -5, i32* %a1, align 4
  store i32 0, i32* %k1, align 4
  store i32 0, i32* %f1, align 4
  %2 = load i32* %a, align 4
  %3 = load i32* %b, align 4
  %4 = add nsw i32 %2, %3
  store i32 %4, i32* %c, align 4
  %5 = load i32* %a, align 4
  %6 = load i32* %b, align 4
  %7 = sub nsw i32 %5, %6
  store i32 %7, i32* %d, align 4
  %8 = load i32* %a, align 4
  %9 = load i32* %b, align 4
  %10 = mul nsw i32 %8, %9
  store i32 %10, i32* %e, align 4
  %11 = load i32* %a, align 4
  %12 = load i32* %b, align 4
  %13 = sdiv i32 %11, %12
  store i32 %13, i32* %f, align 4
```

<sup>1</sup> [http://llvm.org/docs/doxygen/html/structllvm\\_1\\_1InstrStage.html](http://llvm.org/docs/doxygen/html/structllvm_1_1InstrStage.html)

```

%14 = load i32* %a1, align 4
%15 = load i32* %b, align 4
%16 = udiv i32 %14, %15
store i32 %16, i32* %f1, align 4
%17 = load i32* %a, align 4
%18 = load i32* %b, align 4
%19 = and i32 %17, %18
store i32 %19, i32* %g, align 4
%20 = load i32* %a, align 4
%21 = load i32* %b, align 4
%22 = or i32 %20, %21
store i32 %22, i32* %h, align 4
%23 = load i32* %a, align 4
%24 = load i32* %b, align 4
%25 = xor i32 %23, %24
store i32 %25, i32* %i, align 4
%26 = load i32* %a, align 4
%27 = shl i32 %26, 2
store i32 %27, i32* %j, align 4
%28 = load i32* %a1, align 4
%29 = shl i32 %28, 2
store i32 %29, i32* %j1, align 4
%30 = load i32* %a, align 4
%31 = ashr i32 %30, 2
store i32 %31, i32* %k, align 4
%32 = load i32* %a1, align 4
%33 = lshr i32 %32, 2
store i32 %33, i32* %k1, align 4
%34 = load i32* %c, align 4
ret i32 %34
}

118-165-78-12:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch4_1_2.bc -o -
.section .mdebug.abi32
.previous
.file "ch4_1_2.bc"
.text
.globl main
.align 2
.type main,@function
.ent main           # @main
main:
.cfi_startproc
.frame $sp,72,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -72
$tmp1:
.cfi_def_cfa_offset 72
addiu $2, $zero, 0
st $2, 68($sp)
addiu $3, $zero, 5
st $3, 64($sp)
addiu $3, $zero, 2
st $3, 60($sp)

```

```
st  $2, 56($sp)
st  $2, 52($sp)
st  $2, 20($sp)
addiu $3, $zero, -5
st  $3, 16($sp)
st  $2, 12($sp)
st  $2, 8($sp)
ld   $2, 60($sp)
ld   $3, 64($sp)
add $2, $3, $2
st  $2, 56($sp)
ld   $2, 60($sp)
ld   $3, 64($sp)
sub $2, $3, $2
st  $2, 52($sp)
ld   $2, 60($sp)
ld   $3, 64($sp)
mul $2, $3, $2
st  $2, 48($sp)
ld   $2, 60($sp)
ld   $3, 64($sp)
div $3, $2
mflo $2
st  $2, 44($sp)
ld   $2, 60($sp)
ld   $3, 16($sp)
divu $3, $2
mflo $2
st  $2, 8($sp)
ld   $2, 60($sp)
ld   $3, 64($sp)
and $2, $3, $2
st  $2, 40($sp)
ld   $2, 60($sp)
ld   $3, 64($sp)
or   $2, $3, $2
st  $2, 36($sp)
ld   $2, 60($sp)
ld   $3, 64($sp)
xor $2, $3, $2
st  $2, 32($sp)
ld   $2, 64($sp)
shl $2, $2, 2
st  $2, 28($sp)
ld   $2, 16($sp)
shl $2, $2, 2
st  $2, 4($sp)
ld   $2, 64($sp)
sra $2, $2, 2
st  $2, 24($sp)
ld   $2, 16($sp)
shr $2, $2, 2
st  $2, 12($sp)
ld   $2, 56($sp)
addiu $sp, $sp, 72
ret $2
.set  macro
.set  reorder
```

```

.end main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc

```

This version can process `+, -, *, /, &, |, ^, <<, and >>` operators in C language. The corresponding LLVM IR instructions are **add**, **sub**, **mul**, **sdiv**, **and**, **or**, **xor**, **shl**, **ashr**. IR instruction **sdiv** stand for signed div while **udiv** is for unsigned div. The ‘**ashr**’ instruction (arithmetic shift right) returns the first operand shifted to the right a specified number of bits with sign extension. In brief, we call **ashr** is “shift with sign extension fill”.

---

**Note: ashr**

**Example:** `<result> = ashr i32 4, 1 ; yields {i32}:result = 2`

`<result> = ashr i8 -2, 1 ; yields {i8}:result = -1`

`<result> = ashr i32 1, 32 ; undefined`

---

The C operator `>>` for negative operand is dependent on implementation. Most compiler translate it into “shift with sign extension fill”, for example, Mips **sra** is the instruction. Following is the Microsoft web site explanation,

**Note: >>, Microsoft Specific**

The result of a right shift of a signed negative quantity is implementation dependent. Although Microsoft C++ propagates the most-significant bit to fill vacated bit positions, there is no guarantee that other implementations will do likewise.

---

In addition to **ashr**, the other instruction “shift with zero filled” **lshr** in LLVM (Mips implement lshr with instruction **srl**) has the following meaning.

**Note: lshr**

**Example:** `<result> = lshr i8 -2, 1 ; yields {i8}:result = 0x7FFFFFFF`

---

In LLVM, IR node **sra** is defined for ashr IR instruction, node **srl** is defined for lshr instruction (I don’t know why don’t use ashr and lshr as the IR node name directly). Summary as the Table: C operator `>>` implementation.

Table 4.1: C operator `>>` implementation

| Description                                       | Shift with zero filled | Shift with signed extension filled |
|---------------------------------------------------|------------------------|------------------------------------|
| symbol in .bc                                     | lshr                   | ashr                               |
| symbol in IR node                                 | srl                    | sra                                |
| Mips instruction                                  | srl                    | sra                                |
| Cpu0 instruction                                  | shr                    | sra                                |
| signed example before x <code>&gt;&gt; 1</code>   | 0xffffffff i.e. -2     | 0xffffffff i.e. -2                 |
| signed example after x <code>&gt;&gt; 1</code>    | 0x7fffffff i.e. 2G-1   | 0xffffffff i.e. -1                 |
| unsigned example before x <code>&gt;&gt; 1</code> | 0xffffffff i.e. 4G-2   | 0xffffffff i.e. 4G-2               |
| unsigned example after x <code>&gt;&gt; 1</code>  | 0x7fffffff i.e. 2G-1   | 0xffffffff i.e. 4G-1               |

**lshr:** Logical SHift Right

**ashr:** Arithmetic SHift right

**srl:** Shift Right Logically

**sra:** Shift Right Arithmetically

### shr: SHift Right

If we consider the  $x \gg 1$  definition is  $x = x/2$  for compiler implementation. As you can see from Table: C operator  $\gg$  implementation, **lshr** is failed on some signed value (such as -2). In the same way, **ashr** is failed on some unsigned value (such as 4G-2). So, in order to satisfy this definition in both signed and unsigned integer of x, we need these two instructions, **lshr** and **ashr**.

Table 4.2: C operator  $\ll$  implementation

| Description                       | Shift with zero filled |
|-----------------------------------|------------------------|
| symbol in .bc                     | shl                    |
| symbol in IR node                 | shl                    |
| Mips instruction                  | sll                    |
| Cpu0 instruction                  | shl                    |
| signed example before $x \ll 1$   | 0x40000000 i.e. 1G     |
| signed example after $x \ll 1$    | 0x80000000 i.e. -2G    |
| unsigned example before $x \ll 1$ | 0x40000000 i.e. 1G     |
| unsigned example after $x \ll 1$  | 0x80000000 i.e. 2G     |

Again, consider the  $x \ll 1$  definition is  $x = x*2$ . From Table: C operator  $\ll$  implementation, we see **lshr** satisfy the unsigned  $x=1G$  but failed on signed  $x=1G$ . It's fine since the 2G is out of 32 bits signed integer range (-2G ~ 2G-1). For the overflow case, no way to keep the correct result in register. So, any value in register is OK. You can check the **lshr** satisfy  $x = x*2$  for  $x \ll 1$  when the x result is not out of range, no matter operand x is signed or unsigned integer.

Micorsoft implementation references as <sup>2</sup>.

The sub-section ““ashr‘ Instruction” and sub-section ““lshr‘ Instruction” of <sup>3</sup>.

The version Chapter4\_1 just add 70 lines code in td files. With these 70 lines code, it process 9 operators more for C language and their corresponding llvm IR instructions. The arithmetic instructions are easy to implement by add the definition in td file only.

Run ch4\_1\_3.cpp with code Chapter4\_1/ which support udiv and sra will get the result as follows,

### LLVMBackendTutorialExampleCode/InputFiles/ch4\_1\_3.cpp

```

1 int main()
2 {
3     int a = 1;
4     int b = 2;
5     int k = 0;
6     unsigned int a1 = -5, f1 = 0;
7
8     f1 = a1 / b;
9     k = (a >> 2);
10
11    return k;
12 }
```

```

118-165-13-40:InputFiles Jonathan$ clang -c ch4_1_3.cpp -emit-llvm -o ch4_1_3.bc
118-165-13-40:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch4_1_3.bc -o ch4_1_3.cpu0.s
118-165-13-40:InputFiles Jonathan$ cat ch4_1_3.cpu0.s
...

```

<sup>2</sup> <http://msdn.microsoft.com/en-us/library/336xbhcz%28v=vs.80%29.aspx>

<sup>3</sup> <http://llvm.org/docs/LangRef.html>

```

1  udiv    $2, $3, $2
2  st    $2, 0($sp)
3  ld    $2, 16($sp)
4  sra   $2, $2, 2
...

```

## 4.2 Operator “not” !

Files ch4\_2.cpp and ch4\_2.bc are the C source code for “not” boolean operator and it’s corresponding llvm IR. List them as follows,

**LLVMBackendTutorialExampleCode/InputFiles/ch4\_2.cpp**

```

1  int main()
2  {
3      int a = 5;
4      int b = 0;
5
6      b = !a;
7
8      return b;
9  }

; ModuleID = 'ch4_2.bc'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-
f32:32:f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.8.0"

define i32 @main() nounwind ssp {
entry:
    %retval = alloca i32, align 4
    %a = alloca i32, align 4
    %b = alloca i32, align 4
    store i32 0, i32* %retval
    store i32 5, i32* %a, align 4
    store i32 0, i32* %b, align 4
    %0 = load i32* %a, align 4      // a = %0
    %tobool = icmp ne i32 %0, 0    // ne: stand for not equal
    %lnot = xor i1 %tobool, true
    %conv = zext i1 %lnot to i32
    store i32 %conv, i32* %b, align 4
    %1 = load i32* %b, align 4
    ret i32 %1
}

```

As above,  $b = !a$ , is translated into  $(\text{xor } (\text{icmp ne } i32 \%0, 0), \text{true})$ . The  $\%0$  is the virtual register of variable **a** and the result of  $(\text{icmp ne } i32 \%0, 0)$  is 1 bit size. To prove the translation is correct. Let’s assume  $\%0 \neq 0$  first, then the  $(\text{icmp ne } i32 \%0, 0) = 1$  (or true), and  $(\text{xor } 1, 1) = 0$ . When  $\%0 = 0$ ,  $(\text{icmp ne } i32 \%0, 0) = 0$  (or false), and  $(\text{xor } 0, 1) = 1$ . So, the translation is correct.

Now, let’s run ch4\_2.bc with Chapter4\_1/ with `llc -debug` option to get result as follows,

```

118-165-16-22:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -debug -relocation-model=pic
-filetype=asm ch4_3.bc -o ch4_3.cpu0.s

```

```
...
==== main
Initial selection DAG: BB#0 'main:entry'
SelectionDAG has 20 nodes:
...
0x7ffb7982ab10: <multiple use>
  0x7ffb7982ab10: <multiple use>
  0x7ffb7982a210: <multiple use>
  0x7ffb7982ac10: ch = setne [ORD=5]

0x7ffb7982ad10: i1 = setcc 0x7ffb7982ab10, 0x7ffb7982a210, 0x7ffb7982ac10
[ORD=5]

0x7ffb7982ae10: i1 = Constant<-1> [ORD=6]

0x7ffb7982af10: i1 = xor 0x7ffb7982ad10, 0x7ffb7982ae10 [ORD=6]

0x7ffb7982b010: i32 = zero_extend 0x7ffb7982af10 [ORD=7]
...
Replacing.3 0x7ffb7982af10: i1 = xor 0x7ffb7982ad10, 0x7ffb7982ae10 [ORD=6]

With: 0x7ffb7982d210: i1 = setcc 0x7ffb7982ab10, 0x7ffb7982a210, 0x7ffb7982cf10

Optimized lowered selection DAG: BB#0 'main:'
SelectionDAG has 17 nodes:
...
0x7ffb7982ab10: <multiple use>
  0x7ffb7982ab10: <multiple use>
  0x7ffb7982a210: <multiple use>
  0x7ffb7982cf10: ch = seteq

0x7ffb7982d210: i1 = setcc 0x7ffb7982ab10, 0x7ffb7982a210, 0x7ffb7982cf10

0x7ffb7982b010: i32 = zero_extend 0x7ffb7982d210 [ORD=7]
...
Type-legalized selection DAG: BB#0 'main:entry'
SelectionDAG has 18 nodes:
...
0x7ffb7982ab10: <multiple use>
  0x7ffb7982ab10: <multiple use>
  0x7ffb7982a210: <multiple use>
  0x7ffb7982cf10: ch = seteq [ID=-3]

0x7ffb7982ac10: i32 = setcc 0x7ffb7982ab10, 0x7ffb7982a210, 0x7ffb7982cf10
[ID=-3]

0x7ffb7982ad10: i32 = Constant<1> [ID=-3]

0x7ffb7982ae10: i32 = and 0x7ffb7982ac10, 0x7ffb7982ad10 [ID=-3]
...
ISEL: Starting pattern match on root node: 0x7ffb7982ac10: i32 = setcc
0x7ffb7982ab10, 0x7ffb7982a210, 0x7ffb7982cf10 [ID=14]

Initial Opcode index to 0
Match failed at index 0
LLVM ERROR: Cannot select: 0x7ffb7982ac10: i32 = setcc 0x7ffb7982ab10,
0x7ffb7982a210, 0x7ffb7982cf10 [ID=14]
0x7ffb7982ab10: i32, ch = load 0x7ffb7982aa10, 0x7ffb7982a710,
```

```

0x7ffb7982a410<LD4[%a]> [ORD=4] [ID=13]
0x7ffb7982a710: i32 = FrameIndex<1> [ORD=2] [ID=5]
0x7ffb7982a410: i32 = undef [ORD=1] [ID=3]
0x7ffb7982a210: i32 = Constant<0> [ORD=1] [ID=1]
In function: main

```

Summary as Table: C operator ! corresponding IR of .bc and IR of DAG.

Table 4.3: C operator ! corresponding IR of .bc and IR of DAG

| IR of .bc                    | Optimized lowered selection DAG   | Type-legalized selection DAG      |
|------------------------------|-----------------------------------|-----------------------------------|
| %tobool = icmp ne i32 %0, 0  |                                   |                                   |
| %lnot = xor i1 %tobool, true | %lnot = (setcc %tobool, 0, seteq) | %lnot = (setcc %tobool, 0, seteq) |
| %conv = zext i1 %lnot to i32 | %conv = (zero_extend %lnot)       | %conv = (and %lnot, 1)            |

From above DAG translation result of `llc -debug`, we see the IRs are same in both stages of “Initial selection DAG” and “Optimized lowered selection DAG”.

The (setcc %0, 0, setne) and (xor %tobool, -1) in “Initial selection DAG” stage corresponding (icmp %0, 0, ne) and (xor %tobool, 1) in `ch4_2.bc`. The argument in xor is 1 bit size (1 and -1 are same, they are all represented by 1). The (zero\_extend %lnot) of “Initial selection DAG” corresponding (zext i1 %lnot to i32) of `ch4_2.bc`. As above it translate 2 DAG nodes (setcc %0, 0, setne) and (xor %tobool, -1) into 1 DAG node (setcc %tobool, 0, seteq) in “Optimized lowered selection DAG” stage. This translation is right since for 1 bit size, (xor %tobool, 1) and (not %tobool) has same result; and (not (setcc %tobool, 0, setne)) is equal to (setcc %tobool, 0, seteq). In “Type-legalized selection DAG” stage, it translate (zero\_extend i1 %lnot to 32) into (and %lnot, 1). (zero\_extend i1 %lnot to 32) just expand the %lnot to i32 32 bits result, so translate into (and %lnot, 1) is correct.

Finally, according the DAG translation message, it fails at (setcc %tobool, 0, seteq). Run it with `Chapter4_2/` which added code to handle pattern (setcc %tobool, 0, seteq) as below, to get the following result.

#### LLVMBackendTutorialExampleCode/Chapter4\_2/Cpu0InstrInfo.td

```

def : Pat<(not CPURegs:$in),
      (XOR CPURegs:$in, (ADDiu ZERO, 1))>

// setcc patterns
multiclass SeteqPats<RegisterClass RC, Instruction XOROp> {
def : Pat<(seteq RC:$lhs, RC:$rhs),
      (XOROp (XOROp RC:$lhs, RC:$rhs), (ADDiu ZERO, 1))>;
}

defm : SeteqPats<CPURegs, XOR>;

118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -debug -filetype=asm ch4_2.bc
-o ch4_2.cpu0.s
...
ISEL: Starting pattern match on root node: 0x7fbc6902ac10: i32 = setcc
0x7fbc6902ab10, 0x7fbc6902a210, 0x7fbc6902cf10 [ID=14]

Initial Opcode index to 365
Created node: 0x7fbc6902af10: i32 = XOR 0x7fbc6902ab10, 0x7fbc6902a210

Created node: 0x7fbc6902d510: i32 = ADDiu 0x7fbc6902d310, 0x7fbc6902d410

Morphed node: 0x7fbc6902ac10: i32 = XOR 0x7fbc6902af10, 0x7fbc6902d510

```

```
ISEL: Match complete!
=> 0x7fbc6902ac10: i32 = XOR 0x7fbc6902af10, 0x7fbc6902d510
```

Summary as Table: C operator ! corresponding IR of DAG and .

Table 4.4: C operator ! corresponding IR of Type-legalized selection DAG (include and after this stage) and Cpu0 instructions

| Include and after Type-legalized selection DAG | Cpu0 instruction                                                                                                                                                                       |
|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>%lnot = (setcc %tobool, 0, seteq)</code> | <ul style="list-style-type: none"> <li>• <code>%1 = (xor %tobool, 0)</code></li> <li>• <code>%true = (addiu \$r0, 1)</code></li> <li>• <code>%lnot = (xor %1, %true)</code></li> </ul> |
| <code>%conv = (and %lnot, 1)</code>            | <code>%conv = (and %lnot, 1)</code>                                                                                                                                                    |

Chapter4\_2/ defined seteq DAG pattern. It translate `%lnot = (setcc %tobool, 0, seteq)` into `%1 = (xor %tobool, 0)`, `%true = (addiu $r0, 1)` and `%lnot = (xor %1, %true)` in “Instruction selection” stage according the rules defined in Cpu0InstrInfo.td as above. This translation is right based on the following truth:

1. `%lnot = 1` when `%tobool = 0` and `$lnot = 0` when `%tobool!=0`.
2. `%true = (addiu $r0, 1)` always is 1 since `$r0` is zero. `%tobool` is 0 or 1. When `%tobool = 0`, `%1 = 1` and `%lnot = (xor %1, %true) = 0`; when `%tobool = 1`, `%1 = 0` and `%lnot = (xor 0, %true) = 1`.
3. When `%tobool = !0`, `%1 != 1` and `%lnot = (xor %1, %true) != 0`.

After xor, the IR (and `%lnot, 1`) is translated into Cpu0 (and `$lnot, 1`) which is defined before. List the asm file ch4\_2.cpu0.s as below, you can check it with the final result.

```
118-165-16-22:InputFiles Jonathan$ cat ch4_2.cpu0.s
...
# BB#0:
addiu $sp, $sp, -16
$tmp1:
.cfi_def_cfa_offset 16
addiu $2, $zero, 0
st $2, 12($sp)
addiu $3, $zero, 5
st $3, 8($sp)
st $2, 4($sp)
ld $3, 8($sp)
xor $2, $3, $2
addiu $3, $zero, 1
xor $2, $2, $3
and $2, $2, $3
st $2, 4($sp)
addiu $sp, $sp, 16
ret $lr
...
```

## 4.3 Display Ivm IR nodes with Graphviz

The previous section, display the DAG translation process in text on terminal by `llc -debug` option. The `llc` also support the graphic display. The [section Install other tools on iMac](#) mentioned the web for `llc` graphic display information. The `llc` graphic display with tool Graphviz is introduced in this section. The graphic display is more readable by eye than display text in terminal. It's not necessary, but helps a lot especially when you are tired in tracking

the DAG translation process. List the `llc` graphic support options from the sub-section “SelectionDAG Instruction Selection Process” of web <sup>4</sup> as follows,

---

**Note:** The `llc` Graphviz DAG display options

- view-dag-combine1-dags displays the DAG after being built, before the first optimization pass.
  - view-legalize-dags displays the DAG before Legalization.
  - view-dag-combine2-dags displays the DAG before the second optimization pass.
  - view-isel-dags displays the DAG before the Select phase.
  - view-sched-dags displays the DAG before Scheduling.
- 

By tracking `llc -debug`, you can see the DAG translation steps as follows,

```
Initial selection DAG
Optimized lowered selection DAG
Type-legalized selection DAG
Optimized type-legalized selection DAG
Legalized selection DAG
Optimized legalized selection DAG
Instruction selection
Selected selection DAG
Scheduling
...
```

Let's run `llc` with option `-view-dag-combine1-dags`, and open the output result with Graphviz as follows,

```
118-165-12-177:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -view-dag-combine1-dags -march=cpu0
-relocation-model=pic -filetype=asm ch4_2.bc -o ch4_2.cpu0.s
Writing '/tmp/llvm_84ibpm/dag.main.dot'... done.
118-165-12-177:InputFiles Jonathan$ Graphviz /tmp/llvm_84ibpm/dag.main.dot
```

It will show the `/tmp/llvm_84ibpm/dag.main.dot` as [Figure 4.1](#).

From [Figure 4.1](#), we can see the `-view-dag-combine1-dags` option is for Initial selection DAG. We list the other view options and their corresponding DAG translation stage as follows,

---

**Note:** `llc` Graphviz options and corresponding DAG translation stage

- view-dag-combine1-dags: Initial selection DAG
  - view-legalize-dags: Optimized type-legalized selection DAG
  - view-dag-combine2-dags: Legalized selection DAG
  - view-isel-dags: Optimized legalized selection DAG
  - view-sched-dags: Selected selection DAG
- 

The `-view-isel-dags` is important and often used by an llvm backend writer because it is the DAG before instruction selection. The backend programmer need to know what is the DAG for writing the pattern match instruction in target description file `.td`.

---

<sup>4</sup> <http://llvm.org/docs/CodeGenerator.html>



Figure 4.1: llc option -view-dag-combine1-dags graphic view

## 4.4 Local variable pointer

To support pointer to local variable, add this code fragment in Cpu0InstrInfo.td and Cpu0InstPrinter.cpp as follows,

### LLVMBackendTutorialExampleCode/Chapter4\_4/Cpu0InstrInfo.td

```
def mem_ea : Operand<i32> {
let PrintMethod = "printMemOperandEA";
let MIOperandInfo = (ops CPUREgs, simm16);
let EncoderMethod = "getMemEncoding";
}
...
class EffectiveAddress<string instr_asm, RegisterClass RC, Operand Mem> :
FMem<0x09, (outs RC:$ra), (ins Mem:$addr),
    instr_asm, [(set RC:$ra, addr:$addr)], IIAlu>;
...
// FrameIndexes are legalized when they are operands from load/store
// instructions. The same not happens for stack address copies, so an
// add op with mem ComplexPattern is used and the stack address copy
// can be matched. It's similar to Sparc LEA_ADDRi
def LEA_ADDiu : EffectiveAddress<"addiu\t$ra, $addr", CPUREgs, mem_ea> {
let isCodeGenOnly = 1;
}
```

### LLVMBackendTutorialExampleCode/Chapter4\_4/Cpu0InstPrinter.td

```
void Cpu0InstPrinter:::
printMemOperandEA(const MCInst *MI, int opNum, raw_ostream &O) {
// when using stack locations for not load/store instructions
// print the same way as all normal 3 operand instructions.
printOperand(MI, opNum, O);
O << ", ";
printOperand(MI, opNum+1, O);
return;
}
```

Run ch4\_4.cpp with code Chapter4\_4/ which support pointer to local variable, will get result as follows,

### LLVMBackendTutorialExampleCode/InputFiles/ch4\_4.cpp

```
1 int main()
2 {
3     int b = 3;
4
5     int* p = &b;
6
7     return *p;
8 }
```

```
118-165-66-82:InputFiles Jonathan$ clang -c ch4_4.cpp -emit-llvm -o ch4_4.bc
118-165-66-82:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_
debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch4_4.bc -o ch4_4.cpu0.s
```

```
118-165-66-82:InputFiles Jonathan$ cat ch4_4.cpu0.s
.section .mdebug.abi32
.previous
.file "ch4_5.bc"
.text
.globl main
.align 2
.type main,@function
.ent main           # @main
main:
.cfi_startproc
.frame $sp,16,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
    addiu $sp, $sp, -16
$tmp1:
.cfi_def_cfa_offset 16
    addiu $2, $zero, 0
    st $2, 12($sp)
    addiu $2, $zero, 3
    st $2, 8($sp)
    addiu $2, $sp, 8
    st $2, 0($sp)
    addiu $sp, $sp, 16
    ret $lr
.set macro
.set reorder
.end main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc
```

## 4.5 Operator mod, %

### 4.5.1 The DAG of %

Example input code ch4\_5.cpp which contains the C operator “%” and it’s corresponding llvm IR, as follows,

#### LLVMBackendTutorialExampleCode/InputFiles/ch4\_5.cpp

```
1 int main()
2 {
3     int b = 11;
4 // unsigned int b = 11;
5
6     b = (b+1)%12;
7
8     return b;
9 }
```

```

; ModuleID = 'ch4_5.bc'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-
f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.8.0"

define i32 @main() nounwind ssp {
entry:
%retval = alloca i32, align 4
%b = alloca i32, align 4
store i32 0, i32* %retval
store i32 11, i32* %b, align 4
%0 = load i32* %b, align 4
%add = add nsw i32 %0, 1
%rem = srem i32 %add, 12
store i32 %rem, i32* %b, align 4
%1 = load i32* %b, align 4
ret i32 %1
}

```

LLVM **srem** is the IR corresponding “%”, reference sub-section “srem instruction” of <sup>3</sup>. Copy the reference as follows,

---

#### Note: ‘srem’ Instruction

Syntax: <result> = srem <ty> <op1>, <op2> ; yields {ty}:result

Overview: The ‘srem’ instruction returns the remainder from the signed division of its two operands. This instruction can also take vector versions of the values in which case the elements must be integers.

Arguments: The two arguments to the ‘srem’ instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics: This instruction returns the remainder of a division (where the result is either zero or has the same sign as the dividend, op1), not the modulo operator (where the result is either zero or has the same sign as the divisor, op2) of a value. For more information about the difference, see The Math Forum. For a table of how this is implemented in various languages, please see Wikipedia: modulo operation.

Note that signed integer remainder and unsigned integer remainder are distinct operations; for unsigned integer remainder, use ‘urem’.

Taking the remainder of a division by zero leads to undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by taking the remainder of a 32-bit division of -2147483648 by -1. (The remainder doesn’t actually overflow, but this rule lets srem be implemented using instructions that return both the result of the division and the remainder.)

Example: <result> = srem i32 4, %var ; yields {i32}:result = 4 % %var

---

Run Chapter4\_5/ with input file ch4\_5.bc via llc option –view-isel-dags as below, will get the following error message and the llvm DAG of Figure 4.2 below.

```

118-165-79-37:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -view-isel-dags -relocation-model=
pic -filetype=asm ch4_5.bc -o ch4_5.cpu0.s
...
LLVM ERROR: Cannot select: 0x7fa73a02ea10: i32 = mulhs 0x7fa73a02c610,
0x7fa73a02e910 [ID=12]
0x7fa73a02c610: i32 = Constant<12> [ORD=5] [ID=7]
0x7fa73a02e910: i32 = Constant<715827883> [ID=9]

```



Figure 4.2: `ch4_5.bc` DAG

LLVM replace srem divide operation with multiply operation in DAG optimization because DIV operation cost more in time than MUL. For example code “**int b = 11; b=(b+1)%12;**”, it translate into [Figure 4.2](#). We verify the result and explain it by calculate the value in each node. The  $0xC * 0x2AAAAAAAB = 0x2,00000004$ , ( $\text{mulhs } 0xC, 0x2AAAAAAAB$ ) meaning get the Signed mul high word (32bits). Multiply with 2 operands of 1 word size generate the 2 word size of result ( $0x2, 0xAAAAAAAB$ ). The high word result, in this case is  $0x2$ . The final result (sub 12, 12) is 0 which match the statement  $(11+1)\%12$ .

## 4.5.2 Arm solution

Let's run `Chapter4_5_1/` with `ch4_5.cpp` as well as `llc -view-sched-dags` option to get [Figure 4.3](#). Similarly, SMMUL get the high word of multiply result.

Follows is the result of run `Chapter4_5_1/` with `ch4_5.bc`.

```
118-165-66-82:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_
debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch4_5.bc -o ch4_5.cpu0.s
118-165-71-252:InputFiles Jonathan$ cat ch4_5.cpu0.s
    .section .mdebug.abi32
    .previous
    .file "ch4_5.bc"
    .text
    .globl main
    .align 2
    .type main,@function
    .ent main          # @main
main:
    .cfi_startproc
    .frame $sp,8,$lr
    .mask 0x00000000,0
    .set noreorder
    .set nomacro
# BB#0:
    addiu $sp, $sp, -8
$tmp1:
    .cfi_def_cfa_offset 8
    addiu $2, $zero, 0
    st $2, 4($sp)
    addiu $2, $zero, 11
    st $2, 0($sp)
    addiu $2, $zero, 10922
    shl $2, $2, 16
    ori $3, $2, 43691
    addiu $2, $zero, 12
    smmul $3, $2, $3
    shr $4, $3, 31
    sra $3, $3, 1
    addu $3, $3, $4
    mul $3, $3, $2
    sub $2, $2, $3
    st $2, 0($sp)
    addiu $sp, $sp, 8
    ret $lr
    .set macro
    .set reorder
    .end main
$tmp2:
```



Figure 4.3: Translate ch4\_5.bc into cpu0 backend DAG

```
.size main, ($tmp2)-main
.cfi_endproc
```

The other instruction UMMUL and llvm IR mulhu are unsigned int type for operator `%`. You can check it by unmark the “**unsigned int b = 11;**” in ch4\_5.cpp.

Use SMMUL instruction to get the high word of multiplication result is adopted in ARM. The Chapter4\_5\_1/ use the ARM solution. With this solution, the following code is needed.

#### LLVMBackendTutorialExampleCode/Chapter4\_5\_1/Cpu0InstrInfo.td

```
// Transformation Function - get the lower 16 bits.
def LO16 : SDNodeXForm<imm, [<
return getImm(N, N->getZExtValue() & 0xFFFF);
]>;

// Transformation Function - get the higher 16 bits.
def HI16 : SDNodeXForm<imm, [<
return getImm(N, (N->getZExtValue() >> 16) & 0xFFFF);
]>;
...
def SMMUL    : ArithLogicR<0x50, "smmul", mulhs, IIImul, CPURegs, 1>;
def UMMUL    : ArithLogicR<0x51, "ummul", mulhu, IIImul, CPURegs, 1>;
...
// Arbitrary immediates
def : Pat<(i32 imm:$imm),
      (OR (SHL (ADDiu ZERO, (HI16 imm:$imm)), 16), (ADDiu ZERO, (LO16 imm:$imm)))>;
```

### 4.5.3 Mips solution

Mips use MULT instruction and save the high & low part to register HI and LO. After that, use mfhi/mflo to move register HI/LO to your general purpose register. ARM SMMUL is fast if you only need the HI part of result (it ignore the LO part of operation). ARM also provide SMULL (signed multiply long) to get the whole 64 bits result. If you need the LO part of result, you can use Cpu0 MUL instruction which only get the LO part of result. Chapter4\_5\_2/ is implemented with Mips MULT style. We choose it as the implementation of this book to add instructions as less as possible. This approach is better for Cpu0 to keep it as a tutorial architecture for school teaching purpose material, and apply Cpu0 as an engineer learning materials in compiler, system program and verilog CPU hardware design. For Mips style implementation, we add the following code in Cpu0RegisterInfo.td, Cpu0InstrInfo.td and Cpu0ISelDAGToDAG.cpp. And list the related DAG nodes mulhs and mulhu which are used in Chapter4\_5\_2/ from TargetSelectionDAG.td.

#### LLVMBackendTutorialExampleCode/Chapter4\_5\_2/Cpu0RegisterInfo.td

```
// Hi/Lo registers
def HI  : Register<"HI">, DwarfRegNum<[18]>;
def LO  : Register<"LO">, DwarfRegNum<[19]>;
...
// Hi/Lo Registers
def HILO : RegisterClass<"Cpu0", [i32], 32, (add HI, LO)>;
...
// Cpu0Schedule.td
...
def IIHiLo      : InstrItinClass;
```

```
...
def Cpu0GenericItineraries : ProcessorItineraries<[ALU, IMULDIV], [], [
...
InstrItinData<IIHiLo      , [InstrStage<1, [IMULDIV]>]>,
...
]>;
```

### LLVMBackendTutorialExampleCode/Chapter4\_5\_2/Cpu0InstrInfo.td

```
// Mul, Div
class Mult<bits<8> op, string instr_asm, InstrItinClass itin,
    RegisterClass RC, list<Register> DefRegs>:
FL<op, (outs), (ins RC:$ra, RC:$rb),
!strconcat(instr_asm, "\t$ra, $rb"), [], itin> {
let imm16 = 0;
let isCommutable = 1;
let Defs = DefRegs;
let neverHasSideEffects = 1;
}

class Mult32<bits<8> op, string instr_asm, InstrItinClass itin>:
Mult<op, instr_asm, itin, CPURegs, [HI, LO]>;

// Move from Hi/Lo
class MoveFromLOHI<bits<8> op, string instr_asm, RegisterClass RC,
    list<Register> UseRegs>:
FL<op, (outs RC:$ra), (ins),
!strconcat(instr_asm, "\t$ra"), [], IIHiLo> {
let rb = 0;
let imm16 = 0;
let Uses = UseRegs;
let neverHasSideEffects = 1;
}
...
def MULT      : Mult32<0x50, "mult", IIImul>;
def MULTu     : Mult32<0x51, "multu", IIImul>;

def MFHI : MoveFromLOHI<0x40, "mfhi", CPURegs, [HI]>;
def MFLO : MoveFromLOHI<0x41, "mflo", CPURegs, [LO]>;
```

### LLVMBackendTutorialExampleCode/Chapter4\_5\_2/Cpu0ISelDAGToDAG.cpp

```
/// Select multiply instructions.
std::pair<SDNode*, SDNode*>
Cpu0DAGToDAGISel::SelectMULT(SDNode *N, unsigned Opc, DebugLoc dl, EVT Ty,
    bool HasLo, bool HasHi) {
SDNode *Lo = 0, *Hi = 0;
SDNode *Mul = CurDAG->getMachineNode(Opc, dl, MVT::Glue, N->getOperand(0),
    N->getOperand(1));
SDValue InFlag = SDValue(Mul, 0);

if (HasLo) {
    Lo = CurDAG->getMachineNode(Cpu0::MFLO, dl,
        Ty, MVT::Glue, InFlag);
    InFlag = SDValue(Lo, 1);
```

```

}

if (HasHi)
    Hi = CurDAG->getMachineNode(Cpu0::MFHI, dl,
                                   Ty, InFlag);

return std::make_pair(Lo, Hi);
}

/// Select instructions not customized! Used for
/// expanded, promoted and normal instructions
SDNode* Cpu0DAGToDAGISel::Select(SDNode *Node) {
unsigned Opcode = Node->getOpcode();
DebugLoc dl = Node->getDebugLoc();
...
EVT NodeTy = Node->getValueType(0);
unsigned MultOpc;
switch(Opcode) {
default: break;

case ISD::MULHS:
case ISD::MULHU: {
    MultOpc = (Opcode == ISD::MULHU ? Cpu0::MULTu : Cpu0::MULT);
    return SelectMULT(Node, MultOpc, dl, NodeTy, false, true).second;
}
...
}

```

#### include/llvm/Target/TargetSelectionDAG.td

```

def mulhs      : SDNode<"ISD::MULHS"      , SDTIntBinOp, [SDNPCommutative]>;
def mulhu     : SDNode<"ISD::MULHU"     , SDTIntBinOp, [SDNPCommutative]>;

```

Except the custom type, llvm IR operations of expand and promote type will call Cpu0DAGToDAGISel::Select() during instruction selection of DAG translation. In Select(), it return the HI part of multiplication result to HI register, for IR operations of mulhs or mulhu. After that, MFHI instruction move the HI register to cpu0 field “a” register, \$ra. MFHI instruction is FL format and only use cpu0 field “a” register, we set the \$rb and imm16 to 0. [Figure 4.4](#) and ch4\_5.cpu0.s are the result of compile ch4\_5.bc.

```

118-165-66-82:InputFiles Jonathan$ cat ch4_5.cpu0.s
.section .mdebug.abi32
.previous
.file "ch4_5.bc"
.text
.globl main
.align 2
.type main,@function
.ent main           # @main
main:
.cfi_startproc
.frame $sp,8,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
    addiu $sp, $sp, -8
$tmp1:

```



Figure 4.4: DAG for ch4\_5.bc with Mips style MULT

```

.cfi_def_cfa_offset 8
addiu $2, $zero, 0
st $2, 4($sp)
addiu $2, $zero, 11
st $2, 0($sp)
addiu $2, $zero, 10922
shl $2, $2, 16
ori $3, $2, 43691
addiu $2, $zero, 12
mult $2, $3
mfhi $3
shr $4, $3, 31
sra $3, $3, 1
addu $3, $3, $4
mul $3, $3, $2
sub $2, $2, $3
st $2, 0($sp)
addiu $sp, $sp, 8
ret $lr
.set macro
.set reorder
.end main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc

```

## 4.6 Full support %

The sensitive readers may find the llvm using “**multiplication**” instead of “**div**” to get the “%” result just because our example use constant as divider, “**(b+1)%12**” in our example. If programmer use variable as the divider like “**(b+1)%a**”, then what will happen in our code. The answer is our code will has error to take care this. In [section Support arithmetic instructions](#), we use “**div a, b**” to hold the quotient part in register. The multiplication operator “**\***” need 64 bits of register to hold the result for two 32 bits of operands multiplication. We modify cpu0 to use the pair of registers LO and HI which just like Mips to solve this issue in last section. Now, it’s time to modify cpu0 for integer “**divide**” operator again. We use LO and HI registers to hold the “**quotient**” and “**remainder**” and use instructions “**mflo**” and “**mfhi**” to get the result from LO or HI registers. With this solution, the “**c = a / b**” can be got by “**div a, b**” and “**mflo c**”; the “**c = a % b**” can be got by “**div a, b**” and “**mfhi c**”.

Chapter4\_6/ support operator “%” and “/”. The code added in Chapter4\_6/ as follows,

[LLVMBackendTutorialExampleCode/Chapter4\\_6/Cpu0InstrInfo.cpp](#)

```

1  }
2
3  void Cpu0InstrInfo::
4  copyPhysReg(MachineBasicBlock &MBB,
5              MachineBasicBlock::iterator I, DebugLoc DL,
6              unsigned DestReg, unsigned SrcReg,
7              bool KillSrc) const {
8  unsigned Opc = 0, ZeroReg = 0;
9
10 if (Cpu0::CPURegsRegClass.contains(DestReg)) { // Copy to CPU Reg.
11     if (Cpu0::CPURegsRegClass.contains(SrcReg))
12         Opc = Cpu0::ADD, ZeroReg = Cpu0::ZERO;

```

```

13     else if (SrcReg == Cpu0::HI)
14         Opc = Cpu0::MFHI, SrcReg = 0;
15     else if (SrcReg == Cpu0::LO)
16         Opc = Cpu0::MFLO, SrcReg = 0;
17     }
18     else if (Cpu0::CPUREgsRegClass.contains(SrcReg)) { // Copy from CPU Reg.
19         if (DestReg == Cpu0::HI)
20             Opc = Cpu0::MTHI, DestReg = 0;
21         else if (DestReg == Cpu0::LO)
22             Opc = Cpu0::MTLO, DestReg = 0;
23     }
24
25     assert(Opc && "Cannot copy registers");
26
27     MachineInstrBuilder MIB = BuildMI(MBB, I, DL, get(Opc));
28
29     if (DestReg)
30         MIB.addReg(DestReg, RegState::Define);
31
32     if (ZeroReg)
33         MIB.addReg(ZeroReg);
34
35     if (SrcReg)
36         MIB.addReg(SrcReg, getKillRegState(KillSrc));
37 }
```

### LLVMBackendTutorialExampleCode/Chapter4\_6/Cpu0InstrInfo.h

```

1     virtual void copyPhysReg(MachineBasicBlock &MBB,
2                                 MachineBasicBlock::iterator MI, DebugLoc DL,
3                                 unsigned DestReg, unsigned SrcReg,
4                                 bool KillSrc) const;
```

### LLVMBackendTutorialExampleCode/Chapter4\_6/Cpu0InstrInfo.td

```

def SDT_Cpu0DivRem      : SDTypeProfile<0, 2,
    [SDTCisInt<0>,
     SDTCisSameAs<0, 1>];
...
// DivRem(u) nodes
def Cpu0DivRem : SDNode<"Cpu0ISD::DivRem", SDT_Cpu0DivRem,
    [SDNPOutGlue];
def Cpu0DivRemU : SDNode<"Cpu0ISD::DivRemU", SDT_Cpu0DivRem,
    [SDNPOutGlue];
...
class Div<SDNode opNode, bits<8> op, string instr_asm, InstrItinClass itin,
    RegisterClass RC, list<Register> DefRegs>:
    FL<op, (outs), (ins RC:$rb, RC:$rc),
        !strconcat(instr_asm, "\t$$zero, $rb, $rc"),
        [(opNode RC:$rb, RC:$rc)], itin> {
let imm16 = 0;
let Defs = DefRegs;
}

class Div32<SDNode opNode, bits<8> op, string instr_asm, InstrItinClass itin>:
```

```

Div<opNode, op, instr_asm, itin, CPURegs, [HI, LO]>;
...
class MoveToLOHI<bits<8> op, string instr_asm, RegisterClass RC,
    list<Register> DefRegs>:
    FL<op, (outs), (ins RC:$ra),
    !strconcat(instr_asm, "\t$ra"), [], IIHiLo> {
let rb = 0;
let imm16 = 0;
let Defs = DefRegs;
let neverHasSideEffects = 1;
}
...
def SDIV      : Div32<Cpu0DivRem, 0x16, "div", IIIdiv>;
def UDIV      : Div32<Cpu0DivRemU, 0x17, "divu", IIIdiv>;
...
def MTHI : MoveToLOHI<0x42, "mthi", CPURegs, [HI]>;
def MTLO : MoveToLOHI<0x43, "mtlo", CPURegs, [LO]>;

```

### LLVMBackendTutorialExampleCode/Chapter4\_6/Cpu0ISelLowering.cpp

```

Cpu0TargetLowering::
Cpu0TargetLowering(Cpu0TargetMachine &TM)
: TargetLowering(TM, new TargetLoweringObjectFileELF()),
Subtarget(&TM.getSubtarget<Cpu0Subtarget>()) {
...
setOperationAction(ISD::SDIV, MVT::i32, Expand);
setOperationAction(ISD::SREM, MVT::i32, Expand);
setOperationAction(ISD::UDIV, MVT::i32, Expand);
setOperationAction(ISD::UREM, MVT::i32, Expand);

setTargetDAGCombine(ISD::SDIVREM);
setTargetDAGCombine(ISD::UDIVREM);
...
}
...
static SDValue PerformDivRemCombine(SDNode *N, SelectionDAG& DAG,
    TargetLowering::DAGCombinerInfo &DCI,
    const Cpu0Subtarget* Subtarget) {
if (DCI.isBeforeLegalizeOps())
return SDValue();

EVT Ty = N->getValueType(0);
unsigned LO = Cpu0::LO;
unsigned HI = Cpu0::HI;
unsigned opc = N->getOpcode() == ISD::SDIVREM ? Cpu0ISD::DivRem :
    Cpu0ISD::DivRemU;
DebugLoc dl = N->getDebugLoc();

SDValue DivRem = DAG.getNode(opc, dl, MVT::Glue,
    N->getOperand(0), N->getOperand(1));
SDValue InChain = DAG.getEntryNode();
SDValue InGlue = DivRem;

// insert MFLO
if (N->hasAnyUseOfValue(0)) {
SDValue CopyFromLo = DAG.getCopyFromReg(InChain, dl, LO, Ty,

```

```

        InGlue);
DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), CopyFromLo);
InChain = CopyFromLo.getValue(1);
InGlue = CopyFromLo.getValue(2);
}

// insert MFHI
if (N->hasAnyUseOfValue(1)) {
SDValue CopyFromHi = DAG.getCopyFromReg(InChain, dl,
    HI, Ty, InGlue);
DAG.ReplaceAllUsesOfValueWith(SDValue(N, 1), CopyFromHi);
}

return SDValue();
}

SDValue Cpu0TargetLowering::PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI)
const {
SelectionDAG &DAG = DCI.DAG;
unsigned opc = N->getOpcode();

switch (opc) {
default: break;
case ISD::SDIVREM:
case ISD::UDIVREM:
return PerformDivRemCombine(N, DAG, DCI, Subtarget);
}

return SDValue();
}

```

### LLVMBackendTutorialExampleCode/Chapter4\_6/Cpu0ISelLowering.h

```

namespace llvm {
namespace Cpu0ISD {
enum NodeType {
    // Start the numbering from where ISD NodeType finishes.
    FIRST_NUMBER = ISD::BUILTIN_OP_END,
    Ret,
    // DivRem(u)
    DivRem,
    DivRemU
};
}
...

```

Run with ch4\_6\_2.cpp can get the result for operator “/” as below. But run with ch4\_6\_1.cpp as below, cannot get the “div” for operator “%”. It still use “multiplication” instead of “div” because llvm do “Constant Propagation Optimization” on this. The ch4\_6\_2.cpp can get the “div” for “%” result since it make the llvm “Constant Propagation Optimization” useless in this.

### LLVMBackendTutorialExampleCode/InputFiles/ch4\_6\_1.cpp

```

1 int main()
2 {

```

```

3     int b = 11;
4     int a = 12;
5
6     b = (b+1)%a;
7
8     return b;
9 }
```

#### LLVMBackendTutorialExampleCode/InputFiles/ch4\_6\_2.cpp

```

1 int test_mod(int c)
2 {
3     int b = 11;
4
5     b = (b+1)%c;
6
7     return b;
8 }
```

```

118-165-77-79:InputFiles Jonathan$ clang -c ch4_6_2.cpp -emit-llvm -o ch4_6_2.bc
118-165-77-79:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_
debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch4_6_2.bc -o -
...
div $zero, $3, $2
mflo $2
...
```

To explain how Chapter4\_6 work with “**div**”, let’s run Chapter8\_8 with ch4\_6\_2.cpp as follows,

```

118-165-83-58:InputFiles Jonathan$ clang -c ch4_6_2.cpp -I/Applications/Xcode.app/
Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/
include/ -emit-llvm -o ch4_6_2.bc
118-165-83-58:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/bin/
Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm -debug ch4_6_2.bc -o -
Args: /Users/Jonathan/llvm/test/cmake_debug_build/bin/Debug/llc -march=cpu0
-relocation-model=pic -filetype=asm -debug ch4_6_2.bc -o -

==== _Z8test_modi
Initial selection DAG: BB#0 '_Z8test_modi:'
SelectionDAG has 21 nodes:
 0x7fed68410bc8: ch = EntryToken [ORD=1]

 0x7fed6882cb10: i32 = undef [ORD=1]

 0x7fed6882cd10: i32 = FrameIndex<0> [ORD=1]

 0x7fed6882ce10: i32 = Constant<0>

 0x7fed6882d110: i32 = FrameIndex<1> [ORD=2]

 0x7fed68410bc8: <multiple use>
 0x7fed68410bc8: <multiple use>
 0x7fed6882ca10: i32 = FrameIndex<-1> [ORD=1]

 0x7fed6882cb10: <multiple use>
 0x7fed6882cc10: i32, ch = load 0x7fed68410bc8, 0x7fed6882ca10,
```

```
0x7fed6882cb10<LD4[FixedStack-1]> [ORD=1]

0x7fed6882cd10: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882cf10: ch = store 0x7fed68410bc8, 0x7fed6882cc10, 0x7fed6882cd10,
0x7fed6882cb10<ST4[%1]> [ORD=1]

0x7fed6882d010: i32 = Constant<11> [ORD=2]

0x7fed6882d110: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882d210: ch = store 0x7fed6882cf10, 0x7fed6882d010, 0x7fed6882d110,
0x7fed6882cb10<ST4[%b]> [ORD=2]

0x7fed6882d210: <multiple use>
0x7fed6882d110: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882d310: i32, ch = load 0x7fed6882d210, 0x7fed6882d110,
0x7fed6882cb10<LD4[%b]> [ORD=3]

0x7fed6882d210: <multiple use>
0x7fed6882cd10: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882d610: i32, ch = load 0x7fed6882d210, 0x7fed6882cd10,
0x7fed6882cb10<LD4[%1]> [ORD=5]

0x7fed6882d310: <multiple use>
0x7fed6882d610: <multiple use>
0x7fed6882d810: ch = TokenFactor 0x7fed6882d310:1, 0x7fed6882d610:1 [ORD=7]

0x7fed6882d310: <multiple use>
0x7fed6882d410: i32 = Constant<1> [ORD=4]

0x7fed6882d510: i32 = add 0x7fed6882d310, 0x7fed6882d410 [ORD=4]

0x7fed6882d610: <multiple use>
0x7fed6882d710: i32 = srem 0x7fed6882d510, 0x7fed6882d610 [ORD=6]

0x7fed6882d110: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882fc10: ch = store 0x7fed6882d810, 0x7fed6882d710, 0x7fed6882d110,
0x7fed6882cb10<ST4[%b]> [ORD=7]

0x7fed6882fe10: i32 = Register %V0

0x7fed6882fc10: <multiple use>
0x7fed6882fe10: <multiple use>
0x7fed6882fc10: <multiple use>
0x7fed6882d110: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882fd10: i32, ch = load 0x7fed6882fc10, 0x7fed6882d110,
0x7fed6882cb10<LD4[%b]> [ORD=8]

0x7fed6882ff10: ch, glue = CopyToReg 0x7fed6882fc10, 0x7fed6882fe10,
0x7fed6882fd10

0x7fed6882ff10: <multiple use>
0x7fed6882fe10: <multiple use>
```

```

0x7fed6882ff10: <multiple use>
0x7fed68830010: ch = Cpu0ISD::Ret 0x7fed6882ff10, 0x7fed6882fe10,
0x7fed6882ff10:1

Replacing.1 0x7fed6882fd10: i32, ch = load 0x7fed6882fc10, 0x7fed6882d110,
0x7fed6882cb10<LD4[%b]> [ORD=8]

With: 0x7fed6882d710: i32 = srem 0x7fed6882d510, 0x7fed6882d610 [ORD=6]
and 1 other values

Replacing.1 0x7fed6882d310: i32, ch = load 0x7fed6882d210, 0x7fed6882d110,
0x7fed6882cb10<LD4[%b]> [ORD=3]

With: 0x7fed6882d010: i32 = Constant<11> [ORD=2]
and 1 other values

Replacing.3 0x7fed6882d810: ch = TokenFactor 0x7fed6882d210,
0x7fed6882d610:1 [ORD=7]

With: 0x7fed6882d610: i32, ch = load 0x7fed6882d210, 0x7fed6882cd10,
0x7fed6882cb10<LD4[%1]> [ORD=5]

Replacing.3 0x7fed6882d510: i32 = add 0x7fed6882d010, 0x7fed6882d410 [ORD=4]

With: 0x7fed6882d810: i32 = Constant<12>

Replacing.1 0x7fed6882cc10: i32, ch = load 0x7fed68410bc8, 0x7fed6882ca10,
0x7fed6882cb10<LD4[FixedStack-1]> [align=8] [ORD=1]

With: 0x7fed6882cc10: i32, ch = load 0x7fed68410bc8, 0x7fed6882ca10,
0x7fed6882cb10<LD4[FixedStack-1]> [align=8] [ORD=1]
and 1 other values
Optimized lowered selection DAG: BB#0 '_Z8test_modi:'
SelectionDAG has 16 nodes:
0x7fed68410bc8: ch = EntryToken [ORD=1]

0x7fed6882cb10: i32 = undef [ORD=1]

0x7fed6882cd10: i32 = FrameIndex<0> [ORD=1]

0x7fed6882d110: i32 = FrameIndex<1> [ORD=2]

0x7fed68410bc8: <multiple use>
0x7fed68410bc8: <multiple use>
0x7fed6882ca10: i32 = FrameIndex<-1> [ORD=1]

0x7fed6882cb10: <multiple use>
0x7fed6882cc10: i32, ch = load 0x7fed68410bc8, 0x7fed6882ca10,
0x7fed6882cb10<LD4[FixedStack-1]> [align=8] [ORD=1]

0x7fed6882cd10: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882cf10: ch = store 0x7fed68410bc8, 0x7fed6882cc10, 0x7fed6882cd10,
0x7fed6882cb10<ST4[%1]> [ORD=1]

0x7fed6882d010: i32 = Constant<11> [ORD=2]

```

```

0x7fed6882d110: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882d210: ch = store 0x7fed6882cf10, 0x7fed6882d010, 0x7fed6882d110,
0x7fed6882cb10<ST4[%b]> [ORD=2]

0x7fed6882cd10: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882d610: i32, ch = load 0x7fed6882d210, 0x7fed6882cd10,
0x7fed6882cb10<LD4[%1]> [ORD=5]

0x7fed6882d810: i32 = Constant<12>

0x7fed6882d610: <multiple use>
0x7fed6882d710: i32 = srem 0x7fed6882d810, 0x7fed6882d610 [ORD=6]

0x7fed6882fe10: i32 = Register %V0

0x7fed6882d610: <multiple use>
0x7fed6882d710: <multiple use>
0x7fed6882d110: <multiple use>
0x7fed6882cb10: <multiple use>
0x7fed6882fc10: ch = store 0x7fed6882d610:1, 0x7fed6882d710, 0x7fed6882d110,
0x7fed6882cb10<ST4[%b]> [ORD=7]

0x7fed6882fe10: <multiple use>
0x7fed6882d710: <multiple use>
0x7fed6882ff10: ch, glue = CopyToReg 0x7fed6882fc10, 0x7fed6882fe10,
0x7fed6882d710

0x7fed6882ff10: <multiple use>
0x7fed6882fe10: <multiple use>
0x7fed6882ff10: <multiple use>
0x7fed68830010: ch = Cpu0ISD::Ret 0x7fed6882ff10, 0x7fed6882fe10,
0x7fed6882ff10:1

Type-legalized selection DAG: BB#0 '_Z8test_modi:'
SelectionDAG has 16 nodes:
...
0x7fed6882d610: i32, ch = load 0x7fed6882d210, 0x7fed6882cd10,
0x7fed6882cb10<LD4[%1]> [ORD=5] [ID=-3]

0x7fed6882d810: i32 = Constant<12> [ID=-3]

0x7fed6882d610: <multiple use>
0x7fed6882d710: i32 = srem 0x7fed6882d810, 0x7fed6882d610 [ORD=6] [ID=-3]
...

Legalized selection DAG: BB#0 '_Z8test_modi:'
SelectionDAG has 16 nodes:
0x7fed68410bc8: ch = EntryToken [ORD=1] [ID=0]

0x7fed6882cb10: i32 = undef [ORD=1] [ID=2]

0x7fed6882cd10: i32 = FrameIndex<0> [ORD=1] [ID=3]

0x7fed6882d110: i32 = FrameIndex<1> [ORD=2] [ID=5]

0x7fed6882fe10: i32 = Register %V0 [ID=6]

```

```

...
0x7fed6882d810: i32 = Constant<12> [ID=7]

0x7fed6882d610: <multiple use>
0x7fed6882ce10: i32,i32 = sdivrem 0x7fed6882d810, 0x7fed6882d610

Optimized legalized selection DAG: BB#0 '_Z8test_modi:'
SelectionDAG has 18 nodes:
...
0x7fed6882d510: i32 = Register %HI

0x7fed6882d810: i32 = Constant<12> [ID=7]

0x7fed6882d610: <multiple use>
0x7fed6882d410: glue = Cpu0ISD::DivRem 0x7fed6882d810, 0x7fed6882d610

0x7fed6882d310: i32,ch,glue = CopyFromReg 0x7fed68410bc8, 0x7fed6882d510,
0x7fed6882d410
...
===== Instruction selection begins: BB#0 ''
...
Selecting: 0x7fed6882d410: glue = Cpu0ISD::DivRem 0x7fed6882d810,
0x7fed6882d610 [ID=13]

ISEL: Starting pattern match on root node: 0x7fed6882d410: glue =
Cpu0ISD::DivRem 0x7fed6882d810, 0x7fed6882d610 [ID=13]

Initial Opcode index to 1355
Morphed node: 0x7fed6882d410: i32,glue = SDIV 0x7fed6882d810, 0x7fed6882d610

ISEL: Match complete!
=> 0x7fed6882d410: i32,glue = SDIV 0x7fed6882d810, 0x7fed6882d610
...

```

According above DAG translation message from llc -debug, it do the following things:

1. Reduce DAG nodes in stage “Optimized lowered selection DAG” (Replacing ... displayed before “Optimized lowered selection DAG: BB#0 '\_Z8test\_modi:entry’ ”). Since SSA form has some redundant nodes for store and load, them can be removed.
2. Change DAG srem to sdivrem in stage “Legalized selection DAG”.
3. Change DAG sdivrem to Cpu0ISD::DivRem and in stage “Optimized legalized selection DAG”.
4. Add DAG “0x7fd25b830710: i32 = Register %HI” and “CopyFromReg 0x7fd25b410e18, 0x7fd25b830710, 0x7fd25b830910” in stage “Optimized legalized selection DAG”.

Summary as Table: Stages for C operator % and Table: Functions handle the DAG translation and pattern match for C operator %.

Table 4.5: Stages for C operator %

| Stage                             | IR/DAG/instruction | IR/DAG/instruction     |
|-----------------------------------|--------------------|------------------------|
| .bc                               | srem               |                        |
| Legalized selection DAG           | sdivrem            |                        |
| Optimized legalized selection DAG | Cpu0ISD::DivRem    | CopyFromReg xx, Hi, xx |
| pattern match                     | div                | mfhi                   |

Table 4.6: Functions handle the DAG translation and pattern match for C operator %

| Translation                       | Do by                                            |
|-----------------------------------|--------------------------------------------------|
| srem => sdivrem                   | setOperationAction(ISD::SREM, MVT::i32, Expand); |
| sdivrem => Cpu0ISD::DivRem        | setTargetDAGCombine(ISD::SDIVREM);               |
| sdivrem => CopyFromReg xx, Hi, xx | PerformDivRemCombine();                          |
| Cpu0ISD::DivRem => div            | SDIV (Cpu0InstrInfo.td)                          |
| CopyFromReg xx, Hi, xx => mfhi    | MFLO (Cpu0InstrInfo.td)                          |

Item 2 as above, is triggered by code “setOperationAction(ISD::SREM, MVT::i32, Expand);” in Cpu0ISelLowering.cpp. About **Expand** please ref. <sup>5</sup> and <sup>6</sup>. Item 3 is triggered by code “setTargetDAGCombine(ISD::SDIVREM);” in Cpu0ISelLowering.cpp. Item 4 is triggered by PerformDivRemCombine() which called by PerformDAGCombine() since the % corresponding **srem** make the “N->hasAnyUseOfValue(1)” to true in PerformDivRemCombine(). Then, it creates “CopyFromReg 0x7fd25b410e18, 0x7fd25b830710, 0x7fd25b830910”. When use “%” in C, it will make “N->hasAnyUseOfValue(0)” to true. For sdivrem, **sdiv** make “N->hasAnyUseOfValue(0)” true while **srem** make “N->hasAnyUseOfValue(1)” true.

Above items will change the DAG when llc running. After that, the pattern match defined in Chapter4\_6/Cpu0InstrInfo.td will translate **Cpu0ISD::DivRem** to **div**; and “**CopyFromReg 0x7fd25b410e18, Register %H, 0x7fd25b830910**” to **mfhi**.

## 4.7 Summary

We support most of C operators in this chapter. Until now, we have around 3400 lines of source code with comments. With these 345 lines of source code added, it supports the number of operators from three to over ten.

List C operators, IR of .bc, Optimized legalized selection DAG and Cpu0 instructions implemented in this chapter in Table: Chapter 4 operators.

<sup>5</sup> <http://llvm.org/docs/WritingAnLLVMBackend.html#expand>

<sup>6</sup> <http://llvm.org/docs/CodeGenerator.html#selectiondag-legalizetypes-phase>

Table 4.7: Chapter 4 operators

| C     | .bc                                                                                                                 | Optimized legalized selection DAG                                                                                   | Cpu0                                                                                                                                      |
|-------|---------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| +     | add                                                                                                                 | add                                                                                                                 | add                                                                                                                                       |
| -     | sub                                                                                                                 | sub                                                                                                                 | sub                                                                                                                                       |
| *     | mul                                                                                                                 | mul                                                                                                                 | mul                                                                                                                                       |
| /     | sdiv                                                                                                                | Cpu0ISD::DivRem                                                                                                     | div                                                                                                                                       |
| •     | udiv                                                                                                                | Cpu0ISD::DivRemU                                                                                                    | divu                                                                                                                                      |
| &, && | and                                                                                                                 | and                                                                                                                 | and                                                                                                                                       |
| !,    | or                                                                                                                  | or                                                                                                                  | or                                                                                                                                        |
| ^     | xor                                                                                                                 | xor                                                                                                                 | xor                                                                                                                                       |
| <<    | shl                                                                                                                 | shl                                                                                                                 | shl                                                                                                                                       |
| >>    | <ul style="list-style-type: none"> <li>ashr</li> <li>lshr</li> </ul>                                                | <ul style="list-style-type: none"> <li>sra</li> <li>srl</li> </ul>                                                  | <ul style="list-style-type: none"> <li>sra</li> <li>shr</li> </ul>                                                                        |
| !     | <ul style="list-style-type: none"> <li>%tobool = icmp ne i32 %0, 0</li> <li>%lnot = xor i1 %tobool, true</li> </ul> | <ul style="list-style-type: none"> <li>%lnot = (setcc %tobool, 0, seteq)</li> <li>%conv = (and %lnot, 1)</li> </ul> | <ul style="list-style-type: none"> <li>%1 = (xor %tobool, 0)</li> <li>%true = (addiu \$r0, 1)</li> <li>%lnot = (xor %1, %true)</li> </ul> |
| •     | <ul style="list-style-type: none"> <li>%conv = zext i1 %lnot to i32</li> </ul>                                      | <ul style="list-style-type: none"> <li>%conv = (and %lnot, 1)</li> </ul>                                            | <ul style="list-style-type: none"> <li>%conv = (and %lnot, 1)</li> </ul>                                                                  |
| %     | <ul style="list-style-type: none"> <li>srem</li> <li>sremu</li> </ul>                                               | <ul style="list-style-type: none"> <li>Cpu0ISD::DivRem</li> <li>Cpu0ISD::DivRemU</li> </ul>                         | <ul style="list-style-type: none"> <li>div</li> <li>divu</li> </ul>                                                                       |



---

# GENERATING OBJECT FILES

The previous chapters only introduce the assembly code generated. This chapter will introduce you the obj support first, and display the obj by objdump utility. With LLVM support, the cpu0 backend can generate both big endian and little endian obj files with only a few code added. The Target Registration mechanism and their structure will be introduced in this chapter.

## 5.1 Translate into obj file

Currently, we only support translate llvm IR code into assembly code. If you try to run Chapter4\_6\_2/ to translate obj code will get the error message as follows,

```
[Gamma@localhost 3]$ /usr/local/llvm/test/cmake_debug_build/bin/
llc -march=cpu0 -relocation-model=pic -filetype=obj ch4_1_2.bc -o ch4_1_2.cpu0.o
/usr/local/llvm/test/cmake_debug_build/bin/llc: target does not
support generation of this file type!
```

The Chapter5/ support obj file generated. It can get result for big endian and little endian with command llc -march=cpu0 and llc -march=cpu0el. Run it will get the obj files as follows,

```
[Gamma@localhost InputFiles]$ cat ch4_1_2.cpu0.s
...
.set nomacro
# BB#0:
addiu $sp, $sp, -72
addiu $2, $zero, 0
st $2, 68($sp)
addiu $3, $zero, 5
st $3, 64($sp)
...
[Gamma@localhost 3]$ /usr/local/llvm/test/cmake_debug_build/bin/
llc -march=cpu0 -relocation-model=pic -filetype=obj ch4_2.bc -o ch4_2.cpu0.o
[Gamma@localhost InputFiles]$ objdump -s ch4_2.cpu0.o

ch4_2.cpu0.o:      file format elf32-big

Contents of section .text:
0000 09ddfffb8 09200000 022d0044 09300005  ..... .-.D.0..
0010 023d0040 09300002 023d003c 022d0038  .=.0.0...=.<.-.8
0020 022d0034 022d0014 0930ffffb 023d0010  .-.4.-...0...=...
0030 022d000c 022d0008 012d003c 013d0040  .-...-...-.<.=.0
0040 13232000 022d0038 012d003c 013d0040  .# ..-.8.-.<.=.0
```

```

0050 14232000 022d0034 012d003c 013d0040 .# ..-.4.-.<.=.@
0060 15232000 022d0030 012d003c 013d0040 .# ..-.0.-.<.=.@
0070 16320000 41200000 022d002c 012d003c .2..A ...-.,--.<
0080 013d0010 17320000 41200000 022d0008 .=...2..A ...-..
0090 012d003c 013d0040 18232000 022d0028 .-.<.=.0.# ...-(.
00a0 012d003c 013d0040 19232000 022d0024 .-.<.=.0.# ...-$
00b0 012d003c 013d0040 1a232000 022d0020 .-.<.=.0.# ...-
00c0 012d0040 1e220002 022d001c 012d0010 .-.@."...-.-.-
00d0 1e220002 022d0004 012d0040 1b220002 ."...-.-.0."...
00e0 022d0018 012d0010 1f220002 022d000c .-.-.-."...-..
00f0 012d0038 09dd0048 2c000000 .-8...H, ...

Contents of section .eh_frame:
0000 00000010 00000000 017a5200 017c0e01 .....zR...|...
0010 1b0c0d00 00000010 00000018 00000000 .....
0020 000000fc 00440e48 .....D.H

[Gamma@localhost InputFiles]$ /usr/local/llvm/test/
cmake_debug_build/bin/llc -march=cpu0el -relocation-model=pic -filetype=obj
ch4_2.bc -o ch4_2.cpu0el.o
[Gamma@localhost InputFiles]$ objdump -s ch4_2.cpu0el.o

ch4_2.cpu0el.o:      file format elf32-little

Contents of section .text:
0000 b8fffd09 00002009 44002d02 05003009 .....D.-...0.
0010 40003d02 02003009 3c003d02 38002d02 @.=...0.<.=.8-.
0020 34002d02 14002d02 fbff3009 10003d02 4.-...-...0...=.
0030 0c002d02 08002d02 3c002d01 40003d01 ...-.-.-<.-.0.=.
0040 00202313 38002d02 3c002d01 40003d01 . #.8.-.<.-.0.=.
0050 00202314 34002d02 3c002d01 40003d01 . #.4.-.<.-.0.=.
0060 00202315 30002d02 3c002d01 40003d01 . #.0.-.<.-.0.=.
0070 00003216 00002041 2c002d02 3c002d01 ..2... A,.-.<.-.
0080 10003d01 00003217 00002041 08002d02 ..=...2... A...-
0090 3c002d01 40003d01 00202318 28002d02 <.-.0.=... #.(.-.
00a0 3c002d01 40003d01 00202319 24002d02 <.-.0.=... #.$.-.
00b0 3c002d01 40003d01 0020231a 20002d02 <.-.0.=... #. .-.
00c0 40002d01 0200221e 1c002d02 10002d01 @.-...".-.-.-
00d0 0200221e 04002d02 40002d01 0200221b .."...-..0.-...".
00e0 18002d02 10002d01 0200221f 0c002d02 .-.-.-."...-.
00f0 38002d01 4800dd09 0000002c 8.-.H....,
Contents of section .eh_frame:
0000 10000000 00000000 017a5200 017c0e01 .....zR...|...
0010 1b0c0d00 10000000 18000000 00000000 .....
0020 fc000000 00440e48 .....D.H

```

The first instruction is “**addiu \$sp, -72**” and it’s corresponding obj is 0x09ddff8. The addiu opcode is 0x09, 8 bits, \$sp register number is 13(0xd), 4bits, and the immediate is 16 bits -72(=0xffb8), so it’s correct. The third instruction “**st \$2, 68(\$sp)**” and it’s corresponding obj is 0x022d0044. The st opcode is **0x02**, \$2 is 0x2, \$sp is 0xd and immediate is 68(0x0044). Thanks to cpu0 instruction format which opcode, register operand and offset(immediate value) size are multiple of 4 bits. Base on the 4 bits multiple, the obj format is easy to check by eyes. The big endian (B0, B1, B2, B3) = (09, dd, ff, b8), objdump from B0 to B3 as 0x09ddff8 and the little endian is (B3, B2, B1, B0) = (09, dd, ff, b8), objdump from B0 to B3 as 0xb8ffdd09.

## 5.2 Backend Target Registration Structure

Now, let’s examine Cpu0MCTargetDesc.cpp.

## LLVMBackendTutorialExampleCode/Chapter5\_1/MCTargetDesc/Cpu0MCTargetDesc.cpp

```

1  }
2
3 extern "C" void LLVMInitializeCpu0TargetMC() {
4     // Register the MC asm info.
5     RegisterMCAsmInfoFn X(TheCpu0Target, createCpu0MCAsmInfo);
6     RegisterMCAsmInfoFn Y(TheCpu0elTarget, createCpu0MCAsmInfo);
7
8     // Register the MC codegen info.
9     TargetRegistry::RegisterMCCodeGenInfo(TheCpu0Target,
10                                         createCpu0MCCodeGenInfo);
11    TargetRegistry::RegisterMCCodeGenInfo(TheCpu0elTarget,
12                                         createCpu0MCCodeGenInfo);
13    // Register the MC instruction info.
14    TargetRegistry::RegisterMCInstrInfo(TheCpu0Target, createCpu0MCInstrInfo);
15    TargetRegistry::RegisterMCInstrInfo(TheCpu0elTarget, createCpu0MCInstrInfo);
16
17    // Register the MC register info.
18    TargetRegistry::RegisterMCRegInfo(TheCpu0Target, createCpu0MCRegisterInfo);
19    TargetRegistry::RegisterMCRegInfo(TheCpu0elTarget, createCpu0MCRegisterInfo);
20
21    // Register the MC Code Emitter
22    TargetRegistry::RegisterMCCodeEmitter(TheCpu0Target,
23                                         createCpu0MCCodeEmitterEB);
24    TargetRegistry::RegisterMCCodeEmitter(TheCpu0elTarget,
25                                         createCpu0MCCodeEmitterEL);
26
27    // Register the object streamer.
28    TargetRegistry::RegisterMCObjectStreamer(TheCpu0Target, createMCStreamer);
29    TargetRegistry::RegisterMCObjectStreamer(TheCpu0elTarget, createMCStreamer);
30
31    // Register the asm backend.
32    TargetRegistry::RegisterMCAsmBackend(TheCpu0Target,
33                                         createCpu0AsmBackendEB32);
34    TargetRegistry::RegisterMCAsmBackend(TheCpu0elTarget,
35                                         createCpu0AsmBackendEL32);
36    // Register the MC subtarget info.
37    TargetRegistry::RegisterMCSubtargetInfo(TheCpu0Target,
38                                         createCpu0MCSubtargetInfo);
39    TargetRegistry::RegisterMCSubtargetInfo(TheCpu0elTarget,
40                                         createCpu0MCSubtargetInfo);
41    // Register the MCInstPrinter.
42    TargetRegistry::RegisterMCInstPrinter(TheCpu0Target,
43                                         createCpu0MCInstPrinter);
44    TargetRegistry::RegisterMCInstPrinter(TheCpu0elTarget,
45                                         createCpu0MCInstPrinter);
46}

```

Cpu0MCTargetDesc.cpp do the target registration as mentioned in “section Target Registration”<sup>1</sup> of the last chapter. Drawing the register function and those class it registered in Figure 5.1 to Figure 5.9 for explanation.

In Figure 5.1, registering the object of class Cpu0AsmInfo for target TheCpu0Target and TheCpu0elTarget. TheCpu0Target is for big endian and TheCpu0elTarget is for little endian. Cpu0AsmInfo is derived from MCAsmInfo which is llvm built-in class. Most code is implemented in it’s parent, back end reuse those code by inherit.

In Figure 5.2, instancing MCCodeGenInfo, and initialize it by pass Roloc::PIC because we use command llc

<sup>1</sup> <http://jonathan2251.github.com/lbd/llvmsstructure.html#target-registration>



Figure 5.1: Register Cpu0MCAsmInfo



Figure 5.2: Register MCCCodeGenInfo



Figure 5.3: Register MCInstrInfo



Figure 5.4: Register MCRegisterInfo



Figure 5.5: Register Cpu0MCCodeEmitter



Figure 5.6: Register MCELFStreamer



Figure 5.7: Register Cpu0AsmBackend



Figure 5.8: Register Cpu0MCSubtargetInfo



Figure 5.9: Register Cpu0InstPrinter



Figure 5.10: MCELFStreamer inherit tree

`-relocation-model=pic` to tell `llc` compile using position-independent code mode. Recall the addressing mode in system program book has two mode, one is PIC mode, the other is absolute addressing mode. MC stands for Machine Code.

In [Figure 5.3](#), instancing `MCInstrInfo` object `X`, and initialize it by `InitCpu0MCInstrInfo(X)`. Since `InitCpu0MCInstrInfo(X)` is defined in `Cpu0GenInstrInfo.inc`, it will add the information from `Cpu0InstrInfo.td` we specified. [Figure 5.4](#) is similar to [Figure 5.3](#), but it initialize the register information specified in `Cpu0RegisterInfo.td`. They share a lot of code with instruction/register td description.

[Figure 5.5](#), instancing two objects `Cpu0MCCodeEmitter`, one is for big endian and the other is for little endian. They take care the obj format generated. So, it's not defined in [Chapter4\\_6\\_2/](#) which support assembly code only.

[Figure 5.6](#), `MCELFStreamer` take care the obj format also. [Figure 5.5](#) `Cpu0MCCodeEmitter` take care code emitter while `MCELFStreamer` take care the obj output streamer. [Figure 5.10](#) is `MCELFStreamer` inherit tree. You can find a lot of operations in that inherit tree.

Reader maybe has the question for what are the actual arguments in `createCpu0MCCodeEmitterEB(const MCInstrInfo &MCII, const MCSubtargetInfo &STI, MCContext &Ctx)` and at when they are assigned. Yes, we didn't assign it, we register the `createXXX()` function by function pointer only (according C, `TargetRegistry::RegisterXXX(TheCpu0Target, createXXX())` where `createXXX` is function pointer). LLVM keep a function pointer to `createXXX()` when we call target registry, and will call these `createXXX()` function back at proper time with arguments assigned during the target registration process, `RegisterXXX()`.

[Figure 5.7](#), `Cpu0AsmBackend` class is the bridge for asm to obj. Two objects take care big endian and little endian also. It derived from `MCAsmBackend`. Most of code for object file generated is implemented by `MCELFStreamer` and it's parent, `MCAsmBackend`.

[Figure 5.8](#), instancing `MCSubtargetInfo` object and initialize with `Cpu0.td` information. [Figure 5.9](#), instancing `Cpu0InstPrinter` to take care printing function for instructions. Like [Figure 5.1](#) to [Figure 5.4](#), it has been defined in [Chapter4\\_6\\_2/](#) code for assembly file generated support.



# GLOBAL VARIABLES, STRUCTS AND ARRAYS, OTHER TYPE

In the previous two chapters, we only access the local variables. This chapter will deal global variable access translation. After that, introducing the types of struct and array as well as their corresponding llvm IR statement, and how the cpu0 translate these llvm IR statements in [section Array and struct support](#). Finally, we deal the other types such as “**short int**” and **char** in the last section.

The global variable DAG translation is different from the previous DAG translation we have now. It create DAG nodes at run time in our backend C++ code according the `l1c -relocation-model` option while the others of DAG just do IR DAG to Machine DAG translation directly according the input file IR DAG.

## 6.1 Global variable

Chapter6\_1/ support the global variable, let's compile ch6\_1.cpp with this version first, and explain the code changes after that.

[LLVMBackendTutorialExampleCode/InputFiles/ch6\\_1.cpp](#)

```
1 int gStart;
2 int gI = 100;
3 int fun()
4 {
5     int c = 0;
6
7     c = gI;
8
9     return c;
10 }
```

  

```
118-165-78-166:InputFiles Jonathan$ llvm-dis ch6_1.bc -o -
; ModuleID = 'ch6_1.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-
f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-
n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"

@gStart = global i32 2, align 4
@gI = global i32 100, align 4
```

```
define i32 @_Z3funv() nounwind uwtable ssp {
    %1 = alloca i32, align 4
    %c = alloca i32, align 4
    store i32 0, i32* %1
    store i32 0, i32* %c, align 4
    %2 = load i32* @_gI, align 4
    store i32 %2, i32* %c, align 4
    %3 = load i32* %c, align 4
    ret i32 %3
}
```

### 6.1.1 Cpu0 global variable options

Cpu0 like Mips supports both static and pic mode. There are two different layout of global variables for static mode which controlled by option `cpu0-use-small-section`. Chapter6\_1/ support the global variable translation. Let's run Chapter6\_1/ with `ch6_1.cpp` via three different options `llc -relocation-model=static -cpu0-use-small-section=false`, `llc -relocation-model=static -cpu0-use-small-section=true` and `llc -relocation-model=pic` to trace the DAG and Cpu0 instructions.

```
118-165-78-166:InputFiles Jonathan$ clang -c ch6_1.cpp -emit-llvm -o ch6_1.bc
118-165-78-166:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -cpu0-use-small-section=false
-filetype=asm -debug ch6_1.bc -o -
```

...

```
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
```

```
...
0x7ffd5902cc10: <multiple use>
0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=-3]

0x7ffd5902d010: i32 = GlobalAddress<i32* @_gI> 0 [ORD=3] [ID=-3]

0x7ffd5902cc10: <multiple use>
0x7ffd5902d110: i32,ch = load 0x7ffd5902cf10, 0x7ffd5902d010,
0x7ffd5902cc10<LD4[@_gI]> [ORD=3] [ID=-3]
...
```

```
Legalized selection DAG: BB#0 '_Z3funv:entry'
```

```
SelectionDAG has 16 nodes:
```

```
...
0x7ffd5902cc10: <multiple use>
0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=8]

0x7ffd5902d310: i32 = TargetGlobalAddress<i32* @_gI> 0 [TF=5]

0x7ffd5902d710: i32 = Cpu0ISD::Hi 0x7ffd5902d310

0x7ffd5902d610: i32 = TargetGlobalAddress<i32* @_gI> 0 [TF=6]

0x7ffd5902d810: i32 = Cpu0ISD::Lo 0x7ffd5902d610

0x7ffd5902fe10: i32 = add 0x7ffd5902d710, 0x7ffd5902d810
```

```

0x7ffd5902cc10: <multiple use>
0x7ffd5902d110: i32, ch = load 0x7ffd5902cf10, 0x7ffd5902fe10,
0x7ffd5902cc10<LD4[@gI]> [ORD=3] [ID=9]
...
addiu  $2, $zero, %hi(gI)
shl   $2, $2, 16
addiu $2, $2, %lo(gI)
ld    $2, 0($2)
...
.type  gStart,@object          # @gStart
.data
.globl gStart
.align 2
gStart:
.4byte 2                      # 0x2
.size   gStart, 4

.type  gI,@object             # @gI
.globl gI
.align 2
gI:
.4byte 100                     # 0x64
.size   gI, 4

118-165-78-166:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -cpu0-use-small-section=true
-filetype=asm -debug ch6_1.bc -o -
...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7fc5f382cc10: <multiple use>
0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=-3]

0x7fc5f382d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

0x7fc5f382cc10: <multiple use>
0x7fc5f382d110: i32, ch = load 0x7fc5f382cf10, 0x7fc5f382d010,
0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=-3]

Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 15 nodes:
...
0x7fc5f382cc10: <multiple use>
0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=8]

0x7fc5f382d710: i32 = GLOBAL_OFFSET_TABLE

0x7fc5f382d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=4]

0x7fc5f382d610: i32 = Cpu0ISD::GPRel 0x7fc5f382d310

0x7fc5f382d810: i32 = add 0x7fc5f382d710, 0x7fc5f382d610

```

```

0x7fc5f382cc10: <multiple use>
0x7fc5f382d110: i32, ch = load 0x7fc5f382cf10, 0x7fc5f382d810,
0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=9]
...
addiu  $2, $gp, %gp_rel(gI)
ld    $2, 0($2)
...
.type  gStart,@object          # @gStart
.section .sdata, "aw", @progbits
.globl gStart
.align 2
gStart:
    .4byte 2                  # 0x2
    .size   gStart, 4

    .type  gI,@object          # @gI
    .globl gI
    .align 2
gI:
    .4byte 100                 # 0x64
    .size   gI, 4

118-165-78-166:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm -debug ch6_1.bc
-o -

...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7fad7102cc10: <multiple use>
0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=-3]

0x7fad7102d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

0x7fad7102cc10: <multiple use>
0x7fad7102d110: i32, ch = load 0x7fad7102cf10, 0x7fad7102d010,
0x7fad7102cc10<LD4[@gI]> [ORD=3] [ID=-3]
...
Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 15 nodes:
0x7ff3c9c10b98: ch = EntryToken [ORD=1] [ID=0]
...
0x7fad7102cc10: <multiple use>
0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=8]

0x7fad70c10b98: <multiple use>
0x7fad7102d610: i32 = Register %GP

0x7fad7102d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=1]

0x7fad7102d710: i32 = Cpu0ISD::Wrapper 0x7fad7102d610, 0x7fad7102d310

0x7fad7102cc10: <multiple use>
0x7fad7102d810: i32, ch = load 0x7fad70c10b98, 0x7fad7102d710,

```

```

0x7fad7102cc10<LD4 [<unknown>]>

0x7ff3ca02cc10: <multiple use>
0x7ff3ca02d110: i32, ch = load 0x7ff3ca02cf10, 0x7ff3ca02d810,
0x7ff3ca02cc10<LD4[@gI]> [ORD=3] [ID=9]

...
.set noreorder
.cupload $6
.set nomacro
...
ld $2, %got(gI)($gp)
ld $2, 0($2)
...
.type gStart,@object      # @gStart
.data
.globl gStart
.align 2
gStart:
.4byte 2                  # 0x2
.size gStart, 4

.type gI,@object          # @gI
.globl gI
.align 2
gI:
.4byte 100                # 0x64
.size gI, 4

```

Summary above information to Table: Cpu0 global variable options.

Table 6.1: Cpu0 global variable options

| option name             | default | other option value | description                                                                                                                                         |
|-------------------------|---------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| -relocation-model       | pic     | static             | <ul style="list-style-type: none"> <li>• pic: Position Independent Address</li> <li>• static: Absolute Address</li> </ul>                           |
| -cpu0-use-small-section | false   | true               | <ul style="list-style-type: none"> <li>• false: .data or .bss, 16 bits addressable</li> <li>• true: .sdata or .sbss, 32 bits addressable</li> </ul> |

Table 6.2: Cpu0 DAGs and instructions for -relocation-model=static

| option: cpu0-use-small-section | false                                                                     | true                                                    |
|--------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------|
| addressing mode                | absolute                                                                  | \$gp relative                                           |
| addressing                     | absolute                                                                  | \$gp+offset                                             |
| Legalized selection DAG        | (add Cpu0ISD::Hi<gI offset Hi16><br>Cpu0ISD::Lo<gI offset Lo16>)          | (add GLOBAL_OFFSET_TABLE,<br>Cpu0ISD::GPRel<gI offset>) |
| Cpu0                           | addiu \$2, \$zero, %hi(gI); shl \$2, \$2, 16;<br>addiu \$2, \$2, %lo(gI); | addiu \$2, \$gp, %gp_rel(gI);                           |
| relocation records solved      | link time                                                                 | link time                                               |

- In static, `cpu0-use-small-section=true`, offset between `gI` and `.data` can be calculated since the `$gp` is assigned at fixed address of the start of global address table.
- In “static, `cpu0-use-small-section=false`”, the `gI` high and low address (`%hi(gI)` and `%lo(gI)`) are translated into absolute address.

Table 6.3: Cpu0 DAGs and instructions for `-relocation-model=pic`

| option:<br><code>cpu0-use-small-<br/>section</code> | <code>false</code>                                                    | <code>true</code>                                                                                                                               |
|-----------------------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| addressing mode                                     | <code>\$gp relative</code>                                            | <code>\$gp relative</code>                                                                                                                      |
| addressing                                          | <code>\$gp+offset</code>                                              | <code>\$gp+offset</code>                                                                                                                        |
| Legalized<br>selection DAG                          | <code>(load<br/>(Cpu0ISD::Wrapper<br/>%GP, &lt;gI offset&gt;))</code> | <code>(load EntryToken, (Cpu0ISD::Wrapper (add Cpu0ISD::Hi&lt;gI<br/>offset Hi16&gt;, Register %GP), Cpu0ISD::Lo&lt;gI offset Lo16&gt;))</code> |
| Cpu0                                                | <code>ld \$2, %got(gI)(\$gp);</code>                                  | <code>addiu \$2, \$zero, %got_hi(gI); shl \$2, \$2, 16; add \$2, \$2, \$gp; ld<br/>\$2, %got_lo(gI)(\$2);</code>                                |
| relocation<br>records solved                        | link/load time                                                        | link/load time                                                                                                                                  |

- In `pic`, offset between `gI` and `.data` cannot be calculated if the function is loaded at run time (dynamic link); the offset can be calculated if use static link.
- In C, all variable names binding statically. In C++, the overload variable or function are binding dynamically.

According book of system program, there are Absolute Addressing Mode and Position Independent Addressing Mode. The dynamic function must compiled with Position Independent Addressing Mode. In principle, option `-relocation-model` is used to generate Absolute Addressing or Position Independent Addressing. The exception is `-relocation-model=static` and `-cpu0-use-small-section=false`. In this case, the register `$gp` is reserved to set at the start address of global variable area. Cpu0 use `$gp` relative addressing in this mode.

To support global variable, first add **UseSmallSectionOpt** command variable to `Cpu0Subtarget.cpp`. After that, user can run `llc` with option `llc -cpu0-use-small-section=false` to specify **UseSmallSectionOpt** to false. The default of **UseSmallSectionOpt** is false if without specify it further. About the **cl::opt** command line variable, you can refer to <sup>1</sup> further.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0Subtarget.h

```
class Cpu0Subtarget : public Cpu0GenSubtargetInfo {
    ...
    // UseSmallSection - Small section is used.
    bool UseSmallSection;
    ...
    bool useSmallSection() const { return UseSmallSection; }
};
```

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0Subtarget.cpp

```
static cl::opt<bool>
UseSmallSectionOpt("cpu0-use-small-section", cl::Hidden, cl::init(false),
                  cl::desc("Use small section. Only work with -relocation-model=" "static. pic always not use small section."));
```

<sup>1</sup> <http://llvm.org/docs/CommandLine.html>

Next add file Cpu0TargetObjectFile.h, Cpu0TargetObjectFile.cpp and the following code to Cpu0RegisterInfo.cpp and Cpu0ISelLowering.cpp.

**LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0TargetObjectFile.h**

```

1 //===== llvm/Target/Cpu0TargetObjectFile.h - Cpu0 Object Info ----- C++ -----//  

2 //  

3 // The LLVM Compiler Infrastructure  

4 //  

5 // This file is distributed under the University of Illinois Open Source  

6 // License. See LICENSE.TXT for details.  

7 //  

8 //=====-----//  

9  

10 #ifndef LLVM_TARGET_CPU0_TARGETOBJECTFILE_H  

11 #define LLVM_TARGET_CPU0_TARGETOBJECTFILE_H  

12  

13 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"  

14  

15 namespace llvm {  

16  

17     class Cpu0TargetObjectFile : public TargetLoweringObjectFileELF {  

18         const MCSection *SmallDataSection;  

19         const MCSection *SmallBSSSection;  

20     public:  

21  

22         void Initialize(MCContext &Ctx, const TargetMachine &TM);  

23  

24  

25         /// IsGlobalInSmallSection - Return true if this global address should be  

26         /// placed into small data/bss section.  

27         bool IsGlobalInSmallSection(const GlobalValue *GV,  

28                                     const TargetMachine &TM, SectionKind Kind) const;  

29         bool IsGlobalInSmallSection(const GlobalValue *GV,  

30                                     const TargetMachine &TM) const;  

31  

32         const MCSection *SelectSectionForGlobal(const GlobalValue *GV,  

33                                             SectionKind Kind,  

34                                             Mangler *Mang,  

35                                             const TargetMachine &TM) const;  

36  

37         // TODO: Classify globals as cpu0 wishes.  

38     };  

39 } // end namespace llvm  

40  

41 #endif

```

**LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0TargetObjectFile.cpp**

```

1 //===== Cpu0TargetObjectFile.cpp - Cpu0 Object Files -----//  

2 //  

3 // The LLVM Compiler Infrastructure  

4 //  

5 // This file is distributed under the University of Illinois Open Source  

6 // License. See LICENSE.TXT for details.

```

```

7  // -----
8  //=====
9
10 #include "Cpu0TargetObjectFile.h"
11 #include "Cpu0Subtarget.h"
12 #include "llvm/IR/DerivedTypes.h"
13 #include "llvm/IR/GlobalVariable.h"
14 #include "llvm/IR/DataLayout.h"
15 #include "llvm/MC/MCContext.h"
16 #include "llvm/MC/MCSectionELF.h"
17 #include "llvm/Target/TargetMachine.h"
18 #include "llvm/Support/CommandLine.h"
19 #include "llvm/Support/ELF.h"
20 using namespace llvm;
21
22 static cl::opt<unsigned>
23 SSThreshold("cpu0-ssection-threshold", cl::Hidden,
24             cl::desc("Small data and bss section threshold size (default=8)"),
25             cl::init(8));
26
27 void Cpu0TargetObjectFile::Initialize(MCContext &Ctx, const TargetMachine &TM) {
28     TargetLoweringObjectFileELF::Initialize(Ctx, TM);
29
30     SmallDataSection =
31         getContext().getELFSection(".sdata", ELF::SHT_PROGBITS,
32                                     ELF::SHF_WRITE | ELF::SHF_ALLOC,
33                                     SectionKind::getDataRel());
34
35     SmallBSSSection =
36         getContext().getELFSection(".sbss", ELF::SHT_NOBITS,
37                                     ELF::SHF_WRITE | ELF::SHF_ALLOC,
38                                     SectionKind::getBSS());
39
40 }
41
42 // A address must be loaded from a small section if its size is less than the
43 // small section size threshold. Data in this section must be addressed using
44 // gp_rel operator.
45 static bool IsInSmallSection(uint64_t Size) {
46     return Size > 0 && Size <= SSThreshold;
47 }
48
49 bool Cpu0TargetObjectFile::IsGlobalInSmallSection(const GlobalValue *GV,
50                                                 const TargetMachine &TM) const {
51     if (GV->isDeclaration() || GV->hasAvailableExternallyLinkage())
52         return false;
53
54     return IsGlobalInSmallSection(GV, TM, getKindForGlobal(GV, TM));
55 }
56
57 /// IsGlobalInSmallSection - Return true if this global address should be
58 /// placed into small data/bss section.
59 bool Cpu0TargetObjectFile::
60 IsGlobalInSmallSection(const GlobalValue *GV, const TargetMachine &TM,
61                       SectionKind Kind) const {
62
63     // Only use small section for non linux targets.
64     const Cpu0Subtarget &Subtarget = TM.getSubtarget<Cpu0Subtarget>();

```

```

65
66 // Return if small section is not available.
67 if (!Subtarget.useSmallSection())
68     return false;
69
70 // Only global variables, not functions.
71 const GlobalVariable *GVA = dyn_cast<GlobalVariable>(GV);
72 if (!GVA)
73     return false;
74
75 // We can only do this for datarel or BSS objects for now.
76 if (!Kind.isBSS() && !Kind.isDataRel())
77     return false;
78
79 // If this is a internal constant string, there is a special
80 // section for it, but not in small data/bss.
81 if (Kind.isMergeable1ByteCString())
82     return false;
83
84 Type *Ty = GV->getType()->getElementType();
85 return IsInSmallSection(TM.getDataLayout()->getTypeAllocSize(Ty));
86 }
87
88
89
90 const MCSection *Cpu0TargetObjectFile::
91 SelectSectionForGlobal(const GlobalValue *GV, SectionKind Kind,
92                     Mangler *Mang, const TargetMachine &TM) const {
93 // TODO: Could also support "weak" symbols as well with ".gnu.linkonce.s.*"
94 // sections?
95
96 // Handle Small Section classification here.
97 if (Kind.isBSS() && IsGlobalInSmallSection(GV, TM, Kind))
98     return SmallBSSSection;
99 if (Kind.isDataNoRel() && IsGlobalInSmallSection(GV, TM, Kind))
100    return SmallDataSection;
101
102 // Otherwise, we work the same as ELF.
103 return TargetLoweringObjectFileELF::SelectSectionForGlobal(GV, Kind, Mang, TM);
104 }

```

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0RegisterInfo.cpp

```

// pure virtual method
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
    ...
    // Reserve GP if small section is used.
    if (Subtarget.useSmallSection()) {
        Reserved.set(Cpu0::GP);
    }
    ...
}

```

**LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelLowering.cpp**

```

#include "Cpu0MachineFunction.h"
...
#include "Cpu0TargetObjectFile.h"
...
#include "MCTargetDesc/Cpu0BaseInfo.h"
...
#include "llvm/Support/CommandLine.h"
SDValue Cpu0TargetLowering::getGlobalReg(SelectionDAG &DAG, EVT Ty) const {
    Cpu0FunctionInfo *FI = DAG.getMachineFunction().getInfo<Cpu0FunctionInfo>();
    return DAG.getRegister(FI->getGlobalBaseReg(), Ty);
}

static SDValue getTargetNode(SDValue Op, SelectionDAG &DAG, unsigned Flag) {
    EVT Ty = Op.getValueType();

    if (GlobalAddressSDNode *N = dyn_cast<GlobalAddressSDNode>(Op))
        return DAG.getTargetGlobalAddress(N->getGlobal(), Op.getDebugLoc(), Ty, 0,
                                         Flag);
    if (ExternalSymbolSDNode *N = dyn_cast<ExternalSymbolSDNode>(Op))
        return DAG.getTargetExternalSymbol(N->getSymbol(), Ty, Flag);
    if (BlockAddressSDNode *N = dyn_cast<BlockAddressSDNode>(Op))
        return DAG.getTargetBlockAddress(N->getBlockAddress(), Ty, 0, Flag);
    if (JumpTableSDNode *N = dyn_cast<JumpTableSDNode>(Op))
        return DAG.getTargetJumpTable(N->getIndex(), Ty, Flag);
    if (ConstantPoolSDNode *N = dyn_cast<ConstantPoolSDNode>(Op))
        return DAG.getTargetConstantPool(N->getConstVal(), Ty, N->getAlignment(),
                                         N->getOffset(), Flag);

    llvm_unreachable("Unexpected node type.");
    return SDValue();
}

SDValue Cpu0TargetLowering::getAddrLocal(SDValue Op, SelectionDAG &DAG) const {
    DebugLoc DL = Op.getDebugLoc();
    EVT Ty = Op.getValueType();
    unsigned GOTFlag = Cpu0II::MO_GOT;
    SDValue GOT = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
                             getTargetNode(Op, DAG, GOTFlag));
    SDValue Load = DAG.getLoad(Ty, DL, DAG.getEntryNode(), GOT,
                               MachinePointerInfo::getGOT(), false, false, false,
                               0);
    unsigned LoFlag = Cpu0II::MO_ABS_LO;
    SDValue Lo = DAG.getNode(Cpu0ISD::Lo, DL, Ty, getTargetNode(Op, DAG, LoFlag));
    return DAG.getNode(ISD::ADD, DL, Ty, Load, Lo);
}

SDValue Cpu0TargetLowering::getAddrGlobal(SDValue Op, SelectionDAG &DAG,
                                         unsigned Flag) const {
    DebugLoc DL = Op.getDebugLoc();
    EVT Ty = Op.getValueType();
    SDValue Tgt = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
                             getTargetNode(Op, DAG, Flag));
    return DAG.getLoad(Ty, DL, DAG.getEntryNode(), Tgt,
                       MachinePointerInfo::getGOT(), false, false, false, 0);
}

```

```

SDValue Cpu0TargetLowering::getAddrGlobalLargeGOT(SDValue Op, SelectionDAG &DAG,
                                                unsigned HiFlag,
                                                unsigned LoFlag) const {
    DebugLoc DL = Op.getDebugLoc();
    EVT Ty = Op.getValueType();
    SDValue Hi = DAG.getNode(Cpu0ISD::Hi, DL, Ty, getTargetNode(Op, DAG, HiFlag));
    Hi = DAG.getNode(ISD::ADD, DL, Ty, Hi, getGlobalReg(DAG, Ty));
    SDValue Wrapper = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, Hi,
                                   getTargetNode(Op, DAG, LoFlag));
    return DAG.getLoad(Ty, DL, DAG.getEntryNode(), Wrapper,
                        MachinePointerInfo::getGOT(), false, false, false, 0);
}

const char *Cpu0TargetLowering::getTargetNodeName(unsigned Opcode) const {
    switch (Opcode) {
        case Cpu0ISD::JmpLink: return "Cpu0ISD::JmpLink";
        case Cpu0ISD::Hi: return "Cpu0ISD::Hi";
        case Cpu0ISD::Lo: return "Cpu0ISD::Lo";
        case Cpu0ISD::GPRel: return "Cpu0ISD::GPRel";
        case Cpu0ISD::Ret: return "Cpu0ISD::Ret";
        case Cpu0ISD::DivRem: return "Cpu0ISD::DivRem";
        case Cpu0ISD::DivRemU: return "Cpu0ISD::DivRemU";
        case Cpu0ISD::Wrapper: return "Cpu0ISD::Wrapper";
        default: return NULL;
    }
}

Cpu0TargetLowering::
Cpu0TargetLowering(Cpu0TargetMachine &TM)
: TargetLowering(TM, new Cpu0TargetObjectFile()),
  Subtarget(&TM.getSubtarget<Cpu0Subtarget>()) {
    ...
    // Cpu0 Custom Operations
    setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);
    ...
}

SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
    switch (Op.getOpcode())
    {
        case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);
    }
    return SDValue();
}

//=====//
// Lower helper functions
//=====//

//=====//
// Misc Lower Operation implementation
//=====//

SDValue Cpu0TargetLowering::LowerGlobalAddress(SDValue Op,
                                              SelectionDAG &DAG) const {
    // FIXME there isn't actually debug info here
}

```

```

DebugLoc dl = Op.getDebugLoc();
const GlobalValue *GV = cast<GlobalAddressSDNode>(Op)->getGlobal();

if (getTargetMachine().getRelocationModel() != Reloc::PIC_) {
    SDVList VTs = DAG.getVList(MVT::i32);

    Cpu0TargetObjectFile &TLOF = (Cpu0TargetObjectFile&)getObjFileLowering();

    // %gp_rel relocation
    if (TLOF.IsGlobalInSmallSection(GV, getTargetMachine())) {
        SDValue GA = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,
                                                Cpu0II::MO_GPREL);
        SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, dl, VTs, &GA, 1);
        SDValue GOT = DAG.getGLOBAL_OFFSET_TABLE(MVT::i32);
        return DAG.getNode(ISD::ADD, dl, MVT::i32, GOT, GPRelNode);
    }
    // %hi/%lo relocation
    SDValue GAHi = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,
                                              Cpu0II::MO_ABS_HI);
    SDValue GALo = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,
                                              Cpu0II::MO_ABS_LO);
    SDValue HiPart = DAG.getNode(Cpu0ISD::Hi, dl, VTs, &GAHi, 1);
    SDValue Lo = DAG.getNode(Cpu0ISD::Lo, dl, MVT::i32, GALo);
    return DAG.getNode(ISD::ADD, dl, MVT::i32, HiPart, Lo);
}

if (GV->hasInternalLinkage() || (GV->hasLocalLinkage() && !isa<Function>(GV)))
    return getAddrLocal(Op, DAG);

if (TLOF.IsGlobalInSmallSection(GV, getTargetMachine()))
    return getAddrGlobal(Op, DAG, Cpu0II::MO_GOT16);
else
    return getAddrGlobalLargeGOT(Op, DAG, Cpu0II::MO_GOT_HI16,
                                Cpu0II::MO_GOT_LO16);
}

```

The setOperationAction(ISD::GlobalAddress, MVT::i32, Custom) tells llc that we implement global address operation in C++ function Cpu0TargetLowering::LowerOperation(). LLVM will call this function only when llvm want to translate IR DAG of loading global variable into machine code. Since there are many Custom type of setOperationAction(ISD::XXX, MVT::XXX, Custom) in construction function Cpu0TargetLowering(), and each of them will trigger llvm calling Cpu0TargetLowering::LowerOperation() in stage “Legalized selection DAG”. The global address access can be identified by check if the DAG node of opcode is equal to ISD::GlobalAddress.

Finally, add the following code in Cpu0InstrInfo.td.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0InstrInfo.td

```

// Hi and Lo nodes are used to handle global addresses. Used on
// Cpu0ISelLowering to lower stuff like GlobalAddress, ExternalSymbol
// static model. (nothing to do with Cpu0 Registers Hi and Lo)
def Cpu0Hi : SDNode<"Cpu0ISD::Hi", SDTIntUnaryOp>;
def Cpu0Lo : SDNode<"Cpu0ISD::Lo", SDTIntUnaryOp>;
def Cpu0GPRel : SDNode<"Cpu0ISD::GPRel", SDTIntUnaryOp>;
...
// hi/lo relocs
def : Pat<(Cpu0Hi tglobaladdr:$in), (SHL (ADDiu ZERO, tglobaladdr:$in), 16)>;
// Expect cpu0 add LUI support, like Mips

```

```

//def : Pat<(Cpu0Hi tglobaladdr:$in), (LUi tglobaladdr:$in)>;
def : Pat<(Cpu0Lo tglobaladdr:$in), (ADDiu ZERO, tglobaladdr:$in)>;

def : Pat<(add CPUREgs:$hi, (Cpu0Lo tglobaladdr:$lo)),
        (ADDiu CPUREgs:$hi, tglobaladdr:$lo)>;

// gp_rel relocs
def : Pat<(add CPUREgs:$gp, (Cpu0GPRel tglobaladdr:$in)),
        (ADDiu CPUREgs:$gp, tglobaladdr:$in)>;

```

## 6.1.2 Static mode

From Table: Cpu0 global variable options, option `cpu0-use-small-section=false` put the global variable in `data/bss` while `cpu0-use-small-section=true` in `sdata/sbss`. The `sdata` stands for small data area. Section data and `sdata` are areas for global variable with initial value (such as `int gI = 100` in this example) while Section `bss` and `sbss` are areas for global variables without initial value (for example, `int gI;`).

### data or bss

The `data/bss` are 32 bits addressable areas since Cpu0 is a 32 bits architecture. Option `cpu0-use-small-section=false` will generate the following instructions.

```

...
    addiu  $2, $zero, %hi(gI)
    shl    $2, $2, 16
    addiu  $2, $2, %lo(gI)
    ld     $2, 0($2)

...
.type   gStart,@object          # @gStart
.data
.globl  gStart
.align  2

gStart:
    .4byte 2                  # 0x2
    .size   gStart, 4

    .type   gI,@object          # @gI
    .globl  gI
    .align  2

gI:
    .4byte 100                 # 0x64
    .size   gI, 4

```

Above code, it loads the high address part of `gI` PC relative address (16 bits) to register `$2` and shift 16 bits. Now, the register `$2` got its high part of `gI` absolute address. Next, it add register `$2` and low part of `gI` absolute address into `$2`. At this point, it gets the `gI` memory address. Finally, it gets the `gI` content by instruction “`ld $2, 0($2)`”. The `l1c -relocation-model=static` is for absolute address mode which must be used in static link mode. The dynamic link must be encoded with Position Independent Addressing. As you can see, the PC relative address can be solved in static link. In static, the function `fun()` is included to the whole execution file, ELF. The offset between `.data` and instruction “`addiu $2, $zero, %hi(gI)`” can be calculated. Since use PC relative address coding, this program can be loaded to any address and run well there. If this program uses absolute address and will be loaded at a specific address known at link stage, the relocation record of `gI` variable access instruction such as “`addiu $2, $zero, %hi(gI)`” and “`addiu $2, $2, %lo(gI)`” can be solved at link time. If this program uses absolute address and the loading address is known at load time, then this relocation record will be solved by loader at loading time.

`IsGlobalInSmallSection()` return true or false depends on `UseSmallSectionOpt`.

The code fragment of LowerGlobalAddress() as the following corresponding option `llc -relocation-model=static -cpu0-use-small-section=true` will translate DAG (`GlobalAddress<i32* @gI> 0`) into (`add Cpu0ISD::Hi<gI offset Hi16> Cpu0ISD::Lo<gI offset Lo16>`) in stage “Legalized selection DAG” as below.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelLowering.cpp

```
// Cpu0ISelLowering.cpp
...
// %hi/%lo relocation
SDValue GAHi = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,
                                         Cpu0II::MO_ABS_HI);
SDValue GALo = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,
                                         Cpu0II::MO_ABS_LO);
SDValue HiPart = DAG.getNode(Cpu0ISD::Hi, dl, VTs, &GAHi, 1);
SDValue Lo = DAG.getNode(Cpu0ISD::Lo, dl, MVT::i32, GALo);
return DAG.getNode(ISD::ADD, dl, MVT::i32, HiPart, Lo);

118-165-78-166:InputFiles Jonathan$ clang -c ch6_1.cpp -emit-llvm -o ch6_1.bc
118-165-78-166:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -cpu0-use-small-section=false
-filetype=asm -debug ch6_1.bc -o -
...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7ffd5902cc10: <multiple use>
0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=-3]

0x7ffd5902d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

0x7ffd5902cc10: <multiple use>
0x7ffd5902d110: i32,ch = load 0x7ffd5902cf10, 0x7ffd5902d010,
0x7ffd5902cc10<LD4[@gI]> [ORD=3] [ID=-3]
...
Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 16 nodes:
...
0x7ffd5902cc10: <multiple use>
0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=8]

0x7ffd5902d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=5]

0x7ffd5902d710: i32 = Cpu0ISD::Hi 0x7ffd5902d310

0x7ffd5902d610: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=6]

0x7ffd5902d810: i32 = Cpu0ISD::Lo 0x7ffd5902d610

0x7ffd5902fe10: i32 = add 0x7ffd5902d710, 0x7ffd5902d810

0x7ffd5902cc10: <multiple use>
0x7ffd5902d110: i32,ch = load 0x7ffd5902cf10, 0x7ffd5902fe10,
```

```
0x7ffd5902cc10<LD4[@gI]> [ORD=3] [ID=9]
```

Finally, the pattern defined in Cpu0InstrInfo.td as the following will translate DAG (add Cpu0ISD::Hi<gI offset Hi16> Cpu0ISD::Lo<gI offset Lo16>) into Cpu0 instructions as below.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0InstrInfo.td

```
// Hi and Lo nodes are used to handle global addresses. Used on
// Cpu0ISelLowering to lower stuff like GlobalAddress, ExternalSymbol
// static model. (nothing to do with Cpu0 Registers Hi and Lo)
def Cpu0Hi      : SDNode<"Cpu0ISD::Hi", SDTIntUnaryOp>;
def Cpu0Lo      : SDNode<"Cpu0ISD::Lo", SDTIntUnaryOp>;
...
// hi/lo relocs
def : Pat<(Cpu0Hi tglobaladdr:$in), (SHL (ADDiu ZERO, tglobaladdr:$in), 16)>;
// Expect cpu0 add LUi support, like Mips
//def : Pat<(Cpu0Hi tglobaladdr:$in), (LUi tglobaladdr:$in)>;
def : Pat<(Cpu0Lo tglobaladdr:$in), (ADDiu ZERO, tglobaladdr:$in)>;
def : Pat<(add CPUREgs:$hi, (Cpu0Lo tglobaladdr:$lo)),
          (ADDiu CPUREgs:$hi, tglobaladdr:$lo)>;
...
    addiu  $2, $zero, %hi(gI)
    shl    $2, $2, 16
    addiu $2, $2, %lo(gI)
    ...
```

As above, Pat<(...),(...)> include two lists of DAGs. The left is IR DAG and the right is machine instruction DAG. Pat<(Cpu0Hi tglobaladdr:\$in), (SHL (ADDiu ZERO, tglobaladdr:\$in), 16)>; will translate DAG (Cpu0ISD::Hi tglobaladdr) into (shl (addiu ZERO, tglobaladdr), 16). Pat<(Cpu0Lo tglobaladdr:\$in), (ADDiu ZERO, tglobaladdr:\$in)>; will translate (Cpu0ISD::Hi tglobaladdr) into (addiu ZERO, tglobaladdr). Pat<(add CPUREgs:\$hi, (Cpu0Lo tglobaladdr:\$lo)), (ADDiu CPUREgs:\$hi, tglobaladdr:\$lo)>; will translate DAG (add Cpu0ISD::Hi, Cpu0ISD::Lo) into Cpu0 instruction (add Cpu0ISD::Hi, Cpu0ISD::Lo).

### sdata or sbss

The sdata/sbss are 16 bits addressable areas which planed in ELF for fast access. Option cpu0-use-small-section=true will generate the following instructions.

```
addiu  $2, $gp, %gp_rel(gI)
ld     $2, 0($2)
...
.type  gStart,@object          # @gStart
.section .sdata,"aw",@progbits
.globl gStart
.align 2
gStart:
    .4byte 2                  # 0x2
    .size   gStart, 4

    .type  gI,@object          # @gI
    .globl gI
    .align 2
gI:
```

```
.4byte 100          # 0x64
.size  gI, 4
```

The code fragment of LowerGlobalAddress() as the following corresponding option `llc -relocation-model=static -cpu0-use-small-section=true` will translate DAG (`GlobalAddress<i32* @gI> 0`) into (`add GLOBAL_OFFSET_TABLE Cpu0ISD::GPRel<gI offset>`) in stage “Legalized selection DAG” as below.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelLowering.cpp

```
// Cpu0ISelLowering.cpp
...
// %gp_rel relocation
if (TLOF.IsGlobalInSmallSection(GV, getTargetMachine())) {
    SDValue GA = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,
                                             Cpu0II::MO_GPREL);
    SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, dl, VTs, &GA, 1);
    SDValue GOT = DAG.getGLOBAL_OFFSET_TABLE(MVT::i32);
    return DAG.getNode(ISD::ADD, dl, MVT::i32, GOT, GPRelNode);
}

...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7fc5f382cc10: <multiple use>
0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=-3]

0x7fc5f382d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

0x7fc5f382cc10: <multiple use>
0x7fc5f382d110: i32, ch = load 0x7fc5f382cf10, 0x7fc5f382d010,
0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=-3]

Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 15 nodes:
...
0x7fc5f382cc10: <multiple use>
0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=8]

0x7fc5f382d710: i32 = GLOBAL_OFFSET_TABLE

0x7fc5f382d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=4]

0x7fc5f382d610: i32 = Cpu0ISD::GPRel 0x7fc5f382d310

0x7fc5f382d810: i32 = add 0x7fc5f382d710, 0x7fc5f382d610

0x7fc5f382cc10: <multiple use>
0x7fc5f382d110: i32, ch = load 0x7fc5f382cf10, 0x7fc5f382d810,
0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=9]
...
```

Finally, the pattern defined in `Cpu0InstrInfo.td` as the following will translate DAG (`add GLOBAL_OFFSET_TABLE Cpu0ISD::GPRel<gI offset>`) into Cpu0 instruction as below. The following code in `Cpu0ISelDAGToDAG.cpp` make

the GLOBAL\_OFFSET\_TABLE translate into \$gp as below.

#### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelDAGToDAG.cpp

```

/// getGlobalBaseReg - Output the instructions required to put the
/// GOT address into a register.
SDNode *Cpu0DAGToDAGISel::getGlobalBaseReg() {
    unsigned GlobalBaseReg = MF->getInfo<Cpu0FunctionInfo>()->getGlobalBaseReg();
    return CurDAG->getRegister(GlobalBaseReg, TLI.getPointerTy()).getNode();
}

/// Select instructions not customized! Used for
/// expanded, promoted and normal instructions
SDNode* Cpu0DAGToDAGISel::Select(SDNode *Node) {
    ...
    // Get target GOT address.
    // For global variables as follows,
    // - @gI = global i32 100, align 4
    // - %2 = load i32* @gI, align 4
    // =>
    // - .cupload $gp
    // - ld      $2, %got(gI) ($gp)
    case ISD::GLOBAL_OFFSET_TABLE:
        return getGlobalBaseReg();
    ...
}

```

#### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0InstrInfo.td

```

// Cpu0InstrInfo.td
def Cpu0GPRel : SDNode<"Cpu0ISD::GPRel", SDTIntUnaryOp>;
...
// gp_rel relocs
def : Pat<(add CPUREgs:$gp, (Cpu0GPRel tglobaladdr:$in)),
           (ADD CPUREgs:$gp, (ADDiu ZERO, tglobaladdr:$in))>;
addiu  $2, $gp, %gp_rel(gI)
...

```

Pat<(add CPUREgs:\$gp, (Cpu0GPRel tglobaladdr:\$in)), (ADD CPUREgs:\$gp, (ADDiu ZERO, tglobaladdr:\$in))>; will translate (add \$gp Cpu0ISD::GPRel tglobaladdr) into (add \$gp, (addiu ZERO, tglobaladdr)).

In this mode, the \$gp content is assigned at compile/link time, changed only at program be loaded, and is fixed during the program running; while the -relocation-model=pic the \$gp can be changed during program running. For this example, if \$gp is assigned to the start address of .sdata by loader when program ch6\_1.cpu0.s is loaded, then linker can caculate %gp\_rel(gI) = (the relative address distance between gI and start of .sdata section. Which meaning this relocation record can be solved at link time, that's why it is static mode.

In this mode, we reserve \$gp to a specific fixed address of both linker and loader agree to. So, the \$gp cannot be allocated as a general purpose for variables. The following code tells llvm never allocate \$gp for variables.

**LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0Subtarget.cpp**

```
Cpu0Subtarget::Cpu0Subtarget(const std::string &TT, const std::string &CPU,
                            const std::string &FS, bool little,
                            Reloc::Model _RM) :
    Cpu0GenSubtargetInfo(TT, CPU, FS),
    Cpu0ABI(UnknownABI), IsLittle(little), RM(_RM)
{
    ...
    // Set UseSmallSection.
    UseSmallSection = UseSmallSectionOpt;
    if (RM == Reloc::Static && !UseSmallSection)
        FixGlobalBaseReg = false;
    else
        FixGlobalBaseReg = true;
}
```

**LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0RegisterInfo.cpp**

```
// pure virtual method
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
    ...
    const Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
    // Reserve GP if globalBaseRegFixed()
    if (Cpu0FI->globalBaseRegFixed())
        Reserved.set(Cpu0::GP);
    }
    ...
}
```

### 6.1.3 pic mode

#### sdata or sbss

Option `llc -relocation-model=pic -cpu0-use-small-section=true` will generate the following instructions.

```
...
.set noreorder
.cupload    $6
.set nomacro
...
ld      $2, %got(gI)($gp)
ld      $2, 0($2)
...
.type   gStart,@object      # @gStart
.data
.globl  gStart
.align  2
gStart:
.4byte 2                  # 0x2
.size   gStart, 4
.type   gI,@object        # @gI
```

```

.globl  gI
.align  2
gI:
    .4byte 100          # 0x64
    .size   gI, 4

```

The following code fragment of Cpu0AsmPrinter.cpp will emit **.cupload** asm pseudo instruction at function entry point as below.

#### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0MachineFunction.h

```

//===== Cpu0MachineFunction.h - Private data used for Cpu0 -----*-- C++ -*==//
...
class Cpu0FunctionInfo : public MachineFunctionInfo {
    virtual void anchor();
    ...

    /// GlobalBaseReg - keeps track of the virtual register initialized for
    /// use as the global base register. This is used for PIC in some PIC
    /// relocation models.
    unsigned GlobalBaseReg;
    int GPFI; // Index of the frame object for restoring $gp
    ...

    public: Cpu0FunctionInfo(MachineFunction& MF)
        : ..., GlobalBaseReg(0), ...
    {}

    bool globalBaseRegFixed() const;
    bool globalBaseRegSet() const;
    unsigned getGlobalBaseReg();
};

} // end of namespace llvm

#endif // CPU0_MACHINE_FUNCTION_INFO_H

```

#### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0MachineFunction.cpp

```

1 //===== Cpu0MachineFunctionInfo.cpp - Private data used for Cpu0 =====//  

2 //  

3 //          The LLVM Compiler Infrastructure  

4 //  

5 // This file is distributed under the University of Illinois Open Source  

6 // License. See LICENSE.TXT for details.  

7 //  

8 //=====-----=====//  

9  

10 #include "Cpu0MachineFunction.h"  

11 #include "Cpu0InstrInfo.h"  

12 #include "Cpu0Subtarget.h"  

13 #include "MCTargetDesc/Cpu0BaseInfo.h"  

14 #include "llvm/IR/Function.h"  

15 #include "llvm/CodeGen/MachineInstrBuilder.h"  

16 #include "llvm/CodeGen/MachineRegisterInfo.h"

```

```

17
18 using namespace llvm;
19
20 bool FixGlobalBaseReg = true;
21
22 bool Cpu0FunctionInfo::globalBaseRegFixed() const {
23     return FixGlobalBaseReg;
24 }
25
26 bool Cpu0FunctionInfo::globalBaseRegSet() const {
27     return GlobalBaseReg;
28 }
29
30 unsigned Cpu0FunctionInfo::getGlobalBaseReg() {
31     // Return if it has already been initialized.
32     if (GlobalBaseReg)
33         return GlobalBaseReg;
34
35     if (FixGlobalBaseReg) // $gp is the global base register.
36         return GlobalBaseReg = Cpu0::GP;
37
38     const TargetRegisterClass *RC;
39     RC = (const TargetRegisterClass*)&Cpu0::CPURegsRegClass;
40
41     return GlobalBaseReg = MF.getRegInfo().createVirtualRegister(RC);
42 }
43
44 void Cpu0FunctionInfo::anchor() { }

```

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0AsmPrinter.cpp

```

/// EmitFunctionBodyStart - Targets can override this to emit stuff before
/// the first basic block in the function.
void Cpu0AsmPrinter::EmitFunctionBodyStart() {
    ...
    bool EmitCPLoad = (MF->getTarget().getRelocationModel() == Reloc::PIC_) &&
        Cpu0FI->globalBaseRegSet() &&
        Cpu0FI->globalBaseRegFixed();
    if (OutStreamer.hasRawTextSupport()) {
        ...
        OutStreamer.EmitRawText(StringRef("\t.set\tnoreorder"));
        // Emit .cupload directive if needed.
        if (EmitCPLoad)
            OutStreamer.EmitRawText(StringRef("\t.cupload\t$6"));
        OutStreamer.EmitRawText(StringRef("\t.set\tnomacro"));
        if (Cpu0FI->getEmitNOAT())
            OutStreamer.EmitRawText(StringRef("\t.set\tnoat"));
    } else if (EmitCPLoad) {
        SmallVector<MCInst, 4> MCInsts;
        MCInstLowering.LowerCPLoad(MCInsts);
        for (SmallVector<MCInst, 4>::iterator I = MCInsts.begin();
            I != MCInsts.end(); ++I)
            OutStreamer.EmitInstruction(*I);
    }
}

```

```

...
.set noreorder
.cupload $6
.set nomacro
...

```

The **.cupload** is the assembly directive (macro) which will expand to several instructions. Issue **.cupload** before **.set nomacro** since the **.set nomacro** option causes the assembler to print a warning whenever an assembler operation generates more than one machine language instruction, reference Mips ABI<sup>2</sup>.

Following code will expand **.cupload** into machine instructions as below. “09a00000 1ea0010 09aa0000 13aa6000” is the **.cupload** machine instructions displayed in comments of **Cpu0MCInstLower.cpp**.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0MCInstLower.cpp

```

1  }
2
3 static void CreateMCInst(MCInst& Inst, unsigned Opc, const MCOperand& Opnd0,
4                           const MCOperand& Opnd1,
5                           const MCOperand& Opnd2 = MCOperand()) {
6     Inst.setOpcode(Opc);
7     Inst.addOperand(Opnd0);
8     Inst.addOperand(Opnd1);
9     if (Opnd2.isValid())
10        Inst.addOperand(Opnd2);
11 }
12
13 // Lower ".cupload $reg" to
14 // "addiu $gp, $zero, %hi(_gp_disp)"
15 // "shl $gp, $gp, 16"
16 // "addiu $gp, $gp, %lo(_gp_disp)"
17 // "addu $gp, $gp, $t9"
18 void Cpu0MCInstLower:::LowerCPOLOAD(SmallVector<MCInst, 4>& MCInsts) {
19     MCOperand GPRReg = MCOperand:::CreateReg(Cpu0:::GP);
20     MCOperand T9Reg = MCOperand:::CreateReg(Cpu0:::T9);
21     MCOperand ZEROReg = MCOperand:::CreateReg(Cpu0:::ZERO);
22     StringRef SymName("_gp_disp");
23     const MCSymbol *Sym = Ctx->GetOrCreateSymbol(SymName);
24     const MCSymbolRefExpr *MCSSym;
25
26     MCSSym = MCSymbolRefExpr:::Create(Sym, MCSymbolRefExpr:::VK_Cpu0_ABS_HI, *Ctx);
27     MCOperand SymHi = MCOperand:::CreateExpr(MCSSym);
28     MCSSym = MCSymbolRefExpr:::Create(Sym, MCSymbolRefExpr:::VK_Cpu0_ABS_LO, *Ctx);
29     MCOperand SymLo = MCOperand:::CreateExpr(MCSSym);
30
31     MCInsts.resize(4);
32
33     CreateMCInst(MCInsts[0], Cpu0:::ADDiu, GPRReg, ZEROReg, SymHi);
34     CreateMCInst(MCInsts[1], Cpu0:::SHL, GPRReg, GPRReg, MCOperand:::CreateImm(16));
35     CreateMCInst(MCInsts[2], Cpu0:::ADDiu, GPRReg, GPRReg, SymLo);
36     CreateMCInst(MCInsts[3], Cpu0:::ADD, GPRReg, GPRReg, T9Reg);
37 }

```

```

118-165-76-131:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=
obj ch8_2.bc -o ch8_2.cpu0.o

```

<sup>2</sup> <http://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf>

```
118-165-76-131:InputFiles Jonathan$ gobjdump -s ch6_1.cpu0.o

ch6_1.cpu0.o:      file format elf32-big

Contents of section .text:
0000 09a00000 1eaa0010 09aa0000 13aa6000  .....`.
0010 09ddfff8 09200000 022d0004 022d0000  ....-....-...
...
118-165-76-131:InputFiles Jonathan$ gobjdump -tr ch6_1.cpu0.o
...
RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE        VALUE
00000000 UNKNOWN    _gp_disp
00000008 UNKNOWN    _gp_disp
00000020 UNKNOWN    gI
```

---

**Note: // Mips ABI: \_gp\_disp** After calculating the gp, a function allocates the local stack space and saves the gp on the stack, so it can be restored after subsequent function calls. In other words, the gp is a caller saved register.

...

\_gp\_disp represents the offset between the beginning of the function and the global offset table. Various optimizations are possible in this code example and the others that follow. For example, the calculation of gp need not be done for a position-independent function that is strictly local to an object module.

---

The \_gp\_disp as above is a relocation record, it means both the machine instructions 09a00000 (offset 0) which equal to assembly “addiu \$gp, \$zero, %hi(\_gp\_disp)” and 09aa0000 (offset 8) which equal to assembly “addiu \$gp, \$gp, %lo(\_gp\_disp)” are relocated records depend on \_gp\_disp. The loader or OS can caculate \_gp\_disp by (x - start address of .data) when load the dynamic function into memory x, and adjust these two instructions offset correctly. Since shared function is loaded when this function be called, the relocation record “ld \$2, %got(gI)(\$gp)” cannot be resolved in link time. In spite of the reloaction record is solved on load time, the name binding is static since linker deliver the memory address to loader and loader can solve this just by caculate the offset directly. No need to search the variable name at run time. The ELF relocation records will be introduced in Chapter ELF Support. Don’t worry, if you don’t quite understand it at this point.

The code fragment of LowerGlobalAddress() as the following corresponding option llc -relocation-model=pic will translate DAG (GlobalAddress<i32\* @gI> 0) into (load EntryToken, (Cpu0ISD::Wrapper Register %GP, TargetGlobalAddress<i32\* @gI> 0)) in stage “Legalized selection DAG” as below.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelLowering.cpp

```
SDValue Cpu0TargetLowering::getAddrGlobal(SDValue Op, SelectionDAG &DAG,
                                         unsigned Flag) const {
    DebugLoc DL = Op.getDebugLoc();
    EVT Ty = Op.getValueType();
    SDValue Tgt = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
                             getTargetNode(Op, DAG, Flag));
    return DAG.getLoad(Ty, DL, DAG.getEntryNode(), Tgt,
                        MachinePointerInfo::getGOT(), false, false, false, 0);
}

SDValue Cpu0TargetLowering::LowerGlobalAddress(SDValue Op,
                                              SelectionDAG &DAG) const {
```

---

```

...
if (TLOF.IsGlobalInSmallSection(GV, getTargetMachine()))
    return getAddrGlobal(Op, DAG, Cpu0II::MO_GOT16);
...
}

```

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelDAGToDAG.cpp

```

/// ComplexPattern used on Cpu0InstrInfo
/// Used on Cpu0 Load/Store instructions
bool Cpu0DAGToDAGISel::
SelectAddr(SDNode *Parent, SDValue Addr, SDValue &Base, SDValue &Offset) {
    ...
    // on PIC code Load GA
    if (Addr.getOpcode() == Cpu0ISD::Wrapper) {
        Base = Addr.getOperand(0);
        Offset = Addr.getOperand(1);
        return true;
    }
    ...
}
...

Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
...
0x7fad7102cc10: <multiple use>
0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=-3]

0x7fad7102d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

0x7fad7102cc10: <multiple use>
0x7fad7102d110: i32, ch = load 0x7fad7102cf10, 0x7fad7102d010,
0x7fad7102cc10<LD4[@gI]> [ORD=3] [ID=-3]
...
Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 15 nodes:
0x7ff3c9c10b98: ch = EntryToken [ORD=1] [ID=0]
...
0x7fad7102cc10: <multiple use>
0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=8]

0x7fad70c10b98: <multiple use>
0x7fad7102d610: i32 = Register %GP

0x7fad7102d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=1]

0x7fad7102d710: i32 = Cpu0ISD::Wrapper 0x7fad7102d610, 0x7fad7102d310

0x7fad7102cc10: <multiple use>
0x7fad7102d810: i32, ch = load 0x7fad70c10b98, 0x7fad7102d710,
0x7fad7102cc10<LD4[<unknown>]>
0x7ff3ca02cc10: <multiple use>
0x7ff3ca02d110: i32, ch = load 0x7ff3ca02cf10, 0x7ff3ca02d810,
0x7ff3ca02cc10<LD4[@gI]> [ORD=3] [ID=9]

```

...

Finally, the pattern Cpu0 instruction **ld** defined before in Cpu0InstrInfo.td will translate DAG (load EntryToken, (Cpu0ISD::Wrapper Register %GP, TargetGlobalAddress<i32\* @gI> 0)) into Cpu0 instruction as below.

```
...
    ld      $2, %got(gI)($gp)
...
```

Remind in pic mode, Cpu0 use ".cupload" and "ld \$2, %got(gI)(\$gp)" to access global variable. It take 5 instructions in Cpu0 and 4 instructions in Mips. The cost came from we didn't assume the register \$gp is always assigned to address .sdata and fixed there. Even we reserve \$gp in this function, the \$gp register can be changed at other functions. In last sub-section, the \$gp is assumed to preserve at any function. If \$gp is fixed during the run time, then ".cupload" can be removed here and have only one instruction cost in global variable access. The advantage of ".cupload" removing came from losing one general purpose register \$gp which can be allocated for variables. In last sub-section, .sdata mode, we use ".cupload" removing since it is static link, and without ".cupload" will save four instructions which has the faster result in speed. In pic mode, the dynamic loading takes too much time. Remove ".cupload" with the cost of losing one general purpose register at all functions is not deserved here. Anyway, in pic mode and used in static link, you can choose ".cupload" removing. But we prefer use \$gp for general purpose register as the solution. The relocation records of ".cupload" from llc -relocation-model=pic can also be solved in link stage if we want to link this function by static link.

## data or bss

The code fragment of LowerGlobalAddress() as the following corresponding option llc -relocation-model=pic will translate DAG (GlobalAddress<i32\* @gI> 0) into (load EntryToken, (Cpu0ISD::Wrapper (add Cpu0ISD::Hi<gI offset Hi16>, Register %GP), TargetGlobalAddress<i32\* @gI> 0)) in stage "Legalized selection DAG" as below.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelLowering.cpp

```
SDValue Cpu0TargetLowering::getAddrGlobalLargeGOT(SDValue Op, SelectionDAG &DAG,
                                                unsigned HiFlag,
                                                unsigned LoFlag) const {
    DebugLoc DL = Op.getDebugLoc();
    EVT Ty = Op.getValueType();
    SDValue Hi = DAG.getNode(Cpu0ISD::Hi, DL, Ty, getTargetNode(Op, DAG, HiFlag));
    Hi = DAG.getNode(ISD::ADD, DL, Ty, Hi, getGlobalReg(DAG, Ty));
    SDValue Wrapper = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, Hi,
                                   getTargetNode(Op, DAG, LoFlag));
    return DAG.getLoad(Ty, DL, DAG.getEntryNode(), Wrapper,
                       MachinePointerInfo::getGOT(), false, false, false, 0);
}

SDValue Cpu0TargetLowering::LowerGlobalAddress(SDValue Op,
                                              SelectionDAG &DAG) const {
    ...
    if (TLOF::IsGlobalInSmallSection(GV, getTargetMachine()))
        ...
    else
        return getAddrGlobalLargeGOT(Op, DAG, Cpu0II::MO_GOT_HI16,
                                     Cpu0II::MO_GOT_LO16);
}
```

```

...
Type-legalized selection DAG: BB#0 '_Z3funv:'
SelectionDAG has 10 nodes:
...
0x7fb77a02cd10: ch = store 0x7fb779c10a08, 0x7fb77a02ca10, 0x7fb77a02cb10,
0x7fb77a02cc10<ST4[%c]> [ORD=1] [ID=-3]

0x7fb77a02ce10: i32 = GlobalAddress<i32* @gI> 0 [ORD=2] [ID=-3]

0x7fb77a02cc10: <multiple use>
0x7fb77a02cf10: i32, ch = load 0x7fb77a02cd10, 0x7fb77a02ce10,
0x7fb77a02cc10<LD4[@gI]> [ORD=2] [ID=-3]
...
Legalized selection DAG: BB#0 '_Z3funv:'
SelectionDAG has 16 nodes:
...
0x7fb77a02cd10: ch = store 0x7fb779c10a08, 0x7fb77a02ca10, 0x7fb77a02cb10,
0x7fb77a02cc10<ST4[%c]> [ORD=1] [ID=6]

0x7fb779c10a08: <multiple use>
0x7fb77a02d110: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=19]

0x7fb77a02d410: i32 = Cpu0ISD::Hi 0x7fb77a02d110

0x7fb77a02d510: i32 = Register %GP

0x7fb77a02d610: i32 = add 0x7fb77a02d410, 0x7fb77a02d510

0x7fb77a02d710: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=20]

0x7fb77a02d810: i32 = Cpu0ISD::Wrapper 0x7fb77a02d610, 0x7fb77a02d710

0x7fb77a02cc10: <multiple use>
0x7fb77a02fe10: i32, ch = load 0x7fb779c10a08, 0x7fb77a02d810,
0x7fb77a02cc10<LD4[GOT]>

0x7fb77a02cc10: <multiple use>
0x7fb77a02cf10: i32, ch = load 0x7fb77a02cd10, 0x7fb77a02fe10,
0x7fb77a02cc10<LD4[@gI]> [ORD=2] [ID=7]
...

```

Finally, the pattern Cpu0 instruction **Id** defined before in Cpu0InstrInfo.td will translate DAG (load EntryToken, (Cpu0ISD::Wrapper (add Cpu0ISD::Hi<gI offset Hi16>, Register %GP), Cpu0ISD::Lo<gI offset Lo16>)) into Cpu0 instructions as below.

```

...
    addiu $2, $zero, %got_hi(gI)
    shl  $2, $2, 16
    add  $2, $2, $gp
    ld   $2, %got_lo(gI) ($2)
...

```

## 6.1.4 Global variable print support

Above code is for global address DAG translation. Next, add the following code to Cpu0MCInstLower.cpp, Cpu0InstPrinter.cpp and Cpu0ISelLowering.cpp for global variable printing operand function.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0MCInstLower.cpp

```

MCOperand Cpu0MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
                                              MachineOperandType MOTy,
                                              unsigned Offset) const {
    MCSymbolRefExpr::VariantKind Kind;
    const MCSymbol *Symbol;

    switch (MO.getTargetFlags()) {
        default: llvm_unreachable("Invalid target flag!");
    }
    // Cpu0_GPREL is for llc -march=cpu0 -relocation-model=static
    // -cpu0-use-small-section=false (global var in .sdata)
    case Cpu0II::MO_GPREL: Kind = MCSymbolRefExpr::VK_Cpu0_GPREL; break;

    case Cpu0II::MO_GOT16: Kind = MCSymbolRefExpr::VK_Cpu0_GOT16; break;
    case Cpu0II::MO_GOT: Kind = MCSymbolRefExpr::VK_Cpu0_GOT; break;
    // ABS_HI and ABS_LO is for llc -march=cpu0 -relocation-model=static
    // (global var in .data)
    case Cpu0II::MO_ABS_HI: Kind = MCSymbolRefExpr::VK_Cpu0_ABS_HI; break;
    case Cpu0II::MO_ABS_LO: Kind = MCSymbolRefExpr::VK_Cpu0_ABS_LO; break;
    }

    switch (MOTy) {
        case MachineOperand::MO_GlobalAddress:
            Symbol = Mang->getSymbol(MO.getGlobal());
            break;

        default:
            llvm_unreachable("<unknown operand type>");
    }
    ...
}
}

MCOperand Cpu0MCInstLower::LowerOperand(const MachineOperand& MO,
                                         unsigned offset) const {
    MachineOperandType MOTy = MO.getType();

    switch (MOTy) {
        ...
        case MachineOperand::MO_GlobalAddress:
            return LowerSymbolOperand(MO, MOTy, offset);
        ...
    }
}

```

### LLVMBackendTutorialExampleCode/Chapter6\_1/InstPrinter/Cpu0InstPrinter.cpp

```

static void printExpr(const MCExpr *Expr, raw_ostream &OS) {
    ...
    switch (Kind) {
        default: llvm_unreachable("Invalid kind!");
        case MCSymbolRefExpr::VK_None: break;
    }
    // Cpu0_GPREL is for llc -march=cpu0 -relocation-model=static
    case MCSymbolRefExpr::VK_Cpu0_GPREL: OS << "%gp_rel("; break;
    case MCSymbolRefExpr::VK_Cpu0_GOT16: OS << "%got("; break;
    case MCSymbolRefExpr::VK_Cpu0_GOT: OS << "%got("; break;
    case MCSymbolRefExpr::VK_Cpu0_ABS_HI: OS << "%hi("; break;
}

```

```

case MCSymbolRefExpr::VK_Cpu0_ABS_LO:      OS << "%lo(";      break;
}
...
}

```

The following function is for llc -debug DAG node name printing.

#### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelLowering.cpp

```

const char *Cpu0TargetLowering::getTargetNodeName(unsigned Opcode) const {
    switch (Opcode) {
    case Cpu0ISD::JmpLink:           return "Cpu0ISD::JmpLink";
    case Cpu0ISD::Hi:                return "Cpu0ISD::Hi";
    case Cpu0ISD::Lo:                return "Cpu0ISD::Lo";
    case Cpu0ISD::GPRel:             return "Cpu0ISD::GPRel";
    case Cpu0ISD::Ret:               return "Cpu0ISD::Ret";
    case Cpu0ISD::DivRem:            return "MipsISD::DivRem";
    case Cpu0ISD::DivRemU:           return "MipsISD::DivRemU";
    case Cpu0ISD::Wrapper:           return "Cpu0ISD::Wrapper";
    default:                         return NULL;
    }
}

```

OS is the output stream which output to the assembly file.

### 6.1.5 Summary

The global variable Instruction Selection for DAG translation is not like the ordinary IR node translation, it has static (absolute address) and PIC mode. Backend deals this translation by create DAG nodes in function LowerGlobalAddress() which called by LowerOperation(). Function LowerOperation() take care all Custom type of operation. Backend set global address as Custom operation by `"setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);"` in Cpu0TargetLowering() constructor. Different address mode has it's own DAG list be created. By set the pattern `Pat<>` in Cpu0InstrInfo.td, the llvm can apply the compiler mechanism, pattern match, in the Instruction Selection stage.

There are three type for setXXXAction(), Promote, Expand and Custom. Except Custom, the other two maybe no need to coding. The section “Instruction Selector” of <sup>3</sup> is the references.

As shown in the section, the global variable can be laid in .sdata/.sbss by option `-cpu0-use-small-section=true`. It is possible, the small data section (16 bits addressable) is full out at link stage. When this happens, linker will highlight this error and force the toolchain user to fix it. The toolchain user, need to reconsider which global variables should be move from .sdata/.sbss to .data/.bss by set option `-cpu0-use-small-section=false` for that global variables declared file. The rule for global variables allocation is “set the small and frequent variables in small 16 addressable area”.

## 6.2 Array and struct support

LLVM use getelementptr to represent the array and struct type in C. Please reference section getelementptr of <sup>4</sup>. For ch6\_2.cpp, the llvm IR as follows,

<sup>3</sup> <http://llvm.org/docs/WritingAnLLVMBackend.html>

<sup>4</sup> <http://llvm.org/docs/LangRef.html>

### LLVMBackendTutorialExampleCode/InputFiles/ch6\_2.cpp

```
1 struct Date
2 {
3     int year;
4     int month;
5     int day;
6 };
7
8 Date date = {2012, 10, 12};
9 int a[3] = {2012, 10, 12};
10
11 int main()
12 {
13     int day = date.day;
14     int i = a[1];
15
16     return 0;
17 }

// ch6_2.ll
; ModuleID = 'ch6_2.bc'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-
f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.8.0"

%struct.Date = type { i32, i32, i32 }

@date = global %struct.Date { i32 2012, i32 10, i32 12 }, align 4
@a = global [3 x i32] [i32 2012, i32 10, i32 12], align 4

define i32 @main() nounwind ssp {
entry:
    %retval = alloca i32, align 4
    %day = alloca i32, align 4
    %i = alloca i32, align 4
    store i32 0, i32* %retval
    %0 = load i32* getelementptr inbounds (%struct.Date* @date, i32 0, i32 2),
    align 4
    store i32 %0, i32* %day, align 4
    %1 = load i32* getelementptr inbounds ([3 x i32]* @a, i32 0, i32 1), align 4
    store i32 %1, i32* %i, align 4
    ret i32 0
}
```

Run Chapter6\_1/ with ch6\_2.bc on static mode will get the incorrect asm file as follows,

```
118-165-66-82:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_
debug_build/bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=asm
ch6_2.bc -o ch6_2.cpu0.static.s
118-165-66-82:InputFiles Jonathan$ cat ch6_2.cpu0.static.s
.section .mdebug.abi32
.previous
.file "ch6_2.bc"
.text
.globl main
.align 2
.type main,@function
```

```

.ent main          # @main
main:
.cfi_startproc
.frame $sp,16,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
    addiu $sp, $sp, -16
$tmp1:
    .cfi_def_cfa_offset 16
    addiu $2, $zero, 0
    st $2, 12($sp)
    addiu $2, $zero, %hi(date)
    shl $2, $2, 16
    addiu $2, $2, %lo(date)
    ld $2, 0($2)    // the correct one is ld $2, 8($2)
    st $2, 8($sp)
    addiu $2, $zero, %hi(a)
    shl $2, $2, 16
    addiu $2, $2, %lo(a)
    ld $2, 0($2)
    st $2, 4($sp)
    addiu $sp, $sp, 16
    ret $lr
.set macro
.set reorder
.end main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc

.type date,@object          # @date
.data
.globl date
.align 2
date:
    .4byte 2012          # 0x7dc
    .4byte 10             # 0xa
    .4byte 12             # 0xc
.size date, 12

.type a,@object            # @a
.globl a
.align 2
a:
    .4byte 2012          # 0x7dc
    .4byte 10             # 0xa
    .4byte 12             # 0xc
.size a, 12

```

For “**day = date.day**”, the correct one is “**ld \$2, 8(\$2)**”, not “**ld \$2, 0(\$2)**”, since date.day is offset 8(date). Type int is 4 bytes in cpu0, and the date.day has fields year and month before it. Let use debug option in llc to see what’s wrong,

```

jonathantekimac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -debug -relocation-model=static
-filetype=asm ch6_2.bc -o ch6_2.cpu0.static.s
...

```

```
==== main
Initial selection DAG: BB#0 'main:entry'
SelectionDAG has 20 nodes:
 0x7f7f5b02d210: i32 = undef [ORD=1]

 0x7f7f5ac10590: ch = EntryToken [ORD=1]

 0x7f7f5b02d010: i32 = Constant<0> [ORD=1]

 0x7f7f5b02d110: i32 = FrameIndex<0> [ORD=1]

 0x7f7f5b02d210: <multiple use>
0x7f7f5b02d310: ch = store 0x7f7f5ac10590, 0x7f7f5b02d010, 0x7f7f5b02d110,
0x7f7f5b02d210<ST4[%retval]> [ORD=1]

 0x7f7f5b02d410: i32 = GlobalAddress<%struct.Date* @date> 0 [ORD=2]

 0x7f7f5b02d510: i32 = Constant<8> [ORD=2]

 0x7f7f5b02d610: i32 = add 0x7f7f5b02d410, 0x7f7f5b02d510 [ORD=2]

 0x7f7f5b02d210: <multiple use>
0x7f7f5b02d710: i32, ch = load 0x7f7f5b02d310, 0x7f7f5b02d610, 0x7f7f5b02d210
<LD4[getelementptr inbounds (%struct.Date* @date, i32 0, i32 2)]> [ORD=3]

 0x7f7f5b02db10: i64 = Constant<4>

 0x7f7f5b02d710: <multiple use>
 0x7f7f5b02d710: <multiple use>
 0x7f7f5b02d810: i32 = FrameIndex<1> [ORD=4]

 0x7f7f5b02d210: <multiple use>
0x7f7f5b02d910: ch = store 0x7f7f5b02d710:1, 0x7f7f5b02d710, 0x7f7f5b02d810,
0x7f7f5b02d210<ST4[%day]> [ORD=4]

 0x7f7f5b02da10: i32 = GlobalAddress<[3 x i32]* @a> 0 [ORD=5]

 0x7f7f5b02dc10: i32 = Constant<4> [ORD=5]

 0x7f7f5b02dd10: i32 = add 0x7f7f5b02da10, 0x7f7f5b02dc10 [ORD=5]

 0x7f7f5b02d210: <multiple use>
0x7f7f5b02de10: i32, ch = load 0x7f7f5b02d910, 0x7f7f5b02dd10, 0x7f7f5b02d210
<LD4[getelementptr inbounds ([3 x i32]* @a, i32 0, i32 1)]> [ORD=6]

...

Replacing.3 0x7f7f5b02dd10: i32 = add 0x7f7f5b02da10, 0x7f7f5b02dc10 [ORD=5]
With: 0x7f7f5b030010: i32 = GlobalAddress<[3 x i32]* @a> + 4

Replacing.3 0x7f7f5b02d610: i32 = add 0x7f7f5b02d410, 0x7f7f5b02d510 [ORD=2]
With: 0x7f7f5b02db10: i32 = GlobalAddress<%struct.Date* @date> + 8

Optimized lowered selection DAG: BB#0 'main:entry'
```

SelectionDAG has 15 nodes:

```

0x7f7f5b02d210: i32 = undef [ORD=1]

0x7f7f5ac10590: ch = EntryToken [ORD=1]

0x7f7f5b02d010: i32 = Constant<0> [ORD=1]

0x7f7f5b02d110: i32 = FrameIndex<0> [ORD=1]

0x7f7f5b02d210: <multiple use>
0x7f7f5b02d310: ch = store 0x7f7f5ac10590, 0x7f7f5b02d010, 0x7f7f5b02d110,
0x7f7f5b02d210<ST4[%retval]> [ORD=1]

0x7f7f5b02db10: i32 = GlobalAddress<%struct.Date* @date> + 8

0x7f7f5b02d210: <multiple use>
0x7f7f5b02d710: i32, ch = load 0x7f7f5b02d310, 0x7f7f5b02db10, 0x7f7f5b02d210
<LD4[getelementptr inbounds (%struct.Date* @date, i32 0, i32 2)]> [ORD=3]

0x7f7f5b02d710: <multiple use>
0x7f7f5b02d710: <multiple use>
0x7f7f5b02d810: i32 = FrameIndex<1> [ORD=4]

0x7f7f5b02d210: <multiple use>
0x7f7f5b02d910: ch = store 0x7f7f5b02d710:1, 0x7f7f5b02d710, 0x7f7f5b02d810,
0x7f7f5b02d210<ST4[%day]> [ORD=4]

0x7f7f5b030010: i32 = GlobalAddress<[3 x i32]* @a> + 4

0x7f7f5b02d210: <multiple use>
0x7f7f5b02de10: i32, ch = load 0x7f7f5b02d910, 0x7f7f5b030010, 0x7f7f5b02d210
<LD4[getelementptr inbounds ([3 x i32]* @a, i32 0, i32 1)]> [ORD=6]

...

```

By `llc -debug`, you can see the DAG translation process. As above, the DAG list for `date.day` (add `GlobalAddress<[3 x i32]* @a> 0, Constant<8>`) with 3 nodes is replaced by 1 node `GlobalAddress<%struct.Date* @date> + 8`. The DAG list for `a[1]` is same. The replacement occurs since `TargetLowering.cpp::isOffsetFoldingLegal(...)` return true in `llc -static` static addressing mode as below. In Cpu0 the `ld` instruction format is “`ld $r1, offset($r2)`” which meaning load `$r2` address+offset to `$r1`. So, we just replace the `isOffsetFoldingLegal(...)` function by override mechanism as below.

### lib/CodeGen/SelectionDAG/TargetLowering.cpp

```

bool
TargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
    // Assume that everything is safe in static mode.
    if (getTargetMachine().getRelocationModel() == Reloc::Static)
        return true;

    // In dynamic-no-pic mode, assume that known defined values are safe.
    if (getTargetMachine().getRelocationModel() == Reloc::DynamicNoPIC &&
        GA &&
        !GA->getGlobal()->isDeclaration() &&
        !GA->getGlobal()->isWeakForLinker())
        return true;
}

```

```
// Otherwise assume nothing is safe.  
    return false;  
}
```

### LLVMBackendTutorialExampleCode/Chapter6\_2/Cpu0ISelLowering.cpp

```
bool  
Cpu0TargetLowering::isOffsetFoldingLegal (const GlobalAddressSDNode *GA) const {  
    // The Cpu0 target isn't yet aware of offsets.  
    return false;  
}
```

Beyond that, we need to add the following code fragment to Cpu0ISelDAGToDAG.cpp,

### LLVMBackendTutorialExampleCode/Chapter6\_2/Cpu0ISelDAGToDAG.cpp

```
// Cpu0ISelDAGToDAG.cpp  
/// ComplexPattern used on Cpu0InstrInfo  
/// Used on Cpu0 Load/Store instructions  
bool Cpu0DAGToDAGISel::  
SelectAddr(SDNode *Parent, SDValue Addr, SDValue &Base, SDValue &Offset) {  
...  
    // Addresses of the form FI+const or FI/const  
    if (CurDAG->isBaseWithConstantOffset(Addr)) {  
        ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Addr.getOperand(1));  
        if (isInt<16>(CN->getSExtValue())) {  
  
            // If the first operand is a FI, get the TargetFI Node  
            if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>  
                (Addr.getOperand(0)))  
                Base = CurDAG->getTargetFrameIndex(FIN->getIndex(), ValTy);  
            else  
                Base = Addr.getOperand(0);  
  
            Offset = CurDAG->getTargetConstant(CN->getZExtValue(), ValTy);  
            return true;  
        }  
    }  
}
```

Recall we have translated DAG list for date.day (add GlobalAddress<[3 x i32]\* @a> 0, Constant<8>) into (add (add (add Cpu0ISD::Hi (Cpu0II::MO\_ABS\_HI), Cpu0ISD::Lo(Cpu0II::MO\_ABS\_LO)), Constant<8>) by the following code in Cpu0ISelLowering.cpp.

### LLVMBackendTutorialExampleCode/Chapter6\_1/Cpu0ISelLowering.cpp

```
// Cpu0ISelLowering.cpp  
SDValue Cpu0TargetLowering::LowerGlobalAddress (SDValue Op,  
                                              SelectionDAG &DAG) const {  
...  
    // %hi/%lo relocation  
    SDValue GAHi = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,  
                                              Cpu0II::MO_ABS_HI);
```

```

SDValue GALo = DAG.getTargetGlobalAddress(GV, dl, MVT::i32, 0,
                                         Cpu0II::MO_ABS_LO);
SDValue HiPart = DAG.getNode(Cpu0ISD::Hi, dl, VTs, &GAHi, 1);
SDValue Lo = DAG.getNode(Cpu0ISD::Lo, dl, MVT::i32, GALo);
return DAG.getNode(ISD::ADD, dl, MVT::i32, HiPart, Lo);
...
}
    
```

So, when the `SelectAddr(...)` of `Cpu0ISelDAGToDAG.cpp` is called. The `Addr` `SDValue` in `SelectAddr(..., Addr, ...)` is `DAG` list for `date.day` (`add (add Cpu0ISD::Hi (Cpu0II::MO_ABS_HI), Cpu0ISD::Lo(Cpu0II::MO_ABS_LO)), Constant<8>`). Since `Addr.getOpcode() = ISD::ADD`, `Addr.getOperand(0) = (add Cpu0ISD::Hi (Cpu0II::MO_ABS_HI), Cpu0ISD::Lo(Cpu0II::MO_ABS_LO))` and `Addr.getOperand(1).getOpcode() = ISD::Constant`, the `Base = SDValue (add Cpu0ISD::Hi (Cpu0II::MO_ABS_HI), Cpu0ISD::Lo(Cpu0II::MO_ABS_LO))` and `Offset = Constant<8>`. After set `Base` and `Offset`, the `load` `DAG` will translate the global address `date.day` into machine instruction **“ld \$r1, 8(\$r2)”** in `Instruction Selection` stage.

Chapter6\_2/ include these changes as above, you can run it with `ch6_2.cpp` to get the correct generated instruction **“ld \$r1, 8(\$r2)”** for `date.day` access, as follows.

```

...
ld $2, 8($2)
st $2, 8($sp)
addiu $2, $zero, %hi(a)
shl $2, $2, 16
addiu $2, $2, %lo(a)
ld $2, 4($2)
    
```

## 6.3 Type of char and short int

To support signed/unsigned char and short int, we add the following code to Chapter6\_3/.

### LLVMBackendTutorialExampleCode/Chapter6\_3/Cpu0InstrInfo.td

```

def sextloadi16_a : AlignedLoad<sextloadi16>;
def zextloadi16_a : AlignedLoad<zextloadi16>;
def extloadi16_a : AlignedLoad<extloadi16>;
...
def truncstorei16_a : AlignedStore<truncstorei16>;
...
defm LB      : LoadM32<0x03, "lb", sextloadi8>;
defm LBu    : LoadM32<0x04, "lbu", zextloadi8>;
defm SB      : StoreM32<0x05, "sb", truncstorei8>;
defm LH      : LoadM32<0x06, "lh", sextloadi16_a>;
defm LHu    : LoadM32<0x07, "lhu", zextloadi16_a>;
defm SH      : StoreM32<0x08, "sh", truncstorei16_a>;
    
```

Run Chapter6\_3/ with `ch6_3.cpp` will get the following result.

### LLVMBackendTutorialExampleCode/InputFiles/ch6\_3.cpp

```

1 struct Date
2 {
3     short year;
    
```

```

4     char month;
5     char day;
6     char hour;
7     char minute;
8     char second;
9 };
10
11 unsigned char b[4] = {'a', 'b', 'c', '\0'};
12
13 int main()
14 {
15     unsigned char a = b[1];
16     char c = (char)b[1];
17     Date date1 = {2012, (char)11, (char)25, (char)9, (char)40, (char)15};
18     char m = date1.month;
19     char s = date1.second;
20
21     return 0;
22 }
```

```

118-165-64-245:InputFiles Jonathan$ clang -c ch6_3.cpp -emit-llvm -o ch6_3.bc
118-165-64-245:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch6_3.bc -o
ch6_3.cpu0.s
```

```

118-165-64-245:InputFiles Jonathan$ cat ch6_3.cpu0.s
    .section .mdebug.abi32
    .previous
    .file   "ch6_3.bc"
    .text
    .globl  main
    .align  2
    .type   main,@function
    .ent    main                      # @main
main:
    .cfi_startproc
    .frame  $sp,32,$lr
    .mask   0x00000000,0
    .set    noreorder
    .cupload $6
    .set    nomacro
# BB#0:
    addiu   $sp, $sp, -32
$tmp1:
    .cfi_def_cfa_offset 32
    addiu   $2, $zero, 0
    st      $2, 28($sp)
    ld      $3, %got(b) ($gp)
    lbu    $4, 1($3)
    sb      $4, 24($sp)
    lbu    $3, 1($3)
    sb      $3, 20($sp)
    ld      $3, %got($_ZZ4mainE5date1) ($gp)
    addiu   $3, $3, %lo($_ZZ4mainE5date1)
    lhu    $4, 4($3)
    shl    $4, $4, 16
    lhu    $5, 6($3)
    or     $4, $4, $5
    st      $4, 12($sp)           // store hour, minute and second on 12($sp)
```

```

lhu    $4, 2($3)
lhu    $3, 0($3)
shl    $3, $3, 16
or     $3, $3, $4
st     $3, 8($sp)           // store year, month and day on 8($sp)
lbu    $3, 10($sp)          // m = date1.month;
sb     $3, 4($sp)
lbu    $3, 14($sp)          // s = date1.second;
sb     $3, 0($sp)
addiu $sp, $sp, 32
ret    $lr
.set   macro
.set   reorder
.end   main
$tmp2:
.size  main, ($tmp2)-main
.cfi_endproc

.type   b,@object          # @b
.data
.globl b
b:
.asciz "abc"
.size  b, 4

.type   $_ZZ4mainE5date1,@object # @_ZZ4mainE5date1
.section .rodata.cst8,"aM",@progbits,8
.align 1
$_ZZ4mainE5date1:
.2byte 2012                # 0x7dc
.byte   11                  # 0xb
.byte   25                  # 0x19
.byte   9                   # 0x9
.byte   40                  # 0x28
.byte   15                  # 0xf
.space  1
.size   $_ZZ4mainE5date1, 8

```



# CONTROL FLOW STATEMENTS

This chapter illustrates the corresponding IR for control flow statements, like “**if else**”, “**while**” and “**for**” loop statements in C, and how to translate these control flow statements of llvm IR into cpu0 instructions.

## 7.1 Control flow statement

Run ch7\_1\_1.cpp with clang will get result as follows,

[LLVMBackendTutorialExampleCode/InputFiles/ch7\\_1\\_1.cpp](#)

```
1 int main()
2 {
3     unsigned int a = 0;
4     int b = 1;
5     int c = 2;
6     int d = 3;
7     int e = 4;
8     int f = 5;
9     int g = 6;
10    int h = 7;
11    int i = 8;
12
13    if (a == 0) {
14        a++;
15    }
16    if (b != 0) {
17        b++;
18    }
19    if (c > 0) {
20        c++;
21    }
22    if (d >= 0) {
23        d++;
24    }
25    if (e < 0) {
26        e++;
27    }
28    if (f <= 0) {
29        f++;
30    }
31    if (g <= 1) {
```

```

32         g++;
33     }
34     if (h >= 1) {
35         h++;
36     }
37     if (i < h) {
38         i++;
39     }
40     if (a != b) {
41         a++;
42     }
43
44     return a;
45 }

; ModuleID = 'ch7_1_1.bc'
target datalayout = "e-p:32:32:32-i1:8:8-i16:16:16-i32:32:32-i64:32:64-
f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.8.0"

define i32 @main() nounwind ssp {
entry:
    %retval = alloca i32, align 4
    %a = alloca i32, align 4
    %b = alloca i32, align 4
    %c = alloca i32, align 4
    %d = alloca i32, align 4
    %e = alloca i32, align 4
    %f = alloca i32, align 4
    %g = alloca i32, align 4
    %h = alloca i32, align 4
    %i = alloca i32, align 4
    store i32 0, i32* %retval
    store i32 0, i32* %a, align 4
    store i32 1, i32* %b, align 4
    store i32 2, i32* %c, align 4
    store i32 3, i32* %d, align 4
    store i32 4, i32* %e, align 4
    store i32 5, i32* %f, align 4
    store i32 6, i32* %g, align 4
    store i32 7, i32* %h, align 4
    store i32 8, i32* %i, align 4
    %0 = load i32* %a, align 4
    %cmp = icmp eq i32 %0, 0
    br i1 %cmp, label %if.then, label %if.end

if.then:                                     ; preds = %entry
    %1 = load i32* %a, align 4
    %inc = add i32 %1, 1
    store i32 %inc, i32* %a, align 4
    br label %if.end

if.end:                                      ; preds = %if.then, %entry
    %2 = load i32* %b, align 4
    %cmp1 = icmp ne i32 %2, 0
    br i1 %cmp1, label %if.then2, label %if.end4

if.then2:                                     ; preds = %if.end

```

```

%3 = load i32* %b, align 4
%inc3 = add nsw i32 %3, 1
store i32 %inc3, i32* %b, align 4
br label %if.end4

if.end4:                                ; preds = %if.then2, %if.end
%4 = load i32* %c, align 4
%cmp5 = icmp sgt i32 %4, 0
br i1 %cmp5, label %if.then6, label %if.end8

if.then6:                                ; preds = %if.end4
%5 = load i32* %c, align 4
%inc7 = add nsw i32 %5, 1
store i32 %inc7, i32* %c, align 4
br label %if.end8

if.end8:                                ; preds = %if.then6, %if.end4
%6 = load i32* %d, align 4
%cmp9 = icmp sge i32 %6, 0
br i1 %cmp9, label %if.then10, label %if.end12

if.then10:                                ; preds = %if.end8
%7 = load i32* %d, align 4
%inc11 = add nsw i32 %7, 1
store i32 %inc11, i32* %d, align 4
br label %if.end12

if.end12:                                ; preds = %if.then10, %if.end8
%8 = load i32* %e, align 4
%cmp13 = icmp slt i32 %8, 0
br i1 %cmp13, label %if.then14, label %if.end16

if.then14:                                ; preds = %if.end12
%9 = load i32* %e, align 4
%inc15 = add nsw i32 %9, 1
store i32 %inc15, i32* %e, align 4
br label %if.end16

if.end16:                                ; preds = %if.then14, %if.end12
%10 = load i32* %f, align 4
%cmp17 = icmp sle i32 %10, 0
br i1 %cmp17, label %if.then18, label %if.end20

if.then18:                                ; preds = %if.end16
%11 = load i32* %f, align 4
%inc19 = add nsw i32 %11, 1
store i32 %inc19, i32* %f, align 4
br label %if.end20

if.end20:                                ; preds = %if.then18, %if.end16
%12 = load i32* %g, align 4
%cmp21 = icmp sle i32 %12, 1
br i1 %cmp21, label %if.then22, label %if.end24

if.then22:                                ; preds = %if.end20
%13 = load i32* %g, align 4
%inc23 = add nsw i32 %13, 1
store i32 %inc23, i32* %g, align 4

```

```

br label %if.end24

if.end24:                                ; preds = %if.then22, %if.end20
    %14 = load i32* %h, align 4
    %cmp25 = icmp sge i32 %14, 1
    br i1 %cmp25, label %if.then26, label %if.end28

if.then26:                                ; preds = %if.end24
    %15 = load i32* %h, align 4
    %inc27 = add nsw i32 %15, 1
    store i32 %inc27, i32* %h, align 4
    br label %if.end28

if.end28:                                ; preds = %if.then26, %if.end24
    %16 = load i32* %i, align 4
    %17 = load i32* %h, align 4
    %cmp29 = icmp slt i32 %16, %17
    br i1 %cmp29, label %if.then30, label %if.end32

if.then30:                                ; preds = %if.end28
    %18 = load i32* %i, align 4
    %inc31 = add nsw i32 %18, 1
    store i32 %inc31, i32* %i, align 4
    br label %if.end32

if.end32:                                ; preds = %if.then30, %if.end28
    %19 = load i32* %a, align 4
    %20 = load i32* %b, align 4
    %cmp33 = icmp ne i32 %19, %20
    br i1 %cmp33, label %if.then34, label %if.end36

if.then34:                                ; preds = %if.end32
    %21 = load i32* %a, align 4
    %inc35 = add i32 %21, 1
    store i32 %inc35, i32* %a, align 4
    br label %if.end36

if.end36:                                ; preds = %if.then34, %if.end32
    %22 = load i32* %a, align 4
    ret i32 %22
}

```

The “**icmp ne**” stand for integer compare NotEqual, “**slt**” stands for Set Less Than, “**sle**” stands for Set Less Equal. Run version Chapter6\_2/ with `llc -view-isel-dags` or `-debug` option, you can see it has translated **if** statement into `(br(brcond(%1, setcc(%2, Constant<c>, setne)), BasicBlock_02), BasicBlock_01)`. Ignore `%1`, we get the form `(br(brcond(setcc(%2, Constant<c>, setne)), BasicBlock_02), BasicBlock_01)`. For explanation, We list the IR DAG as follows,

```
%cond=setcc(%2, Constant<c>, setne)
brcond %cond, BasicBlock_02
br BasicBlock_01
```

We want to translate them into cpu0 instructions DAG as follows,

```
addiu %3, ZERO, Constant<c>
cmp %2, %3
jne BasicBlock_02
jmp BasicBlock_01
```

For the first addiu instruction as above which move Constant<c> into register, we have defined it before by the following code,

**LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0InstrInfo.td**

```
// Small immediates
def : Pat<(i32 immSExt16:$in),
      (ADDiu ZERO, imm:$in)>

// Arbitrary immediates
def : Pat<(i32 imm:$imm),
      (OR (SHL (ADDiu ZERO, (HI16 imm:$imm)), 16),
       (ADDiu ZERO, (LO16 imm:$imm)))>;
```

For the last IR br, we translate unconditional branch (br BasicBlock\_01) into jmp BasicBlock\_01 by the following pattern definition,

**LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0InstrInfo.td**

```
def brtarget : Operand<OtherVT> {
    let EncoderMethod = "getBranchTargetOpValue";
    let OperandType = "OPERAND_PCREL";
    let DecoderMethod = "DecodeBranchTarget";
}

...
// Unconditional branch
class UncondBranch<bits<8> op, string instr_asm>:
    BranchBase<op, (outs), (ins brtarget:$imm24),
               !strconcat(instr_asm, "\t$imm24"), [(br bb:$imm24)], IIBranch> {
        let isBranch = 1;
        let isTerminator = 1;
        let isBarrier = 1;
        let hasDelaySlot = 0;
    }
    ...
def JMP : UncondBranch<0x26, "jmp">;
```

The pattern [(br bb:\$imm24)] in class UncondBranch is translated into jmp machine instruction. The other two cpu0 instructions translation is more complicate than simple one-to-one IR to machine instruction translation we have experienced until now. To solve this chained IR to machine instructions translation, we define the following pattern,

**LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0InstrInfo.td**

```
// brcond patterns
multiclass BrcondPats<RegisterClass RC, Instruction JEQOp, Instruction JNEOp,
                      Instruction JLTOp, Instruction JGTOp, Instruction JLEOp, Instruction JGEOp,
                      Instruction CMPOp> {
    ...
def : Pat<(brcond (i32 (setne RC:$lhs, RC:$rhs)), bb:$dst),
          (JNEOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
    ...
def : Pat<(brcond RC:$cond, bb:$dst),
          (JNEOp (CMPOp RC:$cond, ZEROReg), bb:$dst)>;
```

Above definition support (setne RC:\$lhs, RC:\$rhs) register to register compare. There are other compare pattern like, seteq, setlt, ... . In addition to seteq, setne, ..., we define setueq, setune, ..., by reference Mips code even though we didn't find how setune came from. We have tried to define unsigned int type, but clang still generate setne instead of setune. Pattern search order is according their appear order in context. The last pattern (brcond RC:\$cond, bb:\$dst) is meaning branch to \$dst if \$cond != 0, it is equal to (JNEOp (CMPOp RC:\$cond, ZEROReg), bb:\$dst) in cpu0 translation.

The CMP instruction will set the result to register SW, and then JNE check the condition based on SW status as [Figure 7.1](#). Since SW belongs to a different register class, it is correct even an instruction is inserted between CMP and JNE as follows,



Figure 7.1: JNE (CMP \$r2, \$r3),

```

1  cmp %2, %3
2  addiu $r1, $r2, 3    // $r1 register never be allocated to $SW
3  jne BasicBlock_02

```

The reserved registers setting by the following function code we defined before,

**LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0RegisterInfo.cpp**

```

1  }
2
3  // pure virtual method
4  BitVector Cpu0RegisterInfo:::
5  getReservedRegs(const MachineFunction &MF) const {
6      static const uint16_t ReservedCPURegs[] = {
7          Cpu0::ZERO, Cpu0::AT, Cpu0::SP, Cpu0::LR, Cpu0::PC
8      };
9      BitVector Reserved(getNumRegs());
10     typedef TargetRegisterClass::iterator RegIter;
11
12     for (unsigned I = 0; I < array_lengthof(ReservedCPURegs); ++I)
13         Reserved.set(ReservedCPURegs[I]);
14
15     const Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
16     // Reserve GP if globalBaseRegFixed()
17     if (Cpu0FI->globalBaseRegFixed())
18         Reserved.set(Cpu0::GP);
19
20     return Reserved;
21 }

```

Although the following definition in Cpu0RegisterInfo.td has no real effect in Reserved Registers, you should comment the Reserved Registers in it for readability. Setting SW into another register class to prevent the SW register allocated to the register used by other instruction. The copyPhysReg() is called when DestReg and SrcReg belong to different Register Class. As comment, the only possibility in (DestReg==SW, SrcReg==CPU0Regs) is “cmp \$SW, \$ZERO, \$rc”.

**LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0RegisterInfo.td**

```

//=====
// Register Classes
//=====

def CPURegs : RegisterClass<"Cpu0", [i32], 32, (add
    // Return Values and Arguments
    V0, V1, A0, A1,
    // Not preserved across procedure calls
    T9,
    // Callee save
    S0, S1, S2,
    // Reserved
    ZERO, AT, GP, FP, SP, LR, PC)>;
...
// Status Registers
def SR    : RegisterClass<"Cpu0", [i32], 32, (add SW)>;

```

### LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0InstrInfo.cpp

```

-- Called when DestReg and SrcReg belong to different Register Class.
void Cpu0InstrInfo::
copyPhysReg(MachineBasicBlock &MBB,
            MachineBasicBlock::iterator I, DebugLoc DL,
            unsigned DestReg, unsigned SrcReg,
            bool KillSrc) const {
    unsigned Opc = 0, ZeroReg = 0;

    if (Cpu0::CPURegsRegClass.contains(DestReg)) { // Copy to CPU Reg.
    ...
    else if (SrcReg == Cpu0::SW) // add $ra, $ZERO, $SW
        Opc = Cpu0::ADD, ZeroReg = Cpu0::ZERO;
    }
    else if (Cpu0::CPURegsRegClass.contains(SrcReg)) { // Copy from CPU Reg.
    ...
    // Only possibility in (DestReg==SW, SrcReg==CPU0Regs) is
    // cmp $SW, $ZERO, $rc
    else if (DestReg == Cpu0::SW)
        Opc = Cpu0::CMP, ZeroReg = Cpu0::ZERO;
    }
}

```

Chapter7\_1/ include support for control flow statement. Run with it as well as the following llc option, you can get the obj file and dump it's content by hexdump as follows,

```

118-165-79-206:InputFiles Jonathan$ cat ch7_1_1.cpu0.s
...
    ld $3, 32($sp)
    cmp $3, $2
    jne $BB0_2
    jmp $BB0_1
$BB0_1:                      # %if.then
    ld $2, 32($sp)
    addiu $2, $2, 1
    st $2, 32($sp)
$BB0_2:                      # %if.end
    ld $2, 28($sp)
...
118-165-79-206:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=obj
ch7_1_1.bc -o ch7_1_1.cpu0.o

118-165-79-206:InputFiles Jonathan$ hexdump ch7_1_1.cpu0.o
    // jmp offset is 0x10=16 bytes which is correct
00000080 ..... 10 20 20 02 21 00 00 10
00000090 26 00 00 00 .....

```

The immediate value of jne (op 0x21) is 16; The offset between jne and \$BB0\_2 is 20 (5 words = 5\*4 bytes). Suppose the jne address is X, then the label \$BB0\_2 is X+20. Cpu0 is a RISC cpu0 with 3 stages of pipeline which are fetch, decode and execution according to cpu0 web site information. The cpu0 do branch instruction execution at decode stage which like mips. After the jne instruction fetched, the PC (Program Counter) is X+4 since cpu0 update PC at fetch stage. The \$BB0\_2 address is equal to PC+16 for the jne branch instruction execute at decode stage. List and explain this again as follows,

```

        // Fetch instruction stage for jne instruction. The fetch stage
        // can be divided into 2 cycles. First cycle fetch the
        // instruction. Second cycle adjust PC = PC+4.
        jne $BB0_2 // Do jne compare in decode stage. PC = X+4 at this stage.
        // When jne immediate value is 16, PC = PC+16. It will fetch
        // X+20 which equal to label $BB0_2 instruction, ld $2, 28($sp).
        jmp $BB0_1
$BB0_1:                                # %if.then
        ld $2, 32($sp)
        addiu $2, $2, 1
        st $2, 32($sp)
$BB0_2:                                # %if.end
        ld $2, 28($sp)

```

If cpu0 do “**jne**” compare in execution stage, then we should set **PC**=**PC**+12, offset of (\$BB0\_2, jne \$BB02) – 8, in this example.

Cpu0 is for teaching purpose and didn’t consider the performance with design. In reality, the conditional branch is important in performance of CPU design. According bench mark information, every 7 instructions will meet 1 branch instruction in average. Cpu0 take 2 instructions for conditional branch, (**jne(cmp...)**), while Mips use one instruction (**bne**).

Finally we list the code added for full support of control flow statement,

#### [LLVMBackendTutorialExampleCode/Chapter7\\_1/MCTargetDesc/Cpu0MCCodeEmitter.cpp](#)

```

1  }
2
3  /// getBranchTargetOpValue - Return binary encoding of the branch
4  /// target operand. If the machine operand requires relocation,
5  /// record the relocation and return zero.
6  unsigned Cpu0MCCodeEmitter::
7  getBranchTargetOpValue(const MCInst &MI, unsigned OpNo,
8                      SmallVectorImpl<MCFixup> &Fixups) const {
9
10  const MCOperand &MO = MI.getOperand(OpNo);
11  assert(MO.isExpr() && "getBranchTargetOpValue expects only expressions");
12
13  const MCExpr *Expr = MO.getExpr();
14  Fixups.push_back(MCFixup::Create(0, Expr,
15                                  MCFixupKind(Cpu0::fixup_Cpu0_PC24)));
16  return 0;
17 }

```

#### [LLVMBackendTutorialExampleCode/Chapter7\\_1/Cpu0MCInstLower.cpp](#)

```

MCOperand Cpu0MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
                                              MachineOperandType MOTy,
                                              unsigned Offset) const {
...
switch(MO.getTargetFlags()) {
default:                                llvm_unreachable("Invalid target flag!");
case Cpu0II::MO_NO_FLAG:    Kind = MCSymbolRefExpr::VK_None; break;
...
}

```

```

...
switch (MOTy) {
  case MachineOperand::MO_MachineBasicBlock:
    Symbol = MO.getMBB()->getSymbol();
    break;
  ...
}

MCOperand Cpu0MCInstLower::LowerOperand(const MachineOperand& MO,
                                         unsigned offset) const {
  MachineOperandType MOTy = MO.getType();

  switch (MOTy) {
    default: llvm_unreachable("unknown operand type");
    case MachineOperand::MO_Register:
    ...
    case MachineOperand::MO_MachineBasicBlock:
    case MachineOperand::MO_GlobalAddress:
    case MachineOperand::MO_BlockAddress:
    ...
  }
  ...
}

```

### LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0InstrInfo.cpp

```

// Called when DestReg and SrcReg belong to different Register Class.
void Cpu0InstrInfo::
copyPhysReg(MachineBasicBlock &MBB,
            MachineBasicBlock::iterator I, DebugLoc DL,
            unsigned DestReg, unsigned SrcReg,
            bool KillSrc) const {
  if (Cpu0::CPURegsRegClass.contains(DestReg)) { // Copy to CPU Reg.
    ...
  } else if (SrcReg == Cpu0::SW) // add $ra, $ZERO, $SW
    Opc = Cpu0::ADD, ZeroReg = Cpu0::ZERO;
  } else if (Cpu0::CPURegsRegClass.contains(SrcReg)) { // Copy from CPU Reg.
    ...
    // Only possibility in (DestReg==SW, SrcReg==CPU0Regs) is
    // cmp $SW, $ZERO, $rc
    else if (DestReg == Cpu0::SW)
      Opc = Cpu0::CMP, ZeroReg = Cpu0::ZERO;
  }
  ...
}

```

### LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0ISelLowering.cpp

```

Cpu0TargetLowering::
Cpu0TargetLowering(Cpu0TargetMachine &TM)
  : TargetLowering(TM, new Cpu0TargetObjectFile()),
    Subtarget(&TM.getSubtarget<Cpu0Subtarget>()) {
  ...
  // Used by legalize types to correctly generate the setcc result.
}

```

```

// Without this, every float setcc comes with a AND/OR with the result,
// we don't want this, since the fpcmp result goes to a flag register,
// which is used implicitly by brcond and select operations.
AddPromotedToType(ISD::SETCC, MVT::i1, MVT::i32);
...
setOperationAction(ISD::BRCOND, MVT::Other, Custom);

// Operations not directly supported by Cpu0.
setOperationAction(ISD::BR_CC, MVT::i32, Expand);
...
}

```

### LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0InstrFormats.td

```

//=====
// Format J instruction class in Cpu0 : </opcode/address/>
//=====

class FJ<bits<8> op, dag outs, dag ins, string asmstr, list<dag> pattern,
        InstrItinClass itin>: Cpu0Inst<outs, ins, asmstr, pattern, itin, FrmJ>
{
    bits<24> addr;

    let Opcode = op;

    let Inst{23-0} = addr;
}

```

### LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0InstrInfo.td

```

// Cpu0InstrInfo.td
// Instruction operand types
def brtarget : Operand<OtherVT> {
    let EncoderMethod = "getBranchTargetOpValue";
    let OperandType = "OPERAND_PCREL";
    let DecoderMethod = "DecodeBranchTarget";
}

...
/// Conditional Branch
class CBranch<bits<8> op, string instr_asm, RegisterClass RC,
            list<Register> UseRegs>:
    FJ<op, (outs), (ins RC:$ra, brtarget:$addr),
                !strconcat(instr_asm, "\t$addr"),
                [(brcond RC:$ra, bb:$addr)], IIBranch> {
        let isBranch = 1;
        let isTerminator = 1;
        let hasDelaySlot = 0;
        let neverHasSideEffects = 1;
    }

    // Unconditional branch, such as JMP
    class UncondBranch<bits<8> op, string instr_asm>:
        FJ<op, (outs), (ins brtarget:$addr),
                    !strconcat(instr_asm, "\t$addr"), [(br bb:$addr)], IIBranch> {
            let isBranch = 1;

```

```

let isTerminator = 1;
let isBarrier = 1;
let hasDelaySlot = 0;
let DecoderMethod = "DecodeJumpRelativeTarget";
}

...
/// Jump and Branch Instructions
def JEQ      : CBranch<0x20, "jeq", CPURegs>;
def JNE      : CBranch<0x21, "jne", CPURegs>;
def JLT      : CBranch<0x22, "jlt", CPURegs>;
def JGT      : CBranch<0x23, "jgt", CPURegs>;
def JLE      : CBranch<0x24, "jle", CPURegs>;
def JGE      : CBranch<0x25, "jge", CPURegs>;
def JMP      : UncondBranch<0x26, "jmp">;
...

// brcond patterns
multiclass BrcondPats<RegisterClass RC, Instruction JEQOp,
Instruction JNEOp, Instruction JLTOp, Instruction JGTOp,
Instruction JLEOp, Instruction JGEOp, Instruction CMPOp,
Register ZEROReg> {
def : Pat<(brcond (i32 (seteq RC:$lhs, RC:$rhs)), bb:$dst),
(JEQOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setueq RC:$lhs, RC:$rhs)), bb:$dst),
(JEQOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setne RC:$lhs, RC:$rhs)), bb:$dst),
(JNEOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setune RC:$lhs, RC:$rhs)), bb:$dst),
(JNEOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setlt RC:$lhs, RC:$rhs)), bb:$dst),
(JLTOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setult RC:$lhs, RC:$rhs)), bb:$dst),
(JLTOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setgt RC:$lhs, RC:$rhs)), bb:$dst),
(JGTOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setugt RC:$lhs, RC:$rhs)), bb:$dst),
(JGTOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setle RC:$lhs, RC:$rhs)), bb:$dst),
(JLEOp (CMPOp RC:$rhs, RC:$lhs), bb:$dst)>;
def : Pat<(brcond (i32 (setule RC:$lhs, RC:$rhs)), bb:$dst),
(JLEOp (CMPOp RC:$rhs, RC:$lhs), bb:$dst)>;
def : Pat<(brcond (i32 (setge RC:$lhs, RC:$rhs)), bb:$dst),
(JGEOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;
def : Pat<(brcond (i32 (setuge RC:$lhs, RC:$rhs)), bb:$dst),
(JGEOp (CMPOp RC:$lhs, RC:$rhs), bb:$dst)>;

def : Pat<(brcond RC:$cond, bb:$dst),
(JNEOp (CMPOp RC:$cond, ZEROReg), bb:$dst)>;
}

defm : BrcondPats<CPURegs, JEQ, JNE, JLT, JGT, JLE, JGE, CMP, ZERO>;

```

The ch7\_1\_2.cpp is for “**nest if**” test. The ch7\_1\_3.cpp is the “**for loop**” as well as “**while loop**”, “**continue**”, “**break**”, “**goto**” test. The ch7\_1\_6.cpp is for “**goto**” test. You can run with them if you like to test more.

Finally, Chapter7\_1/ support the local array definition by add the LowerCall() empty function in Cpu0ISelLowering.cpp as follows,

**LLVMBackendTutorialExampleCode/Chapter7\_1/Cpu0ISelLowering.cpp**

```
// Cpu0ISelLowering.cpp
SDValue
Cpu0TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
                               SmallVectorImpl<SDValue> &InVals) const {
    return CLI.Chain;
}
```

With this LowerCall(), it can translate ch7\_1\_4.cpp, ch7\_1\_4.bc to ch7\_1\_4.cpu0.s as follows,

**LLVMBackendTutorialExampleCode/InputFiles/ch7\_1\_4.cpp**

```
1 int main()
2 {
3     int a[3]={0, 1, 2};
4
5     return 0;
6 }

; ModuleID = 'ch7_1_4 .bc'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-
f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.8.0"

 @_ZZ4mainEla = private unnamed_addr constant [3 x i32] [i32 0, i32 1, i32 2],
align 4

define i32 @_main() nounwind ssp {
entry:
    %retval = alloca i32, align 4
    %a = alloca [3 x i32], align 4
    store i32 0, i32* %retval
    %0 = bitcast [3 x i32]* %a to i8*
    call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* bitcast ([3 x i32]*
        @_ZZ4mainEla to i8*), i32 12, i32 4, i1 false)
    ret i32 0
}

118-165-79-206:InputFiles Jonathan$ cat ch7_1_4.cpu0.s
.section .mdebug.abi32
.previous
.file "ch7_1_4.bc"
.text
.globl main
.align 2
.type main,@function
.ent main                # @_main
main:
    .frame $sp,24,$lr
    .mask 0x00000000,0
    .set noreorder
    .cupload $t9
    .set nomacro
# BB#0:                      # @_entry
    addiu $sp, $sp, -24
    ld $2, %got(__stack_chk_guard) ($gp)
```

```

ld  $3, 0($2)
st  $3, 20($sp)
addiu $3, $zero, 0
st  $3, 16($sp)
ld  $3, %got($_ZZ4mainE1a)($gp)
addiu $3, $3, %lo($_ZZ4mainE1a)
ld  $4, 8($3)
st  $4, 12($sp)
ld  $4, 4($3)
st  $4, 8($sp)
ld  $3, 0($3)
st  $3, 4($sp)
ld  $2, 0($2)
ld  $3, 20($sp)
cmp $2, $3
jne $BB0_2
jmp $BB0_1
$BB0_1:                                # %SP_return
    addiu $sp, $sp, 24
    ret $lr
$BB0_2:                                # %CallStackCheckFailBlk
    .set    macro
    .set    reorder
    .end    main
$tmp1:
    .size   main, ($tmp1)-main

    .type   $_ZZ4mainE1a,@object      # @_ZZ4mainE1a
    .section .rodata,"a",@progbits
    .align  2
$_ZZ4mainE1a:
    .4byte 0                         # 0x0
    .4byte 1                         # 0x1
    .4byte 2                         # 0x2
    .size   $_ZZ4mainE1a, 12

```

The ch7\_1\_5.cpp is for test C operators **==**, **!=**, **&&**, **||**. No code need to add since we have take care them before. But it can be test only when the control flow statement support is ready, as follows,

### LLVMBackendTutorialExampleCode/InputFiles/ch7\_1\_5.cpp

```

1 int main()
2 {
3     unsigned int a = 0;
4     int b = 1;
5     int c = 2;
6
7     if ((a == 0 && b == 2) || (c != 2)) {
8         a++;
9     }
10
11    return 0;
12 }

```

```

118-165-78-230:InputFiles Jonathan$ clang -c ch7_1_5.cpp -emit-llvm -o ch7_1_5.bc
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch7_1_5.bc -o

```

```

ch7_1_5.cpu0.s
118-165-78-230:InputFiles Jonathan$ cat ch7_1_5.cpu0.s
.section .mdebug.abi32
.previous
.file "ch7_1_5.bc"
.text
.globl main
.align 2
.type main,@function
.ent main          # @main
main:
.cfi_startproc
.frame $sp,16,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -16
$tmp1:
.cfi_def_cfa_offset 16
addiu $3, $zero, 0
st $3, 12($sp)
st $3, 8($sp)
addiu $2, $zero, 1
st $2, 4($sp)
addiu $2, $zero, 2
st $2, 0($sp)
ld $4, 8($sp)
cmp $4, $3
jne $BB0_2          // a != 0
jmp $BB0_1
$BB0_1:             // a == 0
ld $3, 4($sp)
cmp $3, $2
jeq $BB0_3          // b == 2
jmp $BB0_2
$BB0_2:
ld $3, 0($sp)
cmp $3, $2          // c == 2
jeq $BB0_4
jmp $BB0_3
$BB0_3:             // (a == 0 && b == 2) || (c != 2)
ld $2, 8($sp)
addiu $2, $2, 1     // a++
st $2, 8($sp)
$BB0_4:
addiu $sp, $sp, 16
ret $lr
.set macro
.set reorder
.end main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc

```

## 7.2 RISC CPU knowledge

As mentioned in the previous section, cpu0 is a RISC (Reduced Instruction Set Computer) CPU with 3 stages of pipeline. RISC CPU is full in world. Even the X86 of CISC (Complex Instruction Set Computer) is RISC inside. (It translate CISC instruction into micro-instruction which do pipeline as RISC). Knowledge with RISC will make you satisfied in compiler design. List these two excellent books we have read which include the real RISC CPU knowledge needed for reference. Sure, there are many books in Computer Architecture, and some of them contain real RISC CPU knowledge needed, but these two are what we read.

Computer Organization and Design: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design)

Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design)

The book of “Computer Organization and Design: The Hardware/Software Interface” (there are 4 editions until the book is written) is for the introduction (simple). “Computer Architecture: A Quantitative Approach” (there are 5 editions until the book is written) is more complicate and deep in CPU architecture.

Above two books use Mips CPU as example since Mips is more RISC-like than other market CPUs. ARM serials of CPU dominate the embedded market especially in mobile phone and other portable devices. The following book is good which I am reading now.

ARM System Developer’s Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in Computer Architecture and Design).

# FUNCTION CALL

The subroutine/function call of backend code translation is supported in this chapter. A lots of code needed in function call. We break it down according llvm supplied interface for easy to explanation. This chapter start from introducing the Mips stack frame structure since we borrow many part of ABI from it. Although each CPU has it's own ABI, most of RISC CPUs ABI are similar. In addition to support fixed number of arguments function call, cpu0 also support variable number of arguments since C/C++ support this feature. Supply Mips ABI and assemble language manual on internet link in this chapter for your reference. The section “4.5 DAG Lowering” of tricore\_llvm.pdf contains some knowledge about Lowering process. Section “4.5.1 Calling Conventions” of tricore\_llvm.pdf is the related materials you can reference.

This chapter is more complicate than any of the previous chapter. It include stack frame and the related ABI support. If you have problem in reading the stack frame illustrated in the first three sections of this chapter, you can read the appendix B of “Procedure Call Convention” of book “Computer Organization and Design” which listed in section “RISC CPU knowledge” of chapter “Control flow statement”<sup>1</sup>, “Run Time Memory” of compiler book, or “Function Call Sequence” and “Stack Frame” of Mips ABI.

## 8.1 Mips stack frame

The first thing for design the cpu0 function call is deciding how to pass arguments in function call. There are two options. The first is pass arguments all in stack. Second is pass arguments in the registers which are reserved for function arguments, and put the other arguments in stack if it over the number of registers reserved for function call. For example, Mips pass the first 4 arguments in register \$a0, \$a1, \$a2, \$a3, and the other arguments in stack if it over 4 arguments. Figure 8.1 is the Mips stack frame.

Run `llc -march=mips` for `ch8_1.bc`, you will get the following result. See comment “//”.

### LLVMBackendTutorialExampleCode/InputFiles/ch8\_1.cpp

```
1 int gI = 100;
2
3 int sum_i(int x1, int x2, int x3, int x4, int x5, int x6)
4 {
5     int sum = gI + x1 + x2 + x3 + x4 + x5 + x6;
6
7     return sum;
8 }
9
10 int main()
```

<sup>1</sup> <http://jonathan2251.github.com/lbd/ctrlflow.html#risc-cpu-knowledge>

| Base     | Offset | Contents                                                    | Frame                 |
|----------|--------|-------------------------------------------------------------|-----------------------|
| old \$sp | +16    | unspecified                                                 | <i>High addresses</i> |
|          |        | ...                                                         |                       |
|          |        | variable size                                               |                       |
|          |        | (if present)<br>incoming arguments<br>passed in stack frame |                       |
| \$sp     | +0     | space for incoming<br>arguments 1-4                         | Previous              |
|          |        | locals and<br>temporaries                                   |                       |
|          |        | general register<br>save area                               |                       |
|          |        | floating-point<br>register save area                        |                       |
| \$sp     | +0     | argument<br>build area                                      | Current               |
|          |        |                                                             |                       |
|          |        |                                                             | <i>Low addresses</i>  |

Figure 8.1: Mips stack frame

```

11  {
12      int a = sum_i(1, 2, 3, 4, 5, 6);
13
14      return a;
15  }

118-165-78-230:InputFiles Jonathan$ clang -c ch8_1.cpp -emit-llvm -o ch8_1.bc
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=mips -relocation-model=pic -filetype=asm ch8_1.bc -o
ch8_1.mips.s
118-165-78-230:InputFiles Jonathan$ cat ch8_1.mips.s
    .section .mdebug.abi32
    .previous
    .file "ch8_1.bc"
    .text
    .globl _Z5sum_iiiiiii
    .align 2
    .type _Z5sum_iiiiiii,@function
    .set nomips16          # @_Z5sum_iiiiiii
    .ent _Z5sum_iiiiiii
_Z5sum_iiiiiii:
    .cfi_startproc
    .frame $sp,32,$ra
    .mask 0x00000000,0
    .fmask 0x00000000,0
    .set noreorder
    .set nomacro
    .set noat
# BB#0:
    addiu $sp, $sp, -32
$tmp1:
    .cfi_def_cfa_offset 32
    sw $4, 28($sp)
    sw $5, 24($sp)
    sw $t9, 20($sp)
    sw $7, 16($sp)
    lw $1, 48($sp) // load argument 5
    sw $1, 12($sp)
    lw $1, 52($sp) // load argument 6
    sw $1, 8($sp)
    lw $2, 24($sp)
    lw $3, 28($sp)
    addu $2, $3, $2
    lw $3, 20($sp)
    addu $2, $2, $3
    lw $3, 16($sp)
    addu $2, $2, $3
    lw $3, 12($sp)
    addu $2, $2, $3
    addu $2, $2, $1
    sw $2, 4($sp)
    jr $ra
    addiu $sp, $sp, 32
    .set at
    .set macro
    .set reorder
    .end _Z5sum_iiiiiii
$tmp2:

```

```

.size _Z5sum_iiiiii, ($tmp2)-_Z5sum_iiiiii
.cfi_endproc

.globl main
.align 2
.type main,@function
.set nomips16           # @main
.ent main
main:
.cfi_startproc
.frame $sp,40,$ra
.mask 0x80000000,-4
.fmask 0x00000000,0
.set noreorder
.set nomacro
.set noat
# BB#0:
lui $2, %hi(_gp_disp)
addiu $2, $2, %lo(_gp_disp)
addiu $sp, $sp, -40
$tmp5:
.cfi_def_cfa_offset 40
sw $ra, 36($sp)          # 4-byte Folded Spill
$tmp6:
.cfi_offset 31, -4
addu $gp, $2, $25
sw $zero, 32($sp)
addiu $1, $zero, 6
sw $1, 20($sp) // Save argument 6 to 20($sp)
addiu $1, $zero, 5
sw $1, 16($sp) // Save argument 5 to 16($sp)
lw $25, %call16(_Z5sum_iiiiii)($gp)
addiu $4, $zero, 1 // Pass argument 1 to $4 (=a0)
addiu $5, $zero, 2 // Pass argument 2 to $5 (=a1)
addiu $t9, $zero, 3
jalr $25
addiu $7, $zero, 4
sw $2, 28($sp)
lw $ra, 36($sp)          # 4-byte Folded Reload
jr $ra
addiu $sp, $sp, 40
.set at
.set macro
.set reorder
.end main
$tmp7:
.size main, ($tmp7)-main
.cfi_endproc

```

From the mips assembly code generated as above, we know it save the first 4 arguments to \$a0..\$a3 and last 2 arguments to 16(\$sp) and 20(\$sp). Figure 8.2 is the arguments location for example code ch8\_1.cpp. It load argument 5 from 48(\$sp) in sum\_i() since the argument 5 is saved to 16(\$sp) in main(). The stack size of sum\_i() is 32, so 16+32(\$sp) is the location of incoming argument 5.

The 007-2418-003.pdf in <sup>2</sup> is the Mips assembly language manual. <sup>3</sup> is Mips Application Binary Interface which include the Figure 8.1.

<sup>2</sup> <https://www.dropbox.com/sh/2pkh1fewlq2zag9/OHnrYn2nOs/doc/MIPSproAssemblyLanguageProgrammerGuide>

<sup>3</sup> <http://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf>



Figure 8.2: Mips arguments location in stack frame

## 8.2 Load incoming arguments from stack frame

From last section, to support function call, we need implementing the arguments pass mechanism with stack frame. Before do that, let's run the old version of code Chapter7\_1/ with ch8\_1.cpp and see what happens.

```
118-165-79-31:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch8_1.bc -o ch8_1.cpu0.s
Assertion failed: (InVals.size() == Ins.size() && "LowerFormalArguments didn't
emit the correct number of values!"), function LowerArguments, file /Users/
Jonathan/llvm/test/src/lib/CodeGen/SelectionDAG/
SelectionDAGBuilder.cpp, ...
...
0. Program arguments: /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch8_1.bc -o
ch8_1.cpu0.s
1. Running pass 'Function Pass Manager' on module 'ch8_1.bc'.
2. Running pass 'CPU0 DAG->DAG Pattern Instruction Selection' on function
'@_Z5sum_iiiiii'
Illegal instruction: 4
```

Since Chapter7\_1/ define the LowerFormalArguments() with empty, we get the error message as above. Before define LowerFormalArguments(), we have to choose how to pass arguments in function call. We choose pass arguments all in stack frame. We don't reserve any dedicated register for arguments passing since cpu0 has only 16 registers while Mips has 32 registers. Cpu0CallingConv.td is defined for cpu0 passing rule as follows,

### LLVMBackendTutorialExampleCode/Chapter8\_1/Cpu0CallingConv.td

```
1 def RetCC_Cpu0EABI : CallingConv<[
2     // i32 are returned in registers V0, V1, A0, A1
3     CCIfType<[i32], CCAssignToReg<[V0, V1, A0, A1]>>
4 ]>;
5
6 //=====
7 // Cpu0 EABI Calling Convention
8 //=====
9
10 def CC_Cpu0EABI : CallingConv<[
11     // Promote i8/i16 arguments to i32.
12     CCIfType<[i8, i16], CCPromoteToType<i32>>,
13     // Integer values get stored in stack slots that are 4 bytes in
14     // size and 4-byte aligned.
15     CCIfType<[i32], CCAssignToStack<4, 4>>
16 ]>;
17
18 //=====
19 // Cpu0 Calling Convention Dispatch
20 //=====
21
22
23 def CC_Cpu0 : CallingConv<[
24     CCDelegateTo<CC_Cpu0EABI>
25 ]>;
26
27
28 def RetCC_Cpu0 : CallingConv<[
```

```

29     CCDelegateTo<RetCC_Cpu0EABI>
30 ] >;
31
32 def CSR_O32 : CalleeSavedRegs<(add LR, FP,
33                               (sequence "S%u", 2, 0))>;

```

As above, CC\_Cpu0 is the cpu0 Calling Convention which delegate to CC\_Cpu0EABI and define the CC\_Cpu0EABI. The reason we don't define the Calling Convention directly in CC\_Cpu0 is that a real general CPU like Mips can have several Calling Convention. Combine with the mechanism of "section Target Registration"<sup>4</sup> which llvm supplied, we can use different Calling Convention in different target. Although cpu0 only have a Calling Convention right now, define with a dedicate Call Convention name (CC\_Cpu0EABI in this example) is a better solution for system expand, and naming your Calling Convention. CC\_Cpu0EABI as above, say it pass arguments in stack frame.

Function LowerFormalArguments() charge function incoming arguments creation. We define it as follows,

#### LLVMBackendTutorialExampleCode/Chapter8\_1/Cpu0ISelLowering.cpp

```

1  }
2
3  /// LowerFormalArguments - transform physical registers into virtual registers
4  /// and generate load operations for arguments places on the stack.
5  SDValue
6  Cpu0TargetLowering::LowerFormalArguments (SDValue Chain,
7                                         CallingConv::ID CallConv,
8                                         bool isVarArg,
9                                         const SmallVectorImpl<ISD::InputArg> &Ins,
10                                        DebugLoc dl, SelectionDAG &DAG,
11                                        SmallVectorImpl<SDValue> &InVals)
12                                         const {
13     MachineFunction &MF = DAG.getMachineFunction();
14     MachineFrameInfo *MFI = MF.getFrameInfo();
15     Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
16
17     Cpu0FI->setVarArgsFrameIndex(0);
18
19     // Used with vargs to accumulate store chains.
20     std::vector<SDValue> OutChains;
21
22     // Assign locations to all of the incoming arguments.
23     SmallVector<CCValAssign, 16> ArgLocs;
24     CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
25                   getTargetMachine(), ArgLocs, *DAG.getContext());
26
27     CCInfo.AnalyzeFormalArguments(Ins, CC_Cpu0);
28
29     Function::const_arg_iterator FuncArg =
30         DAG.getMachineFunction().getFunction()->arg_begin();
31     int LastFI = 0; // Cpu0FI->LastInArgFI is 0 at the entry of this function.
32
33     for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i, ++FuncArg) {
34         CCValAssign &VA = ArgLocs[i];
35         EVT ValVT = VA.getValVT();
36         ISD::ArgFlagsTy Flags = Ins[i].Flags;
37         bool IsRegLoc = VA.isRegLoc();
38

```

<sup>4</sup> <http://jonathan2251.github.com/lbd/llvmstructure.html#target-registration>

```

39     if (Flags.isByVal()) {
40         assert(Flags.getByValSize() &&
41             "ByVal args of size 0 should have been ignored by front-end.");
42         continue;
43     }
44     // sanity check
45     assert(VA.isMemLoc());
46
47     // The stack pointer offset is relative to the caller stack frame.
48     LastFI = MFI->CreateFixedObject(ValVT.getSizeInBits()/8,
49                                     VA.getLocMemOffset(), true);
50
51     // Create load nodes to retrieve arguments from the stack
52     SDValue FIN = DAG.getFrameIndex(LastFI, getPointerTy());
53     InVals.push_back(DAG.getLoad(ValVT, dl, Chain, FIN,
54                                 MachinePointerInfo::getFixedStack(LastFI),
55                                 false, false, false, 0));
56 }
57 Cpu0FI->setLastInArgFI(LastFI);
58 // All stores are grouped in one node to allow the matching between
59 // the size of Ins and InVals. This only happens when on varg functions
60 if (!OutChains.empty()) {
61     OutChains.push_back(Chain);
62     Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
63                         &OutChains[0], OutChains.size());
64 }
65 return Chain;
66 }
67
68 //=====
```

Refresh “section Global variable”<sup>5</sup>, we handled global variable translation by create the IR DAG in LowerGlobalAddress() first, and then do the Instruction Selection by their corresponding machine instruction DAG in Cpu0InstrInfo.td. LowerGlobalAddress() is called when llc meet the global variable access. LowerFormalArguments() work with the same way. It is called when function is entered. It get incoming arguments information by CCInfo(CallConv, ..., ArgLocs, ...) before enter “**for loop**”. In ch8\_1.cpp, there are 6 arguments in sum\_i(...) function call and we use the stack frame only for arguments passing without any arguments pass in registers. So ArgLocs.size() is 6, each argument information is in ArgLocs[i] and ArgLocs[i].isMemLoc() is true. In “**for loop**”, it create each frame index object by LastFI = MFI->CreateFixedObject(ValVT.getSizeInBits()/8,VA.getLocMemOffset(), true) and FIN = DAG.getFrameIndex(LastFI, getPointerTy()). And then create IR DAG load node and put the load node into vector InVals by InVals.push\_back(DAG.getLoad(ValVT, dl, Chain, FIN, MachinePointerInfo::getFixedStack(LastFI), false, false, false, 0)). Cpu0FI->setVarArgsFrameIndex(0) and Cpu0FI->setLastInArgFI(LastFI) are called when before and after above work. In ch8\_1.cpp example, LowerFormalArguments() will be called twice. First time is for sum\_i() which will create 6 load DAG for 6 incoming arguments passing into this function. Second time is for main() which didn’t create any load DAG for no incoming argument passing into main(). In addition to LowerFormalArguments() which create the load DAG, we need to define the loadRegFromStackSlot() to issue the machine instruction “**ld \$r, offset(\$sp)**” to load incoming arguments from stack frame offset. GetMemOperand(..., FI, ...) return the Memory location of the frame index variable, which is the offset.

### LLVMBackendTutorialExampleCode/Chapter8\_1/Cpu0InstrInfo.cpp

```

1     }
2
3     static MachineMemOperand* GetMemOperand(MachineBasicBlock &MBB, int FI,
```

<sup>5</sup> <http://jonathan2251.github.com/lbd/globalvar.html#global-variable>

```

4                                         unsigned Flag) {
5     MachineFunction &MF = *MBB.getParent();
6     MachineFrameInfo &MFI = *MF.getFrameInfo();
7     unsigned Align = MFI.getObjectAlignment(FI);
8
9     return MF.getMachineMemOperand(MachinePointerInfo::getFixedStack(FI), Flag,
10                                MFI.getObjectSize(FI), Align);
11 }
12
13 void Cpu0InstrInfo::
14 loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
15                       unsigned DestReg, int FI,
16                       const TargetRegisterClass *RC,
17                       const TargetRegisterInfo *TRI) const
18 {
19     DebugLoc DL;
20     if (I != MBB.end()) DL = I->getDebugLoc();
21     MachineMemOperand *MMO = GetMemOperand(MBB, FI, MachineMemOperand::MOLoad);
22     unsigned Opc = 0;
23
24     if (Cpu0::CPURegsRegClass.hasSubClassEq(RC))
25         Opc = Cpu0::LD;
26     assert(Opc && "Register class not handled!");
27     BuildMI(MBB, I, DL, get(Opc), DestReg).addFrameIndex(FI).addImm(0)
28         .addMemOperand(MMO);
29 }

```

In addition to Calling Convention and LowerFormalArguments(), Chapter8\_1/ add the following code for cpu0 instructions **swi** (Software Interrupt), **jsub** and **jalr** (function call) definition and printing.

#### LLVMBackendTutorialExampleCode/Chapter8\_1/Cpu0InstrFormats.td

```

// Cpu0 Pseudo Instructions Format
class Cpu0Pseudo<dag outs, dag ins, string asmstr, list<dag> pattern>:
    Cpu0Inst<outs, ins, asmstr, pattern, IIPseudo, Pseudo> {
    let isCodeGenOnly = 1;
    let isPseudo = 1;
}

```

#### LLVMBackendTutorialExampleCode/Chapter8\_1/Cpu0InstrInfo.td

```

def SDT_Cpu0JmpLink      : SDTypeProfile<0, 1, [SDTCisVT<0, iPTR]>;
...
// Call
def Cpu0JmpLink : SDNode<"Cpu0ISD::JmpLink", SDT_Cpu0JmpLink,
    [SDNPHasChain, SDNPOutGlue, SDNPOptInGlue,
     SDNPVariadic]>;
...
def jmptarget  : Operand<OtherVT> {
    let EncoderMethod = "getJumpTargetOpValue";
}
...
def calltarget : Operand<iPTR> {
    let EncoderMethod = "getJumpTargetOpValue";
}

```

```
...
// Jump and Link (Call)
let isCall=1, hasDelaySlot=0 in {
  class JumpLink<bits<8> op, string instr_asm>:
    FJ<op, (outs), (ins calltarget:$target, variable_ops),
    !strconcat(instr_asm, "\t$target"), [(Cpu0JmpLink imm:$target)],
    IIBranch> {
    let DecoderMethod = "DecodeJumpTarget";
  }

  class JumpLinkReg<bits<8> op, string instr_asm,
    RegisterClass RC>:
    FA<op, (outs), (ins RC:$rb, variable_ops),
    !strconcat(instr_asm, "\t$rb"), [(Cpu0JmpLink RC:$rb)], IIBranch> {
    let rc = 0;
    let ra = 14;
    let shamt = 0;
  }
}

...
/// Jump and Branch Instructions
def SWI : JumpLink<0x2A, "swi">;
def JSUB : JumpLink<0x2B, "jsub">;
...
def IRET : JumpFR<0x2D, "iret", CPURegs>;
def JALR : JumpLinkReg<0x2E, "jalr", CPURegs>;
...
def : Pat<(Cpu0JmpLink (i32 tglobaladdr:$dst)),
      (JSUB tglobaladdr:$dst)>;
...

```

### LLVMBackendTutorialExampleCode/Chapter8\_1/Cpu0InstPrinter.cpp

```
static void printExpr(const MCExpr *Expr, raw_ostream &OS) {
  switch (Kind) {
  ...
  case MCSymbolRefExpr::VK_Cpu0_GOT_CALL: OS << "%call124("; break;
  ...
  }
}
...
```

### LLVMBackendTutorialExampleCode/Chapter8\_1/MCTargetDesc/Cpu0MCCodeEmitter.cpp

```
unsigned Cpu0MCCodeEmitter::getMachineOpValue(const MCInst &MI, const MCOperand &MO,
                                              SmallVectorImpl<MCFixup> &Fixups) const {
...
  switch(cast<MCSymbolRefExpr>(Expr)->getKind()) {
  ...
  case MCSymbolRefExpr::VK_Cpu0_GOT_CALL:
    FixupKind = Cpu0::fixup_Cpu0_CALL24;
    break;
  ...
}
}
```

```
...
}
```

### LLVMBackendTutorialExampleCode/Chapter8\_1/Cpu0MachineFunction.h

```
class Cpu0FunctionInfo : public MachineFunctionInfo {
    ...
    /// VarArgsFrameIndex - FrameIndex for start of varargs area.
    int VarArgsFrameIndex;

    // Range of frame object indices.
    // InArgFIRange: Range of indices of all frame objects created during call to
    //                 LowerFormalArguments.
    // OutArgFIRange: Range of indices of all frame objects created during call to
    //                 LowerCall except for the frame object for restoring $gp.
    std::pair<int, int> InArgFIRange, OutArgFIRange;
    int GPFI; // Index of the frame object for restoring $gp
    mutable int DynAllocFI; // Frame index of dynamically allocated stack area.
    unsigned MaxCallFrameSize;

public:
    Cpu0FunctionInfo(MachineFunction& MF)
        : MF(MF), GlobalBaseReg(0),
          VarArgsFrameIndex(0), InArgFIRange(std::make_pair(-1, 0)),
          OutArgFIRange(std::make_pair(-1, 0)), GPFI(0), DynAllocFI(0),
          MaxCallFrameSize(0)
    {}

    bool isInArgFI(int FI) const {
        return FI <= InArgFIRange.first && FI >= InArgFIRange.second;
    }
    void setLastInArgFI(int FI) { InArgFIRange.second = FI; }

    void extendOutArgFIRange(int FirstFI, int LastFI) {
        if (!OutArgFIRange.second)
            // this must be the first time this function was called.
            OutArgFIRange.first = FirstFI;
        OutArgFIRange.second = LastFI;
    }

    int getGPFI() const { return GPFI; }
    void setGPFI(int FI) { GPFI = FI; }
    bool needGPSaveRestore() const { return getGPFI(); }
    bool isGPFI(int FI) const { return GPFI && GPFI == FI; }

    // The first call to this function creates a frame object for dynamically
    // allocated stack area.
    int getDynAllocFI() const {
        if (!DynAllocFI)
            DynAllocFI = MF.getFrameInfo() -> CreateFixedObject(4, 0, true);

        return DynAllocFI;
    }
    bool isDynAllocFI(int FI) const { return DynAllocFI && DynAllocFI == FI; }
    ...
    int getVarArgsFrameIndex() const { return VarArgsFrameIndex; }
    void setVarArgsFrameIndex(int Index) { VarArgsFrameIndex = Index; }
}
```

```

unsigned getMaxCallFrameSize() const { return MaxCallFrameSize; }
void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; }
};

```

After above changes, you can run Chapter8\_1/ with ch8\_1.cpp and see what happens in the following,

```

118-165-79-83:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch8_1.bc -o ch8_1.cpu0.s
Assertion failed: ((CLI.IsTailCall || InVals.size() == CLI.Ins.size()) &&
"LowerCall didn't emit the correct number of values!"), function LowerCallTo,
file /Users/Jonathan/llvm/test/src/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.
cpp, ...
...
0. Program arguments: /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch8_1.bc -o
ch8_1.cpu0.s
1. Running pass 'Function Pass Manager' on module 'ch8_1.bc'.
2. Running pass 'CPU0 DAG->DAG Pattern Instruction Selection' on function
'@main'
Illegal instruction: 4

```

Now, the LowerFormalArguments() has the correct number, but LowerCall() has not the correct number of values!

## 8.3 Store outgoing arguments to stack frame

Figure 8.2 depicted two steps to take care arguments passing. One is store outgoing arguments in caller function, and the other is load incoming arguments in callee function. We defined LowerFormalArguments() for “**load incoming arguments**” in callee function last section. Now, we will finish “**store outgoing arguments**” in caller function. LowerCall() is responsible to do this. The implementation as follows,

### LLVMBackendTutorialExampleCode/Chapter8\_2/Cpu0ISelLowering.cpp

```

1 SDValue
2 Cpu0TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
3                               SmallVectorImpl<SDValue> &InVals) const {
4     SelectionDAG &DAG
5     = CLI.DAG;
6     DebugLoc &dl
7     = CLI.DL;
8     SmallVector<ISD::OutputArg, 32> &Outs
9     = CLI.Outs;
10    SmallVector<SDValue, 32> &OutVals
11    = CLI.OutVals;
12    SmallVector<ISD::InputArg, 32> &Ins
13    = CLI.Ins;
14    SDValue InChain
15    = CLI.Chain;
16    SDValue Callee
17    = CLI.Callee;
18    bool &isTailCall
19    = CLI.IsTailCall;
20    CallingConv::ID CallConv
21    = CLI.CallConv;
22    bool isVarArg
23    = CLI.IsVarArg;
24
25    MachineFunction &MF = DAG.getMachineFunction();
26    MachineFrameInfo *MFI = MF.getFrameInfo();
27    const TargetFrameLowering *TFL = MF.getTarget().getFrameLowering();
28    bool IsPIC = getTargetMachine().getRelocationModel() == Reloc::PIC_;
29    Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
30
31    // Analyze operands of the call, assigning locations to each operand.

```

```

22     SmallVector<CCValAssign, 16> ArgLocs;
23     CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
24                     getTargetMachine(), ArgLocs, *DAG.getContext());
25
26     CCInfo.AnalyzeCallOperands(Outs, CC_Cpu0);
27
28     // Get a count of how many bytes are to be pushed on the stack.
29     unsigned NextStackOffset = CCInfo.getNextStackOffset();
30
31     // If this is the first call, create a stack frame object that points to
32     // a location to which .cprestore saves $gp.
33     if (IsPIC && Cpu0FI->globalBaseRegFixed() && !Cpu0FI->getGPFI())
34         Cpu0FI->setGPFI(MFI->CreateFixedObject(4, 0, true));
35     // Get the frame index of the stack frame object that points to the location
36     // of dynamically allocated area on the stack.
37     int DynAllocFI = Cpu0FI->getDynAllocFI();
38     unsigned MaxCallFrameSize = Cpu0FI->getMaxCallFrameSize();
39
40     if (MaxCallFrameSize < NextStackOffset) {
41         Cpu0FI->setMaxCallFrameSize(NextStackOffset);
42
43         // Set the offsets relative to $sp of the $gp restore slot and dynamically
44         // allocated stack space. These offsets must be aligned to a boundary
45         // determined by the stack alignment of the ABI.
46         unsigned StackAlignment = TFL->getStackAlignment();
47         NextStackOffset = (NextStackOffset + StackAlignment - 1) /
48                           StackAlignment * StackAlignment;
49
50         MFI->setObjectOffset(DynAllocFI, NextStackOffset);
51     }
52     // Chain is the output chain of the last Load/Store or CopyToReg node.
53     // ByValChain is the output chain of the last Memcpy node created for copying
54     // byval arguments to the stack.
55     SDValue Chain, CallSeqStart, ByValChain;
56     SDValue NextStackOffsetVal = DAG.getIntPtrConstant(NextStackOffset, true);
57     Chain = CallSeqStart = DAG.getCALLSEQ_START(InChain, NextStackOffsetVal);
58     ByValChain = InChain;
59
60     // With EABI is it possible to have 16 args on registers.
61     SmallVector<std::pair<unsigned, SDValue>, 16> RegsToPass;
62     SmallVector<SDValue, 8> MemOpChains;
63
64     int FirstFI = -MFI->getNumFixedObjects() - 1, LastFI = 0;
65
66     // Walk the register/memloc assignments, inserting copies/loads.
67     for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
68         SDValue Arg = OutVals[i];
69         CCValAssign &VA = ArgLocs[i];
70         MVT ValVT = VA.getValVT(), LocVT = VA.getLocVT();
71         ISD::ArgFlagsTy Flags = Outs[i].Flags;
72
73         // ByVal Arg.
74         if (Flags.is ByVal()) {
75             assert("!!!Error!!!, Flags.is ByVal()==true");
76             assert(Flags.getByValSize() &&
77                   "ByVal args of size 0 should have been ignored by front-end.");
78             continue;
79         }

```

```

80
81     // Register can't get to this point...
82     assert(VA.isMemLoc());
83
84     // Create the frame index object for this incoming parameter
85     LastFI = MFI->CreateFixedObject(ValVT.getSizeInBits()/8,
86                                     VA.getLocMemOffset(), true);
87     SDValue PtrOff = DAG.getFrameIndex(LastFI, getPointerTy());
88
89     // emit ISD::STORE whichs stores the
90     // parameter value to a stack Location
91     MemOpChains.push_back(DAG.getStore(Chain, dl, Arg, PtrOff,
92                                         MachinePointerInfo(), false, false, 0));
93 }
94
95 // Extend range of indices of frame objects for outgoing arguments that were
96 // created during this function call. Skip this step if no such objects were
97 // created.
98 if (LastFI)
99     Cpu0FI->extendOutArgFIRange(FirstFI, LastFI);
100
101 // If a memcpy has been created to copy a byval arg to a stack, replace the
102 // chain input of CallSeqStart with ByValChain.
103 if (InChain != ByValChain)
104     DAG.UpdateNodeOperands(CallSeqStart.getNode(), ByValChain,
105                            NextStackOffsetVal);
106
107 // Transform all store nodes into one single node because all store
108 // nodes are independent of each other.
109 if (!MemOpChains.empty())
110     Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
111                          &MemOpChains[0], MemOpChains.size());
112
113 // If the callee is a GlobalAddress/ExternalSymbol node (quite common, every
114 // direct call is) turn it into a TargetGlobalAddress/TargetExternalSymbol
115 // node so that legalize doesn't hack it.
116 unsigned char OpFlag;
117 bool IsPICCall = IsPIC; // true if calls are translated to jalr $25
118 bool GlobalOrExternal = false;
119 SDValue CalleeLo;
120
121 if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee)) {
122     OpFlag = IsPICCall ? Cpu0II::MO_GOT_CALL : Cpu0II::MO_NO_FLAG;
123     Callee = DAG.getTargetGlobalAddress(G->getGlobal(), dl,
124                                         getPointerTy(), 0, OpFlag);
125     GlobalOrExternal = true;
126 }
127 else if (ExternalSymbolSDNode *S = dyn_cast<ExternalSymbolSDNode>(Callee)) {
128     if (!IsPIC) // static
129         OpFlag = Cpu0II::MO_NO_FLAG;
130     else // O32 & PIC
131         OpFlag = Cpu0II::MO_GOT_CALL;
132     Callee = DAG.getTargetExternalSymbol(S->getSymbol(), getPointerTy(),
133                                         OpFlag);
134     GlobalOrExternal = true;
135 }
136
137 SDValue InFlag;

```

```

138
139 // Create nodes that load address of callee and copy it to T9
140 if (IsPICCall) {
141     if (GlobalOrExternal) {
142         // Load callee address
143         Callee = DAG.getNode(Cpu0ISD::Wrapper, dl, getPointerTy(),
144                               getGlobalReg(DAG, getPointerTy()), Callee);
145         SDValue LoadValue = DAG.getLoad(getPointerTy(), dl, DAG.getEntryNode(),
146                                         Callee, MachinePointerInfo::getGOT(),
147                                         false, false, false, 0);
148
149         // Use GOT+LO if callee has internal linkage.
150         if (CalleeLo.getNode()) {
151             SDValue Lo = DAG.getNode(Cpu0ISD::Lo, dl, getPointerTy(), CalleeLo);
152             Callee = DAG.getNode(ISD::ADD, dl, getPointerTy(), LoadValue, Lo);
153         } else
154             Callee = LoadValue;
155     }
156 }
157
158 // T9 should contain the address of the callee function if
159 // -relocation-model=pic or it is an indirect call.
160 if (IsPICCall || !GlobalOrExternal) {
161     // copy to T9
162     unsigned T9Reg = Cpu0::T9;
163     Chain = DAG.getCopyToReg(Chain, dl, T9Reg, Callee, SDValue(0, 0));
164     InFlag = Chain.getValue(1);
165     Callee = DAG.getRegister(T9Reg, getPointerTy());
166 }
167
168 // Cpu0JmpLink = #chain, #target_address, #opt_in_flags...
169 //           = Chain, Callee, Reg#1, Reg#2, ...
170 //
171 // Returns a chain & a flag for retval copy to use.
172 SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
173 SmallVector<SDValue, 8> Ops;
174 Ops.push_back(Chain);
175 Ops.push_back(Callee);
176
177 // Add argument registers to the end of the list so that they are
178 // known live into the call.
179 for (unsigned i = 0, e = RegsToPass.size(); i != e, ++i)
180     Ops.push_back(DAG.getRegister(RegsToPass[i].first,
181                                 RegsToPass[i].second.getValueType()));
182
183 // Add a register mask operand representing the call-preserved registers.
184 const TargetRegisterInfo *TRI = getTargetMachine().getRegisterInfo();
185 const uint32_t *Mask = TRI->getCallPreservedMask(CallConv);
186 assert(Mask && "Missing call preserved mask for calling convention");
187 Ops.push_back(DAG.getRegisterMask(Mask));
188
189 if (InFlag.getNode())
190     Ops.push_back(InFlag);
191
192 Chain = DAG.getNode(Cpu0ISD::JmpLink, dl, NodeTys, &Ops[0], Ops.size());
193 InFlag = Chain.getValue(1);
194
195 // Create the CALLSEQ_END node.

```

```

196     Chain = DAG.getCALLSEQ_END(Chain,
197                             DAG.getIntPtrConstant(NextStackOffset, true),
198                             DAG.getIntPtrConstant(0, true), InFlag);
199     InFlag = Chain.getValue(1);
200
201     // Handle result values, copying them out of physregs into vregs that we
202     // return.
203     return LowerCallResult(Chain, InFlag, CallConv, isVarArg,
204                           Ins, dl, DAG, InVals);
205 }
206
207 /// LowerCallResult - Lower the result values of a call into the
208 /// appropriate copies out of appropriate physical registers.
209 SDValue
210 Cpu0TargetLowering::LowerCallResult(SDValue Chain, SDValue InFlag,
211                                     CallingConv::ID CallConv, bool isVarArg,
212                                     const SmallVectorImpl<ISD::InputArg> &Ins,
213                                     DebugLoc dl, SelectionDAG &DAG,
214                                     SmallVectorImpl<SDValue> &InVals) const {
215     // Assign locations to each value returned by this call.
216     SmallVector<CCValAssign, 16> RVLocs;
217     CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
218                    getTargetMachine(), RVLocs, *DAG.getContext());
219
220     CCInfo.AnalyzeCallResult(Ins, RetCC_Cpu0);
221
222     // Copy all of the result registers out of their specified physreg.
223     for (unsigned i = 0; i != RVLocs.size(); ++i) {
224         Chain = DAG.getCopyFromReg(Chain, dl, RVLocs[i].getLocReg(),
225                                     RVLocs[i].getValVT(), InFlag).getValue(1);
226         InFlag = Chain.getValue(2);
227         InVals.push_back(Chain.getValue(0));
228     }
229
230     return Chain;
231 }

```

Just like load incoming arguments from stack frame, we call CCInfo(CallConv,..., ArgLocs, ...) to get outgoing arguments information before enter “**for loop**” and set stack alignment with 8 bytes. They’re almost same in “**for loop**” with LowerFormalArguments(), except LowerCall() create store DAG vector instead of load DAG vector. After the “**for loop**”, it create “**ld \$t9, %call24(\_Z5sum\_iiiiii)(\$gp)**” and jalr \$t9 for calling subroutine (the \$6 is \$t9) in PIC mode. DAG.getCALLSEQ\_START() and DAG.getCALLSEQ\_END() are set before the “**for loop**” and after call subroutine, they insert CALLSEQ\_START, CALLSEQ\_END, and translate into pseudo machine instructions !ADJCALLSTACKDOWN, !ADJCALLSTACKUP later according Cpu0InstrInfo.td definition as follows.

### LLVMBackendTutorialExampleCode/Chapter8\_2/Cpu0InstrInfo.td

```

def SDT_Cpu0CallSeqStart : SDCallSeqStart<[SDTCisVT<0, i32>]>;
def SDT_Cpu0CallSeqEnd   : SDCallSeqEnd<[SDTCisVT<0, i32>, SDTCisVT<1, i32>]>;
...
// These are target-independent nodes, but have target-specific formats.
def callseq_start : SDNode<"ISD::CALLSEQ_START", SDT_Cpu0CallSeqStart,
                      [SDNPHasChain, SDNPOutGlue]>;
def callseq_end   : SDNode<"ISD::CALLSEQ_END", SDT_Cpu0CallSeqEnd,
                      [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

```

```
//=====
// Pseudo instructions
//=====

// As stack alignment is always done with addiu, we need a 16-bit immediate
let Defs = [SP], Uses = [SP] in {
def ADJCALLSTACKDOWN : Cpu0Pseudo<(outs), (ins uimm16:$amt),
    "ADJCALLSTACKDOWN $amt",
    [(callseq_start timm:$amt)]>;
def ADJCALLSTACKUP   : Cpu0Pseudo<(outs), (ins uimm16:$amt1, uimm16:$amt2),
    "ADJCALLSTACKUP $amt1",
    [(callseq_end timm:$amt1, timm:$amt2)]>;
}
```

Like load incoming arguments, we need to implement storeRegToStackSlot() for store outgoing arguments to stack frame offset.

#### [LLVMBackendTutorialExampleCode/Chapter8\\_2/Cpu0InstrInfo.cpp](#)

```
//- st SrcReg, MMO(FI)
void Cpu0InstrInfo:::
storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
    unsigned SrcReg, bool isKill, int FI,
    const TargetRegisterClass *RC,
    const TargetRegisterInfo *TRI) const {
    DebugLoc DL;
    if (I != MBB.end()) DL = I->getDebugLoc();
    MachineMemOperand *MMO = GetMemOperand(MBB, FI, MachineMemOperand::MOStore);

    unsigned Opc = 0;

    if (RC == Cpu0::CPURegsRegisterClass)
        Opc = Cpu0::ST;
    assert(Opc && "Register class not handled!");
    BuildMI(MBB, I, DL, get(Opc)).addReg(SrcReg, getKillRegState(isKill))
        .addFrameIndex(FI).addImm(0).addMemOperand(MMO);
}
```

Now, let's run Chapter8\_2/ with ch8\_1.cpp to get result as follows (see comment //).

```
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch8_1.bc -o
ch8_1.cpu0.s
118-165-78-230:InputFiles Jonathan$ cat ch8_1.cpu0.s
    .section .mdebug.abi32
    .previous
    .file "ch8_1.bc"
    .text
    .globl _Z5sum_iiiiii
    .align 2
    .type _Z5sum_iiiiii,@function
    .ent _Z5sum_iiiiii          # @_Z5sum_iiiiii
_Z5sum_iiiiii:
    .cfi_startproc
    .frame $sp,32,$lr
    .mask 0x00000000,0
    .set noreorder
```

```

.cupload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -32
$tmp1:
.cfi_def_cfa_offset 32
ld $2, 32($sp)
st $2, 28($sp)
ld $2, 36($sp)
st $2, 24($sp)
ld $2, 40($sp)
st $2, 20($sp)
ld $2, 44($sp)
st $2, 16($sp)
ld $2, 48($sp)
st $2, 12($sp)
ld $2, 52($sp)
st $2, 8($sp)
addiu $3, $zero, %got_hi(gI)
shl $3, $3, 16
addu $3, $3, $gp
ld $3, %got_lo(gI)($3)
ld $3, 0($3)
ld $4, 28($sp)
addu $3, $3, $4
ld $4, 24($sp)
addu $3, $3, $4
ld $4, 20($sp)
addu $3, $3, $4
ld $4, 16($sp)
addu $3, $3, $4
ld $4, 12($sp)
addu $3, $3, $4
addu $2, $3, $2
st $2, 4($sp)
addiu $sp, $sp, 32
ret $lr
.set macro
.set reorder
.end _Z5sum_iiiiii
$tmp2:
.size _Z5sum_iiiiii, ($tmp2)-_Z5sum_iiiiii
.cfi_endproc

.globl main
.align 2
.type main,@function
.ent main          # @main
main:
.cfi_startproc
.frame $sp,40,$lr
.mask 0x00004000,-4
.set noreorder
.cupload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -40
$tmp5:

```

```

.cfi_def_cfa_offset 40
st    $lr, 36($sp)           # 4-byte Folded Spill
$tmp6:
.cfi_offset 14, -4
addiu $2, $zero, 0
st    $2, 32($sp)
!ADJCALLSTACKDOWN 24
addiu $2, $zero, 6
st    $2, 60($sp)
addiu $2, $zero, 5
st    $2, 56($sp)
addiu $2, $zero, 4
st    $2, 52($sp)
addiu $2, $zero, 3
st    $2, 48($sp)
addiu $2, $zero, 2
st    $2, 44($sp)
addiu $2, $zero, 1
st    $2, 40($sp)
ld    $t9, %call124(_Z5sum_iiiiiii)($gp)
jalr $t9
!ADJCALLSTACKUP 24
st    $2, 28($sp)
ld    $lr, 36($sp)           # 4-byte Folded Reload
addiu $sp, $sp, 40
ret   $lr
.set  macro
.set  reorder
.end  main
$tmp7:
.size  main, ($tmp7)-main
.cfi_endproc

.type  gI,@object          # @gI
.data
.globl gI
.align 2
gI:
.4byte 100                 # 0x64
.size  gI, 4

```

## 8.4 Fix issues

Run Chapter8\_2/ with ch6\_2.cpp to get the incorrect main return (return register \$2 is not 0) as follows,

[LLVMBackendTutorialExampleCode/InputFiles/ch6\\_2.cpp](#)

```

1 struct Date
2 {
3     int year;
4     int month;
5     int day;
6 };
7

```

```

8 Date date = {2012, 10, 12};
9 int a[3] = {2012, 10, 12};
10
11 int main()
12 {
13     int day = date.day;
14     int i = a[1];
15
16     return 0;
17 }

118-165-78-31:InputFiles Jonathan$ clang -c ch6_2.cpp -emit-llvm -o ch6_2.bc
118-165-78-31:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=asm ch6_2.bc -o
ch6_2.cpu0.static.s
118-165-78-31:InputFiles Jonathan$ cat ch6_2.cpu0.static.s
.section .mdebug.abi32
.previous
.file "ch6_2.bc"
.text
.globl main
.align 2
.type main,@function
.ent main          # @main
main:
.cfi_startproc
.frame $sp,16,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
    addiu $sp, $sp, -16
$tmp1:
.cfi_def_cfa_offset 16
    addiu $2, $zero, 0
    st $2, 12($sp)
    addiu $2, $zero, %hi(date)
    shl $2, $2, 16
    addiu $2, $2, %lo(date)
    ld $2, 8($2)
    st $2, 8($sp)
    addiu $2, $zero, %hi(a)
    shl $2, $2, 16
    addiu $2, $2, %lo(a)
    ld $2, 4($2)
    st $2, 4($sp)
    addiu $sp, $sp, 16
    ret $lr
.set macro
.set reorder
.end main
...

```

Summary the issues for the code generated as above and in last section as follows:

1. It store the arguments to wrong offset.
2. !ADJCALLSTACKUP and !ADJCALLSTACKDOWN.
3. The \$gp is caller saved register. The caller main() didn't save \$gp will has bug if the callee sum\_i() has changed

\$gp. Programmer can change \$gp with assembly code in sum\_i().

4. Return value of main().

Solve these issues in each sub-section.

#### 8.4.1 Fix the wrong offset in storing arguments to stack frame

To fix the wrong offset in storing arguments, we modify the following code in eliminateFrameIndex() as follows. The code as below is modified in Chapter8\_3/ to set the caller outgoing arguments into spOffset(\$sp) (Chapter8\_2/ set them to pOffset+stackSize(\$sp)).

**LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0RegisterInfo.cpp**

```
void Cpu0RegisterInfo::  
eliminateFrameIndex(MachineBasicBlock::iterator II, int SPAdj,  
                    RegScavenger *RS) const {  
    ...  
    Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();  
    ...  
    if (Cpu0FI->isOutArgFI(FrameIndex) || Cpu0FI->isDynAllocFI(FrameIndex) ||  
        (FrameIndex >= MinCSFI && FrameIndex <= MaxCSFI))  
        FrameReg = Cpu0::SP;  
    else  
        FrameReg = getFrameRegister(MF);  
    ...  
    // Calculate final offset.  
    // - There is no need to change the offset if the frame object is one of the  
    //   following: an outgoing argument, pointer to a dynamically allocated  
    //   stack space or a $gp restore location,  
    // - If the frame object is any of the following, its offset must be adjusted  
    //   by adding the size of the stack:  
    //   incoming argument, callee-saved register location or local variable.  
    if (Cpu0FI->isOutArgFI(FrameIndex) || Cpu0FI->isGPFIFI(FrameIndex) ||  
        Cpu0FI->isDynAllocFI(FrameIndex))  
        Offset = spOffset;  
    else  
        Offset = spOffset + (int64_t)stackSize;  
    Offset += MI.getOperand(i+1).getImm();  
    ...  
}
```

**LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0MachineFunction.h**

```
/// SRetReturnReg - Some subtargets require that sret lowering includes  
/// returning the value of the returned struct in a register. This field  
/// holds the virtual register into which the sret argument is passed.  
unsigned SRetReturnReg;  
...  
Cpu0FunctionInfo(MachineFunction& MF)  
: MF(MF), SRetReturnReg(0)  
...  
bool isOutArgFI(int FI) const {  
    return FI <= OutArgFIRange.first && FI >= OutArgFIRange.second;  
}
```

```
...
unsigned getSRetReturnReg() const { return SRetReturnReg; }
void setSRetReturnReg(unsigned Reg) { SRetReturnReg = Reg; }
...
```

Run Chapter8\_3/ with ch8\_1.cpp will get the following result. It correct arguments offset in main() from (0+40)\$sp, (8+40)\$sp, ..., to (0)\$sp, (8)\$sp, ..., where the stack size is 40 in main().

```
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch8_1.bc -o
ch8_1.cpu0.s
118-165-78-230:InputFiles Jonathan$ cat ch8_1.cpu0.s
...
addiu $2, $zero, 6
st $2, 20($sp)           // Correct offset
addiu $2, $zero, 5
st $2, 16($sp)
addiu $2, $zero, 4
st $2, 12($sp)
addiu $2, $zero, 3
st $2, 8($sp)
addiu $2, $zero, 2
st $2, 4($sp)
addiu $2, $zero, 1
st $2, 0($sp)
ld $t9, %call24(_Z5sum_iiiiii)($gp)
jalr $t9
...
```

The incoming arguments is the formal arguments defined in compiler and program language books. The outgoing arguments is the actual arguments. Summary as Table: Callee incoming arguments and caller outgoing arguments.

Table 8.1: Callee incoming arguments and caller outgoing arguments

| Description              | Callee                                     | Caller                                      |
|--------------------------|--------------------------------------------|---------------------------------------------|
| Charged Function         | LowerFormalArguments()                     | LowerCall()                                 |
| Charged Function Created | Create load vectors for incoming arguments | Create store vectors for outgoing arguments |
| Arguments location       | spOffset + stackSize                       | spOffset                                    |

#### 8.4.2 Pseudo hook instruction ADJCALLSTACKDOWN and ADJCALLSTACKUP

To fix the !ADJSTACKDOWN and !ADJSTACKUP, we call Cpu0GenInstrInfo(Cpu0:: ADJCALLSTACKDOWN, Cpu0::ADJCALLSTACKUP) in Cpu0InstrInfo() constructor function and define eliminateCallFramePseudoInstr() as follows,

##### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0InstrInfo.cpp

```
Cpu0InstrInfo::Cpu0InstrInfo(Cpu0TargetMachine &tm)
: Cpu0GenInstrInfo(Cpu0::ADJCALLSTACKDOWN, Cpu0::ADJCALLSTACKUP),
...
```

#### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0FrameLowering.h

```
void eliminateCallFramePseudoInstr(MachineFunction &MF,
                                    MachineBasicBlock &MBB,
                                    MachineBasicBlock::iterator I) const;
```

#### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0FrameLowering.cpp

```
...
// Cpu0
// This function eliminate ADJCALLSTACKDOWN,
// ADJCALLSTACKUP pseudo instructions
void Cpu0FrameLowering::eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
                                                       MachineBasicBlock::iterator I) const {
    // Simply discard ADJCALLSTACKDOWN, ADJCALLSTACKUP instructions.
    MBB.erase(I);
}
```

With above definition, `eliminateCallFramePseudoInstr()` will be called when llvm meet pseudo instructions `ADJCALLSTACKDOWN` and `ADJCALLSTACKUP`. We just discard these 2 pseudo instructions. Run `Chapter8_3/` with `ch8_1.cpp` will these two Pseudo hook instructions.

### 8.4.3 Handle \$gp register in PIC addressing mode

In “section Global variable”<sup>5</sup>, we mentioned two link type, the static link and dynamic link. The option `-relocation-model=static` is for static link function while option `-relocation-model=pic` is for dynamic link function. One example of dynamic link function is used in share library. Share library include a lots of dynamic link functions usually can be loaded at run time. Since share library can be loaded in different memory address, the global variable address it access cannot be decided at link time. But, we can caculate the distance between the global variable address and the start address of shared library function when it be loaded.

Let’s run `Chapter8_3/` with `ch8_2.cpp` to get the following correct result. We putting the comments in the result for explanation.

```
118-165-78-230:InputFiles Jonathan$ cat ch8_1.cpu0.s
_Z5sum_iiiiiii:
...
.cupload $t9 // assign $gp = $t9 by loader when loader load re-entry
               // function (shared library) of _Z5sum_iiiiiii
.set      nomacro
# BB#0:
.addiu   $sp, $sp, -32
$tmp1:
.cfi_def_cfa_offset 32
...
.ld      $3, %got(gI)($gp)    // %got(gI) is offset of (gI - _Z5sum_iiiiiii)
...
.ret    $lr
.set      macro
.set      reorder
.end     _Z5sum_iiiiiii
...
.ent    main                  # @main
main:
```

```

.cfi_startproc
...
.cupload $t9
.set nomacro
...
.cprestore 24    // save $gp to 24($sp)
addiu $2, $zero, 0
...
ld $t9, %call24(_Z5sum_iiiiii)($gp)
jalr $t9          // $t9 register is the alias of $6
ld $gp, 24($sp)  // restore $gp from 24($sp)
...
.end main
$tmp7:
.size main, ($tmp7)-main
.cfi_endproc

.type gI,@object      # @gI
.data
.globl gI
.align 2
gI:
.4byte 100           # 0x64
.size gI, 4

```

As above code comment, “**.cprestore 24**” is a pseudo instruction for saving **\$gp** to **24(\$sp)** while Instruction “**ld \$gp, 24(\$sp)**” will restore the **\$gp**. In other word, **\$gp** is a caller saved register, so **main()** need to save/restore **\$gp** before/after call the shared library **\_Z5sum\_iiiiii()** function. In **\_Z5sum\_iiiiii()** function, we translate global variable **gI** address by “**ld \$3, %got(gI)(\$gp)**” where **%got(gI)** is the offset value of **(gI - \_Z5sum\_iiiiii)** which can be caculated at link time.

According the original cpu0 web site information, it only support “**jsub**” 24 bits address range access. We add “**jalr**” to cpu0 and expand it to 32 bit address. We did this change for two reason. One is cpu0 can be expand to 32 bit address space by only add this instruction. The other is cpu0 as well as this book are designed for teaching purpose. We reserve “**jalr**” as PIC mode for dynamic linking function to demonstrate:

1. How caller handle the caller saved register **\$gp** in calling the function
2. How the code in the shared libray function use **\$gp** to access global variable address.
3. The **jalr** for dynamic linking function is easier in implementation and faster. As we have depicted in section “pic mode” of chapter “Global variables, structs and arrays, other type”. This solution is popular in reality and deserve change cpu0 official design as a compiler book.

Now, after the following code added in Chapter8\_3/, we can issue “**.cprestore**” in **emitPrologue()** and emit “**ld \$gp, (\$gp save slot on stack)**” after **jalr** by create file **Cpu0EmitGPRestore.cpp** which run as a function pass.

#### LLVMBackendTutorialExampleCode/Chapter8\_3/CMakeLists.txt

```

add_llvm_target (Cpu0CodeGen
  ...
  Cpu0EmitGPRestore.cpp
  ...

```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0TargetMachine.cpp

```
Cpu0elTargetMachine::
Cpu0elTargetMachine(const Target &T, StringRef TT,
                    StringRef CPU, StringRef FS, const TargetOptions &Options,
                    Reloc::Model RM, CodeModel::Model CM,
                    CodeGenOpt::Level OL)
: Cpu0TargetMachine(T, TT, CPU, FS, Options, RM, CM, OL, true) {}
namespace {
...
virtual bool addPreRegAlloc();
...
}

bool Cpu0PassConfig::addPreRegAlloc() {
    // Do not restore $gp if target is Cpu064.
    // In N32/64, $gp is a callee-saved register.

    addPass(createCpu0EmitGPRestorePass(getCpu0TargetMachine()));
    return true;
}
```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0.h

```
FunctionPass *createCpu0EmitGPRestorePass(Cpu0TargetMachine &TM);
```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0FrameLowering.cpp

```
void Cpu0FrameLowering::emitPrologue(MachineFunction &MF) const {
...
unsigned RegSize = 4;
unsigned LocalVarAreaOffset = Cpu0FI->needGPSaveRestore() ?
(MFI->getObjectOffset(Cpu0FI->getGPFI()) + RegSize) :
Cpu0FI->getMaxCallFrameSize();
...
// Restore GP from the saved stack location
if (Cpu0FI->needGPSaveRestore()) {
    unsigned Offset = MFI->getObjectOffset(Cpu0FI->getGPFI());
    BuildMI(MBB, MBBI, dl, TII.get(Cpu0::CPRESTORE)).addImm(Offset)
        .addReg(Cpu0::GP);
}
}
```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0InstrInfo.td

```
let neverHasSideEffects = 1 in
def CPRESTORE : Cpu0Pseudo<(outs), (ins i32imm:$loc, CPUREgs:$gp),
    ".cprestore\t$loc", []>;
```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0ISelLowering.cpp

```
SDValue
Cpu0TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
                               SmallVectorImpl<SDValue> &InVals) const {
    ...
    // If this is the first call, create a stack frame object that points to
    // a location to which .cprestore saves $gp.
    if (IsPIC && Cpu0FI->globalBaseRegFixed() && !Cpu0FI->getGPF() )
    ...
    if (MaxCallFrameSize < NextStackOffset) {
        ...
        if (Cpu0FI->needGPSaveRestore())
            MFI->setObjectOffset(Cpu0FI->getGPF(), NextStackOffset);
        }
        ...
    }
    ...
}
```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0EmitGPRestore.cpp

```
1 //===== Cpu0EmitGPRestore.cpp - Emit GP Restore Instruction =====//
2 //
3 //          The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // This pass emits instructions that restore $gp right
11 // after jalr instructions.
12 //
13 //=====//
14
15 #define DEBUG_TYPE "emit-gp-restore"
16
17 #include "Cpu0.h"
18 #include "Cpu0TargetMachine.h"
19 #include "Cpu0MachineFunction.h"
20 #include "llvm/CodeGen/MachineFunctionPass.h"
21 #include "llvm/CodeGen/MachineInstrBuilder.h"
22 #include "llvm/Target/TargetInstrInfo.h"
23 #include "llvm/ADT/Statistic.h"
24
25 using namespace llvm;
26
27 namespace {
28     struct Inserter : public MachineFunctionPass {
29
30     TargetMachine &TM;
31     const TargetInstrInfo *TII;
32
33     static char ID;
34     Inserter(TargetMachine &tm)
```

```

35     : MachineFunctionPass(ID), TM(tm), TII(tm.getInstrInfo()) { }
36
37     virtual const char *getPassName() const {
38         return "Cpu0 Emit GP Restore";
39     }
40
41     bool runOnMachineFunction(MachineFunction &F);
42     };
43     char Inserter::ID = 0;
44 } // end of anonymous namespace
45
46 bool Inserter::runOnMachineFunction(MachineFunction &F) {
47     Cpu0FunctionInfo *Cpu0FI = F.getInfo<Cpu0FunctionInfo>();
48
49     if ((TM.getRelocationModel() != Reloc::PIC_) ||
50         (!Cpu0FI->globalBaseRegFixed()))
51     return false;
52
53     bool Changed = false;
54     int FI = Cpu0FI->getGPFIndex();
55
56     for (MachineFunction::iterator MFI = F.begin(), MFE = F.end();
57           MFI != MFE; ++MFI) {
58         MachineBasicBlock& MBB = *MFI;
59         MachineBasicBlock::iterator I = MFI->begin();
60
61         /// IsLandingPad - Indicate that this basic block is entered via an
62         /// exception handler.
63         // If MBB is a landing pad, insert instruction that restores $gp after
64         // EH_LABEL.
65         if (MBB.isLandingPad()) {
66             // Find EH_LABEL first.
67             for (; I->getOpcode() != TargetOpcode::EH_LABEL; ++I) ;
68
69             // Insert ld.
70             ++I;
71             DebugLoc dl = I != MBB.end() ? I->getDebugLoc() : DebugLoc();
72             BuildMI(MBB, I, dl, TII->get(Cpu0::LD), Cpu0::GP).addFrameIndex(FI)
73                                         .addImm(0);
74             Changed = true;
75         }
76
77         while (I != MFI->end()) {
78             if (I->getOpcode() != Cpu0::JALR) {
79                 ++I;
80                 continue;
81             }
82
83             DebugLoc dl = I->getDebugLoc();
84             // emit lw $gp, ($gp save slot on stack) after jalr
85             BuildMI(MBB, ++I, dl, TII->get(Cpu0::LD), Cpu0::GP).addFrameIndex(FI)
86                                         .addImm(0);
87             Changed = true;
88         }
89     }
90
91     return Changed;
92 }

```

```

93
94  /// createCpu0EmitGPRestorePass - Returns a pass that emits instructions that
95  /// restores $gp clobbered by jalr instructions.
96  FunctionPass *llvm::createCpu0EmitGPRestorePass(Cpu0TargetMachine &tm) {
97      return new Inserter(tm);
98  }

```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0AsmPrinter.cpp

```

void Cpu0AsmPrinter::EmitInstrWithMacroNoAT(const MachineInstr *MI) {
    MCInst TmpInst;

    MCInstLowering.Lower(MI, TmpInst);
    OutStreamer.EmitRawText(StringRef("\t.set\tmacro"));
    if (Cpu0FI->getEmitNOAT())
        OutStreamer.EmitRawText(StringRef("\t.set\tat"));
    OutStreamer.EmitInstruction(TmpInst);
    if (Cpu0FI->getEmitNOAT())
        OutStreamer.EmitRawText(StringRef("\t.set\tnoat"));
    OutStreamer.EmitRawText(StringRef("\t.set\tnomacro"));
}

...
void Cpu0AsmPrinter::EmitInstruction(const MachineInstr *MI) {
    ...
    unsigned Opc = MI->getOpcode();
    MCInst TmpInst0;
    SmallVector<MCInst, 4> MCInsts;

    switch (Opc) {
        case Cpu0::CPRESTORE: {
            const MachineOperand &MO = MI->getOperand(0);
            assert(MO.isImm() && "CPRESTORE's operand must be an immediate.");
            int64_t Offset = MO.getImm();

            if (OutStreamer.hasRawTextSupport()) {
                if (!isInt<16>(Offset)) {
                    EmitInstrWithMacroNoAT(MI);
                    return;
                }
            } else {
                MCInstLowering.LowerCPRESTORE(Offset, MCInsts);

                for (SmallVector<MCInst, 4>::iterator I = MCInsts.begin();
                      I != MCInsts.end(); ++I)
                    OutStreamer.EmitInstruction(*I);

                return;
            }

            break;
        }
        default:
            break;
    }

    MCInstLowering.Lower(MI, TmpInst0);
    OutStreamer.EmitInstruction(TmpInst0);
}

```

```

}
```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0MCInstLower.cpp

```

1  }
2
3 // Lower ".cprestore offset" to "st $gp, offset($sp)".
4 void Cpu0MCInstLower::LowerCPRESTORE(int64_t Offset,
5                                     SmallVector<MCInst, 4>& MCInsts) {
6     assert(isInt<32>(Offset) && (Offset >= 0) &&
7           "Imm operand of .cprestore must be a non-negative 32-bit value.");
8
9     MCOperand SPReg = MCOperand::CreateReg(Cpu0::SP), BaseReg = SPReg;
10    MCOperand GPReg = MCOperand::CreateReg(Cpu0::GP);
11    MCOperand ZEROReg = MCOperand::CreateReg(Cpu0::ZERO);
12
13    if (!isInt<16>(Offset)) {
14        unsigned Hi = ((Offset + 0x8000) >> 16) & 0xffff;
15        Offset &= 0xffff;
16        MCOperand ATReg = MCOperand::CreateReg(Cpu0::AT);
17        BaseReg = ATReg;
18
19        // addiu    at,zero,hi
20        // shl      at,at,16
21        // add      at,at,sp
22        MCInsts.resize(3);
23        CreateMCInst(MCInsts[0], Cpu0::ADDIU, ATReg, ZEROReg, MCOperand::CreateImm(Hi));
24        CreateMCInst(MCInsts[1], Cpu0::SHL, ATReg, ATReg, MCOperand::CreateImm(16));
25        CreateMCInst(MCInsts[2], Cpu0::ADD, ATReg, ATReg, SPReg);
26    }
27
28    MCInst St;
29    CreateMCInst(St, Cpu0::ST, GPReg, BaseReg, MCOperand::CreateImm(Offset));
30    MCInsts.push_back(St);
31}

```

The added code of Cpu0AsmPrinter.cpp as above will call the LowerCPRESTORE() when user run with llc -filetype=obj. The added code of Cpu0MCInstLower.cpp as above take care the .cprestore machine instructions.

```

118-165-76-131:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=
obj ch8_1.bc -o ch8_1.cpu0.o
118-165-76-131:InputFiles Jonathan$ hexdump ch8_2.cpu0.o
...
// .cprestore machine instruction " 01 ad 00 18"
00000d0 01 ad 00 18 09 20 00 00 01 2d 00 40 09 20 00 06
...
118-165-67-25:InputFiles Jonathan$ cat ch8_1.cpu0.s
...
.ent _Z5sum_iiiiiii          # @_Z5sum_iiiiiii
_Z5sum_iiiiiii:
...
.cupload $t9 // assign $gp = $t9 by loader when loader load re-entry function
               // (shared library) of _Z5sum_iiiiiii
.set nomacro

```

```
# BB#0:
...
.ent  main           # @main
...
.cprestore 24 // save $gp to 24($sp)
...
```

Run llc -static will call jsub instruction instead of jalr as follows,

```
118-165-76-131:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=
asm ch8_1.bc -o ch8_1.cpu0.s
118-165-76-131:InputFiles Jonathan$ cat ch8_1.cpu0.s
...
    jsub  _Z5sum_iiiiiii
...
```

Run with llc -filetype=obj, you can find the Cx of “**jsub Cx**” is 0 since the Cx is calculated by linker as below. Mips has the same 0 in it’s jal instruction. The ch8\_1\_2.cpp, ch8\_1\_3.cpp and ch8\_1\_4.cpp are example code more for test.

```
// jsub _Z5sum_iiiiiii translate into 2B 00 00 00
00F0: 2B 00 00 00 01 2D 00 34 00 ED 00 3C 09 DD 00 40
```

#### 8.4.4 Correct the return of main()

The LowerReturn() modified in Chapter8\_3/ as follows,

##### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0ISelLowering.cpp

```
1 //=====
2
3 SDValue
4 Cpu0TargetLowering::LowerReturn(SDValue Chain,
5                                 CallingConv::ID CallConv, bool isVarArg,
6                                 const SmallVectorImpl<ISD::OutputArg> &Outs,
7                                 const SmallVectorImpl<SDValue> &OutVals,
8                                 DebugLoc dl, SelectionDAG &DAG) const {
9
10 // CCValAssign - represent the assignment of
11 // the return value to a location
12 SmallVector<CCValAssign, 16> RVLocs;
13
14 // CCState - Info about the registers and stack slot.
15 CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
16                 getTargetMachine(), RVLocs, *DAG.getContext());
17
18 // Analyze return values.
19 CCInfo.AnalyzeReturn(Outs, RetCC_Cpu0);
20
21 SDValue Flag;
22 SmallVector<SDValue, 4> RetOps(1, Chain);
23
24 // Copy the result values into the output registers.
25 for (unsigned i = 0; i != RVLocs.size(); ++i) {
26     CCValAssign &VA = RVLocs[i];
```

```

27     assert(VA.isRegLoc() && "Can only return in registers!");
28
29     Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), OutVals[i], Flag);
30
31     // Guarantee that all emitted copies are stuck together with flags.
32     Flag = Chain.getValue(1);
33     RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
34 }
35
36 RetOps[0] = Chain; // Update chain.
37
38 // Add the flag if we have it.
39 if (Flag.getNode())
40     RetOps.push_back(Flag);
41
42 // Return on Cpu0 is always a "ret $lr"
43 return DAG.getNode(Cpu0ISD::Ret, dl, MVT::Other, &RetOps[0], RetOps.size());
44 }
45
46 bool

```

#### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0InstrInfo.h

```

1  virtual bool expandPostRAPpseudo(MachineBasicBlock::iterator MI) const;
2
3 private:
4     void ExpandRetLR(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
5                        unsigned Opc) const;

```

#### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0InstrInfo.cpp

```

1  /// Expand Pseudo instructions into real backend instructions
2  bool Cpu0InstrInfo::expandPostRAPpseudo(MachineBasicBlock::iterator MI) const {
3      MachineBasicBlock &MBB = *MI->getParent();
4
5      switch(MI->getDesc().getOpcode()) {
6      default:
7          return false;
8      case Cpu0::RetLR:
9          ExpandRetLR(MBB, MI, Cpu0::RET);
10         break;
11     }
12
13     MBB.erase(MI);
14     return true;
15 }
16
17 void Cpu0InstrInfo::ExpandRetLR(MachineBasicBlock &MBB,
18                                 MachineBasicBlock::iterator I,
19                                 unsigned Opc) const {
20     BuildMI(MBB, I, I->getDebugLoc(), get(Opc)).addReg(Cpu0::LR);
21 }

```

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0InstrInfo.td

```
// Return
def Cpu0Ret : SDNode<"Cpu0ISD::Ret", SDTNone,
               [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
...
let isBranch=1, isTerminator=1, isBarrier=1, imm16=0, hasDelaySlot = 1,
    isIndirectBranch = 1 in
class JumpFR<bits<8> op, string instr_asm, RegisterClass RC>:
    FL<op, (outs), (ins RC:$ra),
        !strconcat(instr_asm, "\t$ra"), [(brind RC:$ra)], IIBranch> {
    let rb = 0;
    let imm16 = 0;
}
// Return instruction
class RetBase<RegisterClass RC>: JumpFR<0x2C, "ret", RC> {
    let isReturn = 1;
    let isCodeGenOnly = 1;
    let hasCtrlDep = 1;
    let hasExtraSrcRegAllocReq = 1;
}
...
let isReturn=1, isTerminator=1, hasDelaySlot=1, isCodeGenOnly=1,
    isBarrier=1, hasCtrlDep=1, addr=0 in
def RetLR : Cpu0Pseudo<(outs), (ins), "", [(Cpu0Ret)]>;
def RET      : RetBase<CPUREgs>;

```

Above code do the following:

1. Declare a pseudo node by the following code,

### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0InstrInfo.td

```
// Return
def Cpu0Ret : SDNode<"Cpu0ISD::Ret", SDTNone,
               [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
...
let isReturn=1, isTerminator=1, hasDelaySlot=1, isCodeGenOnly=1,
    isBarrier=1, hasCtrlDep=1, addr=0 in
def RetLR : Cpu0Pseudo<(outs), (ins), "", [(Cpu0Ret)]>;

```

2. Create Cpu0ISD::Ret node in LowerReturn() which is called when meet function return as above code in Chapter8\_3/Cpu0ISelLowering.cpp. More specific, it create DAGs (Cpu0ISD::Ret (CopyToReg %X, %V0, %Y), %V0, Flag). Since the the V0 register is assigned in CopyToReg and Cpu0ISD::Ret use V0, the CopyToReg with V0 register will live out and won't be removed in any later optimization step. Remember, if use "return DAG.getNode(Cpu0ISD::Ret, dl, MVT::Other, Chain, DAG.getRegister(Cpu0::LR, MVT::i32));" instead of "return DAG.getNode (Cpu0ISD::Ret, dl, MVT::Other, &RetOps[0], RetOps.size());" the V0 register won't be live out, the previous DAG (CopyToReg %X, %V0, %Y) will be removed in later optimization stage. Then the result is same with Chapter8\_2 which the return value is error.

```
Initial selection DAG: BB#0 'main:entry'
SelectionDAG has 21 nodes:
...
0x1e1e50: i32 = Register %V0

0x1e9fd20: <multiple use>
```

```

0x1e9fd20: <multiple use>
0x1ea1c50: i32 = FrameIndex<2> [ORD=7]

0x1e9f120: <multiple use>
0x1ea1d50: ch = store 0x1e9fd20:1, 0x1e9fd20, 0x1ea1c50,
0x1e9f120<ST4[%i]> [ORD=7]

0x1ea1e50: <multiple use>
0x1e9ef20: <multiple use>
0x1ea1f50: ch,glue = CopyToReg 0x1ea1d50, 0x1ea1e50, 0x1e9ef20

0x1ea1f50: <multiple use>
0x1ea1e50: <multiple use>
0x1ea1f50: <multiple use>
0x1ea2050: ch = Cpu0ISD::Ret 0x1ea1f50, 0x1ea1e50, 0x1ea1f50:1

```

3. After instruction selection, the Cpu0::Ret is replaced by Cpu0::RetLR as below. This effect came from “def RetLR” as step 1.

```

===== Instruction selection begins: BB#0 'entry'
Selecting: 0x1ea4050: ch = Cpu0ISD::Ret 0x1ea3f50, 0x1ea3e50,
0x1ea3f50:1 [ID=27]

ISEL: Starting pattern match on root node: 0x1ea4050: ch = Cpu0ISD::Ret
0x1ea3f50, 0x1ea3e50, 0x1ea3f50:1 [ID=27]

Morphed node: 0x1ea4050: ch = RetLR 0x1ea3e50, 0x1ea3f50, 0x1ea3f50:1
...
ISEL: Match complete!
=> 0x1ea4050: ch = RetLR 0x1ea3e50, 0x1ea3f50, 0x1ea3f50:1
...
===== Instruction selection ends:
Selected selection DAG: BB#0 'main:entry'
SelectionDAG has 28 nodes:
...
0x1ea3e50: <multiple use>
0x1ea3f50: <multiple use>
0x1ea3f50: <multiple use>
0x1ea4050: ch = RetLR 0x1ea3e50, 0x1ea3f50, 0x1ea3f50:1

```

4. Expand the Cpu0::RetLR into instruction **ret \$lr** in “Post-RA pseudo instruction expansion pass” stage by the code in Chapter8\_3/Cpu0InstrInfo.cpp as above. This stage is after the register allocation, so we can replace the V0 (\$r2) by LR (\$lr) without any side effect.
5. Print assembly or obj according the information (those \*.inc generated by TableGen from \*.td) generated by the following code at “Cpu0 Assembly Printer” stage.

#### LLVMBackendTutorialExampleCode/Chapter8\_3/Cpu0InstrInfo.td

```

class JumpFR<bits<8> op, string instr_asm, RegisterClass RC>:
  FL<op, (outs), (ins RC:$ra),
  !strconcat(instr_asm, "\t$ra"), [(brind RC:$ra)], IIBranch> {
  let rb = 0;
  let imm16 = 0;
}
// Return instruction
class RetBase<RegisterClass RC>: JumpFR<0x2C, "ret", RC> {

```

```

let isReturn = 1;
let isCodeGenOnly = 1;
let hasCtrlDep = 1;
let hasExtraSrcRegAllocReq = 1;
}
...
def RET      : RetBase<CPUREgs>;

```

List the stages mentioned in Chapter 3 and sub-stages in Chapter 4 again as below. Step 2 as above is before “CPU0 DAG->DAG Pattern Instruction Selection” stage, step 3 is in “Instruction selection” stage, step 4 is in “Expand ISel Pseudo-instructions” stage and step 5 is “Cpu0 Assembly Printer” stage.

```

118-165-79-200:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=asm ch6_2.bc
-debug-pass=Structure -o -
...
Machine Branch Probability Analysis
ModulePass Manager
FunctionPass Manager
...
CPU0 DAG->DAG Pattern Instruction Selection
  Initial selection DAG
  Optimized lowered selection DAG
  Type-legalized selection DAG
  Optimized type-legalized selection DAG
  Legalized selection DAG
  Optimized legalized selection DAG
  Instruction selection
  Selected selection DAG
  Scheduling
...
Greedy Register Allocator
...
Post-RA pseudo instruction expansion pass
...
Cpu0 Assembly Printer

```

Summary to Table: Correct the return value in each stage.

Table 8.2: Correct the return value in each stage

| Stage                                              | Function                             |
|----------------------------------------------------|--------------------------------------|
| Write Code                                         | Declare a pseudo node Cpu0::Ret      |
| Before CPU0 DAG->DAG Pattern Instruction Selection | Create Cpu0ISD::Ret DAG              |
| Instruction selection                              | Cpu0::Ret is replaced by Cpu0::RetLR |
| Post-RA pseudo instruction expansion pass          | Cpu0::RetLR -> ret \$lr              |
| Cpu0 Assembly Printer                              | Print according “def RET”            |

Run Chapter8\_3/ to get the correct result (return register \$2 is 0) as follows,

```

118-165-78-31:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=asm ch6_2.bc -o
ch6_2.cpu0.static.s
118-165-78-31:InputFiles Jonathan$ cat ch6_2.cpu0.static.s
.section .mdebug.abi32
.previous
.file "ch6_2.bc"
.text

```

```

.globl main
.align 2
.type main,@function
.ent main                                # @main
main:
.cfi_startproc
.frame $sp,16,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -16
$tmp1:
.cfi_def_cfa_offset 16
addiu $2, $zero, 0
st $2, 12($sp)
addiu $3, $zero, %hi(date)
shl $3, $3, 16
addiu $3, $3, %lo(date)
ld $3, 8($3)
st $3, 8($sp)
addiu $3, $zero, %hi(a)
shl $3, $3, 16
addiu $3, $3, %lo(a)
ld $3, 4($3)
st $3, 4($sp)
addiu $sp, $sp, 16
ret $lr
.set macro
.set reorder
.end main
$tmp2:
.size main, ($tmp2)-main
.cfi_endproc

.type date,@object                         # @date
.data
.globl date
.align 2
date:
.4byte 2012                                # 0x7dc
.4byte 10                                    # 0xa
.4byte 12                                    # 0xc
.size date, 12

.type a,@object                            # @a
.globl a
.align 2
a:
.4byte 2012                                # 0x7dc
.4byte 10                                    # 0xa
.4byte 12                                    # 0xc
.size a, 12

```

## 8.5 Support features

This section support features of struct type, variable number of arguments and dynamic stack allocation.

Run Chapter8\_3 with ch8\_2\_1.cpp will get the error message as follows,

**LLVMBackendTutorialExampleCode/InputFiles/ch8\_2\_1.cpp**

```
1  struct Date
2  {
3      int year;
4      int month;
5      int day;
6      int hour;
7      int minute;
8      int second;
9  };
10 Date gDate = {2012, 10, 12, 1, 2, 3};
11
12 struct Time
13 {
14     int hour;
15     int minute;
16     int second;
17 };
18 Time gTime = {2, 20, 30};
19
20 Date getDate()
21 {
22     return gDate;
23 }
24
25 Date copyDate(Date date)
26 {
27     return date;
28 }
29
30 Date copyDate(Date* date)
31 {
32     return *date;
33 }
34
35 Time copyTime(Time time)
36 {
37     return time;
38 }
39
40 Time copyTime(Time* time)
41 {
42     return *time;
43 }
44
45 int main()
46 {
47     Time time1 = {1, 10, 12};
48     Date date1 = getDate();
49     Date date2 = copyDate(date1);
```

```

50     Date date3 = copyDate(&date1);
51     Time time2 = copyTime(time1);
52     Time time3 = copyTime(&time1);
53
54     return 0;
55 }

JonathanMac:InputFiles Jonathan$ clang -c ch8_2_1.cpp -emit-llvm -o
ch8_2_1.bc
JonathanMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch8_2_1.bc -o ch8_2_1.cpu0.s
...
Assertion failed: (InVals.size() == Ins.size() && "LowerFormalArguments didn't
emit the correct number of values!"), function LowerArguments, file /Users/
Jonathan/llvm/test/src/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp,
line 6712.
...

```

Run Chapter8\_3/ with ch8\_3.cpp to get the following error,

#### LLVMBackendTutorialExampleCode/InputFiles/ch8\_3.cpp

```

1  ##include <stdio.h>
2  #include <stdarg.h>
3
4  int sum_i(int amount, ...)
5  {
6      int i = 0;
7      int val = 0;
8      int sum = 0;
9
10     va_list vl;
11     va_start(vl, amount);
12     for (i = 0; i < amount; i++)
13     {
14         val = va_arg(vl, int);
15         sum += val;
16     }
17     va_end(vl);
18
19     return sum;
20 }
21
22 int main()
23 {
24     int a = sum_i(6, 0, 1, 2, 3, 4, 5);
25 // printf("a = %d\n", a);
26
27     return a;
28 }


```

```

118-165-78-230:InputFiles Jonathan$ clang -target `llvm-config --host-target` -c
ch8_3.cpp -emit-llvm -o ch8_3.bc
118-165-78-230:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch8_3.bc -o
ch8_3.cpu0.s

```

```
LLVM ERROR: Cannot select: 0x7f8b6902fd10: ch = vastart 0x7f8b6902fa10,  
0x7f8b6902fb10, 0x7f8b6902fc10 [ORD=9] [ID=22]  
 0x7f8b6902fb10: i32 = FrameIndex<5> [ORD=7] [ID=9]  
In function: _Z5sum_iiz
```

### LLVMBackendTutorialExampleCode/InputFiles/ch8\_4.cpp

```
1  #include <alloca.h>  
2  
3  int sum(int x1, int x2, int x3, int x4, int x5, int x6)  
4  {  
5      int sum = x1 + x2 + x3 + x4 + x5 + x6;  
6  
7      return sum;  
8  }  
9  
10 int weight_sum(int x1, int x2, int x3, int x4, int x5, int x6)  
11 {  
12     int *b = (int*)alloca(sizeof(int) * x1);  
13     *b = 1111;  
14     int weight = sum(6*x1, x2, x3, x4, 2*x5, x6);  
15  
16     return weight;  
17 }  
18  
19 int main()  
20 {  
21     int a = weight_sum(1, 2, 3, 4, 5, 6);  
22  
23     return a;  
24 }
```

Run Chapter8\_3 with ch8\_4.cpp will get the following error.

```
118-165-72-242:InputFiles Jonathan$ clang -I/Applications/Xcode.app/Contents/  
Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/  
-c ch8_4.cpp -emit-llvm -o ch8_4.bc  
118-165-72-242:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/  
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch8_4.bc -o  
ch8_4.cpu0.s  
LLVM ERROR: Cannot select: 0x7ffd8b02ff10: i32, ch = dynamic_stackalloc  
0x7ffd8b02f910:1, 0x7ffd8b02fe10, 0x7ffd8b02c010 [ORD=12] [ID=48]  
 0x7ffd8b02fe10: i32 = and 0x7ffd8b02fc10, 0x7ffd8b02fd10 [ORD=12] [ID=47]  
 0x7ffd8b02fc10: i32 = add 0x7ffd8b02fa10, 0x7ffd8b02fb10 [ORD=12] [ID=46]  
 0x7ffd8b02fa10: i32 = shl 0x7ffd8b02f910, 0x7ffd8b02f510 [ID=45]  
 0x7ffd8b02f910: i32, ch = load 0x7ffd8b02ee10, 0x7ffd8b02e310,  
 0x7ffd8b02b310<LD4[%1]> [ID=44]  
 0x7ffd8b02e310: i32 = FrameIndex<1> [ORD=3] [ID=10]  
 0x7ffd8b02b310: i32 = undef [ORD=1] [ID=2]  
 0x7ffd8b02f510: i32 = Constant<2> [ID=25]  
 0x7ffd8b02fb10: i32 = Constant<7> [ORD=12] [ID=16]  
 0x7ffd8b02fd10: i32 = Constant<-8> [ORD=12] [ID=17]  
 0x7ffd8b02c010: i32 = Constant<0> [ORD=12] [ID=8]  
In function: _Z5sum_iiz
```

### 8.5.1 Structure type support

Chapter8\_4/ with the following code added to support the structure type in function call.

#### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0ISelLowering.cpp

```

// AddLiveIn - This helper function adds the specified physical register to the
// MachineFunction as a live in value. It also creates a corresponding
// virtual register for it.
static unsigned
AddLiveIn(MachineFunction &MF, unsigned PReg, const TargetRegisterClass *RC)
{
    assert(RC->contains(PReg) && "Not the correct regclass!");
    unsigned VReg = MF.getRegInfo().createVirtualRegister(RC);
    MF.getRegInfo().addLiveIn(PReg, VReg);
    return VReg;
}
...
//=====
//           Call Calling Convention Implementation
//=====

static const unsigned IntRegsSize = 2;

static const uint16_t IntRegs[] = {
    Cpu0::A0, Cpu0::A1
};

// Write ByVal Arg to arg registers and stack.
static void
WriteByValArg(SDValue& ByValChain, SDValue Chain, DebugLoc dl,
              SmallVector<std::pair<unsigned, SDValue>, 16>& RegsToPass,
              SmallVector<SDValue, 8>& MemOpChains, int& LastFI,
              MachineFrameInfo *MFI, SelectionDAG &DAG, SDValue Arg,
              const CCValAssign &VA, const ISD::ArgFlagsTy& Flags,
              MVT PtrType, bool isLittle) {
    unsigned LocMemOffset = VA.getLocMemOffset();
    unsigned Offset = 0;
    uint32_t RemainingSize = Flags.getByValSize();
    unsigned ByValAlign = Flags.getByValAlign();

    if (RemainingSize == 0)
        return;

    // Create a fixed object on stack at offset LocMemOffset and copy
    // remaining part of byval arg to it using memcpy.
    SDValue Src = DAG.getNode(ISD::ADD, dl, MVT::i32, Arg,
                              DAG.getConstant(Offset, MVT::i32));
    LastFI = MFI->CreateFixedObject(RemainingSize, LocMemOffset, true);
    SDValue Dst = DAG.getFrameIndex(LastFI, PtrType);
    ByValChain = DAG.getMemcpy(ByValChain, dl, Dst, Src,
                               DAG.getConstant(RemainingSize, MVT::i32),
                               std::min(BindViewAlign, (unsigned)4),
                               /*isVolatile=*/false, /*AlwaysInline=*/false,
                               MachinePointerInfo(0), MachinePointerInfo(0));
}
...

```

```

SDValue
Cpu0TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
                             SmallVectorImpl<SDValue> &InVals) const {
    ...
    // Walk the register/memloc assignments, inserting copies/loads.
    for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
        ...
        // ByVal Arg.
        if (Flags.isByVal()) {
            ...
            WriteByValArg( ByValChain, Chain, dl, RegsToPass, MemOpChains, LastFI,
                           MFI, DAG, Arg, VA, Flags, getPointerTy(),
                           Subtarget->isLittle());
            ...
        }
        ...
    }
    ...
}

//=====
//      Formal Arguments Calling Convention Implementation
//=====

static void ReadByValArg(MachineFunction &MF, SDValue Chain, DebugLoc dl,
                           std::vector<SDValue> &OutChains,
                           SelectionDAG &DAG, unsigned NumWords, SDValue FIN,
                           const CCValAssign &VA, const ISD::ArgFlagsTy &Flags,
                           const Argument *FuncArg) {
    unsigned LocMem = VA.getLocMemOffset();
    unsigned FirstWord = LocMem / 4;

    // copy register A0 - A1 to frame object
    for (unsigned i = 0; i < NumWords; ++i) {
        unsigned CurWord = FirstWord + i;
        if (CurWord >= IntRegsSize)
            break;

        unsigned SrcReg = IntRegs[CurWord];
        unsigned Reg = AddLiveIn(MF, SrcReg, &Cpu0::CPURegsRegClass);
        SDValue StorePtr = DAG.getNode(ISD::ADD, dl, MVT::i32, FIN,
                                       DAG.getConstant(i * 4, MVT::i32));
        SDValue Store = DAG.getStore(Chain, dl, DAG.getRegister(Reg, MVT::i32),
                                     StorePtr, MachinePointerInfo(FuncArg, i * 4),
                                     false, false, 0);
        OutChains.push_back(Store);
    }
}

SDValue
Cpu0TargetLowering::LowerFormalArguments(SDValue Chain,
                                         CallingConv::ID CallConv,
                                         bool isVarArg,
                                         const SmallVectorImpl<ISD::InputArg> &Ins,
                                         DebugLoc dl, SelectionDAG &DAG,
                                         SmallVectorImpl<SDValue> &InVals)
                                         const {
    ...
    for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i, ++FuncArg) {

```

```

...
if (Flags.isByVal()) {
    assert(Flags.getByValSize() &&
        "ByVal args of size 0 should have been ignored by front-end.");
    unsigned NumWords = (Flags.getByValSize() + 3) / 4;
    LastFI = MFI->CreateFixedObject(NumWords * 4, VA.getLocMemOffset(),
        true);
    SDValue FIN = DAG.getFrameIndex(LastFI, getPointerTy());
    InVals.push_back(FIN);
    ReadByValArg(MF, Chain, dl, OutChains, DAG, NumWords, FIN, VA, Flags,
        &*FuncArg);
    continue;
}
...
}
// The cpu0 ABIs for returning structs by value requires that we copy
// the sret argument into $v0 for the return. Save the argument into
// a virtual register so that we can access it from the return points.
if (DAG.getMachineFunction().getFunction()->hasStructRetAttr()) {
    unsigned Reg = Cpu0FI->getSRetReturnReg();
    if (!Reg) {
        Reg = MF.getRegInfo().createVirtualRegister(getRegClassFor(MVT::i32));
        Cpu0FI->setSRetReturnReg(Reg);
    }
    SDValue Copy = DAG.getCopyToReg(DAG.getEntryNode(), dl, Reg, InVals[0]);
    Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Copy, Chain);
}
...
}
...
SDValue
Cpu0TargetLowering::LowerReturn(SDValue Chain,
    CallingConv::ID CallConv, bool isVarArg,
    const SmallVectorImpl<ISD::OutputArg> &Outs,
    const SmallVectorImpl<SDValue> &OutVals,
    DebugLoc dl, SelectionDAG &DAG) const {
...
// The cpu0 ABIs for returning structs by value requires that we copy
// the sret argument into $v0 for the return. We saved the argument into
// a virtual register in the entry block, so now we copy the value out
// and into $v0.
if (DAG.getMachineFunction().getFunction()->hasStructRetAttr()) {
    MachineFunction &MF = DAG.getMachineFunction();
    Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
    unsigned Reg = Cpu0FI->getSRetReturnReg();

    if (!Reg)
        llvm_unreachable("sret virtual register not created in the entry block");
    SDValue Val = DAG.getCopyFromReg(Chain, dl, Reg, getPointerTy());

    Chain = DAG.getCopyToReg(Chain, dl, Cpu0::V0, Val, Flag);
    Flag = Chain.getValue(1);
    RetOps.push_back(DAG.getRegister(Cpu0::V0, getPointerTy()));
}
...
}

```

In addition to above code, we have defined the calling convention at early of this chapter as follows,

### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0CallingConv.td

```
def RetCC_Cpu0EABI : CallingConv<[  
    // i32 are returned in registers V0, V1, A0, A1  
    CCIfType<[i32], CCAssignToReg<[V0, V1, A0, A1]>>  
]>;
```

It meaning for the return value, we keep it in registers V0, V1, A0, A1 if the return value didn't over 4 registers size; If it over 4 size, cpu0 will save them with pointer. For explanation, let's run Chapter8\_4/ with ch8\_2\_1.cpp and explain with this example.

```
JonathantekiiMac:InputFiles Jonathan$ cat ch8_2_1.cpu0.s  
.section .mdebug.abi32  
.previous  
.file "ch8_2_1.bc"  
.text  
.globl _Z7getDatev  
.align 2  
.type _Z7getDatev,@function  
.ent _Z7getDatev # @_Z7getDatev  
  
_Z7getDatev:  
.cfi_startproc  
.frame $sp,0,$lr  
.mask 0x00000000,0  
.set noreorder  
.cupload $t9  
.set nomacro  
  
# BB#0:  
ld $2, 0($sp) // $2 is 192($sp)  
ld $3, %got(gDate)($gp) // $3 is &gDate  
ld $4, 20($3) // save gDate contents to 212..192($sp)  
st $4, 20($2)  
ld $4, 16($3)  
st $4, 16($2)  
ld $4, 12($3)  
st $4, 12($2)  
ld $4, 8($3)  
st $4, 8($2)  
ld $4, 4($3)  
st $4, 4($2)  
ld $3, 0($3)  
st $3, 0($2)  
ret $lr  
.set macro  
.set reorder  
.end _Z7getDatev  
  
$tmp0:  
.size _Z7getDatev, ($tmp0)-_Z7getDatev  
.cfi_endproc  
  
.globl _Z8copyDate4Date  
.align 2  
.type _Z8copyDate4Date,@function  
.ent _Z8copyDate4Date # @_Z8copyDate4Date  
  
_Z8copyDate4Date:  
.cfi_startproc  
.frame $sp,0,$lr  
.mask 0x00000000,0
```

```

.set    noreorder
.set    nomacro
# BB#0:
    st  $5, 4($sp)
    ld  $2, 0($sp)          // $2 = 168($sp)
    ld  $3, 24($sp)
    st  $3, 20($2)          // copy date1, 24..4($sp), to date2,
    ld  $3, 20($sp)          // 188..168($sp)
    st  $3, 16($2)
    ld  $3, 16($sp)
    st  $3, 12($2)
    ld  $3, 12($sp)
    st  $3, 8($2)
    ld  $3, 8($sp)
    st  $3, 4($2)
    ld  $3, 4($sp)
    st  $3, 0($2)
    ret $lr
.set    macro
.set    reorder
.end  _Z8copyDate4Date
$tmp1:
.size _Z8copyDate4Date, ($tmp1)-_Z8copyDate4Date
.cfi_endproc

.globl _Z8copyDateP4Date
.align 2
.type _Z8copyDateP4Date, @function
.ent _Z8copyDateP4Date      # @_Z8copyDateP4Date
_Z8copyDateP4Date:
.cfi_startproc
.frame $sp,8,$lr
.mask 0x00000000,0
.set    noreorder
.set    nomacro
# BB#0:
    addiu $sp, $sp, -8
$tmp3:
.cfi_def_cfa_offset 8
    ld  $2, 8($sp)          // $2 = 120($sp of main) date2
    ld  $3, 12($sp)          // $3 = 192($sp of main) date1
    st  $3, 0($sp)
    ld  $4, 20($3)          // copy date1, 212..192($sp of main),
    st  $4, 20($2)          // to date2, 140..120($sp of main)
    ld  $4, 16($3)
    st  $4, 16($2)
    ld  $4, 12($3)
    st  $4, 12($2)
    ld  $4, 8($3)
    st  $4, 8($2)
    ld  $4, 4($3)
    st  $4, 4($2)
    ld  $3, 0($3)
    st  $3, 0($2)
    addiu $sp, $sp, 8
    ret $lr
.set    macro
.set    reorder

```

```

.end _Z8copyDateP4Date
$tmp4:
.size _Z8copyDateP4Date, ($tmp4)-_Z8copyDateP4Date
.cfi_endproc

.globl _Z8copyTime4Time
.align 2
.type _Z8copyTime4Time, @function
.ent _Z8copyTime4Time      # @_Z8copyTime4Time
_Z8copyTime4Time:
.cfi_startproc
.frame $sp,64,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -64
$tmp6:
.cfi_def_cfa_offset 64
ld $2, 68($sp)      // save 8..0 ($sp of main) to 24..16($sp)
st $2, 20($sp)
ld $2, 64($sp)
st $2, 16($sp)
ld $2, 72($sp)
st $2, 24($sp)
st $2, 40($sp)      // save 8($sp of main) to 40($sp)
ld $2, 20($sp)      // timel.minute, save timel.minute and
st $2, 36($sp)      // timel.second to 36..32($sp)
ld $2, 16($sp)      // timel.second
st $2, 32($sp)
ld $2, 40($sp)      // $2 = 8($sp of main) = timel.hour
st $2, 56($sp)      // copy timel to 56..48($sp)
ld $2, 36($sp)
st $2, 52($sp)
ld $2, 32($sp)
st $2, 48($sp)
ld $2, 48($sp)      // copy timel to 8..0($sp)
ld $3, 52($sp)
ld $4, 56($sp)
st $4, 8($sp)
st $3, 4($sp)
st $2, 0($sp)
ld $2, 0($sp)        // put timel to $2, $3 and $4 ($v0, $v1 and $a0)
ld $3, 4($sp)
ld $4, 8($sp)
addiu $sp, $sp, 64
ret $lr
.set macro
.set reorder
.end _Z8copyTime4Time
$tmp7:
.size _Z8copyTime4Time, ($tmp7)-_Z8copyTime4Time
.cfi_endproc

.globl _Z8copyTimeP4Time
.align 2
.type _Z8copyTimeP4Time, @function
.ent _Z8copyTimeP4Time      # @_Z8copyTimeP4Time

```

```

_Z8copyTimeP4Time:
.cfi_startproc
.frame $sp,40,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -40
$tmp9:
.cfi_def_cfa_offset 40
ld $2, 40($sp)           // 216($sp of main)
st $2, 16($sp)
ld $3, 8($2)             // copy time1, 224..216($sp of main) to
st $3, 32($sp)           // 32..24($sp), 8..0($sp) and $2, $3, $4
ld $3, 4($2)
st $3, 28($sp)
ld $2, 0($2)
st $2, 24($sp)
ld $2, 24($sp)
ld $3, 28($sp)
ld $4, 32($sp)
st $4, 8($sp)
st $3, 4($sp)
st $2, 0($sp)
ld $2, 0($sp)
ld $3, 4($sp)
ld $4, 8($sp)
addiu $sp, $sp, 40
ret $lr
.set macro
.set reorder
.end _Z8copyTimeP4Time
$tmp10:
.size _Z8copyTimeP4Time, ($tmp10)-_Z8copyTimeP4Time
.cfi_endproc

.globl main
.align 2
.type main,@function
.ent main           # @main
main:
.cfi_startproc
.frame $sp,248,$lr
.mask 0x00004180,-4
.set noreorder
.cupload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -248
$tmp13:
.cfi_def_cfa_offset 248
st $lr, 244($sp)      # 4-byte Folded Spill
st $8, 240($sp)       # 4-byte Folded Spill
st $7, 236($sp)       # 4-byte Folded Spill
$tmp14:
.cfi_offset 14, -4
$tmp15:
.cfi_offset 8, -8

```

```
$tmp16:
.cfi_offset 7, -12
.cprestore 16
addiu $7, $zero, 0
st $7, 232($sp)
ld $2, %got($_Z4mainE5time1)($gp)
addiu $2, $2, %lo($_Z4mainE5time1)
ld $3, 8($2)      // save initial value to time1, 224..216($sp)
st $3, 224($sp)
ld $3, 4($2)
st $3, 220($sp)
ld $2, 0($2)
st $2, 216($sp)
addiu $8, $sp, 192
st $8, 0($sp)      // *0($sp) = 192($sp)
ld $t9, %call24(_Z7getDatev)($gp) // copy gDate contents to date1, 212..192($sp)
jalr $t9
ld $gp, 16($sp)
ld $2, 212($sp)    // copy 212..192($sp) to 164..144($sp)
st $2, 164($sp)
ld $2, 208($sp)
st $2, 160($sp)
ld $2, 204($sp)
st $2, 156($sp)
ld $2, 200($sp)
st $2, 152($sp)
ld $2, 196($sp)
st $2, 148($sp)
ld $2, 192($sp)
st $2, 144($sp)
ld $2, 164($sp)    // copy 164..144($sp) to 24..4($sp)
st $2, 24($sp)
ld $2, 160($sp)
st $2, 20($sp)
ld $2, 156($sp)
st $2, 16($sp)
ld $2, 152($sp)
st $2, 12($sp)
ld $2, 148($sp)
st $2, 8($sp)
ld $2, 144($sp)
st $2, 4($sp)
addiu $2, $sp, 168
st $2, 0($sp)      // *0($sp) = 168($sp)
ld $t9, %call24(_Z8copyDate4Date)($gp)
jalr $t9
ld $gp, 16($sp)
st $8, 4($sp)      // 4($sp) = 192($sp) date1
addiu $2, $sp, 120
st $2, 0($sp)      // *0($sp) = 120($sp) date2
ld $t9, %call24(_Z8copyDateP4Date)($gp)
jalr $t9
ld $gp, 16($sp)
ld $2, 224($sp)    // save time1 to arguments passing location,
st $2, 96($sp)      // 8..0($sp)
ld $2, 220($sp)
st $2, 92($sp)
ld $2, 216($sp)
```

```

st  $2, 88($sp)
ld  $2, 88($sp)
ld  $3, 92($sp)
ld  $4, 96($sp)
st  $4, 8($sp)
st  $3, 4($sp)
st  $2, 0($sp)
ld  $t9, %call24(_Z8copyTime4Time) ($gp)
jalr $t9
ld  $gp, 16($sp)
st  $3, 76($sp)      // save return value time2 from $2, $3, $4 to
st  $2, 72($sp)      // 80..72($sp) and 112..104($sp)
st  $4, 80($sp)
ld  $2, 72($sp)
ld  $3, 76($sp)
ld  $4, 80($sp)
st  $4, 112($sp)
st  $3, 108($sp)
st  $2, 104($sp)
addiu $2, $sp, 216
st  $2, 0($sp)      // *(0($sp)) = 216($sp)
ld  $t9, %call24(_Z8copyTimeP4Time) ($gp)
jalr $t9
ld  $gp, 16($sp)
st  $3, 44($sp)      // save return value time3 from $2, $3, $4 to
st  $2, 40($sp)      // 48..44($sp) 64..56($sp)
st  $4, 48($sp)
ld  $2, 40($sp)
ld  $3, 44($sp)
ld  $4, 48($sp)
st  $4, 64($sp)
st  $3, 60($sp)
st  $2, 56($sp)
add $2, $zero, $7    // return 0 by $2, ($7 is 0)

ld  $7, 236($sp)      # 4-byte Folded Reload // restore callee saved
ld  $8, 240($sp)      # 4-byte Folded Reload // registers $s0, $s1
ld  $lr, 244($sp)      # 4-byte Folded Reload // ($7, $8)
addiu $sp, $sp, 248
ret $lr
.set  macro
.set  reorder
.end  main
$tmp17:
.size main, ($tmp17)-main
.cfi_endproc

.type gDate,@object          # @gDate
.data
.globl gDate
.align 2
gDate:
        .4byte 2012          # 0x7dc
        .4byte 10             # 0xa
        .4byte 12             # 0xc
        .4byte 1              # 0x1
        .4byte 2              # 0x2
        .4byte 3              # 0x3

```

```

.size gDate, 24

.type gTime, @object          # @_gTime
.globl gTime
.align 2
gTime:
    .4byte 2                  # 0x2
    .4byte 20                 # 0x14
    .4byte 30                 # 0x1e
.size gTime, 12

.type _$ZZ4mainE5time1, @object # @_ZZ4mainE5time1
.section .rodata, "a", @progbits
.align 2
$ZZ4mainE5time1:
    .4byte 1                  # 0x1
    .4byte 10                 # 0xa
    .4byte 12                 # 0xc
.size _$ZZ4mainE5time1, 12

```

In LowerCall(), Flags.isByVal() will be true if the outgoing arguments over 4 registers size, then it will call WriteByValArg(..., getPointerTy(), ...) to save those arguments to stack as offset. For example code of ch8\_2\_1.cpp, Flags.isByVal() is true for copyDate(date1) outgoing arguments, since the date1 is type of Date which contains 6 integers (year, month, day, hour, minute, second). But Flags.isByVal() is false for copyTime(time1) since type Time is a struct contains 3 integers (hour, minute, second). So, if you mark WriteByValArg(..., getPointerTy(), ...), the result will missing the following code in caller, main(),

```

ld $2, 164($sp)      // copy 164..144($sp) to 24..4($sp)
st $2, 24($sp)
ld $2, 160($sp)
st $2, 20($sp)
ld $2, 156($sp)
st $2, 16($sp)
ld $2, 152($sp)
st $2, 12($sp)
ld $2, 148($sp)
st $2, 8($sp)
ld $2, 144($sp)
st $2, 4($sp)          // will missing the above code

addiu $2, $sp, 168
st $2, 0($sp)          // *0($sp) = 168($sp)
ld $t9, %call124(_Z8copyDate4Date) ($gp)

```

In LowerFormalArguments(), the “if (Flags.isByVal())” getting the incoming arguments which corresponding the outgoing arguments of LowerCall().

LowerFormalArguments() is called when a function is entered while LowerReturn() is called when a function is left, reference <sup>6</sup>. The former save the return register to virtual register while the later load the virtual register back to return register. Since the return value is “struct type” and over 4 registers size, it save pointer (struct address) to return register. List the code and their effect as follows,

---

<sup>6</sup> <http://developer.mips.com/clang-llvm/>

### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0ISelLowering.cpp

```

SDValue
Cpu0TargetLowering::LowerFormalArguments (SDValue Chain,
                                         CallingConv::ID CallConv,
                                         bool isVarArg,
                                         const SmallVectorImpl<ISD::InputArg> &Ins,
                                         DebugLoc dl, SelectionDAG &DAG,
                                         SmallVectorImpl<SDValue> &InVals)
                                         const {

    ...

    // The cpu0 ABIs for returning structs by value requires that we copy
    // the sret argument into $v0 for the return. Save the argument into
    // a virtual register so that we can access it from the return points.
    if (DAG.getMachineFunction().getFunction() ->hasStructRetAttr()) {
        unsigned Reg = Cpu0FI->getSRetReturnReg();
        if (!Reg) {
            Reg = MF.getRegInfo().createVirtualRegister(getRegClassFor(MVT::i32));
            Cpu0FI->setSRetReturnReg(Reg);
        }
        SDValue Copy = DAG.getCopyToReg(DAG.getEntryNode(), dl, Reg, InVals[0]);
        Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Copy, Chain);
    }
    ...
}

addiu $2, $sp, 168
st $2, 0($sp)      // *0($sp) = 168($sp); LowerFormalArguments():
// return register is $2, virtual register is
// 0($sp)
ld $t9, %call24(_Z8copyDate4Date)($gp)

```

### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0ISelLowering.cpp

```

SDValue
Cpu0TargetLowering::LowerReturn (SDValue Chain,
                                 CallingConv::ID CallConv, bool isVarArg,
                                 const SmallVectorImpl<ISD::OutputArg> &Outs,
                                 const SmallVectorImpl<SDValue> &OutVals,
                                 DebugLoc dl, SelectionDAG &DAG) const {

    ...

    // The cpu0 ABIs for returning structs by value requires that we copy
    // the sret argument into $v0 for the return. We saved the argument into
    // a virtual register in the entry block, so now we copy the value out
    // and into $v0.
    if (DAG.getMachineFunction().getFunction() ->hasStructRetAttr()) {
        MachineFunction &MF      = DAG.getMachineFunction();
        Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
        unsigned Reg = Cpu0FI->getSRetReturnReg();

        if (!Reg)
            llvm_unreachable("sret virtual register not created in the entry block");
        SDValue Val = DAG.getCopyFromReg(Chain, dl, Reg, getPointerTy());

        Chain = DAG.getCopyToReg(Chain, dl, Cpu0::V0, Val, Flag);
        Flag = Chain.getValue(1);
    }
}

```

```

        RetOps.push_back(DAG.getRegister(Cpu0::V0, getPointerTy()));
    }
    ...
}

.globl _Z8copyDateP4Date
.align 2
.type _Z8copyDateP4Date, @function
.ent _Z8copyDate4Date      # @_Z8copyDate4Date
_Z8copyDate4Date:
.cfi_startproc
.frame $sp, 0, $lr
.mask 0x00000000, 0
.set noreorder
.set nomacro
# BB#0:
st $5, 4($sp)
ld $2, 0($sp)           // $2 = 168($sp); LowerReturn(): virtual
                        // register is 0($sp), return register is $2
ld $3, 24($sp)
st $3, 20($2)           // copy date1, 24..4($sp), to date2,
ld $3, 20($sp)           // 188..168($sp)
st $3, 16($2)
ld $3, 16($sp)
st $3, 12($2)
ld $3, 12($sp)
st $3, 8($2)
ld $3, 8($sp)
st $3, 4($2)
ld $3, 4($sp)
st $3, 0($2)
ret $lr
.set macro
.set reorder
.end _Z8copyDate4Date

```

The ch8\_2\_2.cpp include C++ class “Date” implementation. It can be translated into cpu0 backend too since the front end (clang in this example) translate them into C language form. You can also mark the “hasStructRetAttr() if” part from both of above functions, the output cpu0 code will use \$3 instead of \$2 as return register as follows,

```

.globl _Z8copyDateP4Date
.align 2
.type _Z8copyDateP4Date, @function
.ent _Z8copyDateP4Date      # @_Z8copyDateP4Date
_Z8copyDateP4Date:
.cfi_startproc
.frame $sp, 8, $lr
.mask 0x00000000, 0
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -8
$tmp3:
.cfi_def_cfa_offset 8
ld $2, 12($sp)
st $2, 0($sp)
ld $4, 20($2)
ld $3, 8($sp)

```

```

st  $4, 20($3)
ld  $4, 16($2)
st  $4, 16($3)
ld  $4, 12($2)
st  $4, 12($3)
ld  $4, 8($2)
st  $4, 8($3)
ld  $4, 4($2)
st  $4, 4($3)
ld  $2, 0($2)
st  $2, 0($3)
addiu $sp, $sp, 8
ret $lr
.set  macro
.set  reorder
.end  _Z8copyDateP4Date

```

## 8.5.2 Variable number of arguments

Until now, we support fixed number of arguments in formal function definition (Incoming Arguments). This section support variable number of arguments since C language support this feature.

Run Chapter8\_4/ with ch8\_3.cpp as well as clang option, `clang -target 'llvm-config --host-target'`, to get the following result,

```

118-165-76-131:InputFiles Jonathan$ clang -target 'llvm-config --host-target' -c
ch8_3.cpp -emit-llvm -o ch8_3.bc
118-165-76-131:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm
ch8_3.bc -o ch8_3.cpu0.s
118-165-76-131:InputFiles Jonathan$ cat ch8_3.cpu0.s
.section .mdebug.abi32
.previous
.file "ch8_3.bc"
.text
.globl _Z5sum_iiz
.align 2
.type _Z5sum_iiz,@function
.ent _Z5sum_iiz          # @_Z5sum_iiz
_Z5sum_iiz:
.frame $sp,24,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
addiu $sp, $sp, -24
ld $2, 24($sp)      // amount
st $2, 20($sp)      // amount
addiu $2, $zero, 0
st $2, 16($sp)      // i
st $2, 12($sp)      // val
st $2, 8($sp)       // sum
addiu $3, $sp, 28
st $3, 4($sp)       // arg_ptr = 2nd argument = &arg[1],
                     // since &arg[0] = 24($sp)
st $2, 16($sp)
$BB0_1:             # =>This Inner Loop Header: Depth=1

```

```

ld  $2, 20($sp)
ld  $3, 16($sp)
cmp $3, $2          // compare(i, amount)
jge $BB0_4
jmp $BB0_2
$BB0_2:                      #  in Loop: Header=BB0_1 Depth=1
    // i < amount
    ld  $2, 4($sp)
    addiu $3, $2, 4    // arg_ptr + 4
    st   $3, 4($sp)
    ld   $2, 0($2)     // *arg_ptr
    st   $2, 12($sp)
    ld   $3, 8($sp)    // sum
    add  $2, $3, $2    // sum += *arg_ptr
    st   $2, 8($sp)
# BB#3:                      #  in Loop: Header=BB0_1 Depth=1
    // i >= amount
    ld  $2, 16($sp)
    addiu $2, $2, 1    // i++
    st   $2, 16($sp)
    jmp $BB0_1
$BB0_4:
    addiu $sp, $sp, 24
    ret $lr
    .set  macro
    .set  reorder
    .end  _Z5sum_iiz
$tmp1:
    .size _Z5sum_iiz, ($tmp1)-_Z5sum_iiz

.globl main
.align 2
.type main,@function
.ent  main          # @main
main:
    .frame $sp,88,$lr
    .mask 0x00004000,-4
    .set noreorder
    .cupload $t9
    .set nomacro
# BB#0:
    addiu $sp, $sp, -88
    st  $lr, 84($sp)      # 4-byte Folded Spill
    .cprestore 32
    addiu $2, $zero, 0
    st  $2, 80($sp)
    addiu $3, $zero, 5
    st  $3, 24($sp)
    addiu $3, $zero, 4
    st  $3, 20($sp)
    addiu $3, $zero, 3
    st  $3, 16($sp)
    addiu $3, $zero, 2
    st  $3, 12($sp)
    addiu $3, $zero, 1
    st  $3, 8($sp)
    st  $2, 4($sp)
    addiu $2, $zero, 6

```

```

st  $2, 0($sp)
ld  $t9, %call24(_Z5sum_iiz)($gp)
jalr $t9
ld  $gp, 32($sp)
st  $2, 76($sp)
ld  $lr, 84($sp)           # 4-byte Folded Reload
addiu $sp, $sp, 88
ret $lr
.set  macro
.set  reorder
.end  main
$tmp4:
.size main, ($tmp4)-main

```

The analysis of output ch8\_3.cpu0.s as above in comment. As above code, in # BB#0, we get the first argument “**amount**” from “**ld \$2, 24(\$sp)**” since the stack size of the callee function “**\_Z5sum\_iiz()**” is 24. And set argument pointer, **arg\_ptr**, to **28(\$sp)**, **&arg[1]**. Next, check **i < amount** in block **\$BB0\_1**. If **i < amount**, than enter into **\$BB0\_2**. In **\$BB0\_2**, it do **sum += \*arg\_ptr** as well as **arg\_ptr+=4**. In # BB#3, do **i+=1**.

To support variable number of arguments, the following code needed to add in Chapter8\_4/. The ch8\_3\_2.cpp is C++ template example code, it can be translated into cpu0 backend code too.

#### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0TargetLowering.h

```

class Cpu0TargetLowering : public TargetLowering {
...
private:
...
SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;
...
}

```

#### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0TargetLowering.cpp

```

Cpu0TargetLowering::
Cpu0TargetLowering(Cpu0TargetMachine &TM)
: TargetLowering(TM, new Cpu0TargetObjectFile()),
Subtarget(&TM.getSubtarget<Cpu0Subtarget>()) {
...
setOperationAction(ISD::VASTART,           MVT::Other, Custom);
...
// Support va_arg(): variable numbers (not fixed numbers) of arguments
// (parameters) for function all
setOperationAction(ISD::VAARG,           MVT::Other, Expand);
setOperationAction(ISD::VACOPY,           MVT::Other, Expand);
setOperationAction(ISD::VAEND,           MVT::Other, Expand);
...
}
...

SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
    switch (Op.getOpcode())
    {

```

```

...
case ISD::VASTART:           return LowerVASTART(Op, DAG);
}
return SDValue();
}

...
SDValue Cpu0TargetLowering::LowerVASTART(SDValue Op, SelectionDAG &DAG) const {
    MachineFunction &MF = DAG.getMachineFunction();
    Cpu0FunctionInfo *FuncInfo = MF.getInfo<Cpu0FunctionInfo>();

    DebugLoc dl = Op.getDebugLoc();
    SDValue FI = DAG.getFrameIndex(FuncInfo->getVarArgsFrameIndex(),
                                    getPointerTy());

    // vastart just stores the address of the VarArgsFrameIndex slot into the
    // memory location argument.
const Value *SV = cast<SrcValueSDNode>(Op.getOperand(2))->getValue();
return DAG.getStore(Op.getOperand(0), dl, FI, Op.getOperand(1),
                    MachinePointerInfo(SV), false, false, 0);
}

...
SDValue
Cpu0TargetLowering::LowerFormalArguments(SDValue Chain,
                                         CallingConv::ID CallConv,
                                         bool isVarArg,
                                         const SmallVectorImpl<ISD::InputArg> &Ins,
                                         DebugLoc dl, SelectionDAG &DAG,
                                         SmallVectorImpl<SDValue> &InVals)
                                         const {
    ...
    if (isVarArg) {
        unsigned RegSize = Cpu0::CPURegsRegClass.getSize();
        // Offset of the first variable argument from stack pointer.
        int FirstVaArgOffset = RegSize;

        // Record the frame index of the first variable argument
        // which is a value necessary to VASTART.
        LastFI = MFI->CreateFixedObject(RegSize, FirstVaArgOffset, true);
        Cpu0FI->setVarArgsFrameIndex(LastFI);
    }
    ...
}
}

```

### LLVMBackendTutorialExampleCode/InputFiles/ch8\_3\_2.cpp

```

1 // #include <stdio.h>
2 #include <stdarg.h>
3
4 template<class T>
5 T sum(T amount, ...)
6 {
7     T i = 0;
8     T val = 0;
9     T sum = 0;
10
11     va_list vl;

```

```

12     va_start(vl, amount);
13     for (i = 0; i < amount; i++)
14     {
15         val = va_arg(vl, T);
16         sum += val;
17     }
18     va_end(vl);
19
20     return sum;
21 }
22
23 int main()
24 {
25     int a = sum<int>(6, 0, 1, 2, 3, 4, 5);
26 // printf("a = %d\n", a);
27
28     return a;
29 }

```

Mips qemu reference <sup>7</sup>, you can download and run it with gcc to verify the result with printf() function. We will verify the code correction in chapter “Run backend” through the CPU0 Verilog language machine.

### 8.5.3 Dynamic stack allocation support

Even though C language very rare to use dynamic stack allocation, there are languages use it frequently. The following C example code use it.

Chapter8\_4 support dynamic stack allocation with the following code added.

#### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0FrameLowering.cpp

```

void Cpu0FrameLowering::emitPrologue(MachineFunction &MF) const {
    ...
    unsigned FP = Cpu0::FP;
    unsigned ZERO = Cpu0::ZERO;
    unsigned ADDu = Cpu0::ADDu;
    ...
    // if framepointer enabled, set it to point to the stack pointer.
    if (hasFP(MF)) {
        // Insert instruction "move $fp, $sp" at this location.
        BuildMI(MBB, MBBI, dl, TII.get(ADDu), FP).addReg(SP).addReg(ZERO);

        // emit ".cfi_def_cfa_register $fp"
        MCSymbol *SetFPLabel = MMI.getContext().CreateTempSymbol();
        BuildMI(MBB, MBBI, dl,
            TII.get(TargetOpcode::PROLOG_LABEL)).addSym(SetFPLabel);
        DstML = MachineLocation(FP);
        SrcML = MachineLocation(MachineLocation::VirtualFP);
        Moves.push_back(MachineMove(SetFPLabel, DstML, SrcML));
    }
    ...
}

void Cpu0FrameLowering::emitEpilogue(MachineFunction &MF,

```

<sup>7</sup> section “4.5.1 Calling Conventions” of tricore\_llvm.pdf

```
MachineBasicBlock &MBB) const {
  ...
  unsigned FP = Cpu0::FP;
  unsigned ZERO = Cpu0::ZERO;
  unsigned ADDu = Cpu0::ADDu;
  ...

  // if framepointer enabled, restore the stack pointer.
  if (hasFP(MF)) {
    // Find the first instruction that restores a callee-saved register.
    MachineBasicBlock::iterator I = MBBI;

    for (unsigned i = 0; i < MFI->getCalleeSavedInfo().size(); ++i)
      --I;

    // Insert instruction "move $sp, $fp" at this location.
    BuildMI(MBB, I, dl, TII.get(ADDu), SP).addReg(FP).addReg(ZERO);
  }
  ...
}
```

### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0ISelLowering.cpp

```
Cpu0TargetLowering::
Cpu0TargetLowering(Cpu0TargetMachine &TM)
  : TargetLowering(TM, new Cpu0TargetObjectFile()),
    Subtarget(&TM.getSubtarget<Cpu0Subtarget>()) {
  ...
  setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i32, Expand);
  ...
  setStackPointerRegisterToSaveRestore(Cpu0::SP);
  ...
}
```

### LLVMBackendTutorialExampleCode/Chapter8\_4/Cpu0RegisterInfo.cpp

```
// pure virtual method
BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
  ...
  // Reserve FP if this function should have a dedicated frame pointer register.
  if (MF.getTarget().getFrameLowering()->hasFP(MF)) {
    Reserved.set(Cpu0::FP);
  }
  ...
}
```

Run Chapter8\_4 with ch8\_4.cpp will get the following correct result.

```
118-165-72-242:InputFiles Jonathan$ clang -I/Applications/Xcode.app/Contents/
Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/
-c ch8_4.cpp -emit-llvm -o ch8_4.bc
118-165-72-242:InputFiles Jonathan$ llvm-dis ch8_4.bc -o ch8_4.ll
118-165-72-242:InputFiles Jonathan$ cat ch8_4.ll
; ModuleID = 'ch8_4.bc'
```

```

target datalayout = "e-p:64:64:64-i1:8:8-i16:16:16-i32:32:32-i64:64:64-
f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:
32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"

define i32 @_Z5sum_iiiiiii(i32 %x1, i32 %x2, i32 %x3, i32 %x4, i32 %x5, i32 %x6)
nounwind uwtable ssp {
    ...
    %10 = alloca i8, i64 %9      // int *b = (int*)alloca(sizeof(int) * x1);
    %11 = bitcast i8* %10 to i32*
    store i32* %11, i32** %b, align 8
    %12 = load i32** %b, align 8
    store i32 1111, i32* %12, align 4    // *b = 1111;
    ...
}
...
118-165-72-242:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm ch8_4.bc -o
ch8_4.cpu0.s
118-165-72-242:InputFiles Jonathan$ cat ch8_4.cpu0.s
...
_Z10weight_sumiiiiii:
.cfi_startproc
.frame $fp,80,$lr
.mask 0x00004080,-4
.set noreorder
.cupload $t9
.set nomacro
# BB#0:
addiu $sp, $sp, -80
$tmp6:
.cfi_def_cfa_offset 80
st $lr, 76($sp)          # 4-byte Folded Spill
st $7, 72($sp)          # 4-byte Folded Spill
$tmp7:
.cfi_offset 14, -4
$tmp8:
.cfi_offset 7, -8
add $fp, $sp, $zero
$tmp9:
.cfi_def_cfa_register 11
.cprestore 24
ld $7, %got(__stack_chk_guard) ($gp)
ld $2, 0($7)
st $2, 68($fp)
ld $2, 80($fp)
st $2, 64($fp)
ld $2, 84($fp)
st $2, 60($fp)
ld $2, 88($fp)
st $2, 56($fp)
ld $2, 92($fp)
st $2, 52($fp)
ld $2, 96($fp)
st $2, 48($fp)
ld $2, 100($fp)
st $2, 44($fp)

```

```

ld      $2, 64($fp)      // int *b = (int*)alloca(sizeof(int) * x1);
shl    $2, $2, 2
addiu $2, $2, 7
addiu $3, $zero, -8
and    $2, $2, $3
subu  $2, $sp, $2
add    $sp, $zero, $2 // set sp to the bottom of alloca area
st    $2, 40($fp)
addiu $3, $zero, 1111
st    $3, 0($2)
ld    $2, 64($fp)
ld    $3, 60($fp)
ld    $4, 56($fp)
ld    $5, 52($fp)
ld    $t9, 48($fp)
ld    $t0, 44($fp)
st    $t0, 20($sp)
shl    $t9, $t9, 1
st    $t9, 16($sp)
st    $5, 12($sp)
st    $4, 8($sp)
st    $3, 4($sp)
addiu $3, $zero, 6
mul   $2, $2, $3
st    $2, 0($sp)
ld    $t9, %call24(_Z3sumiiiiii)($gp)
jalr  $t9
ld    $gp, 24($fp)
st    $2, 36($fp)
ld    $3, 0($7)
ld    $4, 68($fp)
bne   $3, $4, $BB1_2
# BB#1:                                # %SP_return
add   $sp, $fp, $zero
ld    $7, 72($sp)                      # 4-byte Folded Reload
ld    $lr, 76($sp)                      # 4-byte Folded Reload
addiu $sp, $sp, 80
ret   $2
$BB1_2:                                # %CallStackCheckFailBlk
ld    $t9, %call24(__stack_chk_fail)($gp)
jalr  $t9
ld    $gp, 24($fp)
.set  macro
.set  reorder
.end   _Z10weight_sumiiiiii
$tmp10:
.size  _Z10weight_sumiiiiii, ($tmp10)-_Z10weight_sumiiiiii
.cfi_endproc
...

```

As you can see, the dynamic stack allocation need frame pointer register **fp** support. As Figure 8.3, the sp is adjusted to  $sp - 56$  when it entered the function as usual by instruction **addiu \$sp, \$sp, -56**. Next, the fp is set to sp where is the position just above alloca() spaces area when meet instruction **addu \$fp, \$sp, \$zero**. After that, the sp is changed to the just below of alloca() area. Remind, the alloca() area which the b point to, “`/*b = (int*)alloca(sizeof(int) * x1)`” is allocated at run time since the spaces is variable size which depend on x1 variable and cannot be calculated at link time.

Figure 8.4 depicted how the stack pointer changes back to the caller stack bottom. As above, the **fp** is set to the just

above of alloca(). The first step is changing the sp to fp by instruction **addu \$sp, \$fp, \$zero**. Next, sp is changed back to caller stack bottom by instruction **addiu \$sp, \$sp, 56**.



Figure 8.3: Frame pointer changes when enter function

Use fp to keep the old stack pointer value is not necessary. Actually, the sp can back to the the old sp by add the alloca() spaces size. Most ABI like Mips and ARM access the above area of alloca() by fp and the below area of alloca() by sp, as [Figure 8.5](#) depicted. The reason for this definition is the speed for local variable access. Since the RISC CPU use immediate offset for load and store as below, using fp and sp for access both areas of local variables have better performance compare to use the sp only.

```
ld      $2, 64($fp)
st      $3, 4($sp)
```

Cpu0 use fp and sp to access the above and below areas of alloca() too. As ch8\_4.cpu0.s, it access local variable (above of alloca()) by fp offset and outgoing arguments (below of alloca()) by sp offset.

## 8.6 Summary of this chapter

Until now, we have 6,000 lines of source code around in the end of this chapter. The cpu0 backend code now can take care the integer function call and control statement just like the llvm front end tutorial example code. Look back the chapter of “Back end structure”, there are 3,100 lines of source code with taking three instructions only. With this 95% more of code, it can translate tens of instructions, global variable, control flow statement and function call. Now the cpu0 backend is not just a toy. It can translate the C++ OOP language into cpu0 instructions without much effort. Because the most complex things in language, such as C++ syntax, is handled by front end. LLVM is a real structure following the compiler theory, any backend of LLVM can benefit from this structure. A couple of thousands lines of code make OOP language translated into your backend. And your backend will grow up automatically through the front end support languages more and more.



Figure 8.4: Stack pointer changes when exit function



Figure 8.5: fp and sp access areas



# ELF SUPPORT

Cpu0 backend generated the ELF format of obj. The ELF (Executable and Linkable Format) is a common standard file format for executables, object code, shared libraries and core dumps. First published in the System V Application Binary Interface specification, and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unixsystems. In 1999 it was chosen as the standard binary file format for Unix and Unix-like systems on x86 by the x86open project. Please reference <sup>1</sup>.

The binary encode of cpu0 instruction set in obj has been checked in the previous chapters. But we didn't dig into the ELF file format like elf header and relocation record at that time. This chapter will use the binutils which has been installed in “sub-section Install other tools on iMac” of Appendix A: “Installing LLVM” <sup>2</sup> to analysis cpu0 ELF file. You will learn the objdump, readelf, ..., tools and understand the ELF file format itself through using these tools to analyze the cpu0 generated obj in this chapter. LLVM has the llvm-objdump tool which like objdump. We will make cpu0 support llvm-objdump tool in this chapter. The binutils support other CPU ELF dump as a cross compiler tool chains. Linux platform has binutils already and no need to install it further. We use Linux binutils in this chapter just because iMac will display Chinese text. The iMac corresponding binutils have no problem except it use add g in command, for example, use gobjdump instead of objdump, and display your area language instead of pure English.

The binutils tool we use is not a part of llvm tools, but it's a powerful tool in ELF analysis. This chapter introduce the tool to readers since we think it is a valuable knowledge in this popular ELF format and the ELF binutils analysis tool. An LLVM compiler engineer has the responsibility to analyze the ELF since the obj is need to be handled by linker or loader later. With this tool, you can verify your generated ELF format.

The cpu0 author has published a “System Software” book which introduce the topics of assembler, linker, loader, compiler and OS in concept, and at same time demonstrate how to use binutils and gcc to analysis ELF through the example code in his book. It's a Chinese book of “System Software” in concept and practice. This book does the real analysis through binutils. The “System Software”<sup>3</sup> written by Beck is a famous book in concept of telling readers what is the compiler output, what is the linker output, what is the loader output, and how they work together. But it covers the concept only. You can reference it to understand how the “**Relocation Record**” works if you need to refresh or learning this knowledge for this chapter.

<sup>4</sup>, <sup>5</sup>, <sup>6</sup> are the Chinese documents available from the cpu0 author on web site.

## 9.1 ELF format

ELF is a format used both in obj and executable file. So, there are two views in it as [Figure 9.1](#).

---

<sup>1</sup> [http://en.wikipedia.org/wiki/Executable\\_and\\_Linkable\\_Format](http://en.wikipedia.org/wiki/Executable_and_Linkable_Format)

<sup>2</sup> <http://jonathan2251.github.com/lbd/install.html#install-other-tools-on-imac>

<sup>3</sup> Leland Beck, System Software: An Introduction to Systems Programming.

<sup>4</sup> <http://ccckmit.wikidot.com/lk:aout>

<sup>5</sup> <http://ccckmit.wikidot.com/lk:objfile>

<sup>6</sup> <http://ccckmit.wikidot.com/lk:elf>



Figure 9.1: ELF file format overview

As Figure 9.1, the “Section header table” include sections .text, .rodata, ..., .data which are sections layout for code, read only data, ..., and read/write data. “Program header table” include segments include run time code and data. The definition of segments is run time layout for code and data, and sections is link time layout for code and data.

## 9.2 ELF header and Section header table

Let's run Chapter7\_7/ with ch6\_1.cpp, and dump ELF header information by `readelf -h` to see what information the ELF header contains.

```
[Gamma@localhost InputFiles]$ /usr/local/llvm/test/cmake_debug_build/
bin/llc -march=cpu0 -relocation-model=pic -filetype=obj ch6_1.bc -o ch6_1.cpu0.o
```

```
[Gamma@localhost InputFiles]$ readelf -h ch6_1.cpu0.o
ELF Header:
  Magic: 7f 45 4c 46 01 02 01 08 00 00 00 00 00 00 00 00
  Class: ELF32
  Data: 2's complement, big endian
  Version: 1 (current)
  OS/ABI: UNIX - IRIX
  ABI Version: 0
  Type: REL (Relocatable file)
  Machine: <unknown>: 0xc9
  Version: 0x1
  Entry point address: 0x0
  Start of program headers: 0 (bytes into file)
  Start of section headers: 212 (bytes into file)
  Flags: 0x70000001
  Size of this header: 52 (bytes)
  Size of program headers: 0 (bytes)
  Number of program headers: 0
  Size of section headers: 40 (bytes)
  Number of section headers: 10
  Section header string table index: 7
[Gamma@localhost InputFiles]$
```

```
[Gamma@localhost InputFiles]$ /usr/local/llvm/test/cmake_debug_build/
bin/llc -march=mips -relocation-model=pic -filetype=obj ch6_1.bc -o ch6_1.mips.o
```

```
[Gamma@localhost InputFiles]$ readelf -h ch6_1.mips.o
ELF Header:
  Magic: 7f 45 4c 46 01 02 01 08 00 00 00 00 00 00 00 00
  Class: ELF32
  Data: 2's complement, big endian
  Version: 1 (current)
  OS/ABI: UNIX - IRIX
  ABI Version: 0
  Type: REL (Relocatable file)
  Machine: MIPS R3000
  Version: 0x1
  Entry point address: 0x0
  Start of program headers: 0 (bytes into file)
  Start of section headers: 212 (bytes into file)
  Flags: 0x70000001
  Size of this header: 52 (bytes)
  Size of program headers: 0 (bytes)
  Number of program headers: 0
```

```
Size of section headers: 40 (bytes)
Number of section headers: 11
Section header string table index: 8
[Gamma@localhost InputFiles]$
```

As above ELF header display, it contains information of magic number, version, ABI, ..., . The Machine field of cpu0 is unknown while mips is MIPS3000. It is because cpu0 is not a popular CPU recognized by utility readelf. Let's check ELF segments information as follows,

```
[Gamma@localhost InputFiles]$ readelf -l ch6_1.cpu0.o
```

There are no program headers in this file.  
[Gamma@localhost InputFiles]\$

The result is in expectation because cpu0 obj is for link only, not for execution. So, the segments is empty. Check ELF sections information as follows. It contains offset and size information for every section.

```
[Gamma@localhost InputFiles]$ readelf -S ch6_1.cpu0.o
There are 10 section headers, starting at offset 0xd4:
```

| Section Headers: |               |          |          |        |        |    |     |    |     |    |
|------------------|---------------|----------|----------|--------|--------|----|-----|----|-----|----|
| [Nr]             | Name          | Type     | Addr     | Off    | Size   | ES | Flg | Lk | Inf | Al |
| [ 0]             |               | NULL     | 00000000 | 000000 | 000000 | 00 |     | 0  | 0   | 0  |
| [ 1]             | .text         | PROGBITS | 00000000 | 000034 | 000034 | 00 | AX  | 0  | 0   | 4  |
| [ 2]             | .rel.text     | REL      | 00000000 | 000310 | 000018 | 08 |     | 8  | 1   | 4  |
| [ 3]             | .data         | PROGBITS | 00000000 | 000068 | 000004 | 00 | WA  | 0  | 0   | 4  |
| [ 4]             | .bss          | NOBITS   | 00000000 | 00006c | 000000 | 00 | WA  | 0  | 0   | 4  |
| [ 5]             | .eh_frame     | PROGBITS | 00000000 | 00006c | 000028 | 00 | A   | 0  | 0   | 4  |
| [ 6]             | .rel.eh_frame | REL      | 00000000 | 000328 | 000008 | 08 |     | 8  | 5   | 4  |
| [ 7]             | .shstrtab     | STRTAB   | 00000000 | 000094 | 00003e | 00 |     | 0  | 0   | 1  |
| [ 8]             | .symtab       | SYMTAB   | 00000000 | 000264 | 000090 | 10 |     | 9  | 6   | 4  |
| [ 9]             | .strtab       | STRTAB   | 00000000 | 0002f4 | 00001b | 00 |     | 0  | 0   | 1  |

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings)  
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)  
O (extra OS processing required) o (OS specific), p (processor specific)

```
[Gamma@localhost InputFiles]$
```

## 9.3 Relocation Record

The cpu0 backend translate global variable as follows,

```
[Gamma@localhost InputFiles]$ clang -c ch6_1.cpp -emit-llvm -o ch6_1.bc
[Gamma@localhost InputFiles]$ /usr/local/llvm/test/cmake_debug_build/
bin/llc -march=cpu0 -relocation-model=pic -filetype=asm ch6_1.bc -o ch6_1.cpu0.s
[Gamma@localhost InputFiles]$ cat ch6_1.cpu0.s
.section .mdebug.abi32
.previous
.file "ch6_1.bc"
.text
.globl main
.align 2
.type main,@function
.ent main          # @main
main:
.cfi_startproc
```

```

.frame $sp,8,$lr
.mask 0x00000000,0
.set noreorder
.cupload $t9
...
    ld $2, %got(gI)($gp)
...
.type gI,@object          # @gI
.data
.globl gI
.align 2
gI:
    .4byte 100          # 0x64
    .size gI, 4

[Gamma@localhost InputFiles]$ /usr/local/llvm/test/cmake_debug_build/
bin/llc -march=cpu0 -relocation-model=pic -filetype=obj ch6_1.bc -o ch6_1.cpu0.o
[Gamma@localhost InputFiles]$ objdump -s ch6_1.cpu0.o

ch6_1.cpu0.o:      file format elf32-big

Contents of section .text:
// .cupload machine instruction
0000 09a00000 1eaa0010 09aa0000 13aa6000  .....
...
0020 002a0000 00220000 012d0000 09dd0008  .*....-.....
...
[Gamma@localhost InputFiles]$ Jonathan$


[Gamma@localhost InputFiles]$ readelf -tr ch6_1.cpu0.o
There are 10 section headers, starting at offset 0xd4:

Section Headers:
[Nr] Name
  Type      Addr      Off      Size      ES      Lk Inf Al
  Flags
[ 0]
    NULL      00000000 000000 000000 00      0      0      0
    [00000000]:
[ 1] .text
    PROGBITS    00000000 000034 000034 00      0      0      4
    [00000006]: ALLOC, EXEC
[ 2] .rel.text
    REL        00000000 000310 000018 08      8      1      4
    [00000000]:
[ 3] .data
    PROGBITS    00000000 000068 000004 00      0      0      4
    [00000003]: WRITE, ALLOC
[ 4] .bss
    NOBITS    00000000 000006c 000000 00      0      0      4
    [00000003]: WRITE, ALLOC
[ 5] .eh_frame
    PROGBITS    00000000 000006c 000028 00      0      0      4
    [00000002]: ALLOC
[ 6] .rel.eh_frame
    REL        00000000 000328 000008 08      8      5      4
    [00000000]:

```

```

[ 7] .shstrtab
    STRTAB      00000000 000094 00003e 00    0    0    1
    [00000000]:
[ 8] .symtab
    SYMTAB      00000000 000264 000090 10    9    6    4
    [00000000]:
[ 9] .strtab
    STRTAB      00000000 0002f4 00001b 00    0    0    1
    [00000000]:


Relocation section '.rel.text' at offset 0x310 contains 3 entries:
  Offset      Info      Type          Sym.Value  Sym. Name
00000000  00000805 unrecognized: 5      00000000  _gp_disp
00000008  00000806 unrecognized: 6      00000000  _gp_disp
00000020  00000609 unrecognized: 9      00000000  gI

Relocation section '.rel.eh_frame' at offset 0x328 contains 1 entries:
  Offset      Info      Type          Sym.Value  Sym. Name
0000001c  00000202 unrecognized: 2      00000000  .text
[Gamma@localhost InputFiles]$ readelf -tr ch6_1.mips.o
There are 10 section headers, starting at offset 0xd0:


Section Headers:
[Nr] Name
  Type      Addr      Off      Size      ES      Lk Inf Al
  Flags
[ 0]
    NULL      00000000 000000 000000 00    0    0    0
    [00000000]:
[ 1] .text
    PROGBITS  00000000 000034 000030 00    0    0    4
    [00000006]: ALLOC, EXEC
[ 2] .rel.text
    REL      00000000 00030c 000018 08    8    1    4
    [00000000]:
[ 3] .data
    PROGBITS  00000000 000064 000004 00    0    0    4
    [00000003]: WRITE, ALLOC
[ 4] .bss
    NOBITS   00000000 000068 000000 00    0    0    4
    [00000003]: WRITE, ALLOC
[ 5] .eh_frame
    PROGBITS  00000000 000068 000028 00    0    0    4
    [00000002]: ALLOC
[ 6] .rel.eh_frame
    REL      00000000 000324 000008 08    8    5    4
    [00000000]:
[ 7] .shstrtab
    STRTAB      00000000 000090 00003e 00    0    0    1
    [00000000]:
[ 8] .symtab
    SYMTAB      00000000 000260 000090 10    9    6    4
    [00000000]:
[ 9] .strtab
    STRTAB      00000000 0002f0 00001b 00    0    0    1
    [00000000]:


Relocation section '.rel.text' at offset 0x30c contains 3 entries:

```

```

Offset      Info      Type          Sym.Value  Sym. Name
00000000  00000805 R_MIPS_HI16  00000000  _gp_disp
00000004  00000806 R_MIPS_LO16  00000000  _gp_disp
00000018  00000609 R_MIPS_GOT16 00000000  gI

Relocation section '.rel.eh_frame' at offset 0x324 contains 1 entries:
Offset      Info      Type          Sym.Value  Sym. Name
0000001c  00000202 R_MIPS_32    00000000  .text

```

As depicted in section Handle \$gp register in PIC addressing mode, it translate “**.cupload %reg**” into the following.

```

// Lower ".cupload $reg" to
// "addiu $gp, $zero, %hi(_gp_disp)"
// "shl $gp, $gp, 16"
// "addiu $gp, $gp, %lo(_gp_disp)"
// "addu $gp, $gp, $t9"

```

The `_gp_disp` value is determined by loader. So, it's undefined in obj. You can find the Relocation Records for offset 0 and 8 of .text section referred to `_gp_disp` value. The offset 0 and 8 of .text section are instructions “`addiu $gp, $zero, %hi(_gp_disp)`” and “`addiu $gp, $gp, %lo(_gp_disp)`” and their corresponding obj encode are 09a00000 and 09aa0000. The obj translate the `%hi(_gp_disp)` and `%lo(_gp_disp)` into 0 since when loader load this obj into memory, loader will know the `_gp_disp` value at run time and will update these two offset relocation records into the correct offset value. You can check the cpu0 of `%hi(_gp_disp)` and `%lo(_gp_disp)` are correct by above mips Relocation Records of `R_MIPS_HI(_gp_disp)` and `R_MIPS_LO(_gp_disp)` even though the cpu0 is not a CPU recognized by greadelf utilitly. The instruction “**Id \$2, %got(gI)(\$gp)**” is same since we don't know what the address of .data section variable will load to. So, translate the address to 0 and made a relocation record on 0x00000020 of .text section. Loader will change this address too.

Run with ch8\_3\_3.cpp will get the unknown result in `_Z5sum_iiz` and other symbol reference as below. Loader or linker will take care them according the relocation records compiler generated.

```

[Gamma@localhost InputFiles]$ /usr/local/llvm/test/cmake_debug_build/
bin/llc -march=cpu0 -relocation-model=pic -filetype=obj ch8_3_3.bc -o ch8_3_3.
cpu0.o
[Gamma@localhost InputFiles]$ readelf -tr ch8_3_3.cpu0.o
There are 11 section headers, starting at offset 0x248:

```

| Section Headers: |                |              |          |        |        |    |           |
|------------------|----------------|--------------|----------|--------|--------|----|-----------|
| [Nr]             | Name           | Type         | Addr     | Off    | Size   | ES | Lk Inf Al |
|                  |                | Flags        |          |        |        |    |           |
| [ 0]             |                | NULL         | 00000000 | 000000 | 000000 | 00 | 0 0 0     |
|                  | [00000000]:    |              |          |        |        |    |           |
| [ 1]             | .text          | PROGBITS     | 00000000 | 000034 | 000178 | 00 | 0 0 4     |
|                  | [00000006]:    | ALLOC, EXEC  |          |        |        |    |           |
| [ 2]             | .rel.text      | REL          | 00000000 | 000538 | 000058 | 08 | 9 1 4     |
|                  | [00000000]:    |              |          |        |        |    |           |
| [ 3]             | .data          | PROGBITS     | 00000000 | 0001ac | 000000 | 00 | 0 0 4     |
|                  | [00000003]:    | WRITE, ALLOC |          |        |        |    |           |
| [ 4]             | .bss           | NOBITS       | 00000000 | 0001ac | 000000 | 00 | 0 0 4     |
|                  | [00000003]:    | WRITE, ALLOC |          |        |        |    |           |
| [ 5]             | .rodata.str1.1 | PROGBITS     | 00000000 | 0001ac | 000008 | 01 | 0 0 1     |

```

[00000032]: ALLOC, MERGE, STRINGS
[ 6] .eh_frame
  PROGBITS        00000000 0001b4 000044 00  0  0  4
  [00000002]: ALLOC
[ 7] .rel.eh_frame
  REL            00000000 000590 000010 08  9  6  4
  [00000000]:
[ 8] .shstrtab
  STRTAB         00000000 0001f8 00004d 00  0  0  1
  [00000000]:
[ 9] .symtab
  SYMTAB         00000000 000400 0000e0 10  10  8  4
  [00000000]:
[10] .strtab
  STRTAB         00000000 0004e0 000055 00  0  0  1
  [00000000]:

```

Relocation section '.rel.text' at offset 0x538 contains 11 entries:

| Offset   | Info     | Type          | Sym. | Value    | Sym.             | Name |
|----------|----------|---------------|------|----------|------------------|------|
| 00000000 | 00000c05 | unrecognized: | 5    | 00000000 | _gp_disp         |      |
| 00000008 | 00000c06 | unrecognized: | 6    | 00000000 | _gp_disp         |      |
| 0000001c | 00000b09 | unrecognized: | 9    | 00000000 | _stack_chk_guard |      |
| 000000b8 | 00000b09 | unrecognized: | 9    | 00000000 | _stack_chk_guard |      |
| 000000dc | 00000a0b | unrecognized: | b    | 00000000 | _stack_chk_fail  |      |
| 000000e8 | 00000c05 | unrecognized: | 5    | 00000000 | _gp_disp         |      |
| 000000f0 | 00000c06 | unrecognized: | 6    | 00000000 | _gp_disp         |      |
| 00000140 | 0000080b | unrecognized: | b    | 00000000 | _Z5sum_iiz       |      |
| 00000154 | 00000209 | unrecognized: | 9    | 00000000 | \$._str          |      |
| 00000158 | 00000206 | unrecognized: | 6    | 00000000 | \$._str          |      |
| 00000160 | 00000d0b | unrecognized: | b    | 00000000 | printf           |      |

Relocation section '.rel.eh\_frame' at offset 0x590 contains 2 entries:

| Offset   | Info     | Type          | Sym. | Value    | Sym.  | Name |
|----------|----------|---------------|------|----------|-------|------|
| 0000001c | 00000302 | unrecognized: | 2    | 00000000 | .text |      |
| 00000034 | 00000302 | unrecognized: | 2    | 00000000 | .text |      |

[Gamma@localhost InputFiles]\$ /usr/local/llvm/test/cmake\_debug\_build/bin/llc -march=mips -relocation-model=pic -filetype=obj ch8\_3\_3.bc -o ch8\_3\_3.mips.o

[Gamma@localhost InputFiles]\$ readelf -tr ch8\_3\_3.mips.o

There are 11 section headers, starting at offset 0x254:

Section Headers:

| [Nr] | Name                     | Type     | Addr     | Off    | Size   | ES | Lk | Inf | Al |
|------|--------------------------|----------|----------|--------|--------|----|----|-----|----|
|      |                          | Flags    |          |        |        |    |    |     |    |
| [ 0] |                          | NULL     | 00000000 | 000000 | 000000 | 00 | 0  | 0   | 0  |
|      | [00000000]:              |          |          |        |        |    |    |     |    |
| [ 1] | .text                    | PROGBITS | 00000000 | 000034 | 000184 | 00 | 0  | 0   | 4  |
|      | [00000006]: ALLOC, EXEC  |          |          |        |        |    |    |     |    |
| [ 2] | .rel.text                | REL      | 00000000 | 000544 | 000058 | 08 | 9  | 1   | 4  |
|      | [00000000]:              |          |          |        |        |    |    |     |    |
| [ 3] | .data                    | PROGBITS | 00000000 | 0001b8 | 000000 | 00 | 0  | 0   | 4  |
|      | [00000003]: WRITE, ALLOC |          |          |        |        |    |    |     |    |
| [ 4] | .bss                     |          |          |        |        |    |    |     |    |

```

NOBITS          00000000 0001b8 000000 00    0    0    4
[00000003]: WRITE, ALLOC
[ 5] .rodata.str1.1
PROGBITS       00000000 0001b8 000008 01    0    0    1
[00000032]: ALLOC, MERGE, STRINGS
[ 6] .eh_frame
PROGBITS       00000000 0001c0 000044 00    0    0    4
[00000002]: ALLOC
[ 7] .rel.eh_frame
REL            00000000 00059c 000010 08    9    6    4
[00000000]:
[ 8] .shstrtab
STRTAB         00000000 000204 00004d 00    0    0    1
[00000000]:
[ 9] .symtab
SYMTAB         00000000 00040c 0000e0 10   10    8    4
[00000000]:
[10] .strtab
STRTAB         00000000 0004ec 000055 00    0    0    1
[00000000]:


Relocation section '.rel.text' at offset 0x544 contains 11 entries:
Offset  Info  Type      Sym.Value  Sym. Name
00000000 00000c05 R_MIPS_HI16    00000000  __gp_disp
00000004 00000c06 R_MIPS_LO16    00000000  __gp_disp
00000024 00000b09 R_MIPS_GOT16   00000000  __stack_chk_guard
000000c8 00000b09 R_MIPS_GOT16   00000000  __stack_chk_guard
000000f0 00000a0b R_MIPS_CALL16  00000000  __stack_chk_fail
00000100 00000c05 R_MIPS_HI16    00000000  __gp_disp
00000104 00000c06 R_MIPS_LO16    00000000  __gp_disp
00000134 0000080b R_MIPS_CALL16  00000000  __Z5sum_iiz
00000154 00000209 R_MIPS_GOT16   00000000  $.str
00000158 00000206 R_MIPS_LO16    00000000  $.str
0000015c 00000d0b R_MIPS_CALL16  00000000  printf


Relocation section '.rel.eh_frame' at offset 0x59c contains 2 entries:
Offset  Info  Type      Sym.Value  Sym. Name
0000001c 00000302 R_MIPS_32     00000000  .text
00000034 00000302 R_MIPS_32     00000000  .text
[Gamma@localhost InputFiles]$
```

## 9.4 Cpu0 ELF related files

Files Cpu0ELFObjectWrite.cpp and Cpu0MC\*.cpp are the files take care the obj format. Most obj code translation are defined by Cpu0InstrInfo.td and Cpu0RegisterInfo.td. With these td description, LLVM translate the instruction into obj format automatically.

## 9.5 lld

The lld is a project of LLVM linker. It's under development and we cannot finish the installation by following the web site direction. Even with this, it's really make sense to develop a new linker according lld web site information. Please visit the web site <sup>7</sup>.

<sup>7</sup> <http://lld.llvm.org/>

## 9.6 llvm-objdump

### 9.6.1 llvm-objdump -t -r

In iMac, gobjdump -tr can display the information of relocation records like readelf -tr. LLVM tool llvm-objdump is the same tool as objdump. Let's run gobjdump and llvm-objdump commands as follows to see the differences.

```
118-165-83-12:InputFiles Jonathan$ clang -c ch8_3_3.cpp -emit-llvm -I/
Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/
SDKs/MacOSX10.8.sdk/usr/include/ -o ch8_3_3.bc
118-165-83-10:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=obj ch8_3_3.bc -o
ch8_3_3.cpu0.o

118-165-78-12:InputFiles Jonathan$ gobjdump -t -r ch8_3_3.cpu0.o

ch8_3_3.cpu0.o: file format elf32-big
```

#### SYMBOL TABLE:

|            |    |                |          |                   |
|------------|----|----------------|----------|-------------------|
| 00000000 1 | df | *ABS*          | 00000000 | ch8_3_3.bc        |
| 00000000 1 | o  | .rodata.str1.1 | 00000008 | \$.str            |
| 00000000 1 | d  | .text          | 00000000 | .text             |
| 00000000 1 | d  | .data          | 00000000 | .data             |
| 00000000 1 | d  | .bss           | 00000000 | .bss              |
| 00000000 1 | d  | .rodata.str1.1 | 00000000 | .rodata.str1.1    |
| 00000000 1 | d  | .eh_frame      | 00000000 | .eh_frame         |
| 00000000 g | F  | .text          | 00000d4  | _Z5sum_iiz        |
| 000000d4 g | F  | .text          | 00000074 | main              |
| 00000000   |    | *UND*          | 00000000 | __stack_chk_fail  |
| 00000000   |    | *UND*          | 00000000 | __stack_chk_guard |
| 00000000   |    | *UND*          | 00000000 | printf            |

#### RELOCATION RECORDS FOR [.text]:

| OFFSET   | TYPE    | VALUE             |
|----------|---------|-------------------|
| 00000008 | UNKNOWN | __stack_chk_guard |
| 00000010 | UNKNOWN | __stack_chk_guard |
| 000000d0 | UNKNOWN | __stack_chk_fail  |
| 00000118 | UNKNOWN | _Z5sum_iiz        |
| 00000124 | UNKNOWN | \$.str            |
| 0000012c | UNKNOWN | \$.str            |
| 00000134 | UNKNOWN | printf            |

#### RELOCATION RECORDS FOR [.eh\_frame]:

| OFFSET   | TYPE    | VALUE |
|----------|---------|-------|
| 0000001c | UNKNOWN | .text |
| 00000034 | UNKNOWN | .text |

```
118-165-83-10:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llvm-objdump -t -r ch8_3_3.cpu0.o
```

```
ch8_3_3.cpu0.o: file format ELF32-CPU0
```

#### RELOCATION RECORDS FOR [.text]:

```

0 R_CPU0_HI16 _gp_disp
8 R_CPU0_LO16 _gp_disp
28 R_CPU0_GOT16 __stack_chk_guard
188 R_CPU0_GOT16 __stack_chk_guard
224 R_CPU0_CALL24 __stack_chk_fail
236 R_CPU0_HI16 _gp_disp
244 R_CPU0_LO16 _gp_disp
324 R_CPU0_CALL24 _Z5sum_iiz
344 R_CPU0_GOT16 $.str
348 R_CPU0_LO16 $.str
356 R_CPU0_CALL24 printf

```

RELOCATION RECORDS FOR [.eh\_frame]:

```

28 R_CPU0_32 .text
52 R_CPU0_32 .text

```

SYMBOL TABLE:

```

00000000 1 df *ABS* 00000000 ch8_3_3.bc
00000000 1 .rodata.str1.1 00000008 $.str
00000000 1 d .text 00000000 .text
00000000 1 d .data 00000000 .data
00000000 1 d .bss 00000000 .bss
00000000 1 d .rodata.str1.1 00000000 .rodata.str1.1
00000000 1 d .eh_frame 00000000 .eh_frame
00000000 g F .text 000000ec _Z5sum_iiz
000000ec g F .text 00000094 main
00000000 *UND* 00000000 __stack_chk_fail
00000000 *UND* 00000000 __stack_chk_guard
00000000 *UND* 00000000 _gp_disp
00000000 *UND* 00000000 printf

```

The latter llvm-objdump can display the file format and relocation records information since we add the relocation records information in ELF.h as follows,

### include/support/ELF.h

```

// Machine architectures
enum {
    ...
    EM_CPU0           = 201, // Document Write An LLVM Backend Tutorial For Cpu0
    ...
}

// include/object/ELF.h
...
template<support::endianness target_endianness, bool is64Bits>
error_code ELFObjectFile<target_endianness, is64Bits>
    ::getRelocationTypeName(DataRefImpl Rel,
                           SmallVectorImpl<char> &Result) const {
    ...
    switch (Header->e_machine) {
    case ELF::EM_CPU0: // llvm-objdump -t -r
        switch (type) {
            LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_NONE);
            LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_16);
            LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_32);
            LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_REL32);
        }
    }
}

```

```

LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_24);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_HI16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_LO16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GPREL16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_LITERAL);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GOT16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_PC24);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_CALL24);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GPREL32);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_SHIFT5);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_SHIFT6);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_64);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GOT_DISP);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GOT_PAGE);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GOT_OFST);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GOT_HI16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GOT_LO16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_SUB);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_INSERT_A);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_INSERT_B);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_DELETE);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_HIGHER);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_HIGHEST);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_CALL_HI16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_CALL_LO16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_SCN_DISP);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_REL16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_ADD_IMMEDIATE);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_PJUMP);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_RELGOT);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_JALR);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_DTPMOD32);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_DTPREL32);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_DTPMOD64);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_DTPREL64);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_GD);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_LDM);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_DTPREL_HI16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_DTPREL_LO16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_GOTTPREL);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_TPREL32);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_TPREL64);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_TPREL_HI16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_TLS_TPREL_LO16);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_GLOB_DAT);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_COPY);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_JUMP_SLOT);
LLVM_ELF_SWITCH_RELLOC_TYPE_NAME(R_CPU0_NUM);

default:
    res = "Unknown";
}
break;
...
}

template<support::endianness target_endianness, bool is64Bits>
error_code ELFObjectFile<target_endianness, is64Bits>

```

```

        ::getRelocationValueString(DataRefImpl Rel,
                                   SmallVectorImpl<char> &Result) const {
    ...
    case ELF::EM_CPU0: // llvm-objdump -t -r
        res = symname;
        break;
    ...
}

template<support::endianness target_endianness, bool is64Bits>
StringRef ELFObjectFile<target_endianness, is64Bits>
        ::getFileName() const {
    switch(Header->e_ident[ELF::EI_CLASS]) {
    case ELF::ELFCLASS32:
        switch(Header->e_machine) {
        ...
        case ELF::EM_CPU0: // llvm-objdump -t -r
            return "ELF32-CPU0";
        ...
    }
}

template<support::endianness target_endianness, bool is64Bits>
unsigned ELFObjectFile<target_endianness, is64Bits>::getArch() const {
    switch(Header->e_machine) {
    ...
    case ELF::EM_CPU0: // llvm-objdump -t -r
        return (target_endianness == support::little) ?
            Triple::cpu0el : Triple::cpu0;
    ...
}

```

## 9.6.2 llvm-objdump -d

Run Chapter8\_9/ and command `llvm-objdump -d` for dump file from elf to hex as follows,

```

JonathanTekiiMac:InputFiles Jonathan$ clang -c ch7_1_1.cpp -emit-llvm -o
ch7_1_1.bc
JonathanTekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=obj ch7_1_1.bc
-o ch7_1_1.cpu0.o
JonathanTekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llvm-objdump -d ch7_1_1.cpu0.o

ch7_1_1.cpu0.o: file format ELF32-unknown

Disassembly of section .text:error: no disassembler for target cpu0-unknown-
unknown

```

To support `llvm-objdump`, the following code added to Chapter9\_1/.

### LLVMBackendTutorialExampleCode/Chapter9\_1/CMakeLists.txt

```

tablegen(LLVM Cpu0GenDisassemblerTables.inc -gen-disassembler)
...

```

### LLVMBackendTutorialExampleCode/Chapter9\_1/LLVMBuild.txt

```
[common]
subdirectories = Disassembler ...
...
has_disassembler = 1
...
```

### LLVMBackendTutorialExampleCode/Chapter9\_1/Cpu0InstrInfo.td

```
class CmpInstr<bits<8> op, string instr_asm,
    InstrItinClass itin, RegisterClass RC, RegisterClass RD,
    bit isComm = 0>:
  FA<op, (outs RD:$rc), (ins RC:$ra, RC:$rb),
  !strconcat(instr_asm, "\t$ra, $rb"), [], itin> {
  ...
  let DecoderMethod = "DecodeCMPInstruction";
}

class CBranch<bits<8> op, string instr_asm, RegisterClass RC,
  list<Register> UseRegs>:
  FJ<op, (outs), (ins RC:$ra, brtarget:$addr),
  !strconcat(instr_asm, "\t$addr"),
  [(brcond RC:$ra, bb:$addr)], IIBranch> {
  ...
  let DecoderMethod = "DecodeBranchTarget";
}

let isBranch=1, isTerminator=1, isBarrier=1, imm16=0, hasDelaySlot = 1,
  isIndirectBranch = 1 in
class JumpFR<bits<8> op, string instr_asm, RegisterClass RC>:
  FL<op, (outs), (ins RC:$ra),
  !strconcat(instr_asm, "\t$ra"), [(brind RC:$ra)], IIBranch> {
  let rb = 0;
  let imm16 = 0;
}

let isCall=1, hasDelaySlot=0 in {
  class JumpLink<bits<8> op, string instr_asm>:
    FJ<op, (outs), (ins calltarget:$target, variable_ops),
    !strconcat(instr_asm, "\t$target"), [(Cpu0JmpLink imm:$target)],
    IIBranch> {
    let DecoderMethod = "DecodeJumpAbsoluteTarget";
  }
}

def JR      : JumpFR<0x2C, "ret", CPURegs>;
```

### LLVMBackendTutorialExampleCode/Chapter9\_1/Disassembler/CMakeLists.txt

```
include_directories( ${CMAKE_CURRENT_BINARY_DIR}/.. ${CMAKE_CURRENT_SOURCE_DIR}/.. )

add_llvm_library(LLVMCpu0Disassembler
  Cpu0Disassembler.cpp
)
```

```
# workaround for hanging compilation on MSVC9 and 10
if( MSVC_VERSION EQUAL 1400 OR MSVC_VERSION EQUAL 1500 OR MSVC_VERSION EQUAL 1600 )
set_property(
  SOURCE Cpu0Disassembler.cpp
  PROPERTY COMPILE_FLAGS "/Od"
)
endif()

add_dependencies(LLVMCpu0Disassembler Cpu0CommonTableGen)
```

### LLVMBackendTutorialExampleCode/Chapter9\_1/Disassembler/LLVMBuild.txt

```
1 ;===== ./lib/Target/Cpu0/Disassembler/LLVMBuild.txt -----*-- Conf -----;
2 ;
3 ;                               The LLVM Compiler Infrastructure
4 ;
5 ; This file is distributed under the University of Illinois Open Source
6 ; License. See LICENSE.TXT for details.
7 ;
8 ;=====-----=====;
9 ;
10 ; This is an LLVMBuild description file for the components in this subdirectory.
11 ;
12 ; For more information on the LLVMBuild system, please see:
13 ;
14 ;     http://llvm.org/docs/LLVMBuild.html
15 ;
16 ;=====-----=====;
17
18 [component_0]
19 type = Library
20 name = Cpu0Disassembler
21 parent = Cpu0
22 required_libraries = MC Support Cpu0Info
23 add_to_library_groups = Cpu0
```

### LLVMBackendTutorialExampleCode/Chapter9\_1/Disassembler/Cpu0Disassembler.cpp

```
1 //===== Cpu0Disassembler.cpp - Disassembler for Cpu0 -----*-- C++ -----//*
2 //
3 //                               The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====//*
9 //
10 // This file is part of the Cpu0 Disassembler.
11 //
12 //=====-----=====//*
13
14 #include "Cpu0.h"
15 #include "Cpu0Subtarget.h"
16 #include "Cpu0RegisterInfo.h"
17 #include "llvm/MC/MCDisassembler.h"
```

```

18 #include "llvm/MC/MCFixedLenDisassembler.h"
19 #include "llvm/Support/MemoryObject.h"
20 #include "llvm/Support/TargetRegistry.h"
21 #include "llvm/MC/MCSubtargetInfo.h"
22 #include "llvm/MC/MCInst.h"
23 #include "llvm/Support/MathExtras.h"
24
25 using namespace llvm;
26
27 typedef MCDisassembler::DecodeStatus DecodeStatus;
28
29 /// Cpu0Disassembler - a disassembler class for Cpu032.
30 class Cpu0Disassembler : public MCDisassembler {
31 public:
32     /// Constructor - Initializes the disassembler.
33     ///
34     Cpu0Disassembler(const MCSubtargetInfo &STI, bool bigEndian) :
35         MCDisassembler(STI), isBigEndian(bigEndian) {
36     }
37
38     ~Cpu0Disassembler() {
39     }
40
41     /// getInstruction - See MCDisassembler.
42     DecodeStatus getInstruction(MCInst &instr,
43                                 uint64_t &size,
44                                 const MemoryObject &region,
45                                 uint64_t address,
46                                 raw_ostream &vStream,
47                                 raw_ostream &cStream) const;
48
49 private:
50     bool isBigEndian;
51 };
52
53 // Decoder tables for Cpu0 register
54 static const unsigned CPUREgsTable[] = {
55     Cpu0::ZERO, Cpu0::AT, Cpu0::V0, Cpu0::V1,
56     Cpu0::A0, Cpu0::A1, Cpu0::T9, Cpu0::S0,
57     Cpu0::S1, Cpu0::S2, Cpu0::GP, Cpu0::FP,
58     Cpu0::SW, Cpu0::SP, Cpu0::LR, Cpu0::PC
59 };
60
61 static DecodeStatus DecodeCPUREgsRegisterClass(MCInst &Inst,
62                                                 unsigned RegNo,
63                                                 uint64_t Address,
64                                                 const void *Decoder);
65 static DecodeStatus DecodeCMPInstruction(MCInst &Inst,
66                                         unsigned Insn,
67                                         uint64_t Address,
68                                         const void *Decoder);
69 static DecodeStatus DecodeBranchTarget(MCInst &Inst,
70                                         unsigned Insn,
71                                         uint64_t Address,
72                                         const void *Decoder);
73 static DecodeStatus DecodeJumpRelativeTarget(MCInst &Inst,
74                                         unsigned Insn,
75                                         uint64_t Address,

```

```

76         const void *Decoder);
77 static DecodeStatus DecodeJumpAbsoluteTarget(MCInst &Inst,
78                                         unsigned Insn,
79                                         uint64_t Address,
80                                         const void *Decoder);
81
82 static DecodeStatus DecodeMem(MCInst &Inst,
83                               unsigned Insn,
84                               uint64_t Address,
85                               const void *Decoder);
86 static DecodeStatus DecodeSimm16(MCInst &Inst,
87                               unsigned Insn,
88                               uint64_t Address,
89                               const void *Decoder);
90
91 namespace llvm {
92     extern Target TheCpu0elTarget, TheCpu0Target, TheCpu064Target,
93                 TheCpu064elTarget;
94 }
95
96 static MCDisassembler *createCpu0Disassembler(
97     const Target &T,
98     const MCSubtargetInfo &STI) {
99     return new Cpu0Disassembler(STI, true);
100 }
101
102 static MCDisassembler *createCpu0elDisassembler(
103     const Target &T,
104     const MCSubtargetInfo &STI) {
105     return new Cpu0Disassembler(STI, false);
106 }
107
108 extern "C" void LLVMInitializeCpu0Disassembler() {
109     // Register the disassembler.
110     TargetRegistry::RegisterMCDisassembler(TheCpu0Target,
111                                             createCpu0Disassembler);
112     TargetRegistry::RegisterMCDisassembler(TheCpu0elTarget,
113                                             createCpu0elDisassembler);
114 }
115
116
117 #include "Cpu0GenDisassemblerTables.inc"
118
119     /// readInstruction - read four bytes from the MemoryObject
120     /// and return 32 bit word sorted according to the given endianess
121 static DecodeStatus readInstruction32(const MemoryObject &region,
122                                         uint64_t address,
123                                         uint64_t &size,
124                                         uint32_t &insn,
125                                         bool isBigEndian) {
126     uint8_t Bytes[4];
127
128     // We want to read exactly 4 Bytes of data.
129     if (region.readBytes(address, 4, (uint8_t*)Bytes, NULL) == -1) {
130         size = 0;
131         return MCDisassembler::Fail;
132     }
133

```

```

134     if (isBigEndian) {
135         // Encoded as a big-endian 32-bit word in the stream.
136         insn = (Bytes[3] << 0) |
137                 (Bytes[2] << 8) |
138                 (Bytes[1] << 16) |
139                 (Bytes[0] << 24);
140     }
141     else {
142         // Encoded as a small-endian 32-bit word in the stream.
143         insn = (Bytes[0] << 0) |
144                 (Bytes[1] << 8) |
145                 (Bytes[2] << 16) |
146                 (Bytes[3] << 24);
147     }
148
149     return MCDisassembler::Success;
150 }
151
152 DecodeStatus
153 Cpu0Disassembler::getInstruction(MCInst &instr,
154                                     uint64_t &Size,
155                                     const MemoryObject &Region,
156                                     uint64_t Address,
157                                     raw_ostream &vStream,
158                                     raw_ostream &cStream) const {
159     uint32_t Insn;
160
161     DecodeStatus Result = readInstruction32(Region, Address, Size,
162                                              Insn, isBigEndian);
163     if (Result == MCDisassembler::Fail)
164         return MCDisassembler::Fail;
165
166     // Calling the auto-generated decoder function.
167     Result = decodeInstruction(DecoderTableCpu032, instr, Insn, Address,
168                               this, STI);
169     if (Result != MCDisassembler::Fail) {
170         Size = 4;
171         return Result;
172     }
173
174     return MCDisassembler::Fail;
175 }
176
177 static DecodeStatus DecodeCPURegsRegisterClass(MCInst &Inst,
178                                                 unsigned RegNo,
179                                                 uint64_t Address,
180                                                 const void *Decoder) {
181     if (RegNo > 16)
182         return MCDisassembler::Fail;
183
184     Inst.addOperand(MCOperand::CreateReg(CPURegsTable[RegNo]));
185     return MCDisassembler::Success;
186 }
187
188 static DecodeStatus DecodeMem(MCInst &Inst,
189                               unsigned Insn,
190                               uint64_t Address,
191                               const void *Decoder) {

```

```

192     int Offset = SignExtend32<16>(Insn & 0xffff);
193     int Reg = (int)fieldFromInstruction(Insn, 20, 4);
194     int Base = (int)fieldFromInstruction(Insn, 16, 4);
195
196     Inst.addOperand(MCOperand::CreateReg(CPURegsTable[Reg]));
197     Inst.addOperand(MCOperand::CreateReg(CPURegsTable[Base]));
198     Inst.addOperand(MCOperand::CreateImm(Offset));
199
200     return MCDisassembler::Success;
201 }
202
203 /* CMP instruction define $rc and then $ra, $rb; The printOperand() print
204 operand 1 and operand 2 (operand 0 is $rc and operand 1 is $ra), so we Create
205 register $rc first and create $ra next, as follows,
206
207 // Cpu0InstrInfo.td
208 class CmpInstr<bits<8> op, string instr_asm,
209     InstrItinClass itin, RegisterClass RC, RegisterClass RD, bit isComm = 0>:
210     FA<op, (outs RD:$rc), (ins RC:$ra, RC:$rb),
211     !strconcat(instr_asm, "\t$ra, $rb"), [], itin> {
212
213 // Cpu0AsmWriter.inc
214 void Cpu0InstPrinter::printInstruction(const MCInst *MI, raw_ostream &O) {
215 ...
216     case 3:
217         // CMP, JEQ, JGE, JGT, JLE, JLT, JNE
218         printOperand(MI, 1, O);
219         break;
220     ...
221     case 1:
222         // CMP
223         printOperand(MI, 2, O);
224         return;
225         break;
226     */
227     static DecodeStatus DecodeCMPInstruction(MCInst &Inst,
228                                         unsigned Insn,
229                                         uint64_t Address,
230                                         const void *Decoder) {
231         int Reg_a = (int)fieldFromInstruction(Insn, 20, 4);
232         int Reg_b = (int)fieldFromInstruction(Insn, 16, 4);
233         int Reg_c = (int)fieldFromInstruction(Insn, 12, 4);
234
235         Inst.addOperand(MCOperand::CreateReg(CPURegsTable[Reg_c]));
236         Inst.addOperand(MCOperand::CreateReg(CPURegsTable[Reg_a]));
237         Inst.addOperand(MCOperand::CreateReg(CPURegsTable[Reg_b]));
238         return MCDisassembler::Success;
239     }
240
241 /* CBranch instruction define $ra and then imm24; The printOperand() print
242 operand 1 (operand 0 is $ra and operand 1 is imm24), so we Create register
243 operand first and create imm24 next, as follows,
244
245 // Cpu0InstrInfo.td
246 class CBranch<bits<8> op, string instr_asm, RegisterClass RC,
247     list<Register> UseRegs>:
248     FJ<op, (outs), (ins RC:$ra, brtarget:$addr),
249     !strconcat(instr_asm, "\t$addr"),

```

```

250             [(brcond RC:$ra, bb:$addr)], IIBranch> {
251
252     // Cpu0AsmWriter.inc
253     void Cpu0InstPrinter::printInstruction(const MCInst *MI, raw_ostream &O) {
254     ...
255     case 3:
256         // CMP, JEQ, JGE, JGT, JLE, JLT, JNE
257         printOperand(MI, 1, O);
258         break;
259     */
260     static DecodeStatus DecodeBranchTarget (MCInst &Inst,
261                                         unsigned Insn,
262                                         uint64_t Address,
263                                         const void *Decoder) {
264         int BranchOffset = fieldFromInstruction(Insn, 0, 24);
265         if (BranchOffset > 0x8fffff)
266             BranchOffset = -1*(0x1000000 - BranchOffset);
267         Inst.addOperand(MCOperand::CreateReg(CPURegsTable[0]));
268         Inst.addOperand(MCOperand::CreateImm(BranchOffset));
269         return MCDisassembler::Success;
270     }
271
272     static DecodeStatus DecodeJumpRelativeTarget (MCInst &Inst,
273                                         unsigned Insn,
274                                         uint64_t Address,
275                                         const void *Decoder) {
276
277         int JumpOffset = fieldFromInstruction(Insn, 0, 24);
278         if (JumpOffset > 0x8fffff)
279             JumpOffset = -1*(0x1000000 - JumpOffset);
280         Inst.addOperand(MCOperand::CreateImm(JumpOffset));
281         return MCDisassembler::Success;
282     }
283
284     static DecodeStatus DecodeJumpAbsoluteTarget (MCInst &Inst,
285                                         unsigned Insn,
286                                         uint64_t Address,
287                                         const void *Decoder) {
288
289         unsigned JumpOffset = fieldFromInstruction(Insn, 0, 24);
290         Inst.addOperand(MCOperand::CreateImm(JumpOffset));
291         return MCDisassembler::Success;
292     }
293
294     static DecodeStatus DecodeSimm16 (MCInst &Inst,
295                                         unsigned Insn,
296                                         uint64_t Address,
297                                         const void *Decoder) {
298         Inst.addOperand(MCOperand::CreateImm(SignExtend32<16>(Insn)));
299         return MCDisassembler::Success;
300     }

```

As above code, it add directory Disassembler for handling the obj to assembly code reverse translation. So, add Disassembler/Cpu0Disassembler.cpp and modify the CMakeList.txt and LLVMBuild.txt to build with directory Disassembler and enable the disassembler table generated by “has\_disassembler = 1”. Most of code is handled by the table of \*.td files defined. Not every instruction in \*.td can be disassembled without trouble even though they can be translated into assembly and obj successfully. For those cannot be disassembled, LLVM supply the “**let Decoder-Method**” keyword to allow programmers implement their decode function. In Cpu0 example, we define function De-

codeCMPInstruction(), DecodeBranchTarget() and DecodeJumpAbsoluteTarget() in Cpu0Disassembler.cpp and tell the LLVM table driven system by write “**let DecoderMethod = ...**” in the corresponding instruction definitions or ISD node of Cpu0InstrInfo.td. LLVM will call these DecodeMethod when user use Disassembler job in tools, like `llvm-objdump -d`. You can check the comments above these DecodeMethod functions to see how it work. For the CMP instruction, since there are 3 operand \$rc, \$ra and \$rb occurs in `CmpInstr<...>`, and the assembler print \$ra and \$rb. LLVM table generate system will print operand 1 and 2 (\$ra and \$rb) in the table generated function `printInstruction()`. The operand 0 (\$rc) didn’t be printed in `printInstruction()` since assembly print \$ra and \$rb only. In the CMP decode function, we didn’t decode shamt field because we don’t want it to be displayed and it’s not in the assembler print pattern of Cpu0InstrInfo.td.

The RET (Cpu0ISD::Ret) and JR (ISD::BRIND) are both for “ret” instruction. The former is for instruction encode in assembly and obj while the latter is for decode in disassembler. The IR node Cpu0ISD::Ret is created in `LowerReturn()` which called at function exit point.

Now, run Chapter9\_1/ with command `llvm-objdump -d ch7_1_1.cpu0.o` will get the following result.

```
JonathantekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=obj
ch7_1_1.bc -o ch7_1_1.cpu0.o
JonathantekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llvm-objdump -d ch7_1_1.cpu0.o
```

```
ch7_1_1.cpu0.o:          file format ELF32-CPU0
```

Disassembly of section .text:

main:

|     |             |                       |
|-----|-------------|-----------------------|
| 0:  | 09 dd ff d8 | addiu \$sp, \$sp, -40 |
| 4:  | 09 30 00 00 | addiu \$3, \$zero, 0  |
| 8:  | 02 3d 00 24 | st \$3, 36(\$sp)      |
| c:  | 02 3d 00 20 | st \$3, 32(\$sp)      |
| 10: | 09 20 00 01 | addiu \$2, \$zero, 1  |
| 14: | 02 2d 00 1c | st \$2, 28(\$sp)      |
| 18: | 09 40 00 02 | addiu \$4, \$zero, 2  |
| 1c: | 02 4d 00 18 | st \$4, 24(\$sp)      |
| 20: | 09 40 00 03 | addiu \$4, \$zero, 3  |
| 24: | 02 4d 00 14 | st \$4, 20(\$sp)      |
| 28: | 09 40 00 04 | addiu \$4, \$zero, 4  |
| 2c: | 02 4d 00 10 | st \$4, 16(\$sp)      |
| 30: | 09 40 00 05 | addiu \$4, \$zero, 5  |
| 34: | 02 4d 00 0c | st \$4, 12(\$sp)      |
| 38: | 09 40 00 06 | addiu \$4, \$zero, 6  |
| 3c: | 02 4d 00 08 | st \$4, 8(\$sp)       |
| 40: | 09 40 00 07 | addiu \$4, \$zero, 7  |
| 44: | 02 4d 00 04 | st \$4, 4(\$sp)       |
| 48: | 09 40 00 08 | addiu \$4, \$zero, 8  |
| 4c: | 02 4d 00 00 | st \$4, 0(\$sp)       |
| 50: | 01 4d 00 20 | ld \$4, 32(\$sp)      |
| 54: | 28 40 00 0c | bne \$4, \$zero, 12   |
| 58: | 01 4d 00 20 | ld \$4, 32(\$sp)      |
| 5c: | 09 44 00 01 | addiu \$4, \$4, 1     |
| 60: | 02 4d 00 20 | st \$4, 32(\$sp)      |
| 64: | 01 4d 00 1c | ld \$4, 28(\$sp)      |
| 68: | 27 40 00 0c | beq \$4, \$zero, 12   |
| 6c: | 01 4d 00 1c | ld \$4, 28(\$sp)      |
| 70: | 09 44 00 01 | addiu \$4, \$4, 1     |
| 74: | 02 4d 00 1c | st \$4, 28(\$sp)      |
| 78: | 01 4d 00 18 | ld \$4, 24(\$sp)      |
| 7c: | 0a 44 00 01 | slti \$4, \$4, 1      |
| 80: | 28 40 00 0c | bne \$4, \$zero, 12   |

|                  |                       |
|------------------|-----------------------|
| 84: 01 4d 00 18  | ld \$4, 24(\$sp)      |
| 88: 09 44 00 01  | addiu \$4, \$4, 1     |
| 8c: 02 4d 00 18  | st \$4, 24(\$sp)      |
| 90: 01 4d 00 14  | ld \$4, 20(\$sp)      |
| 94: 0a 44 00 00  | slti \$4, \$4, 0      |
| 98: 28 40 00 0c  | bne \$4, \$zero, 12   |
| 9c: 01 4d 00 14  | ld \$4, 20(\$sp)      |
| a0: 09 44 00 01  | addiu \$4, \$4, 1     |
| a4: 02 4d 00 14  | st \$4, 20(\$sp)      |
| a8: 01 4d 00 10  | ld \$4, 16(\$sp)      |
| ac: 09 50 ff ff  | addiu \$5, \$zero, -1 |
| b0: 20 45 40 00  | slt \$4, \$5, \$4     |
| b4: 28 40 00 0c  | bne \$4, \$zero, 12   |
| b8: 01 4d 00 10  | ld \$4, 16(\$sp)      |
| bc: 09 44 00 01  | addiu \$4, \$4, 1     |
| c0: 02 4d 00 10  | st \$4, 16(\$sp)      |
| c4: 01 4d 00 0c  | ld \$4, 12(\$sp)      |
| c8: 20 33 40 00  | slt \$3, \$3, \$4     |
| cc: 28 30 00 0c  | bne \$3, \$zero, 12   |
| d0: 01 3d 00 0c  | ld \$3, 12(\$sp)      |
| d4: 09 33 00 01  | addiu \$3, \$3, 1     |
| d8: 02 3d 00 0c  | st \$3, 12(\$sp)      |
| dc: 01 3d 00 08  | ld \$3, 8(\$sp)       |
| e0: 20 22 30 00  | slt \$2, \$2, \$3     |
| e4: 28 20 00 0c  | bne \$2, \$zero, 12   |
| e8: 01 2d 00 08  | ld \$2, 8(\$sp)       |
| ec: 09 22 00 01  | addiu \$2, \$2, 1     |
| f0: 02 2d 00 08  | st \$2, 8(\$sp)       |
| f4: 01 2d 00 04  | ld \$2, 4(\$sp)       |
| f8: 0a 22 00 01  | slti \$2, \$2, 1      |
| fc: 28 20 00 0c  | bne \$2, \$zero, 12   |
| 100: 01 2d 00 04 | ld \$2, 4(\$sp)       |
| 104: 09 22 00 01 | addiu \$2, \$2, 1     |
| 108: 02 2d 00 04 | st \$2, 4(\$sp)       |
| 10c: 01 2d 00 04 | ld \$2, 4(\$sp)       |
| 110: 01 3d 00 00 | ld \$3, 0(\$sp)       |
| 114: 20 23 20 00 | slt \$2, \$3, \$2     |
| 118: 27 20 00 0c | beq \$2, \$zero, 12   |
| 11c: 01 2d 00 00 | ld \$2, 0(\$sp)       |
| 120: 09 22 00 01 | addiu \$2, \$2, 1     |
| 124: 02 2d 00 00 | st \$2, 0(\$sp)       |
| 128: 01 2d 00 1c | ld \$2, 28(\$sp)      |
| 12c: 01 3d 00 20 | ld \$3, 32(\$sp)      |
| 130: 27 32 00 0c | beq \$3, \$2, 12      |
| 134: 01 2d 00 20 | ld \$2, 32(\$sp)      |
| 138: 09 22 00 01 | addiu \$2, \$2, 1     |
| 13c: 02 2d 00 20 | st \$2, 32(\$sp)      |
| 140: 01 2d 00 20 | ld \$2, 32(\$sp)      |
| 144: 09 dd 00 28 | addiu \$sp, \$sp, 40  |
| 148: 2c 00 00 00 | ret \$zero            |

## 9.7 Dynamic link

We explain how the dynamic link work for Cpu0 even though the Cpu0 linker and dynamic linker not exist at this point. Same with other parts, Cpu0 dynamic link implementation borrowed from Mips ABI. We trace the dynamic link implementation by lldb on X86 platform. Finding X86 and Mips all use the plt as the dynamic link implementation.

### 9.7.1 Linker support

In this section, it shows what's the code that compiler generate to support dynamic link. And what's the code generated by linker for dynamic link.

Compile main.cpp to get the Cpu0 PIC assembly code as follows,

**LLVMBackendTutorialExampleCode/InputFiles/main.cpp**

```

1 extern int foo(int x1, int x2);
2 extern int bar();
3
4 int main()
5 {
6     int a = foo(1, 2);
7     a += foo(3, 4);
8     a += bar();
9
10    return a;
11 }
```

```

18-165-77-200:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=asm main.bc -o -
    .section .mdebug.abi32
    .previous
    .file "main.bc"
    .text
    .globl main
    .align 2
    .type main,@function
    .ent main # @main
main:
    .cfi_startproc
    .frame $sp,40,$lr
    .mask 0x00004080,-4
    .set noreorder
    .cupload $t9
    .set nomacro
# BB#0:
    addiu $sp, $sp, -40
$tmp2:
    .cfi_def_cfa_offset 40
    st $lr, 36($sp) # 4-byte Folded Spill
    st $7, 32($sp) # 4-byte Folded Spill
$tmp3:
    .cfi_offset 14, -4
$tmp4:
    .cfi_offset 7, -8
    .cprestore 8
    addiu $2, $zero, 0
    st $2, 28($sp)
    addiu $2, $zero, 2
    st $2, 4($sp)
    addiu $2, $zero, 1
    st $2, 0($sp)
    ld $7, %call124(_Z3fooii)($gp)
    add $6, $zero, $7

```

```

jalr $6
ld $gp, 8($sp)
st $2, 24($sp)
addiu $2, $zero, 4
st $2, 4($sp)
addiu $2, $zero, 3
st $2, 0($sp)
add $6, $zero, $7
jalr $6
ld $gp, 8($sp)
ld $3, 24($sp)
add $2, $3, $2
st $2, 24($sp)
ld $6, %call24(_Z3barv)($gp)
jalr $6
ld $gp, 8($sp)
ld $3, 24($sp)
add $2, $3, $2
st $2, 24($sp)
ld $7, 32($sp)           # 4-byte Folded Reload
ld $lr, 36($sp)           # 4-byte Folded Reload
addiu $sp, $sp, 40
ret $2
.set macro
.set reorder
.end main

$tmp5:
.size main, ($tmp5)-main
.cfi_endproc

```

```

118-165-77-200:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=obj main.bc -o
main.cpu0.o
118-165-77-200:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llvm-objdump -r main.cpu0.o

```

```
main.cpu0.o: file format ELF32-CPU0
```

```

RELOCATION RECORDS FOR [.text]:
4 R_CPU0_L016 _gp_disp
52 R_CPU0_CALL24 _Z3fooii
112 R_CPU0_CALL24 _Z3barv

```

```

RELOCATION RECORDS FOR [.eh_frame]:
28 R_CPU0_32 .text

```

Suppost we have the linker which support dynamic link of Cpu0. After linker, the plt for dynamic link function \_Z5fooii and \_Z3barv are solved and the ELF file looks like the following,

SYMBOL TABLE:

```

...
0040035c 1 d .dynsym 00000000 .dynsym
...
Disassembly of section .plt:
...
00400720 <_Z3barv@plt>:
400720: lui $8,0x41

```

```

400724:    ld      $6,0x0acc($8)
400738:    addiu  $9,$zero,0x18
40073c:    jr      $6

00400730 <_Z3fooii@plt>:
400730:    lui     $8,0x41
400734:    ld      $6,0x0ado($8)
400738:    addiu  $9,$zero,0x08
40073c:    jr      $6
...
004009f0 <.CPU0.stubs>:
4009F0: 8f998010 ld      $6,-32752(gp)
4009F4: 03e07821 add    $8,$zero, lr
4009F8: 0320f809 jalr  $6

```

## 9.7.2 Principle

To support dynamic link, Cpu0 set the protocol as Table registers changed for call dynamic link function `_Z3fooii()`. The `.dynsym+0x08` include the dynamic link function information. Usually the information include which library and the offset value in this library. This information can be got and saved in ELF file in link time.

After the ELF is loaded to memory, it looks like [Figure 9.2](#)

Table 9.1: registers changed for call dynamic link function `_Z3fooii()`

| register/memory                                | call <code>_Z3fooii</code> first time                      | call <code>_Z3fooii</code> second time |
|------------------------------------------------|------------------------------------------------------------|----------------------------------------|
| 0x410ad0                                       | point to CPU0.stubs                                        | point to <code>_Z3fooii</code>         |
| <code>.dynsym+0x08</code>                      | (libfoobar.so, offset, length) about <code>_Z3fooii</code> | useless                                |
| -32752(gp)                                     | point to dynamic_linker                                    | useless                                |
| \$8                                            | the next instruction of <code>_Z3fooii()</code>            | useless                                |
| \$9                                            | <code>.dynsym+0x08</code>                                  | useless                                |
| \$6 (at the end of <code>_Z3fooii@plt</code> ) | point to CPU0.stubs0                                       | point to <code>_Z3fooii</code>         |
| \$6 (at the end of CPU0.stubs)                 | point to dynamic_linker                                    | useless                                |

Explains it as follows,

1. As you can see, the first time of function call, `a = foo(1,2)`, which is implemented by instructions “`ld $7, %call24(_Z3fooii@plt)($gp)`”, “`add $6, $zero, $7`” and “`jalr $6`”. Remember, `.dynsym+0x08` contains information (libfoobar.so, offset, length) which is set by linker at link to dynamic shared library. After “`jalr $6`”, PC counter jump to “`00400730 <_Z3fooii@plt>`”.
2. The memory `0x410ad0` contents is the address of CPU0.stubs when the program, `main()`, is loaded.
3. After `_Z3fooii@plt` instructions executed, it jump to CPU0.stubs since `$6 = the address of CPU0.stubs`. Register `$9 = the contents of address .dynsym+0x08` since it is set in step 1.
4. After CPU0.stubs is executed, register `$8 = 0x004008a4` which point to the caller next instruction in step 1.
5. Dynamic linker looks into register `$9` which value is `0x08`. It ask OS for the caller process address information of `.dynsym + offset 0x08`. This address include information (libfoobar.so, offset, length). With this information, dynamic linker knows where can get `_Z3fooii` function body. Dynamic linker loads `_Z3fooii()` function body to an available address where from asking OS. After load `_Z3fooii()`, it call `_Z3fooii()` and save and restore the registers `$6, $8, $9` and caller saved registers just before and after call `_Z3fooii()`.
6. After `_Z3fooii()` return, dynamic linker set the contents of address `0x410ad0` to the entry address of `_Z3fooii` in memory.
7. Dynamic linker execute `jr $8`. It jump to the next instruction of “`a = _Z3fooii();`” in caller.



Figure 9.2: Call dynamic function `_Z3fooii()` first time

After the `_Z3fooii()` is called at second time, it looks like Figure 9.3. It jump to `_Z3fooii()` directly in `<_Z3fooii@plt>` since the contents of address 0x410ad0 is changed to the memory address of `_Z3fooii()` at step 6 of Figure 9.2. From now on, any call `_Z3fooii()` will jump to `_Z3fooii()` directly from `_Z3fooii@plt` instructions.

According Mips Application Binary Interface (ABI), `$t9` is register alias for `$25` in Mips. The `%t9` is the register used in `jalr` `$25` for long distance function pointer (far subroutine call). Cpu0 use register `$6` as the `$t9` (`$25`) register of Mips. The `jal` `%subroutine` has 24 bits range of address offset relative to Program Counter (PC) while `jalr` has 32 bits address range in register size of 32 bits. One example of PIC mode is used in share library just like this example. Share library is re-entry code which can be loaded in different memory address decided on run time. The `jalr` make the implementation of dynamic link function easier and faster as above.

### 9.7.3 Trace with lldb

We tracking the dynamic link on X86 as below. You can skip it if you have no interest or already know how to track it via lldb or gdb.

```
118-165-77-200:InputFiles Jonathan$ clang -fPIC -g -c foobar.cpp
118-165-77-200:InputFiles Jonathan$ clang -shared -g foobar.o -o libfoobar.so
118-165-77-200:InputFiles Jonathan$ clang -g -c main.cpp
118-165-77-200:InputFiles Jonathan$ clang -g main.o libfoobar.so
118-165-77-200:InputFiles Jonathan$ gobjdump -d main.o
```

```
main.o:      fileformat mach-o-x86-64
```

Disassembly of section .text:

```
0000000000000000 <_main>:
 0: 55          push   %rbp
 1: 48 89 e5    mov    %rsp,%rbp
 4: 48 83 ec 10 sub   $0x10,%rsp
 8: bf 01 00 00 00 mov   $0x1,%edi
 d: be 02 00 00 00 mov   $0x2,%esi
12: c7 45 fc 00 00 00 00 movl  $0x0,-0x4(%rbp)
19: e8 00 00 00 00 callq 1e <_main+0x1e>
1e: bf 03 00 00 00 mov   $0x3,%edi
23: be 04 00 00 00 mov   $0x4,%esi
28: 89 45 f8    mov    %eax,-0x8(%rbp)
2b: e8 00 00 00 00 callq 30 <_main+0x30>
30: 8b 75 f8    mov    -0x8(%rbp),%esi
33: 01 c6        add    %eax,%esi
35: 89 75 f8    mov    %esi,-0x8(%rbp)
38: e8 00 00 00 00 callq 3d <_main+0x3d>
3d: 8b 75 f8    mov    -0x8(%rbp),%esi
40: 01 c6        add    %eax,%esi
42: 89 75 f8    mov    %esi,-0x8(%rbp)
45: 8b 45 f8    mov    -0x8(%rbp),%eax
48: 48 83 c4 10 add    $0x10,%rsp
4c: 5d          pop    %rbp
4d: c3          retq
```

```
118-165-77-200:InputFiles Jonathan$ gobjdump -d a.out
```

```
...
```

```
main.o:      fileformat mach-o-x86-64
```

Disassembly of section .text:

```
0000000100000ef0 <_main>:
```

Figure 9.3: Call dynamic function `_Z3fooii()` second time

```

1000000ef0: 55          push    %rbp
1000000ef1: 48 89 e5    mov     %rsp,%rbp
1000000ef4: 48 83 ec 10 sub    $0x10,%rsp
1000000ef8: bf 01 00 00 00 mov    $0x1,%edi
1000000efd: be 02 00 00 00 mov    $0x2,%esi
1000000f02: c7 45 fc 00 00 00 00 movl   $0x0,-0x4(%rbp)
1000000f09: e8 36 00 00 00 callq  100000f44 <__Z3fooii$stub>
1000000f0e: bf 03 00 00 00 mov    $0x3,%edi
1000000f13: be 04 00 00 00 mov    $0x4,%esi
1000000f18: 89 45 f8      mov     %eax,-0x8(%rbp)
1000000f1b: e8 24 00 00 00 callq  100000f44 <__Z3fooii$stub>
1000000f20: 8b 75 f8      mov     -0x8(%rbp),%esi
1000000f23: 01 c6          add    %eax,%esi
1000000f25: 89 75 f8      mov     %esi,-0x8(%rbp)
1000000f28: e8 11 00 00 00 callq  100000f3e <__Z3barv$stub>
1000000f2d: 8b 75 f8      mov     -0x8(%rbp),%esi
1000000f30: 01 c6          add    %eax,%esi
1000000f32: 89 75 f8      mov     %esi,-0x8(%rbp)
1000000f35: 8b 45 f8      mov     -0x8(%rbp),%eax
1000000f38: 48 83 c4 10      add    $0x10,%rsp
1000000f3c: 5d          pop    %rbp
1000000f3d: c3          retq

```

Disassembly of section \_\_TEXT.\_\_stubs:

```

0000000100000f3e <__Z3barv$stub>:
100000f3e: ff 25 cc 00 00 00      jmpq   *0xcc(%rip)      # 100001010
<__Z3barv$stub>

0000000100000f44 <__Z3fooii$stub>:
100000f44: ff 25 ce 00 00 00      jmpq   *0xce(%rip)      # 100001018
<__Z3fooii$stub>

```

Disassembly of section \_\_TEXT.\_\_stub\_helper:

```

0000000100000f4c <__TEXT.__stub_helper>:
100000f4c: 4c 8d 1d b5 00 00 00      lea    0xb5(%rip),%r11      # 100001008 <>
100000f53: 41 53          push   %r11
100000f55: ff 25 a5 00 00 00      jmpq  *0xa5(%rip)      # 100001000
<dyld_stub_binder$stub>
100000f5b: 90          nop
100000f5c: 68 00 00 00 00      pushq $0x0
100000f61: e9 e6 ff ff ff      jmpq  100000f4c <__Z3fooii$stub+0x8>
100000f66: 68 0f 00 00 00      pushq $0xf
100000f6b: e9 dc ff ff ff      jmpq  100000f4c <__Z3fooii$stub+0x8>

```

Disassembly of section \_\_TEXT.\_\_ unwind\_info:

...

```

118-165-77-200:InputFiles Jonathan$ lldb a.out
Current executable set to 'a.out' (x86_64).
(lldb) run main
Process 702 launched: '/Users/Jonathan/test/lbd/docs/BackendTutorial/
LLVMBackendTutorialExampleCode/InputFiles/a.out' (x86_64)
Process 702 exited with status = 15 (0x0000000f)

```

```
(lldb) b main
Breakpoint 1: where = a.out`main + 25 at main.cpp:7, address = 0x0000000100000f09
(lldb) target stop-hook add
Enter your stop hook command(s). Type 'DONE' to end.
> disassemble --pc
> DONE
Stop hook #1 added.
(lldb) run
Process 705 launched: '/Users/Jonathan/test/lbd/docs/BackendTutorial/LLVMBackendTutorialExampleCode/InputFiles/a.out' (x86_64)
dyld`_dyld_start:
-> 0x7fff5fc01028: popq  %rdi
  0x7fff5fc01029: pushq  $0
  0x7fff5fc0102b: movq  %rsp, %rbp
  0x7fff5fc0102e: andq  $-16, %rsp
Process 753 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f09 a.out`main + 25 at main.cpp:7,
stop reason = breakpoint 1.1
    frame #0: 0x0000000100000f09 a.out`main + 25 at main.cpp:7
4
5         int main()
6         {
-> 7             int a = foo(1, 2);
8             a += foo(3, 4);
9             a += bar();
10
a.out`main + 25 at main.cpp:7:
-> 0x100000f09: callq  0x100000f44 ; symbol stub for: foo(int, int)
  0x100000f0e: movl  $3, %edi
  0x100000f13: movl  $4, %esi
  0x100000f18: movl  %eax, -8(%rbp)
(lldb) stepi
Process 753 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f44 a.out`foo(int, int), stop reason
= instruction step into
    frame #0: 0x0000000100000f44 a.out`foo(int, int)
a.out`symbol stub for: foo(int, int):
-> 0x100000f44: jmpq  *206(%rip) ; (void *)0x0000000100000f66
a.out`symbol stub for: foo(int, int):
-> 0x100000f44: jmpq  *206(%rip) ; (void *)0x0000000100000f66
  0x100000f4a: addb  %al, (%rax)
  0x100000f4c: addb  %al, (%rax)
  0x100000f4e: addb  %al, (%rax)
(lldb) p $rip
(unsigned long) $1 = 4294971204
(lldb) memory read/4xw 4294971410
0x100001012: 0x00010000 0x0f660000 0x00010000 0x00000000
(lldb) stepi
Process 859 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f66 a.out, stop reason = instruction
    step into frame #0: 0x0000000100000f66 a.out
-> 0x100000f66: pushq  $15
  0x100000f6b: jmpq  0x100000f4c
-> 0x100000f66: pushq  $15
  0x100000f6b: jmpq  0x100000f4c
  0x100000f70: addb  %al, (%rax)
  0x100000f72: addb  %al, (%rax)
) stepi stepi
```

```

Process 859 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f6b a.out, stop reason = instruction
  step into frame #0: 0x0000000100000f6b a.out
-> 0x100000f6b: jmpq 0x100000f4c
-> 0x100000f6b: jmpq 0x100000f4c
  0x100000f70: addb %al, (%rax)
  0x100000f72: addb %al, (%rax)
  0x100000f74: addb %al, (%rax)
(lldb) stepi
Process 859 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f4c a.out, stop reason = instruction
  step into frame #0: 0x0000000100000f4c a.out
-> 0x100000f4c: leaq 181(%rip), %r11 ; (void *)0x0000000000000000
  0x100000f53: pushq %r11
  0x100000f55: jmpq *165(%rip) ; (void *)0x00007fff978da878:
  dyld_stub_binder
  0x100000f5b: nop
-> 0x100000f4c: leaq 181(%rip), %r11 ; (void *)0x0000000000000000
  0x100000f53: pushq %r11
  0x100000f55: jmpq *165(%rip) ; (void *)0x00007fff978da878:
  dyld_stub_binder
  0x100000f5b: nop
(lldb)
Process 859 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f53 a.out, stop reason = instruction
  step into frame #0: 0x0000000100000f53 a.out
-> 0x100000f53: pushq %r11
  0x100000f55: jmpq *165(%rip) ; (void *)0x00007fff978da878:
  dyld_stub_binder
  0x100000f5b: nop
  0x100000f5c: pushq $0
-> 0x100000f53: pushq %r11
  0x100000f55: jmpq *165(%rip) ; (void *)0x00007fff978da878:
  dyld_stub_binder
  0x100000f5b: nop
  0x100000f5c: pushq $0
(lldb)
Process 859 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f55 a.out, stop reason = instruction
  step into frame #0: 0x0000000100000f55 a.out
-> 0x100000f55: jmpq *165(%rip) ; (void *)0x00007fff978da878:
  dyld_stub_binder
  0x100000f5b: nop
  0x100000f5c: pushq $0
  0x100000f61: jmpq 0x100000f4c
-> 0x100000f55: jmpq *165(%rip) ; (void *)0x00007fff978da878:
  dyld_stub_binder
  0x100000f5b: nop
  0x100000f5c: pushq $0
  0x100000f61: jmpq 0x100000f4c
(lldb)
Process 859 stopped
* thread #1: tid = 0x1c03, 0x00007fff978da878 libdyld.dylib`dyld_stub_binder,
  stop reason = instruction step into
  frame #0: 0x00007fff978da878 libdyld.dylib`dyld_stub_binder
libdyld.dylib`dyld_stub_binder:
-> 0xffff978da878: pushq %rbp
  0xffff978da879: movq %rsp, %rbp

```

```

0x7fff978da87c: subq    $192, %rsp
0x7fff978da883: movq    %rdi, (%rsp)
libdyld.dylib`dyld_stub_binder:
-> 0x7fff978da878: pushq    %rbp
0x7fff978da879: movq    %rsp, %rbp
0x7fff978da87c: subq    $192, %rsp
0x7fff978da883: movq    %rdi, (%rsp)
(lldb) cont
Process 753 resuming
Process 753 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f1b a.out `main + 43 at main.cpp:8,
  stop reason = breakpoint 2.1
  frame #0: 0x0000000100000f1b a.out `main + 43 at main.cpp:8
5          int main()
6          {
7              int a = foo(1, 2);
-> 8              a += foo(3, 4);
9              a += bar();
10
11         return a;
a.out `main + 43 at main.cpp:8:
-> 0x100000f1b: callq  0x100000f44           ; symbol stub for: foo(int, int)
0x100000f20: movl    -8(%rbp), %esi
0x100000f23: addl    %eax, %esi
0x100000f25: movl    %esi, -8(%rbp)
(lldb) stepi
Process 753 stopped
* thread #1: tid = 0x1c03, 0x0000000100000f44 a.out `foo(int, int), stop reason =
  instruction step into frame #0: 0x0000000100000f44 a.out `foo(int, int)
a.out `symbol stub for: foo(int, int):
-> 0x100000f44: jmpq    *206(%rip)           ; (void *)0x0000000100003f20:
  foo(int, int) at /Users/Jonathan/test/lbd/docs/BackendTutorial/LLVMBackendT
  utorialExampleCode/InputFiles/foobar.cpp:3
a.out `symbol stub for: foo(int, int):
-> 0x100000f44: jmpq    *206(%rip)           ; (void *)0x0000000100003f20:
  foo(int, int) at /Users/Jonathan/test/lbd/docs/BackendTutorial/LLVMBackendT
  utorialExampleCode/InputFiles/foobar.cpp:3
0x100000f4a: addb    %al, (%rax)
0x100000f4c: addb    %al, (%rax)
0x100000f4e: addb    %al, (%rax)
(lldb) p $rip
(unsigned long) $2 = 4294971204
(lldb) memory read/4xw 4294971410
0x100001012: 0x00010000 0x3f200000 0x00010000 0x00000000
(lldb) stepi
Process 753 stopped
* thread #1: tid = 0x1c03, 0x0000000100003f20 libfoobar.so `foo(x1=0, x2=3) at
  foobar.cpp:3, stop reason = instruction step into
  frame #0: 0x0000000100003f20 libfoobar.so `foo(x1=0, x2=3) at foobar.cpp:3
1
2         int foo(int x1, int x2)
-> 3         {
4             int sum = x1 + x2;
5
6         return sum;
libfoobar.so `foo(int, int) at foobar.cpp:3:
-> 0x100003f20: pushq    %rbp
0x100003f21: movq    %rsp, %rbp

```

```
0x100003f24: movl    %edi, -4(%rbp)
0x100003f27: movl    %esi, -8(%rbp)
(lldb)
```



# RUN BACKEND

This chapter will add LLVM AsmParser support first. With AsmParser support, we can hand code the assembly language in C/C++ file and translate it into obj (elf format). We can write a C++ main function as well as the boot code by assembly hand code, and translate this main() + bootcode() into obj file. Combined with llvm-objdump support in last chapter, this main() + bootcode() elf can be translated into hex file format which include the disassemble code as comment. Furthermore, we can design the Cpu0 with Verilog language tool and run the Cpu0 backend on PC by feed the hex file and see the Cpu0 instructions execution result.

## 10.1 AsmParser support

Run Chapter9\_1/ with ch10\_1.cpp will get the following error message.

[LLVMBackendTutorialExampleCode/InputFiles/ch10\\_1.cpp](#)

```
1 asm("ld      $2, 8($sp)");
2 asm("st      $0, 4($sp)");
3 asm("addiu $3,      $ZERO, 0");
4 asm("add $3, $1, $2");
5 asm("sub $3, $2, $3");
6 asm("mul $2, $1, $3");
7 asm("div $3, $2");
8 asm("divu $2, $3");
9 asm("and $2, $1, $3");
10 asm("or $3, $1, $2");
11 asm("xor $1, $2, $3");
12 asm("mult $4, $3");
13 asm("multu $3, $2");
14 asm("mfhi $3");
15 asm("mflo $2");
16 asm("mthi $2");
17 asm("mtlo $2");
18 asm("sra $2, $2, 2");
19 asm("rol $2, $1, 3");
20 asm("ror $3, $3, 4");
21 asm("shl $2, $2, 2");
22 asm("shr $2, $3, 5");
23 asm("cmp $sw, $2, $3");
24 asm("jeq $sw, 20");
25 asm("jne $sw, 16");
26 asm("jlt $sw, -20");
```

```
27 asm("jle $sw, -16");
28 asm("jgt $sw, -4");
29 asm("jge $sw, -12");
30 asm("swi 0x00000400");
31 asm("jsub 0x000010000");
32 asm("ret $lr");
33 asm("jalr $t9");
34 asm("li $3, 0x00700000");
35 asm("la $3, 0x00800000($6)");
36 asm("la $3, 0x00900000");
```

```
JonathantekiiMac:InputFiles Jonathan$ clang -c ch10_1.cpp -emit-llvm -o ch10_1.bc
JonathantekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=obj ch10_1.bc
-o ch10_1.cpu0.o
LLVM ERROR: Inline asm not supported by this streamer because we don't have
an asm parser for this target
```

Since we didn't implement cpu0 assembly, it has the error message as above. The cpu0 can translate LLVM IR into assembly and obj directly, but it cannot translate hand code assembly into obj. Directory AsmParser handle the assembly to obj translation. The Chapter10\_1/ include AsmParser implementation as follows,

### LLVMBackendTutorialExampleCode/Chapter10\_1/AsmParser/Cpu0AsmParser.cpp

```
1 //===== Cpu0AsmParser.cpp - Parse Cpu0 assembly to MCInst instructions =====//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====-----=====//
```

  

```
9
10 #include "MCTargetDesc/Cpu0MCTargetDesc.h"
11 #include "Cpu0RegisterInfo.h"
12 #include "llvm/ADT/StringSwitch.h"
13 #include "llvm/MC/MCContext.h"
14 #include "llvm/MC/MCEExpr.h"
15 #include "llvm/MC/MCInst.h"
16 #include "llvm/MC/MCStreamer.h"
17 #include "llvm/MC/MCSubtargetInfo.h"
18 #include "llvm/MC/MCSymbol.h"
19 #include "llvm/MC/MCParser/MCAsmLexer.h"
20 #include "llvm/MC/MCParser/MCParsedAsmOperand.h"
21 #include "llvm/MC/MCTargetAsmParser.h"
22 #include "llvm/Support/TargetRegistry.h"
23
24 using namespace llvm;
25
26 namespace {
27     class Cpu0AssemblerOptions {
28     public:
29         Cpu0AssemblerOptions():
30             aTReg(1), reorder(true), macro(true) {
31     }
32 }
```

```

33     bool isReorder() {return reorder; }
34     void setReorder() {reorder = true; }
35     void setNoreorder() {reorder = false; }
36
37     bool isMacro() {return macro; }
38     void setMacro() {macro = true; }
39     void setNomacro() {macro = false; }
40
41 private:
42     unsigned aTReg;
43     bool reorder;
44     bool macro;
45 };
46 }
47
48 namespace {
49 class Cpu0AsmParser : public MCTargetAsmParser {
50     MCSubtargetInfo &STI;
51     MCAsmParser &Parser;
52     Cpu0AssemblerOptions Options;
53
54
55 #define GET_ASSEMBLER_HEADER
56 #include "Cpu0GenAsmMatcher.inc"
57
58     bool MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
59                                 SmallVectorImpl<MCParsedAsmOperand*> &Operands,
60                                 MCStreamer &Out, unsigned &ErrorInfo,
61                                 bool MatchingInlineAsm);
62
63     bool ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc);
64
65     bool ParseInstruction(ParseInstructionInfo &Info, StringRef Name,
66                           SMLoc NameLoc,
67                           SmallVectorImpl<MCParsedAsmOperand*> &Operands);
68
69     bool parseMathOperation(StringRef Name, SMLoc NameLoc,
70                           SmallVectorImpl<MCParsedAsmOperand*> &Operands);
71
72     bool ParseDirective(AsmToken DirectiveID);
73
74     Cpu0AsmParser::OperandMatchResultTy
75     parseMemOperand(SmallVectorImpl<MCParsedAsmOperand*> &);
76
77     bool ParseOperand(SmallVectorImpl<MCParsedAsmOperand*> &,
78                       StringRef Mnemonic);
79
80     int tryParseRegister(StringRef Mnemonic);
81
82     bool tryParseRegisterOperand(SmallVectorImpl<MCParsedAsmOperand*> &Operands,
83                                 StringRef Mnemonic);
84
85     bool needsExpansion(MCInst &Inst);
86
87     void expandInstruction(MCInst &Inst, SMLoc IDLoc,
88                           SmallVectorImpl<MCInst> &Instructions);
89     void expandLoadImm(MCInst &Inst, SMLoc IDLoc,
90                         SmallVectorImpl<MCInst> &Instructions);

```

```

91     void expandLoadAddressImm(MCInst &Inst, SMLoc IDLoc,
92                               SmallVectorImpl<MCInst> &Instructions);
93     void expandLoadAddressReg(MCInst &Inst, SMLoc IDLoc,
94                               SmallVectorImpl<MCInst> &Instructions);
95     bool reportParseError(StringRef ErrorMsg);
96
97     bool parseMemOffset(const MCEexpr *&Res);
98     bool parseRelocOperand(const MCEexpr *&Res);
99
100    bool parseDirectiveSet();
101
102    bool parseSetAtDirective();
103    bool parseSetNoAtDirective();
104    bool parseSetMacroDirective();
105    bool parseSetNoMacroDirective();
106    bool parseSetReorderDirective();
107    bool parseSetNoReorderDirective();
108
109    MCSymbolRefExpr::VariantKind getVariantKind(StringRef Symbol);
110
111    int matchRegisterName(StringRef Symbol);
112
113    int matchRegisterByNumber(unsigned RegNum, StringRef Mnemonic);
114
115    unsigned getReg(int RC, int RegNo);
116
117 public:
118     Cpu0AsmParser(MCSubtargetInfo &sti, MCAsmParser &parser)
119         : MCTargetAsmParser(), STI(sti), Parser(parser) {
120         // Initialize the set of available features.
121         setAvailableFeatures(ComputeAvailableFeatures(STI.getFeatureBits()));
122     }
123
124     MCAsmParser &getParser() const { return Parser; }
125     MCAsmLexer &getLexer() const { return Parser.getLexer(); }
126
127 };
128 }
129
130 namespace {
131
132     /// Cpu0Operand - Instances of this class represent a parsed Cpu0 machine
133     /// instruction.
134     class Cpu0Operand : public MCParsedAsmOperand {
135
136         enum KindTy {
137             k_CondCode,
138             k_CoprocNum,
139             k_Immediate,
140             k_Memory,
141             k_PostIndexRegister,
142             k_Register,
143             k_Token
144         } Kind;
145
146         Cpu0Operand(KindTy K) : MCParsedAsmOperand(), Kind(K) {}
147
148         union {

```

```

149     struct {
150         const char *Data;
151         unsigned Length;
152     } Tok;
153
154     struct {
155         unsigned RegNum;
156     } Reg;
157
158     struct {
159         const MCExpr *Val;
160     } Imm;
161
162     struct {
163         unsigned Base;
164         const MCExpr *Off;
165     } Mem;
166 };
167
168     SMLoc StartLoc, EndLoc;
169
170 public:
171     void addRegOperands(MCInst &Inst, unsigned N) const {
172         assert(N == 1 && "Invalid number of operands!");
173         Inst.addOperand(MCOperand::CreateReg(getReg()));
174     }
175
176     void addExpr(MCInst &Inst, const MCExpr *Expr) const {
177         // Add as immediate when possible. Null MCExpr = 0.
178         if (Expr == 0)
179             Inst.addOperand(MCOperand::CreateImm(0));
180         else if (const MCConstantExpr *CE = dyn_cast<MCConstantExpr>(Expr))
181             Inst.addOperand(MCOperand::CreateImm(CE->getValue()));
182         else
183             Inst.addOperand(MCOperand::CreateExpr(Expr));
184     }
185
186     void addImmOperands(MCInst &Inst, unsigned N) const {
187         assert(N == 1 && "Invalid number of operands!");
188         const MCExpr *Expr = getImm();
189         addExpr(Inst, Expr);
190     }
191
192     void addMemOperands(MCInst &Inst, unsigned N) const {
193         assert(N == 2 && "Invalid number of operands!");
194
195         Inst.addOperand(MCOperand::CreateReg(getMemBase()));
196
197         const MCExpr *Expr = getMemOff();
198         addExpr(Inst, Expr);
199     }
200
201     bool isReg() const { return Kind == k_Register; }
202     bool isImm() const { return Kind == k_Immediate; }
203     bool isToken() const { return Kind == k_Token; }
204     bool isMem() const { return Kind == k_Memory; }
205
206    StringRef getToken() const {

```

```

207     assert(Kind == k_Token && "Invalid access!");
208     return StringRef(Tok.Data, Tok.Length);
209 }
210
211 unsigned getReg() const {
212     assert((Kind == k_Register) && "Invalid access!");
213     return Reg.RegNum;
214 }
215
216 const MCExpr *getImm() const {
217     assert((Kind == k_Immediate) && "Invalid access!");
218     return Imm.Val;
219 }
220
221 unsigned getMemBase() const {
222     assert((Kind == k_Memory) && "Invalid access!");
223     return Mem.Base;
224 }
225
226 const MCExpr *getMemOff() const {
227     assert((Kind == k_Memory) && "Invalid access!");
228     return Mem.Off;
229 }
230
231 static Cpu0Operand *CreateToken(StringRef Str, SMLoc S) {
232     Cpu0Operand *Op = new Cpu0Operand(k_Token);
233     Op->Tok.Data = Str.data();
234     Op->Tok.Length = Str.size();
235     Op->StartLoc = S;
236     Op->EndLoc = S;
237     return Op;
238 }
239
240 static Cpu0Operand *CreateReg(unsigned RegNum, SMLoc S, SMLoc E) {
241     Cpu0Operand *Op = new Cpu0Operand(k_Register);
242     Op->Reg.RegNum = RegNum;
243     Op->StartLoc = S;
244     Op->EndLoc = E;
245     return Op;
246 }
247
248 static Cpu0Operand *CreateImm(const MCExpr *Val, SMLoc S, SMLoc E) {
249     Cpu0Operand *Op = new Cpu0Operand(k_Immediate);
250     Op->Imm.Val = Val;
251     Op->StartLoc = S;
252     Op->EndLoc = E;
253     return Op;
254 }
255
256 static Cpu0Operand *CreateMem(unsigned Base, const MCExpr *Off,
257                               SMLoc S, SMLoc E) {
258     Cpu0Operand *Op = new Cpu0Operand(k_Memory);
259     Op->Mem.Base = Base;
260     Op->Mem.Off = Off;
261     Op->StartLoc = S;
262     Op->EndLoc = E;
263     return Op;
264 }

```

```

265
266     /// getStartLoc - Get the location of the first token of this operand.
267     SMLoc getStartLoc() const { return StartLoc; }
268     /// getEndLoc - Get the location of the last token of this operand.
269     SMLoc getEndLoc() const { return EndLoc; }
270
271     virtual void print(raw_ostream &OS) const {
272         llvm_unreachable("unimplemented!");
273     }
274 };
275 }
276
277 bool Cpu0AsmParser::needsExpansion(MCInst &Inst) {
278
279     switch(Inst.getOpcode()) {
280         case Cpu0::LoadImm32Reg:
281         case Cpu0::LoadAddr32Imm:
282         case Cpu0::LoadAddr32Reg:
283             return true;
284         default:
285             return false;
286     }
287 }
288
289 void Cpu0AsmParser::expandInstruction(MCInst &Inst, SMLoc IDLoc,
290                                         SmallVectorImpl<MCInst> &Instructions) {
291     switch(Inst.getOpcode()) {
292         case Cpu0::LoadImm32Reg:
293             return expandLoadImm(Inst, IDLoc, Instructions);
294         case Cpu0::LoadAddr32Imm:
295             return expandLoadAddressImm(Inst, IDLoc, Instructions);
296         case Cpu0::LoadAddr32Reg:
297             return expandLoadAddressReg(Inst, IDLoc, Instructions);
298     }
299 }
300
301 void Cpu0AsmParser::expandLoadImm(MCInst &Inst, SMLoc IDLoc,
302                                     SmallVectorImpl<MCInst> &Instructions) {
303     MCInst tmpInst;
304     const MCOperand &ImmOp = Inst.getOperand(1);
305     assert(ImmOp.isImm() && "expected immediate operand kind");
306     const MCOperand &RegOp = Inst.getOperand(0);
307     assert(RegOp.isReg() && "expected register operand kind");
308
309     int ImmValue = ImmOp.getImm();
310     tmpInst.setLoc(IDLoc);
311     if ( -32768 <= ImmValue && ImmValue <= 32767) {
312         // for -32768 <= j < 32767.
313         // li d,j => addiu d,$zero,j
314         tmpInst.setOpcode(Cpu0::ADDiu); //TODO: no ADDiu64 in td files?
315         tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
316         tmpInst.addOperand(
317             MCOperand::CreateReg(Cpu0::ZERO));
318         tmpInst.addOperand(MCOperand::CreateImm(ImmValue));
319         Instructions.push_back(tmpInst);
320     } else {
321         // for any other value of j that is representable as a 32-bit integer.
322         // li d,j => addiu d, $0, hi16(j)

```

```

323     //          shl d, d, 16
324     //          addiu at, $0, lo16(j)
325     //          or d, d, at
326     tmpInst.setOpcode(Cpu0::ADDiu);
327     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
328     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::ZERO));
329     tmpInst.addOperand(MCOperand::CreateImm((ImmValue & 0xffff0000) >> 16));
330     Instructions.push_back(tmpInst);
331     tmpInst.clear();
332     tmpInst.setOpcode(Cpu0::SHL);
333     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
334     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
335     tmpInst.addOperand(MCOperand::CreateImm(16));
336     Instructions.push_back(tmpInst);
337     tmpInst.clear();
338     tmpInst.setOpcode(Cpu0::ADDiu);
339     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::AT));
340     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::ZERO));
341     tmpInst.addOperand(MCOperand::CreateImm(ImmValue & 0x0000ffff));
342     Instructions.push_back(tmpInst);
343     tmpInst.clear();
344     tmpInst.setOpcode(Cpu0::OR);
345     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
346     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
347     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::AT));
348     tmpInst.setLoc(IDLoc);
349     Instructions.push_back(tmpInst);
350 }
351 }
352
353 void Cpu0AsmParser::expandLoadAddressReg(MCInst &Inst, SMLoc IDLoc,
354                                         SmallVectorImpl<MCInst> &Instructions) {
355     MCInst tmpInst;
356     const MCOperand &ImmOp = Inst.getOperand(2);
357     assert(ImmOp.isImm() && "expected immediate operand kind");
358     const MCOperand &SrcRegOp = Inst.getOperand(1);
359     assert(SrcRegOp.isReg() && "expected register operand kind");
360     const MCOperand &DstRegOp = Inst.getOperand(0);
361     assert(DstRegOp.isReg() && "expected register operand kind");
362     int ImmValue = ImmOp.getImm();
363     if ( -32768 <= ImmValue && ImmValue <= 32767) {
364         // for -32768 <= j < 32767.
365         // la d,j(s) => addiu d,s,j
366         tmpInst.setOpcode(Cpu0::ADDiu); //TODO: no ADDiu64 in td files?
367         tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
368         tmpInst.addOperand(MCOperand::CreateReg(SrcRegOp.getReg()));
369         tmpInst.addOperand(MCOperand::CreateImm(ImmValue));
370         Instructions.push_back(tmpInst);
371     } else {
372         // for any other value of j that is representable as a 32-bit integer.
373         // li d,j(s) => addiu d, $0, hi16(j)
374         //          shl d, d, 16
375         //          addiu at, $0, lo16(j)
376         //          or d, d, at
377         //          add d,d,s
378         tmpInst.setOpcode(Cpu0::ADDiu);
379         tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
380         tmpInst.addOperand(MCOperand::CreateReg(Cpu0::ZERO));

```

```

381     tmpInst.addOperand(MCOperand::CreateImm((ImmValue & 0xffff0000) >> 16));
382     Instructions.push_back(tmpInst);
383     tmpInst.clear();
384     tmpInst.setOpcode(Cpu0::SHL);
385     tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
386     tmpInst.addOperand(MCOperand::CreateReg(SrcRegOp.getReg()));
387     tmpInst.addOperand(MCOperand::CreateImm(16));
388     Instructions.push_back(tmpInst);
389     tmpInst.clear();
390     tmpInst.setOpcode(Cpu0::ADDiu);
391     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::AT));
392     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::ZERO));
393     tmpInst.addOperand(MCOperand::CreateImm(ImmValue & 0x0000ffff));
394     Instructions.push_back(tmpInst);
395     tmpInst.clear();
396     tmpInst.setOpcode(Cpu0::OR);
397     tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
398     tmpInst.addOperand(MCOperand::CreateReg(SrcRegOp.getReg()));
399     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::AT));
400     tmpInst.setLoc(IDLoc);
401     Instructions.push_back(tmpInst);
402     tmpInst.clear();
403     tmpInst.setOpcode(Cpu0::ADD);
404     tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
405     tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
406     tmpInst.addOperand(MCOperand::CreateReg(SrcRegOp.getReg()));
407     Instructions.push_back(tmpInst);
408 }
409 }
410
411 void Cpu0AsmParser::expandLoadAddressImm(MCInst &Inst, SMLoc IDLoc,
412                                         SmallVectorImpl<MCInst> &Instructions) {
413     MCInst tmpInst;
414     const MCOperand &ImmOp = Inst.getOperand(1);
415     assert(ImmOp.isImm() && "expected immediate operand kind");
416     const MCOperand &RegOp = Inst.getOperand(0);
417     assert(RegOp.isReg() && "expected register operand kind");
418     int ImmValue = ImmOp.getImm();
419     if ( -32768 <= ImmValue && ImmValue <= 32767) {
420         // for -32768 <= j < 32767.
421         // la d, j => addiu d, $zero, j
422         tmpInst.setOpcode(Cpu0::ADDiu);
423         tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
424         tmpInst.addOperand(
425             MCOperand::CreateReg(Cpu0::ZERO));
426         tmpInst.addOperand(MCOperand::CreateImm(ImmValue));
427         Instructions.push_back(tmpInst);
428     } else {
429         // for any other value of j that is representable as a 32-bit integer.
430         // la d, j => addiu d, $0, hi16(j)
431         //           shl d, d, 16
432         //           addiu at, $0, lo16(j)
433         //           or d, d, at
434         tmpInst.setOpcode(Cpu0::ADDiu);
435         tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
436         tmpInst.addOperand(MCOperand::CreateReg(Cpu0::ZERO));
437         tmpInst.addOperand(MCOperand::CreateImm((ImmValue & 0xffff0000) >> 16));
438         Instructions.push_back(tmpInst);

```

```

439     tmpInst.clear();
440     tmpInst.setOpcode(Cpu0::SHL);
441     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
442     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
443     tmpInst.addOperand(MCOperand::CreateImm(16));
444     Instructions.push_back(tmpInst);
445     tmpInst.clear();
446     tmpInst.setOpcode(Cpu0::ADDiu);
447     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::AT));
448     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::ZERO));
449     tmpInst.addOperand(MCOperand::CreateImm(ImmValue & 0x0000ffff));
450     Instructions.push_back(tmpInst);
451     tmpInst.clear();
452     tmpInst.setOpcode(Cpu0::OR);
453     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
454     tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
455     tmpInst.addOperand(MCOperand::CreateReg(Cpu0::AT));
456     tmpInst.setLoc(IDLoc);
457     Instructions.push_back(tmpInst);
458 }
459 }
460
461 bool Cpu0AsmParser::
462 MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
463                         SmallVectorImpl<MCParsedAsmOperand*> &Operands,
464                         MCStreamer &Out, unsigned &ErrorInfo,
465                         bool MatchingInlineAsm) {
466     MCInst Inst;
467     unsigned MatchResult = MatchInstructionImpl(Operands, Inst, ErrorInfo,
468                                              MatchingInlineAsm);
469
470     switch (MatchResult) {
471     default: break;
472     case Match_Success: {
473         if (needsExpansion(Inst)) {
474             SmallVector<MCInst, 4> Instructions;
475             expandInstruction(Inst, IDLoc, Instructions);
476             for (unsigned i = 0; i < Instructions.size(); i++) {
477                 Out.EmitInstruction(Instructions[i]);
478             }
479         } else {
480             Inst.setLoc(IDLoc);
481             Out.EmitInstruction(Inst);
482         }
483         return false;
484     }
485     case Match_MissingFeature:
486         Error(IDLoc, "instruction requires a CPU feature not currently enabled");
487         return true;
488     case Match_InvalidOperand: {
489         SMLoc ErrorLoc = IDLoc;
490         if (ErrorInfo != ~0U) {
491             if (ErrorInfo >= Operands.size())
492                 return Error(IDLoc, "too few operands for instruction");
493
494             ErrorLoc = ((Cpu0Operand*)Operands[ErrorInfo])->getStartLoc();
495             if (ErrorLoc == SMLoc()) ErrorLoc = IDLoc;
496         }

```

```

497     return Error(ErrorLoc, "invalid operand for instruction");
498 }
499
500 case Match_MnemonicFail:
501     return Error(IDLoc, "invalid instruction");
502 }
503     return true;
504 }
505
506 int Cpu0AsmParser::matchRegisterName(StringRef Name) {
507
508     int CC;
509     CC = StringSwitch<unsigned>(Name)
510         .Case("zero", Cpu0::ZERO)
511         .Case("at", Cpu0::AT)
512         .Case("v0", Cpu0::V0)
513         .Case("v1", Cpu0::V1)
514         .Case("a0", Cpu0::A0)
515         .Case("a1", Cpu0::A1)
516         .Case("t9", Cpu0::T9)
517         .Case("s0", Cpu0::S0)
518         .Case("s1", Cpu0::S1)
519         .Case("s2", Cpu0::S2)
520         .Case("gp", Cpu0::GP)
521         .Case("fp", Cpu0::FP)
522         .Case("sw", Cpu0::SW)
523         .Case("sp", Cpu0::SP)
524         .Case("lr", Cpu0::LR)
525         .Case("pc", Cpu0::PC)
526         .Default(-1);
527
528     if (CC != -1)
529         return CC;
530
531     return -1;
532 }
533
534 unsigned Cpu0AsmParser::getReg(int RC, int RegNo) {
535     return *(getContext().getRegisterInfo().getRegClass(RC).begin() + RegNo);
536 }
537
538 int Cpu0AsmParser::matchRegisterByNumber(unsigned RegNum, StringRef Mnemonic) {
539     if (RegNum > 15)
540         return -1;
541
542     return getReg(Cpu0::CPUREgsRegClassID, RegNum);
543 }
544
545 int Cpu0AsmParser::tryParseRegister(StringRef Mnemonic) {
546     const AsmToken &Tok = Parser.getTok();
547     int RegNum = -1;
548
549     if (Tok.is(AsmToken::Identifier)) {
550         std::string lowerCase = Tok.getString().lower();
551         RegNum = matchRegisterName(lowerCase);
552     } else if (Tok.is(AsmToken::Integer))
553         RegNum = matchRegisterByNumber(static_cast<unsigned>(Tok.getIntVal()),
554                                     Mnemonic.lower());

```

```

555     else
556         return RegNum;    //error
557     return RegNum;
558 }
559
560 bool Cpu0AsmParser:::
561     tryParseRegisterOperand(SmallVectorImpl<MCParsedAsmOperand*> &Operands,
562                            StringRef Mnemonic) {
563
564     SMLoc S = Parser.getTok().getLoc();
565     int RegNo = -1;
566
567     RegNo = tryParseRegister(Mnemonic);
568     if (RegNo == -1)
569         return true;
570
571     Operands.push_back(Cpu0Operand::CreateReg(RegNo, S,
572                                             Parser.getTok().getLoc()));
573     Parser.Lex(); // Eat register token.
574     return false;
575 }
576
577 bool Cpu0AsmParser::ParseOperand(SmallVectorImpl<MCParsedAsmOperand*>&Operands,
578                                     StringRef Mnemonic) {
579     // Check if the current operand has a custom associated parser, if so, try to
580     // custom parse the operand, or fallback to the general approach.
581     OperandMatchResultTy ResTy = MatchOperandParserImpl(Operands, Mnemonic);
582     if (ResTy == MatchOperand_Success)
583         return false;
584     // If there wasn't a custom match, try the generic matcher below. Otherwise,
585     // there was a match, but an error occurred, in which case, just return that
586     // the operand parsing failed.
587     if (ResTy == MatchOperand_ParseFail)
588         return true;
589
590     switch (getLexer().getKind()) {
591     default:
592         Error(Parser.getTok().getLoc(), "unexpected token in operand");
593         return true;
594     case AsmToken::Dollar: {
595         // parse register
596         SMLoc S = Parser.getTok().getLoc();
597         Parser.Lex(); // Eat dollar token.
598         // parse register operand
599         if (!tryParseRegisterOperand(Operands, Mnemonic)) {
600             if (getLexer().is(AsmToken::LParen)) {
601                 // check if it is indexed addressing operand
602                 Operands.push_back(Cpu0Operand::CreateToken("(", S));
603                 Parser.Lex(); // eat parenthesis
604                 if (getLexer().isNot(AsmToken::Dollar))
605                     return true;
606
607                 Parser.Lex(); // eat dollar
608                 if (tryParseRegisterOperand(Operands, Mnemonic))
609                     return true;
610
611                 if (!getLexer().is(AsmToken::RParen))
612                     return true;
613             }
614         }
615     }
616 }

```

```

613     S = Parser.getTok().getLoc();
614     Operands.push_back(Cpu0Operand::CreateToken(")", S));
615     Parser.Lex();
616 }
617 return false;
618 }
619 // maybe it is a symbol reference
620 StringRef Identifier;
621 if (Parser.parseIdentifier(Identifier))
622     return true;
623
624 SMLoc E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1);
625
626 MCSymbol *Sym = getContext().GetOrCreateSymbol("$" + Identifier);
627
628 // Otherwise create a symbol ref.
629 const MCExpr *Res = MCSymbolRefExpr::Create(Sym, MCSymbolRefExpr::VK_None,
630                                              getContext());
631
632 Operands.push_back(Cpu0Operand::CreateImm(Res, S, E));
633 return false;
634 }
635 case AsmToken::Identifier:
636 case AsmToken::LParen:
637 case AsmToken::Minus:
638 case AsmToken::Plus:
639 case AsmToken::Integer:
640 case AsmToken::String: {
641     // quoted label names
642     const MCExpr *IdVal;
643     SMLoc S = Parser.getTok().getLoc();
644     if (getParser().parseExpression(IdVal))
645         return true;
646     SMLoc E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1);
647     Operands.push_back(Cpu0Operand::CreateImm(IdVal, S, E));
648     return false;
649 }
650 case AsmToken::Percent: {
651     // it is a symbol reference or constant expression
652     const MCExpr *IdVal;
653     SMLoc S = Parser.getTok().getLoc(); // start location of the operand
654     if (parseRelocOperand(IdVal))
655         return true;
656
657     SMLoc E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1);
658
659     Operands.push_back(Cpu0Operand::CreateImm(IdVal, S, E));
660     return false;
661 }
662 // case AsmToken::Percent
663 // switch(getLexer().getKind())
664 return true;
665 }
666
667 bool Cpu0AsmParser::parseRelocOperand(const MCExpr *&Res) {
668
669     Parser.Lex(); // eat % token
670     const AsmToken &Tok = Parser.getTok(); // get next token, operation

```

```

671     if (Tok.isNot(AsmToken::Identifier))
672         return true;
673
674     std::string Str = Tok.getIdentifier().str();
675
676     Parser.Lex(); // eat identifier
677     // now make expression from the rest of the operand
678     const MCExpr *IdVal;
679     SMLoc EndLoc;
680
681     if (getLexer().getKind() == AsmToken::LParen) {
682         while (1) {
683             Parser.Lex(); // eat '(' token
684             if (getLexer().getKind() == AsmToken::Percent) {
685                 Parser.Lex(); // eat % token
686                 const AsmToken &nextTok = Parser.getTok();
687                 if (nextTok.isNot(AsmToken::Identifier))
688                     return true;
689                 Str += "%";
690                 Str += nextTok.getIdentifier();
691                 Parser.Lex(); // eat identifier
692                 if (getLexer().getKind() != AsmToken::LParen)
693                     return true;
694                 } else
695                     break;
696             }
697             if (getParser().parseParenExpression(IdVal, EndLoc))
698                 return true;
699
700             while (getLexer().getKind() == AsmToken::RParen)
701                 Parser.Lex(); // eat ')' token
702
703     } else
704         return true; // parenthesis must follow reloc operand
705
706     // Check the type of the expression
707     if (const MCConstantExpr *MCE = dyn_cast<MCConstantExpr>(IdVal)) {
708         // it's a constant, evaluate lo or hi value
709         int Val = MCE->getValue();
710         if (Str == "lo") {
711             Val = Val & 0xffff;
712         } else if (Str == "hi") {
713             Val = (Val & 0xffff0000) >> 16;
714         }
715         Res = MCConstantExpr::Create(Val, getContext());
716         return false;
717     }
718
719     if (const MCSymbolRefExpr *MSRE = dyn_cast<MCSymbolRefExpr>(IdVal)) {
720         // it's a symbol, create symbolic expression from symbol
721         StringRef Symbol = MSRE->getSymbol().getName();
722         MCSymbolRefExpr::VariantKind VK = getVariantKind(Str);
723         Res = MCSymbolRefExpr::Create(Symbol, VK, getContext());
724         return false;
725     }
726     return true;
727 }
728

```

```

729 bool Cpu0AsmParser::ParseRegister(unsigned &RegNo, SMLoc &StartLoc,
730                                     SMLoc &EndLoc) {
731
732     StartLoc = Parser.getTok().getLoc();
733     RegNo = tryParseRegister("");
734     EndLoc = Parser.getTok().getLoc();
735     return (RegNo == (unsigned)-1);
736 }
737
738 bool Cpu0AsmParser::parseMemOffset(const MCExpr *&Res) {
739
740     SMLoc S;
741
742     switch(getLexer().getKind()) {
743     default:
744         return true;
745     case AsmToken::Integer:
746     case AsmToken::Minus:
747     case AsmToken::Plus:
748         return (getParser().parseExpression(Res));
749     case AsmToken::Percent:
750         return parseRelocOperand(Res);
751     case AsmToken::LParen:
752         return false; // it's probably assuming 0
753     }
754     return true;
755 }
756
757 // eg, 12($sp) or 12(la)
758 Cpu0AsmParser::OperandMatchResultTy Cpu0AsmParser::parseMemOperand(
759                         SmallVectorImpl<MCParsedAsmOperand*>&Operands) {
760
761     const MCExpr *IdVal = 0;
762     SMLoc S;
763     // first operand is the offset
764     S = Parser.getTok().getLoc();
765
766     if (parseMemOffset(IdVal))
767         return MatchOperand_ParseFail;
768
769     const AsmToken &Tok = Parser.getTok(); // get next token
770     if (Tok.isNot(AsmToken::LParen)) {
771         Cpu0Operand *Mnemonic = static_cast<Cpu0Operand*>(Operands[0]);
772         if (Mnemonic->getToken() == "la") {
773             SMLoc E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer()-1);
774             Operands.push_back(Cpu0Operand::CreateImm(IdVal, S, E));
775             return MatchOperand_Success;
776         }
777         Error(Parser.getTok().getLoc(), "'(' expected");
778         return MatchOperand_ParseFail;
779     }
780
781     ParserLex(); // Eat '(' token.
782
783     const AsmToken &Tok1 = Parser.getTok(); // get next token
784     if (Tok1.is(AsmToken::Dollar)) {
785         ParserLex(); // Eat '$' token.
786         if (tryParseRegisterOperand(Operands, ""))

```

```

787     Error(Parser.getTok().getLoc(), "unexpected token in operand");
788     return MatchOperand_ParseFail;
789 }
790
791 } else {
792     Error(Parser.getTok().getLoc(), "unexpected token in operand");
793     return MatchOperand_ParseFail;
794 }
795
796 const AsmToken &Tok2 = Parser.getTok(); // get next token
797 if (Tok2.isNot(AsmToken::RParen)) {
798     Error(Parser.getTok().getLoc(), "') expected");
799     return MatchOperand_ParseFail;
800 }
801
802 SMLoc E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1);
803
804 Parser.Lex(); // Eat ')' token.
805
806 if (IdVal == 0)
807     IdVal = MCConstantExpr::Create(0, getContext());
808
809 // now replace register operand with the mem operand
810 Cpu0Operand* op = static_cast<Cpu0Operand*>(Operands.back());
811 int RegNo = op->getReg();
812 // remove register from operands
813 Operands.pop_back();
814 // and add memory operand
815 Operands.push_back(Cpu0Operand::CreateMem(RegNo, IdVal, S, E));
816 delete op;
817 return MatchOperand_Success;
818 }
819
820 MCSymbolRefExpr::VariantKind Cpu0AsmParser::getVariantKind(StringRef Symbol) {
821
822     MCSymbolRefExpr::VariantKind VK
823         = StringSwitch<MCSymbolRefExpr::VariantKind>(Symbol)
824             .Case("hi", MCSymbolRefExpr::VK_Cpu0_ABS_HI)
825             .Case("lo", MCSymbolRefExpr::VK_Cpu0_ABS_LO)
826             .Case("gp_rel", MCSymbolRefExpr::VK_Cpu0_GPREL)
827             .Case("call24", MCSymbolRefExpr::VK_Cpu0_GOT_CALL)
828             .Case("got", MCSymbolRefExpr::VK_Cpu0_GOT)
829             .Case("tlsgd", MCSymbolRefExpr::VK_Cpu0_TLSGD)
830             .Case("tlsldm", MCSymbolRefExpr::VK_Cpu0_TLSLDM)
831             .Case("dtprel_hi", MCSymbolRefExpr::VK_Cpu0_DTPREL_HI)
832             .Case("dtprel_lo", MCSymbolRefExpr::VK_Cpu0_DTPREL_LO)
833             .Case("gottprel", MCSymbolRefExpr::VK_Cpu0_GOTTPREL)
834             .Case("tprel_hi", MCSymbolRefExpr::VK_Cpu0_TPREL_HI)
835             .Case("tprel_lo", MCSymbolRefExpr::VK_Cpu0_TPREL_LO)
836             .Case("got_disp", MCSymbolRefExpr::VK_Cpu0_GOT_DISP)
837             .Case("got_page", MCSymbolRefExpr::VK_Cpu0_GOT_PAGE)
838             .Case("got_ofst", MCSymbolRefExpr::VK_Cpu0_GOT_OFST)
839             .Case("hi(%neg(%gp_rel)", MCSymbolRefExpr::VK_Cpu0_GPOFF_HI)
840             .Case("lo(%neg(%gp_rel", MCSymbolRefExpr::VK_Cpu0_GPOFF_LO)
841             .Default(MCSymbolRefExpr::VK_None);
842
843     return VK;
844 }

```

```

845
846 bool Cpu0AsmParser::
847 parseMathOperation(StringRef Name, SMLoc NameLoc,
848                     SmallVectorImpl<MCParsedAsmOperand*> &Operands) {
849     // split the format
850     size_t Start = Name.find('.'), Next = Name.rfind('.');
851     StringRef Format1 = Name.slice(Start, Next);
852     // and add the first format to the operands
853     Operands.push_back(Cpu0Operand::CreateToken(Format1, NameLoc));
854     // now for the second format
855     StringRef Format2 = Name.slice(Next, StringRef::npos);
856     Operands.push_back(Cpu0Operand::CreateToken(Format2, NameLoc));
857
858     // set the format for the first register
859     // setFpFormat(Format1);
860
861     // Read the remaining operands.
862     if (getLexer().isNot(AsmToken::EndOfStatement)) {
863         // Read the first operand.
864         if (ParseOperand(Operands, Name)) {
865             SMLoc Loc = getLexer().getLoc();
866             Parser.eatToEndOfStatement();
867             return Error(Loc, "unexpected token in argument list");
868         }
869
870         if (getLexer().isNot(AsmToken::Comma)) {
871             SMLoc Loc = getLexer().getLoc();
872             Parser.eatToEndOfStatement();
873             return Error(Loc, "unexpected token in argument list");
874
875         }
876         Parser.Lex(); // Eat the comma.
877
878         // Parse and remember the operand.
879         if (ParseOperand(Operands, Name)) {
880             SMLoc Loc = getLexer().getLoc();
881             Parser.eatToEndOfStatement();
882             return Error(Loc, "unexpected token in argument list");
883         }
884     }
885
886     if (getLexer().isNot(AsmToken::EndOfStatement)) {
887         SMLoc Loc = getLexer().getLoc();
888         Parser.eatToEndOfStatement();
889         return Error(Loc, "unexpected token in argument list");
890     }
891
892     Parser.Lex(); // Consume the EndOfStatement
893     return false;
894 }
895
896 bool Cpu0AsmParser::
897 ParseInstruction(ParseInstructionInfo &Info, StringRef Name, SMLoc NameLoc,
898                     SmallVectorImpl<MCParsedAsmOperand*> &Operands) {
899
900     // Create the leading tokens for the mnemonic, split by '.' characters.
901     size_t Start = 0, Next = Name.find('.');
902     StringRef Mnemonic = Name.slice(Start, Next);

```

```
903     Operands.push_back(Cpu0Operand::CreateToken(Mnemonic, NameLoc));
904
905     // Read the remaining operands.
906     if (getLexer().isNot(AsmToken::EndOfStatement)) {
907         // Read the first operand.
908         if (ParseOperand(Operands, Name)) {
909             SMLoc Loc = getLexer().getLoc();
910             Parser.eatToEndOfStatement();
911             return Error(Loc, "unexpected token in argument list");
912         }
913     }
914
915     while (getLexer().is(AsmToken::Comma)) {
916         Parser.Lex(); // Eat the comma.
917
918         // Parse and remember the operand.
919         if (ParseOperand(Operands, Name)) {
920             SMLoc Loc = getLexer().getLoc();
921             Parser.eatToEndOfStatement();
922             return Error(Loc, "unexpected token in argument list");
923         }
924     }
925 }
926
927 if (getLexer().isNot(AsmToken::EndOfStatement)) {
928     SMLoc Loc = getLexer().getLoc();
929     Parser.eatToEndOfStatement();
930     return Error(Loc, "unexpected token in argument list");
931 }
932
933 Parser.Lex(); // Consume the EndOfStatement
934 return false;
935 }
936
937 bool Cpu0AsmParser::reportParseError(StringRef ErrorMsg) {
938     SMLoc Loc = getLexer().getLoc();
939     Parser.eatToEndOfStatement();
940     return Error(Loc, ErrorMsg);
941 }
942
943 bool Cpu0AsmParser::parseSetReorderDirective() {
944     Parser.Lex();
945     // if this is not the end of the statement, report error
946     if (getLexer().isNot(AsmToken::EndOfStatement)) {
947         reportParseError("unexpected token in statement");
948         return false;
949     }
950     Options.setReorder();
951     Parser.Lex(); // Consume the EndOfStatement
952     return false;
953 }
954
955 bool Cpu0AsmParser::parseSetNoReorderDirective() {
956     Parser.Lex();
957     // if this is not the end of the statement, report error
958     if (getLexer().isNot(AsmToken::EndOfStatement)) {
959         reportParseError("unexpected token in statement");
960         return false;
961 }
```

```

961     }
962     Options.setNoreorder();
963     Parser.Lex(); // Consume the EndOfStatement
964     return false;
965 }
966
967 bool Cpu0AsmParser::parseSetMacroDirective() {
968     Parser.Lex();
969     // if this is not the end of the statement, report error
970     if (getLexer().isNot(AsmToken::EndOfStatement)) {
971         reportParseError("unexpected token in statement");
972         return false;
973     }
974     Options.setMacro();
975     Parser.Lex(); // Consume the EndOfStatement
976     return false;
977 }
978
979 bool Cpu0AsmParser::parseSetNoMacroDirective() {
980     Parser.Lex();
981     // if this is not the end of the statement, report error
982     if (getLexer().isNot(AsmToken::EndOfStatement)) {
983         reportParseError("'noreorder' must be set before 'nomacro'");
984         return false;
985     }
986     if (Options.isReorder()) {
987         reportParseError("'noreorder' must be set before 'nomacro'");
988         return false;
989     }
990     Options.setNomacro();
991     Parser.Lex(); // Consume the EndOfStatement
992     return false;
993 }
994 bool Cpu0AsmParser::parseDirectiveSet() {
995
996     // get next token
997     const AsmToken &Tok = Parser.getTok();
998
999     if (Tok.getString() == "reorder") {
1000         return parseSetReorderDirective();
1001     } else if (Tok.getString() == "noreorder") {
1002         return parseSetNoReorderDirective();
1003     } else if (Tok.getString() == "macro") {
1004         return parseSetMacroDirective();
1005     } else if (Tok.getString() == "nomacro") {
1006         return parseSetNoMacroDirective();
1007     }
1008     return true;
1009 }
1010
1011 bool Cpu0AsmParser::ParseDirective(AsmToken DirectiveID) {
1012
1013     if (DirectiveID.getString() == ".ent") {
1014         // ignore this directive for now
1015         Parser.Lex();
1016         return false;
1017     }
1018

```

```
1019     if (DirectiveID.getString() == ".end") {
1020         // ignore this directive for now
1021         Parser.Lex();
1022         return false;
1023     }
1024
1025     if (DirectiveID.getString() == ".frame") {
1026         // ignore this directive for now
1027         Parser.eatToEndOfStatement();
1028         return false;
1029     }
1030
1031     if (DirectiveID.getString() == ".set") {
1032         return parseDirectiveSet();
1033     }
1034
1035     if (DirectiveID.getString() == ".fmask") {
1036         // ignore this directive for now
1037         Parser.eatToEndOfStatement();
1038         return false;
1039     }
1040
1041     if (DirectiveID.getString() == ".mask") {
1042         // ignore this directive for now
1043         Parser.eatToEndOfStatement();
1044         return false;
1045     }
1046
1047     if (DirectiveID.getString() == ".gpword") {
1048         // ignore this directive for now
1049         Parser.eatToEndOfStatement();
1050         return false;
1051     }
1052
1053     return true;
1054 }
1055
1056 extern "C" void LLVMInitializeCpu0AsmParser() {
1057     RegisterMCAsmParser<Cpu0AsmParser> X(TheCpu0Target);
1058     RegisterMCAsmParser<Cpu0AsmParser> Y(TheCpu0elTarget);
1059 }
1060
1061 #define GET_REGISTER_MATCHER
1062 #define GET_MATCHER_IMPLEMENTATION
1063 #include "Cpu0GenAsmMatcher.inc"
```

### LLVMBackendTutorialExampleCode/Chapter10\_1/AsmParser/CMakeLists.txt

```
include_directories( ${CMAKE_CURRENT_BINARY_DIR}... ${CMAKE_CURRENT_SOURCE_DIR}... )
add_llvm_library(LLVMCpu0AsmParser
    Cpu0AsmParser.cpp
)
add_dependencies(LLVMCpu0AsmParser Cpu0CommonTableGen)
```

### LLVMBackendTutorialExampleCode/Chapter10\_1/AsmParser/LLVMBuild.txt

```

1 ;===== ./lib/Target/Mips/AsmParser/LLVMBuild.txt -----*-- Conf -*---=;
2 ;
3 ; The LLVM Compiler Infrastructure
4 ;
5 ; This file is distributed under the University of Illinois Open Source
6 ; License. See LICENSE.TXT for details.
7 ;
8 ;=====-----=;
9 ;
10 ; This is an LLVMBuild description file for the components in this subdirectory.
11 ;
12 ; For more information on the LLVMBuild system, please see:
13 ;
14 ; http://llvm.org/docs/LLVMBuild.html
15 ;
16 ;=====-----=;
17 ;
18 [component_0]
19 type = Library
20 name = Cpu0AsmParser
21 parent = Mips
22 required_libraries = MC MCParser Support MipsDesc MipsInfo
23 add_to_library_groups = Cpu0

```

The Cpu0AsmParser.cpp contains one thousand of code which do the assembly language parsing. You can understand it with a little patient only. To let directory AsmParser be built, modify CMakeLists.txt and LLVMBuild.txt as follows,

### LLVMBackendTutorialExampleCode/Chapter10\_1/CMakeLists.txt

```

tablegen(LLVM Cpu0GenAsmMatcher.inc -gen-asm-matcher)
...
add_subdirectory(AsmParser)

```

### LLVMBackendTutorialExampleCode/Chapter10\_1/LLVMBuild.txt

```

subdirectories = AsmParser ...
...
has_asmparser = 1

```

The other files change as follows,

### LLVMBackendTutorialExampleCode/Chapter10\_1/MCTargetDesc/Cpu0MCCodeEmitter.cpp

```

unsigned Cpu0MCCodeEmitter::
getBranchTargetOpValue(const MCInst &MI, unsigned OpNo,
    SmallVectorImpl<MCFixup> &Fixups) const {
    ...
    // If the destination is an immediate, we have nothing to do.
    if (MO.isImm()) return MO.getImm();
    ...
}

```

```
/// getJumpAbsoluteTargetOpValue - Return binary encoding of the jump
/// target operand. Such as SWI.
unsigned Cpu0MCCodeEmitter:::
getJumpAbsoluteTargetOpValue(const MCInst &MI, unsigned OpNo,
                           SmallVectorImpl<MCFixup> &Fixups) const {
    ...
    // If the destination is an immediate, we have nothing to do.
    if (MO.isImm()) return MO.getImm();
    ...
}
```

### LLVMBackendTutorialExampleCode/Chapter10\_1/Cpu0.td

```
def Cpu0AsmParser : AsmParser {
    let ShouldEmitMatchRegisterName = 0;
}

def Cpu0AsmParserVariant : AsmParserVariant {
    int Variant = 0;

    // Recognize hard coded registers.
    string RegisterPrefix = "$";
}

def Cpu0 : Target {
    ...
    let AssemblyParsers = [Cpu0AsmParser];
    ...
    let AssemblyParserVariants = [Cpu0AsmParserVariant];
}
```

### LLVMBackendTutorialExampleCode/Chapter10\_1/Cpu0InstrFormats.td

```
// Pseudo-instructions for alternate assembly syntax (never used by codegen).
// These are aliases that require C++ handling to convert to the target
// instruction, while InstAliases can be handled directly by tblgen.
class Cpu0AsmPseudoInst<dag outs, dag ins, string asmstr>:
    Cpu0Inst<outs, ins, asmstr, [], IIPseudo, Pseudo> {
        let isPseudo = 1;
        let Pattern = [];
    }
```

### LLVMBackendTutorialExampleCode/Chapter10\_1/Cpu0InstrInfo.td

```
// Cpu0InstrInfo.td
def Cpu0MemAsmOperand : AsmOperandClass {
    let Name = "Mem";
    let ParserMethod = "parseMemOperand";
}

// Address operand
def mem : Operand<i32> {
```

```

    ...
    let ParserMatchClass = Cpu0MemAsmOperand;
}

...
class CmpInstr<...
    !strconcat(instr_asm, "\t$rc, $ra, $rb"), [], itin> {
...
}

...
class CBranch<...
    !strconcat(instr_asm, "\t$ra, $addr"), ...> {
...
}

...
//=====
// Pseudo Instruction definition
//=====

class LoadImm32< string instr_asm, Operand Od, RegisterClass RC> :
    Cpu0AsmPseudoInst<(outs RC:$ra), (ins Od:$imm32),
    !strconcat(instr_asm, "\t$ra, $imm32")> ;
def LoadImm32Reg : LoadImm32<"li", shamt, CPURegs>;

class LoadAddress<string instr_asm, Operand MemOpnd, RegisterClass RC> :
    Cpu0AsmPseudoInst<(outs RC:$ra), (ins MemOpnd:$addr),
    !strconcat(instr_asm, "\t$ra, $addr")> ;
def LoadAddr32Reg : LoadAddress<"la", mem, CPURegs>;

class LoadAddressImm<string instr_asm, Operand Od, RegisterClass RC> :
    Cpu0AsmPseudoInst<(outs RC:$ra), (ins Od:$imm32),
    !strconcat(instr_asm, "\t$ra, $imm32")> ;
def LoadAddr32Imm : LoadAddressImm<"la", shamt, CPURegs>;

```

Above define the **ParserMethod = “parseMemOperand”** and implement the `parseMemOperand()` in `Cpu0AsmParser.cpp` to handle the “**mem**” operand which used in `ld` and `st`. For example, `ld $2, 4($sp)`, the **mem** operand is `4($sp)`. Accompany with **“let ParserMatchClass = Cpu0MemAsmOperand;”**, LLVM will call `parseMemOperand()` of `Cpu0AsmParser.cpp` when it meets the assembly **mem** operand `4($sp)`. With above **“let”** assignment, `TableGen` will generate the following structure and functions in `Cpu0GenAsmMatcher.inc`.

### cmake\_debug\_build/lib/Target/Cpu0/Cpu0GenAsmMatcher.inc

```

enum OperandMatchResultTy {
    MatchOperand_Success,      // operand matched successfully
    MatchOperand_NoMatch,      // operand did not match
    MatchOperand_ParseFail    // operand matched but had errors
};

OperandMatchResultTy MatchOperandParserImpl(
    SmallVectorImpl<MCParsedAsmOperand*> &Operands,
    StringRef Mnemonic);
OperandMatchResultTy tryCustomParseOperand(
    SmallVectorImpl<MCParsedAsmOperand*> &Operands,
    unsigned MCK);

Cpu0AsmParser::OperandMatchResultTy Cpu0AsmParser::
tryCustomParseOperand(SmallVectorImpl<MCParsedAsmOperand*> &Operands,
    unsigned MCK) {

```

```
switch(MCK) {
case MCK_Mem:
    return parseMemOperand(Operands);
default:
    return MatchOperand_NoMatch;
}
return MatchOperand_NoMatch;
}

Cpu0AsmParser::OperandMatchResultTy Cpu0AsmParser::
MatchOperandParserImpl(SmallVectorImpl<MCParsedAsmOperand*> &Operands,
                      StringRef Mnemonic) {
    ...
}

/// MatchClassKind - The kinds of classes which participate in
/// instruction matching.
enum MatchClassKind {
    ...
    MCK_Mem, // user defined class 'Cpu0MemAsmOperand'
    ...
};
```

Above 3 Pseudo Instruction definitions in Cpu0InstrInfo.td such as LoadImm32Reg are handled by Cpu0AsmParser.cpp as follows,

### LLVMBackendTutorialExampleCode/Chapter10\_1/AsmParser/Cpu0AsmParser.cpp

```
bool Cpu0AsmParser::needsExpansion(MCInst &Inst) {

    switch(Inst.getOpcode()) {
    case Cpu0::LoadImm32Reg:
    case Cpu0::LoadAddr32Imm:
    case Cpu0::LoadAddr32Reg:
        return true;
    default:
        return false;
    }
}

void Cpu0AsmParser::expandInstruction(MCInst &Inst, SMLoc IDLoc,
                                      SmallVectorImpl<MCInst> &Instructions) {
    switch(Inst.getOpcode()) {
    case Cpu0::LoadImm32Reg:
        return expandLoadImm(Inst, IDLoc, Instructions);
    case Cpu0::LoadAddr32Imm:
        return expandLoadAddressImm(Inst, IDLoc, Instructions);
    case Cpu0::LoadAddr32Reg:
        return expandLoadAddressReg(Inst, IDLoc, Instructions);
    }
}

bool Cpu0AsmParser::
MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
                      SmallVectorImpl<MCParsedAsmOperand*> &Operands,
                      MCStreamer &Out, unsigned &ErrorInfo,
                      bool MatchingInlineAsm) {
```

```

MCInst Inst;
unsigned MatchResult = MatchInstructionImpl(Operands, Inst, ErrorInfo,
                                             MatchingInlineAsm);

switch (MatchResult) {
default: break;
case Match_Success: {
    if (needsExpansion(Inst)) {
        SmallVector<MCInst, 4> Instructions;
        expandInstruction(Inst, IDLoc, Instructions);
        ...
    }
    ...
}

```

Finally, remind the CPURegs as below must follow the order of register number because AsmParser use this when do register number encode.

#### LLVMBackendTutorialExampleCode/Chapter10\_1/Cpu0RegisterInfo.td

```

1 //=====//
2 // The register string, such as "9" or "gp" will show on "llvm-objdump -d"
3 let Namespace = "Cpu0" in {
4     // General Purpose Registers
5     def ZERO : Cpu0GPRReg< 0, "zero">, DwarfRegNum<[0]>;
6     def AT   : Cpu0GPRReg< 1, "1">,   DwarfRegNum<[1]>;
7     def V0   : Cpu0GPRReg< 2, "2">,   DwarfRegNum<[2]>;
8     def V1   : Cpu0GPRReg< 3, "3">,   DwarfRegNum<[3]>;
9     def A0   : Cpu0GPRReg< 4, "4">,   DwarfRegNum<[6]>;
10    def A1   : Cpu0GPRReg< 5, "5">,   DwarfRegNum<[7]>;
11    def T9   : Cpu0GPRReg< 6, "t9">,   DwarfRegNum<[6]>;
12    def S0   : Cpu0GPRReg< 7, "7">,   DwarfRegNum<[7]>;
13    def S1   : Cpu0GPRReg< 8, "8">,   DwarfRegNum<[8]>;
14    def S2   : Cpu0GPRReg< 9, "9">,   DwarfRegNum<[9]>;
15    def GP   : Cpu0GPRReg< 10, "gp">,  DwarfRegNum<[10]>;
16    def FP   : Cpu0GPRReg< 11, "fp">,  DwarfRegNum<[11]>;
17    def SW   : Cpu0GPRReg< 12, "sw">,  DwarfRegNum<[12]>;
18    def SP   : Cpu0GPRReg< 13, "sp">,  DwarfRegNum<[13]>;
19    def LR   : Cpu0GPRReg< 14, "lr">,  DwarfRegNum<[14]>;
20    def PC   : Cpu0GPRReg< 15, "pc">,  DwarfRegNum<[15]>;
21 //  def MAR : Register< 16, "mar">,  DwarfRegNum<[16]>;
22 //  def MDR : Register< 17, "mdr">,  DwarfRegNum<[17]>;
23
24     // Hi/Lo registers
25     def HI   : Register<"hi">, DwarfRegNum<[18]>;
26     def LO   : Register<"lo">, DwarfRegNum<[19]>;
27 }
28 //=====
29 // Register Classes
30 //=====
31
32
33 def CPURegs : RegisterClass<"Cpu0", [i32], 32, (add
34     // Reserved
35     ZERO, AT,
36     // Return Values and Arguments
37     V0, V1, A0, A1,

```

```

38 // Not preserved across procedure calls
39 T9,
40 // Callee save
41 S0, S1, S2,
42 // Reserved
43 GP, FP,
44 // Not preserved across procedure calls
45 SW,
46 // Reserved
47 SP, LR, PC)>;

```

Run Chapter10\_1/ with ch10\_1.cpp to get the correct result as follows,

```

JonathantekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llc -march=cpu0 -relocation-model=pic -filetype=obj ch10_1.bc -o
ch10_1.cpu0.o
JonathantekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llvm-objdump -d ch10_1.cpu0.o

```

ch10\_1.cpu0.o: file format ELF32-unknown

Disassembly of section .text:

| .text:          |                        |
|-----------------|------------------------|
| 0: 00 2d 00 08  | ld \$2, 8(\$sp)        |
| 4: 01 0d 00 04  | st \$zero, 4(\$sp)     |
| 8: 09 30 00 00  | addiu \$3, \$zero, 0   |
| c: 13 31 20 00  | add \$3, \$at, \$2     |
| 10: 14 32 30 00 | sub \$3, \$2, \$3      |
| 14: 15 21 30 00 | mul \$2, \$at, \$3     |
| 18: 16 32 00 00 | div \$3, \$2           |
| 1c: 17 23 00 00 | divu \$2, \$3          |
| 20: 18 21 30 00 | and \$2, \$at, \$3     |
| 24: 19 31 20 00 | or \$3, \$at, \$2      |
| 28: 1a 12 30 00 | xor \$at, \$2, \$3     |
| 2c: 50 43 00 00 | mult \$4, \$3          |
| 30: 51 32 00 00 | multu \$3, \$2         |
| 34: 40 30 00 00 | mfhi \$3               |
| 38: 41 20 00 00 | mflo \$2               |
| 3c: 42 20 00 00 | mthi \$2               |
| 40: 43 20 00 00 | mtlo \$2               |
| 44: 1b 22 00 02 | sra \$2, \$2, 2        |
| 48: 1c 21 10 03 | rol \$2, \$at, 3       |
| 4c: 1d 33 10 04 | ror \$3, \$3, 4        |
| 50: 1e 22 00 02 | shl \$2, \$2, 2        |
| 54: 1f 23 00 05 | shr \$2, \$3, 5        |
| 58: 10 23 00 00 | cmp \$zero, \$2, \$3   |
| 5c: 20 00 00 14 | jeq \$zero, 20         |
| 60: 21 00 00 10 | jne \$zero, 16         |
| 64: 22 ff ff ec | jlt \$zero, -20        |
| 68: 24 ff ff f0 | jle \$zero, -16        |
| 6c: 23 ff ff fc | jgt \$zero, -4         |
| 70: 25 ff ff f4 | jge \$zero, -12        |
| 74: 2a 00 04 00 | swi 1024               |
| 78: 2b 01 00 00 | jsub 65536             |
| 7c: 2c e0 00 00 | ret \$lr               |
| 80: 2d e6 00 00 | jalr \$6               |
| 84: 09 30 00 70 | addiu \$3, \$zero, 112 |
| 88: 1e 33 00 10 | shl \$3, \$3, 16       |

```

8c: 09 10 00 00          addiu $at, $zero, 0
90: 19 33 10 00          or $3, $3, $at
94: 09 30 00 80          addiu $3, $zero, 128
98: 1e 36 00 10          shl $3, $6, 16
9c: 09 10 00 00          addiu $at, $zero, 0
a0: 19 36 10 00          or $3, $6, $at
a4: 13 33 60 00          add $3, $3, $6
a8: 09 30 00 90          addiu $3, $zero, 144
ac: 1e 33 00 10          shl $3, $3, 16
b0: 09 10 00 00          addiu $at, $zero, 0
b4: 19 33 10 00          or $3, $3, $at

```

We replace cmp and jeg with explicit \$sw in assembly and \$zero in disassembly for AsmParser support. It's OK with just a little bad in readability and in assembly programing than implicit representation.

## 10.2 Verilog of CPU0

Verilog language is an IEEE standard in IC design. There are a lot of book and documents for this language. Web site <sup>1</sup> has a pdf <sup>2</sup> in this. Example code LLVMBackendTutorialExampleCode/cpu0s\_verilog/raw/cpu0s.v is the cpu0 design in Verilog. In Appendix A, we have downloaded and installed Icarus Verilog tool both on iMac and Linux. The cpu0s.v is a simple design with only 280 lines of code. Although it has not the pipeline features, we can assume the cpu0 backend code run on the pipeline machine because the pipeline version use the same machine instructions. Verilog is C like language in syntax and this book is a compiler book, so we list the cpu0s.v as well as the building command directly as below. We expect readers can understand the Verilog code just with a little patient and no need further explanation. There are two type of I/O. One is memory mapped I/O, the other is instruction I/O. CPU0 use memory mapped I/O, we set the memory address 0x7000 as the output port. When meet the instruction “**st \$ra, cx(\$rb)**”, where cx(\$rb) is 0x7000 (28672), CPU0 display the content as follows,

```

ST :
if (R[b]+c16 == 28672)
$display("%4dns %8x : %8x OUTPUT=%-d", $stime, pc0, ir, R[a]);

```

**LLVMBackendTutorialExampleCode/cpu0\_verilog/raw/cpu0s.v**

```

`define MEMSIZE 'h7000
`define MEMEMPTY 8'hFF
`define IOADDR  'h7000

// Operand width
`define INT32 2'b11      // 32 bits
`define INT24 2'b10      // 24 bits
`define INT16 2'b01      // 16 bits
`define BYTE  2'b00      // 8  bits

// Reference web: http://ccckmit.wikidot.com/ocs:cpu0
module cpu0(input clock, reset, output reg [2:0] tick,
            output reg [31:0] ir, pc, mar, mdr, inout [31:0] dbus,
            output reg m_en, m_rw, output reg [1:0] m_size);
  reg signed [31:0] R [0:15], HI, LO;
  // High and Low part of 64 bit result
  reg [7:0] op;

```

<sup>1</sup> <http://www.ece.umd.edu/courses/enee359a/>

<sup>2</sup> [http://www.ece.umd.edu/courses/enee359a/verilog\\_tutorial.pdf](http://www.ece.umd.edu/courses/enee359a/verilog_tutorial.pdf)

```

reg [3:0] a, b, c;
reg [4:0] c5;
reg signed [31:0] c12, c16, uc16, c24, Ra, Rb, Rc, pc0; // pc0 : instruction pc

// register name
#define PC R[15] // Program Counter
#define LR R[14] // Link Register
#define SP R[13] // Stack Pointer
#define SW R[12] // Status Word
// SW Flage
#define N 'SW[31] // Negative flag
#define Z 'SW[30] // Zero
#define C 'SW[29] // Carry
#define V 'SW[28] // Overflow
#define I 'SW[7] // Hardware Interrupt Enable
#define T 'SW[6] // Software Interrupt Enable
#define M 'SW[0] // Mode bit
// Instruction Opcode
parameter [7:0] LD=8'h01, ST=8'h02, LB=8'h03, LBu=8'h04, SB=8'h05, LH=8'h06,
LHu=8'h07, SH=8'h08, ADDiu=8'h09, ANDi=8'h0C, ORi=8'h0D,
XORi=8'h0E,
CMP=8'h10,
ADDu=8'h11, SUBu=8'h12, ADD=8'h13, SUB=8'h14, MUL=8'h15, SDIV=8'h16,
AND=8'h18, OR=8'h19, XOR=8'h1A,
SRA=8'h1B, ROL=8'h1C, ROR=8'h1D, SHL=8'h1E, SHR=8'h1F,
JEQ=8'h20, JNE=8'h21, JLT=8'h22, JGT=8'h23, JLE=8'h24, JGE=8'h25,
JMP=8'h26,
SWI=8'h2A, JSUB=8'h2B, RET=8'h2C, IRET=8'h2D, JALR=8'h2E,
MFHI=8'h40, MFLO=8'h41, MTHI=8'h42, MTLO=8'h43,
MULT=8'h50;

reg [2:0] state, next_state;
parameter Reset=3'h0, Fetch=3'h1, Decode=3'h2, Execute=3'h3, WriteBack=3'h4;

task memReadStart(input [31:0] addr, input [1:0] size); begin // Read Memory Word
    mar = addr; // read(m[addr])
    m_rw = 1; // Access Mode: read
    m_en = 1; // Enable read
    m_size = size;
end endtask

task memReadEnd(output [31:0] data); begin // Read Memory Finish, get data
    mdr = dbus; // get momory, dbus = m[addr]
    data = mdr; // return to data
    m_en = 0; // read complete
end endtask

// Write memory -- addr: address to write, data: date to write
task memWriteStart(input [31:0] addr, input [31:0] data, input [1:0] size); begin
    mar = addr; // write(m[addr], data)
    mdr = data;
    m_rw = 0; // access mode: write
    m_en = 1; // Enable write
    m_size = size;
end endtask

task memWriteEnd; begin // Write Memory Finish
    m_en = 0; // write complete

```

```

end endtask

task regSet(input [3:0] i, input [31:0] data); begin
    if (i!=0) R[i] = data;
end endtask

task regHILOSet(input [31:0] data1, input [31:0] data2); begin
    HI = data1;
    LO = data2;
end endtask

always @(posedge clock or posedge reset) begin
    if (reset) state <= Reset;
    else state <= next_state;
end

always @(state or reset) begin
    m_en = 0;
    case (state)
        Reset: begin
            'PC = 0; tick = 0; R[0] = 0; 'SW = 0; 'LR = -1;
            next_state = reset?Reset:Fetch;
        end
        Fetch: begin // Tick 1 : instruction fetch, throw PC to address bus,
            // memory.read(m[PC])
            memReadStart('PC, 'INT32);
            pc0 = 'PC;
            'PC = 'PC+4;
            next_state = Decode;
        end
        Decode: begin // Tick 2 : instruction decode, ir = m[PC]
            memReadEnd(ir); // IR = dbus = m[PC]
            {op,a,b,c} = ir[31:12];
            c24 = $signed(ir[23:0]);
            c16 = $signed(ir[15:0]);
            uc16 = ir[15:0];
            c12 = $signed(ir[11:0]);
            c5 = ir[4:0];
            Ra = R[a];
            Rb = R[b];
            Rc = R[c];
            next_state = Execute;
        end
        Execute: begin // Tick 3 : instruction execution
            case (op)
                // load and store instructions
                LD: memReadStart(Rb+c16, 'INT32); // LD Ra, [Rb+Cx]; Ra<=[Rb+Cx]
                ST: memWriteStart(Rb+c16, Ra, 'INT32); // ST Ra, [Rb+Cx]; Ra>[Rb+Cx]
                LB: memReadStart(Rb+c16, 'BYTE); // LB Ra, [Rb+Cx]; Ra<=(byte) [Rb+Cx]
                LBu: memReadStart(Rb+c16, 'BYTE); // LBu Ra, [Rb+Cx]; Ra<=(byte) [Rb+Cx]
                SB: memWriteStart(Rb+c16, Ra, 'BYTE); // SB Ra, [Rb+Cx]; Ra>=(byte) [Rb+Cx]
                LH: memReadStart(Rb+c16, 'INT16); // LH Ra, [Rb+Cx]; Ra<=(2bytes) [Rb+Cx]
                LHu: memReadStart(Rb+c16, 'INT16); // LHu Ra, [Rb+Cx]; Ra<=(2bytes) [Rb+Cx]
                SH: memWriteStart(Rb+c16, Ra, 'INT16); // SH Ra, [Rb+Cx]; Ra>=(2bytes) [Rb+Cx]
                // Mathematic
                ADDiu: R[a] = Rb+c16; // ADDiu Ra, Rb+Cx; Ra<=Rb+Cx
                CMP: begin 'N=(Ra-Rb<0); 'Z=(Ra-Rb==0); end // CMP Ra, Rb; SW=(Ra >= Rb)
                ADDu: regSet(a, Rb+Rc); // ADDu Ra,Rb,Rc; Ra<=Rb+Rc

```

```

ADD:   begin regSet(a, Rb+Rc); if (a < Rb) 'V = 1; else 'V =0; end
          // ADD Ra,Rb,Rc; Ra<=Rb+Rc
SUBu:  regSet(a, Rb-Rc);           // SUBu Ra,Rb,Rc; Ra<=Rb-Rc
SUB:   begin regSet(a, Rb-Rc); if (Rb < 0 && Rc > 0 && a >= 0)
          'V = 1; else 'V =0; end           // SUB Ra,Rb,Rc; Ra<=Rb-Rc
MUL:   regSet(a, Rb*Rc);           // MUL Ra,Rb,Rc; Ra<=Rb*Rc
SDIV:  regHILOSet(Ra%Rb, Ra/Rb); // SDIV Ra,Rb; HI<=Ra%Rb; LO<=Ra/Rb
          // with exception overflow
AND:   regSet(a, Rb&Rc);           // AND Ra,Rb,Rc; Ra<=(Rb and Rc)
ANDi:  regSet(a, Rb&uc16);        // ANDi Ra,Rb,c16; Ra<=(Rb and c16)
OR:    regSet(a, Rb|Rc);           // OR Ra,Rb,Rc; Ra<=(Rb or Rc)
ORi:   regSet(a, Rb|uc16);        // ORi Ra,Rb,c16; Ra<=(Rb or c16)
XOR:   regSet(a, Rb^Rc);           // XOR Ra,Rb,Rc; Ra<=(Rb xor Rc)
XORi:  regSet(a, Rb^uc16);        // XORi Ra,Rb,c16; Ra<=(Rb xor c16)
SHL:   regSet(a, Rb<<c5);        // Shift Left; SHL Ra,Rb,Cx; Ra<=(Rb << Cx)
SRA:   regSet(a, (Rb&'h80000000) | (Rb>>c5));
          // Shift Right with signed bit fill;
          // SHR Ra,Rb,Cx; Ra<=(Rb&0x80000000) | (Rb>>Cx)
SHR:   regSet(a, Rb>>c5);        // Shift Right with 0 fill;
          // SHR Ra,Rb,Cx; Ra<=(Rb >> Cx)
ROL:   regSet(a, (Rb<<c5) | (Rb>>(32-c5))); // Rotate Left;
ROR:   regSet(a, (Rb>>c5) | (Rb<<(32-c5))); // Rotate Right;
MFLO:  regSet(a, LO);             // MFLO Ra; Ra<=LO
MFHI:  regSet(a, HI);             // MFHI Ra; Ra<=HI
MTLO:  LO = Ra;                 // MTLO Ra; LO<=Ra
MTHI:  HI = Ra;                 // MTHI Ra; HI<=Ra
MULT:  {HI, LO}=Ra*Rb; // MULT Ra,Rb; HI<=((Ra*Rb)>>32);
          // LO<=((Ra*Rb) and 0x00000000ffffffffff);
          // with exception overflow
// Jump Instructions
JEQ:  if ('Z) 'PC='PC+c24;      // JEQ Cx; if SW(=) PC PC+Cx
JNE:  if (!'Z) 'PC='PC+c24;      // JNE Cx; if SW(!=) PC PC+Cx
JLT:  if ('N) 'PC='PC+c24;       // JLT Cx; if SW(<) PC PC+Cx
JGT:  if (!'N&! 'Z) 'PC='PC+c24; // JGT Cx; if SW(>) PC PC+Cx
JLE:  if ('N || 'Z) 'PC='PC+c24; // JLE Cx; if SW(<=) PC PC+Cx
JGE:  if (!'N || 'Z) 'PC='PC+c24; // JGE Cx; if SW(>=) PC PC+Cx
JMP:  'PC = 'PC+c24;           // JMP Cx; PC <= PC+Cx
SWI:  begin
          'LR='PC; 'PC= c24; 'I = 1'b1;
      end // Software Interrupt; SWI Cx; LR <= PC; PC <= Cx; INT<=1
JSUB: begin 'LR='PC; 'PC='PC + c24; end // JSUB Cx; LR<=PC; PC<=PC+Cx
JALR: begin 'LR='PC; 'PC=Ra; end // JALR Ra,Rb; Ra<=PC; PC<=Rb
RET:  begin 'PC='LR; end         // RET; PC <= LR
IRET: begin
          'PC='LR; 'I = 1'b0;
      end // Interrupt Return; IRET; PC <= LR; INT<=0
      endcase
      next_state = WriteBack;
  end
WriteBack: begin // Read/Write finish, close memory
  case (op)
    LD, LB, LBu, LH, LHu : memReadEnd(R[a]);
          //read memory complete
    ST, SB, SH: memWriteEnd();
          // write memory complete
  endcase
  case (op)
    MULT, SDIV, MTHI, MTLO :

```

```

$display("%4dns %8x : %8x HI=%8x LO=%8x SW=%8x", $stime, pc0, ir, HI,
        LO, 'SW);
ST :
if (R[b]+c16 == 'IOADDR)
    $display("%4dns %8x : %8x OUTPUT=%-d", $stime, pc0, ir, R[a]);
else
    $display("%4dns %8x : %8x m[%-04d+%-04d]=-d SW=%8x", $stime, pc0, ir,
        R[b], c16, R[a], 'SW);
default :
    $display("%4dns %8x : %8x R[%02d]=-8x=%-d SW=%8x", $stime, pc0, ir, a,
        R[a], R[a], 'SW);
endcase
if (op==RET && 'PC < 0) begin
    $display("RET to PC < 0, finished!");
    $finish;
end
next_state = Fetch;
end
endcase
pc = 'PC;
end

endmodule

module memory0(input clock, reset, en, rw, input [1:0] m_size,
               input [31:0] abus, dbus_in, output [31:0] dbus_out);
reg [7:0] m [0:'MEMSIZE-1];
reg [31:0] data;

integer i;
initial begin
// erase memory
    for (i=0; i < 'MEMSIZE; i=i+1) begin
        m[i] = 'MEMEMPTY;
    end
// display memory contents
    $readmemh("cpu0s.hex", m);
    for (i=0; i < 'MEMSIZE && m[i] != 'MEMEMPTY; i=i+4) begin
        $display("%8x: %8x", i, {m[i], m[i+1], m[i+2], m[i+3]});
    end
end

always @(clock or abus or en or rw or dbus_in)
begin
    if (abus >=0 && abus <= 'MEMSIZE-4) begin
        if (en == 1 && rw == 0) begin // r_w==0:write
            data = dbus_in;
            case (m_size)
                'BYTE: {m[abus]} = dbus_in[7:0];
                'INT16: {m[abus], m[abus+1]} = dbus_in[15:0];
                'INT24: {m[abus], m[abus+1], m[abus+2]} = dbus_in[24:0];
                'INT32: {m[abus], m[abus+1], m[abus+2], m[abus+3]} = dbus_in;
            endcase
        end else if (en == 1 && rw == 1) begin// r_w==1:read
            case (m_size)
                'BYTE: data = {8'h00, 8'h00, 8'h00, m[abus]};
                'INT16: data = {8'h00, 8'h00, m[abus], m[abus+1]};
                'INT24: data = {8'h00, m[abus], m[abus+1], m[abus+2]};
            endcase
        end
    end
end

```

```

`INT32: data = {m[abus], m[abus+1], m[abus+2], m[abus+3]};
endcase
end else
  data = 32'hZZZZZZZZ;
end else
  data = 32'hZZZZZZZZ;
end
assign dbus_out = data;
endmodule

module main;
  reg clock, reset;
  wire [2:0] tick;
  wire [31:0] pc, ir, mar, mdr, dbus;
  wire m_en, m_rw;
  wire [1:0] m_size;

cpu0 cpu(.clock(clock), .reset(reset), .pc(pc), .tick(tick), .ir(ir),
.mar(mar), .mdr(mdr), .dbus(dbus), .m_en(m_en), .m_rw(m_rw), .m_size(m_size));

memory0 mem(.clock(clock), .reset(reset), .en(m_en), .rw(m_rw), .m_size(m_size),
.abus(mar), .dbus_in(mdr), .dbus_out(dbus));

initial
begin
  clock = 0;
  reset = 1;
  #20 reset = 0;
  #300000 $finish;
end

always #10 clock=clock+1;
endmodule

```

```

JonathantekiiMac:raw Jonathan$ pwd
/Users/Jonathan/test/2/lbd/LLVMBbackendTutorialExampleCode/cpu0_verilog/raw
JonathantekiiMac:raw Jonathan$ iverilog -o cpu0s cpu0s.v

```

## 10.3 Run program on CPU0 machine

Now let's compile ch\_run\_backend.cpp as below. Since code size grows up from low to high address and stack grows up from high to low address. We set \$sp at 0x6ffc because cpu0s.v use 0x7000 bytes of memory.

### LLVMBbackendTutorialExampleCode/InputFiles/InitRegs.h

```

1 asm("addiu $4,      $ZERO, 0");
2 asm("addiu $5,      $ZERO, 0");
3 asm("addiu $6,      $ZERO, 0");
4 asm("addiu $7,      $ZERO, 0");
5 asm("addiu $8,      $ZERO, 0");
6 asm("addiu $9,      $ZERO, 0");
7 asm("addiu $10, $ZERO, 0");
8 asm("addiu $11, $ZERO, 0");

```

```

9  asm("addiu $12, $ZERO, 0");
10 asm("addiu $14, $ZERO, -1");

```

### LLVMBackendTutorialExampleCode/InputFiles/ch\_run\_backend.cpp

```

1  // /Users/Jonathan/llvm/test/cmake_debug_build/bin/Debug/llvm-objdump -d ch_run_backend.cpu0.o | tai
2
3  #include <stdarg.h>
4
5  #include "InitRegs.h"
6
7  #define OUT_MEM 0x7000 // 28672
8
9  asm("addiu $sp, $zero, 0x6ffc");
10
11 void print_integer(int x);
12 int test_operators(int x);
13 int test_control();
14 int sum_i(int amount, ...);
15
16 int main()
17 {
18     int a = 0;
19     a = test_operators(12); // a = 13
20     print_integer(a);
21     a += test_control(); // a = (128+18) = 146
22     print_integer(a);
23     a = sum_i(6, 0, 1, 2, 3, 4, 5);
24     print_integer(a); // a = 15
25
26     return a;
27 }
28
29 // For memory IO
30 void print_integer(int x)
31 {
32     int *p = (int*)OUT_MEM;
33     *p = x;
34     return;
35 }
36
37 void print1_integer(int x)
38 {
39     asm("ld $at, 8($sp)");
40     asm("st $at, 28672($0)");
41     return;
42 }
43
44 #if 0
45 // For instruction IO
46 void print2_integer(int x)
47 {
48     asm("ld $at, 8($sp)");
49     asm("outw $stat");
50     return;
51 }

```

```
52  #endif
53
54 int test_operators(int x)
55 {
56     int a = 11;
57     int b = 2;
58     int c, d, e, f, g, h, i, j, k, l, m, n, o;
59     unsigned int a1 = -11, k1 = 0;
60
61     k = (a >> 2);
62     print_integer(k); // 2
63     k1 = (a1 >> 2);
64     print_integer((int)k1); // 0x3fffffd = 1073741821
65     c = a + b;
66     d = a - b;
67     e = a * b;
68     f = a / b;
69     g = (a & b);
70     h = (a | b);
71     i = (a ^ b);
72     j = (a << 2);
73     l = a % x;
74     m = (a+1)%12;
75
76     n = !a;
77     print_integer(n); // 0
78     int* p = &b;
79     o = *p;
80
81     return (c+d+e+f+g+h+i+j+l+m+o); // (13+9+22+5+2+11+9+44+11+0+2)=128
82 }
83
84 int test_control()
85 {
86     int b = 1;
87     int c = 2;
88     int d = 3;
89     int e = 4;
90     int f = 5;
91
92     if (b != 0) {
93         b++;
94     }
95     if (c > 0) {
96         c++;
97     }
98     if (d >= 0) {
99         d++;
100    }
101    if (e < 0) {
102        e++;
103    }
104    if (f <= 0) {
105        f++;
106    }
107
108    return (b+c+d+e+f); // (2+3+4+4+5)=18
109 }
```

```

110
111 int sum_i(int amount, ...)
112 {
113     int i = 0;
114     int val = 0;
115     int sum = 0;
116
117     va_list vl;
118     va_start(vl, amount);
119     for (i = 0; i < amount; i++)
120     {
121         val = va_arg(vl, int);
122         sum += val;
123     }
124     va_end(vl);
125
126     return sum;
127 }

```

```

JonathantekiiMac:InputFiles Jonathan$ pwd
/Users/Jonathan/test/2/lbd/LLVMBackendTutorialExampleCode/InputFiles
JonathantekiiMac:InputFiles Jonathan$ clang -c ch_run_backend.cpp -emit-llvm -o
ch_run_backend.bc
JonathantekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=obj
ch_run_backend.bc -o ch_run_backend.cpu0.o
JonathantekiiMac:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llvm-objdump -d ch_run_backend.cpu0.o | tail -n +6 | awk '{print /* "
$1 " */\t" $2 " " $3 " " $4 " " $5 "\t/* " $6"\t" $7" " $8" " $9" " $10 "\t*/"}'
> ../cpu0_verilog/raw/cpu0s.hex

```

```

118-165-81-39:raw Jonathan$ cat cpu0s.hex
...
/* 4c: */ 2b 00 00 20 /* jsub 0      */
/* 50: */ 01 2d 00 04 /* st $2, 4($sp)      */
/* 54: */ 2b 00 01 44 /* jsub 0      */

```

As above code the subroutine address for “**jsub #offset**” are 0. This is correct since C language support separate compile and the subroutine address is decided at link time for static address mode or at load time for PIC address mode. Since our backend didn’t implement the linker and loader, we change the “**jsub #offset**” encode in Chapter10\_2/ as follow,

#### LLVMBackendTutorialExampleCode/Chapter10\_2/MCTargetDesc/Cpu0MCCodeEmitter.cpp

```

unsigned Cpu0MCCodeEmitter::  

getJumpTargetOpValue(const MCInst &MI, unsigned OpNo,  

                    SmallVectorImpl<MCFixup> &Fixups) const {  

  

unsigned Opcode = MI.getOpcode();  

...
if (Opcode == Cpu0::JSUB)
    Fixups.push_back(MCFixup::Create(0, Expr,
                                    MCFixupKind(Cpu0::fixup_Cpu0_PC24)));
else if (Opcode == Cpu0::JSUB)
    Fixups.push_back(MCFixup::Create(0, Expr,
                                    MCFixupKind(Cpu0::fixup_Cpu0_24)));
else

```

```

    llvm_unreachable("unexpected opcode in getJumpAbsoluteTargetOpValue()");

    return 0;
}

```

We change JSUB from Relocation Records fixup\_Cpu0\_24 to Non-Relocation Records fixup\_Cpu0\_PC24 as the definition below. This change is fine since if call a outside defined subroutine, it will add a Relocation Record for this “jsub #offset”. At this point, we set it to Non-Relocation Records for run on CPU0 Verilog machine. If one day, the CPU0 linker is appeared and the linker do the sections arrangement, we should adjust it back to Relocation Records. A good linker will reorder the sections for optimization in data/function access. In other word, keep the global variable access as close as possible to reduce cache miss possibility.

### LLVMBackendTutorialExampleCode/Chapter10\_2/MCTargetDesc/Cpu0AsmBackend.cpp

```

const MCFixupKindInfo &getFixupKindInfo(MCFixupKind Kind) const {
    const static MCFixupKindInfo Infos[Cpu0::NumTargetFixupKinds] = {
        // This table *must* be in same the order of fixup_* kinds in
        // Cpu0FixupKinds.h.
        //
        // name                  offset  bits  flags
        ...
        { "fixup_Cpu0_24",      0,      24,    0 },
        ...
        { "fixup_Cpu0_PC24",    0,      24,    MCFixupKindInfo::FKF_IsPCRel },
        ...
    }
    ...
}

```

Let's run the Chapter10\_2/ with llvm-objdump -d again, will get the hex file as follows,

```

Jonathan@tekiiMac:~/InputFiles$ pwd
/Users/Jonathan/test/2/lbd/LLVMBackendTutorialExampleCode/InputFiles
Jonathan@tekiiMac:~/InputFiles$ clang -c ch_run_backend.cpp -emit-llvm -o
ch_run_backend.bc
Jonathan@tekiiMac:~/InputFiles$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=obj
ch_run_backend.bc -o ch_run_backend.cpu0.o
Jonathan@tekiiMac:~/InputFiles$ /Users/Jonathan/llvm/test/cmake_debug_
build/bin/Debug/llvm-objdump -d ch_run_backend.cpu0.o | tail -n +6 | awk '{print /* '
$1 " */\t" $2 " " $3 " " $4 " " $5 "\t/* " $6"\t" $7" " $8" " $9" " $10 "\t*/"}'
> ../cpu0_verilog/raw/cpu0s.hex

118-165-75-55:raw$ cat cpu0s.hex
/* 0: */ 09 10 00 00 /* addiu $1, $zero, 0 */
/* 4: */ 09 20 00 00 /* addiu $2, $zero, 0 */
/* 8: */ 09 30 00 00 /* addiu $3, $zero, 0 */
/* c: */ 09 40 00 00 /* addiu $4, $zero, 0 */
/* 10: */ 09 50 00 00 /* addiu $5, $zero, 0 */
/* 14: */ 09 60 00 00 /* addiu $t9, $zero, 0 */
/* 18: */ 09 70 00 00 /* addiu $7, $zero, 0 */
/* 1c: */ 09 80 00 00 /* addiu $8, $zero, 0 */
/* 20: */ 09 90 00 00 /* addiu $9, $zero, 0 */
/* 24: */ 09 a0 00 00 /* addiu $gp, $zero, 0 */
/* 28: */ 09 b0 00 00 /* addiu $fp, $zero, 0 */
/* 2c: */ 09 c0 00 00 /* addiu $sw, $zero, 0 */
/* 30: */ 09 e0 ff ff /* addiu $lr, $zero, -1 */

```

```

/* 34: */ 09 d0 6f fc /* addiu $sp, $zero, 28668 */
/* */ /* */
/* main: */ /* */
/* 38: */ 09 dd ff b0 /* addiu $sp, $sp, -80 */
/* 3c: */ 02 ed 00 4c /* st $lr, 76($sp) */
/* 40: */ 02 7d 00 48 /* st $7, 72($sp) */
/* 44: */ 09 70 00 00 /* addiu $7, $zero, 0 */
/* 48: */ 02 7d 00 44 /* st $7, 68($sp) */
/* 4c: */ 02 7d 00 40 /* st $7, 64($sp) */
/* 50: */ 09 20 00 0c /* addiu $2, $zero, 12 */
/* 54: */ 02 2d 00 00 /* st $2, 0($sp) */
/* 58: */ 2b 00 00 7c /* jsub 124 */
/* 5c: */ 02 2d 00 40 /* st $2, 64($sp) */
/* 60: */ 02 2d 00 00 /* st $2, 0($sp) */
/* 64: */ 2b 00 02 28 /* jsub 552 */
/* 68: */ 2b 00 02 48 /* jsub 584 */
/* 6c: */ 01 3d 00 40 /* ld $3, 64($sp) */
/* 70: */ 11 23 20 00 /* addu $2, $3, $2 */
/* 74: */ 02 2d 00 40 /* st $2, 64($sp) */
/* 78: */ 02 2d 00 00 /* st $2, 0($sp) */
/* 7c: */ 2b 00 02 10 /* jsub 528 */
/* 80: */ 09 20 00 05 /* addiu $2, $zero, 5 */
/* 84: */ 02 2d 00 18 /* st $2, 24($sp) */
/* 88: */ 09 20 00 04 /* addiu $2, $zero, 4 */
/* 8c: */ 02 2d 00 14 /* st $2, 20($sp) */
/* 90: */ 09 20 00 03 /* addiu $2, $zero, 3 */
/* 94: */ 02 2d 00 10 /* st $2, 16($sp) */
/* 98: */ 09 20 00 02 /* addiu $2, $zero, 2 */
/* 9c: */ 02 2d 00 0c /* st $2, 12($sp) */
/* a0: */ 09 20 00 01 /* addiu $2, $zero, 1 */
/* a4: */ 02 2d 00 08 /* st $2, 8($sp) */
/* a8: */ 02 7d 00 04 /* st $7, 4($sp) */
/* ac: */ 09 20 00 06 /* addiu $2, $zero, 6 */
/* b0: */ 02 2d 00 00 /* st $2, 0($sp) */
/* b4: */ 2b 00 02 d4 /* jsub 724 */
/* b8: */ 02 2d 00 40 /* st $2, 64($sp) */
/* bc: */ 02 2d 00 00 /* st $2, 0($sp) */
/* c0: */ 2b 00 01 cc /* jsub 460 */
/* c4: */ 01 2d 00 40 /* ld $2, 64($sp) */
/* c8: */ 01 7d 00 48 /* ld $7, 72($sp) */
/* cc: */ 01 ed 00 4c /* ld $lr, 76($sp) */
/* d0: */ 09 dd 00 50 /* addiu $sp, $sp, 80 */
/* d4: */ 2c e0 00 00 /* ret $lr */
/* */ /* */
/* _Z14test_operatorsi: */ /* */
/* d8: */ 09 dd ff 98 /* addiu $sp, $sp, -104 */
/* dc: */ 02 ed 00 64 /* st $lr, 100($sp) */
/* e0: */ 02 7d 00 60 /* st $7, 96($sp) */
/* e4: */ 01 2d 00 68 /* ld $2, 104($sp) */
/* e8: */ 02 2d 00 5c /* st $2, 92($sp) */
/* ec: */ 09 20 00 0b /* addiu $2, $zero, 11 */
/* f0: */ 02 2d 00 58 /* st $2, 88($sp) */
/* f4: */ 09 20 00 02 /* addiu $2, $zero, 2 */
/* f8: */ 02 2d 00 54 /* st $2, 84($sp) */
/* fc: */ 09 20 ff f5 /* addiu $2, $zero, -11 */
/* 100: */ 02 2d 00 1c /* st $2, 28($sp) */
/* 104: */ 09 70 00 00 /* addiu $7, $zero, 0 */
/* 108: */ 02 7d 00 18 /* st $7, 24($sp) */

```

```

/* 10c: */ 01 2d 00 58 /* ld $2, 88($sp) */
/* 110: */ 1b 22 00 02 /* sra $2, $2, 2 */
/* 114: */ 02 2d 00 30 /* st $2, 48($sp) */
/* 118: */ 02 2d 00 00 /* st $2, 0($sp) */
/* 11c: */ 2b 00 01 70 /* jsub 368 */
/* 120: */ 01 2d 00 1c /* ld $2, 28($sp) */
/* 124: */ 1f 22 00 02 /* shr $2, $2, 2 */
/* 128: */ 02 2d 00 18 /* st $2, 24($sp) */
/* 12c: */ 02 2d 00 00 /* st $2, 0($sp) */
/* 130: */ 2b 00 01 5c /* jsub 348 */
/* 134: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 138: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 13c: */ 11 23 20 00 /* addu $2, $3, $2 */
/* 140: */ 02 2d 00 50 /* st $2, 80($sp) */
/* 144: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 148: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 14c: */ 14 23 20 00 /* sub $2, $3, $2 */
/* 150: */ 02 2d 00 4c /* st $2, 76($sp) */
/* 154: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 158: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 15c: */ 15 23 20 00 /* mul $2, $3, $2 */
/* 160: */ 02 2d 00 48 /* st $2, 72($sp) */
/* 164: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 168: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 16c: */ 16 32 00 00 /* div $3, $2 */
/* 170: */ 41 20 00 00 /* mflo $2 */
/* 174: */ 02 2d 00 44 /* st $2, 68($sp) */
/* 178: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 17c: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 180: */ 18 23 20 00 /* and $2, $3, $2 */
/* 184: */ 02 2d 00 40 /* st $2, 64($sp) */
/* 188: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 18c: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 190: */ 19 23 20 00 /* or $2, $3, $2 */
/* 194: */ 02 2d 00 3c /* st $2, 60($sp) */
/* 198: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 19c: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 1a0: */ 1a 23 20 00 /* xor $2, $3, $2 */
/* 1a4: */ 02 2d 00 38 /* st $2, 56($sp) */
/* 1a8: */ 01 2d 00 58 /* ld $2, 88($sp) */
/* 1ac: */ 1e 22 00 02 /* shl $2, $2, 2 */
/* 1b0: */ 02 2d 00 34 /* st $2, 52($sp) */
/* 1b4: */ 01 2d 00 5c /* ld $2, 92($sp) */
/* 1b8: */ 01 3d 00 58 /* ld $3, 88($sp) */
/* 1bc: */ 16 32 00 00 /* div $3, $2 */
/* 1c0: */ 40 20 00 00 /* mfhi $2 */
/* 1c4: */ 02 2d 00 2c /* st $2, 44($sp) */
/* 1c8: */ 09 20 2a aa /* addiu $2, $zero, 10922 */
/* 1cc: */ 1e 22 00 10 /* shl $2, $2, 16 */
/* 1d0: */ 0d 32 aa ab /* ori $3, $2, 43691 */
/* 1d4: */ 01 2d 00 58 /* ld $2, 88($sp) */
/* 1d8: */ 09 22 00 01 /* addiu $2, $2, 1 */
/* 1dc: */ 50 23 00 00 /* mult $2, $3 */
/* 1e0: */ 40 30 00 00 /* mfhi $3 */
/* 1e4: */ 1f 43 00 1f /* shr $4, $3, 31 */
/* 1e8: */ 1b 33 00 01 /* sra $3, $3, 1 */
/* 1ec: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 1f0: */ 09 40 00 0c /* addiu $4, $zero, 12 */

```

```

/* 1f4: */ 15 33 40 00 /* mul $3, $3, $4 */
/* 1f8: */ 14 22 30 00 /* sub $2, $2, $3 */
/* 1fc: */ 02 2d 00 28 /* st $2, 40($sp) */
/* 200: */ 01 2d 00 58 /* ld $2, 88($sp) */
/* 204: */ 1a 22 70 00 /* xor $2, $2, $7 */
/* 208: */ 09 30 00 01 /* addiu $3, $zero, 1 */
/* 20c: */ 1a 22 30 00 /* xor $2, $2, $3 */
/* 210: */ 18 22 30 00 /* and $2, $2, $3 */
/* 214: */ 02 2d 00 24 /* st $2, 36($sp) */
/* 218: */ 02 2d 00 00 /* st $2, 0($sp) */
/* 21c: */ 2b 00 00 70 /* jsub 112 */
/* 220: */ 09 2d 00 54 /* addiu $2, $sp, 84 */
/* 224: */ 02 2d 00 14 /* st $2, 20($sp) */
/* 228: */ 01 2d 00 54 /* ld $2, 84($sp) */
/* 22c: */ 02 2d 00 20 /* st $2, 32($sp) */
/* 230: */ 01 3d 00 4c /* ld $3, 76($sp) */
/* 234: */ 01 4d 00 50 /* ld $4, 80($sp) */
/* 238: */ 11 34 30 00 /* addu $3, $4, $3 */
/* 23c: */ 01 4d 00 48 /* ld $4, 72($sp) */
/* 240: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 244: */ 01 4d 00 44 /* ld $4, 68($sp) */
/* 248: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 24c: */ 01 4d 00 40 /* ld $4, 64($sp) */
/* 250: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 254: */ 01 4d 00 3c /* ld $4, 60($sp) */
/* 258: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 25c: */ 01 4d 00 38 /* ld $4, 56($sp) */
/* 260: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 264: */ 01 4d 00 34 /* ld $4, 52($sp) */
/* 268: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 26c: */ 01 4d 00 2c /* ld $4, 44($sp) */
/* 270: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 274: */ 01 4d 00 28 /* ld $4, 40($sp) */
/* 278: */ 11 33 40 00 /* addu $3, $3, $4 */
/* 27c: */ 11 23 20 00 /* addu $2, $3, $2 */
/* 280: */ 01 7d 00 60 /* ld $7, 96($sp) */
/* 284: */ 01 ed 00 64 /* ld $lr, 100($sp) */
/* 288: */ 09 dd 00 68 /* addiu $sp, $sp, 104 */
/* 28c: */ 2c e0 00 00 /* ret $lr */
/* */ /* */
/* _Z13print_integeri: */ /* */
/* 290: */ 09 dd ff f8 /* addiu $sp, $sp, -8 */
/* 294: */ 01 2d 00 08 /* ld $2, 8($sp) */
/* 298: */ 02 2d 00 04 /* st $2, 4($sp) */
/* 29c: */ 09 20 70 00 /* addiu $2, $zero, 28672 */
/* 2a0: */ 02 2d 00 00 /* st $2, 0($sp) */
/* 2a4: */ 01 3d 00 04 /* ld $3, 4($sp) */
/* 2a8: */ 02 32 00 00 /* st $3, 0($2) */
/* 2ac: */ 09 dd 00 08 /* addiu $sp, $sp, 8 */
/* 2b0: */ 2c e0 00 00 /* ret $lr */
/* */ /* */
/* _Z12test_controlv: */ /* */
/* 2b4: */ 09 dd ff e8 /* addiu $sp, $sp, -24 */
/* 2b8: */ 09 30 00 01 /* addiu $3, $zero, 1 */
/* 2bc: */ 02 3d 00 14 /* st $3, 20($sp) */
/* 2c0: */ 09 20 00 02 /* addiu $2, $zero, 2 */
/* 2c4: */ 02 2d 00 10 /* st $2, 16($sp) */
/* 2c8: */ 09 20 00 03 /* addiu $2, $zero, 3 */

```

```

/* 2cc: */ 02 2d 00 0c /* st $2, 12($sp) */
/* 2d0: */ 09 20 00 04 /* addiu $2, $zero, 4 */
/* 2d4: */ 02 2d 00 08 /* st $2, 8($sp) */
/* 2d8: */ 09 20 00 05 /* addiu $2, $zero, 5 */
/* 2dc: */ 02 2d 00 04 /* st $2, 4($sp) */
/* 2e0: */ 09 20 00 00 /* addiu $2, $zero, 0 */
/* 2e4: */ 01 4d 00 14 /* ld $4, 20($sp) */
/* 2e8: */ 10 42 00 00 /* cmp $zero, $4, $2 */
/* 2ec: */ 20 00 00 0c /* jeq $zero, 12 */
/* 2f0: */ 01 4d 00 14 /* ld $4, 20($sp) */
/* 2f4: */ 09 44 00 01 /* addiu $4, $4, 1 */
/* 2f8: */ 02 4d 00 14 /* st $4, 20($sp) */
/* 2fc: */ 01 4d 00 10 /* ld $4, 16($sp) */
/* 300: */ 10 43 00 00 /* cmp $zero, $4, $3 */
/* 304: */ 22 00 00 0c /* jlt $zero, 12 */
/* 308: */ 01 3d 00 10 /* ld $3, 16($sp) */
/* 30c: */ 09 33 00 01 /* addiu $3, $3, 1 */
/* 310: */ 02 3d 00 10 /* st $3, 16($sp) */
/* 314: */ 01 3d 00 0c /* ld $3, 12($sp) */
/* 318: */ 10 32 00 00 /* cmp $zero, $3, $2 */
/* 31c: */ 22 00 00 0c /* jlt $zero, 12 */
/* 320: */ 01 3d 00 0c /* ld $3, 12($sp) */
/* 324: */ 09 33 00 01 /* addiu $3, $3, 1 */
/* 328: */ 02 3d 00 0c /* st $3, 12($sp) */
/* 32c: */ 09 30 ff ff /* addiu $3, $zero, -1 */
/* 330: */ 01 4d 00 08 /* ld $4, 8($sp) */
/* 334: */ 10 43 00 00 /* cmp $zero, $4, $3 */
/* 338: */ 23 00 00 0c /* jgt $zero, 12 */
/* 33c: */ 01 3d 00 08 /* ld $3, 8($sp) */
/* 340: */ 09 33 00 01 /* addiu $3, $3, 1 */
/* 344: */ 02 3d 00 08 /* st $3, 8($sp) */
/* 348: */ 01 3d 00 04 /* ld $3, 4($sp) */
/* 34c: */ 10 32 00 00 /* cmp $zero, $3, $2 */
/* 350: */ 23 00 00 0c /* jgt $zero, 12 */
/* 354: */ 01 2d 00 04 /* ld $2, 4($sp) */
/* 358: */ 09 22 00 01 /* addiu $2, $2, 1 */
/* 35c: */ 02 2d 00 04 /* st $2, 4($sp) */
/* 360: */ 01 2d 00 10 /* ld $2, 16($sp) */
/* 364: */ 01 3d 00 14 /* ld $3, 20($sp) */
/* 368: */ 11 23 20 00 /* addu $2, $3, $2 */
/* 36c: */ 01 3d 00 0c /* ld $3, 12($sp) */
/* 370: */ 11 22 30 00 /* addu $2, $2, $3 */
/* 374: */ 01 3d 00 08 /* ld $3, 8($sp) */
/* 378: */ 11 22 30 00 /* addu $2, $2, $3 */
/* 37c: */ 01 3d 00 04 /* ld $3, 4($sp) */
/* 380: */ 11 22 30 00 /* addu $2, $2, $3 */
/* 384: */ 09 dd 00 18 /* addiu $sp, $sp, 24 */
/* 388: */ 2c e0 00 00 /* ret $lr */
/* */
/* _Z5sum_iiz: */ /* */
/* 38c: */ 09 dd ff e8 /* addiu $sp, $sp, -24 */
/* 390: */ 01 2d 00 18 /* ld $2, 24($sp) */
/* 394: */ 02 2d 00 14 /* st $2, 20($sp) */
/* 398: */ 09 20 00 00 /* addiu $2, $zero, 0 */
/* 39c: */ 02 2d 00 10 /* st $2, 16($sp) */
/* 3a0: */ 02 2d 00 0c /* st $2, 12($sp) */
/* 3a4: */ 02 2d 00 08 /* st $2, 8($sp) */
/* 3a8: */ 09 3d 00 1c /* addiu $3, $sp, 28 */

```

```

/* 3ac: */ 02 3d 00 04 /* st $3, 4($sp)      */
/* 3b0: */ 02 2d 00 10 /* st $2, 16($sp)    */
/* 3b4: */ 01 2d 00 14 /* ld $2, 20($sp)   */
/* 3b8: */ 01 3d 00 10 /* ld $3, 16($sp)   */
/* 3bc: */ 10 32 00 00 /* cmp  $zero, $3, $2   */
/* 3c0: */ 25 00 00 30 /* jge  $zero, 48   */
/* 3c4: */ 01 2d 00 04 /* ld $2, 4($sp)    */
/* 3c8: */ 09 32 00 04 /* addiu $3, $2, 4   */
/* 3cc: */ 02 3d 00 04 /* st $3, 4($sp)    */
/* 3d0: */ 01 22 00 00 /* ld $2, 0($2)     */
/* 3d4: */ 02 2d 00 0c /* st $2, 12($sp)   */
/* 3d8: */ 01 3d 00 08 /* ld $3, 8($sp)    */
/* 3dc: */ 11 23 20 00 /* addu $2, $3, $2   */
/* 3e0: */ 02 2d 00 08 /* st $2, 8($sp)    */
/* 3e4: */ 01 2d 00 10 /* ld $2, 16($sp)   */
/* 3e8: */ 09 22 00 01 /* addiu $2, $2, 1   */
/* 3ec: */ 02 2d 00 10 /* st $2, 16($sp)   */
/* 3f0: */ 26 ff ff c0 /* jmp  -64      */
/* 3f4: */ 01 2d 00 08 /* ld $2, 8($sp)    */
/* 3f8: */ 09 dd 00 18 /* addiu $sp, $sp, 24 */
/* 3fc: */ 2c e0 00 00 /* ret  $lr      */
/* */ /* */
/* _Z14print1_integeri: */ /* */
/* 400: */ 09 dd ff f8 /* addiu $sp, $sp, -8 */
/* 404: */ 01 2d 00 08 /* ld $2, 8($sp)   */
/* 408: */ 02 2d 00 04 /* st $2, 4($sp)   */
/* 40c: */ 01 1d 00 08 /* ld $1, 8($sp)   */
/* 410: */ 02 10 70 00 /* st $1, 28672($zero) */
/* 414: */ 09 dd 00 08 /* addiu $sp, $sp, 8  */
/* 418: */ 2c e0 00 00 /* ret  $lr      */

```

From above result, you can find the `print_integer()` which implemented by C language has 8 instructions while the `print1_integer()` which implemented by assembly has 6 instructions. But the C version is better in portability since the assembly is binding with machine assembly language and make the assumption that the stack size of `print1_integer()` is 8. Now, run the `cpu0` backend to get the result as follows,

```

118-165-75-55:raw Jonathan$ ./cpu0s
WARNING: cpu0s.v:224: $readmemh(cpu0s.hex): Not enough words in the file for the requested range [0:2
00000000: 09100000
00000004: 09200000
...
00000418: 2ce00000
    90ns 00000000 : 09100000 R[01]=00000000=0      SW=00000000
    170ns 00000004 : 09200000 R[02]=00000000=0      SW=00000000
    250ns 00000008 : 09300000 R[03]=00000000=0      SW=00000000
    330ns 0000000c : 09400000 R[04]=00000000=0      SW=00000000
    410ns 00000010 : 09500000 R[05]=00000000=0      SW=00000000
    490ns 00000014 : 09600000 R[06]=00000000=0      SW=00000000
    570ns 00000018 : 09700000 R[07]=00000000=0      SW=00000000
    650ns 0000001c : 09800000 R[08]=00000000=0      SW=00000000
    730ns 00000020 : 09900000 R[09]=00000000=0      SW=00000000
    810ns 00000024 : 09a00000 R[10]=00000000=0      SW=00000000
    890ns 00000028 : 09b00000 R[11]=00000000=0      SW=00000000
    970ns 0000002c : 09c00000 R[12]=00000000=0      SW=00000000
    1050ns 00000030 : 09e0ffff R[14]=ffffffffff=-1  SW=00000000
    1130ns 00000034 : 09d06ffc R[13]=00006ffc=28668 SW=00000000
    1210ns 00000038 : 09ddfffb0 R[13]=00006fac=28588 SW=00000000
    1290ns 0000003c : 02ed004c m[28588+76 ]=-1      SW=00000000

```

```

1370ns 00000040 : 027d0048 m[28588+72 ]=0           SW=00000000
1450ns 00000044 : 09700000 R[07]=00000000=0          SW=00000000
1530ns 00000048 : 027d0044 m[28588+68 ]=0          SW=00000000
1610ns 0000004c : 027d0040 m[28588+64 ]=0          SW=00000000
1690ns 00000050 : 0920000c R[02]=0000000c=12         SW=00000000
1770ns 00000054 : 022d0000 m[28588+0   ]=12         SW=00000000
1850ns 00000058 : 2b00007c R[00]=00000000=0          SW=00000000
1930ns 000000d8 : 09ddff98 R[13]=00006f44=28484      SW=00000000
2010ns 000000dc : 02ed0064 m[28484+100 ]=92         SW=00000000
2090ns 000000e0 : 027d0060 m[28484+96 ]=0          SW=00000000
2170ns 000000e4 : 012d0068 R[02]=0000000c=12         SW=00000000
2250ns 000000e8 : 022d005c m[28484+92 ]=12         SW=00000000
2330ns 000000ec : 0920000b R[02]=0000000b=11         SW=00000000
2410ns 000000f0 : 022d0058 m[28484+88 ]=11         SW=00000000
2490ns 000000f4 : 09200002 R[02]=00000002=2          SW=00000000
2570ns 000000f8 : 022d0054 m[28484+84 ]=2          SW=00000000
2650ns 000000fc : 0920ffff5 R[02]=ffffffff5=-11       SW=00000000
2730ns 00000100 : 022d001c m[28484+28   ]=-11        SW=00000000
2810ns 00000104 : 09700000 R[07]=00000000=0          SW=00000000
2890ns 00000108 : 027d0018 m[28484+24   ]=0          SW=00000000
2970ns 0000010c : 012d0058 R[02]=0000000b=11         SW=00000000
3050ns 00000110 : 1b220002 R[02]=00000002=2          SW=00000000
3130ns 00000114 : 022d0030 m[28484+48   ]=2          SW=00000000
3210ns 00000118 : 022d0000 m[28484+0   ]=2          SW=00000000
3290ns 0000011c : 2b000170 R[00]=00000000=0          SW=00000000
3370ns 00000290 : 09ddff8 R[13]=00006f3c=28476      SW=00000000
3450ns 00000294 : 012d0008 R[02]=00000002=2          SW=00000000
3530ns 00000298 : 022d0004 m[28476+4   ]=2          SW=00000000
3610ns 0000029c : 09207000 R[02]=00007000=28672      SW=00000000
3690ns 000002a0 : 022d0000 m[28476+0   ]=28672      SW=00000000
3770ns 000002a4 : 013d0004 R[03]=00000002=2          SW=00000000
3850ns 000002a8 : 02320000 OUTPUT=2                  SW=00000000
3930ns 000002ac : 09dd0008 R[13]=00006f44=28484      SW=00000000
4010ns 000002b0 : 2ce00000 R[14]=00000120=288        SW=00000000
4090ns 00000120 : 012d001c R[02]=ffffffff5=-11       SW=00000000
4170ns 00000124 : 1f220002 R[02]=3fffffff=1073741821 SW=00000000
4250ns 00000128 : 022d0018 m[28484+24   ]=1073741821 SW=00000000
4330ns 0000012c : 022d0000 m[28484+0   ]=1073741821 SW=00000000
4410ns 00000130 : 2b00015c R[00]=00000000=0          SW=00000000
4490ns 00000290 : 09ddff8 R[13]=00006f3c=28476      SW=00000000
4570ns 00000294 : 012d0008 R[02]=3fffffff=1073741821 SW=00000000
4650ns 00000298 : 022d0004 m[28476+4   ]=1073741821 SW=00000000
4730ns 0000029c : 09207000 R[02]=00007000=28672      SW=00000000
4810ns 000002a0 : 022d0000 m[28476+0   ]=28672      SW=00000000
4890ns 000002a4 : 013d0004 R[03]=3fffffff=1073741821 SW=00000000
4970ns 000002a8 : 02320000 OUTPUT=1073741821        SW=00000000
5050ns 000002ac : 09dd0008 R[13]=00006f44=28484      SW=00000000
5130ns 000002b0 : 2ce00000 R[14]=00000134=308        SW=00000000
5210ns 00000134 : 012d0054 R[02]=00000002=2          SW=00000000
5290ns 00000138 : 013d0058 R[03]=0000000b=11         SW=00000000
5370ns 0000013c : 11232000 R[02]=0000000d=13         SW=00000000
5450ns 00000140 : 022d0050 m[28484+80   ]=13         SW=00000000
5530ns 00000144 : 012d0054 R[02]=00000002=2          SW=00000000
5610ns 00000148 : 013d0058 R[03]=0000000b=11         SW=00000000
5690ns 0000014c : 14232000 R[02]=00000009=9          SW=00000000
5770ns 00000150 : 022d004c m[28484+76   ]=9          SW=00000000
5850ns 00000154 : 012d0054 R[02]=00000002=2          SW=00000000
5930ns 00000158 : 013d0058 R[03]=0000000b=11         SW=00000000

```

```

6010ns 0000015c : 15232000 R[02]=00000016=22 SW=00000000
6090ns 00000160 : 022d0048 m[28484+72 ]=22 SW=00000000
6170ns 00000164 : 012d0054 R[02]=00000002=2 SW=00000000
6250ns 00000168 : 013d0058 R[03]=0000000b=11 SW=00000000
6330ns 0000016c : 16320000 HI=00000001 LO=00000005 SW=00000000
6410ns 00000170 : 41200000 R[02]=00000005=5 SW=00000000
6490ns 00000174 : 022d0044 m[28484+68 ]=5 SW=00000000
6570ns 00000178 : 012d0054 R[02]=00000002=2 SW=00000000
6650ns 0000017c : 013d0058 R[03]=0000000b=11 SW=00000000
6730ns 00000180 : 18232000 R[02]=00000002=2 SW=00000000
6810ns 00000184 : 022d0040 m[28484+64 ]=2 SW=00000000
6890ns 00000188 : 012d0054 R[02]=00000002=2 SW=00000000
6970ns 0000018c : 013d0058 R[03]=0000000b=11 SW=00000000
7050ns 00000190 : 19232000 R[02]=0000000b=11 SW=00000000
7130ns 00000194 : 022d003c m[28484+60 ]=11 SW=00000000
7210ns 00000198 : 012d0054 R[02]=00000002=2 SW=00000000
7290ns 0000019c : 013d0058 R[03]=0000000b=11 SW=00000000
7370ns 000001a0 : 1a232000 R[02]=00000009=9 SW=00000000
7450ns 000001a4 : 022d0038 m[28484+56 ]=9 SW=00000000
7530ns 000001a8 : 012d0058 R[02]=0000000b=11 SW=00000000
7610ns 000001ac : 1e220002 R[02]=0000002c=44 SW=00000000
7690ns 000001b0 : 022d0034 m[28484+52 ]=44 SW=00000000
7770ns 000001b4 : 012d005c R[02]=0000000c=12 SW=00000000
7850ns 000001b8 : 013d0058 R[03]=0000000b=11 SW=00000000
7930ns 000001bc : 16320000 HI=0000000b LO=00000000 SW=00000000
8010ns 000001c0 : 40200000 R[02]=0000000b=11 SW=00000000
8090ns 000001c4 : 022d002c m[28484+44 ]=11 SW=00000000
8170ns 000001c8 : 09202aaa R[02]=00002aaa=10922 SW=00000000
8250ns 000001cc : 1e220010 R[02]=2aaa0000=715784192 SW=00000000
8330ns 000001d0 : 0d32aaab R[03]=2aaaaaaab=715827883 SW=00000000
8410ns 000001d4 : 012d0058 R[02]=0000000b=11 SW=00000000
8490ns 000001d8 : 09220001 R[02]=0000000c=12 SW=00000000
8570ns 000001dc : 50230000 HI=00000002 LO=00000004 SW=00000000
8650ns 000001e0 : 40300000 R[03]=00000002=2 SW=00000000
8730ns 000001e4 : 1f43001f R[04]=00000000=0 SW=00000000
8810ns 000001e8 : 1b330001 R[03]=00000001=1 SW=00000000
8890ns 000001ec : 11334000 R[03]=00000001=1 SW=00000000
8970ns 000001f0 : 0940000c R[04]=0000000c=12 SW=00000000
9050ns 000001f4 : 15334000 R[03]=0000000c=12 SW=00000000
9130ns 000001f8 : 14223000 R[02]=00000000=0 SW=00000000
9210ns 000001fc : 022d0028 m[28484+40 ]=0 SW=00000000
9290ns 00000200 : 012d0058 R[02]=0000000b=11 SW=00000000
9370ns 00000204 : 1a227000 R[02]=0000000b=11 SW=00000000
9450ns 00000208 : 09300001 R[03]=00000001=1 SW=00000000
9530ns 0000020c : 1a223000 R[02]=0000000a=10 SW=00000000
9610ns 00000210 : 18223000 R[02]=00000000=0 SW=00000000
9690ns 00000214 : 022d0024 m[28484+36 ]=0 SW=00000000
9770ns 00000218 : 022d0000 m[28484+0 ]=0 SW=00000000
9850ns 0000021c : 2b000070 R[00]=00000000=0 SW=00000000
9930ns 00000290 : 09ddfff8 R[13]=00006f3c=28476 SW=00000000
10010ns 00000294 : 012d0008 R[02]=00000000=0 SW=00000000
10090ns 00000298 : 022d0004 m[28476+4 ]=0 SW=00000000
10170ns 0000029c : 09207000 R[02]=00007000=28672 SW=00000000
10250ns 000002a0 : 022d0000 m[28476+0 ]=28672 SW=00000000
10330ns 000002a4 : 013d0004 R[03]=00000000=0 SW=00000000
10410ns 000002a8 : 02320000 OUTPUT=0
10490ns 000002ac : 09dd0008 R[13]=00006f44=28484 SW=00000000
10570ns 000002b0 : 2ce00000 R[14]=00000220=544 SW=00000000

```

```

10650ns 00000220 : 092d0054 R[02]=00006f98=28568 SW=00000000
10730ns 00000224 : 022d0014 m[28484+20]=28568 SW=00000000
10810ns 00000228 : 012d0054 R[02]=00000002=2 SW=00000000
10890ns 0000022c : 022d0020 m[28484+32]=2 SW=00000000
10970ns 00000230 : 013d004c R[03]=00000009=9 SW=00000000
11050ns 00000234 : 014d0050 R[04]=0000000d=13 SW=00000000
11130ns 00000238 : 11343000 R[03]=00000016=22 SW=00000000
11210ns 0000023c : 014d0048 R[04]=00000016=22 SW=00000000
11290ns 00000240 : 11334000 R[03]=0000002c=44 SW=00000000
11370ns 00000244 : 014d0044 R[04]=00000005=5 SW=00000000
11450ns 00000248 : 11334000 R[03]=00000031=49 SW=00000000
11530ns 0000024c : 014d0040 R[04]=00000002=2 SW=00000000
11610ns 00000250 : 11334000 R[03]=00000033=51 SW=00000000
11690ns 00000254 : 014d003c R[04]=0000000b=11 SW=00000000
11770ns 00000258 : 11334000 R[03]=0000003e=62 SW=00000000
11850ns 0000025c : 014d0038 R[04]=00000009=9 SW=00000000
11930ns 00000260 : 11334000 R[03]=00000047=71 SW=00000000
12010ns 00000264 : 014d0034 R[04]=0000002c=44 SW=00000000
12090ns 00000268 : 11334000 R[03]=00000073=115 SW=00000000
12170ns 0000026c : 014d002c R[04]=0000000b=11 SW=00000000
12250ns 00000270 : 11334000 R[03]=0000007e=126 SW=00000000
12330ns 00000274 : 014d0028 R[04]=00000000=0 SW=00000000
12410ns 00000278 : 11334000 R[03]=0000007e=126 SW=00000000
12490ns 0000027c : 11232000 R[02]=00000080=128 SW=00000000
12570ns 00000280 : 017d0060 R[07]=00000000=0 SW=00000000
12650ns 00000284 : 01ed0064 R[14]=0000005c=92 SW=00000000
12730ns 00000288 : 09dd0068 R[13]=00006fac=28588 SW=00000000
12810ns 0000028c : 2ce00000 R[14]=0000005c=92 SW=00000000
12890ns 0000005c : 022d0040 m[28588+64]=128 SW=00000000
12970ns 00000060 : 022d0000 m[28588+0]=128 SW=00000000
13050ns 00000064 : 2b000228 R[00]=00000000=0 SW=00000000
13130ns 00000290 : 09ddffff R[13]=00006fa4=28580 SW=00000000
13210ns 00000294 : 012d0008 R[02]=00000080=128 SW=00000000
13290ns 00000298 : 022d0004 m[28580+4]=128 SW=00000000
13370ns 0000029c : 09207000 R[02]=00007000=28672 SW=00000000
13450ns 000002a0 : 022d0000 m[28580+0]=28672 SW=00000000
13530ns 000002a4 : 013d0004 R[03]=00000080=128 SW=00000000
13610ns 000002a8 : 02320000 OUTPUT=128
13690ns 000002ac : 09dd0008 R[13]=00006fac=28588 SW=00000000
13770ns 000002b0 : 2ce00000 R[14]=00000068=104 SW=00000000
13850ns 00000068 : 2b000248 R[00]=00000000=0 SW=00000000
13930ns 000002b4 : 09ddfffe R[13]=00006f94=28564 SW=00000000
14010ns 000002b8 : 09300001 R[03]=00000001=1 SW=00000000
14090ns 000002bc : 023d0014 m[28564+20]=1 SW=00000000
14170ns 000002c0 : 09200002 R[02]=00000002=2 SW=00000000
14250ns 000002c4 : 022d0010 m[28564+16]=2 SW=00000000
14330ns 000002c8 : 09200003 R[02]=00000003=3 SW=00000000
14410ns 000002cc : 022d000c m[28564+12]=3 SW=00000000
14490ns 000002d0 : 09200004 R[02]=00000004=4 SW=00000000
14570ns 000002d4 : 022d0008 m[28564+8]=4 SW=00000000
14650ns 000002d8 : 09200005 R[02]=00000005=5 SW=00000000
14730ns 000002dc : 022d0004 m[28564+4]=5 SW=00000000
14810ns 000002e0 : 09200000 R[02]=00000000=0 SW=00000000
14890ns 000002e4 : 014d0014 R[04]=00000001=1 SW=00000000
14970ns 000002e8 : 10420000 R[04]=00000001=1 SW=00000000
15050ns 000002ec : 2000000c R[00]=00000000=0 SW=00000000
15130ns 000002f0 : 014d0014 R[04]=00000001=1 SW=00000000
15210ns 000002f4 : 09440001 R[04]=00000002=2 SW=00000000

```

|                  |   |                               |             |
|------------------|---|-------------------------------|-------------|
| 15290ns 000002f8 | : | 024d0014 m[28564+20 ]=2       | SW=00000000 |
| 15370ns 000002fc | : | 014d0010 R[04]=00000002=2     | SW=00000000 |
| 15450ns 00000300 | : | 10430000 R[04]=00000002=2     | SW=00000000 |
| 15530ns 00000304 | : | 2200000c R[00]=00000000=0     | SW=00000000 |
| 15610ns 00000308 | : | 013d0010 R[03]=00000002=2     | SW=00000000 |
| 15690ns 0000030c | : | 09330001 R[03]=00000003=3     | SW=00000000 |
| 15770ns 00000310 | : | 023d0010 m[28564+16 ]=3       | SW=00000000 |
| 15850ns 00000314 | : | 013d000c R[03]=00000003=3     | SW=00000000 |
| 15930ns 00000318 | : | 10320000 R[03]=00000003=3     | SW=00000000 |
| 16010ns 0000031c | : | 2200000c R[00]=00000000=0     | SW=00000000 |
| 16090ns 00000320 | : | 013d000c R[03]=00000003=3     | SW=00000000 |
| 16170ns 00000324 | : | 09330001 R[03]=00000004=4     | SW=00000000 |
| 16250ns 00000328 | : | 023d000c m[28564+12 ]=4       | SW=00000000 |
| 16330ns 0000032c | : | 0930ffff R[03]=ffffffff=-1    | SW=00000000 |
| 16410ns 00000330 | : | 014d0008 R[04]=00000004=4     | SW=00000000 |
| 16490ns 00000334 | : | 10430000 R[04]=00000004=4     | SW=00000000 |
| 16570ns 00000338 | : | 2300000c R[00]=00000000=0     | SW=00000000 |
| 16650ns 00000348 | : | 013d0004 R[03]=00000005=5     | SW=00000000 |
| 16730ns 0000034c | : | 10320000 R[03]=00000005=5     | SW=00000000 |
| 16810ns 00000350 | : | 2300000c R[00]=00000000=0     | SW=00000000 |
| 16890ns 00000360 | : | 012d0010 R[02]=00000003=3     | SW=00000000 |
| 16970ns 00000364 | : | 013d0014 R[03]=00000002=2     | SW=00000000 |
| 17050ns 00000368 | : | 11232000 R[02]=00000005=5     | SW=00000000 |
| 17130ns 0000036c | : | 013d000c R[03]=00000004=4     | SW=00000000 |
| 17210ns 00000370 | : | 11223000 R[02]=00000009=9     | SW=00000000 |
| 17290ns 00000374 | : | 013d0008 R[03]=00000004=4     | SW=00000000 |
| 17370ns 00000378 | : | 11223000 R[02]=0000000d=13    | SW=00000000 |
| 17450ns 0000037c | : | 013d0004 R[03]=00000005=5     | SW=00000000 |
| 17530ns 00000380 | : | 11223000 R[02]=00000012=18    | SW=00000000 |
| 17610ns 00000384 | : | 09dd0018 R[13]=00006fac=28588 | SW=00000000 |
| 17690ns 00000388 | : | 2ce00000 R[14]=0000006c=108   | SW=00000000 |
| 17770ns 0000006c | : | 013d0040 R[03]=00000080=128   | SW=00000000 |
| 17850ns 00000070 | : | 11232000 R[02]=00000092=146   | SW=00000000 |
| 17930ns 00000074 | : | 022d0040 m[28588+64 ]=146     | SW=00000000 |
| 18010ns 00000078 | : | 022d0000 m[28588+0 ]=146      | SW=00000000 |
| 18090ns 0000007c | : | 2b000210 R[00]=00000000=0     | SW=00000000 |
| 18170ns 00000290 | : | 09ddffff R[13]=00006fa4=28580 | SW=00000000 |
| 18250ns 00000294 | : | 012d0008 R[02]=00000092=146   | SW=00000000 |
| 18330ns 00000298 | : | 022d0004 m[28580+4 ]=146      | SW=00000000 |
| 18410ns 0000029c | : | 09207000 R[02]=00007000=28672 | SW=00000000 |
| 18490ns 000002a0 | : | 022d0000 m[28580+0 ]=28672    | SW=00000000 |
| 18570ns 000002a4 | : | 013d0004 R[03]=00000092=146   | SW=00000000 |
| 18650ns 000002a8 | : | 02320000 OUTPUT=146           |             |
| 18730ns 000002ac | : | 09dd0008 R[13]=00006fac=28588 | SW=00000000 |
| 18810ns 000002b0 | : | 2ce00000 R[14]=00000080=128   | SW=00000000 |
| 18890ns 00000080 | : | 09200005 R[02]=00000005=5     | SW=00000000 |
| 18970ns 00000084 | : | 022d0018 m[28588+24 ]=5       | SW=00000000 |
| 19050ns 00000088 | : | 09200004 R[02]=00000004=4     | SW=00000000 |
| 19130ns 0000008c | : | 022d0014 m[28588+20 ]=4       | SW=00000000 |
| 19210ns 00000090 | : | 09200003 R[02]=00000003=3     | SW=00000000 |
| 19290ns 00000094 | : | 022d0010 m[28588+16 ]=3       | SW=00000000 |
| 19370ns 00000098 | : | 09200002 R[02]=00000002=2     | SW=00000000 |
| 19450ns 0000009c | : | 022d000c m[28588+12 ]=2       | SW=00000000 |
| 19530ns 000000a0 | : | 09200001 R[02]=00000001=1     | SW=00000000 |
| 19610ns 000000a4 | : | 022d0008 m[28588+8 ]=1        | SW=00000000 |
| 19690ns 000000a8 | : | 027d0004 m[28588+4 ]=0        | SW=00000000 |
| 19770ns 000000ac | : | 09200006 R[02]=00000006=6     | SW=00000000 |
| 19850ns 000000b0 | : | 022d0000 m[28588+0 ]=6        | SW=00000000 |

```

19930ns 000000b4 : 2b0002d4 R[00]=00000000=0           SW=00000000
20010ns 0000038c : 09ddfffe8 R[13]=00006f94=28564      SW=00000000
20090ns 00000390 : 012d0018 R[02]=00000006=6           SW=00000000
20170ns 00000394 : 022d0014 m[28564+20]=6             SW=00000000
20250ns 00000398 : 09200000 R[02]=00000000=0           SW=00000000
20330ns 0000039c : 022d0010 m[28564+16]=0             SW=00000000
20410ns 000003a0 : 022d000c m[28564+12]=0             SW=00000000
20490ns 000003a4 : 022d0008 m[28564+8]=0             SW=00000000
20570ns 000003a8 : 093d001c R[03]=00006fb0=28592      SW=00000000
20650ns 000003ac : 023d0004 m[28564+4]=28592        SW=00000000
20730ns 000003b0 : 022d0010 m[28564+16]=0             SW=00000000
20810ns 000003b4 : 012d0014 R[02]=00000006=6           SW=00000000
20890ns 000003b8 : 013d0010 R[03]=00000000=0           SW=00000000
20970ns 000003bc : 10320000 R[03]=00000000=0           SW=80000000
21050ns 000003c0 : 25000030 R[00]=00000000=0           SW=80000000
21130ns 000003c4 : 012d0004 R[02]=00006fb0=28592      SW=80000000
21210ns 000003c8 : 09320004 R[03]=00006fb4=28596      SW=80000000
21290ns 000003cc : 023d0004 m[28564+4]=28596        SW=80000000
21370ns 000003d0 : 01220000 R[02]=00000000=0           SW=80000000
21450ns 000003d4 : 022d000c m[28564+12]=0             SW=80000000
21530ns 000003d8 : 013d0008 R[03]=00000000=0           SW=80000000
21610ns 000003dc : 11232000 R[02]=00000000=0           SW=80000000
21690ns 000003e0 : 022d0008 m[28564+8]=0             SW=80000000
21770ns 000003e4 : 012d0010 R[02]=00000000=0           SW=80000000
21850ns 000003e8 : 09220001 R[02]=00000001=1           SW=80000000
21930ns 000003ec : 022d0010 m[28564+16]=1             SW=80000000
22010ns 000003f0 : 26fffffc0 R[15]=00003b4=948         SW=80000000
22090ns 000003b4 : 012d0014 R[02]=00000006=6           SW=80000000
22170ns 000003b8 : 013d0010 R[03]=00000001=1           SW=80000000
22250ns 000003bc : 10320000 R[03]=00000001=1           SW=80000000
22330ns 000003c0 : 25000030 R[00]=00000000=0           SW=80000000
22410ns 000003c4 : 012d0004 R[02]=00006fb4=28596      SW=80000000
22490ns 000003c8 : 09320004 R[03]=00006fb8=28600      SW=80000000
22570ns 000003cc : 023d0004 m[28564+4]=28600        SW=80000000
22650ns 000003d0 : 01220000 R[02]=00000001=1           SW=80000000
22730ns 000003d4 : 022d000c m[28564+12]=1             SW=80000000
22810ns 000003d8 : 013d0008 R[03]=00000000=0           SW=80000000
22890ns 000003dc : 11232000 R[02]=00000001=1           SW=80000000
22970ns 000003e0 : 022d0008 m[28564+8]=1             SW=80000000
23050ns 000003e4 : 012d0010 R[02]=00000001=1           SW=80000000
23130ns 000003e8 : 09220001 R[02]=00000002=2           SW=80000000
23210ns 000003ec : 022d0010 m[28564+16]=2             SW=80000000
23290ns 000003f0 : 26fffffc0 R[15]=00003b4=948         SW=80000000
23370ns 000003b4 : 012d0014 R[02]=00000006=6           SW=80000000
23450ns 000003b8 : 013d0010 R[03]=00000002=2           SW=80000000
23530ns 000003bc : 10320000 R[03]=00000002=2           SW=80000000
23610ns 000003c0 : 25000030 R[00]=00000000=0           SW=80000000
23690ns 000003c4 : 012d0004 R[02]=00006fb8=28600      SW=80000000
23770ns 000003c8 : 09320004 R[03]=00006fb0=28604      SW=80000000
23850ns 000003cc : 023d0004 m[28564+4]=28604        SW=80000000
23930ns 000003d0 : 01220000 R[02]=00000002=2           SW=80000000
24010ns 000003d4 : 022d000c m[28564+12]=2             SW=80000000
24090ns 000003d8 : 013d0008 R[03]=00000001=1           SW=80000000
24170ns 000003dc : 11232000 R[02]=00000003=3           SW=80000000
24250ns 000003e0 : 022d0008 m[28564+8]=3             SW=80000000
24330ns 000003e4 : 012d0010 R[02]=00000002=2           SW=80000000
24410ns 000003e8 : 09220001 R[02]=00000003=3           SW=80000000
24490ns 000003ec : 022d0010 m[28564+16]=3             SW=80000000

```

|                    |                               |             |
|--------------------|-------------------------------|-------------|
| 24570ns 000003f0 : | 26ffffc0 R[15]=000003b4=948   | SW=80000000 |
| 24650ns 000003b4 : | 012d0014 R[02]=00000006=6     | SW=80000000 |
| 24730ns 000003b8 : | 013d0010 R[03]=00000003=3     | SW=80000000 |
| 24810ns 000003bc : | 10320000 R[03]=00000003=3     | SW=80000000 |
| 24890ns 000003c0 : | 25000030 R[00]=00000000=0     | SW=80000000 |
| 24970ns 000003c4 : | 012d0004 R[02]=00006fb0=28604 | SW=80000000 |
| 25050ns 000003c8 : | 09320004 R[03]=00006fc0=28608 | SW=80000000 |
| 25130ns 000003cc : | 023d0004 m[28564+4]=28608     | SW=80000000 |
| 25210ns 000003d0 : | 01220000 R[02]=00000003=3     | SW=80000000 |
| 25290ns 000003d4 : | 022d000c m[28564+12]=3        | SW=80000000 |
| 25370ns 000003d8 : | 013d0008 R[03]=00000003=3     | SW=80000000 |
| 25450ns 000003dc : | 11232000 R[02]=00000006=6     | SW=80000000 |
| 25530ns 000003e0 : | 022d0008 m[28564+8]=6         | SW=80000000 |
| 25610ns 000003e4 : | 012d0010 R[02]=00000003=3     | SW=80000000 |
| 25690ns 000003e8 : | 09220001 R[02]=00000004=4     | SW=80000000 |
| 25770ns 000003ec : | 022d0010 m[28564+16]=4        | SW=80000000 |
| 25850ns 000003f0 : | 26ffffc0 R[15]=000003b4=948   | SW=80000000 |
| 25930ns 000003b4 : | 012d0014 R[02]=00000006=6     | SW=80000000 |
| 26010ns 000003b8 : | 013d0010 R[03]=00000004=4     | SW=80000000 |
| 26090ns 000003bc : | 10320000 R[03]=00000004=4     | SW=80000000 |
| 26170ns 000003c0 : | 25000030 R[00]=00000000=0     | SW=80000000 |
| 26250ns 000003c4 : | 012d0004 R[02]=00006fc0=28608 | SW=80000000 |
| 26330ns 000003c8 : | 09320004 R[03]=00006fc4=28612 | SW=80000000 |
| 26410ns 000003cc : | 023d0004 m[28564+4]=28612     | SW=80000000 |
| 26490ns 000003d0 : | 01220000 R[02]=00000004=4     | SW=80000000 |
| 26570ns 000003d4 : | 022d000c m[28564+12]=4        | SW=80000000 |
| 26650ns 000003d8 : | 013d0008 R[03]=00000006=6     | SW=80000000 |
| 26730ns 000003dc : | 11232000 R[02]=0000000a=10    | SW=80000000 |
| 26810ns 000003e0 : | 022d0008 m[28564+8]=10        | SW=80000000 |
| 26890ns 000003e4 : | 012d0010 R[02]=00000004=4     | SW=80000000 |
| 26970ns 000003e8 : | 09220001 R[02]=00000005=5     | SW=80000000 |
| 27050ns 000003ec : | 022d0010 m[28564+16]=5        | SW=80000000 |
| 27130ns 000003f0 : | 26ffffc0 R[15]=000003b4=948   | SW=80000000 |
| 27210ns 000003b4 : | 012d0014 R[02]=00000006=6     | SW=80000000 |
| 27290ns 000003b8 : | 013d0010 R[03]=00000005=5     | SW=80000000 |
| 27370ns 000003bc : | 10320000 R[03]=00000005=5     | SW=80000000 |
| 27450ns 000003c0 : | 25000030 R[00]=00000000=0     | SW=80000000 |
| 27530ns 000003c4 : | 012d0004 R[02]=00006fc4=28612 | SW=80000000 |
| 27610ns 000003c8 : | 09320004 R[03]=00006fc8=28616 | SW=80000000 |
| 27690ns 000003cc : | 023d0004 m[28564+4]=28616     | SW=80000000 |
| 27770ns 000003d0 : | 01220000 R[02]=00000005=5     | SW=80000000 |
| 27850ns 000003d4 : | 022d000c m[28564+12]=5        | SW=80000000 |
| 27930ns 000003d8 : | 013d0008 R[03]=0000000a=10    | SW=80000000 |
| 28010ns 000003dc : | 11232000 R[02]=0000000f=15    | SW=80000000 |
| 28090ns 000003e0 : | 022d0008 m[28564+8]=15        | SW=80000000 |
| 28170ns 000003e4 : | 012d0010 R[02]=00000005=5     | SW=80000000 |
| 28250ns 000003e8 : | 09220001 R[02]=00000006=6     | SW=80000000 |
| 28330ns 000003ec : | 022d0010 m[28564+16]=6        | SW=80000000 |
| 28410ns 000003f0 : | 26ffffc0 R[15]=000003b4=948   | SW=80000000 |
| 28490ns 000003b4 : | 012d0014 R[02]=00000006=6     | SW=80000000 |
| 28570ns 000003b8 : | 013d0010 R[03]=00000006=6     | SW=80000000 |
| 28650ns 000003bc : | 10320000 R[03]=00000006=6     | SW=40000000 |
| 28730ns 000003c0 : | 25000030 R[00]=00000000=0     | SW=40000000 |
| 28810ns 000003f4 : | 012d0008 R[02]=0000000f=15    | SW=40000000 |
| 28890ns 000003f8 : | 09dd0018 R[13]=00006fa0=28588 | SW=40000000 |
| 28970ns 000003fc : | 2ce00000 R[14]=000000b8=184   | SW=40000000 |
| 29050ns 000000b8 : | 022d0040 m[28588+64]=15       | SW=40000000 |
| 29130ns 000000bc : | 022d0000 m[28588+0]=15        | SW=40000000 |

```
29210ns 000000c0 : 2b0001cc R[00]=00000000=0 SW=40000000
29290ns 00000290 : 09ddfff8 R[13]=00006fa4=28580 SW=40000000
29370ns 00000294 : 012d0008 R[02]=0000000f=15 SW=40000000
29450ns 00000298 : 022d0004 m[28580+4]=15 SW=40000000
29530ns 0000029c : 09207000 R[02]=00007000=28672 SW=40000000
29610ns 000002a0 : 022d0000 m[28580+0]=28672 SW=40000000
29690ns 000002a4 : 013d0004 R[03]=0000000f=15 SW=40000000
29770ns 000002a8 : 02320000 OUTPUT=15
29850ns 000002ac : 09dd0008 R[13]=00006fac=28588 SW=40000000
29930ns 000002b0 : 2ce00000 R[14]=000000c4=196 SW=40000000
30010ns 000000c4 : 012d0040 R[02]=0000000f=15 SW=40000000
30090ns 000000c8 : 017d0048 R[07]=00000000=0 SW=40000000
30170ns 000000cc : 01ed004c R[14]=ffffffff=-1 SW=40000000
30250ns 000000d0 : 09dd0050 R[13]=00006ffc=28668 SW=40000000
30330ns 000000d4 : 2ce00000 R[14]=ffffffff=-1 SW=40000000
RET to PC < 0, finished!
```

As above result, cpu0s.v dump the memory first after read input cpu0s.hex. Next, it run instructions from address 0 and print each destination register value in the fourth column. The first column is the nano seconds of timing. The second is instruction address. The third is instruction content. We have checked the “>>” is correct on both signed and unsigned int type , and tracking the variable **a** value by print\_integer(). You can verify it with the **OUTPUT=xxx** in Verilog output.

We show Verilog PC output by display the I/O memory mapped address but didn't implementing the output hardware interface or port. The output hardware interface/port is hardware output device dependent, such as RS232, speaker, LED, .... You should implement the I/O interface/port when you want to program FPGA and wire I/O device to the I/O port.

# BACKEND OPTIMIZATION

This chapter introduce how to do backend optimization in LLVM first. Next we do optimization via redesign instruction sets with hardware level to do optimization by create a efficient RISC CPU which aim to C/C++ high level language.

## 11.1 Cpu0 backend Optimization: Remove useless JMP

LLVM use functional pass in code generation and optimization. Following the 3 tiers of compiler architecture, LLVM did much optimization in middle tier of which is LLVM IR, SSA form. In spite of this middle tier optimization, there are opportunities in optimization which depend on backend features. Mips fill delay slot is an example of backend optimization used in pipeline RISC machine. You can modify from Mips this part if your backend is a pipeline RISC with delay slot. We apply the “delete useless jmp” unconditional branch instruction in Cpu0 backend optimization in this section. This algorithm is simple and effective as a perfect tutorial in optimization. You can understand how to add a optimization pass and design your complicate optimization algorithm on your backend in real project.

Chapter11\_1/ support this optimization algorithm include the added codes as follows,

### LLVMBackendTutorialExampleCode/Chapter11\_1/CMakeLists.txt

```
add_llvm_target(Cpu0CodeGen
...
Cpu0DelUselessJMP.cpp
...
)
```

### LLVMBackendTutorialExampleCode/Chapter11\_1/Cpu0.h

```
...
FunctionPass *createCpu0DelJmpPass(Cpu0TargetMachine &TM);

// Cpu-TargetMachine.cpp
class Cpu0PassConfig : public TargetPassConfig {
...
    virtual bool addPreEmitPass();
};

// Implemented by targets that want to run passes immediately before
// machine code is emitted. return true if -print-machineinstrs should
// print out the code after the passes.
```

```
bool Cpu0PassConfig::addPreEmitPass() {
    Cpu0TargetMachine &TM = getCpu0TargetMachine();
    addPass(createCpu0DelJmpPass(TM));
    return true;
}
```

### LLVMBackendTutorialExampleCode/Chapter11\_1/Cpu0DelUselessJMP.cpp

```
1 //===== Cpu0DelUselessJMP.cpp - Cpu0 DelJmp =====//
2 //
3 //          The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //=====//
9 //
10 // Simple pass to fills delay slots with useful instructions.
11 //
12 //=====//
13
14 #define DEBUG_TYPE "del-jmp"
15
16 #include "Cpu0.h"
17 #include "Cpu0TargetMachine.h"
18 #include "llvm/CodeGen/MachineFunctionPass.h"
19 #include "llvm/Support/CommandLine.h"
20 #include "llvm/Target/TargetMachine.h"
21 #include "llvm/Target/TargetInstrInfo.h"
22 #include "llvm/ADT/SmallSet.h"
23 #include "llvm/ADT/Statistic.h"
24
25 using namespace llvm;
26
27 STATISTIC(NumDelJmp, "Number of useless jmp deleted");
28
29 static cl::opt<bool> EnableDelJmp(
30     "enable-cpu0-del-useless-jmp",
31     cl::init(true),
32     cl::desc("Delete useless jmp instructions: jmp 0."),
33     cl::Hidden);
34
35 namespace {
36     struct DelJmp : public MachineFunctionPass {
37
38         TargetMachine &TM;
39         const TargetInstrInfo *TII;
40
41         static char ID;
42         DelJmp(TargetMachine &tm)
43             : MachineFunctionPass(ID), TM(tm), TII(tm.getInstrInfo()) { }
44
45         virtual const char *getPassName() const {
46             return "Cpu0 Del Useless jmp";
47         }
48     };
49 }
```

```

49     bool runOnMachineBasicBlock(MachineBasicBlock &MBB, MachineBasicBlock &MBBN);
50     bool runOnMachineFunction(MachineFunction &F) {
51         bool Changed = false;
52         if (EnableDelJmp) {
53             MachineFunction::iterator FJ = F.begin();
54             if (FJ != F.end())
55                 FJ++;
56             if (FJ == F.end())
57                 return Changed;
58             for (MachineFunction::iterator FI = F.begin(), FE = F.end();
59                  FJ != FE; ++FI, ++FJ)
59                 // In STL style, F.end() is the dummy BasicBlock() like '\0' in
60                 // C string.
61                 // FJ is the next BasicBlock of FI; When FI range from F.begin() to
62                 // the PreviousBasicBlock of F.end() call runOnMachineBasicBlock().
63                 Changed |= runOnMachineBasicBlock(*FI, *FJ);
64             }
65         }
66         return Changed;
67     }
68
69 };
70     char DelJmp::ID = 0;
71 } // end of anonymous namespace
72
73 bool DelJmp::
74 runOnMachineBasicBlock(MachineBasicBlock &MBB, MachineBasicBlock &MBBN) {
75     bool Changed = false;
76
77     MachineBasicBlock::iterator I = MBB.end();
78     if (I != MBB.begin())
79         I--; // set I to the last instruction
80     else
81         return Changed;
82
83     if (I->getOpcode() == Cpu0::JMP && I->getOperand(0).getMBB() == &MBBN) {
84         // I is the instruction of "jmp #offset=0", as follows,
85         // jmp      $BB0_3
86         // $BB0_3:
87         //    ld      $4, 28($sp)
88         ++NumDelJmp;
89         MBB.erase(I); // delete the "JMP 0" instruction
90         Changed = true; // Notify LLVM kernel Changed
91     }
92     return Changed;
93 }
94
95
96 /// createCpu0DelJmpPass - Returns a pass that DelJmp in Cpu0 MachineFunctions
97 FunctionPass *llvm::createCpu0DelJmpPass(Cpu0TargetMachine &tm) {
98     return new DelJmp(tm);
99 }

```

As above code, except Cpu0DelUselessJMP.cpp, other files changed for register class DelJmp as a functional pass. As comment of above code, MBB is the current block and MBBN is the next block. For the last instruction of every MBB, we check if it is the JMP instruction as well as its Operand is the next basic block. By getMBB() in MachineOperand, you can get the MBB address. For the member function of MachineOperand, please check include/llvm/CodeGen/MachineOperand.h Let's run Chapter11\_1/ with ch11\_1.cpp to explain it easier.

### LLVMBackendTutorialExampleCode/InputFiles/ch11\_1.cpp

```
1 int main()
2 {
3     int a = 0;
4     int b = 1;
5     int c = 2;
6
7     if (a == 0) {
8         a++;
9     }
10    if (b == 0) {
11        a = a + b;
12    } else if (b < 0) {
13        a = a--;
14    }
15    if (c > 0) {
16        c++;
17    }
18
19    return a;
20 }
```

```
118-165-78-10:InputFiles Jonathan$ clang -c ch11_1.cpp -emit-llvm -o ch11_1.bc
118-165-78-10:InputFiles Jonathan$ clang -target 'llvm-config --host-target'
-c ch11_1.cpp -emit-llvm -o ch11_1.bc
118-165-78-10:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=asm -stats
ch11_1.bc -o ch11_1.cpu0.s
=====
          ... Statistics Collected ...
=====
...
2 del-jmp      - Number of useless jmp deleted
...
.

.section .mdebug.abi32
.previous
.file "ch11_1.bc"
.text
.globl main
.align 2
.type main,@function
.ent main          # @main
main:
.frame $sp,16,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
    addiu $sp, $sp, -16
    addiu $2, $zero, 0
    st $2, 12($sp)
    st $2, 8($sp)
    addiu $2, $zero, 1
    st $2, 4($sp)
    addiu $2, $zero, 2
    st $2, 0($sp)
```

```

ld $2, 8($sp)
bne $2, $zero, $BB0_2
# BB#1:
ld $2, 8($sp)
addiu $2, $2, 1
st $2, 8($sp)
$BB0_2:
ld $2, 4($sp)
bne $2, $zero, $BB0_4
jmp $BB0_3
$BB0_4:
ld $2, 4($sp)
addiu $3, $zero, -1
slt $2, $3, $2
bne $2, $zero, $BB0_6
jmp $BB0_5
$BB0_3:
ld $2, 4($sp)
ld $3, 8($sp)
addu $2, $3, $2
st $2, 8($sp)
jmp $BB0_6
$BB0_5:
ld $2, 8($sp)
addiu $3, $2, -1
st $3, 8($sp)
st $2, 8($sp)
$BB0_6:
ld $2, 0($sp)
slti $2, $2, 1
bne $2, $zero, $BB0_8
# BB#7:
ld $2, 0($sp)
addiu $2, $2, 1
st $2, 0($sp)
$BB0_8:
ld $2, 8($sp)
addiu $sp, $sp, 16
ret $lr
.set macro
.set reorder
.end main
$tmp1:
.size main, ($tmp1)-main

```

The terminal display “Number of useless jmp deleted” by `llc -stats` option because we set the “STATISTIC(NumDelJmp, “Number of useless jmp deleted”)” in code. It delete 2 jmp instructions from block “# BB#0” and “\$BB0\_6”. You can check it by `llc -enable-cpu0-del-useless-jmp=false` option to see the difference from no optimization version. If you run with `ch7_1_1.cpp`, will find 10 jmp instructions are deleted in 100 lines of assembly code, which meaning 10% enhance in speed and code size.

## 11.2 Cpu0 Optimization: Redesign instruction sets

If you compare the cpu0 and Mips instruction sets, you will find the following,

1. Mips has **addu** and **add** two different instructions for No Trigger Exception and Trigger Exception.

2. Mips use SLT, BEQ and set the status in explicit/general register while Cpu0 use CMP, JEQ and set status in implicit/specific register.

According RISC spirits, this section will replace CMP, JEQ with Mips style instructions and support both Trigger and No Trigger Exception operators. Mips style BEQ instructions will reduce the number of branch instructions too. Which means optimization in speed and code size.

### 11.2.1 Cpu0 new instruction sets table

Redesign Cpu0 instruction set and remap OP code as follows (OP code 0x00 is reserved for NOP operation in pipeline architecture),

- First column F.: meaning Format.

Table 11.1: Cpu0 Instruction Set

| F. | Mnemonic | Opcode | Meaning                 | Syntax           | Operation                  |
|----|----------|--------|-------------------------|------------------|----------------------------|
| L  | LD       | 01     | Load word               | LD Ra, [Rb+Cx]   | Ra <= [Rb+Cx]              |
| L  | ST       | 02     | Store word              | ST Ra, [Rb+Cx]   | [Rb+Cx] <= Ra              |
| L  | LB       | 03     | Load byte               | LB Ra, [Rb+Cx]   | Ra <= (byte)[Rb+Cx]        |
| L  | LBu      | 04     | Load byte unsigned      | LBu Ra, [Rb+Cx]  | Ra <= (byte)[Rb+Cx]        |
| L  | SB       | 05     | Store byte              | SB Ra, [Rb+Cx]   | [Rb+Cx] <= (byte)Ra        |
| A  | LH       | 06     | Load half word unsigned | LH Ra, [Rb+Cx]   | Ra <= (2bytes)[Rb+Cx]      |
| A  | LHu      | 07     | Load half word          | LHu Ra, [Rb+Cx]  | Ra <= (2bytes)[Rb+Cx]      |
| A  | SH       | 08     | Store half word         | SH Ra, [Rb+Cx]   | [Rb+Rc] <= Ra              |
| L  | ADDiu    | 09     | Add immediate           | ADDiu Ra, Rb, Cx | Ra <= (Rb + Cx)            |
| L  | SLTi     | 0A     | Set less Then           | SLTi Ra, Rb, Cx  | Ra <= (Rb < Cx)            |
| L  | SLTi     | 0B     | SLTi unsigned           | SLTi Ra, Rb, Cx  | Ra <= (Rb < Cx)            |
| L  | ANDi     | 0C     | AND imm                 | ANDi Ra, Rb, Cx  | Ra <= (Rb & Cx)            |
| L  | ORi      | 0D     | OR                      | ORi Ra, Rb, Cx   | Ra <= (Rb   Cx)            |
| L  | XORi     | 0E     | XOR                     | XORi Ra, Rb, Cx  | Ra <= (Rb ^ Cx)            |
| L  | LUi      | 0F     | Load upper              | LUi Ra, Cx       | Ra <= (Cx  0x0000)         |
| A  | ADDu     | 11     | Add unsigned            | ADD Ra, Rb, Rc   | Ra <= Rb + Rc              |
| A  | SUBu     | 12     | Sub unsigned            | SUB Ra, Rb, Rc   | Ra <= Rb - Rc              |
| A  | ADD      | 13     | Add                     | ADD Ra, Rb, Rc   | Ra <= Rb + Rc              |
| A  | SUB      | 14     | Subtract                | SUB Ra, Rb, Rc   | Ra <= Rb - Rc              |
| A  | MUL      | 15     | Multiply                | MUL Ra, Rb, Rc   | Ra <= Rb * Rc              |
| A  | DIV      | 16     | Divide                  | DIV Ra, Rb       | HI<=Ra%Rb, LO<=Ra/Rb       |
| A  | DIVu     | 16     | Div unsigned            | DIVu Ra, Rb      | HI<=Ra%Rb, LO<=Ra/Rb       |
| A  | AND      | 18     | Bitwise and             | AND Ra, Rb, Rc   | Ra <= Rb & Rc              |
| A  | OR       | 19     | Bitwise or              | OR Ra, Rb, Rc    | Ra <= Rb   Rc              |
| A  | XOR      | 1A     | Bitwise exclusive or    | XOR Ra, Rb, Rc   | Ra <= Rb ^ Rc              |
| A  | ROL      | 1C     | Rotate left             | ROL Ra, Rb, Cx   | Ra <= Rb rol Cx            |
| A  | ROR      | 1D     | Rotate right            | ROR Ra, Rb, Cx   | Ra <= Rb ror Cx            |
| A  | SHL      | 1E     | Shift left              | SHL Ra, Rb, Cx   | Ra <= Rb << Cx             |
| A  | SHR      | 1F     | Shift right             | SHR Ra, Rb, Cx   | Ra <= Rb >> Cx             |
| L  | BEQ      | 20     | Jump if equal           | BEQ Ra, Rb, Cx   | if (Ra==Rb), PC <= PC + Cx |
| L  | BNE      | 21     | Jump if not equal       | BNE Ra, Rb, Cx   | if (Ra!=Rb), PC <= PC + Cx |
| J  | JMP      | 26     | Jump (unconditional)    | JMP Cx           | PC <= PC + Cx              |
| J  | SWI      | 2A     | Software interrupt      | SWI Cx           | LR <= PC; PC <= Cx         |
| J  | JSUB     | 2B     | Jump to subroutine      | JSUB Cx          | LR <= PC; PC <= PC + Cx    |
| J  | RET      | 2C     | Return from subroutine  | RET Cx           | PC <= LR                   |

Continued on next page

Table 11.1 – continued from previous page

| F. | Mnemonic | Opcode | Meaning                       | Syntax          | Operation               |
|----|----------|--------|-------------------------------|-----------------|-------------------------|
| J  | IRET     | 2D     | Return from interrupt handler | IRET            | PC <= LR; INT 0         |
| J  | JR       | 2E     | Jump to subroutine            | JR Rb           | LR <= PC; PC <= Rb      |
| A  | SLT      | 30     | Set less Then                 | SLT Ra, Rb, Rc  | Ra <= (Rb < Rc)         |
| A  | SLTu     | 31     | SLT unsigned                  | SLTu Ra, Rb, Rc | Ra <= (Rb < Rc)         |
| L  | MFHI     | 40     | Move HI to GPR                | MFHI Ra         | Ra <= HI                |
| L  | MFLO     | 41     | Move LO to GPR                | MFLO Ra         | Ra <= LO                |
| L  | MTHI     | 42     | Move GPR to HI                | MTHI Ra         | HI <= Ra                |
| L  | MTLO     | 43     | Move GPR to LO                | MTLO Ra         | LO <= Ra                |
| L  | MULT     | 50     | Multiply for 64 bits result   | MULT Ra, Rb     | (HI,LO) <= MULT(Ra,Rb)  |
| L  | MULTU    | 51     | MULT for unsigned 64 bits     | MULTU Ra, Rb    | (HI,LO) <= MULTU(Ra,Rb) |

As above, the OPu, such as ADDu is for unsigned integer or No Trigger Exception. The LUi for example, “LUi \$2, 0x7000”, load 0x700 to high 16 bits of \$2 and fill the low 16 bits of \$2 to 0x0000.

## 11.2.2 Cpu0 code changes

Chapter11\_2/ include the changes for new instruction sets as follows,

**LLVMBackendTutorialExampleCode/Chapter11\_2/AsmParser/Cpu0AsmParser.cpp**

```
// Cpu0AsmParser.cpp
void Cpu0AsmParser::expandLoadImm(MCInst &Inst, SMLoc IDLoc,
                                    SmallVectorImpl<MCInst> &Instructions) {
    MCInst tmpInst;
    const MCOperand &ImmOp = Inst.getOperand(1);
    assert(ImmOp.isImm() && "expected immediate operand kind");
    const MCOperand &RegOp = Inst.getOperand(0);
    assert(RegOp.isReg() && "expected register operand kind");

    int ImmValue = ImmOp.getImm();
    tmpInst.setLoc(IDLoc);
    if (0 <= ImmValue && ImmValue <= 65535) {
        // for 0 <= j <= 65535.
        // li d,j => ori d,$zero,j
        tmpInst.setOpcode(Cpu0::ORI);
        tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
        tmpInst.addOperand(
            MCOperand::CreateReg(Cpu0::ZERO));
        tmpInst.addOperand(MCOperand::CreateImm(ImmValue));
        Instructions.push_back(tmpInst);
    } else if (ImmValue < 0 && ImmValue >= -32768) {
        // for -32768 <= j < 0.
        // li d,j => addiu d,$zero,j
        tmpInst.setOpcode(Cpu0::ADDIU); //TODO: no ADDIU64 in td files?
        tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
        tmpInst.addOperand(
            MCOperand::CreateReg(Cpu0::ZERO));
        tmpInst.addOperand(MCOperand::CreateImm(ImmValue));
        Instructions.push_back(tmpInst);
    } else {
        // for any other value of j that is representable as a 32-bit integer.
    }
}
```

```

// li d, j => lui d,hi16(j)
//          ori d,d,lo16(j)
tmpInst.setOpcode(Cpu0::LUI);
tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
tmpInst.addOperand(MCOperand::CreateImm((ImmValue & 0xffff0000) >> 16));
Instructions.push_back(tmpInst);
tmpInst.clear();
tmpInst.setOpcode(Cpu0::ORI);
tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
tmpInst.addOperand(MCOperand::CreateImm(ImmValue & 0xffff));
tmpInst.setLoc(IDLoc);
Instructions.push_back(tmpInst);
}

}

void Cpu0AsmParser::expandLoadAddressReg(MCInst &Inst, SMLoc IDLoc,
                                         SmallVectorImpl<MCInst> &Instructions) {
MCInst tmpInst;
const MCOperand &ImmOp = Inst.getOperand(2);
assert(ImmOp.isImm() && "expected immediate operand kind");
const MCOperand &SrcRegOp = Inst.getOperand(1);
assert(SrcRegOp.isReg() && "expected register operand kind");
const MCOperand &DstRegOp = Inst.getOperand(0);
assert(DstRegOp.isReg() && "expected register operand kind");
int ImmValue = ImmOp.getImm();
if ( -32768 <= ImmValue && ImmValue <= 32767) {
    // for -32768 <= j < 32767.
    // la d,j(s) => addiu d,s,j
    tmpInst.setOpcode(Cpu0::ADDiu); //TODO: no ADDiu64 in td files?
    tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
    tmpInst.addOperand(MCOperand::CreateReg(SrcRegOp.getReg()));
    tmpInst.addOperand(MCOperand::CreateImm(ImmValue));
    Instructions.push_back(tmpInst);
} else {
    // for any other value of j that is representable as a 32-bit integer.
    // la d,j(s) => lui d,hi16(j)
    //          ori d,d,lo16(j)
    //          add d,d,s
    tmpInst.setOpcode(Cpu0::LUI);
    tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
    tmpInst.addOperand(MCOperand::CreateImm((ImmValue & 0xffff0000) >> 16));
    Instructions.push_back(tmpInst);
    tmpInst.clear();
    tmpInst.setOpcode(Cpu0::ORI);
    tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
    tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
    tmpInst.addOperand(MCOperand::CreateImm(ImmValue & 0xffff));
    Instructions.push_back(tmpInst);
    tmpInst.clear();
    tmpInst.setOpcode(Cpu0::ADD);
    tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
    tmpInst.addOperand(MCOperand::CreateReg(DstRegOp.getReg()));
    tmpInst.addOperand(MCOperand::CreateReg(SrcRegOp.getReg()));
    Instructions.push_back(tmpInst);
}
}

```

```

void Cpu0AsmParser::expandLoadAddressImm(MCInst &Inst, SMLoc IDLoc,
                                         SmallVectorImpl<MCInst> &Instructions) {
    MCInst tmpInst;
    const MCOperand &ImmOp = Inst.getOperand(1);
    assert(ImmOp.isImm() && "expected immediate operand kind");
    const MCOperand &RegOp = Inst.getOperand(0);
    assert(RegOp.isReg() && "expected register operand kind");
    int ImmValue = ImmOp.getImm();
    if (-32768 <= ImmValue && ImmValue <= 32767) {
        // for -32768 <= j < 32767.
        // la d,j => addiu d,$zero,j
        tmpInst.setOpcode(Cpu0::ADDiu);
        tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
        tmpInst.addOperand(
            MCOperand::CreateReg(Cpu0::ZERO));
        tmpInst.addOperand(MCOperand::CreateImm(ImmValue));
        Instructions.push_back(tmpInst);
    } else {
        // for any other value of j that is representable as a 32-bit integer.
        // la d,j => lui d,hi16(j)
        //          ori d,d,lo16(j)
        tmpInst.setOpcode(Cpu0::LUI);
        tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
        tmpInst.addOperand(MCOperand::CreateImm((ImmValue & 0xffff0000) >> 16));
        Instructions.push_back(tmpInst);
        tmpInst.clear();
        tmpInst.setOpcode(Cpu0::ORI);
        tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
        tmpInst.addOperand(MCOperand::CreateReg(RegOp.getReg()));
        tmpInst.addOperand(MCOperand::CreateImm(ImmValue & 0xffff));
        Instructions.push_back(tmpInst);
    }
}

int Cpu0AsmParser::matchRegisterName(StringRef Name) {
    ...
    .Case("t0", Cpu0::T0)
    ...
}

```

### LLVMBackendTutorialExampleCode/Chapter11\_2/Disassembler/Cpu0Disassembler.cpp

```

// Decoder tables for Cpu0 register
static const unsigned CPUREgsTable[] = {
// Change SW to T0 which is a caller saved
    Cpu0::T0, ...
};

// DecodeCMPInstruction() function is removed since No CMP instruction.
...

// Change DecodeBranchTarget() to following for 16 bit offset
static DecodeStatus DecodeBranchTarget(MCInst &Inst,
                                         unsigned Insn,
                                         uint64_t Address,
                                         const void *Decoder) {

```

```
int BranchOffset = fieldFromInstruction(Insn, 0, 16);
if (BranchOffset > 0x8fff)
    BranchOffset = -1*(0x10000 - BranchOffset);
Inst.addOperand(MCOperand::CreateImm(BranchOffset));
return MCDisassembler::Success;
}
```

### LLVMBackendTutorialExampleCode/Chapter11\_2/MCTargetDesc/Cpu0AsmBackend.cpp

```
static unsigned adjustFixupValue(unsigned Kind, uint64_t Value) {
    ...
    // Add/subtract and shift
    switch (Kind) {
    ...
    case Cpu0::fixup_Cpu0_PC16:
    case Cpu0::fixup_Cpu0_PC24:
        // So far we are only using this type for branches.
        // For branches we start 1 instruction after the branch
        // so the displacement will be one instruction size less.
        Value -= 4;
        break;
    ...
}
...
const MCFixupKindInfo &getFixupKindInfo(MCFixupKind Kind) const {
    const static MCFixupKindInfo Infos[Cpu0::NumTargetFixupKinds] = {
        // This table *must* be in same the order of fixup_* kinds in
        // Cpu0FixupKinds.h.
        //
        // name          offset  bits  flags
        ...
        { "fixup_Cpu0_PC16",        0,      16,  MCFixupKindInfo::FKF_IsPCRel },
    ...
}
```

### LLVMBackendTutorialExampleCode/Chapter11\_2/MCTargetDesc/Cpu0BaseInfo.cpp

```
inline static unsigned getCpu0RegisterNumbering(unsigned RegEnum)
{
    switch (RegEnum) {
    ...
    case Cpu0::T0:
    ...
    }
}
```

### LLVMBackendTutorialExampleCode/Chapter11\_2/MCTargetDesc/Cpu0FixupKinds.cpp

```
enum Fixups {
    ...
    // PC relative branch fixup resulting in - R_CPU0_PC16.
    // cpu0 PC16, e.g. beq
    fixup_Cpu0_PC16,
```

```
}; ...
```

### LLVMBackendTutorialExampleCode/Chapter11\_2/MCTargetDesc/Cpu0MCCodeEmitter.cpp

```
unsigned Cpu0MCCodeEmitter::  
getBranchTargetOpValue(const MCInst &MI, unsigned OpNo,  
                      SmallVectorImpl<MCFixup> &Fixups) const {  
    ...  
    Fixups.push_back(MCFixup::Create(0, Expr,  
                                     MCFixupKind(Cpu0::fixup_Cpu0_PC16)));  
    return 0;  
}  
...  
unsigned Cpu0MCCodeEmitter::  
getJumpTargetOpValue(const MCInst &MI, unsigned OpNo,  
                     SmallVectorImpl<MCFixup> &Fixups) const {  
    ...  
    if (Opcode == Cpu0::JSUB || Opcode == Cpu0::JMP)  
    ...  
}
```

### LLVMBackendTutorialExampleCode/Chapter11\_2/Cpu0InstrInfo.td

```
def jmptarget : Operand<OtherVT> {  
    let EncoderMethod = "getJumpTargetOpValue";  
    let OperandType = "OPERAND_PCREL";  
    let DecoderMethod = "DecodeJumpRelativeTarget";  
}  
...  
// Immediate can be loaded with LUi (32-bit int with lower 16-bit cleared).  
def immLow16Zero : PatLeaf<(imm), [{  
    int64_t Val = N->getSExtValue();  
    return isInt<32>(Val) && !(Val & 0xffff);  
}]>;  
...  
class ArithOverflowR<bits<8> op, string instr_asm,  
                  InstrItinClass itin, RegisterClass RC, bit isComm = 0>:  
    FA<op, (outs RC:$ra), (ins RC:$rb, RC:$rc),  
    !strconcat(instr_asm, "\t$ra, $rb, $rc"), [], itin> {  
    let shamt = 0;  
    let isCommutable = isComm;  
}  
// Conditional Branch  
class CBranch<bits<8> op, string instr_asm, PatFrag cond_op, RegisterClass RC>:  
    FL<op, (outs), (ins RC:$ra, RC:$rb, brtarget:$imm16),  
    !strconcat(instr_asm, "\t$ra, $rb, $imm16"),  
    [(brcond (i32 (cond_op RC:$ra, RC:$rb)), bb:$imm16)], IIBranch> {  
    let isBranch = 1;  
    let isTerminator = 1;  
    let hasDelaySlot = 1;  
    let Defs = [AT];  
}  
...  
// SetCC
```

```

class SetCC_R<bits<8> op, string instr_asm, PatFrag cond_op,
    RegisterClass RC>:
    FA<op, (outs CPURegs:$ra), (ins RC:$rb, RC:$rc),
        !strconcat(instr_asm, "\t$ra, $rb, $rc"),
        [(set CPURegs:$ra, (cond_op RC:$rb, RC:$rc))],
    IIAlu> {
    let shamt = 0;
}

class SetCC_I<bits<8> op, string instr_asm, PatFrag cond_op, Operand Od,
    PatLeaf imm_type, RegisterClass RC>:
    FL<op, (outs CPURegs:$ra), (ins RC:$rb, Od:$imm16),
        !strconcat(instr_asm, "\t$ra, $rb, $imm16"),
        [(set CPURegs:$ra, (cond_op RC:$rb, imm_type:$imm16))],
    IIAlu>;
...
/// Load and Store Instructions
/// aligned
defm LD      : LoadM32<0x01, "ld", load_a>;
defm ST      : StoreM32<0x02, "st", store_a>;

/// Arithmetic Instructions (ALU Immediate)
// add defined in include/llvm/Target/TargetSelectionDAG.td, line 315 (def add).
def ADDiu   : ArithLogicI<0x09, "addiu", add, simm16, immSExt16, CPURegs>;
def SLTi    : SetCC_I<0x0a, "slti", setlt, simm16, immSExt16, CPURegs>;
def SLTiu   : SetCC_I<0x0b, "sltiu", setult, simm16, immSExt16, CPURegs>;
def ANDi    : ArithLogicI<0x0c, "andi", and, uimm16, immZExt16, CPURegs>;
def ORi     : ArithLogicI<0x0d, "ori", or, uimm16, immZExt16, CPURegs>;
def XORi   : ArithLogicI<0x0e, "xori", xor, uimm16, immZExt16, CPURegs>;
def LUI     : LoadUpper<0x0f, "lui", CPURegs, uimm16>;

/// Arithmetic Instructions (3-Operand, R-Type)
def ADDu    : ArithLogicR<0x11, "addu", add, IIAlu, CPURegs, 1>;
def SUBu   : ArithLogicR<0x12, "subu", sub, IIAlu, CPURegs>;
def ADD     : ArithOverflowR<0x13, "add", IIAlu, CPURegs, 1>;
def SUB    : ArithOverflowR<0x14, "sub", IIAlu, CPURegs>;
def MUL    : ArithLogicR<0x15, "mul", mul, IIImul, CPURegs, 1>;
def DIV    : Div32<Cpu0DivRem, 0x16, "div", IIIdiv>;
def DIVu   : Div32<Cpu0DivRemU, 0x17, "divu", IIIdiv>;
def AND    : ArithLogicR<0x18, "and", and, IIAlu, CPURegs, 1>;
def OR     : ArithLogicR<0x19, "or", or, IIAlu, CPURegs, 1>;
def XOR    : ArithLogicR<0x1A, "xor", xor, IIAlu, CPURegs, 1>;

def SLT    : SetCC_R<0x30, "slt", setlt, CPURegs>;
def SLTu   : SetCC_R<0x31, "sltu", setult, CPURegs>;
...

/// Jump and Branch Instructions
def BEQ    : CBranch<0x20, "beq", seteq, CPURegs>;
def BNE    : CBranch<0x21, "bne", setne, CPURegs>;

def JMP    : UncondBranch<0x26, "jmp">;

//=====
// Arbitrary patterns that map to one or more instructions
//=====

```

```

// Small immediates
...
def : Pat<(i32 immZExt16:$in),
      (ORi ZERO, imm:$in)>;
def : Pat<(i32 immLow16Zero:$in),
      (LUI (HI16 imm:$in))>

// Arbitrary immediates
def : Pat<(i32 imm:$imm),
      (ORi (LUI (HI16 imm:$imm)), (LO16 imm:$imm))>;
...

// gp_rel relocs
...
def : Pat<(not CPURegs:$in),
      (XORi CPURegs:$in, 1)>

// brcond patterns
multiclass BrcondPats<RegisterClass RC, Instruction BEQOp, Instruction BNEOp,
                      Instruction SLTOp, Instruction SLTuOp, Instruction SLTiOp,
                      Instruction SLTiOp, Register ZEROReg> {
    def : Pat<(brcond (i32 (setne RC:$lhs, 0)), bb:$dst),
               (BNEOp RC:$lhs, ZEROReg, bb:$dst)>;
    def : Pat<(brcond (i32 (seteq RC:$lhs, 0)), bb:$dst),
               (BEQOp RC:$lhs, ZEROReg, bb:$dst)>

    def : Pat<(brcond (i32 (setge RC:$lhs, RC:$rhs)), bb:$dst),
               (BEQ (SLTOp RC:$lhs, RC:$rhs), ZERO, bb:$dst)>;
    def : Pat<(brcond (i32 (setuge RC:$lhs, RC:$rhs)), bb:$dst),
               (BEQ (SLTuOp RC:$lhs, RC:$rhs), ZERO, bb:$dst)>;
    def : Pat<(brcond (i32 (setge RC:$lhs, immSExt16:$rhs)), bb:$dst),
               (BEQ (SLTiOp RC:$lhs, immSExt16:$rhs), ZERO, bb:$dst)>;
    def : Pat<(brcond (i32 (setuge RC:$lhs, immSExt16:$rhs)), bb:$dst),
               (BEQ (SLTiOp RC:$lhs, immSExt16:$rhs), ZERO, bb:$dst)>

    def : Pat<(brcond (i32 (setle RC:$lhs, RC:$rhs)), bb:$dst),
               (BEQ (SLTOp RC:$rhs, RC:$lhs), ZERO, bb:$dst)>;
    def : Pat<(brcond (i32 (setule RC:$lhs, RC:$rhs)), bb:$dst),
               (BEQ (SLTuOp RC:$rhs, RC:$lhs), ZERO, bb:$dst)>

    def : Pat<(brcond RC:$cond, bb:$dst),
               (BNEOp RC:$cond, ZEROReg, bb:$dst)>;
}

defm : BrcondPats<CPURegs, BEQ, BNE, SLT, SLTu, SLTi, SLTi, ZERO>

// setcc patterns
multiclass SeteqPats<RegisterClass RC, Instruction SLTiOp, Instruction XOROp,
                      Instruction SLTuOp, Register ZEROReg> {
    def : Pat<(seteq RC:$lhs, RC:$rhs),
           (SLTiOp (XOROp RC:$lhs, RC:$rhs), 1)>;
    def : Pat<(setne RC:$lhs, RC:$rhs),
           (SLTuOp ZEROReg, (XOROp RC:$lhs, RC:$rhs))>;
}

multiclass SetlePats<RegisterClass RC, Instruction SLTOp, Instruction SLTuOp> {
    def : Pat<(setle RC:$lhs, RC:$rhs),

```

```

        (XORi (SLTop RC:$rhs, RC:$lhs), 1)>;
def : Pat<(setule RC:$lhs, RC:$rhs),
        (XORi (SLTuOp RC:$rhs, RC:$lhs), 1)>;
}

multiclass SetgtPats<RegisterClass RC, Instruction SLTop, Instruction SLTuOp> {
    def : Pat<(setgt RC:$lhs, RC:$rhs),
          (SLTop RC:$rhs, RC:$lhs)>;
    def : Pat<(setugt RC:$lhs, RC:$rhs),
          (SLTuOp RC:$rhs, RC:$lhs)>;
}

multiclass SetgePats<RegisterClass RC, Instruction SLTop, Instruction SLTuOp> {
    def : Pat<(setge RC:$lhs, RC:$rhs),
          (XORi (SLTop RC:$lhs, RC:$rhs), 1)>;
    def : Pat<(setuge RC:$lhs, RC:$rhs),
          (XORi (SLTuOp RC:$lhs, RC:$rhs), 1)>;
}

multiclass SetgeImmPats<RegisterClass RC, Instruction SLTiOp,
                      Instruction SLTiOp> {
    def : Pat<(setge RC:$lhs, immSExt16:$rhs),
          (XORi (SLTiOp RC:$lhs, immSExt16:$rhs), 1)>;
    def : Pat<(setuge RC:$lhs, immSExt16:$rhs),
          (XORi (SLTiOp RC:$lhs, immSExt16:$rhs), 1)>;
}

defm : SeteqPats<CPUREgs, SLTiu, XOR, SLTu, ZERO>;
defm : SetlePats<CPUREgs, SLT, SLTu>;
defm : SetgtPats<CPUREgs, SLT, SLTu>;
defm : SetgePats<CPUREgs, SLT, SLTu>;
defm : SetgeImmPats<CPUREgs, SLTi, SLTiu>;

```

### LLVMBackendTutorialExampleCode/Chapter11\_2/Cpu0MCInstLower.cpp

```

/ Lower ".cupload $reg" to
// "lui $gp, %hi(_gp_disp)"
// "addiu $gp, $gp, %lo(_gp_disp)"
// "addu $gp, $gp, $t9"
void Cpu0MCInstLower::LowerCPOLOAD(SmallVector<MCInst, 4>& MCInsts) {
    ...
    MCInsts.resize(3);

    CreateMCInst(MCInsts[0], Cpu0::LUI, GPReg, ZEROReg, SymHi);
    CreateMCInst(MCInsts[1], Cpu0::ADDIU, GPReg, GPReg, SymLo);
    CreateMCInst(MCInsts[2], Cpu0::ADD, GPReg, GPReg, T9Reg);
    ...
}

// Lower ".cprestore offset" to "st $gp, offset($sp)".
void Cpu0MCInstLower::LowerCPRESTORE(int64_t Offset,
                                         SmallVector<MCInst, 4>& MCInsts) {
    ...
    // lui at,hi
    // add at,at,sp
    MCInsts.resize(2);
}

```

```

CreateMCInst (MCInsts[0], Cpu0::LUI, ATReg, ZEROReg, MCOperand::CreateImm(Hi));
CreateMCInst (MCInsts[1], Cpu0::ADD, ATReg, ATReg, SPReg);
}

```

#### LLVMBackendTutorialExampleCode/Chapter11\_2/Cpu0RegisterInfo.td

```

let Namespace = "Cpu0" in {
  ...
  def T0    : Cpu0GPRReg< 12, "t0">,   DwarfRegNum<[12]>;
  ...
}

def CPUREgs : RegisterClass<"Cpu0", [i32], 32, (add
  T0,
  // Reserved
  SP, LR, PC)>;

// Remove SR RegisterClass since no SW in General register
// Status Registers
/* def SR  : RegisterClass<"Cpu0", [i32], 32, (add SW)>; */

```

As modified from above, it remove the CMP instruction, SW register and related code from Chapter11\_1/, and change from JEQ 24bits offset to BEQ 16 bits offset. And more, replace “ADDiu, SHL 16” with the efficient LUI instruction.

#### LLVMBackendTutorialExampleCode/Chapter11\_2/Cpu0AnalyzeImmediate.h

```

void ReplaceADDiuSHLWithLUI(InstSeq &Seq);

/// GetShortestSeq - Find the shortest instruction sequence in SeqLs and
/// return it in Insts.
void GetShortestSeq(InstSeqLs &SeqLs, InstSeq &Insts);

```

#### LLVMBackendTutorialExampleCode/Chapter11\_2/Cpu0AnalyzeImmediate.cpp

```

// e.g. the following two instructions
// ADDiu 0x0111
// SHL 18
// are replaced with
// LUI 0x444
void Cpu0AnalyzeImmediate::ReplaceADDiuSHLWithLUI(InstSeq &Seq) {
  // Check if the first two instructions are ADDiu and SHL and the shift amount
  // is at least 16.
  if ((Seq.size() < 2) || (Seq[0].Opc != ADDiu) ||
    (Seq[1].Opc != SHL) || (Seq[1].ImmOpnd < 16))
    return;

  // Sign-extend and shift operand of ADDiu and see if it still fits in 16-bit.
  int64_t Imm = SignExtend64<16>(Seq[0].ImmOpnd);
  int64_t ShiftedImm = (uint64_t)Imm << (Seq[1].ImmOpnd - 16);

  if (!isInt<16>(ShiftedImm))
    return;
}

```

```

// Replace the first instruction and erase the second.
Seq[0].Opc = LUi;
Seq[0].ImmOpnd = (unsigned)(ShiftedImm & 0xffff);
Seq.erase(Seq.begin() + 1);
}

void Cpu0AnalyzeImmediate::GetShortestSeq(InstSeqLs &SeqLs, InstSeq &Insts) {
    InstSeqLs::iterator ShortestSeq = SeqLs.end();
    // The length of an instruction sequence is at most 7.
    unsigned ShortestLength = 8;

    for (InstSeqLs::iterator S = SeqLs.begin(); S != SeqLs.end(); ++S) {
        ReplaceADDiuSHLWithLUI(*S);
        assert(S->size() <= 7);

        if (S->size() < ShortestLength) {
            ShortestSeq = S;
            ShortestLength = S->size();
        }
    }

    Insts.clear();
    Insts.append(ShortestSeq->begin(), ShortestSeq->end());
}

const Cpu0AnalyzeImmediate::InstSeq
&Cpu0AnalyzeImmediate::Analyze(uint64_t Imm, unsigned Size,
                                bool LastInstrIsADDiu) {
    this->Size = Size;

    ADDiu = Cpu0::ADDiu;
    ORi = Cpu0::ORi;
    SHL = Cpu0::SHL;
    LUi = Cpu0::LUi;

    InstSeqLs SeqLs;

    // Get the list of instruction sequences.
    if (LastInstrIsADDiu | !Imm)
        GetInstSeqLsADDiu(Imm, Size, SeqLs);
    else
        GetInstSeqLs(Imm, Size, SeqLs);

    // Set Insts to the shortest instruction sequence.
    GetShortestSeq(SeqLs, Insts);

    return Insts;
}

```

Above code replace addiu and shl with single instruction lui only. The effect as the following table.

Table 11.2: Cpu0 stack adjustment new instructions

|     |                     |              |                                                                                                                                                                             |                                                                                                                                                                                |
|-----|---------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| old | x10000 ~ 0xffffffff | • 0x90008000 | <ul style="list-style-type: none"> <li>• addiu \$1, \$zero, -9;</li> <li>• shl \$1, \$1, 28;</li> <li>• addiu \$1, \$1, -32768;</li> <li>• addu \$sp, \$sp, \$1;</li> </ul> | <ul style="list-style-type: none"> <li>• addiu \$1, \$zero, -28671;</li> <li>• shl \$1, \$1, 16</li> <li>• addiu \$1, \$1, -32768;</li> <li>• addu \$sp, \$sp, \$1;</li> </ul> |
| new | x10000 ~ 0xffffffff | • 0x90008000 | <ul style="list-style-type: none"> <li>• lui \$1, 28671;</li> <li>• ori \$1, \$1, 32768;</li> <li>• addu \$sp, \$sp, \$1;</li> </ul>                                        | <ul style="list-style-type: none"> <li>• lui \$1, 36865;</li> <li>• addiu \$1, \$1, -32768;</li> <li>• addu \$sp, \$sp, \$1;</li> </ul>                                        |

Assume sp = 0xa0008000 and stack size = 0x90008000, then (0xa0008000 - 0x90008000) => 0x10000000. Verify with the Cpu0 Prologue instructions as follows,

1. “lui \$1, 28671” => \$1 = 0x6fff0000.
2. “ori \$1, \$1, 32768” => \$1 = (0x6fff0000 + 0x00008000) => \$1 = 0x6fff8000.
3. “addu \$sp, \$sp, \$1” => \$sp = (0xa0008000 + 0x6fff8000) => \$sp = 0x10000000.

Verify with the Cpu0 Epilogue instructions with sp = 0x10000000 and stack size = 0x90008000 as follows,

1. “lui \$1, 36865” => \$1 = 0x90010000.
2. “addiu \$1, \$1, -32768” => \$1 = (0x90010000 + 0xffff8000) => \$1 = 0x90008000.
3. “addu \$sp, \$sp, \$1” => \$sp = (0x10000000 + 0x90008000) => \$sp = 0xa0008000.

### 11.2.3 Cpu0 Verilog language changes

**LLVMBackendTutorialExampleCode/cpu0\_verilog/redesign/cpu0s.v**

```

`define MEMSIZE 'h7000
`define MEMEMPTY 8'hFF
`define IOADDR  'h7000

// Operand width
`define INT32 2'b11      // 32 bits
`define INT24 2'b10      // 24 bits
`define INT16 2'b01      // 16 bits
`define BYTE  2'b00      // 8  bits

// Reference web: http://ccckmit.wikidot.com/ocs:cpu0
module cpu0(input clock, reset, output reg [2:0] tick,
            output reg [31:0] ir, pc, mar, mdr, inout [31:0] dbus,
            output reg m_en, m_rw, output reg [1:0] m_size);
    reg signed [31:0] R [0:15], HI, LO, SW;
    // HI, LO: High and Low part of 64 bit result
    // SW: Status Word

```

```

reg [7:0] op;
reg [3:0] a, b, c;
reg [4:0] c5;
reg signed [31:0] c12, c16, uc16, c24, Ra, Rb, Rc, pc0; // pc0 : instruction pc

// register name
#define PC    R[15]    // Program Counter
#define LR    R[14]    // Link Register
#define SP    R[13]    // Stack Pointer
// SW Flage
#define N     SW[31]   // Negative flag
#define Z     SW[30]   // Zero
#define C     SW[29]   // Carry
#define V     SW[28]   // Overflow
#define I     SW[7]    // Hardware Interrupt Enable
#define T     SW[6]    // Software Interrupt Enable
#define M     SW[0]    // Mode bit
// Instruction Opcode
parameter [7:0] LD=8'h01,ST=8'h02,LB=8'h03,LBu=8'h04,SB=8'h05,LH=8'h06,
LHu=8'h07,SH=8'h08,ADDiu=8'h09,SLTi=8'h0A,SLTi=8'h0B,ANDi=8'h0C,ORi=8'h0D,
XORi=8'h0E,LUi=8'h0F,
ADDu=8'h11,SUBu=8'h12,ADD=8'h13,SUB=8'h14,MUL=8'h15,DIV=8'h16,DIVu=8'h17,
AND=8'h18,OR=8'h19,XOR=8'h1A,
SRA=8'h1B,ROL=8'h1C,ROR=8'h1D,SHL=8'h1E,SHR=8'h1F,
BEQ=8'h20,BNE=8'h21,
JMP=8'h26,
SWI=8'h2A,JSUB=8'h2B,RET=8'h2C,IRET=8'h2D,JALR=8'h2E,
SLT=8'h30,SLTu=8'h31,
MFHI=8'h40,MFLO=8'h41,MTHI=8'h42,MTLO=8'h43,
MULT=8'h50,MULTu=8'h51;

reg [2:0] state, next_state;
parameter Reset=3'h0, Fetch=3'h1, Decode=3'h2, Execute=3'h3, WriteBack=3'h4;

task memReadStart(input [31:0] addr, input [1:0] size); begin // Read Memory Word
    mar = addr;        // read(m[addr])
    m_rw = 1;          // Access Mode: read
    m_en = 1;          // Enable read
    m_size = size;
end endtask

task memReadEnd(output [31:0] data); begin // Read Memory Finish, get data
    mdr = dbus; // get momory, dbus = m[addr]
    data = mdr; // return to data
    m_en = 0; // read complete
end endtask

// Write memory -- addr: address to write, data: date to write
task memWriteStart(input [31:0] addr, input [31:0] data, input [1:0] size); begin
    mar = addr;        // write(m[addr], data)
    mdr = data;
    m_rw = 0;          // access mode: write
    m_en = 1;          // Enable write
    m_size = size;
end endtask

task memWriteEnd; begin // Write Memory Finish
    m_en = 0; // write complete

```

```

end endtask

task regSet(input [3:0] i, input [31:0] data); begin
    if (i!=0) R[i] = data;
end endtask

task regHILOSet(input [31:0] data1, input [31:0] data2); begin
    HI = data1;
    LO = data2;
end endtask

always @(posedge clock or posedge reset) begin
    if (reset) state <= Reset;
    else state <= next_state;
end

always @(state or reset) begin
    m_en = 0;
    case (state)
        Reset: begin
            'PC = 0; tick = 0; R[0] = 0; SW = 0; 'LR = -1;
            next_state = reset?Reset:Fetch;
        end
        Fetch: begin // Tick 1 : instruction fetch, throw PC to address bus,
            // memory.read(m[PC])
            memReadStart('PC, 'INT32);
            pc0 = 'PC;
            'PC = 'PC+4;
            next_state = Decode;
        end
        Decode: begin // Tick 2 : instruction decode, ir = m[PC]
            memReadEnd(ir); // IR = dbus = m[PC]
            {op,a,b,c} = ir[31:12];
            c24 = $signed(ir[23:0]);
            c16 = $signed(ir[15:0]);
            uc16 = ir[15:0];
            c12 = $signed(ir[11:0]);
            c5 = ir[4:0];
            Ra = R[a];
            Rb = R[b];
            Rc = R[c];
            next_state = Execute;
        end
        Execute: begin // Tick 3 : instruction execution
            case (op)
                // load and store instructions
                LD: memReadStart(Rb+c16, 'INT32); // LD Ra, [Rb+Cx]; Ra<=[Rb+Cx]
                ST: memWriteStart(Rb+c16, Ra, 'INT32); // ST Ra, [Rb+Cx]; Ra>[Rb+Cx]
                LB: memReadStart(Rb+c16, 'BYTE); // LB Ra, [Rb+Cx]; Ra<=(byte) [Rb+Cx]
                LBu: memReadStart(Rb+c16, 'BYTE); // LBu Ra, [Rb+Cx]; Ra<=(byte) [Rb+Cx]
                SB: memWriteStart(Rb+c16, Ra, 'BYTE); // SB Ra, [Rb+Cx]; Ra>=(byte) [Rb+Cx]
                LH: memReadStart(Rb+c16, 'INT16); // LH Ra, [Rb+Cx]; Ra<=(2bytes) [Rb+Cx]
                LHu: memReadStart(Rb+c16, 'INT16); // LHu Ra, [Rb+Cx]; Ra<=(2bytes) [Rb+Cx]
                SH: memWriteStart(Rb+c16, Ra, 'INT16); // SH Ra, [Rb+Cx]; Ra>=(2bytes) [Rb+Cx]
                // Mathematic
                ADDiu: R[a] = Rb+c16; // ADDiu Ra, Rb+Cx; Ra<=Rb+Cx
                // CMP: begin 'N=(Ra-Rb<0); 'Z=(Ra-Rb==0); end // CMP Ra, Rb; SW=(Ra >= Rb)
                ADDu: regSet(a, Rb+Rc); // ADDu Ra,Rb,Rc; Ra<=Rb+Rc
            endcase
        end
    end
end

```

```

ADD:   begin regSet(a, Rb+Rc); if (a < Rb) 'V = 1; else 'V =0; end
                           // ADD Ra,Rb,Rc; Ra<=Rb+Rc
SUBu:  regSet(a, Rb-Rc);           // SUBu Ra,Rb,Rc; Ra<=Rb-Rc
SUB:   begin regSet(a, Rb-Rc); if (Rb < 0 && Rc > 0 && a >= 0)
      'V = 1; else 'V =0; end           // SUB Ra,Rb,Rc; Ra<=Rb-Rc
MUL:   regSet(a, Rb*Rc);           // MUL Ra,Rb,Rc; Ra<=Rb*Rc
DIVu:  regHILOSet(Ra%Rb, Ra/Rb); // DIV Ra,Rb; HI<=Ra%Rb; LO<=Ra/Rb
DIV:   begin regHILOSet(Ra%Rb, Ra/Rb);
      if ((Ra < 0 && Rb < 0) || (Ra == 0)) 'V = 1;
      else 'V =0; end // DIVu Ra,Rb; HI<=Ra%Rb; LO<=Ra/Rb; With overflow
                           // with exception overflow
AND:   regSet(a, Rb&Rc);           // AND Ra,Rb,Rc; Ra<=(Rb and Rc)
ANDi:  regSet(a, Rb&uc16);        // ANDi Ra,Rb,c16; Ra<=(Rb and c16)
OR:    regSet(a, Rb|Rc);           // OR Ra,Rb,Rc; Ra<=(Rb or Rc)
ORi:   regSet(a, Rb|uc16);        // ORi Ra,Rb,c16; Ra<=(Rb or c16)
XOR:   regSet(a, Rb^Rc);           // XOR Ra,Rb,Rc; Ra<=(Rb xor Rc)
XORi:  regSet(a, Rb^uc16);        // XORi Ra,Rb,c16; Ra<=(Rb xor c16)
LUI:   regSet(a, uc16<<16);      // LUI Ra,Rb,c16; Ra<=(Rb << c16)
SHL:   regSet(a, Rb<<c5);        // Shift Left; SHL Ra,Rb,Cx; Ra<=(Rb << Cx)
SRA:   regSet(a, (Rb&'h80000000) | (Rb>>c5));
                           // Shift Right with signed bit fill;
                           // SHR Ra,Rb,Cx; Ra<=(Rb&0x80000000) | (Rb>>Cx)
SHR:   regSet(a, Rb>>c5);        // Shift Right with 0 fill;
                           // SHR Ra,Rb,Cx; Ra<=(Rb >> Cx)
ROL:   regSet(a, (Rb<<c5) | (Rb>>(32-c5))); // Rotate Left;
ROR:   regSet(a, (Rb>>c5) | (Rb<<(32-c5))); // Rotate Right;
// set
SLT:   if (Rb < Rc) R[a]=1; else R[a]=0;
SLTu:  if (Rb < Rc) R[a]=1; else R[a]=0;
SLTi:  if (Rb < c16) R[a]=1; else R[a]=0;
SLTi:  if (Rb < c16) R[a]=1; else R[a]=0;
// Branch Instructions
BEQ:   if (Ra==Rb) 'PC='PC+c16;
BNE:   if (Ra!=Rb) 'PC='PC+c16;
MFLO:  regSet(a, LO);           // MFLO Ra; Ra<=LO
MFHI:  regSet(a, HI);           // MFHI Ra; Ra<=HI
MTLO:  LO = Ra;                // MTLO Ra; LO<=Ra
MTHI:  HI = Ra;                // MTHI Ra; HI<=Ra
MULT:  {HI, LO}=Ra*Rb; // MULT Ra,Rb; HI<=((Ra*Rb)>>32);
                           // LO<=((Ra*Rb) and 0x00000000ffffffff);
                           // with exception overflow
MULTu: {HI, LO}=Ra*Rb; // MULT Ra,Rb; HI<=((Ra*Rb)>>32);
                           // LO<=((Ra*Rb) and 0x00000000ffffffff);
                           // without exception overflow
// Jump Instructions
JMP:   'PC = 'PC+c24;           // JMP Cx; PC <= PC+Cx
SWI:   begin
      'LR='PC; 'PC= c24; 'I = 1'b1;
end // Software Interrupt; SWI Cx; LR <= PC; PC <= Cx; INT<=1
JSUB:begin 'LR='PC; 'PC='PC + c24; end // JSUB Cx; LR<=PC; PC<=PC+Cx
JALR:begin 'LR='PC; 'PC=Ra; end // JALR Ra,Rb; Ra<=PC; PC<=Rb
RET:  begin 'PC='LR; end           // RET; PC <= LR
IRET:begin
      'PC='LR; 'I = 1'b0;
end // Interrupt Return; IRET; PC <= LR; INT<=0
endcase
next_state = WriteBack;
end

```

```

WriteBack: begin // Read/Write finish, close memory
    case (op)
        LD, LB, LBu, LH, LHu : memReadEnd(R[a]);
                                //read memory complete
        ST, SB, SH : memWriteEnd();
                                // write memory complete
    endcase
    case (op)
        MULT, MULTu, DIV, DIVu, MTHI, MTLO :
            $display("%4dns %8x : %8x HI=%8x LO=%8x SW=%8x", $stime, pc0, ir, HI,
                      LO, SW);
        ST :
            if (R[b]+c16 == `IOADDR)
                $display("%4dns %8x : %8x OUTPUT=%-d", $stime, pc0, ir, R[a]);
            else
                $display("%4dns %8x : %8x m[%-04d+%-04d]=-d SW=%8x", $stime, pc0, ir,
                      R[b], c16, R[a], SW);
        default :
            $display("%4dns %8x : %8x R[%02d]=-d SW=%8x", $stime, pc0, ir, a,
                      R[a], R[a], SW);
    endcase
    SW = 0; // clear SW
    if (op==RET && `PC < 0) begin
        $display("RET to PC < 0, finished!");
        $finish;
    end
    next_state = Fetch;
end
endcase
pc = `PC;
end

endmodule

module memory0(input clock, reset, en, rw, input [1:0] m_size,
               input [31:0] abus, dbus_in, output [31:0] dbus_out);
    reg [7:0] m [0:'MEMSIZE-1];
    reg [31:0] data;

    integer i;
    initial begin
        // erase memory
        for (i=0; i < 'MEMSIZE; i=i+1) begin
            m[i] = `MEMEMPTY;
        end
        // display memory contents
        $readmemh("cpu0s.hex", m);
        for (i=0; i < 'MEMSIZE && m[i] != `MEMEMPTY; i=i+4) begin
            $display("%8x: %8x", i, {m[i], m[i+1], m[i+2], m[i+3]});
        end
    end

    always @(clock or abus or en or rw or dbus_in)
    begin
        if (abus >= 0 && abus <= `MEMSIZE-4) begin
            if (en == 1 && rw == 0) begin // r_w==0:write
                data = dbus_in;
                case (m_size)

```

```

`BYTE: {m[abus]} = dbus_in[7:0];
`INT16: {m[abus], m[abus+1]} = dbus_in[15:0];
`INT24: {m[abus], m[abus+1], m[abus+2]} = dbus_in[24:0];
`INT32: {m[abus], m[abus+1], m[abus+2], m[abus+3]} = dbus_in;
endcase
end else if (en == 1 && rw == 1) begin// r_w==1:read
    case (m_size)
        `BYTE: data = {8'h00, 8'h00, 8'h00, m[abus]};
        `INT16: data = {8'h00, 8'h00, m[abus], m[abus+1]};
        `INT24: data = {8'h00, m[abus], m[abus+1], m[abus+2]};
        `INT32: data = {m[abus], m[abus+1], m[abus+2], m[abus+3]};
    endcase
end else
    data = 32'hZZZZZZZZ;
end else
    data = 32'hZZZZZZZZ;
end
assign dbus_out = data;
endmodule

module main;
    reg clock, reset;
    wire [2:0] tick;
    wire [31:0] pc, ir, mar, mdr, dbus;
    wire m_en, m_rw;
    wire [1:0] m_size;

    cpu0 cpu(.clock(clock), .reset(reset), .pc(pc), .tick(tick), .ir(ir),
        .mar(mar), .mdr(mdr), .dbus(dbus), .m_en(m_en), .m_rw(m_rw), .m_size(m_size));

    memory0 mem(.clock(clock), .reset(reset), .en(m_en), .rw(m_rw), .m_size(m_size),
        .abus(mar), .dbus_in(mdr), .dbus_out(dbus));

    initial
    begin
        clock = 0;
        reset = 1;
        #20 reset = 0;
        #300000 $finish;
    end

    always #10 clock=clock+1;
endmodule

```

#### 11.2.4 Run the redesigned Cpu0

Run Chapter11\_2/ with ch\_run\_backend.cpp to get result as below. It match the expect value as comment in ch\_run\_backend.cpp.

##### LLVMBackendTutorialExampleCode/InputFiles/ch\_run\_backend.cpp

```

1 #include "InitRegs.h"
2
3 #define OUT_MEM 0x7000 // 28672
4

```

```

5  asm("addiu $sp, $zero, 0x6ffc");
6
7  void print_integer(int x);
8  int test_operators(int x);
9  int test_control();
10 int sum_i(int amount, ...);
11
12 int main()
13 {
14     int a = 0;
15     a = test_operators(12); // a = 13
16     print_integer(a);
17     a += test_control(); // a = (128+18) = 146
18     print_integer(a);
19     a = sum_i(6, 0, 1, 2, 3, 4, 5);
20     print_integer(a); // a = 15
21
22     return a;
23 }
24
25 // For memory IO
26 void print_integer(int x)
27 {
28     int *p = (int*)OUT_MEM;
29     *p = x;
30     return;
31 }
32
33 void printl_integer(int x)
34 {
35     asm("ld $at, 8($sp)");
36     asm("st $at, 28672($0)");
37     return;
38 }
39
40 #if 0
41 // For instruction IO
42 void print2_integer(int x)
43 {
44     asm("ld $at, 8($sp)");
45     asm("outw $stat");
46     return;
47 }
48 #endif
49
50 int test_operators(int x)
51 {
52     int a = 11;
53     int b = 2;
54     int c, d, e, f, g, h, i, j, k, l, m, n, o;
55     unsigned int a1 = -11, k1 = 0;
56
57     k = (a >> 2);
58     print_integer(k); // 2
59     k1 = (a1 >> 2);
60     print_integer((int)k1); // 0x3fffffd = 1073741821
61     c = a + b;
62     d = a - b;

```

```

63     e = a * b;
64     f = a / b;
65     g = (a & b);
66     h = (a | b);
67     i = (a ^ b);
68     j = (a << 2);
69     l = a % x;
70     m = (a+1)%12;
71
72     n = !a;
73     print_integer(n); // 0
74     int* p = &b;
75     o = *p;
76
77     return (c+d+e+f+g+h+i+j+l+m+o); // (13+9+22+5+2+11+9+44+11+0+2)=128
78 }
79
80 int test_control()
81 {
82     int b = 1;
83     int c = 2;
84     int d = 3;
85     int e = 4;
86     int f = 5;
87
88     if (b != 0) {
89         b++;
90     }
91     if (c > 0) {
92         c++;
93     }
94     if (d >= 0) {
95         d++;
96     }
97     if (e < 0) {
98         e++;
99     }
100    if (f <= 0) {
101        f++;
102    }
103
104    return (b+c+d+e+f); // (2+3+4+4+5)=18
105 }
106
107 int sum_i(int amount, ...)
108 {
109     int i = 0;
110     int val = 0;
111     int sum = 0;
112
113     va_list vl;
114     va_start(vl, amount);
115     for (i = 0; i < amount; i++)
116     {
117         val = va_arg(vl, int);
118         sum += val;
119     }
120     va_end(vl);

```

```

121
122     return sum;
123 }

118-165-77-203:InputFiles Jonathan$ clang -target 'llvm-config --host-target'
-c ch_run_backend.cpp -emit-llvm -o ch_run_backend.bc
118-165-77-203:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=obj -stats
ch_run_backend.bc -o ch_run_backend.cpu0.o
=====
          ... Statistics Collected ...
=====
...
  5 del-jmp      - Number of useless jmp deleted
...
118-165-77-203:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llvm-objdump -d ch_run_backend.cpu0.o | tail -n +6| awk '{print /* " $1
" */\t" $2 " " $3 " " $4 " " $5 "\t/* " $6"\t" $7" " $8" " $9" " $10 "\t
118-165-77-203:redesign Jonathan$ ./cpu0s
WARNING: cpu0s.v:227: $readmemh(cpu0s.hex): Not enough words in the file for
the requested range [0:1536].
00000000: 09100000
00000004: 09200000
00000008: 09300000
0000000c: 09400000
00000010: 09500000
00000014: 09600000
00000018: 09700000
0000001c: 09800000
00000020: 09900000
00000024: 09a00000
00000028: 09b00000
0000002c: 09c00000
00000030: 09e0ffff
00000034: 09d005fc
00000038: 09ddffe0
0000003c: 02ed001c
00000040: 09200000
00000044: 022d0018
00000048: 022d0014
0000004c: 2b000038
00000050: 022d0014
00000054: 022d0000
00000058: 2b000190
0000005c: 2b0001b0
00000060: 013d0014
00000064: 11232000
00000068: 022d0014
0000006c: 022d0000
00000070: 2b000178
00000074: 2b00026c
00000078: 012d0014
0000007c: 01ed001c
00000080: 09dd0020
00000084: 2c000000
00000088: 09ddffa0

```

```
0000008c: 02ed005c
00000090: 027d0058
00000094: 0920000b
00000098: 022d0054
0000009c: 09200002
000000a0: 022d0050
000000a4: 09700000
000000a8: 027d004c
000000ac: 027d0048
000000b0: 027d0028
000000b4: 0920ffffb
000000b8: 022d0024
000000bc: 027d0020
000000c0: 0f20f000
000000c4: 0d220001
000000c8: 022d001c
000000cc: 0f20000f
000000d0: 0d22ffff
000000d4: 022d0018
000000d8: 013d001c
000000dc: 11232000
000000e0: 022d0024
000000e4: 012d0050
000000e8: 013d0054
000000ec: 11232000
000000f0: 022d004c
000000f4: 012d0050
000000f8: 013d0054
000000fc: 12232000
00000100: 022d0048
00000104: 012d0050
00000108: 013d0054
0000010c: 15232000
00000110: 022d0044
00000114: 012d0050
00000118: 013d0054
0000011c: 16320000
00000120: 23200000
00000124: 022d0040
00000128: 0f202aaa
0000012c: 0d32aaab
00000130: 012d0054
00000134: 09220001
00000138: 26230000
0000013c: 22300000
00000140: 1f43001f
00000144: 1b330001
00000148: 11334000
0000014c: 0940000c
00000150: 15334000
00000154: 12223000
00000158: 022d0050
0000015c: 013d0054
00000160: 18232000
00000164: 022d003c
00000168: 012d0050
0000016c: 013d0054
00000170: 19232000
```

00000174: 022d0038  
00000178: 012d0050  
0000017c: 013d0054  
00000180: 1a232000  
00000184: 022d0034  
00000188: 012d0054  
0000018c: 1e220002  
00000190: 022d0030  
00000194: 012d0054  
00000198: 1b220002  
0000019c: 022d002c  
000001a0: 022d0000  
000001a4: 2b000044  
000001a8: 012d0024  
000001ac: 1f220002  
000001b0: 022d0020  
000001b4: 022d0000  
000001b8: 2b000030  
000001bc: 012d0054  
000001c0: 1a227000  
000001c4: 0b220001  
000001c8: 0c220001  
000001cc: 022d0050  
000001d0: 092d0050  
000001d4: 022d0014  
000001d8: 012d004c  
000001dc: 017d0058  
000001e0: 01ed005c  
000001e4: 09dd0060  
000001e8: 2c000000  
000001ec: 09ddffff8  
000001f0: 012d0008  
000001f4: 022d0004  
000001f8: 09207000  
000001fc: 022d0000  
00000200: 013d0004  
00000204: 02320000  
00000208: 09dd0008  
0000020c: 2c000000  
00000210: 09ddfffe8  
00000214: 09200001  
00000218: 022d0014  
0000021c: 09200002  
00000220: 022d0010  
00000224: 09200003  
00000228: 022d000c  
0000022c: 09200004  
00000230: 022d0008  
00000234: 09200005  
00000238: 022d0004  
0000023c: 012d0014  
00000240: 2720000c  
00000244: 012d0014  
00000248: 09220001  
0000024c: 022d0014  
00000250: 012d0010  
00000254: 0a220001  
00000258: 2820000c

```
0000025c: 012d0010
00000260: 09220001
00000264: 022d0010
00000268: 012d000c
0000026c: 0a220000
00000270: 2820000c
00000274: 012d000c
00000278: 09220001
0000027c: 022d000c
00000280: 012d0008
00000284: 0930ffff
00000288: 20232000
0000028c: 2820000c
00000290: 012d0008
00000294: 09220001
00000298: 022d0008
0000029c: 012d0004
000002a0: 09300000
000002a4: 20232000
000002a8: 2820000c
000002ac: 012d0004
000002b0: 09220001
000002b4: 022d0004
000002b8: 012d0010
000002bc: 013d0014
000002c0: 11232000
000002c4: 013d000c
000002c8: 11223000
000002cc: 013d0008
000002d0: 11223000
000002d4: 013d0004
000002d8: 11223000
000002dc: 09dd0018
000002e0: 2c000000
000002e4: 09ddffff
000002e8: 022d0008
000002ec: 023d0004
000002f0: 024d0000
000002f4: 0f207ffff
000002f8: 0f301000
000002fc: 11423000
00000300: 0f207ffff
00000304: 0f301000
00000308: 13423000
0000030c: 0f208ffff
00000310: 0f307000
00000314: 14423000
00000318: 0f200000
0000031c: 0930ffff
00000320: 14423000
00000324: 0f20ffff
00000328: 0d22ffff
0000032c: 0c22ffff
00000330: 1e220010
00000334: 0e22ffff
00000338: 0930ffff
0000033c: 17230000
00000340: 16230000
```

```

00000344: 0e220001
00000348: 1c421004
0000034c: 1d421008
00000350: 012d0008
00000354: 013d0004
00000358: 014d0000
0000035c: 09dd000c
00000360: 2c000000
    90ns 00000000 : 09100000 R[01]=00000000=0           SW=00000000
    170ns 00000004 : 09200000 R[02]=00000000=0           SW=00000000
    250ns 00000008 : 09300000 R[03]=00000000=0           SW=00000000
    330ns 0000000c : 09400000 R[04]=00000000=0           SW=00000000
    410ns 00000010 : 09500000 R[05]=00000000=0           SW=00000000
    490ns 00000014 : 09600000 R[06]=00000000=0           SW=00000000
    570ns 00000018 : 09700000 R[07]=00000000=0           SW=00000000
    650ns 0000001c : 09800000 R[08]=00000000=0           SW=00000000
    730ns 00000020 : 09900000 R[09]=00000000=0           SW=00000000
    810ns 00000024 : 09a00000 R[10]=00000000=0           SW=00000000
    890ns 00000028 : 09b00000 R[11]=00000000=0           SW=00000000
    970ns 0000002c : 09c00000 R[12]=00000000=0           SW=00000000
    1050ns 00000030 : 09e0ffff R[14]=ffffffff=-1         SW=00000000
    1130ns 00000034 : 09d005fc R[13]=000005fc=1532       SW=00000000
    1210ns 00000038 : 09ddfffe0 R[13]=000005dc=1500       SW=00000000
    1290ns 0000003c : 02ed001c m[1500+28 ]=-1           SW=00000000
    1370ns 00000040 : 09200000 R[02]=00000000=0           SW=00000000
    1450ns 00000044 : 022d0018 m[1500+24 ]=0           SW=00000000
    1530ns 00000048 : 022d0014 m[1500+20 ]=0           SW=00000000
    1610ns 0000004c : 2b000038 R[00]=00000000=0           SW=00000000
    1690ns 00000088 : 09ddffa0 R[13]=0000057c=1404       SW=00000000
    1770ns 0000008c : 02ed005c m[1404+92 ]=80           SW=00000000
    1850ns 00000090 : 027d0058 m[1404+88 ]=0           SW=00000000
    1930ns 00000094 : 0920000b R[02]=0000000b=11         SW=00000000
    2010ns 00000098 : 022d0054 m[1404+84 ]=11           SW=00000000
    2090ns 0000009c : 09200002 R[02]=00000002=2           SW=00000000
    2170ns 000000a0 : 022d0050 m[1404+80 ]=2           SW=00000000
    2250ns 000000a4 : 09700000 R[07]=00000000=0           SW=00000000
    2330ns 000000a8 : 027d004c m[1404+76 ]=0           SW=00000000
    2410ns 000000ac : 027d0048 m[1404+72 ]=0           SW=00000000
    2490ns 000000b0 : 027d0028 m[1404+40 ]=0           SW=00000000
    2570ns 000000b4 : 0920ffff R[02]=ffffffffb=-5         SW=00000000
    2650ns 000000b8 : 022d0024 m[1404+36 ]=-5           SW=00000000
    2730ns 000000bc : 027d0020 m[1404+32 ]=0           SW=00000000
    2810ns 000000c0 : 0f20f000 R[02]=f0000000=-268435456 SW=00000000
    2890ns 000000c4 : 0d220001 R[02]=f0000001=-268435455 SW=00000000
    2970ns 000000c8 : 022d001c m[1404+28 ]=-268435455 SW=00000000
    3050ns 000000cc : 0f20000f R[02]=000f0000=983040       SW=00000000
    3130ns 000000d0 : 0d22ffff R[02]=000fffff=1048575       SW=00000000
    3210ns 000000d4 : 022d0018 m[1404+24 ]=1048575       SW=00000000
    3290ns 000000d8 : 013d001c R[03]=f0000001=-268435455 SW=00000000
    3370ns 000000dc : 11232000 R[02]=f0100000=-267386880 SW=00000000
    3450ns 000000e0 : 022d0024 m[1404+36 ]=-267386880       SW=00000000
    3530ns 000000e4 : 012d0050 R[02]=00000002=2           SW=00000000
    3610ns 000000e8 : 013d0054 R[03]=0000000b=11          SW=00000000
    3690ns 000000ec : 11232000 R[02]=0000000d=13          SW=00000000
    3770ns 000000f0 : 022d004c m[1404+76 ]=13           SW=00000000
    3850ns 000000f4 : 012d0050 R[02]=00000002=2           SW=00000000
    3930ns 000000f8 : 013d0054 R[03]=0000000b=11          SW=00000000
    4010ns 000000fc : 12232000 R[02]=00000009=9           SW=00000000

```

```

4090ns 00000100 : 022d0048 m[1404+72 ]=9          SW=00000000
4170ns 00000104 : 012d0050 R[02]=00000002=2          SW=00000000
4250ns 00000108 : 013d0054 R[03]=0000000b=11         SW=00000000
4330ns 0000010c : 15232000 R[02]=00000016=22         SW=00000000
4410ns 00000110 : 022d0044 m[1404+68 ]=22          SW=00000000
4490ns 00000114 : 012d0050 R[02]=00000002=2          SW=00000000
4570ns 00000118 : 013d0054 R[03]=0000000b=11         SW=00000000
4650ns 0000011c : 16320000 HI=00000001 LO=00000005 SW=00000000
4730ns 00000120 : 23200000 R[02]=00000005=5          SW=00000000
4810ns 00000124 : 022d0040 m[1404+64 ]=5          SW=00000000
4890ns 00000128 : 0f202aaa R[02]=2aaa0000=715784192 SW=00000000
4970ns 0000012c : 0d32aaab R[03]=2aaaaaab=715827883 SW=00000000
5050ns 00000130 : 012d0054 R[02]=0000000b=11         SW=00000000
5130ns 00000134 : 09220001 R[02]=0000000c=12         SW=00000000
5210ns 00000138 : 26230000 HI=00000002 LO=00000004 SW=00000000
5290ns 0000013c : 22300000 R[03]=00000002=2          SW=00000000
5370ns 00000140 : 1f43001f R[04]=00000000=0          SW=00000000
5450ns 00000144 : 1b330001 R[03]=00000001=1          SW=00000000
5530ns 00000148 : 11334000 R[03]=00000001=1          SW=00000000
5610ns 0000014c : 0940000c R[04]=0000000c=12         SW=00000000
5690ns 00000150 : 15334000 R[03]=0000000c=12         SW=00000000
5770ns 00000154 : 12223000 R[02]=00000000=0          SW=00000000
5850ns 00000158 : 022d0050 m[1404+80 ]=0          SW=00000000
5930ns 0000015c : 013d0054 R[03]=0000000b=11         SW=00000000
6010ns 00000160 : 18232000 R[02]=00000000=0          SW=00000000
6090ns 00000164 : 022d003c m[1404+60 ]=0          SW=00000000
6170ns 00000168 : 012d0050 R[02]=00000000=0          SW=00000000
6250ns 0000016c : 013d0054 R[03]=0000000b=11         SW=00000000
6330ns 00000170 : 19232000 R[02]=0000000b=11         SW=00000000
6410ns 00000174 : 022d0038 m[1404+56 ]=11          SW=00000000
6490ns 00000178 : 012d0050 R[02]=00000000=0          SW=00000000
6570ns 0000017c : 013d0054 R[03]=0000000b=11         SW=00000000
6650ns 00000180 : 1a232000 R[02]=0000000b=11         SW=00000000
6730ns 00000184 : 022d0034 m[1404+52 ]=11          SW=00000000
6810ns 00000188 : 012d0054 R[02]=0000000b=11         SW=00000000
6890ns 0000018c : 1e220002 R[02]=0000002c=44         SW=00000000
6970ns 00000190 : 022d0030 m[1404+48 ]=44          SW=00000000
7050ns 00000194 : 012d0054 R[02]=0000000b=11         SW=00000000
7130ns 00000198 : 1b220002 R[02]=00000002=2          SW=00000000
7210ns 0000019c : 022d002c m[1404+44 ]=2          SW=00000000
7290ns 000001a0 : 022d0000 m[1404+0 ]=2          SW=00000000
7370ns 000001a4 : 2b000044 R[00]=00000000=0          SW=00000000
7450ns 000001ec : 09ddfff8 R[13]=00000574=1396        SW=00000000
7530ns 000001f0 : 012d0008 R[02]=00000002=2          SW=00000000
7610ns 000001f4 : 022d0004 m[1396+4 ]=2          SW=00000000
7690ns 000001f8 : 09207000 R[02]=00007000=28672        SW=00000000
7770ns 000001fc : 022d0000 m[1396+0 ]=28672        SW=00000000
7850ns 00000200 : 013d0004 R[03]=00000002=2          SW=00000000
7930ns 00000204 : 02320000 OUTPUT=2
8010ns 00000208 : 09dd0008 R[13]=0000057c=1404        SW=00000000
8090ns 0000020c : 2c000000 R[00]=00000000=0          SW=00000000
8170ns 000001a8 : 012d0024 R[02]=f0100000=-267386880 SW=00000000
8250ns 000001ac : 1f220002 R[02]=3c040000=1006895104 SW=00000000
8330ns 000001b0 : 022d0020 m[1404+32 ]=1006895104 SW=00000000
8410ns 000001b4 : 022d0000 m[1404+0 ]=1006895104 SW=00000000
8490ns 000001b8 : 2b000030 R[00]=00000000=0          SW=00000000
8570ns 000001ec : 09ddfff8 R[13]=00000574=1396        SW=00000000
8650ns 000001f0 : 012d0008 R[02]=3c040000=1006895104 SW=00000000

```

```

8730ns 000001f4 : 022d0004 m[1396+4] = 1006895104 SW=00000000
8810ns 000001f8 : 09207000 R[02]=00007000=28672 SW=00000000
8890ns 000001fc : 022d0000 m[1396+0] = 28672 SW=00000000
8970ns 00000200 : 013d0004 R[03]=3c040000=1006895104 SW=00000000
9050ns 00000204 : 02320000 OUTPUT=1006895104
9130ns 00000208 : 09dd0008 R[13]=0000057c=1404 SW=00000000
9210ns 0000020c : 2c000000 R[00]=00000000=0 SW=00000000
9290ns 000001bc : 012d0054 R[02]=0000000b=11 SW=00000000
9370ns 000001c0 : 1a227000 R[02]=0000000b=11 SW=00000000
9450ns 000001c4 : 0b220001 R[02]=00000000=0 SW=00000000
9530ns 000001c8 : 0c220001 R[02]=00000000=0 SW=00000000
9610ns 000001cc : 022d0050 m[1404+80] = 0 SW=00000000
9690ns 000001d0 : 092d0050 R[02]=000005cc=1484 SW=00000000
9770ns 000001d4 : 022d0014 m[1404+20] = 1484 SW=00000000
9850ns 000001d8 : 012d004c R[02]=0000000d=13 SW=00000000
9930ns 000001dc : 017d0058 R[07]=00000000=0 SW=00000000
10010ns 000001e0 : 01ed005c R[14]=00000050=80 SW=00000000
10090ns 000001e4 : 09dd0060 R[13]=000005dc=1500 SW=00000000
10170ns 000001e8 : 2c000000 R[00]=00000000=0 SW=00000000
10250ns 00000050 : 022d0014 m[1500+20] = 13 SW=00000000
10330ns 00000054 : 022d0000 m[1500+0] = 13 SW=00000000
10410ns 00000058 : 2b000190 R[00]=00000000=0 SW=00000000
10490ns 000001ec : 09ddffff8 R[13]=000005d4=1492 SW=00000000
10570ns 000001f0 : 012d0008 R[02]=0000000d=13 SW=00000000
10650ns 000001f4 : 022d0004 m[1492+4] = 13 SW=00000000
10730ns 000001f8 : 09207000 R[02]=00007000=28672 SW=00000000
10810ns 000001fc : 022d0000 m[1492+0] = 28672 SW=00000000
10890ns 00000200 : 013d0004 R[03]=0000000d=13 SW=00000000
10970ns 00000204 : 02320000 OUTPUT=13
11050ns 00000208 : 09dd0008 R[13]=000005dc=1500 SW=00000000
11130ns 0000020c : 2c000000 R[00]=00000000=0 SW=00000000
11210ns 0000005c : 2b0001b0 R[00]=00000000=0 SW=00000000
11290ns 00000210 : 09ddffe8 R[13]=000005c4=1476 SW=00000000
11370ns 00000214 : 09200001 R[02]=00000001=1 SW=00000000
11450ns 00000218 : 022d0014 m[1476+20] = 1 SW=00000000
11530ns 0000021c : 09200002 R[02]=00000002=2 SW=00000000
11610ns 00000220 : 022d0010 m[1476+16] = 2 SW=00000000
11690ns 00000224 : 09200003 R[02]=00000003=3 SW=00000000
11770ns 00000228 : 022d000c m[1476+12] = 3 SW=00000000
11850ns 0000022c : 09200004 R[02]=00000004=4 SW=00000000
11930ns 00000230 : 022d0008 m[1476+8] = 4 SW=00000000
12010ns 00000234 : 09200005 R[02]=00000005=5 SW=00000000
12090ns 00000238 : 022d0004 m[1476+4] = 5 SW=00000000
12170ns 0000023c : 012d0014 R[02]=00000001=1 SW=00000000
12250ns 00000240 : 2720000c HI=00000002 LO=00000004 SW=00000000
12330ns 00000244 : 012d0014 R[02]=00000001=1 SW=00000000
12410ns 00000248 : 09220001 R[02]=00000002=2 SW=00000000
12490ns 0000024c : 022d0014 m[1476+20] = 2 SW=00000000
12570ns 00000250 : 012d0010 R[02]=00000002=2 SW=00000000
12650ns 00000254 : 0a220001 R[02]=00000000=0 SW=00000000
12730ns 00000258 : 2820000c R[02]=00000000=0 SW=00000000
12810ns 0000025c : 012d0010 R[02]=00000002=2 SW=00000000
12890ns 00000260 : 09220001 R[02]=00000003=3 SW=00000000
12970ns 00000264 : 022d0010 m[1476+16] = 3 SW=00000000
13050ns 00000268 : 012d000c R[02]=00000003=3 SW=00000000
13130ns 0000026c : 0a220000 R[02]=00000000=0 SW=00000000
13210ns 00000270 : 2820000c R[02]=00000000=0 SW=00000000
13290ns 00000274 : 012d000c R[02]=00000003=3 SW=00000000

```

```

13370ns 00000278 : 09220001 R[02]=00000004=4 SW=00000000
13450ns 0000027c : 022d000c m[1476+12 ]=4 SW=00000000
13530ns 00000280 : 012d0008 R[02]=00000004=4 SW=00000000
13610ns 00000284 : 0930ffff R[03]=-ffffffff=-1 SW=00000000
13690ns 00000288 : 20232000 R[02]=00000001=1 SW=00000000
13770ns 0000028c : 2820000c R[02]=00000001=1 SW=00000000
13850ns 0000029c : 012d0004 R[02]=00000005=5 SW=00000000
13930ns 000002a0 : 09300000 R[03]=00000000=0 SW=00000000
14010ns 000002a4 : 20232000 R[02]=00000001=1 SW=00000000
14090ns 000002a8 : 2820000c R[02]=00000001=1 SW=00000000
14170ns 000002b8 : 012d0010 R[02]=00000003=3 SW=00000000
14250ns 000002bc : 013d0014 R[03]=00000002=2 SW=00000000
14330ns 000002c0 : 11232000 R[02]=00000005=5 SW=00000000
14410ns 000002c4 : 013d000c R[03]=00000004=4 SW=00000000
14490ns 000002c8 : 11223000 R[02]=00000009=9 SW=00000000
14570ns 000002cc : 013d0008 R[03]=00000004=4 SW=00000000
14650ns 000002d0 : 11223000 R[02]=0000000d=13 SW=00000000
14730ns 000002d4 : 013d0004 R[03]=00000005=5 SW=00000000
14810ns 000002d8 : 11223000 R[02]=00000012=18 SW=00000000
14890ns 000002dc : 09dd0018 R[13]=000005dc=1500 SW=00000000
14970ns 000002e0 : 2c000000 R[00]=00000000=0 SW=00000000
15050ns 00000060 : 013d0014 R[03]=0000000d=13 SW=00000000
15130ns 00000064 : 11232000 R[02]=0000001f=31 SW=00000000
15210ns 00000068 : 022d0014 m[1500+20 ]=31 SW=00000000
15290ns 0000006c : 022d0000 m[1500+0 ]=31 SW=00000000
15370ns 00000070 : 2b000178 R[00]=00000000=0 SW=00000000
15450ns 000001ec : 09ddffff R[13]=000005d4=1492 SW=00000000
15530ns 000001f0 : 012d0008 R[02]=0000001f=31 SW=00000000
15610ns 000001f4 : 022d0004 m[1492+4 ]=31 SW=00000000
15690ns 000001f8 : 09207000 R[02]=00007000=28672 SW=00000000
15770ns 000001fc : 022d0000 m[1492+0 ]=28672 SW=00000000
15850ns 00000200 : 013d0004 R[03]=0000001f=31 SW=00000000
15930ns 00000204 : 02320000 OUTPUT=31
16010ns 00000208 : 09dd0008 R[13]=000005dc=1500 SW=00000000
16090ns 0000020c : 2c000000 R[00]=00000000=0 SW=00000000
16170ns 00000074 : 2b00026c R[00]=00000000=0 SW=00000000
16250ns 000002e4 : 09ddffff R[13]=000005d0=1488 SW=00000000
16330ns 000002e8 : 022d0008 m[1488+8 ]=28672 SW=00000000
16410ns 000002ec : 023d0004 m[1488+4 ]=31 SW=00000000
16490ns 000002f0 : 024d0000 m[1488+0 ]=12 SW=00000000
16570ns 000002f4 : 0f207fff R[02]=7fff0000=2147418112 SW=00000000
16650ns 000002f8 : 0f301000 R[03]=10000000=268435456 SW=00000000
16730ns 000002fc : 11423000 R[04]=8fff0000=-1879113728 SW=00000000
16810ns 00000300 : 0f207fff R[02]=7fff0000=2147418112 SW=00000000
16890ns 00000304 : 0f301000 R[03]=10000000=268435456 SW=00000000
16970ns 00000308 : 13423000 R[04]=8fff0000=-1879113728 SW=10000000
17050ns 0000030c : 0f208fff R[02]=8fff0000=-1879113728 SW=00000000
17130ns 00000310 : 0f307000 R[03]=70000000=1879048192 SW=00000000
17210ns 00000314 : 14423000 R[04]=1fff0000=536805376 SW=10000000
17290ns 00000318 : 0f200000 R[02]=00000000=0 SW=00000000
17370ns 0000031c : 0930ffff R[03]=-ffffffff=-1 SW=00000000
17450ns 00000320 : 14423000 R[04]=00000001=1 SW=00000000
17530ns 00000324 : 0f20ffff R[02]=ffff0000=-65536 SW=00000000
17610ns 00000328 : 0d22ffff R[02]=-ffffffffff=-1 SW=00000000
17690ns 0000032c : 0c22ffff R[02]=0000ffff=65535 SW=00000000
17770ns 00000330 : 1e220010 R[02]=-ffff0000=-65536 SW=00000000
17850ns 00000334 : 0e22ffff R[02]=-ffffffff=-1 SW=00000000
17930ns 00000338 : 0930ffff R[03]=-ffffffff=-1 SW=00000000

```

```

18010ns 0000033c : 17230000 HI=00000000 LO=00000001 SW=00000000
18090ns 00000340 : 16230000 HI=00000000 LO=00000001 SW=10000000
18170ns 00000344 : 0e220001 R[02]=fffffffffe=-2           SW=00000000
18250ns 00000348 : 1c421004 R[04]=fffffffffef=-17        SW=00000000
18330ns 0000034c : 1d421008 R[04]=fefffffe=-16777217   SW=00000000
18410ns 00000350 : 012d0008 R[02]=00007000=28672       SW=00000000
18490ns 00000354 : 013d0004 R[03]=00000001f=31         SW=00000000
18570ns 00000358 : 014d0000 R[04]=00000000c=12         SW=00000000
18650ns 0000035c : 09dd000c R[13]=000005dc=1500        SW=00000000
18730ns 00000360 : 2c000000 R[00]=00000000=0           SW=00000000
18810ns 00000078 : 012d0014 R[02]=00000001f=31         SW=00000000
18890ns 0000007c : 01ed001c R[14]=fffffffffe=-1         SW=00000000
18970ns 00000080 : 09dd0020 R[13]=000005fc=1532        SW=00000000
19050ns 00000084 : 2c000000 R[00]=00000000=0           SW=00000000
RET to PC < 0, finished!

```

Run with ch7\_1\_1.cpp, it reduce some branch from pair instructions “CMP, JXX” to 1 single instruction ether is BEQ or BNE, as follows,

```

118-165-77-203:InputFiles Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
bin/Debug/llc -march=cpu0 -relocation-model=static -filetype=asm ch7_1_1.bc -o
ch7_1_1.cpu0.s
118-165-77-203:InputFiles Jonathan$ cat ch7_1_1.cpu0.s
.section .mdebug.abi32
.previous
.file "ch7_1_1.bc"
.text
.globl main
.align 2
.type main,@function
.ent main           # @main
main:
.cfi_startproc
.frame $sp,40,$lr
.mask 0x00000000,0
.set noreorder
.set nomacro
# BB#0:
    addiu $sp, $sp, -40
$tmp1:
.cfi_def_cfa_offset 40
    addiu $3, $zero, 0
    st $3, 36($sp)
    st $3, 32($sp)
    addiu $2, $zero, 1
    st $2, 28($sp)
    addiu $4, $zero, 2
    st $4, 24($sp)
    addiu $4, $zero, 3
    st $4, 20($sp)
    addiu $4, $zero, 4
    st $4, 16($sp)
    addiu $4, $zero, 5
    st $4, 12($sp)
    addiu $4, $zero, 6
    st $4, 8($sp)
    addiu $4, $zero, 7
    st $4, 4($sp)

```

```
addiu $4, $zero, 8
st $4, 0($sp)
ld $4, 32($sp)
bne $4, $zero, $BB0_2
# BB#1:
ld $4, 32($sp)
addiu $4, $4, 1
st $4, 32($sp)
$BB0_2:
ld $4, 28($sp)
beq $4, $zero, $BB0_4
# BB#3:
ld $4, 28($sp)
addiu $4, $4, 1
st $4, 28($sp)
$BB0_4:
ld $4, 24($sp)
slti $4, $4, 1
bne $4, $zero, $BB0_6
# BB#5:
ld $4, 24($sp)
addiu $4, $4, 1
st $4, 24($sp)
$BB0_6:
ld $4, 20($sp)
slti $4, $4, 0
bne $4, $zero, $BB0_8
# BB#7:
ld $4, 20($sp)
addiu $4, $4, 1
st $4, 20($sp)
$BB0_8:
ld $4, 16($sp)
addiu $5, $zero, -1
slt $4, $5, $4
bne $4, $zero, $BB0_10
# BB#9:
ld $4, 16($sp)
addiu $4, $4, 1
st $4, 16($sp)
$BB0_10:
ld $4, 12($sp)
slt $3, $3, $4
bne $3, $zero, $BB0_12
# BB#11:
ld $3, 12($sp)
addiu $3, $3, 1
st $3, 12($sp)
$BB0_12:
ld $3, 8($sp)
slt $2, $2, $3
bne $2, $zero, $BB0_14
# BB#13:
ld $2, 8($sp)
addiu $2, $2, 1
st $2, 8($sp)
$BB0_14:
ld $2, 4($sp)
```

```
    slti $2, $2, 1
    bne $2, $zero, $BB0_16
# BB#15:
    ld $2, 4($sp)
    addiu $2, $2, 1
    st $2, 4($sp)
$BB0_16:
    ld $2, 4($sp)
    ld $3, 0($sp)
    slt $2, $3, $2
    beq $2, $zero, $BB0_18
# BB#17:
    ld $2, 0($sp)
    addiu $2, $2, 1
    st $2, 0($sp)
$BB0_18:
    ld $2, 28($sp)
    ld $3, 32($sp)
    beq $3, $2, $BB0_20
# BB#19:
    ld $2, 32($sp)
    addiu $2, $2, 1
    st $2, 32($sp)
$BB0_20:
    ld $2, 32($sp)
    addiu $sp, $sp, 40
    ret $lr
    .set macro
    .set reorder
    .end main
$tmp2:
    .size main, ($tmp2)-main
    .cfi_endproc
```

The ch11\_3.cpp is written in assembly for AsmParser test. You can check if it will generate the obj.



# APPENDIX A: GETTING STARTED: INSTALLING LLVM AND THE CPU0 EXAMPLE CODE

This book is on the process of merging into llvm trunk but not finished yet. The merged llvm trunk version on my git hub is LLVM 3.3 released. So, you have to get book example code and the based llvm trunk by git command as follows,

```
git clone https://github.com/Jonathan2251/lbd.git
```

In this chapter, we will run through how to set up LLVM using if you are using Mac OS X or Linux. When discussing Mac OS X, we are using Apple's Xcode IDE (version 4.5.1) running on Mac OS X Mountain Lion (version 10.8) to modify and build LLVM from source, and we will be debugging using lldb. We cannot debug our LLVM builds within Xcode at the moment, but if you have experience with this, please contact us and help us build documentation that covers this. For Linux machines, we are building and debugging (using gdb) our LLVM installations on a Fedora 17 system. We will not be using an IDE for Linux, but once again, if you have experience building/ debugging LLVM using Eclipse or other major IDEs, please contact the authors. For information on using cmake to build LLVM, please refer to the "Building LLVM with CMake"<sup>1</sup> documentation for further information. We are using cmake version 2.8.9.

We will install two llvm directories in this chapter. One is the directory llvm/release/ which contains the clang, clang++ compiler we will use to translate the C/C++ input file into llvm IR. The other is the directory llvm/test/ which contains our cpu0 backend program and without clang and clang++.

---

## **Todo**

Find information on debugging LLVM within Xcode for Macs.

---

---

## **Todo**

Find information on building/debugging LLVM within Eclipse for Linux.

---

## 12.1 Setting Up Your Mac

### 12.1.1 Installing LLVM, Xcode and cmake

---

<sup>1</sup> <http://llvm.org/docs/CMake.html?highlight=cmake>

### Todo

Fix centering for figure captions.

---

Please download LLVM latest release version 3.2 (llvm, clang, compiler-rt) from the “LLVM Download Page”<sup>2</sup>. Then extract them using `tar -zxvf {llvm-3.2.src.tar, clang-3.2.src.tar, compiler-rt-3.2.src.tar}`, and change the llvm source code root directory into src. After that, move the clang source code to src/tools/clang, and move the compiler-rt source to src/projects/compiler-rt as shown as follows,

```
118-165-78-111:Downloads Jonathan$ tar -zxvf clang-3.2.src.tar.gz
118-165-78-111:Downloads Jonathan$ tar -zxvf compiler-rt-3.2.src.tar.gz
118-165-78-111:Downloads Jonathan$ tar -zxvf llvm-3.2.src.tar.gz
118-165-78-111:Downloads Jonathan$ mv llvm-3.2.src src
118-165-78-111:Downloads Jonathan$ mv clang-3.2.src src/tools/clang
118-165-78-111:Downloads Jonathan$ mv compiler-rt-3.2.src src/projects/compiler-rt
118-165-78-111:Downloads Jonathan$ pwd
/Users/Jonathan/Downloads
118-165-78-111:Downloads Jonathan$ ls
clang-3.2.src.tar.gz      llvm-3.2.src.tar.gz
compiler-rt-3.2.src.tar.gz  src
118-165-78-111:Downloads Jonathan$ ls src/tools/
CMakeLists.txt  clang      llvm-as      llvm-dis      llvm-mcmarkup
llvm-readobj    llvm-stub   LLVMBuild.txt  gold        llvm-bcanalyzer
llvm-dwarfdump   llvm-nm     llvm-rtdyld   lto         Makefile
llc             llvm-config  llvm-extract  llvm-objdump  llvm-shlib
macho-dump      bugpoint    lli          llvm-cov     llvm-link
llvm-prof       llvm-size    opt          bugpoint-passes  llvm-ar
llvm-diff       llvm-mc     llvm-ranlib   llvm-stress
118-165-78-111:Downloads Jonathan$ ls src/projects/
CMakeLists.txt  LLVMBuild.txt  Makefile  compiler-rt sample
```

Next, copy the LLVM source to `/Users/Jonathan/llvm/release/src` by executing the terminal command `cp -rf /Users/Jonathan/Downloads/src /Users/Jonathan/ llvm/release/..`

Install Xcode from the Mac App Store. Then install cmake, which can be found here:<sup>3</sup>. Before installing cmake, make sure you can install applications you download from the Internet. Open *System Preferences* → *Security & Privacy*. Click the **lock** to make changes, and under “Allow applications downloaded from:” select the radio button next to “Anywhere.” See [Figure 12.1](#) below for an illustration. You may want to revert this setting after installing cmake.

Alternatively, you can mount the cmake .dmg image file you downloaded, right -click (or control-click) the cmake .pkg package file and click “Open.” Mac OS X will ask you if you are sure you want to install this package, and you can click “Open” to start the installer.

### 12.1.2 Create LLVM.xcodeproj by cmake Graphic UI

We install llvm source code with clang on directory `/Users/Jonathan/llvm/release/` in last section. Now, will generate the LLVM.xcodeproj in this chapter.

Currently, we cannot do debug by lldb with cmake graphic UI operations depicted in this section, but we can do debug by lldb with “section Create LLVM.xcodeproj of supporting cpu0 by terminal cmake command”<sup>4</sup>. Even with that, let’s build LLVM project with cmake graphic UI since this LLVM directory contains the release version for clang and clang++ execution file. First, create LLVM.xcodeproj as [Figure 12.2](#), then click **configure** button to enter [Figure 12.3](#), and then click **Done** button to get [Figure 12.4](#).

---

<sup>2</sup> <http://llvm.org/releases/download.html#3.2>

<sup>3</sup> <http://www.cmake.org/cmake/resources/software.html>

<sup>4</sup> <http://jonathan2251.github.com/lbd/install.html#create-llvm-xcodeproj-of-supporting-cpu0-by-terminal-cmake-command>



Figure 12.1: Adjusting Mac OS X security settings to allow cmake installation.



Figure 12.2: Start to create LLVM.xcodeproj by cmake



Figure 12.3: Create LLVM.xcodeproj by cmake – Set option to generate Xcode project

Click OK from [Figure 12.4](#) and select Cmake 2.8-9.app for CMAKE\_INSTALL\_NAME\_TOOL by click the right side button “...” of that row to get [Figure 12.5](#).

Click Configure button to get [Figure 12.6](#).

Check CLANG\_BUILD\_EXAMPLES, LLVM\_BUILD\_EXAMPLES, and uncheck LLVM\_ENABLE\_PIC as [Figure 12.7](#).

Click Configure button again. If the output result message has no red color, then click Generate button to get [Figure 12.8](#).

### 12.1.3 Build llvm by Xcode

Now, LLVM.xcodeproj is created. Open the cmake\_debug\_build/LLVM.xcodeproj by Xcode and click menu “**Product – Build**” as [Figure 12.9](#).

After few minutes of build, the clang, llc, llvm-as, ..., can be found in cmake\_release\_build/bin/Debug/ as follows.

```
118-165-78-111:cmake_release_build Jonathan$ cd bin/Debug/
118-165-78-111:Debug Jonathan$ pwd
/Users/Jonathan/llvm/release/cmake_release_build/bin/Debug
118-165-78-111:Debug Jonathan$ ls
BrainF          Kaleidoscope-Ch7  clang-tblgen      llvm-dis      llvm-rtdyld
ExceptionDemo   ModuleMaker      count          llvm-dwarfddump  llvm-size
Fibonacci       ParallelJIT     diagtool      llvm-extract    llvm-stress
FileCheck        arcmt-test      llc           llvm-link      llvm-tblgen
FileUpdate       bugpoint       lli           llvm-mc       macho-dump
HowToUseJIT      c-arcmt-test   llvm-ar      llvm-mcmarkup  not
```



Figure 12.4: Create LLVM.xcodeproj by cmake – Before Adjust CMAKE\_INSTALL\_NAME\_TOOL



Figure 12.5: Select Cmake 2.8-9.app



Figure 12.6: Click cmake Configure button first time



Figure 12.7: Check CLANG\_BUILD\_EXAMPLES, LLVM\_BUILD\_EXAMPLES, and uncheck LLVM\_ENABLE\_PIC in cmake



Figure 12.8: Click cmake Generate button second time



Figure 12.9: Click Build button to build LLVM.xcodeproj by Xcode

```

Kaleidoscope-Ch2  c-index-test      llvm-as          llvm-nm          obj2yaml
Kaleidoscope-Ch3  clang           llvm-bcanalyzer  llvm-objdump    opt
Kaleidoscope-Ch4  clang++          llvm-config      llvm-prof       yaml-bench
Kaleidoscope-Ch5  clang-check      llvm-cov         llvm-ranlib     yaml2obj
Kaleidoscope-Ch6  clang-interpreter llvm-diff        llvm-readobj
118-165-78-111:Debug Jonathan$
```

To access those execution files, edit `.profile` (if you `.profile` not exists, please create file `.profile`), save `.profile` to `/Users/Jonathan/`, and enable `$PATH` by command `source .profile` as follows. Please add path `/Applications//Xcode.app/Contents/Developer/usr/bin` to `.profile` if you didn't add it after Xcode download.

```

118-165-65-128:~ Jonathan$ pwd
/Users/Jonathan
118-165-65-128:~ Jonathan$ cat .profile
export PATH=$PATH:/Applications/Xcode.app/Contents/Developer/usr/bin:/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin:/Applications/Graphviz.app/Contents/MacOS:/Users/Jonathan/llvm/release/cmake_release_build/bin/Debug
export WORKON_HOME=$HOME/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh # where Homebrew places it
export VIRTUALENVWRAPPER_VIRTUALENV_ARGS='--no-site-packages' # optional
118-165-65-128:~ Jonathan$
```

### 12.1.4 Create LLVM.xcodeproj of supporting cpu0 by terminal cmake command

We have installed llvm with clang on directory llvm/release/. Now, we want to install llvm with our cpu0 backend code on directory llvm/test/ in this section.

In “section Create LLVM.xcodeproj by cmake Graphic UI”<sup>5</sup>, we create LLVM.xcodeproj by cmake graphic UI. We can create LLVM.xcodeproj by cmake command on terminal also. This book is on the process of merging into llvm trunk but not finished yet. The merged llvm trunk version on my git hub is LLVM 3.3 of merged date 2013/03/28. So, you have to get book example code and the based llvm trunk by git command as follows,

```
git clone https://github.com/Jonathan2251/lbd.git
```

The details of installing Cpu0 backend example code as follows,

```
118-165-78-111:llvm Jonathan$ mkdir test
118-165-78-111:llvm Jonathan$ cd test
118-165-78-111:test Jonathan$ pwd
/Users/Jonathan/llvm/test
118-165-78-111:test Jonathan$ git clone https://github.com/Jonathan2251/lbd.git src
118-165-78-111:test Jonathan$ cp -rf src/lib/Target/Cpu0/
LLVMBackendTutorialExampleCode/src_files_modify/modify/src/* src/.
118-165-78-111:test Jonathan$ grep -R "Cpu0" src/include
...
src/include/llvm/MC/MCEExpr.h:      VK_Cpu0_GPREL,
src/include/llvm/MC/MCEExpr.h:      VK_Cpu0_GOT_CALL,
src/include/llvm/MC/MCEExpr.h:      VK_Cpu0_GOT16,
src/include/llvm/MC/MCEExpr.h:      VK_Cpu0_GOT,
src/include/llvm/MC/MCEExpr.h:      VK_Cpu0_ABS_HI,
src/include/llvm/MC/MCEExpr.h:      VK_Cpu0_ABS_LO,
...
src/lib/MC/MCEExpr.cpp:  case VK_Cpu0_GOT_PAGE: return "GOT_PAGE";
src/lib/MC/MCEExpr.cpp:  case VK_Cpu0_GOT_OFST: return "GOT_OFST";
src/lib/Target/LLVMBuild.txt:subdirectories = ARM CellSPU CppBackend Hexagon
MBLaze MSP430 NVPTX Mips Cpu0 PowerPC Sparc X86 XCore
118-165-78-111:test Jonathan$
```

Next, please copy Cpu0 chapter 2 example code according the following commands,

```
118-165-80-55:test Jonathan$ cd src/lib/Target/Cpu0/LLVMBackendTutorialExampleCode/
118-165-80-55:LLVMBackendTutorialExampleCode Jonathan$ pwd
/Users/Jonathan/llvm/test/src/lib/Target/Cpu0/LLVMBackendTutorialExampleCode
118-165-80-55:LLVMBackendTutorialExampleCode Jonathan$ sh removecpu0.sh
118-165-80-55:LLVMBackendTutorialExampleCode Jonathan$ ls ..
LLVMBackendTutorialExampleCode
118-165-80-55:LLVMBackendTutorialExampleCode Jonathan$ cp -rf Chapter2/* ../.
118-165-80-55:LLVMBackendTutorialExampleCode Jonathan$ cd ..
118-165-80-55:Cpu0 Jonathan$ ls
CMakeLists.txt          Cpu0InstrInfo.td      Cpu0TargetMachine.cpp  TargetInfo
Cpu0.h                  Cpu0RegisterInfo.td  ExampleCode          readme
Cpu0.td                 Cpu0Schedule.td      LLVMBuild.txt
Cpu0InstrFormats.td    Cpu0Subtarget.h     MCTargetDesc
118-165-80-55:Cpu0 Jonathan$
```

Now, it's ready for building llvm/test/src code by command `cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -G "Xcode" ..../src/` as follows. Remind, currently, the `cmake` terminal command can work with lldb debug, but the “section Create LLVM.xcodeproj by cmake Graphic UI”<sup>5</sup> cannot.

<sup>5</sup> <http://jonathan2251.github.com/lbd/install.html#create-llvm-xcodeproj-by-cmake-graphic-ui>

```
118-165-78-111:Target Jonathan$ cd ../../..
118-165-78-111:test Jonathan$ ls
src
118-165-78-111:test Jonathan$ pwd
/Users/Jonathan/llvm/test
118-165-78-111:test Jonathan$ ls
src
118-165-78-111:test Jonathan$ mkdir cmake_debug_build
118-165-78-111:test Jonathan$ cd cmake_debug_build
118-165-78-111:cmake_debug_build Jonathan$ cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -G "Xcode" ../src/
CMake Error: The source directory "/Users/Jonathan/llvm/src" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
118-165-78-111:test Jonathan$ cd cmake_debug_build/
118-165-78-111:cmake_debug_build Jonathan$ cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -G "Xcode" ../src/
-- The C compiler identification is Clang 4.1.0
-- The CXX compiler identification is Clang 4.1.0
-- Check for working C compiler using: Xcode
...
-- Targeting ARM
-- Targeting CellSPU
-- Targeting CppBackend
-- Targeting Hexagon
-- Targeting Mips
-- Targeting Cpu0
-- Targeting MBlaze
-- Targeting MSP430
-- Targeting NVPTX
-- Targeting PowerPC
-- Targeting Sparc
-- Targeting X86
-- Targeting XCore
-- Performing Test SUPPORTS_GLINE_TABLES_ONLY_FLAG
-- Performing Test SUPPORTS_GLINE_TABLES_ONLY_FLAG - Success
-- Performing Test SUPPORTS_NO_C99_EXTENSIONS_FLAG
-- Performing Test SUPPORTS_NO_C99_EXTENSIONS_FLAG - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/Jonathan/llvm/test/cmake_debug_build
118-165-78-111:cmake_debug_build Jonathan$
```

Now, you can build this llvm build with Cpu0 example code by Xcode as the last section indicated.

Since Xcode use clang compiler and lldb instead of gcc and gdb, we can run lldb debug as follows,

```
118-165-65-128:InputFiles Jonathan$ pwd
/Users/Jonathan/LLVMBackendTutorialExampleCode/InputFiles
118-165-65-128:InputFiles Jonathan$ clang -c ch3.cpp -emit-llvm -o ch3.bc
118-165-65-128:InputFiles Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=mips -relocation-model=pic -filetype=asm
ch3.bc -o ch3.mips.s
118-165-65-128:InputFiles Jonathan$ lldb -- /Users/Jonathan/llvm/test/
cmake_debug_build/bin/Debug/llc -march=mips -relocation-model=pic -filetype=
asm ch3.bc -o ch3.mips.s
Current executable set to '/Users/Jonathan/llvm/test/cmake_debug_build/bin/
Debug/llc' (x86_64).
(lldb) b MipsTargetInfo.cpp:19
breakpoint set --file 'MipsTargetInfo.cpp' --line 19
```

```
Breakpoint created: 1: file ='MipsTargetInfo.cpp', line = 19, locations = 1
(lldb) run
Process 6058 launched: '/Users/Jonathan/llvm/test/cmake_debug_build/bin/Debug/
l1c' (x86_64)
Process 6058 stopped
* thread #1: tid = 0x1c03, 0x000000010077f231 l1c'LLVMInitializeMipsTargetInfo
+ 33 at MipsTargetInfo.cpp:20, stop reason = breakpoint 1.1
  frame #0: 0x000000010077f231 l1c'LLVMInitializeMipsTargetInfo + 33 at
MipsTargetInfo.cpp:20
  17
  18     extern "C" void LLVMInitializeMipsTargetInfo() {
  19         RegisterTarget<Triple::mips,
-> 20             /*HasJIT=*/true> X(TheMipsTarget, "mips", "Mips");
  21
  22         RegisterTarget<Triple::mipsel,
  23             /*HasJIT=*/true> Y(TheMipselTarget, "mipsel", "Mipsel");
(lldb) n
Process 6058 stopped
* thread #1: tid = 0x1c03, 0x000000010077f24f l1c'LLVMInitializeMipsTargetInfo
+ 63 at MipsTargetInfo.cpp:23, stop reason = step over
  frame #0: 0x000000010077f24f l1c'LLVMInitializeMipsTargetInfo + 63 at
MipsTargetInfo.cpp:23
  20             /*HasJIT=*/true> X(TheMipsTarget, "mips", "Mips");
  21
  22         RegisterTarget<Triple::mipsel,
-> 23             /*HasJIT=*/true> Y(TheMipselTarget, "mipsel", "Mipsel");
  24
  25         RegisterTarget<Triple::mips64,
  26             /*HasJIT=*/false> A(TheMips64Target, "mips64", "Mips64
[experimental]");
(lldb) print X
(l1vm::RegisterTarget<llvm::Triple::ArchType, true>) $0 = {}
(lldb) quit
118-165-65-128:InputFiles Jonathan$
```

About the lldb debug command, please reference <sup>6</sup> or lldb portal <sup>7</sup>.

### 12.1.5 Setup llvm-lit on iMac

The llvm-lit <sup>8</sup> is the llvm regression test tool. You don't need to set up it if you don't want to do regression test even though this book do the regression test. To set it up correctly in iMac, you need move it from directory bin/llvm-lit to bin/Debug/llvm-lit, and modify llvm-lit as follows,

```
118-165-69-59:bin Jonathan$ pwd
/Users/Jonathan/llvm/test/cmake_debug_build/bin
118-165-69-59:bin Jonathan$ ls
Debug      llvm-lit
118-165-69-59:bin Jonathan$ cp llvm-lit Debug/.
// edit llvm-lit as follows,
  'build_config' : ":",
  'build_mode' : "Debug",
```

---

<sup>6</sup> <http://lldb.llvm.org/lldb-gdb.html>

<sup>7</sup> <http://lldb.llvm.org/>

<sup>8</sup> <http://llvm.org/docs/TestingGuide.html>

## 12.1.6 Install Icarus Verilog tool on iMac

Install Icarus Verilog tool by command `brew install icarus-verilog` as follows,

```
JonathantekiiMac:~ Jonathan$ brew install icarus-verilog
==> Downloading ftp://icarus.com/pub/eda/verilog/v0.9/verilog-0.9.5.tar.gz
#####
# 100.0%
#####
# 100.0%
==> ./configure --prefix=/usr/local/Cellar/icarus-verilog/0.9.5
==> make
==> make installdirs
==> make install
/usr/local/Cellar/icarus-verilog/0.9.5: 39 files, 12M, built in 55 seconds
```

### 12.1.7 Install other tools on iMac

These tools mentioned in this section is for coding and debug. You can work even without these tools. Files compare tools Kdiff3 came from web site <sup>9</sup>. FileMerge is a part of Xcode, you can type FileMerge in Finder – Applications as Figure 12.10 and drag it into the Dock as Figure 12.11.



Figure 12.10: Type FileMerge in Finder – Applications



Figure 12.11: Drag FileMege into the Dock

<sup>9</sup> <http://kdiff3.sourceforge.net>

Download tool Graphviz for display llvm IR nodes in debugging,<sup>10</sup>. We choose mountainlion as Figure 12.12 since our iMac is Mountain Lion.



Figure 12.12: Download graphviz for llvm IR node display

After install Graphviz, please set the path to .profile. For example, we install the Graphviz in directory /Applications/Graphviz.app/Contents/MacOS/, so add this path to /User/Jonathan/.profile as follows,

```
118-165-12-177:InputFiles Jonathan$ cat /Users/Jonathan/.profile
export PATH=$PATH:/Applications/Xcode.app/Contents/Developer/usr/bin:
/Applications/Graphviz.app/Contents/MacOS:/Users/Jonathan/llvm/release/
cmake_release_build/bin/Debug
```

The Graphviz information for llvm is in the section “SelectionDAG Instruction Selection Process” of<sup>11</sup> and the section “Viewing graphs while debugging code” of<sup>12</sup>. TextWrangler is for edit file with line number display and dump binary file like the obj file, \*.o, that will be generated in chapter of Other instructions. You can download from App Store. To dump binary file, first, open the binary file, next, select menu “File – Hex Front Document” as Figure 12.13. Then select “Front document’s file” as Figure 12.14.

Install binutils by command brew install binutils as follows,

```
118-165-77-214:~ Jonathan$ brew install binutils
==> Downloading http://ftpmirror.gnu.org/binutils/binutils-2.22.tar.gz
#####
100.0%
==> ./configure --program-prefix=g --prefix=/usr/local/Cellar/binutils/2.22
--infodir=/usr/local
==> make
==> make install
/usr/local/Cellar/binutils/2.22: 90 files, 19M, built in 4.7 minutes
118-165-77-214:~ Jonathan$ ls /usr/local/Cellar/binutils/2.22
COPYING      README      lib
ChangeLog     bin        share
```

<sup>10</sup> [http://www.graphviz.org/Download\\_macos.php](http://www.graphviz.org/Download_macos.php)

<sup>11</sup> <http://llvm.org/docs/CodeGenerator.html>

<sup>12</sup> <http://llvm.org/docs/ProgrammersManual.html>



Figure 12.13: Select Hex Dump menu



Figure 12.14: Select Front document's file in TextWrangler

```
INSTALL_RECEIPT.json      include      x86_64-apple-darwin12.2.0
118-165-77-214:binutils-2.23 Jonathan$ ls /usr/local/Cellar/binutils/2.22/bin
gaddr2line  gc++filt  gnm  gobjdump  greadelf  gstrings
gar  gelfedit  gobjcopy  granlib  gsize  gstrip
```

## 12.2 Setting Up Your Linux Machine

### 12.2.1 Install LLVM 3.2 release build on Linux

First, install the llvm release build by,

1. Untar llvm source, rename llvm source with src.
2. Untar clang and move it src/tools/clang.
3. Untar compiler-rt and move it to src/project/compiler-rt.

Next, build with cmake command, `cmake -DCMAKE_BUILD_TYPE=Release -DCLANG_BUILD_EXAMPLES=ON -DLLVM_BUILD_EXAMPLES=ON -G "Unix Makefiles" ..../src/`, as follows.

```
[Gamma@localhost cmake_release_build]$ cmake -DCMAKE_BUILD_TYPE=Release
-DCLANG_BUILD_EXAMPLES=ON -DLLVM_BUILD_EXAMPLES=ON -G "Unix Makefiles" ..../src/
-- The C compiler identification is GNU 4.7.0
...
-- Constructing LLVMBuild project information
-- Targeting ARM
-- Targeting CellSPU
-- Targeting CppBackend
-- Targeting Hexagon
-- Targeting Mips
-- Targeting MBBlaze
-- Targeting MSP430
-- Targeting PowerPC
-- Targeting PTX
-- Targeting Sparc
-- Targeting X86
-- Targeting XCore
-- Clang version: 3.2
-- Found Subversion: /usr/bin/svn (found version "1.7.6")
-- Configuring done
-- Generating done
-- Build files have been written to: /usr/local/llvm/release/cmake_release_build
```

After cmake, run command make, then you can get clang, llc, llvm-as, ..., in cmake\_release\_build/bin/ after a few tens minutes of build. Next, edit /home/Gamma/.bash\_profile with adding /usr/local/llvm/release/cmake\_release\_build/bin to PATH to enable the clang, llc, ..., command search path, as follows,

```
[Gamma@localhost ~]$ pwd
/home/Gamma
[Gamma@localhost ~]$ cat .bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
  . ~/.bashrc
fi
```

```
# User specific environment and startup programs

PATH=$PATH:/usr/local/sphinx/bin:/usr/local/llvm/release/cmake_release_build/bin:
/opt/mips_linux_toolchain_clang/mips_linux_toolchain/bin:$HOME/.local/bin:
$HOME/bin

export PATH
[Gamma@localhost ~]$ source .bash_profile
[Gamma@localhost ~]$ $PATH
bash: /usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:
/usr/sbin:/usr/local/sphinx/bin:/opt/mips_linux_toolchain_clang/mips_linux_tool
chain/bin:/home/Gamma/.local/bin:/home/Gamma/bin:/usr/local/sphinx/bin:/usr/
local/llvm/release/cmake_release_build/bin
```

## 12.2.2 Install cpu0 debug build on Linux

This book is on the process of merging into llvm trunk but not finished yet. The merged llvm trunk version on my git hub is LLVM 3.3 of merged date 2013/03/28. So, you have to get book example code and the based llvm trunk by git command as follows,

```
git clone https://github.com/Jonathan2251/lbd.git
```

The details of installing Cpu0 backend example code according the following list steps, and the corresponding commands shown as below,

- 1) Enter /usr/local/llvm/test/ and get Cpu0 example code as well as the llvm trunk of Cpu0 based.
2. Make dir Cpu0 in src/lib/Target and download example code.
- 3) Update my modified files to support cpu0 by command, cp -rf /usr/local/llvm/test/src/lib/Target/Cpu0/LLVMBackendTutorialExampleCode/ src\_files\_modify/modify/src ..
- 4) Check step 2 is effective by command grep -R "Cpu0" . | more `. I add the Cpu0 backend support, so check with grep.
- 5) Enter src/lib/Target/Cpu0/, generate LLVMBackendTutorialExampleCode, and copy example code LLVMBackendTutorialExampleCode/2/Cpu0 to the directory by commands cd src/lib/Target/Cpu0/ and cp -rf LLVMBackendTutorialExample/Chapter2/\* ../..
- 6) Remove clang from /usr/local/llvm/test/src/tools/clang, and mkdir test/cmake\_debug\_build. Without this you will waste extra time for command make in cpu0 example code build.

```
118-165-78-111:llvm Jonathan$ mkdir test
118-165-78-111:llvm Jonathan$ cd test
[Gamma@localhost test]$ pwd
/usr/local/llvm/test
[Gamma@localhost test]$ git clone https://github.com/Jonathan2251/lbd.git src
[Gamma@localhost test]$ cp -rf src/lib/Target/Cpu0/
LLVMBackendTutorialExampleCode/src_files_modify/modify/src/* src/..
[Gamma@localhost test]$ grep -R "cpu0" src/include
src/include//llvm/ADT/Triple.h:    cpu0,      // For Tutorial Backend Cpu0
src/include//llvm/MC/MCExpr.h:    VK_Cpu0_GPREL,
src/include//llvm/MC/MCExpr.h:    VK_Cpu0_GOT_CALL,
...
[Gamma@localhost test]$ cd src/lib/Target/Cpu0/LLVMBackendTutorialExampleCode/
[Gamma@localhost LLVMBackendTutorialExampleCode]$ sh removecpu0.sh
[Gamma@localhost LLVMBackendTutorialExampleCode]$ ls ../
```

```
LLVMBackendTutorialExampleCode
[Gamma@localhost LLVMBackendTutorialExampleCode]$ cp -rf Chapter2/* ../.
[Gamma@localhost LLVMBackendTutorialExampleCode]$ ls ..
CMakeLists.txt          Cpu0InstrInfo.td      Cpu0TargetMachine.cpp  TargetInfo
Cpu0.h                  Cpu0RegisterInfo.td  ExampleCode          readme
Cpu0.td                 Cpu0Schedule.td      LLVMBuild.txt
Cpu0InstrFormats.td    Cpu0Subtarget.h      MCTargetDesc
[Gamma@localhost Cpu0]$ cd ../../../../..
[Gamma@localhost test]$ pwd
/usr/local/llvm/test
```

Now, go into directory `llvm/test/`, create directory `cmake_debug_build` and do `cmake` like build the `llvm/release`, but we do Debug build and use clang as our compiler instead, as follows,

```
[Gamma@localhost test]$ pwd
/usr/local/llvm/test
[Gamma@localhost test]$ mkdir cmake_debug_build
[Gamma@localhost test]$ cd cmake_debug_build/
[Gamma@localhost cmake_debug_build]$ cmake
-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang
-DCMAKE_BUILD_TYPE=Debug -G "Unix Makefiles" ../src/
-- The C compiler identification is Clang 3.2.0
-- The CXX compiler identification is Clang 3.2.0
-- Check for working C compiler: /usr/local/llvm/release/cmake_release_build/bin/
clang
-- Check for working C compiler: /usr/local/llvm/release/cmake_release_build/bin/
clang
-- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/local/llvm/release/cmake_release_build/
bin/clang++
-- Check for working CXX compiler: /usr/local/llvm/release/cmake_release_build/
bin/clang++
-- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done ...
-- Targeting Mips
-- Targeting Cpu0
-- Targeting MBBlaze
-- Targeting MSP430
-- Targeting PowerPC
-- Targeting PTX
-- Targeting Sparc
-- Targeting X86
-- Targeting XCore
-- Configuring done
-- Generating done
-- Build files have been written to: /usr/local/llvm/test/cmake_debug
_build
[Gamma@localhost cmake_debug_build]$
```

Then do `make` as follows,

```
[Gamma@localhost cmake_debug_build]$ make
Scanning dependencies of target LLVMSupport
[ 0%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/APFloat.cpp.o
[ 0%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/APInt.cpp.o
[ 0%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/APSInt.cpp.o
```

```
[ 0%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Allocator.cpp.o
[ 1%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/BlockFrequency.cpp.o ...
Linking CXX static library ../../lib/libgtest.a
[100%] Built target gtest
Scanning dependencies of target gtest_main
[100%] Building CXX object utils/unittest/CMakeFiles/gtest_main.dir/UnitTestMain/
TestMain.cpp.o Linking CXX static library ../../lib/libgtest_main.a
[100%] Built target gtest_main
[Gamma@localhost cmake_debug_build]$
```

Now, we are ready for the cpu0 backend development. We can run gdb debug as follows.

If your setting has anything about gdb errors, please follow the errors indication (maybe need to download gdb again).

Finally, try gdb as follows.

```
[Gamma@localhost InputFiles]$ pwd
/usr/local/llvm/test/src/lib/Target/Cpu0/ExampleCode/
LLVMBackendTutorialExampleCode/InputFiles
[Gamma@localhost InputFiles]$ clang -c ch3.cpp -emit-llvm -o ch3.bc
[Gamma@localhost InputFiles]$ gdb -args /usr/local/llvm/test/
cmake_debug_build/bin/llc -march=cpu0 -relocation-model=pic -filetype=obj
ch3.bc -o ch3.cpu0.o
GNU gdb (GDB) Fedora (7.4.50.20120120-50.fc17)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/llvm/test/cmake_debug_build/bin/llc.
..done.
(gdb) break MipsTargetInfo.cpp:19
Breakpoint 1 at 0xd54441: file /usr/local/llvm/test/src/lib/Target/
Mips/TargetInfo/MipsTargetInfo.cpp, line 19.
(gdb) run
Starting program: /usr/local/llvm/test/cmake_debug_build/bin/llc
-march=cpu0 -relocation-model=pic -filetype=obj ch3.bc -o ch3.cpu0.o
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, LLVMInitializeMipsTargetInfo ()
  at /usr/local/llvm/test/src/lib/Target/Mips/TargetInfo/MipsTargetInfo.cpp:20
20          /*HasJIT=*/true> X(TheMipsTarget, "mips", "Mips");
(gdb) next
23          /*HasJIT=*/true> Y(TheMipselTarget, "mipsel", "Mipsel");
(gdb) print X
$1 = {<No data fields>}
(gdb) quit
A debugging session is active.

Inferior 1 [process 10165] will be killed.

Quit anyway? (y or n) y
```

```
[Gamma@localhost InputFiles]$
```

### **12.2.3 Install Icarus Verilog tool on Linux**

Download the snapshot version of Icarus Verilog tool from web site, <ftp://icarus.com/pub/eda/verilog/snapshots> or go to <http://iverilog.icarus.com/> and click snapshot version link. Follow the INSTALL file guide to install it.

### **12.2.4 Install other tools on Linux**

Download Graphviz from <sup>13</sup> according your Linux distribution. Files compare tools Kdiff3 came from web site <sup>8</sup>.

---

<sup>13</sup> <http://www.graphviz.org/Download.php>

# APPENDIX B: LLVM CHANGES

This chapter show you the old version of LLVM API and structure those affect Cpu0 back end. Mips changes also mentioned in this chapter. If you work on the latest LLVM version only, please skip this chapter. LLVM version 3.2 released in 20 December, 2012. Version 3.1 released in 22 May, 2012. This book started from September, 2012. This chapter discuss the old version start from 3.1.

## 13.1 Difference between 3.2 and 3.1

### 13.1.1 API difference

Difference in API as follows,

1. In llvm 3.1, the parameters of call back function for Target Registration is different from 3.2. LLVM 3.2 add parameter “MCRegisterInfo” in the callback function for RegisterMCCodeEmitter() and “StringRef” in the callback function for RegisterMCAsmBackend. In other word, you can get more information of registers and CPU (type of StringRef) for your backend after this registration. Of course, these information came from TabGen which source is the Target Description .td you write.

```
extern "C" void LLVMInitializeCpu0TargetMC() {
    ...
    // Register the MC Code Emitter
    TargetRegistry::RegisterMCCodeEmitter(TheCpu0Target,
        createCpu0MCCodeEmitterEB);
    TargetRegistry::RegisterMCCodeEmitter(TheCpu0elTarget,
        createCpu0MCCodeEmitterEL);
    ...

    // Register the asm backend.
    TargetRegistry::RegisterMCAsmBackend(TheCpu0Target,
        createCpu0AsmBackendEB32);
    TargetRegistry::RegisterMCAsmBackend(TheCpu0elTarget,
        createCpu0AsmBackendEL32);
    ...
}
```

Version 3.1 as follows,

```
MCCodeEmitter *createCpu0MCCodeEmitterEB(const MCInstrInfo &MCII,
    const MCSubtargetInfo &STI,
    MCContext &Ctx);
MCCodeEmitter *createCpu0MCCodeEmitterEL(const MCInstrInfo &MCII,
    const MCSubtargetInfo &STI,
```

```

    MCContext &Ctx);

MCAsmBackend *createCpu0AsmBackendEB32(const Target &T, StringRef TT);
MCAsmBackend *createCpu0AsmBackendEL32(const Target &T, StringRef TT);

```

Version 3.2 as follows,

```

MCCodeEmitter *createCpu0MCCodeEmitterEB(const MCInstrInfo &MCII,
                                         const MCRegisterInfo &MRI,
                                         const MCSubtargetInfo &STI,
                                         MCContext &Ctx);
MCCodeEmitter *createCpu0MCCodeEmitterEL(const MCInstrInfo &MCII,
                                         const MCRegisterInfo &MRI,
                                         const MCSubtargetInfo &STI,
                                         MCContext &Ctx);

MCAsmBackend *createCpu0AsmBackendEB32(const Target &T, StringRef TT,
                                         StringRef CPU);
MCAsmBackend *createCpu0AsmBackendEL32(const Target &T, StringRef TT,
                                         StringRef CPU);

```

2. Change LowerCall() parameters as follows,

Version 3.1 as follows,

```

SDValue
LowerCall(SDValue Chain, SDValue Callee,
          CallingConv::ID CallConv, bool isVarArg,
          bool doesNotRet, bool &isTailCall,
          const SmallVectorImpl<ISD::OutputArg> &Outs,
          const SmallVectorImpl<SDValue> &OutVals,
          const SmallVectorImpl<ISD::InputArg> &Ins,
          DebugLoc dl, SelectionDAG &DAG,
          SmallVectorImpl<SDValue> &InVals) const;

```

Version 3.2 as follows,

```

LowerCall(TargetLowering::CallLoweringInfo &CLI,
          SmallVectorImpl<SDValue> &InVals) const;

```

The TargetLowering::CallLoweringInfo is type of structure/class which contains the old version 3.1 parameters. You can get the 3.1 same information by,

```

SDValue
Cpu0TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
                               SmallVectorImpl<SDValue> &InVals) const {
    SelectionDAG &DAG
                  = CLI.DAG;
    DebugLoc &dl
                  = CLI.DL;
    SmallVector<ISD::OutputArg, 32> &Outs
                  = CLI.Outs;
    SmallVector<SDValue, 32> &OutVals
                  = CLI.OutVals;
    SmallVector<ISD::InputArg, 32> &Ins
                  = CLI.Ins;
    SDValue InChain
                  = CLI.Chain;
    SDValue Callee
                  = CLI.Callee;
    bool &isTailCall
                  = CLI.IsTailCall;
    CallingConv::ID CallConv
                  = CLI.CallConv;
    bool isVarArg
                  = CLI.IsVarArg;
    ...
}

```

As chapter “function call”, the role of LowerCall() is handling the outgoing arguments passing in function call.

3. The TargetData structure of LLVMTargetMachine has been renamed to DataLayout and the corresponding function name change as follows,

```
// 3.1
class Cpu0TargetMachine : public LLVMTargetMachine {
    ...
    virtual const TargetData *getTargetData() const
    { return &DataLayout; }
    ...
}

// 3.2
class Cpu0TargetMachine : public LLVMTargetMachine {
    ...
    virtual const DataLayout *getDataLayout() const
    { return &DL; }
    ...
}
```

4. The “add a pass” API change as follows,

```
// 3.1
TargetPassConfig *Cpu0TargetMachine::createPassConfig(PassManagerBase &PM) {
    return new Cpu0PassConfig(this, PM);
}

// Install an instruction selector pass using
// the ISelDag to gen Cpu0 code.
bool Cpu0PassConfig::addInstSelector() {
    PM->add(createCpu0ISelDag(getCpu0TargetMachine()));
    return false;
}

// 3.2
// Install an instruction selector pass using
// the ISelDag to gen Cpu0 code.
bool Cpu0PassConfig::addInstSelector() {
    addPass(createCpu0ISelDag(getCpu0TargetMachine()));
    return false;
}
```

5. Above changes is mandatory. There are some changes are adviced to follow. Like the below. We comment the “Change Reason” in the following code. You can get the “Change Reason” by internet searching.

```
MCOObjectWriter *createObjectWriter(raw_ostream &OS) const {
    // Change Reason:
    // Reduce the exposure of Triple::OSType in the ELF object writer. This will
    // avoid including ADT/Triple.h in many places when the target specific bits
    // are moved.
    return createCpu0ELFObjectWriter(OS,
        MCELFObjectTargetWriter::getOSABI(OSType), IsLittle);
    // Even though, the old function still work on LLVM version 3.2
    // return createCpu0ELFObjectWriter(OS, OSType, IsLittle);
}

class Cpu0MCCodeEmitter : public MCCodeEmitter {
    // #define LLVM_DELETED_FUNCTION
    // LLVM_DELETED_FUNCTION - Expands to = delete if the compiler supports it.
    // Use to mark functions as uncallable. Member functions with this should be
```

```

// declared private so that some behavior is kept in C++03 mode.
// class DontCopy { private: DontCopy(const DontCopy&) LLVM_DELETED_FUNCTION;
// DontCopy &operator =(const DontCopy&) LLVM_DELETED_FUNCTION; public: ... };
// Definition at line 79 of file Compiler.h.

Cpu0MCCodeEmitter(const Cpu0MCCodeEmitter &) LLVM_DELETED_FUNCTION;
void operator=(const Cpu0MCCodeEmitter &) LLVM_DELETED_FUNCTION;
// Even though, the old function still work on LLVM version 3.2
// Cpu0MCCodeEmitter(const Cpu0MCCodeEmitter &); // DO NOT IMPLEMENT
// void operator=(const Cpu0MCCodeEmitter &); // DO NOT IMPLEMENT
...

```

### 13.1.2 Structure difference

1. Change the name from CPUREgsRegisterClass (3.1) to CPUREgsRegClass (3.2). The source of register class information came from your backend <register>.td. The new name CPUREgsRegClass is “**call by reference**” type in C++ while the old CPUREgsRegisterClass is “**pointer**” type. The “reference” type use “.” while pointer type use “->” as follows,

```

// 3.2
unsigned CPUREgSize = Cpu0::CPUREgsRegClass.getSize();
// 3.1
unsigned CPUREgSize = Cpu0::CPUREgsRegisterClass->getSize();

```

2. The TargetData structure has been renamed to DataLayout and moved to VMCore to remove a dependency on Target<sup>1</sup>.

```

// 3.1
#include "llvm/Target/TargetData.h"
class Cpu0TargetMachine : public LLVMTargetMachine {
...
  const TargetData DataLayout; // Calculates type size & alignment
...
}

// 3.2
#include "llvm/DataLayout.h"
class Cpu0TargetMachine : public LLVMTargetMachine {
...
  const DataLayout DL; // Calculates type size & alignment
...
}

```

3. DebugInfo.h is moved.

```

// 3.1
#include "llvm/Analysis/DebugInfo.h

// 3.2
#include "llvm/DebugInfo.h"

```

---

<sup>1</sup> <http://llvm.org/releases/3.2/docs/ReleaseNotes.html>

### 13.1.3 Verify the Cpu0 for difference

3.1\_src\_files\_modify include the LLVM 3.1 those files modified for Cpu0 backend support. Please copy 3.1\_src\_files\_modify/src\_files\_modify/src to your LLVM 3.1 source directory. The llvm3.1/Cpu0 is the code for LLVM version 3.1. File ch\_all.cpp include the all C/C++ operators, global variable, struct, array, control statement and function call test. Run llvm3.1/Cpu0 with ch\_all.cpp will get the assembly code as below. By compare it with the output of 3.2 result, you can verify the correction as below. The difference came from 3.2 correcting the label number in order.

#### LLVMBackendTutorialExampleCode/InputFiles/ch\_all.cpp

```

1  #if 1
2  //#include <stdio.h>
3  #include <stdarg.h>
4  #include <stdlib.h>
5  #endif
6  #if 1
7  int test_operators()
8  {
9      int a = 5;
10     int b = 2;
11     int c = 0;
12     int d = 0;
13     int e, f, g, h, i, j, k, l = 0;
14     unsigned int a1 = -5, k1 = 0, f1 = 0;
15
16     c = a + b;
17     d = a - b;
18     e = a * b;
19     f = a / b;
20     f1 = a1 / b;
21     g = (a & b);
22     h = (a | b);
23     i = (a ^ b);
24     j = (a << 2);
25     int j1 = (a1 << 2);
26     k = (a >> 2);
27     k1 = (a1 >> 2);
28
29     b = !a;
30     int* p = &b;
31     b = (b+1)%a;
32     c = rand();
33 //   c = 12;
34     b = (b+1)%c;
35
36     return c;
37 }
38 #endif
39 #if 1
40 int gI = 100;
41
42 int test_globalvar()
43 {
44     int c = 0;
45

```

```
46     c = gI;
47
48     return c;
49 }
50 #endif
51 #if 1
52 struct Date
53 {
54     int year;
55     int month;
56     int day;
57 };
58
59 Date date = {2012, 10, 12};
60 int a[3] = {2012, 10, 12};
61
62 int test_struct()
63 {
64     int day = date.day;
65     int i = a[1];
66
67     return 0;
68 }
69 #endif
70 #if 1
71 int test_control()
72 {
73     int a = 3;
74
75     if (a != 0)
76         a++;
77     goto L1;
78     a++;
79 L1:
80     a--;
81
82     return a;
83 }
84 #endif
85 #if 1
86 template<class T>
87 T sum(T amount, ...)
88 {
89     T i = 0;
90     T val = 0;
91     T sum = 0;
92
93     va_list vl;
94     va_start(vl, amount);
95     for (i = 0; i < amount; i++)
96     {
97         val = va_arg(vl, T);
98         sum += val;
99     }
100    va_end(vl);
101
102    return sum;
103 }
```

```

104  #endif
105  int main()
106  {
107      int result = 0;
108      result = test_operators();
109      result = test_globalvar();
110      result = test_struct();
111      result = test_control();
112      int a = sum<int>(6, 1, 2, 3, 4, 5, 6);
113 //  printf("a = %d\n", a);
114
115      return result;
116  }

118-165-78-60:InputFiles Jonathan$ diff ch_all.3.1.cpu0.s ch_all.3.2.cpu0.s
262c262
<    jge $BB4_7
---
>    jge $BB4_6
285d284
< # BB#6:                                # in Loop: Header=BB4_1 Depth=1
290c289
< $BB4_7:
---
> $BB4_6:
295,297c294,296
<    jne $BB4_9
<    jmp $BB4_8
< $BB4_8:                                # %SP_return
---
>    jne $BB4_8
>    jmp $BB4_7
> $BB4_7:                                # %SP_return
301c300
< $BB4_9:                                # %CallStackCheckFailBlk
---
> $BB4_8:                                # %CallStackCheckFailBlk

// ch_all.3.2.cpu0.s
...
$BB4_5:                                # in Loop: Header=BB4_1 Depth=1
    ld $3, 0($3)
    st $3, 36($sp)
    ld $4, 32($sp)
    add $3, $4, $3
    st $3, 32($sp)
    ld $3, 40($sp)
    addiu $3, $3, 1
    st $3, 40($sp)
    jmp $BB4_1
$BB4_6:
    ld $2, %got(__stack_chk_guard) ($gp)
    ld $2, 0($2)
    ld $3, 48($sp)
    cmp $2, $3
    jne $BB4_8
    jmp $BB4_7
$BB4_7:                                # %SP_return

```

```

...
// ch_all.3.1.cpu0.s
...
$BB4_5:                                # in Loop: Header=BB4_1 Depth=1
    ld  $3, 0($3)
    st  $3, 36($sp)
    ld  $4, 32($sp)
    add $3, $4, $3
    st  $3, 32($sp)
# BB#6:                                # in Loop: Header=BB4_1 Depth=1
    ld  $3, 40($sp)
    addiu $3, $3, 1
    st  $3, 40($sp)
    jmp $BB4_1
$BB4_7:
    ld  $2, %got(__stack_chk_guard) ($gp)
    ld  $2, 0($2)
    ld  $3, 48($sp)
    cmp $2, $3
    jne $BB4_9
    jmp $BB4_8
$BB4_8:                                # %SP_return
...

```

## 13.2 Difference in Mips backend

In 3.1, Mips use **".cupload"** and **".cprestore"** pseudo assembly code. It removes these pseudo assembly code in 3.2. This change is good for spim (mips assembly code simulator) which run for Mips assembly code. According the theory of “System Software”, some pseudo assembly code (especially for those not in standard) cannot be translated by assembler. It will break down in assembly code simulator. Run ch\_mips\_llvm3.2\_globalvar\_changes.cpp with llvm 3.1 and 3.2 for mips, you will find the **".cprestore"** is removed directly since 3.2 use other register instead of \$gp in the callee function (as example, it use \$1 in f() and remove **.gprestore** in sum\_i()). **".cupload"** is replaced with instructions as follows,

```

// llvm 3.1 mips
.cupload $25

// llvm 3.2 mips
    lui $2, %hi(_gp_disp)
    addiu $2, $2, %lo(_gp_disp)
    ...
    addu $gp, $2, $25

```

Reference <sup>2</sup> for **".cupload"**, **".cprestore"** and **"\_gp\_disp"**.

---

<sup>2</sup> <http://jonathan2251.github.com/lbd/funccall.html#handle-gp-register-in-pic-addressing-mode>

# APPENDIX C: INSTRUCTIONS DISCUSS

This chapter discuss other backend instructions.

## 14.1 Implicit operand

LLVM IR is a 3 address form (4 tuple <opcode, %1, %2, %3>) which match the current RISC cpu0 (like Mips). So, it seems no “move” IR DAG. Because “move a, b” can be replaced by “lw a, b\_offset(\$sp)” for local variable, or can be replaced by “addu \$a, \$0,\$ b”. The cpu0 is same as Mips. Base on this reason, the move instruction is useless even though it supplied by the cpu0 author.

For the old CPU or Micro Processor (MCU), like PIC, 8051 and old intel processor. These CPU/MCU need memory saving and not aim to high level of program (such as C) only (they aim to assembly code program too). These CPU/MCU need implicit operand, maybe use ACC (accumulate register).

It will translate,

c = a + b + d;

into,

```
mtacc  Addr(12) // Move b To Acc
add    Addr(16) // Add a To Acc
add    Addr(4)  // Add d To Acc
mfacc  Addr(8)  // Move Acc To c
```

Above code also can be coded by programmer who use assembly language directly in MCU or BIOS programm since maybe the code size is just 4KB or less.

Since cpu0 is a 32 bits (code size can be 4GB), it use Store and Load instructions for memory address access only. Other instructions (include add), use register to register style operation. We change the implicit operand support in this section. It's just a demonstration with this design, not fully support. The purpose is telling reader how to implement this style of CPU/MCU backend. Run Chapter8\_4\_2/ with ch\_move.cpp will get the following result,

[LLVMBackendTutorialExampleCode/InputFiles/ch\\_move.cpp](#)

```
1 int main()
2 {
3     int a = 1;
4     int b = 2;
5     int c = 0;
6     int d = 4;
7     int e = 5;
```

```
8
9     c = a + b + d + e;
10
11    return 0;
12 }

ld  $3, 12($sp) // $3 is a
ld  $4, 16($sp) // $4 is b
mtacc $4          // Move b To Acc
add $3            // Add a To Acc
ld  $4, 4($sp)   // $4 is d
add $4            // Add d To Acc
mfacc $3          // Move Acc to $3
addiu $3, $3, 5  // Add e(=5) to $3
st   $3, 8($sp)
```

To support this implicit operand, ACC. The following code is added to Chapter8\_4\_2.cpp.

### LLVMBackendTutorialExampleCode/Chapter8\_4\_2/Cpu0RegisterInfo.td

```
let Namespace = "Cpu0" in {
    // General Purpose Registers
    def ZERO : Cpu0GPRReg< 0, "ZERO">, DwarfRegNum<[0]>;
    ...
    def ACC : Register<"acc">, DwarfRegNum<[20]>;
}
...
def RACC : RegisterClass<"Cpu0", [i32], 32, (add ACC)>;
```

### LLVMBackendTutorialExampleCode/Chapter8\_4\_2/Cpu0InstrInfo.td

```
class MoveFromACC<bits<8> op, string instr_asm, RegisterClass RC,
    list<Register> UseRegs>:
    FL<op, (outs RC:$ra), (ins),
        !strconcat(instr_asm, "\t$ra"), [], IIAlu> {
    let rb = 0;
    let imm16 = 0;
    let Uses = UseRegs;
    let neverHasSideEffects = 1;
}

class MoveToACC<bits<8> op, string instr_asm, RegisterClass RC,
    list<Register> DefRegs>:
    FL<op, (outs), (ins RC:$ra),
        !strconcat(instr_asm, "\t$ra"), [], IIAlu> {
    let rb = 0;
    let imm16 = 0;
    let Defs = DefRegs;
    let neverHasSideEffects = 1;
}

class ArithLogicUniR2<bits<8> op, string instr_asm, RegisterClass RC1,
    RegisterClass RC2, list<Register> DefRegs>:
    FL<op, (outs), (ins RC1:$accum, RC2:$ra),
        !strconcat(instr_asm, "\t$ra"), [], IIAlu> {
```

```

let rb = 0;
let imm16 = 0;
let Defs = DefRegs;
let neverHasSideEffects = 1;
}

...
//def ADD      : ArithLogicR<0x13, "add", add, IIAlu, CPURegs, 1>;
...
def MFACC : MoveFromACC<0x44, "mfacc", CPURegs, [ACC]>;
def MTACC : MoveToACC<0x45, "mtacc", CPURegs, [ACC]>;
def ADD   : ArithLogicUniR2<0x46, "add", RACC, CPURegs, [ACC]>;
...
def : Pat<(add RACC:$lhs, CPURegs:$rhs),
        (ADD RACC:$lhs, CPURegs:$rhs)>;
def : Pat<(add CPURegs:$lhs, CPURegs:$rhs),
        (ADD (MTACC CPURegs:$lhs), CPURegs:$rhs)>;

```

### LLVMBackendTutorialExampleCode/Chapter8\_4\_2/Cpu0InstrInfo.cpp

```

// Called when DestReg and SrcReg belong to different Register Class.
void Cpu0InstrInfo::
copyPhysReg(MachineBasicBlock &MBB,
            MachineBasicBlock::iterator I, DebugLoc DL,
            unsigned DestReg, unsigned SrcReg,
            bool KillSrc) const {
    unsigned Opc = 0, ZeroReg = 0;

    if (Cpu0::CPURegsRegClass.contains(DestReg)) { // Copy to CPU Reg.
    ...
    else if (SrcReg == Cpu0::ACC)
        Opc = Cpu0::MFACC, SrcReg = 0;
    }
    else if (Cpu0::CPURegsRegClass.contains(SrcReg)) { // Copy from CPU Reg.
    ...
    else if (DestReg == Cpu0::ACC)
        Opc = Cpu0::MTACC, DestReg = 0;
    }
    ...
}

```

Explain the code as follows,

```

ld $3, 12($sp) // $3 is a
ld $4, 16($sp) // $4 is b

mtacc $4      // Move b To Acc
// After meet first a+b IR, it call this pattern,
// def : Pat<(add CPURegs:$lhs, CPURegs:$rhs),
//           (ADD (MTACC CPURegs:$lhs), CPURegs:$rhs)>;
// After this pattern translation, the DestReg class change from CPU0Regs to
// RACC according the following code of copyPhysReg(). copyPhysReg() is called
// when DestReg and SrcReg belong to different Register Class.
//
// if (DestReg)
//     MIB.addReg(DestReg, RegState::Define);
//

```

```
// if (ZeroReg)
//     MIB.addReg(ZeroReg);
//
// if (SrcReg)
//     MIB.addReg(SrcReg, getKillRegState(KillSrc));

add $3      // Add a To Acc
// Apply this pattern since the DestReg class is RACC
// def : Pat<(add RACC:$lhs, CPUREgs:$rhs),
//           (ADD RACC:$lhs, CPUREgs:$rhs)>

ld $4, 4($sp) // $4 is d
add $4      // Add d To Acc
// Apply the pattern as above since the DestReg class is RACC

mfacc $3      // Move Acc to $3
// Compiler/backend can use ADDiu since e is 5. But it add MFACC before ADDiu
// since the DestReg class is RACC. Translate to CPU0Regs class by MFACC and
// apply ADDiu since ADDiu use CPU0Regs as operands.
addiu $3, $3, 5 // Add e(=5) to $3
st $3, 8($sp)
```

# TODO LIST

---

**Todo**

Add info about LLVM documentation licensing.

---

(The *original entry* is located in /Users/Jonathan/test/lbd/source/about.rst, line 151.)

---

**Todo**

Find information on debugging LLVM within Xcode for Macs.

---

(The *original entry* is located in /Users/Jonathan/test/lbd/source/install.rst, line 36.)

---

**Todo**

Find information on building/debugging LLVM within Eclipse for Linux.

---

(The *original entry* is located in /Users/Jonathan/test/lbd/source/install.rst, line 37.)

---

**Todo**

Fix centering for figure captions.

---

(The *original entry* is located in /Users/Jonathan/test/lbd/source/install.rst, line 46.)

---

**Todo**

I might want to re-edit the following paragraph

---

(The *original entry* is located in /Users/Jonathan/test/lbd/source/llvmstructure.rst, line 679.)

---



# BOOK EXAMPLE CODE

The example code is available in:

<http://jonathan2251.github.com/lbd/LLVMBackendTutorialExampleCode.tar.gz>



---

CHAPTER  
**SEVENTEEN**

---

## **ALTERNATE FORMATS**

The book is also available in the following formats: