



## **BANGALORE INSTITUTE OF TECHNOLOGY**

K.R. Road, V.V.Puram, Bengaluru-560 004

### **DEPARTMENT OF COMPUTER SCIENCE & ENGG**

#### **SYSTEM SOFTWARE AND COMPILER DESIGN**

#### **NOTES**

**SUBJECT CODE: 15CS63**

**By**

**Mrs. Hemavathi. P  
Assistant Professor  
Department of CSE**



**SYSTEM SOFTWARE AND COMPILER DESIGN**  
**[As per Choice Based Credit System (CBCS) scheme]**  
**(Effective from the academic year 2016 -2017)**

**SEMESTER – VI**

|                               |        |            |    |
|-------------------------------|--------|------------|----|
| Subject Code                  | 15CS63 | IA Marks   | 20 |
| Number of Lecture Hours/Week  | 4      | Exam Marks | 80 |
| Total Number of Lecture Hours | 50     | Exam Hours | 03 |

**CREDITS – 04**

**Course objectives:** This course will enable students to

- Define System Software such as Assemblers, Loaders, Linkers and Macroprocessors
- Familiarize with source file, object file and executable file structures and libraries
- Describe the front-end and back-end phases of compiler and their importance to students

| <b>Module – 1</b>                                                                                                                                                                                                                                                                                                                                                                                                                                      | <b>Teaching Hours</b> |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|
| Introduction to System Software, Machine Architecture of SIC and SIC/XE.<br><b>Assemblers:</b> Basic assembler functions, machine dependent assembler features, machine independent assembler features, assembler design options.<br><b>Macroprocessors:</b> Basic macro processor functions,<br><b>Text book 1: Chapter 1: 1.1,1.2,1.3.1,1.3.2, Chapter2 : 2.1-2.4,Chapter4: 4.1.1,4.1.2</b>                                                          | <b>10 Hours</b>       |
| <b>Module – 2</b>                                                                                                                                                                                                                                                                                                                                                                                                                                      |                       |
| <b>Loaders and Linkers:</b> Basic Loader Functions, Machine Dependent Loader Features, Machine Independent Loader Features, Loader Design Options, Implementation Examples.<br><b>Text book 1 : Chapter 3 ,3.1 -3.5</b>                                                                                                                                                                                                                                | <b>10 Hours</b>       |
| <b>Module – 3</b>                                                                                                                                                                                                                                                                                                                                                                                                                                      |                       |
| <b>Introduction:</b> Language Processors, The structure of a compiler, The evaluation of programming languages, The science of building compiler, Applications of compiler technology, Programming language basics<br><b>Lexical Analysis:</b> The role of lexical analyzer, Input buffering, Specifications of token, recognition of tokens, lexical analyzer generator, Finite automate.<br><b>Text book 2:Chapter 1 1.1-1.6 Chapter 3 3.1 – 3.6</b> | <b>10 Hours</b>       |
| <b>Module – 4</b>                                                                                                                                                                                                                                                                                                                                                                                                                                      |                       |
| Syntax Analysis: Introduction, Role Of Parsers, Context Free Grammars, Writing a grammar, Top Down Parsers, Bottom-Up Parsers, Operator-Precedence Parsing<br><b>Text book 2: Chapter 4 4.1 4.2 4.3 4.4 4.5 4.6 Text book 1 : 5.1.3</b>                                                                                                                                                                                                                | <b>10 Hours</b>       |
| <b>Module – 5</b>                                                                                                                                                                                                                                                                                                                                                                                                                                      |                       |
| Syntax Directed Translation, Intermediate code generation, Code generation<br><b>Text book 2: Chapter 5.1, 5.2, 5.3, 6.1, 6.2, 8.1, 8.2</b>                                                                                                                                                                                                                                                                                                            | <b>10 Hours</b>       |
| <b>Course outcomes:</b> The students should be able to:                                                                                                                                                                                                                                                                                                                                                                                                |                       |
| <ul style="list-style-type: none"> <li>• Explain system software such as assemblers, loaders, linkers and macroprocessors</li> <li>• Design and develop lexical analyzers, parsers and code generators</li> <li>• Utilize lex and yacc tools for implementing different concepts of system software</li> </ul>                                                                                                                                         |                       |

**Question paper pattern:**

The question paper will have TEN questions.

There will be TWO questions from each module.

Each question will have questions covering all the topics under a module.

The students will have to answer FIVE full questions, selecting ONE full question from each module.

**Text Books:**

1. System Software by Leland. L. Beck, D Manjula, 3<sup>rd</sup> edition, 2012
2. Compilers-Principles, Techniques and Tools by Alfred V Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman. Pearson, 2<sup>nd</sup> edition, 2007

**Reference Books:**

1. Systems programming – Srimanta Pal , Oxford university press, 2016
2. System programming and Compiler Design, K C Louden, Cengage Learning
3. System software and operating system by D. M. Dhamdhere TMG
4. Compiler Design, K Muneeswaran, Oxford University Press 2013.

**BANGALORE INSTITUTE OF TECHNOLOGY  
K R ROAD, V V PURAM, BENGALURE-04**

**DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING**

**COURSE OBJECTIVES AND OUTCOMES-2015-19**

**Course Title : System Software and Compiler Design                          Course Code :: 15CS63**

**No. of Lecture Hrs./Week : 04                          Exam Hours : 03**

**Total No. of Lecture Hrs. : 52                          Exam Marks : 80**

**Prerequisites**

1. Microprocessors and Microcontrollers(15CS44)
2. Automata Theory and Computability (15CS54)

**Course Learning Objectives**

This course will help students to achieve the following objectives:

1. To understand the concepts of System software, Application Software and different hypothetical machine architectures.
2. Familiarize with source file, symbol table creation (pass-1), object file creation (pass-2), loaders and linkers.
3. To know the fundamental concepts of translators.
4. To identify the methods and strategies for parsing techniques.
5. Devise and perform syntax-directed translation schemes for compiler.
6. Devise intermediate code generation schemes and analyze the optimized code generated after the synthesis phase.

**Course Outcomes**

At the end of the course students should be able to:

1. Apply the knowledge of System Software such as Assemblers, Loaders, Linkers and Macro processors to build an application.
2. Understand the basic principles of compiler in high level programming language.
3. Analyze and design the analysis phase using different techniques.
4. Build the system software by associating synthesis phase with analysis phase for better optimization and performance.

## **MODULE-1**

**TEXTBOOK:** System Software by Leland.L. Beck, D.Manjula, 3<sup>rd</sup> Edition, 2012

### **CHAPTER 1: Introduction to System Software and Machine Architecture**

- 1.1 Introduction
- 1.2 System Software and Machine Architecture
- 1.3 The Simplified Instructional Computer (SIC)
  - 1.3.1 SIC Machine Architecture
  - 1.3.2 SIC/XE Machine Architecture
  - 1.3.3 SIC Programming Examples

### **CHAPTER 2: Assemblers**

- 2.1 Basic Assembler Functions
  - 2.1.1 A Simple SIC Assembler
  - 2.1.2 Assembler Algorithm and Data Structures
- 2.2 Machine-Dependent Assembler Features
  - 2.2.1 Instruction Formats and Addressing Modes
  - 2.2.2 Program Relocation
- 2.3 Machine-Independent Assembler Features
  - 2.3.1 Literals
  - 2.3.2 Symbol-Defining Statements
  - 2.3.3 Expressions
  - 2.3.4 Program Blocks
  - 2.3.5 Control Sections and Program Linking
- 2.4 Assembler Design Options
  - 2.4.1 One-Pass Assemblers
  - 2.4.2 Multi-Pass Assemblers

### **CHAPTER 4: Macro Processors**

- 4.1 Basic Macro Processor Functions
  - 4.1.1 Macro Definition and Expansion
  - 4.1.2 A Simple Bootstrap Loader

## CHAPTER 1

### **Introduction to System Software and Machine Architecture**

1.1 Introduction

1.2 System Software and Machine Architecture

1.3 The Simplified Instructional Computer (SIC)

1.3.1 SIC Machine Architecture

1.3.2 SIC/XE Machine Architecture

1.3.3 SIC Programming Examples

The term "software" refers to the set of electronic program instructions or data a computer processor reads in order to perform a task.

"Hardware" refers to the physical components that you can see and touch, such as the computer hard drive, mouse and keyboard.



Fig: Relationship b/w system software, Appl software and Hardware

Defn: "System software" is a set of programs that are dedicated to manage the computer itself, such as operating system, file management utility, Application software are a set of productivity programs or end-user programs to perform their specific tasks

# Difference between system software and Application software

## System software :

- 1. System software is a set of programs that are dedicated to manage the computer itself (mem. mgmt., process mgmt., protection, security)
- 2. Is written in a low-level language i.e assembly language
- 3. Starts running when the system is turned on and runs till the system is shut down
- 4. A system is unable to run without system software
- 5. System software is general purpose
- 6. Ex: operating system, assembler, compiler, loader or linker, text editor, debugger, macro processors,
- 7. not machine dependent (machine architecture)

## Application software

Application software is a set of computer programs designed to permit the user to perform a group of functions, tasks or activities.

Is written in a high-level language like C, C++, Java, Fort, VB etc

Runs as and when the user requests

Appln. software is never not required to run the system ∵ it is user specific

Appln. software is specific purpose software

E.g. web browser, word processing, spreadsheet, database, Adobe creative suite, Audio editor suite, games

not machine dependent

## 12. System software and machine architecture

### • Machine dependency of system software

→ system programs are intended to support the operation and use of the computer.

→ machine architecture differs in :

- machine code,
- instruction formats
- Addressing mode
- Registers

### • machine independency of system software

→ general design and logic is basically the same:

- code optimization
- subprogram linking

### 1.3. Simplified Instructional Computer

As we know different systems have different features and different features are difficult to study one by one. So to avoid this problem we study Simplified Instructional Computer.

SIC is a hypothetical computer system introduced in system software. Due to the fact that most modern microprocessors include complex functions for the purpose of efficiency, it is very difficult to learn systems programming using a real-world system. The SIC solved this problem by abstracting away these complex behaviours in favour of an architecture that is clear and accessible for those wanting to learn systems programming.

- SIC comes in two versions

  - standard model

  - XP version (Extra Equipment or Extra Expensive)

  - The two versions has been designed to be upward compatible.

## 1.3.1 SIC machine architecture

Every machine architecture includes

- a) memory
- b) Registers
- c) Data formats
- d) Instruction formats
- e) Addressing modes
- f) Instruction set
- g) Input and output

### a) Memory

- memory consists of 8 bit bytes
- Any 3 consecutive bytes form a word (24 bits)
- All addresses on SIC are byte addresses
- Words are addressed by the location of their lowest numbered byte
- Total of 32,768 ( $2^{15}$ ) bytes in the computer memory
- ; 15-line address bus

### b) Registers

- Five registers, all of which have special uses
- Each register is 24 bits in length
- Table shows the mnemonic, number and uses of each register.

| Mnemonic         | Number | Use                                                                            |
|------------------|--------|--------------------------------------------------------------------------------|
| A<br>Accumulator | 0      | Used for arithmetic operations                                                 |
| X                | 1      | Used for addressing                                                            |
| Index Register   | 2      |                                                                                |
| L                | 2      | JMP - Jump to subroutine instruction stores return address                     |
| Linkage Register | 3      |                                                                                |
| PC               | 8      | Contains the address of the next instruction to be fetched for execution       |
| Program Counter  | 8      |                                                                                |
| SW               | 9      | Contains variety of information including a condition code in comp instruction |
| Status word      |        |                                                                                |

### c) Data formats

→ Integers are stored as 32-bit binary number.  
 $(0 - 2^{31} - 1 \Rightarrow 0 \text{ to } 16777215)$

→ Negative value are represented as 2's complement  
 Ex:  $-3H$  is represented as (8 bit representation)

$$3H = 00011000$$

$$\text{Is complement } 11100111$$

$$\begin{array}{r}
 + \\
 \hline
 11101000 \rightarrow 2^{32}
 \end{array}$$

→ characters are stored using their 8-bit ASCII codes  
 → There is no floating-point hardware on the standard

version of SIC

$$\begin{aligned}
 \text{Ex:- } 5 &= 0000\ 0000\ 0000\ 0000\ 0000\ 0101 \\
 -5 &= 1111\ 1111\ 1111\ 1111\ 1111\ 0011 \\
 &\Delta = 0100\ 0001 \quad (65)
 \end{aligned}$$

#### d) Instruction formats

→ All machine instructions on the standard version of 6100 are have 36-bit format



→ indicates Induced addressing mode

#### e) Addressing modes

→ Two addressing modes based on  $x$  bit

- Direct Addressing
- Indirect Addressing

| mode     | Indication | Target address (TA)                                                                                    |
|----------|------------|--------------------------------------------------------------------------------------------------------|
| Direct   | $x=0$      | $TA = \text{address}$                                                                                  |
| Indirect | $x=1$      | $TA = \text{address} + (x)$<br>parentheses indicate the content<br>of a register or memory<br>location |

→ Ex:- LDA TEN ; LDA = 00 (opcode)



Target address / Effective address = 1000 ie contents of the address 1000 is loaded to accumulator

→ Indexed addressing mode

E1: STCH BUFFER, X ; opcode for STCH = 5h

$T_A = \text{address} + (x)$

= 1000 + content of the index register X

In the Accumulator content, the character is loaded to the effective address.

## → Instruction set

(i) load and store : LDA, LDX, STA, STX

(ii) Integer Arithmetic operations : ADD, SUB, MUL, DIV

- Arithmetic operations involve registers A and a word in memory with the result being left in

The register  
Ex: ADD WORD ; A  $\leftarrow$  A + WORD  
ie adds register A contents with WORD word and  
result is stored in register A

### iii) Composition Operations : comp

Expansion Operations : COMP  
• compares the value in register A with memory  
in a condition code (cc) accordingly

Ex: comp WORD ; compares n's contents with  
WORD ord sets cc as < = >

(iv) Conditional jump instructions : TLT, TEO, TGT

- these instructions will test the setting of cc and jump accordingly

(v) Subroutine linkage instructions ; JSUB, RSUB

- JSUB - jumps to the subroutine by placing the return address in register L. (Fn. call)
- RSUB - returns by jumping to the address contained in register L (Fn. return to the caller)

Eg: void main()

```
{  
    Add(1,2);  
    ↓  
    Fn call  
    ↳ Add (int x, int y)  
    ↓ return (x+y);  
    ↳ Fn return
```



### (g) Input and output

- Input and output is performed by transferring 1 byte data at a time to or from the rightmost 8 bits of register A.
- Each device is assigned a unique 8-bit code
- There are three I/O instructions which specify the device code as an operand

(i) TD (Test Device) : checks whether the addressed device is ready to send or receive a byte of data, CC (condition code in SW register) is set according ( $< =$ )

- $<$  → device is ready to send/receive
- $=$  → device is not ready.

- A program has to wait until the device is ready, then execute a Read Data <sup>(RD)</sup> and write data (WD) instructions.
- This sequence should be continued for each byte of data. (I/O).
- RD : Transfers a byte of data from I/P device, into rightmost byte of register A (<sup>RD INDEV</sup> STA DATA)
- WD : Byte of data is loaded into rightmost byte of reg. A and then written to output device (LDA DATA 4 WD OUTDEV)

Ex: SIC instructions for data movement operations  
(no memory - memory move instructions)

|       |       |                                   |
|-------|-------|-----------------------------------|
| LDA   | FIVE  | ; Load constant 5 into register A |
| STA   | ALPHA | ; Store in Alpha                  |
| LDCH  | CHAR2 | ; Load character '2' into reg A   |
| STCM  | C1    | ; store in character variable C1  |
| ;     |       |                                   |
| ;     |       |                                   |
| ;     |       |                                   |
| ALPHA | RESW  | 1 ; one-word variable             |
| FIVE  | WORD  | 5 ; one-word constant             |
| CHAR2 | BYTE  | C '2' ; one-byte constant         |
| C1    | RESB  | 1 ; one-byte variable             |

↓ some can be written as

LDX FIVE  
STX ALPHA

A - Accumulator  
X - indexed register  
L - linkage register

Or

LDL FIVE  
STL ALPHA

2) SIC instructions for arithmetic operations

$$\text{Delta} = \text{BETA} = \text{ALPHA} + \text{INCR} - 1$$

$$\text{DELTA} = \text{GAMMA} + \text{INCR} - 1$$

|       |       |                                      |
|-------|-------|--------------------------------------|
| LDA   | ALPHA | ; loads Alpha into register A        |
| ADD   | INCR  | ; $A \leftarrow (A) + (\text{INCR})$ |
| SUB   | ONE   | ; $A \leftarrow (A) - 1$             |
| STA   | BETA  | ; $\text{BETA} \leftarrow (A)$       |
| LDA   | GAMMA | ; $A \leftarrow (\text{GAMMA})$      |
| ADD   | INCR  | ; $A \leftarrow (A) + (\text{INCR})$ |
| SUB   | ONE   | ; $A \leftarrow (A) - 1$             |
| STA   | DELTA | ; $\text{DELTA} \leftarrow (A)$      |
|       |       |                                      |
|       |       |                                      |
| ONS   | WORD  | 1                                    |
| ALPHA | RESW  | 1                                    |
| BETA  | RESW  | 1                                    |
| GAMMA | RESW  | 1                                    |
| DELTA | RESW  | 1                                    |
| INCR  | RESW  | 1                                    |

3) SIC instructions for looping and indirect operations  
(program to copy n-byte character string to another string)

// LDx ZERO ; initialize index register

j=0;

for (i=0; s1[i] != '\0'; i++)

s2[j++] = s1[i];

s2[j] = '\0';

LDX ZERO ; initialize index register to 0  
 LOOP LDCH STR1, X ; copy the first character of str1 to reg. A  
                   ( $TA = \text{address} \Rightarrow \text{content of first byte of str1}$ )  
 STCH STR2, X ; store the first character into STR2  
 TIX ELEVEN ; Add 1 to index, compare to 11  
                    $X = 0 + 1 = 1$ ;  $1 \leftrightarrow 11$  cc will  
                   be set as <  
 JLT LOOP ; repeat if index is < 11  
 :  
 STR1 BYTE C 'HELLO<sup>space</sup>WORLD'  
 STR2 RESB 11  
 ZERO WORD 0  
 ELEVEN WORD 11

b) Program to add 2 arrays of 100 words each and  
 store it in another array. Each word is 3 bytes.  
 $100 \text{ words} = 3 \times 100 = 300 \text{ bytes}$   
 $C = A + B$  ;

ADDLOOP LDX INDEX ; initializes index value  $x \geq 0$   
 LDA ALPHA, X ;  $A \leftarrow (\text{ALPHA})$   
 ADD BETA, X ;  $A \leftarrow (A) + (\text{BETA})$  at 0th byte (index value)  
 STA GAMMA, X ;  $C \leftarrow (A)$  at 0th byte (index value)  
 LDA INDEX ;  $A \leftarrow 0$   
 ADD THREE ;  $A \leftarrow (A) + 3 = 3 \rightarrow n$   
 STA INDEX ;  $\text{INDEX} = 3$   
 COMP K300 ;  $A \leftrightarrow k_{300}$  ie  $3 \leftrightarrow 300$  cc = <  
 JLT ADDLOOP ; repeat loop, now  $x = 3^{\text{rd}}$  byte  
 K300 WORD 300  
 INDEX RESW 1  
 ALPHA RESW 100  
 THREE WORD 3  
 BETA RESW 100  
 GAMMA RESW 100

5) To read one byte of data from input device and copies it to device as

|         |      |         |                                      |
|---------|------|---------|--------------------------------------|
| INLOOP  | TD   | INDEV   | ; Init input device                  |
|         | JEQ  | INLOOP  | ; cc := then loop until device ready |
|         | RD   | INDEV   | ; once ready, read a byte into reg A |
|         | STCH | DATA    | ; store it in data(memory)           |
|         | :    |         |                                      |
|         |      |         |                                      |
| OUTLOOP | TD   | OUTDEV  | ; Init output device                 |
|         | JEQ  | OUTLOOP | ; cc := then loop until device ready |
|         | LDCH | DATA    | ; load data byte into reg A          |
|         | WD   | OUTDEV  | ; write one byte to output device    |
|         | :    |         |                                      |
|         |      |         |                                      |
| INDEX   | BYTE | X 'F'   |                                      |
| OUTDEV  | BNIL | X '0S'  |                                      |
| DATA    | RESB | 1       |                                      |

6) Subroutine call to read a 100-byte record from an input device into memory.

|       |      |          |                                                                               |
|-------|------|----------|-------------------------------------------------------------------------------|
|       | TAB  | READ     | ; call Read subroutine where it stores the return address in linkage register |
|       | :    |          |                                                                               |
| READ  | LDX  | ZERO     | ; X ← 0                                                                       |
| RLOOP | TD   | INDEV    | ; Init input device                                                           |
|       | JEQ  | RLOOP    | ; cc :=, loop until device is ready                                           |
|       | RD   | INDEV    | ; read one byte into reg A                                                    |
|       | STCH | RECORD,X | ; store it into RECORD at 0th address                                         |
|       | TX   | K100     | ; X = (X)+1 & I → 100<br>; compare                                            |
|       | TER  | RLOOP    | ; cc < then loop back                                                         |
|       | RSUB |          | ; Exit from subroutine; it returns to the address stored in linkage register  |
|       | :    |          |                                                                               |

|        |      |        |
|--------|------|--------|
| INDEX  | BYTE | X 'F1' |
| RECORD | REGB | 100    |
| ZERO   | WORD | 0      |
| KLOC   | WORD | 100    |

1.3.2 : ~~SIC/XE~~

machine architecture

→ SIC/XE : simple Instructional computer with Extra Equipment

- a) memory
- b) Registers
- c) Data formats
- d) Instruction formats
- e) Addressing modes
- f) Instruction set
- g) Input and output

a) memory

→ memory consists of 8 bit bytes

→ 3 consecutive bytes form a word (24 bits)

→ all addresses are byte addressed

→ words are addressed by the location of their lowest numbered byte

→ Total of 1 MB ( $2^{20}$  bytes) in the memory. (20 bit address bus) which leads to change in instruction formats and addressing modes.

b) Registers

→ There are 9 registers

→ Each register is 24 bits in length except Floating Point reg

→ The registers are A, X, L, B, S, T, F, PC & SW

| Mnemonic | Number | Uses                                                                                           |
|----------|--------|------------------------------------------------------------------------------------------------|
| A        | 0      | Used for arithmetic operations<br>Accumulator                                                  |
| X        | 1      | Used for addressing (indexed)<br>Index Register                                                |
| L        | 2      | Used to store the return address<br>for JSUB instruction<br>Linkage register                   |
| B        | 3      | Used for addressing<br>Base register                                                           |
| S        | 4      | General working register - no special use<br>General Register                                  |
| T        | 5      | General working register - no special use<br>General Register                                  |
| F        | 6      | General working register Floating point<br>Accumulator (48 bits)<br>Floating Point Accumulator |
| PC       | 8      | Contains the address of the next<br>instruction to be fetched for execution<br>Program Counter |
| SW       | 9      | Contains a variety of information<br>including a condition code (CC)<br>Status Word            |

### c) Data formats :

- Integers are stored as 32-bit binary numbers
- Negative values are represented as 2's complement  
(not 1's complement +1)
- Characters are stored using their 8-bit ASCII codes
- There is a 48-bit floating point data type



- The fraction is interpreted as a value between 0 and 1
- The assumed binary point is immediately before the higher order bit
- For normalized floating point numbers, the higher order bit of the fraction must be 1
- The exponent is interpreted as an unsigned binary number between 0 and 8047 ( $0 - (2^8 - 1)$ )
- If the exponent has value e, fraction f and the absolute value of number is represented as  

$$f \times 2^{(e-1023)}$$
- The sign of floating point number is indicated by s ( $s=0 (+ve)$  and  $1 (-ve)$ )

Ex:  $5 = 0000\ 0000\ 0000\ 0000\ 0000\ 0101$   
 $-5 = 1111\ 1111\ 1111\ 1111\ 1111\ 1011$

~~A@ = 0100 0001 (65)~~

### Ex: 4.89 representation

As we know from computer organization it is represented as  $f.mB$

$\downarrow$  Base(2)  
Fraction (mantissa)

- 1) Represent 4 in binary form  $\rightarrow 100$
- 2) Convert 0.89 into binary form until it repeats or until we get 36 bits which represents the fraction part

100.1110001110101110000101000111010111  
000010100

- 3) normalization has to be done but not always  $\because$  they have specified that binary point is immediately before the higher order bit

i.e. 100.111000 . . .

$0.\overbrace{1001110001110101110000101000111010}^{\text{fraction}} \times 2^3 \rightarrow \text{Exponent}$

Note: for normalized floating point number it will be as  $1.00111000111\ldots \times 2^e$

$$\text{i.e. } \frac{e+1024}{2} = 0.10011\ldots \times 2^{3+1024}$$

$$= 0.10011\ldots \times 2^{1027}$$

|   | "bit"                                            | 36 bit            |
|---|--------------------------------------------------|-------------------|
| 0 | 10000000011   1001110001110101110000101000111010 | exponent fraction |

$0.37 \times 2 \rightarrow 1$   
 $0.78 \times 2 \rightarrow 1$   
 $0.56 \times 2 \rightarrow 1$   
 $0.12 \times 2 \rightarrow 1$   
 $0.24 \times 2 \rightarrow 0$   
 $0.48 \times 2 \rightarrow 0$   
 $0.96 \times 2 \rightarrow 0$   
 $0.92 \times 2 \rightarrow 1$   
 $0.84 \times 2 \rightarrow 1$   
 $0.68 \times 2 \rightarrow 1$   
 $0.36 \times 2 \rightarrow 1$   
 $0.72 \times 2 \rightarrow 0$   
 $0.44 \times 2 \rightarrow 1$   
 $0.88 \times 2 \rightarrow 0$   
 $0.76 \times 2 \rightarrow 1$   
 $0.52 \times 2 \rightarrow 1$   
 $0.04 \times 2 \rightarrow 1$   
 $0.16 \times 2 \rightarrow 0$   
 $0.32 \times 2 \rightarrow 0$   
 $0.64 \times 2 \rightarrow 0$   
 $0.28 \times 2 \rightarrow 1$   
 $0.56 \times 2 \rightarrow 0$   
 $0.12 \times 2 \rightarrow 1$   
 $0.24 \times 2 \rightarrow 0$   
 $0.48 \times 2 \rightarrow 0$   
 $0.96 \times 2 \rightarrow \text{Continue as above}$

0111101011100001010

$(1027)_2 \rightarrow 1000000011$

Given  $-0.0001189$

→ Represent 0 as binary 0

Represent  $0.0001189$  as binary

$0.0000000001000000000110000000$

$111100000001111110.$   
fraction

$\cdot 1000000001100000011100000001$

$1111110x^{\frac{-10}{2}}$

$\Rightarrow -1 \times 2^{e+1024} = .10000\ldots x^{\frac{-10+1024}{2}}$

$= \underbrace{.10000\ldots}_{\text{fraction}} x^{\frac{1014}{2}} \rightarrow E$

(1014) = 0111110110

$\begin{array}{l} \text{0000H89X2} \\ \text{0.000978X2} \rightarrow 0 \\ \text{0.001956X2} \rightarrow 0 \\ \text{0.003912X2} \rightarrow 0 \\ \text{0.007820X2} \rightarrow 0 \\ \text{0.015648X2} \rightarrow 0 \\ \text{0.031296X2} \rightarrow 0 \\ \text{0.062592X2} \rightarrow 0 \\ \text{0.125184X2} \rightarrow 0 \\ \text{0.250368X2} \rightarrow 0 \\ \text{0.500736X2} \rightarrow 0 \\ \text{0.001472X2} \rightarrow 01 \\ \text{0.002944X2} \rightarrow 0 \end{array}$

|        |            |                                    |    |
|--------|------------|------------------------------------|----|
| 1      | 0111110110 | 1000000000110000001111000000011111 | 36 |
| $\leq$ | Exponent   | fraction                           | 10 |

d) Instruction formats

→ Since the memory used by SIC/XE may be  $\geq$  by 16,  
the instruction format of SIC is not enough.

→ There are two possible options

→ There are two forms of relative addressing  
(i) Either use some form of relative addressing  
in labels

- (ii) Extended the address field to 20 bits

$\rightarrow$  if  $e=0$ , then format 3

$\rightarrow$  if  $c=4$ , then format 4

i) Format 1 (1 byte)



Ex:  $\text{RSUB} \xrightarrow{\text{HC}}$  (return to subroutine)  
 $\Rightarrow$  it returns to the address stored in linkage register

|      |      |
|------|------|
| 0100 | 1100 |
|------|------|

h c  $\rightarrow$  object code

j) Format 2 (2 bytes)



Ex: COMPR A,S (compare the contents of registers A & S)

opcode 9 COMPR = A0

8 bits

|           |           |                             |
|-----------|-----------|-----------------------------|
| 1010 0000 | 0000 0000 | 0100                        |
| A         | 0         | 0 H $\rightarrow$ obj. code |

k) Format 3 (3 bytes)



note: c=0

Ex: LDA #3 (load 3 to A)

6 1 1 1 1 12 bits

|         |   |     |   |           |           |
|---------|---|-----|---|-----------|-----------|
| 0000 00 | 0 | 100 | 0 | 0000 0000 | 0000 0011 |
| opcode  | n | i   | z | b         | p e 0 0 3 |

01003  $\rightarrow$  object code

l) Format 4 (4 bytes)



Ex: JTSUB RDREC (jump to address 1036)

opcode JSUB-H8

6 1 1 1 1 1 20 bits

|         |         |           |                               |
|---------|---------|-----------|-------------------------------|
| 0100 10 | 11 0001 | 0000 0001 | 0000 0000 0011 0000           |
| opcode  | n       | i         | b p e address H B 1 0 1 0 3 6 |

object code is HB 101036

### e) Addressing modes

| MODE                                         | INDICATION                                                    | TARGET ADDRESS CALCULATION                                                         |
|----------------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------|
| 1. Base Relative                             | $b=1, p=0$                                                    | $TA = (B) + \text{displacement}$<br>$(0 \leq \text{disp} \leq 4095)$               |
| 3. Program Counter<br>Relative               | $b=0, p=1$                                                    | $TA = (P) + \text{displacement}$<br>$(-2048 \leq \text{disp} \leq 2047)$           |
| 3. Direct Addressing                         | $b=0, p=0$<br>(for format i)<br>$b=0, p=0$<br>(for format ii) | $TA = \text{displacement}$<br>$TA = \text{Address field}$                          |
| 4. Base Relative<br>Indexed<br>addressing    | $b=1, p=0$<br>$i=1$                                           | $TA = (B) + (i) + \text{displacement}$                                             |
| 5. Program Relative<br>Indexed<br>addressing | $b=0, p=1$<br>$i=1$                                           | $TA = (P) + (i) + \text{displacement}$                                             |
| 6. Immediate<br>addressing                   | $p=1, n=0$                                                    | Target address itself used<br>$TA = \text{operand value}$<br>(no memory reference) |
| 7. Indirect<br>addressing                    | $i=0, n=1$                                                    | $TA = \text{displacement value}$                                                   |
| 8. Simple<br>addressing                      | $i=0, n=0$<br>OR<br>$i=1, n=1$                                | $TA = \text{location of operand}$                                                  |

→ note:

format 3 :

↳ in Base relative, disp is interpreted as 12-bit unsigned integer (1)

↳ in Pc relative disp is interpreted as 12-bit signed integer & negative numbers if 2's complement (2)

## special symbols indication

- 1) # : Immediate address
- 2) @ : Indirect address
- 3) + : Format h
- 4) \* : The current value of PC
- 5) c '' : Character string
- 6) op m, x : x-denotes the index register
- 7) base : Base register

## Instruction set :

- Note: Immediate addressing ( $i=1, n=0$ )  $\rightarrow$  Target address itself is used as the operand value (no memory reference is performed)
- $\rightarrow$  Indirect addressing ( $i=0, n=1$ )  $\rightarrow$  the word at the location given by the target address is fetched and the value contained in this word is then taken as the address of the operand value.
- $\rightarrow$  Indexing cannot be used with immediate or indirect addressing mode.
- \*  $\rightarrow$  we can't set both  $b=1 \& p=1$  which is invalid instruction set.

examples of 510XE instructions and addressing modes

$$(S) = 0060000$$

$$(PC) = 003000$$

$$(X) = 000090$$

### Stack instructions

| Hex      | 6       | 5 | 4 | 3 | 2 | 1 | Binary                   | 12/20 | Value loaded<br>from<br>Reg. A | Mode                                                                                 |
|----------|---------|---|---|---|---|---|--------------------------|-------|--------------------------------|--------------------------------------------------------------------------------------|
| 032600   | 0000 00 | 1 | 0 | 0 | 1 | 0 | 0110 0000 0000           | 3600  | 103000                         | Program Counter Relative<br>TA = (PC) + displacement<br>= 003000 + 600 = 3600        |
| 03C300   | 0000 00 | 1 | 1 | 1 | 0 | 0 | 0011 0000 0000           | 6370  | 00C303                         | Base relative indirect<br>TA = (BX) + (X) + disp<br>= 006000 + 00010 + 300<br>= 6390 |
| 022030   | 0000 00 | 1 | 0 | 0 | 0 | 1 | 0000 0011 0000           | 3030  | 103000                         | Indirect + Program relative<br>TA = (PC) + disp<br>= 003000 + 030 = 3030             |
| 010030   | 0000 00 | 0 | 0 | 1 | 0 | 0 | 0000 0011 0000           | 30    | 000050                         | Immediate address<br>TA is used as operand value                                     |
| 003600   | 0000 00 | 0 | 0 | 0 | 0 | 1 | 0110 0000 0000           | 3600  | 103000                         | PC relative<br>TA = (PC) + disp<br>= 003000 + 600 = 3600                             |
| 0310C303 | 0000 00 | 1 | 1 | 0 | 0 | 0 | 0000 1100 0011 0000 0011 | C303  | 003030                         | Simple addressing<br>TA = location of operand                                        |

|      |         |
|------|---------|
|      | :       |
|      | :       |
|      | :       |
| 3030 | 003600, |
|      | :       |
|      | :       |
|      | :       |
| 3600 | 103000  |
|      | :       |
|      | :       |
|      | :       |
|      | :       |
| 6390 | 00C303  |
|      | :       |
|      | :       |
|      | :       |
|      | :       |
| 6303 | 003030  |
|      | :       |
|      | :       |
|      | :       |

(B) = 006000  
 (PC) = 003000  
 (X) = 000090

Fig: contents of registers  
 B, PC and X & memory  
 locators

## 1) Instruction set

- \* → Load and store instructions: LDA, LDX, STA, STX, LDB, STB
- \* → Integer and floating point arithmetic operations:  
ADD, SUB, MUL, DIV, ADDF, SUBF, MULF, DIVF
- \* → Register move instructions (RMOV)  $\Rightarrow$  register to register operations such as ADDR, SUBR, MULR, DIVR
- \* → A special supervisor call (SVC) instruction is provided. Executing this instruction generates an interrupt that can be used for communication with the operating system.
- Comparison instructions: COMP, COMPR, COMPF
- Conditional jump instructions: JLI, JEB, JGT
- Subroutine linkage instructions: JSUB, RSUB

## 2) Input and output

→ Input and output is performed by transferring 1 byte of data at a time to or from the rightmost 8 bits of register A.

→ Each device is assigned a unique 8-bit code

→ Three I/O instructions which specify the device code as an operand

- (i) TD (Test Device)  $\rightarrow$  Tests whether the addressed device is ready to send or receive a byte of data and sets the condition code (CC)
  - < : Device is ready to send/receive
  - = : Device is not ready

- Test continues until the device is ready.
- Once ready, either RD (Read into): <sup>Transfers data</sup> from input device or keyboard into rightmost byte of register A, and stored in buffer if required (RD INDEX & SIA DATA)
- WD (Write Data): a byte of data is loaded into the rightmost byte of register A and then written to the addressed device (UDA DATA & WD OUTPER)
- \* → There are I/O channels that can be used to perform input and output while the CPU is executing other instructions. This allows overlap of computing and I/O, resulting in more efficient system operation.
  - ↳ SIO ↔ Start I/O
  - ↳ TIO ↔ Test I/O
  - ↳ HIO ↔ Half I/O

# 8085 8086 programs

15

## i) Data movement operations

```

LDA #5 ; loads value 5 into register A
STA ALPHA ; store in alpha : A  $\leftarrow$  (A) + (ALPHA)
LDA #90 ; load ascii code for 'z' into A
STCH C1 ; store in character variable C1
;
;
ALPHA RESC0 ; one word variable
C1 RESB ; one byte variable

```

## ii) Arithmetic operations ( $Beta = alpha + inc - 1$ )

```

LDS INCR ; load value of inc to S
LDA ALPHA ; load value of alpha to A
ADDR S,A ; A  $\leftarrow$  (A) + (S)
SUB #1 ; A  $\leftarrow$  (A) - 1
STA BETA ; BETA  $\leftarrow$  (A)
;
;
```

## iii) Looping and indexed operations

$GAMMA = ALPHA + BETA$  where ALPHA and BETA  
are arrays of 100 words each.  
( $100 \times 3 = 300$  bytes)

```

ADDLOOP LDS #3 ; S = 3
LDT #300 ; T = 300
LDX #10 ; X = 10 → which specifies index value
LDA ALPHA,X ; A  $\leftarrow$  (ALPHA) at the specified address of index register
ADD BETA,X ; A  $\leftarrow$  (A) + (BETA)
STA GAMMA,X ; GAMMA  $\leftarrow$  (A) at specified index value (0)
ADDR S,X ; X  $\leftarrow$  (X) + (S) = 0 + 3 = 3 (index register address is 3)
COMPR XT ; (X)  $\leftrightarrow$  (T) compared if  $3 < 300$ , if  $3 <$ 
JLT ADDLOOP ; repeat loop till  $300 = 300$ 

```

|       |      |     |
|-------|------|-----|
| ALPHA | RESW | 100 |
| BETA  | RESW | 100 |
| GAMMA | RESW | 100 |

- 5) To read one byte of data from input device F1 and copies it to device 05

|         |      |         |
|---------|------|---------|
| INLOOP  | TD   | INDEV   |
|         | TEQ  | INLOOP  |
|         | RD   | INDEV   |
|         | STCH | DATA    |
|         | :    |         |
| OUTLOOP | TD   | OUTDEV  |
|         | TEQ  | OUTLOOP |
|         | LDCH | DATA    |
|         | WD   | OUTDEV  |
|         | :    |         |
| HYPER   | BYTE | X 'F1'  |
| OUTDEV  | BYTE | X '05'  |
| DATA    | RESB | 1       |

- 6) subroutine will to read two-byte record from an input device into memory

|       |      |       |
|-------|------|-------|
| TIUB  | READ |       |
|       | :    |       |
| READ  | LDX  | #0    |
|       | LDT  | #100  |
| RLOOP | TD   | INDEV |
|       | TEQ  | RLOOP |
|       | RD   | INDEV |

```

STCH    RECORD, X
TIXR    T
JLT    LOOP
RSUB
;
;
INDEX  BYTE  X 'F1'
RECORD  RESB  100

```

Write a SIC and SIC/XE program to copy 'SYSTEM SOFTWARE' to another string

a) SIC program

```

LDX    ZERO
LOOP   LDCHI  STR1, X
       STCHI  STR2, X
       TIX    FIFTEEN
       JLT    LOOP
;
;
STR1  BYTE  C 'SYSTEM SOFTWARE'
STR2  RESB  15
ZERO  WORD  0
FIFTEEN WORD  15

```

b) SIC/XE program

```

LDX    #0
LDT    #15
LOOP   LDCHI  STR1, X
       STCHI  STR2, X
       TIXR   T
       JLT    LOOP
;
;
STR1  BYTE  C 'SYSTEM SOFTWARE'
STR2  RESB  15

```

### Exercises 1.3

1. Write a sequence of instructions for 8080 to set ALPHA equal to the product of BETA and GAMMA. Assume ALPHA, BETA and GAMMA are 1 word ( $\text{ALPHA} = \text{BETA} * \text{GAMMA}$ )

```

LDA    BETA
MUL    GAMMA
STA    ALPHA
:
ALPHA  RESW 1
BETA   RESW 1
GAMMA  RESW 1

```

2. Write a sequence of instructions for 8080 to set ALPHA equal to  $A + B - 9$ . ALPHA, BETA and GAMMA are 1 word. Use immediate addressing for the constants ( $A = h + B - 9$ )

```

LDA    BETA
LDI    #H
MULR  S,A
SUB   #9
STA    ALPHA
:
ALPHA  RESW 1

```

3 Write SIC instructions to swap the values of ALPHA and BETA.

```
LDA ALPHA  
STA GAMMA  
LDA BETA  
STA ALPHA  
LDA GAMMA  
STA BETA  
;  
ALPHA RESW 1  
BETA RESW 1  
GAMMA RESW 1
```

In write a sequence of instructions for sic to set ALPHA equal to the integer portion of BETA + GAMMA. ALPHA, BETA, GAMMA are 1 word each

```
LDA BETA  
DIV GAMMA  
STA ALPHA  
;  
ALPHA RESW 1  
BETA RESW 1  
GAMMA RESW 1
```

5. Write a sequence of instructions for 8086 to divide BETA by GAMMA, setting ALPHA to the integer portion of the quotient and DELTA to the remainder. Use register-to-register instructions to make the calculation as efficient as possible.

|               |                         |              |
|---------------|-------------------------|--------------|
|               |                         | Ex:    B = 5 |
| LDA    BETA   | ; A = 5                 | Q = 3        |
| DIVF    GAMMA |                         |              |
| LDS    GAMMA  | ; S = 2                 |              |
| DIVR    S, A  | ; A = A/S = 5/2 = 2     |              |
| STA    ALPHA  | ; A = 2                 |              |
| MULR    S, A  | ; A = A*S = 2*2 = 4     |              |
| LDS    BETA   | ; S = 5                 |              |
| SUBR    A, S  | ; S = S - A = 5 - 4 = 1 |              |
| STS    DELTA  | ; DELTA = 1             |              |
| :             |                         |              |
| ALPHA    RESW | 1                       |              |
| BETA    RESW  | 1                       |              |
| GAMMA    RESW | 1                       |              |
| DELTA    RESW | 1                       |              |

Note:

// To find the remainder

$$\text{Quotient} = \text{Dividend} / \text{Divisor}$$

$$\text{Remainder} = \text{Dividend} - (\text{Quotient} * \text{Divisor})$$

$$\text{Ex:- Dividend} = 10, \text{ Divisor} = 3, \quad Q = 10/3 = 3; \quad R = 10 - (3 * 3) = 10 - 9 = 1$$

$$\text{Dividend} = 15, \text{ Divisor} = 3, \quad Q = 15/3 = 5; \quad R = 15 - (5 * 3) = 15 - 15 = 0$$

6. Write a sequence of instructions for sicxe to divide BETA by GAMMA, setting ALPHA to the value of the quotient, rounded to the nearest integer. Use register-to-register instructions to make the calculation as efficient as possible

```
LDF    BETA
DIVF    GAMMA
FIIX
STA    ALPHA
;
ALPHA  RESW 1
BETA   RESW 1
GAMMA  RESW 1
```

7. Write a sequence of instructions for sic to clear a 20-by-7 string to all blanks

```
LPX    ZERO
Loop  LDCH    BLANK
       STRI, X
       TIX     TWENTY      ; ADD 1 to index and compare
       JLT     LOOP       ; LOOP if index < 20 with 20 & set
                           ; ECSE =>
;
       STRI    RESW 20
       BLANK  BYRE  C
       ZERO   WORD  0
       TWENTY WORD  20
```

8. Write a sequence of instructions for SIC/XE to clear a 20-byte string to all blanks. Use immediate addressing and register-to-register instructions to make the process as efficient as possible.

```

LDT #20
LDX #0
CLOOP LCH #0
STCH STR1, X
TIXR T
JLT CLOOP
:
STR1 RESW 20

```

9. Suppose that ALPHA is an array of 100 words. Write a sequence of six instructions to set all 100 elements of the array to 0.

|                |       |        |
|----------------|-------|--------|
| LDA ZERO       | INDEX | RESW 1 |
| STA INDEX      |       |        |
| Loop LDX INDEX |       |        |
| LDA ZERO       |       |        |
| STA ALPHA, X   |       |        |
| LDA INDEX      |       |        |
| ADD THREE      |       |        |
| STA INDEX      |       |        |
| COMP K300      |       |        |
| TIX TWENTY     |       |        |
| TIX WORD       |       |        |

10 ALPHA is an array of 100 words. write a sequence of instructions for sic|x8 to set all 100 elements of the array to 0. use immediate addressing and register-to-register instructions to make the program as efficient as possible

```
LDS #3  
LDT #300  
LD X #0  
LOOP LDA #0  
      STA ALPHA, X  
      ADDR S, X  
      COMPR X, T  
      JLT LOOP  
      :  
      ALPHA RESW 100
```

11 ALPHA is an array of 100 words. write a sequence of sic|x8 instructions to arrange the 100 words in ascending order and store the result in an array BETA of 100 words.

```
LDS #3  
LDT #300  
LD X #0  
Loop LDA ALPHA, X  
      MUL #4
```

12. ALPHA and BETA are the two arrays of 100 words.  
 Another array of GAMMA elements are obtained by multiplying the corresponding ALPHA element by 4 and adding the corresponding BETA elements. write the next instructions for the same.

```

LDS #3
LDT #300
LDX #0
LOOP LDA ALPHA, X
      MVI #H
      ADD BETA, X
      STA GAMMA, X
      ADDR S, X ; X ← X + 3
      COMPR X, T ; X ≥ 300
      JLT LOOP
      :
ALPHA RESW 100
BETA RESW 100
GAMMA RESW 100
  
```

13. ALPHA is an array of 100 words. write a sequence of six instructions to find the maximum element in the array and store results in MAX.

```

LDS #3
LDT #300
LDX #0
LOOP LDA ALPHA,X
      COMP MAX
      JLT NORMAX
      STA MAX
      NORMAX ADDR S,X
      COMPR X,T
      JLT LOOP
      :
ALPHA RES16 100
MAX WORD -32768

```

note: COMP MAX  $\Rightarrow$  indicates Accumulator value is compared with MAX and set the cc (condition code) i.e.  $CC \leftarrow < = >$  of  $(A) ? (max)$ . Based on cc value check the condition i.e. JLT, JGT, JGE

$\rightarrow$  COMPR X,T  $\Rightarrow$  Register value are compared  $X : (*) ? (T)$  and cc is set :  $< = >$  and jump instruction is called

## Explanation

$\text{ALPHA} = \{10, 20, 30, 40, \dots, 32768, \dots\}$

|       |   |
|-------|---|
| 4     | 4 |
| 0     | 3 |
| Index |   |

Each value is 1 word = 3 bytes

100 words =  $3 \times 100 = 300$  bytes

index has to be incremented by 3.

i.e. initially  $x = 0, 3, 6, 9, \dots, 300$

According to code:  $S = 3, T = 100, X = 0$

1st iteration Loop : Accumulator (A) = 10 at 0<sup>th</sup> position

comp max ;  $10 \neq 32768$  sets CC: <

JLT NOMAX

NOMAX ; ADDR S, X ;  $X \leftarrow (X) + S = 0 + 3 \rightarrow$  increment by 3 (next element)

element is 20

COMPR X, T ;  $X \leftarrow (\star) ? (\otimes) T$

$\leftarrow 20 \neq 300$  CC: <

JLT LOOP

To check whether the array index has come to an end.

## Iteration

a) Loop :  $A \leftarrow 20$  which is at 3<sup>rd</sup> position

LDA ALPHA 3  $\rightarrow$  value at 3<sup>rd</sup> position

COMP MAX ;  $20 \neq 32768 \rightarrow CC: <$

JLT NOMAX

:

Contd...

14. A RECORD contains a 100-byte record. write a subroutine for SIC that will write this record onto device DS.

```
TSUB    LWRREC
;
LWRREC LDX    'ZERO'          ; initialize index register = 0
LOOP    TD    OUTPUT          ; Test output device
        TEQ    WLOOP           ; loop if device is busy
        LDC1T  RECORD, X        ; load one byte to accumulator
        WD    OUTPUT           ; write one byte to device
        TX    LENGTH           ; add 1 to index & compare to
        TLT    LOOP             ; loop if index is < 100
        RSUB   ; exit from subroutine
;
        ZERO  WORD  0
        LENGTH WORD  1
        OUTPUT BYTE  X '05'
        RECORD REGB  100
```

Note: To read and write the data between the devices, the device has to be ready to perform. This is done by using TD (Test device) instruction; status of the device is tested and cc is set to either { < (ready) = (not ready) } if ready then RD is executed.  $\Rightarrow$  reads 1 byte of data from device into rightmost byte of register A. If the input device is character-oriented (Keyboard), the value placed in reg. A is the ASCII code for the character that was read. If  $\text{WD} \rightarrow$  off, then it is loaded into the rightmost byte of register A. and

15. write a subroutine for SIC/XE to write a RECORD of 100 bytes onto output device 05.

```

JSUB    WRREC ; Jump to subroutine

WRREC   LDX    #0
        LDT    #100

LOOP    TD     OUTPUT
        JEQ    LOOP
        LDCH   RECORD, X
        WD     OUTPUT
        TXR    T
        JLT    LOOP

RSUB
;
OUTPUT  BYTB  X '05'
RECORD  REC8  100
;
```

16. write a subroutine for SIC that will read a record into a buffer. The record may be any length from 1 to 100 bytes. The end of the record is marked with a "null" character (ASCII code 00). The subroutine should place the length of the record read into a variable named LENGTH.

```

JSUB    RDREC
;
RDREC   LDY    ZERO
RLOOP   TD     INDEV
```

|      |        |            |
|------|--------|------------|
|      | JEQ    | RLOOP      |
|      | RD     | INDEV      |
|      | COMP   | NULL       |
|      | JEQ    | EXIT       |
|      | SICH   | BUFFER, X  |
|      | TX     | KIPO       |
|      | JLT    | RLOOP      |
| EXIT | STX    | LENGTH     |
|      | RSUB   |            |
|      | :      |            |
|      | ZERO   | WORD '0    |
|      | NULL   | WORD 0     |
|      | KIPO   | WORD 1     |
|      | INDEV  | BYTE X 'F' |
|      | LENGTH | RESW 1     |
|      | BUFFER | RECB 100   |

|     |          |            |                  |
|-----|----------|------------|------------------|
| 17> | SIC/XE : | RSUB RDREC |                  |
|     |          | :          | SICH BUFFER, X   |
|     | RDREC    | LDX #0     |                  |
|     |          | LDT #100   | TXR T            |
|     |          | LDS #0     | JLT RLOOP        |
|     | RLOOP    | TD INDEV   | EXIT STX LENGTH  |
|     |          | JEQ RLOOP  | RSUB             |
|     |          | RD INDEV   | :                |
|     | Compr.   | A, S       | INDEV BYTE X 'F' |
|     |          | JEQ EXIT   | LENGTH RESW 1    |
|     |          |            | BUFFER RECB 100  |

11 To sort an array of 10 words in an ascending order,

|        |      |           |
|--------|------|-----------|
| OUTER  | LDX  | INDEX ;   |
|        | LDS  | ARR1, X ; |
|        | LDX  | #0        |
| INNER  | LDT  | ARR1, X   |
|        | COMR | S,T       |
|        | JLT  | LOOP      |
|        | JEQ  | LOOP      |
|        | RMO  | S,A       |
|        | RMO  | T,S       |
|        | RMO  | A,T       |
|        | RMO  | X,A       |
|        | LDX  | INDEX     |
|        | STS  | ARR1,X    |
|        | RMO  | A,X       |
|        | STT  | ARR1,X    |
| LOOP   | RMO  | X,A       |
|        | APP  | #3        |
|        | COMP | LENGTH    |
|        | RMO  | A,X       |
|        | JLT  | INNER     |
|        | LDA  | INDEX     |
|        | ADD  | #3        |
|        | COMP | LENGTH    |
|        | STA  | INDEX     |
|        | JLT  | OUTER     |
| INDEX  | WORD | 0         |
| ARR1   | RSBW | 10        |
| LENGTH | WORD | 30        |

QP

1. Write a SIC program to copy string 'SYSTEM SOFTWARE' to another string.

LDX ZERO : Initialize X to 30H  
MOVCH LDCH STR1,X ; X specifies indexing  
STCH STR2,X  
TIX FIFTEEN ; Increment X and compare with 15  
JLT MOVCH  
  
STR1 BYTE C 'SYSTEM SOFTWARE'  
STR2 RSS 15  
ZERO WORD 0  
FIFTEEN WORD 15

# Comparison Chart of SIC and SIC/XE machine

| Specification              | SIC                                                                                                                                                                                                                                                                                                                                                                                                                              | SIC/XE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>Memory</b>              | <ul style="list-style-type: none"> <li>• Word size: 3 bytes (24 bits)</li> <li>• Total size: 32,768 bytes (2<sup>15</sup>). Thus any memory address will need at most 15 bits to be referenced ('almost' four hex characters).</li> </ul>                                                                                                                                                                                        | <ul style="list-style-type: none"> <li>• Word size: 3 bytes (24 bits)</li> <li>• Total size: 32,768 bytes (2<sup>15</sup>). Thus any memory address will need at most 15 bits to be referenced ('almost' four hex characters).</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| <b>Register</b>            | <ul style="list-style-type: none"> <li>• Total Registers: 5</li> <li>• Accumulator (A): Used for most of the operations (number 0)</li> <li>• Index (X): Used for indexed addressing (number 1)</li> <li>• Linkage (L): Stores return addresses for JSUB (number 2)</li> <li>• Program Counter (PC): Address for next instruction (number 8)</li> <li>• Status Word (SW): Information and condition codes (number 9).</li> </ul> | <ul style="list-style-type: none"> <li>• Total Registers: 9, same 5 from SIC plus 4 additional ones.</li> <li>• Base (B): Used for base-relative addressing (number 3)</li> <li>• General (S and T): General use (numbers 4 and 5 resp.)</li> <li>• Floating Point Accumulator (F): Used for floating point arithmetic, 48 bits long (number 6)</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| <b>Instruction Formats</b> | <ul style="list-style-type: none"> <li>• Only one instruction format of 24 bits [3 bytes / 1 word]</li> <li>• Opcode: first 8 bits, direct translation from the Operation Code Table</li> <li>• Flag (X): next bit indicates address mode (0 direct - 1 indexed)</li> <li>• Address: next 15 bits, indicate address of operand according to address mode.</li> </ul>                                                             | <ul style="list-style-type: none"> <li>• Four instruction formats</li> <li>• Format 1 (1 byte): contains only operation code [straight from table]</li> <li>• Format 2 (2 bytes): first eight bits for operation code, next four for register 1 and following four for register 2.</li> <li>• The numbers for the registers go according to the numbers indicated at the registers section [ie, register T is replaced by hex 5].</li> <li>• If the operation uses only one register the last hex digit becomes '\0' [ie, TIXR T becomes B850]</li> <li>• Format 3 (3 bytes): First 6 bits contain operation code, next 6 bits contain flags, last 12 bits contain displacement for the address of the operand.</li> <li>• Operation code uses only 6 bits, thus the second hex digit will be affected by the values of the first two flags (n and i)</li> <li>• The flags, in order, are: n, i, x, b, p, and e. Its functionality is explained in the next section.</li> <li>• The last flag e indicates the instruction format (0 for 3 and 1 for 4)</li> <li>• Format 4 (4 bytes): same as format 3 with an extra 2 hex digits (8 bits) for addresses that require more than 12 bits to be represented</li> </ul> |

| Specification            | SIGAC                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Addressing Modes         | <ul style="list-style-type: none"> <li>• Only two possible addressing modes</li> <li>• <b>Direct</b> (<math>x = 0</math>): operand address goes as it is indexed (<math>x = 1</math>): value to be added to the value stored at the register <math>x</math> to obtain real address of the operand.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Assembler Considerations | <ul style="list-style-type: none"> <li>• Operation code gets translated directly from table (no need to check other bits)</li> <li>• <math>x</math> bit dependent on the addressing mode of the operand. If indexed the code will have to indicate it with '<math>X</math>' after the operand name (ie. BUFFER,X)</li> <li>• The last 3 hex digits of the address will remain the same, the first hex digit (leftmost) will change if the address is indexed (first bit becomes one, thus the hex digit increases by 8). I.e, if the address of the operand is 124A and the addressing is indexed, the object code will indicate 924A.</li> <li>• <b>Relative:</b> for Base relative, the instruction BASE will precede the current instruction.</li> <li>• Any other format, except immediate, will be considered Program Counter relative. If the displacement with respect to the PC does not fit into the 12 bits, the assembler should try to compute the displacement with respect to the Base register. If neither case works, the instruction should be extended to format 4, where the addressing mode becomes direct.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|                          | <ul style="list-style-type: none"> <li>• five possible addressing modes plus combinations (see page 11 for examples)</li> <li>• <b>Direct</b> (<math>x, b</math>, and <math>p</math> all set to 0): operand address goes as it is, <math>n</math> and <math>i</math> are both set to the same value, either 0 or 1. While in general that value is 1, if set to 0 for format 3 we can assume that the rest of the flags (<math>x, b, p</math>, and <math>e</math>) are used as a part of the address of the operand, to make the format compatible to the SIGC format</li> <li>• <b>Relative</b> (either <math>b</math> or <math>p</math> equal to 1 and the other one to 0): the address of the operand should be added to the current value stored at the B register (<math>b = 1</math>) or to the value stored at the PC register (<math>p = 1</math>)</li> <li>• <b>Immediate</b> (<math>i = 1, n = 0</math>): The operand value is already enclosed on the instruction (ie. lies on the last 12/20 bits of the instruction)</li> <li>• <b>Indirect</b> (<math>i = 0, n = 1</math>): The operand value points to an address that holds the address for the operand value</li> <li>• <b>Indexed</b> (<math>x = 1</math>): value to be added to the value stored at the register <math>x</math> to obtain real address of the operand. This can be combined with any of the previous modes except immediate.</li> <li>• Operation code gets translated directly from table. While the first hex digit remains the same, the second one can change according to the values of the <math>n</math> and <math>i</math> flags. Thus, we can add 1, 2 or 3 to the operation code.</li> <li>• Direct addressing is mainly used in extended format (format 4) and is indicated with a '+' before the operand (an indication that the format is 4, which will also make the <math>e</math> flag to be 1).</li> <li>• Relative: for Base relative, the instruction BASE will precede the current instruction.</li> <li>• Any other format, except immediate, will be considered Program Counter relative. If the displacement with respect to the PC does not fit into the 12 bits, the assembler should try to compute the displacement with respect to the Base register. If neither case works, the instruction should be extended to format 4, where the addressing mode becomes direct.</li> </ul> |

**SIC/XE**

- **Immediate addressing** will be indicated by the use of '\#' before the operand name/value (i.e. #1)
- **Indirect addressing** will be indicated by adding the prefix '\@' to the operand name (i.e. @RETADR)
- **Indexed addressing** will be indicated the same way as it was for the SIC machine, '\X' after the operand name (i.e. BUFFER,X)
- Hex digits for the address are not affected by the content of the flags, since the first two flags affect the second digit of the operation code, and the following four make up its own hex digit.

## CHAPTER 2

# Assemblers

### 2.1 Basic Assembler Functions

2.1.1 A Simple SIC Assembler

2.1.2 Assembler Algorithm and Data Structures

### 2.2 Machine-Dependent Assembler Features

2.2.1 Instruction Formats and Addressing Modes

2.2.2 Program Relocation

### 2.3 Machine-Independent Assembler Features

2.3.1 Literals

2.3.2 Symbol-Defining Statements

2.3.3 Expressions

2.3.4 Program Blocks

2.3.5 Control Sections and Program Linking

### 2.4 Assembler Design Options

2.4.1 One-Pass Assemblers

2.4.2 Multi-Pass Assemblers

## Chapter 2 : Assemblers



Assembler does two functions

- 1) It converts the mnemonic operation codes into their machine language equivalent
  - 2) Converts symbolic labels into their machine address
- The design of assembler can be of
1. Convert mnemonic operation codes to their machine language equivalent. ex: Translate `STL` to 14
  2. Convert symbolic operands to their equivalent machine addresses. ex: Translate `READR` to 1033
  3. Build the machine instructions in the proper format.
  4. Convert the data constants specified in the source program into their internal machine representation.  
ex: Translate '`EOF`' to `45hFh6`
  5. Write the object program and the assembly listing

## Different datastructures for assemblers

- 1 Operation code Table (OPTAB)
- 2 Symbol Table (SYMTAB)
- 3 Location Counter Variable (LOCCTR)

## I OPERATION CODE TABLE (OPTAB)

### a) contents

- Mnemonic operation codes
- machine language equivalent
- Instruction format and length

### b) During pass-1

- validate op codes
- find instruction lengths to increase location counter value (LOCCTR)

### c) During pass-2

- determines the instruction format (3 or 4)
- translates the operation codes to their machine language equivalents

### d) Implementation

- static hash table, easy for searching

| Mnemonic name | op code | Format |
|---------------|---------|--------|
| ADD m         | 18      | 3/4    |
| :             |         |        |
| :             |         |        |
| :             |         |        |
|               |         |        |

## E. SYMBOL TABLE (SYMTAB)

### a) contents

- label name
- label address
- Flags to indicate error conditions
- data type or length

### b) During pass-1

- store label name and assigned address  
(from LCCR) in SYMTAB

### c) During pass-2

- symbols used as operands are looked up  
in SYMTAB

### d) Implementations

- A dynamic hash table for efficient insertion and retrieval
- should perform well with non-random keys (LOOP1, LOOP2, ...)

| Label name | value | Flags | Length |
|------------|-------|-------|--------|
| CLOOP      | 0003  |       |        |
|            |       |       |        |
|            |       |       |        |

### iii. LOCATION COUNTER VARIABLE (LOCCTR)

- Variable accumulated for address argument  
ie LOCCTR gives the address of the associated labels
- LOCCTR is initialized to be the beginning address specified in the "START" statement
- After each source stmt is processed during pass-1, the instruction length or data area is added to LOCCTR

→ The functionality of assembler looks like this



note : During pass-1, the address of labels is not known :  
it is defined later ie called forward reference. To  
resolve this we go for pass-2.

Eg. JEQ RETADR

## 8.1 Assembler Directives

- 1) START specifies name and starting address for the program
- 2) END indicates the end of the source program and optionally specify the first executable instruction in the program
- 3) BYTE generates character or hexadecimal constant, occupying as many bytes as needed to represent the constant
- 4) WORD generates one-word integer constant
- 5) RESB Reserve the indicated number of bytes for a data area
- 6) RESW Reserve the indicated number of words for a data area.
- 7) LTORG creates a literal pool that contains all of the literal operands used since the previous LTORG or the beginning of the program
- 8) EQU establishes symbolic names that can be used for improved readability instead of numeric values and also used to define mnemonic names for registers.

q) ORG used to indirectly assign values to symbols

10) USE Indicates which portion of the source program belongs to the various blocks and also indicates a continuation of a previously begun block

11) BASE Indicates that the base register will contain the address of operand

12) NOBASE Indicates that the contents of the base register can no longer be relied upon for addressing.

### 3.1.1 A simple 8085 assembly

The usual (general) format to represent the assembly language program for 8085 machine with generated assembly code :



where

→ LABEL : An identifier and optional labels are used to reduce reliance upon programmers remembering where data or code is located. The length of label differs between assemblers.

Ex:- FIRST STL #A096.

→ OPCODE : Is a machine code instruction. It may require additional information like operand (optional)

Ex:- COMP ZERO ; with operand

OR

RSUB ; without operand

→ OPERAND : Is an additional data or information that the opcode requires. Operands are used to specify constants, labels, immediate data, data contained in another register, an address etc

## Advantages and Disadvantages of assembly language

Advantages : → Reduced Errors

→ Faster Translation time

→ changes could be made easier and faster

Disadvantages : → many instructions are required to achieve small tasks

→ source program tend to be large and difficult to follow

→ Programs are machine dependent, thus the complete program has to be rewritten if the hardware is changed

→ The programmers have to have the complete knowledge of the process architecture and instruction set.

| Mnemonic    | Format | Opcode | Effect                                                                                                                                        | Notes |
|-------------|--------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------|
| ADD m       | 3/4    | 18     | A $\leftarrow (A) + (m.m+2)$                                                                                                                  |       |
| ADDF m      | 3/4    | 58     | F $\leftarrow (F) + (m.m+5)$                                                                                                                  | X F   |
| ADDR r1,r2  | 2      | 90     | r2 $\leftarrow (r2) + (r1)$                                                                                                                   | X     |
| AND m       | 3/4    | 40     | A $\leftarrow (A) \& (m.m+2)$                                                                                                                 |       |
| CLEAR r1    | 2      | 84     | r1 $\leftarrow 0$                                                                                                                             |       |
| COMPF m     | 3/4    | 28     | (A) : (m..m+2)                                                                                                                                |       |
| COMPF F     | 3/4    | 88     | (F) : (m..m+5)                                                                                                                                |       |
| COMPR r1,r2 | 3      | A0     | (r1) : (r2)                                                                                                                                   |       |
| DIV m       | 3/4    | 24     | A $\leftarrow (A) / (m..m+2)$                                                                                                                 |       |
| DIVF m      | 3/4    | 64     | F $\leftarrow (F) / (m..m+5)$                                                                                                                 |       |
| DIVR r1,r2  | 2      | 9C     | r2 $\leftarrow (r2) / (r1)$                                                                                                                   |       |
| FX          | 1      | C4     | A $\leftarrow (F)$ [convert to integer]                                                                                                       |       |
| FLOAT       | 1      | C0     | F $\leftarrow (A)$ [convert to floating]                                                                                                      |       |
| I/O         | 1      | F4     | Halt I/O channel number (A)                                                                                                                   | P X   |
| J m         | 3/4    | 3C     | PC $\leftarrow m$                                                                                                                             |       |
| JEQ m       | 3/4    | 30     | PC $\leftarrow m$ if CC set to =                                                                                                              |       |
| JGT m       | 3/4    | 34     | PC $\leftarrow m$ if CC set to >                                                                                                              |       |
| JLT m       | 3/4    | 38     | PC $\leftarrow m$ if CC set to <                                                                                                              |       |
| JSTB m      | 3/4    | 48     | L $\leftarrow (PC)$ ; PC $\leftarrow m$                                                                                                       |       |
| LDA m       | 3/4    | 00     | A $\leftarrow (m..(m+2))$                                                                                                                     |       |
| LDB m       | 3/4    | 68     | B $\leftarrow (m..m+2)$                                                                                                                       |       |
| LDCFH m     | 3/4    | 50     | A [rightmost byte] $\leftarrow (m)$                                                                                                           |       |
| LDF m       | 3/4    | 70     | F $\leftarrow (m..m+5)$                                                                                                                       | X F   |
| LDL m       | 3/4    | 08     | L $\leftarrow (m..m+2)$                                                                                                                       |       |
| LDS m       | 3/4    | 6C     | S $\leftarrow (m..m+2)$                                                                                                                       | X     |
| LDT m       | 3/4    | 74     | T $\leftarrow (m..m+2)$                                                                                                                       | X     |
| LDX m       | 3/4    | 04     | X $\leftarrow (m..m+2)$                                                                                                                       |       |
| LPS m       | 3/4    | D0     | Load processor status from information beginning at address m (see Section 6.2.1)                                                             | P X   |
| MUL m       | 3/4    | 20     | A $\leftarrow (A)^*(m..m+2)$                                                                                                                  |       |
| Mnemonic    | Format | Opcode | Effect                                                                                                                                        | Notes |
| MULF m      | 3/4    | 60     | F $\leftarrow (F)^*(m..m+5)$                                                                                                                  | X F   |
| MULR r1,r2  | 2      | 98     | r2 $\leftarrow (r2)^*(r1)$                                                                                                                    | X     |
| NORM        | 1      | C8     | F $\leftarrow (F)$ [normalized]                                                                                                               | X F   |
| OR m        | 3/4    | 44     | A $\leftarrow (A) \mid (m..m+2)$                                                                                                              |       |
| RD m        | 3/4    | DB     | A [rightmost byte] $\leftarrow$ data from device specified by (m)                                                                             | P     |
| RMO r1,r2   | 2      | AC     | r2 $\leftarrow (r1)$                                                                                                                          | X     |
| RSUB        | 3/4    | 4C     | PC $\leftarrow (L)$                                                                                                                           |       |
| SHFTL r1,n  | 2      | A4     | r1 $\leftarrow (r1)$ ; left circular shift n bits. [In assembled instruction, r2 = n-1]                                                       | X     |
| SHTR r1,n   | 2      | AS     | r1 $\leftarrow (r1)$ ; right shift n bits, with vacated bit positions set equal to leftmost bit of (r1). [In assembled instruction, r2 = n-1] | X     |
| SSK m       | 3/4    | EC     | Protection key for address m $\leftarrow (A)$ (see Section 6.2.4)                                                                             | P X   |
| STA m       | 3/4    | 0C     | m..m+2 $\leftarrow (A)$                                                                                                                       |       |
| STB m       | 3/4    | 78     | m..m+2 $\leftarrow (B)$                                                                                                                       | X     |
| STCH m      | 3/4    | 54     | m $\leftarrow (A)$ [rightmost byte]                                                                                                           |       |
| STF m       | 3/4    | 80     | m..m+5 $\leftarrow (F)$                                                                                                                       | X F   |
| STI m       | 3/4    | D4     | Interval timer value $\leftarrow (m..m+2)$ (see Section 6.2.1)                                                                                | P X   |
| STL m       | 3/4    | 14     | m..m+2 $\leftarrow (L)$                                                                                                                       |       |
| STS m       | 3/4    | 7C     | m..m+2 $\leftarrow (S)$                                                                                                                       | X     |
| STSW m      | 3/4    | E8     | m..m+2 $\leftarrow (SW)$                                                                                                                      | P     |
| STT m       | 3/4    | 84     | m..m+2 $\leftarrow (T)$                                                                                                                       | X     |
| STX m       | 3/4    | 10     | m..m+2 $\leftarrow (X)$                                                                                                                       |       |
| SUB m       | 3/4    | 1C     | A $\leftarrow (A) - (m..m+2)$                                                                                                                 |       |
| SUFB m      | 3/4    | 5C     | F $\leftarrow (F) - (m..m+5)$                                                                                                                 | X F   |

| Mnemonic   | Format | Opcode | Effect                                                               | Notes |
|------------|--------|--------|----------------------------------------------------------------------|-------|
| SUBR r1,r2 | 2      | 94     | $r2 \leftarrow (r2) - (r1)$                                          | X     |
| SVC n      | 2      | B0     | Generate SIC interrupt. [In assembled instruction, $r1 = n$ ]        | X     |
| TD m       | 3/4    | E0     | Test device specified by (m)                                         | P C   |
| I/O        | 1      | F8     | Test I/O channel number (A)                                          | P X C |
| TIX m      | 3/4    | 2C     | $X \leftarrow (X) + 1; (X); (m.m+2)$                                 | C     |
| TXR r1     | 2      | B8     | $X \leftarrow (X) + 1; (X); (r1)$                                    | X C   |
| WD m       | 3/4    | DC     | Device specified by (m) $\leftarrow (A) \quad P$<br>[rightmost byte] |       |

### Instruction Formats

|                     | Addressing type | Flag bits | Assembler language notation | Calculation of target address TA | Operand           | Notes |
|---------------------|-----------------|-----------|-----------------------------|----------------------------------|-------------------|-------|
| Format 1 (1 byte):  | Simple          | 110000    | op c                        | disp                             | (TA)              | D     |
|                     |                 | 110001    | +op m                       | addr                             | (TA)              | 4 D   |
| Format 2 (2 bytes): |                 |           | 110010                      | op m                             | (PC) + disp       | (TA)  |
|                     |                 |           | 110100                      | op m                             | (B) + disp        | {(A)} |
|                     |                 |           | 111000                      | op c,X                           | disp + (X)        | {(A)} |
|                     |                 |           | 111001                      | +op m,X                          | addr + (X)        | (TA)  |
|                     |                 |           | 111010                      | op m,X                           | (PC) + disp + (X) | (TA)  |
| Format 3 (3 bytes): |                 |           | 111100                      | op m,X                           | (B) + disp + (X)  | (TA)  |
|                     |                 |           | 000000                      | -                                | b/p/e/disp        | {(A)} |
|                     |                 |           | 001000                      | op m,X                           | b/p/e/disp + (X)  | (TA)  |
| Format 4 (4 bytes): | Indirect        | 100000    | op @m                       | disp                             | ((TA))            | D S   |
|                     |                 | 100001    | +op @m                      | addr                             | ((TA))            | 4 D   |
|                     |                 | 100010    | op @m                       | (PC) + disp                      | ((TA))            | A     |
|                     |                 | 100100    | op @m                       | (B) + disp                       | ((TA))            | A     |
|                     | Immediate       | 010000    | op #c                       | disp                             | TA                | D     |
|                     |                 | 010001    | +op #m                      | addr                             | TA                | 4 D   |
|                     |                 | 010010    | op #m                       | (PC) + disp                      | TA                | A     |
|                     |                 | 010100    | op #m                       | (B) + disp                       | TA                | A     |

### Addressing Modes

The following addressing modes apply to Format 3 and 4 instructions. Combinations of addressing bits not included in this table are treated as errors by the machine. In the description of assembler language notation, c indicates a constant between 0 and 4095 (or a memory address known to be in this

range); m indicates a memory address or a constant value larger than 4095. Further information can be found in Section 1.3.2.

The letters in the Notes column have the following meanings:

- 4 Format 4 instruction
- D Direct-addressing instruction
- A Assembler selects either program-counter relative or base-relative mode
- S Compatible with instruction format for standard SIC machine. Operand value can be between 0 and 32,767 (see Section 1.3.2 for details).

### Notes

#### Format 1 (1 byte):



#### Format 2 (2 bytes):



#### Format 3 (3 bytes):



#### Format 4 (4 bytes):



| Line | Source statement |        |                                        |          |                                |
|------|------------------|--------|----------------------------------------|----------|--------------------------------|
|      | 5                | COPY   | START                                  | 1000     | COPY FILE FROM INPUT TO OUTPUT |
|      | 10               | FIRST  | STL                                    | RETADR   | SAVE RETURN ADDRESS            |
|      | 15               | CLOOP  | JSUB                                   | RDREC    | READ INPUT RECORD              |
|      | 20               |        | LDA                                    | LENGTH   | TEST FOR EOF (LENGTH = 0)      |
|      | 25               |        | COMP                                   | ZERO     |                                |
|      | 30               |        | JEQ                                    | ENDFIL   | EXIT IF EOF FOUND              |
|      | 35               |        | JSUB                                   | WRREC    | WRITE OUTPUT RECORD            |
|      | 40               |        | J                                      | CLOOP    | LOOP                           |
|      | 45               | ENDFIL | LDA                                    | EOF      | INSERT END OF FILE MARKER      |
|      | 50               |        | STA                                    | BUFFER   |                                |
|      | 55               |        | LDA                                    | THREE    | SET LENGTH = 3                 |
|      | 60               |        | STA                                    | LENGTH   |                                |
|      | 65               |        | JSUB                                   | WRREC    | WRITE EOF                      |
|      | 70               |        | LDL                                    | RETADR   | GET RETURN ADDRESS             |
|      | 75               |        | RSUB                                   |          | RETURN TO CALLER               |
|      | 80               | EOF    | BYTE                                   | C'EOF'   |                                |
|      | 85               | THREE  | WORD                                   | 3        |                                |
|      | 90               | ZERO   | WORD                                   | 0        |                                |
|      | 95               | RETADR | RESW                                   | 1        |                                |
|      | 100              | LENGTH | RESW                                   | 1        | LENGTH OF RECORD               |
|      | 105              | BUFFER | RESB                                   | 4096     | 4096-BYTE BUFFER AREA          |
|      | 110              |        |                                        |          |                                |
|      | 115              |        | SUBROUTINE TO READ RECORD INTO BUFFER  |          |                                |
|      | 120              |        |                                        |          |                                |
|      | 125              | RDREC  | LDX                                    | ZERO     | CLEAR LOOP COUNTER             |
|      | 130              |        | LDA                                    | ZERO     | CLEAR A TO ZERO                |
|      | 135              | RLOOP  | TD                                     | INPUT    | TEST INPUT DEVICE              |
|      | 140              |        | JEQ                                    | RLOOP    | LOOP UNTIL READY               |
|      | 145              |        | RD                                     | INPUT    | READ CHARACTER INTO REGISTER A |
|      | 150              |        | COMP                                   | ZERO     | TEST FOR END OF RECORD (X'00') |
|      | 155              |        | JEQ                                    | EXIT     | EXIT LOOP IF EOR               |
|      | 160              |        | STCH                                   | BUFFER,X | STORE CHARACTER IN BUFFER      |
|      | 165              |        | TIK                                    | MAXLEN   | LOOP UNLESS MAX LENGTH         |
|      | 170              |        | JLT                                    | RLOOP    | HAS BEEN REACHED               |
|      | 175              | EXIT   | STX                                    | LENGTH   | SAVE RECORD LENGTH             |
|      | 180              |        | RSUB                                   |          | RETURN TO CALLER               |
|      | 185              | INPUT  | BYTE                                   | X'F1'    | CODE FOR INPUT DEVICE          |
|      | 190              | MAXLEN | WORD                                   | 4096     |                                |
|      | 195              |        |                                        |          |                                |
|      | 200              |        | SUBROUTINE TO WRITE RECORD FROM BUFFER |          |                                |
|      | 205              |        |                                        |          |                                |
|      | 210              | WRREC  | LDX                                    | ZERO     | CLEAR LOOP COUNTER             |
|      | 215              | WLOOP  | TD                                     | OUTPUT   | TEST OUTPUT DEVICE             |
|      | 220              |        | JEQ                                    | WLOOP    | LOOP UNTIL READY               |
|      | 225              |        | LDCH                                   | BUFFER,X | GET CHARACTER FROM BUFFER      |
|      | 230              |        | WD                                     | OUTPUT   | WRITE CHARACTER                |
|      | 235              |        | TIK                                    | LENGTH   | LOOP UNTIL ALL CHARACTERS      |
|      | 240              |        | JLT                                    | WLOOP    | HAVE BEEN WRITTEN              |
|      | 245              |        | RSUB                                   |          | RETURN TO CALLER               |
|      | 250              | OUTPUT | BYTE                                   | X'05'    | CODE FOR OUTPUT DEVICE         |
|      | 255              |        | END                                    | FIRST    |                                |

Figure 2.1 Example of a SIC assembler language program.

ASSEMBLER OUTPUT

| Line | Loc  | Length | Source statement                        | Object code | Assembly                               |
|------|------|--------|-----------------------------------------|-------------|----------------------------------------|
|      |      |        | LABEL                                   | OPCODE      | OPERAND                                |
| 5    | 1000 |        | COPY                                    | START       | 1000                                   |
| 10   | 1000 | 3      | FIRST                                   | STL         | RETADR                                 |
| 15   | 1003 | 3      | CLOOP                                   | JSUB        | RDRREC                                 |
| 20   | 1006 | 3      |                                         | LDA         | LENGTH                                 |
| 25   | 1009 | 3      |                                         | COMP        | ZERO                                   |
| 30   | 100C | 3      |                                         | JEQ         | ENDFIL                                 |
| 35   | 100F | 3      |                                         | JSUB        | WRREC                                  |
| 40   | 1012 | 3      |                                         | J           | CLOOP                                  |
| 45   | 1015 | 3      | ENDFIL                                  | LDA         | EOP                                    |
| 50   | 1018 | 3      |                                         | STA         | BUFFER                                 |
| 55   | 101B | 3      |                                         | LDA         | THREE                                  |
| 60   | 101E | 2      |                                         | STA         | LENGTH                                 |
| 65   | 1021 | 3      |                                         | JSUB        | WRREC                                  |
| 70   | 1024 | 3      |                                         | LDL         | RETADR                                 |
| 75   | 1027 | 3      |                                         | RSUB        |                                        |
| 80   | 102A | 3      | EOP                                     | BYTE        | C'EOP'                                 |
| 85   | 102D | 3      | THREE                                   | WORD        | 3 BYTES                                |
| 90   | 1030 | 2      | ZERO                                    | WORD        | 0                                      |
| 95   | 1033 | 3      | RETADR                                  | RESW        | 1 → PAGE                               |
| 100  | 1036 | 3      | LENGTH                                  | RESW        | 1                                      |
| 105  | 1039 | 4096   | BUFFER                                  | RESB        | (1000 in hexadecimal)                  |
| 110  |      |        |                                         |             |                                        |
| 115  | 100C | 1000   | { SUBROUTINE TO READ RECORD INTO BUFFER |             |                                        |
| 120  |      |        | }                                       |             |                                        |
| 125  | 2039 | 3      | RDRREC                                  | LDX         | ZERO                                   |
| 130  | 203C | 3      |                                         | LDA         | ZERO                                   |
| 135  | 203F | 3      | RLOOP                                   | TD          | INPUT                                  |
| 140  | 2042 | 3      |                                         | JEQ         | RLOOP                                  |
| 145  | 2045 | 3      |                                         | RD          | INPUT                                  |
| 150  | 2048 | 3      |                                         | COMP        | ZERO                                   |
| 155  | 204B | 3      |                                         | JEQ         | EXIT                                   |
| 160  | 204E | 3      |                                         | STCH        | BUFFER,X                               |
| 165  | 2051 | 3      |                                         | TIX         | MAXLEN                                 |
| 170  | 2054 | 3      |                                         | JLT         | RLOOP                                  |
| 175  | 2057 | 3      | EXIT                                    | STX         | LENGTH                                 |
| 180  | 205A | 3      |                                         | RSUB        |                                        |
| 185  | 205D | 1      | INPUT                                   | BYTE        | X'F1'                                  |
| 190  | 205E | 3      | MAXLEN                                  | WORD        | 4096                                   |
| 195  |      |        |                                         |             | 4 bytes = bytes                        |
| 200  |      |        |                                         |             | SUBROUTINE TO WRITE RECORD FROM BUFFER |
| 205  |      |        |                                         |             |                                        |
| 210  | 2061 | 3      | WRREC                                   | LDX         | ZERO                                   |
| 215  | 2064 | 3      | WLOOP                                   | TD          | OUTPUT                                 |
| 220  | 2067 | 3      |                                         | JEQ         | WLOOP                                  |
| 225  | 206A | 3      |                                         | LOCH        | BUFFER,X                               |
| 230  | 206D | 3      |                                         | WD          | OUTPUT                                 |
| 235  | 2070 | 3      |                                         | TIX         | LENGTH                                 |
| 240  | 2073 | 3      |                                         | JLT         | WLOOP                                  |
| 245  | 2076 | 3      |                                         | RSUB        |                                        |
| 250  | 2079 | 1      | OUTPUT                                  | BYTE        | X'05'                                  |
| 255  | 207A |        |                                         | END         | FIRST                                  |

Figure 2.2 Program from Fig. 2.1 with object code.

The following program contains a main routine that reads records from an input device (code: F1) and copies them to output device (code: 05).

Main function calls subroutine RDREC to read a record into a buffer and subroutine WRREC to write record from the buffer to output device.

Each subroutine transfers one record one character at a time because only I/O instructions available are RD and WD.

Since the I/O rates of two devices (disk and a printing terminal) may be different, a buffer is used. The end of each record is marked with a null character ie 00 (in hexadecimal). If a record is longer than length of buffer (H096 bytes) then only the first H096 bytes are copied. The end of file to be copied is indicated by a zero length record.

The program indicates EOF (End of File) on output device when the zero length record (ie end of file) is detected. The program terminates by executing the RSUB instruction since it was called by JSUB instruction.

## Procedure to generate object code and object Program ( Intermediate File )

Note: We have assumed that the program starts at address 1000.

- 1) First and foremost write the locctr address
  - START 1000
  - Add 3 bytes for each instructions. ('.' instruction format for sic mfc is 3 bits ie 3 bytes)
  - BYTES C 'EOF' : count the length of constant and add those many bytes
  - RESW 2000 : then it should be  $2000 \times 3 \text{ bytes} = 6000B = 1770(H)$  added to previous address
  - RESW 1 : add just 3 bytes
  - RESB 2000 : convert 2000 to hexadecimal ( $\sim 700$ ) ie 700 bytes and add
  - RESB A096 :  $A096 \rightarrow 1000_{(H)}$  is added to previous value
  - WORD 3 or WORD 0 → 3 bytes added.
- 2) Start creating the object code.
  - Convert mnemonic operation codes to their machine language equivalent. ex: STL to 14
  - Convert symbolic operands to their equivalent machine address ex: RETADR to 1033 (forward reference)

→ Build machine instruction in proper format

a) Direct addressing :  $x=0$  :  $TA = \text{address}$

b) Indirect addressing :  $x=1$  :  $TA = \text{address}(x)$

→ indicated by symbol 'X'

Eg:-  $\text{STCH BUFFER}, X \rightarrow \text{Line No. } 160$

→ convert the data constants into their machine

representation. Eg:- EOF to 154F46 (Line no 80)

(A=65, a=97)

$\hookrightarrow (41)_{16} \hookrightarrow (61)_{16}$

3) Write the object program (Intermediate File)

→ object program contains three types of records.

a) Header Record      b) Text Record      c) End Record.

c) Header Record : Contains program name, starting address and length of program.

|            |                                                  |
|------------|--------------------------------------------------|
| column 1   | H                                                |
| col. 2-7   | Program name                                     |
| col. 8-13  | starting address of object Program (Hexadecimal) |
| col. 14-19 | Length of object program in bytes (Hexadecimal)  |

Eg:- 5  $\hookrightarrow$  <sup>name of program</sup> COPY START 1000  $\hookrightarrow$  starting address

25B 207A 13FD

$$\begin{aligned}\text{length of program} &= \text{last address} - \text{starting address} \\ &= 207A - 1000 = 107A\end{aligned}$$

i.e H,COPY,001000,00107A (Header Record)

### b) Text Record :

Text record contains the translated instructions (machine code) and data of the program together with an indication of addresses where these are to be loaded.

|           |                                                                               |
|-----------|-------------------------------------------------------------------------------|
| col. 1    | T                                                                             |
| col. 2-7  | starting address for object code in this record (Hexadecimal)                 |
| col 8-9   | length of object code in this record in bytes (hexadecimal)                   |
| col 10-69 | object code represented in hexadecimal<br>(2 columns per byte of object code) |

↓ note

60 columns

length of object code

⇒ 10 words ⇒ 30 bytes ⇒  $(1E)_{16}$

Ex:- 10 10000 FIRST SIL RETADDR  $\left.\begin{array}{l} 141033 \\ \text{3 bytes each} \end{array}\right\}$  10 words  
 : : :  
 55 101B LDA THREE 00102D

Text record →

$T \wedge 001000 \wedge 1B \wedge 141033 \wedge \dots \dots \dots \wedge 00102D$

→ marker for separation

c) End Record :

End record marks the end of the object program and specifies the address in the program where execution is to begin. If no operand is specified then the address of the first executable instruction is used.

|         |                                                                         |
|---------|-------------------------------------------------------------------------|
| col 1   | E                                                                       |
| col 2-7 | Address of first executable instruction in object program (hexadecimal) |

Ex:- 10 1000 FIRST STL RETADR 10103

;

;

255 END FIRST

End record  $\rightarrow$  E<sub>1001000</sub>

Let us start for the given program in Fig. 2.11

Given opcodes

|           |           |           |           |        |
|-----------|-----------|-----------|-----------|--------|
| STL - 1H  | T - 3C    | LDX - 0H  | JLT - 38  | F - H6 |
| JSUB - 4B | STA - 0C  | TD - ED   | LDCH - 50 |        |
| LDA - 00  | STX - 10  | RD - D8   | WP - DC   |        |
| COMP - 28 | LDL - 08  | STCH - 5H | E - H5    |        |
| JEQ - 30  | RSUB - HC | TRX - 2C  | O - HF    |        |

① start incrementing LOCAR

Initially it is 1000.

→ start adding 3 bytes each time from line no. 5 to 105

→ 105 1039 BUFFER RESB A096  
→ convert to Hexadecimal i.e  
 $(A096)_{16} = 1000$

∴ add 1000 bytes to 1039 = 2039

∴ Line no. 125 starts at 2039<sup>th</sup> address continue till line no. 185.

→ 185 205D INPUT BYTE X 'F'  
→ 185 205D INPUT BYTE X 'F' F1  
→ 185 205D INPUT BYTE X 'F' F1 1 byte

∴ add only 1 byte to 205D  $\Rightarrow$  205E at line no 190.

→ 190 205E MAXLEN LOOPD 4096 001000  
→ word & 3 bytes not 1000 bytes

∴ 3 bytes added to 205E  $\Rightarrow$  2061 at line no. 210

→ 210 2061 WRREC LDZ ZERO D4030 -  
continue the same till end.

→ 255 207A END FIRST

③ object wrote for each line.

7

→ Every line is direct addressing except line no. 160

and 325

| Line  | LocADR | label  | opcode | operand          | object code |
|-------|--------|--------|--------|------------------|-------------|
| (i) 5 | 1000   | COPY   | START  | 1000             |             |
| 10    | 1000   | FIRST  | STL    | RETADR           | 1h 1033     |
| :     |        |        |        | mnemonic<br>code |             |
| 95    | 1033   | RETADR | RESW   | 1                |             |

(ii)  $\text{R}\ddot{\text{U}}\text{B}$       1027

(iii) 160 20H (3) STCH BUFFER, X → indicates indexed addressing  
 5H ↓  
 address of buffer = 1039



5 4 9 0 3 9

160 204E SICK BUFFER, X SH9039

(iv) same for line number 325

225      206A      LDCH      BUFFER, X  
 ↓                  ↓  
 50                  1039

| opcode    | X | Address            |
|-----------|---|--------------------|
| 0101 0000 | 1 | 001 0000 0011 1001 |
| S    O    | 9 | 0    3    9        |

∴ 225      206A      LDCH      BUFFER, X      509039

③ Object program for Fig 3.2

H,COPY, n001000,n00107A

Tn 001000A 1E141033A H82039A 001036A 281030A 301015A 182061A 3C1003A ... A0D102D  
 Tn 00101EA 15A 0C1036A H92061A 081035A H0000A H5HFH6D000003A 0000000  
 Tn 002039A 1EA 0H1030A 0D103DA E0205DA30203FA D8205DA 281030A 302052A ... A38203F  
 Tn 002057A 1CA 101036A HC0000AF 1001000A 0H1030A E02079A 302064A ... A2C1036  
 Tn 002073A 07A 382064A HC0000A 05

E,001000.

④

SYMBOL TABLE

| Symbol name | value of the symbol |
|-------------|---------------------|
| FIRST       | 1000                |
| CLOOP       | 1003                |
| ENDFILE     | 1015                |
| EDIF        | 102A                |
| THREE       | 102D                |
| ZERO        | 1030                |
| ROTADR      | 1033                |
| LENGTH      | 1036                |
| BUFFER      | 1039                |
| RDRBC       | 2039                |
| RLOOP       | 203F                |

| symbol name | value |
|-------------|-------|
| EXIT        | 2057  |
| INPUT       | 205D  |
| MAXLEN      | 205E  |
| WRREC       | 2061  |
| WLOOP       | 2064  |
| OUTPUT      | 2079  |

loader loads info into main memory

## Functions of Pass-I and Pass-II

### Pass I :

- Assign address to all statements in the program
- Save the values (addresses) assigned to all labels for use in Pass II
- Perform some processing of assemble directives (includes processing that affects address assignment, such as determining the length of data areas defined by BYTE, WORD etc)

### Pass II :

- Assemble instructions (translating operation codes and looking up addresses)
- Generate data values defined by BYTE, WORD etc
- Perform processing of assemble directives not done during pass-I
- Write the object program and the assembly listing.

Pass 1:

```

begin
    read first input line
    if OPCODE = 'START' then
        begin
            save #[OPERAND] as starting address
            initialize LOCCTR to starting address
            write line to intermediate file
            read next input line
        end (if START)
    else
        initialize LOCCTR to 0
    while OPCODE ≠ 'END' do
        begin
            if this is not a comment line then
                begin
                    if there is a symbol in the LABEL field then
                        begin
                            search SYMTAB for LABEL
                            if found then
                                set error flag (duplicate symbol)
                            else
                                insert (LABEL,LOCCTR) into SYMTAB
                        end {if symbol}
                    search OPTAB for OPCODE
                    if found then
                        add 3 {instruction length} to LOCCTR
                    else if OPCODE = 'WORD' then
                        add 3 to LOCCTR
                    else if OPCODE = 'RESW' then
                        add 3 * #[OPERAND] to LOCCTR
                    else if OPCODE = 'RESB' then
                        add #[OPERAND] to LOCCTR
                    else if OPCODE = 'BYTE' then
                        begin
                            find length of constant in bytes
                            add length to LOCCTR
                        end (if BYTE)
                    else
                        set error flag (invalid operation code)
                end {if not a comment}
            write line to intermediate file
            read next input line
        end (while not END)
    write last line to intermediate file
    save (LOCCTR - starting address) as program length
end (Pass 1)

```

Figure 2.4(a) Algorithm for Pass 1 of assembler.

Pass 2:

```

begin
    read first input line {from intermediate file}
    if OPCODE = 'START' then
        begin
            write listing line
            read next input line
        end {if START}
    write Header record to object program
    initialize first Text record
    while OPCODE ≠ 'END' do
        begin
            if this is not a comment line then
                begin
                    search OPTAB for OPCODE
                    if found then
                        begin
                            if there is a symbol in OPERAND field then
                                begin
                                    search SYMTAB for OPERAND
                                    if found then
                                        store symbol value as operand address
                                    else
                                        begin
                                            store 0 as operand address
                                            set error flag (undefined symbol)
                                        end
                                end {if symbol}
                            else
                                store 0 as operand address
                                assemble the object code instruction
                            end {if opcode found}
                        else if OPCODE = 'BYTE' or 'WORD' then
                            convert constant to object code
                        if object code will not fit into the current Text record then
                            begin
                                write Text record to object program
                                initialize new Text record
                            end
                            add object code to Text record
                        end {if not comment}
                    write listing line
                    read next input line
                end {while not END}
            write last Text record to object program
            write End record to object program
            write last listing line
        end {Pass 2}
    
```

Figure 2.4(b) Algorithm for Pass 2 of assembler.

## 2.3. Machine Dependent Assembler Features

- Here we consider an example of SIC/XE machine

→ As we know already, SIC/XE has

- a) Registers : A X L B S T F PC SW  
( 0 1 2 3 4 5 6 8 9 )

- b) Data formats :      Integers : 3 bytes  
                             Characters : 1 byte  
                             Float : 6 bytes

- c) Instructions Formats:

Format 1 : 1 byte       $\frac{8}{opcode}$       Ex: FLOPi, FIX

Format 2: 2 bytes            Ex: ADDR A,X

Format 3 : 3 bytes  Ex:- STL RETADR

Format A: 4 bytes  E1 + JSUB RDREC

$\Rightarrow$  we have 20 address lines  $\therefore$  we can have  $2^{20}$  addresses

d) Addressing mode are determined based on 6 bits

n i x b p e

(i)  $\rightarrow$

| n | i | x | Addressing mode             |
|---|---|---|-----------------------------|
| 1 | 0 |   | Indirect addressing         |
| 0 | 1 |   | Immediate                   |
| * | 1 | 1 | not immediate, not indirect |
| 0 | 0 | v | Simple addressing           |
|   |   | 1 | Indexed addressing          |
|   |   | 0 | Direct addressing           |

(ii)

| b | p | c | Addressing mode                  |
|---|---|---|----------------------------------|
| 0 | 1 |   | Program Counter Relative         |
| 1 | 0 |   | Base relative                    |
| * | 1 | 1 | Invalid (can't be set)           |
| 0 | 0 | v | NO pc relative, no base relative |
|   |   | 1 | Format n instruction             |
|   |   | 0 | Format 3 instructions            |

Different addressing mode notations

- 1) Indirect Addressing : @
- 2) Immediate Addressing : #
- 3) Extended Format : +
- 4) Indexed Addressing : operand, X
- 5) character string : c ' '
- 6) Base - Register : BASE
- 7) Current value of PC : \*

→ The addressing priority are as follows

a) PC relative addressing :  $-20H8 \leq \text{disp} \leq 20H7$   
 $(FFFFF800 \leq \text{disp} \leq 7FF)$

b) Base relative addressing :  $0 \leq \text{disp} \leq A095$   
 $(0 \leq \text{disp} \leq FFF)$

c) Extended Instruction Format :

note: Negative numbers are represented in 2's complement

Procedure to create object code for 8086 program

i) Write the LOCCTR address for each instruction  
 in the program.

→ if operand field is

- (i) memory address → Format 3 ⇒ Add 3 bytes
- (ii) Register - Register → Format 2 ⇒ Add 2 bytes
- (iii) + before operand → Format n ⇒ Add n bytes

→ if it is RESW 2000

.  $2000 \times 3 \text{ bytes} = (6000)_d = (1770)_H \Rightarrow$  Add these  
 many bytes to previous address.  
 . multiplication by 3 ∵ each word is 3 bytes

→ RESC 1 ⇒ Add just 3 bytes

→ RESB 2000 ⇒ Add 2000 bytes in hexadecinal  
 ie  $(2000)_d = (7D0)_H$

→ RESB 4096

$$\cdot (4096)_d = (1000)_H \Rightarrow \text{Add 1000 bytes}$$

→ BYTES C 'EOF'  $\Rightarrow$  Count the length of constant  
and add those many bytes

→ Enter the labels onto SYMTAB (page 1)

- ⇒ Once we are done with LOCCTR calculation and then  
finding program length = endAddress - startAddress
- 3) Now start creating the object code (Page 2) based on  
different addressing mode and set corresponding bits  
and calculate displacement
- For extended format, displacement = address
- For Reg-to-Reg instruction, write the opcode  
address followed by register numbers.
- Eg: CLEAR  $\xrightarrow{x} B_{H10}$  (Format 2)  
 $(1) \rightarrow$  number of x registers in the list
- For PC relative, disp = TA - PC
- For Base relative, disp = TA - (B)

| Line | Source statement |         |          |                                        |
|------|------------------|---------|----------|----------------------------------------|
| 5    | COPY             | START   | 0        | COPY FILE FROM INPUT TO OUTPUT         |
| 10   | FIRST            | STL     | RETADR   | SAVE RETURN ADDRESS                    |
| 12   | LDB              | #LENGTH |          | ESTABLISH BASE REGISTER                |
| 13   | BASE             | LENGTH  |          |                                        |
| 15   | CLOOP            | +JSUB   | RDREC    | READ INPUT RECORD                      |
| 20   |                  | LDA     | LENGTH   | TEST FOR EOF (LENGTH = 0)              |
| 25   |                  | COMP    | #0       |                                        |
| 30   |                  | JEQ     | ENDFIL   | EXIT IF EOF FOUND                      |
| 35   |                  | +JSUB   | WRREC    | WRITE OUTPUT RECORD                    |
| 40   |                  | J       | CLOOP    | LOOP                                   |
| 45   | ENDFIL           | LDA     | EOF      | INSERT END OF FILE MARKER              |
| 50   |                  | STA     | BUFFER   |                                        |
| 55   |                  | LDA     | #3       | SET LENGTH = 3                         |
| 60   |                  | STA     | LENGTH   |                                        |
| 65   |                  | +JSUB   | WRREC    | WRITE EOF                              |
| 70   |                  | J       | RETADR   | RETURN TO CALLER                       |
| 80   | EOF              | BYTE    | C'EOF'   |                                        |
| 95   | RETADR           | RESW    | 1        |                                        |
| 100  | LENGTH           | RESW    | 3        | LENGTH OF RECORD                       |
| 105  | BUFFER           | RESB    | 4096     | 4096-BYTE BUFFER AREA                  |
| 110  | .                |         |          |                                        |
| 115  | .                |         |          | SUBROUTINE TO READ RECORD INTO BUFFER  |
| 120  | .                |         |          |                                        |
| 125  | RDREC            | CLEAR   | X        | CLEAR LOOP COUNTER                     |
| 130  |                  | CLEAR   | A        | CLEAR A TO ZERO                        |
| 132  |                  | CLEAR   | S        | CLEAR S TO ZERO                        |
| 133  |                  | +LDT    | #4096    |                                        |
| 135  | RLOOP            | TD      | INPUT    | TEST INPUT DEVICE                      |
| 140  |                  | JEQ     | RLOOP    | LOOP UNTIL READY                       |
| 145  |                  | RD      | INPUT    | READ CHARACTER INTO REGISTER A         |
| 150  |                  | COMPR   | A,S      | TEST FOR END OF RECORD (X'00')         |
| 155  |                  | JEQ     | EXIT     | EXIT LOOP IF ECR                       |
| 160  |                  | STCH    | BUFFER,X | STORE CHARACTER IN BUFFER              |
| 165  |                  | TIXR    | T        | LOOP UNTIL MAX LENGTH                  |
| 170  |                  | JLT     | RLOOP    | HAS BEEN REACHED                       |
| 175  | EXIT             | STX     | LENGTH   | SAVE RECORD LENGTH                     |
| 180  |                  | RSUB    |          | RETURN TO CALLER                       |
| 185  | INPUT            | BYTE    | X'F1'    | CODE FOR INPUT DEVICE                  |
| 195  | .                |         |          |                                        |
| 200  | .                |         |          | SUBROUTINE TO WRITE RECORD FROM BUFFER |
| 205  | .                |         |          |                                        |
| 210  | WRREC            | CLEAR   | X        | CLEAR LOOP COUNTER                     |
| 212  |                  | LDT     | LENGTH   |                                        |
| 215  | WLOOP            | TD      | OUTPUT   | TEST OUTPUT DEVICE                     |
| 220  |                  | JEQ     | WLOOP    | LOOP UNTIL READY                       |
| 225  |                  | LDCH    | BUFFER,X | GET CHARACTER FROM BUFFER              |
| 230  |                  | WD      | OUTPUT   | WRITE CHARACTER                        |
| 235  |                  | TIXR    | T        | LOOP UNTIL ALL CHARACTERS              |
| 240  |                  | JLT     | WLOOP    | HAVE BEEN WRITTEN                      |
| 245  |                  | RSUB    |          | RETURN TO CALLER                       |
| 250  | OUTPUT           | BYTE    | X'05'    | CODE FOR OUTPUT DEVICE                 |
| 255  |                  | END     | FIRST    |                                        |

Figure 2.5 Example of a SIC/XE program.

| Line | Loc  | Op       | Source statement                       | Object code |
|------|------|----------|----------------------------------------|-------------|
| 5    | 0000 | COPY     | START 0                                |             |
| 10   | 0000 | # F1RST  | STL RETADR                             | 17202D      |
| 12   | 0003 | #        | LDB #LENGTH                            | 69202D      |
| 13   |      |          | BASE LENGTH                            |             |
| 15   | 0006 | # CLOOP  | +JSUB RDREC                            | 4B101036    |
| 20   | 000A | #        | LDA LENGTH                             | 032026      |
| 25   | 000D | #        | COMP #C                                | 290000      |
| 30   | 0010 | #        | JEQ ENDFIL                             | 332007      |
| 35   | 0013 | #        | +JSUB WRREC                            | 4B10105D    |
| 40   | 0017 | #        | J CLOOP                                | 3F2FEC      |
| 45   | 001A | # ENDFIL | LDA EOF                                | 032010      |
| 50   | 001D | #        | STA BUFFER                             | 0F2C16      |
| 55   | 0020 | #        | LDA #3                                 | 010003      |
| 60   | 0023 | #        | STA LENGTH                             | 0F200D      |
| 65   | 0026 | #        | +JSUB WRREC                            | 4B10105D    |
| 70   | 002A | #        | J 3RETADR                              | 3E2003      |
| 80   | 002D | # EOF    | BYTE C'EOF'                            | 454F46      |
| 95   | 0030 | # RETADR | RESW 1                                 |             |
| 100  | 0033 | # LENGTH | RESW 2                                 |             |
| 105  | 0036 | # BUFFER | RESB 4096                              | (JPCN)      |
| 110  |      |          |                                        |             |
| 115  |      |          | SUBROUTINE TO READ RECORD INTO BUFFER  |             |
| 120  |      |          |                                        |             |
| 125  | 1036 | # RDREC  | CLEAR X                                | B410        |
| 130  | 1038 | #        | CLEAR A                                | B400        |
| 132  | 103A | #        | CLEAR S                                | B440        |
| 133  | 103C | #        | +LDT #4096                             | 75101000    |
| 135  | 1040 | # RLOOP  | TD INPUT                               | E32019      |
| 140  | 1043 | #        | JEQ RLOOP                              | 332FFA      |
| 145  | 1046 | #        | RD INPUT                               | DB2013      |
| 150  | 1049 | #        | COMPR A,S                              | A034        |
| 155  | 104B | #        | JEQ EXIT                               | 332008      |
| 160  | 104E | #        | STCH BUFFER,X                          | 57C003      |
| 165  | 1051 | #        | TIXR T                                 | B850        |
| 170  | 1053 | #        | JLT RLOOP                              | 3B2FEA      |
| 175  | 1056 | # EXIT   | STX LENGTH                             | 134000      |
| 180  | 1059 | #        | RSUB                                   | 4F0000      |
| 185  | 105C | # INPUT  | BYTE X'F1'                             | F1          |
| 190  |      |          |                                        |             |
| 195  |      |          | SUBROUTINE TO WRITE RECORD FROM BUFFER |             |
| 200  |      |          |                                        |             |
| 205  |      |          |                                        |             |
| 210  | 105D | # WRREC  | CLEAR X                                | B410        |
| 212  | 105F | #        | LDT LENGTH                             | 774000      |
| 215  | 1062 | # WLOOP  | TD OUTPUT                              | E32011      |
| 220  | 1065 | #        | JEQ WLOOP                              | 332FFA      |
| 225  | 1068 | #        | LDCH BUFFER,X                          | 53C003      |
| 230  | 106B | #        | WD OUTPUT                              | 0F2008      |
| 235  | 106E | #        | TIXR T                                 | B850        |
| 240  | 1070 | #        | JLT WLOOP                              | 3B2FEF      |
| 245  | 1073 | #        | RSUB                                   | 4F0000      |
| 250  | 1076 | # OUTPUT | BYTE X'05'                             | 05          |
| 255  | 1077 |          | END FIRST                              | (JPCN)      |

Figure 2.6 Program from Fig. 2.5 with object code.

Consider the example of figure 2.5

- 1) Add the length of each instruction and add it to LOCCTR and find the program length

$$\text{Program length} = \frac{\text{end address} - \text{start address}}{= 1077 - 0000 = 1077}$$

- 2) Create the symbol table

page 1

| Symbol name | PC value |
|-------------|----------|
| FIRST       | 0000     |
| CLOOP       | 0006     |
| ENDFIL      | 001A     |
| EOF         | 002D     |
| RETAPR      | 0030     |
| LENGTH      | 0033     |
| BUFFER      | 0036     |
| RDREC       | 1036     |
| RLOOP       | 1040     |
| EXIT        | 1056     |
| INPUT       | 105C     |
| WRREC       | 105D     |
| LOLOOP      | 1062     |
| OUTPUT      | 1076     |

### 3) start creating object code (part -2)

10 0003 FIRST RETL RETADR (Format 3 : opnd is memory address)

By default assembler uses PC relative addressing

| opcode | $n$      | $i$ | $x$ | $b$ | $p$ | $c$ | disp |
|--------|----------|-----|-----|-----|-----|-----|------|
| 1      | 01110010 |     |     |     |     |     | 02D  |

$$\rightarrow TA = PC + disp$$

$$\hookrightarrow \text{Displacement} = TA - PC$$

$$= RETADR - LOCCTR (\text{location of next inst to be executed})$$

$$= 0030 - 0003 = \underline{\underline{002D}} \quad ; \text{format 3 displacement}$$

$\hookrightarrow$  02D is within range of  $-2^{12} \leq \text{disp} \leq 2^{12}$  is 12 bits

$\hookrightarrow$  opcode is 6 bits ( $h+2 \Rightarrow 1 \text{ nibble} + 2 \text{ bits}$ )

$\Rightarrow$  last 2 bits can be represented by  $n$  bits  
but always last 2 bits are "zero"

Ex:-  $h \Rightarrow \underline{\underline{0100}}$   
only 2 bits

$\& \Rightarrow \underline{\underline{1100}}$

$\hookrightarrow$  not immediate, not indirect so set  $n=1, i=1$

$\hookrightarrow$  not indexed  $x=0$ , not base relative  $b=0$  but it

is pc relative  $p=1$ , not format  $k$   $c=0$

$\hookrightarrow$  write the whole instruction's object code ie

|                       |          |     |
|-----------------------|----------|-----|
| 1                     | 01110010 | 02D |
| nibble representation | 1 7 2    | 02D |

STL RETADR 17.02D

13 0003  $\xrightarrow{LDB}$   $\xrightarrow{H8}$  #<sub>16</sub> 0033

$\rightarrow$  opcode for LDB = 68

$\rightarrow$  it is imm calculate disp.

$$TA = PC + \text{disp}$$

$$\text{disp} = TA - PC = 0033 - 0006 = 002D$$

$\rightarrow$  PC relative ; operand is memory address

$\rightarrow$  it is immediate so  $n=1$



$$\text{LDB } \# \text{ LENGTH} \Rightarrow 69202D$$

15 0006 CLOOP  $\xrightarrow{\quad}$   $\xrightarrow{TSUB}$   $\xrightarrow{H8}$  RDREC  $\xrightarrow{0030}$   $\rightarrow$  format 4

$\rightarrow$  disp = (operand) ; if it is extended format (F4)  
 $= 01036$  (20 bits)

$\rightarrow$  not immediate, not indirect  $n=1, i=1$ .

$\rightarrow$  extended  $c=1$



$$\text{CLOOP } \xrightarrow{\quad} \xrightarrow{TSUB} \xrightarrow{\quad} \text{RDREC} \Rightarrow HB101036$$

30 0000 LDA LENGTH  $\rightarrow$  Format 3

$\rightarrow$  PC relative,  $disp = TA - PC$

$P=1$

$$= 0033 - 0000 = 026$$

$\rightarrow$  not immediate, not indirect, not indexed so

$n=1, i=1, z=0$



LDA LENGTH  $\Rightarrow$  032026

25 0000 CMP #0  
28

$\rightarrow$  immediate not PC relative because operand is direct value but not memory address.

$\therefore$  displacement = operand = 000

$\rightarrow$  Immediate addressing  $n=0, i=1, b=0, p=0$



CMP #0  $\Rightarrow$  290000

3C

0010

JNE

BC

ENDFL

001A

 $\Rightarrow$  Format 3

, PC relative  $\therefore \text{disp} = TA - PC$

$$= 001A - 001B = \underline{\underline{007}}$$

within range

, not immediate, not indirect  $n=1, i=1$



3 3      2 007

$\therefore \text{JNE ENDFL} \Rightarrow 332007$

35  
65

0013

+ JSUB

H8

WRREC

10SD

 $\Rightarrow$  Format 4

, Displacement = address of operand  
 $= \underline{\underline{010SD}}$   
~~with this~~



+ JSUB WRREC  $\Rightarrow AB\underline{\underline{010SD}}$

HD

0011

J

100P

 $\Rightarrow$  Format 3

$$\text{disp} = TA - PC = 0006 - 001A$$

$= -14$  (it takes 2's complement)

$= REC$

• If it is PC relative  $P=1$

• not immediate, not indirect  $n=0, i=1$



object code : 3 F 2FEC

HS CCIA BMDR LDA EDF  $\Rightarrow$  Format 3 (PC relative)  
 $\underbrace{00}_{00}$   $\underbrace{0010}_{0010}$

$$disp = TA - PC = 002D - 001D = 0010$$



LDA EDF  $\Rightarrow$  032010

SIA CCIA SIA BR BMDR  $\Rightarrow$  Format 3 (PC relative)  
 $\underbrace{0C}_{0C}$   $\underbrace{0036}_{0036}$

$$disp = TA - PC = 0036 - 0020 = 0016$$



$\therefore$  SIA BUFF BR  $\Rightarrow$  OF2016

SIA CCIA LDA #3  $\Rightarrow$  immediate address

$$disp = 003$$



LDA #3  $\Rightarrow$  010003

65 0026  $\frac{0026}{00}$   $\frac{0033}{0033}$   $\Rightarrow$  Format 3 (PC relative)

$$\text{disp} = TA - PC = 0033 - 0026 = 0007$$



$$0 \underbrace{1}_{F} \underbrace{0007}_{200D} \Rightarrow 0F200D$$

70 002B  $\frac{002B}{3E}$   $\frac{0030}{0030}$   $\Rightarrow$  Format 3 + Indirect

$$\text{disp} = TA - PC = 0030 - 002B = 0003$$



$$3 \underbrace{E}_{6} \underbrace{0003}_{2003} \Rightarrow 3E2003$$

80 C030 EOF E/M C 'EOF'

$\Rightarrow$  Convert EOF to hexadecimal ASCII value

$$E \rightarrow 45$$

$$O \rightarrow 4F$$

$$F \rightarrow 46$$

125 1036 RRREFC  $\frac{\text{CLEAR}}{BH} X$  A X L B S T F PC SW  
0 1 2 3 4 5 6 8 9

$\Rightarrow BH10$   $\hookrightarrow$  this is not accumulator

$\Rightarrow$  only 2 bytes since it is register-to-register mode

130 1039  $\frac{\text{CLEAR}}{BH}$  A  $\Rightarrow BH00$

132 105A  $\frac{\text{CLEAR}}{BH}$  S  $\Rightarrow BH40$

160 1081 COMR R, S  $\Rightarrow$  A00H  
A0

160 1071 TRXR T  $\Rightarrow$  B850  
B8

030 1064 TRXR T  $\Rightarrow$  B850  
B8

210 1050 LCR CLEAR X  $\Rightarrow$  BH10  
BH

133 1022 ADDI 4H096  $\Rightarrow$  Format H & Immediate addressing  
 $disp = (4096)_{16} = 010000$   
  
 The diagram shows a 16-bit binary word divided into four fields:  
 - Sign (S): bit 7, value 1.  
 - Exponent (E): bits 6-10, value 01000.  
 - Fraction (F): bits 0-5, value 01000.  
 - Displacement (D): bits 11-15, value 01000.  
 Below the fields, the binary value 0101000 is shown, with a label "101000" indicating the displacement field.

133 1040 BLDP TP 105C  $\Rightarrow$  Format 3 + PC relative  
00

$$disp = TA - PC = 105C - 1043 = \phi 019$$

The diagram shows a 16-bit binary word divided into three fields:  
 - E: bit 15, value 1.  
 - F: bits 14-12, value 001.  
 - disp: bits 11-13, value 019.  
 Below the fields, the binary value 032019 is shown, with a label "2019" indicating the displacement field.

1046 1046  $\underbrace{1046}_{30}$   $\underbrace{\text{FFA} \oplus \text{Pc}}_{1040} \Rightarrow$  Format 3 + Pc relative

$$\text{disp} = \text{TA} - \text{PC} = 1046 - 1040 = -6$$



1046 1046  $\underbrace{1046}_{30}$   $\underbrace{1056}_{1056} \Rightarrow$  Format 3 + Pc relative

$$\text{disp} = \text{TA} - \text{PC} = 1056 - 1046 = 008$$



1046 1046  $\underbrace{RD}_{D8}$   $\underbrace{105C}_{105C} \Rightarrow$  Format 3 + Pc relative

$$\text{disp} = \text{TA} - \text{PC} = 105C - 1046 = 013$$



160 1056  $\xrightarrow[\text{SH}]{\text{CIN}}$  BUFFER, X  $\Rightarrow$  Indirect + PC relative  
~~1056~~  $\xrightarrow{\text{SH}}$  0036

$$\begin{aligned}\text{disp} = \text{TA} - \text{PC} &= 0036 - 1056 = -1018 \\ &= \underbrace{6F65}_{(H123)_{10}} > 2047\end{aligned}$$

$\therefore$  it is not PC relative, go for base relative

$$\begin{aligned}\text{disp} &= \frac{\text{TA} - \text{B}}{\text{BUFFER} - \text{B}} \quad (\text{length is stored in base register at } 0033) \\ &= 0036 - 0033 = 0003\end{aligned}$$



170 1056  $\xrightarrow[38]{\text{JIT}}$  KLOOP  $\xrightarrow[1040]{\text{KLOOP}}$   $\Rightarrow$  Format 3 + PC relative

$$\begin{aligned}\text{disp} = \text{TA} - \text{PC} &= 1040 - 1056 = FEA \\ \begin{array}{|c|cccccc|c|} \hline & n & i & x & b & p & c \\ \hline 3 & 1 & 0 & 1 & 1 & 0 & 0 & FEA \\ \hline \end{array} & \Rightarrow 3B2FEA\end{aligned}$$

175 1016 1311 Sx Lungs (?) Fornet 3 + PC relative  
12 0033

$$disp = TA - PC = 0033 - 1059 = 6FPA > 2047$$

... go for base relative mode

$$obsp = TA - (B) = 0033 - 0033 = \phi 000$$



120 1659 PSV3  
no displacement : no operard.  
⇒ Format 3



⇒ character string ∵ store as it is

312 10SF  $\frac{1}{7h}$   $\frac{\text{LEADERSHIP}}{0033}$   $\Rightarrow$  format 3

$$d_{SP} = TA - PC = 0033 - 1062 = \frac{EFDI}{4463} > 2047$$

... go for bare relative

$$\text{disp} = \text{TA} - \text{B} = 0033 - 0033 = 0000$$



$$0111 \quad 1065 \quad \underbrace{\text{LWLR}}_{EO} \quad \underbrace{\text{FD}}_{EO} \quad \underbrace{\text{LWLR}}_{1076} \Rightarrow \text{Format 3 + PC relative}$$

$$\text{disp} = \text{TA} - \text{PC} = 1076 - 1065 = 011$$



$$0111 \quad 1062 \quad \underbrace{\text{JPFQ}}_{30} \quad \underbrace{\text{LWLR}}_{1068} \Rightarrow \text{Format 3 + PC relative}$$

$$\text{disp} = \text{TA} - \text{PC} = 1062 - 1068 = FFA$$



$$0036 \quad 106B \quad \underbrace{\text{LDCH}}_{50} \quad \underbrace{\text{BUFFER}, X}_{0036} \Rightarrow \text{indirect}$$

$$\text{disp} = \text{TA} - \text{PC} = 0036 - 106B = -1035 > 2047$$

$\therefore$  go for base relative mode

$$\text{disp} = \text{TA} - \text{B} = 0036 - 0033 = 0003$$



23E      106E      PC      1076       $\Rightarrow F_3 + PC$  relative

$$\text{disp} = TA - PC = 1076 - 106E = \text{hex } 008$$



23E      1073      PC      1062       $\Rightarrow$  Format 3 + PC relative

$$\text{disp} = TA - PC = 1062 - 1073 = \text{hex } FEF$$



24E      1073      PC       $\Rightarrow$  Format 3



25E      1076      OUTPUT BYTES '0's'       $\Rightarrow$  character string  $\therefore$  store as it is  $\Rightarrow 0s$

#### ii) object program

H COPY ~000000, 001011

T 000000 ID 172020 692020 HB101036 032026 270008 332007 HB10105D  
3F2FEC 032010

TAK00001D A13 A OF 2016 A 010003 A OF 2020 A HBI0105D A 362003 A NSHF46

T<sub>4</sub> 001036, ID: BHT0, BH00, BH00, TS101000, E32019, 332PFA, DB2013, ADO4,  
332PFA, DB2013, ADO4, 332008, 570003, AB850

586003 ADP200PA B850

T<sub>0</sub>001010~07n3B2F6P~HF0000~05

卷之三

## Loading into memory

1. Generate the complete objd program for the following assembly level program

CLEAR - BK, LDA - 00, LDB - 67, ADD - 18, TIX - 8C,  
JLT - 38 STA - 0C

JLT - 38 STA - 0C

| PASS-I | LENGTH | LABEL | OPCODE | OPERAND         | PASS-II  |
|--------|--------|-------|--------|-----------------|----------|
| 0000   |        | SUM   | START  | 0               |          |
| 0000   | 2      |       | CLEAR  | X               | BK10     |
| 0002   | 3      |       | LDA    | #0              | 010000   |
| 0005   | 4      |       | +LDB   | #TOTAL          | 69101789 |
|        |        |       | BASE   | TOTAL           |          |
| 0009   | 3      | LOOP  | ADD    | TABLE,X         | 1BA00P   |
| 000C   | 3      |       | TIX    | COUNT           | 2F2007   |
| 000F   | 3      |       | JLT    | LOOP            | 3F2FF7   |
| 0012   | 4      |       | +STA   | TOTAL           | 0F101789 |
| 0016   | 3      | COUNT | RESW   | 1               |          |
| 0019   | 1770   | TABLE | RESW   | 2000<br>(1770)H |          |
| 1789   | 3      | TOTAL | RESW   | 1               |          |
| 178C   |        |       | END    | FIRST           |          |

$$\text{RESW } 2000 \Rightarrow 2000 \times 3 = (\text{6000})_{\text{bytes}} = (1770)_{\text{H}}$$

$$\therefore 0019 + 1770 = 1789$$

$$\text{Program Length} = 178C - 0000 = 178C$$

1) 0000 CLEAR X (Register-to-Register)

⇒ directly opcode with register numbers

⇒ BH10

2) 0002 LDH #0 ⇒ Immediate Addressing

$$\text{disp} = 000$$



⇒ 010000

3) 0005 + 40B #TOTAL ⇒ Extended & Immediate - Format n + immediate with PC relative

$$\begin{aligned}\text{disp} &= \text{opcode} \text{addr} \\ &= 1789\end{aligned}$$



⇒ 69101789

4) 0009 LOOP ADD TABLE,X ⇒ indexed with PC relative

$$TA = PC + \text{disp}$$

$$\text{disp} = TA - PC$$

$$= 0019 - 000C = 00D$$



⇒ 1BA00D

5) 000C TIX COUNT ⇒ Format 3

$$\text{disp} = 0016 - 000F = 007$$



⇒ 2F-2007

6) 000F JLT LOOP  $\Rightarrow$  Format 3 PC relative

$$\text{disp} = TA - PC$$

$$= 0009 - 0012 = FF7$$

$\xrightarrow[\text{target}]{\text{within}}$   $-2048 \leq FF7 \leq 2047$



$$B \quad F \quad P \quad E \\ 3 \quad 5 \quad 2 \quad FF7 \Rightarrow 3F2FF7$$

7) 0012 + STA TOTAL  $\Rightarrow$  Format 4

$$\text{disp} = \text{operand value} = 1789$$



$$F \quad P \quad E \quad B \\ 0 \quad 1 \quad 0 \quad 1789 \Rightarrow 0F101789$$

object program (Pass-2)

H<sub>n</sub> sum  $\wedge$  0000,  $\wedge$  00178C

T<sub>n</sub> 0000000000000000, 69101789, 1BA00D, 2F2007, 3F2FF7, 0F101789

E<sub>n</sub> 0000000

SYMTAB  $\rightarrow$

(Pass-1)

| SIMBOL NAME | VALUE |
|-------------|-------|
| LOOP        | 0009  |
| COUNT       | 0016  |
| TABLE       | 0019  |
| TOTAL       | 1789  |

g. Generate the complete object program for the following assembly level program. Also indicate the content of symbol table at the end. Assume standard SIC model and assume the following code codes in HEX

|          |          |           |          |
|----------|----------|-----------|----------|
| LDA = 00 | STA = 0C | TIX = 3C  | JLT = 38 |
| LDX = 0H | ADD = 18 | RSUB = 4C |          |

| LOCCTR<br>(PAGC:1) | LENGTH | LABEL | OPCODE | OPERAND         | BBTEXT(OPI) |
|--------------------|--------|-------|--------|-----------------|-------------|
|                    |        | SUM   | START  | H000            |             |
| H000               | 3      | FIRST | LDX    | ZERO            | 0H5788      |
| H003               | 3      |       | LDA    | ZERO            | 005788      |
| H006               | 3      | LOOP  | ADD    | TABLE,X         | 18C015      |
| H009               | 3      |       | TIX    | COUNT           | 205785      |
| H00C               | 3      |       | JLT    | LOOP            | 3840B6      |
| H00F               | 3      |       | STA    | TOTAL           | 0C578B      |
| H012               | 3      |       | RSUB   |                 | 4C0000      |
| H015               | 1770   | TABLE | RESLO  | 2000<br>(1770)H |             |
| 5785               | 3      | COUNT | RESLO  | 1               |             |
| 5788               | 3      | ZERO  | WORD   | 0.              | 000000      |
| 578B               | 3      | TOTAL | RESLO  | 1               |             |
| 578E               |        |       | END    | FLRST           |             |

Program length = End address - starting address  
 $= 578E - H000 = 178E$

→ since it is SIC program, we have two addressing mode

- direct addressing ( $x=0$ )
- indexed addressing ( $x=i$ )

so directly put opcode with operand address



1) H000 FIRST LDN ZERO



$\Rightarrow 0h5788$

2) H006 ADD TABLE, X  $\Rightarrow$  indirect addressing  
 $\quad \quad \quad$  18 H015



$\Rightarrow 18C015$

SYMTAB

| NAME  | VALUE |
|-------|-------|
| FIRST | H000  |
| LOOP  | H006  |
| TABLE | H015  |
| COUNT | 5785  |
| ZERO  | 5788  |
| TOTAL | 578B  |

object program

H<sub>A</sub> SUM  $\wedge^{00} H000 \wedge 00178E$

$\rightarrow \wedge 00H000 \wedge 15 \wedge OH5789 \wedge 005788 \wedge 18C015 \wedge 2C5785 \wedge 32H006 \wedge OC578B \wedge H00000$

$\rightarrow \wedge 005788 \wedge 03 \wedge 0000000$

E<sub>A</sub> 00H000

## LOADING INTO MAIN MEMORY

|      | 0  | 1  | 2  | 3  | 4  | 5  | 6     | 7  | 8  | 9  | A  | B    | C  | D  | E  | F  |
|------|----|----|----|----|----|----|-------|----|----|----|----|------|----|----|----|----|
| 0000 |    |    |    |    |    |    |       |    |    |    |    |      |    |    |    |    |
| 0010 |    |    |    |    |    |    |       |    |    |    |    |      |    |    |    |    |
| 1    |    |    |    |    |    | *  | *     | *  |    |    |    |      |    |    |    |    |
| H000 | 04 | 51 | 88 | 00 | 57 | 88 | 18    | CO | 15 | 2C | 57 | 85   | 38 | 40 | 06 | 0C |
| H010 | 57 | 88 | 4C | 00 | 00 |    |       |    |    |    |    |      |    |    |    |    |
| H020 |    |    |    |    |    |    |       |    |    |    |    |      |    |    |    |    |
| e    |    |    |    |    |    |    |       |    |    |    |    |      |    |    |    |    |
| a    |    |    |    |    |    |    |       |    |    |    |    |      |    |    |    |    |
| a    |    |    |    |    |    |    |       |    |    |    |    |      |    |    |    |    |
| S780 |    |    |    |    |    | 1  | 0001F | 00 | 00 | 00 | 1  | 107A | 1  |    |    |    |
| S790 |    |    |    |    |    |    |       |    |    |    |    |      |    |    |    |    |

TABLE

\* A C E \*

3. Generate the object code for each statement in the following sickxi program and generate the object program for the same.

| LOCCTR | LENGTH | LABEL  | OPCODE | OPERAND         | OBJECT-CODE |
|--------|--------|--------|--------|-----------------|-------------|
|        |        | SUM    | STRA   | 0               |             |
| 0000   | 3      | FIRST  | LDX    | #0              | 050000      |
| 0003   | 3      |        | LDA    | #0              | 010000      |
| 0006   | 4      |        | +LDB   | # TABLE2        | 69101790    |
|        |        |        | BASE   | TABLE2          |             |
| 000A   | 3      | LOOP   | ADD    | TABLE, X        | 1BA013      |
| 000D   | 3      |        | ADD    | TABLE2, X       | 1BC000      |
| 0010   | 3      |        | TIX    | COUNT           | 2F200A      |
| 0013   | 3      |        | JLT    | LOOP            | 3B2FFH      |
| 0016   | 4      |        | +STA   | TOTAL           | 0F102F00    |
| 001A   | 3      |        | RSUB   |                 | 4F0000      |
| 001D   | 3      | COUNT  | RESW   | 1               |             |
| 0020   | 1770   | TABLE  | RESW   | 2000<br>(1770)H |             |
| 1790   | 1770   | TABLE2 | RESW   | 2000<br>(1770)H |             |
| 2F00   | 3      | TOTAL  | RESW   | 1               |             |
| 2FD3   |        | END    | FIRST  |                 |             |

$$LDX = 04$$

$$LDB = 62$$

$$TIX = 2C$$

$$STA = 0C$$

$$LDA = 00$$

$$ADD = 18$$

$$JLT = 38$$

$$RSUB = HC$$

→ keep assigning the length for each instruction  
based on

- (i) 1st operand is memory address — 3 bytes
- (ii) 1st operand is register — 2 bytes
- (iii) + (Extended format) — n bytes

→ Find the LOCADR value ; Program length = 8F03 - 0000  
= 2F03

→ Create symTAB

| Name   | Value |
|--------|-------|
| FIRST  | 0000  |
| LOOP   | 000A  |
| COUNT  | 001D  |
| TABLE  | 0020  |
| TABLES | 1790  |
| TOTAL  | 2F00  |

→ object code for each instruction

i) 0000 FIRST LDX #0 → immediate addressing

$$\text{disp} = 000$$



$$0 \quad S \quad i \quad k \quad p \quad c \\ 0 \quad 0 \quad 0 \quad 1 \quad 0 \Rightarrow 050000$$

ii) 0003 LDA #0 → immediate addressing

$$\text{disp} = 000$$



$$0 \quad I \quad 0000 \Rightarrow 010000$$

3) 0006 + LDB #TABLE2 → Extended + Immediate 26

$$TA = PC + disp$$

$$disp = TA - PC = 1790 - 0006$$

$$disp = \frac{\text{Target}}{\text{address}} = \frac{01790}{2000}$$

| n i x b p c |   |        |   |          |   |   |   |   |   |
|-------------|---|--------|---|----------|---|---|---|---|---|
| 1           | 1 | 0      | 1 | 0        | 0 | 1 | 0 | 1 | 0 |
| 6           | 9 | 101790 | → | 69101790 |   |   |   |   |   |

4) 000A LOOP ADD<sub>18</sub> TABLE, X → Indirect + PC relative

$$TA = PC + disp$$

$$disp = TA - PC = 0020 - 000A = 0013$$

| n i x b p c |   |   |     |   |        |   |   |   |   |
|-------------|---|---|-----|---|--------|---|---|---|---|
| 1           | 1 | 0 | 1   | 1 | 0      | 0 | 1 | 0 | 0 |
| 1           | B | A | 013 | → | 1BA013 |   |   |   |   |

5) 000D ADD<sub>18</sub> TABLE2, X → Indirect + base relative

(∴ TABLE2 is stored in base register)

Initially we can try for PC-relative & check out whether displacement is within the range.

$$disp = TA - PC$$

$$= 1790 - 0010 = (1780)_{10} \geq (6016)_H > (2047)_H$$

∴ go for base relative

$$disp = TA - B \text{ (look for address of TABLE2 in symbols)}$$

$$= 1790 - 1790 = 0000$$

| n i x b p c |   |   |     |   |        |   |   |   |   |
|-------------|---|---|-----|---|--------|---|---|---|---|
| 1           | 1 | 0 | 1   | 1 | 1      | 0 | 1 | 0 | 0 |
| 1           | B | C | 000 | → | 1BC000 |   |   |   |   |

6) 0010  $\underset{22}{\text{JIX}}$  COUNT  $\rightarrow$  PC relative

$$\text{disp} = TA - PC = 001D - 0013 = \$000A$$



7) 0013  $\underset{38}{\text{JLT}}$  LOOP  $\Rightarrow$  PC relative

$$\text{disp} = TA - PC = 000A - 0016 = FFH$$



8) 0016 +  $\underset{0C}{\text{STA}}$  TOTAL  $\Rightarrow$  Extended (Format 4)

$$\text{disp} = \text{address of TOTAL} = 2F00$$



9) R8UB 001A RSUB  
 $\Rightarrow$  no operand  $\therefore$  no displacement



→ Object program

H<sub>N</sub> sum & 000000 & 008F03

Tr 00000000, ID A0500000, 0100000, 69101790, 1BA013A, 1BC000, A2F200A, 3B2FFh  
A OF1D2F00h, HF0000

E&N 000000

→ leader leads into economy

H Generate the machine code for the following SIC/XP program

Given JSUB = A0, LDA = 80, LDX = 60, STA = E0, COMP = 90,  
 RSUB = AC, J = B8

| LOCNR | LENGTH | LABEL        | OPCODE | OPERAND      | CBS/SIC CODE |
|-------|--------|--------------|--------|--------------|--------------|
|       |        | COPY         | START  | 1000         |              |
| 1000  | 4      | CLOOP        | +JSUB  | RDREC        |              |
|       | 3      |              | LDA    | LENGTH       |              |
|       | 3      |              | COMP   | ZERO         |              |
|       | 3      |              | JBA    | GXIT         |              |
|       | 3      |              | J      | CLOOP        |              |
|       | 3      | EXIT         | STA    | BUFFER       |              |
|       | 3      |              | LDA    | THREE        |              |
|       | 3      |              | STA    | TOTAL LENGTH |              |
|       | 3      |              | RSUB   |              |              |
|       |        | BUFFER       | RESLO  | 100          |              |
|       | 3      | EOF          | BYT0   | C EOF        |              |
|       | 3      | ZERO         | WORD   | 0            |              |
|       | 9      | THREE        | WORD   | 3            |              |
|       | 3      | LENGTH       | RESW   | 1            |              |
|       | 3      | TOTAL LENGTH | RESW   | 1            |              |
|       | 3      | RDREC        | LDX    | ZERO         |              |

2.2.2. Program Relocation  
↳ Absolute Assembly program is one which executes properly, only if program is loaded from specified location.

Ex: All SIC programs are absolute assembly program

Consider the SIC program

|    |      |       |       |         |        |
|----|------|-------|-------|---------|--------|
| 5  | 1000 | COPY  | START | 1000    |        |
| 10 | 1000 | FIRST | SIC   | RETADPR | 1H1033 |
| 15 | 1003 | LOOP  | TSUB  | RDREC   | H82037 |
|    |      |       |       |         |        |
|    |      |       |       |         |        |
| 55 | 101B |       | LDA   | THREE   | 00102D |
|    |      |       |       |         |        |
| 85 | 102D | THREE | WORD  | 3       | 000003 |

- Here program is loaded at address 1000.
- Line no. 55 specifies that the register A is to be loaded from memory address 102D [object code].
- Suppose we attempt to load and execute the program at address 2000 instead of address 1000, the address 102D will not contain the value that we expected, or it might be part of some other user's program.
- Obviously we need to make some changes in the address portion of this instruction so we can load and execute the program at address 2000.

- At the same time, there are statements like  
like no. 95. which generate a constant 3,  
that should remain the same regardless of where  
the program is loaded.
  - From the object code, we can't it is not possible  
to tell which values represent addresses and  
which represent constant data items.
  - This is all because the assembler does not  
know the actual location where the program  
will be loaded till load time. ∴ it cannot  
make the necessary changes required.
  - Only parts of the program that require modification  
at load time are those that specify direct  
addresses.
- This is achieved through relocatable program for SIC/XE  
machines)

### 3.3.3. Program Relocation

Program relocation is a process of modifying the addresses used in address sensitive instructions of a program such that program can execute correctly from allocated memory area. It is often needed to have more than one program at some time, sharing the memory and other resources of the machine. Because of this, it is necessary to load a program into memory whenever it is available. Hence relocation of the addresses in the program is required and this will be done during loading time. Assembler only indicates those instructions which need modification and this information is passed to loader.

The assembler solves the relocation problem as follows:

- keeping track of operand address relative to start of a program
- generating commands for loader which add the beginning address to operand relative address

The An object program that contains the information necessary to perform this kind of modification is called a "relocatable program". We can accomplish this with a modification record as follows

## Modification Record

col. 1 m

col 2-7 starting location of the address field to be modified, relative to the beginning of the program

col 8-9 length of the address field to be modified, in half-bytes (hexadecimal).

→ The length is stored in half-bytes (rather than bytes) because the address field to be modified may not occupy an integral number of bytes.  
Ex: 30 bits = 5 half-bytes

→ The starting location is the location of the byte containing the leftmost bits of the address field to be modified. If this field occupies an odd number of half-bytes, it is assumed to begin in the middle of the first byte at the starting location.

Ex: 8085 program

36

|     |      |        |       |         |          |
|-----|------|--------|-------|---------|----------|
| 5   | 0000 | COPY   | STORF |         |          |
| 10  | 0000 | FIRST  | STL   | RETADR  | 17202D   |
| 12  | 0003 |        | LDB   | #LENGTH | 69202D   |
| 13  |      |        | BASE  | LENGTH  |          |
| 15  | 0006 | CLoop  | +JSUB | RDREC   | HB101036 |
|     |      |        | :     | :       |          |
|     |      |        | +JSUB | WRREC   | AB1010SD |
| 35  | 0013 |        | +     | CLoop   | 3E2FFC   |
| 40  | 0017 |        | :     |         |          |
|     |      |        | +JSUB | WRREC   | HB1010SD |
| 65  | 0026 |        | :     |         |          |
|     |      |        | :     |         |          |
| 100 | 0036 | BUFFER | RESB  | 4096    |          |
|     |      |        | :     |         |          |
|     |      |        |       |         | BH10     |
| 125 | 1036 |        | RDREC | CLEAR X |          |

loaded at address 0000

→ Programs 15



(a)



(b)



(c)

Fig: Example for program relocation

- JSUB instruction at loc 15 is loaded at address 0005
- The address field contains 01036 (address of RDREC)
- 15 0006 +JSUB RDREC AB101036
- Suppose we want to load this program beginning at address 5000, as shown in fig (b), the address of instruction labeled RDREC will be 6036.
- ② If we load at 5000 as in fig (c), then
  - Likewise if we load at 5H2C as in fig (c), then address of RDREC will be AB108456.
  - It means, irrespective of the starting address loaded, RDREC is always 1036 bytes past the starting address of the program. This is the reason we initialized the location counter to 0. (i.e. relative to the starting address)



$m_1000007_{10}^05$

Note: 05 because  $16 \times 16 = 256$   
20 bits address  $\Rightarrow$   
05 half byte  
 $\downarrow$

$$05 \times 4 = 20 \text{ bits}$$

Actually at location 0009, first half byte is part of flag bits x, b, j, e. But length 05 tells loader to modify only last 5 half bytes. Hence instruction AB1 remains unchanged.

Relocation for instruction of the 3S and 6S

|    |      |        |        |          |
|----|------|--------|--------|----------|
|    |      | + JSUB | CORREL | HB1010SD |
| 3S | 0013 |        |        |          |
| 6S | 0026 | + JSUB | CORREL | HB1010SD |

$m_p 000014105$  &  $m_p 000027105$

→ If we add 5000 then address should be

$$\begin{array}{r} \text{HB1/0*1036} \\ + 5000 \\ \hline \text{HB1/06036} \end{array}$$

$$\begin{array}{r} \text{HB1/01036} \\ + 7H20 \xrightarrow{\text{relocatable address}} \\ \hline \text{HB1/08H56} \end{array}$$

→ Some instructions like  
 CLEAR S } don't need modification :: operand is  
 LPA #3 } not a memory address.

→ 10 STZ RETADR : doesn't need modification ::  
 operand is specified using program-counter relative or  
 base-relative addressing. Here the displacement is  
 always 02D. Irrespective of location of program loaded,  
 it is always 2D bytes away from the STZ instruction  
 between ~~WORD~~ and BUFFER

→ The ~~to~~ distance will always be 3 bytes.

The object program is rewritten as (Fig 3.6)

$M_{A1084} \sim 0000000_A 001077$   
 $T_A 0000000_A 1D_A 7202D_A 69222D_A \underline{HB101036}_A 032026_A 290000_A \sim \underline{HB10105D}_A 032010$   
 $T_A 000001D_A 13_A 0F2016_A 010003_A 0F200D_A \underline{HB10105D}_A 3E20D3_A 454F46$   
 $T_A 000001D_A 13_A 0F2016_A 010003_A 0F200D_A \underline{HB10105D}_A 3E20D3_A 454F46$   
 $T_A 000001D_A 13_A 0F2016_A 010003_A 0F200D_A \underline{HB10105D}_A 3E20D3_A 454F46$   
 $T_A 000001D_A 13_A 0F2016_A 010003_A 0F200D_A \underline{HB10105D}_A 3E20D3_A 454F46$   
 $T_A 000001D_A 13_A 0F2016_A 010003_A 0F200D_A \underline{HB10105D}_A 3E20D3_A 454F46$   
 $M_{A0000009_A} \sim$   
 $M_{A000014_A} \sim$   
 $M_{A000027_A} \sim$   
 $E_A 6000000$

## 2.3 Machine Independent Assembler features

→ machine independent means some assembler features that are not closely related to machine architecture.

This section includes

2.3.1 → The implementation of literals within an assembler

2.3.2 → Two assembler directives EQU and ORG used

to define the symbols

2.3.3 → Use of expressions in assembler language statements

2.3.4 → Implementation of program blocks

2.3.5 → Implementation of control sections

2.3.6 → Implementation of control sections

### 2.3.1 Literals

→ Constant operand can be specified as a part of the instruction that uses it, instead of using a label which is defined as constant elsewhere. Such an operand is called a literal because the value is stated "literally" in the instruction

Ex: { 45 001A ENDFIL LDB EOF  
      :                              Label  
      80 002D EDF BYTE C 'EDF' HSFH6

↓ can be written as

{ 45 001A ENDFIL LDA = C 'EDF' 032010  
      :  
      \*                              HSFH6  
                                    = C 'EDF'

The object code generated for loc 45, 915 and 230  
in fig 3.6 and fig 2.10 are identical.

(i) 45 001A ENDPAL LDA = X 'EOF' 032010

| opcode  | n | r | z | b | p | e | disp           |
|---------|---|---|---|---|---|---|----------------|
| 0000 00 | 1 | 1 | 0 | 0 | 1 | 0 | 0000 0001 0000 |

$$\begin{aligned} \text{disp} &= \text{opaddress} - \text{PC} & TA &= (\text{PC}) + \text{disp} \\ &= 002D - 001D = 01D \end{aligned}$$

$\Rightarrow 032010$

(ii) 915 1062 WLOOP TD = X '05' E32011

$$TD = EO$$

$$\begin{aligned} \text{disp} &= \text{opaddress} - \text{PC} \\ &= 1076 - 1065 = 011 \end{aligned}$$

|    |    |    |   |   |   |   |   |                |
|----|----|----|---|---|---|---|---|----------------|
| 11 | 10 | 00 | 1 | 1 | 0 | 0 | 1 | 0000 0001 0001 |
| E  | 3  |    | 2 | 0 | 1 | 1 |   |                |

(iii) 230 106B WD = X '05' DF2008

:

$$1076 - * = X '05'$$

$$WD = DC$$

$$\text{disp} = \text{opaddress} - \text{PC} = 1076 - 106E = 008$$

| D    | C  |    | 2 | 0 | 0  | 8              |
|------|----|----|---|---|----|----------------|
| 1101 | 11 | 11 | 0 | 0 | 10 | 0000 0000 1000 |

| Line | Source statement |                                        |                      |                                |
|------|------------------|----------------------------------------|----------------------|--------------------------------|
| 5    | COPY             | START                                  | 0                    | COPY FILE FROM INPUT TO OUTPUT |
| 10   | FIRST            | STL                                    | RETADR               | SAVE RETURN ADDRESS            |
| 13   |                  | LDB                                    | #LENGTH              | ESTABLISH BASE REGISTER        |
| 14   |                  | BASE                                   | LENGTH               |                                |
| 15   | CLOOP            | -JSUB                                  | RREC                 | READ INPUT RECORD              |
| 20   |                  | LDA                                    | LENGTH               | TEST FOR EOF (LENGTH = 0)      |
| 25   |                  | COMP                                   | #0                   |                                |
| 30   |                  | JEQ                                    | ENDFIL               | EXIT IF EOF FOUND              |
| 35   |                  | +JSUB                                  | WRREC                | WRITE OUTPUT RECORD            |
| 40   |                  | J                                      | CLOOP                | LOOP                           |
| 45   | ENDFIL           | STA                                    | =C'EOF'              | INSERT END OF FILE MARKER      |
| 50   |                  | STA                                    | BUFFER               |                                |
| 55   |                  | LDA                                    | #3                   | SET LENGTH = 3                 |
| 60   |                  | STA                                    | LENGTH               |                                |
| 65   |                  | +JSUB                                  | WRREC                | WRITE EOF                      |
| 70   |                  | J                                      | RETADR               | RETURN TO CALLER               |
| 93   |                  | LTORG                                  | ----- Original ----- |                                |
| 95   | RETADR           | RESW                                   | 1                    |                                |
| 100  | LENGTH           | RESW                                   | 1                    | LENGTH OF RECORD               |
| 105  | BUFFER           | RESB                                   | 4096                 | 4096-BYTE BUFFER AREA          |
| 106  | BUFEND           | EQU                                    | *                    |                                |
| 107  | MAXLEN           | EQU                                    | BUFFEND-BUFFER       | MAXIMUM RECORD LENGTH          |
| 110  |                  |                                        |                      |                                |
| 115  |                  | SUBROUTINE TO READ RECORD INTO BUFFER  |                      |                                |
| 120  |                  |                                        |                      |                                |
| 125  | RDREC            | CLEAR                                  | X                    | CLEAR LOOP COUNTER             |
| 130  |                  | CLEAR                                  | A                    | CLEAR A TO ZERO                |
| 132  |                  | CLEAR                                  | S                    | CLEAR S TO ZERO                |
| 133  |                  | +LDT                                   | #MAXLEN              |                                |
| 135  | RLOOP            | TD                                     | INPUT                | TEST INPUT DEVICE              |
| 140  |                  | JEQ                                    | RLOOP                | LOOP UNTIL READY               |
| 145  |                  | RD                                     | INPUT                | READ CHARACTER INTO REGISTER A |
| 150  |                  | COMPR                                  | A,S                  | TEST FOR END OF RECORD (X'00') |
| 155  |                  | JEQ                                    | EXIT                 | EXIT LOOP IF EOF               |
| 160  |                  | STCH                                   | BUFFER,X             | STORE CHARACTER IN BUFFER      |
| 165  |                  | TIXR                                   | T                    | LOOP UNTIL MAX LENGTH          |
| 170  |                  | JLT                                    | RLOOP                | HAS BEEN REACHED               |
| 175  | EXIT             | STX                                    | LENGTH               | SAVE RECORD LENGTH             |
| 180  |                  | RSUB                                   |                      | RETURN TO CALLER               |
| 185  | INPUT            | BYTE                                   | X'F1'                | CODE FOR INPUT DEVICE          |
| 195  |                  |                                        |                      |                                |
| 200  |                  | SUBROUTINE TO WRITE RECORD FROM BUFFER |                      |                                |
| 205  |                  |                                        |                      |                                |
| 210  | WRREC            | CLEAR                                  | X                    | CLEAR LOOP COUNTER             |
| 212  |                  | LDT                                    | LENGTH               |                                |
| 215  | WLOOP            | TD                                     | =X'05'               | TEST OUTPUT DEVICE             |
| 220  |                  | JEQ                                    | WLOOP                | LOOP UNTIL READY               |
| 225  |                  | LDCH                                   | BUFFER,X             | GET CHARACTER FROM BUFFER      |
| 230  |                  | WD                                     | =X'05'               | WRITE CHARACTER                |
| 235  |                  | TIXR                                   | T                    | LOOP UNTIL ALL CHARACTERS      |
| 240  |                  | JLT                                    | WLOOP                | HAVE BEEN WRITTEN              |
| 245  |                  | RSUB                                   |                      | RETURN TO CALLER               |
| 255  |                  | END                                    | FIRST                |                                |

Figure 2.9 Program demonstrating additional assembler features.

| Line | Loc  | Source statement |                                        |               |          | Object code |
|------|------|------------------|----------------------------------------|---------------|----------|-------------|
| 5    | 0000 | COPY             | START                                  | 0             |          |             |
| 10   | 0000 | FIRST            | STL                                    | RETADR        | 17202D   |             |
| 13   | 0C03 |                  | LDB                                    | #LENGTH       | 69202D   |             |
| 14   |      |                  | BASE                                   | LENGTH        |          |             |
| 15   | 0006 | CLOOP            | +JSUB                                  | RDREC         | 4B101036 |             |
| 20   | 000A |                  | LDA                                    | LENGTH        | 032026   |             |
| 25   | 000D |                  | COMP                                   | #0            | 290000   |             |
| 30   | 0010 |                  | JEQ                                    | ENDFIL        | 332007   |             |
| 35   | 0013 |                  | +JSUB                                  | WRREC         | 4B10105D |             |
| 40   | 0017 |                  | J                                      | CLOOP         | 3F2FEC   |             |
| 45   | 001A | ENDFIL           | LDA                                    | =C'EOF'       | 032010   |             |
| 50   | 001D |                  | STA                                    | BUFFER        | 0F2016   |             |
| 55   | 0020 |                  | LDA                                    | #3            | 010003   |             |
| 60   | 0023 |                  | STA                                    | LENGTH        | 0F200D   |             |
| 65   | 0026 |                  | +JSUB                                  | WRREC         | 4B10105D |             |
| 70   | 002A |                  | J                                      | @RETADR       | 3E2003   |             |
| 93   |      |                  | LTORG                                  |               |          |             |
|      | 002D | *                | =C'EOF'                                |               | 454F46   |             |
| 95   | 0030 | RETADR           | RESW                                   | 1             |          |             |
| 100  | 0033 | LENGTH           | RESW                                   | 1             |          |             |
| 105  | 0036 | BUFFER           | RESB                                   | 4096          |          |             |
| 106  | 1036 | BUPEND           | EQU                                    | *             |          |             |
| 107  | 1000 | MAXLEN           | EQU                                    | BUFEND-BUFFER |          |             |
| 110  |      |                  |                                        |               |          |             |
| 115  |      |                  | SUBROUTINE TO READ RECORD INTO BUFFER  |               |          |             |
| 120  |      |                  |                                        |               |          |             |
| 125  | 1036 | RDREC            | CLEAR                                  | X             | B410     |             |
| 130  | 1038 |                  | CLEAR                                  | A             | B400     |             |
| 132  | 103A |                  | CLEAR                                  | S             | B440     |             |
| 133  | 103C |                  | +LDT                                   | #MAXLEN       | 75101000 |             |
| 135  | 1040 | RLOOP            | TD                                     | INPUT         | E32019   |             |
| 140  | 1043 |                  | JEQ                                    | RLOOP         | 332FFA   |             |
| 145  | 1046 |                  | RD                                     | INPUT         | DB2013   |             |
| 150  | 1049 |                  | COMPR                                  | A,S           | A004     |             |
| 155  | 104B |                  | JEQ                                    | EXIT          | 332008   |             |
| 160  | 104E |                  | STCH                                   | BUFFER,X      | 57C003   |             |
| 165  | 1051 |                  | TIXR                                   | T             | B850     |             |
| 170  | 1053 |                  | JLT                                    | RLOOP         | 3B2FEA   |             |
| 175  | 1056 | EXIT             | STX                                    | LENGTH        | 134000   |             |
| 180  | 1059 |                  | RSUB                                   |               | 4F0000   |             |
| 185  | 105C | INPUT            | BYTE                                   | X'F1'         | F1       |             |
| 195  |      |                  |                                        |               |          |             |
| 200  |      |                  | SUBROUTINE TO WRITE RECORD FROM BUFFER |               |          |             |
| 205  |      |                  |                                        |               |          |             |
| 210  | 105D | WRREC            | CLEAR                                  | X             | B410     |             |
| 212  | 105F |                  | LDT                                    | LENGTH        | 774000   |             |
| 215  | 1062 | WLOOP            | TD                                     | =X'05'        | E32011   |             |
| 220  | 1065 |                  | JEQ                                    | WLOOP         | 332FFA   |             |
| 225  | 1068 |                  | LDCH                                   | BUFFER,X      | 53C003   |             |
| 230  | 106B |                  | WD                                     | =X'05'        | DF2008   |             |
| 235  | 106E |                  | TIXR                                   | T             | B850     |             |
| 240  | 1070 |                  | JLT                                    | WLOOP         | 3B2FEF   |             |
| 245  | 1073 |                  | RSUB                                   |               | 4F0000   |             |
| 255  |      |                  | END                                    | FIRST         |          |             |
|      | 1076 | *                | =X'05'                                 |               | 05       |             |

Figure 2.10 Program from Fig. 2.9 with object code.

## Literal Pool:

- All the literal operands used in a program are gathered together into one or more literal pools.
  - Normally literals are placed into a pool at the end of the program, which shows the assigned address and the generated data value.
  - The drawback of keeping literal pool at the end of the program is ~~as~~ the operand is too far away from the instruction referring it and requires a large amount of storage reservation for the buffer too.
  - To avoid this we use an assembler directive LTORG (ORIGIN OF LITERALS) which instructs the assembler to assemble the current literals pool immediately.
  - When the assembler encounters a LTORG statement, it creates a literal pool that contains all of the literal operands used since the previous LTORG (or the beginning of the program), i.e., keep the literal operand close to the instruction.
  - Some literal may be used more than once in the program ie. duplicate literals, but it stores only one copy of the specified data value.
- Ex.       $\text{d1} \equiv 1062$        $\text{wloop} : \text{DP} = \text{x}'05'$   
            $\text{d2} \equiv 106B$        $\text{WP} = \text{x}'05$

→ Apart from one copy of data value, it stores only one data area with this value generated. Both instructions refer to the same address in the literal pool for their operand.

→ There are two ways of recognizing the duplicate literals:

(a) Compare the character strings defining them.  
Same literal name with different value

Ex: X '0s'

(b) compare the generated data value. This is better but increases the complexity of the assembler.

Ex: = C'EDF' and = X'45H#H6'

→ The problem of using character strings to recognize duplicate literals is, as we see '\*' denotes a literal refers to the current value of program counter after line no. 93. There may be some literals that have the same name but different values, for example the statements

BASE \* → ①

LD B = \* → ②

① → loads the beginning address of the program into register B. This value will be available later for base relative addressing.

\* → It causes a problem if we use at line no. 13

,= 13 0003 LDB = \* 692003

it specifies an operand with value 0003.

55 0070 LDA = \* 010020

ie. literal operands have identical names but they have different values and both must appear in the literal pool.

\* → The same problem arises if a literal refers to any other literal whose value changes between one point in the program and another.

→ The datastructure used to store literal operands is

literal table LITTAB.

→ Literal Table (LITTAB) : It is a hashtable using literal name or value as the key.

- literal name
- operand value
- operand length
- address assigned

| NAME      | OPERAND VALUE | LENGTH | ADDRESS |
|-----------|---------------|--------|---------|
| = C 'EOF' | EOF           | 03     | 002D    |
| = X '05'  | 05            | 01     | 1076    |

part-1

→ Builds LITTAB with literal name, operand value and length, leaving the address unassigned

→ when LITERAL statement is encountered, assign an address to each literal not yet assigned an address. Along with this, location counter is updated to reflect the number of bytes occupied by each literal.

Pass - 3

- Search LITERALS for each literal operand encountered to generate respective object code
- Generate data value using BYTE or WORD statements
- Generate modification record for literals that represent an address in the program.

Difference between literal and an immediate operand

Literal (=)

Immediate operand (#)

1. Literal is an assemble directive
2. The assembler generates the specified value as a constant at some other memory location. The address of this generated constant is used as target address for machine instruction.
3. Architectural support is required
4. very slow since values are obtained from data memory
5. capable of storing large data

Immediate is a machine recognizable data

2. Here value is assembled as part of machine instruction.

Architectural support not required

faster than literal : data is within the instruction

can't store larger data if fullword is opcode, registers

for the given assembly language PFB → SS 0020 20A #3 010001

### 3.3.2 Symbol Defining statements

Most assemblers provide an assembler directive that allows the programmer to define symbols and specify their values.

The assembler directives are a) EQU b) ORG

#### a) EQU (Equate)

→ allows the programmer to define symbols (i.e. enters it into SYMTAB) and assigns to it the specified value.

syntax: symbol EQU value

→ The value may be

- (i) constant
- (ii) An expression involving constant
- (iii) previously defined symbols.

→ Use of EQU

▷ To establish symbolic names that can be used for improved readability in place of numeric values.

e.g. `+LDT #H096 ; load the value H096 into reg. T`

$\downarrow$  replace with

`MAXLEN EQU H096`

`+LDT MAXLEN`

→ when assembler encounters the EQU statement, it enters MAXLEN to SYMTAB with value H096.

- During assembly of LTR instruction, the assembler searches the symbols for the MAXLEN symbols and uses its value as the operand in the instruction.
- The advantage of doing so is if we want to change the value here to some other value, we need to change it only one place instead of searching or scanning through the program for #4096 for the replacement (required change)  $\Rightarrow$  #define in c

3) To define mnemonic names for registers.

Ex: A EOU 0  
 X EOU 1  
 L EOU 2

- The symbols A, X, L has to be entered into SYMFAB with their corresponding values 0, 1, 2.
- Instruction RMO A,X searches the symbols for A and X and their values to assemble the instruction.
- To reflect the logical function of the registers

Ex: BASE EOU RI  
 COUNT EOU R2  
 INDEX EOU R3

- Register RI is used as base register, R2 as program counter, R3 as index registers etc.
- Forward Reference is not allowed in EOU as all terms in the value field must have been defined previously during pass-1.

Ex: ALPHA RESW 1 } Allowed  
 RISTA RTW ALPHA }

BETA EOU ALPHA } not allowed  
 ALPHA RESW 1 } allowed

b) ORG (origin)

→ Assembler directive used to initially assign value to symbols.

→ syntax :

ORG value

→ value can be

(i) constant

(ii) expression involving constant

(iii) Previously defined symbols

→ When ORG is encountered, the assembler sets its locctr to the specified value.

→ Location counter is used to control assignment of storage in the object program. Hence altering its value would result in an incorrect assembly.

∴ the directive should be minimum used.

→ The ORG statement will affect the values of all labels defined until the next ORG.

→ If the previous value of locctr is automatically remembered, then we can return to the normal

use of locctr just by writing

ORG

→ Example : To define a symbol table with the following structure.

→ Symbol table with the given structure

| SYMBOL             | VALUE | FLAGS |
|--------------------|-------|-------|
| STAB<br>(we enter) |       |       |
|                    |       |       |

6 bytes      3 bytes      2 bytes  
(1 word)

- ↪ symbol field contains user defined symbols.
- ↪ value field represents the value assigned to the symbol
- ↪ flag field specifies symbol type and other information
- ↪ The space for this table is reserved as

STAB1      R6SB      1100 ;  $\frac{100}{\text{entry}} \times 11 = 1100$   
                each entry

- ↪ we can access the label entries in two ways  
(Usage of E6U and ORG)

↪ Using E6U  $\Rightarrow$

| SYMBOL | E6U | STAB        |
|--------|-----|-------------|
| VALUE  | E6U | STAB $[+6]$ |
| FLAGS  | E6U | STAB $[+9]$ |

offset from STAB

- (i) To fetch the value field,  
LDA VALUE, X ; where  $X = 0, 11, 22, \dots$  for each entry  
 $\hookrightarrow$  index register

\* \* \* (ii) This method of definition simply defines the labels, it does not make the structure of the table as clear as it might be.

(iii) Therefore we make use of ORG.



- (i) The first ORG sets the location counter to the value of STAB
- (ii) RESB statement defines SYMBOL to have the current value in LOCCTR
- (iii) LOCCTR is then advanced, so the label on RESLO statement assigns to VALUE to address (STAB+6) and then advanced to assign to (STAB+9).  
Flops to address (STAB+9).
- (iv) The last ORG statement sets LOCCTR back to its previous value, which is the address of the next unassigned byte of memory after the table STAB.

→ forward reference is not allowed in ORG.  
ie all symbols used to specify the new location counter value have to be previously defined.

→ Example :

|       |       |   |
|-------|-------|---|
| ORG   | ALPHA |   |
| BYTE1 | RESB  | 1 |
| BYTE2 | RESB  | 1 |

BYTE3 RESB 1

ORG

ALPHA RESB 1

↳ cannot process : the assembler does not know what value has to be assigned to the location counter in response to the first ORG statement. The symbols BYTE1, BYTE2, BYTE3 are not assigned address during pass 1.

↳ it has to be written as

ALPHA RESB 1

ORG ALPHA

BYTE1 RESB 1

BYTE2 RESB 1

BYTE3 RESB 1

ORG

### 9.3.3. Expressions

- The assembler allows the use of expressions as operand.
- It calculates the expression and produces a single operand address or value.
- The expression consists of
  - (i) operators : + - \* / (division is usually defined to produce an integer value)
  - (ii) Individual terms : constants, user-defined symbols, special terms like \* (current value of the location counter).

Ex:- MAXLEN EQU BUFEND-BUFFER  
       STAB           RESB (6+3+1)\*MAX  
       BUFEND EQU \*

- The values of terms can be absolute (independent of program location) such as constant or relative (to the beginning of the program) such as Address labels, data areas, references to the location counter value.

- Expressions are classified as
  - (i) Absolute Expressions } based on type of value
  - (ii) Relative Expressions } produced.

### (i) Absolute Expressions :

- ↳ Absolute means independent of program location and contains absolute terms like constants
- ↳ It may also contain relative terms provided the relative terms occur in pairs and the terms in each such pair have opposite signs.
- ↳ It is not necessary that the paired terms be adjacent to each other in the expression however, all relative terms must be capable of being paired in this way.
- ↳ None of the relative terms may enter into a multiplication or division operation.

### (ii) Relative Expressions :

- ↳ Relative means relative to the beginning of the program, such as labels on the instruction, data areas, references to the location counter value.
- ↳ Here, all of the relative terms except one can be paired as in absolute expressions and the remaining unpaired relative term must have a positive sign.
- ↳ No relative terms may enter into a multiplication or division operation.

Note: If either of absolute expression or relative expression do not meet the conditions, they are flagged as errors.

→ A relative term or expression represents some value which is written as  $(S+r)$

- $S$  = starting address of the program
- $r$  = value of term or expression relative to the starting address.

Ex:  $100 \text{ MAXLDR EQU BUFEND - BUFFER}$

↳ both  $\text{BUFEND}$  and  $\text{BUFFER}$  are relative terms representing an address within the program. The expression represents an absolute value.

## (2) Illegal Expressions

|                                 |                               |
|---------------------------------|-------------------------------|
| $\text{BUFEND} + \text{BUFFER}$ | ; no opposite signs           |
| $100 - \text{BUFFER}$           | ; both are not relative terms |
| $3 * \text{BUFFER}$             | ; * can't be used             |

→ Type of expression is determined by keeping track of symbol types in the program. This is done by adding a flag (R or A) in the symtab for each symbol defined.

Ex:

| Symbol | Type | Value |
|--------|------|-------|
| REGADR | R    | 0030  |
| BUFFER | R    | 0036  |
| BUFEPO | R    | 1036  |
| MARKER | R    | 1000  |

} few symbols  
of fig 9.10

### 3.3.4 Program Blocks

41

- Till now, we have seen that the program being assembled was treated as a single unit, even though it had subroutines, data areas etc resulting in a single block of object code.
- Within this object code (program) the generated machine instructions and data appeared in the same order as they were written in the source program.
- But sometimes it is required to logically rearrange the statements of the source program so that the large buffer space can be mapped to the end of object program, if no need of using extended instruction format, if the base register usage is not required, with the problem of placing literals in program has to be more flexible etc.
- All that are achieved through some of the assembler features such as program blocks and control sections.
- \* → Program blocks: Allows the generated machine instructions and data to appear in the object program in a different order from the corresponding source statement.
- or
- Program blocks are segments of code that are rearranged within a single object program unit.

Assembler Directive : USE

Syntax : USE Blockname

Fig. 3.11 shows the ~~source~~ program with program blocks

→ There are three blocks in the program

(i) Unnamed program block contains all executable instructions of the program

(ii) CDATA program block contains all data areas that consists of <sup>few</sup> longer block of memory i.e few words or less in length

(iii) CBLKs program block contains all data areas that consist of longer blocks of memory.

→ At the beginning, statements are assumed to be part of the unnamed (default) block. If no USE statements are included, the entire program belongs to this single block.

→ USE on line 92 indicate the beginning of CBLKs block

→ USE on line 103 indicate the beginning of CBLKs block

→ USE on line 123 resume the default block

\* → Each program block may contain several separate segments of the source program but assembler will logically rearrange these segments to gather together the pieces of each block and assign address.

→ Program readability is better if data areas are placed in the source program close to the statements that reference them.

The assembler accomplishes this logical re-arrangement of code by maintaining during pass-1 and pass-2

### (i) Pass-1:

Fig 3.12(b) shows the pass-1 of program blocks

- A separate location counter for each program block is assigned and is assigned to zero when a program block begins
- ↳ Saving and Restoring the current value of LOCATR occurs while switching between blocks
- ↳ Each label is assigned an address relative to the start of the block.
- ↳ stores the block name and number in the SYMTAB along with the assigned relative address of the label.
- ↳ At the end of pass-1, indicates the block length as the latest value of LOCATR for each block.
- ↳ constructs a table which contains the starting address and length for all blocks.
- ↳ Assembler assigns to each block a starting address in the object program (beginning with relative location 0).

| Block Name | Block Number | Address | Length |
|------------|--------------|---------|--------|
| default    | 0            | 0000    | 0066   |
| CDATA      | 1            | 0066    | 0008   |
| CRIME      | 2            | 0071    | 1000   |

- ↳ Flag is also added in this table



ii) Pass 2 :

→ calculates the address for each symbol relative to the start of the object program (not the start of the individual program block) by adding

(i) The location of the symbol relative to the start of its block (from symtab)

(ii) The starting address of this block.

Example

|     |      |   | OP     | LDA  | LENGTH |
|-----|------|---|--------|------|--------|
| 20  | 0006 | 0 |        |      |        |
| :   |      |   |        |      |        |
| 92  | 0000 | 1 |        | USE  | CDATA  |
| 100 | 0003 | 1 | USWGT# | REGW | 1      |

→ The value of the operand USWGT# is 0003 relative to block 1 (CDATA)

∴ address = 0003 + 0066 = 0069 relative to program (TA) when this instruction is executed

| Line | Loc/Block |  | Source statement |                   | Object code                            |                      |
|------|-----------|--|------------------|-------------------|----------------------------------------|----------------------|
| 5    | 0000 0    |  | COPY             | START             | 0                                      |                      |
| 12   | 0000 0    |  | FIRST            | STL RETADR        | 172063                                 |                      |
| 15   | 0003 0    |  | CLOOP            | JSUB RDREC        | 4B2021                                 |                      |
| 20   | 0006 0    |  |                  | LDA LENGTH        | 032060                                 |                      |
| 23   | 0009 0    |  |                  | COMP #0           | 290000                                 |                      |
| 30   | 000C 0    |  |                  | JEQ ENDFYL        | 332006                                 |                      |
| 35   | 000F 0    |  |                  | JSUB WRREC        | 4B203B                                 | T7                   |
| 40   | 0012 0    |  |                  | RETADR CLOOP      | 3F2FEE                                 |                      |
| 45   | 0015 0    |  | ENDSIL           | LDA =C'EOF'       | 032053                                 |                      |
| 50   | 0018 0    |  |                  | STA BUFFER        | 0F2056                                 |                      |
| 55   | 001B 0    |  |                  | LDA #3            | 010003                                 |                      |
| 60   | 001E 0    |  |                  | STA LENGTH        | 0F2048                                 |                      |
| 65   | 0021 0    |  |                  | JSUB WRREC        | 4B2029                                 | T7                   |
| 70   | 0024 0    |  |                  | RETADR            | 3E203F                                 |                      |
| 92   | 0000 1    |  |                  | USE CDATA         |                                        |                      |
| 95   | 0000 1    |  | RETADR           | RESW 1            | → 0066 + 0000 : 0066 T7 block 1        |                      |
| 100  | 0003 1    |  | LENGTH           | RESW 1            | → 0066 + 0003 : 0067                   |                      |
| 103  | 0006 2    |  |                  | USE CBLKS         |                                        |                      |
| 105  | 0000 2    |  | BUFFER           | RESB 4096         | → 0066 + 0000 : 0071                   |                      |
| 106  | 1000 2    |  | BUFEND           | SQU               |                                        |                      |
| 107  | 1000      |  | MAXLEN           | EQU BUFEND-BUFFER | 1001 - 0000 > 1001                     |                      |
| 110  |           |  |                  |                   |                                        |                      |
| 115  |           |  |                  |                   | SUBROUTINE TO READ RECORD INTO BUFFER  |                      |
| 120  |           |  |                  |                   |                                        |                      |
| 123  | 0027 0    |  |                  | USE               |                                        |                      |
| 125  | 0027 0    |  | RDREC            | CLEAR X           | B410                                   |                      |
| 130  | 0029 0    |  |                  | CLEAR A           | B400                                   |                      |
| 132  | 002B 0    |  |                  | CLEAR S           | B440                                   |                      |
| 133  | 002D 0    |  |                  | +LDT #MAXLEN      | 75101000                               |                      |
| 135  | 0031 0    |  | RLOOP            | TD INPUT          | E32038                                 | T7                   |
| 140  | 0034 0    |  |                  | JEQ RLOOP         | 332FFA                                 |                      |
| 145  | 0037 0    |  |                  | RD INPUT          | DB2032                                 |                      |
| 150  | 003A 0    |  |                  | COMPR A,S         | A004                                   |                      |
| 155  | 003C 0    |  |                  | JEQ EXIT          | 332008                                 |                      |
| 160  | 003F 0    |  |                  | STCH BUFFER,X     | 53A02F                                 |                      |
| 165  | 0042 0    |  |                  | TLXR T            | B850                                   |                      |
| 170  | 0044 0    |  |                  | JLT RLOOP         | 3B2FEA                                 | T7                   |
| 175  | 0047 0    |  | EXIT             | STX LENGTH        | 13201F                                 | T7                   |
| 180  | 004A 0    |  |                  | RSUB              | 4F0000                                 |                      |
| 183  | 0006 1    |  |                  | USE CDATA         |                                        | T7 → 0066 + 0006     |
| 185  | 0006 1    |  | INPUT            | BYTE X'F1'        | F1                                     | → 0066 + 0006        |
| 195  |           |  |                  |                   |                                        | → 0066 + 0006        |
| 200  |           |  |                  |                   | SUBROUTINE TO WRITE RECORD FROM BUFFER | → 0066 + 0006        |
| 205  |           |  |                  |                   |                                        |                      |
| 208  | 004D 0    |  |                  | USE               |                                        |                      |
| 210  | 004D 0    |  | WRREC            | CLEAR X           | B410                                   |                      |
| 212  | 004F 0    |  |                  | LDT LENGTH        | 772017                                 |                      |
| 215  | 0052 0    |  | WLOOP            | TD =X'C5'         | E32018                                 |                      |
| 220  | 0055 0    |  |                  | JEQ WLOOP         | 332FFA                                 | T7                   |
| 225  | 0058 0    |  |                  | LDCH BUFFER,X     | 53A016                                 |                      |
| 230  | 005B 0    |  |                  | WD =X'C5'         | DF2012                                 |                      |
| 235  | 005E 0    |  |                  | PIXR T            | B850                                   |                      |
| 240  | 0060 0    |  |                  | JLT WLOOP         | 3B2FEF                                 |                      |
| 245  | 0063 0    |  |                  | RSUB              | 4F0000                                 |                      |
| 252  | 0007 1    |  |                  | USE CDATA         |                                        |                      |
| 253  | 0007 1    |  |                  | LTORG             |                                        | T7 → 0066 + 7        |
|      | 000A 1    |  |                  | =C'EOF'           | 454P46                                 |                      |
|      | 000A 1    |  |                  | =X'05'            | 05                                     | → 0066 + 000D        |
| 255  |           |  |                  | END FIRST         |                                        | → 0066 + 000A : 0070 |

Figure 2.12(a) Program from Fig. 2.11 with object code.

| Line | Source statement |       |               |                                        |
|------|------------------|-------|---------------|----------------------------------------|
| 3    | COPY             | START | 0             | COPY FILE FROM INPUT TO OUTPUT         |
| 14   | FIRST            | STL   | RETADR        | SAVE RETURN ADDRESS                    |
| 15   | CL0OP?           | JSUB  | R1FEC         | READ INPUT RECORD                      |
| 20   |                  | LDA   | LENTH         | TEST FOR EOF (LENGTH = 0)              |
| 25   |                  | COMP  | #0            |                                        |
| 30   |                  | JEQ   | ENDFIL        | EXIT IF EOF FOUND                      |
| 35   |                  | JSUB  | WRREC         | WRITE OUTPUT RECORD                    |
| 40   |                  | J     | CL0OP         | LOOP                                   |
| 45   | ENDFIL           | LDA   | =C'EOF'       | INSERT END OF FILE MARKER              |
| 50   |                  | STA   | BUFFER        |                                        |
| 55   |                  | LDA   | #3            | SET LENGTH = 3                         |
| 60   |                  | STA   | LENTH         |                                        |
| 65   |                  | JSUB  | WRREC         | WRITE EOF                              |
| 70   |                  | J     | RETADR        | RETURN TO CALLER                       |
| 75   |                  | USE   | CDATA         |                                        |
| 85   | RETADR           | RESW  | 1             |                                        |
| 100  | LENGTH           | RESW  | 1             | LENGTH OF RECORD                       |
| 103  |                  | USE   | CBLKS         |                                        |
| 105  | BUFFER           | RESB  | 4096          | 4096-BYTE BUFFER AREA                  |
| 106  | BUFEND           | EQU   | *             | FIRST LOCATION AFTER BUFFER            |
| 107  | MAXLEN           | EQU   | BUFEND-BUFFER | MAXIMUM RECORD LENGTH                  |
| 110  | .                |       |               |                                        |
| 115  | .                |       |               | SUBROUTINE TO READ RECORD INTO BUFFER  |
| 120  | .                |       |               |                                        |
| 123  |                  | USE   |               |                                        |
| 125  | RDREC            | CLEAR | X             | CLEAR LOOP COUNTER                     |
| 130  |                  | CLEAR | A             | CLEAR A TO ZERO                        |
| 132  |                  | CLEAR | S             | CLEAR S TO ZERO                        |
| 133  |                  | +LDT  | #MAXLEN       |                                        |
| 135  | RLOOP            | TD    | INPUT         | TEST INPUT DEVICE                      |
| 140  |                  | JEQ   | RLOOP         | LOOP UNTIL READY                       |
| 145  |                  | RD    | INPUT         | READ CHARACTER INTO REGISTER A         |
| 150  |                  | COMPR | A,S           | TEST FOR END OF RECORD (X'00')         |
| 155  |                  | JEQ   | EXIT          | EXIT LOOP IF EOF                       |
| 160  |                  | STCH  | BUFFER,X      | STORE CHARACTER IN BUFFER              |
| 165  |                  | TIXR  | T             | LOOP UNLESS MAX LENGTH                 |
| 170  |                  | JLT   | RLOOP         | HAS BEEN REACHED                       |
| 175  | EXIT             | STX   | LENGTH        | SAVE RECORD LENGTH                     |
| 180  |                  | RSUB  |               | RETURN TO CALLER                       |
| 183  |                  | USE   | CDATA         |                                        |
| 185  | INPUT            | BYTE  | X'F1'         | CODE FOR INPUT DEVICE                  |
| 195  | .                |       |               |                                        |
| 200  | .                |       |               | SUBROUTINE TO WRITE RECORD FROM BUFFER |
| 205  | .                |       |               |                                        |
| 208  |                  | USE   |               |                                        |
| 210  | WRREC            | CLEAR | X             | CLEAR LOOP COUNTER                     |
| 212  |                  | LDT   | LENGTH        |                                        |
| 215  | WLOOP            | TD    | =X'05'        | TEST OUTPUT DEVICE                     |
| 220  |                  | JEQ   | WLOOP         | LOOP UNTIL READY                       |
| 225  |                  | LDCH  | BUFFER,X      | GET CHARACTER FROM BUFFER              |
| 230  |                  | WD    | =X'05'        | WRITE CHARACTER                        |
| 235  |                  | TIXR  | T             | LOOP UNTIL ALL CHARACTERS              |
| 240  |                  | JLT   | WLOOP         | HAVE BEEN WRITTEN                      |
| 245  |                  | RSUB  |               | RETURN TO CALLER                       |
| 252  |                  | USE   | CDATA         |                                        |
| 253  |                  | LTORG |               |                                        |
| 255  |                  | END   | FIRST         |                                        |

Figure 2.11 Example of a program with multiple program blocks.

$$\begin{aligned} \text{Disp: } & TA - (PC) \\ & = 0069 - 0009 \\ & = 0060 \end{aligned}$$

44

$$\begin{aligned} PC &= 0000 + 0009 = 0009 \\ (\text{starting address of df block}) + 0009 &= 0009 \end{aligned}$$



SymTAB

| Label name | Block number | Address | Flag |
|------------|--------------|---------|------|
| Length     | 1            | 0003    |      |

note: line 107

1000 MAXLEN EQU BUFFER  
 $\rightarrow$  shown without a block number indicates that  
 MAXLEN is an absolute symbol, whose value is  
 not relative to the start of any program block.

- object Program:
- It is not necessary to physically rearrange the generated code in the object program. The assembler just simply inserts the proper load address in each text record. The loader will load these codes into correct place
  - Header record as before
  - Text records: the first 2 text records generated from line 5 through 70.

- When USE statement on line 92 is encountered,  
the assembler writes the new Text record even though  
there is room (space) in the previous Text record.  
→ The process continues till the end of the program.

```

H, LOPR = 000000, C01071
T, 000000A1E, 172063, MB2021, . . . . . , 010003
T, 00001E, 09, 0F2048, MB2029, 3E203F
T, 000027, 1D, BH10, BH00, . . . . . , B850H
T, 00004H, 09, 3B9FFA, 13201F, HF0000
T, 00006C, 01, AF1, . . . . . , HF0000
T, 00004H, 19, BH10, 772017, . . . . . , 0027
T, 00006D, 04H, H5HFh6, 0-S
E, 000000
  
```



Fig: Program blocks loaded in memory

Relative address

```
begin
    block number = 0 LOCCTR[i] = 0 for all i
    read the first input line
    if OPCODE = 'START' then
        begin
            write line to intermediate file ;
            read next input line
        end {if START}
    while OPCODE # 'END' do
        if OPCODE = 'USE'
            begin
                if there is no OPEREND name then
                    set block name as default
                else block name as OPERAND name
                if there is no entry for block name then
                    insert (block name, block number +1) in block table
                i = block number for block name
                if this is not a comment line then
                    begin
                        if there is a symbol in the LABEL field then
                            begin
                                search SYMTAB for LABEL
                                if found then
                                    set error flag (duplicate symbol)
                                else
                                    insert (LABEL, LOCCTR[i]) into SYMTAB
                            end {if symbol}
                        Search OPTAB for OPCODE
                        if found then
                            add 3 instruction length to LOCCTR[i]
                        else if OPCODE = 'WORD' then
                            add 3 to LOCCTR[i]
                        else if OPCODE = 'RESW' then
                            add 3 * #[OPERAND] to LOCCTR[i]
                        else if OPCODE = 'RESB' then
                            add #[OPERAND] to LOCCTR[i]
                        else if OPCODE = 'BYTE' then
                            begin
                                find length of constant in bytes
                                add length to LOCCTR[i]
                            end {if byte}
                    end
            else
```

Figure 2.12(b) Pass 1 of program blocks.

```
Set error flag
end {if not a comment}
write line to intermediate file
read Text input line
end {while not END}
write last line to intermediate file
save Length[i] as LOCCTR(i) for all i
Address[0] = starting address
Address[i] = address(i - 1) + Length(i - 1)
    [for i = 1 to max(block number)]
insert(address[i], Length[i]) in block table for all i
end {Pass 1}
```

Figure 2.12(b) (cont'd)

```
If OPCODE = 'USE' then
    set block number for block name with OPERAND field
    search SYMTAB for OPERAND
    store symbol value + address [block number] as operand address
end {Pass 2}
```

Figure 2.12(c) Pass 2 of program blocks.

## Loading



contacts os & os says  
start loading at particular address

Assembler generates the object program (Header Record, Text record, modify record and End record). Assembler interacts with loader through object program. Loader contacts operating system to load at particular address. Then os checks if that particular ~~space~~ address is free or not. If not it will tell the loader to either wait or remove unnecessary space. Loader loads the object program residing in hard disk to main memory and start executing.

→ when there is no enough space, somebody has to instruct the loader to change its address and try loading. It is done by assembler not by os. Assembler instructs the loader to change the address.

i.e. line 15/35/65 + JSUB RPREGC

line 15  $\rightarrow$  address is HB101036 at 0006.

01036 starts from 0007 (middle). i.e. we can access 1 byte but not 1 nibble.

0006 - HB

0007  $\rightarrow$  10  $\rightarrow$  go here and modify the record. This is done by assembler  $\therefore$  we have modification record in 000000-1005

- $\rightarrow$  loader should listen to both assembler and os
- $\rightarrow$  Assembler says goto 0007 and modify but os says it is loaded at 5000 ~~and~~  $\therefore$  modify at 5007.
- $\star \star \rightarrow$  + JSUB #4096  $\rightarrow$  does not need modification record  
 $\because$  it is immediate addressing. Irrespective of relocation it remains same.

- $\rightarrow$  Execution is part of microprocessor
- $\xrightarrow{\text{relocation}}$  All instructions works except F4 instructions.

E1 + JSUB RPREGC HB101036



TA = 01036  $\rightarrow$  loaded at 0000

- $\star \star$  if it is loaded at 5000, it will not work properly.  
loader stops functioning  $\because$  TA = 01036.
- + JSUB 06036 X  $\therefore$  go for modification record

- Start adding length as
  - memory address - 3 bytes
  - Register-to-register - 2 bytes
  - Extended - 4 bytes
- note: All literals should be placed where LTORG appears in the program if LTORG is not present all literals will be inserted at the end of the program. (Line no 253)
- At line 105, block 1 (CDATA) starts ∵ it stores the LOCCTR value ie 0027 in LOCCTR-0. Then starts assigning 0000 to block 1.
- At line 105, block 2 (CBLCK) starts ∵ it saves the LOCCTR value - 0006 in LOCCTR-1 column.
- Line 125, block 0 starts again. It restores the LOCCTR value 0027 and starts over till line no. 185. (LOCCTR = 004D)
- Line 185, block -1 (CPAID) restarts by restoring the saved LOCCTR value & save LOCCTR = 0007
- Line 210, restore 004D and starts over till line no 245 having LOCCTR = 0066 which is stored in LOCCTR-0 column

### THREE LOCATION COUNTERS

| LOCCTR - 0<br>(Default block) | LOCCTR - 1<br>(CPATA block) | LOCCTR - 2<br>(CBLCKS.) |
|-------------------------------|-----------------------------|-------------------------|
| 0000                          | 0000                        | 0000                    |
| 0027                          | 0006                        | 1000                    |
| 004D                          | 0007                        |                         |
| 0066                          | 000B                        |                         |

### LITERAL TABLE

| Literal name | value of literal | length of literal | address of literal |
|--------------|------------------|-------------------|--------------------|
| = C 'EOF'    | HEX FFB6         | 03                | 0007               |
| = X '0E'     | 0E               | 01                | 000A               |
|              |                  |                   | 000B               |

### BLOCK TABLE

| Block Name | Block number | Address | Length |
|------------|--------------|---------|--------|
| Default    | 0            | 0000    | 0066   |
| CPATA      | 1            | 0066    | 000B   |
| CBLCKS     | 2            | 0071    | 1000   |

} Program Length  

$$= 66 + 0B + 1000$$
  

$$= 1071$$

### SYMBOL TABLE (with block number)

| Symbol Name | value | block no |
|-------------|-------|----------|
| FIRST       | 0000  | 0        |
| CLOOP       | 0003  | 0        |
| ENDFILE     | 0015  | 0        |
| RETADR      | 0000  | 1        |
| LENGTH      | 0003  | 1        |
| BUFFER      | 0000  | 2        |
| BUFEND      | 1000  | 2        |

| Symbol Name | value | block no |
|-------------|-------|----------|
| MAXLEN      | 1000  |          |
| RDRBC       | 0027  | 0        |
| RLOOP       | 0031  | 0        |
| EXIT        | 0047  | 0        |
| INPUT       | 0006  | 1        |
| WRREC       | 004D  | 0        |
| LOLOOP      | 0052  | 0        |

→ Line 253, we have use DATA (Block 1) and  
 253 → LTOrg ∵ store LOCCTR = 0007 at loc  
 253

ie 253 0007 LTOrg



all literals should be placed where LTOrg appears in the program. we have two literals here ie  $= C \text{ 'EOF'}$  and  $= X \text{ '05'}$   $\Rightarrow$  4 bytes  
 $\frac{3 \text{ bytes}}{1 \text{ byte}}$

starts at 000P, 0008, 0009, 000A  
 (E) (0) (F) (05)

so it stores 000B in LOCCTR-1 column

→ Line 105, block 9 starts and it reserves 1000 bytes of memory ∵ it saves 1000 in LOCCTR-2 column

<sup>note</sup> → BUFEND EQU 100  $\Rightarrow$  value of buffer is 100  
 BUFEND EQU \*  $\Rightarrow$  value will be current location  
 value = 0000 + 1000 = 1000

store these values in literal table.

10 0000 0 STL RETADR

→ present in block 1 : add  
var of block 0 (default block)

$$\begin{aligned} \text{displacement} &= \text{size}_{\text{q}}(\text{previous block}) + \text{TA} - \text{PC} \\ &= \text{size}_{\text{q}}(\text{B0}) + \text{RETADR} - \text{PC} \\ &= 0066 + 0000 - 0003 \\ &= 0063 \end{aligned}$$



15 0003 0 TSUB RDRBC  
Block (0)

$$\begin{aligned} \text{disp} &= \text{size}_{\text{q}} \text{ previous block} + \text{TA} - \text{PC} \\ &= 0 + 0027 - 006 = 021 \end{aligned}$$



20 0006 0 LDA LORGTH  
block 1

$$\text{disp} = \text{size of previous block} + TA - PC \\ = 0066 + 0003 - 0009 = 0060$$



35 000F JSUB 48 WORKREC belongs to block 0

⇒ AS before

\* 45 0015 LDA = C 'EOF'  
belongs to CDATA not to default block  
since it is literal which is placed after  
LTORG.

$$\text{disp} = \text{size of previous block} + TA - PC \\ = \text{size of } B0 + C 'EOF' - PC \\ = 0066 + 0007 - 0018 = 0055$$



SD 001B STA BUFFER  
 $\rightsquigarrow$   $\rightsquigarrow$   
 or belongs to block 2 (CBLK2)

$$\begin{aligned}
 d_{sp} &= \text{size of } (B_0 + B_1) + \text{BUFFER} - P \\
 &= \text{size of } (\text{Default block} + \text{DATA}) + \text{BUFFER} - P \\
 &= 0066 + 000B + 0000 - 001B \\
 &= 0071 - 001B = 0056
 \end{aligned}$$



### object program

HACOPY  $\wedge$  000000  $\wedge$  001071

1  $T \wedge$  000000  $\wedge$  1E  $\wedge$  172063  $\wedge$  4B2021  $\wedge$  D32060  $\wedge$  290000  $\wedge$  332006  $\wedge$  4B203B  $\wedge$  3F2FEE  
 $\wedge$  032055  $\wedge$  0F2056  $\wedge$  010003

2  $T \wedge$  00001E  $\wedge$  09  $\wedge$  0F2068  $\wedge$  4B2029  $\wedge$  3E203F

3  $T \wedge$  000027  $\wedge$  1D  $\wedge$  B410  $\wedge$  B410  $\wedge$  B400  $\wedge$  B440  $\wedge$  75101000  $\wedge$  E32038  $\wedge$  332FFA  $\wedge$   
 $\wedge$  DB2032  $\wedge$  AC04  $\wedge$  332008  $\wedge$  57A02F  $\wedge$  B850

4  $T \wedge$  00004H  $\wedge$  09  $\wedge$  3B2FFA  $\wedge$  13201F  $\wedge$  4F0000

5  $T \wedge$  00006C  $\wedge$  01  $\wedge$  F1

6  $T \wedge$  00000D  $\wedge$  19  $\wedge$  B410  $\wedge$  772017  $\wedge$  E3201B  $\wedge$  332FFA  $\wedge$  53A016  $\wedge$  DF2012  $\wedge$  B850  
 $\wedge$  3F2FEE  $\wedge$  4F0000

7  $T \wedge$  00006D  $\wedge$  04  $\wedge$  454F4H  $\wedge$  05

B  $\wedge$  000000

# Loading the object program into memory

50

| Address | 0  | 1  | 2  | 3  | 4  | 5  | 6      | 7  | 8       | 9  | A  | B  | C  | D  | E  | F  |
|---------|----|----|----|----|----|----|--------|----|---------|----|----|----|----|----|----|----|
| 0000    | 17 | 20 | 63 | 4B | 20 | 21 | 03     | 20 | 60      | 29 | 00 | 00 | 33 | 20 | 06 | 4B |
| 0010    | 20 | 3B | 3F | 2F | EE | 03 | 20     | 55 | 0F      | 20 | 56 | 01 | 00 | 03 | 0F | 20 |
| 0020    | 48 | 4B | 20 | 29 | 3E | 20 | 3F     | B4 | 10      | B4 | 00 | B4 | 40 | 75 | 10 | 10 |
| 0030    | 00 | 13 | 20 | 38 | 33 | 2F | FA     | DB | 20      | 32 | A0 | 0H | 33 | 20 | 08 | 57 |
| 0040    | 00 | 2F | B8 | 50 | 3B | 2F | EA     | 13 | 20      | 1F | HF | 00 | 00 | B4 | 10 | 17 |
| 0050    | 20 | 17 | 63 | 20 | 1B | 33 | 2F     | FA | 53      | A0 | 16 | 0F | 20 | 12 | B8 | 5D |
| 0060    | 3B | 2F | EE | HF | 00 | 00 | RETADR |    | LENGTH  |    | P1 | 45 | 4F | 46 |    |    |
| 0070    | 05 |    |    |    |    |    |        |    | DATA(4) |    |    |    |    |    |    |    |
| 0080    |    |    |    |    |    |    |        |    |         |    |    |    |    |    |    |    |
| 0090    |    |    |    |    |    |    |        |    |         |    |    |    |    |    |    |    |
| ⋮       |    |    |    |    |    |    |        |    |         |    |    |    |    |    |    |    |
| 1050    |    |    |    |    |    |    |        |    |         |    |    |    |    |    |    |    |
| 1060    |    |    |    |    |    |    |        |    |         |    |    |    |    |    |    |    |
| 1070    |    |    |    |    |    |    |        |    |         |    |    |    |    |    |    |    |

How does microprocessor execute an instruction

Ex: 10 <sup>0000</sup> STL RETADR 172063 L = 666600

⇒ store the contents of linkage register into RETADR

location

⇒ opwde for STL = 10 (known by microprocessor)



1 7 2063

$$TA = PC + disp$$

$$= 0003 + 063 = \underline{\underline{0066}}$$

RETADR

i.e. STL 0066 ⇒ copy the contents of linkage register (16 bits) into address 0066

Note: If starting address (location) is changed from 0000 to 5000, it works as usual. i.e. we don't have to format the address. i.e. no need of modification. reversed as before.  $\rightarrow$  advantage of program block.

### 2.3.5 Control sections and Program linking

- Control section is a part of the program that maintains its identity after assembly.
- Each control section can be loaded and reloaded independently of the others.
- Control sections are usually used for subroutines or other logical subdivisions of a program.
- The programs can assemble, load and manipulate each of these control sections separately.
- \* → It uses a assembler directive : CSER which indicates the beginning of the control section where each control section starts its location ~~as counter separately~~
- \* → When control sections form logically related parts of a program, it is necessary to provide some means for linking them together. This is because instructions in one control section may need to refer to instructions or data located in another control section. And assembler has no idea where exactly the control sections will be located at execution time.

- such references between control sections are called external references.
  - The assembler generates information for each external reference that will allow the loader to perform the required linking.
  - There are two types of external references external symbols
- (1) External Definition (EXTDEF)  
 . symbols that are defined in one section and are used by other sections  
 syntax: EXTDEF name [;name]  
 Ex: EXTDEF BUFFER, ROPEND

- (2) External Reference (EXTREF)  
 . symbols that are used in this control sections but are defined in some other control sections.  
 syntax: EXTREF name [,name]  
 Ex: EXTREF RDREC, WRREC

Note: To reference a external symbol, extended format instruction (Format 4) is needed

| Line | Loc  | Source statement |                                        |                        | Object code |
|------|------|------------------|----------------------------------------|------------------------|-------------|
| 5    | 0000 | COPY             | START                                  | C                      |             |
| 6    |      |                  | EXTDEF                                 | BUFFER, BUFEND, LENGTH |             |
| 7    |      |                  | EXTREF                                 | RDREC, WRREC           |             |
| 10   | 0000 | FIRST            | STL                                    | RETAADR                | 172027      |
| 15   | 0003 | CLOOP            | +JSUB                                  | RDREC                  | 4B100000    |
| 20   | 0007 |                  | LDA                                    | LENGTH                 | 032023      |
| 25   | 000A |                  | COMP                                   | #0                     | 290000      |
| 30   | 000D |                  | JBQ                                    | EMDFIL                 | 332007      |
| 35   | 0010 |                  | +JSUB                                  | WRREC                  | 4B100000    |
| 40   | 0014 |                  | J                                      | CLOOP                  | 3F2FE0      |
| 45   | 0017 | ENDFIL           | LDA                                    | =C'EOF'                | 032016      |
| 50   | 001A |                  | STA                                    | BUFFER                 | 0F2016      |
| 55   | 001D |                  | LDA                                    | #3                     | 010003      |
| 60   | 0020 |                  | STA                                    | LENGTH                 | 0F200A      |
| 65   | 0023 |                  | +JSUB                                  | WRREC                  | 4B100000    |
| 70   | 0027 |                  | J                                      | RETAADR                | 3E2000      |
| 75   | 002A |                  | RESW                                   | 1                      |             |
| 80   | 002D |                  | RESW                                   | 1                      |             |
| 85   |      |                  | LTORG                                  |                        |             |
| 90   | 0030 |                  | =C'EOF'                                |                        | 454F46      |
| 95   | 0033 | BUFFER           | RESB                                   | 4096                   |             |
| 100  | 1033 | BUFEND           | EQU                                    | *                      |             |
| 105  | 1039 | MAXLEN           | EQU                                    | BUFEND-BUFFER          |             |
| 110  | 0000 | RDREC            | CSECT                                  |                        |             |
| 115  |      |                  | SUBROUTINE TO READ RECORD INTO BUFFER  |                        |             |
| 120  |      |                  |                                        |                        |             |
| 125  | 0000 |                  | EXTREF                                 | BUFFER, LENGTH, BUFEND |             |
| 130  | 0002 |                  | CLEAR                                  | X                      | B410        |
| 132  | 0004 |                  | CLEAR                                  | A                      | B400        |
| 133  | 0006 |                  | CLEAR                                  | S                      | B440        |
| 135  | 0009 | RLOOP            | LDT                                    | MAXLEN                 | 77201F      |
| 140  | 000C |                  | TD                                     | INPUT                  | E3201B      |
| 145  | 000F |                  | JEQ                                    | RLOOP                  | 332FFA      |
| 150  | 0012 |                  | RD                                     | INPUT                  | D82015      |
| 155  | 0014 |                  | COMPR                                  | A, S                   | A004        |
| 160  | 0017 |                  | JBQ                                    | EXIT                   | 332009      |
| 165  | 001B |                  | +STCH                                  | BUFFER, X              | 57900C90    |
| 170  | 001D |                  | TIXR                                   | T                      | B850        |
| 175  | 0020 | EXIT             | JLT                                    | RLOOP                  | 3B2FE9      |
| 180  | 0024 |                  | +STX                                   | LENGTH                 | 13100000    |
| 185  | 0027 | INPUT            | RSUB                                   |                        | 4F0000      |
| 190  | 0028 | MAXLEN           | BYTE                                   | X'F1'                  | F1          |
| 193  | 0000 | WRREC            | WORD                                   | BUFEND-BUFFER          | 000000      |
| 195  |      |                  | CSECT                                  |                        |             |
| 200  |      |                  | SUBROUTINE TO WRITE RECORD FROM BUFFER |                        |             |
| 205  |      |                  |                                        |                        |             |
| 210  | 0000 |                  | EXTREF                                 | LENGTH, BUFFER         |             |
| 212  | 0002 |                  | CLEAR                                  | X                      | B410        |
| 215  | 0006 | WLOOP            | +LDT                                   | LENGTH                 | 77100000    |
| 220  | 0009 |                  | TO                                     | =X'05'                 | E32012      |
| 225  | 000C |                  | JEQ                                    | WLOOP                  | 332FFA      |
| 230  | 0010 |                  | +LDCH                                  | BUFFER, X              | 53900000    |
| 235  | 0013 |                  | WD                                     | =X'05'                 | 2F2009      |
| 240  | 0015 |                  | TIXR                                   | T                      | B850        |
| 245  | 0018 |                  | JLT                                    | WLOOP                  | 3B2FE9      |
| 250  |      | END              | RSUB                                   | FIRST                  | 4F0000      |
| 255  | 001B |                  | =X'05'                                 |                        | 05          |

Figure 2.16 Program from Fig. 2.15 with object code.

| Line | Source statement |        |                                        |
|------|------------------|--------|----------------------------------------|
| 5    | COPY             | START  | COPY FILE FROM INPUT TO OUTPUT         |
| 6    |                  | EXTDBF | BUFFER,BUFEND,LENGTH                   |
| 7    |                  | EXTREF | RDREC,WRECC                            |
| 10   | FIRST            | STL    | SETADR SAVE RETURN ADDRESS             |
| 15   | CLCOOP           | +JSUB  | RDRREC READ INPUT RECORD               |
| 20   |                  | LDA    | LENGTH TEST FOR EOF (LENGTH = 0)       |
| 25   |                  | COMP   | #0                                     |
| 30   |                  | JEQ    | ENDFILE EXIT IF EOF FOUND              |
| 35   |                  | +JSOB  | WRREC WRITE OUTPUT RECORD              |
| 40   |                  | J      | CLCOOP LOOP                            |
| 45   | ENDFILE          | LDA    | =C'EOF' INSERT END OF FILE MARKER      |
| 50   |                  | STA    | BUFFER                                 |
| 55   |                  | LDA    | #3 SET LENGTH = 3                      |
| 60   |                  | STA    | LENGTH                                 |
| 65   |                  | +JSOB  | WRREC WRITE EOF                        |
| 70   |                  | J      | RETADR RETURN TO CALLER                |
| 75   | RETADR           | RESW   | 1                                      |
| 100  | LENGTH           | RESW   | 1 LENGTH OF RECORD                     |
| 103  |                  | LTORG  |                                        |
| 105  | BUFFEP           | RESB   | 4096 4096-BYTE BUFFER AREA             |
| 106  | BUFEND           | EQU    | *                                      |
| 107  | MAXLEN           | EQU    | BUFEND-BUFFER                          |
| 109  | RDREC            | CSECT  |                                        |
| 110  |                  |        |                                        |
| 115  | .                |        | SUBROUTINE TO READ RECORD INTO BUFFER  |
| 120  | .                |        |                                        |
| 122  |                  | EXTREF | BUFFER,LENGTH,BUFEND                   |
| 125  |                  | CLEAR  | X CLEAR LOOP COUNTER                   |
| 130  |                  | CLSAR  | A CLEAR A TO ZERO                      |
| 132  |                  | CLEAR  | S CLEAR S TO ZERO                      |
| 133  |                  | LDT    | MAXLEN                                 |
| 135  | RLOOP            | TD     | INPUT TEST INPUT DEVICE                |
| 140  |                  | SEQ    | RLOOP LOOP UNTIL READY                 |
| 145  |                  | RD     | INPUT READ CHARACTER INTO REGISTER A   |
| 150  |                  | COMPR  | A,S TEST FOR END OF RECORD (X'00')     |
| 155  |                  | JEQ    | EXIT EXIT LOOP IF EOF                  |
| 160  |                  | +STCH  | BUFFER,A STORE CHARACTER IN BUFFER     |
| 165  |                  | TIKR   | T LOOP UNLESS MAX LENGTH               |
| 170  |                  | JLT    | RLOOP HAS BEEN REACHED                 |
| 175  | EXIT             | -RTX   | LENGTH SAVE RECORD LENGTH              |
| 180  |                  | RSUB   | RETURN TO CALLER                       |
| 185  | INPUT            | BYTE   | X'F1' CODE FOR INPUT DEVICE            |
| 190  | MAXLEN           | WORD   | BUFEND-BUFFER                          |
| 193  | WRECC            | CSECT  |                                        |
| 195  |                  |        |                                        |
| 200  | .                |        | SUBROUTINE TO WRITE RECORD FROM BUFFER |
| 205  | .                |        |                                        |
| 207  |                  | EXTREF | LENGTH,BUFFER                          |
| 210  |                  | CLEAR  | X CLEAR LOOP COUNTER                   |
| 212  |                  | +LDT   | LENGTH                                 |
| 215  | WLOOP            | TD     | =X'05' TEST OUTPUT DEVICE              |
| 220  |                  | JEQ    | WLOOP LOOP UNTIL READY                 |
| 225  |                  | +LDCH  | BUFFER,X GET CHARACTER FROM BUFFER     |
| 230  |                  | WD     | =X'05' WRITE CHARACTER                 |
| 235  |                  | TIKR   | T LOOP UNTIL ALL CHARACTERS            |
| 240  |                  | JLT    | WLOOP HAVE BEEN WRITTEN                |
| 245  |                  | RSUB   | RETURN TO CALLER                       |
| 255  | END              | FIRST  |                                        |

Figure 2.15 Illustration of control sections and program linking.

In Fig 2.16, there are three control sections.

- 1) main program  $\rightarrow$  COPY from loc 0 to line 107
- 2) read subroutine  $\rightarrow$  RDREC from loc 109 to loc 170
- 3) write subroutine  $\rightarrow$  WRREC from line 193 to 255.

- $\rightarrow$  Assembler establishes a separate location counter (beginning at 0) for each control section
- $\rightarrow$  Control sections named COPY, RDREC, WRREC are not named in EXTDEF because they are automatically considered to be external symbols.
- $\rightarrow$  Assembler handles the external references as follows

a) 15 0003 C00P + JSUB 48 RDREC  
EXTREF

. Assembler is unaware of RDREC address, so it inserts an address of zero and pads this to load, which is taken care during loading. The address of RDREC will have no predictable relationship to anything in the control section by name COPY, therefore relative addressing is not possible. Thus an extended format instruction must be used to provide room for the actual address to be inserted.

| register |          | symbol |
|----------|----------|--------|
| A        | 10000000 | 00000  |
| B        | 10000000 | 00000  |

- b) 160 0017 +\$1CH BUFFER: X 57900000  
 → BUFFER is used in RDREL control section  
 but defined in copy control section.
- c) 190 0028 MAXLEN WORD BUFEND-BUFFER 000000  
 → two external references in the expression  
 BUFEND and BUFFER in WORD section.  
 → The assembler inserts an address of 300,  
 it passes information to the loader to add  
 to this data area the address of BUFEND  
 and subtract from this data area the  
 address of BUFFER, which results in the  
 desired value.
- d) 107 1000 MAXLEN EDV BUFEND-BUFFER  
 → Both expressions looks same but the  
 (107 & 190)  
 difference is here BUFEND and BUFFER  
 are defined and used in the same control  
 section so value can be calculated  
 immediately.

$$\text{MAXLEN EDV } 1033 - 0033 \\ = \underline{1000}$$

54

A reference to MAXLEN in the COPY control section will use the definition on line 107, whereas a reference to MAXLEN in RDREC control section will use the definition on line 190.

→ object program

Along with header record, footer record, modification record, two more records are added

a) Referr Record → information of symbols defined in this control section

|            |                                                            |
|------------|------------------------------------------------------------|
| col. 1     | D                                                          |
| col. 2-7   | Name of external symbol defined in this control section    |
| col. 8-13  | Relative address within this control section (Hexadecimal) |
| col. 14-19 | Repeat information in col. 2-13 for other external symbol  |

b) Refer Record → symbols that are used as external references in this control section

|           |                                                          |
|-----------|----------------------------------------------------------|
| col. 1    | R                                                        |
| col. 2-7  | Name of external symbol referred in this control section |
| col. 8-13 | Name of other external reference symbol                  |

c) modification Record

|            |                                                                                      |
|------------|--------------------------------------------------------------------------------------|
| col. 1     | M                                                                                    |
| col. 2-7   | starting address of the field modified, in half-bytes (hexadecimal)                  |
| col. 11-16 | External symbol whose value is to be added to or subtracted from the indicated field |

## COPY

H\_COPY ~ 00000000001033

R\_BUFFER ~ 000033~BUFEND~001033~LENGTH~00002D

R\_PUREC ~ WRECC

T~000000~ID~172027~KB100000~03~2D25~290000~332007~4B100000~3F)A+C~  
032616~CF~2016

T~00001D~0D~010003~DF200A~4B100000~3E2000

T~000030~03~H5KF6

m~0000004~05~+RDREC

m~000011~05~+WRREC

m~00002A~05~+WRREC

E~0000000

## RDREC

H\_RDREC ~ 000000~00002B

R\_BUFFER, LENGTH, BUFEND

T~000000~ID~BH00~BH00~77201F~E3201B~332FFA~DB2015~AB04~82009~  
57900000~B850

T~00001D~0E~3B2F~E9~013100000~H~F0000~F1~0000000

m~000018~05~+BUFFER

m~000021~05~+LENGTH

m~000028~06~+BUFEND

m~000028~06~+BUFFER

E

## WRREC

H\_WRREC ~ 000000~00001C

R\_LENGTH, BUFFER

T~000000~IC~BH00~77100000~E32012~332FFA~53700000~DF2008~B850~  
~3B2FEE~H~F00000S

m~000003~05~+LENGTH

m~00000D~05~+BUFFER

E

## 2.4 Assembler Program Options

We will learn two alternatives of the standard two-pass assembler.

a) One-pass Assemblers

b) Multi-pass Assemblers

### a) One-pass Assemblers

→ As we know already, assembling the forward references is very difficult ∵ we don't know the addresses. This can be eliminated very easily for data items. That is data items are defined in the source program before they are referenced.

(Ex. ~~Attnote~~ declaration of data variable in c).

(Ex. Attnote declarations for labels on instructions.)

→ It is not the name for labels on instructions.

→ It is not the name for labels on instructions.

Ex: If the program has a forward jump i.e. skipping

from a loop after taking some condition → here

we can't define before itself. Therefore the

assembler has to provide the way to handle

forward references.

- Two types of one-pass assemblers.
  - (i) load-and-go assembler: Assembler produces object program code directly in memory for immediate execution.
- Here object program is not written out and no loader is needed.
- Application: program development and testing.
- ECL: A university computing system for students.
  - Since a large fraction of the total workload here is program translation. Because programs consists of programs, translation. Because programs are re-assembled nearly every time they are run, the efficiency of the assembly process is an important consideration.
- Load-and-go assembler avoids the overhead of writing the <sup>object</sup> program out (secondary storage) and reading it back in. ∴ forward references can be handled easily.
- Avoids usage of forward references.
- Used on systems where external working-storing devices are either slow or not available.

| Line | Loc  | Source statement |                                        |          | Object code |
|------|------|------------------|----------------------------------------|----------|-------------|
| 1    | 1000 | COPY             | START                                  | 1000     | ,           |
| 1    | 1000 | EOF              | BYTE                                   | X'0F'    | 454F46      |
| 2    | 1003 | THREE            | WORD                                   | 3        | 000003      |
| 3    | 1006 | ZERO             | WORD                                   | 0        | 000000      |
| 4    | 1009 | RETADR           | RESW                                   | 1        |             |
| 5    | 100C | LENGTH           | RESW                                   | 2        |             |
| 6    | 100F | BUFFER           | RESB                                   | 4098     |             |
| 9    |      | .                |                                        |          |             |
| 10   | 200F | FIRST            | STL                                    | RETADR   | 141009      |
| 15   | 2012 | CLLOOP           | JSUB                                   | R0REC    | 48203D      |
| 20   | 2015 |                  | LDA                                    | LENGTH   | 00100C      |
| 25   | 2018 |                  | COMP                                   | ZERO     | 281006      |
| 30   | 201B |                  | JEQ                                    | ENDFIL   | 302024      |
| 35   | 201E |                  | JSUB                                   | WRREC    | 482062      |
| 40   | 2021 |                  | J                                      | CLLOOP   | 302012      |
| 45   | 2024 | ENDFIL           | LDA                                    | EOF      | 001000      |
| 50   | 2027 |                  | STA                                    | BUFFER   | 0C100F      |
| 55   | 202A |                  | LDA                                    | THREE    | 001003      |
| 60   | 202D |                  | STA                                    | LENGTH   | 0C100C      |
| 65   | 2030 |                  | JSUB                                   | WRREC    | 482062      |
| 70   | 2033 |                  | LDL                                    | RETADR   | 081009      |
| 75   | 2036 |                  | RSUB                                   |          | 4C0000      |
| 110  |      | .                |                                        |          |             |
| 115  |      | .                | SUBROUTINE TO READ RECORD INTO BUFFER  |          |             |
| 120  |      | .                |                                        |          |             |
| 121  | 2039 | INPUT            | BYTE                                   | X'F1'    | F1          |
| 122  | 203A | MAXLEN           | WORD                                   | 4096     | 001000      |
| 124  |      | .                |                                        |          |             |
| 125  | 203D | R0REC            | LDX                                    | ZERO     | 041006      |
| 130  | 2040 |                  | LDA                                    | ZERO     | 001006      |
| 135  | 2043 | RLLOOP           | TD                                     | INPUT    | E02039      |
| 140  | 2046 |                  | JEQ                                    | FLLOOP   | 302043      |
| 145  | 2049 |                  | RD                                     | INPUT    | D82039      |
| 150  | 204C |                  | COMP                                   | ZERO     | 281006      |
| 155  | 204F |                  | JEQ                                    | EXIT     | 30205B      |
| 160  | 2052 |                  | STCH                                   | BUFFER,X | 54900F      |
| 165  | 2055 |                  | TX                                     | MAXLEN   | 2C203A      |
| 170  | 2058 |                  | JLT                                    | FLLOOP   | 382043      |
| 175  | 205B | EXIT             | STX                                    | LENGTH   | 10100C      |
| 180  | 205E |                  | RSUB                                   |          | 4C0000      |
| 195  |      | .                | SUBROUTINE TO WRITE RECORD FROM BUFFER |          |             |
| 200  |      | .                |                                        |          |             |
| 205  | 2061 | OUTPUT           | BYTE                                   | X'05'    | 05          |
| 207  |      | .                |                                        |          |             |
| 210  | 2062 | WRREC            | LDX                                    | ZERO     | 041006      |
| 215  | 2065 | WLLOOP           | TD                                     | OUTPUT   | E02061      |
| 220  | 2068 |                  | JEQ                                    | WLLOOP   | 302065      |
| 225  | 206B |                  | LDCH                                   | BUFFER,X | 54900F      |
| 230  | 206E |                  | WD                                     | OUTPUT   | D82061      |
| 235  | 2071 |                  | TX                                     | LENGTH   | 2C100C      |
| 240  | 2074 |                  | JLT                                    | WLLOOP   | 382065      |
| 245  | 2077 |                  | RSUB                                   |          | 4C0000      |
| 255  |      | END              | FIRST                                  |          |             |

Figure 2.18 Sample program for a one-pass assembler.

↳ Assembler generating object code.  
Working process of load-go assembler

- ↳ Fig 2.18 shows an example for one-pass assembler
- ↳ Here all data item definitions are placed ahead of the code that references them.

Ex:

|     |      |        |      |         |
|-----|------|--------|------|---------|
| 01  | 1000 | EOF    | BYTE | c 'EOF' |
| 02  | 1003 | THREE  | WORD | 3       |
| 03  | 1006 | ZERO   | WORD | 0       |
| ... |      |        |      |         |
| 06  | 100F | BUFFER | RESB | H096.   |

- ↳ The assembler generates object code as it scans the source program
- ↳ If an instruction operand is a symbol that has not yet been defined (forward reference), the operand address is committed during assembly.
- ↳ The symbol is entered into symbol table if not exists along with a flag indicating undefined.
- ↳ The address of the operand field of the instruction that refers to the undefined symbol is added to a list of forward references associated with the symbol table entry.

Ex:

|    |      |       |      |       |            |
|----|------|-------|------|-------|------------|
| 15 | 2012 | CLOOP | JSUB | RDREC | H80000     |
|    |      |       |      |       | 2012 → H8  |
|    |      |       |      |       | 2013 → ... |

|        |      |   |
|--------|------|---|
| RDREC  | *    | * |
| LENGTH | 100C |   |

|      |   |
|------|---|
| 2013 | ① |
|------|---|

↳ address where it has to start later

↳ When the definition for a symbol is encountered, the forward reference list for that symbol is scanned (if exists) and the proper address is inserted into any instructions previously generated.

↳ Fig 2.19 shows the object code and symbol table entries as they were scanned till line no. 10.

↳ 15 3012 CLOOP JSUB RDREC  
 ↳ undefined  
 ∵ symbol table entry is   
 2013 is the address location where it has to load once found further.  
 ↳ Some for line no. 30, 35.

| Memory Address | Content                                     |
|----------------|---------------------------------------------|
| 1000           | 45H FFH 600 00030000 00XXXXXX XXXXXXXX      |
| 1010           | *XXXXXX XXXXXXX XXXXXXXXX XXXXXXXXX         |
| 1020           |                                             |
| 1030           |                                             |
| 2000           | XXXXXX XX XXXXXN XXXXXXNN XXXXXXNN XXXXXXNN |
| 2010           | 1009H -- 00100C 28100630 --- H8 --          |
| 2020           | -- 302012                                   |
| 2030           |                                             |
| 2040           |                                             |

| SYMBOL TABLE |             |
|--------------|-------------|
| LENGTH       | 100 C       |
| RDREC        | * → 2012 10 |
| THREE        | 100 3       |
| ZERO         | 100 6       |
| WRREC        | * → 201F 10 |
| EOF          | 1000        |
| BNDFL        | * → 201C 10 |
| RETADR       | 1009        |
| BUFFER       | 100F        |
| CLOOP        | 2012        |
| FURST        | 2001F       |

Fig 2.19: Object code in memory and symbol table entries after scanning line no.

## Memory

## Address

## Contents

|      |          |          |          |          |
|------|----------|----------|----------|----------|
| 1000 | HSHF4600 | 00030000 | 00XXXXXX | XXXXXXX  |
| 1010 | XXXXXXXX | XXXXXXXX | XXXXXXXX | XXXXXXXX |
| :    |          |          |          |          |
| 1000 | XXXXXXXX | XXXXXXX  | XXXXXXX  | XXXXXXH  |
| 2010 | 1009H820 | 3D00100C | 28100630 | 2024H820 |
| 2020 | --3C2012 | 00100000 | 100F0010 | 030C100C |
| 2030 | H8---08  | 1009H000 | 00F10010 | 000AH000 |
| 2040 | 00100660 | 20393030 | H3D82039 | 28100630 |
| 2050 | ----SH90 | 0F       |          |          |

## SYMBOL TABLE

|        |           |
|--------|-----------|
| LENGTH | 100C      |
| RDREC  | 203D      |
| THREE  | 1003      |
| ZERO   | 1006      |
| WRCREC | * → 201P  |
| EOF    | 1000      |
| ENDPIL | 2024      |
| RETADR | 1009      |
| BUFFER | 100F      |
| CLOOP  | 2012      |
| FIRST  | 200F      |
| MAXLEN | 203A      |
| INPUT  | 2039      |
| EXIT   | * → 2050# |
| RLOOP  | 2043      |

Fig : Object code in memory and symbol table entries after scanning line 160 of fig 9.18

|      |             |           |             |             |
|------|-------------|-----------|-------------|-------------|
| 1000 | HSHF4600    | 00030000  | 00XXXXXX    | XXXXXXX     |
| 1010 | XX XX XX XX | XX XXXXXX | XX XX XX XX | XX XX XXXXX |
| :    |             |           |             |             |
| 2000 | XXXXXXX     | XXXXXXX   | XXXXXXX     | XXXXXXH     |
| 2010 | 1009H820    | 3D00100C  | 28100630    | 2024H820    |
| 2020 | 623C2012    | 00100000  | 100F0010    | 030C100C    |
| 2030 | H8206208    | 1009H000  | 00F10010    | 000AH006    |
| 2040 | 00100660    | 20393030  | H3D82039    | 28100630    |
| 2050 | 205B5H90    | 0F2C303A  | 38504310    | 100CH100    |
| 2060 | 00050H10    | 06E07061  | 30206550    | 900F=DC20   |
| 2070 | 612C100C    | 382065H0  | 0000        |             |
| 2080 |             |           |             |             |

|        |      |
|--------|------|
| LENGTH | 100C |
| RDREC  | 203D |
| THREE  | 1003 |
| ZERO   | 1006 |
| WRCREC | 2062 |
| EOF    | 1000 |
| ENDPIL | 2026 |
| RETADR | 1009 |
| BUFFER | 100F |
| CLOOP  | 2012 |
| FIRST  | 200F |
| MAXLEN | 203A |
| INPUT  | 2039 |
| EXIT   | 205B |
| RLOOP  | 2043 |
| OUTPUT | 2061 |
| WLOOP  | 2065 |

Fig : Complete object program in memory and symbol table entries

note: 2) any symbols in the SYMTAB are still marked with \* should be flagged by the assembler as errors. (undefined symbol error is C long after compiling completely)

- ↳ The assembler searches SYMTAB for the value of the symbol named END statement and jumps to this location to begin execution of the assembled program.
- ↳ In load-and-go assembler, the actual address must be known at assembly time.

10

```

begin
    read first input line
    if OPCODE = 'START' then
        begin
            save # [OPERAND] as starting address
            initialize LOCCTR as starting address
            read next input line
        end (if START)
    else
        : initialize LOCCTR to 0
while OPCODE ≠ 'END' do
    begin
        if there is not a comment line then
            begin
                if there is a symbol in the LABEL field then
                    begin
                        search SYMTAB for LABEL
                        if found then
                            begin
                                if symbol value as null
                                set symbol value as LOCCTR and search
                                    the linked list with the corresponding
                                    operand
                                PTR addresses and generate operand
                                    addresses as corresponding symbol
                                    values
                                set symbol value as LOCCTR in symbol
                                    table and delete the linked list
                            end
                        else
                            insert (LABEL, LOCCTR) into SYMTAB
                    end
                search OPTAB for OPCODE
                if found then
                    begin
                        search SYMTAB for OPERAND address
                    if found then
                        if symbol value not equal to null then
                            store symbol value as OPERAND address
                        else
                            insert at the end of the linked list
                                with a node with address as LOCCTR
                    else
                        insert (symbol name, null)

```

Figure 2.19(c) Algorithm for One pass assembler.

```

        add 3 to LOCCTR
    end
    else if OPCODE = 'WORD' then
        add 3 to LOCCTR & convert comment to
        object code
    else if OPCODE = 'RESW', then
        add 3 #[OPERAND] to LOCCTR
    else if OPCODE = 'RESB' then
        add #[OPERAND] to LOCCTR
    else if OPCODE = 'BYTE' then
        begin
            find length of constant in bytes
            add length to LOCCTR
            convert constant to object code
        end
    if object code will not fit into current
    text record then
        begin
            write text record to object program
            initialize new text record
        end
        add object code to Text record
    end
    write listing line
    read next input line
end
write last Text record to object program
write End record to object program
write last listing line
end (Pass 1)

```

Figure 2.19(c) (cont'd)

references that could not be handled by the assembler. Of course, the object

v) Assembler generating object code.

↳ This type of one-pass assembler also works in the same manner as before (load-and-go) except where the definition of symbol is encountered.

↳ When a symbol definition is encountered, instructions that made forward references to that symbol may no longer be available in memory to modify.  $\Rightarrow$  means they have already been written out as part of a Text Record in the object program.

↳ In such situations, assembler generates another Text record with the correct operand address.

↳ When the program is loaded, this address will be inserted into the instruction by the loader.

↳ The object program for fig 9.18 is shown below during one pass-one.

H,COPY A 001000,00107A

1 T,001000,09,456FB6,000003,000000

2 T,00200F,1ENH1009,180000,00100C,281006,300000,h80000,3C0012

3 T,00201C,02,2024

4 T,002024,19,001000,0C100F,001003,0C100C,h80000,081009,hc0000,

R,001000

5 T,002013,02,203D

i. T<sub>A</sub>00203D<sub>A</sub> 041006<sub>A</sub> 001006<sub>A</sub> 6E02039<sub>A</sub> E02063<sub>A</sub> DS2039<sub>A</sub> 2S1006<sub>A</sub>  
 3000CE<sub>A</sub> 54900F<sub>A</sub> 2C203A<sub>A</sub> 382063<sub>A</sub>  
  
 ii. T<sub>A</sub>002050<sub>A</sub> 02<sub>A</sub> 205B<sub>A</sub>  
  
 iii. T<sub>A</sub>00205B<sub>A</sub> 07<sub>A</sub> 1010CC<sub>A</sub> HC0000<sub>A</sub> 0005  
  
 iv. T<sub>A</sub>00201F<sub>A</sub> 02<sub>A</sub> 2062<sub>A</sub>  
  
 v. T<sub>A</sub>002031<sub>A</sub> 02<sub>A</sub> 2063<sub>A</sub>  
  
 vi. T<sub>A</sub>002062<sub>A</sub> 18<sub>A</sub> 041006<sub>A</sub> E02061<sub>A</sub> 302065<sub>A</sub> S0900F<sub>A</sub> DC2061<sub>A</sub> 2C100C<sub>A</sub>  
 382065<sub>A</sub> HC0000

B<sub>A</sub>00200F

- ↳ The record test record contains the object code generated from lines 10 through 40 in fig 2.18.
- ↳ The operand address for instructions on line 15,30 and 45 has been generated as 0000.
- ↳ When definition of ENDFILE on line 45 is encountered, the assembler generates the <sup>third</sup> Test Record. It indicates that the value 209H (address of ENDFILE) has to be loaded at location 201C.
- ↳ This continues for all the forward references encountered.

## b) multi-pass Assemblers

- ↳ we know that whenever we use 'EQU' assembly directive, any symbol used on the right-hand side should be defined previously in the source program. This is not true always.

```

    ↳ EQU   ALPHA     EQU   BETA
                  BETA     EQU   DELTA
                  DELTA    RSW   1
  
```

- ↳ As we see above, we have multiple forward references i.e. Alpha depends on value of Beta, Beta depends on value of delta.
- ↳ Any assembler that makes only two sequential passes over the source program cannot resolve such a sequential sequence of definitions.
- ↳ To overcome this we go for multi-pass assembler which makes at many passes as needed to process the definitions of symbols.
- ↳ It is not necessary for multi-pass assembler to make more than two passes over the entire program.
- \* ↳ Instead, the portions of the program that involve forward references in symbol definitions are several

during pass-1. Additional pass through their stored definitions are made as the assembly progresses.

This process is followed by a normal pass-2

↳ SYMTAB stores the symbol definition, symbols which are dependent on this, & of symbols dependent on this symbol

|      |   |      |         |      |                                                   |
|------|---|------|---------|------|---------------------------------------------------|
| Ex:- | 1 | 10F  | HALFSZ  | EOV  | MAXLEN/2                                          |
|      | 2 |      | MAXLEN  | EOV  | BUFFEND - BUFFER                                  |
|      | 3 |      | PREVBT  | EOV  | BUFFER - 1                                        |
|      | 4 | 102H | BUFFER  | RESB | NO76 $\Rightarrow$ (vec) 16                       |
|      | 5 | 203H | BUFFEND | EOV  | * - symbolic Mc command<br>relative counter value |

→ below fig shows the symbol table entry when it reads line no. 1 indicating that HALFSZ depends on created value



|        |      |                 |              |
|--------|------|-----------------|--------------|
| BUFEND | 41   | MAXLEN          | φ            |
| HALFS2 | 41   | MAXLEN          | φ            |
| PREVBT | 1033 |                 | φ            |
| MAXLEN | 21   | BUFEYD - BUFFER | → HALFS2   φ |
| BUFFER | 1034 |                 | φ            |

(d)

↳ Fig (d) shows the symbol table entry after scanning line no. ⑦, whose location counter value is 103h. MAXLEN dependency value 42 is reduced to 41.

|        |      |   |
|--------|------|---|
| BUFEND | 203h | φ |
| HALFS2 | 800  | φ |
| PREVBT | 1033 | φ |
| MAXLEN | 1000 | φ |
| BUFFER | 1034 | φ |

(c)

↳ fig (c) indicates the complete symbol table entry process.



(b)

→ Fig (b) shows symbol table entry after reading line no 2  
As we see, MAXLEN depends on 2 symbols BUFEYD &  
BUFFER. ∴ MAXLEN  $\neq$  9.



(c)

Thus two are dependent  
on value of Buffer.

## CHAPTER 4

# Macro Processors

- 4.1 Basic Macro Processor Functions
  - 4.1.1 Macro Definition and Expansion
  - 4.1.2 A Simple Bootstrap Loader

## Chapter 11 : Programming

- we are going to study definition of macro
- what is the need for macro
- Data structures used in macro invocation and expansion

Macro : It is a single instruction that expands automatically into a set of instructions to perform a particular task. Thus macro instructions allow the programmer to write a shorthand version of a program, and leave the mechanical details to be handled by the macroprocessor.

Ex: In 8086, 7 instructions (STA, STB, etc) is required to save the contents of all registers before calling a subprogram, but by using a macro instruction, the programmer can write a single short macro instruction required like SAVEREGS. The SAVEREGS would be expanded into seven instructions required to save the contents of all registers. LOADREGS macro instruction would be used to reload the register contents after returning from the subprogram.

- \* → Macro processor performs no analysis of the text it handles and is not concerned about the meaning of the involved statements during macro expansion. ∴ The design of a macro processor is machine independent.

(ii) Basic macro functions:

- Macros refers to a set of statements which will replace every invocation to it. The two concepts associated with macros are :

- (i) Macro definition
- (ii) Macro expansion

(i) Macro Definition:

- Consists of macro prototype, one or more module and macro preprocessor.

- macro definition is a set of statements present in between a macro header statement (MACRO) and a macro end statement (MEND). MACRO and MEND are two assembly directives used in macro definition.

\* Syntax of macro prototype:

$\langle \text{macro name} \rangle [ \langle \text{formal parameter specification} \rangle [, \dots] ]$

where  $\langle \text{macro name} \rangle$ : The mnemonic field of a statement

$\langle \text{formal parameter} \rangle$ :  $\langle \text{parameter name} \rangle [ \langle \text{parameter kind} \rangle ]$

.. each parameter begins with 'f'

Syntax: Macro call

<macro name> [<actual parameter specification> [ ... ]]

→ In general, macro definition is given as

NAME MACRO PARAMETERS

NAME MACRO PARAMETERS

:

:

:

body; the statements which are generated as the expansion  
of the macros

:

:

MEND

// macro invocation is as

:

:

:

NAME PARAMETERS

:

:

:

→ As in fig 4.1, macro definition is at loc 10

10 RDBUFF MACRO 4INDEX, 4BUFADR, 4RECLTH

:

45 MEND

→ macro invocation in fig 4.1

190 RDBUFF F1, BUFFER, LENGTH

- (iii) Macro Expansion
- Main invocation statements are the statements of the macro body that are expanded each time the main is invoked.
  - The program in fig n.1 is supplied as input to a macro processor.
  - Fig n.2 shows the output that would be generated by the macro processor.
  - In expanding the macro invocation on line 190, argument F1 is substituted for the parameter F1NDER, BUFFER is substituted for \$BUFAADR, LENGTH is substituted for \$RECLEN.
  - Lines 190a through 190m show the complete expansion of the macro invocation on line 190.
  - Same in lines 210a through 210h for WRBUF macro.
  - As we see the macro body does have any label.
  - If for example, line 140 → JEB \*-3 and line 155 JLT \*+6. If we put label, it would be generated twice on line 210d and 230d resulting in an error (duplicate label definition) when the program is assembled.
  - \*-3, \*+9, ... indicate re-relative addressing

| Line | Source statement | Line   | Source statement |                            |
|------|------------------|--------|------------------|----------------------------|
| Last | OpCode           | OpCode | OpCode           |                            |
| 5    | COPY             | 5      | COPY             |                            |
| 10   | FIRST            | 10     | START            | 0                          |
| 15   | LAST             | 15     | STOP             | RETUR                      |
| 20   | RECD             | 20     | RECD             | INPUT TO OUTPUT            |
| 25   | RECD             | 25     | RECD             | CLEAR LOOP COUNTER         |
| 30   | CLEAR            | 30     | CLEAR            | CLEAR LOOP COUNTER         |
| 35   | CLEAR            | 35     | CLEAR            | CLEAR                      |
| 40   | MAX              | 40     | CLEAR            | S                          |
| 45   | LEN              | 45     | ADDR             | +4096                      |
| 50   | TEST             | 50     | STL              | ADDR                       |
| 55   | INPUT            | 55     | FLD              | INPUT                      |
| 60   | MAX              | 60     | FLD              | MAXIMUM RECORD LENGTH      |
| 65   | COND             | 65     | FLD              | TEST INPUT DEVICE          |
| 70   | AS               | 70     | FLD              | LOOP JUMP, READY           |
| 75   | JBR              | 75     | FLD              | READ CHARACTER INTO REG A  |
| 80   | MAX              | 80     | FLD              | TEST FOR END OF RECORD     |
| 85   | COND             | 85     | FLD              | EXIT LOOP IF NOT           |
| 90   | AS               | 90     | FLD              | STORE CHARACTER IN BUFFER  |
| 95   | JBR              | 95     | FLD              | LOOP UNLESS MAXIMUM LENGTH |
| 100  | STC              | 100    | FLD              | HAS BEEN REACHED           |
| 105  | LINR             | 105    | FLD              | SAVE RECORD LENGTH         |
| 110  | STC              | 110    | FLD              | TEST FOR END OF FILE       |
| 115  | MACRO            | 115    | FLD              | NO                         |
| 120  | MACRO            | 120    | FLD              | ENDFL                      |
| 125  | DATA             | 125    | FLD              | 05, BUFFER LENGTH          |
| 130  | LENTH            | 130    | FLD              | WRITE OUTPUT RECORD        |
| 135  | TD               | 135    | FLD              | CLEAR LOOP COUNTER         |
| 140  | JBR              | 140    | FLD              | LSRTH                      |
| 145  | AS               | 145    | FLD              | GET CHARACTER FROM BUFFER  |
| 150  | TEST             | 150    | FLD              | TEST OUTPUT DEVICE         |
| 155  | COND             | 155    | FLD              | LOOP JUMP, READY           |
| 160  | AS               | 160    | FLD              | WRITE CHARACTER            |
| 165  | JBR              | 165    | FLD              | LOOP UNTIL ALL CHARACTERS  |
| 170  | MAX              | 170    | FLD              | HAVE BEEN WRITTEN          |
| 175  | COND             | 175    | FLD              | LOOP                       |
| 180  | STC              | 180    | FLD              | RECD                       |
| 185  | TEST             | 185    | FLD              | WECDP                      |
| 190  | ADDR             | 190    | FLD              | 05, DFL, WRTES             |
| 195  | FLD              | 195    | FLD              | INSERT FOR MARKER          |
| 200  | COND             | 200    | FLD              | CLEAR LOOP COUNTER         |
| 205  | FLD              | 205    | FLD              | LSRTH                      |
| 210  | TEST             | 210    | FLD              | TO                         |
| 215  | COND             | 215    | FLD              | =X'05'                     |
| 220  | FLD              | 220    | FLD              | JBR                        |
| 225  | FLD              | 225    | FLD              | *X'05'                     |
| 230  | FLD              | 230    | FLD              | WRTES                      |
| 235  | FLD              | 235    | FLD              | WRTES                      |
| 240  | FLD              | 240    | FLD              | LOOP UNTIL ALL CHARACTERS  |
| 245  | FLD              | 245    | FLD              | HAVE BEEN WRITTEN          |
| 250  | FLD              | 250    | FLD              | ALR                        |
| 255  | FLD              | 255    | FLD              | *-14                       |
| 260  | FLD              | 260    | FLD              | ADDDR                      |
| 265  | FLD              | 265    | FLD              | TEST                       |
| 270  | FLD              | 270    | FLD              | FLD                        |
| 275  | FLD              | 275    | FLD              | TEST OUTPUT DEVICE         |
| 280  | FLD              | 280    | FLD              | LOOP UNTIL READY           |
| 285  | FLD              | 285    | FLD              | WRTES                      |
| 290  | FLD              | 290    | FLD              | LOOP UNTIL ALL CHARACTERS  |
| 295  | FLD              | 295    | FLD              | HAVE BEEN WRITTEN          |
| 300  | FLD              | 300    | FLD              | RECD                       |
| 305  | FLD              | 305    | FLD              | LSRTH                      |
| 310  | FLD              | 310    | FLD              | TEST                       |
| 315  | FLD              | 315    | FLD              | FLD                        |
| 320  | FLD              | 320    | FLD              | TEST LENGTH OF RECORD      |
| 325  | FLD              | 325    | FLD              | LENGTH OF RECORD           |
| 330  | FLD              | 330    | FLD              | 4096-BYTE BUFFER AREA      |
| 335  | FLD              | 335    | FLD              | 4096-BYTE BUFFER AREA      |
| 340  | FLD              | 340    | FLD              | FIRST                      |

Figure 4.1 Use of macros in a SIC/XE program.

Figure 4.2 Program from Fig. 4.1 with macros expanded.

```

1 MACROS          (Defines SIC standard version macros)
2 RDBUFF          &TINDEV, &BFUPDR, &RECFLTH
3
4 MEND            [SIC standard version]
5
6 MEND            [End of RDBUFF]
7           &TINDEV, &BFUPDR, &RECFLTH
8
9 MEND            [SIC standard version]
10
11 MEND           [End of RDBUFF]
12
13 MEND            [End of MACROS]
14
15 MACROX          (Defines SIC/XE macros)
16 FDBUFF          &TINDEV, &BFUPDR, &RECFLTH
17
18 MEND            [SIC/XE version]
19
20 MEND            [End of RDBUFF]
21           &TINDEV, &BFUPDR, &RECFLTH
22
23 MEND            [SIC/XE version]
24
25 MEND            [End of RDBUFF]
26
27 MEND            [End of MACROS]

```



Figure 4.3 Example of the definition of macros within a macro body.

Figure 4.4 Contents of macro processor tables for the program in Fig. 4.1: (a) entries in NAMTAB and DEFTAB defining macro RDBUFF, (b) entries in APGTAB for invocation of RDBUFF on line 190.

## Multi-pass Macro Processor and its advantages

- For designing two pass macro processor, all macro definitions are processed during pass-1 and all macro invocation statements are expanded during pass-2.
- The two-pass macro processor would not allow the body of one macro instruction to contain definition of other macros, if all macros defined during the pass before any macro invocation were expanded.
- Example of recursive macro definition is shown in Fig 4.3 (a) for SIC machine and Fig 4.3 (b) for SIC/XE machine.
- The same program can be run on either a SIC machine or SIC/XE machine. Invocation of MACROS or MACROX is only changed for use.
- A one-pass macro processor that alternates between macro definition and macro expansions in a recursive way is able to handle recursive macro definition provided that a macro definition of a macro should appear before the invocation.

But it is not so for the preprocessor. (Why not?)

→ There are three main data structures involved

(i) Definition Table (DEFTAB)

↳ It stores the macro definitions which contains the macro prototype and the statements that make up the macro body.

↳ Comment lines are omitted as they are not part of the macro expansion.

↳ References to the macro instruction parameters are converted to a positional notation for efficiency in substituting arguments.

(ii) Name Table (NAMTAB)

↳ It stores the macro record, which serves as index to DEFTAB.

↳ For each macro instruction defined, NAMTAB contains pointers to the beginning and end of the definition in DEFTAB DEFTAB.

(iii) Argument Table (ARGTAB)

↳ Used during the expansion of macro invocations.

↳ When a macro invocation start is recognized, the arguments are stored in ARGTAB according to their position in the argument list.

↳ When it is expanded, arguments from ARGTAB are substituted for the corresponding parameters in the macro body.



```

substitute positional notation for parameters
enter line into DEFTAB
if OPCODE = 'MACRO' then
  LEVEL := LEVEL + 1
else if OPCODE = 'MEND' then
  LEVEL := LEVEL - 1
end {if not comment}
store in NAMETAB pointers to beginning and end of definition
end {DEFLINE}

procedure EXPAND
begin
  EXPANDING := TRUE
  get first line of macro definition [prototype] from DEFTAB
  set up arguments from macro invocation in ARGTAB
  write Macro invocation to expanded file as a comment
  while not end of macro definition do
    begin
      GETLINE
      PROCESSLINE
      and {while}
      EXPANDING := FALSE
    end {EXPAND}
  end {while}
end {EXPAND}

procedure GETLINE
begin
  if EXPANDING then
    begin
      get next line of macro definition from DEFTAB
      substitute arguments from ARGTAB for positional notation
    end {if}
  else
    read next line from input file
  end {GETLINE}

```

Figure 4.5 Algorithm for a one-pass macro processor

Handling nested macro definition within macros

→ In DFTAB (define procedure), when a macro definition is being entered into DEFTAB, the normal approach is to continue until an MEND directive is reached. This will not work for nested宏 defn since the first MEND encountered in the inner macro will terminate the whole macro definition process.

→ To solve this problem, a PDEFINE procedure is used which maintains a counter named LEVEL. The LEVEL value is incremented by 1 when macro directive is read. The value is determined by 1 when MEND is read. When LEVEL value becomes 0, the directive is read. When LEVEL value becomes 0, the MEND that corresponds to the original macro directive has been found. This process is very much like matching left and right parenthesis when scanning an arithmetic expression.

## LOADERS AND LINKERS

3 processes a system program performs →

1. Loading - bringing the object program into memory for execution

2. Relocation -

modify the object program so that it can be located at a different location from the original one

3. Linking -

combining 2 or more separate object programs and supply information needed to allow reference b/w them.

Loaders - system program that perform loading function.

- Can also support linking & relocation.

Linker - separate system program for linking operation.

### LOADERS [3.1]

→ Basic Loader Function or fundamental

◦ The most basic loader function is →

bringing object program into memory and starting execution.

#### ◦ Absolute Loader

It is the most basic loader that just performs loading function.

It performs all its functions in a single pass.

- It checks Header record to verify that correct program is being loaded and that it will fit in the memory space available

- It reads each Text record and the object code is moved to its corresponding location

- The End record for indicates end of object and gives address of location from where execution starts.

## ALGORITHM : Absolute Loader

```
begin
    read Header record
    verify program name and length
    read first Text record
    while record type ≠ 'E' do
        begin
            if object code is in character form, convert it
            to internal representation
            move object code to specified location in memory
            read next object program record
        end
    jump to address specified in End record
end.
```

### Note →

In the object program, each byte of assembled code is given using hexadecimal representation in character form.

e.g - OP code for STL → 14

It is represented using pair of characters "1" & "4"

So, when loader reads this, they occupy 2 bytes of memory. But, in the instruction loaded for execution that is to be stored as 1 byte represented by hexadecimal 14.

⇒ Each pair of bytes from object program record must be packed together into 1 byte during loading.

∴ This method of representation is insufficient.

∴ So, object program can be stored in binary form → each byte of object code stored in 1 byte of memory but they aren't easy to read for humans.

• Simple Bootstrap Loader

- Special absolute loader that is first executed when computer is first started or restarted.
- It loads the 1<sup>st</sup> program to be run on the computer, ie OS.

| Line | BOOT                                                   | START | O                                | BOOTSTRAP LOADER FOR SIC/XE |
|------|--------------------------------------------------------|-------|----------------------------------|-----------------------------|
| 0    | •                                                      |       | O                                | BOOTSTRAP LOADER FOR SIC/XE |
| 1    | • THIS BOOTSTRAP READS OBJECT CODE FROM DEVICE F1 AND  |       |                                  |                             |
| 2    | • ENTERS IT INTO MEMORY LOCATION STARTING FROM         |       |                                  |                             |
| 3    | • ADDRESS 80h. AFTER LOADING IS COMPLETE CONTROL JUMPS |       |                                  |                             |
| 4    | • TO 80h IS EXECUTED TO BEGIN EXECUTION OF PROGRAM.    |       |                                  |                             |
| 5    | • REGISTER X CONTAINS NEXT ADDRESS TO BE LOADED        |       |                                  |                             |
| 6    | CLEAR A                                                |       | CLEAR REG A - TO 0               |                             |
| 7    | LDX #128                                               |       | INITIALIZE REG X TO 80h          |                             |
| 8    | LOOP JSUB GETC                                         |       | READ HEX DIGIT FROM PROG         |                             |
| 9    | RMD A,S                                                |       | SAVE IN. REG S.                  |                             |
| 10   | SHIFL S,4                                              |       | MOVE TO HIGH-ORDER 4 BITS        |                             |
| 11   | JSUB GETC                                              |       | GET NEXT HEX DIGIT               |                             |
| 12   | ADDR S,A                                               |       | COMBINE DIGITS TO 1 BYTE         |                             |
| 13   | STCH O,X                                               |       | STORG AT ADDR. IN X.             |                             |
| 14   | TIXR X,X                                               |       | ADD 1 TO MEMORY ADDRESS          |                             |
| 15   | J LOOP                                                 |       | LOOP TILL EOF REACHED.           |                             |
| 16   | .                                                      |       |                                  |                             |
| 17   | .                                                      |       |                                  |                             |
| 18   | • SUBROUTINE TO READ FROM DEVICE AND CONVERT IT        |       |                                  |                             |
| 19   | • FROM ASCII TO HEXA DIGIT VALUE AND RETURN IT         |       |                                  |                             |
| 20   | • TO REG A. IF EOF ENCOUNTERED, CONTROL TRANSFERRED    |       |                                  |                             |
| 21   | • TO 80h                                               |       |                                  |                             |
| 22   | •                                                      |       |                                  |                             |
| 23   | GETC TD INPUT                                          |       | TEST INPUT DEVICE                |                             |
| 24   | JEG GETC                                               |       | LOOP UNTIL READY                 |                             |
| 25   | RD INPUT                                               |       | READ CHARACTER                   |                             |
| 26   | COMP #41                                               |       | IF CHAR IS 04h (EOF)             |                             |
| 27   | JEG 80                                                 |       | JUMP TO START OF PROG LOADED     |                             |
| 28   | COMP #48                                               |       | COMP TO 30h ('0')                |                             |
| 29   | JLT GETC                                               |       | SKIP CHAR < '0'                  |                             |
| 30   | SUB #48                                                |       | SUBTRACT 30h FROM ASCII          |                             |
| 31   | COMP #10                                               |       | FOR 'A' TO 'F', RESULT < 10 THEN |                             |
| 32   | JLT RETURN.                                            |       | CONVERSION COMPLETE, ELSE.       |                             |
| 33   | SUB #7                                                 |       | SUBTRACT 7 MORE.                 |                             |
| 34   | RETURN RSUB INPUT BYTE X 'F1'                          |       | RETURN TO CALLER                 |                             |
| 35   | B END LOOP                                             |       | INPUT DEVICE                     |                             |

- The bootstrap begins at address 0. [Line 0]
- It loads the OS starting at address 80h by initializing register & (the pointer) to 80h [LINE 8]

- As this is the 1<sup>st</sup> prog to be loaded, its loading is simple.

The object program from device FI is

- represented as a hexadecimal digit for 1 byte
- has no Header or End record or any other control information.

Hence, the object code is loaded into consecutive bytes of memory starting at 80h.

- Subroutine GETC -

It reads 1 char. from device FI and converts it from ASCII to the hexadecimal digit it's represented.

When it encounters EOF, the control moves to 80h (i.e start of loaded program)

So in the program,

the main loop keeps track of the next memory location for loading and reads the 2 characters & stores it as 1 byte.

The subroutine reads the character and converts it from ASCII to its represented hex value.

3.2

→ Disadvantage of absolute loader

- o The program needs absolute memory location for loading to be specified by the programmer.  
But in large & advanced machine, multiple independent programs run together share memory.  
Here predicting memory for loading is impossible.
- o The subroutines of libraries aren't used efficiently.  
For efficient use only required subroutines should be loaded but this isn't possible with absolute addresses.

### MACHINE DEPENDENT LOADER FEATURES

→ In most modern computer, the loaders also perform the relocation and linking function, in addition to the basic loading function.

### • Relocation

- Loaders that allows relocation are called relocating loader or relative loader.

- Methods for specifying relocation as part of object program

(i) Modification record is used to describe each part of object code that must be changed when program relocates

And the instructions whose value is affected by relocation are ones that use extended format.

The modification record specify the start address & length of fields to be altered. It then describes the modification to be performed

But, this method isn't suited for all machines

eg- In a SIC machine, there is no relative addressing & so all instructions need to be modified during relocation. This leads to a lot of Modification record that dramatically increases object code size.

(ii) There is a relocation bit associated with Text Record each word of object code by

Text Record used in machines that it is primarily use direct address & fixed instruction format

In SIC machine, each instruction occupies 1 word, i.e. one relocation bit per instruction

The relocation bits gathered together to a bit mask which is present in the Text Record following the ~~second~~ length indicator.

eg- T<sub>1</sub> 001057, OA<sub>1</sub> 800 ^ 100036<sub>1</sub> 4C000H F1<sub>1</sub> 001000

if relocation bit correspond to a word is  
- 1 → modification required  
prog's start addr is to be  
added to this word during  
relocation

- 0 → no modification required.

If Text record has fewer than 12 words,  
then ~~corresponding~~ for unused words  $\leftrightarrow$   
corresponding words relocation bit = 0.

eg - FFC ( 1111 1111 1100 )

First 10 words need to be modified.

### (ii) Some

Some computers have hardware relocation  
capability that eliminates need of loader  
to relocate program.

The SIC/XE machine usually use the Modification  
record scheme for relocation.

ALGORITHM: SIC/XE relocation loader

begin.

get PROGADDR from operating system

while not end of input do

begin

read next record

while record type ≠ 'E' do

begin

read next input record

while record type = 'T' then

begin.

more object code from a  
record to location ADDE  
+ specified address

end

while record type = 'M'

add PROGADDR at location  
PROGADDR + specified address

end.

end

end.

The SIC machines usually use the modification bit scheme.

ALGORITHM: SIC relocation loader algorithm.

begin

get PROGADDR from operating system

while not end of input do

begin

read next record

while end ≠ record type ≠ 'E' do

while record type = 'T'

begin

: get length = second data.

: mask bits(M) as third data.

for ( i=0, i<length, i++ )

if M<sub>i</sub> = 1 then

add PROGADDR at the  
location PROGADDR + specified  
address.

else move object code from record  
to location PROGADDR +  
specified address

read next record .

end

end

end.

## Program Linking

- programs made up of multiple control sections  
can be assembled in 2 ways

- all control sections together

- i.e. in same invocation of assembly

- each independently.

In both cases they will appear as separate segments  
of object code after assembly

Assembler sees code only as control sections  
that are to be loaded, relocated & linked. It  
doesn't need to know which control sections

were assembled at same time

Consider 3 programs each contg style contd  
Section :

| Loc  |       | Source statement |                         |  | Object code |
|------|-------|------------------|-------------------------|--|-------------|
| 0000 | PROGA | START            | 0                       |  |             |
|      |       | EXTDEF           | LISTA,ENDA              |  |             |
|      |       | EXTREF           | LISTB,ENDS,LISTC,ENDC   |  |             |
|      |       |                  |                         |  |             |
| 0020 | REF1  | LDA              | LISTA                   |  | 03201D      |
| 0021 | REF2  | +LDT             | LISTB+4                 |  | 77100004    |
| 0027 | REF3  | LDX              | #ENDA-LISTA             |  | 050014      |
|      |       |                  |                         |  |             |
| 0040 | LISTA | EQU              | *                       |  |             |
|      |       |                  |                         |  |             |
| 0054 | REF4  | EQU              | *                       |  |             |
| 0054 | REF4  | WORD             | ENDA-LISTA+LISTC        |  | 000014      |
| 0055 | REF5  | WORD             | ENDC-LISTC-10           |  | FFFFF6      |
| 005A | REF6  | WORD             | ENDC-LISTC+LISTA-1      |  | 00003F      |
| 005D | REF7  | WORD             | ENDA-LISTA-(ENDB-LISTB) |  | 000014      |
| 0060 | REF8  | WORD             | LISTB-LISTA             |  | FFFFC0      |
|      | END   | REF1             |                         |  |             |

| Loc  |       | Source statement |                         |  | Object code |
|------|-------|------------------|-------------------------|--|-------------|
| 0000 | PROGB | START            | 0                       |  |             |
|      |       | EXTDEF           | LISTB,ENDS              |  |             |
|      |       | EXTREF           | LISTA,ENDA,LISTC,ENDC   |  |             |
|      |       |                  |                         |  |             |
| 0016 | REF1  | LDA              | LISTA                   |  | 03100000    |
| 001A | REF2  | +LDT             | LISTB+4                 |  | 772027      |
| 001D | REF3  | +LDX             | #ENDA-LISTA             |  | 05100000    |
|      |       |                  |                         |  |             |
| 0060 | LISTB | EQU              | *                       |  |             |
|      |       |                  |                         |  |             |
| 0070 | ENDB  | EQU              | *                       |  |             |
| 0070 | REF4  | WORD             | ENDA-LISTA+LISTC        |  | 000000      |
| 0073 | REF5  | WORD             | ENDC-LISTC-10           |  | FFFFF6      |
| 0076 | REF6  | WORD             | ENDC-LISTC+LISTA-1      |  | FFFFFF      |
| 0079 | REF7  | WORD             | ENDA-LISTA-(ENDB-LISTB) |  | FFFFP0      |
| 007C | REF8  | WORD             | LISTB-LISTA             |  | 000050      |
|      | END   |                  |                         |  |             |

| Loc  |       | Source statement |                         |  | Object code |
|------|-------|------------------|-------------------------|--|-------------|
| 0000 | PROGC | START            | 0                       |  |             |
|      |       | EXTDEF           | LISTC,ENDC              |  |             |
|      |       | EXTREF           | LISTA,ENDA,LISTB,ENDB   |  |             |
|      |       |                  |                         |  |             |
| 0018 | REF1  | +LDA             | LISTA                   |  | 03100000    |
| 001C | REF2  | +LDT             | LISTB+4                 |  | 77100004    |
| 0020 | REF3  | +LDX             | #ENDA-LISTA             |  | 05100000    |
|      |       |                  |                         |  |             |
| 0030 | LISTC | EQU              | *                       |  |             |
|      |       |                  |                         |  |             |
| 0042 | ENDC  | EQU              | *                       |  |             |
| 0042 | REF4  | WORD             | ENDA-LISTA+LISTC        |  | 000030      |
| 0045 | REF5  | WORD             | ENDC-LISTC-10           |  | 000008      |
| 0048 | REF6  | WORD             | ENDC-LISTC+LISTA-1      |  | 000011      |
| 004B | REF7  | WORD             | ENDA-LISTA-(ENDB-LISTB) |  | 000000      |
| 004E | REF8  | WORD             | LISTB-LISTA             |  | 000000      |
|      | END   |                  |                         |  |             |

- LISTA, LISTB, LISTC → list of items of each prog.  
ENDA, ENDB, ENDC → marks end of lists  
Reference to external symbol  $\oplus n$   
REF1 to REF3 → as instruction operands  
REF4 to REF8 → values of data word.

### REF1

In PROGA, REF1 is a reference to label within the program so, no modification for relocation or linking needed

In PROGB & PROGC → REF1 is reference to an external symbol so, assembly uses extended format instruction with address-field → 0000 & Modification record required to tell loader the add value of LISTA is to be added after linking

### REF2

Similar to REF1 but here PROGB has local reference & PROGA & PROGC have external symbol.

### REF3

It is an immediati operand whose value is ENA-LISTA  
In PROGA - it can be directly computed but in  
the other 2 ~~progs~~, the value is unknown  
The expression is assembled as extend refena &  
final result is an absolute value independent  
of location of where program is loaded.

### General approach →

Assembler evaluate as much of the expression as it can & remain is passed onto loader via Modification record.

### eg - REF4

In PROGA assembler can evaluate all expression except for LISTC

The result is an initial value of  $000014h$  and 1 Modification record

In PROGB no terms can be evaluated by the assembler  
The result is an initial value of  $000000h$  & 3 Modification record

In PROGC assembly can supply value of LISTC but result is unknown.

Initial value is relative address of LISTC and  
1 Modification record telling to add value of  
2 subtract value of LISTA.

— Consider the 3 progs have been loaded into memory with PROGA start at address 4000, with PROGB & PROGC immediately following.

#REF4 to REF8 will end up with same value in each of the 3 program after relocation and linking.

Eg - Value of reference REF4 in PROGA located at 4054 ( $4000 + \text{relative address of REF4}(0054)$ )

Initial value of REF4  $\rightarrow 000014h$  (from the Text record)

To this we add address assigned to LISTC (4112) [beginning of PROGC + 30]

$\Rightarrow$  value in memory 4054

$$\rightarrow 000014 + 004112 \\ = 004126.$$

### Object Program



### Load Address

PROGA 004000

PROGB 004063

PROGC 0040E2

In PRO4B for REF4

located at relative address 70  
so, memory location ( $4063 + 70 \rightarrow 40D3$ )  
initial value  $\rightarrow 000000$

+ ENDA  $\rightarrow 4064$  ( $4000 + 54$ )  
+ LISTC  $\rightarrow 4112$  ( $40E2 + 30$ )  
- LISTA  $\rightarrow 4040$  ( $4000 + 40$ )  
 $= \underline{\underline{004126}}$

→ same as in PROGA  
Similarly for PRO1C, REF4 also results in 004126

\* REF1 - REF3 → which are references that are instruction operand, calculated values after loading aren't always equal as additional address calculation step involved in case of base or PC, relative instruction

e.g. - REF1  $\rightarrow$

For PROGA  $\rightarrow$  target address 4040.  
displacement 01D + PC (4023)

For PROGA  $\rightarrow$  REF1 is extended format instruction with direct address which is 4040  
(LISTA location  $\rightarrow 4000 + 40 = 4040$ )

→ Algorithm & Data Structure for Linking Loads.

- Algorithm for linking & relocating loader that uses modification record for relocation so that linking & relocation function can be performed using same mechanism.
- I/P to loader is set of object programs that are to be linked together
- . Programs may contain reference to symbol whose definition comes later & so linking operation can't be performed till the external symbol is assigned an address.

- o Linking loader makes 2 passes over its I/P.
  - Pass 1  $\rightarrow$  assigns address to all external symbols
  - Pass 2  $\rightarrow$  performs actual linking, relocation

Data structure needed,

- ESTAB → external symbol table
  - it stores name & address of each external symbol in the set of control sections (programs) that are loaded
  - Hashed organization is used for this table.

### CPROG4

Important variable needed,

- PROGADDR → program load address
- CSADDR → control section address

PROGADDR is the beginning address where linking program is to be loaded. Its value is supplied by the OS

CSADDR - start address of 1st control section currently being scanned by loader

### Pass1

- loader only concerned with Header & Define record types.
- Value for PROGADDR is obtained from OS, which is the CSADDR for the 1st control section.
- Control section name is obtained from Header record and is entered in ESTAB with its corresponding value given by CSADDR.
- All external symbols that appear in Define record also entered in ESTAB. Their address is relative address + CSADDR.
- When end record reached,
  - control section length (SDH) added to CSADDR → this is CSADDR for next section.

Pass 1:

```

begin
  get PROGADDR from operating system
  set CSADDR to PROGADDR (for first control section)
  while not end of input do
    begin
      read next input record (Header record for control section)
      set CSLTH to control section length
      search ESTAB for control section name
      if found then
        set error flag (duplicate external symbol)
      else
        enter control section name into ESTAB with value CSADDR
    while record type ≠ 'E' do
      begin
        read next input record
        if record type = 'D' then
          for each symbol in the record do
            begin
              search ESTAB for symbol name
              if found then
                set error flag (duplicate external symbol)
              else
                enter symbol into ESTAB with value
                  (CSADDR + indicated address)
            end (for)
        end (while ≠ 'E')
        add CSLTH to CSADDR (starting address for next control section)
      end (while not EOF)
    end (Pass 1)

```

Pass 2:

- Here actual loading, relocation & linking is done
- As Each Text Record is read, object code is moved to its specified address which is, relative address + CSADDR.
- When Modification Record is encountered, symbol required for modification is looked up in ESTAB & its value is added or subtracted from intended location.

Pass 2:

```

begin
  set CSADDR to PROGADDR
  set EXECADDR to PROGADDR
  while not end of input do
    begin
      read next input record (Header record)
      set CSLTH to control section length
      while record type ≠ 'S' do
        begin
          read next input record
          if record type = 'T' then
            begin
              (if object code is in character form, convert
               into internal representation)
              move object code from record to location
                (CSADDR + specified address)
            end (if 'T')
          else if record type = 'M' then
            begin
              search ESTAB for modifying symbol name
              if found then
                add or subtract symbol value at location
                  (CSADDR + specified address)
              else
                set error flag (undefined external symbol)
            end (if 'M')
        end (while ≠ 'E')
        if an address is specified (in End record) then
          set EXECADDR to (CSADDR + specified address)
        add CSLTH to CSADDR
      end (while not EOF)
      jump to location given by EXECADDR (to start execution of loaded program)
    end (Pass 2)

```

Last step performed by loader,  
transfer of control to loaded program  
to begin execution. The End record for each  
control section may contain address of 1<sup>st</sup> instruction  
in that control section to be executed.  
If more than 1 control section specifies transfer  
address, loader uses the last one encountered  
if no control section specifies transfer address, loads  
uses beginning of linked program (i.e. PROG ADDR).

→ Algorithm can be made more efficient if we  
use reference no for external symbol in Modification  
record, instead of the symbol name  
Then we will need to add a Refn record that  
specifies the symbol & its reference no  
eg. R<sub>A</sub> 02 LISTB 03 ENDB 04 LISTC 05 ENDC  
→ Reference record in PROGA.  
So the modification record will be of the form  
M, 000024, 05, +02

Advantages of this method →  
• avoids multiple searches of ESTAB for  
same symbol while loading of control section  
Now, only 1 lookup in ESTAB required for each  
external reference symbol.

### MACHING INDEPENDENT LOADER FUNCTIONS

#### o Automatic Library Search. →

- Many linking loaders can automatically incorporate subroutines from program libraries into the program being loaded.
- Some std. libraries are used in such a way, other libraries may be specified by control statements or by parameters to loaders.

- Subroutines called by program being loaded are automatically fetched from the library and linked to the program while loading. This is known as.
- Automatic library call (or) library search.

How is it done?

The linking loader that supports this must be able to keep track of external symbols used that aren't part of the input.

To do this the loader enters all external symbol it encounters into the ESTAB, when it if the symbol isn't already not present. When it encounters the external symbol's definition it complete its entry (if present) by filling in its address.

If at the end of Pass 1, some unresolved symbols present in ESTAB then loader searches for them in the libraries.

It is possible that subroutines fetched from libraries may also contain external symbols so library search needs to be repeated till all external references have been resolved.

This process allows programmers to override the standard library's subroutines by providing our own subroutines as input to loader so when loader goes to search library for unresolved symbol reference, the overridden subroutine reference is already defined & resolved.

How libraries are searched?

The libraries themselves ~~are~~ have assembled or compiled version of subroutines. It is possible to search them by their Define records, but it is inefficient.

Special structure called directory used to search libraries. It contains name of each routine & a pointer to its address within the file. If subroutine referred to by multiple names, there is an entry for each name and all point to same location.

- This same technique applies to resolution of <sup>external</sup> reference to data items.

#### • Loader Options

- Loaders allow options that modify standard processing of the loader. Many loaders have a special command language that is used to specify options. Sometimes there is a separate ilp.fil. that contains such control statement, sometimes the statements are embedded in the primary input stream b/w object programs or can be included in the source program.

- On some systems options are specified as part of job control language that is processed by the OS. Here, OS incorporates the options specified into a control block that is made available to loader when it's invoked.

#### - Some options -

- to select alternative sources of ilp  
eg - INCLUDE program-name (lib-name)  
This directs loader to read object-program from a library & treat it as primary loader ilp's part.

- to allow users to delete external symbols or entire sections
- to change external references within program being loaded & linked

**DELETE** (sect-name)

deletes control section(s) from set of progs being loaded

**CHANGE** name1, name2

name1 is changed to name2 wherever it appears in the object prog.

eg - Consider a main program say COPY that has 2 subprograms - RDREC : to read records WRREC : to write records

Each has its own control section

Suppose utility subroutine UTLIB available such that it contains subroutines READ & WRITE and it is more favorable for COPY to use them

As a temp measure, first we use some load commands to make these changes without reassembling the program, to test the new routines

|         |               |                                                                                                       |
|---------|---------------|-------------------------------------------------------------------------------------------------------|
| INCLUDE | READ (UTLIB)  | } tells loader to include control section READ & WRITE from UTLIB library                             |
| INCLUDE | WRITE (UTLIB) |                                                                                                       |
| DELETE  | RDREC, WRREC  | → tells not to load RDREC & WRREC                                                                     |
| CHANGE  | RDREC, READ   | } → changes all external references to RDREC to refer to READ & reference to WRREC to refer to WRITE. |
| CHANGE  | WRREC, WRITE  |                                                                                                       |

- LIBRARY MYLIB
  - it automatically includes library routines to satisfy external references

- NOCALL SYMBOLS

→ tells loader not that external references are to remain unresolved.

- option to specify that no external reference is to be resolved  
Usefull when programs are to be linked but not immediately executed
- option to specify where execution should begin
- option to control whether or not loader should execute program if error is detected during load

## LOADER DESIGN OPTIONS

### Organisation of loaded function

- linking & relocation takes place at load time  
(used by linking loader)

- linkage editors — linking is performed prior to load time

- dynamic linking — linking is performed at execution time

### → Linkage Editors



a) Linkage editor



In linkage editor, the source program is first assembled or compiled

## Linkage Editor vs Linking Loader.

- Linking Loader performs all linking & relocation function if loads linked program directly into memory for execution.
- Linkage editor produces a linked version of program called load module or executable image, which is written into a file or library for later execution.
- Linkage editor is useful for programs that need to be executed multiple times without reassembling everytime.  
For execution, relocation loader loads program into memory. Only the object code modification required is getting the actual address for load, rest is done during linking. So, now loading can be done in 1 Pass.
- Linking Loader is better when program needs to be reassembled for every execution.
- The linked program produced by linkage editor is in a form that is suitable for processing by relocation loader.
  - All external references are resolved
  - relocation is indicated by some mechanism like Modification record or bit mask.

Information about external references are often retained in the linked program as it allows subsequent relinking of program to replace certain sections, modify external references, etc.
- If actual address for load is known, then linkage editor can perform the relocation, i.e. result is linked program that is exact image of way program will appear in memory.  
But,  
flexibility of load program at any location is preferred over the reduction of overhead for performing relocation at run time.
- Other useful functions →
  - modification of a linked program without having to process the entire program.  
eg - Consider a program PLANNER that has multiple

subroutines. One of its subroutine PROJECT had to be changed due to error or to improve efficiency. After new version of PROJECT is assembled or compiled, linkage editor can replace this subroutine in the linked version of PLANNER. Using some linkage editor commands.

```
INCLUDE PLANNER (PROGLIB)
DELETE PROJECT
INCLUDE PROJECT (NEWLIB)
REPLACE PLANNER (PROGLIB)
```

→ linkage editor can be used to build packages of subroutines or other control sections that are generally used together.

This is useful while dealing with subroutine libraries that support high level programming lang.

e.g. In a typical implementation of FORTAN, there are large number of subroutines that are used to handle formatted input & output. There are large no. of cross-references b/w these subprograms because they are closely related.

But, it is desirable to keep them as separate modules for program modularity & maintainability.

But, same set of cross-references will be processed for almost every FORTAN program linked. This represents a substantial overhead.

We can use the linkage editor to combine the subroutines into a package using commands like,

```
INCLUDE READR (FTNLIB)
INCLUDE WRITER (FTNLIB)
INCLUDE ENCODE (FTNLIB)
:
SAVE FTN10 (STBLIB)
```

The linked module FTN10 can be included in directory of SUBLIB under same name as original subroutines. Thus, search of SUBLIB before FTNLIB would retrieve FTN10 instead of separate routines.

And as FTN10 would already have all cross-references b/w subroutines resolved, these linkage wouldn't need to be reprocessed when user's program is linked.

→ linkage editor allows user to specify that external references are not to be resolved by automatic library search.

eg - If 100 FORTAN program using I/O routines are to be stored in a library, the library will store 100 copies of FTN10 if all external reference were resolved.  
This wastes a lot of library space.

We can use commands to specify that no library search is to be performed during linkage editing and so they can only be resolved during execution.

This will require slightly more overhead due to 2 linkage operations but it results in large saving of library space.

- Linkage editors are in general more flexible than linking loader & also offer more control.
- But they also are more complex and have greater overhead.

→ Dynamic Linking. (or dynamic loading or load on call)

- Here the linking is performed during execution time.  
i.e. a subroutine is loaded & linked to rest of program when it is first called.

- It is used to allow several executable programs to share 1 copy of a subroutine or library.

eg - run-time support routines for a high level lang like C  
for could be stored in a dynamic link library.

A single copy of the routines could be loaded into memory & all executable C programs could be linked to this copy instead of having separate copy for each.

- In object-oriented system, dynamic linking is used for references to software objects.

This allows implementation of object & its methods to be determined during run-time.

The implementation can be changed anytime without affecting program that uses the object.

- Advantages of dynamic linking -

\* it provides ability to load routine only when they are needed  
This results in saving of time & memory space

e.g. - consider program contains subroutines that correct or diagnose error in I/O data during execution. If no error occurs (which can be common) then these subroutines will not be used and so will not be loaded & linked.

- If program has many subroutines but uses only a few depending on its input, then only the subroutines required can be loaded & linked during execution.

- How to accomplish load & linking of called subroutine?

- The routine that must be dynamically loaded must be called via OS service request, i.e. the request is to the part of the loader that is kept in memory. So instead of JSUB instruction that refers to an external symbol.

- So instead of executing a JSUB instruction that refers to an ~~real~~ external symbol, program makes a load & call request to OS with symbolic name of subroutine as the parameter.



Here, the user program sends a load-and-call request for ERRHANDL subroutine.

- The OS examines internal table to determine whether or not routine is already loaded  
If not, routine is loaded from specified user or system library

[Load]



and then control is passed  
to the routine being called,

[Call]



When subroutine completes its processing, it returns to its caller (i.e. OS routine that handles load-and-call request).  
The OS then returns the control to the user program.



After subroutine is completed, the memory that was allocated for loading may be released & used for other purpose. But, this isn't done immediately as if a 2nd call to it occurs, another load operation won't be required.  
So, it is desirable to keep the subroutine till memory isn't required by user.

If subroutine called is still in memory, control is directly passed to it from the dynamic loader.



- In dynamic loading, binding of symbolic name to actual address is delayed from load time until execution time which results in greater flexibility

- But, this also requires more overhead as OS intervenes in the calling process

## → Bootstrap Loaders

In a idle computer with no program in memory, how do thing start?

- When machine is empty and idle then is no need for relocation, <sup>only</sup> absolute address for program being <sup>1<sup>st</sup></sup> loaded is needed. (this program is usually the OS). For this we need an absolute loader loaded.
- Early computers required operator to enter in memory the object code of absolute loader using switches on computer console. But, this is too inconvenient & error-prone.
- In some computer, absolute loader program is permanently present in a ROM. When some hardware signal occurs indicating start up of the system, the machine begins executing this ROM program.  
In some computer, program is executed in the ROM on others, program is copied ~~to~~ to main memory & executed.  
But, it is inconvenient to change the ROM program if modification necessary.
- Intermediate solution,  
have a built-in hardware function (in small ROM program) that reads fixed length records from some device into memory at fixed location  
After reading operation is complete, control is transferred to address in memory where records are stored. These records contain address in machine instructions that absolute loader loads the absolute program that follows.  
If the instructions can't be fit in 1 record, then record causes ready of other records & they in turn cause ready of more records.  
→ hence the term bootstrap.

1<sup>st</sup> record(s) → bootstrap loader.

This loader added to begin of all object programs that

all to be loaded into empty & idle system.

### IMPLEMENTATION EXAMPLE →

→ MS-DOS Linker for Pentium & other x86 system.

- Most MS-DOS compiler & assembly produce object modules, not executable machine language programs.
- These object modules have extension .OBJ and they contain binary image of translated instructions & data of program. It also describes structure of program.
- MS-DOS LINK - linkage editor that combines one or more object modules to produce a complete executable program.

The executable program have extension .EXE.  
LINK can also combine the translated program with other module from object code libraries.

- A typical MS-DOS object module,

| Record Type                | Description                      |
|----------------------------|----------------------------------|
| THEADR                     | Translator Header                |
| TYPDEF<br>PUBDEF<br>EXTDEF | External symbol & references     |
| LNAMES<br>SEGDEF<br>GRPDEF | Segment definition and grouping  |
| LEDATA<br>LIDATA           | Translated instructions & data.  |
| FIXUPP                     | Relocation & linking information |
| MODEND                     | End of object module.            |

- similar to  
Header &  
End record  
of SIC/XF
- THEADR record - specifies name of object module
  - MODEND record - marks end of module & contains reference to entry point of program

- PUBDEF record - contains list of external symbols called public names that are defined in the object module.
  - EXTDEF record - contains list of external symbols that are referenced to in the object module.
- Similar to Define & Refa record of SIC/XE  
Both PUBDEF & EXTDEF contain info abt data type designated by an external name.
- TYPEDEF record - defines the types

- SEGDEF record
  - describes segment in object module including their name, length & alignment

GRPDEF record - specify how these segments are combined into groups

LNAMES record - contains list of all segment & class names used in program.

SEGDEF & GRPDEF refer to segment by giving the position of its name in the LNAMES records.

#### LE DATA

- LEDATA record - contains translated instructions & date from source program  
It is similar to Text record of SIC/XE

LIDATA record - specify translated instructions & date that occur in repeating pattern.

- FIXUPP record - used to resolve external references & carry out address modifications that are associated with relocation & grouping of segment within the program.

It's similar to Modification record of SIC/XE  
But FIXUPP records are more complicated.

A FIXUPP record must immediately follow the LEDATA or LIDATA record to which it applies.

o LINK performs its functions in two Passes.

Pass1 - computes starting address for each segment in the program

It constructs a symbol table that associates an address with each segment (using LNAMES, SEGMENT-SEGDEF & GRPDEF records) and each external symbol (using EXTDEF & PUBDEF records).

If unresolved external symbols remain after all object modules are processed, LINK searches the specified libraries.

Pass 2 - LINK extracts translated instruction & data from object modules & build an image of executable program in memory.

This is because,

executable program is organised by segment & not by each of object modules.

Building a memory image, most efficient way to handle rearrangements caused by combining & concatenating segments.

If enough memory isn't available, LINK uses temp disk file in addition.

Here LINK processes each LEDATA & LDATA record along with corresponding FIXUP records & places binary data for LEDATA & LDATA record into memory image at locations reflecting segment address computed during Pass1.

Relocation & resolving of external reference is done here. A table of segment fixups is maintained that is used to perform relocation that reflects actual segment address when program is executed.

Once memory image is complete LINK writes it to .EXE file, which also contains a header that contains table of segment fixups & information about memory requirement & entry points & also initial contents of CS & SP registers

## → Sunos Linkers for SPARC system.

- SunOS provides 2 different linkers
  - sun-time linker
  - link-editor
- Link-editor is most commonly used in process of compiling a program.  
It takes 1 or more object modules produced by assemblers & compilers & combines them to produce a single o/p module.
- Types of output module →
  1. Relocatable object module  
It is suitable for further link-editing
  2. Static executable  
It has all symbolic references bound & ready to run
  3. Dynamic executable  
It has some symbolic references that are to be bound at run-time
  4. Shared Object  
It provides services that can be bound at run-time to 1 or more dynamic executables.
- Object module contains multiple sections which represent instructions & data areas from source program.  
These sections have a set of attributes such as "executable", "writable".  
Object modules also include list of relocation & linking operations that need to be performed & a symbol table that describes the symbols used.
- Sun-OS link-editor reads the object modules that are given to it to process. Sections that have same attributes are concatenated to form new section in o/p file.

- Symbol table from o/p files are processed to match symbol definitions & references, and relocation & linking operations are performed within o/p file.
- Linker generates new symbol table & new set of relocation instruction in output file. They represent symbols that need to be bound at run-time & relocations that need to be performed during loading.
- Relocation & linking operation are specified using set of processor-specific code. The codes reflect instruction format & addressing mode that are found in the machine as they describe the size of the field to be modified & calculations that need to be performed.
- Symbolic references from o/p file that can't be resolved are processed by referring to archives & shared objects.
  - ↳ collection of relocatable object modules.
  - Directory within archives associate symbol name with object module that contains its definition. & selected module from archive is included to resolve the reference.

~~Shared~~

Shared object is an indivisible unit that was generated by link-edit operation.

If reference symbol is defined in a shared object, entire content of shared object becomes logical part of o/p file.

Link-editor records dependency to shared object, actual inclusion of the shared object happens at run-time.
- SunOS run-time linker used to build dynamic executable & shared objects at execution time.
- It determines what shared objects are required by dynamic executable & ensures that they are included. It also resolves any additional dependencies on other shared objects.

After locating & including necessary objects, linker performs relocation & linking to prepare program for execution.

They bind symbol to actual memory address to which segment is loaded & their control is passed to executable program after binds data references.

Binding of procedure call is done during execution. During link-editing, calls to globally defined procedure is converted to reference to a procedure linking table. When procedure is called for the 1<sup>st</sup> time, control is passed to run-time linker via the table. The linker looks up the actual address of the procedure & includes it to linkage table.

So, subsequent call will directly go to called procedure  
→ lazy binding.

- Run-time linker provides flexibility.  
During execution, prog can dynamically bind to new shared objects, thus allows prog to choose b/w. no. of shared objects.  
If a shared object isn't needed, it isn't linked.

→ Cray MPP Linker for Cray T3E system.

- T3E system contain large no. of processing elements (PEs).

Each PE has its own local memory & can access memory of all other PEs.

- An application program on a T3E system is allocated a partition that consist of several PEs. (to take advantage of parallel architecture of machine)

- Work to be done is divided into b/w the PEs
- eg - partition contains consists of 16 PEs, 2 elements of a 1D array is distributed



If prog contains loop that process all 256 elements, PE0 can execute loop fr A[0] to A[15]  
 PE1 can execute loop fr A[16] to A[31] & so on.

• Shared data → data that is divided among no. of PEs.

Private data → data that isn't shared by dividing it, each PE contains a copy of the data.

Or PE has private data that exists only in its own local memory

• When program is loaded,  
 each PE gets a copy of executable code, its private data & its portion of shared data.

• MPP linker organizes blocks of code or data from object program into lists.

The blocks on a given list all share some same property.

The blocks on each list is collected, address is assigned to each block & relocation and linking operations are performed.

The linker then writes a executable file that contains relocated & linked blocks. It also specifies no. of PEs required & other control information.

• Distribution of shared data depends on no. of PEs.

If no. of PEs is specified at compilation,  
it can't be execution later.

If not, either

linker can create executable file that targets  
for a fixed no. of PEs

or partition size can be chosen at run time.  
This is called plastic executable.

Plastic executable is often larger than one  
targeted for fixed no. of PEs as,

it must contain copy of all relocatable  
object module & all linker directives that  
are needed to produce final executable.

# Compiler Design - IDCS63

## UNIT 1 : Introduction

Translator - Any program that converts a high level language program to Machine (Low Language) code.

Compiler - Program that reads code in one language i.e. source code and translates it into another language i.e. target language is a compiler.



Translator

Interpreter - A kind of language processor which does not produce target program as a translation, but directly execute the operations specified in source program, on inputs supplied by the user.



Language Pre-processing system:



- o A Hybrid compiler: *relatively slow*



Structure of a compiler:



→ 2 main parts:

① Analysis - breaks up source prog into constituent pieces & imposes grammatical structure on them. Based on the structure it creates intermediate representation of source prog. collects information about prog, stores it in the "Symbol Table". (Front End of compiler)

② Synthesis - constructs the desired target prog from the intermediate representation and information in the symbol table. (Back End of compiler)

→ 7 phases:

① Lexical Analysis - Scanning

- on reading character stream of the prog, it groups them into meaningful sequences called "Lexemes".
- for each lexeme, analyzer produces as output a token of form:

< token-name, attribute value >

abstract symbol  
used in parser

points to entry in the  
symbol table for this token

② Syntax Analysis - Parsing

- parser uses tokens i.e. output of scanner and creates a tree like intermediate representation that depicts the "grammatical structure" of token stream

Ex: For grammar  $E \rightarrow E+E \mid E * E \mid \text{num}$

For Input  $2+3*5$



Interior node: operators  
exterior node: arguments

### ③ Semantic Analysis

- uses syntax tree and information in symbol table to check source prog for semantic (meaning) consistency with lang definition.
- It gathers type information and saves it in either syntax tree or symbol table for use in ICG.
- Type checking - compiler checks whether each operator has the matching operands
- coercions - lang specification may permit some type conversion

### ④ Intermediate Code Generation (ICG)

- The intermediary code during processing may be in the form of Syntax tree or reduced form of source code.
- properties :
  - should be easy to produce
  - should be easy to translate into target M/c.

### ⑤ Code Optimization (M/c independent)

- to improve intermediate code to get better target code
- Better in terms of : faster, shorter, less power consuming code
- Instead of using int to float operation,  
replace integer by its floating-point value directly

### ⑥ Code Generation

- Input from Intermediate representation maps to target lang.
- If target lang is M/c code - the instructions are translated into sequences of M/c instruction to perform same task.
- Judicious assignment of registers to hold variables is done

## → Compiler construction tools :

- ① Parser Generators - automatically produce syntax analyzers from a grammatical description of a prog lang
- ② Scanner Generators - produce lexical analyzers from a regular expression description of tokens of lang
- ③ Syntax directed translation engines - produce collections of routines for walking a parse tree and generate ICB
- ④ code generator generators - produce C<sub>b</sub> from collection of rules for translating each operation of intermediate lang into M<sub>c</sub> lang for a target M<sub>c</sub>
- ⑤ Data flow analysis engines - facilitate gathering of data about how values are transmitted from one part of prog to every other part.
- ⑥ compiler construction toolkits - provide integrated set of routines for constructing various compiler phases

## Application of compiler Technology :

- ① Implementation of high level prog. lang using modern OOPS concept like,
  - Data Abstraction
  - Inheritance properties
- ② optimization for computer architectures
  - Parallelism
    - (i) at instruction level - multiple operations executed together
    - (ii) at preprocessor level - different threads run separately
  - Memory Hierarchy
    - Building very Large or Fast storage, but not both

### ③ Design New computer Architectures

- RISC - reduces complex memory addressing ; support data structure access , procedure invocation ...
- Specialized Architectures -  
Data flow M/c , vector M/c , VLIW & SIMD M/c .

### ④ Program Translations

- (i) Binary Translation - Increases S/w availability
- (ii) Hardware Synthesis - Verilog , VHDL - reduces time & effort
- (iii) Database Query Interpreter - SQL queries effective retrieval
- (iv) compiled simulation - model run , to validate design.
- (v) Reduce redundancy in code

### ⑤ Software Productivity Tools

- (i) Type checking - to catch program inconsistency
- (ii) Bounds checking - Lang. provides range checking like for the buffer overflow , security , optimize range check , sophisticated analyser , error detection tools .
- (iii) Memory Management Tools - (garbage collection)
  - Automatic memory management tracks all memory related errors - leaks ...

# 1. Write the differences between compiler and Interpreter

## Compiler

1. Compiler translates the entire program in one go and then executes it.
2. It produces efficient object code therefore programs runs faster.
3. Error reporting is time consuming (displayed after entire pgm is checked)
4. Conditional control statements are executed faster.
5. Memory requirement is more as single object code is generated.
6. Program need not be compiled every time.
7. Difficult to use.
8. Translate once and then run the result (stand-alone code, faster execution).
9. Eg:- C, C++



## Interpreter

1. Interpreter first converts high level language into an intermediate code and then executes it line by line. The intermediate code is executed by another program.
2. No intermediate object code is generated.
3. Errors are displayed for every instruction interpreted if any (error reporting is immediate).
4. Conditional control statements executed slower.
5. Memory requirement is less.
6. Everytime high level program is converted into lower level pgm.
7. Easy to use for beginners.
8. read - check - execute loop  
→ slower, not stand-alone.
9. Eg:- python, prolog



→ Examples showing detail phases of compiler:

①  $\text{position} = \text{initial} + \text{rate} * 60$

↓  
Lexical Analysis

$\langle \text{id}, 1 \rangle \Leftrightarrow \langle \text{id}, 2 \rangle \langle + \rangle \langle \text{id}, 3 \rangle \langle * \rangle \langle 60 \rangle$

↓  
Syntax Analysis



↓  
Semantic Analysis



↓  
Intermediate code generation

$t_1 = \text{int to float}(60)$

$t_2 = \text{id}_3 * t_1$

$t_3 = \text{id}_2 + t_2$

$\text{id}_1 = t_3$

↓  
MC independent code optimization

$t_1 = \text{id}_3 * 60.0$

$\text{id}_1 = \text{id}_2 + t_1$

→ Code Generation

LDF R1, id3

MULF R1, R1, #60.0

LDF R2, id2

ADDF R2, R2, R1

STR id1, R2

$$② \quad a[\text{index}] = 4 + 2 + \text{index}$$

### Lexical Analysis

$\langle \text{id}, 1 \rangle \langle [ \rangle \langle \text{id}, 2 \rangle \langle ] \rangle \Leftrightarrow \langle 4 \rangle \langle + \rangle \langle 2 \rangle \langle + \rangle \langle \text{id}, 2 \rangle$

### Syntax Analysis



|   |       |  |  |
|---|-------|--|--|
| 1 | a     |  |  |
| 2 | index |  |  |

### Semantic Analysis



### Intermediate code generation

$$t_1 = 4 + 2$$

$$a[\text{index}] = t_1 + \text{index}$$

### MIC independent code optimization

$$a[\text{index}] = 6 + \text{index}$$

### Code Generation

```

mov index, R0          // R0 = index
mov &a, R1              // R1 = starting address of array a
add R0, R1              // R1 = R0+R1
mov #6, R2              // R2 = 6
add R1, R2              // R2 = R1+R2 = R1+6
mov R2, &R1              // store R2 in &R1 i.e. &a's value
  
```

## → Environments and States



- Environment is mapping from names to location in the store
  - State is mapping from location in store to their values

## Dynamic Mapping Exceptions:

- (i) Static Binding of Names to Locations - global variable declaration - location is stored once for all.

Ex:    int i;  
void fun (...) {  
    int i;

// global i



- (ii) Static Binding of Locations to Values - declared constants

Ex: #define ARRSIZE 1000 //static bind

## Static Scope and Block Structure :

① main () {

```
int a=1;  
int b=1;
```

B1

int b=2;

18

```
int a=3;  
cout << a << b;
```

```
int b=4;  
cout << a << b;
```

cont << a << b;

```
cout << a << b;
```

→

| Declaration | Scope |
|-------------|-------|
| int a=1 ;   | B1-B3 |
| int b=1 ;   | B1-B2 |
| int b=2 ;   | B2-B4 |
| int a=3 ;   | B3    |
| int b=4 ;   | B4    |

② main ()

```
{
    int w,x,y,z;
    int i=4; int j=5;
    {
        int j=7; i=6;
        w = i+j;
        printf(w);
    } → 6+7 = 13
}
```

w = i+j; → 6+5 = 11  
 printf(w);

```
{
    int i=8;
    y = i+j;
    printf(y);
} → 8+5 = 13
```

z = i+j; → 6+5 = 11  
 printf(z);

③ #define a (n+1)

```

int n=2;
void b() { n=a; printf("%d",a); }
void c() { int x=1; printf("%d",a); }
void main()
{
    b();
    c();
}
```

Q1P

3

2

④

```

int w, x, y, z;
int i = 3;
int j = 4;
{
    int k = 5;
    w = i + j;           5 + 4 = 9
}
x = i + j;           3 + 4 = 7
{
    int l = 6;
    i = 7;
    y = i + l;           7 + 6 = 13
}
z = i + j;           7 + 4 = 11

```

10. what is printed by the following C cod.

a)  $\#define \alpha (x+1)$

int  $x=2;$

void b() {  $x=a$ ; printf("%d\n", x); }  $\rightarrow 3$

void c() { int a=1; printf("%d\n", a); }  $\rightarrow 2$

void main() { b(); c(); }

b)  $\#define \alpha (x+1)$

int  $x=2;$

void b() {  $x=a$ ; printf("%d\n", x); }  $\rightarrow 3$

void c() { printf("%d\n", a); }  $\rightarrow 1$   $\because$  reassignment for the same variable  $x$

void main() { b(); c(); }

$x=3$ ,  $a=3+1=4$

c)  $\#define \alpha (x+1)$

int  $x=2;$

void b() { int a=1; printf("%d\n", a); }  $\rightarrow 2$

void c() { printf("%d\n", a); }  $\rightarrow 3$

void main() { b(); c(); }

d)  $\#define \alpha (x+1)$

int  $x=2;$

void b() { int x=a; printf("%d\n", a); }  $\rightarrow 1$   $\because (x=2+1=3)$  again  $a=3+1=4$

void c() { printf("%d\n", a); }  $\rightarrow 1$

void main() { b(); c(); }

## → Parameter Passing Mechanisms :

- (1) Actual parameters - parameters used in call of procedure
- (2) Formal parameters - parameters used in procedure definition

### ① Call by value

- The actual parameter is evaluated (if an expression) or copied (if a variable), the value is placed in the location belonging to corresponding formal parameter of called procedure.
- It has all computations involving formal parameter done by called procedure is local to that procedure.

### ② Call by reference

- The address of actual parameter is passed to the callee as the value of corresponding formal parameter.
- Uses of formal parameter in code of callee are implemented by following this pointer to location indicated by caller.
- Changes to formal parameter → appear as changes in actual parameter.
- If actual parameter is expression, it is evaluated before the call and its value stored in a location of its own.
- Changes to formal parameter change value in this location, But - No effect on data of caller.

### ③ Call by name

- used in early prog - Algol 60.
- it requires callee execute as if actual parameter were substituted literally for formal parameter in the code of the callee as if formal parameter were macro standing for the actual parameter.

→ Examples:

① call by value

```
int add ( int a, int b )
{
    return (a+b);
}

main ()
{
    c = add ( 10, 20 );
}
```

② call by reference

```
int add ( int &a, int &b )
{
    return (a+b);
}

main ()
{
    int p=10;
    int q=20;
    c = add ( &p, &q );
}
```

③ call by Name-aliasing

```
int add ( int a, int b )
{
    return (a+b);
}

main ()
{
    int p=10;
    int q=20;
    c = add ( kp, kq );
}
```

- Aliasing :

- Interesting consequence of call by reference parameter passing where references to objects are passed by value.
- It is possible that two formal parameters refer to the same location — such variables are ALIAS to one another.
- Though they may be distinct formal parameters, they may be alias of one other.

Ex: Let  $a$  be array in procedure  $P$

$P$   
  {

//  $q(x,y)$  call

$q(a,a)$  ;

}

array names are references to  
location  $\Rightarrow$  Alias

$x[s] = y[t]$

## Questions

## Chapter-1 Introduction

1. Define Compilers?
2. Differentiate b/w compilers & Interpreter?
3. Explain The long program System?
4. Describe the analysis-Synthesis model of the compiler or Explain in detail the Various phases of Compiler with an example?
5. Explain in detail the Various phases of Compilation for the s/w string
  - a.  $P = i + n * 60$
  - b.  $a = a * b + a * b$
  - c.  $a = (b + c) * (b + c) * 2$
  - d.  $a[\text{index}] = 4 + 2 + \text{index}$
6. why is it necessary to group phases of Compiler
7. what is the purpose of Compiler Constn tool.  
Describe the different compiler construction tool we used?
8. Analyse the s/w productivity tool and explain
9. Explain the different parameter passing technique with an example?

## Chapter-3 - Lexical Analysis

→ Lexical Analysis :

Interaction between Lexer and Parser :



→ Task of Lexer :

- ① Identification of Lexemes
- ② Stripping out comments
- ③ Removing whitespace (blank, \n, \t)
- ④ correlating error messages generated by compiler
- ⑤ keep track of line numbers to show error
- ⑥ If source program uses Macro-preprocessor, The expansion of macros is also done by scanner.

→ Lexer - cascade of 2 processes :

- ① Scanning consists of simple processes that do not require tokenization of input, such as deletion of comments & compaction of consecutive whitespace characters into one.
- ② Lexical analysis proper in more complex portion, where scanner produces sequence of tokens as output.

Lexer versus Parser : Separate phases because :

- ① Simplicity of design - important consideration
- ② compiler efficiency improved - use specialized technique for lexical analysis (Input Buffering)
- ③ compiler portability is enhanced.

## → Tokens, Patterns, Lexemes :

- ① Token : A pair consisting of a token name and an optional attribute value.
- Token name - Abstract symbol representing a kind of lexical unit  
Ex: Keyword, Identifier, ...

The token names are the input symbols that parser processes.

- ② Pattern : Description of the form that lexemes of token may take (description in metalanguage).

Ex: token name : Identifier

pattern :  $(-a-zA-Z)^+ [a-zA-Z0-9]^*$

- ③ Lexeme : Sequence of characters in source program that Matches the pattern of a token and is identified by the lexer as an instance of that token.

Ex: token name : Keyword

pattern : [i][f]

lexeme : if

| TOKEN      | INFORMAL DESCRIPTION                 | SAMPLE LEXEMES |
|------------|--------------------------------------|----------------|
| if         | characters i,f                       | if             |
| else       | characters e,l,s,e                   | else           |
| comparison | <, >, <=, >=, ==, !=                 | <=, >          |
| id         | letter followed by letter and digits | pi, score      |
| number     | numeric constants                    | 3.14, 0, 6.9e8 |
| literal    | enclosed within " "                  | "core dumped"  |

## → Lexical Errors : Recovery options

- ① Panic mode recovery - delete successive characters from remaining input until lexer finds well known token at beginning of input left out.
- ② Delete one character from remaining input
- ③ Insert one missing character into remaining input
- ④ Replace a character by the other
- ⑤ Transpose two adjacent characters.

Examples :

fi ( a < b )  $\Rightarrow$  if ( a < b )

int a, ;  $\Rightarrow$  int a ; or int a, b ;

## → Input Buffering : To speed up reading of src prog.

### ① Single Buffer / 1-Buffer Technique

We use only one single buffer to store processed character from large no. of characters from source prog.

Main overhead is that if,

lexeme size > Buffer size  
we lose the lexeme

World  
5 Bytes

d  
W|o|r|d  
4 Bytes

• It reloads data, removes old data.

without sentinel

### ② 2-Buffer Technique $\leftarrow$ with sentinel

We use two buffers that are alternately reloaded,

Each buffer of same size N, N = size of a disk block. (4096 Byte)

• Using read system call, N characters are read.



I/P < Buff size

→ Special char eof marks end of file and this char is different from any other char of src prog.

→ Two pointers maintained :

- ① lexemeBegin - marks beginning of current lexeme whose extent we are attempting to determine
- ② Forward - scans ahead until a pattern match is found. When forward reaches end of next lexeme, \*\* we retract one position back and return token.

- we need 2 checks in 2 Buffer without sentinels :
  - 1) Advancing forward requires whether we reached the end of one of the buffer, if Yes Reload other buffer and make forward point to newly loaded buffer beginning.
  - 2) Before returning token check whether valid or not.

→ Sentinels (2 Buffer technique with sentinels)

using sentinel character at the end which is a special char that is not part of src prog (usually eof)



Here check if reached end of Buffer or not.

Look Ahead is atmost 1 char, make previous char as returned valid token.

→ Look Ahead code with sentinel :

switch (\* forward++)

{

case eof : if (forward is at end of Buffer 1) {  
    reload Buffer 2 ;

    forward = Beginning of Buffer 2 ; }

else if (forward is at end of Buffer 2) {  
    reload Buffer 1 ;

    forward = Beginning of Buffer 1 ; }

else /\* eof with fin in a buffer marks end of input \*/  
    terminate lexical analysis

break;

cases for other char

}



- ① Alphabet - finite set of symbols Ex:  $\Sigma = \{0,1\}$   
string - finite sequence of symbols from  $\Sigma$  Ex: 0101  
language - countable set of strings over  $\Sigma$ .
- ② Prefix of string - string obtained by removing zero or more symbols from end of string.  
Ex: ban, banana, e are prefixes of banana.
- ③ Suffix of string - string obtained by removing zero or more symbols from beginning of string.  
Ex: nana, banana, e are suffixes of banana
- ④ Substring - string obtained by deleting any prefix and any suffix from string.  
Ex: banana, nan, e are substrings of banana
- ⑤ Proper prefix - prefixes, which is not  $\epsilon$  or equal to string  
Ex: ban, banan
- ⑥ Proper suffix - suffix which is not  $\epsilon$  or equal to string itself  
Ex: anana, na
- ⑦ Proper substring - substring from string which is not  $\epsilon$  or the string itself  
Ex: anan, banan, anana

Subsequence - string formed by deleting zero or more not necessarily consecutive positions of string.

Ex: baan, anaa -- for banana

→ operations on languages:

| operation              | definition & notation                                                       |
|------------------------|-----------------------------------------------------------------------------|
| Union of L & M         | $L \cup M = \{ s \mid s \text{ is in } L \text{ or } s \text{ is in } M \}$ |
| Concatenation of L & M | $L M = \{ st \mid s \text{ is in } L \text{ and } t \text{ is in } M \}$    |
| Kleene closure of L    | $L^* = \bigcup_{i=0}^{\infty} L^i$                                          |
| Positive closure of L  | $L^+ = \bigcup_{i=1}^{\infty} L^i$                                          |

→ Regular Definition:

For some alphabet set  $\Sigma$ , sequence of regular definition:

$$d_1 \rightarrow r_1$$

$$d_2 \rightarrow r_2$$

:

$$d_n \rightarrow r_n \quad \text{where}$$

- 1) each  $d_i$  is new symbol (not in  $\Sigma$  & other  $d_i$ )
- 2)  $r_i$  is regular expression over  $\Sigma \cup \{ d_1, d_2, \dots, d_{i-1} \}$

Ex ① C identifiers:

$$\text{letter} \rightarrow A \mid B \mid \dots \mid z \mid a \mid b \mid \dots \mid z \mid \_$$

$$\text{digit} \rightarrow 0 \mid 1 \mid \dots \mid 9 \mid \_$$

$$\text{id} \rightarrow \text{letter} (\text{letter} \mid \text{digit})^*$$

Ex ② Unsigned numbers:

$$\text{digit} \rightarrow 0 \mid 1 \mid \dots \mid 9 \mid \_$$

$$\text{digits} \rightarrow \text{digit digit}^*$$

$$\text{optional fraction} \rightarrow \cdot \text{digits} \mid \epsilon$$

$$\text{optional exponent} \rightarrow (E (+) -) \text{ digits} \mid \epsilon$$

$$\text{number} \rightarrow \text{digits optional fraction optional exponent}$$

# Algebraic Laws for Regular Expressions

| LAW                                 | DESCRIPTION                                  |
|-------------------------------------|----------------------------------------------|
| 1. $r/s = s/r$                      | $/$ is commutative                           |
| 2. $r/(s/t) = (r/s)/t$              | $/$ is associative                           |
| 3. $r(st) = (rs)t$                  | concatenation is associative                 |
| 4. $r(s/t) = rs/r ; (s/t)r = sr/tr$ | concatenation distributes over $/$           |
| 5. $\epsilon r = r\epsilon = r$     | $\epsilon$ is the identity for concatenation |
| 6. $r^* = (r/\epsilon)^*$           | $\epsilon$ is guaranteed in a closure        |
| 7. $r^{**} = r^*$                   | $*$ is idempotent                            |

→ Recognition of Tokens:

$\text{stmt} \rightarrow \text{if expr then stmt} \mid \text{if expr then stmt else stmt} \mid \epsilon$

$\text{expr} \rightarrow \text{term relop term} \mid \text{term}$

$\text{term} \rightarrow \text{id} \mid \text{number}$

where,

$\text{number} \rightarrow [0-9]^+ (\cdot [0-9]^+)? (E [+ -] ? [0-9]^+)?$

$\text{id} \rightarrow [-a-zA-Z] [-a-zA-Z 0-9]^+$

$\text{if} \rightarrow \text{if}$

$\text{then} \rightarrow \text{then}$

$\text{else} \rightarrow \text{else}$

$\text{rel op} \rightarrow < \mid > \mid <= \mid >= \mid = \mid <>$

white space :  $\text{ws} \rightarrow (\text{blank} \mid \text{tab} \mid \text{newline})^+$

→ Transition diagram:

- ① For relational operator, regular definition is  
 $\text{rel op} \rightarrow < \mid > \mid <= \mid >= \mid = \mid <>$



Code:

```
state = 0 ;
TOKEN getloop()
{
    Token retToken = new (loop);
    while(1)
    {
        switch (state)
        {
            case 0: c = newchar(); or c = getch();
                if (c == '<') state = 1;
                else if (c == '>') state = 5;
                else if (c == '=') state = 8;
                else fail(); break;

            case 1: c = getch();
                if (c == '=') state = 2;
                else if (c == '>') state = 3;
                else if (c == '...') state = 4; // other
                else fail(); break;

            case 2: retract();
                return (retToken.attribute = LF);
                break();

            case 3: retract();
                return (retToken.attribute = NE);
                break();

            case 4: retract();
                return (retToken.attribute = LT);
                break();

            case 5: c = getch();
                if (c == '=') state = 6;
                else if (c == '...') state = 7; // other
                else fail(); break();
        }
    }
}
```

② for identifier

letter → [a-zA-Z\_]

digit → [0-9]

$\text{id} \rightarrow \text{letter} (\text{letter} | \text{digit})^*$



State = 0;

```
for (;;)
```

{ switch (state)

Case 0 : ch = getch();

```
if ( isalpha(ch) ) state = 1;
```

```
else fail(); break;
```

Case 1 : ch = getch();

```
if ( isalnum(ch) ) state = 1;
```

else state = 2;

break:

case 2 : retract();

Install ID (7);

```
return (retToken);
```

*break*:

### ③ Unsigned number

digit  $\rightarrow [0-9]$



### ④ Keywords

Ex: Keyword  $\rightarrow$  IF | THEN | ELSE



### ⑤ Delimiter | Whitespace

delim  $\rightarrow$  space | tab | newline



## chapter-2      Lexical Analysis

### Questions

1. Explain Lexical Analysis in detail with block diagram.
2. Explain the reason for separating analysis phase of compiler for lexical Analysis and Syntax Analysis.
3. What do you mean by lexical errors? How do we recover them?
4. Define the terms token; pattern, lexeme with an example.
5. why 2-buffer technique is used in LA? write an algorithm for lookahead code with sentinel.
6. Give the formal definitions for operations on languages with notations.
7. List the algebraic laws for Regular Expression.
8. Define the term prefix, suffix, substring, proper prefix, proper suffix, proper substring, subsequence with an example.
9. write regular definition for identifiers, unsigned numbers, keywords, relational operators and whitespace.
10. Draw the transition diagram for identifier, identifiers, unsigned number, keywords and white spaces.

## UNIT-2 Syntax Analysis - I

### Topics

- 1) Introductions
- 2) Context-free grammars
- 3) Writing a grammar
- 4) Top down parsing
- 5) Bottom up parsing

### 1) Introduction :

### 2) The Role of the parser / Block diagram for Syntax Analysis.



## The General types of parsers for grammars



## Syntax - Error Handling

### Common programming Errors

#### i) Lexical Errors:

These include Misspellings of Identifiers, keywords or operations

eg: Use of an Identifier ellipsize Instead of ellipsesize

#### ii) Syntactic Errors:

These errors include misplaced Semicolon/Extra or missing braces;

### iii) Semantic Errors

These include type mismatches b/w operators and operands.  
An example: return statement in a Java method with result type Void

### iv) Logical Errors:

Can be anything from incorrect reasoning on the part of the programmer

e.g. Using '=' instead of '==' in C programming.

## Error Recovery Techniques:

- i) panic Mode Recovery
- ii) phrase level Recovery
- iii) Error productions
- iv) Global Corrections

### i) panic Mode Recovery:

→ In the panic mode Recovery, keep deleting one character at a time until we find synchronization tokens (;) and ({}).

\* synchronization tokens → Semicolon ;  
→ Epilog {}

e.g. int a, ; //Error

### ii) phrase level Recovery :

→ It includes Insert, delete, update

→ On discovering an error, a parser may perform local correction on the remaining ilp; that is, it may replace a prefix of the remaining ilp by some string that allows the parser to continue.

- It includes
  - Replacing a Comma, by a Semicolon
  - delete an extra Semicolon
  - Inverting a Missing Semicolon

eg: int a, ;

Replace , by ; and delete extra ;

### iii) Error productions:

- By anticipating common errors that might be encountered, we can augment the grammar for the language at hand with productions that generate Incorrect constructs
- A parser constructed from a grammar augmented by these error productions detects the anticipated errors when an error production is used during parsing

### iv) Global Corrections:

- Ideally, we would like a compiler to make as few changes as possible in processing an incorrect input string. There are algorithms for choosing a minimal sequence of changes to obtain a globally least cost correction.
- Given an incorrect input-string  $x$  and Grammar  $G$ , these algorithms will find a parse tree for a related string  $y$ , such that the number of insertions, deletions, and changes of tokens required to transform  $x$  into  $y$  is as small as possible.

### drawback of global corrections:

- These generally too costly to implement in terms of time and space, so there are currently only a theoretical interest.

## CONTEXT FREE GRAMMARS

defn: Context free grammar is a 4-tuple defined as  
 $(V, T, P, S)$ , where

$V$ : Set of Variable

$T$ : Set of Terminals

$P$ : Set of production

$S$  is the Start Symbols

Differentiate b/w CFG and RE

### CFG

1. It is the part of the Syntax Analysis
2. Useful for describing nested grammatical structure such as balanced parenthesis and so, on.
3. CFG's are combined using pushdown automata
4. CFG can keep track of no. of symbols seen so far
5. Every CFG need not be RE
6. CFG are more powerful
7. Eg:

$$\text{letter} \rightarrow [A-Z a-z]$$

$$\text{digit} \rightarrow [0-9]$$

$$\text{id} \rightarrow \text{Id}(\text{letter/digit})$$

### RE

1. It is the part of the lexical Analysis
2. Useful for describing the structure of construct / lexical construct such as Identifiers, keywords etc
3. Regular Expressions are combined using Finite Automata
4. RE cannot keep track of no. of symbols seen so far
5. Every RE is a CFG
6. RE are less powerful as compared to CFG
7. Eg:  $[a-z A-Z 0-9][0-9]^*$

Q) For the following CFG

- Give the LMD for the string
- Give the RMD for the string
- Give the parse tree for the string
- Is the grammar ambiguous / Unambiguous? Justify

$$1. S \rightarrow SS^+ | SS^* | a \Rightarrow \underline{aa} + a^*$$

$$2. S \rightarrow 0S1 | 01 \Rightarrow 000111$$

$$3. S \rightarrow +SS^* | SS^+ | a \Rightarrow +^*aaa$$

$$4. S \rightarrow S(S)S | \epsilon \Rightarrow (( ))$$

$$5. S \rightarrow S+S | SS | (S) | S^* | a \Rightarrow (a+a)^*a$$

$$6. S \rightarrow (L) | a \quad L \rightarrow L, S | S \Rightarrow (a, a)$$

$$7. S \rightarrow abS | bSaS | \epsilon \Rightarrow aabbab$$

$$8. \text{bexpn} \rightarrow \text{bexpn} \quad \text{bterm} | \text{bterm}$$

bterm  $\rightarrow$  bterm and bfactor | bfactor

bfactor  $\rightarrow$  not bfactor | (bexpn) | true | false

$\Rightarrow$  not (true | false)

$$9. E \rightarrow E+E | E*E | -E | (E) | id \Rightarrow id + id + id$$

$$10. S \rightarrow iEts | iEtses | a \Rightarrow \text{If } E_1 \text{ then if } E_2 \text{ then } S_1 \text{ else } S_2$$

$$11. R \rightarrow R^* | R | RR | R* | (R) | a/b | c \Rightarrow a/b*c$$

$$1) \quad S \rightarrow SS + | SS^* | a \Rightarrow aa + a^*$$

$$\begin{array}{ll} S \xrightarrow{lm} SS^* & S \xrightarrow{rm} SS^* \\ \Rightarrow SS + S^* & \Rightarrow Sa^* \\ \Rightarrow as + S^* & \Rightarrow ss + a^* \\ \Rightarrow aa + S^* & \Rightarrow Sa + a^* \\ \Rightarrow aa + a^* & \Rightarrow aa + a^* \end{array}$$

parse tree:



The grammar is Unambiguous  
because it has only one LMD and one RMD

$$2) \quad S \rightarrow OSI / OI \Rightarrow OOOIII$$

$$\begin{array}{ll} S \xrightarrow{lm} OSI & S \xrightarrow{rm} OSI \\ \Rightarrow OOSII & \Rightarrow OOSII \\ \Rightarrow OOSIII & \Rightarrow OOOIII \end{array}$$

parse tree:



The grammar is Unambiguous  
because it has only one LMD and only one RMD

3)  $s \rightarrow +ss | *ss/a \Rightarrow +*aaa$

$$s \xrightarrow{Im} +ss$$

$$\xrightarrow{Im} +*sss$$

$$\Rightarrow +*ass$$

$$\Rightarrow +*aas$$

$$\Rightarrow +*aaa$$

$$s \xrightarrow{nm} +ss$$

$$\xrightarrow{nm} +*sss$$

$$\Rightarrow +*ssa$$

$$\Rightarrow +*saa$$

$$\Rightarrow +*saa$$



The Grammar is Unambiguous  
because it has only one LRD and one RMD

4)  $s \rightarrow s(s)s | \epsilon \Rightarrow (( ))$

$$s \xrightarrow{Im} s(s)s$$

$$\Rightarrow (s)s$$

$$\Rightarrow (s(s)s)s$$

$$\Rightarrow (\underline{s}s)s$$

$$\Rightarrow ((s)s)s$$

$$\Rightarrow ((s)s)s$$

$$\Rightarrow ((s)s)s$$

$$\Rightarrow ((s)s)s$$

$$\Rightarrow ((s)s)s$$

$$\Rightarrow ((s)s)s$$

$$\text{(i)}$$

$$s \xrightarrow{Im} \underline{s}(s)\bar{s}$$

$$\xrightarrow{Im} (s)s$$

$$\xrightarrow{Im} (\underline{s}(s)s)s$$

$$\xrightarrow{Im} (s(s)s(s)s)s$$

$$\xrightarrow{Im} ((s)s(s)s)s$$

$$\xrightarrow{Im} ((s)s(s)s)s$$

$$\xrightarrow{Im} ((s)s(s)s)s$$

$$\xrightarrow{Im} ((s)s(s)s)s$$

$$\xrightarrow{Im} ((s)s(s)s)s$$

$$\xrightarrow{Im} ((s)s(s)s)s \text{ (ii)}$$

RMD:

$$S \xrightarrow{\text{RMD}} S(S)S$$

$$\Rightarrow S(S)$$

$$\Rightarrow S(S(S)S)$$

$$\Rightarrow S(S(S)S(S))$$

$$\Rightarrow S(S(S)S(C))$$

$$\Rightarrow S(S(S)(C))$$

$$\Rightarrow S(S(C)(C))$$

$$\Rightarrow S((C)(C))$$

$$\Rightarrow ((C)(C))$$

(i)

$$S \xrightarrow{\text{RMD}} S(S)S$$

$$\Rightarrow S(S)$$

$$\Rightarrow S(S(S)S)$$

$$\Rightarrow S(S(S))$$

$$\Rightarrow S(S(C))$$

$$\Rightarrow S(S(S)S(C))$$

$$\Rightarrow S(S(S)(C))$$

$$\Rightarrow S(S(C)(C))$$

$$\Rightarrow S((C)(C))$$

$$\Rightarrow ((S(C))(C))$$

(ii)

parse tree for LMD (i) and (ii)



The Grammar is ambiguous  
Since it has a LMD and a RMD

$$\Rightarrow S \rightarrow S+S|SS|(S)|S^*|a \Rightarrow (a+a)^*$$

$$S \xrightarrow{Im} SS$$

$$\Rightarrow S^*S$$

$$\Rightarrow (S)*S$$

$$\Rightarrow (S+S)*S$$

$$\Rightarrow (a+S)*S$$

$$\Rightarrow (a+a)*S$$

$$\Rightarrow (a+a)*a$$

$$S \xrightarrow{Im} SS$$

$$\Rightarrow S^*S$$

$$\Rightarrow S*a$$

$$\Rightarrow (S)*a$$

$$\Rightarrow (S+S)*a$$

$$\Rightarrow (S+a)*a$$

$$\Rightarrow (a+a)*a$$



The Grammar is Unambiguous  
because It has only one RMD and LMD

$$\Rightarrow S \rightarrow (L)a$$

$$L \rightarrow L, S|S \Rightarrow (a, a)$$

$$S \xrightarrow{Im} (L)$$

$$\Rightarrow (L, S)$$

$$\Rightarrow (S, S)$$

$$\Rightarrow (a, S)$$

$$\Rightarrow (a, a)$$

$$S \xrightarrow{Im} (L)$$

$$\Rightarrow (L, S)$$

$$\Rightarrow (L, a)$$

$$\Rightarrow (S, a)$$

$$\Rightarrow (a, a)$$



The Grammar is Unambiguous  
because It has only one LMD and RMD

$$\Rightarrow S \rightarrow S+S|SS|(S)|S^*|a \Rightarrow (a+a)^*a$$

$$S \xrightarrow{Im} SS$$

$$\Rightarrow S^*S$$

$$\Rightarrow (S)^*S$$

$$\Rightarrow (S+S)^*S$$

$$\Rightarrow (a+s)^*s$$

$$\Rightarrow (a+a)^*s$$

$$\Rightarrow (a+a)^*a$$

$$S \xrightarrow{Sm} SS$$

$$\Rightarrow S^*S$$

$$\Rightarrow S^*a$$

$$\Rightarrow (S)^*a$$

$$\Rightarrow (S+S)^*a$$

$$\Rightarrow (s+s)^*a$$

$$\Rightarrow (a+a)^*a$$



The Grammar Is Unambiguous  
because It has only one RMD and LMD

$$\Rightarrow S \rightarrow (L)|a$$

$$L \rightarrow L, SLS \Rightarrow (a, a)$$

$$S \xrightarrow{Im} (L)$$

$$\Rightarrow (L, S)$$

$$\Rightarrow (S, S)$$

$$\Rightarrow (a, S)$$

$$\Rightarrow (a, a)$$

$$S \xrightarrow{Sm} (L)$$

$$\Rightarrow (L, S)$$

$$\Rightarrow (L, a)$$

$$\Rightarrow (S, a)$$

$$\Rightarrow (a, a)$$



The Grammar Is Unambiguous  
because It has only one LMD and RMD

7)  $S \rightarrow asbs \mid bsas \mid \epsilon \Rightarrow aabbab$

LMD:

|                                |                                |
|--------------------------------|--------------------------------|
| $S \xrightarrow{Im} asbs$      | $S \xrightarrow{Im} asbs$      |
| $\Rightarrow a\cancel{a}sbsbs$ | $\Rightarrow aa\cancel{b}bsbs$ |
| $\Rightarrow aab\cancel{s}bs$  | $\Rightarrow aabsbs$           |
| $\Rightarrow aabbsasbs$        | $\Rightarrow aabbasbs$         |
| $\Rightarrow aabbasbs$         | $\Rightarrow aabbabs$          |
| $\Rightarrow aabbabs$          | $\Rightarrow aabbab$           |
| $\Rightarrow aabbab$           |                                |

RMD:

|                                 |                            |
|---------------------------------|----------------------------|
| $S \xrightarrow{rm} asbs$       | $S \xrightarrow{rm} asbs$  |
| $S \Rightarrow asb$             | $\xrightarrow{rm} asbasbs$ |
| $\Rightarrow aasbsb$            | $\xrightarrow{rm} asbaSb$  |
| $\Rightarrow aasbbSasb$         | $\Rightarrow asbab$        |
| $\Rightarrow aasbbS\cancel{a}b$ | $\Rightarrow aasbsbab$     |
| $\Rightarrow aaSbbSab$          | $\Rightarrow aasbbab$      |
| $\Rightarrow aasbbab$           | $\Rightarrow aabbab$       |
| $\Rightarrow aabbab$            |                            |

parse tree:



The grammar is ambiguous  
since it has 2 LMD and 2 RMD

8) bexpr → bexpr on bterm / bterm  
 bterm → bterm and bfactor / bfactor  
 bfactor → not bfactor | (bexpr) | true | false  
 not (true or false)

LMD:

bexpr  $\xrightarrow{\text{LMD}}$  bterm  
 $\Rightarrow$  bfactor  
 $\Rightarrow$  not bfactor  
 $\Rightarrow$  not (bexpr)  
 $\Rightarrow$  not (bexpr on bterm)  
 $\Rightarrow$  not (bterm on bterm)  
 $\Rightarrow$  not (bfactor on bterm)  
 $\Rightarrow$  not (true on bterm)  
 $\Rightarrow$  not (true on bfactor)  
 $\Rightarrow$  not (true on false)

RMD:

bexpr  $\xrightarrow{\text{RMD}}$  bterm  
 $\Rightarrow$  bfactor  
 $\Rightarrow$  not bfactor  
 $\Rightarrow$  not (bexpr)  
 $\Rightarrow$  not (bexpr on bterm)  
 $\Rightarrow$  not (bexpr on bfactor)  
 $\Rightarrow$  not (bexpr on false)  
 $\Rightarrow$  not (bterm on false)  
 $\Rightarrow$  not (bfactor on false)  
 $\Rightarrow$  not (true on false)



$$\Rightarrow E \rightarrow E+E | E*E | -E | (E) | id$$

$$\Rightarrow id + id * id$$

LMD

$$E \xrightarrow{Im} E+E (E \rightarrow E+E)$$

$$\Rightarrow E+E*E$$

$$\Rightarrow id + E*E$$

$$\Rightarrow id + id * E$$

$$\Rightarrow id + id * id$$

$$E \xrightarrow{Im} E*E$$

$$\Rightarrow E+E*E$$

$$\Rightarrow id + E*E$$

$$\Rightarrow id + id * E$$

$$\Rightarrow id + id * id$$

RMD

$$E \xrightarrow{nm} E+E$$

$$\Rightarrow E+E*E$$

$$\Rightarrow E+E*id$$

$$\Rightarrow E+id+id$$

$$\Rightarrow id + id * id$$

$$E \xrightarrow{nm} E*E$$

$$\Rightarrow E*id$$

$$\Rightarrow E+E*id$$

$$\Rightarrow E+id*id$$

$$\Rightarrow id + id * id$$



The Grammar is Ambiguous

Since we got more than 1 LMD and more than 1 RMD for this Grammar

$$10 \quad S \rightarrow iEtS / iEtSeS / a \quad E \rightarrow b$$

$\Rightarrow$  If  $E_1$  then If  $E_2$  then  $S_1$  Else  $S_2$

1)



2)



The given grammar is ambiguous :: it has two

different parse trees

$$R \rightarrow A \mid R \mid RR \mid R^* \mid (R) \mid a \mid b \mid c \Rightarrow abc$$

1)

LMD 1

$$\begin{aligned} R &\rightarrow R \mid R \\ &\xrightarrow{\text{LMD 1}} a \mid R \\ &\xrightarrow{\text{LMD 1}} a \mid RR \\ &\xrightarrow{\text{LMD 1}} a \mid R^* R \\ &\xrightarrow{\text{LMD 1}} a \mid b^* R \\ &\Rightarrow abc \end{aligned}$$

LMD 2

$$\begin{aligned} R &\rightarrow RR \\ &\xrightarrow{\text{LMD 2}} R \mid RR \\ &\xrightarrow{\text{LMD 2}} a \mid RR \\ &\xrightarrow{\text{LMD 2}} a \mid R^* R \\ &\xrightarrow{\text{LMD 2}} a \mid b^* R \\ &\Rightarrow abc \end{aligned}$$

The given grammar is ambiguous :: it has LMDs.

(ii)

## Eliminating ambiguity

Can be done in 2 methods

i) Dis-ambiguity rule ii) Using precedence & associativity of operations

### Dis-ambiguity rule

→ Some grammars corresponding the statements are ambiguous, this is due to dangling else.

The dangling else problem can be eliminated and thus ambiguity of the grammar can also be eliminated

Q) what is dangling else problem??

→ Consider the following grammar

$$S \rightarrow iCts \mid iCtSeSa \quad \text{Input string: ibtibtaea}$$

$C \rightarrow b$

where  $i \rightarrow$  stands for keyword 'if'

$C \rightarrow$  stands for 'Condition' to be satisfied,  
and  $C$  is non-terminal

$t \rightarrow$  stands for keyword 'then'

$s \rightarrow$  stands for 'Statement' for non-terminal

$e \rightarrow$  stands for keyword 'else'

$a \rightarrow$  stands for other statement

$b \rightarrow$  stands for other statement

Since the above grammar is ambiguous, we get two different parse tree for the string ibtibtaea



Since there are 2 parse tree for the same string  
therefore the given grammar is ambiguous.  
Observe the following points

- The first parse tree associates else with 2nd statement
- The second parse tree associates else with first if stmt

The ambiguity whether to associate else with first if statement / Second If -statement is called dangling else problem.

Eg) Q) Eliminate ambiguity from the following ambiguous grammar:

$$S \rightarrow iCtS \mid iCtSeS \mid a \\ C \rightarrow b$$

→ In all programming languages when If -statements are nested, the first parse tree is preferred. So, the general rule is "Match each else with closest unmatched then". This rule can be directly incorporated into grammar and ambiguity can be eliminated as shown below:

Step 1) The matched stmt M is an If-else statement where the statement S before else and after else keyword is matched. This can be expressed as:

$$M \rightarrow iCtMeM$$

Step 2) An Unmatched statement U is the one consisting of:

a) Simple If -statement where the statement S is matched statement/ Unmatched statement. The equivalent production is  $\rightarrow U \rightarrow iCtS$

b) If-else statement where the statement before else is matched and statement after else is Unmatched.

The equivalent production is:  $U \rightarrow iCtMeU$

Step 3: The matched statement M and Un-matched statement U can obtained using the Statement S as shown below:

$$S \rightarrow M|U$$

So, the final grammar which is un-ambiguous is shown below:

$$S \rightarrow M|U$$

$$M \rightarrow iCtMeM/a$$

$$U \rightarrow iCtS$$

$$U \rightarrow iCtMeU \quad C \Rightarrow b$$



Observe that the above grammar associates else with closest then and eliminates ambiguity from the grammar.

### Eliminating ambiguity using precedence and Associativity

This method is explained using the following example:

Eg. Q) Convert the ambiguous grammar into Unambiguous grammar:

$$E \rightarrow E * E | E - E$$

$$E \rightarrow E \wedge E | E / E$$

$$E \rightarrow E + E$$

$$E \rightarrow (E) | id$$

The grammar can be converted into unambiguous grammar using the precedence of operations as shown well as associativity operators as shown below:

Step 1: Arrange the operators in increasing order of the precedence along with the associativity as shown below:

| operations | Associativity | non-terminal used |
|------------|---------------|-------------------|
| +, -       | LEFT          | E                 |
| *, /       | LEFT          | T                 |
| ^          | RIGHT         | P                 |

Since there are three levels of precedence, we associate three non-terminals : E, T and P. Also an extra non-terminal F, generating basic units in an arithmetic expression

Step 2: The basic units in expression are id (identifier) and parenthesized expressions. The production corresponding to this can be written as:

$$F \rightarrow (E) | id$$

Step 3: The next highest priority operation is  $\wedge$  and it is right associative. So, the production must start from the non-terminal P and it should have right recursion as shown below:

$$P \rightarrow F^\wedge P | F$$

Step 4: The next highest priority operators are \* and / and they are left associative. So, the production must start from the non-terminal T and it should have left recursion as shown below:

$$T \rightarrow T * P | T / P | P$$

Step 5: The next highest priority operators are + and - and they are left associative. So, the production must start from the non-terminal E and it should have left recursion as shown below:

$$E \rightarrow E + T | E - T | T$$

Step 6: The final grammar which is unambiguous grammar can be written as shown below:

$$E \rightarrow E + T | E - T | T$$

$$T \rightarrow T * P | T / P | P$$

$$P \rightarrow F^\wedge P | F$$

$$F \rightarrow (E) | id$$

Q) Convert the following Ambiguous grammar into Unambiguous grammar by considering \* and - operators lowest

$$E \rightarrow E + E$$

$$E \rightarrow E - E$$

$$E \rightarrow E^{\wedge} E$$

$$E \rightarrow E * E$$

$$E \rightarrow E / E$$

$$E \rightarrow (E) | id$$

priority and they are left associative, / and + operators have the highest priority and are right associative and  $\wedge$  operator has the highest priority

and are right associativity and  $\wedge$  operator has precedence in between and it is left associativity.

→ The grammar can be converted into unambiguous grammar using the precedence of operators as well as associativity operators as shown below:

Step 1: Arrange the operators in increasing order of the precedence along with associativity as shown below:

| precedence<br>(lowest) | operators | Associativity | non-terminal used |
|------------------------|-----------|---------------|-------------------|
|                        | *         | LEFT          | E                 |
|                        | -         | LEFT          | P                 |
| (highest)              | $\wedge$  | RIGHT         | T                 |
|                        | /, +      |               |                   |

Since there are three levels of precedence we evaluate three non-terminals: E, P and T. Also use an extra non-terminal F generating basic units in an arithmetic expression

Step 2: The basic units in expression are id (Identifier) and parenthesized expressions. The production corresponding to this can be written as:

$$F \rightarrow (E) | id$$

Step 3: The next highest priority operators are + and / They are right associative, So, the production must start from the non-terminal T and it should be right recursive in RHS of the production as shown below:

$$T \rightarrow F + T \mid F / T \mid F$$

Step 4: The next highest priority operator is \* and it is left associative, So, the production must start from the non-terminal P and it should be left recursive in RHS of the production as shown below:

$$P \rightarrow P * T \mid T$$

Step 5: The next highest priority operators are \* and - and they are left associative. So, the production must start from the non-terminal E and it should be left recursive in RHS of the production as shown below:

$$\cancel{P \rightarrow P * T \mid T} \quad E \rightarrow E + P \mid E - P \mid P$$

Step 6: The next highest priority operators are \* and - and they

Step 6: The final grammar which is unambiguous can be written as shown below:

$$E \rightarrow E + P \mid E - P \mid P$$

$$P \rightarrow P * T \mid T$$

$$T \rightarrow F + T \mid F / T \mid F$$

$$F \rightarrow (E) / id$$

Q) Define Ambiguity ? Show that the following grammar is ambiguous.

$R \rightarrow R' | 'R | RR | R* | (R) | abc$  for input string  
 $a/b*c$

Give an unambiguous grammar for the above grammar, such that precedence order from lowest to highest are Concatenation, \*, |, (), identifier and all are left to right associativity

Ans: The grammar is said to be ambiguous if it has more than one LMD / more than one RMP

If there are two <sup>or</sup> different parse trees for the input string by applying LMD / by applying RMD

→ ijp string:  $a^l b^m c^n$

LMD 3      RMD 2

$$R \rightarrow R' \backslash R$$

$R \rightarrow RR$

$\xrightarrow{\text{Irr}} \text{a}'\text{l}'\text{R}$

$$\xrightarrow{\text{Im}} R' \wr R$$

$\Rightarrow a' l' RR$

$\xrightarrow{\text{def}}$  a 'IRR'

$\Rightarrow \alpha' T' R * R$

air air

$\Rightarrow a' \mid b'$

$$\Rightarrow a \parallel b * R$$

pass tree:



It has two LMP's ∴ the given grammar is Unambiguous



→ Unambiguous grammar

1. Arrange the operators in the ascending order with the precedence and associativity

| operations | Associativity | non-terminal used |
|------------|---------------|-------------------|
| *          | LEFT          | R                 |
| *          | LEFT          | S                 |
|            | LEFT          | T                 |

2. The basic units in expression are (R) and a,b,c we use additional non-terminal U for generating those  $U \rightarrow (R) | a | b | c$

3. The next highest priority operator is | and it is left associative. So the production must start from the non-terminal T and it must be left recursive in RHS of the production

$$T \rightarrow T|'U|U$$

4. The next highest priority operator is \* and it is left associative. So the production must start from non-terminal S and it is a Unary operator

$$S \rightarrow T^*|T$$

5. The next highest priority operator is Concatenation and it is left associative. So the production must start from the non-terminal R and it should have left recursion as

$$R \rightarrow RS|S$$

Step 6: The final grammar which is unambiguous can be written as

$$R \rightarrow RS|S$$

$$S \rightarrow T^*|T$$

$$T \rightarrow T'UV|U$$

$$U \rightarrow (R)abbC$$

The parse tree for the iIP  $abb^*$  is



## Left Recursion :

defn:

If Non-terminal Symbol and the 1st symbol of the production are same then It is left recursion

General form:  $A \rightarrow A\alpha_1 | A\alpha_2 | A\alpha_3 | \beta$

$$\downarrow \\ A \rightarrow \beta A'$$

$$A' \rightarrow \alpha_1 A' | \alpha_2 A' | \epsilon$$

## Algorithm for Left Recursion Elimination

Algorithm Left-Recursion

inp: Grammar  $G_1$  with no cycles on  $\epsilon$ -production

oup: An equivalent grammar with no left-recursion

Method: Apply the algorithm to  $G_1$ . Note that the resulting non-left-recursive grammar may have  $\epsilon$ -productions.

1. Arrange the non-terminals in some order  $A_1, A_2, \dots, A_n$
2. For(each  $i$  from 1 to  $n$ ) {
3.   for(each  $j$  from 1 to  $i-1$ ) {
4.     replace each production of the form  $A_i \rightarrow A_j r$  by the productions  $A_i \rightarrow \delta_1 r | \delta_2 r | \dots | \delta_k r$  where  $A_j \rightarrow \delta_1 | \delta_2 | \dots | \delta_k$  are all current  $A_i$ -productions
5.   }
6.   Eliminate the immediate left recursion among the  $A_i$ -production
7. }

## Example on Removing/Eliminating left Recursion

$$1. E \rightarrow \underbrace{E + T}_{A \quad \alpha} \mid T \quad \underbrace{T}_{B}$$



$$\begin{array}{l} E \rightarrow TE' \\ E' \rightarrow +TE'E' \mid \epsilon \end{array} \xrightarrow{\text{Epsilon}}$$

(here  $E \rightarrow E + T \mid T$ )

(both are same)

$\therefore$  The grammar contains  
left recursion

$$2. T \rightarrow \underbrace{T * F}_{A \quad \alpha} \mid F \quad \underbrace{F}_{B}$$

$$\downarrow \quad \quad \quad \Rightarrow \quad \quad \quad \boxed{A \rightarrow BA} \\ T \rightarrow FT'$$

$$\boxed{A' \rightarrow \alpha A' \mid \epsilon}$$

$$T' \rightarrow *FT'/\epsilon$$

$$3. S \rightarrow \underbrace{S(S)S}_{A \quad \alpha} \mid \epsilon \quad \underbrace{S}_{B}$$

$$\downarrow \quad \quad \quad S \rightarrow S$$

$$S' \rightarrow (S)SS \mid \epsilon$$

$$4. S \rightarrow SS+ \mid SS* \mid a$$



$$S \rightarrow aS'$$

$$S' \rightarrow S + S' \mid S * S' \mid \epsilon$$

$$5. E \rightarrow E + T \mid T$$

$$T \rightarrow T * F \mid F$$

$$F \rightarrow (E) / id$$



$$E \rightarrow TE$$

$$E' \rightarrow +TE'E' \mid \epsilon$$

$$T \rightarrow FT'$$

$$T' \rightarrow *FT'/\epsilon$$

$$F \rightarrow (E) / id$$

## Left factoring

General form:  $A \rightarrow \alpha\beta_1 / \alpha\beta_2 / \alpha\beta_3 \dots / \alpha\beta_n / \gamma$

$$\downarrow \\ A \rightarrow \alpha A' / \gamma$$

$$A' \rightarrow \beta_1 / \beta_2 \dots / \beta_n$$

## Algorithm for left factoring

Algorithm left-factoring

ilp: Grammar G

olp: An equivalent left-factored grammar

method: For each non-terminal A, find the longest prefix  $\alpha$  common to two or more of its alternatives

If  $\alpha \neq \epsilon$  - i.e. there is a non-trivial common prefix - replace all of the A-productions

$$A \rightarrow \alpha\beta_1 / \alpha\beta_2 / \dots / \alpha\beta_n / \gamma$$

where  $\gamma$  represents all alternatives that do not begin with  $\alpha$ , by

$$A \rightarrow \alpha A' / \gamma$$

$$A' \rightarrow \beta_1 / \beta_2 / \dots / \beta_n$$

Here  $A'$  is new non-terminal. Repeatedly apply this transformation until no two alternatives for a non-terminal have a common prefix

## Examples on left factoring

$$1. S \rightarrow \underbrace{ss +}_{\alpha \beta_1} \underbrace{ss *}_{\alpha \beta_2} / a$$

$\Downarrow$

$$S \rightarrow sss' / a$$

$$S' \rightarrow +/*$$

The common terminals we have to take as  $\alpha$  and the remaining term we have to take as  $\beta_1, \beta_2$  - So on in each production

$$2. \del{S \rightarrow OS' + a}$$

$\Downarrow$

$$S \rightarrow OS' / OI$$

$\Downarrow$

$$S \rightarrow OS' / \epsilon$$

$$S' \rightarrow SI / I$$

$$3. S \rightarrow iEtS / iEtSeS / a$$

$E \rightarrow b$

$\Downarrow$

$$S \rightarrow iEtS / iEtSeS / a$$

$\alpha \beta_1 \alpha \beta_2 \gamma$

$\Downarrow$

$$S \rightarrow iEtSS' / a$$

$$S' \rightarrow \epsilon / es$$

$E \rightarrow b$

Since  $\beta_1$  is empty we'll take  $\beta_1$  as  $\epsilon$

## Top down parsers :

- dfn: Is a parser of an ilp string of token by having out the steps in a left most derivation, it derives the string from the start symbol
- It is termed as topdown because the parse tree is traversed in a preorder way that is from the root node to the leaf node
- It has various types.



### i) Backtracking

- Backtracking tries different possibilities for parsing an ilp string by backtracking up an arbitrary amount in the ilp if any possibilities fails
- These are more powerful but very much slower, as they require exponential time to parse, hence they're not suitable for practical compilers

eg:  $S \rightarrow CA\bar{d}$

$A \rightarrow ab\bar{a}$

ilp string  $\rightarrow c\bar{a}b\bar{d}$

ilp string  $\rightarrow c\bar{a}\bar{d}$



## ii) Non-backtracking parsers:

i) Recursive descent      ii) Table driven

### i) Recursive descent parser:

→ Recursive descent parsers are more versatile & suitable for handwritten parsers

→ It helps to study the method for parsing and serves as basis for topdown parsers

$$S \rightarrow CA\alpha$$

$$A \rightarrow ab\alpha$$

here, the grammar rule of a non-terminal A is given as a defn of procedure call which will recognize A

a) The right hand side of a grammar rule, specifies the structure of the code for the procedure.

b) The sequence of terminals on the right hand side corresponds to the if part matches while the sequence of non-terminals are calls with the corresponding procedure

$$NT = \{S, A\}$$

$$T = \{a, b, c, d\}$$

procedure S()

If (input == 'c')

{ Advance();

A();

If (input == 'd')

{

Advance();

return(true);

}  
else

{  
    return(false); }

}  
else {

    return(false); }

}

procedure A()

{

    isave = in - ptn;

    if (input == 'a')

{

        Advance();

        if (input == 'b')

{

            Advance();

            return(true);

}  
else {

        in - ptn = isave

        if (input == 'a')

{

            Advance();

            return(true);

}

}  
return(false);

}

### isave:

Saves the ilp pointer position before each alternate production to facilitate backtracking whenever a terminal is encountered. The ilp pointer

Advances the next position if alternate phase the in pointer references to the previous position to trace the next alternate.

### Advance():

advance is a procedure that is written to advance the ilp pointer to the next position on a successful completion of the parsing action. The parser returns a true value.

### drawbacks of Recursive descent parsing

1. left recursion → It has the production rule of the form  $A \rightarrow A\alpha$   
the parser goes into infinite loop  
eg:  $A \rightarrow AbA$   
 $\text{ilp} \rightarrow abb$   
The device string abb there is an ambiguity as to how many times the nonterminals has to be expanded
2. Backtracking: It occurs when there is more than one alternate in the production to be tried while parsing the ilp string

$$S \rightarrow (Ad \\ A \rightarrow abA$$

ilp: ~~cab~~



illp string: cad



∴ illp symbol : illp symbol but ptr is pointing to b

3.3. It is very difficult to identify the posn of the errors

Example: Recursive descent-

Q) Write a recursive descent parser for the following grammar

$$E \rightarrow TE'$$

$$E' \rightarrow +TE' \mid \epsilon$$

$$T \rightarrow FT'$$

$$T' \rightarrow *FT' \mid \epsilon$$
 illp: "id + id \* id

$$F \rightarrow (E) \mid id$$

→ → procedure E()

{ if (input == 'T')  
T();

↳ Eprime();

procedure T()

{ F();

↳ Tprime();

procedure F()

{ if (input == 'C')  
Advance();

EC();

If (input == ')')

return (True);

else

return (False);

```

        elseif(input == "id")
    {
        Advance();
        return(true);
    }
    else
    {
        return(false);
    }
}

procedure Eprime()
{
    if(input == "+")
    {
        Advance();
        T();
        Eprime();
        return(true);
    }
    return(false);
}

procedure Tprime()
{
    if(input == "*")
    {
        Advance();
        F();
        Tprime();
        return(true);
    }
    else
    {
        return(false);
    }
}

```

### ii) Table drivers:

- Table drivers is also called as predictive parsing
- predictive parser is a recursive descent parser, which production has the capability to predict which production is to be used to replace the left string
- The predictive parser does not suffer from backtracking

# Predictive Parsers / LL(0) parsers/Table driven parsers.

Q) why do we need a FIRST and FOLLOW set

Consider the below given top down approach for the example:



## FIRST AND FOLLOW SETS:

### i) FIRST:

i) If  $x$  is a terminal, then  $\text{FIRST}(x) = \{x\}$

ii) If  $x$  is a nonterminal, and  $x \rightarrow y_1 y_2 \dots y_k$  is a production for some  $k \geq 1$ , then place  $a$  in  $\text{FIRST}(x)$  if for some  $i$ ,  $a$  is in  $\text{FIRST}(y_i)$  and  $\epsilon$  is in all of  $\text{FIRST}(y_1) \dots \text{FIRST}(y_{i-1})$ . That is  $y_i$ .

$y_{i-1} \xrightarrow{x} \epsilon$ , If  $\epsilon$  is in  $\text{FIRST}(y_j)$  for all  $j=1 \dots k$  then add  $\epsilon$  to  $\text{FIRST}(x)$ . For example, every thing in  $\text{FIRST}(y_1)$  is surely in  $\text{FIRST}(x)$ . If  $y_1$  doesn't derive  $\epsilon$ , then we add nothing more to  $\text{FIRST}(x)$ , but if  $y_1 \xrightarrow{*} \epsilon$ , then we add

$\text{FIRST}(y_2)$  and so on.

iii) If  $x \rightarrow \epsilon$  is a production, then add  $\epsilon$  to  $\text{FIRST}(x)$ .

Some of the general forms on how to find the  $\text{FIRST}(x)$  are given below:

1)  $x \rightarrow aB$   
 $\text{FIRST}(x) = \{a\}$

$x \rightarrow \epsilon$   
 $\text{FIRST}(x) = \epsilon$

2)  $x \rightarrow ABC$   
 $A \rightarrow a/\epsilon$   
 $B \rightarrow b$   
 $C \rightarrow c$

$\text{FIRST}(x) = \text{FIRST}(A)$   
 $= \{a, \epsilon\}$   
 $\downarrow$   
 $\text{FIRST}(B)$   
 $= \{a, b\}$

3)  $x \rightarrow ABC$   
 $A \rightarrow a$   
 $B \rightarrow b$   
 $C \rightarrow c$   
 $\text{FIRST}(x) = \text{FIRST}(A)$   
 $= \{a\}$

4)  $X \rightarrow ABC$   
 $A \rightarrow a|\epsilon$   
 $B \rightarrow b|\epsilon$   
 $C \rightarrow c$

$$\text{FIRST}(X) = \text{FIRST}(A) \\ = \{a, \epsilon\}$$

$$\downarrow \text{FIRST}(B)$$

$$= \{a, b, c\} \xrightarrow{\{b, \epsilon\}} \text{FIRST}(C)$$

$$\{c\}$$

5)  $X \rightarrow ABC$   
 $A \rightarrow a|\epsilon$   
 $B \rightarrow b|\epsilon$   
 $C \rightarrow c|\epsilon$

$$\text{FIRST}(X) = \text{FIRST}(A) \\ = \{a, \epsilon\}$$

$$\downarrow \text{FIRST}(B)$$

$$\{b, \epsilon\}$$

$$\xrightarrow{\{c, \epsilon\}} \text{FIRST}(C)$$

$$\therefore = \{a, b, c, \epsilon\}$$

$$\xrightarrow{\{c, \epsilon\}}$$

### ii) FOLLOW:

i) place \$ in FOLLOW(S), where S is the start symbol,  
and \$ is the input right endmarker

ii) If there is a production  $A \rightarrow \alpha B \beta$ , then everything  
in FIRST(B) except \$\epsilon\$ is in FOLLOW(B)

iii) If there is a production  $A \rightarrow \alpha B$ , on a production  
 $A \rightarrow \alpha B \beta$ , where FIRST(B) contains \$\epsilon\$,  
then everything in FOLLOW(A) is in FOLLOW(B).

### How to find FOLLOW:

1)  $A \rightarrow XBCD$   
 $(\rightarrow c)$

$$\text{follow}(B) = \{\text{FIRST}(CD)\} \\ = \{c\}$$

2)  $A \rightarrow XBCD$   
 $\quad \quad \quad \quad B$

Since  $B = CD$

$$\therefore \text{Follow}(B) = \{\text{FIRST}(B)\} \\ = \text{FIRST}(CD) \\ = \{c, d\}$$

$$3) A \rightarrow \overbrace{X}^{\alpha} \overbrace{B}^{\beta} \overbrace{C}^{\gamma} \overbrace{D}^{\delta}$$

$$C \rightarrow \epsilon / c$$

$$D \rightarrow d / \epsilon$$

$$\text{Follow}(B) = \text{first}(B)$$

$$= \text{first}(D)$$

$$= \{c, d\} + \text{Follow}(A)$$

$$4) A \rightarrow \overbrace{X}^{\alpha} \overbrace{B}^{\beta} \overbrace{\epsilon}^{\gamma} \quad (\text{Since } \beta = \epsilon, \text{ here})$$

$$\text{Follow}(B) = \text{Follow}(\text{left most non-terminal}) \\ = \text{Follow}(A)$$

$$5) A \rightarrow \overbrace{X}^{\alpha} \overbrace{B}^{\beta} \overbrace{C}^{\gamma} \overbrace{D}^{\delta} \overbrace{Be}^{\epsilon} \quad ①$$

$$C \rightarrow c / \epsilon \quad \overbrace{X}^{\alpha} \overbrace{B}^{\beta} \overbrace{P}^{\gamma} \quad ②$$

$$D \rightarrow d$$

$$\text{Follow}(B) = \text{FIRST}(CDBe)$$

$$= \{c, d\} \quad ①$$

$$+ \\ \text{Follow}(B) = \text{FIRST}(e)$$

$$= \{e\}$$

$$\therefore \text{FAL}(B) = ① + ②$$

$$\text{FAL}(B) = \{c, d, e\}$$

Find the FIRST and FOLLOW set for the following grammars

$$1. E \rightarrow TE'$$

$$E' \rightarrow +TE' / \epsilon$$

$$T \rightarrow FT'$$

$$T' \rightarrow *FT' / \epsilon$$

$$F \rightarrow (\epsilon) / id$$

$$2. S \rightarrow iEts | iEtse | a$$

$$E \rightarrow b$$

3)  $S \rightarrow G_1 H$   
 $G_1 \rightarrow aF$   
 $H \rightarrow bF/\epsilon$   
 $H \rightarrow KL$   
 $K \rightarrow m/\epsilon$   
 $L \rightarrow n/\epsilon$

4)  $S \rightarrow aB|ac|sd|se$   
 $B \rightarrow bBc|f$   
 $c \rightarrow g$

5)  $S \rightarrow aBDh$   
 $B \rightarrow ec$   
 $C \rightarrow bc/\epsilon$   
 $D \rightarrow EF$   
 $E \rightarrow g/\epsilon$   
 $F \rightarrow f/\epsilon$

~~$D \rightarrow EF$~~   
 ~~$E \rightarrow g/\epsilon$~~   
 ~~$F \rightarrow f/\epsilon$~~

6.  $S \rightarrow (L)|a$   
 $L \rightarrow L, S|S$

7)  $S \rightarrow L = R|R$   
 $L \rightarrow *R|i\alpha$   
 $R \rightarrow L$

8.  $S \rightarrow AaAb|BbBa$   
 $A \rightarrow \epsilon$   
 $B \rightarrow \epsilon$

9)  $S \rightarrow aABBb$   
 $A \rightarrow c/\epsilon$   
 $B \rightarrow d/\epsilon$

10. Stmt - Sequence  $\rightarrow$  Stmt Stmt Sequence  
 Stmt\_Sequence!  $\rightarrow ;$  Stmt\_Sequence |  $\epsilon$   
 Stmt  $\rightarrow S$

11)  $S \rightarrow asbs|bsas|\epsilon$

12)  $S \rightarrow a|\uparrow|(T)$   
 $T \rightarrow T, S|S$

13)  $S \rightarrow As|b$   
 $A \rightarrow SA|a$

The word after 3 lines has blank in it

Answer:

$$\begin{aligned} \Rightarrow E &\rightarrow TE' \\ E &\rightarrow +TE'/\epsilon \\ T &\rightarrow FT' \\ T' &\rightarrow *FT'/\epsilon \\ F &\rightarrow (\epsilon)/id \end{aligned}$$

|       | E  | E'         | T  | T'         | F  |
|-------|----|------------|----|------------|----|
| FIRST | (  | +          | C  | *          | C  |
|       | id | $\epsilon$ | id | $\epsilon$ | id |
| FOL   | \$ | \$         | +  | +          | *  |
|       | )  | )          | )  | )          |    |

here  $\text{Follow}(E) = F \rightarrow (E)$

$$\begin{aligned} \text{Follow}(E) &= \text{FIRST}(\{)) \\ &= \{\}\} \end{aligned}$$

$$\text{Follow}(E') = E \rightarrow \frac{T}{\alpha} \frac{E'}{B} \text{ and } E' \rightarrow \frac{+TE'}{\alpha B} \text{ and } E' \rightarrow \frac{*FT'}{\alpha B}$$

$$\text{FOL}(E') = \text{FIRST}(B)$$

$$\therefore \text{FOL}(E') = \text{FOL}(E)$$

$$\text{FOL}(E') = \text{FIRST}(B)$$

$$\therefore \text{FOL}(E') = \text{FOL}(E)$$

$$\text{So, on } \text{Follow}(T) = E' \rightarrow \frac{+TE'}{\alpha B} \text{ and } E \rightarrow \frac{TE'}{\alpha B}$$

$$= \text{FIRST}(B)$$

$$= \text{FIRST}(E)$$

$$= \{+\}$$

$$E \rightarrow \frac{TE'}{\alpha B}$$

$$= \text{FIRST}(B)$$

$$= \text{FIRST}(E)$$

$$= \{+\}$$

Note: we should not write  $\underline{\epsilon}$  in the follow set-

$$2) S \rightarrow iEts \mid iEtses \mid a$$

$$E \rightarrow b$$

|        | S  | E |
|--------|----|---|
| FIRST  | i  | b |
|        | a  |   |
|        | \$ | t |
| Follow | e  |   |

$$\text{Follow}(S) = S \rightarrow iEts \quad B \bar{P}$$

$$\begin{aligned} \text{Follow}(S) &= \text{FIRST}(B) \\ &= \text{Follow}(S) \end{aligned}$$

$$S \rightarrow iEtses \quad a \bar{B} \bar{P}$$

$$\begin{aligned} \text{Follow}(S) &= \text{FIRST}(B) \\ &= \text{FIRST}(es) \\ &= \{e\} \end{aligned}$$

$$3) S \rightarrow , GHI;$$

$$G \rightarrow aF$$

$$f \rightarrow BF \mid \epsilon - \text{Epsilon}$$

$$H \rightarrow KL$$

$$K \rightarrow m \mid \epsilon$$

$$L \rightarrow n \mid \epsilon$$

|        | S  | G | H | I          | K          | L          |
|--------|----|---|---|------------|------------|------------|
| FIRST  | ,  | a | b | m          | m          | n          |
|        |    |   |   | $\epsilon$ | $n$        | $\epsilon$ |
|        |    |   |   | $\epsilon$ | $\epsilon$ | $\epsilon$ |
| Follow | \$ | m | m | ;          | n          | ;          |
| .      |    | n | n | ;          | ;          | ;          |
| .      |    | ; | ; | ;          | ;          | ;          |

$$4) S \rightarrow aB \mid ac \mid Sa \mid Se$$

$$B \rightarrow bBc \mid f$$

$$C \rightarrow g$$

|        | S            | B                 | C            |
|--------|--------------|-------------------|--------------|
| FIRST  | a            | b<br>f            | g            |
| FOLLOW | \$<br>d<br>e | \$<br>d<br>e<br>c | \$<br>d<br>e |

$$5) S \rightarrow aBDh$$

$$B \rightarrow ec$$

$$C \rightarrow bc \mid \epsilon$$

$$D \rightarrow EF$$

$$E \rightarrow g \mid \epsilon$$

$$F \rightarrow F \mid \epsilon$$

|        | S  | B           | C               | D                    | E               | F               |
|--------|----|-------------|-----------------|----------------------|-----------------|-----------------|
| FIRST  | a  | e           | b<br>$\epsilon$ | g<br>f<br>$\epsilon$ | g<br>$\epsilon$ | f<br>$\epsilon$ |
| FOLLOW | \$ | g<br>f<br>h | g<br>f<br>h     | h                    | h<br>f          | h               |

$$6) S \rightarrow (L)a$$

$$L \rightarrow L, S \mid S$$

|        | S  | L |
|--------|----|---|
| FIRST  | (  | ( |
|        | a  | a |
| FOLLOW | \$ | ) |
|        | ,  | , |

$$8) S \rightarrow AaAb \mid BbBa$$

$$A \rightarrow \epsilon$$

$$B \rightarrow \epsilon$$

|        | S      | A          | B          | Follow |  |
|--------|--------|------------|------------|--------|--|
| FIRST  | a<br>b | $\epsilon$ | $\epsilon$ |        |  |
| Follow | \$     | a<br>b     | b<br>a     |        |  |

$$9) S \rightarrow aABb$$

$$A \rightarrow C \mid \epsilon$$

$$B \rightarrow d \mid \epsilon$$

|        | S  | A          | B          | Follow |  |
|--------|----|------------|------------|--------|--|
| FIRST  | a  | (          | d          |        |  |
|        |    | $\epsilon$ | $\epsilon$ |        |  |
| Follow | \$ | d<br>b     | b          |        |  |

$$10) S \rightarrow a \mid \uparrow \mid (T)$$

$$T \rightarrow T, S \mid S$$

|        | S  | T                    | Follow |  |
|--------|----|----------------------|--------|--|
| FIRST  | a  | a<br>$\uparrow$<br>( | )      |  |
|        |    | $\uparrow$           | )      |  |
| Follow | \$ | )                    | ,      |  |

13)  $S \rightarrow As/b$

$A \rightarrow SA/a$

|        | <u>S</u>     | <u>A</u> |
|--------|--------------|----------|
| FIRST  | b<br>a       | a        |
| FOLLOW | \$<br>a<br>b | a<br>b   |

14)  $S \rightarrow L = R | R$

$L \rightarrow *R | id$

$R \rightarrow L$

|        | <u>S</u> | <u>L</u> | <u>R</u> |
|--------|----------|----------|----------|
| FIRST  | *        | *        | *        |
| FOLLOW | \$       | =        | \$       |

15)  $\text{stmt\_Sequence} \rightarrow \text{stmt } \text{stmt\_Sequence}$

$\text{stmt\_Sequence} \rightarrow ; \text{stmt\_Sequence} | \epsilon$

$\text{stmt} \rightarrow S$

|        | <u>stmt\_Sequence</u> | <u>stmt\_Sequence</u> | <u>stmt</u> |
|--------|-----------------------|-----------------------|-------------|
| FIRST  | S                     | ;                     | S           |
| FOLLOW | \$                    | \$                    | ;           |

$$10) S \rightarrow asbs \mid bsas \mid \epsilon$$

|       |   |
|-------|---|
| FIRST | S |
|       | a |
|       | b |

  

|        |    |
|--------|----|
| Follow | \$ |
|        | a  |
|        | b  |

$$\text{Follow}(S) \Rightarrow S \xrightarrow{\alpha \beta \beta} asbs$$

$$= \text{FIRST}(\beta)$$

$$= \text{FIRST}(b\beta)$$

$$\text{Follow}(S) = \{b\}$$

$$S \xrightarrow{\alpha \beta \beta} bsas$$

$$= \text{FIRST}(\beta)$$

$$= \text{FIRST}(as)$$

$$= \{a\}$$

$$S \not\xrightarrow{\epsilon}$$

$$FOL(S) \xrightarrow{\alpha \beta \beta} asbs$$

$$= \text{FIRST}(\beta)$$

$$FOL(S) = \text{Follow}(S)$$

$$Fa(S) \xrightarrow{\alpha \beta \beta} bsas$$

$$= \text{FIRST}(\beta) \not\xrightarrow{\epsilon}$$

$$FOL(S) = FOL(S)$$

Top-down Parsing

Predictive parsing table / LL(1) grammar / table driven predictive parser

→ lookahead symbol

LL(1) grammar

↓ ↗ left most derivation  
Scan the i/p from left to right

Steps:

- 1) Eliminate left recursion from the grammar
- 2) perform left factoring
- 3) find the FIRST and FOLLOW set
- 4) Construct the predictive parsing table
- 5) check whether the given i/p string is accepted/not

Algorithm for Constructing predictive parsing table

INPUT : Grammar  $G_1$

OUTPUT : parsing table M

METHOD: For each production  $A \rightarrow \alpha$  of the grammar,  
do the following

1. For each terminal  $a$  in  $\text{FIRST}(A)$ , add  $A \rightarrow \alpha$  to  $M[A, a]$
2. If  $\epsilon$  is in  $\text{FIRST}(\alpha)$ , Then for each terminal  $b$  in  $\text{Follow}(A)$ , add  $A \rightarrow \alpha$  to  $M[A, b]$ . If  $\epsilon$  is in  $\text{FIRST}(\alpha)$  &  $\$$  is in  $\text{Follow}(A)$ , add  $A \rightarrow \alpha$  to  $M[A, \$]$  as well

Predictive parsing Algorithm

INPUT: A string w and a parsing table 'm' for a grammar  $G_1$ .

OUTPUT: If w is in  $L(G_1)$  and LMD of w;  
otherwise an error condition

Input: 

|  |  |  |   |   |   |   |    |
|--|--|--|---|---|---|---|----|
|  |  |  | a | + | b | 1 | \$ |
|--|--|--|---|---|---|---|----|

METHOD: Initially, the parser is in a configuration with  $w\$$  in the ilp buffer and the start symbol  $S$  on top of the stack, above  $\$$ . The pgm in fig. uses the predictive parsing table  $M$  to procedure a predictive parse for the ilp set  $ip$  to point to the first symbol of  $w$  ; set  $x$  to the top stack symbol ;

```

while ( $x \neq \$$ ) { /* stack is not empty */
    if ( $x$  is a) pop the stack & a advance ip;
    else if ( $x$  is a terminal) error();
    else if ( $M[x, a]$  is an error entry) error();
    else if ( $M[x, a] = x \rightarrow y_1, y_2, \dots, y_k$ ) {
        output the production  $x \rightarrow y_1, y_2, \dots, y_k$ ;
        pop the stack;
        push  $y_k, y_{k-1}, \dots, y_1$  onto the stack, with  $y_1$  on top
    }
    Set x to the top stack symbol;
}

```



fig: model of a table driven predictive parser

Checking whether the given grammar is LL(1) or not without using parsing table

A grammar is LL(1) iff whenever,  $A \rightarrow \alpha\beta$  are two distinct productions of  $G$ , the following conditions hold

- i) For no terminal 'a' do with  $\alpha$  and  $\beta$  derive strings beginning with  $a \Rightarrow \text{FIRST}(\alpha)$  and  $\text{FIRST}(\beta)$  are disjoint.
- ii) Atmost one of  $\alpha$  and  $\beta$  can derive the empty string  $\Rightarrow$  either  $\text{FIRST}(\alpha) \rightarrow \epsilon$  or  $\text{FIRST}(\beta) \rightarrow \epsilon$  but not both.
- iii) If  $\beta \not\Rightarrow \epsilon$ , then  $\alpha$  does not derive any string beginning with a terminal in  $\text{FOLLOW}(A)$ . Likewise, if  $\alpha \not\Rightarrow \epsilon$ , then  $\beta$  does not derive any string beginning with a terminal in  $\text{FOLLOW}(B)$   
 $\Rightarrow \text{FIRST}(\alpha)$  and  $\text{FOLLOW}(A)$  are disjoint or  $\text{FIRST}(\beta)$  and  $\text{FOLLOW}(A)$  are disjoint

General forms:

$$\textcircled{1} \quad A \xrightarrow{\alpha \quad \beta} aB|ac$$

$$\text{FIRST}(\alpha) = \{a\}$$

$$\text{FIRST}(\beta) = \{a\}$$

are not disjoint, ac to the algorithm in any production

$$\text{eg: } A \xrightarrow{\alpha \quad \beta} aB|bC$$

$$\text{FIRST}(\alpha) = \{a\}$$

$$\text{FIRST}(\beta) = \{b\}$$
 are disjoint

3)  $A \rightarrow Bc \mid CD$  either  $\text{FIRST}(B) \Rightarrow \epsilon$  or  
 $B \rightarrow b \mid \epsilon$   $\text{FIRST}(D) \Rightarrow \epsilon$   
 $C \rightarrow c \mid \epsilon$  but not both

3) i)  $A \rightarrow aB$   
 $B \rightarrow cAa \mid \epsilon$   
 $\text{FIRST}(a) = \{a\}$   
 $\text{Follow}(A) = \{\$, ab\}$  are not disjoint

ii)  $A \rightarrow Ba$   
 $B \rightarrow cAa \mid \epsilon$   
 $\text{FIRST}(B) = \{ab\}$  and  $\text{Follow}(A) = \{\$, ab\}$   
 are not disjoint

Examples:

1.  $S \rightarrow iEtss' \mid a$   
 $s' \rightarrow es \mid \epsilon$   
 $E \rightarrow b$

|                 | $S$     | $s'$    | $E$ |
|-----------------|---------|---------|-----|
| $\text{FIRST}$  | i       | e       | b   |
| $\text{Follow}$ | \$<br>e | \$<br>e | t   |

- $\alpha \quad \beta$
- $S \rightarrow iEtss' \mid a$
- a.  $\text{FIRST}(\alpha) \cap \text{FIRST}(\beta) = \emptyset$   
 $\{i\} \cap \{ab\} = \emptyset$
- b. neither of  $\alpha$  or  $\beta$  are  $\epsilon$

2.  $S \rightarrow iEtss' \mid a$

- $\alpha \quad \beta$
- a.  $\text{FIRST}(\alpha) \cap \text{FIRST}(\beta) = \emptyset$   
 $\{i\} \cap \{ab\} = \emptyset$

b. neither of  $\alpha$  or  $\beta$  are  $\Rightarrow \epsilon$

c.  $B \rightarrow E$ , then  $\text{FIRST}(\alpha) \cap \text{Follow}(A) = \emptyset$

$\text{FIRST}(es) \cap \text{Follow}(s') = \emptyset$

$\{e\} \cap \{\$\} \neq \emptyset$

∴ The given grammar is not LL(1). Condition fails

$$S' \rightarrow eS | \epsilon$$

a) FIRST( $\alpha$ )  $\cap$  FIRST( $\beta$ ) =  $\emptyset$

$$\{\epsilon\} \cap \{\epsilon\} = \emptyset$$

b) The given grammar

$$p \Rightarrow \epsilon \text{ but } \alpha \neq \epsilon$$

c)  $p \Rightarrow \epsilon$ , then FIRST( $\alpha$ )  $\cap$  FOLLOW( $A$ ) =  $\emptyset$

$$\text{FIRST}(eS) \cap \text{FOLLOW}(S') = \emptyset$$

$$\{\epsilon\} \cap \{\$e\} \neq \emptyset$$

Condition fails

$\therefore$  The given grammar is not LL(1)

$$2) S \rightarrow S(S)S | \epsilon \Rightarrow S \rightarrow S \\ S \rightarrow (S) SS' | \epsilon$$

|        | S          | S'         |
|--------|------------|------------|
| FIRST  | $\epsilon$ | $\epsilon$ |
| FOLLOW | $\$$       | $\$$       |
|        | )          | )          |
|        | (          | (          |

$$1. S \rightarrow \$$$

not required because we don't have  $B$  production

$$2. S' \rightarrow (S) SS' | \epsilon$$

a. FIRST( $\alpha$ )  $\cap$  FIRST( $\beta$ )  $\neq \emptyset$   
 $\{\epsilon\} \cap \{\epsilon\} = \emptyset$

b. only  $B \Rightarrow \epsilon$  and  $\alpha \neq \epsilon$

c.  $B \Rightarrow \epsilon$ , then  
 $\text{FIRST}(\alpha) \cap \text{FOLLOW}(A) = \emptyset$   
 $\{\epsilon\} \cap \{\$\epsilon\} \neq \emptyset$

$\therefore$  The given grammar is not LL(1)

$$3. S \rightarrow SS+ | SS* | \alpha \Rightarrow S \rightarrow aS' \\ S' \rightarrow S+S | S*S' | \epsilon$$

↓ left factoring

final production

$$\left\{ \begin{array}{l} S \rightarrow aS' \\ S' \rightarrow SS'' | \epsilon \\ S \rightarrow +S' | *S' \end{array} \right.$$

|        | $S$  | $S'$       | $S''$ |
|--------|------|------------|-------|
| FIRST  | $a$  | $a$        | $+$   |
|        |      | $\epsilon$ | $*$   |
| Follow | $\$$ | $\$$       | $\$$  |
|        | $+$  | $+$        | $+$   |
|        | $*$  | $*$        | $*$   |

1.  $S \rightarrow aS'$   
not required
2.  $S'' \rightarrow +S' | *S'$ 
  - a.  $\text{FIRST}(+S) \cap \text{FIRST}(*S) = \emptyset$   
 $\{+\} \cap \{* \} = \emptyset$
  - b. neither  $\alpha$  or  $\beta \Rightarrow \epsilon$   
all conditions are satisfied
3.  $S' \rightarrow SS'' | \epsilon$ 
  - a.  $\text{FIRST}(SS'') \cap \text{FIRST}(\epsilon) = \emptyset$   
 $\{a\} \cap \{\epsilon\} = \emptyset$
  - b.  $\beta \Rightarrow \epsilon$  but not  $\alpha$
  - c.  $\beta \Rightarrow \epsilon$  then  
 $\text{FIRST}(SS'') \cap \text{Follow}(S) = \emptyset$   
 $\{a\} \cap \{\$\} = \emptyset$

all conditions are satisfied  
 $\therefore$  The grammar is LL(1)

$$\begin{array}{l}
 \text{i)} E \rightarrow E + T \mid T \\
 T \rightarrow T * F \mid F \\
 F \rightarrow (E) \mid \text{id}
 \end{array} \Rightarrow
 \begin{array}{l}
 E' \rightarrow + T E' \mid \epsilon \\
 T' \rightarrow * F T' \mid \epsilon \\
 F' \rightarrow (E) \mid \text{id}
 \end{array}$$

|        | E  | E'         | T  | T'         | F  |
|--------|----|------------|----|------------|----|
| FIRST  | (  | +          | (  | *          | (  |
|        | id | $\epsilon$ | id | $\epsilon$ | id |
| Follow | \$ | \$         | +  | +          | +  |
|        | )  | )          | )  | )          | )  |

ii)  $E \rightarrow TE'$

iii)  $E' \rightarrow + E' \mid \epsilon$

a.  $\text{FIRST}(+TE') \cap \text{FIRST}(\epsilon) = \emptyset$

$\{+\} \cap \{\epsilon\} = \emptyset$

b.  $\beta \Rightarrow \epsilon$  but  $\alpha \not\Rightarrow \epsilon$

c.  $\beta \Rightarrow \epsilon$  then  $\text{FIRST}(+TE') \cap \text{FOL}(E') = \emptyset$

$\{+\} \cap \{(\$\}) = \emptyset$

iv)  $T \rightarrow FT'$

v)  $T' \rightarrow *FT' \mid \epsilon$

a.  $\text{FIRST}(*FT') \cap \text{FIRST}(\epsilon) = \emptyset$

$\{*\} \cap \{\epsilon\} = \emptyset$

b.  $\beta \Rightarrow \epsilon$  but  $\alpha \not\Rightarrow \epsilon$

c.  $\beta \Rightarrow \epsilon$  then  $\text{FIRST}(*FT') \cap \text{FOL}(T') = \emptyset$

$\{*\} \cap \{+\$\} = \emptyset$

vi)  $F \rightarrow (E) \mid \text{id}$

a.  $\text{FIRST}((E)) \cap \text{FIRST}(\text{id}) = \emptyset$

$\{( \} \cap \{ \text{id} \} = \emptyset$

b. neither of them are not  $\Rightarrow \epsilon$

Checking whether the given grammar is LL(1) or not  
with Constructing the predictive parsing table

1.  $E \rightarrow E + T \mid T$   
 $T \rightarrow T * F \mid F$   
 $F \rightarrow (E) \mid id$

i) Remove left Recursion

$E \rightarrow TE'$   
 $E' \rightarrow +TE' \mid \epsilon$   
 $T \rightarrow FT'$   
 $T' \rightarrow *FT' \mid \epsilon$   
 $F \rightarrow (E) \mid id$

ii) Remove left factoring

→ here, not required

iii) find FIRST and FOLLOW set

|        | E  | $E'$       | T  | $T'$       | F  |
|--------|----|------------|----|------------|----|
| FIRST  | (  | +          | (  | *          | (  |
|        | id | $\epsilon$ | id | $\epsilon$ | id |
| FOLLOW | \$ | \$         | +  | +          | *  |
|        | )  | )          | \$ | )          | \$ |

iv) Construct the parsing table

|      | (                   | id                  | )                         | +                         | *                     | \$                        |
|------|---------------------|---------------------|---------------------------|---------------------------|-----------------------|---------------------------|
| E    | $E \rightarrow TE'$ | $E \rightarrow TE'$ |                           |                           |                       |                           |
| $E'$ |                     |                     | $E' \rightarrow \epsilon$ | $E' \rightarrow +TE'$     |                       | $E' \rightarrow \epsilon$ |
| T    | $T \rightarrow FT'$ | $T \rightarrow FT'$ |                           |                           |                       |                           |
| $T'$ |                     |                     | $T' \rightarrow \epsilon$ | $T' \rightarrow \epsilon$ | $T' \rightarrow *FT'$ | $T' \rightarrow \epsilon$ |
| F    | $F \rightarrow (E)$ | $F \rightarrow id$  |                           |                           |                       |                           |

| Stack     | Input           | Action                      |
|-----------|-----------------|-----------------------------|
| E \$      | id + id * id \$ |                             |
| TE' \$    | id + id * id \$ | push E → TE'                |
| FT'E' \$  | id + id * id \$ | push T → FT'                |
| idT'E' \$ | id + id * id \$ | push F → id                 |
| TE' \$    | + id * id \$    | matched 'id'                |
| E' \$     | + id * id \$    | T → E                       |
| +TE' \$   | + id * id \$    | push E' → +TE'              |
| TE' \$    | id * id \$      | matched '@+' <sup>(1)</sup> |
| FT'E' \$  | id * id \$      | push T → FT'                |
| idT'E' \$ | id * id \$      | push F → id                 |
| TE' \$    | * id \$         | matched 'id'                |
| *FT'E' \$ | * id \$         | push T → *FT'               |
| FT'E' \$  | id \$           | matched '*' <sup>(1)</sup>  |
| idT'E' \$ | id \$           | push F → id                 |
| TE' \$    | \$              | matched 'id'                |
| E' \$     | \$              | T → E                       |
| \$        | \$              | E' → E                      |

∴ The grammar is accepted the ilp string  
i.e. The ilp is passed successfully

The grammar is LL(1) ∵ no multiple entries

2)  $S \rightarrow iEtss | iEtsses | a$

$E \rightarrow b$

i) If  $E_1$  then if  $E_2$  then  $S_1$  else  $S_2$   
if  $b$  then if  $b$  then  $a$  else  $a$

ii) No left Recursion

iii) Remove left factoring

$S \rightarrow iEtss' | a$

$S \rightarrow es | \epsilon$

$E \rightarrow b$

iv) FIRST and FOLLOW Set

|        | $S$  | $S'$ | $E$ |
|--------|------|------|-----|
| FIRST  | $i$  | $e$  | $b$ |
| FOLLOW | $\$$ | $\$$ | $t$ |
|        | $e$  | $e$  |     |

v) parsing table

|      | $\$$                      | $i$                    | $e$                 | $b$                       | $a$ | $t$ |
|------|---------------------------|------------------------|---------------------|---------------------------|-----|-----|
| $S$  |                           | $S \rightarrow iEtss'$ |                     |                           |     |     |
| $S'$ | $S' \rightarrow \epsilon$ |                        | $S' \rightarrow es$ | $S' \rightarrow \epsilon$ |     |     |
| $E$  |                           |                        |                     | $E \rightarrow b$         |     |     |

The grammar is  
not LL(1).  
because it has  
multiple entries  
for the same  
terminal in a table

vi) Stack      Input      Action

|            |                          |                       |
|------------|--------------------------|-----------------------|
| $S \$$     | $iEtsses   ibtibtaea \$$ |                       |
| $iEtss \$$ | $ibtibtaea \$$           | $S \rightarrow iEtss$ |
| $Etss \$$  | $btibtaea \$$            | matched 'i'           |
| $btss \$$  | $btibtaea \$$            | $E \rightarrow b$     |

| Stack       | input     | Action                |
|-------------|-----------|-----------------------|
| tss \$      | tbtaea \$ | matched 'b'           |
| ss' \$      | btaea \$  | matched 't'.          |
| iEtss's' \$ | ibtaea \$ | $s \rightarrow iEtss$ |
| Etss's' \$  | btaea \$  | matched 'i'           |
| btss's' \$  | taea \$   | $E \rightarrow b$     |
| tss's' \$   | aea \$    | matched 'b'           |
| ss's' \$    | ea \$     | matched t             |
| as's' \$    |           | $s \rightarrow a$     |
| s's' \$     |           | matched 'a'           |

↳ ambiguity whether to push  $s \rightarrow s$  or  $s \rightarrow E$

$\Rightarrow$  ifp: If E then else S  
ibtaea

| Stack     | input     | Action                 |
|-----------|-----------|------------------------|
| ss \$     | ibtaea \$ | $s \rightarrow iEtss$  |
| iEtss' \$ | ibtaea \$ | matched 'i'            |
| Etss' \$  | btaea \$  | push $E \rightarrow b$ |
| btss' \$  | taea \$   | match 'b'              |
| tss' \$   | aea \$    | match t                |
| ss' \$    | aea \$    | $s \rightarrow a$      |
| as' \$    | ea \$     | match a                |
| s' \$     |           |                        |

↳ ambiguity, whether to push  $s \rightarrow s$  or  $s \rightarrow E$

3)  $S \rightarrow SS + (SS*)a \quad \text{IIP: } aat*a*$

i) Remove left recursion    ii) Remove left factoring

$$S \rightarrow aS \quad S \rightarrow aS'$$

$$S \rightarrow S+S \mid S*S \mid \epsilon \quad S' \rightarrow SS'' \mid \epsilon$$

$$S'' \rightarrow +S \mid *S$$

iii) find FIRST and FOLLOW set

|        | $S$ | $S'$       | $S''$ |
|--------|-----|------------|-------|
| FIRST  | a   | a          | +     |
|        |     | $\epsilon$ | *     |
| FOLLOW | \$  | \$         | \$    |
|        | +   | +          | +     |
|        | *   | *          | *     |

iv) find the predictive parsing table

|       | \$                        | a                     | +                         | *                         |
|-------|---------------------------|-----------------------|---------------------------|---------------------------|
| $S$   |                           | $S \rightarrow aS$    |                           |                           |
| $S'$  | $S' \rightarrow \epsilon$ | $S' \rightarrow SS''$ | $S' \rightarrow \epsilon$ | $S' \rightarrow \epsilon$ |
| $S''$ |                           |                       | $S'' \rightarrow +S'$     | $S'' \rightarrow *S'$     |

The given grammar is LL(1)

because there are no multiple production.

v) parse the IIP string

| stack      | input     | action                    |
|------------|-----------|---------------------------|
| $S\$$      | $aat*a\$$ |                           |
| $aS\$$     | $aat*a\$$ | $S \rightarrow aS$        |
| $S'\$$     | $ata*\$$  | match 'a'                 |
| $SS''\$$   | $ta*\$$   | $S' \rightarrow SS''$     |
| $aS'S''\$$ | $ta*\$$   | $S \rightarrow aS'$       |
| $S'S''\$$  | $+a*\$$   | match 'a'                 |
| $S''\$$    | $+a*\$$   | $S' \rightarrow \epsilon$ |

| Stack     | Input  | Stack          |
|-----------|--------|----------------|
| + \$      | t * \$ | \$" → + \$     |
| \$ \$     | a * \$ | match '+'      |
| ss" \$    | a * \$ | push s → ss"   |
| as" s" \$ | a * \$ | push s → as    |
| s' s" \$  | * \$   | match 'a'      |
| s" \$     | * \$   | push s → ε     |
| * s' \$   | *      | push s" → * \$ |
| s' \$     | \$     | match *        |
| \$        | \$     | push s' → ε    |

The input string is successfully parsed

$$4) \quad S \rightarrow OSI|OI$$

input string: 000111

i) Remove left recursion

→ not needed

ii) Remove left factoring

$$S \rightarrow OS$$

$$S \rightarrow SI|I$$

iii) FIRST and FOLLOW set

|        | S  | \$ |
|--------|----|----|
| FIRST  | O  | O  |
| FOLLOW | \$ | \$ |

iv) Construct the predictive parsing table

| S  | \$ | O                  |
|----|----|--------------------|
| S  |    | $S \rightarrow OS$ |
| S' |    | $S' \rightarrow S$ |

The grammar is LL(1), since it does not have any multiple production

→ parse the input string

| Stack   | input    | action       |
|---------|----------|--------------|
| s\$     | 000111\$ |              |
| 0s\$    | 000111\$ | push s → 0s  |
| s'      | 00111\$  | match '0'    |
| s1\$    | 00111\$  | push s → s1  |
| 0s'1\$  | 00111\$  | s → 0s'      |
| s'1\$   | 0111\$   | match '0'    |
| s1\$    | 0111\$   | s' → s1      |
| 0s'11\$ | 0111\$   | push s → 0s' |
| s'11\$  | 111\$    | match 0      |
| 11\$    | 11\$     | s' → 1       |
| 1\$     | 1\$      | match 1      |
| \$      | \$       | match 1      |

The input string is successfully parsed

- 5)  $S \rightarrow +SS | *SS | a$  ilp:  $+*aaa$
- Remove left recursion  $\Rightarrow$  not required
  - Remove left factoring  $\Rightarrow$  Not required
  - construct FIRST and FOLLOW set

|  |     |       |        |
|--|-----|-------|--------|
|  | $S$ | FIRST | FOLLOW |
|  | $+$ |       |        |
|  | $*$ |       |        |
|  | $a$ |       |        |
|  |     | $$$   |        |
|  |     | $+$   |        |
|  |     | $*$   |        |
|  |     | $a$   |        |

w) construct the predictive parsing table

|     |     |       |       |     |
|-----|-----|-------|-------|-----|
|     | $$$ | $+$   | $*$   | $a$ |
| $S$ |     | $+SS$ | $*SS$ | $a$ |

The grammar is LL(1). Since it contains no more than 1 production

v) ilp string:  $+*aaa$

| Stack   | input     | Action                                            |
|---------|-----------|---------------------------------------------------|
| $S\$$   | $+*aaa\$$ |                                                   |
| $+SS\$$ | $+*aaa\$$ | $S \rightarrow +SS$                               |
| $SS\$$  | $*aaa\$$  | match '+'                                         |
| $*SS\$$ | $*aaa\$$  | $S \rightarrow *SS$                               |
| $SS\$$  | $aaa\$$   | match *                                           |
| $SS\$$  | $aa\$$    | push $S \rightarrow a$ ilp is successfully parsed |
| $SS\$$  | $a\$$     | match a                                           |
| $a\$$   | $a\$$     | push $S \rightarrow a$                            |
| $\$$    | $\$$      | match a                                           |
| $\$$    | $\$$      | push $S \rightarrow a$                            |
| $\$$    | $\$$      | match a                                           |

- 6)  $S \rightarrow S(S)S \mid \epsilon$  i/p: (( ))
- i) Remove left recursion  $\Rightarrow S \rightarrow S + S \mid SS \mid (S) \mid S^* \mid a$   
 ~~$S \rightarrow \epsilon S'$~~  i/p: (a+a)\*a  
 ~~$S' \rightarrow (S)SS \mid \epsilon$~~
- ii) Remove left factoring  $\Rightarrow S \rightarrow aS' \mid (S)S'$   
 $\rightarrow S' \rightarrow +SS' \mid SS' \mid *S' \mid \epsilon$   
 → not needed
- iii) Find FIRST and FOLLOW

|        | $S$          | $S'$                                   |
|--------|--------------|----------------------------------------|
| FIRST  | a<br>(<br>*  | + , $\epsilon$<br>a<br>(<br>*          |
| FOLLOW | \$ + ) a ( * | \$<br>+<br>)<br>a<br>(<br>+<br>a, c, * |

iv) construct the predictive parsing table

|      | \$                                              | (                                          | a                                          | )                         | +                                                  | *                                                 |
|------|-------------------------------------------------|--------------------------------------------|--------------------------------------------|---------------------------|----------------------------------------------------|---------------------------------------------------|
| S    | $S \rightarrow (S)S$                            | $S \rightarrow aS'$                        |                                            |                           |                                                    |                                                   |
| $S'$ | $S' \rightarrow \epsilon$<br>$S' \rightarrow E$ | $S' \rightarrow SS'$<br>$S' \rightarrow E$ | $S' \rightarrow SS'$<br>$S' \rightarrow E$ | $S' \rightarrow \epsilon$ | $S' \rightarrow +SS'$<br>$S' \rightarrow \epsilon$ | $S' \rightarrow *S'$<br>$S' \rightarrow \epsilon$ |

The grammar is not LL(1)

v) input : (a+a)\*a

| stack      | input     | Action                   |
|------------|-----------|--------------------------|
| $S\$$      | $(a+a)*a$ |                          |
| $(S)S'\$$  | $(a+a)*a$ | $S \rightarrow (S)S'$    |
| $S)S'\$$   | $a+a)*a$  | match (                  |
| $aS')S'\$$ | $a+a)*a$  | push $S \rightarrow aS'$ |
| $s')S'\$$  | $+a)*a$   | match a                  |

→ Ambiguous, whether to parse  $S' \rightarrow +SS'$  or  $S' \rightarrow \epsilon$

6)  $S \rightarrow S(S)S \mid \epsilon$  ilp: (( ))()

i) Remove left Recursion

$\Rightarrow S \rightarrow S'S'$

$S' \rightarrow (S)SS' \mid \epsilon$

ii) Remove left factoring  
→ not needed

iii) find FIRST and Follow set

|        | $S$ | $S'$ | $\epsilon$ | $\epsilon$ | $\epsilon$ | $\epsilon$ |
|--------|-----|------|------------|------------|------------|------------|
| FIRST  | (   | (    | $\epsilon$ | $\epsilon$ | $\epsilon$ | $\epsilon$ |
| Follow | \$  | \$   | )          | )          | )          | )          |
|        | )   | )    | (          | (          | (          | (          |

w) parsing table

|      | \$                        | (                                                    | )                         | * |
|------|---------------------------|------------------------------------------------------|---------------------------|---|
| $S$  | $S \rightarrow \epsilon$  | $S \rightarrow S \cdot$<br>$S \rightarrow \epsilon$  | $S \rightarrow \epsilon$  |   |
| $S'$ | $S' \rightarrow \epsilon$ | $S' \rightarrow (S)SS'$<br>$S' \rightarrow \epsilon$ | $S' \rightarrow \epsilon$ |   |

The grammar is not LL(1), since it has a multiple transition productions

⇒ parse the ilp string: (( ))()

| Stack      | Input      | Action                         |
|------------|------------|--------------------------------|
| $S\$$      | $(( ))\$$  |                                |
| $S'\$$     | $(( )) \$$ | $S \rightarrow S'$ → ambiguity |
| $(S)SS'\$$ | $(( )) \$$ | $S' \rightarrow (S)SS'$        |
| $S)SS'\$$  | $(( )) \$$ | match ''                       |
| $S')SS'\$$ | $(( )) \$$ | push $S \rightarrow S'$        |

$$\text{if } S \rightarrow (\cup) a \\ L \rightarrow L, S | S \Rightarrow (a, a)$$

Step i) Remove left Recursion

$$L \rightarrow \frac{\$}{A} S | S$$

$$L \rightarrow S L'$$

$$L \rightarrow S L' | \epsilon$$

$$S \rightarrow (\cup) a$$

ii) Remove left Recursion factoring

not required

iii) write FIRST and FOLLOW set

|        | S        | L      | L' |
|--------|----------|--------|----|
| FIRST  | (<br>a   | (<br>a | ,  |
| FOLLOW | \$,<br>) | )      | )  |

iv) write the productive parsing table

|      | (                   | a                         | )                     | , | \$ |
|------|---------------------|---------------------------|-----------------------|---|----|
| S    | $S \rightarrow (L)$ | $S \rightarrow a$         |                       |   |    |
| L    | $L \rightarrow SL'$ | $L \rightarrow SL$        |                       |   |    |
| $L'$ |                     | $L' \rightarrow \epsilon$ | $L' \rightarrow , SL$ |   |    |

| Stack       | Input       | Action                    |
|-------------|-------------|---------------------------|
| $S\$$       | (, a, a) \$ |                           |
| $(L) \$$    | (, a, a) \$ | $S \rightarrow (L)$       |
| $L) \$$     | a, a) \$    | match (                   |
| $SL') \$$   | a, a) \$    | $L \rightarrow SL'$       |
| $aL') \$$   | a, a) \$    | $S \rightarrow a$         |
| $L') \$$    | , a) \$     | match a                   |
| $, SL') \$$ | , a) \$     | $L' \rightarrow , SL'$    |
| $SL') \$$   | a) \$       | match ,                   |
| $aL') \$$   | a) \$       | $S \rightarrow a$         |
| $L') \$$    | ) \$        | match a                   |
| $) \$$      | ) \$        | $L' \rightarrow \epsilon$ |
| $\$$        | \$          | match )                   |

The grammar is successfully parsed

$$8) S \rightarrow S + S \mid SS \mid (S) \mid S^* \mid a \Rightarrow (a+a)^* a$$

i) Remove left recursion

$$S \rightarrow S + S \mid SS \mid (S) \mid S^* \mid a$$

$$S \rightarrow (S) S' \mid a S'$$

$$S' \rightarrow (+ S^*)^* + SS' \mid SS' \mid (* S')^* \mid \epsilon$$

ii) remove left factoring  
not required

iii) Find FIRST and FOLLOW set

|       | $s$        | $s'$       |        | $s$ | $s'$ |
|-------|------------|------------|--------|-----|------|
| FIRST | (          | +          | FOLLOW | \$  | \$   |
|       | a          | a          |        | )   | )    |
|       | *          | *          |        | +   | +    |
|       | $\epsilon$ | $\epsilon$ |        | (   | (    |
|       |            |            |        | a   | a    |
|       |            |            |        | *   | *    |

iv) write predictive parse table

|      | (                         | a                    | ) | +                     | * | \$ |
|------|---------------------------|----------------------|---|-----------------------|---|----|
| $s$  | $s \rightarrow (s)s'$     | $s \rightarrow a s'$ |   |                       |   |    |
| $s'$ | $s' \rightarrow +ss'$     |                      |   | $s' \rightarrow +ss'$ |   |    |
|      | $s' \rightarrow ss'$      |                      |   |                       |   |    |
|      | $s' \rightarrow *s'$      |                      |   |                       |   |    |
|      | $s' \rightarrow \epsilon$ |                      |   |                       |   |    |

∴ The given grammar is not LL(1)

g)  $s \rightarrow asbs | bsas | \epsilon \Rightarrow aabbab$

ii) Remove left factoring - not required

ii) Remove left Recursion - not required

iii) Write FIRST and FOLLOW set

|       | $s$        |  | $s$ |
|-------|------------|--|-----|
| FIRST | a          |  | b   |
|       | b          |  | a   |
|       | $\epsilon$ |  | \$  |

iv) write a predictive parsing table

|     | a                        | b                        | \$                       |
|-----|--------------------------|--------------------------|--------------------------|
| $s$ | $s \rightarrow asbs$     | $s \rightarrow bsas$     | $s \rightarrow \epsilon$ |
|     | $s \rightarrow \epsilon$ | $s \rightarrow \epsilon$ |                          |

| V. | Stack         | Input    | Action               |
|----|---------------|----------|----------------------|
|    | s\$           | aabbab\$ |                      |
|    | asbs\$        | aabbab\$ | $s \rightarrow asbs$ |
|    | sbs\$         | abbab\$  | match a              |
|    | asbsbs\$      | abbab\$  | $s \rightarrow asbs$ |
|    | sbsbs\$       | bbab\$   | match a              |
|    | bsasbsbs\$    | bbab\$   | $s \rightarrow bsas$ |
|    | Sasbsbs\$     | babs\$   | match b              |
|    | b\$asasbsbs\$ | babs\$   | $s \rightarrow bsas$ |
|    | Sasasbsbs\$   |          |                      |

↳ ambiguous

10)  $bexpn \rightarrow bexpn \text{ on } bterm/bterm$

$bterm \rightarrow bterm \text{ and } bfactor/bfactor$

$bfactor \rightarrow \text{not } bfactor | (bexpn) | \text{true } | \text{false}$

ifp: not (true or false)

→ i) Remove left Recursion

$bexpn \rightarrow bterm bexpn \text{ or } bterm/bterm$

$bexpn' \rightarrow \text{on } bterm bexpn' \text{ or } bterm \rightarrow bterm \text{ and } bfactor/bfactor}$

$bterm \rightarrow bfactor/bterm \text{ and } bfactor/bterm$

$bterm' \rightarrow \text{and } bfactor/bterm' \text{ or } \epsilon$

$bfactor \rightarrow \text{not } bfactor | (bexpn) | \text{true } | \text{false}$

ii) No left greater factoring

iii) FIND the FIRST and FOLLOW set

|        | bexpn                     | bexpn'  | blcm                      | blcm'         | bfaclon                   |
|--------|---------------------------|---------|---------------------------|---------------|---------------------------|
| FIRST  | not<br>(<br>true<br>false | or<br>ε | not<br>(<br>true<br>false | and<br>ε      | not<br>(<br>true<br>false |
| FOLLOW | \$<br>)                   | \$<br>) | or<br>\$<br>)             | or<br>\$<br>) | and<br>or<br>\$<br>)      |

predictive parsing table:

|        | not-   | or          | and         | true           | false   | \$ |
|--------|--------|-------------|-------------|----------------|---------|----|
| bexpn  | bexpn' | blcm bexpn' | blcm bexpn' | bfaclon        | bfaclon |    |
| bexpn' |        |             | ε           | or blcm bexpn' |         | ε  |

|         | (                | not              | ) | or             | and         | true    | false   | \$ |
|---------|------------------|------------------|---|----------------|-------------|---------|---------|----|
| bexpn   | blcm bexpn'      | blcm bexpn'      |   |                | bfaclon     | blcm    | bfaclon |    |
| bexpn'  |                  |                  | ε | or blcm bexpn' |             |         |         | ε  |
| blcm    | bfaclon bfaclon' | bfaclon bfaclon' |   |                | ε           | bfaclon | bfaclon |    |
| blcm'   |                  |                  | ε | ε              | and bfaclon | blcm    |         | ε  |
| bfaclon | (bexpn)          | not bfaclon      |   |                |             | true    | false   |    |

| stack                                  | Input                  | Action                                      |
|----------------------------------------|------------------------|---------------------------------------------|
| beapn \$                               | not(true or false) \$  | beapn → blurnbeapn<br>↳ blurn → bfatorblurn |
| bfatorblurn \$                         | not (true or false) \$ | ↳ bfator → not bfator                       |
| bfator \$                              | not (true or false) \$ | ↳ bfator → (beapn)                          |
| bfator \$                              | (true or false) \$     | <del>match</del>                            |
| beapn) blurn)<br>beapn \$              | true or false) \$      | match (                                     |
| blurnbeapn) blurn)<br>beapn \$         | true or false) \$      | beapn → blurnbeapn                          |
| bfatorblurn 'beapn)<br>blurn' beapn \$ | true or false) \$      | blurn → bfatorblurn                         |
| blurn 'beapn)<br>blurn' beapn \$       | on false) \$           | bfator → blurn                              |
| beapn) blurn)<br>beapn \$              | on false) \$           | blurn → E                                   |
| blurn beapn)                           | false) \$              | beapn → blurnbeapn                          |
| blurn' beapn \$                        | false) \$              | blurn → bfatorblurn                         |
| bfatorblurn                            | false) \$              | bfator → E                                  |
| beapn) blurn' beapn \$                 | ) \$                   | blurn → E & beapn → (match)                 |
| blurn' beapn) blurn)<br>beapn \$       | \$                     |                                             |
| blurn' beapn \$                        | \$                     |                                             |

## Error recovery in predictive parser:

1. panic mode Recovery: In the blank entries of LR(0) follow set of all N, place symbol.

| Stacks | gfp | Table Entry | Action                                   |
|--------|-----|-------------|------------------------------------------|
| NT     | T   | blank       | skip the terminal from the gfp           |
| NT     | T   | Synch       | Remove NT from the S except start symbol |
| T      | T   | match       | pop the terminal from stack & input      |

### Example:

$$\begin{aligned}
 1. \quad E &\rightarrow E + T | T \\
 T &\rightarrow T * F / F \\
 F &\rightarrow (E) | id \\
 gfp: &) id * ? id
 \end{aligned}$$

$$\begin{aligned}
 E &\rightarrow TE' \\
 E' &\rightarrow +TE'| \epsilon \\
 T &\rightarrow FT' \\
 T' &\rightarrow *FT'| \epsilon \\
 F &\rightarrow (E) | id
 \end{aligned}$$

|        | E  | E' | T  | T' | F  |
|--------|----|----|----|----|----|
| FIRST  | (  | +  | (  | *  | (  |
|        | id | ε  | id | ε  | id |
| Follow | \$ | \$ | +  | +  | *  |
|        | )  | )  | \$ | \$ | \$ |
|        |    |    | )  | )  | )  |
|        |    |    |    |    | +  |

### Predictive parsing table

|    | (                   | id                  | )                  | +                    | *                 | \$                        |
|----|---------------------|---------------------|--------------------|----------------------|-------------------|---------------------------|
| E  | $E \rightarrow TE'$ | $E \rightarrow TE'$ | Synch              |                      |                   |                           |
| E' |                     |                     | $E' \rightarrow E$ | $E' \rightarrow TE'$ |                   | $E' \rightarrow \epsilon$ |
| T  | $T \rightarrow FT'$ | $T \rightarrow FT'$ | Synch              | $T \rightarrow FT'$  |                   | Synch                     |
| T' |                     |                     | $T \rightarrow E$  | $T \rightarrow FT'$  | $T \rightarrow E$ |                           |
| F  | $F \rightarrow (E)$ | $F \rightarrow id$  | Synch              | Synch                | Synch             | Synch                     |

| Stack       | Input        | Action                                                       |
|-------------|--------------|--------------------------------------------------------------|
| E \$        | id * id \$   | Since it is the start symbol<br>skip the ilp [E, \$] - Synch |
| E \$        | id * id \$   | $E \rightarrow TE'$                                          |
| TE' \$      | id * + id \$ | $T \rightarrow FT'$                                          |
| FT' E' \$   | id * + id \$ | $F \rightarrow id$                                           |
| id T' E' \$ | id * + id \$ | match id                                                     |
| T' E' \$    | * + id \$    | $T' \rightarrow * FT'$                                       |
| * FT' E' \$ | * + id \$    | match *                                                      |
| FT' E' \$   | + id \$      | [F, +] - Synch, remove NT from Stack                         |
| T' E' \$    | + id \$      | $T' \rightarrow \epsilon$                                    |
| E' \$       | + id \$      | $E' \rightarrow + TE'$                                       |
| + TE' \$    | + id \$      | match +                                                      |
| TE' \$      | id \$        | $T \rightarrow FT'$                                          |
| FT' E' \$   | id \$        | $F \rightarrow id$                                           |
| id T' E' \$ | id \$        | match id                                                     |
| T' E' \$    | \$           | $T' \rightarrow \epsilon$                                    |
| E' \$       | \$           | $E' \rightarrow \epsilon$                                    |
| \$          | \$           | match \$                                                     |

Q) show that the following grammar is ambiguous.

$E \rightarrow E+E | E-E | E*E | E| E^T | (E) | id$  and ilp: id + id \* id

Give an unambiguous grammar such that precedence order from lowest to highest are +, -, \*, /, (), id and all are left-to-right associative.

Bottom up parses:Introduction:

→ bottom-up parse corresponds to the construction of a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top)

→ e.g: A bottom-up parses for  $id * id$

Handle pruning:

Bottom up parsing during a left-to-right scan of the input constructs a right-most derivation in reverse. Informally a 'handle' is a substring that matches the body of the production, and whose reduction represents one step along the reverse of the right-most derivation.

Example:

adding subscripts to the tokens id for clarity. The handles during the parse of  $id * id$  are to the expressions

grammar  $\rightarrow \left\{ \begin{array}{l} E \rightarrow E + T | T \\ T \rightarrow T * F | F \\ F \rightarrow (E) | id \end{array} \right\}$  are shown in the figure.

Although  $T$  is the body of the production  $E \rightarrow T$ , the symbol  $T$  is not a handle in the sentential form  $T * id_2$ . If  $T$  were indeed replaced by  $E$ , we would get a string  $E * id_2$ , which cannot be derived from the start symbol  $E$ . Thus, the leftmost substring that matches the body of some production need not be a handle.

formally, if  $S \xrightarrow{nm} id_1 * id_2$

| RIGHT SENTENTIAL FORM | HANDLE  | REDUCTION PRODUCTION  |
|-----------------------|---------|-----------------------|
| $id_1 * id_2$         | $id_1$  | $F \rightarrow id$    |
| $F * id_2$            | $F$     | $T \rightarrow F$     |
| $T * id_2$            | $id_2$  | $F \rightarrow id$    |
| $F$                   | $T * F$ | $E \rightarrow T * F$ |

Handles during a parse of  $id_1 * id_2$  (a)

formally, if  $S \xrightarrow{nm} \alpha Aw \xrightarrow{nm} \alpha bw$ , as in figure (b), then production  $A \rightarrow B$ , in the position following  $\alpha$  is a handle of  $\alpha bw$ . Alternatively, a handle of a right sentential form  $\gamma$  is a production  $A \rightarrow \beta$  and a position of  $\gamma$  where the string  $B$  may be found, such that replacing  $B$  at that position by  $A$  produces the previous right sentential form in a rightmost derivation of  $\gamma$ .

Notice that the string  $w$  to the right of the handle must contain only terminal symbols. For convenience, we prefer to the body  $B$  rather than  $A \rightarrow B$  as a handle. Note we "a handle" rather than "the handle", because

The grammar could be ambiguous, with more than one rightmost derivation of  $\alpha\beta w$ . If a grammar is unambiguous, then every right-sentential form of the grammar has exactly one handle.

A right-most derivation in reverse can be obtained by "handle pruning". That is, we start with a string of terminals  $w$  to be parsed. If  $w$  is a sentence of a grammar at hand, then let  $w = T_n$ , where  $T_n$  is the  $n^{\text{th}}$  right sentential form of some as yet unknown rightmost derivation.

$$S = T_0 \xrightarrow{\text{rnm}} T_1 \xrightarrow{\text{rnm}} T_2 \xrightarrow{\text{rnm}} \dots \xrightarrow{\text{rnm}} T_{n-1} \xrightarrow{\text{rnm}} T_n = w$$



Figure (b). A handle  $A \rightarrow B$  in the parse tree for  $\alpha\beta w$ . To reconstruct this derivation in reverse order, we locate the handle  $B_n$  in  $T_n$  and replace  $B_n$  by the head of relevant production  $A_n \rightarrow B_n$  to obtain the previous right sentential form  $T_{n-1}$ . Note that we do not know how handles are to be found, but we shall see methods of doing so shortly.

We then repeat this process. That is we locate the handle  $B_{n-1}$  in  $T_{n-1}$  and reduce this handle to obtain the right sentential form  $T_{n-2}$ . If by continuing this process we produce a right sentential form consisting only of the start symbol  $S$ , then we halt & that's successful.

## Shift-Reduce parsing:

Shift-reduce parsing is a form of bottom up parsing in which a stack holds grammar symbols and an input buffer holds the rest of the string to be parsed.

→ we use \$ to mark the bottom of the stack and also the right end of the input. Conventionally, when discussing bottom-up parsing, we show the top of the stack on the right, rather on the left as we did for top-down parsing.

Initially, the stack is empty, and string  $w$  is on the input, as follows:

| STACK | INPUT |
|-------|-------|
| \$    | $w\$$ |

During left to right scan of the input string, the parser shifts zero/more input symbols onto the stack until it is ready to reduce a string  $\beta$  of the grammar symbols on top of the stack. It then reduces  $\beta$  to the head of the appropriate production. The parser repeats this cycle until it has reduced an entire  $w$  until the stack contains the start symbol and input is empty.

## Actions in shift-reduce parsing:

1. Shift: shift the next input symbol onto the top of stack
2. Reduce: The right end of the string to be reduced must be at the top of the stack. Locate the left end of the string within the stack and decide with what non-terminal to replace the string

3. Accept: Announce successful completion of parsing

4. Error: Discover a syntax error and call an error recovery routine

Find the handles for the given RSF and construct shift-reduce parser:

$$\begin{aligned}1. \quad E &\rightarrow E+T \\ T &\rightarrow T*F/F \\ F &\rightarrow (E)id\end{aligned}$$

inp: id + id  
id + id \* id

$$\begin{aligned}\rightarrow RMD \\ E &\rightarrow E+T \\ &\Rightarrow E+F \\ &\Rightarrow E+id \\ &\Rightarrow T+id \\ &\Rightarrow F+id \\ &\Rightarrow id+id\end{aligned}$$

| RSF         | Handle | Action              |
|-------------|--------|---------------------|
| $id_1+id_2$ | $id_1$ | $F \rightarrow id$  |
| $F+id_2$    | $F$    | $T \rightarrow F$   |
| $T+id_2$    | $T$    | $E \rightarrow T$   |
| $E+id_2$    | $id_2$ | $F \rightarrow id$  |
| $E+F$       | $F$    | $T \rightarrow F$   |
| $E+T$       | $E+T$  | $E \rightarrow E+T$ |

| Stack       | RSF            | Action                      |
|-------------|----------------|-----------------------------|
| \$          | $id_1+id_2 \$$ | shift $id_1$                |
| \$ $id_1$   | $\$ id_2 \$$   | reduce $F \rightarrow id$   |
| \$ $F$      | $+id_2 \$$     | reduce $T \rightarrow F$    |
| \$ $T$      | $+id_2 \$$     | reduce $E \rightarrow T$    |
| \$ $E$      | $+id_2 \$$     | shift +                     |
| \$ $E+T$    | $+id_2 \$$     | shift $id_2$                |
| \$ $E+id_2$ | \$             | reduce $F \rightarrow id_2$ |
| \$ $E+F$    | \$             | reduce $T \rightarrow F$    |
| \$ $E+T$    | \$             | reduce $E \rightarrow T$    |
| \$ $E$      |                | Success                     |

ip: id + id \* id

RMD:

$E \rightarrow E + T$   
 $\Rightarrow E + T * F$   
 $\Rightarrow E + T * id$   
 $\Rightarrow E + F * id$   
 $\Rightarrow E + id * id$   
 $\Rightarrow T + id * id$   
 $\Rightarrow F + id * id$   
 $\Rightarrow id + id * id$

---

|  | RSF                  | Handle  | Action                |
|--|----------------------|---------|-----------------------|
|  | $id_1 + id_2 * id_3$ | $id_1$  | $F \rightarrow id$    |
|  | $F + id_2 * id_3$    | $F$     | $T \rightarrow F$     |
|  | $T + id_2 * id_3$    | $T$     | $E \rightarrow F$     |
|  | $E + id_2 * id_3$    | $id_2$  | $F \rightarrow id_2$  |
|  | $E + F * id_3$       | $F$     | $T \rightarrow F$     |
|  | $E + T * id_3$       | $id_3$  | $F \rightarrow id_3$  |
|  | $E + T * F$          | $T + F$ | $T \rightarrow F * F$ |
|  | $E + T$              | $E + T$ | $E \rightarrow E + T$ |
|  | $E$                  |         |                       |

| Stack             | RSF                     | Action                       |
|-------------------|-------------------------|------------------------------|
| \$                | $id_1 + id_2 * id_3 \$$ | shift $id_1$                 |
| $\$ id_1$         | $+ id_2 * id_3 \$$      | $F \rightarrow id_1$         |
| $\$ F$            | $+ id_2 * id_3 \$$      | $T \rightarrow F$            |
| $\$ T$            | $+ id_2 * id_3 \$$      | $E \rightarrow T$            |
| $\$ E$            | $+ id_2 * id_3 \$$      | Shift +                      |
| $\$ E +$          | $id_2 * id_3 \$$        | shift $id_2$                 |
| $\$ E + id_2$     | $* id_3 \$$             | reduce $F \rightarrow id$    |
| $\$ E + T * id_3$ | \$                      | $T \rightarrow F$            |
| $\$ E + T * F$    | \$                      | shift *                      |
| $\$ E$            | \$                      | shift $ids$                  |
|                   |                         | reduce $F \rightarrow id_3$  |
|                   |                         | reduce $T \rightarrow T * F$ |
|                   |                         | reduce $F \rightarrow E + T$ |
|                   |                         | Success                      |

2)  $S \rightarrow OS1/O1$  Pilp: 000111

$S \xrightarrow{?m} OS1$

$\Rightarrow OS1110$

$\Rightarrow 000111$

| RSF    | Handle | Action              |
|--------|--------|---------------------|
| 000111 | O1     | $S \rightarrow OS1$ |
| 00S11  | OS1    | $S \rightarrow OS1$ |
| OS1    | OS1    | $S \rightarrow OS1$ |

| Stack    | RSF      | Action                     |
|----------|----------|----------------------------|
| \$       | 000111\$ | Shift O                    |
| \$0      | 00111\$  | Shift O                    |
| \$00     | 0111\$   | Shift O                    |
| \$000    | 111\$    | Shift I                    |
| \$0001   | 11\$     | reduce $S \rightarrow O1$  |
| \$005    | 1\$      | Shift I                    |
| \$0001S1 | \$       | reduce $S \rightarrow OS1$ |
| \$0S     | \$       | Shift I                    |
| \$OS1    | \$       | reduce $S \rightarrow OS1$ |
| \$S      | \$       | Success                    |

3)  $S \rightarrow SS+ | SS* | a$  Pilp: aaa\*attt

$S \rightarrow SS+$

$\Rightarrow SSS++$

$\Rightarrow SSA++$

$\Rightarrow SSS*attt$

$\Rightarrow SSA*attt$

$\Rightarrow SAA*attt$

$\Rightarrow AAA*attt$

| RSF       | Handle | Action              |
|-----------|--------|---------------------|
| aaa*attt  | a      | $S \rightarrow a$   |
| Saa*attt  | a      | $S \rightarrow a$   |
| SSa*attt  | a      | $S \rightarrow a$   |
| SSS*aattt | SS*    | $S \rightarrow SS*$ |
| SSa++     | a      | $S \rightarrow a$   |
| SSS++     | SS+    | $S \rightarrow SS+$ |
| SS+*      | SS+    | $S \rightarrow SS+$ |
| S         |        |                     |

| Stack | RSF      | Action                     |
|-------|----------|----------------------------|
| \$    | aa*a++\$ | shift a                    |
| \$a   | aa*a++\$ | reduce $s \rightarrow a$   |
| \$s   | aa*a++\$ | Shift a                    |
| \$sa  | a*a++\$  | reduce $s \rightarrow a$   |
| \$ss  | a*a++\$  | Shift a                    |
| \$ss* | *a++\$   | reduce $s \rightarrow a$   |
| \$sss | *a++\$   | Shift *                    |
| \$ss* | a++\$    | reduce $s \rightarrow ss*$ |
| \$ss  | a++\$    | Shift a                    |
| \$sa  | a++\$    | reduce $s \rightarrow a$   |
| \$ssa | σ++\$    | Shift σ                    |
| \$ss+ | +\$      | reduce $s \rightarrow ss+$ |
| \$ss  | +\$      | Shift +                    |
| \$ss+ | \$       | reduce $s \rightarrow ss+$ |
| \$s   | \$       | Success                    |

## Type of conflicts in shift-reduce parsers

### Conflicts during shifts - Reduce parsing.

There are CFL's for which shift-reduce parsing cannot be used. Every shift-reduce parser for such a grammar can reach a configuration in which the parser, knowing the entire stack contents & the next input symbol

Type:

#### i) shift/reduce conflict:

→ Cannot decide whether to shift or to reduce  
called shift-reduce Conflict

#### ii) reduce/reduce conflict:

→ Cannot decide which of several reductions to make called reduce-reduce conflicts

e.g.: Consider the grammar

$$E \rightarrow E+E$$

$$\quad | \quad E-E$$

$$\quad | \quad \text{NUM}$$

Input:

| No. | Stack | operation/grammar                     |
|-----|-------|---------------------------------------|
| 1   | 2 NUM | Shift 2                               |
| 2   | E     | reduce $E \rightarrow \text{NUM}$     |
| 3   | E+    | Shift +                               |
| 4   | E+3   | Shift 3                               |
| 5   | E+E   | reduce $E \rightarrow \text{NUM}$     |
| 6   | E     | reduce $E \rightarrow E+E$ on shift * |

i.e. Shift-reduce Conflict

$$\text{ii) } \begin{array}{l} E \rightarrow T \\ T \rightarrow id \\ F \rightarrow id \end{array}$$

id.

diff  
be) bi

eg 2) An ambiguous grammar can never be LR. For e.g. consider the dangling-else grammar

$$\begin{array}{l} \text{stmt} \rightarrow \text{if expr then stmt} \\ | \quad \quad \quad \text{if expr then stmt else stmt} \\ | \quad \quad \quad \text{other} \end{array}$$

If we have a shift-reduce parser in configuration like the

$$\begin{array}{ll} \text{STACK} & \text{INPUT} \\ \dots \text{if expr then stmt} & \text{else} \dots \$ \end{array}$$

→ we cannot tell whether if expr then stmt is the handle, no matter what happens below it on the stack. here there is a shift/reduce conflict. Depending on what follows the else on the input, it might be correct to reduce if expr then stmt to stmt, or it might be correct to shift else then to look for another stmt to complete the alternative if expr Then stmt else stmt

- eg 3)
- (1)  $\text{stmt} \rightarrow \text{id} (\text{parameter\_list})$
  - (2)  $\text{stmt} \rightarrow \text{expr} := \text{expr}$
  - (3)  $\text{parameter\_list} \rightarrow \text{parameter\_list}, \text{parameter}$
  - (4)  $\text{parameter\_list} \rightarrow \text{parameter}$
  - (5)  $\text{parameter} \rightarrow \text{id}$
  - (6)  $\text{expr} \rightarrow \text{id} (\text{expr\_list})$
  - (7)  $\text{expr} \rightarrow \text{id}$
  - (8)  $\text{expr\_list} \rightarrow \text{expr\_list}, \text{expr}$
  - (9)  $\text{expr\_list} \rightarrow \text{expr}$

STACK  
.. id (id

INPUT  
, id) --

In the problem id on the top of the stack must be reduced, but by which production? The correct choice is production(5) if p is a procedure, but production(7) if p is an array. The stack does not tell which; information in the symbol table obtained from the declaration of p must be used.

So, In this case we have the conflict that reduce id by parameter or expr. It is called as reduce-reduce conflict.

(left-deriving, left-lookahead) (1)  
right-deriving, right-lookahead (2)

left-deriving, right-deriving (3)

right-deriving, left-deriving (4)

non-dominant (5)

(left-deriving, right-lookahead) (6)

left-deriving, right-lookahead (7)

open left-deriving, right-lookahead (8)

right-deriving, left-lookahead (9)

or in the configuration above. In the former case, we choose reduction by production (5); in the latter case by production (7). Notice how the symbol third from the top of the stack determines the reduction to be made, even though it is not involved in the reduction. Shift-reduce parsing can utilize information far down in the stack to guide the parse.  $\square$

#### 4.6 OPERATOR-PRECEDENCE PARSING

The largest class of grammars for which shift-reduce parsers can be built successfully – the LR grammars – will be discussed in Section 4.7. However, for a small but important class of grammars we can easily construct efficient shift-reduce parsers by hand. These grammars have the property (among other essential requirements) that no production right side is  $\epsilon$  or has two adjacent nonterminals. A grammar with the latter property is called an *operator grammar*.

**Example 4.27.** The following grammar for expressions

$$\begin{aligned} E &\rightarrow EAE \mid (E) \mid -E \mid \text{id} \\ A &\rightarrow + \mid - \mid * \mid / \mid \uparrow \end{aligned}$$

is not an operator grammar, because the right side  $EAE$  has two (in fact three) consecutive nonterminals. However, if we substitute for  $A$  each of its alternatives, we obtain the following operator grammar:

$$E \rightarrow E+E \mid E-E \mid E*E \mid E/E \mid E \uparrow E \mid (E) \mid -E \mid \text{id} \quad (4.17)$$

We now describe an easy-to-implement parsing technique called operator-precedence parsing. Historically, the technique was first described as a manipulation on tokens without any reference to an underlying grammar. In fact, once we finish building an operator-precedence parser from a grammar, we may effectively ignore the grammar, using the nonterminals on the stack only as placeholders for attributes associated with the nonterminals.

As a general parsing technique, operator-precedence parsing has a number of disadvantages. For example, it is hard to handle tokens like the minus sign, which has two different precedences (depending on whether it is unary or binary). Worse, since the relationship between a grammar for the language being parsed and the operator-precedence parser itself is tenuous, one cannot always be sure the parser accepts exactly the desired language. Finally, only a small class of grammars can be parsed using operator-precedence techniques.

Nevertheless, because of its simplicity, numerous compilers using operator-precedence parsing techniques for expressions have been built successfully. Often these parsers use recursive descent, described in Section 4.4, for statements and higher-level constructs. Operator-precedence parsers have even been built for entire languages.

In operator-precedence parsing, we define three disjoint *precedence relations*,  $<\cdot, \doteq, \cdot>$ , between certain pairs of terminals. These precedence relations guide the selection of handles and have the following meanings:

| RELATION      | MEANING                              |
|---------------|--------------------------------------|
| $a < \cdot b$ | $a$ "yields precedence to" $b$       |
| $a = b$       | $a$ "has the same precedence as" $b$ |
| $a \cdot > b$ | $a$ "takes precedence over" $b$      |

We should caution the reader that while these relations may appear similar to the arithmetic relations "less than," "equal to," and "greater than," the precedence relations have quite different properties. For example, we could have  $a < \cdot b$  and  $a \cdot > b$  for the same language, or we might have none of  $a < \cdot b$ ,  $a = b$ , and  $a \cdot > b$  holding for some terminals  $a$  and  $b$ .

There are two common ways of determining what precedence relations should hold between a pair of terminals. The first method we discuss is intuitive and is based on the traditional notions of associativity and precedence of operators. For example, if  $*$  is to have higher precedence than  $+$ , we make  $+ < \cdot *$  and  $* \cdot > +$ . This approach will be seen to resolve the ambiguities of grammar (4.17), and it enables us to write an operator-precedence parser for it (although the unary minus sign causes problems).

② The second method of selecting operator-precedence relations is first to construct an unambiguous grammar for the language, a grammar that reflects the correct associativity and precedence in its parse trees. This job is not difficult for expressions; the syntax of expressions in Section 2.2 provides the paradigm. For the other common source of ambiguity, the dangling else grammar (4.9) is a useful model. Having obtained an unambiguous grammar, there is a mechanical method for constructing operator-precedence relations from it. These relations may not be disjoint, and they may parse a language other than that generated by the grammar, but with the standard sorts of arithmetic expressions, few problems are encountered in practice. We shall not discuss this construction here; see Aho and Ullman [1972b].

### Using Operator-Precedence Relations

The intention of the precedence relations is to delimit the handle of a right-sentential form, with  $<$  marking the left end,  $=$  appearing in the interior of the handle, and  $\cdot >$  marking the right end. To be more precise, suppose we have a right-sentential form of an operator grammar. The fact that no adjacent nonterminals appear on the right sides of productions implies that no right-sentential form will have two adjacent nonterminals either. Thus, we may write the right-sentential form as  $\beta_0 a_1 \beta_1 \cdots a_n \beta_n$ , where each  $\beta_i$  is either  $\epsilon$  (the empty string) or a single nonterminal, and each  $a_i$  is a single terminal.

Suppose that between  $a_i$  and  $a_{i+1}$  exactly one of the relations  $<$ ,  $=$ , and  $\cdot >$  holds. Further, let us use  $\$$  to mark each end of the string, and define  $\$ < \cdot b$  and  $b \cdot > \$$  for all terminals  $b$ . Now suppose we remove the nonterminals from the string and place the correct relation  $<$ ,  $=$ , or  $\cdot >$ , between each

pair of terminals and between the endmost terminals and the \$'s marking the ends of the string. For example, suppose we initially have the right-sentential form  $\text{id} + \text{id} * \text{id}$  and the precedence relations are those given in Fig. 4.23. These relations are some of those that we would choose to parse according to grammar (4.17).

|           | <b>id</b> | <b>+</b> | <b>*</b> | <b>\$</b> |
|-----------|-----------|----------|----------|-----------|
| <b>id</b> | >         | >        | >        | >         |
| <b>+</b>  | <         | >        | <        | >         |
| <b>*</b>  | <         | >        | >        | >         |
| <b>\$</b> | <         | <        | <        | <         |

Fig. 4.23. Operator-precedence relations.

Then the string with the precedence relations inserted is:

$$\$ < \cdot \text{id} \cdot > + < \cdot \text{id} \cdot > * < \cdot \text{id} \cdot > \$ \quad (4.18)$$

For example,  $<$  is inserted between the leftmost \$ and id since  $<$  is the entry in row \$ and column id. The handle can be found by the following process.

1. Scan the string from the left end until the first  $\cdot >$  is encountered. In (4.18) above, this occurs between the first id and +.
2. Then scan backwards (to the left) over any  $\cdot$ 's until a  $< \cdot$  is encountered. In (4.18), we scan backwards to \$.
3. The handle contains everything to the left of the first  $\cdot >$  and to the right of the  $< \cdot$  encountered in step (2), including any intervening or surrounding nonterminals. (The inclusion of surrounding nonterminals is necessary so that two adjacent nonterminals do not appear in a right-sentential form.) In (4.18), the handle is the first id.

If we are dealing with grammar (4.17), we then reduce id to E. At this point we have the right-sentential form  $E + \text{id} * \text{id}$ . After reducing the two remaining id's to E by the same steps, we obtain the right-sentential form  $E + E * E$ . Consider now the string  $\$ + * \$$  obtained by deleting the nonterminals. Inserting the precedence relations, we get

$$\$ < \cdot + < \cdot * \cdot > \$$$

indicating that the left end of the handle lies between + and \* and the right end between \* and \$. These precedence relations indicate that, in the right-sentential form  $E + E * E$ , the handle is  $E * E$ . Note how the E's surrounding the \* become part of the handle.

Since the nonterminals do not influence the parse, we need not worry about distinguishing among them. A single marker "nonterminal" can be kept on

the stack of a shift-reduce parser to indicate placeholders for attribute values.

It may appear from the discussion above that the entire right-sentential form must be scanned at each step to find the handle. Such is not the case if we use a stack to store the input symbols already seen and if the precedence relations are used to guide the actions of a shift-reduce parser. If the precedence relation  $<$  or  $\doteq$  holds between the topmost terminal symbol on the stack and the next input symbol, the parser shifts; it has not yet found the right end of the handle. If the relation  $\cdot >$  holds, a reduction is called for. At this point the parser has found the right end of the handle, and the precedence relations can be used to find the left end of the handle in the stack.

If no precedence relation holds between a pair of terminals (indicated by a blank entry in Fig. 4.23), then a syntactic error has been detected and an error recovery routine must be invoked, as discussed later in this section. The above ideas can be formalized by the following algorithm.

**Algorithm 4.5.** Operator-precedence parsing algorithm.

*Input.* An input string  $w$  and a table of precedence relations.

*Output.* If  $w$  is well formed, a *skeletal* parse tree, with a placeholder nonterminal  $E$  labeling all interior nodes; otherwise, an error indication.

*Method.* Initially, the stack contains  $\$$  and the input buffer the string  $w\$$ . To parse, we execute the program of Fig. 4.24.  $\square$

```

(1) set  $ip$  to point to the first symbol of  $w\$$ ;
(2) repeat forever
(3)   if  $\$$  is on top of the stack and  $ip$  points to  $\$$  then
(4)     return
(5)   else begin
(6)     let  $a$  be the topmost terminal symbol on the stack
        and let  $b$  be the symbol pointed to by  $ip$ ;
(7)     if  $a < b$  or  $a \doteq b$  then begin
(8)       push  $b$  onto the stack;
        advance  $ip$  to the next input symbol;
    end;
(9)     else if  $a \cdot > b$  then      /* reduce */
(10)       repeat
(11)         pop the stack
(12)       until the top stack terminal is related by  $<$ 
            to the terminal most recently popped
(13)     else error()
end

```

Fig. 4.24. Operator-precedence parsing algorithm.

### Operator-Precedence Relations from Associativity and Precedence

We are always free to create operator-precedence relations any way we see fit and hope that the operator-precedence parsing algorithm will work correctly when guided by them. For a language of arithmetic expressions such as that generated by grammar (4.17) we can use the following heuristic to produce a proper set of precedence relations. Note that grammar (4.17) is ambiguous, and right-sentential forms could have many handles. Our rules are designed to select the "proper" handles to reflect a given set of associativity and precedence rules for binary operators.

1. If operator  $\theta_1$  has higher precedence than operator  $\theta_2$ , make  $\theta_1 \cdot > \theta_2$  and  $\theta_2 < \cdot \theta_1$ . For example, if  $*$  has higher precedence than  $+$ , make  $* \cdot > +$  and  $+ < \cdot *$ . These relations ensure that, in an expression of the form  $E + E * E + E$ , the central  $E * E$  is the handle that will be reduced first.
2. If  $\theta_1$  and  $\theta_2$  are operators of equal precedence (they may in fact be the same operator), then make  $\theta_1 \cdot > \theta_2$  and  $\theta_2 \cdot > \theta_1$  if the operators are left-associative, or make  $\theta_1 < \cdot \theta_2$  and  $\theta_2 < \cdot \theta_1$  if they are right-associative. For example, if  $+$  and  $-$  are left-associative, then make  $+ \cdot > +$ ,  $+ \cdot > -$ ,  $- \cdot > -$ , and  $- \cdot > +$ . If  $\dagger$  is right associative, then make  $\dagger < \cdot \dagger$ . These relations ensure that  $E - E + E$  will have handle  $E - E$  selected and  $E \dagger E \dagger E$  will have the last  $E \dagger E$  selected.
3. Make  $\theta < \cdot id$ ,  $id \cdot > \theta$ ,  $\theta < \cdot ($ ,  $( < \cdot \theta$ ,  $) \cdot > \theta$ ,  $\theta \cdot > )$ ,  $\theta \cdot > \$$ , and  $\$ < \cdot \theta$  for all operators  $\theta$ . Also, let

$$\begin{array}{lll} (\doteq) & \$ < \cdot ( & \$ < \cdot id \\ (< \cdot ( & id \cdot > \$ & ) \cdot > \$ \\ (< \cdot id & id \cdot > ) & ) \cdot > ) \end{array}$$

These rules ensure that both  $id$  and  $(E)$  will be reduced to  $E$ . Also,  $\$$  serves as both the left and right endmarker, causing handles to be found between  $\$$ 's wherever possible.

**Example 4.28.** Figure 4.25 contains the operator-precedence relations for grammar (4.17), assuming

1.  $\dagger$  is of highest precedence and right-associative,
2.  $*$  and  $/$  are of next highest precedence and left-associative, and
3.  $+$  and  $-$  are of lowest precedence and left-associative.

(Blanks denote error entries.) The reader should try out the table to see that it works correctly, ignoring problems with unary minus for the moment. Try the table on the input  $id * (id \dagger id) - id / id$ , for example.  $\square$

|    | + | - | * | / | † | id | ( | ) | \$ |
|----|---|---|---|---|---|----|---|---|----|
| +  | < | > | < | < | < | <  | < | > | >  |
| -  | > | > | > | < | < | <  | < | > | >  |
| *  | > | > | > | > | < | <  | < | > | >  |
| /  | > | > | > | > | < | <  | < | > | >  |
| †  | > | > | > | > | < | <  | < | > | >  |
| id | > | > | > | > | > | <  | < | > | >  |
| (  | < | < | < | < | < | <  | < | = | >  |
| )  | > | > | > | > | > | <  | < | > | >  |
| \$ | < | < | < | < | < | <  | < | > | >  |

Fig. 4.25. Operator-precedence relations.

### Handling Unary Operators

If we have a unary operator such as  $\sim$  (logical negation), which is not also a binary operator, we can incorporate it into the above scheme for creating operator-precedence relations. Supposing  $\sim$  to be a unary prefix operator, we make  $\theta < \sim$  for any operator  $\theta$ , whether unary or binary. We make  $\sim > \theta$  if  $\sim$  has higher precedence than  $\theta$  and  $\sim < \theta$  if not. For example, if  $\sim$  has higher precedence than  $\&$ , and  $\&$  is left-associative, we would group  $E\&\sim E\&E$  as  $(E\&(\sim E))\&E$ , by these rules. The rule for unary postfix operators is analogous.

The situation changes when we have an operator like the minus sign  $-$  that is both unary prefix and binary infix. Even if we give unary and binary minus the same precedence, the table of Fig. 4.25 will fail to parse strings like  $id * - id$  correctly. The best approach in this case is to use the lexical analyzer to distinguish between unary and binary minus, by having it return a different token when it sees unary minus. Unfortunately, the lexical analyzer cannot use lookahead to distinguish the two; it must remember the previous token. In Fortran, for example, a minus sign is unary if the previous token was an operator, a left parenthesis, a comma, or an assignment symbol.

### Precedence Functions

Compilers using operator-precedence parsers need not store the table of precedence relations. In most cases, the table can be encoded by two *precedence functions*  $f$  and  $g$  that map terminal symbols to integers. We attempt to select  $f$  and  $g$  so that, for symbols  $a$  and  $b$ ,

1.  $f(a) < g(b)$  whenever  $a < \cdot b$ ,
2.  $f(a) = g(b)$  whenever  $a \doteq b$ , and
3.  $f(a) > g(b)$  whenever  $a \cdot > b$ .

Thus the precedence relation between  $a$  and  $b$  can be determined by a

numerical comparison between  $f(a)$  and  $g(b)$ . Note, however, that error entries in the precedence matrix are obscured, since one of (1), (2), or (3) holds no matter what  $f(a)$  and  $g(b)$  are. The loss of error detection capability is generally not considered serious enough to prevent the using of precedence functions where possible; errors can still be caught when a reduction is called for and no handle can be found.

Not every table of precedence relations has precedence functions to encode it, but in practical cases the functions usually exist.

**Example 4.29.** The precedence table of Fig. 4.25 has the following pair of precedence functions,

|     | + | - | * | / | † | ( | ) | id | \$ |
|-----|---|---|---|---|---|---|---|----|----|
| $f$ | 2 | 2 | 4 | 4 | 4 | 0 | 6 | 6  | 0  |
| $g$ | 1 | 1 | 3 | 3 | 5 | 5 | 0 | 5  | 0  |

For example,  $* < id$ , and  $f(*) < g(id)$ . Note that  $f(id) > g(id)$  suggests that  $id \cdot > id$ ; but, in fact, no precedence relation holds between  $id$  and  $id$ . Other error entries in Fig. 4.25 are similarly replaced by one or another precedence relation.  $\square$

A simple method for finding precedence functions for a table, if such functions exist, is the following.

**Algorithm 4.6.** Constructing precedence functions.

*Input.* An operator precedence matrix.

*Output.* Precedence functions representing the input matrix, or an indication that none exist.

*Method.*

1. Create symbols  $f_a$  and  $g_a$  for each  $a$  that is a terminal or  $\$$ .
2. Partition the created symbols into as many groups as possible, in such a way that if  $a \doteq b$ , then  $f_a$  and  $g_b$  are in the same group. Note that we may have to put symbols in the same group even if they are not related by  $\doteq$ . For example, if  $a \doteq b$  and  $c \doteq b$ , then  $f_a$  and  $f_c$  must be in the same group, since they are both in the same group as  $g_b$ . If, in addition,  $c \doteq d$ , then  $f_a$  and  $g_d$  are in the same group even though  $a \doteq d$  may not hold.
3. Create a directed graph whose nodes are the groups found in (2). For any  $a$  and  $b$ , if  $a < b$ , place an edge from the group of  $g_b$  to the group of  $f_a$ . If  $a \cdot > b$ , place an edge from the group of  $f_a$  to that of  $g_b$ . Note that an edge or path from  $f_a$  to  $g_b$  means that  $f(a)$  must exceed  $g(b)$ ; a path from  $g_b$  to  $f_a$  means that  $g(b)$  must exceed  $f(a)$ .
4. If the graph constructed in (3) has a cycle, then no precedence functions exist. If there are no cycles, let  $f(a)$  be the length of the longest path

beginning at the group of  $f_a$ ; let  $g(a)$  be the length of the longest path from the group of  $g_a$ .  $\square$

**Example 4.30.** Consider the matrix of Fig. 4.23. There are no  $\doteq$  relationships, so each symbol is in a group by itself. Figure 4.26 shows the graph constructed using Algorithm 4.6.



Fig. 4.26. Graph representing precedence functions.

There are no cycles, so precedence functions exist. As  $f_{\$}$  and  $g_{\$}$  have no out-edges,  $f(\$) = g(\$) = 0$ . The longest path from  $g_+$  has length 1, so  $g(+) = 1$ . There is a path from  $g_M$  to  $f_*$  to  $g_*$  to  $f_+$  to  $g_+$  to  $f_{\$}$ , so  $g(id) = 5$ . The resulting precedence functions are:

|     | + | * | id | \$ |
|-----|---|---|----|----|
| $f$ | 2 | 4 | 4  | 0  |
| $g$ | 1 | 3 | 5  | 0  |

$\square$

### Error Recovery in Operator-Precedence Parsing

There are two points in the parsing process at which an operator-precedence parser can discover syntactic errors:

1. If no precedence relation holds between the terminal on top of the stack and the current input.<sup>1</sup>
2. If a handle has been found, but there is no production with this handle as a right side.

Recall that the operator-precedence parsing algorithm (Algorithm 4.5) appears to reduce handles composed of terminals only. However, while nonterminals

<sup>1</sup> In compilers using precedence functions to represent the precedence tables, this source of error detection may be unavailable.

are treated anonymously, they still have places held for them on the parsing stack. Thus when we talk in (2) above about a handle matching a production's right side, we mean that the terminals are the same and the positions occupied by nonterminals are the same.

We should observe that, besides (1) and (2) above, there are no other points at which errors could be detected. When scanning down the stack to find the left end of the handle in steps (10-12) of Fig. 4.24, the operator-precedence parsing algorithm, we are sure to find a  $\prec$  relation, since \$ marks the bottom of stack and is related by  $\prec$  to any symbol that could appear immediately above it on the stack. Note also that we never allow adjacent symbols on the stack in Fig. 4.24 unless they are related by  $\prec$  or  $\doteq$ . Thus steps (10-12) must succeed in making a reduction.

Just because we find a sequence of symbols  $a \prec b_1 \doteq b_2 \doteq \dots \doteq b_k$  on the stack, however, does not mean that  $b_1 b_2 \dots b_k$  is the string of terminal symbols on the right side of some production. We did not check for this condition in Fig. 4.24, but we clearly can do so, and in fact we must do so if we wish to associate semantic rules with reductions. Thus we have an opportunity to detect errors in Fig. 4.24, modified at steps (10-12) to determine what production is the handle in a reduction.

#### *Handling Errors During Reductions*

We may divide the error detection and recovery routine into several pieces. One piece handles errors of type (2). For example, this routine might pop symbols off the stack just as in steps (10-12) of Fig. 4.24. However, as there is no production to reduce by, no semantic actions are taken; a diagnostic message is printed instead. To determine what the diagnostic should say, the routine handling case (2) must decide what production the right side being popped "looks like." For example, suppose *abc* is popped, and there is no production right side consisting of *a*, *b* and *c* together with zero or more nonterminals. Then we might consider if deletion of one of *a*, *b*, and *c* yields a legal right side (nonterminals omitted). For example, if there were a right side *aEcE*, we might issue the diagnostic

*illegal b on line (line containing b)*

We might also consider changing or inserting a terminal. Thus if *abEdc* were a right side, we might issue a diagnostic

*missing d on line (line containing c)*

We may also find that there is a right side with the proper sequence of terminals, but the wrong pattern of nonterminals. For example, if *abc* is popped off the stack with no intervening or surrounding nonterminals, and *abc* is not a right side but *aEbc* is, we might issue a diagnostic

*missing E on line (line containing b)*

Here  $E$  stands for an appropriate syntactic category represented by nonterminal  $E$ . For example, if  $a$ ,  $b$ , or  $c$  is an operator, we might say "expression;" if  $a$  is a keyword like `if`, we might say "conditional."

In general, the difficulty of determining appropriate diagnostics when no legal right side is found depends upon whether there are a finite or infinite number of possible strings that could be popped in lines (10-12) of Fig. 4.24. Any such string  $b_1 b_2 \dots b_k$  must have  $\doteq$  relations holding between adjacent symbols, so  $b_1 \doteq b_2 \doteq \dots \doteq b_k$ . If an operator precedence table tells us that there are only a finite number of sequences of terminals related by  $\doteq$ , then we can handle these strings on a case-by-case basis. For each such string  $x$  we can determine in advance a minimum-distance legal right side  $y$  and issue a diagnostic implying that  $x$  was found when  $y$  was intended.

It is easy to determine all strings that could be popped from the stack in steps (10-12) of Fig. 4.24. These are evident in the directed graph whose nodes represent the terminals, with an edge from  $a$  to  $b$  if and only if  $a \doteq b$ . Then the possible strings are the labels of the nodes along paths in this graph. Paths consisting of a single node are possible. However, in order for a path  $b_1 b_2 \dots b_k$  to be "poppable" on some input, there must be a symbol  $a$  (possibly  $\$$ ) such that  $a \leq b_1$ . Call such a  $b_1$  *initial*. Also, there must be a symbol  $c$  (possibly  $\$$ ) such that  $b_k \geq c$ . Call  $b_k$  *final*. Only then could a reduction be called for and  $b_1 b_2 \dots b_k$  be the sequence of symbols popped. If the graph has a path from an initial to a final node containing a cycle, then there are an infinity of strings that might be popped; otherwise, there are only a finite number.



Fig. 4.27. Graph for precedence matrix of Fig. 4.25.

**Example 4.31.** Let us reconsider grammar (4.17):

$$E \rightarrow E+E \mid E-E \mid E*E \mid E/E \mid E \uparrow E \mid (E) \mid -E \mid \text{id}$$

The precedence matrix for this grammar was shown in Fig. 4.25, and its graph is given in Fig. 4.27. There is only one edge, because the only pair related by  $\doteq$  is the left and right parenthesis. All but the right parenthesis are initial, and all but the left parenthesis are final. Thus the only paths from an initial to a final node are the paths  $+$ ,  $-$ ,  $*$ ,  $/$ ,  $\text{id}$ , and  $\uparrow$  of length one, and the path from  $($  to  $)$  of length two. There are but a finite number, and each corresponds to the terminals of some production's right side in the grammar. Thus the error checker for reductions need only check that the proper set of

nonterminal markers appears among the terminal strings being reduced. Specifically, the checker does the following:

1. If +, -, \*, /, or ! is reduced, it checks that nonterminals appear on both sides. If not, it issues the diagnostic

*missing operand*

2. If Id is reduced, it checks that there is no nonterminal to the right or left. If there is, it can warn

*missing operator*

3. If ( ) is reduced, it checks that there is a nonterminal between the parentheses. If not, it can say

*no expression between parentheses*

Also it must check that no nonterminal appears on either side of the parentheses. If one does, it issues the same diagnostic as in (2).  $\square$

If there are an infinity of strings that may be popped, error messages cannot be tabulated on a case-by-case basis. We might use a general routine to determine whether some production right side is close (say distance 1 or 2, where distance is measured in terms of tokens, rather than characters, inserted, deleted, or changed) to the popped string and if so, issue a specific diagnostic on the assumption that that production was intended. If no production is close to the popped string, we can issue a general diagnostic to the effect that "something is wrong in the current line."

#### *Handling Shift/Reduce Errors*

We must now discuss the other way in which the operator-precedence parser detects errors. When consulting the precedence matrix to decide whether to shift or reduce (lines (6) and (9) of Fig. 4.24), we may find that no relation holds between the top stack symbol and the first input symbol. For example, suppose *a* and *b* are the two top stack symbols (*b* is at the top), *c* and *d* are the next two input symbols, and there is no precedence relation between *b* and *c*. To recover, we must modify the stack, input or both. We may change symbols, insert symbols onto the input or stack, or delete symbols from the input or stack. If we insert or change, we must be careful that we do not get into an infinite loop, where, for example, we perpetually insert symbols at the beginning of the input without being able to reduce or to shift any of the inserted symbols.

One approach that will assure us no infinite loops is to guarantee that after recovery the current input symbol can be shifted (if the current input is \$, guarantee that no symbol is placed on the input, and the stack is eventually shortened). For example, given *ab* on the stack and *cd* on the input, if  $a \leq c^2$

<sup>2</sup> We use  $\leq$  to mean  $<$  or  $=$ .

we might pop  $b$  from the stack. Another choice is to delete  $c$  from the input if  $b \leq d$ . A third choice is to find a symbol  $e$  such that  $b \leq e \leq c$  and insert  $e$  in front of  $c$  on the input. More generally, we might insert a string of symbols such that

$$b \leq e_1 \leq e_2 \leq \dots \leq e_n \leq c$$

if a single symbol for insertion could not be found. The exact action chosen should reflect the compiler designer's intuition regarding what error is likely in each case.

For each blank entry in the precedence matrix we must specify an error-recovery routine; the same routine could be used in several places. Then when the parser consults the entry for  $a$  and  $b$  in step (6) of Fig. 4.24, and no precedence relation holds between  $a$  and  $b$ , it finds a pointer to the error-recovery routine for this error.

**Example 4.32.** Consider the precedence matrix of Fig. 4.25 again. In Fig. 4.28, we show the rows and columns of this matrix that have one or more blank entries, and we have filled in these blanks with the names of error handling routines.

|    | id | (  | )  | \$ |
|----|----|----|----|----|
| id | c3 | c3 | >  | >  |
| (  | <  | <  | =  | e4 |
| )  | e3 | e3 | >  | >  |
| \$ | <  | <  | e2 | e1 |

Fig. 4.28. Operator-precedence matrix with error entries.

The substance of these error handling routines is as follows:

- e1: /\* called when whole expression is missing \*/
  - insert id onto the input
  - issue diagnostic: "missing operand"
- e2: /\* called when expression begins with a right parenthesis \*/
  - delete ) from the input
  - issue diagnostic: "unbalanced right parenthesis"
- e3: /\* called when id or ) is followed by id or ( \*/
  - insert + onto the input
  - issue diagnostic: "missing operator"
- e4: /\* called when expression ends with a left parenthesis \*/
  - pop ( from the stack
  - issue diagnostic: "missing right parenthesis"

Let us consider how this error-handling mechanism would treat the

erroneous input  $\text{id} + )$ . The first actions taken by the parser are to shift  $\text{id}$ , reduce it to  $E$  (we again use  $E$  for anonymous nonterminals on the stack), and then to shift the  $+$ . We now have configuration

| STACK   | INPUT |
|---------|-------|
| $\$E +$ | $)\$$ |

Since  $+ \cdot > )$  a reduction is called for, and the handle is  $+$ . The error checker for reductions is required to inspect for  $E$ 's to left and right. Finding one missing, it issues the diagnostic

missing operand

and does the reduction anyway.

Our configuration is now

|       |       |
|-------|-------|
| $\$E$ | $)\$$ |
|-------|-------|

There is no precedence relation between  $\$$  and  $)$ , and the entry in Fig. 4.28 for this pair of symbols is e2. Routine e2 causes diagnostic

unbalanced right parenthesis

to be printed and removes the right parenthesis from the input. We are now left with the final configuration for the parser.

|       |      |   |
|-------|------|---|
| $\$E$ | $\$$ | □ |
|-------|------|---|

#### 4.7 LR PARSERS

This section presents an efficient, bottom-up syntax analysis technique that can be used to parse a large class of context-free grammars. The technique is called  $LR(k)$  parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a rightmost derivation in reverse, and the  $k$  for the number of input symbols of lookahead that are used in making parsing decisions. When  $(k)$  is omitted,  $k$  is assumed to be 1. LR parsing is attractive for a variety of reasons.

- LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written.
- The LR parsing method is the most general nonbacktracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods.
- The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers.
- An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input.

handle - substring that matches  
the right side of the production

### Shift Reduce Process:

| Start | RSF    | Action                    |
|-------|--------|---------------------------|
| \$    | 000111 | shift 01                  |
| for   | 0011   | reduce $S \rightarrow 01$ |
| for   | 0011   |                           |

27/04/18: Conflict in shift reduce parsing:

1. Shift Reduce conflict — example:

2. Reduce-Reduce conflict:

of productions with same production on the right

$S \rightarrow id$

$P \rightarrow id$

| Start | RSF      | Action     |
|-------|----------|------------|
| \$    | id id \$ | shift else |

Parser doesn't know either id should be reduced to S or P.

### Operator precedence Parser:

Grammer G is operator grammar iff:

i) No Eproduction

ii) No two adjacent non-terminals.

operator

Grammer

↓ input

Operator precedence Par

↓

parse tree (postfix expression)

Ex:  $E \rightarrow EA\epsilon | id$  if not a OG

$A \rightarrow * | +$

$E \rightarrow E+E | E\epsilon E$  (id) ✓ OG.

### Steps for operator precedence parsing:

#### Problems:

- Check whether the given grammar is operator grammar or not, possible try to convert.
- Generate operator relation table
- Parse the input string
- Construct the parse tree.

PROBLEM:

- v Construct the operator precedence parser for the given grammar and parse the given input string.

$$E \rightarrow EA E \mid id$$

$$A \rightarrow + \mid *$$

- (i) Converting to operator grammar

$$E \rightarrow E+E \mid E*E \mid id$$

- (ii) Operator Relation Table:

Assumption: identifier - highest precedence

right associative

\* - left associative ( $\Rightarrow$ )

( $\Leftarrow$ )

+ - left associative

& - least precedence

|    | id    | +             | *     | &     |
|----|-------|---------------|-------|-------|
| id | -     | $\Rightarrow$ | $\gg$ | $\gg$ |
| +  | $\ll$ | $\gg$         | $\ll$ | $\gg$ |
| *  | $\ll$ | $\gg$         | $\gg$ | $\gg$ |
| &  | $\ll$ | $\ll$         | $\ll$ | -     |

Accept



- (iii) Parse the input string:

| Stack  | input          | Relation | Action  |
|--------|----------------|----------|---------|
| &      | id + id * id / | $\ll$    | push id |
| & id   | + id * id /    | $\gg$    | pop id  |
| & +    | + id * id /    | $\ll$    | push +  |
| & + id | * id /         | $\ll$    | push id |
| & + id | * id /         | $\gg$    | pop id  |
| & +    | + id /         | $\ll$    | push +  |
| & + *  | + id /         | $\ll$    | push id |
| & + *  | id /           | $\gg$    | pop id  |
| & +    | /              | $\gg$    | pop *   |
| & +    | /              | $\gg$    | pop +   |
| &      | /              | -        | Accept  |

28/04/18

$$E \rightarrow E+E \mid E \cdot E \mid EXE \mid E/E \mid EE \mid E(E) \mid id$$

input: id \* (id + id) - id / id

⇒ It is an OG

(i) Generate relation table

id - highest precedence      + - → left associative  
 () - right associative      \* - least precedence  
 ↑ - right associative  
 \*/ - left associative

|    | id | + | - | * | / | ↑ | ( | ) | §      |
|----|----|---|---|---|---|---|---|---|--------|
| id | -  | > | > | > | > | > | - | > | >      |
| +  | <  | < | > | < | < | < | < | > | >      |
| -  | <  | > | > | < | < | < | < | > | >      |
| *  | <  | > | > | > | > | < | < | > | >      |
| /  | <  | > | > | > | > | < | < | > | >      |
| ↑  | <  | > | > | > | > | < | < | > | >      |
| (  | <  | < | < | < | < | < | < | = | -      |
| )  | -  | > | > | > | > | > | - | > | >      |
| §  | <  | < | < | < | < | < | < | - | Accept |

(iii) Parse input:

| Stack    | input                      | Relation | Action  |
|----------|----------------------------|----------|---------|
| §        | id * (id + id) - id / id § | <        | push id |
| § id     | * (id + id) - id / id §    | >        | pop id  |
| §        | * (id + id) - id / id §    | <        | push *  |
| § * (    | (id + id) - id / id §      | <        | push (  |
| § * ( id | id + id) - id / id §       | <        | push id |
| § * ( id | id + id) - id / id §       | >        | pop id  |
| § * (    | id + id) - id / id §       | <        | push +  |
| § * ( id | id + id) - id / id §       | <        | push id |
| § * ( id | id + id) - id / id §       | >        | pop id  |
| § * ( 1  | id + id) - id / id §       | >        | pop 1   |
| § * ( )  | id + id) - id / id §       | =        | push )  |
| § * ( )  | - id / id §                | >        | pop )   |
| § *      | - id / id §                | >        | pop *   |

|           |            |   |         |
|-----------|------------|---|---------|
| \$ -      | id / id \$ | < | push id |
| \$ - id   | / id \$    | > | pop id  |
| \$ -      | / id \$    | < | push /  |
| \$ - /    | id \$      | < | push id |
| \$ - / id | \$         | > | pop id  |
| \$ - /    | \$         | > | pop /   |
| \$ -      | \$         | > | pop -   |
| \$        | \$         | - | Accept  |

Algorithm: Operator precedence parsing algorithm:

Input: An input string  $w$  and a table of precedence relations.

Output: If  $w$  is well formed, a skeletal parse tree, with a placeholder non-terminal  $\epsilon$  labelling all interior nodes otherwise, an error indication.

Method: Initially the stack contains  $\$$  and the input buffer the input buffer the string  $w\$$ . To parse, we execute the program.

1. Set input to point to the first symbol of  $w\$$ .
2. repeat forever
3. if  $\$$  is on top of stack and ip points to  $\$$  then
4. return
5. Let  $a$  be the topmost terminal symbol on the stack and let  $b$  be the symbol pointed to by ip
6. if  $a \in b$  or  $a = b$  then begin
7.     push  $b$  onto the stack
8.     advance ip to next input symbol  
end;
9. else if  $a > b$  then
10.   repeat
11.   pop the stack over any of :
12.   until the top stack terminal is related by  $<$  to the terminal most recently popped
13. else error()

30/04/18

$$S \rightarrow (L) | a$$

$$L \rightarrow L, S | s$$

$$i/p : (a, (a,a))$$

if H is an OG

iii Relation table:

|    | id | , | ( | ) | \$     |  |
|----|----|---|---|---|--------|--|
| id | -  | > | - | > | >      |  |
| ,  | <  | - | < | > | >      |  |
| (  | <  | < | < | = | -      |  |
| )  | -  | > | - | > | >      |  |
| \$ | <  | < | < | - | Accept |  |

a - highest  
& least      ( - left  
) - left

Note: Do remember to

put the relation  
operator based  
on associativity i.e.

which one is evaluated  
first. After parsing  
we get the parse tree

iiii Input is  $(a, (a,a))\$$

| Stack   | Input          | Relation | Action       |
|---------|----------------|----------|--------------|
| \$      | $(a, (a,a))\$$ | <        | Push (       |
| \$c     | $a, (a,a)\$$   | <        | Push a       |
| \$ca    | $, (a,a)\$$    | >        | Pop a        |
| \$c,    | $, (a,a)\$$    | <        | push ,       |
| \$c,c   | $(a,a)\$$      | <        | push (       |
| \$c,c   | $a)\$$         | <        | push a       |
| \$c,ca  | $, a)\$$       | >        | pop a        |
| \$c,c,  | $, a)\$$       | <        | pop , push , |
| \$c,c,  | $a)\$$         | <        | push , push  |
| \$c,c,a | $\rangle)\$$   | >        | pop → pop    |
| \$c,c,  | $\rangle)\$$   | >        | pop → pop    |
| \$c,c,  | $\rangle)\$$   | =        | push ) push  |
| \$c,()  | $\rangle)$     | >        | pop ) pop    |
| \$c,    | $\rangle)$     | >        | pop ,        |
| \$c     | $\rangle)$     | =        | push )       |
| \$()    | $\$$           | >        | pop ()       |
| \$      |                | Accept   |              |



Drawbacks of operator relation table:

- It is very difficult to handle tokens like '-' which has two precedence functions based on whether it is unary operator or binary operator.
  - Only small class of grammars can be parsed
  - If we ever have  $n$  operators, then the no. of entries in the table are  $4n^4 = 16$  entries i.e. in general, if the number of operators are  $n$ , we need  $O(n^4)$  entries. To overcome this we go for operator precedence functions.

## Operator precedenie funkcií:

- The parsers does not store relation table instead they make use of precedence functions which map the terminal symbols to integers.
  - It uses two functions i.e  $f_a$  and  $g_b$  for the symbols  $a$  and  $b$ .
    - (i) if  $a > b$ , then there is an arrow from function  $f_a$  to function  $g_b$  (edge)
    - (ii) if  $a \leq b$ , then there is an edge from  $g_b$  to  $f_a$
    - (iii) if  $a = b$  then  $f_a = g_b$  are in the same group. Note that even if they are not related by  $=$  directly we group them together for example if  $a = b$  and  $c = b$  then  $f_a$  and  $f_c$  are in the same group since they are both in the same group as  $g_b$ .
    - (iv) If the graph constructed has no cycle, then the precedence functions exists.
    - (v) Find the longest path in the function starting from terminal to  $t$  i.e  $f(a)$  to  $t$  and  $g(a)$  to  $t$ . Using these

# Vocabulary of operators relation table

classmate

Date \_\_\_\_\_  
Page \_\_\_\_\_

Example:  $E \rightarrow E+E \mid E * E \mid id$

|      | $id$         | $+$           | $*$           | $\$$          |
|------|--------------|---------------|---------------|---------------|
| $id$ | -            | $\rightarrow$ | $\rightarrow$ | $\rightarrow$ |
| $+$  | $\leftarrow$ | $\rightarrow$ | $\star$       | $\rightarrow$ |
| $*$  | $\leftarrow$ | $\rightarrow$ | $\rightarrow$ | $\rightarrow$ |
| $\$$ | $\leftarrow$ | $\leftarrow$  | $\leftarrow$  | -             |



The longest path is

$fid \rightarrow g*$   $\rightarrow f+ \rightarrow q+ \rightarrow f\$$   
 $gid \rightarrow f+ \rightarrow g* \rightarrow f+ \rightarrow q+ \rightarrow$

| $f$ | $+$ | $*$ | $\$$ |   |
|-----|-----|-----|------|---|
| $q$ | 4   | 2   | 4    | 0 |
|     | 5   | 1   | 3    | 0 |

Advantages: lesser entries

disadvantages: for blank entries of relation table we get non-blank entries in junction table i.e. we can't make out the errors during parsing

H-W Construct an operator precedence parser for the given grammar and parse an input string

- ✓ (i)  $E \rightarrow E+E \mid E * E \mid (E) \mid id$  ip:  $(id + id * id)$ : it is an OG
- (ii)  $E \rightarrow E+T \mid T$   
 $T \rightarrow T * V \mid V$  input:  $a + b * c * d$ .  
 $V \rightarrow a \mid b \mid c \mid d$

(ii)  $\Rightarrow$  it is an OG

iii Relation table

|      | $id$         | $+$           | $*$           | $\$$          |
|------|--------------|---------------|---------------|---------------|
| $id$ | -            | $\rightarrow$ | $\rightarrow$ | $\rightarrow$ |
| $+$  | $\leftarrow$ | $\rightarrow$ | $\leftarrow$  | $\rightarrow$ |
| $*$  | $\leftarrow$ | $\leftarrow$  | -             | $\rightarrow$ |

input       $a + b * c * d + f$

| Stack | Input     | Relation | Action |
|-------|-----------|----------|--------|
| \$    | a+b*c*d\$ | <        | push a |
| \$a   | +b*c*d\$  | >        | pop a  |
| \$+   | +b*c*d\$  | <        | push + |
| \$+b  | b*c*d\$   | <        | push b |
| \$+b  | *c*d\$    | >        | pop b  |
| \$+   | *c*d\$    | <        | push * |
| \$+*  | c*d\$     | <        | push c |
| \$+*c | *d\$      | >        | pop c  |
| \$+*  | *d\$      | >        | pop *  |
| \$+   | *d\$      | <        | push * |
| \$++  | d\$       | <        | push d |
| \$++d | \$        | >        | pop d  |
| \$++  | \$        | >        | pop *  |
| \$+   | \$        | >        | pop +  |
| \$    | t         | Accept   |        |

५

| Relation | id | +  | *  | (  | )  | \$ | Stack  | input   | Relation     | Action    |
|----------|----|----|----|----|----|----|--------|---------|--------------|-----------|
| table    | id | -  | →  | →  | -  | →  | →      | \$      | (id+id{id})t | ← push (  |
|          | +  | <- | →  | <- | <- | →  | →      | \$()    | id+id{id})t  | ← push id |
|          | *  | <- | →  | →  | <- | →  | →      | \$id    | +id{id})t    | → pop id  |
|          | (  | <- | <- | <- | <- | =  | -      | \$()    | +id{id})t    | ← push +  |
|          | )  | -  | →  | →  | -  | →  | →      | \$(+    | id{id})t     | ← push id |
|          | \$ | <- | <- | <- | <- | -  | Accept | \$(+id  | → id)t       | → pop id  |
|          |    |    |    |    |    |    |        | \$(+*   | → id)t       | ← push *  |
|          |    |    |    |    |    |    |        | \$(+*)  | id)t         | ← push id |
|          |    |    |    |    |    |    |        | \$(+*id | )t           | → pop id  |
|          |    |    |    |    |    |    |        | \$(+*)  | )t           | → pop *   |
|          |    |    |    |    |    |    |        | \$(+    | )t           | → pop +   |
|          |    |    |    |    |    |    |        | \$()    | )t           | = push )  |
|          |    |    |    |    |    |    |        | \$()    | t            | → pop     |

# UNIT - 5

## SYNTAX - DIRECTED TRANSLATION

### CONTENTS

- Syntax directed definitions
- Evaluation orders for SRT's
- Application of SRT
- SRT schemes.

### Syntax Directed Definitions:

A syntax directed definition in a context free grammars with attributes & rules. Attributes are associated with grammar symbols & rules with production. If 'x' is a symbol, 'a' is one of attributes then we write  $x.a$  to denote value of  $a$  at a particular part of tree. Note that attributes may be of many kinds: numbers, types, tables, references, strings, etc...

- \* 2 types of attributes:
  - ▷ Synthesized attr
  - ▷ Inherited attr

① Synthesized attr: A synth attr for a nonterminal A at a parse tree node N is defined by a semantic rule associated with the production at N. Note that the production must have A as its head. A synthesized attr at node N is defined only in terms of attribute values at the children of N or at N itself.

② Inherited attr: An inherited attr for a nonterminal B at a parse tree node N is defined by a semantic rule associated with the production at the parent of N. Note that the production must have B as a symbol in its body. An inherited attr at node N is defined only

in terms of attr values at N's parent, N itself & N's sibling

NOTE ① Synthesized  $\begin{cases} N \rightarrow \text{Node under consideration} \\ C \rightarrow \text{child} \end{cases}$



Case(i) Single child



Case(ii) Rightmost child



Case(iii) No child  
⇒ Itself.

② Inherited

$\begin{cases} P \rightarrow \text{parent} \\ C \rightarrow \text{node under consideration} \\ S \rightarrow \text{Sibling} \\ O \rightarrow \text{operator} \end{cases}$



Case(i) sibling



Case (2) : inherited from both parent & sibling.

③ Terminals can have Synthesized attributes but not inherited attributes.

\* Attr of terminals have lexical value that are supplied by lexical analysis.

\* Types of SAD : ① S Attributed SAD  
② L Attributed SAD

① S-Attributed SAD

- \* A SAD that involves only synthesized attrs  
other it is called S-attributed SAD
- \* In S-attributed SAD, each rule consists

an attribute for the terminal at the head of a production from attribute taken from body of production.

→ S-attributed SOD can be implemented naturally in conjunction with an LR parser / bottom up parser.

### \* Annotated parse tree

A parse tree showing the value(s) of attribute(s) is called an annotated parse tree.

\* It is used in bottom-up parser.

\* Order of evaluation is postorder traversal.

\* Example of S-attributed SOD.

| Production                      | Semantic rules                |
|---------------------------------|-------------------------------|
| 1) $L \rightarrow E_n$          | $L.val = E.val$               |
| 2) $E \rightarrow EI + P$       | $E.val = EI.val + P.val$      |
| 3) $E \rightarrow P$            | $E.val = P.val$               |
| 4) $P \rightarrow T, *F$        | $P.val = T.val * F.val$       |
| 5) $T \rightarrow F$            | $T.val = F.val$               |
| 6) $F \rightarrow (E)$          | $F.val = E.val$               |
| 7) $F \rightarrow \text{digit}$ | $F.val = \text{digit}.lexval$ |

### ② L-Attributed SOD

\* Example of mixed attributes / L-attributed SOD

| Production                      | Semantic rules                                 |
|---------------------------------|------------------------------------------------|
| 1) $T \rightarrow FT'$          | $T'.inh = F.val$<br>$T.val = T'.syn$           |
| 2) $T' \rightarrow *FT''$       | $T''.inh = T'.inh * F.val$<br>$T.syn = T'.syn$ |
| 3) $T' \rightarrow E$           | $T'.syn = T'.inh$                              |
| 4) $F \rightarrow \text{digit}$ | $F.val = \text{digit}.lexval$                  |

- \* A SAD which has both synthesised & inherited attributes is called as C-attributed SAD.
- \* It is used in top down parsing.
- \* Order of evaluation is topological sorting.

### Evaluating Orders For SAD's

- \* A dependency graph is used to determine the order of computation of attributes.
- \* While an annotated parse trees shows the values of attributes, a dependency graph helps us to determine how those values can be computed.

### DEPENDENCY GRAPHS

A dependency graph predicts the flow of information among the attribute instances in a particular parse tree: An edge from one attribute instance to another means that the value of first is needed to compute the second.

- 1) For each parse tree node, say node  $X$ , the dependency graph has a node for each attribute associated with  $X$ .
- 2) If a semantic rule (defines) associated with a product ' $p$ ' defines the value of synthesised attribute  $A.b$  in terms of value of  $X.c$  then the dependency graph has an edge from  $X.c$  to  $A.b$ .
- 3) If a semantic rule associated with a product ' $p$ ' defines the value of inherited attribute  $B.c$  in terms of value of  $X.a$  then the

dependency graph has edge from X.C to B.C

Eg1: production

$$E \rightarrow E_1 + T$$

Semantic Rule

$$E.\text{Val} = E_1.\text{Val} + T.\text{Val}$$

fig: E.Val is synthesized from E<sub>1</sub>.Val & T.Val



Eg2: Production

$$T \rightarrow FT'$$

$$T' \rightarrow *FT'$$

$$T' \rightarrow \epsilon$$

$$F \rightarrow \text{digit}$$

Semantic Rule.

$$T'.\text{inh} = F.\text{val}$$

$$T.\text{Val} = T'.\text{syn}$$

$$T'.\text{inh} = T'.\text{inh} + F.\text{val}$$

$$T'.\text{syn} = T'.\text{syn}$$

$$T'.\text{syn} = T'.\text{inh}$$

$$F.\text{val} = \text{digit.lexval}$$

fig: Dependency graph for above production.



Ordering the evaluation of attributes

- \* If the dependency graph has an edge from node M to N, then the attribute corresponding to M must be evaluated before attribute of N.

- \* Thus the only allowable orders of evaluation are those sequences of nodes  $N_1, N_2, \dots, N_k$  if there is an edge of the dependency graph from  $N_i$  to  $N_j$  then  $i < j$ .
- \* Such an ordering is called a topological sorting of a graph.
- \* If there is any cycle then no topological sort, i.e., evaluation of SAD is not possible.
- \* Eg: For dependency graph for (Eq 2) in previous page  
topological sorting: 1, 2, 3, 4, 5, 6, 7, 8, 9  
 (or)  
 1, 3, 5, 2, 4, 6, 7, 8, 9

### S-Attributed Definitions

- An SAD is S-attributed if every attribute is synthesized.
- When SAD is S-attributed, it attributes evaluated in any bottom-up order of nodes of parse trees.
- we can have post-order traversal of parse tree to evaluate attributes in S-attributed definitn.

```

Postorder(N) {
    for(each child C of N, from the left)
        postorder(c);
    evaluate the attributes associated with
    node N;
}
  
```

- S-Attributed definitions can be implemented during bottom up parsing without the need to explicitly create parse trees.

## Attributed Definitions

- \* A SOD is K-attributed if the edges in dependency graph goes from Left to Right but not from Right to Left.
- \* More precisely, each attribute must be either
  - synthesized.
  - Inherited , but if there is a production  $A \rightarrow X_1 X_2 \dots X_n$  & there is an inherited attribute  $X_i$ . a Competed by a rule associated with this product then the rule may only use:
    - Inherited attribute associated with the head A.
    - Either inherited or synthesised attr associated with the occurrence of symbols  $X_1, X_2, \dots, X_{i-1}$  located to the left of  $X_i$ .
    - Inherited/Synthesized attr associated with this occurrence of  $X_i$  itself but only in such a way that there is no cycle in the graph.

## PROBLEMS

- 5.1. Write a SOD for simple desk calculator

Sol: SOD definition

$$i/p = 3 * 5 + 4 n$$

### PRODUCTION

- 1)  $A \rightarrow E_n$
- 2)  $E \rightarrow E_1 + T$
- 3)  $E \rightarrow T$
- 4)  $T \rightarrow T_1 * F$
- 5)  $T \rightarrow F$
- 6)  $F \rightarrow ( E )$
- 7)  $F \rightarrow \text{digit}$

### SEMANTIC RULES

- L.val = E.val  
 $E.\text{val} = E_1.\text{val} + T.\text{val}$   
 $E.\text{val} = T.\text{val}$   
 $T.\text{val} = T_1.\text{val} * F.\text{val}$   
 $T.\text{val} = F.\text{val}$   
 $F.\text{val} = E.\text{val}$   
 $E.\text{val} = \text{digit.lexval}$

### Step 2: Annotated Parse tree



### Step 3: Dependency graph



Step 4: Topological Order: ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬

write the SOR of construct annotated parse tree, dependency graph.

- $(3+4) * (5+6)n$
- $1 * 2 * 3 * (4+5)n$
- $(9+8 * (7+6)+5) * 4n$

a)  $(3+4) * (5+6)n$

Step 1: SOR Definition

| PRODUCTION                   | SEMANTIC RULES                |
|------------------------------|-------------------------------|
| $L \rightarrow E_n$          | $L.val = E.val$               |
| $E \rightarrow E + T$        | $E.val = E.val + T.val$       |
| $E \rightarrow T$            | $E.val = T.val$               |
| $T \rightarrow T * F$        | $T.val = T.val * F.val$       |
| $T \rightarrow F$            | $T.val = F.val$               |
| $F \rightarrow (E)$          | $F.val = E.val$               |
| $F \rightarrow \text{digit}$ | $F.val = \text{digit.lexval}$ |

Step 2: Annotated Parse Tree



### Step 3: Dependency Graph



Step 4: Topological sorting: ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮  
⑯ ⑰ ⑱ ⑲ ⑳ ㉑ ㉒.

b)  $1 * 2 * 3 * (4 + 5) n$

| product^n             | Semantic rules          |
|-----------------------|-------------------------|
| $A \rightarrow E^n$   | $A.val = E.val$         |
| $E \rightarrow E + T$ | $E.val = E.val + T.val$ |
| $E \rightarrow T$     | $E.val = T.val$         |
| $T \rightarrow T * F$ | $T.val = T.val * F.val$ |
| $T \rightarrow F$     | $T.val = F.val$         |
| $F \rightarrow (E)$   | $F.val = E.val$         |
| $F \rightarrow digit$ | $F.val = digit.lexval$  |

2: Annotated parse tree



Step 3: Dependency graph



Step 4: Topological sorting: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

$$c) (9 + 8 * (7 + 6) + 5) \text{ at } 4n$$

Step 1: SOD definition

| Product <sup>n</sup>          | Semantic rules                |
|-------------------------------|-------------------------------|
| $\mathcal{L} \rightarrow E_n$ | $\mathcal{L}.val = E.val$     |
| $E \rightarrow E + T$         | $E.val = E.val + T.val$       |
| $E \rightarrow T$             | $E.val = T.val$               |
| $T \rightarrow T * F$         | $T.val = T.val * F.val$       |
| $T \rightarrow F$             | $T.val = F.val$               |
| $F \rightarrow (E)$           | $F.val = E.val$               |
| $F \rightarrow \text{digit}$  | $F.val = \text{digit.lexval}$ |

Step 2: Annotated parse tree



### Step 3: Dependency graph



Topological sorting: 29 20 21 22 23 24 25 26 27 28 29 10 11 12 13 14 15 16 17 18

Write L-attributed SDD (6) Write a SDD for top down parser & construct annotated parse tree, dependency graph for given I/p.

① 3 \* 5

Step 4: SDD Definition

| Product             | Semantic rules                       |
|---------------------|--------------------------------------|
| $E \rightarrow TE'$ | $E'.inh = T.val$<br>$E.val = E'.syn$ |

$E' \rightarrow +TE'$ 

$$E'_i.inh = E_i.inh + T_i.inh$$

$$E'_i.syn = E'_i.syn$$

 $E' \rightarrow E$ 

$E'_i.syn = E'_i.inh$

 $T \rightarrow FT'$ 

$T'_i.inh = F_i.val$

$T'_i.syn = T'_i.syn$

 $T' \rightarrow *FT'_i$ 

$T'_i.inh = T'_i.inh * F_i.val$

$T'_i.syn = T'_i.syn$

 $T' \rightarrow E$ 

$T'_i.syn = T'_i.inh$

 $F \rightarrow (E)$ 

$F.val = E.val$

 $F \rightarrow \text{digit}$ 

$F.val = \text{digit.lexval}$

### NOTE

- ① The original grammar symbols [i.e.,  $E, T, F$ ] will only have synthesized attributes [i.e.,  $val$ ].
- ② The augmented grammar symbols [ $'$ ] will have both inherited & synthesized attributes i.e.,  $inh \& syn$ .

### Step 2: Annotated parse tree



### 3. Dependency graph



Topological sorting: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27 28 29

Write L-attributed SAD (①) write a SAD for top down parser & construct annotated parse tree, dependency graph for given I/p.

① 3 \* 5

Step: SAD Definition

| Product             | Semantic rules                       |
|---------------------|--------------------------------------|
| $E \rightarrow TE'$ | $E'.inh = T.inh$<br>$E.syn = E'.syn$ |

$E' \rightarrow +TE'$  $E' \rightarrow E$  $T \rightarrow FT'$  $T' \rightarrow *FT'$  $T' \rightarrow E$  $F \rightarrow (E)$  $F \rightarrow \text{digit}$  $E'_i.\text{inh} = E.\text{inh} + T_i.\text{inh}$  $E'_i.\text{syn} = E'_i.\text{syn}$  $E'_i.\text{syn} = E'_i.\text{inh}$  $T'_i.\text{inh} = F.\text{val}$  $T'_i.\text{syn} = T'_i.\text{syn}$  $T'_i.\text{inh} = T'_i.\text{inh} * F.\text{val}$  $T'_i.\text{syn} = T'_i.\text{syn}$  $T'_i.\text{syn} = T'_i.\text{inh}$  $F.\text{val} = E.\text{val}$  $F.\text{val} = \text{digit.lexval}$ 

### NOTE

- ① The original grammar symbols [i.e.,  $E, T, F$ ] will only have synthesized attributes i.e.,  $\text{val}$ .
- ② The augmented grammar symbols [ $'$ ] will have both inherited & synthesized attributes i.e.,  $\text{inh} \& \text{syn}$ .

### Step 2: Annotated parse tree



Op3: Dependency graph



Op4: Topological Order  $\rightarrow 1 \ 2 \ 3 \ 4 \ 5 \ 6 \ 7 \ 8 \ 9 \ 10 \ 11 \ 12$ .

② 3 + 5

Step1: SDD definition

// Same as previous problem.

Step2: Annotated parse tree



Op3: Dependency graph



Step 1: Topological Ordering = 1 2 3 4 ... 14 15

(3)  $3 * 5 + 4$

Step 1: SOD definition

// same as problem II  $\rightarrow$  ①

Step 2: Annotated parse tree



Step 3: Dependency graph



Topological Order: 1 2 3 4

- 19

$$\cancel{3+5} + (3+4) * (5+6)$$

Sol: SOR definition  
// same as SOR of  $\Pi \rightarrow \mathbb{D}$

5fp2: Annotated parse tree



### Step 3 : Dependency graph



## Topological Ordering

① ② ③ ④

37

38

⑤  $1 * 2 + 3 * (4 + 5)$

Step 1: SDD definition

// same as II  $\rightarrow$  I

Step 2: Annotated parse tree



Step 3: Dependency graph



Eg1: The following definition is  $\lambda$ -attributed. Here the inherited attribute of  $T'$  gets its value from its left sibling  $F$ . Similarly  $T_1'$  gets its value from its parent  $T'$ 's left sibling  $F$

Production

$$T \rightarrow FT'$$

$$T' \rightarrow^* FT_1'$$

Semantic Rules

$$T'.inh = F.Val$$

$$T_1'.inh = T'.inh * F.Val$$

Eg2: The definitions below are not  $\lambda$ -attributed as  $B.i$  depends on its right sibling  $C$ 's attribute.

Production

$$A \rightarrow BC$$

Semantic Rules

$$A.S = B.b$$

$$B.i = f(C.c, A.s)$$

SIDE EFFECTS

Evaluation of semantic rules may generate intermediate codes, ~~as~~ Eg: A disk calculator might print a result, a code generator might enter the  $g + h = g$  type of an identifier into a symbol table, may perform type checking & may issue error msg. These are known as side effects.

SEMANTIC RULES WITH CONTROLLED SIDE EFFECTS

In practise translation involves side effects.

Attribute grammars has no side effects & allow any evaluation order consistent with dependency graph whereas translation schemes impose left to right evaluation & allow scheme actions to contain any program fragment.

## Ways to Control Side Effects

1. permit incidental side effects that do not constrain attribute evaluation.

In other words, permit side effects when attr evaluta<sup>n</sup> based on any topological sort of the dependency graph produces a correct translation.

2. Impose constraints on allowable evaluation order, so that the same translation is produced for any allowable order.

Write an SAD for simple type declaration

a) ifp : int

Step 1:

| Production                              | Semantic rules                                                                                    |
|-----------------------------------------|---------------------------------------------------------------------------------------------------|
| $\alpha \rightarrow T \lambda$          | $\alpha.\text{inh} = T.\text{type}$                                                               |
| $T \rightarrow \text{int}$              | $T.\text{type} = \text{int}$                                                                      |
| $T \rightarrow \text{float}$            | $T.\text{type} = \text{float}$                                                                    |
| $\lambda \rightarrow \alpha, \text{id}$ | $\alpha.\text{inh} = \lambda.\text{inh}$<br>$\text{addtype}(\text{id.entry}, \lambda.\text{inh})$ |
| $\lambda \rightarrow \text{id}$         | $\text{addtype}(\text{id.entry}, \lambda.\text{inh})$                                             |

Step 2: Annotated parse tree



Step 3: Dependency graph



### Explanation:

Non-terminal  $\Delta$  represents a declaration, which from product 1, consists of a type  $T$  followed by a list  $L$  of identifiers.  $T$  has one attribute:  $T.type$ , which is the type in the declaration  $\Delta$ . Nonterminal  $\kappa$  has one attribute, which call  $\text{inh}$  to emphasize that it is an inherited attribute. The purpose of  $\kappa.inh$  is to pass the declared type down the list of identifiers, so that it can be the appropriate symbol table entries. Product ② & ③ each evaluate the synthesized attribute  $T.type$  giving it the appropriate value, integer or float. This type is passed to the attribute  $\kappa.inh$  in the rule of product 1. Product 4 passes  $\kappa.inh$  down the parse tree i.e., the value of  $\kappa.inh$  is copied at a parse tree node by copying the value of  $\kappa.inh$  from the parent of that node, the parent corresponds to the head of product. Products ④ & ⑤ also have a rule in which a function  $\text{addtype}$  is called with 2 arguments:

- ↳  $\text{id.entry}$  a lexical value that points to a symbol table object.
- ↳  $\kappa.inh$ , the type being assigned to every identifier on the list.

The function  $\text{addType}$  properly installs the type  $\kappa.inh$  as the type of the represented identifier. Note that the side effect, adding the type info to the table, does not affect the evaluation order.

b) int a, b

Step 1

Product<sup>n</sup>

D → T &

& T → int

T → float

L → L, id

L → id

Semantic rules.

L.inh = T.type

T.type = int

T.type = float

addtype(id.entry, L.inh)

addtype(id.entry, L.inh)

Step 2: Annotated parse tree



Step 3: Dependency graph



Step 4: Topological order ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨

## UNIT-5

[continued . . . .]

(c) float a, b, c. or int a, b, c <Exercise 5.2.2>

Step 1: S<sub>A</sub>S definition

| Production                    | S <sub>A</sub> S                              |
|-------------------------------|-----------------------------------------------|
| $\mathcal{D} \rightarrow T K$ | $\mathcal{D}.inh = T.type$                    |
| $T \rightarrow int$           | $T.type = int$                                |
| $T \rightarrow float$         | $T.type = float$                              |
| $d \rightarrow d, id$         | $d.inh = d.inh$<br>$d.type (id.entry, d.inh)$ |
| $d \rightarrow id$            | $d.type (id.entry, d.inh)$                    |

Step 2: Annotated parse tree



Step 3: Dependency graph



Step 4: Topological Ordering: 1 2 3 . . . 10

(d) float w, x, y, z

Step 1 SAD definition

// Same as previous problem.

Step 2: Annotated parse tree



Step 3: Dependency graph



Exercise 5.2.4 This grammar generates binary no with a decimal point

$$S \rightarrow d.d | \epsilon$$

$$d \rightarrow d B | B$$

$$B \rightarrow 0 | 1$$

Design an L-attributed SAD to compute S.val, the decimal no value of if string. For eg., translating string 01.101 should be the decimal no 5.625.

Sols

| <u>Production</u>           | <u>SAD</u>                                                                                                                                                          |
|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1) $S \rightarrow d.d,$     | <ol style="list-style-type: none"> <li><math>d.inh = 0</math></li> <li><math>d,.inh = -1</math></li> <li><math>S.val = d.syn + d,.syn</math></li> </ol>             |
| 2) $S \rightarrow \epsilon$ | <ol style="list-style-type: none"> <li><math>d.inh = 0</math></li> <li><math>S.val = d.syn</math></li> </ol>                                                        |
| 3) $d \rightarrow d, B$     | <ol style="list-style-type: none"> <li><math>d,.inh = d.inh + 1</math></li> <li><math>B.inh = d.inh</math></li> <li><math>d,.syn = d,.syn * B.syn</math></li> </ol> |
| 4) $d \rightarrow B$        | <ol style="list-style-type: none"> <li><math>d.syn = B.syn * 2^n d.inh</math></li> </ol>                                                                            |
| 5) $B \rightarrow 0   1$    | <ol style="list-style-type: none"> <li><math>B.syn = digit.level</math></li> </ol>                                                                                  |

Exercise 5.2.5 Design an S-attributed SAD for grammar of translation described in 5.2.4

| Production              | Semantic Rule.           |
|-------------------------|--------------------------|
| 1) $S \rightarrow d.d,$ | $S.val = d.lhs + d,.rhs$ |
| 2) $S \rightarrow d$    | $S.val = d.lhs$          |

3)  $\lambda \rightarrow \lambda + B$

$$1) \text{ L. lhs} = \text{L}_1 \text{. lhs} + (\text{E X. lhs\_exponent} \\ \text{as B. val})$$

$$2) d_{\text{.}} \text{arts} = d_{\text{.}} \text{arts} + (2^{\text{1}} d_{\text{.}} \text{arts-exponent} \\ \text{** B-val})$$

$$3) \text{ l.h.s-exponent} = \text{l., r.h.s-exponent} + 1$$

$$4) L_{\text{softs-exponent}} = L_{\text{softs-exponent}} +$$

4)  $\mathbb{N} \rightarrow B$

1)  $\alpha \cdot \text{lhs} = \text{rhs}$   $\forall$  the exponent  $\neq B.$  val

2) d.  $\text{rhs} = R^1$  & d. rhs exponent  $\neq B$  val

$$3) \text{ & lho-exponent} = 0$$

$$4) \text{ d. gru - exponent} = -1$$

$$5) B \rightarrow O | 1$$

B. Val = digit, lexval

## Application Of Syntax-Directed Translation

## 1. Construction Of Syntax Tree

SAR's are useful for construct<sup>n</sup> of Syntax tree  
A syntax tree is condensed form of parse tree



\* Syntax trees are useful for representing programming language constructs like expressions & statements.

- \* They help computer design by decoupling parsing from translation.
- \* Each node of a syntax tree represent a construct; the children of the node represent the meaningful components of a construct.
 

Eg: A syntax tree node representing an expression  $E_1 + E_2$  has label + & 2 children representing the sub expression  $E_1 \oplus E_2$
- \* Each node is implemented by objects with suitable no of fields; each object will have an opfield that is the label of node with additional fields as follow:
  - If the node is a leaf, an addition field holds the lexical value for the leaf. This is created by function leaf(op, val).
  - If the node is an interior node, there are as many fields as the node has children in Syntax tree. This is created by function Node(op, c<sub>1</sub>, c<sub>2</sub>, ..., c<sub>k</sub>)

Example: The S-attributed definit<sup>n</sup> in fig below constructs Syntax trees for a simple expr grammar involving only binary operators + & -. As usual these operators are at the same precedence level & are jointly left associative. All nonterminals have one synthesized attr node, which represents a node of the syntax tree.

Q-H+C

| PRODUCTION                 | SEMANTIC RULES                                                        |
|----------------------------|-----------------------------------------------------------------------|
| 1. $E \rightarrow E_1 + T$ | $E.\text{node} = \text{newNode}('+', E_1.\text{node}, T.\text{node})$ |
| 2. $E \rightarrow E_1 - T$ | $E.\text{node} = \text{newNode}('-', E_1.\text{node}, T.\text{node})$ |
| 3. $E \rightarrow T$       | $E.\text{node} = T.\text{node}$                                       |

|                            |                                                              |
|----------------------------|--------------------------------------------------------------|
| $T \rightarrow (E)$        | $T.\text{node} = E.\text{node}$                              |
| $T \rightarrow \text{id}$  | $T.\text{node} = \text{newLeaf}(\text{id}, \text{id.entry})$ |
| $T \rightarrow \text{num}$ | $T.\text{node} = \text{newLeaf}(\text{num}, \text{num.val})$ |

Step 2 : Parse tree

## Syntax tree



Sop3: Syntax tree for  $a - b + c$  using above SADs.



Steps in construction of the syntax tree for  $a - b + c$

If the rules are evaluated during a post order traversal of the parse tree, or with reduct<sup>n</sup> during a bottom up parse, then the sequence of steps shown below ends with  $p_5$  pointing to the root of the constructed syntax tree.

- 1)  $P_1 = \text{newLeaf}(\text{id}, \text{entry}-a)$
- 2)  $P_2 = \text{newLeaf}(\text{num}, +)$
- 3)  $P_3 = \text{newNode}(' - ', P_1, P_2)$
- 4)  $P_4 = \text{newLeaf}(\text{id}, \text{entry}-c)$
- 5)  $P_5 = \text{newNode}(' + ', P_3, P_4)$

### Constructing Syntax Trees during Top down parsing

With a grammar designed for top-down parsing, the syntax trees are constructed, using the same sequence of steps, even though the structure of the parse trees differs significantly from that of syntax trees. The L-attributed defn<sup>n</sup> below performs the same translation as the S-attributed definition shown before.

|                            |                                                                                                                                              |
|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| 1) $E \rightarrow TE'$     | $E.\text{node} = E^1.\text{syn}$<br>$E^1.\text{inh} = T.\text{node}$                                                                         |
| 2) $E^1 \rightarrow +TE^1$ | $E^1.\text{inh} = \cancel{E^1.\text{inh} + P_1}$<br>$\text{new Node}(+, E^1.\text{inh}, T.\text{node})$<br>$E^1.\text{syn} = E^1.\text{syn}$ |
| 3) $E^1 \rightarrow -TE^1$ | $E^1.\text{inh} = \text{new Node}(' - ', E^1.\text{inh}, T.\text{node})$<br>$E^1.\text{syn} = E^1.\text{syn}$                                |
| 4) $E^1 \rightarrow E$     | $E^1.\text{syn} = E^1.\text{inh}$                                                                                                            |
| 5) $T \rightarrow (E)$     | $T.\text{node} = E.\text{node}$                                                                                                              |

6)  $T \rightarrow id$

$T.\text{node} = \text{new leaf}(id, id.\text{entry})$

7)  $T \rightarrow num$

$T.\text{node} = \text{new leaf}(num, num.\text{val})$

Dependency Graph for  $a - b + c$  with Inherited SAs



### STRUCTURE OF A TYPE

This is an example of how inherited attribute can be used to carry info from one part of the parse tree to another. In C, the type  $\text{int}[2][3]$  can be read as "array of 2 arrays of 3 integer". The corresponding type expression  $\text{array}(2, \text{array}(3, \text{integer}))$  is represented by the tree as shown below.



| product                            | Semantic rules                                           |
|------------------------------------|----------------------------------------------------------|
| 1) $T \rightarrow BC$              | $T.t = C.t$<br>$C.b = B.t$                               |
| 2) $B \rightarrow \text{int}$      | $B.t = \text{integer}$                                   |
| 3) $B \rightarrow \text{float}$    | $B.t = \text{float}$                                     |
| 4) $C \rightarrow [\text{num}]C$ , | $C.t = \text{array}(\text{num}.val, C.b)$<br>$C.b = C.b$ |
| 5) $C \rightarrow \epsilon$        | $C.t = C.b$                                              |

- The non terminals  $B \oplus T$  have a synthesized attribute  $t$  representing a type
- The non terminal  $C$  has 2 attributes: an inherited attr ( $b$ )  
or a synthesized attr ( $t$ ).
- The inherited attribute,  $b$  pass a basic type down the tree
- They synthesized attribute,  $t$  accumulate the result.
- An annotated parse tree for  $i/p$ :  $\text{int}[2]$

(i)  $\text{int}[2]$



Dependency graph



(ii)  $\text{int}[2][3]$

Annotated parse tree:



## Dependency graph



## SYNTAX DIRECTED TRANSLATION SCHEME :

SDT is a complementary notation to SDF.

- \* All applicat<sup>n</sup> of SDF can be implemented using SDT.
- \* SDT is a CFG with program fragments called semantic actions embedded with production bodies.
- \* Any SDT can be implemented by first building a parse tree & then performing performing the actions in a left to right, depth first order i.e., during pre-order traversal.
- \* Typically SDT's are implemented during parsing without building parse tree. During parsing, an action in a production body is executed as soon as all the grammar symbols to the left of action have been matched.
- \* SDT's that can be implemented during parsing can be characterized by introducing distinct marker non terminals in place of each embedded action.
- \* Each marker M has only one production  $M \rightarrow G$ .
- \* If grammar with marker non terminals can be parsed by a given method, then SDT can be implemented

## UNIT-6

# INTERMEDIATE CODE

## GENERATION

### Intermediate Code generation

In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target code.



Logical structure of a compiler front end

Parsing, static checking and intermediate code generation are done sequentially; sometimes they can be combined and folded into parsing.

Static checking includes type checking, which ensures that operators are applied to compatible operands. It also includes any syntactic checks that remain after parsing. Ex: It ensures that a break-statement in C is enclosed within a while-, for- or switch-statement; an error is reported if such an enclosing statement does not exist.

In the process of translating a program in a given source language into code for a given target machine, a compiler may construct a sequence of intermediate representation as



High level representations are close to the source language and are well suited to tasks like static type checking.

Ex: Syntax tree

Low level representations are close to the target machine & are suitable for machine dependent tasks like register allocation and instruction selection.

An intermediate representation may either be an actual language or it may consist of external data structures that are shared by phases of the compiler.

### Variants of Syntax Trees

Nodes in a syntax tree represent constructs in the source program; the children of a node represent the meaningful components of a construct.

A directed acyclic graph (DAG) for an expression identifies the common subexpressions of the expression.

# Directed Acyclic Graphs for Expressions.

On DAG leaves represents the atomic operands and interior nodes represents the operators. as in the syntax tree. A node  $N$  in a DAG has more than one parent if  $N$  represents a common subexpression ; But in the syntax tree , the tree for the common subexpression would be duplicated as many times as the subexpression appears. In the original expression .

DAG gives the compiler important clues regarding the generation of efficient code to evaluate the expressions .

Ex: DAG for the expression

$$a + a * (b - c) + (b - c) * d$$



↳ The leaf for 'a' has 2 parents, because 'a' appears twice in the expression

↳ The 2 occurrence of the common subexpression  $b - c$  are represented by one node , the node labeled '-'

## SDD to produce DAG.

| PRODUCTION                      | SEMANTIC RULES                                                          |
|---------------------------------|-------------------------------------------------------------------------|
| (2) $E \rightarrow E_1 + T$     | $E.\text{node} = \text{new Node} ('+', E_1.\text{node}, T.\text{node})$ |
| (2E) $E \rightarrow E_1 - T$    | $E.\text{node} = \text{new Node} ('-', E_1.\text{node}, T.\text{node})$ |
| (2EE) $E \rightarrow T$         | $E.\text{node} = T.\text{node}$                                         |
| (2V) $T \rightarrow (E)$        | $T.\text{node} = E.\text{node}$                                         |
| (V) $T \rightarrow Ed$          | $T.\text{node} = \text{new Leaf}(Ed, Ed.\text{entry})$                  |
| (vE) $T \rightarrow \text{num}$ | $T.\text{node} = \text{new Leaf}(\text{num}, \text{num}.val)$           |

It will construct a DAG, before creating a new node, these functions first check whether an identical node already exists.

If a previously created identical ~~ex~~ node exists, the existing node is returned.

Steps for constructing the DAG for above example .

$$(P) P_1 = \text{Leaf}(Ed, \text{entry}-a)$$

$$(PE) P_2 = \text{Leaf}(Ed, \text{entry}-a) = P_1$$

$$(2EE) P_3 = \text{Leaf}(Ed, \text{entry}-B)$$

$$(2V) P_4 = \text{Leaf}(Ed, \text{entry}-C)$$

$$(V) P_5 = \text{Node}('-', P_3, P_4)$$

$$(vE) P_6 = \text{Node}('x', P_1, P_5)$$

$$(v2E) P_7 = \text{Node}('+', P_1, P_6)$$

$$(2E) P_8 = \text{Leaf}(Ed, \text{entry}-B) = P_3$$

$$(E) P_9 = \text{Leaf}(Ed, \text{entry}-C) = P_4$$

$$(X) P_{10} = \text{Node}(' ', P_5, P_4) = P_5$$

$$(xE) P_{11} = \text{Leaf}(Ed, \text{entry}-d)$$

(Ex8)  $P_{12} = \text{Node} ('*', P_5, P_{11})$

(Ex8)  $P_{13} = \text{Node} ('+', P_7, P_{12})$

When the call to Leaf ( $\text{Leaf}, \text{entry- } a$ ) is repeated at step 8,  
the call to Leaf ( $\text{Leaf}, \text{entry- } a$ ) is repeated at step 8,  
the node created by the previous call is returned, so  $P_2 = P_1$ .

The Value-Number Method for Constructing DAG's

- \* The nodes of a DAG are stored in an array of records
- \* Each row of array represents one record & therefore one node.
- \* In each record, the first field is an operation code, indicating the label of the node. Leaves have one additional field which holds the lexical value and interior nodes have 2 additional fields indicating the left and right children.

Ex: DAG for  $E = E + 10$  allocated in an Array



DAG

|   |     |    |   |  |
|---|-----|----|---|--|
| 1 | opd |    |   |  |
| 2 | num | 10 |   |  |
| 3 | +   | 1  | 2 |  |
| 4 | =   | 1  | 3 |  |
| 5 |     |    |   |  |

Array

to entry  
for E

- \* In this array, we refer to nodes by giving the integer index of the record for that node within the array.
- \* This integer historically has been called the "value number"
- \* This integer historically has been called the "value number" for the node or for the expression represented by the node
- \* For above example node labeled '+' has value number 3 & its left & right children have value numbers 1 & 2 respectively

Suppose that nodes are stored in an array & each node is referred to by its value numbers. Let the signature of an interior node be the triple  $\langle op, l, r \rangle$  where  $op$  is the label,  $l$  is its left child's value number &  $r$  its right child's value number. A unary operator may be assumed to have  $r=0$ .

ALGORITHM: To construct the nodes of a DAG using value number method.

INPUT: Label  $op$ , node  $l$  and node  $r$

OUTPUT: The value number of a node in the array with signature  $\langle op, l, r \rangle$

METHOD: Search the array for a node  $M$  with label  $op$ , left child  $l$  and right child  $r$ . If there is such a node, return the value number of  $M$ . If not, create in the array a new node  $N$  with label  $op$ , left child  $l$  and right child  $r$  & return its value number.

Above algorithm yields the desired output, but searching the entire array every time we are asked to locate one node is expensive.

A more efficient approach is to use a hash table, in which the nodes are put into "buckets" each of which typically will have only a few nodes. It supports dictionaries which is an abstract data type that allows us to insert & delete elements of a set & to determine whether a given element is currently in the set.

To construct a hash table for the nodes of a DAG, we need a hash function  $R$  that computes the index of the bucket for a signature  $\langle op, l, r \rangle$ .

The bucket index  $R(\langle op, l, r \rangle)$  is computed deterministically from  $op, l$  &  $r$  so that we may repeat the calculation & always get to the same bucket index for node  $\langle op, l, r \rangle$ .  
The buckets can be implemented as linked list as,



An array indexed by hash value, holds the bucket readers, each of which points to the first cell of a list. Within the linked list for a bucket, each cell holds the value number of one of the nodes that hash to that bucket. That is, node  $\langle op, l, r \rangle$  can be found on the list whose Reader is at index  $R(\langle op, l, r \rangle)$  of the array.

Thus, given the output input node  $op, l$  &  $r$  we compute the bucket index  $R(\langle op, l, r \rangle)$  & search the list of cells in this bucket for the given input node.

For each value number 'v' found in a cell, we must

check whether the signature  $\langle \text{op}, l, r \rangle$  of the poput node matches the node with value number  $v$  in the list of the cells. If we find a match, we return  $v$ . If we find no match, we know no such node can exist in any other bucket, so we create a new cell, add it to the list of cells for bucket-Index  $R(\text{op}, l, r)$  & return the value number in that new cell.

### Problems.

Construct the DAG for the expression.

$$((x+y) - ((x+y) * (x-y))) + ((x+y) * (x-y))$$



Construct the DAG & identify the value number for the sub expressions of the following expressions, assuming + associativity from the left.

(a)  $a+b+(a+b)$



|   |                 |       |
|---|-----------------|-------|
| 1 | $\varnothing d$ | a     |
| 2 | $\varnothing d$ | b     |
| 3 | +               | 1   2 |
| 4 | +               | 3   3 |

(b)  $a + b + a + b$



|   |                 |       |
|---|-----------------|-------|
| 1 | $\varnothing d$ | a     |
| 2 | $\varnothing d$ | b     |
| 3 | +               | 1   2 |
| 4 | +               | 3   1 |
| 5 | +               | 4   2 |

(c)  $a + a + (a + a + a + (a + a + a + a))$



|   |                 |       |
|---|-----------------|-------|
| 1 | $\varnothing d$ | a     |
| 2 | +               | 1   1 |
| 3 | +               | 2   1 |
| 4 | +               | 3   1 |
| 5 | +               | 3   4 |
| 6 | +               | 2   5 |

## Three - Address Code

In 3-address code, there is at most one operator on the right & side of an instruction, e.g., no built up arithmetic expression are permitted.

Thus a source-language expression like  $x+y * z$  might be translated into the sequence of 3 address instructions

$$t_1 = y * z$$

$$t_2 = x + t_1$$

where  $t_1$  &  $t_2$  are compiler generated names.

3 address code is a linearized representation of a DAG in which explicit names correspond to the interior nodes of the graph.

Ex: Write DAG & its corresponding 3 address code for the expression  $a + a * (b - c) + (b - c) * d$ .



DAG

$$t_1 = b - c$$

$$t_2 = a * t_1$$

$$t_3 = a + t_2$$

$$t_4 = t_1 * d$$

$$t_5 = t_3 + t_4$$

3 -address code

## Addresses and Instructions

3-address code is built from 2 concepts : address & instructions.

An address can be one of the following.

- ↳ A name : For convenience, we allow source program names to appear as addresses in 3-address code. In an implementation, a source name is replaced by a pointer to its symbol table entry, where all information about the name is kept.
- ↳ A constant : A compiler must deal with many different types of constants and variables.
- ↳ A compiler generated temporary : It is useful, especially in optimizing compilers, to create a distinct name each time a temporary is needed.

Symbolic labels will be used by instructions that alter the flow of control. A symbolic label represents the index of a 3-address instruction in the sequence of instructions. Actual indexes can be substituted for the labels, either by making a separate pass or by "backpatching".

- Here is a list of the common 3-address instruction forms
- (P) Assignment instructions of the form  $x = y \text{ op } z$  where  $\text{op}$  is a binary arithmetic or logical operation &  $x, y$  &  $z$  are addresses.
  - (PU) Assignments of the form  $x = \text{op } y$ , where  $\text{op}$  is a unary operation
  - (PPU) Copy instructions of the form  $x = y$ , where  $x$  is assigned the value of  $y$ .

- (cv) An unconditional jump goto L. The 3-address instruction with label L is the next to be executed.
- (cv) Conditional jumps of the form if  $x$  goto L and if False  $x$  goto L. These instructions execute the instruction with label L next if  $x$  is true and false respectively.
- (cvf) Conditional jumps such as if  $x$  relop  $y$  goto L which apply a relational operator ( $<$ ,  $=$ ,  $\geq$  etc) to  $x$  &  $y$  & execute the instruction with label L next if  $x$  stands in relation relop to  $y$ . If not, the 3 address instruction following if  $x$  relop  $y$  goto L is executed next, in sequence.
- (cvff) Procedures calls & returns are implemented using the following instructions: param sc for parameters ; call p,n & y=call p,n for procedure & function calls respectively & return y, where y representing a returned value, is optional
- ```

param x1
param x2
...
param xn
call p,n

```

The integer n indicating the no. of actual parameters in call p,n is not redundant because calls can be nested.

- (cvfff) Indexed copy instructions of the form  $x=y[{:}p]$  and  $x[{:}p]=y$ . The instruction  $x=y[{:}p]$  sets  $x$  to the value in the location  $p$  memory units beyond location  $y$ . The instruction  $x[{:}p]=y$  sets the contents of the locations  $p$  units beyond  $x$  to the value of  $y$ .

(Ex) Address & pointer assignments of the form  $x = \&y$ ,  $x = *y$  and  $*x = y$ .

Ex: Consider the statement

do  $\ell = \ell + 1$ ; while ( $a[\ell] < v$ );

2 possible translations of this statement are

L :  $t_1 = \ell + 1$

$\ell = t_1$

$t_2 = \ell * 8$

$t_3 = a[t_2]$

if  $t_3 < v$  goto L

(a) symbolic labels

100 :  $t_1 = \ell + 1$

101 :  $\ell = t_1$

102 :  $t_2 = \ell * 8$

103 :  $t_3 = a[t_2]$

104 : if  $t_3 < v$  goto 100

(b) position number.

The translation in (a) uses a symbolic label L, attached to the first instruction. The translation in (b) shows position number for the instructions, starting arbitrarily at position 100. In both translations, the last instruction is a conditional jump to the first instruction. The multiplication  $\ell * 8$  is appropriate for an array of elements that each take 8 units of space.

### Quadruples

A quadruple has 4 fields, which we call op, arg<sub>1</sub>, arg<sub>2</sub>, & result. The op field contains an internal code for the operator.

Ex:  $x = y + z$  is represented by placing + in op, y in arg<sub>1</sub>, z in arg<sub>2</sub> & x in result.

The following are some exceptions to this rule.

(e) Instructions with unary operators like  $x = \text{minus } y$  or  $x = y$  do not use  $\text{arg}_2$ . Note that for a copy statement like  $x = y, \text{op} \text{ is } =$ , while for most other operations, the assignment operator is implied.

(epp) Operators like  $\text{param}$  use neither  $\text{arg}_2$  nor  $\text{result}$ .

(epp) Operators like  $\text{param}$  use neither  $\text{arg}_2$  nor  $\text{result}$ .

(epp) Conditional & unconditional jumps put the target label in  $\text{result}$ .

Ex: Write quadruples for  $a = b * -c + b * -c$ .

$t_1 = \text{minus } c$

$t_2 = b * t_1$

$t_3 = \text{minus } c$

$t_4 = b * t_3$

$t_5 = t_2 + t_4$

$a = t_5$

|   | op    | arg1  | arg2  | result |
|---|-------|-------|-------|--------|
| 0 | minus | c     |       | $t_1$  |
| 1 | *     | 6     | $t_1$ | $t_2$  |
| 2 | minus | c     |       | $t_3$  |
| 3 | *     | 6     | $t_3$ | $t_4$  |
| 4 | +     | $t_2$ | $t_4$ | $t_5$  |
| 5 | =     | $t_5$ |       | a      |

(a) 3-address code

(b) Quadruples.

The special operator  $\text{minus}$  is used to distinguish the unary minus operator.

### Triples

A triples has only 3 fields, op, arg1 & arg2. Note that the result field in quadruples is used primarily for temporary names. Using triples, we refer to the result of the operation  $x \text{ op } y$  by its position, rather than by an explicit temporary names.

Ex: write triples for  $a = b * -c + b * -c$



(a) Syntax tree

|   | op    | arg1 | arg2 |
|---|-------|------|------|
| 0 | minus | 'c'  | ' '  |
| 1 | *     | 'b'  | (0)  |
| 2 | minus | 'c'  | ' '  |
| 3 | *     | 'b'  | (2)  |
| 4 | +     | (1)  | (3)  |
| 5 | =     | 'a'  | (4)  |

(6) Triples

A ternary operator like  $x[l] = y$  requires 2 entries in the triple structure, for example, we can put  $x$  &  $l$  in one triple &  $y$  in the next.

The benefits of quadruples over triples can be seen in an optimizing compiler, where instructions are often moved around. With quadruples, if we move an instruction that computes a temporary  $t$ , then the instructions that use  $t$  require no change. With triples, the result of an operation is referred to by its position, so moving an instruction may require us to change all references to that result.

### Indirect triples.

Indirect triples consist of a listing of pointers to triples, rather than a listing of triples themselves.

With indirect triples, an optimizing compiler can move an

instruction by reordering the instruction list without affecting the triples themselves.

Ex: write indirect triples for  $a = 6 * -c + 6 * -c$ .

| Instruction |     | op | arg1  | arg2 |
|-------------|-----|----|-------|------|
| 35          | (0) | 0  | minus | c    |
| 36          | (1) | 1  | *     | 6    |
| 37          | (2) | 2  | minus | c    |
| 38          | (3) | 3  | *     | 6    |
| 39          | (4) | 4  | +     | (1)  |
| 40          | (5) | 5  | =     | a    |
|             | ... |    |       |      |

### Static Single Assignment Form

Static single assignment form (SSA) is an intermediate representation that facilitates certain code optimization. & distinctive aspects of SSA that distinguish SSA from 3-address code

(e) All assignments in SSA are to variables with distinct names

$$\text{Ex: } p = a + b$$

$$p_1 = a + b$$

$$q_1 = p - c$$

$$q_1 = p_1 - c$$

$$p = q_1 * d$$

$$p_2 = q_1 * d$$

$$p = e - p$$

$$p_3 = e - p_2$$

$$q_2 = p + q_1$$

$$q_2 = p_3 + q_1$$

3-address code

static single assignment form

The same variable may be defined in different control flow paths in a program. For example, the source program

```
if (flag) x = -1; else x = 1;
```

$$y = x * a;$$

If we use different names for  $x$  in the true part & false then conflict arises which name should use in  $y = x * a$ . Then SSA uses a notational convention called  $\phi$ -function to combine the definitions of  $x$ .

```
if (flag)  $x_1 = -1$ ; else  $x_0 = 1$ ;
```

$$x_3 = \phi(x_1, x_0);$$

Here  $\phi(x_1, x_0)$  has the value  $x_1$  if the control flow passes through the true part of the conditional & the value  $x_0$  if the control flow passes through the false part.

Translate the arithmetic expression  $a + (b+c)$  into

(a) A syntax tree

(b) Quadruples

(c) Triples

(d) Indirect triples.

(a) Syntax tree



$$t_1 = b + c$$

$$t_2 = \text{minus} + t_1$$

$$t_3 = a + t_2$$

### (6) Quadruples

|    | op    | arg 1          | arg 2          | result         |
|----|-------|----------------|----------------|----------------|
| 10 | +     | b              | c              | t <sub>1</sub> |
| 21 | minus | t <sub>1</sub> |                | t <sub>2</sub> |
| 32 | =     | a              | t <sub>2</sub> | t <sub>3</sub> |

### (c) triples

|   | op    | arg 1 | arg 2 |
|---|-------|-------|-------|
| 0 | +     | b     | c     |
| 1 | minus | (0)   |       |
| 2 | =     | a     | (1)   |

### (d) Indirect triples.

|    | Instructions |
|----|--------------|
| 35 | (0)          |
| 36 | (1)          |
| 37 | (2)          |

|   | op    | arg 1 | arg 2 |
|---|-------|-------|-------|
| 0 | +     | b     | c     |
| 1 | minus | (0)   |       |
| 2 | =     | a     | (1)   |

### Translation of Expressions.

An expression with more than one operator, like  $a+b*c$ , will translate into instructions with almost one operator per instruction. An array reference  $A[i][j]$  will expand into a sequence of 3-address instructions that calculate an address for the reference.

### Operations within Expressions

The following syntax-directed definition builds up the 3-address code for an assignment statement & using the attributes code for s & attributes addr & code for an expression. Attributes s.code & e.code denotes the 3 address code for s & e respectively. Attribute e.addr denotes the address

## QUESTIONS

1. Define quadruples, triples and static single assignment form.

A quadruples has 4 fields, op, arg<sub>1</sub>, arg<sub>2</sub> & result. The op field contains an incremental code for the operators.

Ex: the quadruples for  $a = (6 * -c) + (6 * -c)$

|                         | op | arg <sub>1</sub> | arg <sub>2</sub> | result |
|-------------------------|----|------------------|------------------|--------|
| $t_1 = \text{minus } c$ | 0  | minus            | c                | $t_1$  |
| $t_2 = 6 * t_1$         | 1  | *                | 6                | $t_1$  |
| $t_3 = \text{minus } c$ | 2  | minus            | c                | $t_3$  |
| $t_4 = 6 * t_3$         | 3  | *                | 6                | $t_3$  |
| $t_5 = t_2 + t_4$       | 4  | +                | $t_2$            | $t_4$  |
| $a = t_5$               | 5  | =                | $t_5$            | a      |

A triples has only 3 fields op, arg<sub>1</sub> & arg<sub>2</sub>

Ex: the triples for  $a = 6 * -c + 6 * -c$

|   | op    | arg <sub>1</sub> | arg <sub>2</sub> |
|---|-------|------------------|------------------|
| 0 | minus | c                |                  |
| 1 | *     | 6                | (0)              |
| 2 | minus | c                |                  |
| 3 | *     | 6                | (2)              |
| 4 | +     | (1)              | (3)              |
| 5 | =     | a                | (4)              |

Static single assignment form is an intermediate representation that facilitates certain code optimizations

Ex:

$$p = a + b$$

$$q = p - c$$

$$p = q * d$$

$$p = e - p$$

$$q = p + q$$

(a) 3-address code

$$p_1 = a + b$$

$$q_1 = p_1 - c$$

$$p_2 = q_1 * d$$

$$p_3 = e - p_2$$

$$q_2 = p_3 + q_1$$

(b) Static single assignment form.

- Q. Develop SDD to produce directed acyclic graph for an expression show the steps for constructing the DAGs for the expression  $a + a * (b - c) + (b - c) * d$ .

Syntax directed definition PS,

| <u>E ~ PRODUCTION</u>           | <u>SEMANTIC RULES</u>                                             |
|---------------------------------|-------------------------------------------------------------------|
| (E) $E \rightarrow E_1 + T$     | $E\_node = \text{new Node} ('+', E_1.\text{node}, T.\text{node})$ |
| (E2) $E \rightarrow E_1 - T$    | $E\_node = \text{new Node} ('-', E_1.\text{node}, T.\text{node})$ |
| (E3) $E \rightarrow T$          | $E\_node = T.\text{node}$                                         |
| (E4) $T \rightarrow (E)$        | $T\_node = E.\text{node}$                                         |
| (E5) $T \rightarrow \text{Pd}$  | $T\_node = \text{new Leaf} (\text{pd}, \text{pd.entry})$          |
| (E6) $T \rightarrow \text{num}$ | $T\_node = \text{new Leaf} (\text{num}, \text{num.val})$          |

## Steps for constructing the DAG

(8)  $P_1 = \text{Leaf} (\&d, \text{entry}-a)$

(88)  $P_2 = \text{Leaf} (\&d, \text{entry}-a) = P_1$

(888)  $P_3 = \text{Leaf} (\&d, \text{entry}-b)$

(8v)  $P_4 = \text{Leaf} (\&d, \text{entry}-c)$

(v)  $P_5 = \text{Node} ('-', P_3, P_4)$

(v8)  $P_6 = \text{Node} ('*', P_1, P_5)$

(v88)  $P_7 = \text{Node} ('+', P_1, P_6)$

(8x) (v888)  $P_8 = \text{Leaf} (\&d, \text{entry}-b) = P_3$

(8x)  $P_9 = \text{Leaf} (\&d, \text{entry}-c) = P_4$

(x)  $P_{10} = \text{Node} ('^2, P_3, P_4) = P_5$

(x8)  $P_{11} = \text{Leaf} (\&d, \text{entry}-d)$

(x88)  $P_{12} = \text{Node} ('*', P_5, P_{11})$

(x888)  $P_{13} = \text{Node} ('+', P_7, P_{12})$

## DAG



# UNIT - 8

## CODE GENERATION

### INTRODUCTION :

- \* Code generation is the final phase in the compiler design.
- \* The code optimizer accepts intermediate code representation which is generated from the front end of the compiler & produces another intermediate code representation which is optimized.
- \* Code generator takes intermediate representation produced by code optimizer along with supplementary information in symbol table of the source program & produce as output an equivalent target program.



- \* Code generator has 3 main tasks:
  - 1) Instruction selection
  - 2) Register allocation & assignment
  - 3) Instruction Ordering

### 1> INSTRUCTION SELECTION :

Choose appropriate target machine words instructions to implement the IR [intermediate representation] statements

### 2> REGISTER ALLOCATION & ASSIGNMENT :

Decide what values to keep in which registers

### 3) INSTRUCTION ORDERING

Decide in what order to schedule the execution of instructions.

8.1

### ISSUES IN THE DESIGN OF CODE GENERATOR:

- 1) Input to the code generator
- 2) The target program
- 3) Instruction selection
- 4) Register Allocation
- 5) Evaluation Order

#### 1) Input to the code generator

\* Input to the code generator is the intermediate representation of the source program produced by the front end along with information in the symbol table i.e., used to determine the runtime address of the data objects denoted by the names in IR.

< Input = IR + Symbol table >

\* IR has several choices

- (a) 3-address representation : quadruples, triples, indirect triples
- (b) Virtual machine representation : byte codes of stack machine codes
- (c) Linear representation such as postfix notation
- (d) Graphical representation such as syntax trees or DAG's

\* Assumptions made are

- (i) Front end produces low-level IR, i.e., values of names in it can be directly manipulated by the machine instruction.
- (ii) Syntactic & semantic errors have been already detected

## 2) The Target Program:

- \* The output of code generator is target program.
- \* The target architecture of the target machine has a significant impact on the design of code generator.
- \* Most common architectures are:
  - (a) CISC: It has few registers, has maximum of 2 operands & variety of addressing mode, variable length instructions & instruction with side effects.
  - (b) RISC: It has many registers, has maximum of 3 operands with simple addressing modes, & relatively simple instruction set architecture.
- \* Output may take variety of forms.
  - a) Absolute machine language [Executable code]
  - b) Relocatable machine language [object files for linker]
  - c) Assembly language [facilitates debugging]
- a) Absolute machine language has advantage that it can be placed in a fixed location in memory & immediately executed.
- b) Relocatable machine language program allows subprograms to be compiled separately.
- c) Producing Assembly language program as output makes the process of code generation somewhat easier.

## 3) Instruction Selection

The code generator must map the IR program into a code sequence that can be executed by the target machine.

\* The complexity of performing this mapping is determined by the factors such as:

- (i) the level of the IR
- (ii) the nature of the instruction set architectures.
- (iii) the desired quality of the generated code

(i) the levels of the IR:

- > If the [IR is high level], use code templates to translate each IR statements into a sequence of machine instruction.
- > produces poor code, needs further optimizat<sup>n</sup>.
- > If the [IR is low level], use ~~code, this~~ <sup>low level</sup> information to generate more efficient code sequence.

(ii) the nature of the instruction set architectures has strong effect on difficulty of instruct<sup>r</sup> select<sup>r</sup>.

- > Uniformity & completeness of the instruct<sup>r</sup> set are imp factors.
- > If we do not care about the efficiency of the target program, instruct<sup>r</sup> select<sup>r</sup> is straightforward.

> For eg:

|                         |                                         |
|-------------------------|-----------------------------------------|
| $x = y + z \Rightarrow$ | LD R <sub>0</sub> , y                   |
|                         | ADD R <sub>0</sub> , R <sub>0</sub> , z |
|                         | ST x, R <sub>0</sub>                    |

∴ produces redundant LD & store

e.g.:

|                         |                                         |
|-------------------------|-----------------------------------------|
| $a = b + c \Rightarrow$ | LD R <sub>0</sub> , b                   |
| $d = a + b$             | ADD R <sub>0</sub> , R <sub>0</sub> , c |

|                       |
|-----------------------|
| ST a, R <sub>0</sub>  |
| LD R <sub>0</sub> , a |

|                                         |
|-----------------------------------------|
| ADD R <sub>0</sub> , R <sub>0</sub> , c |
| ST d, R <sub>0</sub>                    |

→ REDUNDANT

(iii) the quality of the generated code is determined by its speed & size.

> For eg:

$$\begin{array}{l} a = a + 1 \rightarrow LD R_0, a \\ \quad \quad \quad ADD R_0, R_0, \#1 \\ \quad \quad \quad ST a, R_0 \end{array} \quad \left. \begin{array}{l} \text{replaced} \\ \text{by} \end{array} \right\} INC a$$

#### 4) Register Allocation:

\* Instruc<sup>n</sup> involving register operands are usually shorter & faster than those involving operands in memory.

\* 2 subproblems:

(i) Register allocation: Select the set of variables that will reside in registers at each point in the program.

(ii) Register assignment: Select specific register that a variable will reside in.

\* Complications imposed by the hardware architecture  
Eg: Register pairs for multiplication & division.

\* Multiplication instr<sup>n</sup> is of the form

$$M \boxed{x, y}$$

where  $x \rightarrow$  Multiplicand, is the odd register of an even/odd register pair.  
 $y \rightarrow$  Multiplier, is ~~the~~ a single register.

$\Rightarrow$  Product  $\rightarrow$  occupies the entire even/odd register pair.

\* Division instr<sup>n</sup> is of the form

$$D \boxed{x, y}$$

where  $x \rightarrow$  dividend, occupies even register  
 $y \rightarrow$  divisor, occupies odd/even register  
 $\Rightarrow$  Quotient  $\rightarrow$  stored in odd register  
remainder  $\rightarrow$  stored in even register

Eg: two 3-address code sequences

$$t = a + b$$

$$t = t * c$$

$$t = t / d$$

$$t = a + b$$

$$t = t + c$$

$$t = t / d$$

Optimal machine-Code sequences

L R1, a

A R1, b

M R0, c

D R0, d

ST R1, t

L R0, a

A R0, b

A R0, c

SRDA R0, 3R

D R0, d

ST R1, t

### 5) Evaluation Order:

- \* The order in which computations are performed can effect the efficiency of the target code.
- \* When instructions are independent their evaluation order can be changed.
- \* Some computation orders require fewer registers to hold intermediate results than others.
- \* However picking a best order in the general case is a difficult NP-complete problem.

ADDITIONAL INFORMATION: Eg

$$t1 = a + b$$
$$a + b - (c + d) * e \Rightarrow t2 = c + d$$

$$t3 = e * t2$$

$$t4 = t1 - t3$$

Reorder↓

$$t2 = c + d$$

$$t3 = e * t2$$

$$t1 = a + b$$

$$t4 = t1 - t3$$

MOV R0, a

ADD R0, b

MOV R1, R0

MOV R1, c

ADD R1, d

MOV R0, e

MUL R0, R1

MOV R1, t1

SUB R1, R0

MOV B4, t4, R1

MOV R0, c

ADD R0, d

MOV R1, e

MUL R1, R0

MOV R0, a

ADD R0, b

SUB R0, R1

MOV t4, R0

## THE

### 8.2 THE TARGET LANGUAGE:

For designing a good code generator, we need to have familiarity with target machine & its instruction set. Instead of generating code on a specific target machine, a general machine consisting of many registers are considered.

#### A SIMPLE TARGET MACHINE MODEL:

The characteristics of target machine mode with instruction format & instruction set are shown below:

\* Our hypothetical machine:

(i) It is a 3-address machine with the following format

[OP destination, Source1, Source2]

#### NOTE:

A 3 address instruct<sup>n</sup> can have 2 operands or 1 operand also but it can have max of 3 operands

(ii) The target machine is byte addressable i.e., it can access 8 bit of info from specific address

(iii) It has n no of registers denoted by

R<sub>0</sub>, R<sub>1</sub>, R<sub>2</sub>, ..., R<sub>n-1</sub>

\* Various types of instruct<sup>n</sup> that are used by target m/c

(i) Load Instruct<sup>n</sup>

(ii) Store Instruct<sup>n</sup>

(iii) Computational Instruct<sup>n</sup>

(iv) Unconditional Instruct<sup>n</sup>

(v) Conditional Instruct<sup>n</sup>

(i) Load Instruct<sup>n</sup>: Used to copy the data into distinct operand which must be a register.

SYNTAX: LD dst, addr

where addr operand  $\rightarrow$  register or memory locat

(ii) Store instruction: Used to copy the data into memo locat<sup>n</sup> specified in the destinat<sup>n</sup> operand.

SYNTAX: ST dst, or

where dst  $\rightarrow$  destination & it is a memo location

or  $\rightarrow$  register.

Computational operation.

(iii) Arithmetic instruction: They are performed using these instruction.

SYNTAX: OP dst, Src1, Src2.

where 1<sup>st</sup> operand, dst  $\rightarrow$  destination

2<sup>nd</sup> & 3<sup>rd</sup> operand  $\rightarrow$  Operands where R values fetched for operat<sup>n</sup> to be p

Eg1: ADD R0, R1, R2 //  $R0 = R1 + R2$

Eg2: SUB R0, R0, R1 //  $R0 = R0 - R1$

Eg3: MUL R2, R0, R1 //  $R2 = R0 * R1$

(iv) Unconditional Jumps: The branch instruct<sup>n</sup> without any condit<sup>n</sup> are called unconditional jumps.

SYNTAX: BR label

where BR  $\rightarrow$  BRanch instruct<sup>n</sup>

(v) Conditional Jumps: Based on the value stored in a register i.e., whether it is true or false or -ve, if branching takes place, then the branch inst<sup>n</sup> are called Conditional jumps.

SYNTAX: Cond or, label

where B stands from Branch,

Cond can be LT, GT, LTX, GTX

Less than or equal  
Less than  
Greater than or equal  
Greater than

R → register, contains value such as 0, +ve or -ve.

Eg: BL R0, T1

// Branch to T1, if R0 contains +ve value

Eg: BRTZ R1, TR

// Branch to TR, if R1 contains either 0 or -ve value

- \* Different addressing modes supported by generalized target machine:

(i) Direct addressing mode

2) Indexed —||—

3) Integer Indexed —||—

4) Indirect —||—

5) Immediate —||—

### (ii) Direct A/M:

Address of the data to be accessed is directly present in the instructn, i.e., location is identified by a variable name x.

Eg: LD R1, x

// Load value stored in memory locat<sup>n</sup> x into R1

(iii) Indexed A/M: The data can be accessed from a memory locat<sup>n</sup> using index. This addressing mode is useful for accessing arrays, where a is the base address of the array & register holds the index value

Eg: LD R1, a(R2)

// Accesses the data stored in  
R1 = contents(a + contents(R2))

(iv) Indexed A/M where memory locat<sup>n</sup> is integer

It is same as previous one except that a memory locat<sup>n</sup> is identified as integer.

Eg: LD R1, 100(R2)

// R1 = contents(100 + contents(R2))

(iv) Indirect A/M: Contents of the data can be accessed by differencing using \* operators as shown below:

LD R1, \*(R2)

// R2 contains memory loc<sup>n</sup>  
the data stored in that  
memory loc<sup>n</sup> is copied in  
register R1

LD R1, \*100(R2)

// R1 = contents(contents(100+  
contents(R

(v) Immediate A/M: The data to be manipulated is directly present in the instruction & preceded by

LD R1, #100

// R1  $\leftarrow 100$

### EXERCISE :

code for

1. Generate 3 address statement for  $x = y - z$

LD R1, y

// R1 = y

LD RR, z

// RR = z

ADD R1, RR, AR

// R1 = R1 + RR

ST x, R1

// ~~#~~ x = R1

code for

2. Generate 3 address statement  $x = *p$

LD R1, p

// R1  $\leftarrow p$

LD RR, O(R1)

// RR = contents(O + conte

ST x, RR

// x = RR

(iv) Indirect A/M: Contents of the data can be accessed by differencing using \* operators as shown below:

LD R1, \*(R2)

// R2 contains memory loc<sup>n</sup>  
the data stored in that  
memory loc<sup>n</sup> is copied in  
register R1

LD R1, \*100(R2)

// R1 = contents(contents(100+  
contents(R

(v) Immediate A/M: The data to be manipulated is directly present in the instruction & preceded by

LD R1, #100

// R1  $\leftarrow 100$

### EXERCISE :

code for

1. Generate 3 address statement for  $x = y - z$

LD R1, y

// R1 = y

LD RR, z

// RR = z

ADD R1, RR, AR

// R1 = R1 + RR

ST x, R1

// ~~#~~ x = R1

code for

2. Generate 3 address statement  $x = *p$

LD R1, p

// R1  $\leftarrow p$

LD RR, O(R1)

// RR = contents(O + conte

ST x, RR

// x = RR

3. Generate code for 3 address statement  $*p = y$

LD R1, p //  $R1 = p$   
LD RR, y //  $RR = y$   
ST O(R1), RR // contents(O + contents(R1)) = RR.

4. Generate m/c code for 3 address statement  $b = a[i]$

LD R1, i //  $R1 = i$   
MUL R1, R1, 8 //  $R1 = R1 * 8$   
LD RR, a[R1] //  $RR = \text{contents}(a + \text{contents}(R1))$   
ST b, RR //  $b = RR$

5. Generate m/c code for 3 address statement  $a[j] = c$

LD R1, j //  $R1 = j$   
LD RR, c //  $RR = c$   
MUL R1, R1, 8 //  $R1 = R1 * 8$   
ST a[R1], RR // contents(a + contents(R1)) = RR

6. Generate m/c code for 3 address statement  
*if  $x < y$  goto L*

LD R1, x //  $R1 = x$   
LD RR, y //  $RR = y$   
SUB R1, R1, RR //  $R1 = R1 - RR$   
BNEZ R1, M // if  $R1 < 0$  jump to M

## Program of Instruction Cost

- \* For simplicity we take the cost of an instruction to be one plus the costs associated with the addressing modes of the operands.
- \* A/M involves registers have zero additional cost.
- \* A/M involving memory locat<sup>n</sup> or constant have additional cost of 1.
- \* For example:

- a) LD A0, R1  $\Rightarrow \text{cost} = 1$
- b) LD R0, M  $\Rightarrow \text{cost} = 2$ .
- c) LD RI, \*100(RR)  $\Rightarrow \text{cost} = 3$

### Cost of Addressing mode:

| Mode                       | Form  | Address                 | Added Cost |
|----------------------------|-------|-------------------------|------------|
| ① Absolute direct<br>A/M   | M     | M                       | 1          |
| ② Register direct<br>A/M   | R     | R                       | 0          |
| ③ Indexed<br>A/M           | C(R)  | C+contents(R)           | 1          |
| ④ Indirect register<br>A/M | *R    | contents(R)             | 0          |
| ⑤ Indirect indexed<br>A/M  | *C(R) | contents(C+contents(R)) | 1          |
| ⑥ Immediate<br>A/M         | #C    | N/A                     | 1          |

NOTE: Cost of each statement = 1 + cost (Addressing mode)

## EXERCISES (8.2)

1. Determine the costs of the following instruc<sup>n</sup> sequence

|                |                                                     |
|----------------|-----------------------------------------------------|
| LD R0, 4       | $\xrightarrow{\text{Cost} = 1 + \text{cost(A.M.)}}$ |
| LD RI, 3       | $\xrightarrow{\text{Cost} = 1 + 1 = 2}$             |
| ADD R0, R0, RI | $\xrightarrow{\text{Cost} = 1 + 1 = 2}$             |
| ST x, R0       | $\xrightarrow{\text{Cost} = 1 + 0 = 1}$             |
|                | $\xrightarrow{\text{Cost} = 1 + 1 = 2}$             |
|                | <u>Total Cost = 7</u>                               |

2. LD R0, i

MUL R0, R0, 8

LD RI, a(R0)

ST b, RI

|               |                                                      |
|---------------|------------------------------------------------------|
|               | $\xrightarrow{\text{Cost} = \text{Cost (A.M.)} + 1}$ |
| LD R0, i      | $\xrightarrow{\text{Cost} = 1 + 1 = 2}$              |
| MUL R0, R0, 8 | $\xrightarrow{\text{Cost} = 1 + 1 = 2}$              |
| LD RI, a(R0)  | $\xrightarrow{\text{Cost} = 1 + 1 = 2}$              |
| ST b, RI      | $\xrightarrow{\text{Cost} = 1 + 1 = 2}$              |
|               | <u>Total Cost = 8</u>                                |

3. LD R0, C

LD RI, i

MUL RI, RI, 8

ST a(RI), R0

|               |                                                      |
|---------------|------------------------------------------------------|
|               | $\xrightarrow{\text{Cost} = \text{Cost (A.M.)} + 1}$ |
| LD R0, C      | $\xrightarrow{1 + 1 = 2}$                            |
| LD RI, i      | $\xrightarrow{1 + 1 = 2}$                            |
| MUL RI, RI, 8 | $\xrightarrow{1 + 1 = 2}$                            |
| ST a(RI), R0  | $\xrightarrow{1 + 1 = 2}$                            |
|               | <u>Total Cost = 8</u>                                |

4. LD R0, P  
LD RI, O(R0)  
ST x, RI

$$\text{Cost} = \text{Cost(A.M)} + 1$$

|              |             |
|--------------|-------------|
| LD R0, P     | $1 + 1 = 2$ |
| LD RI, O(R0) | $1 + 1 = 2$ |
| ST x, RI     | $1 + 1 = 2$ |
| TOTAL        | Cost = 6    |

5. LD R0, P  
LD RI, x  
ST O(R0), RI

$$\text{Cost} = \text{Cost(A.M)} + 1$$

|              |             |
|--------------|-------------|
| LD R0, P     | $1 + 1 = 2$ |
| LD RI, x     | $1 + 1 = 2$ |
| ST O(RI), RI | $1 + 1 = 2$ |
| TOTAL        | Cost = 6    |

6. LD R0, x  
LD RI, y  
SUB R0, R0, RI  
BLTR \*R3, R0

$$\text{Cost} = 1 + \text{cost(A.M)}$$

|                |                                   |
|----------------|-----------------------------------|
| LD R0, x       | $1 + 1 = 2$                       |
| LD RI, y       | $1 + 1 = 2$                       |
| SUB R0, R0, RI | $1 + 0 = 1$                       |
| BLTR *R3, R0   | $1 + 1 = 2$ $\because$ indirect A |
| TOTAL          | Cost = 7                          |