

## Branch Instructions

In textbook, 4 are given.

among which 1st 2 are for 32-bit ARM,

& next two are for 16-bit Thumb instructions.

For our syllabus only 1st two.

1. B : Branch :  $pc = \text{label}$  program counter.

2. BL : Branch with link :  $pc = \text{label}$

$lr = \text{address of the}$   
 $\uparrow$   
link register      next instruction  
after BL.

## ④ LDR

## Load and store Instructions

\* LDR is used to read complete 32-bit data into register.

Load into register from memory

Store a register value into memory.

Defn of word depends on the architecture of the computer.

In a 16-bit microprocessor, word = 16-bit

In a 32-bit architecture, word = 32-bit.

At a time how much it can read/write is word.

\* Note:

1. LDR : Load  $\frac{\text{word}}{(32\text{-bit})}$  into a register  
(word)

2. LDRH : Load half-word into a register  
(16-bit)

3. LDRB : Load a byte into a register  
(8-bit)

4. STR : Store 32-bit

5. STRH : Store 16-bit

6. STRB : Store 8-bit

## In case of Unsigned

If I store 32-bit data in a 32-bit wide register then all bits will be having data.

But less than 32-bit data if I store, then remaining bits of 32-bit wide register are padded with zero's.

## In case of Signed

If less than 32-bit data we store in 32-bit wide register then, remaining bits will be padded with:

- zero's if it's a positive number &
- one's if it's a negative number.

## Program:- (On load & store instructions.)

In the program, we are suppose to load the starting address of FBLOCK (1<sup>st</sup> memory block) & SBLOCK into R0 & R1.

(2<sup>nd</sup> memory block)

(Open memory windows)

\* In order to store 32-bit data, you require 4 blocks.



\* index it by 4, if you're reading complete 32 bit data

eg. LDR R3, [R0], #4 } were loading complete  
STR R3, [R1], #4 } 32-bit data & storing complete 32-bit data.

\* DCW 0 means initialize memory location with zero's bcz I want to store some data in it. There shouldn't be any garbage value.

⚫ LDRH =  $\#_1 \quad \#_2$  } for 16-bit  
 ⚫ STRH =  $\#_1 \quad \#_2$  } for 16-bit

⚫ LDRB =  $\#_1$  } for 8-bit  $\rightarrow$  eq  $\Rightarrow$  loop LDRB R8, [R0], #1  
 ⚫ STRB =  $\#_1$  } for 8-bit  $\rightarrow$  eq  $\Rightarrow$  loop STRE R3, [R1], #1

⚫ LDREH : load signed half word  $\rightarrow$  fetch it with sign

⚫ LDRSB : load signed byte  $\rightarrow$  fetch it with sign

LDRSSB:

| Byte 3 | Byte 2 | Byte 1 | Byte 0 |
|--------|--------|--------|--------|
| 0 3    | 0 2    | 0 1    | 0 0    |

→ 32 bit

PROGRAM

AREA LDRpgm, CODE, READONLY  
ENTRY  
 MOV R4, #4  
 LDR R0, =FBLOCK  
 LDR R1, =SBLOCK  
Loop  
 LDR R3, [R0] - #4 ; loading complete 32-bit  
 data into the register from  
 memory  
 STR R3, [R1], #4 ; we're storing register  
 value into memory.  
 SUBS R4, #1 ; subtract 1 from R4, updating flag  
 BNE Loop ; branch to "loop"; if the result is not  
 zero (i.e. Z flag is not set) then continue  
 loop, otherwise branch to memory update part of  
 program.  
FBLOCK DCD 0x123456F8, 0x9ABCDEF0, 0x00004000,  
 0x00000000

Define constant  $R4 = 0^{24}$   
Double word L

T is a used  
directive to allocate memory  
to use in memory  
specify it  
size & type  
values.

AREA MYDATA, DATA, READWRITE  
BLOCK D0 ; We written this down becz I want to be Read/Write memory. I wrote can't  
BLOCK above only becz, FBlock  
FBlock above only memory, as I don't  
just a Read Only memory, I'm just  
write anything in FBlock & I'm doing  
it in data.

when R4 is set  
flag is set  
for T will be compared

## Extra page for tracing. or key points

DCD →

Note that you are  
when it is      32-bit data → e.g. 0x12345678  
 DCD    "      16-bit data → e.g. 0x1034  
 DCW    "      8-bit data → e.g. 0x10  
 and DCB    "      and 0x10

Points:

- LDR & STR instructions can load and store data on a boundary alignment that is the same as the data type size being loaded or stored. For example, LDR can only load 32-bit words on a memory address that is a multiple of four bytes - 0, 4, 8, and so on. This example shows a load from a memory address contained in register r1, followed by a store back to the same address in memory.
- load register r0 with the contents of the memory address pointed to by register r1.

```

LDR    r0, [r1]
; = LDR r0, [r1, #0]

; ← It's just
; a comment
; store the contents of register r0 to the memory address
; pointed to by register r1.
STR    r0, [r1]
; = STR r0, [r1, #0]
```

Here, Register r1 is called the base address register.

## Swap instruction

Exchanging the content of memory with the register content

1st two operand  
↓  
is  
memory location  
we registers,  
8 digit represents 32-bit number.

Note :-

The swap instruction is a special case of load/store instruction. It swaps the content of memory with the contents of registers.

This instruction is an atomic operation (ie. it reads & writes a location in the same bus) operation preventing any other instruction from reading or writing to that location until it completes.

( $\hookrightarrow$  means in OS, I'll lock that memory location until one process completes writing data to it. multiple processes can read simultaneously while simultaneously

Swap can't be interrupted by any other instruction or any other bus access. we say that the system holds the bus until the transaction is complete.

eg:- PRE:

$\text{mem}32[0x9000] = 0x12345678$

$r_0 = 0x00000000$

$r_1 = 0x11112222$

POST:

$\text{mem}32[0x9000] = 0x11112222$

$r_0 = 0x12345678$

$r_1 = 0x11112222$



SWP  $R_d, R_m, [R_n]$   
 Content of  $R_d$  or  $R_m$  or  $[R_n]$  is copied to  $R_d$ ,  $R_m$ ,  $[R_n]$   
 $[r_2] = r_0$   
 $r_1 \rightarrow [r_2]$

Temporary register which I'm using is a memory

Content present in address  $R_d$  which is present in  $R_d$  is loaded to  $R_4$

is copied to  $R_4$  contained temp or trap

Block 1 (Read only) Block 2 (Read write)

0x12345678

$R_2$  is pointing to Read only memory

$R_3$  is pointing to Read write memory

I'm loading the content of memory location whose address is in  $R_2$ .

$R_0 = 0x12345678$

Through  $R_0$  I'm copying from 1 memory location to other.

Store the content of  $R_0$  into memory location whose address is in  $R_3$

Block 2

0x12348678

Now Clear  $R_0$  with [zeroes]

$R_0 = 0x0000 0000$

Next load any immediate value into  $R_1$ .

$R_1 = 0x8000000F$

SWP

$R_0, R_1, [R_3]$

$R_0$  becomes my destination register which will hold earlier content of  $R_3$  &  $R_3$  content is copied into  $R_3$ .

R2 is overwritten on to the memory location

Store cannot be used with ROM. So I'm using another memory location

Programs:-

(changing the content of memory with the register contents)

AREA swapMC, CODE, READONLY

ENTRY

LDR R2, =BLOCK1

LDR R3, =BLOCK2

LDR R4, [R2]

STR R4, [R3]

MOV R0, #0x00000000

MOV R1, #0x10000002 ; Immediate value I've taken

SWP R0,R1,[R3]

B L

BLOCK1 DCD 0x12345678 ; Declaring the data

AREA myDATA, DATA, READWRITE

BLOCK2 DCD 0 ; I've written this down bcz

ROM is only read only memory. But I want Block2 to be Readwrite memory.

R2 is temporary  
In R0 now R2 is present  
R3 = 0x12345678 at Last! Finish.  
R0 R1, [R3] at Present  
R2 content is placed in R0  
R3 content is placed in R0

This program is about swapping 32-bit data among memory & register.

## Program

```
AREA swapMC, CODE, READONLY
ENTRY
    LDR R2, =BLOCK1
    LDR R3, =BLOCK2
    LDR R4, [R2]
    STR R4, [R3]
    MOV R0, #00
    MOV R1, #0x8000000F
    SWP R0, R2, [R3]
    B L
BLOCK1 DCD 0x12345678
AREA myDATA, DATA, READWRITE
BLOCK2 DCD 0
```

END

END

SWPB (Swap Byte)  
 Complete ~~8-bit~~ data is swapped. 2 byte = 8 bit  
Program (about swapping 8-bit data b/w memory & register.)

| Register | Value |
|----------|-------|
|----------|-------|

|    |            |
|----|------------|
| R0 | 0x00000012 |
| R1 | 0x00000005 |
| R2 | 0x00000020 |
| R3 | 0x40000000 |
| R4 | 0x00000012 |

```

    AREA swapMC, CODE, READONLY
    ENTRY
        LDR R2, =BLOCK1
        LDR R3, =BLOCK2
        LDRB R4, [R2] ; loading 8-bit
                        data; 1 byte
                        = 8 bit
        STRB R4, [R3] ; storing 8-bit data
        MOV R0, #00
        MOV R1, #05 ; Immediate value
                      I've taken

```

SWPB R0, R1, [R3]

L B L

BLOCK1 DCB 0x12 ; Declaring the data

It is also 8 bit  
but it is hexadecimal

AREA myDATA, DATA, READWRITE

BLOCK2 DCB 0 ; I've written this down bcz ROM  
is only read only memory.

END

Block2) DCB 0  
 (→ This line will tell  
 the Assembler to reserve  
 memory locations of type  
 DCB (you can write whatever  
 you want all. to your program)  
 & initialize them with  
 zeros (you can initialize  
 with any number).

Zero bez  
 If I'll tell the  
 Assembler to reserve memory  
 locations of type DCB (or  
 anything) & initialize them  
 with zeroes

you  
 can write  
 zero or  
 one or  
 anything. It is written so that  
 garbage value present  
 earlier is removed & you  
 can store fresh value in  
 it.

Q. Develop an assembly language program to implement semaphores & mutual exclusion in operating system.

→ This example shows a simple dataguard that can be used to protect data from being written by another task. The SWP instruction "holds the bus" until the transaction is complete.

```
spin
    MOV    r1, =semaphore
    MOV    r2, #1
    SWP    r3, r2, [r1]      ; hold the bus until complete
    CMP    r3, #1
    BEQ    spin
```

Tracing



← This is a  
memory  
block

r2 = 1 ← register

AT&T format A3FA

r3, r2, [r1] ← Semaphore

80, C113 1  
r25

r3 = 1 maybe

CMP r3 #1

Compare r3 with 1

Yes.

Then BEQ spin

Branch if equal to.

If r3 = 1 then

spin here only !!

## Loading constants

LDR Rd = constant value

ADR Rd = memory address

LDR is used to hold the starting address of the memory location as well as to hold a constant value in the memory location.

ADR ~~is used to~~, hold only memory address.

### Our program:



Both are pointing to same memory location  
ie are holding starting address of BLOCK1

Now copy content of memory location pointed by R0 into R2 & that of R1 into R3.

So, R2 & R3 will have the same content.

### Program:

Question :- write content of R0, R1, R2 & R3 will have same memory address will have same content

```
AREA MS, CODE, READONLY
ENTRY
LDR R0, =BLOCK1
ADR R1, BLOCK1
LDR R2, [R0]
LDR R3, [R1]
BLOCK1 DCD 0x12345678
L B L
END
```

- \* We've 7 modes.
  - \* You know what there is user mode & supervisor mode.
  - \* Only if you are in supervisor mode, you can modify the content of CPSR
    - ↳ current program status register.
  - \* MRS Rd, CPSR → Syntax of MRS  
 $0x00000003$  → value of user mode with no interrupts
    - ↳ every program starts with CPSR at the value 0003.
    - In our program it is loaded into R1
- ```

AREA MB, CODE, READONLY
        ENTRY
        MRS R1, CPSR
        L    B    L
END

```

Program:-

$$R_0 = 0x00000003$$

$$1100 \quad 0000 \quad 0000 \quad 0000 \quad 0000 \quad 0000 \quad 0000 \quad 0011$$

$$R_1 = 0xE0000001$$

$$1110 \quad 0000 \quad 0000 \quad 0000 \quad 0000 \quad 0000 \quad 0000 \quad 0001$$

Rotate Right the content of R0 1 time

$$MOVS R2, R0, ROR #1$$

$$\begin{array}{ccccccccc}
 1100 & 0000 & 0000 & 0000 & 0000 & 0000 & 0000 & 0011 \\
 \downarrow & \downarrow \\
 110 & 0000 & 0000 & 0000 & 0000 & 0000 & 0000 & 0001
 \end{array}$$

C 1

MSB bit is one

$$\boxed{N=1}$$

indicates MSB bit

$$\begin{aligned}
 \text{So, } N &= 1 \\
 Z &= 0 \\
 C &= 1 \\
 V &= 0
 \end{aligned}$$

last bit  
shifted out  
of barrel shifter  
is one. here  
 $\boxed{C=1}$

Whatever we've located is available in our  
destination register (R<sub>2</sub>)

CMP R<sub>2</sub>, R<sub>1</sub>

(you are comparing both numbers)

Both are equal.

C = 1 bcz it is retaining the last one shifted out of the barrel shifter.

$\text{result} = 200$

| Register | Value      |
|----------|------------|
| R0       | 0x00000003 |
| R1       | 0xE0000001 |
| R2       | 0x00000004 |

loadingconstants.s

|       |               |
|-------|---------------|
| AREA  | loadingconsta |
| ENTRY |               |

```
    MOV R0, #0x00000003 ; R0 = 9  
    MOV R1, #0x00000001 ; R1 = 1
```

```

    D CPSR
    N 0
    Z 1
    C 1
    V 0

    MOVS R2, R0, ROR #1 ; R2=9 R0=9 STATUS of NZCV?
    CMP R2, R4 ; R2=9 R1=9 STATUS of NZCV?
    END

```

# Program of Termwork 4

```

 AREA MCTW4, CODE, READONLY
 MOV R6, #6 ; COUNT
 LDR R4, =VALUE ; ARRAY
 LDR R8, =RESULT ; RESULT/MEMORY

 LDR R2,[R4],#4 ; 1st number into R2
 LDR R4,[R4],#4 ; and number into R4
 CMP R2,R4
 BHI LOOP ; IF R2>R4
 MOV R2,R4 ; COPY R4 INTO R2
 SUBS R6,R6,#1 ; R6 = R6-1
 LOOP1
 BNE LOOP
 STR R2,[R8] ; COPY LARGEST NUMBER PRESENT IN R2 INTO
 R8
 VALUE DCD 0x11111111, 0x44444444, 0x22222222, 0x33333333,
 0x55555555, 0x77777777, 0x66666666
 AREA MYDATA, DATA, READWRITE
 RESULT DCD 0
 END

```

| Tw4       |                         |
|-----------|-------------------------|
| Registers | Value                   |
| R0        | 0x00000000              |
| R1        | 0x00000044              |
| R2        | 0x77777777              |
| R3        | 0x00000000              |
| R4        | 0x77777777              |
| R5        | 0x00000000              |
| R6        | 0x77777777 > 0xFFFFFFF8 |
| R7        | 0x00000000              |
| R8        | 0x40000000              |

| Memory 1   |          |
|------------|----------|
| Address    | Value    |
| 0x40000000 | 77777777 |

Practice Programs

In memory may or be smallest.

TW4

- Q. Develop an assembly language program to find the largest number from an array and store the result in RAM location.

Explanation:

- \* Given  $n$  elements, you compare 2 at a time. So no. of comparisons =  $n-1$ .
- \* Counter registers  $\rightarrow$  till  $(n-1)$
- \* BHI if first no. > second no.
- \* If you write code in such a way that if 1st no. is greater then proceed with subtraction. else interchange 1st & 2nd no.
- \* Assembly directive DEC  $\rightarrow$  becomes 32-bit later we are using.
- \* 
- \* SUBS R6, R6, #1 means I will decrement R6 by 1  
(if stored it again in R6  
indicating that 1 comparison got over)
- \* 
- \* R2 is holding largest number.  
Then first this content in R2  
the memory location whose address is stored in R8.
- \* BLS for finding smallest numbers & BHI for largest numbers. You just change this to BLS & run your program. You'll get 

# W6 (on Sorting)

tion  
gram

AREA MCTW6, CODE, READONLY

ENTRY

MOV R8, #4 ; counter register for array

LDR R1, =CVALUE ; R1 points to the array present in co

LDR R2, =DVALUE ; R2 points to the array present in co

DATA region

LDR R3, [R1], #4 ; Fetch a number from first array

STR R3, [R2], #4 ; Paste it in 2nd array

SUBS R8, R8, #1 ; Decrement counter register

CMP R8, #0 ; Compare R8 with zero

BNE LOOPD ; Once R8 becomes zero it goes to loop

If R8!=0, you continue data transfer

Once R8 =0, come out of loop.

START MOV R7, #0 ; swap flag is cleared.

; If there are n-elements, then number of comparisons = n-1. Let me take one register (say RS) to hold the no. of comparisons.

MOV RS, #3 ; Comparison COUNT.

LDR R8, =DVALUE

LDR R1, [R2], #4 ; LDR R1, content of R2 ; 1<sup>st</sup> number in R1

LDR R3, [R2] ; and number in R3

CMP R1, R3 ; R1-R3 will happen, then make use of branch instruction

BLT LOOP1 ; BLT means branch if lesser than. If you don't want to exchange (i.e., swap) then branch to LOOP1.

; else if you want to swap the two numbers then perform the below 4 steps. In these 4, the 1st two steps actually perform swapping.

STR R4, [R8], #-4

STR R3, [R8]

MOV R7, #1 ; swap = 1 ; flag to indicate that swapping has occurred.

ADD R8, #4 ; Restore Pointer

SUBS R5, R5, #1

CMPI R5, #0

BNE LOOPQ

CMP R7, #0

BNE START

B L

L CVALUE DCD 0x44444444, 0x11111111, 0x33333333, 0x22222222

AREA MYDATA, DATA, READONLY

\$VALUE DCD 0

END

## Output

| Register | Value      |
|----------|------------|
| R0       | 0x00000000 |
| R1       | 0x33333333 |
| R2       | 0x4000000C |
| R3       | 0x44444444 |
| R4       | 0x00000000 |
| R5       | 0x00000000 |
| R6       | 0x00000000 |
| R7       | 0x00000000 |

## Memory

Address: 0x00000064

0x00000064: 44 44 44 44 11 11 11 11 33 33 22 22 22 22 64 00 00 00 00 00 40

Address: 0x40000000  
0x40000000: 11 11 11 11 22 22 22 22 33 33 33 33 44 44 44 44

~~Registers~~  
R15(CP) 0x00000054

## Efficient C Programming

8051 microcontroller was of 8-bit. So we used char datatype as far as possible.

But in efficient C programming,

1. avoid char C except modular arithmetic), declaring local variables.
2. use int & long for local variables.

1. char 8-bit Unsigned to 255 while  
2. short 16-bit Signed means the variable can store only two values.  
3. int 32-bit Signed  
4. long 32-bit Signed  
5. long long 64-bit (signed)

\* keep in mind that by default char is unsigned. ie not signed

We'll look at many eg. to show how the compiler translates C source to ARM assembly. Once you have a feel for this translation process, you can distinguish fast code from slow C code.

\* To keep our examples concrete, we have tested them using the following specific C compilers:

- armcc for ARM Developer suite version 1.1 (ADS1.1). You can license this compiler, or a later version, directly from ARM.
- arm-eabi-gcc version 4.95.0. This is the ARM target for the ENUC compiler, gcc and is freely available.

### BASIC C DATA TYPES

Let's start by looking at how ARM compilers handle the basic C data types. We'll see that some of these types are more efficient to use for local variables than others. There are also differences b/w the addressing modes available when loading & storing data of each type.

ARM processors have 32-bit registers & 32-bit data processing operations. The ARM architecture is a RISC load/store architecture.

In other words you must load values from memory into registers before acting on them. There are no arithmetic or logical instructions that manipulate values in memory directly.

Early versions of the ARM architecture (ARMv1 to ARM v3) provided hardware support for loading & storing unsigned 8-bit & unsigned or signed 32-bit values.

Table 5.1 Load & store instructions by ARM architecture.

| Architecture | Instruction | Action                                  |
|--------------|-------------|-----------------------------------------|
| Pre-ARMv4    | LDRB        | load an unsigned 8-bit value            |
|              | STRB        | store a signed or unsigned 8-bit value  |
|              | LDR         | load a signed or unsigned 32-bit value  |
|              | STR         | store a signed or unsigned 32-bit value |
| ARMv4        | LDRSB       | load a signed 8-bit value               |
|              | LDRH        | load an unsigned 16-bit value           |
|              | LDRSH       | load a signed 16-bit value              |
|              | STRH        | store a signed or unsigned 16-bit value |
| ARMv5        | LDRTD       | load a signed or unsigned 64-bit value  |
|              | STRTD       | store a signed or unsigned 64-bit value |

These architectures we used on processor prior to the ARMv7M. Table 5.4 shows the load/store instruction classes available by ARM architecture.

In Table 5.4, loads that act on 8- or 16-bit values extend the value to 32 bits before writing to an ARM register. Unsigned values are zero-extended, and signed values sign-extended. (Similarly, a store of an 8-) This means that the cast of a loaded value to an int type does not cost extra instructions. Similarly, a store of an 8- or 16-bit value selects the lowest 8 or 16 bits of the register. The cast of an int to smaller type does not cost extra instructions on a store.

The ARMv4 architecture & above support signed 8-bit & 16-bit loads & stores directly, through new instructions. Since these instructions are later addition, they do not support as many addressing modes as ARMv4 instructions. We'll see the effect of this in the e.g., the pre-ARMv3 in section 5.2.1.

ARMv5 adds instruction support for 64-bit load & stores. Finally, available in ARM7 & later cores, this is available in ARMv4, ARM processors were not good at handling signed prior to ARMv4. Therefore ARM compilers define char 8-bit or any 16-bit values. Therefore ARM compilers define char as an unsigned 8-bit value, rather than a signed 8-bit value as is typical in many other compilers.

ARMCC & gcc use the datatype mappings in Table 5.8 for an ARM compiler's exceptional case for type char as nothing as it can target. The problem when you're pointing code from another processor architecture, cause eg. is using a char type variable i as a loop counter, with common condition  $i \geq 0$ . As i is unsigned for the ARM compiler loop continuation terminate. Fortunately, armcc produces a warning the loop will never terminate. Compilers also provide a override in this situation: unsigned comparison with 0. Compilers also provide a switch to make char signed. For e.g., the command line option -fsigned-char will make char signed on gcc. The command line option -fsigned-char will have the same effect with armcc. Using an ARM4 processor, we assume that you're using an ARM4 processor. For the rest of this book we assume that you're using an ARM4 processor. This includes ARM7TDMI & all later processors.

### 5.2 Compiler datatype mappings

| 5.2 Data Type | Implementation |
|---------------|----------------|
|---------------|----------------|

|           |                            |
|-----------|----------------------------|
| char      | unsigned 8-bit byte        |
| short     | signed 16-bit halfword     |
| int       | signed 32-bit word         |
| long      | signed 32-bit word         |
| long long | signed 64-bit double word. |

To see the effect of local variable types, let's consider a simple e.g., we'll look in detail at a checksum function that sums the values in a data packet. Most communication protocols (such as TCP/IP) have a checksum or cyclic redundancy check (CRC) routine to check for errors in a data packet.

for ( $i = 0$ ;  $i < 64$ ;  $i++$ )  $\rightarrow$  I'm adding 64 numbers

```
sum += data[i];
```

```
} return sum;
```

The following code checksums a data packet containing 64 words. It shows why you should avoid using char for local variables.

Let's see how compiler is going to produce assembled code for this :-  
we've added labels & comments to make the assembly clear.

checksum\_v1

MOV R2, R0 ; R2 = data loading starting address of address whose name is data in register

MOV R0, #0 ; sum = 0 initialize sum to zero.

MOV R1, #0 ; i=0 → for iteration

checksum-v1-loop  
~~reg LDR r1, [r2, r3]~~  
~~r1 is leftshifted & times and is stored in r2~~  
~~you're going to LDR r3, [r2, r3], LSL #2~~  
~~r3 = data[i]~~  
~~to get index value.~~

**ADD**  $r1, r1, \#1$  ;  $r1 = r1 + 1$  incrementing i  
i.e.  $i = i + 1$

AND r1, r4, #0xFF ; i = char  
COMP R1, #0x4D : compare in 64

$\sum t = \sum i \rightarrow \text{sum} = \text{sum} + \text{data}[i]$

```

    10, -33, 10      3 = min, -1, -2, -3
    8 cc
    checknum = 71 - loop; if i < 64 loop

```

Conveniently  
and easily

MOV PC, R14 ; RETURN SUM

↳ means restore the content of program counter.

For checkups, we've to discard the ordinary & compute Dr's compliment for that.

Important points while tracking the above program.

\* shifting any data logically left ~~is nothing but multiplication.~~ (say 2)

~~0000000000~~ ~~0000000000~~ ~~0000000000~~ ~~0000000000~~

AND  $r1, r1, \#0xFF$   $r2$  is holding value  
 $r1 = 0x24$

$\overbrace{0x33 \quad 0x33 \quad 0x33 \quad 0x33}^{\rightarrow 32\text{bit data}}$  is char 8-bit wide

↓-bit data  
I'm interested only in this last  
8-bit

it'll extract only last 8-bit data.

to accomplish damage: Drought cause semi-dry

Mass St No.

- \* If you declare the local variable by int then you can remove the additional instruction AND if you make use of char to declare the local variable, you have to make use of additional instruction (because we converted 8-bit data to 32-bit)
- \* So, in vt program, we convert 8-bit data to 32-bit then again to 8-bit.
- \* Our registers are of 32-bit left + the content by 1, it is multiplied by 2, when you logically shift left the content by 2, it is multiplied by 4.
- \* When we are accessing 16-bit data, we put it by 2 locations, it covers each of 4 bytes at a time bcz of memory ~~two~~ locations.
- \* Now, let's see revision-2 of the checksum program where we remove the additional instruction AND by declaring the local variable as 2 int instead of char.
- \* When we use char, by default it is unsigned. But int is signed when default. As we are using unsigned integers explicitly, we have to declare ~~int~~ as unsigned.

Now, let's see revision-2 of the checksum program where we remove the additional instruction AND by declaring the local variable as 2 int instead of char. index bcz it can have values from 0 to 64 only +ve values.

unsigned int sum = 0;

```
for (i = 0; i < 64; i++) {
    sum += data[i];
}
return sum;
```

$sum = 0$

$sum = 0 + 0 = 0$

**Assembly language program**

checksum\_v2

```
MOV R2, R0 ; R2 is holding base address of array.
MOV R0, #0 ; sum = 0
MOV R4, #0 ; i = 0
            ; R2 = data
            ; sum = 0
            ; i = 0
            ; R0 = sum = 0.
            ; P.T.O
            ; R1 = i = 0
```

**Initialization:**

### checksum\_vq-loop

```
LDR R3, [R2, R1, LSL #2] ; R3 = data[i]
ADD R1, R1, #1
CMP R1, #0x40
; compare is 64
ADD R0, R3, R0
; sum = sum + R3
; (R0) (sum)
BCC checksum_vq_loop
MOV PC, R14
; if (cc<4) goto loop
; if (cc>4) return sum
; program link register
; count
```

LDR R3, [R2, R1, LSL #2] ; R3 is loaded with  
R1 = 0 LSL #2 = 0  
when you leftshift zero twice, the result  
R2 = 2000 + 0 = 2000 = move into R1  
R3 = 1 ( : ldr R1, [2000])  
; [ ] Content present in 2000  
; is loaded into R3.  
; Data  
; 2000 ; 1  
; sum = 0 + 1 = 1 //



r14 only use for PC → program counter.  
 link register will hold return address (after ~~units~~ it  
 should return  
 where ARM processor puts the actual address  
 whenever it calls a subroutine (i.e function)  
 r14 → contains the address of the next instruction to be fetched by the  
 processor.  
 r15 → PC → 0000 0001 → LSL one time  
 0000 0010 → LSL another time  
 (0000 0000) → LSL

|  |   |
|--|---|
|  | 1 |
|  | 2 |

this value is 4

$$\begin{aligned}
 R3 &= [2000 + 4] \leftarrow \text{content present in} \\
 &= [2004] \leftarrow \text{current address} \\
 &\equiv 2 \leftarrow \text{content of } 2004 \text{ is } 2.
 \end{aligned}$$

So we write tracing neatly.

If I'm interested in 16-bit data, I can use short.

The containers (ie registers) are of 32-bit.  
If you are interested in extracting 32-bit or 16-bit  
or 8-bit data out of that, you should decide.

```
short checksum_v3 (short *data)
{
    unsigned int i;
    short sum = 0;
    for (i = 0; i < 64; i++)
    {
        sum = (short) (sum + data[i]);
    }
    return sum;
}
```

Assembly language program for the above C prog.

checksum\_v3

|                                           |                                                       |
|-------------------------------------------|-------------------------------------------------------|
| MOV R2, R0 ; R2 = data                    | initialise R2 with base address of array              |
| MOV R0, #0 ; sum = 0                      |                                                       |
| MOV R1, #0 ; i = 0                        |                                                       |
| checksum_v3_loop                          | base shifting                                         |
| ADD R3, R2, R1, LSL #1 ; R3 = & data[i]   | ADD R3, R2, R1, LSL #1 ; R3 = & data[i]               |
| LDRH R3, [R3, #0] ; R8 = data[i]          | LDRH R3, [R3, #0] ; R8 = data[i]                      |
| ADD R4, R1, #1 ; i++                      | ADD R4, R1, #1 ; i++                                  |
| CMP R4, #0x40 ; R0 is sum. sum = sum + R3 | CMP R4, #0x40 ; R0 is sum. sum = sum + R3             |
| ADD R0, R3, R0 ; R0 is sum                | ADD R0, R3, R0 ; R0 is sum                            |
| MOV R0, R0, LSL #16                       | MOV R0, R0, LSL #16                                   |
| MOV R0, R0, ASR #16                       | MOV R0, R0, ASR #16                                   |
| BCE checksum_v3-loop                      | As it is 16-bit, it requires only 2 memory locations. |
| MOV PC, R14                               |                                                       |

As it is 16-bit, it requires only 2 memory locations.

Space to write-tracing of previous assembly level register in ROP

Register to index combination of bits and add base address, blends with offset for sum makes ROP

Combine prod & v\_registers prod

i'm here now

:0 = muz prod

ctrl\_inss :0 = 1 of

:0[ctrl\_inss] muz) prod = muz

inse next

prod made up of adding v\_registers prod

extra jumps, etc.

0x0000000000000000

The LDRH instruction doesn't allow for a shifted address of set, as the LDRS instruction did in checksumming.

LDRH doesn't support shifting operation on the instruction itself



Whenever you are reading 16-bit data from a 32-bit register, you don't know whether it is signed or unsigned. PATA

|    |   |    |    |    |    |             |
|----|---|----|----|----|----|-------------|
| 3  | 8 | 12 | 23 | 45 | 56 | 0           |
| 12 |   |    |    |    |    | 16-bit data |

If I want to remove vanish upper 16-bit data from the 32-bit data, I do LSL #16.  
↳ LSL 16 times

|    |    |    |    |
|----|----|----|----|
| 12 | 23 | 45 | 56 |
| 3  | 2  | 1  | 0  |

|    |    |   |     |    |
|----|----|---|-----|----|
| 45 | 56 | 1 | 001 | 00 |
|----|----|---|-----|----|

but, 45 56 00 00 is different from 45 56.

If I do LSR, my signed bit will be lost.  
If I make use of ASR  
So, I got only 45 56 ie. 16-bit data.

upper 16-bit will hold sign extension  
lower 16-bit will hold actual value

FF FF      45 56

So, finally I got only 45 56 ie. 16-bit data.

LDR was used however in R2 but in R3 we make use of (3 additional instructions)

```

ADD    R3, R2, R1, LSL #1, R2
MOV    R0, R0, LSC #16
MOV    R0, R0, ASR #16

```

Note:-

The cast reducing total + array[i] to a short (ie short data type) requires two MOV instruction.  
The compiler shifts left by 16 & then ~~right~~ by 16 to implement, a 16-bit sign extend.

Instead of converting each & every number to 16-bit in each iteration, I can do like this:  
Obtain 32-bit sum from this sum, you obtain only 16-bit data. Let's see it in version-4.

version - 4

```
short checksum_v4(Cshort *data)
{
    unsigned int i;
    int sum = 0;
    for (i = 0; i < 64; i++)
    {
        sum += *(data + i);
    }
}
```

Assembly language program for the above C program is

checksum\_v4  
 MOV R2, #0 ; sum = 0  
 MOV R1, #0 ; i = 0  
 checksum\_v4Loop  
 LDRSH R3, [R0], #8 ; R3 = \* (data++)  
 ADD R1, R4, #1 ; i++  
 CMP R1, #0x40 ; compare i, 64  
 ADD R2, R3, R2 ; sum += R3

Scanned with ACE Scanner

Instead of incrementing the index value, we are incrementing the pointer itself in version-4 assembly code.

\* Can you notice, I'm not using  $\text{R}0 \leftarrow \text{MOV R}2, \text{R}0$   
\* Need not load base address into  $\text{R}0$  register. So I  
didn't use  $\text{MOV R}2, \text{R}0$  as in previous case.

- So, finally you understood that
- In Assembly level pgm, char is unsigned.
  - Avoid using char & short data types bcz they lead to additional instructions and it'll lead to burden on the compiler & your program becomes slow.

Slow.

### Function Argument Types.

short add\_v1(Short a, short b)  
{  
    return a+c(b>>1);  
}  
  
add\_v1 output:  
add\_v1  
    R0 + (R1 right shifted by 1)  
ADD R0, R0, R1, ASR #1 ; R0 = Cint(a + Cint(b >> 1))  
MOV R0, R0, LSL #16  
MOV R0, R0, ASR #16 ; R0 = Cshort )sum  
MOV PC, R14 ; Return R0

ASR will retain the signed bit.  
So, we are using ASR & not LSR.



## callee function

- narrow means all necessary data conversion is done by the callee function & then returned to the calling (or main()) function.

→ callee function will return necessary data to the calling function.

- wide means the callee function will return necessary data to the calling function.

ANSI C Prototype → `#include <ansi.h>`

`int sum(int a, int b);`

`ANSI C 1990`

Note: → compiler will not convert arguments to 16-bit type sum if it's not converted to 16-bit type. It'll compute the sum of the function.

The armcc compiler uses ANSI C prototype of the function where the function arguments are passed narrow.

The gcc compiler makes no assumptions about the range of argument value. It reduces the input arguments to the range of short in both, the caller and the callee function. It also casts the return value to a short type.

→ In gcc compiler a 16-bit will be converted to 16-bit to sum it, then to 32-bit & sum of it is also converted by gcc.

values

## Signed versus unsigned Types

```
int average_v1(int a, int b)
{
    return (a+b)/2;
}
```

→ All arguments are  
of integer type,  
& also the return type.

(a & b are signed numbers)

average\_v1

ADD R0, R0, R1 ; R0 = a+b

ADD R0, R0, R0, LSR #31 ; if R0<0; R0++

Mov R0, R0, ASR #1 ; R0 = R0 >> 1 ; R0/2

Mov PC, R14 ; Return R0

R0 gets divided  
by 2.

• Bcz of the constraint on hardware architecture, we don't have division operation in assembly language program.

- To perform the division operation, I'll use LSR
- I'm dealing with signed numbers.  
↳ a & b are signed.  
↳ means either +ve or -ve number.
- If you shift right anything 31 times, then msb will come to the position of lsb

e.g:-

1st I'll do LSR 31 times

→ only the  
MSB bit  
will be retained  
in LSB position  
& rest all bits  
are lost

\* If the sum is -ve, I'm adding 1 to it.  
else I'll keep the sum as it is.

- \* magnitude is represented by Q's complement & MSB represents sign in a negative numbers in a computer system.

$$\text{sum} = -3$$

$$+3 = 0011$$

$$\begin{array}{r} 0011 \\ 1100 \\ \hline + & & 1 \\ \hline 1101 \end{array}$$

-3 in 4bit is 1101

-3 in 32-bit is 2

sign extension.  An arrow points from the text "sign extension" to the first column of bits. Another arrow points from the text "represents magnitude" to the remaining columns of bits.

signature

is - 2

is 0

whenever you want to divide a -ve number by 2, check whether result (it) is  $< 0$ . If so add 1 to it. If it is +ve nothing is added

bit:

## Input devices

- Sensors
- switches

→ is basically a resistor.

Potentiometer is used to vary the intensity

DC motor

" " " " " speed.

## Output

- Motor → eg: DC motor, Stepper motor.
- LCD
- 7-segment display
- LED

## Convertors

• Analog to Digital convertors

• Digital to Analog

The parameter which will be varying over time is analog in nature. Eg: Temperature, Speech, etc.

8 ADC's → 14 analog channels

↓  
Analog to Digital convertors  
to read analog & convert it to digital onchip itself.

D2A convertor → Digital to Analog convertors.

IC → Integrated circuit of ARM7  
of ARM7

8051 → 40 pin IC

ARM7 is advanced than 8051.

ARM7 is 64 pin IC, available in quad core.  
quad core processor.

↳ on each side 16 pins.  $16 \times 4 = 64$  pins.  
each pin can have min. 1 functionality & max 4  
functionalities.

Single pin can work in a different mode means the  
Pin is multiplexed in nature.

How can I select the functionality of pin or it is  
multiplexed in nature?  
We're Pin Selectors

GPIO → General Purpose Input Output.

8051 has 4 ports

P0, P1, P2, P3.

DRM7 has

2 ports

P0 & P1 each 32-bit wide

How to configure port zero pins as input ports

Q u u " one u " output "

PORT 0 : 32 bit  
But only 8 pins available as 4PIO. In fact only 1 pin is available

PORT 1 : 32 bit  
But only 16 pins available as 8P0

8P0.

Total

So, 45 pins are available for 4PIO

4PIO

P1.0 - P1.45

(use of Port)

There are 4 functions. If you want to select one among them, then how many bits do you need?

$$2^0 = 1 \text{ bit}$$

Ask mam.

functions

| Control bit b1 | Control bit b2 | functions       |
|----------------|----------------|-----------------|
| 0              | 0              | 4PIO            |
| 0              | 1              | 1st alternative |
| 1              | 0              | 2nd alt         |
| 1              | 1              | 3rd alt.        |

$$2^4 = 16 \text{ combinations}$$

1/9

### 3 pin select Registers

↳ all are 32-bit wide.

↳ all are 32-bit wide. Whereas for Port 0, it'll use  
in number 0-15 pin number whereas for Port 1 pin select register's  
configuration register. It contains 32-bit wide to the user  
to port + 1 pin select register from port + 1  
↳ It configures functionality of pins. So, only 16 pins are available  
↳ port 1 pin select register. So, I'll use only one pin select  
↳ port 1 pin select register for port 1. So, I'll use only one pin select  
↳ port 1 pin select register for port 1.

PWM Pulse Bit modulated input.

Do you know what value? I know about set & reset  
reset value?

OR if

you want to make any of the output pin high ->

then we set.  
you'll use reset if you want to clear the output

you can't use set & reset for input pins.

The combination "00" is always for GPIO functionality.

1 → Output } in ARM  
0 → Input }

0 → 0 in 8051  
1 → 1 in microcontroller.

\* PINSEL 0 used to access pins from P0.0 to P0.15  
\* PINSEL 1 " " " " " " P0.16 to P0.31

STRUCTURE OF ARM 'C' program

↳ include headerfile.  
header file used here is LPC21xx.h  
#include <LPC21xx.h>

2) Function declaration  
see that its return value is integer rather than char or

short  
3) Global variables  
4) int main() ← Always main() should be of integer data type.

5) To hardware always blinking the LED will be first program.

Assume  $CED$  is connected to  $P_0$

PINSER 0 is used as P'm using only P0.0

We're interested only in 1887-1890 & 1907-1910

00000  
00000  
00000

you've to make **0x0000 0000** only!!) Port 0  
(0-15) pin are EPTO

~~32 bit~~ register it will receive direction to assign individual pins. It will have 8 value pins dedicated for input and 8 pins dedicated for output.

$\rightarrow$  ~~SO~~ ~~all~~ ~~padding~~  $\rightarrow$  ~~Na~~  
ZODIR = 0x0000 0000  $\rightarrow$  making all ~~emptying~~ available  
 $\downarrow$   $\rightarrow$  0x PFFF FFFF so all one's means  $\&$  F's  
to define  $\hookrightarrow$  memory of size 4 bytes  
thus it is

whether it means an input pin or output pin. Direction ↓

output pin, etc.  
will tell it.

200 DTR for  
is dedicated  
port zero pins  
Port one pins  
are 201 DTR  
dedicated for  
port 1.

1 indicates that,  
that particular pin  
is configured as output  
pin

$\rightarrow$  to set values  
 $\text{POSET} = \text{FFFFFFFFFF}$

→ to see the values  
DOCLR = PPPPPPPP

~~PODSET = 0x00FF0000~~ and ~~0x000000FF~~

→ info. etc.

pin no. 16 to 23 associated pin no 4-57

## Embedded C Terminology

### TW<sub>1</sub> (on LED blinking)

#### Program

```
#include <LPC21xx.h>
```

```
unsigned int delay;
```

```
int main()
```

```
{ PINSEL1 = 0x00000000; // configure P0.16 to P0.23 as INPUT
```

```
P0DIR = 0x00000000; // configure P0.16 to P0.23 as OUTPUT pins
```

```
while(1)
```

pin no.  
23  
22  
21  
20  
19  
18  
17  
16

```
{ P0SET = 0x00FF0000; // SET PINS 16-23 OF PORT 0
```

```
for (delay = 0; delay < 10000; delay++);
```

```
P0CLR = 0x00FF0000; // CLEAR PINS 16-23 OF PORT 0
```

```
for (delay = 0; delay < 10000; delay++);
```

```
To clear them  
we are passing 1's
```

```
1's
```

Output

Logic Analyzer



General Purpose Input/Output (GPIO) - Slow Interface

| DDR:   | 0xFFFFFFFF |
|--------|------------|
| TOSET: | 0x00000000 |
| TOCLR: | 0x00000000 |
| TOEN:  | 0x00000000 |
| Pin:   | 0x00000000 |



6. Next go to Output tab  Click

f. Next go to linker tab & check  
 Use memory layout from Target Dialog

This will allow compiler to use on chip RAM of ARM  
g. Click on OK.

Next steps :-

1. Right click on source group  
2. Add existing files...  
3. Select your file & then **Add** & **Close** you click.

Next steps:-

1. Translate, Build, Start-Stop-Debug Session.

2. Go to Peripherals

3. Select GPIO Slave Interface & then Port 0

A small window appears. keep it as it. It is to see blinking of LED.

4. Go to Debug & click on Run. (You can



click here

logic analyzer

OK

click on run & wait

select it & then close.

Now or at the end.

both are OK.

Out to see waves &

blinking ticks.

6. Go to **Setup...**

A window appears.

Setup Logic Analyzer  
Current logic Analyzer signals:  
TOSET

click on this & wait  
select it & then close.  
this window will go  
after clicking "close".

Display for Bit

Color:

**Close**

f. Now, right click. Some options appear.

Select Bit.

Select Setup min/max from Recording

Analog  
Bit   
Digital  
setup min/max from Recording

8. As you have clicked on Run, you can see movements of wavy form.

So, like this when you finish checking blinking of LED on logic analyzer window, now how to check blinking on board, let us see.

Scanned with ACE Scanner

# \* Counter program of Embedded C

## Program:-

```
#include <LPCQ1xx.h>
void delay(void);
unsigned int count;
int main()
{
    unsigned int comp=0; //compliment
    PINSEL1 = 0x00000000; //Configure Port0C(16-31) as BiPIO pins
    IODIR = 0xFFFFFFFF; //configure P0.16-P0.31 as OUTPUT
    while(1)
    {
        for(count = 0; count <= 0xFF; count++)
        {
            comp = (~count); //Just to ensure that after 255, 0 should
                               //come.
            comp = comp & 0x0000000F; //to fetch lower 8-bit data
            //Now I'm supposed to send it on I/O pin
            IO0PIN = (comp << 16); // You can see the status of COUNT
                               //from this pin. IO0PIN because
                               //I'm using PORT0. ^> IO zero
                               //pin ^> Port zero
            delay();
        }
    }
}

void delay(void)
{
    unsigned int i;
    for(i=0; i<6500000; i++);
}
```