

## Set B

Q.1. Explain the internal architecture of a general purpose processor and on basis of the instruction between memory and CPU, classify and draw the 3 prominent architectures.

- ① The GPPs are further classified as micro-controller and micro-processors. A microcontroller has memory and other peripherals on the chip itself, hence it is best choice for small embedded systems. On the other hand, a microprocessor is more powerful but requires a large number of external components.
- ② Internal architecture of a processor.



The CPU consists of:

- ① Arithmetic Logic Unit (ALU) which performs arithmetic and logic operations.
- ② General purpose registers. These registers constitute the processor's internal memory. The number of registers varies from processor to processor. Registers contain the current data and operands that are being manipulated by the processor.
- ③ Control unit that fetches the instructions from memory, decodes them and executes them. Control unit consists of
  - Instruction pointer that points to the next instruction to be executed. Instruction pointer also called program counter.
  - Stack pointer that points to the stack in memory. In external

contents of registers are transferred to this stack. The processor keeps track of the next free location in the stack through stack pointer.

- Instruction Decoder that decodes the instructions.

- Memory Address Register and memory Data Register

Processor Architectures :-

In addition to manipulating data, a processor's job is to read data and instructions from memory and read and write data to a memory, write data to output devices and read data from input devices. To do these functions, the processor communicates with other devices using three buses, a bus being a group of signals.

Buses are :-

- Data Bus which carries the data b/w the processor and other devices. This bus is bi-directional.

- Address Bus which carries the address information from the processor to memory and hence this bus is unidirectional.

- Control and status bus which carries control/status information such as whether the operation is read or write, indication of address error, as well as processor reset signal, clock input and interrupt signal.

#### A) Interaction between CPU and memory



Based on the number of memory and data buses used, there are three types of architectures for the processors.

- These are :-
- ① Von Neumann Architecture
  - ② Harvard Architecture
  - ③ Super Harvard Architecture

### ① Von Neumann Architecture :-



The Von-Neumann architecture is the most widely used architecture, this architecture has one memory chip which stores both instructions and data. The processor interacts with the memory through address and data buses to fetch instructions and data.

### ② Harvard Architecture :-



In this architecture there are two separate memory blocks. One is program memory and other is data memory. Program memory stores only instructions and data memory stores only data. Two pairs of data buses are between the CPU and the memory blocks. Program memory blocks address bus and program memory data bus are used to access the program memory. Data memory address bus and data memory data bus are used to access the data memory.

## (8) Super-Harvard Architecture.



The super Harvard Architecture is a slight but significant modification of the Harvard architecture. In Hardware architecture, the data memory is accessed more frequently than the program memory. Therefore, SHARC provision has been made to store some secondary data in the program memory to balance the load on both memory blocks. This architecture, proposed by analog devices, is used extensively in digital signal processors.

**Q.1.** Explain the services provided by an operating system and provide the main basis for classification of an OS. Provide in brief the services required specific to embedded OS.

- Services provided by an operating system are:-
- 1) Process / Task management
  - 2) memory management
  - 3) I/O management + File system management
  - 4) Providing services to applications
  - 5) Providing user interface to access OS via API

### Operating System

| single Task OS<br>VS<br>multitask OS | single user OS<br>VS<br>multi-user OS               | Commands<br>VS<br>GUI         |
|--------------------------------------|-----------------------------------------------------|-------------------------------|
| MS DOS<br>VS<br>Windows &<br>Unix    | MSDOS & Windos<br>FOR EDUCATIONAL USE<br>VS<br>Unix | MSDOS VS<br>Windows &<br>Unix |
| daram®                               |                                                     |                               |

- Services Required for embedded operating system :-
- 1) Reliability (99.999% downtime per year)
- 2) Multi-tasking with time constraints
- 3) small footprint - (No frills memory hence no GUI)
- 4) support Diskless systems - (No secondary storage)
- 5) Portability - (Varies w.r.t processor unlike Intel)
- 6) scalability - (8 bit to 64 bit)
- 7) support for API (function calls) via POSIX

1) Reliability :- The OS is an embedded system has to be very reliable. If the system has to have an availability of 99.999% of the time, the downtime per year is just about 5 minutes.

2) Multi-Tasking with time constraints : Embedded systems need to support multi-tasking. All operating systems used in embedded systems support this feature. But then, the task management has to be done efficiently to meet the real-time performance requirements.

3) small footprint :- As the memory devices have limited capacity in embedded systems, they have little memory for the OS. Many frills provided by operating systems such as Windows are not provided in embedded operating systems. The memory occupied by the operating system is known as the 'footprint'. The footprint should be very small for an embedded OS.

4) Support diskless systems:- Unlike the desktop computers, embedded systems may not have secondary storage such as hard disk or CDROM. The embedded OS alone

with application software will reside on a memory chip.  
File system management is not mandatory in embedded systems.

- 5) **Portability** :- A variety of processors are available for developing embedded systems unlike desktop computers whose processors are predominantly the Intel family processors. So, an important requirement of embedded operating systems is portability.
- 6) **Scalability** :- The embedded operating systems may be used on an 8-bit micro-controller or a powerful 64-bit microprocessor. So, scalability is very important for embedded operating systems.
- 7) **Support for standard API** :- Application software is developed using the Application Programming Interface (API) of the operating system. API is a set of function calls. An application developed for one OS may not be portable to another OS. To achieve portability, IEEE standardized the API called Portable Operating System Interface (POSIX).

Q.2.

Explain the design challenges, design trade-offs and design metrics in the Hardware-software co-design principles of an Embedded system using a suitable Venn diagram where needed.

\* Design metrics :- while designing an embedded system following points are to be considered

i) NRE (Non-Recurring Expenditure) cost :-

This includes the costs that are occurred only once and are not for each system. This includes the development cost of the system are other investments that are to be done.

ii) Unit Cost :- This is the cost of each unit. The cost includes the components costs and the manufacturing cost involved in it. It excludes the cost of development of the system for the 1<sup>st</sup> time.

iii) Size :- size of the system is measured in terms of

i) Physical space required

ii) memory capacity often measured in bytes for software and data.

iii) No. of logic gates in the hardware.

iv) Performance :- The performance of system is measured by the time required to execute a task. Smaller the execution time means higher performance.

v) Power :- The power consumption is also an imp attribute for the handled ES or ES used in field i.e. powered by the battery. If the power consumption is less the battery lasts longer and hence it is considered as a better system.

- vi) **Flexibility** :- A system that can have changes made in it without investing in research or the NRE cost are said to have flexibilities. Hence a system that has more flexibilities is better.
- vii) **Time-to-prototype** :- The time required to make a proto-type of the system is very imp. The time required should not be very high, else another competitor may bring the product in the market.
- viii) **Time-to-market** :- Time-to-market depends on design time, manufacturing time and testing time.
- ix) **Maintability** :- Modification in the system can be done by adding, or updating software, data and hardware.
- x) **System and user safety** :- This metric ensures that the system will not cause any harm. System safety means safety in terms of accidental fall from hand or table, theft e.g. a mobile phone locking ability and tracking ability. User safety means safety for user when using a product.



- Design Trade-offs :-

- ① NRE cost :- It is the non-recurring engineering cost. It is only one time cost including research, prototype preparing, PCB design cost.
- ② Size :- Size of the system is measured in the term of physical space required, minimum capacity measured for (S/W) bytes, No. of logic gates used in H/W.
- ③ Power :- ES are powered by batteries. If power consumption is less the battery lasts longer and hence it is considered as a better system.
- ④ Performance :- It is measured by time required to execute a task. Smaller the execution time, means higher performances.

These metrics typically compete with each other. Improving one often leads to degrading another. For eg. if we reduce the size of ES, it will affect the performance of system. If we reduce the power consumption of the system, the NRE cost can be affected. This phenomenon is compared to a wheel with these 4 pins called as design tradeoffs.

- Design challenges :- Any ES when being designed, 3 factors are considered to be important including quality of the system, speed of the system and its cost.

For any system, its quality of the materials used and its speed of execution should be as high as possible. If there were to be implemented, the overall cost of the system will increase which has to be avoided.

So the designer has to design a system very effectively where its quality and speed cannot be compromised for reasonable cost. All the 3 factors will be perfectly balanced.

Q.2. Product life cycle or sales and revenue model.

→ In this model, the life cycle of a product right from its expected release to its end is described. The graphs describe and composed between the sales and revenue if the product was launched on time (scheduled time) & if it was released on delayed time.

The loss of sales and revenue loss is also found out.

$$\text{product life cycle} = 2W$$

$$\text{max sale of } = W$$

$$\text{Area of triangle} = \text{total revenue}$$

$$\text{Area (on time)} = \frac{1}{2} \times 2W \times W = W^2$$

$$\text{Area (delayed)} = \frac{1}{2} \times (2W-D) \times (W-D)$$

$$\text{Loss} = W^2 - \frac{(2W-D)(W-D)}{2} = \frac{D(3W-D)}{2}$$

$$\% \text{ revenue loss} = \frac{\text{delayed}}{\text{original}} \times 100 = \frac{D(3W-D)}{2W^2} \times 100$$

choice A) :-  $x = 25,000, y = 5,000$

$$\text{Total cost} = x + Ny = 5,000, 25,000$$

$$\text{Per unit cost} = (x/N) + y = 5002.5$$

choice B) :-  $x = 45,000, y = 3500$

$$\text{Total cost} = x + Ny = 3500, 45,000$$

$$\text{Per unit cost} = (x/N) + y = 3504.5$$

choice C) :-  $x = 36,000, y = 4,200$

$$\text{Total cost} = x + Ny = 4,200, 36,000$$

$$\text{Per unit cost} = (x/N) + y = 4203.6$$



FOR EDUCATIONAL USE

EDUCATIONAL USE



Explain SPI and I<sub>2</sub>C interface to the processor.

→ SPI has developed by Motorola, peripheral devices such as memory chips, potentiometer, ADCs and DACs, Real-time-clock etc. are provided with SPI interface so that they can be interfaced to the processor. The processor generates the clock and the peripheral use the clock to synchronize its acquisition of the data.



SPI uses four type of signals for interfacing peripherals to the processor.

- Master out slave in (MOSI)
- Master in slave out (MISO)
- serial clock (SCLK)
- chip select (CS) for the peripherals

SPI is based on the master-slave protocol. The processor acts as the master and the peripheral acts as the slave. The processor and the peripherals are connected using these 4 signals. Both the master and slave contain shift registers. The master sends a byte to the slave on MOSI line, and slave sends its register contents on MISO line. Both read and write can be simultaneously. If the master has to read a byte from the slave, it has to write a dummy byte to initialize the slave for transmission.

This is a synchronous protocol for communication between the processor and peripheral.

#### I<sub>2</sub>C Bus:-



I<sub>2</sub>C Bus uses 2 wires for connecting devices. The bus is bidirectional and synchronous to a common clock. Microcontroller with built in I<sub>2</sub>C are available. Data rates of 100 kbps (standard mode) and 400 kbps (fast mode) are supported.

The Bus consists of two lines - serial clock (SCL) and serial data (SDA). Both lines remain high when not in use. A device using the bus drives the line low. Each device has a unique address of 7-bits or 10-bits. If 7-bits are used, 128 devices can be connected to the bus. A device can act as a master or slave. Transmitting device is the master and the receiving device is the slave. The same line is used for master transmission and slave response. I<sub>2</sub>C bus is a multi-master bus, more than one device can act as a master. We can interface a RTC such as philips PCF83B3 or a display of matrix.

I<sub>2</sub>C is a shared Bus system to which many no. of I<sub>2</sub>C device can be connected. Devices connected to the I<sub>2</sub>C bus can act as either master device or 'slave' device.

Master and slave devices can act as either transmitter or receiver. Regardless whether a master is acting as transmitter or receiver, the synchronization

clock signal is generated by the master's device only.

I<sub>2</sub>C supports multi masters on the same bus. The following diagram illustrate the connection:



The I<sub>2</sub>C bus interface is built around an input buffer and an open drain or collector transistor. When the bus is in the idle state, the open drain/collector transistor will be in the floating state and the output lines (SDA and SCL) switch to the high impedance state.

a. Compare ARM 7, 9, 10 & 11 or compare ARM m0, m3, m4  
on the basis of following:

- Pipeline depth, clock frequency, power, throughput, architecture, multiplier.

|                                  | parameters | ARM 7   | ARM 9   | ARM 10   | ARM 11 |
|----------------------------------|------------|---------|---------|----------|--------|
| 1) pipeline depth                | 3-stage    | 5-stage | 6-stage | 8-stages |        |
| 2) Typical clock frequency (MHz) | 80         | 150     | 260     | 335      |        |
| 3) power (mw/mHz)                | 0.06       | 0.19    | 0.50    | 0.040    |        |
| 4) Throughput (MIPS/MHz)         | 0.097      | 1.1     | 1.3     | 1.2      |        |
| 5) Architecture                  | Von Neuman | Harvard | Harvard | Harvard  |        |
| 6) Multiplier                    | 8 * 32     | 8 * 32  | 16 * 32 | 16 * 32  |        |

• compare ARM m0/-, m3, m4

|                         | Parameters | m0/-     | m3      | m4       | m0         |
|-------------------------|------------|----------|---------|----------|------------|
| ① pipeline depth        | 2-stage    | 3-stage  | 3-stage | 3-stage  | 3 stage    |
| ② Power (watt)          | 474        | 141      | 151     | 66       |            |
| ③ Throughput (MIPS/MHz) | 1.37       | 1.89     | 1.95    | 1.21     |            |
| ④ Architecture          | ARM V6-M   | ARM V7-M | ARM-7M  | ARM V6-M | Von-neuman |

Q.4. Write a Branch instruction and conditional execution instruction to execute the following condition & compare the code density.

- IF contents of register r5 are equal to 20 add the contents of r5 to register r2 after logically shifting r2 by 4 bits & store this results in r0.

Explain the difference in the below instructions.

- LDR r0, [r1, #4]
- STR r0, [r1, #12]
- LDR r0, [r1, #4]!
- STR r0, [r1, #12]!
- LDR r0, [r1], #4
- STR r0, [r1], #12
- LDRSH r0, [r1]
- STRB r0, r1

→ A) Using Branch Instruction :-

CMP r5, #20  
BEQ SKIP      (If zero flag is set)

...

...

SKIP: ADD r0, r5, r2, LSL#4

B) Using conditional execution instructions :-

CMP r5, #20  
ADDEQ r0, r5, r2, LSL#4

By comparing code density of case ① and ② we can see that by using conditional instructions it requires less code, instructions which makes code more dense.

Compare to Branch instruction extra machine cycle gets wasted.

- ①  $LDR \quad r0, [r1, \#4]$   
 $\rightarrow r0 \leftarrow [r1 + 4]$ ,  $r0 = \text{mem}[r1 + 4]$   
 r0 gets data from location pointed by  $[r1 + 4]$ .
- ②  $STR \quad r0, [r1, \#12]$   
 $\rightarrow [r1 + 12] \leftarrow r0$ ,  $\text{mem}[r1 + 12] = r0$   
 Store word to memory at location pointed by  $r1$  then update  $r1$  to  $r1$  offset
- ③  $LDR \quad r0, [r1, \#4]$   
 $\rightarrow r0 = \text{mem}[r1 + 4]$ ,  $r1 = r1 + 4$   
 Here  $r1$  is auto-incremented and r0 gets data from location pointed by  $[r1 + 4]$ .
- ④  $STR \quad r0, [r1, \#12]$   
 $\rightarrow [r1 + 12] \leftarrow r0$ ,  $r1 = r1 + 4$   
 Store word to memory with Autoincrementing  $r1$
- ⑤  $LDR \quad r0, [r1], \#4$   
 $\rightarrow r0 = \text{mem}[r1]$ ,  $r1 = r1 + 4$   
 Read word from memory  $r1$  to r0, then update  $r1$  to  $r1$  offset (post indexing)
- ⑥  $STR \quad r0, [r1], \#12$   
 $\rightarrow \text{mem}[r1] = r0$ ,  $r1 = r1 + 12$
- ⑦  $LD RSH \quad r0, [r1]$   
 $\rightarrow$  Read and signed extended half word from memory  $r1$  to r0  
 $r0 = \text{mem}16[r1]$
- ⑧  $STRB \quad r0, [r1]$   
 $\rightarrow$  store byte to memory  $r1$   
 $\text{mem}8[r1] = r0$

Q.4. Consider a 8 stage pipeline with stage delays 15, 22, 26, 14, 19, 25, 12, 18 ns. It is required to process 8000 data items in the pipeline. What is the minimum time required to process the data items? If the latch delay is to be considered as 6ns for each stage what is the maximum clock frequency with which the pipeline can operate.

$$\rightarrow \text{Max. delay of stage} = T_m = 26 \text{ ms}$$

$$\begin{aligned}\tau (\text{clock period of pipeline}) &= T_m + \tau_l \text{ (latch delay)} \\ &= 26 \text{ ms} + 6 \text{ ns} \\ &= 32 \text{ ns}\end{aligned}$$

$$N (\text{data items}) = 8000$$

$$K = \text{No. of stages of pipeline} = 8$$

$$\begin{aligned}\text{Time require to process data} &= [(K-1) + N] \tau \\ &= [7 + 8000] \times 32 \text{ ns} \\ &= 2.56 \times 10^{-4} \\ &= 256.22 \text{ ms}\end{aligned}$$

$$\text{Maximum clock frequency} = \frac{1}{\tau} = \frac{1}{32 \text{ ms}}$$

$$= 31.25 \text{ MHz}$$

B) If the memory location 1000, 1004, 1008 contains the data 15, 18, 24 and r1 is initialized to 1000. The contents of register r2 after execution of following code segment.

$\rightarrow$  A) LDR r5, [r1]

$r5 \leftarrow r15$ ; r5 is loaded with 18 which is present in memory location pointed by [r1].

$$r5 = 15$$

→ LDR r6, [r1, #4]

$$r6 \leftarrow [r1+4] ; r6 = \text{mem}[r1+4]$$

$$= \text{mem}[1004]$$

$$r6 = 18$$

→ LDR r7, [r1, #8]

$$r7 = \text{mem}[r1+8]$$

$$= \text{mem}[1008]$$

$$r7 = 24$$

• ADD r2, r5, r6, LSL #3

$$\Rightarrow r2 = r5 + (r6 \ll 3)$$

$$r6 = 0000\ 0000\ 0001\ 0010$$

$$\text{LSL } \# 3 \Rightarrow$$

$$r6 = 0000\ 0000\ 1001\ 0000$$

$$= 144$$

$$r2 = 15 + 144 = 159$$

• SUB r2, r2, r7, LSR #2

$$r7 \rightarrow 0000\ 0000\ 0001\ 1000$$

$$\text{LSR } \# 12$$

$$r7 \rightarrow 0000\ 0000\ 0000\ 0110$$

$$r7 = 6$$

$$r2 = r2 - (r7 \gg 2)$$

$$= r2 - 6$$

$$= 159 - 6$$

$$r2 = 153$$