

# Information Technology Essentials — Lecture 04

Dr. Karim Lounis

Fall 2023



# Computer Architecture

# Computer Architecture

Computers have considerably evolved in terms of structure (or architecture), size, and performance. However, from a superficial viewpoint, the various computers have three common “main” components:

- **CPU** Is the central processing unit (a.k.a., Processor or Microprocessor) responsible for performing arithmetic and logical operations.
- **Central Memory.** Is the component responsible for storing programs to be executed by the computer.
- **Peripherals.** A.k.a., I/O devices, are components used to input or output information to and from a computer.

These three components are interconnected via communication lines in a circuit board, known as **the motherboard**.

# Computer Architectures

The main three components of a computer:



# Computer Architecture



These three components are interconnected via communication lines (**system bus**) in a circuit board, known as **the motherboard**.

# Von Neumann Architectural Model

Computer can be represented by the Von Neumann architectural model:



# Computer Architecture (Recap)

Keep the following in mind:

- A computer uses a **CPU** to perform arithmetic and logical operations. Executing programs is all about performing a combination of these operations. These basic operations form the building blocks of more complex operations (i.e., computations).
- A computer uses the **central memory** to temporarily store the programs that are to be executed by the CPU as well as any input and output.
- A computer uses **input peripherals** to obtain input from the user of the computer and **output peripherals** to output results to the user.
- The **motherboard**, through the **system bus**, allows the three different components to be connected for communication .
- The **Von Neumann** architectural model is used to abstractly represent modern general-purpose computers.

# Computer Architecture



**Let's dive into the details**

## I. Central Processing Unit

# CPU (Processors)

**Processor or Microprocessor**, is a complex electronic circuit designed to execute machine instructions at a very high speed.



Processors are featured by their: **frequency** (in MHz or GHz), **internal memory** (cache & registers), **word size** (8-64 bit), and **number of cores**.

A processor is connected to the motherboard through the **CPU socket**. The CPU requires a fan to keep its temperature cool.

# Processors

**Processor or Microprocessor**, is a complex electronic circuit designed to execute machine instructions at a very high speed.



Processors are featured by their: **frequency** (in MHz or GHz), **internal memory** (cache & registers), **word size** (8-64 bit), and **number of cores**.

Another feature is **IPC**, or **Instruction Per clock Cycle**. It measures the average number of instructions that the CPU can execute per **clock cycle**.

# Processors

**Processor or Microprocessor**, is a complex electronic circuit designed to execute machine instructions at a very high speed.

Processors are featured by their: **frequency** (in MHz or GHz), **internal memory** (cache & registers), **word size** (8-64 bit), and **number of cores**.

Structurally, a **CPU** consists of:

- Arithmetic Logical Unit (ALU): performs arithmetic and logical operations.
- Control Unit (CU): fetches and decodes instructions.
- Registers: small and fast storage locations (of length word-size) for instructions execution. Some of them are general-purpose (e.g., AX, BX, CX, DX, MAR, MBR, etc), whereas other are special-purpose (e.g., PC, SP, BP, etc).
- Cache memory: small memory location organized in levels L1, L2, & L3.
- Communication buses: address, data, and control bus.

# Processor

Modern CPU design: Include multiple computing cores on a single chip, they are called multicore, e.g., Intel Pentium D (2005) dual-core, Core i3, i5, and i9, AMD RYZEN 3, Apple Silicon M1/2, etc.



Multicore CPUs have an ALU and CU for each core. Also, on certain architectures, each core has its dedicated cache (L1, L2, & L3), whereas on other architectures, the cores may share the third level cache.

# Processors

On Windows PCs you can see the computer's CPU features by browsing the computer's properties:

The screenshot shows the Windows Settings application window. On the left, there is a sidebar with a user profile picture and the name "Lounis". Below the profile are several settings categories: System, Bluetooth & devices, Network & internet, Personalization, Apps, Accounts, Time & language, Gaming, Accessibility, and Privacy & security. The "System" category is currently selected and expanded. To its right, the main content area displays the "About" system information. The title bar says "System > About". Under "Device specifications", the following details are listed:

|               |                                                         |
|---------------|---------------------------------------------------------|
| Device name   | DESKTOP-2O6JU7I                                         |
| Processor     | 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz |
| Installed RAM | 8.00 GB (7.71 GB usable)                                |
| Device ID     | EE19371D-FEAE-4ADD-962C-E31686947EB8                    |
| Product ID    | 00325-96770-04284-AAOEM                                 |
| System type   | 64-bit operating system, x64-based processor            |
| Pen and touch | Pen and touch support with 10 touch points              |

Below this, under "Related links", are "Domain or workgroup", "System protection", and "Advanced system settings". Under "Windows specifications", the following details are listed:

|              |                 |
|--------------|-----------------|
| Edition      | Windows 11 Home |
| Version      | 21H2            |
| Installed on | 9/9/2021        |
| OS build     | 22000.194       |

# Processors

On GNU/Linux you can see the computer's CPU features by executing the command *lscpu*:

```
root@Human-Device:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:  0-3
Thread(s) per core:   2
Core(s) per socket:   2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Model name:            Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
Stepping:               7
CPU MHz:               798.231
CPU max MHz:           2900.0000
CPU min MHz:           800.0000
BogoMIPS:              4589.81
Virtualization:        VT-x
L1d cache:              32K
L1i cache:              32K
L2 cache:               256K
L3 cache:               3072K
NUMA node0 CPU(s):     0-3
Flags:    fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
          pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc
          arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperf mperf pni pclmulqdq
          dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt
          tsc deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi
          flexpriority ept vpid xsaveopt dtherm ida arat pln pts flush_lll
```

# Processors

In GNU/Linux you can see the computer's CPU features by executing the command *lscpu*:

```
Architecture:          x86_64
CPU op-mode(s):       32-bit, 64-bit
Byte Order:           Little Endian
CPU(s):               8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):            1
NUMA node(s):         1
Vendor ID:            AuthenticAMD
CPU family:           21
Model:                2
Model name:           AMD FX(tm)-8350 Eight-Core Processor
Stepping:              0
CPU MHz:              1400.000
CPU max MHz:          4000.0000
CPU min MHz:          1400.0000
BogoMIPS:              8000.05
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              8192K
NUMA node0 CPU(s):    0-7
Flags:    fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
          cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb r
          dtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmpfri pn1 pc1mu
          lqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_
          legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop s
          kinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pst
          ate ssbd vmmcall bmi1 arat npt lbrv svm_lock nrrip_save tsc_scale vmcbl_clean flus
          hbyasid decodeassists pausefilter pfthreshold
```

# Processor Instruction Set

**Processor Instruction Set:** is the set of machine instructions that a given processor is able to decode and execute (supported instructions).

Two main classes: CISC and RISC.

- **Complex Instruction Set Computer.** Is a computer architecture in which single instructions can execute several low-level operations.

PDP-11 and VAX architectures, Motorola 6800, 6809 and 68000-families, the Intel 8080, iAPX432 and x86-family, the Zilog Z80, Z8 and Z8000-families, the Intel 8051-family, & others.

- **Reduced Instruction Set Computer.** Is a computer architecture in which each single instruction can execute one single low-level operation.

ARC processor, the DEC Alpha, the AMD Am29000, the ARM architecture, the Atmel AVR, Blackfin, Intel i860, Intel i960, LoongArch, Motorola 88000, the MIPS architecture, the PA-RISC, the Power ISA, the RISC-V, the SuperH, and the SPARC.



# Processor Instruction Set

**Processor Instruction Set:** is the set of machine instructions that a given processor is able to decode and execute (supported instructions).

Two main classes: CISC and RISC.

- **Complex Instruction Set Computer.** Is a computer architecture in which single instructions can execute several low-level operations.
  - Requires less instructions for a given code.
  - Microprogramming is easy to implement.
  - CISC processors are larger as they contain more transistors.
  - May take multiple cycles per line of code, decreasing efficiency.
- **Reduced Instruction Set Computer.** Is a computer architecture in which each single instruction can execute one single low-level operation.
  - Requires more instructions for a given code.
  - Greater performance due to simplified instruction set.
  - Used in supercomputer, such as the Fugaku.
  - Less expensive, as they use smaller chips.

# Processor Instruction Set

Nowadays, we rather talk about x86 processors Vs ARM processors instead of CISC Vs RISC processor. This is because most processors used these days are either Intel-based (i.e., x86) or ARM-based.

# Processor Instruction Set

**Processor instruction set:** is the set of machine instructions that a given processor is able to decode and execute (supported instructions).

The following instructions are assembly instructions for CISC x86 computer architecture:

Mov Ax, 0x1B25 //Moves the value 0x1b25 to the register AX.

Mov Al, 0x25 //Moves the value 0x25 to lower part of AX.

Inc Ax //Increments the value in the register Ax by 1.

Dec Ax //Decrements the value in the register Ax by 1.

Push Ax //pushes to content of Ax into the stack (SP-2).

Pop Ax //pops the content pointed at [SP] in the stack into Ax register.

Jmp T //Jumps to the instruction labeled with T.

Cmp Ax, 1 //Compares the value stored in the Ax register with 1.

Je T //Jumps to T if the previously compared values were equal.

# Processor Instruction Set

More instructions ...

Jle T //Jumps if the 1<sup>st</sup> operand is  $\leq$  to the 2<sup>nd</sup>.

Loop T //Decrement Cx and jumps to T if Cx  $\neq$  0.

Xor Ax, Bx //Xor of Ax and Bx and stores result in Ax.

Halt //Terminates program.

RSIC-based processors will have a different processor instruction set. E.g.,:

- LDR R0, =0x200
- ADD R1, R2, #10
- CMP R0, R1
- MOV R0, #10
- STR R2, [R0]
- BEQ T
- LDR R1, [R0]
- T Loop
- SUBS R0, R0, #1

In what follows, most of the examples are given in CISC x86 assembly language (i.e., use the x86 instructions set) on a 16-bit computer.

# Processor Instruction Set

## Definition

**Machine code:** is a sequence of bytes 01010101 01010111...01010100 that can be interpreted and executed by a dedicated CPU.

Each instruction in the assembly language is transformed into a machine code by the assembler program (e.g., Netwide Assembler a.k.a., nasm).

Each instruction in machine code is generally composed of an **Operation code** (opcode) and **operand**.

```
MOV Ax, [0xF565] | JMP Label  
INC Ax | ADD Ax, Bx | MOV Ax, Bx
```

Some instructions just consists of an **operation code**:

```
NOP | HLT | RET | CLI | STI
```

Instructions may have different sizes, e.g., in Intel, 1 Byte to 14 Bytes.

# Processor Instruction Set (E.g., Intel 8086 microprocessor)

Consider the following x86-assembly program (Result stored in Ax):

Start:

```
MOV Ax, 0X0001
```

```
MOV Cx, 0X0001
```

T:

```
MUL Ax, Cx
```

```
INC Cx
```

```
CMP Cx, 0X0005
```

```
JBE T      //Jump if CX is below or equal to 5
```

```
HLT
```

Try it at: <https://yjdoc2.github.io/8086-emulator-web/compile>



# Processor Instruction Set (E.g., Intel 8086 microprocessor)

Consider the following x86-assembly program (Result stored in Ax):

Start:

```
MOV Ax, 0X0001
```

```
MOV Cx, 0X0001
```

T:

```
MUL Ax, Cx
```

```
INC Cx
```

```
CMP Cx, 0X0005
```

```
JBE T      //Jump if CX is below or equal to 5
```

```
HLT
```

At the end of this program the register Ax contains the value 0x0078.

# Processor Instruction Set (Instruction Per Clock Cycle)

**Instruction Per Clock cycle** refers to the average number of instructions that a CPU can simultaneously perform per single clock cycle (CC).

- Earlier processors (Sequential CPUs) performed one machine cycle per clock cycle. I.e., around three or more clock cycles per simple instruction.
- Next processors (Pipelined CPUs) performed multiple machine cycles per clock cycle. E.g., executing I1, decoding I2, and fetching for I3.

RISC processors run most of their instructions in one clock cycle

Certain instructions (complex instructions) may take several clock cycles to execute (e.g., CALL takes 18 clock cycle on an MCS-51). Reading a 16-bit data from RAM using an 8-bit data bus may take several cycles.

- Modern processors (Superscalar CPUs) can execute multiple instructions per clock cycle. They use ILP (Instruction-Level Parallelism).

# Single-Processor Systems

In a single-processor system, there is only one CPU (Central Process Unit) capable of executing a **general-purpose** instruction set.



Figure: Motherboard of a single-processor computer

- Such systems do have other **special-purpose** processors such as device-specific processor (or controllers), or a **general-purpose** processor on mainframes.
- A system with  $n$  **special-purpose** processors and one **general-purpose** processor is a single-processor system.

# Single-Processor Systems

In a single-processor system, there is only one CPU (Central Process Unit) capable of executing a **general-purpose** instruction set.

## Graphical Processing Unit

A.k.a., GPU. Is a special-purpose processor used for processing visual and graphical-related tasks.



- Such systems have specific processors on mainframes.
  - A system with a general-purpose processor.
- As they are optimized for parallel processing, they are used a lot for scientific simulations, cryptanalysis, and general-purpose AI applications.

# Single-Processor Systems

In a single-processor system, there is only one CPU (Central Process Unit) capable of executing a **general-purpose** instruction set.

## Tensor Processing Unit

Developed by Google to accelerate machine learning workloads, particularly those involving deep learning neural networks.



- Such systems are typically found in specific purpose applications like mainframes and supercomputers.
  - A system-on-chip (SoC) is a single integrated circuit containing a processor and other components.
- TPUs are designed to deliver high-speed, low-latency computation for large-scale AI applications.

# Multiprocessor Systems

In a multiprocessor system, there are two or more CPUs capable of executing a **general-purpose** instruction set, and sharing the computer bus, [clock], memory, and peripheral devices.



**Figure:** Motherboard of a multiprocessor computer

- Appeared in servers then migrated to desktops, laptops, and recently smartphones and tablets.
- Cheaper than multiple single-core systems and provide better throughput, response time, turn around time, and higher reliability.

# Multiprocessor Systems

Modern CPU design: Include multiple computing cores on a single chip, they are called multicore, e.g., Intel Pentium D (2005) dual-core, Core i3, i5, and i9, AMD RYZEN 3, Apple Silicon M1/2, etc.



# Multiprocessor Systems

**Clustered Systems:** Are composed of several computer systems (a.k.a., hosts) connected via a local area network. Each of the computer systems can be a single-processor system or a multicore system.



- Clusters emerged: low-cost microprocessors, high-speed networks, and software for high-performance distributed computing.
- Clusters are used to share storage and provide high availability.
- Can provide HPC (High Performance Computing) environments.

- End.