

## 程序代写代做 CS 编程辅导 Computer Architecture and Low Level



gramming

Vasilios Kelefouras

WeChat: cstutorcs

Email: v.kelefouras@plymouth.ac.uk

Assignment Project Exam Help  
Website:

<https://www.tutorcs.com>

QQ: 749389476

<https://tutorcs.com>

# Introduction

## 程序代写代做 CS编程辅导

2

### Outline



- Superscalar processors
- Superpipelining processors
- In order and out of order processors
- RISC, CISC, VLIW and EPIC processors
- Moore's Law

WeChat: cstutorcs

Assignment Project Exam Help

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Superscalar Processors

程序代写代做 CS编程辅导

3

- You have been taught CPU pipelining but performance is never enough
- Any other options?



WeChat: cstutorcs

- Why not make the pipeline deeper and deeper?  
**Assignment Project Exam Help**
  - By adding more pipeline stages the clock cycle is reduced  
**Email: tutorcs@163.com**
  - But beyond some point, adding more pipe stages doesn't help, because control/decoupling becomes more complex, and become costlier  
**QQ: 1749389476**

<https://tutorcs.com>

# Superscalar Processors

程序代写代做 CS编程辅导



- A superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the chip.
- Duplicates the pipeline to accommodate **Instruction Level Parallelism (ILP)**  
WeChat: cstutorcs
  - Note that duplicating HW in just one pipe stage doesn't help, e.g., when having 2 ALUs, the bottleneck moves to other stages

Email: tutorcs@163.com

- It therefore achieves more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate
- Superscalar machines issue a variable number of instructions each clock cycle, up to some maximum
  - instructions must be independent – no data or control dependencies

# Pipeline vs 2 way Superscalar (pipelined)

程序代写代做 CS编程辅导

5

|              | 1  | 2  | 3  | 4   | 5  | 6   | 7   | 8  |
|--------------|----|----|----|-----|----|-----|-----|----|
| Instruction1 | IF | ID | EX | WB  |    |     |     |    |
| Instruction2 |    | IF | ID | MEM | WB |     |     |    |
| Instruction3 |    |    | IF | ID  | EX | MEM | WB  |    |
| Instruction4 |    |    |    | IF  | ID | EX  | MEM | WB |

WeChat: cstutorcs

Assignment Project Exam Help

|              |    |    |    |     |     |                        |  |  |
|--------------|----|----|----|-----|-----|------------------------|--|--|
| Instruction1 | IF | ID | EX | MEM | WB  |                        |  |  |
| Instruction2 | IF | ID | EX | MEM | WB  | Email: tutorcs@163.com |  |  |
| Instruction3 |    | IF | ID | EX  | MEM | WB                     |  |  |
| Instruction4 |    | IF | ID | EX  | MEM | WB                     |  |  |

<https://tutorcs.com>

Careful: If the instructions are interdependent, then superscalar does not improve performance at all

# Superscalar Hardware Support

程序代写代做 CS编程辅导

6



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

- Extra hardware to simultaneously fetch multiple instructions is needed
- Hardware logic to determine data dependencies is needed, involving the registers' values
- Extra hardware (functional units) to execute multiple instructions in parallel
- Extra hardware (functional units) to issue multiple instructions in parallel
- More extra hardware for more complicated notions (out of the scope of this module)
- Extra Performance does not come for free

# Think Pair Share

## 程序代写代做 CS编程辅导

7

Describe the pipeline stages of the code for a) a 5 stage pipelined processor and b) 5 stage superscalar pipelined processor. Compare the number of cycles



1    2    3    6    7    8    9

Instruction1    IF    ID    EX    MEM    WB

Instruction2    IF    stall    ID    EX    MEM    WB

Instruction3    stall    IF    ID    EX    MEM    WB

Instruction4    stall    IF    ID    EX    MEM    WB

WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

Instruction1    IF    ID    EX    MEM: 749389476    (1<sup>st</sup> pipe)

Instruction2    IF    stall    ID    EX    MEM    WB    (1<sup>st</sup> pipe)

Instruction3    IF    ID    EX    MEM    WB    (2<sup>nd</sup> pipe)

Instruction4    IF    ID    EX    MEM    WB    (2<sup>nd</sup> pipe)    IMUL R3 ← R1, R2

ADD R3 ← R3, R1  
IMUL R5 ← R6, R8

ADD R7 ← R6, R8

# Superscalar vs Superpipelining (1)

程序代写代做 CS 编程辅导

8

- **Superpipelining:** Vaguely deep pipelining, i.e., lots of stages
- **Superscalar issue complex:** super-pipelining
- How to compare them?
  - e.g., 2-way Superscalar vs. two stages



WeChat: cstutorcs

| IF | ID | EX | MEM | WB  | Assignment             | Project | Exam | Help |
|----|----|----|-----|-----|------------------------|---------|------|------|
| IF | ID | EX | MEM | WB  | Email: tutorcs@163.com |         |      |      |
|    | IF | ID | EX  | MEM |                        |         | WB   |      |
|    | IF | ID | EX  | MEM |                        |         | WB   |      |

QQ: 749389476

<https://tutorcs.com>

|     |     |     |     |     |     |     |    |     |     |     |
|-----|-----|-----|-----|-----|-----|-----|----|-----|-----|-----|
| IF1 | IF2 | ID1 | ID2 | EX1 | EX2 | M1  | M2 | WB1 | WB2 |     |
| IF1 | IF2 | ID1 | ID2 | EX1 | EX2 | M1  | M2 | WB1 | WB2 |     |
|     | IF1 | IF2 | ID1 | ID2 | EX1 | EX2 | M1 | M2  | WB1 | WB2 |
|     | IF1 | IF2 | ID1 | ID2 | EX1 | EX2 | M1 | M2  | WB1 | WB2 |

# Superscalar vs Superpipelining (2)

程序代写代做CS编程辅导



# Superscalar Problem

程序代写代做 CS编程辅导



- what if two successive instructions can't be executed in parallel?
  - Superscalar operation is double impacted by a stall
  
- Superscalar depends upon:
  - Instruction level parallelism (ILP)
  - Compiler based optimization

WeChat: cstutorcs  
Assignment Project Exam Help  
Email: tutorcs@163.com
  
- Limited by
  - Data dependencies
  - Control dependencies

QQ: 749389476  
<https://tutorcs.com>

# A Superscalar Processor

程序代写代做 CS编程辅导



# Is superscalar good enough?

程序代写代做 CS编程辅导

12

- A superscalar processor can **fetch, decode, execute more than one instructions in parallel**
- But...
  - Can execute only independent instructions in parallel
    - Whereas adjacent instructions are often dependent
  - **So the utilization of the second pipe is often low**
- **Solution: out-of-order execution**
  - Execute independent instructions in a different, more efficient order
  - A specific HW mechanism expresses a sliding window of consecutive instructions (**instruction window** – it is a small memory)
    - Ready instructions get picked up from window and executed out of order
    - Instructions enter (**dispatched**) and leave (**committed**) the instruction window in program order, and an instruction can only leave the window when it is the oldest instruction in the window and it has been completed



WeChat: cstutorcs

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Out of Order Processors (1)

程序代写代做 CS编程辅导

13

- The pipelines we have seen so far are statically scheduled and **inorder**, i.e., instructions are executed in the order they appear in the program's order



- If a hazard causes stall conditions, all instructions up to the offending instruction are stalled

- Forwarding, branch prediction, and other techniques can reduce the number of stall cycles we need, but sometimes a stall is unavoidable

WeChat: cstutorcs  
Assignment Project Exam Help

- Consider the following assembly code for a superscalar processor

QQ: 749389476

- The second instruction stalls the second pipe

<https://tutorcs.com>

- What about changing the order of the instructions?

- Careful: data dependencies must be preserved

IMUL R3 ← R1, R2

ADD R3 ← R3, R1

IMUL R5 ← R6, R8

ADD R7 ← R6, R8

IMUL R3 ← R1, R2

IMUL R5 ← R6, R8

ADD R7 ← R3, R5

ADD R3 ← R3, R1

# Data flow analysis

程序代写代做 CS编程辅导

14

S1:  $r1=r0/7$  //division  
S2:  $r8=r1+r2$   
S3:  $r5=r5+1$   
S4:  $r6=r6-r3$   
S5:  $r4=r5+r6$   
S6:  $r7=r8*r4$



cycles

In order execution:

|   |   |   |   |   |   |
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|

WeChat: cstutorcs In order execution 2-way superscalar:

|                       |   |   |   |   |   |
|-----------------------|---|---|---|---|---|
| 1 <sup>st</sup> pipe: | 1 | 2 | 4 | 5 | 6 |
|-----------------------|---|---|---|---|---|

Assignment Project Exam Help

## Data Flow Graph



Email: tutorcs@163.com Out of order execution 2-way superscalar:

|                       |   |   |   |
|-----------------------|---|---|---|
| 1 <sup>st</sup> pipe: | 1 | 2 | 6 |
| 2 <sup>nd</sup> pipe: | 3 | 4 | 5 |

QQ: 749389476

<https://tutorcs.com>

# Out of Order Processors (2)

程序代写代做 CS编程辅导

15

- Idea: Move the dependent instructions out of the way of independent ones
  - Rest areas for dependent instructions: **Reservation station**
  - Monitor the source “values” of each instruction in the resting area
  - When all source “values” of an instruction are available, dispatch the instruction
- Allows independent instructions to execute and complete in the presence of a long latency operation



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Out of Order Processors (3)

程序代写代做 CS编程辅导

16

- **Instruction window:** It is a buffer that holds the instructions that have been fetched and decoded and waiting to be executed



- Note: Often, the instruction window doesn't actually exist as a single buffer, but is distributed among reservation stations

WeChat: cstutorcs

- **Enhanced issue logic.** The issue logic must be enhanced to issue (start) instructions out of order depending on their readiness to execute

Email: tutorcs@163.com

- **Reservation stations.** Each functional unit has a set of reservation stations associated with it. Each station contains information about instructions waiting or ready to issue.

QQ: 749389476

<https://tutorcs.com>

# Out of Order Execution (1)

程序代写代做 CS编程辅导

17

## Fetch, decode & dispatch in parallel

- Multiple instructions are fetched/decoded/dispatched in parallel
- Instructions are put in the instruction window - reservation stations (RS)



## Execute instructions (out of order) that are ready in the reservation stations – speculative execution

- Instruction operands must be ready
- Available execution resources

## After execution:

QQ: 749389476

- Broadcast result on bypass network
- Signal all dependent instructions that their data are ready

<https://tutorcs.com>

18

## Commit instructions in-order

- All execution within the instruction window is speculative (i.e., side-effects are not applied outside the CPU) until it is committed



# Out of Order Execution (2)

程序代写代做 CS编程辅导

18

## □ Advantages: Better performance

- ❑ Exploit Instruction Level Parallelism (ILP)
- ❑ Hide latencies (e.g., L1 cache miss, divide)



## □ Disadvantages: HW is much more complex than that of in-order processors

- ❑ More expensive, larger chip area, higher power consumption

Assignment Project Exam Help

Email: [tutorcs@163.com](mailto:tutorcs@163.com)

QQ: 749389476

<https://tutorcs.com>

- ❑ In a very limited way
- ❑ Compilers lack runtime information
  - Conditional branch direction
  - Data values, which may affect calculation time and control
  - Cache miss / hit

Store Buffer

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

Execute

Reorder Buffer

Reservation Stations

Fetch

Decode

Dispatch

Commit

# Out of Order Processors – the Pipeline (1)

程序代写代做 CS编程辅导

19

- Fetch
  - Branch prediction
- Decode
  - Register renaming (see next)
- Dispatch : new instructions are added to the instruction window - reservation stations (RS)
- Reservation stations (RS)
  - Instructions wait for their inputs
  - Instructions wait for their functional units
  - If instruction operands are ready they are sent to the FU
  - Otherwise, check on bypass network and wait for operands
- Functional units (FU)
  - ALUs, AGUs, FPUs



WeChat: cstutorcs  
Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>



# Out of Order Processors – the Pipeline (2)

程序代写代做 CS编程辅导

20

## □ Bypass network

- Broadcast computed values from reservation stations



## □ ReOrder buffer (ROB)

- It allows instructions to be committed in-order
- De-speculates execution, mostly by Committing Instructions in-order
- flushes the speculative instructions when a misprediction is discovered

WeChat: cstutorcs  
Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>



# Out of Order Processors – the Pipeline (3)

程序代写代做 CS 编程辅导

21

## Commit

- All execution within the instruction window is speculative (i.e., side-effects are applied outside the CPU) until it is committed
- Instructions can write to memory only once it is certain they should have been executed
- Instructions must not affect machine state while they are speculative
- Instructions enter and leave the instruction window in program order, and an instruction can only leave the window when it is the oldest instruction in the window and it has been completed



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>



- The instruction window is instantiated as RS & ROB

# Out of Order Processors – the Pipeline (4)

程序代写代做 CS编程辅导

22

## Store Buffer:

- Stores are not executed Out-of-Order
  - Stores are never performed speculatively
  - There is no transparent way to undo them
- Stores are also never re-ordered among themselves
  - The Store Buffer dispatches a store only when
    - The store has both its address and its data ready and
    - There are no older stores waiting to dispatch



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

Store Buffer



# Data Dependencies and Register Renaming

程序代写代做 CS编程辅导

23

## □ Data dependencies resulting from different categories

- Read after Write (RAW) or true dependence
- Write after Read (WAR) or anti-dependence
- Write after Write (WAW) or output dependence



WeChat: cstutorcs

A:  $S1: PI=3.14;$

$S2: R=2;$

$S3: S=2 \times PI \times R$  //  $S3$  cannot be executed before  $S1, S2$  – true dependence

Assignment Project Exam Help

Email: [tutorcs@163.com](mailto:tutorcs@163.com)

B:  $S1: T1=R1;$  //  $S3$  cannot be executed before or in parallel with  $S1$  – anti-dependence

$S2: R2=PI-T1;$  // dependence. But it can be eliminated by applying register

$S3: R1=PI+S;$  // renaming

QQ: 749389476

<https://tutorcs.com>

$\downarrow$   
 $S1: T1=R1;$   
 $S2: R2=PI-T1;$   
 $S3: R3=PI+S;$

C:

$S1: T1=R1;$   
 $S2: T1=R2+5;$

$S1: T1=R1;$   
 $S2: T2=R2+5;$

WAW dependence is  
eliminated by applying  
register renaming

# Register Renaming

程序代写代做 CS编程辅导

24

- Register renaming is a technique that eliminates the false data dependencies
- Changes register names to avoid WAR/WAW hazards
- The elimination of these false data dependencies reveals more ILP
- **However, True dependencies cannot be eliminated**
- Register renaming is a new pipeline stage that allocates physical/hardware registers to instances of logical registers in the **decode stage**



WeChat: cstutorcs  
Assignment Project Exam Help  
Email: tutorcs@163.com  
QQ: 749389476  
<https://tutorcs.com>



# Register Renaming – example (1)

程序代写代做 CS编程辅导

25

D = (a+b) \* (a+b-c) //high

Before Register Renaming:

I1: load r1,a  
I2: load r2,b  
I3: r3=r1+r2  
I4: load r1,c  
I5: r2=r3-r1  
I6: r1=r3\*r2  
I7: st address,r1



Its assembly code

WeChat: cstutorcs

Draw its Data Flow Graph...

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

Draw its Data Flow Graph...

After Register Renaming:

I1: load r1,a  
I2: load r2,b  
I3: r3=r1+r2  
I4: load r4,c  
I5: r5=r3-r4  
I6: r6=r3\*r5  
I7: st address,r6

Now, speaking about performance, is it better?

# Register Renaming – example (2)

程序代写代做 CS编程辅导

26

After Register Renaming:

I1: load r1,a  
I2: load r2,b  
I3: r3=r1+r2  
I4: load r4,c  
I5: r5=r3-r4  
I6: r6=r3\*r5  
I7: st address,r6



4 pipeline stalls occur.

WeChat: cstutorcs

Assignment Project Exam Help

Solution: Reorder instructions as follows or equally, execute out of order

After OoO:

I1: load r1,a  
I2: load r2,b  
I4: load r4,c -> hiding the latency from I2 to I3..  
I3: r3=r1+r2  
I5: r5=r3-r4  
I6: r6=r3\*r5  
I7: st address,r6

QQ: 749389476

<https://tutorcs.com>

**Conclusion:** In this code, no stalls occur, since none of the load instructions is immediately followed by a dependent (arithmetic) instruction

# Out of Order Processors - Summary

程序代写代做 CS编程辅导

27

- Advantages

- Help exploit Instruction Level Parallelism (ILP)
- Help hide latencies (e.g., multiply, add, shift, divide)
- Superior/complementary to instruction Scheduler in the compiler
  - Dynamic instruction window



- Complex micro-architecture - hardware
  - Assignment Project Exam Help

- Complex instruction logic
- Requires reordering mechanism (retire/commit)
- Misprediction/speculation recovery

Email: tutorcs@163.com

QQ: 749389476

- Speculative Execution

- Advantage: larger scheduling window  $\rightarrow$  reveals more ILP

- Issues:

- Complex logic needed to recover from mis-prediction
- Runtime cost incurred when recovering from a mis-prediction

# Compare the computer architectures in terms of ISA

程序代写代做 CS编程辅导



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>



# What is ISA (Instruction Set Architecture)

程序代写代做 CS 编程辅导

29

- It provides commands to the processor, to tell it what to do, e.g., add, load, store
- ISA is analogous to a human language
- Allows communication
  - Human language: person to person
  - ISA: software to hardware
- Need to speak the same language/ISA
  - Email: tutorcs@163.com
  - Many common aspects
    - Part of speech: verbs, nouns, adjectives, adverbs, etc.
    - Common operations: calculation, control/branch, memory
- Different computer processors may use almost the same instruction set



WeChat: cstutorcs

Assignment Project Exam Help

QQ: 749389476

<https://tutorcs.com>

# RISC vs CISC (1)

## 程序代写代做 CS编程辅导

30

- Complex instruction set computer (**CISC**): complex instructions can execute several low-level operations (such as a load from memory, an arithmetic operation, or a memory store)
  - CISC puts emphasis on HW
  - CISC was developed to make compiler development simpler
  - CISC is typically used for general purpose computers
- Reduced instruction set computer (**RISC**): simple and 1 cycle instructions
  - RISC puts emphasis on SW
  - RISC was developed to make HW simpler
  - RISC is typically used for smartphones, tablets and other embedded devices



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

Assignment Project Exam Help

<https://tutorph.com>, tablets and other  
embedded devices

# RISC vs CISC (An example)

程序代写代做 CS编程辅导

31

- Let's say we want to find the product of two numbers – (1,3) and (2,1). Then store the product back into Main Memory.



C code:  $A[1][3] = A[1][3] * A[2][1]$

1. CISC Approach: MULT \$(1,3), \$(2,1)

- Complex instruction
- HW will do most of the work

2. RISC Approach:

LOAD A, \$(1,3)

Registers

QQ: 749389476

LOAD B, \$(2,1)

<https://tutorcs.com>

MUL A, B

STORE \$(1,3), A

- 4 simple instructions

- The compiler has more work to do



# RISC vs CISC architecture comparison

程序代写代做 CS编程辅导

32

CISC

- 1. Instructions take varying amounts of cycle time (multiple cycles)
- 2. Instructions provide complex operations
- 3. Many different instructions
- 4. Less registers
- 5. Pipeline is difficult
- 6. Different length instructions
- 7. The Opcode has no fixed position and size



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

6. All instructions have the same length(4 bytes)

QQ: 749389476

7. The Opcode has a fixed position and size within the instruction

<https://tutorcs.com>

# RISC vs CISC (2)

程序代写代做 CS编程辅导

33

- Opcode specifies the operation performed
  - RISC – fixed position
  - CISC – no fixed position and size



**RISC instruction format:**



Email: tutorcs@163.com

**CISC instruction format:**

QQ: 749389476



# RISC vs CISC - Pros & Cons

程序代写代做 CS 编程辅导

34

## CISC:

- Emphasis on HW
- Multi-clock complex instructions
- Less “instructions/program” with “complex” instructions
- Easy for assembly-level programmers, good code density
- Easy for compiler writers support more complex higher-level languages



## RISC:

- Emphasis on SW
- Single-clock simple instructions
- Less “cycles/instruction” with single-cycle instructions

WeChat: cstutorcs

Paster design and implementation - RISC takes about 1/5 the design time

The performance of the RISC processors highly depends mostly on the compiler or programmer

QQ: 749389476

<https://tutorcs.com>

Greater Complexity



Greater Complexity



# RISC vs CISC - Conclusions

程序代写代做 CS编程辅导

35

- Nowadays, the two architectures almost seem to have adopted the strategies of the other
  - Today's RISC chips
    - support as many instructions as yesterday's CISC chips
    - incorporate more complicated, CISC-like commands
    - make use of more complicated hardware, making use of extra function units for superscalar execution
  - Today's CISC chips
    - use many techniques formerly associated with RISC chips
    - are now able to execute more than one instructions within a single clock



WeChat: cstutorcs

Assignment Project Exam Help

Email: [tutors@163.com](mailto:tutors@163.com)

QQ: 749389476

<https://tutors.com>

<https://tutors.com>

# VLIW (Very Large Instruction Word) Processors (1)

程序代写代做 CS编程辅导

36

Bundle →  
256 bits



- Multiple function units execute operations in a large instruction concurrently
- A fixed number of operations is formatted as one big instruction (called a bundle)
- Fixed format so could decode operations in parallel
- VLIW CPUs use SW (the compiler) and not HW to decide which operations can run in parallel in advance
  - complexity of hardware is reduced substantially
- However, branch miss-prediction or cache misses may stall the processor
- Compiler does the work of the missing HW
- **Normally, they are used as digital signal processors (DSPs)**
- VLIW is a lot simpler than superscalar designs, but has not so far been commercially successful

# VLIW (Very Large Instruction Word) Processors (2)

程序代写代做 CS编程辅导

37

Bundle →  
256 bits



## A problem with traditional VLIW is code size

- Often it is simply not possible to completely utilize all processor execution units
- Thus many instructions contain NOPs in portions of the instruction word with a corresponding **increase in the size of the code**
- **Increased code size has obvious implications for the efficacy of caches and bus bandwidth**
- **Modern VLIWs deal with this problem in different ways**
  - One simple method is to offer several instruction templates, and allow the compiler to pick the most appropriate one – in this case, the one that utilizes the most bits of the instruction word
  - Another is to employ traditional data-compression techniques to the code

# VLIW vs (RISC & CISC)

程序代写代做 CS编程辅导

38

VLIW processors have the benefits over RISC & CISC



- Faster
- Lower power consumption
- Reduced HW complexity
- Cheaper
- Reduced design time

VLIW Drawbacks:

- No code compatibility between different generations

Large code size - Empty slots

are filled with NOPs

Assignment Project Exam Help

- Compilers are extremely

Email: tutorcs@163.com

□ Parallelism is underutilized for some algorithms – dependencies

introduce NOPs

QQ: 749389476

<https://tutorcs.com>

# EPIC (explicitly parallel instruction computer)

程序代写代做 CS编程辅导

39



- Very closely related to but not the same as VLIW
- Combines the best attributes of VLIW & superscalar RISC
- So far the only implementation of this architecture is as part of the IA-64 processor architecture in the Intel Itanium family of processors
- EPIC instruction word contains three 41-bit instructions and a 5-bit control field
- Applicable to general-purpose computers
- Compiler must find or create sufficient ILP
- Compiler gather instructions in groups
  - All the instructions in a group can be executed in parallel

# EPIC – Pros & Cons

程序代写代做 CS编程辅导

40



## ■ Benefits over VLIW

- Code size is reduced
- The same code can be executed on different processor implementations

WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

Still large code size

<https://tutorcs.com>

## ■ Drawbacks over RISC & CISC

- It is not always possible to completely fill all slots in a bundle, and empty slots are filled with NOPs

- Poor compiler support can significantly impact the performance of EPIC code

# Think-Pair-Share 2<sup>nd</sup> Exercise

程序代写代做 CS编程辅导

41

**Question:** What is in your opinion the best Instruction Set Architecture (ISA) and why?



**Answer:** There is no good and bad ISA, but appropriate and not appropriate. The appropriate ISA depends on the target goals and on the target application.

**WeChat: cstutorcs**  
**Assignment Project Exam Help**

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Comparison of different processors

## (Brainiacs vs Speed-Demons) (1)

42



# Comparison of different processors (Brainiacs vs Speed-Demons) (2)

43

- “Brainiac designs”
  - Extra HW in order to support more Instruction Level Parallelism (ILP)
  - Millions of extra transistors
  - More design effort
  - Consume more power
- “Speed-Demons”
  - Run at higher clock speeds because they are simpler
  - Simple HW design
  - Less Chip area
  - Less power consumption



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

Which would you rather have: 4 powerful brainiac cores, or 8 simpler in-order cores? They use the same chip area

# The CPU frequency has ceased to grow

程序代写代做 CS编程辅导

44



# Moore's Law

程序代写代做 CS编程辅导

45

- How small can we make
- How densely can we pack
- No one can say for sure
- **Gordon Moore, co-founder of Intel, observed that the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit was invented**
- **Moore predicted that this trend would continue for the foreseeable future (Moore's law)**
- In subsequent years, the pace slowed down a bit, but data density has doubled approximately every 18 months, and this is the current definition of Moore's Law.
- Most experts, including Moore himself, expect Moore's Law to hold true until 2020-2025
- Using current technology, Moore's Law cannot hold forever
- There are **physical** and **financial limitations**
- Cost may be the ultimate constraint



WeChat: cstutors

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

# Moore's Law – The number of transistors on integrated circuit chips (1971-2016)

Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important as other aspects of technological progress – such as processing speed or the price of electronic products – are strongly linked to Moore's law.



# Moore's Law Is STILL Going Strong

程序代写代做CS编程辅导

## Hardware performance potential *continues to grow*

"We think we can continue to follow Moore's Law for at least another 10 years."



Intel Senior Fellow Mark Bohr, 2015



WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>



# Conclusions

程序代写代做 CS编程辅导

48



- Computer architecture is complex and diverse
- There is a large number of different computer architecture classifications
- Classification helps us select the most appropriate architecture
- Computer architectures are evolving year by year
- The computer architecture requirements are continually increasing
- There is no good or bad computer architecture. It depends on the
  - ✓ target goals, e.g., performance, flexibility
  - ✓ target application

<https://tutorcs.com>

# Further Reading

程序代写代做 CS编程辅导

49



Structured Computer Organization, Sixth Edition, Andrew S. Tanenbaum, Todd Austin, PEARSON, 2012,  
<https://universalflowuniversity.com/Books/Computer%20Programming/Computers%2C%20Architecture%20and%20Design%20of%20Computer%20Organization%206th%20Edition.pdf>

WeChat: estutors

Assignment Project Exam Help

Computer Organization & Architecture. Designing for Performance. William Stallings, Seventh Edition, 2006, available at

Email: [tutorcs@163.com](mailto:tutorcs@163.com)

<https://inspirit.net.in/books/academic/Computer%20Organisation%20and%20Architecture%208e%20by%20William%20Stallings.pdf>

QQ: 749389476

Nicholas FitzRoy-Dale, The VLIW and EPIC processor architectures, available at  
<https://tutorcs.com>  
<https://www.cse.unsw.edu.au/~cs9244/06/seminars/02-nfd.pdf>

# 程序代写代做 CS编程辅导



Thank you  
WeChat: cstutorcs

Assignment Project Exam Help

Email: tutorcs@163.com

QQ: 749389476

<https://tutorcs.com>

Date

21/10/2019

School of Computing  
(University of Plymouth)