

Iomm  
use multimemory. SMPs.

CLASSMATE

Date 12.2.2020  
Page 1

## SHARED MEMORY MULTIPROCESSORS

~~SMPS~~ **Symmetric** (shared memory multiprocessor)

Are referred to as **SMPs**. centralized shared memory multiprocessors.

aka (multimemory system).

most existing multicore are symmetric processors

these SMPs are referred to as **uniform memory access**.

multiple multicore chips, we have got separate memory for each multi-core chip. memory is distributed. each multicore chip has its own memory.

(NUMA)

we have **distributed memory** rather than centralized memory, distributed shared memory multiprocessors. these are referred to as **non-uniform memory access**. power is referred to as **NUCA**.

challenges of 4 processors.

① in the shared system, and any distributed system. (SMP and DSM) the address space is shared. challenges of parallel processing. ② there could be limited parallelism available.

③ relatively high cost of parallelism, communication.

## CENTRALIZED & SHARED MEMORY ARCHITECTURE

④ use of multilevel caches can substantially reduce the memory bandwidth demands of a processor.

backside or memory bus to distinguish it from the bus used to do I/O operations. In this case, access to memory is **asymmetric**.

## MULTIPROCESSOR CACHE COHERENCE

Consider a cache of shared data. A view of memory by 2 different processors through their individual caches.

How will a processor come to know which processor is caching the same variable?

(AMH)

(which processor is caching the same variable).

Consistency of memory system is coherent, i.e. if any read operation of the data item, returns the most recently written value of that data item.

2 different aspects of memory system behaviour.

① coherence

② consistency

Coherence  $\Rightarrow$  what value can be written returned by a read.

Consistency  $\Rightarrow$  determines when a written value will be returned by a read.

initialization and loading location of memory is done through address bus.

access is done through data bus.

a memory system is coherent if

- ① a read of processor p to location  $x$  that follows a write by p to  $x$ , with no writes of  $x$  by another processor occurring between the write and the read by p always returns the value written by p.
- ② a read by a processor to location  $x$  that follows a write by another processor to  $x$  returns the written value if the read and write are sufficiently separated in time, and no other writes to  $x$  occur between the two accesses.
- ③ writes to the same location are serialized; that is 2 writes to the same location by any 2 processors are seen in the same order by all processors.

the issue of exactly when a written value must be seen by a reader is defined by a memory consistency model.

Coherence and consistency are complementary,

coherence defines the behaviour of reads & writes to the same memory location,

while consistency defines the behaviour of reads and writes to accesses to other memory locations.

## ASSUMPTIONS

- ① a write does not complete until all processors have seen the effect of that write.
- ② the processor does not change the order of any write wrt. any other memory access.

## THE BASIC SCHEMES OF ENFORCING COHERENCY

the coherence problem for multiprocessors and IO.

- 2) IO : multiple copies are a rare event.
- 3) a program running on multiple processors will normally have copies of the same data in several caches.

In a coherent multiprocessor, the cache provides both migration & replication of shared data items.

Multiprocessors adopt a hardware solution by introducing a protocol to maintain coherent caches.

We have 2 classes of protocols.

- ① directory-based (symmetric & distributed)
  - (multiple processes distributed to multiple directories)
- ② snooping.

(24) write invalidate protocol

(25) write update / write broadcast protocol.

snooping  $\rightarrow$  caches will monitor bus activity.

request

protector

State of the address

Cache block.

request

source, processor

type of

function expl.

Cachearc.

① read hit processor Shared or normal hit read data in local cache.  
modified.

② read miss processor invalid shared or normal miss place read miss on bus.

③ read miss processor shared, replacement address conflict miss. place read from memory words write on board miss on bus.

④ read miss processor modified, replacement address conflict miss. write back blk., then place read miss on bus.

⑤ write hit processor modified normal hit write data in local cache.

⑥ write hit processor shared coherence place invalidate on bus. These operations are often called, update on ownership misses, because they don't fetch the data, but only change.

⑦ antemiss processor invalid normal miss place write miss on bus.

⑧ write miss processor shared replacement address conflict miss place write miss on bus.

⑨ write miss processor modified replacement address conflict miss write back block, then place write miss on bus.

⑩ read miss bus shared no action allow shared cache on memory to service the readmiss.

⑪ readmiss bus modified (problem) coherence attempt to read shared data, so place cache block on bus, write back block, and change state to shared.

③ invalidate bus shared coherence attempt to write shared block invalidate the cache block.

④ write miss bus shared coherence attempt to write shared block, invalidate the cache block.

⑤ write miss bus modified coherence attempt to write block that is exclusive elsewhere, writeback the cache block and make its state invalid in the local cache.

a write invalidate cache coherence protocol for a private writeback cache (not write through), showing the states and the state transition for each block in the cache.  
( $2 - 4$  words).

Cache State  
transitions  
based on requests  
from CPU.

3 states

Requests by the processor.



write miss for this b/c.



Cache state transitions based on request from the bus.

The protocol assumes, operations are atomic. non-atomic operations actions introduce the possibility that the protocol can deadlock.

MSI - modified shared invalid.

- ① MEST - extension
- ② MESI - optimized certain behaviour and possibly improve performance.
- ③ intel i7 processor has got MESIF foreword.

④ MOESI      Owned.      AMD opteron.

## DIRECTORY BASED CACHE COHERENCE PROTOCOL.

two primary ops., handling a read miss, and write to a shared clean cache block. Handling a write miss to a block that is currently shared is a simple combination of these 2.)

a directory must track the state of each block - shared. one or more nodes have the block cached. and the value in memory is up to date., as well as in all caches.

- Uncached : no node has a copy of the cache block.

modified exactly one node has a copy of the cache block and it has written the block., so the memory copy is out of date. this processor is called the owner of the block.

bit vectors → indicates all processors which are shared.

Similarities FSM

local node is the node where a request originates.

home node → the mem location & the directory entry of an address node.

the physical address space is statically distributed. the local node may also be the home node.

a remote node is the node that has a copy of the cache block whether exclusive or shared.

P = requesting node number; A = requested address  
D = data content.

message source dest. message function of msg,  
type. contents.

read local home P, A  
miss cache dir.

node P has read miss  
at A request data and  
P a read shared.

write miss local cache home P, A  
dir.

node P has a write miss  
at address A, request  
data and make P the  
exclusive owner.

invalidate local home A

request to send invalidate  
to all remote caches that  
are caching the block at  
address A.

invalidate home dir. remote A

invalidate the shared copy  
of data at address A.

fetch home dir. remote A

fetch the blk at address A  
and end it to its home dir;  
change the state of A in the  
remote Cache shared.

TA -

fetch home  
invalidate dir.remote  
cache

A

fetch the blk at address A and  
send it to its own directory,  
invalidate  
and change the state of block  
in the cache

data, home  
value dir.  
replylocal  
cache

D

return a data value from the  
home directory memory

data, remote  
wb, cachehome  
dir.

A,D

update the data value for address A.