

# Shared Memory Multiprocessors

## CSEN 3104

### 16/11/2019

Dr. Debranjan Sarkar

# Shared memory multiprocessors

- There are mainly 3 types of shared memory models for multiprocessors:
  - Uniform Memory Access (UMA)
  - Non-Uniform Memory Access (NUMA)
  - Cache Only Memory Access (COMA)
- Besides these, other variants also exist:
  - Cache-Coherent Non Uniform Memory Access (CC-NUMA)
  - No Remote Memory Access (NORMA)

# Uniform Memory Access (UMA)

- Physical memory is uniformly shared by all the processors [see figure]
- All processors have equal access time to all memory words (Uniform access)
- Each processor may use a private cache
- Peripherals are also shared in some fashion
- This is called tightly coupled system due to high degree of resource sharing
- System interconnect may be bus, crossbar switch or multistage network
- Suitable for general-purpose and time-sharing applications for multiple users
- May be used to speed up the execution of a single large program in time-critical applications
- Synchronization and communication among processors are done by using shared variables in the common memory

# UMA Multiprocessors Model



# Uniform Memory Access (UMA)

## Salient features

- Physical memory uniformly shared
- Equal access time to all memory words (Uniform access)
- Peripherals are also shared
- Tightly coupled system
- Bus, crossbar switch or multistage network
- General-purpose and time-sharing applications
- Speed up execution of a single large program
- Synchronization and communication among processors
  - Shared variables

# Non Uniform Memory Access (NUMA)

- Shared memory system, where access time varies with the location of the memory word (Non-uniform access)
- The shared memory is physically distributed to all processors, called local memories
- The collection of all local memories forms a global address space accessible by all processors [see figure (a)]
- Access to local memory by a local processor is fast
- Access to remote memory attached to other processors is comparatively slow because of added delay through communication network

NUMA model  
(shared local memories)



Figure (a)

# Non Uniform Memory Access (NUMA)

## Salient features

- Shared memory physically distributed to all processors
- Collection of all local memories forms a global address space
- Access time varies with the location of the memory word
- Access to local memory by a local processor is fast
- Access to remote memory attached to other processors is comparatively slow because of added delay through

# Non Uniform Memory Access (NUMA)

- In addition to distributed memories, globally shared memory can also be added [see figure (b)]
- Here, fastest is local memory access, then access to shared memory and slowest is access to remote memory
- In the hierarchical cluster model shown in figure (b), the processors are divided into several clusters
- Each cluster may be UMA or NUMA multiprocessor
- The clusters are connected to global shared memory modules
- The entire system is considered a NUMA multiprocessor
- All processors belonging to the same cluster have uniform access to cluster shared memory (CSN)
- All clusters have equal access to the global memory (GSM)
- The access time to the cluster memory is less than that to the global memory

## Hierarchical Cluster Model (NUMA)



Figure (b)

P: Processor

CSM: Cluster Shared Memory

GSM: Global Shared Memory

CIN: Cluster Interconnection Network

# Non Uniform Memory Access (NUMA)

## Salient features

- Globally shared memory, in addition to distributed memories
- Memory access
  - local memory -> fastest
  - Shared memory -> faster
  - Remote memory -> slowest
- Processors are divided into several clusters
- Each cluster may be UMA or NUMA (here UMA)
- Clusters connected to global shared memory modules
- All processors in a cluster have uniform access to cluster shared memory (CSN)
- All clusters have equal access to the global memory (GSM)
- The access time to the cluster memory is less than that to the global memory

## COMA Model



P : Processor

C : Cache

D : Directory

# Cache Only Memory Access (COMA)

- COMA model is a special case of NUMA model
- Here, distributed main memories are converted to caches [See Figure]
- There is no memory hierarchy in each processor node
- All the caches form a global address space
- Remote cache access is assisted by distributed cache directories (D)

# No Remote Memory Access (NoRMA) Architecture

- A distributed memory multicomputer system consists of multiple computers (nodes)
- Nodes are inter-connected by message passing network [See Figure]
- Each node acts as an autonomous computer having a processor, a local memory and sometimes I/O devices (viz. Hard disks)
- Message passing network provides point-to-point static connections among the nodes
- All local memories are private and are accessible only to the local processors
- As the address space is not unique globally , the memory is not globally accessible by the processors.
- This is why, the traditional multicomputers are called no-remote-memory-access (NoRMA) machines
- Inter-node communication is carried out by passing messages through the static connection network

## NORMA Architecture



Generic Model of a message-passing multiprocessor:

# No Remote Memory Access (NoRMA) Architecture

## Salient features

- A distributed memory multicomputer system
- Each node is an autonomous computer
- Nodes inter-connected by message passing network (MPN)
- MPN provides point-to-point static connections among the nodes
- Inter-node communication is carried out by passing messages through MPN
- Memories accessible only by local processors (No remote access)
- Address space not unique globally

## Cache-coherent Non Uniform Memory Access (CC-NUMA)

- With NUMA, it is difficult to maintain cache coherence across shared memory
- CC-NUMA uses inter-processor communication between cache controllers to keep a consistent memory image when more than one cache stores the same memory location
- When multiple processors attempt to access the same memory area in rapid succession, the performance of CC-NUMA may be poor
- To reduce the frequency of this kind, Operating System allocates processors and memory in NUMA-friendly ways and avoids scheduling and locking algorithms that make NUMA-unfriendly accesses necessary
- Some cache coherence protocols (viz. MESIF protocol) attempt to reduce the communication required to maintain cache coherence
- The **figure** shows the CCNUMA architecture for scalable shared-memory multiprocessors
- This allows fast access to data in local memory and slower access in a remote memory

## CC-NUMA Shared Memory

### Multiprocessors



# Cache-coherent Non Uniform Memory Access (CC-NUMA): Salient features

- CC-NUMA maintains cache coherence across shared memory
- Uses inter-processor communication between cache controllers
- Allows fast access to data in local memory and slower access in a remote memory
- Blue dashed line -> path taken for local memory access
- Red dashed line -> path taken for remote memory access
- The cache memory controllers in all the nodes co-operate using directory techniques

Thank you