

# Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for the Hybrid Memory Cube (HMC)

Ramyad Hadidi, Bahar Asgari , Burhan Ahmad Mudassar,  
Saibal Mukhopadhyay, Sudhakar Yalamanchili, and Hyesoon Kim



IISWC'17

## 3D-Stacking Technology

Provides opportunities & novel features

### 3D-DRAMs:

- Provide higher bandwidth and density
- Enable lower power consumption
- Motivate processing-in-memory

HMC is an example of such memories.

## Experimental Setup



## New Considerations

### New internal organization

### New thermal behavior

### New latency and bandwidth hierarchy

### New packet-switched interface



## Hybrid Memory Cube (HMC)

HMC 1.1 (Gen2): 4GB size



## Bandwidth



Accessing 4 banks saturates 1 vault bandwidth.  
External bandwidth is saturated at 4 vaults.

## Temperature (read only)



Access patterns affect temperature.

## Temperature & Bandwidth



Greater slope for writes  
Writes are more sensitive to temperature

## Device Power & Bandwidth



## High-Load Latency



## Latency Deconstruction



## Low-Load Latency



125 ns is spent in the HMC

## Latency Deconstruction Summary

| TX Path:                            | 287 ns | 260 ns | 547 ns     |
|-------------------------------------|--------|--------|------------|
| Conversion to flits & buffering     |        |        | 10 cycles  |
| Round-robin arbitration among ports |        |        | 2-9 cycles |
| Add packet fields & flow control    |        |        | 10 cycles  |
| Serialization                       |        |        | 10 cycles  |
| Transmission (128B)                 |        |        | 15 cycles  |

Freq.: 187.5 MHz  
Cycle: 5.3 ns

## Conclusions

- Mixing read and write requests and using large request sizes lead to effective use of bi-directional bandwidth.
- Distributing accesses prevents internal bottlenecks and exploits bank-level parallelism.
- Controlling the request rate to avoid high latency.
- Employing fault-tolerant mechanisms and using proper cooling solutions enables temperature-sensitive operations to reach a higher bandwidth.
- Reducing latency overhead of the infrastructure will greatly benefit latency.

