



IBM Almaden Research Center

## Solid-State Storage: Technology, Design and Applications

Dr. Richard Freitas and Lawrence Chiu

© 2010 IBM Corporation

IBM Almaden Research center



### Abstract

Most system designers dream of replacing slow, mechanical storage (disk drives) with fast, non-volatile memory. The advent of inexpensive solid-state disks (SSDs) based on flash memory technology and, eventually, on storage class memory technology is bringing this dream closer to reality.

This tutorial will briefly examine the leading solid-state memory technologies and then focus on the impact the introduction of such technologies will have on storage systems. It will include a discussion of SSD design, storage system architecture, applications, and performance assessment.

## Author Biographies

- Rich Freitas is a Research Staff Member at the IBM Almaden Research Center. Dr. Freitas received his PhD in EECS from the University of California at Berkeley in 1976. He then joined IBM at the IBM T.J. Watson Research Lab. He has held various management and research positions in architecture and design for storage systems, servers, workstations, and speech recognition hardware at the IBM Almaden Research Center and the IBM T.J. Watson Research Center. His current interest lies in exploring the use of emerging nonvolatile solid state memory technology in storage systems for commercial and scientific computing.
- Larry Chiu is Storage Research Manager and a Senior Technical Staff Member at the IBM Almaden Research Center. He co-founded the SAN Volume Controller product, a leading storage virtualization engine which has held the fastest SPC-1 benchmark record for several years. In 2008, he led a research team in the US and in the UK to demonstrate one million IOPS storage system using solid state disks. He is currently working on expanding solid state disk use cases in enterprise system and software. He has an MS in computer engineering from the University of Southern California and another MS in technology commercialization from the University of Texas at Austin.

## Acknowledgements

- Winfried Wilcke
- Geoff Burr
- Bulent Kurdi
- Clem Dickey
- Paul Muench
- C. Mohan
- KK Rao



## Agenda

|                     |        |
|---------------------|--------|
| <b>Introduction</b> | 10 min |
| <b>Technology</b>   | 40 min |
| <b>System</b>       | 30 min |
| Questions           | 10 min |
| Break               | 30 min |
| <b>Applications</b> | 40 min |
| <b>Performance</b>  | 40 min |
| Questions           | 10 min |

5 | Solid State Storage: Technology, Design and Applications  
FAST February 2010 © 2010 IBM Corporation



# Introduction

6 | Solid State Storage: Technology, Design and Applications  
FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Definition of Storage Class Memory SCM

- **A new class of data storage/memory devices**
  - many technologies compete to be the ‘best’ SCM
- **SCM features:**
  - Non-volatile
  - Short Access times (~ DRAM like )
  - Low cost per bit (more DISK like – by 2020)
  - Solid state, no moving parts
- **SCM blurs the distinction between**
  - MEMORY** (*fast, expensive, volatile*) and
  - STORAGE** (*slow, cheap, non-volatile*)

7 | Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Speed/Volatility/Persistence Matrix

|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                  | FAST<br>(Memory) |                    |                                                |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|------------------|--------------------|------------------------------------------------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                  | DRAM             | DRAM (cache) + SCM | DRAM + SCM + redundancy in system architecture |
| SLOW<br>(Storage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Volatile         |                  |                    |                                                |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Non-Volatile     |                  | USB stick          | Enterprise storage Server e.g., RAID           |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Persistence      |                  | PC disk            |                                                |
| <ul style="list-style-type: none"> <li>▪ <b>NVRAM = Non Volatile RAM</b> <ul style="list-style-type: none"> <li>—Data survives loss of power</li> <li>—SCM is one example of NVRAM</li> <li>—Other NVRAM types: DRAM+battery or DRAM+disk combos</li> </ul> </li> <li>▪ <b>Persistent Storage</b> <ul style="list-style-type: none"> <li>—Data survives despite component failure or loss of power</li> <li>—Disk drives is not persistent but RAID array is</li> </ul> </li> </ul> | FAST<br>(Memory) |                  |                    |                                                |

8 | Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## HDDs



- Invented in the 1950s
- Mechanical device consisting of a rotating magnetic media disk and actuator arm w/ magnetic head

**HUGE COST ADVANTAGES**

- \$ High growth in disk areal density has driven the HDD success
- \$ Magnetic thin-film head wafers have very few critical elements per chip (vs. billions of transistors per semiconductor chip)
- \$ Thin-film head (GMR-head) has only one critical feature size controlled by optical lithography (determining track width)
- \$ Areal density is control by track width times (X) linear density...

9 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center

### History of HDD is based on Areal Density Growth



| Production Year | Areal Density (Megabits/Square Inch) |
|-----------------|--------------------------------------|
| 1960            | ~0.01 (IBM RAMAC™)                   |
| 1970            | ~0.1 (AFC media)                     |
| 1980            | ~1 (GMR head)                        |
| 1990            | ~10 (MR head)                        |
| 2000            | ~100 (Thin-film head)                |
| 2010            | ~1,000,000 (Industry Lab Demos)      |

IBM RAMAC™ (FIRST HARD DISK DRIVE)

AFC=ANTIFERROMAGNETICALLY COUPLED  
GMR=GIGANT MAGNETORESISTIVE



10 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center | Wikipedia | IBM

## Future of HDD

Higher densities through

- perpendicular recording

Jul 2008  
610 Gb/in<sup>2</sup> → ~4 TB

- patterned media

"Ring" writing element  
Longitudinal Recording (standard)  
Recording layer

"Monopole" writing element  
Perpendicular Recording  
Recording Layer  
Additional Layer

Conventional Multigrain Media  
Patterned Magnetic Media

magnetization  
anti-magnetization

single domain magnetic island

bit cell  
recorded data track  
data 0 1 0 1 0 1 0 1 0 1 0 1

[www.hitachigst.com/hdd/research/images/  
pm\\_images/conventional\\_pattern\\_media.pdf](http://www.hitachigst.com/hdd/research/images/pm_images/conventional_pattern_media.pdf)

11 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation











IBM Almaden Research center

# Technology

19 Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation



## Criteria to judge a SCM technology

- **Device Capacity** [GigaBytes]
  - Closely related to cost/bit [\$/GB]
- **Speed**
  - Latency (= access time) Read & Write [nanoseconds]
  - Bandwidth Read & Write [GB/sec]
- **Random Access or Block Access** -
- **Write Endurance= #Writes before death** -
- **Read Endurance= #Reads** “ -
- **Data Retention Time** [Years]
- **Power Consumption** [Watt]

20 Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Even more Criteria

- **Reliability (MTBF)** [Million hours]
- **Volumetric density** [TeraBytes/liter]
- **Power On/Off transit time** [sec]
- **Shock & Vibration** [g-force]
- **Temperature resistance** [ $^{\circ}\text{C}$ ]
- **Radiation resistance** [Rad]

---

*~ 16 criteria! This makes the SCM problem so hard*

IBM Almaden Research center

## Emerging Memory Technologies

| FLASH Extension | FRAM        | MRAM      | PCRAM    | RRAM     | Solid Electrolyte | Polymer/ Organic |
|-----------------|-------------|-----------|----------|----------|-------------------|------------------|
| Trap Storage    | Ramtron     | IBM       | Ovonyx   | IBM      | Axon              | Spansion         |
| Sailfin NOR     | Fujitsu     | Infineon  | BAE      | Sharp    | Infineon          | Samsung          |
| Tower           | STMicro     | Freescale | Intel    | Unity    |                   | TFE              |
| Spansion        | TI          | Philips   | Samsung  | Spansion |                   | MEC              |
| Infineon        | Toshiba     | STMicro   | Lipica   | Samsung  |                   | Zettacore        |
| Macronix        | Immeron     | HP        | IBM      |          |                   | Rotronics        |
| Samsung         | Samsung     | NVE       | Macronix |          |                   | Nanolayer        |
| Toshiba         | NEC         | Honeywell | Intel    |          |                   |                  |
| Spansion        | Hitachi     | Toshiba   | Spansion |          |                   |                  |
| Macronix        | Rohm        | NEC       | Hitachi  |          |                   |                  |
| NEC             | HP          | Sony      | Philips  |          |                   |                  |
| Nano-Xtal       | Cypress     | Fujitsu   |          |          |                   |                  |
| Freescale       | Matsushita  | Renesas   |          |          |                   |                  |
| Matsushita      | Oki         | Samsung   |          |          |                   |                  |
|                 | Hynix       | Hynix     |          |          |                   |                  |
|                 | Celli       | TSMC      |          |          |                   |                  |
|                 | Fujitsu     |           |          |          |                   |                  |
|                 | Seiko Epson |           |          |          |                   |                  |

64Mb FRAM (Prototype)  
0.18um-0.25um

4Mb MRAM (Product)  
0.18um-0.25um

512Mb PRAM (Prototype)  
0.1um-1.8V

4Mb C-RAM (Product)  
0.25um-3.2V



IBM Almaden Research center

## What is SCM? what could it offer?

A solid-state memory that **blurs the boundaries** between storage and memory by being **low-cost, fast, and non-volatile**.

- **SCM system requirements for Memory (Storage) apps**

- No more than 3-5x the **Cost** of enterprise HDD ( $< \$1$  per GB in 2012)
- **<200nsec (<1 μsec)** Read/Write/Erase time
- $>100,000$  Read I/O operations per second
- **>1GB/sec (>100MB/sec)**
- **Lifetime** of  $10^9 - 10^{12}$  write/erase cycles
- 10x lower **power** than enterprise HDD

25 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center

## Density is key

**Cost competition between IC, magnetic and optical devices comes down to effective areal density.**

| Device       | Critical feature-size <b>F</b> | Area ( <b>F<sup>2</sup></b> ) | Density (Gbit /sq. in) |
|--------------|--------------------------------|-------------------------------|------------------------|
| Hard Disk    | 50 nm (MR width)               | 1.0                           | 250                    |
| DRAM         | 45 nm (half pitch)             | 6.0                           | 50                     |
| NAND (2 bit) | 43 nm (half pitch)             | 2.0                           | 175                    |
| NAND (1 bit) | 43 nm (half pitch)             | 4.0                           | 87                     |
| Blue Ray     | 210 nm ( $\lambda/2$ )         | 1.5                           | 10                     |

[Fontana:2004, web searches]

26 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center

## Many Competing Technologies for SCM

- **Phase Change RAM**
  - most promising now (scaling)
- **Magnetic RAM**
  - used today, but poor scaling and a space hog
- **Magnetic Racetrack**
  - basic research, but very promising long term
- **Ferroelectric RAM**
  - used today, but poor scalability
- **Solid Electrolyte and resistive RAM (Memristor)**
  - early development, maybe?
- **Organic, nano particle and polymeric RAM**
  - many different devices in this class, unlikely

- **Improved FLASH**
  - still slow and poor write endurance

**Generic SCM Array**

27 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center

## What is Flash?

oxide

gate

source

drain

Floating Gate

control gate

e<sup>-</sup> e<sup>-</sup>

Flash Memory "1"

source

drain

Floating Gate

control gate

e<sup>-</sup> e<sup>-</sup> e<sup>-</sup> e<sup>-</sup>

Flash Memory "0"

source

drain

- Based on MOS transistor
- Transistor gate is redesigned
  - Charge is placed or removed near the “gate”
  - The threshold voltage  $V_{th}$  of the transistor is shifted by the presence of this charge
  - The threshold Voltage shift detection enables non-volatile memory function.

28 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center

## FLASH memory types and application

|                    | NOR          | NAND                                       |
|--------------------|--------------|--------------------------------------------|
| Cell Size          | $9-11 F^2$   | $2 F^2$<br>( $4 F^2$ physical x 2-bit MLC) |
| Read               | 100 MB/s     | 18-25 MB/s                                 |
| Write              | <0.5MB/sec   | 8MB/sec                                    |
| Erase              | 750msec      | 2ms                                        |
| Market Size (2007) | \$8B         | \$14.2B                                    |
| Applications       | Program code | Multimedia                                 |

29 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation





31

Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation



32

Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation



IBM Almaden Research center

## Magnetic Racetrack Memory

**MRAM alternatives  
a 3-D shift register**



• Data stored as pattern of magnetic domains in long nanowire or “racetrack” of magnetic material.

• Current pulses move domains along racetrack

• Use deep trench to get many (**10-100**) bits per  $4F^2$




**IBM trench DRAM**

**Magnetic Race Track Memory**  
S. Parkin (IBM), US patents 6,834,005 (2004) & 6,898,132 (2005)

35 | Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation

IBM Almaden Research center

## Magnetic Racetrack Memory

- Need deep trench with notches to “pin” domains
- Need sensitive sensors to “read” presence of domains
- Must insure a moderate current pulse moves every domain one and only one notch
- Basic physics of current-induced domain motion being investigated

**Promise (10-100 bits/ $F^2$ ) is enormous...**

**but we’re still working on our basic understanding of the physical phenomena...**




36 | Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation

| IBM Almaden Research center

## RRAM (Resistive RAM)

- Numerous examples of materials showing hysteretic behavior in their I-V curves
- Mechanisms not completely understood, but major materials classes include
  - metal nanoparticles(?) in **organics**
    - could they survive high processing temperatures?
  - oxygen vacancies(?) in **transition-metal oxides**
    - forming step sometimes required
    - scalability unknown
    - no ideal combination yet found of
      - low switching current
      - high reliability & endurance
      - high ON/OFF resistance ratio
  - metallic filaments in **solid electrolytes**

(c)

(d)

SrTiO<sub>3</sub>:Cr

[Karg:2008]

37 | Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation

| IBM Almaden Research center

## Memristor

**IEEE Spectrum** 1208  
THE MAGAZINE OF TECHNOLOGY INSIDERS  
MAKING THE MEMRISTOR  
THE INSIDE STORY OF THE GREATEST ELECTRONICS INVENTION OF THE LAST 25 YEARS

Memristive systems

Bow Ties

**L**EON CHUA's original graph of the hypothetical memristor's behavior is shown at top right; the graph of R. Stanley Williams's experimental results in the *Nature* paper is shown below. The loops map the switching behavior of the device: it begins with a high resistance, and as the voltage increases, the current slowly increases. As charge flows through the device, the resistance drops, and the current increases more rapidly with increasing voltage until the maximum is reached. Then, as the voltage decreases, the current decreases but more slowly, because charge is flowing through the device and the resistance is still dropping. The result is an on-switching loop. When the voltage turns negative, the resistance of the device increases, resulting in an off-switching loop.—R.S.W.

Current, mA

Voltage

38 | Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation









| IBM Almaden Research center | IBM

## Phase-Change Nano-Bridge

- Prototype memory device with ultra-thin (**3nm**) films – Dec 2006



The New York Times

- $3\text{nm} * 20\text{nm} \rightarrow 60\text{nm}^2$   
 $\approx$  Flash roadmap for **2013**  
**→ phase-change scales**
- **Fast** (<100ns SET)  
**Low current** (< 100 $\mu\text{A}$  RESET)

Phase-change “bridge”  
W defined by lithography  
H by thin-film deposition



TiN  
TiN  
GeSb (W=20nm)  
GeSb (W=200nm)  
200 nm

RESET current [ $\mu\text{A}$ ]

Area =  $W \cdot H$  (in  $\text{nm}^2$ )

1000  
500  
300  
200  
100  
60 100 300 1000

H=25nm  
H=3nm  
H=10nm  
L = 50nm

Current scales with area

47 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | [Chen:2006] | © 2010 IBM Corporation

| IBM Almaden Research center | IBM

## Paths to ultra-high density memory



starting from standard  **$4F^2$**  ...

...store **M** bits/cell with  **$2^M$  multiple levels**

...add **N** 1-D **sub-lithographic** **demonstrated** (at IEDM 2005) “fins” ( $N^2$  with 2-D)

...go to 3-D with **L** **layers**

48 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

| IBM Almaden Research center | 

## Sub-lithographic addressing

- Push beyond the lithography roadmap to pattern a dense memory
- But nano-pattern has more complexity than just lines & spaces
- Must find a scheme to connect the surrounding micro-circuitry to the dense nano-array

[Gopalakrishnan:2005 IEDM]

49 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation



| IBM Almaden Research center | 

## MLC (Multi-Level Cells)

- Write and read multiple analog voltages  
→ higher density at same fabrication difficulty
- Logarithm is not your friend:
  - 4 levels for 2 bits
  - 8 levels for 3 bits
  - 16 levels for 4 bits
- Coding & signal processing can help
- An iterative write scheme trades off performance for density → but useful to minimize resistance variability

50 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation





## Paths to ultra-high density memory

At the 32nm node in 2013,  
MLC NAND Flash  
(already M=2 → 2F<sup>2</sup> !)  
is projected\* to be at...

- 2x**      *density*      *product*  
 $43 \text{ Gb/cm}^2 \rightarrow 32 \text{Gb}$
- if we could shrink 4F<sup>2</sup> by...**
- 4x**       $86 \text{ Gb/cm}^2 \rightarrow 64 \text{Gb}$   
e.g., 4 layers of 3-D (L=4)
- 16x**       $344 \text{ Gb/cm}^2 \rightarrow 256 \text{Gb}$   
e.g., 8 layers of 3-D,  
2 bits/cell (L=8,M=2)
- 64x**       $1376 \text{ Gb/cm}^2 \rightarrow \sim 1 \text{Tb}$   
e.g., 4 layers of 3-D,  
4x4 sublithographic (L=4,N=4<sup>2</sup>)



\* 2006 ITRS Roadmap

53

Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 \*\*  
IBM Corporation

| IBM Almaden Research center

IBM

## Industry SCM activities

- Intel/ST-Microelectronics spun out Numonyx (FLASH & PCM)
- Samsung, Numonyx sample PCM chips
  - 128Mb Numonyx chip (90nm) shipped in 12/08 to select customers
  - Samsung started production of 512Mb (60nm) PCM in 9/09
  - Working together on common PCM spec
- Over 30 companies work on SCM
  - including all major IT players
  - SCM research in IBM



IBM sub-litho PCM



Alverstone PCM



Samsung 512 Mbit PCM chip

54

Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 \*\*  
IBM Corporation

| IBM Almaden Research center | 

## For more information

**Flash**

- S. Lai, *IBM J. Res. Dev.*, 52(4/5), 529 (2008).
- R. Bez, E. Camerlenghi, et. al., *Proceedings of the IEEE*, 91(4), 489-502 (2003).
- G. Campardo, M. Scotti, et. al., *Proceedings of the IEEE*, 91(4), 523-536 (2003).
- P. Cappelletti, R. Bez, et. al., *IEDM Technical Digest*, 489-492 (2004).
- A. Fazio, *MRS Bulletin*, 29(11), 814-817 (2004).
- K. Kim and J. Choi, *Proc. Non-Volatile Semiconductor Memory Workshop*, 9-11 (2006).
- M. Noguchi, T. Yaegashi, et. al., *IEDM Technical Digest*, 17.1 (2007).

55 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

| IBM Almaden Research center | 

## For more information (on FeRAM, MRAM, RRAM & SE)

G. W. Burr, B. N. Kurdi, J. C. Scott, C. H. Lam, K. Gopalakrishnan, and R. S. Shenoy,  
 "An overview of candidate device technologies for Storage-Class Memory,"  
*IBM Journal of Research and Development*, 52(4/5), 449-464 (2008).

**FeRAM**

- A. Sheikholeslami and P. G. Gulak, *Proc. IEEE*, 88, No. 5, 667-689 (2000).
- Y.K. Hong, D.J. Jung, et. al., *Symp. VLSI Technology*, 230-231 (2007).
- K. Kim and S. Lee, *J. Appl. Phys.*, 100, No. 5, 051604 (2006).
- N. Setter, D. Damjanovic, et. al., *J. Appl. Phys.*, 100(5), 051606 (2006).
- D. Takashima and I. Kunishima, *IEEE J. Solid-State Circ.*, 33, No. 5, 787-792 (1998).
- S. L. Miller and P. J. McWhorter, *J. Appl. Phys.*, 72(12), 5999-6010 (1992).
- T. P. Ma and J. P. Han, *IEEE Elect. Dev. Lett.*, 23, No. 7, 386-388 (2002).

**MRAM**

- R. E. Fontana and S. R. Hetzler, *J. Appl. Phys.*, 99(8), 08N902, (2006).
- W. J. Gallagher and S. S. P. Parkin, *IBM J. Res. Dev.* 50(1), 5-23, (2006).
- M. Durlam, Y. Chung, et. al., *ICICDT Tech. Dig.*, 1-4, (2007).
- D. C. Wordle, *IBM J. Res. Dev.* 50(1), 69-79, (2006).
- S.S.P. Parkin, *IEDM Tech. Dig.*, 903-906 (2004).
- L. Thomas, M. Hayashi, et. al., *Science*, 315(5818), 1553-1556 (2007).

**RRAM**

- J. C. Scott and L. D. Bozano, *Adv. Mat.*, 19, 1452-1463 (2007).
- Y. Hosoi, Y. Tamai, et. al., *IEDM Tech. Dig.*, 30.7.1-4 (2006).
- D. Lee, D.-J. Seong, et. al., *IEDM Tech. Dig.*, 30.8.1-4 (2006).
- S. F. Karg, G. I. Meijer, et. al., *IBM J. Res. Dev.*, 52(4/5), 481-492 (2008).
- D. B. Strukov, et. al., *Nature*, 453, 80(7191), 80-83 (2008).
- R. S. Williams, *IEEE Spectrum*, Dec 2008.

**SE**

- M. N. Kozicki, M. Park, and M. Mitkova, *IEEE Trans. Nanotech.*, 4(3), 331-338 (2005).
- M.N. Kozicki, M. Balakrishnan, et. al., *Proc. IEEE NVSM Workshop*, 83-89 (2005).
- M. Kund, G. Beitel, et. al., *IEDM Tech. Dig.*, 754-757 (2005).
- P. Schröglmeier, M. Angerbauer, et. al., *Symp. VLSI Circ.*, 186-187 (2007).

56 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

| IBM Almaden Research center

## For more information (on PCRAM)

S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y. Chen, R. M. Shelby,  
 M. Salinga, D. Krebs, S. Chen, H. Lung, and C. H. Lam, "Phase-change random access memory —  
 a scalable technology," *IBM Journal of Research and Development*, **52**(4/5), 465-480 (2008).

**PCRAM**

- S. R. Ovshinsky, *Phys. Rev. Lett.*, **21**(20), 1450 (1968).
- D. Adler, M. S. Shur, et. al., *J. Appl. Phys.*, **51**(6), 3289-3309 (1980).
- R. Neale, *Electronic Engineering*, **73**(891), 67-, (2001).
- T. Ohta, K. Nagata, et. al., *IEEE Trans. Magn.*, **34**(2), 426-431 (1998).
- T. Ohta, J. Optoelectr. Adv. Mat., **3**(3), 609-626 (2001).
- S. Lai, *IEDM Technical Digest*, 10.1.1-10.1.4, (2003).
- A. Pirovano, A. L. Lacaita, et. al., *IEDM Tech. Dig.*, 29.6.1-29.6.4, (2003).
- A. Pirovano, A. Redaelli, et. al., *IEEE Trans. Dev. Mat. Reliability*, **4**(3), 422-427, (2004).
- A. Pirovano, A. L. Lacaita, et. al., *IEEE Trans. Electr. Dev.*, **51**(3), 452-459 (2004).
- Y. C. Chen, C. T. Rettner, et. al., *IEDM Tech. Dig.*, S3OP3, (2006).
- J.H. Oh, J.H. Park, et. al., *IEDM Tech. Dig.*, 2.6, (2006).
- S. Raoux, C. T. Rettner, et. al., *EPCOS 2006*, (2006).
- M. Breitwisch, T. Nirschl, et. al., *Symp. VLSI Tech.*, 100-101, (2007).
- T. Nirschl, J. B. Philipp, et. al., *IEDM Technical Digest*, 17.5, (2007).
- J.I. Lee, H. Park, *Symp. VLSI Tech.*, 102-103 (2007).
- S.-H. Lee, Y. Jung, and R. Agarwal, *Nature Nanotech.*, **2**(10), 626-630 (2007).
- D. H. Kim, F. Merget, et. al., *J. Appl. Phys.*, **101**(6), 064512 (2007).
- M. Wuttig and N. Yamada, *Nature Materials*, **6**(11), 824-832 (2007).

| IBM Almaden Research center

## FeRAM/MRAM/RRAM/SE References

G. W. Burr, B. N. Kurdi, J. C. Scott, C. H. Lam, K. Gopalakrishnan, and R. S. Shenoy, "An overview of candidate device technologies for Storage-Class Memory," *IBM Journal of Research and Development*, **52**(4/5), 449-464 (2008).

- ITRS roadmap, [www.itrs.net](http://www.itrs.net)
- T. Nirschl, J. B. Philipp, et. al., *IEDM Technical Digest*, 17.5 (2007).
- K. Gopalakrishnan, R. S. Shenoy, et. al., *IEDM Technical Digest*, 471-474 (2005).
- F. Li, X. Y. Yang, et. al. *IEEE Trans. Dev. Materials Reliability*, **4**(3), 416-421 (2004).
- H. Tanaka, M. Kido, et. al., *Symp. VLSI Technology*, 14-15 (2007).

IBM Almaden Research center

## In comparison...

|                                   | Flash                                              | SONOS Flash                                        | Nanocrystal Flash                      | FeRAM                                  | FeFET                                  |
|-----------------------------------|----------------------------------------------------|----------------------------------------------------|----------------------------------------|----------------------------------------|----------------------------------------|
| <b>Knowledge level</b>            | product                                            | advanced development                               | development                            | product                                | basic research                         |
| <b>Smallest demonstrated cell</b> | <b>4F<sup>2</sup></b><br>(2F <sup>2</sup> per bit) | <b>4F<sup>2</sup></b><br>(1F <sup>2</sup> per bit) | 16F <sup>2</sup><br>(@90nm)            | 15F <sup>2</sup><br>(@130nm)           | —                                      |
| <b>Prospects for...</b>           |                                                    |                                                    |                                        |                                        |                                        |
| ...scalability                    | <b>poor</b>                                        | <b>maybe</b> (enough stored charge?)               | <b>unclear</b> (enough stored charge?) | <b>poor</b> (integration, signal loss) | <b>unclear</b> (difficult integration) |
| ...fast readout                   | yes                                                | yes                                                | yes                                    | yes                                    | yes                                    |
| ...fast writing                   | <b>NO</b>                                          | <b>NO</b>                                          | <b>NO</b>                              | yes                                    | yes                                    |
| ...low switching Power            | yes                                                | yes                                                | yes                                    | yes                                    | yes                                    |
| ...high endurance                 | <b>NO</b>                                          | <b>poor</b><br>(1e7 cycles)                        | <b>NO</b>                              | yes                                    | yes                                    |
| ...non-volatility                 | yes                                                | yes                                                | yes                                    | yes                                    | <b>poor</b><br>(30 days)               |
| ...MLC operation                  | yes                                                | yes                                                | yes                                    | <b>difficult</b>                       | <b>difficult</b>                       |

59 | Solid State Storage: Technology, Design and Applications  
FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Comparison continued

|                                   | MRAM                             | Racetrack                                             | PCRAM                                                                    | RRAM              | solid electrolyte                                           | organic memory                         |
|-----------------------------------|----------------------------------|-------------------------------------------------------|--------------------------------------------------------------------------|-------------------|-------------------------------------------------------------|----------------------------------------|
| <b>Knowledge level</b>            | product                          | basic research                                        | advanced development                                                     | Early development | development                                                 | basic research                         |
| <b>Smallest demonstrated cell</b> | <b>25F<sup>2</sup></b><br>@180nm | —                                                     | <b>5.8F<sup>2</sup></b> (diode)<br><b>12F<sup>2</sup></b> (BJT)<br>@90nm | —                 | <b>8F<sup>2</sup></b><br>@90nm<br>(4F <sup>2</sup> per bit) | —                                      |
| <b>Prospects for...</b>           |                                  |                                                       |                                                                          |                   |                                                             |                                        |
| ...scalability                    | <b>poor</b><br>(high currents)   | <b>unknown</b><br>(too early to know, good potential) | <b>promising</b><br>(rapid progress to date)                             | <b>unknown</b>    | <b>promising</b><br>(filament-based, but new materials)     | <b>unknown</b><br>(high temperatures?) |
| ...fast readout                   | yes                              | yes                                                   | yes                                                                      | yes               | yes                                                         | <b>sometimes</b>                       |
| ...fast writing                   | yes                              | yes                                                   | yes                                                                      | <b>sometimes</b>  | yes                                                         | <b>sometimes</b>                       |
| ...low switching Power            | <b>NO</b>                        | <b>uncertain</b>                                      | <b>poor</b>                                                              | <b>sometimes</b>  | yes                                                         | <b>sometimes</b>                       |
| ...high endurance                 | yes                              | should                                                | yes                                                                      | <b>poor</b>       | <b>unknown</b>                                              | <b>poor</b>                            |
| ...non-volatility                 | yes                              | <b>unknown</b>                                        | yes                                                                      | <b>sometimes</b>  | <b>sometimes</b>                                            | <b>poor</b>                            |
| ...MLC operation                  | <b>NO</b>                        | yes (3-D)                                             | yes                                                                      | yes               | yes                                                         | <b>unknown</b>                         |

60 | Solid State Storage: Technology, Design and Applications  
FAST February 2010 © 2010 IBM Corporation





IBM Almaden Research center

## SCM Memory Classes

- **Storage device**
  - NAND flash is current technology → eventually PCM
  - Nonvolatile operation essential
  - Erase vs write in place
  - Medium speed --- Flash 20-50us, PCM for storage 1-5us est.
  - Write endurance issues
- **Memory device**
  - DRAM for most performance applications
  - NOR flash for portable, etc. → PCM positioning here
    - nonvolatile operation may not be needed everywhere
    - Fast: DRAM 30-60ns, NOR 75ns, (wt very long), PCM ~75-1000ns est.
    - Write endurance issues, but not as severe
  - Can PCM replace/augment DRAM in mainstream systems?

65 | Solid State Storage: Technology, Design and Applications  
FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Representative NAND Flash Device

- **Power ≈ 100mW**
- **Interface: one or two bytes wide**
- **Data accessed in pages**
  - 2112, 4224 or 8448 Bytes
- **Data erased in blocks**
  - Block = 64 - 128 Pages

66 | Solid State Storage: Technology, Design and Applications  
FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Representative NAND Flash Behavior

- **Read copies Page into BUF and streams data to host**
  - Read 20 - 50 us access,
  - 20 MB/s transfer rate – sustained
  - ONFI will take it to 200 MB/s
- **Write streams data from host into BUF**
  - 6 MB/s transfer rate sustained
  - 20 MB/s on standard bus
  - ONFI increases this to
- **Program copies BUF into an erased Page**
  - Program 2 KB / 4 KB page: 0.2 ms
- **Erase clears all Pages in a Block to "1"s**
  - Erase 128 KB block: 1.5 ms
  - A block must be erased before any of its pages may be programmed

The diagram illustrates the data flow in a NAND flash system. It shows a Host Interface connected to a Flash Chip, which is connected to a Channel. The Channel connects to multiple Flash blocks (labeled Flash 1, Flash 2, and Flash n). The Flash blocks contain pages of data. A Buffer (BUF) and a Cache are also shown, with bidirectional arrows indicating data exchange between them and the Host Interface, Flash Chip, and the Flash blocks.









IBM Almaden Research center | IBM

## SCM-based Memory System



Logical Address > Translation > Wear Level > SCM Physical Add

- **Treat WL as part of address translation flow**
  - Option a – Separate WL/SCM controller
  - Option b - Integrated VM/WL/SCM controller
  - Option c - Software WL/Control
- **Also need physical controller for SCM**
  - Different from DRAM physical controller

75 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation





IBM Almaden Research center

## Challenges for SCM

- **Asymmetric performance**
  - Flash: writes much slower than reads
  - Not as pronounced in other technologies
- **Program/erase cycle**
  - Issue for flash
  - Most are write-in-place
- **Data retention and Non-volatility**
  - It's all relative
  - Use case dependent
- **Bad blocks**
  - Devices are shipped with bad blocks
  - Blocks wear out, etc.

▪ **The “fly in the ointment” for both memory and storage is write endurance**

- In many SCM technologies writes are cumulatively destructive
- For Flash it is the program/erase cycle
- Current commercial flash varieties
  - Single level cell (SLC) →  $10^5$  writes/cell
  - Multi level cell (MLC) →  $10^4$  writes/cell
- Coping strategy → Wear-leveling, etc.

79 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

IBM Almaden Research center

## Static wear leveling

- Infrequently written data – OS data, etc
- Maintain count of erasures per block
- Goal is to keep counts “near” each other
- Simple example: move data from hot block to cold block
  - Write LBA 4
  - D1 → 4
  - 1 now FREE
  - D4 → 1

Logical to physical address map

| LBA | Physical Address |
|-----|------------------|
| 1   | 1                |
| 2   | 6                |
| 3   | 3                |
| 4   | 2                |

erasures

|      |      |
|------|------|
| 1    | (10) |
| 2    | (99) |
| 3    | (28) |
| FREE | (98) |
| FREE | (97) |
| D2   | (98) |

80 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

IBM Almaden Research center | IBM

## Dynamic wear leveling

Logical to physical address map

- Frequently written data – logs, updates, etc.
- Maintain a set of free, erased blocks
- Logical to physical block address mapping
- Write new data of free block
- Erase old location and add to free list.

81 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center | IBM

## Lifetime model (more details)

- S are system level management ‘tools’ providing an effective endurance of  $E^* = S(E)$ 
  - E is the Raw Device endurance and
  - $E^*$  is the *effective Write Endurance*
- S includes
  - Static and dynamic wear leveling of efficiency  $q < 1$
  - Error Correction and bad block management
  - Over-provisioning
  - Compress, de-duplicate & write elimination...
  - $E^* = E \cdot q \cdot f(\text{error correction}) \cdot g(\text{overprovisioning}) \cdot h(\text{compress}) \dots$
  - With S included,  $T_{\text{life}}(\text{System}) = T_{\text{fill}} \cdot E^*$

82 | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

IBM Almaden Research center | IBM

### Write and/or read endurance and life-time of SCM devices

- In DRAM and disks (magnetic) there is no known wear out mechanism
- In flash and many SCM technologies there are known wear out mechanisms
- Simple wear leveling → each write is done to a new (empty) location
  - Data unit is the smallest item that can be written/erased
  - Memory unit is the size of the largest item that can be wear-leveled

|             | DRAM       | Disk       | 256GB Flash             | 8 GB SCM |
|-------------|------------|------------|-------------------------|----------|
| Endurance   | $>10^{16}$ | $>10^{11}$ | $10^5 \rightarrow 10^4$ | $10^8$   |
| Wear-eveled | N          | N          | N                       | Y        |
| Memory unit | 1 B        | 512 B      | 128 KB                  | 256 GB   |
| Data unit   | 1 B        | 512 B      | 128 KB                  | 128 KB   |
| Fill Time   | 100 ns     | 4 ms       | 2 ms                    | 4000 s   |
| Life Time   | >31 yrs    | >12 yrs    | <4 min                  | >12 yrs  |
|             |            |            |                         | >190 yrs |

83 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

IBM Almaden Research center | IBM

### Summary

- There are a number of solid state memory technologies competing with DRAM and Disk
  - Flash and PCM are the current leaders
- An inexpensive nonvolatile memory with medium speed (1 – 50 us) will change the storage hierarchy
- An inexpensive memory with speed near DRAM will change the memory hierarchy
  - Such a memory that is also nonvolatile will enable new areas
- Write endurance is an issue for many of these technologies, but there are techniques to cope with it

84 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

## ■ Questions?

## ■ Break Time

# Performance

IBM Almaden Research center

## IBM QuickSilver Project 2008 → SSD proof of concept

**SAN connected hosts**

**SAN**  
**Storage Virtualization**

4 x 4Gbps FC ports per node

**Quick Silver**     ...     **Quick Silver**

**SAN: Storage Area Network**  
**SVC: San Volume Controller**

87

87 | Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation

IBM Almaden Research center

## QuickSilver Headlines in the Press (August 2008)

- **Network World - IBM flash memory breaks 1 million IOPS barrier**
  - “Flash storage is starting to catch on with enterprise customers as such vendors as EMC promise faster speeds and more efficient use of storage with solid-state disks. Speeds are typically orders-of-magnitude lower than what IBM is claiming to have achieved.”
- **Information Week - IBM Plans Breakthrough Solid-State Storage System 'Quicksilver'**
  - “Compared to the fastest industry benchmarked disk system, the new technology had less than 1/20th the response time. In addition, the solid-state system took up 1/5th the floor space and required 55% of the power and cooling.”
- **Bloomberg - IBM Breaks Performance Records through Systems Innovation**
  - “IBM has demonstrated, for the first time, the game-changing impact solid-state technologies can have on how businesses and individuals manage and access information.”

88 | Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010 IBM Corporation

## Understanding Flash based SSD performance

- Flash media can only do one the following three things: Read, Erase, Program
- IO Read -> Flash Read, IO Write -> Flash Erase and Flash Program
- Erase cycle is very time consuming (in msec)
- Major latency difference for IO Read operation (50usec) versus IO Write (100+usec) operation
- Flash based SSD device requires storage virtualization to deal with undesirable flash properties, erase latency and wear-leveling.
- Storage virtualization techniques typically used are : Relocate on write, batch write operation and , over provisioning.

## Vendor A SSD – IOPS and Latency

| Optimal IOPS |       |                 |             |
|--------------|-------|-----------------|-------------|
| R/W          | IOPS  | Latency - usecs | Queue Depth |
| 100/0        | 47810 | 165             | 8           |
| 0/100        | 11316 | 85.8            | 1           |
| 50/50        | 17089 | 113             | 2           |

Sequential Precondition



| Minimal Latency |       |                 |             |
|-----------------|-------|-----------------|-------------|
| R/W             | IOPS  | Latency - usecs | Queue Depth |
| 100/0           | 17221 | 56.9            | 1           |
| 0/100           | 11316 | 85.8            | 1           |
| 50/50           | 11776 | 83              | 1           |



IBM Almaden Research center

## Vendor B SSD – IOPS and Latency

**Optimal IOPS**

| R/W   | IOPS  | Latency - usecs | Queue Depth |
|-------|-------|-----------------|-------------|
| 100/0 | 27048 | 583             | 16          |
| 0/100 | 19095 | 209             | 4           |
| 50/50 | 12125 | 1300            | 16          |

**Sequential Precondition**

**Minimum Latency**

| R/W   | IOPS  | Latency - usecs | Queue Depth |
|-------|-------|-----------------|-------------|
| 100/0 | 2583  | 386             | 1           |
| 0/100 | 10525 | 92.7            | 1           |
| 50/50 | 2567  | 388             | 1           |

**4K Latency vs Queue Depth**

91 Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Understanding Flash Based SSD Performance

- Latency model changes based on different storage hardware and software architecture.
- Read OPs are 2x+ comparing to Write Ops

92 Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation











This IOPS is not equal to that IOPS

- **Low latency -> High IOPS**
  - You work faster -> You work more per unit time
- **Parallelism -> High IOPS**
  - More of you work -> More work is done per unit time

101 | IBM Almaden Research center | IBM | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation



## SSD Evaluation Service

- **Micro benchmarks**
  - Measure the performance of focused benchmarks:
    - Average IOPS for block size = 4k, queue depth=16, etc. e.g.
    - Metrics: latency → bandwidth and IOPS
    - Reports: hot spots, latency distribution, etc.
- **System benchmarks**
  - Measure performance storage system workloads: e.g., SPC-1
  - Metrics: sustained performance, etc.
- **Application benchmarks**
  - Measure performance of application workloads: TPC-C, etc.
  - Metrics: \$/TPMC, etc.

102 | IBM Almaden Research center | IBM | Solid State Storage: Technology, Design and Applications | FAST February 2010 | © 2010 IBM Corporation

# Application

## System Topology Using SSD



- **Purposed built Storage System using SSD**
  - Single purpose, well understood workload, high performance
  - Eg. Financial Advanced Trading Desk (Algorithmic Trading Desk)
  - Challenges : Balanced performance, cost, reliability and availability.
  
- **General Purpose Storage System**
  - Quickest time to market approach
  - Focus on consumability
  - Mixed SSD with HDD types in multiple tiered storage system.
  - Eg. Database workload, Batch processing
  - Challenges : Find the right balance between automation and policy based data placement.



105

Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010  
IBM Corporation

106

Solid State Storage: Technology, Design and Applications

FAST February 2010

© 2010  
IBM Corporation



IBM Almaden Research center

## Demonstration of Data Placement Technology on IBM Enterprise Storage System

**60-70%+ Reduction in "SPC-1 Like" Average Response Time with Data Placement Technology**

| IOPS  | HDD only (ms) | HDD+SSD with Data Placement (ms) | % Avg. RT Reduction |
|-------|---------------|----------------------------------|---------------------|
| 0     | ~2.0          | ~1.5                             | -                   |
| 5000  | ~4.0          | ~1.8                             | -                   |
| 10000 | ~6.0          | ~2.0                             | -                   |
| 12000 | ~8.5          | ~2.2                             | ~75%                |

- Setup:**
  - Single Enterprise Storage System with both HDD and SSD ranks. About 5-6% capacity is in SSD ranks.
- Demonstration of Data Placement:**
  - Compare "SPC-1 like" workload on HDD versus "Data placement of HDD and SSD"
  - Data Placement Technology identifies and non-disruptively migrates "hot data" from HDD to SSD. About 4% of data is migrated from HDD to SSD.
- Result:**
  - Response time reduction of 60-70%+ at peak load
    - Sustainability test, 76%
    - Ramp test, 77%

109 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

IBM Almaden Research center

## Average Response Time Shows Significant Improvement with Data Placement and Migration Technology

| Minutes (Experimentation Duration) | Avg RT (msec) Before Migration | Avg RT (msec) After 5 hours Migration | Max Improvement (msec) |
|------------------------------------|--------------------------------|---------------------------------------|------------------------|
| 1                                  | ~14.5                          | -                                     | -                      |
| 68                                 | ~10.5                          | -                                     | -                      |
| 135                                | ~8.5                           | -                                     | -                      |
| 202                                | ~7.5                           | -                                     | -                      |
| 269                                | ~6.5                           | -                                     | -                      |
| 336                                | ~5.5                           | -                                     | -                      |
| 403                                | ~4.5                           | -                                     | -                      |
| 470                                | ~3.5                           | -                                     | -                      |
| 537                                | ~3.0                           | -                                     | -                      |
| 604                                | ~2.8                           | -                                     | -                      |
| 671                                | ~2.5                           | -                                     | -                      |
| 738                                | ~2.2                           | -                                     | -                      |
| 805                                | ~2.0                           | -                                     | -                      |
| 872                                | ~1.8                           | -                                     | -                      |
| 939                                | ~1.6                           | -                                     | -                      |
| 1006                               | ~1.5                           | -                                     | -                      |
| 1073                               | ~1.4                           | -                                     | -                      |
| 1140                               | ~1.3                           | -                                     | -                      |
| 1207                               | ~1.2                           | -                                     | -                      |
| 1274                               | ~1.1                           | -                                     | -                      |
| 1341                               | ~1.0                           | -                                     | -                      |
| 1408                               | ~0.9                           | -                                     | -                      |

Before Migration: Avg RT 9.13msec  
After 5 hours Migration: Avg RT 4 msec  
Maximum Improvement Of Average RT to 2 msec

Migration Begins after 1 hour

110 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation





IBM Almaden Research center

## Paths Forward for SCM

- **Storage**
  - Direct disk replacement with an NAND Flash (SCM) packaged as a SSD
  - PCIe card that supports a high bandwidth local or direct attachment to a processor.
  - Design the storage system or the computer system around Flash or SCM from the start
  
- **Memory**
  - Possible positioning in the memory stack
  - Paging

115 | Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## SCM impact on software (Present to Future)

- **Operating systems**
  - Extend state information kept about memory pages
  - New mechanisms to manage new resource
  - Enhanced to provide hints to other layers of software
  - Potential for direct involvement in managing caches and pools
  
- **Middle ware and applications → evolutionary**
  - Improved performance impact immediate – full exploitation will occur gradually
  - Little near term demand for non-volatility
  - Cost improvements will drive memory size
  - Memory size will drive larger and more complex data structures.
  - Reload time on a crash will be exacerbated
  - User's need for non-volatility, persistence, etc. will be driven by these effects – blurring of memory and storage

116 | Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Issues with persistent memory

- **Shared state maintenance**
  - Storage difficult to corrupt, must set up a write operation
  - Directly mapped storage easily corrupted
  - Corrupted state is persistent
- **Memory pool management**
  - Complex management task
  - Fixed or “Virtually Fixed” allocation
  - Addressability
- **SCM Media Failure**
  - Bad block and Wearout
  - Complex recovery scenario in typical memory management model

117 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

IBM Almaden Research center

## Implications on Traditional Commercial Databases

- **Initial SCM in DB uses:**
  - Logging (for Durability)
  - Buffer pool
- **Long term, deep Impact: Random access replaces paging**
  - DB performance depends heavily on good guesses what to page in
  - Random access eliminates column/row access tradeoffs
  - Reduces energy consumption (big effect)
- **Existing trend is to replace ‘update in place’ with ‘appends’**
  - that's good – helps with write endurance issue
- **Reduce variability of data mining response times**
  - from hours and days (today) to seconds (SCM)

118 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

IBM Almaden Research center

IBM

## PCM as Logging Store – Permits > Log Forces/sec?

- Obvious one but options exist even for this one!
- Should log records be written directly to PCM or
  - first to DRAM log buffers and then be forced to PCM (rather than disk)
- In the latter case, is it really that beneficial if ultimately you still want to have log on disk since PCM capacity won't be as much as disk – also since disk is more reliable and is a better long term storage medium
- In former case, all writes will be way slowed down!

119 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation

IBM Almaden Research center

IBM

## PCM replaces DRAM? - Buffer pool in PCM?

- This PCM BP access will be slower than DRAM BP access!
- Writes will suffer even more than reads!!
- Should we instead have DRAM BPs backed by PCM BPs?

This is similar to DB2 z in parallel sysplex environment with BPs in coupling facility (CF)  
But the DB2 situation has well defined rules on when pages move from DRAM BP to CF BP
- Variation was used in SafeRAM work at MCC in 1989

120 | Solid State Storage: Technology, Design and Applications  
FAST February 2010  
© 2010 IBM Corporation



## Assume whole DB fits in PCM?

- Apply old main memory DB design concepts directly?
- Shouldn't we leverage persistence specially?
- Every bit change persisting isn't always a good thing!
- Today's failure semantics lets fair amount of flexibility on tracking changes to DB pages – only some changes logged and inconsistent page states not made persistent!
- Memory overwrites will cause more damage!
- If every write assumed to be persistent as soon as write completes, then L1 & L2 caching can't be leveraged – need to do write through, further degrading performance.

121 Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation



## Assume whole DB fits in PCM? ...

- Even if whole DB fits in PCM and even though PCM is persistent, still need to externalize DB regularly since PCM won't have good endurance!
- If DB spans both DRAM and PCM, then
  - need to have logic to decide what goes where – hot and cold data distinction?
  - persistency isn't uniform and so need to bookkeep carefully

122 Solid State Storage: Technology, Design and Applications FAST February 2010 © 2010 IBM Corporation

IBM Almaden Research center

## Data Availability and PCM



- **What about data availability model with PCM?**
  - Reliability, Recoverability and Availability
  
- **If PCM is used as permanent and persistent medium for data, what is the right kind of reliability model? Is memory failure detection and recovery sufficient?**
  
- **If PCM is used as memory and its persistence is taken advantage of, then such a memory should be dual ported (like for disks) so that its contents are accessible even if the host fails for backup to access**
  
- **Should locks also be maintained in PCM to speed up new transaction processing when host recovers**

123 Solid State Storage: Technology, Design and Applications © 2010 IBM Corporation

FAST February 2010

IBM Almaden Research center

## What about Logging?

- If PCM is persistent and whole DB in PCM, do we need logging?
  
- Of course it is needed to provide at least partial rollback even if data is being versioned (at least need to track what versions to invalidate or eliminate); also for auditing, disaster recovery, ...

124 Solid State Storage: Technology, Design and Applications © 2010 IBM Corporation

FAST February 2010

## Start from Scratch?

- **Maybe it is time for a fundamental rethink**
- **Design a DBMS from scratch keeping in mind the characteristics of PCM**
- **Reexamine data model, access methods, query optimizer, locking, logging, recovery, ...**

## Summary

- **SCM in the form of Flash and PCM are here today and real. Others will follow.**
- **SCM will have a significant impact on the design of current and future systems and applications**

