

# A survey of techniques for architecting SLC/MLC/TLC hybrid Flash memory-based SSDs

Ahmed Izzat Alsalibi<sup>1</sup>  | Sparsh Mittal<sup>2</sup>  | Mohammed Azmi Al-Betar<sup>3</sup> | Putra Bin Sumari<sup>4</sup>

<sup>1</sup>Israa University, Gaza, Palestine

<sup>2</sup>Indian Institute of Technology, Hyderabad, India

<sup>3</sup>Al-Husn University College, Al-Balqa

Applied University, Irbid, Jordan

<sup>4</sup>Universiti Sains Malaysia, Penang, Malaysia

## Correspondence

Ahmed Izzat Alsalibi, Information Systems College, Israa University, Gaza, Palestine.  
Email: ahmed.salibi@gmail.com

Sparsh Mittal, Indian Institute of Technology, Hyderabad, India.  
Email: sparsh0mittal@gmail.com

## Funding information

Science and Engineering Research Board (SERB), India, Grant/Award Number: ECR/2017/000622

## Summary

Flash memory-based solid-state drives (SSDs) offer several attractive features and benefits compared to hard disk drive (HDD), such as shock resistance and better performance especially for random data access. Depending on the number of bits in each cell, Flash memory can be designed as single/multi/triple level cell (SLC/MLC/TLC), which have different performance, density, cost and write endurance characteristics. To bring the best of these together, several researchers have proposed designing SSD using hybrid SLC/MLC/TLC Flash memory. However, these SSDs also present several challenges such as buffer management, placement of hot/cold data in suitable portion, and intelligent garbage collection. Several recent techniques aim to address these challenges. In this paper, we present a survey of techniques for managing SSDs designed with SLC/MLC/TLC Flash memory. We classify the works on several axes to bring out their similarities and differences. We aim to synthesize the state-of-art progress in hybrid SSD management and also spark further research in this area.

## KEYWORDS

flash translation layer, garbage collection, hybrid solid state disk, NAND flash memory, wear leveling

## 1 | INTRODUCTION

As the amount of digital data continues to grow at an exponential rate and key applications become more data intensive, efficient storage architectures and management techniques have become more important than ever before. Conventionally, hard disk drive\* has been used as a storage device; however, its limitations such as poor performance (especially for random accesses), higher form factor, and vulnerability to shocks and magnetic fields have encouraged researchers to explore its alternatives.

Flash memory is a promising technology for designing storage devices due to its several attractive properties, eg, high performance and density, low power consumption, noise-free operation, and immunity to shocks and magnetic fields.<sup>2–6</sup> Also, its cost has been decreasing in recent years and it is expected to provide even better cost efficiency than HDD in near future. Based on the number of bits stored in each cell, Flash can be characterized as SLC (1 bit), MLC<sup>†</sup> (2 bit) and TLC (3 bits). As shown in Table 1, these cell types provide a spectrum of properties and trade-offs. Specifically, on going from SLC to MLC to TLC, the performance and write endurance decrease whereas density and cost efficiency improve.

To achieve the best of these three cell types, hybrid SSD designs have been proposed, which use Flash memories of multiple cell types to improve performance, energy efficiency, and reliability.<sup>16</sup> However, these hybrid SSD designs also bring challenges, such as selection of relative proportion of SLC/MLC/TLC, efficient mapping/moving of hot/cold data to them, accounting for their disparate write endurance and density values, and

\* We use the following acronyms frequently in this paper: error-correcting code (ECC), flash translation layer (FTL), garbage collection (GC), global positioning system (GPS), least/most recently used (LRU/MRU), least/most significant bit (LSB/MSB), logical page address/number (LPA/LPN), logical sector number (LSN), logical/physical superblock address (LSBA/PSBA), nonvolatile RAM (NVRAM), phase-change memory (PCM), physical page address/number (PPA/PPN), resistive random access memory (ReRAM), service-level objective (SLO), spin transfer torque RAM (STT-RAM), storage-class memory (SCM), universal serial bus (USB). We use both SCM and NVRAM to refer to byte-addressable nonvolatile memories, viz, PCM, domain wall memory, ReRAM and STT-RAM.<sup>1</sup>

<sup>†</sup>Note that the term MLC is also sometimes used to refer to Flash cell with multiple (two or more) levels per cell, which includes TLC<sup>7–9</sup>; however, in this paper, we use MLC to refer to Flash with 2 bits per cell only.

A. I. Alsalibi and S. Mittal contributed equally to this work.

**TABLE 1** Comparison of SLC/MLC/TLC Flash memories.<sup>10-15</sup> (out-of-band size is not shown for clarity)

| Features              | SLC                                                    | MLC                                                           | TLC                                  |
|-----------------------|--------------------------------------------------------|---------------------------------------------------------------|--------------------------------------|
| Bits/cell             | 1                                                      | 2                                                             | 3                                    |
| Cost                  | High                                                   | Medium                                                        | Low                                  |
| Page size, bytes      | 2048                                                   | 16 384                                                        | 16 384                               |
| Pages per block       | 64                                                     | 1024                                                          | 1536                                 |
| Block size, KB        | 128                                                    | 16 384                                                        | 24 576                               |
| Page read, $\mu$ s    | 25                                                     | 50                                                            | 75                                   |
| Program time, $\mu$ s | 200-300                                                | 600-900                                                       | 900-1350                             |
| Block erase, ms       | 1.5-2                                                  | 3                                                             | 4.5-10                               |
| P/E cycles            | 100 000                                                | 3000-10 000                                                   | 300-3000                             |
| Application           | USB, SSD, digital camera, mobile handset, & networking | USB, SSD, media player, digital camera, mobile handset, & GPS | USB, SSD, media player, & mobile GPS |

**FIGURE 1** Paper organization

ensuring efficient GC. Clearly, management of hybrid SSDs brings challenges of its own and hence traditional techniques for managing homogeneous (eg, MLC only) SSD may not work well for hybrid SSDs. Recently, several techniques have been proposed to address these challenges.

**Contributions:** In this paper, we present a survey of techniques for designing and managing hybrid SSD devices. Figure 1 shows the overall organization of this paper. We first present a brief background on Flash memory architecture, operations and SSD management approaches (Section 2). We then provide classification of research works from multiple perspectives to offer insights (Section 3). Then, we review hybrid SSD management techniques in terms of their partitioning techniques (Section 4), buffer design (Section 5) and optimization objective and solution schemes (Section 6). In these sections, we discuss each research work in one group only, although many of the works fall under multiple groups. We conclude this paper with a mention of future research directions in this field (Section 7).

**Scope:** For the sake of a concise presentation, we limit the scope of this paper as follows. We focus on software-level management techniques for hybrid SSDs and not their circuit-level design issues. We include techniques, which use at least two types of Flash and not those that merely use an SCM with a Flash cell-type. We focus on the key ideas of each work and include only selected quantitative results, since different works use disparate evaluation platforms and workloads. We hope that this paper will be useful for computer architects, SSD designers, and researchers in the area of storage architectures.

## 2 | BACKGROUND

### 2.1 | Flash memory architecture and operations

We now briefly review the organization of SSD and Flash memory and refer the reader to prior works<sup>17,18</sup> for more details.

**Flash SSD architecture:** An SSD has multiple packages, each consisting of one or more chip dies. Each die consists of multiple planes, which has several blocks. Further, each block has multiple pages, eg, the size of a page and a block may be 4 and 256 KB, respectively.<sup>19</sup> Each page is logically divided into a large “user area,” which stores the user data and a small “out-of-band” area, which stores mapping information, metadata (eg, erase counter and page state) and ECC.<sup>20,21</sup>

**Flash memory operations:** Flash memory allows program (write), read, and erase operations, which are managed by the FTL.<sup>22</sup> Reads/writes happen at page granularity whereas erase happens at block granularity. A write operation can only change the stored bits from 1 to 0. Hence, the only way to change a bit in a page from 0 to 1 is to erase the block containing the page, which sets all bits in the block to 1. The pages in Flash are classified as free (available for storing new data), invalid (storing dead data), and valid.

**Garbage collection and wear leveling:** Since erase operations are much slower than write operations, FTL seeks to hide erase latency by performing “out-of-place” writes whereby a new write is performed to a free page and the previous location of the page is invalidated.<sup>2</sup> To release invalid pages, FTL periodically performs garbage collection whereby the valid pages of a block are copied elsewhere; the block is erased and all its pages are marked as free. By performing erase operations in background, FTL hides the erase operations and exposes only read/write operations to the user. Since Flash write endurance is small, the wear-leveling module seeks to distribute the number of program/erase cycles evenly among all the blocks to increase overall device lifetime.<sup>23</sup>

**Address translation:** The address translation software records the mapping between logical addresses in the file system to physical addresses in Flash memory. Since writes to Flash are performed “out-of-place,” the address mapping between file system and Flash memory is continuously updated.

## 2.2 | Hybrid SSD design and management techniques

### 2.2.1 | Hybrid SSD design

A hybrid SSD (with hard partitioning) is designed by connecting SLC/MLC/TLC chips over different channels. Figure 2 shows a generic hybrid SSD design.

### 2.2.2 | Hybrid SSD partitioning techniques

Hybrid SSDs may use either hard or soft partitioning. We discuss them with example of an SLC/MLC hybrid SSD.

**Hard partitioning:** In hard partitioning, SLC and MLC chips are physically separated. A particular chip continues to work as SLC or MLC during entire the execution time, and thus, there is no mode switching or change in their capacity. For example, in Figure 3A, 3 SLC and 9 MLC chips are used. Assuming that the SLC and MLC portions are used as the buffer and data, respectively, the buffer write traffic (eg, random writes) is mapped entirely to the SLC chips and the data write traffic (eg, sequential writes) is mapped entirely to the MLC chips. However, due to this, SLC chip may reach the end of its lifetime very soon (as shown by the dotted red line) even though MLC chip can sustain more writes.

**Soft partitioning:** Soft partitioning works on the observation that for writing MLC, first LSB and then MSB needs to be programmed and writing MSB takes 4 to 5 times higher latency than writing LSB. However, if only the LSB of MLC is written, the write performance becomes close to that of SLC at the cost of lower capacity.<sup>25</sup> Thus, MLC blocks can be selectively written as SLC blocks, which keeps performance close to that of SLC.<sup>26</sup> For example, in Figure 3B, 12 MLC chips are used. At any time, some of these chips may be programmed in SLC mode, and thus, they exhibit SLC properties. At different times, different chips may be used as SLC, but the total number of SLC chips at anytime is fixed. The buffer write traffic is now more uniformly spread and thus, the device lifetime is increased due to wear leveling. The SLC mode switching is also termed as dynamic write acceleration.<sup>27</sup>

Table 2 compares the properties of these techniques. It is clear that soft partitioning scheme provides higher flexibility and device lifetime. Due to this, soft partitioning based techniques are increasingly being used in real systems<sup>27,28</sup> compared to the hard partitioning techniques.

**Buffer design techniques:** Flash memory is generally divided into a small log-buffer portion and a large data portion. Buffer portion handles hot/random writes, which are written back to data portion at large granularity to reduce the number of write operations. Since hot writes constitute



**FIGURE 2** A generic hybrid SSD architecture. C1/C2/C3 refers to channels 1, 2, and 3, respectively.<sup>18,24</sup>



**FIGURE 3** (SLC and MLC chips are physically separated. Mapping of random writes to SLC can lead to its early wear out). (Only MLC blocks are used, some of which can be selectively programmed as SLC to improve performance and achieve wear leveling).<sup>25</sup> A, Hard partitioning; B, Soft partitioning

**TABLE 2** A comparative evaluation of hard and soft partitioning techniques

|      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Hard | <p>(+) improves performance and lifetime. Allows separate addressing and management of SLC and MLC chips.</p> <p>(-) Despite higher endurance of SLC, use of it as a buffer may lead to its early wear-out, degrading device lifetime. To avoid this, SLC partition ratio needs to be kept high (eg, ~10%-30%), which increases SSD cost and requires maintaining huge address translation table.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| Soft | <p>(+) allows higher buffer bandwidth since multiple channels and even multiple interleaved chips can be concurrently accessed to write to the buffer. Allows flexible tradeoff between buffer capacity, write performance and device lifetime (since SLC writes are less damaging to the Flash than MLC writes). Faster writes allow the device to spend higher time in a low-power state. Allows handling evictions from buffer on-chip without requiring off-chip migration.</p> <p>(-) Due to different capacity of SLC and MLC, a mode switch complicates addressing mechanism. Converting large fraction of blocks to SLC reduces device capacity drastically and hence, after a point, the device has to switch back to MLC, which reduces speed and bandwidth. Also, MLC-turned SLC region may not be as optimized as an SLC chip. Further, program/erase operations of different modes complicate the design and operation of program-erase controller.</p> |

a large fraction of writes, most writes are directed to buffer partition and hence, intelligent choice of memory technology for designing the buffer is important in hybrid SSDs. Table 3 summarizes the advantages/disadvantages of different buffer design approaches.

**Mapping techniques:** The hybrid SSD management techniques can also be classified based on the granularity of mapping. *Page-level mapping* allows direct mapping of any logical page to any physical page in the Flash memory. *Block-level mapping* approach stores the logical to physical address translation information at the granularity of each block.<sup>18,32</sup> Further, multiple *hybrid-level mapping* schemes have been proposed, which use some combination of page and block level mapping schemes. For example, when Flash memory is divided into a large data portion and a small log-buffer portion,<sup>8,33,34</sup> hybrid-level mapping may use block-level mapping for data portion and page-level mapping for log-buffer portion. Random/hot writes to log buffer benefit from the fine-grained page-level mapping and sequential/cold writes benefit from the coarse-grain block-level mapping. Finally, some techniques use *page-level and block-level mapping* schemes for SLC and MLC portions, respectively. In case where SLC and MLC are used as log buffer and data partitions, respectively, this mapping becomes same as the hybrid-level mapping described above. Table 4 compares the properties

**TABLE 3** A comparative evaluation of different buffer design approaches

|             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SLC         | (+) Improves performance and lifetime<br>(-) Increases SSD cost and area due to low density of SLC, can degrade SLC lifetime (refer to Figure 3). Once SLC wears, future writes are redirected to MLC, which harms performance                                                                                                                                                                                                                                                                                                                                                                               |
| MLC         | (+) improves density and lowers cost<br>(-) lowers lifetime due to small write endurance of MLC                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| DRAM or SCM | (+) improves performance and lifetime; bridges speed gap between memory and storage <sup>29</sup> ; allows performing reads/writes in near-constant time regardless of the location of data inside memory. SCMs do not require erase operations and have much higher endurance than Flash memory. <sup>30,31</sup> The latency of SCMs is in the range of DRAM latency, whereas their leakage power consumption is near-zero (since only peripheral circuit consumes leakage power).<br>(-) increases cost; no crash consistency on using DRAM; SCMs are not very mature and not available in large capacity |

**TABLE 4** A comparative evaluation of different mapping schemes

|                       |                                                                                                                                                                                                                                                                                                                                                                       |
|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Page level            | (+) This provides high performance and lifetime and also facilitates flexible separation of hot and cold data. <sup>35,36</sup><br>(-) requires large address translation table. Since this table is stored in SRAM, which has poor density, the overall area overhead is increased. GC needs to be performed in background, which leads to variable write latencies. |
| Block level           | (+) reduces the size of translation table<br>(-) does not allow fine-grain separation of hot/cold data. <sup>37</sup><br>Also, due to the requirement of maintaining equal offset of logical and physical block, on a write to a page in the block, all valid pages of this block need to be copied to a free block.                                                  |
| Hybrid level          | (+) mitigates “erase-before-write” issue with lower memory requirement than the page-mapping table <sup>8,38</sup><br>(-) requires maintaining both block and page-level mapping tables, leading to higher storage overhead. Block-merge operation leads to multiple erase and write operations.                                                                      |
| Page and block levels | Its advantages/disadvantages are similar to those of page and block-level schemes discussed above.                                                                                                                                                                                                                                                                    |

of these techniques. Note that the page-level mapping is the most widely used mapping strategy in the industry, since in practice, other strategies provide small improvement and/or incur large overheads.

### 3 | CLASSIFICATION AND OVERVIEW

In this section, we classify the research works based on several parameters.

#### 3.1 | Based on hybrid SSD and buffer design

Table 5 classifies the works based on the Flash cells used in hybrid SSD design and memory/cell used for designing the buffer. Note that the techniques proposed for SLC + MLC hybrid SSD may also be applicable to SLC + TLC SSD; however, in Table 5, we mention a technique in the category for which it was originally evaluated. While most techniques use SLC or MLC as buffer, some techniques use other memories such as SCM or DRAM as a buffer.

#### 3.2 | Based on optimization goal and evaluation platform

Table 6 classifies the works based on optimization metric and it is clear that the techniques proposed are guided by multiple optimization goals, which need to be carefully balanced.

**TABLE 5** Classification based on hybrid SSD architecture and mapping technique

| Category                | References                                      |
|-------------------------|-------------------------------------------------|
| Hybrid SSD Architecture |                                                 |
| SLC + TLC               | 10,39                                           |
| MLC + TLC               | 24,40                                           |
| TLC + (SLC or MLC)      | 36                                              |
| SLC + MLC               | Nearly all other works discussed in this survey |
| Mapping Technique       |                                                 |
| Page level              | 35,36                                           |
| Block level             | 18,32                                           |
| Page and block levels   | 10,38,41-44                                     |
| Hybrid level            | 7,8,25,33,34                                    |

**TABLE 6** Classification based on optimization metric and evaluation platform

| Category                       | References                   |
|--------------------------------|------------------------------|
| Optimization Metric            |                              |
| Performance                    | 9,10,15,18,24,34-36,38-48    |
| Energy consumption             | 7,18,40,42,47                |
| Lifetime                       | 7-10,25,32-36,41-43,45,48-50 |
| Evaluation Platform            |                              |
| Both simulator and real system | 25,48                        |
| Real system/prototype          | 44                           |
| Simulator                      | nearly all others            |

The choice of the platform for evaluating hybrid SSD techniques is crucially important. While simulators provide high flexibility to test even those designs, which may be currently infeasible to implement, they may be too slow to allow full exploration of the design space. By comparison, real systems allow more accurate evaluation and due to their fast speed, they allow executing larger number of instructions. Clearly, both these platforms are vital for evaluating the proposed techniques and provide complementary insights. For this reason, Table 6 also classifies the works based on their evaluation platform.

### 3.3 | Key ideas of different techniques

We now discuss some key ideas, which are used in different techniques discussed in the survey.

1. Most works classify the data or write operations as hot or cold depending on length of write requests or random/sequential nature<sup>8,10,15,18,24,34,35,38,41,45,47,51</sup> or access frequency<sup>8,45</sup> to store them to different memory cells. Also, pages belonging to metadata can be directly marked as hot.<sup>15</sup> Other works classify the data as user data or ECC data<sup>32</sup> and write or read intensive.<sup>39,46</sup> Also, data may be stored in SLC or MLC based on performance/deadline<sup>9</sup> and instantaneous wear<sup>18</sup> considerations.
2. Several works discuss strategies to migrate data from SLC to MLC,<sup>24,27,35,38,43,45,47,48</sup> SLC to TLC,<sup>28</sup> MLC to TLC<sup>24,40</sup> or PCM to Flash<sup>15,41,42</sup> to utilize the space efficiently and improve performance.
3. Performing data migration<sup>15,27,28,36,45</sup> or GC<sup>9</sup> at idle time helps in mitigating its latency. In case of bursty traffic or low free space, data migration<sup>35</sup> or GC<sup>9,44,48</sup> may have to be performed on-demand.
4. The data item not accessed for a period is a candidate for migration.<sup>35,36,40,43</sup>
5. Based on the number of valid pages in a block, the migration candidates may be selected<sup>38</sup> or the decision about performing migration within MLC or from MLC to TLC<sup>24</sup> may be made.
6. To improve lifetime, some techniques seek to regulate amount of migrated data from SLC to MLC<sup>43,45</sup> or postpone flushing of data by giving extra chances to cold data to stay in SLC.<sup>18,35,38</sup>
7. The techniques using SCMs or DRAM as a buffer may flush the data from the buffer to SLC or MLC based on the number of pages to be flushed.<sup>42,51</sup>

8. Some works discuss strategies to deal with block merge operations,<sup>8,34,38</sup> whereas other works avoid merge operations by using intelligent mapping techniques.<sup>35</sup> Also, some works merge the data in DRAM buffer before flushing it to Flash memory.<sup>41,42</sup>
9. Some works use different mapping schemes in SLC and MLC<sup>8,18</sup> or for hot and cold data.<sup>10</sup>
10. Some techniques schedule different operations on different channels<sup>32,34,51</sup> to boost performance.
11. Some works leverage partial programmability of SLC to reduce write overhead.<sup>32,50</sup>

## 4 | HYBRID SSD PARTITIONING TECHNIQUES

We now discuss the techniques in terms of their partitioning approach, viz, hard (Section 4.1) and soft (Section 4.2) partitioning. Table 7 classifies the works based on their partitioning techniques.

### 4.1 | Use of hard partitioning

Nam et al<sup>8</sup> propose a technique, which uses SLC as the log buffer to serve the frequently updated data (refer to Figure 4). The sequential and random write operations are served by MLC and SLC, respectively. Also, if the sequential write count is more than a quarter of an MLC block, the data in old data blocks and incoming data are merged into a new data block. Also, during block merge operation, valid pages in victim log block are merged with the associated data blocks and the up-to-date data pages are written to new data blocks. To perform overwrite operation in log area, old data is marked as invalid and new data are written in subsequent free pages of log block. This improves performance by virtue of avoiding “erase-before-write” requirement. When SLC has no free pages, victim block for erase is selected in round-robin manner to achieve wear leveling. Their technique is suitable for enterprise database applications and can efficiently support random write operations.

Sung et al<sup>45</sup> propose a technique, which seeks to improve performance and lifetime of hybrid SSD. They divide the logical volume into fixed size portions and record the size of write requests in each portion for a fixed time period. Based on this distribution and latency of each request on SLC/MLC, data migration decision is periodically taken to boost read/write throughput in the next period. Frequent small writes are mapped to SLC portion and less-frequent large writes are mapped to MLC portion. Data migration is performed during idle time, eg, when battery is being charged, which reduces the impact on performance. Also, the amount of migrated data are controlled to reduce degradation in Flash lifetime. Their technique is implemented in the device driver layer of OS and not in FTL, which is closely bound to the Flash organization. Thus, their technique can work with different file systems and Flash chips from different vendors.

Oh et al<sup>36</sup> note that due to poor write performance and small write endurance, TLC-only SSD is ineffective as a general storage device. They propose integrating SLC (or MLC) with TLC. Their technique identifies hot and cold data. Then, based on analytical modeling, they find the fraction of hot data that should be mapped to SLC to boost performance and also balance wear between SLC and TLC. Also, to ensure efficient utilization of SLC, a data item not accessed for a given time period is migrated to TLC in background. Their technique allows optimizing for performance or lifetime or balancing the two metrics. A limitation of their technique is that the analytical model cannot exactly account for time-varying application characteristics and needs to make unrealistic assumptions such as perfect wear leveling.

**TABLE 7** Classification based on the partition strategy

| Category          | References                               |
|-------------------|------------------------------------------|
| Hard partitioning | 8,15,18,24,32,34,36,38,40-43,45-47,49,51 |
| Soft partitioning | 7,9,10,25,33,35,39,44,48,50              |



**FIGURE 4** Working of the technique of Nam et al<sup>8</sup>



**FIGURE 5** Working of the technique of Im et al<sup>35</sup>

#### 4.2 | Use of soft partitioning

Im et al<sup>35</sup> present a technique, which works by detecting the data hotness based on the size of the write. Small writes are mapped to SLC and sequential large writes are mapped to MLC (refer to Figure 5). In case SLC runs out of free pages, invalid pages are reclaimed or cold data are moved to MLC. They note that block-level and hybrid mapping involve costly copy operations for merging and hence, these mapping approaches are not suitable for MLC. Further, due to large MLC capacity, page-level mapping necessitates large mapping tables. Hence, they use unit-based page-level mapping,<sup>52</sup> which does not require merge operations. In this mapping, each unit has multiple sequential logical blocks and multiple physical blocks can be allocated to every unit. A page can be written at any physical page inside the physical blocks assigned to this unit. This reduces the mapping table size since the page-level mapping record is required only within the unit boundary. The unit size is chosen at design time based on the SLC size.

They divide the SLC into a hot and a warm partition. A write request first arrives in the hot partition and after GC in this partition, moves to the warm partition. If warm data sees another write request, the data in warm partition are invalidated and new data are stored in the hot partition. When the warm partition runs out of space, it gives each partition  $N$  chances before it is evicted to MLC. By suitably choosing the size of hot and warm lists, the residency of data in SLC can be controlled. The value of  $N$  is adjusted based on the update ratio in the warm partition. Their technique provides significant improvement in lifetime and write performance.

Jimenez et al<sup>25</sup> note that in hard partitioning, the SLC-based buffer can fail much sooner than the MLC. They propose selectively writing MLC Flash as an SLC Flash, which keeps performance close to that SLC. Their technique chooses between writing in SLC or MLC mode based on the total wear. When a block assigned to the buffer sees more writes than a data block, the blocks are swapped. Thus, the physical location of buffer can be migrated across the device, which balances the global wear. Their technique can work with different FTLs and mapping schemes. Their technique provides flexibility to balance wear across the device, provides near-SLC performance and better-than-MLC lifetime with no additional cost and only small reduction in density.

Yang et al<sup>39</sup> note that TLC blocks can be programmed as SLC blocks, which allows exercising tradeoff between capacity and performance based on the access pattern and instantaneous utilization. Since SSD performance is determined by the latency of hot data, by transforming maximum possible free blocks into SLC, hot data can be served from SLC to boost performance (refer to Figure 6B) compared to the fixed-size SLC buffer approach (refer to Figure 6A). In their design, logical address range of a logical block equals the size of TLC block and that of a logical page equals the size of SLC/TLC page. Data of a logical block can be stored as either one TLC block or three SLC blocks. For each logical block, the mapping from logical to physical pages is stored in a page-level mapping table.



**FIGURE 6** Fixed-sized SLC buffer versus variable-sized SLC buffer approach<sup>39</sup> A, Fixed-sized SLC buffer techniques; B, Variable-sized SLC buffer techniques



**FIGURE 7** Illustration of A, block mapping, B, priority transition, and C, GC operations in the technique of Yang et al.<sup>39</sup> A, Utilization-aware block mapping scheme; B, Priority transition of logical blocks; C, Two types of GC operations for low and high utilization cases, respectively

Since TLC write and read latency values are 8 and 3 times (respectively) that of SLC values, they propose giving higher priority to write-intensive blocks than to read-intensive blocks (refer to Figure 7B). On a write, a block's priority is changed to "high." On a read to a "low" priority block, its priority is changed to "medium." When a logical block is scanned by GC and no more block can be reclaimed from this logical block, the priority is reduced from high to medium or medium to low. Thus, in due time, cold blocks get low priority. This allows migrating cold data to TLC and freeing up the space to store higher priority blocks as SLC. Data with higher and lower priorities are stored in SLC and TLC, respectively (refer to Figure 7A).

On a write request to a logical block, if the number of free SLC pages in this block is sufficient, the write is performed in this block. Otherwise, GC is performed based on the level of storage utilization. In case of low storage utilization, GC is performed within SLC to boost performance at the cost of storage capacity. However, in case of high storage utilization, SLC data are packed and migrated to TLC to increase the overall capacity. TLC is written only through this "TLC-pack" GC operation, which avoids any nonsequential writes violating TLC write constraints. These GC operations are illustrated in Figure 7C. Overall, their technique improves write performance significantly.

Chang et al<sup>9</sup> present a technique, which selectively performs SLC writes in MLC blocks to meet workload SLOs without requiring overprovisioning and also reducing SLC write counts. It is noteworthy that meeting SLO is different from simply boosting performance and requires more careful management. Also, SLC/MLC mode selection for a request impacts both its own latency and the latency of upcoming requests. Their scheduling algorithm works on the observation that if a schedule is viable with  $k$  SLC writes for some write accesses, it remains viable on serving exactly first  $k$  requests in the SLC mode. Incoming reads/writes are inserted in respective queues in order of their deadlines, which is its arrival time plus the target latency. By default, writes are performed in MLC mode, but if any queued request will fail to meet its deadline, an incoming write is performed in the SLC mode. Further, if actual latencies fail to meet the SLO requirements, their technique decreases the target latency and/or increases SLC-mode writes for meeting SLO requirements. In their technique, read requests are prioritized unless write queue saturates since reads have higher impact on user experience and they are served by the "earliest-deadline first" scheme.

To avoid the need of on-demand GC, their technique performs GC when the storage is idle and remaining free block count falls below a threshold. This threshold is updated based on SLC and MLC write counts to balance capacity and performance. In case of bursty writes or low remaining space due to aggressive use of SLC-mode writes, GC is performed in on-demand manner. GC scans both SLC and MLC blocks and chooses those with the lowest amount of live data as victims. Their technique can achieve or approach the SLO target without requiring overprovisioning, whereas MLC-only device is unable to meet this target. Also, the increase in erase operations due to their technique is small.

Lee et al<sup>48</sup> propose a soft partitioning technique, which seeks to provide performance of SLC with capacity of MLC. When a write arrives, their technique chooses SLC/MLC mode for programming with a view to balance the performance and lifetime. To boost performance, their technique writes maximum possible data in SLC-mode, however, excessive use of SLC can degrade capacity and lifetime. To address this, their technique moves valid pages in SLC blocks to MLC blocks to create free space and allows controlling the fraction of writes performed in SLC-mode. For a write request, first the SLC/MLC mode is chosen and data are temporarily buffered in SLC or MLC write buffer. Then, the write is performed similar to that in log-structured file systems.<sup>53</sup> On eviction from write buffer, a data-item is written to log block of SLC or MLC region in that mode. As shown in Figure 8, if there are two SLC blocks and one free block, then the valid pages of both SLC blocks can be copied to the free block with MLC-mode programming. Then, both SLC blocks can be erased, which provides two free blocks. Free space reclamation is invoked only when the available free space



**FIGURE 8** Free-space reclamation in the technique of Lee et al.<sup>48</sup> A, Initial state; B, Copy & erase; C, Final state

falls below a threshold. This approach reduces data migration compared to early reclamation during idle time and reduces performance penalty compared to on-demand reclamation. A limitation of their technique is that it uses SLC for storing both hot and cold data. Also, it assumes the wear of SLC mode to be equal to that of MLC mode, which prevents fully leveraging Flash memory endurance.

Zhang et al<sup>50</sup> exploit PPP feature of SLC to reduce write overhead by implementing in-place delta compression. They note that since a given location is frequently written in a short interval (eg, metadata updates and revision of file content), writes to the buffer have high temporal redundancy. This allows using delta compression to reduce the write overhead to the buffer. Previous techniques store the base (original data) and subsequent deltas in different physical pages, which requires maintaining a large amount of mapping information. Also, a read access needs to now fetch the base and deltas from different pages, which leads to read amplification and read latency penalty. Assuming a 4-KB sector and 16-KB page, their technique writes a sector to an SLC page in compressed form. Since per-sector compression leaves some cells in the page unutilized, they use PPP feature of SLC to use these cells for later storing the deltas. Thus, the base and deltas are stored in the same SLC page. If a page is filled after certain number of updates or the number of deltas reach a threshold, a new physical page is allocated, the latest version data are written to a new page and delta compression is reset for future updates. This ensures that only one page needs to be accessed for reading a data item. They also propose a hybrid ECC for dealing with the different sizes of base and deltas. They show that their technique reduces write traffic to SLC pages with negligible latency and area overhead.

## 5 | BUFFER DESIGN AND MANAGEMENT TECHNIQUES

The use of a particular memory technology or cell design as a buffer has a significant impact on the overall performance, energy efficiency and cost of the hybrid SSD. For this reason, we now review hybrid SSD management techniques in term of their buffer architecture and management policies (refer to Table 8).

### 5.1 | Using SLC as a buffer

Several techniques use SLC as a buffer to serve write operations, which improves performance and also provides higher device lifetime. However, since SLC has lower density, the MLC/TLC pool is used for storing remaining (eg, cold) working set size, which improves capacity and lowers the cost.

Jung et al<sup>34</sup> propose a technique, in which many SLC chips are used as non-volatile write buffer. Their technique designates one SLC chip to store data from the host till it gets full, which is termed as “foreground operation.” The remaining SLC chips perform merge and garbage-collection operation with MLC chips, termed as “background operation.” To improve the efficiency of buffering, the blocks within SLC are divided into three

**TABLE 8** Classification based on the buffer architecture

| Category          | References               |
|-------------------|--------------------------|
| SLC Flash         | 7,8,18,25,32-35,38,39,50 |
| MLC Flash         | 40                       |
| DRAM              | 41,49                    |
| SCM               | 15,24,42,47              |
| Both DRAM and SCM | 46                       |

partitions: sequential log blocks, random log blocks, and data blocks. When the log block partition in foreground SLC is fully consumed, another SLC chip is selected for foreground operation and this SLC is now used for background operation. Background and foreground operations happen on different channels, which avoids conflicts. Their mapping technique is built-on the FAST (fully associative sector translation) technique<sup>54</sup> and seeks to allow SLC to handle both random and sequential write access operations.

Im et al<sup>38</sup> propose a technique, which uses SLC as log buffer with page-mapping scheme and MLC as data block with block-mapping scheme. Use of block-level mapping in MLC helps in reducing the size of mapping table since the MLC partition has larger number of pages than the SLC partition. When SLC has no empty space, GC migrates cold data to MLC or reclaims the space from invalid pages. Hot pages are kept in SLC area. To minimize migration cost, the pages selected for migration are chosen based on the number of valid pages in corresponding MLC data block with which the migrated pages are merged.

Based on the number of updates to a page after last GC operation, each page is classified into hot, warm and cold. Then, based on the number of pages in a data block whose corresponding page in log buffer is hot/cold/warm, each data block is also classified as hot/warm/cold. Based on this, page-level merge is performed only with cold and warm blocks. Then, log pages whose linked data blocks are cold, are moved to data blocks. For pages in log buffer whose connected warm data blocks have more than a threshold valid pages, log blocks are not merged with data blocks to postpone MLC page copy operations. Further, if a log block has less than a threshold number of valid pages, then, its valid pages are moved to another log block. Finally, blocks with no valid pages are erased. Sequential large chunks of data are bypassed from SLC since they are usually written only once. They show that their technique manages SLC area efficiently and improves performance significantly. The limitation of their technique is the use of a special “fusion Flash memory,” which has lower performance than general SLC and MLC chips.<sup>8</sup>

## 5.2 | Using MLC as a buffer

Hachiya et al<sup>40</sup> design MLC/TLC-Flash based hybrid SSD and use MLC Flash as a buffer.<sup>40</sup> Their technique stores hot data in MLC pool to achieve high performance. To ensure efficient utilization of MLC pool, their technique selects extremely cold data and migrates it to TLC pool. Since TLC has limited write endurance and high write latency, storing rarely accessed data in TLC leads to significant reduction in accesses to TLC, which improves its lifetime. Compared to MLC-only SSD, their technique provides improvement in performance, energy and cost efficiency.

## 5.3 | Using SCMs or DRAM as a buffer

Given the low capacity of SLC Flash and low endurance and high latency of MLC Flash, use of them as a buffer presents challenges. To address these limitations, some works propose using non-volatile SCMs (eg, PCM, ReRAM) or volatile memories (eg, DRAM) as a buffer. We now discuss several such works.

Yim et al<sup>15</sup> propose using NVRAM (eg, PCM) as a write buffer in an SLC/MLC hybrid SSD. Their technique demarcates write buffer into three portions. The first portion is used for serving small writes (eg, 4-KB writes for file system metadata), which reduces the host delay. These data-items are copied to Flash memory when it is lying idle. The second portion is used for reducing the number of writes performed by Flash (refer to Figure 9).

Their technique uses hash tables stored in volatile buffer to test whether the requested page is present in the buffer. If so, the page is directly updated in the buffer. Otherwise, if the page had a write in the near past (eg, last K writes), the page is considered to be hot, otherwise, it is marked as cold. Also, based on the type of file system (eg, fast file system or file allocation table), pages belonging to metadata can be explicitly marked as hot. Then, only hot pages are stored in the write buffer. For a write directed to a page stored in buffer, the page is directly updated, which reduces



**FIGURE 9** Working of the technique of Yim et al<sup>15</sup>



**FIGURE 10** Working of the technique of Park et al<sup>41</sup> (Z is a threshold)

the number of writes to Flash. A page evicted from buffer is stored in SLC, whereas all cold pages are stored in MLC. The third portion of buffer facilitates byte-granularity reads to enable “execute-in-place” capability. On a read access, if the page is found in the buffer, the data are provided to host with low latency, otherwise, the page is loaded from Flash to the buffer and also delivered to the host. Overall, their technique achieves write performance of SLC with density and cost of MLC memory.

Park et al<sup>41</sup> present a technique, which uses chained-block (CB) to design hybrid SLC-MLC SSD. Their technique stores the CB mapping table, page mapping table, file system metadata, and user data with frequent short writes in SLC. Long writes are stored in MLC. Both SLC and MLC store data as separate CBs but have similar organization (refer to Figure 10).

Number of physical blocks in SLC CB is same as the number of SLC chips and each of them resides in a different SLC chip. MLC CB length is higher than that of SLC CB, which compensates for its higher latency. After one write, if an associated write comes within a fixed time window, these writes are bundled into one bulk-write in the write buffer (which is stored in a RAM, eg, DRAM). When the time window is over, or a read comes or the buffer saturates, data in the buffer is stored in either an SLC or an MLC CB depending on length of the bundled write.

They use two kinds of mapping tables: (1) a CB mapping table, which provides physical block address inside a logical CB and (2) a page mapping table, which provides address of the requested page inside the logical CB. Finally, Flash is accessed using chip-ID, physical block address and physical page address. When a logical CB is updated, an SLC or MLC CB is allocated to the logical CB, depending on whether the write request is smaller or larger than a threshold. SLC CB waits for more writes for a certain time, after which the data are moved to MLC CB. Figure 11 summarizes their address translation approach. While using 80% MLC chips, they achieve write performance close to that of the SLC chips. The limitation of their technique is the use of complex mapping scheme and use of two types of Flash memory as the update area, which degrades performance. Also, since the lifetime is limited by the portion with smallest lifetime, they use 10% to 30% SLC portion to increase its lifetime above MLC portion, which increases SSD cost and the size of address translation table.

Lu et al<sup>42</sup> propose a hybrid SSD design, which uses PCM and SLC as primary and secondary update areas, respectively and MLC as the main data storage (refer to Figure 12). They use PCM for buffering and updating the write data. SLC and MLC are organized as superblock designs,<sup>41</sup> and their page size needs to be equal. The technique of Park et al<sup>41</sup> uses one MLC superblock for data, one MLC superblock for sequential updating, and multiple SLC superblocks for random updating. In the technique of Lu et al,<sup>42</sup> an LSBA can have only one MLC superblock as data area and any number of SLC superblocks for random updating and thus, their technique does not use MLC superblock(s) as the update area.

If PCM has free space to store incoming write data, the requests are packed with the existing buffered data into multiple page-clusters. Otherwise, single or multiple page clusters are flushed to hybrid Flash to create space in PCM. Depending on whether the number of pages to flush is more or



**FIGURE 11** Address translation process of Park et al<sup>41</sup> technique

**FIGURE 12** Working of the technique of Lu et al.<sup>42</sup>**FIGURE 13** Illustration of flush operation in technique of Lu et al.<sup>42</sup> A, Valid pages are gathered from PCM, data area, and update area for flushing to DRAM; B, The pages stored in DRAM are sorted and transferred into data area and the original updated data area is erased

less than the threshold, the data are flushed to MLC or SLC, respectively. For flushing to MLC, a DRAM buffer of the size of one MLC superblock size is used. The flushed data and valid pages from corresponding LSBA of Flash are all stored in the DRAM buffer (refer to Figure 13A). In DRAM buffer, the pages are sorted based on LPN and written to a free MLC superblock (refer to Figure 13B). Now, this becomes LSBA's data area and the old superblock is erased. Thus, MLC superblock is not used as the update area, which reduces the amount of mapping information. The DRAM buffer is turned off when idle.

In case no SLC superblock is free, GC is performed and a victim is selected in the following order. (1) If an LSBA is found such that the number of used pages in all SLC superblocks in the LSBA update area exceeds the pages of single MLC superblock in LSBA data area, then at least one SLC superblock can be reclaimed after merging SLC superblocks. (2) Otherwise, an LSBA is searched, which has SLC superblocks with more invalid pages than the total pages in one SLC superblock. (3) If no such LSBA is found, it implies that SLC superblocks are uniformly allocated in LSBA. Hence, an SLC superblock (update area) is selected and merged with an MLC superblock (data area). Then, valid pages are written to the newly allocated empty MLC superblock and reclaimed SLC superblock(s) are erased. Overall, their hierarchical update approach improves lifetime and write performance of SSD significantly.

Park et al<sup>46</sup> note that since the data stored in volatile RAM (VRAM) buffer may be lost on a power failure or system crash, most systems use pdflush mechanism to periodically (eg, 30 seconds in Linux) write updated data to storage, even when the buffer has free space. This, however, contributes ~80% of the writes to Flash memory and thus, increases the Flash writes significantly. They propose designing the buffer cache with both VRAM (eg, DRAM) and NVRAM (PCM or STT-RAM), as shown in Figure 14. On a read to a block, the data are cached in the VRAM buffer, which



**FIGURE 14** Working of the technique of Park et al<sup>44</sup>

provides low latency for read-intensive data. On a write due to I/O or `pdflush` operation, the block is moved to the NVRAM buffer. This ensures data consistency without requiring writes to the Flash memory. Writes to Flash happen only when a victim is evicted from the NVRAM buffer.

Their technique divides both VRAM and NVRAM buffers into SLC and MLC portions. For each of the four portions (VRAM<sub>SLC</sub>, VRAM<sub>MLC</sub>, NVRAM<sub>SLC</sub>, and NVRAM<sub>MLC</sub>), the metadata of recently evicted blocks are tracked. By seeing hits to them and based on the read/write latency to SLC and MLC, the size of four portions is adjusted. The sizes are enforced at the time of buffer replacement. On a miss in the buffer, their technique checks if a free block is present in the VRAM (or NVRAM) on a read (or write) reference. If no, a block is evicted from VRAM (or NVRAM) and the requested block is stored in place of free block. Within VRAM, the victim is chosen from VRAM<sub>SLC</sub> or VRAM<sub>MLC</sub> depending on which portion currently exceeds its quota. Similar procedure is applicable to NVRAM. Blocks evicted from NVRAM are flushed to Flash. By assigning the cache space based on the I/O behavior, access pattern of buffered blocks and memory characteristics, their technique provides large improvement in I/O performance. Also, by virtue of using NVRAM in the buffer cache, it promises high reliability of file data.

Cho et al<sup>51</sup> propose a technique, which seeks to maximize parallel execution of sequential write requests since random request do not benefit from parallelism. Their technique uses an adapted version of hybrid superblock mapping where a hybrid mapping approach is used along with superblock design. In this approach, mapping of logical to physical address is done using superblocks. Then, the VPN table for each physical superblock address is accessed for tagging the logical page number. Thus, there is mapping between LSAs to PSBAs. VPN maps LPN in PSBA and provides the channel and bank IDs. This allows parallel writing of pages in an LSBA. Their technique buffers the data in LSBA and flushes them to Flash chips for maximizing interleaving.

As shown in Figure 15, if the requested data are already present in the buffer, the requested data are updated. Otherwise, if the buffer has sufficient free space, the incoming data are pushed to the top slot. However, if the buffer does not have sufficient space, a victim block is found. Victim entry is selected using LRU policy and to account for the spatial locality, number of update pages in a superblock are also considered. If MLC channel is busy due to merge operations, a random entry is selected as a victim; otherwise, a sequential entry is selected. Random and sequential data are flushed to SLC and MLC, respectively, and the threshold for determining the random or sequential write is selected based on the number of MLC and SLC banks. Their technique reduces the number of erase operations and also improves write performance.

Sun et al<sup>47</sup> note that although NVRAMs provide higher performance and endurance than Flash, they also have higher cost, and hence, the amount of NVRAM used in a hybrid SSD needs to be carefully chosen based on several factors, eg, area, cost, latency, and application behavior. By analyzing these parameters, they observe that to achieve high SSD performance, read/write latencies of NVRAMs need to be below 1  $\mu$ s. Also, write latency has higher impact on SSD energy than read latency. They study the impact of different NVRAM area cost models (viz, optimistic and pessimistic), NVRAM:Flash capacity ratio, NVRAM read:write latency ratio, etc, and also, study strategies for mapping hot/cold and sequential/random data to NVRAM or Flash.

Kwon et al<sup>49</sup> propose a scheme, which seeks to improve SSD performance and balance the wear of SLC and MLC Flash. Their scheme identifies hot data based on two factors: frequent updates to a data item and irregular allocation. Irregular allocation refers to the fact that the LPA of the small hot data are not sequential to the previous LPA. Hot and cold data are stored in SLC and MLC registers, respectively, from where they are transferred



**FIGURE 15** Working of the technique of Cho et al<sup>51</sup>



**FIGURE 16** Working of the technique of Matsui et al.<sup>24</sup> (RAM refers to an SCM, eg, ReRAM)

to SLC and MLC chips, respectively. Writing and updating of data in the registers is performed similar to previous techniques.<sup>55,56</sup> DRAM is used for data that cannot be identified as cold or hot. If the write request does not update existing data and is not sequential, LSN of write request is stored in DRAM buffer if DRAM buffer contains less than a threshold number of LSNs. However, when DRAM buffer contains more than a threshold number of LSNs, their scheme checks if the LSN of the write request is in sequence with the LSNs stored in the DRAM buffer. If so, they are transferred to the MLC registers to be written to MLC chips. Otherwise, the data are stored in DRAM since its sequential/random nature is still not known. Overall, their technique reduces total write and erase operations and also achieves wear leveling.

Matsui et al<sup>24</sup> propose a technique, which uses SCM-based buffer for a MLC/TLC-based hybrid SSD (refer to Figure 16). They classify the data as hot, cold, and frozen, which refer to frequently, infrequently, and rarely accessed (respectively) data. Then, hot, cold, and frozen data are stored in SCM, MLC, and TLC, respectively. Further, based on whether the length of a write request is greater than or lower than a threshold (8 KB), the data are classified as random or sequential, respectively. Then, random data are stored in SCM whereas sequential data are stored in MLC. Furthermore, during GC in MLC, the oldest block is selected for reclamation. If the block has many valid pages, these data are considered frozen and they are migrated to TLC. However, if the block does not have many valid pages, its valid pages are moved to other blocks in MLC itself and the block is erased. Thus, data are not directly written to TLC. Their design improves performance for wide variety of applications with minimal increase in SSD cost.

## 6 | OPTIMIZATION OBJECTIVES AND ALGORITHMS

In this section, we discuss SSD management techniques in terms of their optimization metric and solution algorithms. Several hybrid SSD management techniques use optimization/solution heuristics such as k-means clustering,<sup>18</sup> genetic algorithm,<sup>10</sup> integral controller,<sup>43</sup> and analytical models.<sup>10,36,44</sup>

### 6.1 | Techniques for managing failed blocks

Jimenez et al<sup>33</sup> note that in traditional MLC Flash, a block is discarded as soon as its error rate exceeds the correction capability of ECC. They propose “revitalizing” this block by using it as an SLC block, which boosts lifetime at the cost of capacity. Figure 17 shows the working of the baseline and the technique proposed by Jimenez et al<sup>33</sup> over their lifetime. In both cases, initially four portions are allocated: data portion, buffer portion (organized as SLC mode), free blocks, and bad blocks. Initially, bad block portion is empty. In the baseline design, over the lifetime, the blocks, which become faulty, are moved to the bad block portion and the free block portion gradually becomes empty. This is shown by the “intermediate state” on the left side in Figure 17. When the SSD has no free blocks, it is assumed to reach its end, as shown by the “final state” on the left side in Figure 17.

In their proposed technique shown on the right side of Figure 17, the buffer and the free block portions can have both revitalized and robust blocks; however, the data portion can have only robust blocks. Revitalized blocks are kept in free block portion and completely failed blocks are directly migrated to the bad block portion. As shown in the right side of Figure 17, over time, robust blocks get replaced by revitalized blocks in the buffer. While allocating a block from free set, the buffer preferentially allocates revitalized blocks, which reduces the pressure on robust blocks. Thus, as long as enough robust blocks are available for the data portion, the SSD stays alive, which increases the overall lifetime, as shown in bottom right side of Figure 17.

Since buffer partition sees higher write intensity than data partition, revitalized blocks help in reducing write pressure on free blocks and the remaining robust blocks. Management of revitalized blocks is performed similar to those of bad blocks, and they are distinguished from robust



**FIGURE 17** Jimenez et al.<sup>33</sup> technique versus the baseline architecture. A, Lifetime of the baseline device; B, Lifetime with Jimenez et al.<sup>33</sup> [2013] scheme

blocks by use of a flag. Buffer portion preferentially allocates revitalized blocks instead of robust blocks to reduce pressure on them. Their technique increases device lifetime without harming performance or requiring additional storage.

It is noteworthy that the technique of Jimenez et al.<sup>33</sup> converts MLC blocks into SLC when they exhaust their lifetime to benefit from the *high endurance* of SLC blocks. By comparison, other soft partitioning techniques perform mode transition even when MLC blocks are healthy to benefit from the *low latency/energy* of SLC blocks.

## 6.2 | Techniques for managing ECC

Hsieh et al.<sup>32</sup> propose storing ECC in data area of SLC (2 KB) instead of spare area of MLC (128 B), which allows using stronger ECC, unconstrained by spare area size limitations. Since MLC does not allow partial-page programming (PPP), storing ECC in MLC data area is not effective, since ECC size is much lower than that of a page. With PPP, every SLC page can be written four times as 1/4 page size chunks each time, and thus, ECC can be stored in SLC without requiring erase operations. Also, higher write endurance and lower error rate of SLC ensures higher reliability of ECC.

Multiple contiguous SLC pages make an ECC area (refer to Figure 18). Also, multiple MLC pages (eg, one MLC block) form one mapping unit and the ECCs for all pages in a unit are stored in corresponding ECC area for saving space since the ECC size is much lower than the data area size. Since multiple writes are required for filling an MLC block and completing the ECC area, they use SLC itself as a buffer area for temporarily storing ECCs of previously written MLC pages before committing entire ECC area to SLC. As MLC blocks only support sequential programming, when the last page of an MLC block is programmed, all ECCs in buffer area can be committed to an ECC area. After updating ECC mapping table, buffer area is reclaimed.



**FIGURE 18** Management of block mapping table, ECC mapping table, and buffer area in the technique of Hsieh et al.<sup>32</sup> A, Both the MLC chip and the ECC mapping table are indexed by PBA. Both the block mapping table and the ECC mapping table are maintained in RAM; B, Buffer areas and ECC areas are physically interchangeable in SLC. However, buffer/ECC pages within the buffer/ECC areas must be physically contiguous



**FIGURE 19** Use of SLC flash as a circular log space in the technique of Chang et al<sup>18</sup>

They use adaptive ECC scheme, which provides BCH (Bose-Chaudhuri-Hocquenghem) codes with 9, 14, 19, and 24 bits correction capability for every 512 bytes. Initially, 9-bit ECC is allocated to each page in an MLC block. Since different cells in a page see different write patterns, they show different error rates. Thus, depending on error rate or amount of wear of an MLC page, stronger ECC is allocated to it. They assume a multichannel design where ECC and user data are simultaneously written to SLC and MLC through different channels. They show that with a 32-GB MLC-based SSD and 1-GB SLC, their technique improves SSD lifetime significantly. It is noteworthy that “low-density parity-check” (LDPC) codes are used much more widely in state-of-the-art SSD controllers than the BCH codes.

### 6.3 | Use of k-means clustering

Chang et al<sup>18</sup> note that small writes occur frequently and are directed to a small amount of data. Non-hot data are referenced infrequently by large writes. Further, data are either very small or very large and not medium sized. Based on this, they use  $k$ -means ( $k = 2$ ) clustering to find two peak frequencies in request size distribution. This provides the threshold of small writes, and all such small and hot writes are directed to SLC. They manage SLC as a circular list where new data arrives at the head pointer location and free space is reclaimed from the tail pointer location since its update frequency is lower (refer to Figure 19). In case of no space, head pointer moves to the next block. Due to the out-of-place writes, recent data appear close to the head pointer. When the head pointer moves ahead of the tail pointer by more than certain blocks, the tail block is reclaimed, and the tail pointer advances. This organization reduces copy operations during GC and also ensures uniform wear across SLC blocks.

SLC is managed using page-level mapping, and the mapping information is stored in a hashed table. MLC is managed with a two-level mapping scheme. In level-1 mapping, logical blocks are mapped one-to-one to physical blocks. Such physical blocks are termed “data block.” Since in-place writes are forbidden, a spare physical block with all free space is allocated on first sector write to a logical block. New data are written to the first page of the spare block, and the next sector write data are appended to its free space. This block is termed a log block and in case of no remaining free space, another log block is allocated. The ordered list of one data block and multiple log blocks make a block chain. In level-2 mapping, disk sectors of a logical block are mapped to MLC pages in associated block chain.

When SLC wear exceeds MLC wear, SLC accepts only very hot data or updates to existing data; remaining data are directed to MLC (refer to Figure 20). To account for the changing working set, only the metadata of non-update write request is recorded in SLC. If the write request appears again, then the data are stored in SLC, otherwise, the metadata is removed when the corresponding block is erased.

During GC, before moving a valid sector from SLC to MLC, the MLC checks if its corresponding logical block has been allocated to any of the log blocks. If yes, the sector is written to the MLC flash; otherwise, the sector is copied to the head block of SLC. By virtue of postponing writing of random data to MLC, this approach reduces GC overhead in MLC, although it also reduces SLC capacity available for storing the hot data. To compensate this, during GC in MLC, valid data are collected from both log blocks and SLC. They show that using 256-MB SLC with 20-GB MLC provides nearly double the performance compared to an MLC-only SSD while also saving energy.

### 6.4 | Use of genetic algorithm

Liu et al<sup>10</sup> propose a technique for improving lifetime and performance of SLC/TLC-based SSD. In their technique, TLC stores user data whereas mapping tables are stored in SLC since they have high access frequency (refer to Figure 21). A write request with lower or higher than a threshold size is considered random or sequential, respectively. Sequential writes are further divided into hot and cold depending on the access frequency. Random and hot sequential writes are stored with page-level mapping, and the cold sequential writes are stored with block-level mapping.

They allocate the number of SLC blocks based on the relative instantaneous erase counts of SLC and TLC. Since this simple mathematical model may not accurately capture behavior of different applications, they also use a genetic algorithm to obtain a heuristic solution. The population for the algorithm is formed by choosing different combinations of three parameters, viz, hot data threshold, random data threshold, and SLC-mode block ratio. The fitness functions are (1) ratio of erase count between SLC and TLC and (2) ratio of effective capacity of hybrid Flash to that of TLC-only



**FIGURE 20** Working of the technique of Chang et al<sup>18</sup>



**FIGURE 21** Working of the technique of Liu et al<sup>10</sup>

Flash. Multiple generations are simulated until the best-fit solution is found. Their technique brings large improvement in lifetime and performance. Also, the genetic algorithm provides higher lifetime than the mathematical formula-based approach although the genetic algorithm also incurs higher storage and latency overhead due to simulating multiple generations.

## 6.5 | Use of integral controller

Murugan et al<sup>43</sup> propose a hybrid SSD management technique, which works by identifying hot data. Pages with write count higher than a threshold are termed as hot and remaining pages with nonzero writes as termed as warm since they have potential to become hot. The threshold is periodically updated to include pages responsible for top K% (eg, 10%) of writes in the list of hot pages. Then, based on latency and write endurance of SLC and MLC, an integral controller decides the fraction of hot writes that are directed to MLC and the remaining hot writes go to SLC. All cold writes go to MLC. They use wear-leveling techniques within SLC and MLC. Due to the change in working set, a previous hot data item may become cold, and hence, it is migrated from SLC to MLC. As time passes, the amount of data migrated from SLC to MLC is gradually reduced to lower the lifetime degradation. Their technique improves both write performance and SSD lifetime.

## 6.6 | Use of analytical models

Wang et al<sup>44</sup> note that use of a fixed-size SLC buffer can harm performance and naively increasing the capacity of SLC buffer can increase GC overhead due to the requirement of reclaiming/moving large number of blocks. Also, this fixed-size buffer approach fails to account for the variation in application characteristics. They develop an analytical model of write costs in hot and cold regions based on the application behavior, GC overhead

and existing region utilizations. GC cost increases with rising utilization since more valid pages need to be migrated from a victim block. Based on these factors, the optimum capacities of two regions are decided.

Due to the change in working set or incorrect identification of hot/cold data, a data item may be placed in incorrect region. They note that when a hot data page is wrongly placed in cold region, a future update to the page can be performed in hot region and the previous copy in cold region can be invalidated, thus, no data migration is required to correct the placement. However, when a cold data page is incorrectly placed in the hot region, it needs to be migrated to the cold region. They use page-mapping scheme. A hot block uses only fast pages whereas all pages are used in a cold block.

When free blocks in a region fall below a threshold, GC is performed. There are three possible GC operations: (1) taking an erased block from other partition, which changes the capacity and utilization of both partitions, (2) moving valid data in victim blocks to other partition, which changes utilization but not capacity of each partition, and (3) moving valid data within a partition, which does not change partition utilization. For selecting one of three GC operations, their after-GC performance is evaluated and the one achieving the smallest write cost is selected and performed. Their technique improves performance over a fixed-sized partitioning technique.

## 7 | CONCLUSION AND FUTURE OUTLOOK

The design and management of storage devices have a significant impact on the performance, energy, and cost efficiency of all computing systems ranging from embedded systems to data centers and supercomputers.<sup>57</sup> In this paper, we presented a survey of the management techniques for hybrid SLC/MLC/TLC Flash-based SSD devices. We conclude this paper with a brief mention of future research directions.

Flash memory suffers from read disturb issue whereby a read to one row of cells affects the threshold voltages of unread cells in other rows of the same block. Addressing this will be highly important especially as Flash scales to low feature size. Further, in face of increasing memory capacity demands, 3-D Flash appears as a promising solution to continue to scale capacity within area budgets. While 48-layer 3-D Flash products are already in market, 256-layer Flash is expected to become available in near future.<sup>6</sup> Compared to 2-D Flash, 3-D Flash memory provides higher data write performance and reliability, and lower cost per byte. The challenges in using 3-D Flash is that it incurs higher initial investment and manufacturing cost and may provide lower yield compared to 2-D Flash. Addressing these challenges will be vital for ensuring adoption of 3-D Flash in commercial SSDs.

File system maintains metadata for files, which can provide valuable hints to SSD-management techniques; however, most existing techniques fail to benefit from this information. For example, file-system level information can be useful for selecting appropriate blocks for garbage collection, or as hot/cold blocks for migration to suitable SSD partition. Clearly, a coordinated file system and SSD management approach can provide much higher rewards than isolated management techniques for them.

Most proposed techniques use a fixed value of algorithm parameters such as threshold for finding hotness of data. While this approach works well when the programmer/designer has insights into nature of applications, for the general case of workloads with random combination of applications or different ratio of read/write operations, use of fixed parameters may not work well. Clearly, runtime adaptation of control parameters is vital for the proposed techniques to be effective for real-world scenarios.

Most existing simulators have been developed for evaluating homogeneous (eg, MLC-only) SSD devices. Since each simulator mimics a particular device/machine configuration and makes certain assumptions, using these simulators for evaluating hybrid SSD devices may not be fully accurate. Creating hybrid SSD specific simulators and hardware prototypes will be significantly boost research in this field and will provide even more meaningful insights.

Based on Moore's law, the number of cores in a system are steadily increasing. Conventional hybrid SSD management techniques focus only on improving overall throughput, which was sufficient for single-core systems. In multi-core systems, several other system-level objectives and constraints become prominent, such as fairness, QoS (quality of service), implementing priorities between applications running on different cores, and nonuniform latencies due to data-migration operations in hybrid SSDs. However, current hybrid SSD management policies are oblivious of these goals and constraints. Accounting for these metrics in the management policy of hybrid SSDs is important for ensuring their adoption in future multi- and many-core systems.

Most existing SSD management techniques have been proposed and evaluated in context of CPUs. In recent years, GPUs have been intensively used for a range of big-data analytics, database, and scientific applications.<sup>58</sup> Since GPUs are not stand-alone computing units that can directly access the external storage, they need the help of host-side storage software stack for accessing the data on SSDs.<sup>59</sup> This, however, exacerbates the overheads of file-resident data movement and prevents fully exploiting the potential of both GPU and SSD.<sup>60</sup> As GPUs become primary citizens of the computing world, novel strategies to directly connect GPUs to hybrid SSDs and exploit specific features of SLC/MLC/TLC in boosting efficiency of GPU applications will be highly rewarding.

Traditional error-intolerant computing approach necessitates precise computation and storage. However, the intrinsic error tolerance of applications and limitations in cognitive capabilities of users provides scope for approximate computing and storage,<sup>61</sup> where accuracy can be sacrificed to boost performance and energy efficiency. This approach can provide large benefits in Flash memories, for example, permitting imprecise writes in MLC/TLC Flash allows significantly reducing the number of program and verify cycles, which lowers write latency/energy and improves write

endurance. Going forward, an effective approach for imprecisely storing and processing data from hybrid SSDs will be highly effective in meeting the challenges of “big data.”

## ORCID

Ahmed Izzat Alsalibi  <http://orcid.org/0000-0002-8112-3365>

Sparsh Mittal  <http://orcid.org/0000-0002-2908-993X>

## REFERENCES

1. Mittal S, Vetter JS, Li D. A survey of architectural approaches for managing embedded dram and non-volatile on-chip caches. *IEEE Trans Parallel Distrib Syst (TPDS)*. 2015;26(6):1524-1537.
2. Chang L-P, Kuo T-W. Efficient management for large-scale flash-memory storage systems with resource conservation. *ACM Trans Storage (TOS)*. 2005;1(4):381-418.
3. Liu D, Wang Y, Qin Z, Shao Z, Guan Y. A space reuse strategy for flash translation layers in SLC NAND flash memory storage systems. *IEEE Trans Very Large Scale Integr VLSI Syst*. 2012;20(6):1094-1107.
4. Hsieh J-W, Kuo T-W, Chang L-P. Efficient identification of hot data for flash memory storage systems. *ACM Trans Storage*. 2006;2(1):22-40.
5. Vetter J, Mittal S. Opportunities for nonvolatile memory systems in extreme-scale high performance computing. *Comput Sci Eng*. 2015;17(2):73-82.
6. Coughlin T. How Many Layers Are Possible In 3D Flash? 2017. <https://goo.gl/HtDd5B>
7. Jimenez X, Novo D, Ienne P. Software controlled cell bit-density to improve NAND flash lifetime. Paper presented at: Design Automation Conference, San Francisco, CA: ACM; 2012.
8. Nam B-W, Na G-J, Lee S-W. A hybrid flash memory SSD scheme for enterprise database applications. Paper presented at: International Asia-Pacific Web Conference (APWEB), Busan, South Korea: IEEE; 2010.
9. Chang CW, Chen GY, Chen YJ, Yeh CW, Eng PY, Cheung A, Yang CL. Exploiting write heterogeneity of morphable MLC/SLC SSDs in datacenters with service-level objectives. *IEEE Trans Comput*. 2017;PP(99):1-1.
10. Liu D, Yao L, Long L, Shao Z, Guan Y. A workload-aware flash translation layer enhancing performance and lifespan of tlc/slcf dual-mode flash memory in embedded systems. *Microprocess Microsyst*. 2017;55:343-354.
11. HP. SSD endurance. 2015. <https://goo.gl/xjWzDD>
12. MICRON. 3D NAND Flash Memory. 2016. <https://goo.gl/dml86v>
13. ADVANTECH. Flash type comparison for SLC/MLC/TLC and Advantechs Ultra MLC technology. 2016. <https://goo.gl/ewWzea>
14. MICRON. TLC MLC and SLC Devices. 2017. <https://goo.gl/FJo45c>
15. Yim KS. A novel memory hierarchy for flash memory based storage systems. *J Semicond Technol Sci*. 2005;5(4):262-269.
16. Silicon Motion. Silicon Motion Announces Launch of Three New SSD Controllers Optimized for Managing MLC NAND Flash. 2008. <https://goo.gl/J2ps0F>
17. Mittal S, Vetter J. A survey of software techniques for using non-volatile memories for storage and main memory systems. *IEEE Trans Parallel Distrib Syst (TPDS)*. 2016;27(5):1537-1550.
18. Chang L-P. A hybrid approach to NAND-flash-based solid-state disks. *IEEE Trans Comput*. 2010;59(10):1337-1349.
19. Park D, Du DH. Hot data identification for flash-based storage systems using multiple bloom filters. Paper presented at: MSST, Denver, Colorado; 2011.
20. Chang Y-M, Chang Y-H, Kuo T-W, Li Y-C, Li H-P. Disturbance relaxation for 3d flash memory. *IEEE Trans Comput*. 2016;65(5):1467-1483.
21. Liu D, Wang T, Wang Y, Qin Z, Shao Z. Pcm-ftl: a write-activity-aware NAND flash memory management scheme for PCM-based embedded systems. Paper presented at: IEEE 32nd Real-Time Systems Symposium (RTSS), Vienna, Austria: IEEE; 2011.
22. Kim Y, Tauras B, Gupta A, Urgaonkar B. Flashsim: a simulator for NAND flash-based solid-state drives. Paper presented at: Simul'09 First International Conference on Advances in System Simulation, Porto, Portugal: IEEE; 2009.
23. Alsalibi AI, Sumari P, Alomari SA, Al-Betar MA. Performance and reliability concern scheme for efficient garbage collection and wear leveling on flash memory-based solid state disk. *Microsyst Technol*. 2016;23:2521-2535.
24. Matsui C, Yamada T, Sugiyama Y, Yamaga Y, Takeuchi K. Optimal memory configuration analysis in tri-hybrid solid-state drives with storage class memory and multi-level cell/triple-level cell NAND flash memory. *Jpn J Appl Phys*. 2017;56(4S):04CE02.
25. Jimenez X, Novo D, Ienne P. Libra: software-controlled cell bit-density to balance wear in NAND flash. *ACM Trans Embed Comput Syst*. 2015;14(2):28:1-28:22.
26. Samsung. Samsung unveils its third fusion semiconductor—Flex-OneNAND. 2007. <http://www.samsung.com/semiconductor/about-us/news/4194>
27. Geoff G. Micron's M600 SSD accelerates writes with dynamic SLC cache. 2014. <https://goo.gl/Q3hic>
28. Geoff G. Samsung's 840 EVO solid-state drive reviewed TLC NAND with a shot of SLC cache. 2013. <https://goo.gl/oSDk49>
29. Mittal S, Wang R, Vetter J. DESTINY: a comprehensive tool with 3D and multi-level cell memory modeling capability. *J Low Power Electron Appl*. 2017;7(3):23.
30. Mittal S. A survey of power management techniques for phase change memory. *Int J Comput Aided Eng Technol*. 2016;8(4):424-444.
31. Mittal S. A survey of techniques for architecting processor components using domain wall memory. *ACM J Emerg Technol Comput Syst*. 2016;13(2):29.
32. Hsieh JW, Chen CW, Lin HY. Adaptive ECC scheme for hybrid SSD. *IEEE Trans Comput*. 2015;64(12):3348-3361.

33. Jimenez X, Novo D, Ienne P. Phoenix: reviving MLC blocks as SLC to extend NAND flash devices lifetime. Paper presented at: DATE, Grenoble, France; 2013.
34. Jung S, Song YH. Hierarchical use of heterogeneous flash memories for high performance and durability. *IEEE Trans Consum Electron*. 2009;55(3):1383-1391.
35. Im S, Shin D. Comboltl: improving performance and lifespan of MLC flash memory using SLC flash buffer. *J Syst Archit*. 2010;56(12):641-653.
36. Oh Y, Lee E, Choi J, Lee D, Noh SH. Hybrid solid state drives for improved performance and enhanced lifetime. Paper presented at: Symposium on Mass Storage Systems and Technologies (MSST), Long Beach, CA; 2013.
37. Chen R, Qin Z, Wang Y, Liu D, Shao Z, Guan Y. On-demand block-level address mapping in large-scale NAND flash storage systems. *IEEE Trans Comput*. 2015;64(6):1729-1741.
38. Im S, Shin D. Storage architecture and software support for SLC/MLC combined flash memory. Paper presented at: Proceedings of the 2009 ACM Symposium on Applied Computing, Honolulu, Hawaii: ACM; 2009.
39. Yang MC, Chang YH, Tsao CW, Liu CY. Utilization-aware self-tuning design for TLC flash storage devices. *IEEE Trans Very Large Scale Integr Syst*. 2016;24(10):3132-3144.
40. Hachiya S, Johguchi K, Miyaji K, Takeuchi K. Hybrid triple-level-cell/multi-level-cell NAND flash storage array with chip exchangeable method. *Jpn J Appl Phys*. 2014;53(4S):04EE04.
41. Park J-W, Park S-H, Weems CC, Kim S-D. A hybrid flash translation layer design for SLC-MLC flash memory based multibank solid state disk. *Microprocess Microsyst*. 2011;35(1):48-59.
42. Lu N, Choi I-S, Ko S-H, Kim S-D. An effective hierarchical PRAM-SLC-MLC hybrid solid state disk. Paper presented at: International Conference on Computer and Information Science (ICIS), Shanghai, China: IEEE; 2012.
43. Murugan M, Du DH. Hybrot: towards improved performance in hybrid SLC-MLC devices. Paper presented at: International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), Arlington, Virginia: IEEE; 2012.
44. Wang W, Pan W, Xie T, Zhou D. How many MLCs should impersonate slcs to optimize ssd performance? Paper presented at: Proceedings of the Second International Symposium on Memory Systems, MEMSYS '16, ACM; 2016; New York, NY.
45. Sung M, Kim K. Asymmetric flash volume management. *IEEE Trans Consum Electron*. 2012;58(2):455-461.
46. Park J, Lee E, Bahn H. DABC-NV: a buffer cache architecture for mobile systems with heterogeneous flash memories. *IEEE Trans Consum Electron*. 2012;58(4):1237-1245.
47. Sun C, Iwasaki TO, Onagi T, Johguchi K, Takeuchi K. Cost, capacity, and performance analyses for hybrid SCM/NAND flash SSD. *IEEE Trans Circuits Syst I: Regul Pap*. 2014;61(8):2360-2369.
48. Lee S, Kim J. Improving performance and capacity of flash storage devices by exploiting heterogeneity of MLC flash memory. *IEEE Trans Comput*. 2014;63(10):2445-2458.
49. Kwon SJ, Chung T-S. Data pattern aware FTL for SLC+MLC hybrid SSD. *Des Autom Embedded Syst*. 2015;19(1):101-127.
50. Zhang X, Li J, Wang H, Zhao K, Zhang T. Reducing solid-state storage device write stress through opportunistic in-place delta compression. Paper presented at: FAST, Santa Clara, CA; 2016.
51. Cho I-P, Ko S-H, Yang H-M, Kim C-G, Kim S-D. A dynamic buffer management of hybrid solid state disk for media applications. Paper presented at: Proceedings of the International conference on it Convergence and Security 2011, ODISHA, India: Springer; 2012.
52. In J, Kim H, Lee K, Chung T. Method of remapping flash memory. US Patent 7,516,295. 2009.
53. Rosenblum M, Ousterhout JK. The design and implementation of a log-structured file system. *ACM Trans Comput Syst (TOCS)*. 1992;10(1):26-52.
54. Lee S-W, Choi W-K, Park D-J. Fast: an efficient flash translation layer for flash memory. *Emerging Directions in Embedded and Ubiquitous Computing*. Seoul, Korea: Springer; 2006:879-887.
55. Kim J, Seol J, Maeng S. A buffer management issue in designing SSDs for LFSs. *IEICE Trans Inf Syst*. 2010;E93.D(6):1644-1647.
56. Jin S, Kim J, Kim J, Huh J, Maeng S. Sector log: fine-grained storage management for solid state drives. Paper presented at: Proceedings of the 2011 ACM Symposium on Applied Computing, SAC '11, Taichung, Taiwan: ACM; 2011.
57. Mittal S. Power management techniques for data centers: a survey. 2014. arXiv preprint arXiv:1404.6681.
58. Mittal S, Vetter JS. A survey of methods for analyzing and improving GPU energy efficiency. *ACM Comput Surv*. 2015;47(2):19:1-19:23.
59. Mittal S, Vetter J. A survey of CPU-GPU heterogeneous computing techniques. *ACM Comput Surv*. 2015;47(4):69:1-69:35.
60. Zhang J, Donofrio D, Shalf J, Kandemir MT, Jung M. Nvmmu: a non-volatile memory management unit for heterogeneous gpu-ssd architectures. Paper presented at: 2015 International Conference on Parallel Architecture and Compilation (PACT), San Francisco, CA; 2015.
61. Mittal S. A survey of techniques for approximate computing. *ACM Comput Surv*. 2016;48(4):62:1-62:33.

**How to cite this article:** Alsalibi AI, Mittal S, Al-Betar MA, Sumari PB. A survey of techniques for architecting SLC/MLC/TLC hybrid Flash memory-based SSDs. *Concurrency Computat Pract Exper*. 2018;e4420. <https://doi.org/10.1002/cpe.4420>