Reducing latency overhead caused by using LDPC codes in NAND flash memory

Zhao, Wenzhe; Dong, Guiqiang; Sun, Hongbin; Zheng, Nanning; Zhang, Tong

doi:10.1186/1687-6180-2012-203

Research
Open access
Published: 19 September 2012

Reducing latency overhead caused by using LDPC codes in NAND flash memory

Wenzhe Zhao¹,
Guiqiang Dong²,
Hongbin Sun¹,
Nanning Zheng¹ &
…
Tong Zhang³

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 203 (2012) Cite this article

5900 Accesses
12 Citations
3 Altmetric
Metrics details

Abstract

Semiconductor technology scaling makes NAND flash memory subject to continuous raw storage reliability degradation, leading to the demand for more and more powerful error correction codes. This inevitable trend makes conventional BCH code increasingly inadequate, and iterative coding solutions such as low-density parity-check (LDPC) codes become very natural alternative options. However, fine-grained soft-decision memory sensing must be used in order to fully leverage the strong error correction capability of LDPC codes, which results in significant data access latency overhead. This article presents a simple design technique that can reduce such latency overhead. The key is to cohesively exploit the NAND flash memory wear-out dynamics and impact of LDPC code structure on decoding performance. Based upon detailed memory device modeling and ASIC design, we carried out simulations to demonstrate the potential effectiveness of this design method and evaluate the involved trade-offs.

Introduction

Solid-state storage systems based upon NAND flash memory technology must use error correction code (ECC) to ensure the system-level data storage integrity. In current design practice, BCH codes with classical hard-decision decoding algorithms [1] are being widely used. As the semiconductor industry continues to push the technology scaling envelope and pursue aggressive use of multi-level per cell storage, raw storage reliability of NAND flash memory continues to degrade, which quickly makes current design practice inadequate and hence naturally demands more powerful ECCs. Because of their well-proven error correction capability with reasonably low decoding complexity and recent success in hard disk drives, low-density parity-check (LDPC) codes [2, 3] have attracted many attentions because of their applications in NAND flash memory. To maximize their error correction capability, LDPC codes demand NAND flash memory carry out finer-grained (i.e., soft-decision) sensing. As a result, straightforward use of LDPC codes tends to significantly degrade the overall data storage system read response latency. In particular, since NAND flash memory on-chip sensing latency is linearly proportional to the sensing quantization granularity, soft-decision memory sensing will largely increase on-chip memory sensing latency compared with current practice. Meanwhile, in the presence of soft-decision memory sensing, the flash-to-controller data transfer latency will accordingly increase. Since the read response latency is a very critical metric in many data storage systems, it is highly desirable to reduce the read latency overhead caused by the use of LDPC codes in NAND flash memory.

In this article, we present a simple design technique to reduce the latency overhead caused by the use of LDPC codes. First, we note that NAND flash memory cells gradually wear out with the program/erase (P/E) cycling [4], which is reflected as gradually diminishing memory cell storage noise margin (or increasing raw storage bit error rate). To meet a specified P/E cycling endurance limit, the LDPC code decoding with the maximum allowable soft-decision memory sensing precision should be able to tolerate the worst-case raw storage reliability at the end of memory lifetime. Clearly, the memory cell wear-out dynamics makes the maximally achievable error correction capability of LDPC codes largely more-than-enough over the entire lifetime of memory, especially at its early lifetime when P/E cycling number is relatively small. Meanwhile, the error correction capability of LDPC codes strongly depends on the soft-decision input precision. Therefore, it is very straightforward to apply a progressive-precision LDPC decoding strategy to reduce the average latency, i.e., we always start with the hard-decision memory sensing (and hence hard-decision LDPC code decoding), and only if LDPC decoding fails, we progressively increase the sensing precision and retry the decoding until LDPC decoding succeeds.

Under such a straightforward progressive-precision decoding design framework, it is very critical to minimize the hard-decision LDPC decoding failure rate in order to minimize the overall latency overhead. Hard-decision LDPC decoding employs the bit-flipping decoding algorithm [5], which flips the hard-decision of those bits that participate in the most number of unsatisfied parity checks during each iteration. In contrast, soft-decision LDPC decoding employs either sum–product decoding algorithm [3], min–sum decoding algorithm [6], or their many variants. These soft-decision decoding algorithms iteratively update the likelihood probability estimation of each bit. In the context of soft-decision decoding, it has been well known that the decoding performance heavily depends on the column weight of LDPC code parity check matrices, and low-weight LDPC codes tend to have stronger error correction capability than high-weight LDPC codes. However, in the context of hard-decision decoding based upon bit-flipping decoding algorithm, high-weight LDPC codes tend to outperform their low-weight counterparts, which is completely opposite to the scenario of soft-decision decoding. Under the straightforward progressive-precision decoding framework, such conflicting impact of code parity check matrix column weight on error correction capability directly leads to a design dilemma: If we use low-weight LDPC codes to maximize the achievable error correction capability and hence maximize the tolerable worst-case raw storage reliability, the corresponding hard-decision decoding failure rate will relatively be higher, leading to a larger latency overhead. To address this design dilemma, instead of using the same LDPC code throughout the entire NAND flash memory lifetime, we propose to adaptively adjust the LDPC code parity check matrix column weight based upon the wear-out progress of the NAND flash memory. In particular, given the raw storage reliability of NAND flash memory, we always use the LDPC code that can meet the target soft-decision decoding failure rate and meanwhile has the highest column weight. As a result, we can always ensure that the hard-decision decoding failure rate is minimized throughout the entire lifetime of NAND flash memory, leading to the minimal overall latency overhead.

Based upon NAND flash memory erase and programming characteristics, we derive mathematical formulations to approximately model threshold voltage distribution of memory cells in the presence of various major NAND flash memory device noise and distortion sources. Using a hypothetical 2 bits/cell NAND flash memory and rate-8/9 LDPC codes with different column weights, we carry out extensive computer simulations to evaluate the effectiveness of the proposed design technique. In addition, we carried out LDPC decoder ASIC design at 65-nm technology node to evaluate and compare the hard-decision and soft-decision LDPC decoder silicon cost.

Basics and background

Basics of NAND flash memory physics

Each NAND flash memory cell is a floating gate transistor whose threshold voltage can be programmed by injecting certain amount of charges into the floating gate. Before a flash memory cell can be programmed, it must be erased (i.e., remove all the charges from the floating gate, which sets its threshold voltage to the lowest voltage window). For n bits/cell flash memory, the goal of memory programming is to move the memory cell threshold voltage into one of 2ⁿnon-overlapping storage levels that are apart from each other with certain noise margin. However, the memory cell storage noise margin can seriously be degraded in practice, mainly due to P/E cycling effects and cell-to-cell interference, which will be discussed below.

Effects of P/E cycling

Flash memory P/E cycling causes damage to the tunnel oxide of floating gate transistors in the form of charge traps in the oxide and interface states [7–9], which directly results in memory cell threshold voltage shift and fluctuation and hence degrades memory device noise margin. Let N denote the number of P/E cycles that memory cells have gone through and Δ N_trap denote the density growth of either interface or oxide traps. We can approximately quantify the relation between interface/oxide traps generation and the number of P/E cycles as

Δ N_{trap} = A \cdot N^{a},

(1)

where A is a constant factor fitted from measurements. Such a power–law relationship can be explained by the widely accepted reaction–diffusion model in negative bias temperature instability [10, 11] and the scattering-induced diffusion model [12]. Those gradually accumulated traps result in two major types of noises:

1.
Electrons capture and emission events at charge trap sites near the interface developed over P/E cycling directly result in memory cell threshold voltage random fluctuation, which is referred to as random telegraph noise (RTN) [13, 14];
2.
Interface state trap recovery and electron detrapping [12, 15] gradually reduce memory cell threshold voltage, leading to the data retention limitation. This is referred to as data retention noise.

Since the significance of these noises grows with the trap density and trap density grows with P/E cycling, NAND flash memory cell noise margin monotonically degrades with P/E cycling. This leads to the NAND flash memory P/E cycling endurance limit, beyond which memory cell noise margin degradation can no longer be accommodated by the memory system fault tolerance capability.

Cell-to-cell interference

In NAND flash memory, the threshold voltage shift of one floating gate transistor can influence the threshold voltage of its neighboring floating gate transistors through parasitic capacitance-coupling effect [16]. This is referred to as cell-to-cell interference, which has been well recognized as the one of major noise sources in NAND flash memory [17–19]. Threshold voltage shift of a victim cell caused by cell-to-cell interference can be estimated as [16]

F = \sum_{k} (Δ V_{t}^{(k)} \cdot γ^{(k)}),

(2)

where $Δ V_{t}^{(k)}$ represents the threshold voltage shift of one interfering cell which is programmed after the victim cell, and the coupling ratio γ^(k) is defined as

γ^{(k)} = \frac{C^{(k)}}{C_{total}},

(3)

where C^(k) is the parasitic capacitance between the interfering cell and the victim cell, and C_total is the total capacitance of the victim cell.

Use of soft-decision ECC in NAND flash memory

As technology continues to scale down, NAND flash memory cell storage distortion and noise sources become increasingly significant, leading to continuous degradation of memory raw storage reliability. As a result, the industry has very actively been pursuing the transition of ECC from conventional BCH codes to more powerful soft-decision iterative coding solutions, in particular LDPC codes. Nevertheless, since NAND flash memory sensing latency is linearly proportional to the number of sensing quantization levels and the sensing results must be transferred to the memory controller through standard chip-to-chip links, a straightforward use of soft-decision ECC in NAND flash memory can result in significant memory read latency overhead.

The linear dependency of memory sensing latency on the number of sensing quantization levels is caused by the underlying NAND flash memory structure. NAND flash memory cells are organized in an array→block→page hierarchy, where one NAND flash memory array is partitioned into blocks, and each block contains a number of pages. Within one block, each memory cell string typically contains 32 to 128 memory cells, and all the memory cells driven by the same word-line are programmed and sensed at the same time. All the memory cells within the same block must be erased at the same time. Data are programmed and fetched in the unit of page, where the page size ranges from 512 Bytes to 8 kBytes user data. As illustrated in Figure 1, all the memory cell blocks share the same set of bit-lines and an on-chip page buffer that contain sensing circuitry and hold the data being programmed or fetched. As illustrated in Figure 1, when we read one memory page with m-level sensing quantization, the word-line voltage consecutively sweeps through the m different sensing quantization levels, and all the bit-lines are charged and discharged once for every sensing quantization level. This clearly shows the linear dependency of memory sensing latency on the number of sensing quantization levels.

Reducing LDPC-induced latency overhead

Progressive-precision LDPC decoding

From the discussions in “Basics of NAND flash memory physics” section, it is clear that NAND flash memory cell raw storage reliability gradually degrades with the P/E cycling: During the early lifetime of memory cells (i.e., the P/E cycling number N is relatively small), the aggregated P/E cycling effects are relatively less significant, which leads to a relatively large memory cell storage noise margin and hence good raw storage reliability (i.e., low raw storage bit error rate); since the aggregated P/E cycling effects scale with N in approximate power–law fashions, the memory cell storage noise margin and hence raw storage reliability gradually degrade as the P/E cycling number N increases. As a result, it is sufficient for ECC to provide gradually stronger error correction capability throughout the entire NAND flash memory lifetime.

Meanwhile, the error correction capability of LDPC code decoding gradually improves as we increase the memory sensing precision. If NAND flash memory uses conventional hard-decision memory sensing (i.e., there are only l − 1 sensing quantization levels for l-level per cell NAND flash memory), LDPC code decoder can only carry out hard-decision decoding (e.g., using the hard-decision bit-flipping decoding algorithm) and achieve relatively poor error correction capability. As NAND flash memory uses soft-decision memory sensing with higher and higher sensing quantization granularity, LDPC code decoder can carry out soft-decision decoding (e.g., using the sum–product or min–sum decoding algorithm) and achieve stronger and stronger error correction capability.

Very naively, the above discussion suggests that we can use a simple progressive-precision LDPC decoding strategy to reduce the memory sensing latency and flash-to-controller data transfer latency caused by the use of LDPC codes. As illustrated in Figure 2, this straightforward design strategy aims to use just-enough sensing precision for LDPC code decoding through a trial-and-error manner. As discussed above, NAND flash memory raw storage reliability gradually degrades with the P/E cycling, hence fine-grained sensing may only be necessary as flash memory approaches its end of lifetime, and low-overhead coarse-grained sensing (and even hard-decision sensing) could be sufficient during memory early lifetime. In addition, since ECC must ensure an extremely low page error rate (e.g., 10⁻¹² and below) for data storage in NAND flash memory, low-overhead coarse-grained sensing (and even hard-decision sensing) may achieve a reasonably low error rate (e.g., 10⁻⁴), which clearly makes the progressive-precision sensing and decoding strategy well justified, especially for applications that are not very sensitive to read latency variability.

Reducing hard-decision LDPC decoding failure probability

Under the above design framework of the progressive-precision LDPC code decoding, it is highly desirable to maximize the utilization efficiency of hard-decision LDPC code decoding (i.e., to minimize the hard-decision LDPC code decoding failure probability) in order to reduce the overall latency overhead. In this study, we propose a design method that can reduce hard-decision LDPC code decoding failure probability by adaptively configuring LDPC code parity check matrix construction throughout the entire flash memory P/E cycling lifetime. Because their inherent structural regularity can greatly facilitate efficient decoder silicon implementation, quasi-cyclic LDPC (QC-LDPC) codes have widely been studied and adopted in real-life applications. Therefore, we are only interested in the use of QC-LDPC codes in NAND flash memory. The parity check matrix H of a QC-LDPC code can be written as

H = [\begin{matrix} H_{1, 1} & H_{1, 2} & \dots & H_{1, n} \\ H_{2, 1} & H_{2, 2} & \dots & H_{2, n} \\ \dots & \dots & ⋱ & \dots \\ H_{m, 1} & H_{m, 2} & \dots & H_{m, n} \end{matrix}],

where each sub-matrix H_i,j is a circulant matrix in which each row is a cyclic shift of the row above. The column weight (or row weight) of each circulant H_i,j can be 0, 1, or 2.

It is well known that LDPC code decoding performance spectrum contains two regions (i.e., water-fall region and error-floor region [20]): Starting from the worst raw storage reliability with very high raw bit error rate, as we increase the raw storage reliability, the LDPC code decoding failure rate will continue to rapidly reduce like a water-fall; however, as the raw storage reliability improves over a certain threshold, the reduction slope of LDPC code decoding failure rate will noticeably degrade, entering the so-called error-floor region. It is well known that LDPC code decoding performance spectrum is fundamentally subject to a trade-off between water-fall region performance and error-floor threshold: As illustrated in Figure 3, an LDPC code with relatively low column weight (e.g., 3 or 4) can achieve good soft-decision decoding performance within water-fall region but tends to have a relatively worse error-floor threshold (i.e., enter the error-floor region at relatively high soft-decision decoding failure probability); on the other hand, an LDPC code with relatively high column weight (e.g., 5 or 6) has a relatively better error-floor threshold but achieves worse soft-decision decoding performance within water-fall region. Straightforwardly, designers tend to choose the LDPC codes that can satisfy the target page error rate with the least parity check matrix column weight. This will accordingly maximize the soft-decision decoding performance and hence tolerate worse raw storage reliability. In addition, since decoding computational complexity is proportional to the number of 1’s in the code parity check matrix, the use of low-weight LDPC codes can also reduce the decoder implementation silicon cost.

On the contrary to the scenario of soft-decision decoding, a low-weight LDPC code tends to have worse hard-decision decoding performance than a high-column-weight code. This can be illustrated in Figure 3 and will be further demonstrated in “Case studies” section. This observation directly motivates us to adaptively change the LDPC code parity check matrix column weight throughout the entire NAND flash memory lifetime. Its basic idea is to use the LDPC code that has the largest possible column weight and meanwhile can meet the target page error rate through soft-decision decoding under present flash memory P/E cycling. It can be further described as follows: assume the NAND flash memory controller can support s different LDPC codes, $C_{1}, C_{2}, \dots, C_{s}$ . Let w_i represent the column weight of the code $C_{i}$ , and we have w_s>w_s−1 > ⋯ > w₁. Let N_idenote the threshold P/E cycling number beyond which the soft-decision decoding of the code $C_{i}$ cannot satisfy the target page error rate. Based on the above discussions, we have N₁ < N₂ < ⋯ <N_s. In order to reduce the hard-decision decoding failure rate, we should use the weight-w_i LDPC code $C_{i}$ when the NAND flash memory cycling number N ∈[N_i−1,N_i), where we set N₀ as 0 and N_{s + 1} as ∞. Figure 3 illustrates the scenario when we have three different LDPC codes.

Compared with using a single fixed low-weight LDPC code throughout the entire lifetime of NAND flash memory, this proposed adaptive design method can achieve better hard-decision decoding performance (hence lower hard-decision decoding failure rate) throughout the entire NAND flash memory lifetime. This can directly reduce the average latency of on-chip memory sensing and flash-to-controller data transfer caused by the use of LDPC codes in NAND flash memory. Meanwhile, we note that such an adaptive design method will lead to higher silicon implementation cost of flash memory controller since the soft-decision and hard-decision decoders must be able to support different codes with different column weights. Since we only consider the use of QC-LDPC codes, it will be sufficient for the decoder to support the maximum allowable number of column weight and run-time configuration in terms of circulant size and cyclic shift value of each circulant. Most QC-LDPC decoder architectures ever reported in the open literature (e.g., see) [21–24] can readily support such configurability.

Case studies

For the purpose of quantitative evaluation, we develop a quantitative NAND flash memory device model that can capture the major threshold voltage distortion sources. Based upon this model, we carry out simulations to demonstrate the proposed design method.

NAND flash device model

Erase and programming operation modeling

The threshold voltage of erased memory cells tends to have a wide Gaussian-like distribution [25], and we approximately model the threshold voltage distribution of erased state as

p_{e} (x) = \frac{1}{σ_{e} \sqrt{2 Π}} e^{- \frac{{(x - μ_{e})}^{2}}{2 σ_{e}^{2}}},

(4)

where μ_e and σ_e are the mean and standard deviation of the erased state. Regarding memory programming, a tight threshold voltage control is typically realized by using incremental step pulse program [4, 26], i.e., memory cells on the same word-line are recursively programmed. At older technology nodes (e.g., 90-nm node), the threshold voltage of programmed states tends to have a uniform distribution [14]. Nevertheless, in highly scaled technology nodes (e.g., 65-nm and below), the electron injection statistical spread [27] has become significant and tends to make the threshold voltage of programmed states more like Gaussian distribution. Hence, in this study we approximately model the distribution of programmed state as

p_{p} (x) = \frac{1}{σ_{p} \sqrt{2 Π}} e^{- \frac{{(x - μ_{p})}^{2}}{2 σ_{p}^{2}}},

(5)

where μ_p and σ_p are the mean and standard deviation of the programmed state right after programming.

RTN

The fluctuation magnitude of RTN is subject to exponential decay. The probability density function p_r(x) of RTN-induced threshold voltage fluctuation is modeled as a symmetric exponential function [14]:

p_{r} (x) = \frac{1}{2 λ_{r}} e^{- \frac{| x |}{λ_{r}}} .

(6)

Since the significance of RTN is proportional to the interface trap density, we model the mean of RTN, i.e., $μ_{R TN} = \frac{1}{λ_{r}}$ , approximately follows

μ_{R TN} = A_{R TN} \cdot N^{a_{I T}} .

(7)

Retention process

Since interface trap recovery and electron detrapping processes tend to follow Poisson statistics [9], we approximately model the induced threshold voltage reduction as a Gaussian distribution, i.e., $p_{t} (x) = 풩 (μ_{d}, σ_{d}^{2})$ . As demonstrated in relevant device studies (e.g., see) [9, 28], mean value of threshold voltage shift scales approximately with $ln (1 + t)$ over the time. The mean value of retention shift is set to follow the mean of sum of interface traps and oxide traps:

μ_{d} = (A_{t} \cdot N^{a_{IT}} + B_{t} \cdot N^{a_{OT}}) \cdot ln (1 + t) .

(8)

Moreover, the significance of threshold voltage reduction induced by interface trap recovery and electron detrapping is also proportional to the initial threshold voltage magnitude [29], i.e., the higher the initial threshold voltage is, the faster the interface trap recovery and electron detrapping occur and hence the larger threshold voltage reduction will be. Hence, we set the generated retention noise approximately scale K_s(x−x₀), where x is the initial threshold voltage before retention, and x₀ and K_s are constants.

Cell-to-cell interference

To capture inevitable process variability in cell-to-cell interference model, we set both the vertical coupling ratio γ_y and diagonal coupling ratio γ_xyas random variables with truncated Gaussian distribution:

p_{c} (x) = \{\begin{matrix} \frac{c_{c}}{σ_{c} \sqrt{2 Π}} \cdot e^{- \frac{{(x - μ_{c})}^{2}}{2 σ_{c}^{2}}}, if | x - μ_{c} | \leq w_{c} \\ 0, else \end{matrix},

(9)

where μ_c and σ_c are the mean and standard deviation, and c_cis chosen to ensure the integration of this bounded Gaussian distribution equals to 1. According to [18], we set the ratio between the means of γ_x, γ_y, and γ_xy as 0.1:0.08:0.006.

Overall device model

Based upon (4) and (5), we can obtain the threshold voltage distribution function p_p(x) right after programming operation. Then we have the threshold voltage distribution after incorporating RTN p_ar(x) as

p_{ar} (x) = p_{p} (x) \otimes p_{r} (x) .

(10)

After we incorporate the cell-to-cell interference, we have the threshold voltage distribution p_ac. Let p_t(x) denote the distribution of retention noise caused by interface state trap recovery and electron detrapping. The final threshold voltage distribution p_f is obtained as

p_{f} (x) = p_{ac} (x) \otimes p_{t} (x) .

(11)

Experiments

Based upon the above NAND flash memory device model, we carried out simulations to compare the error-correction performance between soft-decision and hard-decision LDPC code decoding and demonstrate the effectiveness of the proposed adaptive design method. In this study, we set that each LDPC codeword protects 2 kB user data, and construct three rate-8/9 QC-LDPC codes with the column weight of 3, 4, and 5, respectively. The code parity check matrices of these three codes contain 3×27, 4×36, and 5×45 circulants, respectively, where all the circulants have a column weight of 1 and are constructed randomly subject to the 4-cycle-free constraint. LDPC code soft-decision decoding employs the min–sum decoding algorithm [6], and hard-decision decoding employs the bit-flipping decoding algorithm [5].

Based upon the NAND flash memory device model described in “NAND flash device model” section, we use 2 bits/cell NAND flash memory with the following device parameters as a test vehicle. We set the normalized σ_e and μ_e of the erased state as 0.35 and 1.4, respectively. For programmed state, we set the normalized program step voltage Δ V_pp as 0.2, and its deviation as 0.05. According to [12], the exponents for interface and oxide traps generation are estimated as a_IT = 0.62 and a_OT = 0.3, respectively. For RTN, we set A_RTN = 2.72×10⁻⁴. The coupling strength factor is set as 1. As for retention shift, we set σ_d = 0.3|μ_d|, and A_t = 3.5×10⁻⁵, B_t = 2.35×10⁻⁴, which are chosen to match the 70%:30% ratio of interface trap recovery and electron detrapping presented in [12]. Regarding the influence of the initial threshold voltage, we set x₀ = 1.4 and K_s = 0.333. We set w_c = 0.1μ_c and σ_c = 0.4μ_c. Accordingly, we carried out Monte Carlo simulations to evaluate the decoding failure rate statistics when using different LDPC codes with both hard-decision decoding and soft-decision decoding. The simulation results are shown in Figure 4. The simulation results clearly show that high-weight LDPC codes perform better than their low-weight counterparts under hard-decision decoding. This is completely opposite to the scenario of using soft-decision decoding. Therefore, it is highly desirable to employ the high-weight LDPC codes in the early lifetime of NAND flash memory in order to reduce the latency overhead.

We further carried out ASIC design to evaluate the silicon overhead of implementing both soft-decision and hard-decision LDPC decoders. With the RTL-level design entry using Verilog, we use Synopsys tool set and 65-nm CMOS standard cell library. The target decoding throughput is 2 Gbps. Both decoders carry out the decoding in a partially parallel manner, and can be on-the-fly configured in terms of cyclic shift value of each circulant, circulant size, and column weight. The soft-decision decoder architecture directly follows the one presented in [22], and the hard-decision decoder employs the similar architecture. All the decoding messages in the soft-decision decoder have 4-bit precision. Table 1 summarizes the ASIC design results, which clearly show that the addition of a hard-decision decoder only induces a relatively small silicon overhead.

Table 1 LDPC decoder design results at 65-nm technology node

Full size table

Conclusion

This article concerns the potentially significant latency overhead caused by the use of powerful soft-decision ECC, in particular LDPC codes, in future NAND flash memory. Although LDPC codes can achieve excellent error-correction capability, their soft-decision decoding nature directly results in significant latency overhead in terms of on-chip memory sensing and flash-to-controller data transfer. We propose a simple yet effective design technique that can reduce such latency overhead. Based upon an approximate NAND flash memory device model, we carried out simulations and the results clearly demonstrate the potential effectiveness of the proposed design solution.

References

Blahut RE: Algebraic Codes For Data Transmission. Cambridge, MA, Cambridge University Press; 2003.
Book MATH Google Scholar
Gallager RG, Low-density parity-check codes: IRE Trans. Inf. Theory. 1962, IT-8: 21-28.
Article Google Scholar
MacKay DJC: Good error-correcting codes based on very sparse matrices. IEEE Trans. Inf. Theory 1999, 45: 399-431. 10.1109/18.748992
Article MathSciNet MATH Google Scholar
Bez R, Camerlenghi E, Modelli A, Visconti A: Introduction to Flash memory. Proc. IEEE 2003, 91: 489-502. 10.1109/JPROC.2003.811702
Article Google Scholar
Kou Y, Lin S, Fossorier MPC: Low-density parity-check codes based on finite geometries: a rediscovery and new results. IEEE Trans. Inf. Theory 2001, 47: 2711-2736. 10.1109/18.959255
Article MathSciNet MATH Google Scholar
Chen J, Dholakia A, Eleftheriou E, Fossorier M, Hu X-Y: Reduced-complexity decoding of LDPC codes. IEEE Trans. Commun 2005, 53: 1288-1299. 10.1109/TCOMM.2005.852852
Article Google Scholar
Olivo P, Ricco B, Sangiorgi E: High-field-induced voltage-dependent oxide charge. Appl. Phys. Lett 1986, 48: 1135. 10.1063/1.96448
Article Google Scholar
Cappelletti P, Bez R, Cantarelli D, Fratin L: Failure mechanisms of Flash cell in program/erase cycling. In International Electron Devices Meeting. San Francisco; 1994:291-294.
Google Scholar
Mielke N, Belgal H, Kalastirsky I, Kalavade P, Kurtz A, Meng Q, Righos N, Wu J: Flash EEPROM threshold instabilities due to charge trapping during program/erase cycling. IEEE Trans. Dev. Mater. Reliab 2004, 4(3):335-344. 10.1109/TDMR.2004.836721
Article Google Scholar
Yang JB, Chen TP, Tan SS, Chan L: Analytical reaction-diffusion model the modeling of nitrogen-enhanced negative bias temperature instability. Appl. Phys. Lett. 2006, 88: 172109–172109-3.
Google Scholar
Ogawa S, Shiono N: Generalized diffusion-reaction model for the low-field charge-buildup instability at the Si-Sio2 interface. Phys. Rev. B 1995, 51: 4218-4230. 10.1103/PhysRevB.51.4218
Article Google Scholar
Yang Hong, Kim H, Park J, Kim S, Lee S, Choi J, Hwang D, Kim C, Park M, Lee K, Park Y, Shin J, Kong J: Reliability issues and models of sub-90nm NAND Flash memory cells. In International conference on Solid-State and Integrated Circuit Technology. Shanghai; 2006:760-762.
Google Scholar
Fukuda K, Shimizu Y, Amemiya K, Kamoshida M, Hu C: Random telegraph noise in flash memories model and technology scaling. In IEEE International Electron Devices Meeting. Washington; 2007:169-172.
Google Scholar
Compagnoni C, Ghidotti M, Lacaita A, Spinelli A, Visconti A: Random telegraph noise effect on the programmed threshold-voltage distribution of flash memories. IEEE Electron. Dev. Lett 2009, 30(9):984-986.
Article Google Scholar
Mielke N, Belgal H, Fazio A, Meng Q, Righos N: Recovery effects in the distributed cycling of flash memories. In Proc. of IEEE International Reliability Physics Symposium. San Jose; 2006:29-35.
Google Scholar
Lee J-D, Hur S-H, Choi J-D: Effects of floating-gate interference on NAND flash memory cell operation. IEEE Electron. Dev. Lett 2002, 23(5):264-266.
Article Google Scholar
Kim K: Future memory technology: challenges and opportunities. In Proc. of International Symposium on VLSI Technology, Systems and Applications. Hsinchu; 2008:5-9.
Google Scholar
Prall K: Scaling non-volatile memory below 30nm. In IEEE 2nd Non-Volatile Semiconductor Memory Workshop. Monterey; 2007:5-10.
Google Scholar
Liu H, Groothuis S, Mouli C, Li J, Parat K, Krishnamohan T: 3D simulation study of cell-cell interference in advanced NAND flash memory. In Proc. IEEE Workshop on Microelectronics and Electron Devices. Boise; 2009:1-3.
Google Scholar
Richardson T: Error floors of LDPC codes. Proc. 41st Allerton Conference on Communications, Control and Computing, 41(3) 2003, 1426-1435.
Google Scholar
Mansour M, Shanbhag NR: A 640-mb/s 2048-bit programmable ldpc decoder chip. IEEE J. Solid State Circuits 2006, 41(3):684-698. 10.1109/JSSC.2005.864133
Article Google Scholar
Zhong H, Xu W, Xie N, Zhang T: Area-efficient min-sum decoder design for high-rate quasi-cyclic low-density parity-check codes in magnetic recording. IEEE Trans. Magnet 2007, 43: 4117-4122.
Article Google Scholar
Dai Y, Yan Z, Chen N: Memory-efficient and high-throughput decoding of quasi-cyclic ldpc codes. IEEE Trans. Commun 2009, 57(4):879-883.
Article Google Scholar
Zhang K, Huang X, Wang Z: A high-throughput ldpc decoder architecture with rate compatibility. IEEE Trans. Circuits Syst. I: Regular Papers 2011, 58(4):839-847.
Article MathSciNet Google Scholar
Takeuchi K, Tanaka T, Nakamura H: A double-level-Vth select gate array architecture for multilevel NAND flash memories. IEEE J. Solid-State Circuits 1996, 31(4):602-609. 10.1109/4.499738
Article Google Scholar
Suh K-D, Su B, Lim Y, Kim J, Choi Y, Koh Y, Lee S, Kwon S, Choi B, Yum J, Choi J, Kim J, Lim H: A 3.3 V 32 Mb NAND flash memory with incremental step pulse programming scheme. IEEE J. Solid State Circuits 1995, 30(11):1149-1156. 10.1109/4.475701
Article Google Scholar
Compagnoni C, Spinelli A, Gusmeroli R, Lacaita A, Beltrami S, Ghetti A, Visconti A: First evidence for injection statistics accuracy limitations in NAND Flash constant-current Fowler-Nordheim programming. IEEE International Electron Devices Meeting 2007, 165-168.
Google Scholar
Compagnoni CM, Miccoli C, Mottadelli R, Beltrami S, Ghidotti M, Lacaita AL, Spinelli AS, Visconti A: Investigation of the threshold voltage instability after distributed cycling in nanoscale NAND flash memory arrays. In IEEE International Reliability Physics Symposium (IRPS). Anaheim; 2010:604-610.
Google Scholar
Lee J, Choi J, Park D, Kim K, Center R, Co S, Gyunggi-Do S: Effects of interface trap generation and annihilation on the data retention characteristics of flash memory cells. IEEE Trans. Dev. Mater. Reliab 2004, 4(1):110-117. 10.1109/TDMR.2004.824360
Article Google Scholar

Download references

Acknowledgement

This research was funded in part by grants from the National Natural Science Foundation of China (No. 61274028), the National High-tech R&D Program of China (No. 2011AA010405), NSF grants CNS-1162152 and NSF grants CCF-0937794. The authors are also grateful to the anonymous reviewers for their valuable and constructive comments.

Author information

Authors and Affiliations

The Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Shaanxi, China
Wenzhe Zhao, Hongbin Sun & Nanning Zheng
Skyera Inc.1704, Automation pkwy, San Jose, 95131, CA, USA
Guiqiang Dong
Electrical, Computer and Systems Engineering Department, Rensselaer Polytechnic Institute, Troy, NY, USA
Tong Zhang

Authors

Wenzhe Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guiqiang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Nanning Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenzhe Zhao.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Authors’ original file for figure 16

Authors’ original file for figure 17

Authors’ original file for figure 18

Authors’ original file for figure 19

Authors’ original file for figure 20

Authors’ original file for figure 21

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Zhao, W., Dong, G., Sun, H. et al. Reducing latency overhead caused by using LDPC codes in NAND flash memory. EURASIP J. Adv. Signal Process. 2012, 203 (2012). https://doi.org/10.1186/1687-6180-2012-203

Download citation

Received: 12 April 2012
Accepted: 21 August 2012
Published: 19 September 2012
DOI: https://doi.org/10.1186/1687-6180-2012-203

Reducing latency overhead caused by using LDPC codes in NAND flash memory

Abstract

Introduction

Basics and background

Basics of NAND flash memory physics

Effects of P/E cycling

Cell-to-cell interference

Use of soft-decision ECC in NAND flash memory

Reducing LDPC-induced latency overhead

Progressive-precision LDPC decoding

Reducing hard-decision LDPC decoding failure probability

Case studies

NAND flash device model

Erase and programming operation modeling

RTN

Retention process

Cell-to-cell interference

Overall device model

Experiments

Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords