 Research
 Open Access
 Published:
Reducing latency overhead caused by using LDPC codes in NAND flash memory
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 203 (2012)
Abstract
Semiconductor technology scaling makes NAND flash memory subject to continuous raw storage reliability degradation, leading to the demand for more and more powerful error correction codes. This inevitable trend makes conventional BCH code increasingly inadequate, and iterative coding solutions such as lowdensity paritycheck (LDPC) codes become very natural alternative options. However, finegrained softdecision memory sensing must be used in order to fully leverage the strong error correction capability of LDPC codes, which results in significant data access latency overhead. This article presents a simple design technique that can reduce such latency overhead. The key is to cohesively exploit the NAND flash memory wearout dynamics and impact of LDPC code structure on decoding performance. Based upon detailed memory device modeling and ASIC design, we carried out simulations to demonstrate the potential effectiveness of this design method and evaluate the involved tradeoffs.
Introduction
Solidstate storage systems based upon NAND flash memory technology must use error correction code (ECC) to ensure the systemlevel data storage integrity. In current design practice, BCH codes with classical harddecision decoding algorithms [1] are being widely used. As the semiconductor industry continues to push the technology scaling envelope and pursue aggressive use of multilevel per cell storage, raw storage reliability of NAND flash memory continues to degrade, which quickly makes current design practice inadequate and hence naturally demands more powerful ECCs. Because of their wellproven error correction capability with reasonably low decoding complexity and recent success in hard disk drives, lowdensity paritycheck (LDPC) codes [2, 3] have attracted many attentions because of their applications in NAND flash memory. To maximize their error correction capability, LDPC codes demand NAND flash memory carry out finergrained (i.e., softdecision) sensing. As a result, straightforward use of LDPC codes tends to significantly degrade the overall data storage system read response latency. In particular, since NAND flash memory onchip sensing latency is linearly proportional to the sensing quantization granularity, softdecision memory sensing will largely increase onchip memory sensing latency compared with current practice. Meanwhile, in the presence of softdecision memory sensing, the flashtocontroller data transfer latency will accordingly increase. Since the read response latency is a very critical metric in many data storage systems, it is highly desirable to reduce the read latency overhead caused by the use of LDPC codes in NAND flash memory.
In this article, we present a simple design technique to reduce the latency overhead caused by the use of LDPC codes. First, we note that NAND flash memory cells gradually wear out with the program/erase (P/E) cycling [4], which is reflected as gradually diminishing memory cell storage noise margin (or increasing raw storage bit error rate). To meet a specified P/E cycling endurance limit, the LDPC code decoding with the maximum allowable softdecision memory sensing precision should be able to tolerate the worstcase raw storage reliability at the end of memory lifetime. Clearly, the memory cell wearout dynamics makes the maximally achievable error correction capability of LDPC codes largely morethanenough over the entire lifetime of memory, especially at its early lifetime when P/E cycling number is relatively small. Meanwhile, the error correction capability of LDPC codes strongly depends on the softdecision input precision. Therefore, it is very straightforward to apply a progressiveprecision LDPC decoding strategy to reduce the average latency, i.e., we always start with the harddecision memory sensing (and hence harddecision LDPC code decoding), and only if LDPC decoding fails, we progressively increase the sensing precision and retry the decoding until LDPC decoding succeeds.
Under such a straightforward progressiveprecision decoding design framework, it is very critical to minimize the harddecision LDPC decoding failure rate in order to minimize the overall latency overhead. Harddecision LDPC decoding employs the bitflipping decoding algorithm [5], which flips the harddecision of those bits that participate in the most number of unsatisfied parity checks during each iteration. In contrast, softdecision LDPC decoding employs either sum–product decoding algorithm [3], min–sum decoding algorithm [6], or their many variants. These softdecision decoding algorithms iteratively update the likelihood probability estimation of each bit. In the context of softdecision decoding, it has been well known that the decoding performance heavily depends on the column weight of LDPC code parity check matrices, and lowweight LDPC codes tend to have stronger error correction capability than highweight LDPC codes. However, in the context of harddecision decoding based upon bitflipping decoding algorithm, highweight LDPC codes tend to outperform their lowweight counterparts, which is completely opposite to the scenario of softdecision decoding. Under the straightforward progressiveprecision decoding framework, such conflicting impact of code parity check matrix column weight on error correction capability directly leads to a design dilemma: If we use lowweight LDPC codes to maximize the achievable error correction capability and hence maximize the tolerable worstcase raw storage reliability, the corresponding harddecision decoding failure rate will relatively be higher, leading to a larger latency overhead. To address this design dilemma, instead of using the same LDPC code throughout the entire NAND flash memory lifetime, we propose to adaptively adjust the LDPC code parity check matrix column weight based upon the wearout progress of the NAND flash memory. In particular, given the raw storage reliability of NAND flash memory, we always use the LDPC code that can meet the target softdecision decoding failure rate and meanwhile has the highest column weight. As a result, we can always ensure that the harddecision decoding failure rate is minimized throughout the entire lifetime of NAND flash memory, leading to the minimal overall latency overhead.
Based upon NAND flash memory erase and programming characteristics, we derive mathematical formulations to approximately model threshold voltage distribution of memory cells in the presence of various major NAND flash memory device noise and distortion sources. Using a hypothetical 2 bits/cell NAND flash memory and rate8/9 LDPC codes with different column weights, we carry out extensive computer simulations to evaluate the effectiveness of the proposed design technique. In addition, we carried out LDPC decoder ASIC design at 65nm technology node to evaluate and compare the harddecision and softdecision LDPC decoder silicon cost.
Basics and background
Basics of NAND flash memory physics
Each NAND flash memory cell is a floating gate transistor whose threshold voltage can be programmed by injecting certain amount of charges into the floating gate. Before a flash memory cell can be programmed, it must be erased (i.e., remove all the charges from the floating gate, which sets its threshold voltage to the lowest voltage window). For n bits/cell flash memory, the goal of memory programming is to move the memory cell threshold voltage into one of 2^{n}nonoverlapping storage levels that are apart from each other with certain noise margin. However, the memory cell storage noise margin can seriously be degraded in practice, mainly due to P/E cycling effects and celltocell interference, which will be discussed below.
Effects of P/E cycling
Flash memory P/E cycling causes damage to the tunnel oxide of floating gate transistors in the form of charge traps in the oxide and interface states [7–9], which directly results in memory cell threshold voltage shift and fluctuation and hence degrades memory device noise margin. Let N denote the number of P/E cycles that memory cells have gone through and Δ N_{trap} denote the density growth of either interface or oxide traps. We can approximately quantify the relation between interface/oxide traps generation and the number of P/E cycles as
where A is a constant factor fitted from measurements. Such a power–law relationship can be explained by the widely accepted reaction–diffusion model in negative bias temperature instability [10, 11] and the scatteringinduced diffusion model [12]. Those gradually accumulated traps result in two major types of noises:

1.
Electrons capture and emission events at charge trap sites near the interface developed over P/E cycling directly result in memory cell threshold voltage random fluctuation, which is referred to as random telegraph noise (RTN) [13, 14];

2.
Interface state trap recovery and electron detrapping [12, 15] gradually reduce memory cell threshold voltage, leading to the data retention limitation. This is referred to as data retention noise.
Since the significance of these noises grows with the trap density and trap density grows with P/E cycling, NAND flash memory cell noise margin monotonically degrades with P/E cycling. This leads to the NAND flash memory P/E cycling endurance limit, beyond which memory cell noise margin degradation can no longer be accommodated by the memory system fault tolerance capability.
Celltocell interference
In NAND flash memory, the threshold voltage shift of one floating gate transistor can influence the threshold voltage of its neighboring floating gate transistors through parasitic capacitancecoupling effect [16]. This is referred to as celltocell interference, which has been well recognized as the one of major noise sources in NAND flash memory [17–19]. Threshold voltage shift of a victim cell caused by celltocell interference can be estimated as [16]
where $\Delta {V}_{t}^{\left(k\right)}$ represents the threshold voltage shift of one interfering cell which is programmed after the victim cell, and the coupling ratio γ^{(k)} is defined as
where C^{(k)} is the parasitic capacitance between the interfering cell and the victim cell, and C_{total} is the total capacitance of the victim cell.
Use of softdecision ECC in NAND flash memory
As technology continues to scale down, NAND flash memory cell storage distortion and noise sources become increasingly significant, leading to continuous degradation of memory raw storage reliability. As a result, the industry has very actively been pursuing the transition of ECC from conventional BCH codes to more powerful softdecision iterative coding solutions, in particular LDPC codes. Nevertheless, since NAND flash memory sensing latency is linearly proportional to the number of sensing quantization levels and the sensing results must be transferred to the memory controller through standard chiptochip links, a straightforward use of softdecision ECC in NAND flash memory can result in significant memory read latency overhead.
The linear dependency of memory sensing latency on the number of sensing quantization levels is caused by the underlying NAND flash memory structure. NAND flash memory cells are organized in an array→block→page hierarchy, where one NAND flash memory array is partitioned into blocks, and each block contains a number of pages. Within one block, each memory cell string typically contains 32 to 128 memory cells, and all the memory cells driven by the same wordline are programmed and sensed at the same time. All the memory cells within the same block must be erased at the same time. Data are programmed and fetched in the unit of page, where the page size ranges from 512 Bytes to 8 kBytes user data. As illustrated in Figure 1, all the memory cell blocks share the same set of bitlines and an onchip page buffer that contain sensing circuitry and hold the data being programmed or fetched. As illustrated in Figure 1, when we read one memory page with mlevel sensing quantization, the wordline voltage consecutively sweeps through the m different sensing quantization levels, and all the bitlines are charged and discharged once for every sensing quantization level. This clearly shows the linear dependency of memory sensing latency on the number of sensing quantization levels.
Reducing LDPCinduced latency overhead
Progressiveprecision LDPC decoding
From the discussions in “Basics of NAND flash memory physics” section, it is clear that NAND flash memory cell raw storage reliability gradually degrades with the P/E cycling: During the early lifetime of memory cells (i.e., the P/E cycling number N is relatively small), the aggregated P/E cycling effects are relatively less significant, which leads to a relatively large memory cell storage noise margin and hence good raw storage reliability (i.e., low raw storage bit error rate); since the aggregated P/E cycling effects scale with N in approximate power–law fashions, the memory cell storage noise margin and hence raw storage reliability gradually degrade as the P/E cycling number N increases. As a result, it is sufficient for ECC to provide gradually stronger error correction capability throughout the entire NAND flash memory lifetime.
Meanwhile, the error correction capability of LDPC code decoding gradually improves as we increase the memory sensing precision. If NAND flash memory uses conventional harddecision memory sensing (i.e., there are only l − 1 sensing quantization levels for llevel per cell NAND flash memory), LDPC code decoder can only carry out harddecision decoding (e.g., using the harddecision bitflipping decoding algorithm) and achieve relatively poor error correction capability. As NAND flash memory uses softdecision memory sensing with higher and higher sensing quantization granularity, LDPC code decoder can carry out softdecision decoding (e.g., using the sum–product or min–sum decoding algorithm) and achieve stronger and stronger error correction capability.
Very naively, the above discussion suggests that we can use a simple progressiveprecision LDPC decoding strategy to reduce the memory sensing latency and flashtocontroller data transfer latency caused by the use of LDPC codes. As illustrated in Figure 2, this straightforward design strategy aims to use justenough sensing precision for LDPC code decoding through a trialanderror manner. As discussed above, NAND flash memory raw storage reliability gradually degrades with the P/E cycling, hence finegrained sensing may only be necessary as flash memory approaches its end of lifetime, and lowoverhead coarsegrained sensing (and even harddecision sensing) could be sufficient during memory early lifetime. In addition, since ECC must ensure an extremely low page error rate (e.g., 10^{−12} and below) for data storage in NAND flash memory, lowoverhead coarsegrained sensing (and even harddecision sensing) may achieve a reasonably low error rate (e.g., 10^{−4}), which clearly makes the progressiveprecision sensing and decoding strategy well justified, especially for applications that are not very sensitive to read latency variability.
Reducing harddecision LDPC decoding failure probability
Under the above design framework of the progressiveprecision LDPC code decoding, it is highly desirable to maximize the utilization efficiency of harddecision LDPC code decoding (i.e., to minimize the harddecision LDPC code decoding failure probability) in order to reduce the overall latency overhead. In this study, we propose a design method that can reduce harddecision LDPC code decoding failure probability by adaptively configuring LDPC code parity check matrix construction throughout the entire flash memory P/E cycling lifetime. Because their inherent structural regularity can greatly facilitate efficient decoder silicon implementation, quasicyclic LDPC (QCLDPC) codes have widely been studied and adopted in reallife applications. Therefore, we are only interested in the use of QCLDPC codes in NAND flash memory. The parity check matrix H of a QCLDPC code can be written as
where each submatrix H_{i,j} is a circulant matrix in which each row is a cyclic shift of the row above. The column weight (or row weight) of each circulant H_{i,j} can be 0, 1, or 2.
It is well known that LDPC code decoding performance spectrum contains two regions (i.e., waterfall region and errorfloor region [20]): Starting from the worst raw storage reliability with very high raw bit error rate, as we increase the raw storage reliability, the LDPC code decoding failure rate will continue to rapidly reduce like a waterfall; however, as the raw storage reliability improves over a certain threshold, the reduction slope of LDPC code decoding failure rate will noticeably degrade, entering the socalled errorfloor region. It is well known that LDPC code decoding performance spectrum is fundamentally subject to a tradeoff between waterfall region performance and errorfloor threshold: As illustrated in Figure 3, an LDPC code with relatively low column weight (e.g., 3 or 4) can achieve good softdecision decoding performance within waterfall region but tends to have a relatively worse errorfloor threshold (i.e., enter the errorfloor region at relatively high softdecision decoding failure probability); on the other hand, an LDPC code with relatively high column weight (e.g., 5 or 6) has a relatively better errorfloor threshold but achieves worse softdecision decoding performance within waterfall region. Straightforwardly, designers tend to choose the LDPC codes that can satisfy the target page error rate with the least parity check matrix column weight. This will accordingly maximize the softdecision decoding performance and hence tolerate worse raw storage reliability. In addition, since decoding computational complexity is proportional to the number of 1’s in the code parity check matrix, the use of lowweight LDPC codes can also reduce the decoder implementation silicon cost.
On the contrary to the scenario of softdecision decoding, a lowweight LDPC code tends to have worse harddecision decoding performance than a highcolumnweight code. This can be illustrated in Figure 3 and will be further demonstrated in “Case studies” section. This observation directly motivates us to adaptively change the LDPC code parity check matrix column weight throughout the entire NAND flash memory lifetime. Its basic idea is to use the LDPC code that has the largest possible column weight and meanwhile can meet the target page error rate through softdecision decoding under present flash memory P/E cycling. It can be further described as follows: assume the NAND flash memory controller can support s different LDPC codes, ${\mathcal{C}}_{1},\phantom{\rule{1em}{0ex}}{\mathcal{C}}_{2},\dots ,\phantom{\rule{1em}{0ex}}{\mathcal{C}}_{s}$. Let w_{ i } represent the column weight of the code ${\mathcal{C}}_{i}$, and we have w_{ s }>w_{s−1} > ⋯ > w_{1}. Let N_{ i }denote the threshold P/E cycling number beyond which the softdecision decoding of the code ${\mathcal{C}}_{i}$ cannot satisfy the target page error rate. Based on the above discussions, we have N_{1} < N_{2} < ⋯ <N_{ s }. In order to reduce the harddecision decoding failure rate, we should use the weightw_{ i } LDPC code ${\mathcal{C}}_{i}$ when the NAND flash memory cycling number N ∈[N_{i−1},N_{ i }), where we set N_{0} as 0 and N_{s + 1} as ∞. Figure 3 illustrates the scenario when we have three different LDPC codes.
Compared with using a single fixed lowweight LDPC code throughout the entire lifetime of NAND flash memory, this proposed adaptive design method can achieve better harddecision decoding performance (hence lower harddecision decoding failure rate) throughout the entire NAND flash memory lifetime. This can directly reduce the average latency of onchip memory sensing and flashtocontroller data transfer caused by the use of LDPC codes in NAND flash memory. Meanwhile, we note that such an adaptive design method will lead to higher silicon implementation cost of flash memory controller since the softdecision and harddecision decoders must be able to support different codes with different column weights. Since we only consider the use of QCLDPC codes, it will be sufficient for the decoder to support the maximum allowable number of column weight and runtime configuration in terms of circulant size and cyclic shift value of each circulant. Most QCLDPC decoder architectures ever reported in the open literature (e.g., see) [21–24] can readily support such configurability.
Case studies
For the purpose of quantitative evaluation, we develop a quantitative NAND flash memory device model that can capture the major threshold voltage distortion sources. Based upon this model, we carry out simulations to demonstrate the proposed design method.
NAND flash device model
Erase and programming operation modeling
The threshold voltage of erased memory cells tends to have a wide Gaussianlike distribution [25], and we approximately model the threshold voltage distribution of erased state as
where μ_{ e } and σ_{ e } are the mean and standard deviation of the erased state. Regarding memory programming, a tight threshold voltage control is typically realized by using incremental step pulse program [4, 26], i.e., memory cells on the same wordline are recursively programmed. At older technology nodes (e.g., 90nm node), the threshold voltage of programmed states tends to have a uniform distribution [14]. Nevertheless, in highly scaled technology nodes (e.g., 65nm and below), the electron injection statistical spread [27] has become significant and tends to make the threshold voltage of programmed states more like Gaussian distribution. Hence, in this study we approximately model the distribution of programmed state as
where μ_{ p } and σ_{ p } are the mean and standard deviation of the programmed state right after programming.
RTN
The fluctuation magnitude of RTN is subject to exponential decay. The probability density function p_{ r }(x) of RTNinduced threshold voltage fluctuation is modeled as a symmetric exponential function [14]:
Since the significance of RTN is proportional to the interface trap density, we model the mean of RTN, i.e., ${\mu}_{\text{R}\mathrm{TN}}=\frac{1}{{\lambda}_{r}}$, approximately follows
Retention process
Since interface trap recovery and electron detrapping processes tend to follow Poisson statistics [9], we approximately model the induced threshold voltage reduction as a Gaussian distribution, i.e., ${p}_{t}\left(x\right)=\mathcal{\ud4a9}({\mu}_{d},{\sigma}_{d}^{2})$. As demonstrated in relevant device studies (e.g., see) [9, 28], mean value of threshold voltage shift scales approximately with $ln(1+t)$ over the time. The mean value of retention shift is set to follow the mean of sum of interface traps and oxide traps:
Moreover, the significance of threshold voltage reduction induced by interface trap recovery and electron detrapping is also proportional to the initial threshold voltage magnitude [29], i.e., the higher the initial threshold voltage is, the faster the interface trap recovery and electron detrapping occur and hence the larger threshold voltage reduction will be. Hence, we set the generated retention noise approximately scale K_{ s }(x−x_{0}), where x is the initial threshold voltage before retention, and x_{0} and K_{ s } are constants.
Celltocell interference
To capture inevitable process variability in celltocell interference model, we set both the vertical coupling ratio γ_{ y } and diagonal coupling ratio γ_{ xy }as random variables with truncated Gaussian distribution:
where μ_{ c } and σ_{ c } are the mean and standard deviation, and c_{ c }is chosen to ensure the integration of this bounded Gaussian distribution equals to 1. According to [18], we set the ratio between the means of γ_{ x }, γ_{ y }, and γ_{ xy } as 0.1:0.08:0.006.
Overall device model
Based upon (4) and (5), we can obtain the threshold voltage distribution function p_{ p }(x) right after programming operation. Then we have the threshold voltage distribution after incorporating RTN p_{ar}(x) as
After we incorporate the celltocell interference, we have the threshold voltage distribution p_{ac}. Let p_{ t }(x) denote the distribution of retention noise caused by interface state trap recovery and electron detrapping. The final threshold voltage distribution p_{ f } is obtained as
Experiments
Based upon the above NAND flash memory device model, we carried out simulations to compare the errorcorrection performance between softdecision and harddecision LDPC code decoding and demonstrate the effectiveness of the proposed adaptive design method. In this study, we set that each LDPC codeword protects 2 kB user data, and construct three rate8/9 QCLDPC codes with the column weight of 3, 4, and 5, respectively. The code parity check matrices of these three codes contain 3×27, 4×36, and 5×45 circulants, respectively, where all the circulants have a column weight of 1 and are constructed randomly subject to the 4cyclefree constraint. LDPC code softdecision decoding employs the min–sum decoding algorithm [6], and harddecision decoding employs the bitflipping decoding algorithm [5].
Based upon the NAND flash memory device model described in “NAND flash device model” section, we use 2 bits/cell NAND flash memory with the following device parameters as a test vehicle. We set the normalized σ_{ e } and μ_{ e } of the erased state as 0.35 and 1.4, respectively. For programmed state, we set the normalized program step voltage Δ V_{pp} as 0.2, and its deviation as 0.05. According to [12], the exponents for interface and oxide traps generation are estimated as a_{IT} = 0.62 and a_{OT} = 0.3, respectively. For RTN, we set A_{RTN} = 2.72×10^{−4}. The coupling strength factor is set as 1. As for retention shift, we set σ_{ d } = 0.3μ_{ d }, and A_{ t } = 3.5×10^{−5}, B_{ t } = 2.35×10^{−4}, which are chosen to match the 70%:30% ratio of interface trap recovery and electron detrapping presented in [12]. Regarding the influence of the initial threshold voltage, we set x_{0} = 1.4 and K_{ s } = 0.333. We set w_{ c } = 0.1μ_{ c } and σ_{ c } = 0.4μ_{ c }. Accordingly, we carried out Monte Carlo simulations to evaluate the decoding failure rate statistics when using different LDPC codes with both harddecision decoding and softdecision decoding. The simulation results are shown in Figure 4. The simulation results clearly show that highweight LDPC codes perform better than their lowweight counterparts under harddecision decoding. This is completely opposite to the scenario of using softdecision decoding. Therefore, it is highly desirable to employ the highweight LDPC codes in the early lifetime of NAND flash memory in order to reduce the latency overhead.
We further carried out ASIC design to evaluate the silicon overhead of implementing both softdecision and harddecision LDPC decoders. With the RTLlevel design entry using Verilog, we use Synopsys tool set and 65nm CMOS standard cell library. The target decoding throughput is 2 Gbps. Both decoders carry out the decoding in a partially parallel manner, and can be onthefly configured in terms of cyclic shift value of each circulant, circulant size, and column weight. The softdecision decoder architecture directly follows the one presented in [22], and the harddecision decoder employs the similar architecture. All the decoding messages in the softdecision decoder have 4bit precision. Table 1 summarizes the ASIC design results, which clearly show that the addition of a harddecision decoder only induces a relatively small silicon overhead.
Conclusion
This article concerns the potentially significant latency overhead caused by the use of powerful softdecision ECC, in particular LDPC codes, in future NAND flash memory. Although LDPC codes can achieve excellent errorcorrection capability, their softdecision decoding nature directly results in significant latency overhead in terms of onchip memory sensing and flashtocontroller data transfer. We propose a simple yet effective design technique that can reduce such latency overhead. Based upon an approximate NAND flash memory device model, we carried out simulations and the results clearly demonstrate the potential effectiveness of the proposed design solution.
References
 1.
Blahut RE: Algebraic Codes For Data Transmission. Cambridge, MA, Cambridge University Press; 2003.
 2.
Gallager RG, Lowdensity paritycheck codes: IRE Trans. Inf. Theory. 1962, IT8: 2128.
 3.
MacKay DJC: Good errorcorrecting codes based on very sparse matrices. IEEE Trans. Inf. Theory 1999, 45: 399431. 10.1109/18.748992
 4.
Bez R, Camerlenghi E, Modelli A, Visconti A: Introduction to Flash memory. Proc. IEEE 2003, 91: 489502. 10.1109/JPROC.2003.811702
 5.
Kou Y, Lin S, Fossorier MPC: Lowdensity paritycheck codes based on finite geometries: a rediscovery and new results. IEEE Trans. Inf. Theory 2001, 47: 27112736. 10.1109/18.959255
 6.
Chen J, Dholakia A, Eleftheriou E, Fossorier M, Hu XY: Reducedcomplexity decoding of LDPC codes. IEEE Trans. Commun 2005, 53: 12881299. 10.1109/TCOMM.2005.852852
 7.
Olivo P, Ricco B, Sangiorgi E: Highfieldinduced voltagedependent oxide charge. Appl. Phys. Lett 1986, 48: 1135. 10.1063/1.96448
 8.
Cappelletti P, Bez R, Cantarelli D, Fratin L: Failure mechanisms of Flash cell in program/erase cycling. In International Electron Devices Meeting. San Francisco; 1994:291294.
 9.
Mielke N, Belgal H, Kalastirsky I, Kalavade P, Kurtz A, Meng Q, Righos N, Wu J: Flash EEPROM threshold instabilities due to charge trapping during program/erase cycling. IEEE Trans. Dev. Mater. Reliab 2004, 4(3):335344. 10.1109/TDMR.2004.836721
 10.
Yang JB, Chen TP, Tan SS, Chan L: Analytical reactiondiffusion model the modeling of nitrogenenhanced negative bias temperature instability. Appl. Phys. Lett. 2006, 88: 172109–1721093.
 11.
Ogawa S, Shiono N: Generalized diffusionreaction model for the lowfield chargebuildup instability at the SiSio2 interface. Phys. Rev. B 1995, 51: 42184230. 10.1103/PhysRevB.51.4218
 12.
Yang Hong, Kim H, Park J, Kim S, Lee S, Choi J, Hwang D, Kim C, Park M, Lee K, Park Y, Shin J, Kong J: Reliability issues and models of sub90nm NAND Flash memory cells. In International conference on SolidState and Integrated Circuit Technology. Shanghai; 2006:760762.
 13.
Fukuda K, Shimizu Y, Amemiya K, Kamoshida M, Hu C: Random telegraph noise in flash memories model and technology scaling. In IEEE International Electron Devices Meeting. Washington; 2007:169172.
 14.
Compagnoni C, Ghidotti M, Lacaita A, Spinelli A, Visconti A: Random telegraph noise effect on the programmed thresholdvoltage distribution of flash memories. IEEE Electron. Dev. Lett 2009, 30(9):984986.
 15.
Mielke N, Belgal H, Fazio A, Meng Q, Righos N: Recovery effects in the distributed cycling of flash memories. In Proc. of IEEE International Reliability Physics Symposium. San Jose; 2006:2935.
 16.
Lee JD, Hur SH, Choi JD: Effects of floatinggate interference on NAND flash memory cell operation. IEEE Electron. Dev. Lett 2002, 23(5):264266.
 17.
Kim K: Future memory technology: challenges and opportunities. In Proc. of International Symposium on VLSI Technology, Systems and Applications. Hsinchu; 2008:59.
 18.
Prall K: Scaling nonvolatile memory below 30nm. In IEEE 2nd NonVolatile Semiconductor Memory Workshop. Monterey; 2007:510.
 19.
Liu H, Groothuis S, Mouli C, Li J, Parat K, Krishnamohan T: 3D simulation study of cellcell interference in advanced NAND flash memory. In Proc. IEEE Workshop on Microelectronics and Electron Devices. Boise; 2009:13.
 20.
Richardson T: Error floors of LDPC codes. Proc. 41st Allerton Conference on Communications, Control and Computing, 41(3) 2003, 14261435.
 21.
Mansour M, Shanbhag NR: A 640mb/s 2048bit programmable ldpc decoder chip. IEEE J. Solid State Circuits 2006, 41(3):684698. 10.1109/JSSC.2005.864133
 22.
Zhong H, Xu W, Xie N, Zhang T: Areaefficient minsum decoder design for highrate quasicyclic lowdensity paritycheck codes in magnetic recording. IEEE Trans. Magnet 2007, 43: 41174122.
 23.
Dai Y, Yan Z, Chen N: Memoryefficient and highthroughput decoding of quasicyclic ldpc codes. IEEE Trans. Commun 2009, 57(4):879883.
 24.
Zhang K, Huang X, Wang Z: A highthroughput ldpc decoder architecture with rate compatibility. IEEE Trans. Circuits Syst. I: Regular Papers 2011, 58(4):839847.
 25.
Takeuchi K, Tanaka T, Nakamura H: A doublelevelVth select gate array architecture for multilevel NAND flash memories. IEEE J. SolidState Circuits 1996, 31(4):602609. 10.1109/4.499738
 26.
Suh KD, Su B, Lim Y, Kim J, Choi Y, Koh Y, Lee S, Kwon S, Choi B, Yum J, Choi J, Kim J, Lim H: A 3.3 V 32 Mb NAND flash memory with incremental step pulse programming scheme. IEEE J. Solid State Circuits 1995, 30(11):11491156. 10.1109/4.475701
 27.
Compagnoni C, Spinelli A, Gusmeroli R, Lacaita A, Beltrami S, Ghetti A, Visconti A: First evidence for injection statistics accuracy limitations in NAND Flash constantcurrent FowlerNordheim programming. IEEE International Electron Devices Meeting 2007, 165168.
 28.
Compagnoni CM, Miccoli C, Mottadelli R, Beltrami S, Ghidotti M, Lacaita AL, Spinelli AS, Visconti A: Investigation of the threshold voltage instability after distributed cycling in nanoscale NAND flash memory arrays. In IEEE International Reliability Physics Symposium (IRPS). Anaheim; 2010:604610.
 29.
Lee J, Choi J, Park D, Kim K, Center R, Co S, GyunggiDo S: Effects of interface trap generation and annihilation on the data retention characteristics of flash memory cells. IEEE Trans. Dev. Mater. Reliab 2004, 4(1):110117. 10.1109/TDMR.2004.824360
Acknowledgement
This research was funded in part by grants from the National Natural Science Foundation of China (No. 61274028), the National Hightech R&D Program of China (No. 2011AA010405), NSF grants CNS1162152 and NSF grants CCF0937794. The authors are also grateful to the anonymous reviewers for their valuable and constructive comments.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Zhao, W., Dong, G., Sun, H. et al. Reducing latency overhead caused by using LDPC codes in NAND flash memory. EURASIP J. Adv. Signal Process. 2012, 203 (2012). https://doi.org/10.1186/168761802012203
Received:
Accepted:
Published:
Keywords
 NAND flash memory
 LDPC code
 Harddecision decoding