- Research Article
- Open Access

# Very Low Rate Scalable Speech Coding through Classified Embedded Matrix Quantization

- Ehsan Jahangiri
^{1, 2}Email author and - Shahrokh Ghaemmaghami
^{2}

**2010**:480345

https://doi.org/10.1155/2010/480345

© E. Jahangiri and S. Ghaemmaghami. 2010

**Received:**21 June 2009**Accepted:**19 February 2010**Published:**12 April 2010

## Abstract

This paper proposes a scalable speech coding scheme using the embedded matrix quantization of the LSFs in the LPC model. For an efficient quantization of the spectral parameters, two types of codebooks of different sizes are designed and used to encode unvoiced and mixed voicing segments separately. The tree-like structured codebooks of our embedded quantizer, constructed through a cell merging process, help to make a fine-grain scalable speech coder. Using an efficient adaptive dual-band approximation of the LPC excitation, where voicing transition frequency is determined based on the concept of instantaneous frequency in the frequency domain, near natural sounding synthesized speech is achieved. Assessment results, including both overall quality and intelligibility scores show that the proposed coding scheme can be a reasonable choice for speech coding in low bandwidth communication applications.

## Keywords

- Instantaneous Frequency
- Vector Quantization
- Training Sequence
- Speech Quality
- Distortion Measure

## 1. Introduction

Scalable speech coding refers to the coding schemes that reconstruct speech at different levels of accuracy or quality at various bit rates. The bit-stream of a scalable coder is composed of two parts: an essential part called the *core* unit and an optional part that includes *enhancement* units. The core unit provides minimal quality for the synthesized speech, while a higher quality is achieved by adding the enhancement units.

Embedded quantization, which provides the ability of successive refinement of the reconstructed symbols, can be employed in speech coders to attain the scalability property. This quantization method has found useful applications in variable-rate and progressive transmission of digital signals. The output symbol of an -bit quantizer, in an embedded quantizer, is embedded in all output symbols of the ( )-bit quantizers, where [1]. In other words, higher rate codes contain lower rate codes plus bits of refinement.

Embedded quantization was first introduced by Tzou [1] for scalar quantization. Tzou proposed a method to achieve embedded quantization by organizing the threshold levels in the form of binary trees, using the numerical optimization of Max [2]. Subsequently, embedded quantization was generalized to vector quantization (VQ). Some examples of such vector quantizers, which are based on the natural embedded property of tree-structured VQ (TSVQ), can be found in [3–5]. Ravelli and Daudet [6] proposed a method for embedded quantization of complex values in the polar form which is applicable to some parametric representations that produce complex coefficients. In the scalable image coding method introduced in [7] by Said and Pearlman, wavelet coefficients are quantized using scalar embedded quantizers.

Even though broadband technologies have significantly increased transmission bandwidth, heavy degradation of voice quality may occur due to the traffic-dependent variability of transmission delay in the network. A nonscalable coder operates well only when all bits, representing each frame of the signal, are recovered. Conversely, a scalable coder adjusts the need for optional bits, based on the data transmission quality, which could have significant impact on the overall performance of the reconstructed voice quality. Accordingly, only the core information is used for recovering the signal in the case of network congestion [8].

Scalable coders may also be used to optimize a multi-destination voice service in case of unequal or varying bandwidth allocations. Typically, voice servers have to produce the same data at different rates for users demanding the same voice signal [6]. This imposes an additional computational load on the server that may even result in congesting the network. A scalable coder can resolve this problem by adjusting the rate-quality balance and managing the number of optional bits allocated to each user.

A desirable feature of a coder is the ability to dynamically adjust coder properties to the instantaneous conditions of transmission channels. This feature is very useful in some applications, such as DCME (Digital Circuit Multiplication Equipment) and PCME (Packet Circuit Multiplication Equipment), in overload situations (too many concurrent active channels), "in-band" signaling, or "in-band" data transmission [9]. In case of varying channel condition that could lead to various channel error rates, a scalable coder can use a lengthier channel code, which in turn forces us to lower the source rate when bandwidth is fixed, to improve the transmission reliability. This is basically a tradeoff between voice quality and error correction capability.

Scalability has become an important issue in multimedia streaming over packet networks such as the Internet [9]. Several scalable coding algorithms have been proposed in literature. The embedded version of the G.726 (ITU-T G.727 ADPCM) [10], the MPEG-4 Code-Excited Linear Prediction (CELP) algorithm, and the MPEG-4 Harmonic Vector Excitation Coding (HVXC) are some of the standardized scalable coders [5]. The recently standardized ITU-T G.729.1 [11], an 8–32 kbps scalable speech coder for wideband telephony and voice over IP (VoIP) applications, is scalable in bit rate, bandwidth and computational complexity. Its bitstream comprises 12 embedded layers with a core layer interoperable with ITU-T G.729 [12]. The G.729.1 output bandwidth is 50–4000 Hz at 8 and 12 kbit/s and 50–7000 Hz from 14 to 32 kbit/s (per 2 kbit/s steps). A Scalable Phonetic Vocoder (SPV), capable of operating at rates 300–1100 bps, is introduced in [13]. The proposed SPV uses a Hidden Markov Model (HMM) based phonetic speech recognizer to estimate the parameters for a Mixed Excitation Linear Prediction (MELP) speech synthesizer [14]. Subsequently, it employs a scalable system to quantize the error signal between the original and phonetically-estimated MELP parameters.

In this paper, we introduce a very low bit-rate scalable speech coder by generalizing embedded quantization to matrix quantization (MQ), which is our main contribution in this paper. The MQ scheme, to which we add the embedded property, is based on the split matrix quantization (SMQ) of the line spectral frequencies (LSFs) [15]. By exploiting the SMQ, both the computational complexity and the memory requirement of the quantization are significantly reduced. Our embedded MQ coder of the LSFs leads to a *fine-grain* scalable scheme, as shown in the next sections.

The rest of the paper is organized as follows. Section 2 describes the method used to produce the initial codebooks for an SMQ. In Section 3, the embedded MQ of the LSFs is presented. Section 4 is devoted to the model of the linear predictive coding (LPC) excitation and determination of the excitation parameters, including band-splitting frequency, pitch period, and voicing. Performance evaluation and some experimental results using the proposed scalable coder are given in Section 5 with conclusions presented in Section 6.

## 2. Initial Codebook Production for SMQ

In our implementation, the LSFs are used as the spectral features in an MQ system. Each matrix is composed of four 40 ms frames, each frame extracted using a hamming window of 50% overlap with adjacent frames, that is, a frame shift of 20 ms, sampled at 8 kHz. The LSF parameters are obtained from an LPC model of order 10, based on the autocorrelation method.

One of the problems we encounter in the codebook production for the MQ is the high computational complexity that usually forces us to use short training sequence or codebooks of small sizes. Although this is an one time process for the training of each codebook, it is time consuming to tune the codebooks by changing some parameters. In this case, writing fast codes (e.g., see [16]), exploiting a computationally modest distortion measure, and suboptimal quantization methods, make the MQ scheme feasible even for processors with moderate processing power. Multistage MQ (MSMQ) [17, 18] and SMQ [15] are two possible solutions to suboptimality in MQ. The Suboptimality of these quantizers mostly arises from the fact that not all potential correlations are used. By using SMQ, we achieve both a lower computational complexity for the codebook production and a lower memory requirement, as compared to a nonsplit MQ.

The LSFs are ideal for split quantization. This is because the spectral sensitivity of these parameters is localized; that is, a change in a given LSF merely affects neighboring frequency regions of the LPC power spectrum. Hence, split quantization of the LSFs cause negligible leakage of the quantization distortion from one spectral region to another [19].

where indicates the th LSF in the th analysis frame.

where *s*(*n*) represents the speech signal,
and
stand for the frame shift and the frame length, respectively. According to [15],
= 0.15 is a reasonable choice.

*classified*quantizer ([3, pages 423-424]). This quantizer encodes the spectral parameters at different bit rates, depending on the voicing information, and thus leads to a variable rate coding system. In this two-codebook design, an extra bit is employed for the codebook selection to indicate which codebook is to be used to extract the proper codeword. Table 1 illustrates codebook sizes in our SMQ system. As shown, a lower resolution codebook is used for quantization of upper LSFs due to the lower sensitivity of the human auditory system (HAS) to higher frequencies. The bit allocation given in Table 1 results in an average bit rate of 550 bps for representing the spectral parameters.

Number of bits allocated to the SMQ codebooks.

Codebook type | 1st split | 2nd split | 3rd split | 4th split | 5th split | Total |
---|---|---|---|---|---|---|

Mixed voicing | 10 | 10 | 10 | 9 | 8 | 47 |

Unvoiced | 8 | 8 | 8 | 7 | 6 | 37 |

We designed codebooks of this split matrix quantizer, based on the LBG algorithm [21], using 1200 TIMIT files [22] as our training database. A sliding block technique is used to capture all interframe transitions in the training set. This is accomplished by using a four-frame window sliding over the training data in one-frame steps.

and the operator denotes an element-by-element matrix division.

To guarantee stability of the LPC synthesis filters, the LSFs must appear in ascending order. However, with the spectrally weighted LSF distance measure used for designing the split quantizer, the LSF ascending order is not guaranteed. As a solution, Paliwal and Atal [19] used the mean of the LSF vectors, within a given voronoi region, to define the centroid. Our solution to preserve stability of the LPC synthesis filters is to put all five generated codewords into a
matrix and then sort each column of not yet ascended order columns of the reproduced spectral parameters matrix across all 5 codewords in ascending order. However, the resulting synthesis filters might become marginally stable due to the poles located too close to the unit circle. The problem is aggravated in fixed-point implementation, where a marginally stable filter can actually become unstable after quantization and loss of precision during processing. Thus, in order to avoid sharp spectral peaks in the spectrum that may lead to unnatural synthesized speech, bandwidth expansion through modification of the LPC vectors is employed. In this case, each LPC filter coefficient, *a* _{
i,
} is replaced by
, for
, where
= 0.99. This operation flattens the spectrum, especially around formant frequencies. Another advantage of the bandwidth expansion is to shorten the duration of the impulse response of the LPC filter, which limits the propagation of channel errors ([8, page 133]).

The next section introduces the method to construct the tree structured codebooks for the embedded quantizer, using the initial codebooks designed in this section.

## 3. Codebook Production for Embedded Matrix Quantizer

*cell-merging*or

*region-merging*method. A cell-merging tree is formed by merging the Voronoi regions in pairs and allocating new centroids to these larger encoding areas. Merging two regions can be interpreted as erasing the boundary between the regions on the Voronoi diagram [23].

where represents the Voronoi region of and the metric is the distance between and . It is worth mentioning that we have codewords at depth . In (10), the summation is over all valid , that is, , for which belongs to the voronoi region .

This number becomes quite large even for moderate values of . Hence, this simple solution cannot be used in practice due to its prohibitively high complexity.

The problem of finding proper regions to merge is similar to a complete bipartite matching problem ([24, page 182]). In fact, we must select a subset of the graph illustrated in Figure 2 that minimizes the accumulated distortion in depth *d,* while no two arcs are incident to the same node and all of the nodes are matched. Some methods to solve this problem are presented in [24] that offer a computational complexity of
, where
is the number of nodes in the graph. However, we used the suboptimal method proposed by Chu in [4] to reduce the merging processing time, which worked well in our implementation. In this method, we sort arc values in ascending order, select arcs with lower values, and remove arcs ending at nodes belonging to the arcs already selected. Therefore, no sharing occurs between Voronoi regions at depth
, which is a necessary characteristic for the constructed tree. The select-remove procedure is continued until a complete matched graph is achieved.

In the following part of this section, we propose four types of distortion criteria to attribute to arc values in the merging process and give details of a comparative assessment.

*,*that is, for all , according to the accumulated squared distortions of the codewords and . For the Voronoi region of , , we have

In (24), and are the number of training matrices that fall into the Voronoi region of and , respectively. Equation (23) in the case of no-weighting and vector codewords reduces to the Equitz's formula in [23].

where it turns into in the case of no-weighting.

The bit allocation used for embedded quantization at different rates. UV and VUV correspond to unvoiced and mixed voicing codebooks, respectively.

Average bits per segment | No. of bits for | No. of bits for | No. of bits for | No. of bits for | No. of bits for | |||||
---|---|---|---|---|---|---|---|---|---|---|

representing | representing | representing | representing | representing | ||||||

LSF1 & LSF2 | LSF3 & LSF4 | LSF5 & LSF6 | LSF7 & LSF8 | LSF9 & LSF10 | ||||||

VUV | UV | VUV | UV | VUV | UV | VUV | UV | VUV | UV | |

43 | 10 | 8 | 10 | 8 | 10 | 8 | 9 | 7 | 8 | 6 |

42 | 10 | 8 | 10 | 8 | 10 | 8 | 9 | 7 | 7 | 5 |

41 | 10 | 8 | 10 | 8 | 10 | 8 | 8 | 6 | 7 | 5 |

40 | 10 | 8 | 10 | 8 | 9 | 7 | 8 | 6 | 7 | 5 |

39 | 10 | 8 | 9 | 7 | 9 | 7 | 8 | 6 | 7 | 5 |

38 | 9 | 7 | 9 | 7 | 9 | 7 | 8 | 6 | 7 | 5 |

37 | 9 | 7 | 9 | 7 | 9 | 7 | 8 | 6 | 6 | 4 |

36 | 9 | 7 | 9 | 7 | 9 | 7 | 7 | 5 | 6 | 4 |

35 | 9 | 7 | 9 | 7 | 8 | 6 | 7 | 5 | 6 | 4 |

34 | 9 | 7 | 8 | 6 | 8 | 6 | 7 | 5 | 6 | 4 |

33 | 8 | 6 | 8 | 6 | 8 | 6 | 7 | 5 | 6 | 4 |

32 | 8 | 6 | 8 | 6 | 8 | 6 | 7 | 5 | 5 | 3 |

31 | 8 | 6 | 8 | 6 | 8 | 6 | 6 | 4 | 5 | 3 |

30 | 8 | 6 | 8 | 6 | 7 | 5 | 6 | 4 | 5 | 3 |

29 | 8 | 6 | 7 | 5 | 7 | 5 | 6 | 4 | 5 | 3 |

28 | 7 | 5 | 7 | 5 | 7 | 5 | 6 | 4 | 5 | 3 |

27 | 7 | 5 | 7 | 5 | 7 | 5 | 6 | 4 | 4 | 2 |

The Spectral Distortion (SD) is applied to 4 minutes of speech utterances outside the training set. As depicted in Figure 3, in the case of full search, type 1 and type 3 distortion measures perform almost similarly and a little better than their unweighted versions (types 2 and 4). Indeed, full codebook search results in the same performance for these four types of measures at full resolution, because all the four types of trees have the same terminal nodes. Although the type 3 measure performs better than the type 2 measure in full search, it is outperformed by types 1 and 2 distortion measures in the fast tree search. This behavior comes from the fact that equality (13) is satisfied for the fast tree search.

It is clear from Figure 3 that the fast tree search does not necessarily find the best matched codeword. Generally speaking, it may be thought that there should be a slight difference between the spectral distortions in full search and fast tree search; nevertheless, we believe this relatively considerable difference, which we see in Figure 3, is due to the codebook structures having matrix codewords.

## 4. Adaptive Dual-Band Excitation

Multiband excitation (MBE) was originally proposed by Griffin and Lim and was shown to be an efficient paradigm for low rate speech coding to produce natural sounding speech [25]. The original MBE model, however, is inapplicable to speech coding at very low rates, that is, below 4 kbps, due to the large number of frequency bands it employs. On the other hand, dual-band excitation, as the simplest possible MBE model, has attracted lots of attention by the research community [26]. It has been shown that most (more than 70%) of the speech frames can be represented by only two bands [26]. Further analysis of the speech spectra revealed that the low frequency band is usually voiced, where the high-frequency band usually contains a noise-like signal (i.e., unvoiced) [26]. In our coding system, we use the dual-band MBE model proposed in [27], in which the two bands join at a variable frequency determined based on the voicing characteristics of speech signals on a frame-by-frame basis in the LPC model. For convenience, we have quoted the main idea of this two-band excitation model from [27] below.

*spectrogram*technique that employs a segment-based analysis using an appropriate window in the frequency domain [29]. Pay attention that this windowing process is different from the one we used in the time domain. The windowing in the time domain is same as the one we used in Section 2. Here, the windowing is performed in the frequency domain using a

*Hanning*window

*N*is the total number of samples in each frame of the speech signal which is 320 here, = min , , in the th spectrogram coefficient, in the number of DFT points which is 64 here, is the predefined window length which is 32 here, and , = , is a Hanning window in the frequency domain. As is evident, as long as , equals . The peak of the spectrogram, , = , gives the IF of the spectrum

where represents IF of the spectrum over frequencies from 0 to , where is the sampling frequency which is 8 kHz in our designated coder.

*flatness*of in a number of subbands, . This is formulated as

where is the subband index, , and the vector is the th part of , = , located in the th band, whose flatness is represented by . The bar over the vector stands for the mean of this vector.

where = min , that means the minimum value of for which .

The threshold is calculated based on the mean of the spectrum flatness within a certain band, averaged over a number of previous frames composed of voiced and unvoiced frames [27]. In this way, the spectrum is assumed to be periodic at frequencies below , and it is considered random at frequencies over , with a resolution specified by .

Bits allocation for pitch, transition frequency, and gain codebooks.

Codebook type | Pitch | Transition Frequency | Gain | Total |
---|---|---|---|---|

No. of bits allocated | 11 | 9 | 7 | 27 |

## 5. Performance Evaluation and Experiments

### 5.1. Spectral Dynamics of MQ Versus VQ

Spectral dynamics and spectral distortion of matrix quantization versus vector quantization at the same rate.

Average number of bits per segment of four frames | 43 | 38 | 33 |
---|---|---|---|

ASE for original speech | 6.57 | 6.57 | 6.57 |

ASE for MQ | 6.21 | 6.15 | 6.11 |

ASE for MQ with segments junction smoothing | 6.08 | 6.02 | 5.97 |

ASE for VQ at the same rate as MQ | 6.56 | 6.54 | 6.43 |

ASD for MQ | 1.65 | 1.75 | 2.05 |

ASD for MQ with segments junction smoothing | 1.63 | 1.72 | 2.01 |

ASD for VQ at the same rate as MQ | 2.50 | 2.68 | 3.02 |

As mentioned earlier, codewords of the designated matrix quantizer are obtained through averaging over real input matrices of the spectral parameters. These matrices have smooth spectral trajectories, thus the averaging process over the matrices results in codewords having relatively smooth spectral dynamics. This is while codewords of the VQ are obtained by averaging over a set of single frame input vectors and not a trajectory of spectral parameters like MQ. This results in better performance of the MQ over the VQ, in terms of spectral dynamics, as confirmed by experimental results given in Table 4. According to this table, the MQ yields both smoother spectral trajectories and lower average spectral distortions, as compared to the VQ at a same rate.

To improve the performance of the MQ, we use simple spectral parameter smoothing at the junction of codewords selected in consecutive segments. In this smoothing method, we replace the first column of the selected minimum distortion codeword by a weighted mean of the first column of the currently selected codeword and the last column of the previously selected codeword. Weighting used for the first column of the recent codeword is 0.75 and for the last column of the previously selected codeword is 0.25. In this smoothing method, the ascending order of the LSFs is guaranteed.

### 5.2. Intelligibility and Quality Assessment

PESQ scores at different rates.

Bit rate | PESQ score | PESQ score |
---|---|---|

(full search) | (tree search) | |

900 | 2.512 | 2.331 |

850 | 2.468 | 2.298 |

800 | 2.447 | 2.293 |

750 | 2.437 | 2.28 |

700 | 2.38 | 2.24 |

No-quantization case | 2.651 |

As it is clear in Figures 7 and 8, the quality difference between these three rates is relatively small, consistent with the fine-granularity property. In some speech samples the quality difference at different rates was almost imperceptible. The results shown in these figures are achieved by doing the test over a variety of samples and taking the average over the scores.

DRT assessment results.

Bit-rate | 900 | 800 | 700 |
---|---|---|---|

Voicing | 100 | 100 | 100 |

Nasality | 67 | 62 | 56 |

Sustention | 78 | 73 | 70 |

Sibilation | 87.5 | 87.5 | 85 |

Graveness | 100 | 100 | 100 |

Compactness | 100 | 87.5 | 87.5 |

Total | 89 | 85 | 83 |

### 5.3. Memory Requirement of the Embedded Quantizer

Thus, the total amount of memory required for the embedded quantizer is slightly less than twice of the memory used for the initial codebooks. In the applications based on fast tree-structured search, there is no need to have internal codewords at the decoder. This is while the internal codewords must be available in both coder and decoder in an embedded quantization scheme ([3, page 413]).

Hence, the embedded SMQ proposes a memory requirement that is much lower than that of a nonSMQ of the same resolution. This confirms a proper selection of the SMQ for our embedded matrix quantizer in the sense of both the computational complexity and size of the memory.

## 6. Conclusion

In this paper, which was a detailed version of [36], we have introduced a very low rate scalable speech coder with 80 ms coding delay, using classified embedded matrix quantization and adaptive dual-band excitation. Although the delay is relatively high with respect to many standardized coders, it is still suitable for some applications, since a delay as high as 250 ms has found to be tolerable for some practical applications according to [37–39]. The transition frequency of the dual-band excitation model is determined based on the evaluation of flatness of the instantaneous frequency contour in the frequency domain. A cell-merging process is applied to the initial codebooks of the SMQ scheme to organize codewords into a tree-structure. The natural embedded property of the constructed tree codebooks helped to build a fine-grain scalable coder operating in the range of 700–900 bps at 12.5 bps steps. It is obvious that a same cell merging process can be applied to larger size initial codebooks in order to get a wider range of bit rate operation. Our intention of testing the bit range of 700–900 was just to evaluate the granularity of the designed embedded quantizer. Four types of distortion measures to assign to the arc values of the initial graph in the merging process, in both full and fast-tree searches, have been introduced and assessed comparatively. Interframe correlation between adjacent frames is exploited to efficiently encode gain, pitch, and the transition frequency using the VQ method. Better performance of the proposed embedded matrix quantizer in comparison with the VQ, at the same bit rate, has been confirmed, in terms of both spectral dynamics and spectral distortion. Speech quality assessment and DRT comparison of the synthesized speech at different rates show that the proposed scalable coding system has the property of fine-granularity.

## Declarations

### Acknowledgment

The authors want to express their thankfulness to Dr. Wai C. Chu and also our friend Tim Han for reviewing this paper several times and making valuable comments and suggestions.

## Authors’ Affiliations

## References

- Tzou K-H: Embedded Max quantization.
*Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '86), 1986, Tokyo, Japan*505-508.View ArticleGoogle Scholar - Max J: Ouantization for minimum distortion.
*IEEE Transactions on Information Theory*1960, 6: 7-12. 10.1109/TIT.1960.1057548MathSciNetView ArticleGoogle Scholar - Gersho A, Gray RM:
*Vector Quantization and Signal Compression*. Kluwer Academic Publishers, Dordrecht, The Netherlands; 1992.View ArticleMATHGoogle Scholar - Chu WC: Embedded quantization of line spectral frequencies using a multistage tree-structured vector quantizer.
*IEEE Transactions on Audio, Speech and Language Processing*2006, 14(4):1205-1217.View ArticleGoogle Scholar - Chu WC: A scalable MELP coder based on embedded quantization of line spectral frequencies.
*Proceedings of International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS '05), December 2005, Hong Kong*29-32.Google Scholar - Ravelli E, Daudet L: Embedded polar quantization.
*IEEE Signal Processing Letters*2007, 14(10):657-660.View ArticleGoogle Scholar - Said A, Pearlman WA: A new, fast, and efficient image codec based on set partitioning in hierarchical trees.
*IEEE Transactions on Circuits and Systems for Video Technology*1996, 6(3):243-250. 10.1109/76.499834View ArticleGoogle Scholar - Chu W:
*Speech Coding Algorithms: Foundation and Evolution of Standardized Coders*. John Wiley & Sons, New York, NY, USA; 2003.View ArticleMATHGoogle Scholar - Hersent O, Petit JP, Gurle D:
*Beyond VoIP Protocols: Understanding Voice Technology and Networking Techniques for IP Telephony*. John Wiley & Sons, New York, NY, USA; 2005.View ArticleGoogle Scholar - ITU : 5−, 4−, 3−, and 2-Bits Sample Embedded Adaptive Differential Pulse Code Modulation (ADPCM)—Recommend. G.727, Geneva, Switzerland, 1990Google Scholar
- ITU-T Rec. G.729.1 : G.729-based embedded variable bit-rate coder: an 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. May 2006Google Scholar
- ITU-T Rec. G.729 : Coding of Speech at 8 kbit/s Using Conjugate Structure Algebraic Code Excited Linear Prediction (CSACELP). March 1996Google Scholar
- McCree A: A scalable phonetic vocoder framework using joint predictive vector quantization of melp parameters.
*Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), May 2006, Toulouse, France*1: 705-709.Google Scholar - Supplee LM, Cohn RP, Collura JS, McCree AV: MELP: the new federal standard at 2400 bps.
*Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '), April 1997, Munich, Germany*2: 1591-1594.Google Scholar - Xydeas CS, Papanastasiou C: Split matrix quantization of LPC parameters.
*IEEE Transactions on Speech and Audio Processing*1999, 7(2):113-125. 10.1109/89.748117View ArticleGoogle Scholar - Getreuer P: Writing Fast MATLAB Code. 2006, http://www.math.ucla.edu/~getreuer/matopt.pdf
- Ozaydin S, Baykal B: Multi stage matrix quantization for very low bit rate speech coding.
*Proceedings of the 3rd Workshop on Signal Processing Advances in Wireless Communications, 2001*372-375.Google Scholar - Özaydın S, Baykal B: Matrix quantization and mixed excitation based linear predictive speech coding at very low bit rates.
*Speech Communication*2003, 41(2-3):381-392. 10.1016/S0167-6393(03)00009-8View ArticleGoogle Scholar - Paliwal KK, Atal BS: Efficient vector quantisation of LPC parameters at 24 bits/frame.
*IEEE Transactions on Speech and Audio Processing*1993, 1(1):3-14. 10.1109/89.221363View ArticleGoogle Scholar - Van Trees HL:
*Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory*. John Wiley & Sons, New York, NY, USA; 2002.View ArticleGoogle Scholar - Linde Y, Buzo A, Gray RM: An algorithm for vector quantizer design.
*IEEE Transactions on Communications Systems*1980, 28(1):84-95. 10.1109/TCOM.1980.1094577View ArticleGoogle Scholar - DARPA TIMIT :
*Acoustic-Phonetic Continuous Speech Corpus*. National Institute of Standards and Technology, Gaitherburg, Md, USA; 1993.Google Scholar - Riskin EA, Ladner R, Wang R-Y, Atlas LE: Index assignment for progressive transmission of full-search vector quantization.
*IEEE Transactions on Image Processing*1994, 3(3):307-312. 10.1109/83.287025View ArticleGoogle Scholar - Lawler EL:
*Combinatorial Optimization: Networks and Matroids*. Dover, New York, NY, USA; 2001.MATHGoogle Scholar - Griffin DW, Lim JS: Multiband excitation vocoder.
*IEEE Transactions on Acoustics, Speech, and Signal Processing*1988, 36(8):1223-1235. 10.1109/29.1651View ArticleMATHGoogle Scholar - Chiu KM, Ching PC: A dual-band excitation LSP codec for very low bit rate transmission.
*Proceedings of the International Symposium on Speech, Image Processing, and Neural Networks (ISSIPNN '94), April 1994, Hong Kong*479-482.View ArticleGoogle Scholar - Ghaemmaghami S, Deriche M: A new approach to modeling excitation in very low-rate speech coding.
*Proceedings of International Conference on Acoustics, Speech and Acoustics, Speech and Signal Processing (ICASSP '98), May 1998, Seattle, WA, USA*597-600.Google Scholar - Tremain TE: The government standard linear predictive coding algorithm: LPC-10.
*Speech Technology Magazine*1982, 40-49.Google Scholar - Boashash B: Estimating and interpreting the instantaneous frequency of a signal—part 1: fundamentals.
*Proceedings of the IEEE*1992, 80(4):520-538. 10.1109/5.135376View ArticleGoogle Scholar - Knagenhjelm HP, Kleijn WB: Spectral dynamics is more important than spectral distortion.
*Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '95), May 1995, Detroit, Mich, USA*1: 732-735.Google Scholar - ITU : Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs—ITU-T Recommendation P.862. 2001.Google Scholar
- ITU : Mean Opinion Score (MOS), Methods For Subjective Determination of Transmission Quality—ITU-T Recommendation P.800.1. 1996.Google Scholar
- ITU : MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA), Method For The Subjective Assessment of Intermediate Quality Levels of Coding Systems—ITU-R BS.1534-1. January 2003.Google Scholar
- Vincent E: MUSHRAM: a MATLAB interface for MUSHRA listening tests. 2005, http://www.elec.qmul.ac.uk/people/emmanuelv/mushram/
- Deller JR, Hansen JHL, Proakis JG:
*Discrete-Time Processing of Speech Signals*. John Wiley & Sons, New York, NY, USA; 2000.Google Scholar - Jahangiri E, Ghaemmaghami S: Scalable speech coding at rates below 900 BPS.
*Proceedings of IEEE International Conference on Multimedia and Expo (ICME '08), June 2008, Hannover, Germany*85-88.Google Scholar - Dusan S, Flanagan JL, Karve A, Balaraman M: Speech compression by polynomial approximation.
*IEEE Transactions on Audio, Speech and Language Processing*2007, 15(2):387-395.View ArticleGoogle Scholar - Klemmer ET: Subjective evaluation of transmission delay in telephone conversations.
*Bell Labs Technical Journal*1967, 1141-1147.Google Scholar - Brady PT: Effects of transmission delay on conversational behaviour on echo-free telephone circuits.
*Bell Labs Technical Journal*1971, 115-134.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.