Greedy sparse decompositions: a comparative study

Dymarski, Przemyslaw; Moreau, Nicolas; Richard, Gaël

doi:10.1186/1687-6180-2011-34

Review
Open access
Published: 03 August 2011

Greedy sparse decompositions: a comparative study

Przemyslaw Dymarski¹,
Nicolas Moreau² &
Gaël Richard²

EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 34 (2011) Cite this article

3581 Accesses
8 Citations
Metrics details

Abstract

The purpose of this article is to present a comparative study of sparse greedy algorithms that were separately introduced in speech and audio research communities. It is particularly shown that the Matching Pursuit (MP) family of algorithms (MP, OMP, and OOMP) are equivalent to multi-stage gain-shape vector quantization algorithms previously designed for speech signals coding. These algorithms are comparatively evaluated and their merits in terms of trade-off between complexity and performances are discussed. This article is completed by the introduction of the novel methods that take their inspiration from this unified view and recent study in audio sparse decomposition.

1 Introduction

Sparse signal decomposition and models are used in a large number of signal processing applications, such as, speech and audio compression, denoising, source separation, or automatic indexing. Many approaches aim at decomposing the signal on a set of constituent elements (that are termed atoms, basis or simply dictionary elements), to obtain an exact representation of the signal, or in most cases an approximative but parsimonious representation. For a given observation vector x of dimension N and a dictionary F of dimension N × L, the objective of such decompositions is to find a vector g of dimension L which satisfies F g = x. In most cases, we have L ≫ N which a priori leads to an infinite number of solutions. In many applications, we are however interested in finding an approximate solution which would lead to a vector g with the smallest number K of non-zero components. The representation is either exact (when g is solution of F g = x) or approximate (when g is solution of F g ≈ x). It is furthermore termed as sparse representation when K ≪ N.

The sparsest representation is then obtained by finding g∈ ℝ ^L that minimizes under the constraint ||g||₀ ≤ K or, using the dual formulation, by finding g∈ ℝ ^L that minimizes ||g||₀ under the constraint .

An extensive literature exists on these iterative decompositions since this problem has received a strong interest from several research communities. In the domain of audio (music) and image compression, a number of greedy algorithms are based on the founding paper of Mallat and Zhang [1], where the Matching Pursuit (MP) algorithm is presented. Indeed, this article has inspired several authors who proposed various extensions of the basic MP algorithm including: the Orthogonal Matching Pursuit (OMP) algorithm [2], the Optimized Orthogonal Matching Pursuit (OOMP) algorithm [3], or more recently the Gradient Pursuit (GP) [4], the Complementary Matching Pursuit (CMP), and the Orthogonal Complementary Matching Pursuit (OCMP) algorithms [5, 6]. Concurrently, this decomposition problem is also heavily studied by statisticians, even though the problem is often formulated in a slightly different manner by replacing the L₀ norm used in the constraint by a L₁ norm (see for example, the Basis Pursuit (BP) algorithm of Chen et al. [7]). Similarly, an abundant literature exists in this domain in particular linked to the two classical algorithms Least Angle Regression (LARS) [8] and the Least Absolute Selection and Shrinkage Operator [9].

However, sparse decompositions also received a strong interest from the speech coding community in the eighties although a different terminology was used.

The primary aim of this article is to provide a comparative study of the greedy "MP" algorithms. The introduced formalism allows to highlight the main differences between some of the most popular algorithms. It is particularly shown in this article that the MP-based algorithms (MP, OMP, and OOMP) are equivalent to previously known multi-stage gain-shape vector quantization approaches [10]. We also provide a detailed comparison between these algorithms in terms of complexity and performance. In the light of this study, we then introduce a new family of algorithms based on the cyclic minimization concept [11] and the recent Cyclic Matching Pursuit (CyMP) [12]. It is shown that these new proposals outperform previous algorithms such as OOMP and OCMP.

This article is organized as follows. In Section 2, we introduce the main notations used in this article. In Section 3, a brief historical view of speech coding is proposed as an introduction to the presentation of classical algorithms. It is shown that the basic iterative algorithm used in speech coding is equivalent to the MP algorithm. The advantage of using an orthogonalization technique for the dictionary F is further discussed and it is shown that it is equivalent to a QR factorization of the dictionary. In Section 4, we extend the previous analysis to recent algorithms (conjugate gradient, CMP) and highlight their strong analogy with the previous algorithms. The comparative evaluation is provided in Section 5 on synthetic signals of small dimension (N = 40), typical for code excited linear predictive (CELP) coders. Section 6 is then dedicated to the presentation of the two novel algorithms called herein CyRMGS and CyOOCMP. Finally, we suggest some conclusions and perspectives in Section 7.

2 Notations

In this article, we adopt the following notations. All vectors x are column vectors where x_i is the i th component. A matrix F ∈ ℝ^{N × L}is composed of L column vectors such as F = [f¹ ··· f ^L ] or alternatively of NL elements denoted , where k (resp. j) specifies the row (resp. column) index. An intermediate vector x obtained at the k th iteration of an algorithm is denoted as x ^k . The scalar product of the two real valued vectors is expressed by < x, y>= x ^t y . The L_p norm is written as ||·|| _p and by convention ||·|| corresponds to the Euclidean norm (L₂). Finally, the orthogonal projection of x on y is the vector α y that satisfies < x - α y , y >= 0, which brings α =< x, y>/||y||².

3 Overview of classical algorithms

3.1 CELP speech coding

Most modern speech codecs are based on the principle of CELP coding [13]. They exploit a simple source/filter model of speech production, where the source corresponds to the vibration of the vocal cords or/and to a noise produced at a constriction of the vocal tract, and the filter corresponds to the vocal/nasal tracts. Based on the quasi-stationary property of speech, the filter coefficients are estimated by linear prediction and regularly updated (20 ms corresponds to a typical value). Since the beginning of the seventies and the "LPC-10" codec [14], numerous approaches were proposed to effectively represent the source.

In the multi-pulse excitation model proposed in [15], the source was represented as , where δ(n) is the Kronecker symbol. The position n_k and gain g_k of each pulse were obtained by minimizing , where x is the observation vector and is obtained by predictive filtering (filter H(z)) of the excitation signal e(n). Note that this minimization was performed iteratively, that is for one pulse at a time. This idea was further developed by other authors [16, 17] and generalized by [18] using vector quantization (a field of intensive research in the late seventies [19]). The basic idea consisted in proposing a potential candidate for the excitation, i.e. one (or several) vector(s) was(were) chosen in a pre-defined dictionary with appropriate gain(s) (see Figure 1).

The dictionary of excitation signals may have a form of an identity matrix (in which nonzero elements correspond to pulse positions), it may also contain Gaussian sequences or ternary signals (in order to reduce computational cost of filtering operation). Ternary signals are also used in ACELP coders [20], but it must be stressed that the ACELP model uses only one common gain for all the pulses. Thus, it is not relevant to the sparse approximation methods, which demand a separate gain for each vector selected from the dictionary. However, in any CELP coder, there is an excitation signal dictionary and a filtered dictionary, obtained by passing the excitation vectors (columns of a matrix representing the excitation signal dictionary) through the linear predictive filter H(z). The filtered dictionary F = {f¹,..., f ^L } is updated every 10-30 ms. The dictionary vectors and gains are chosen to minimize the norm of the error vector. The CELP coding scheme can then be seen as an operation of the multi-stage shape-gain vector quantization on a regularly updated (filtered) dictionary.

Let F be this filtered dictionary (not shown in Figure 1). It is then possible to summarize the CELP main principle as follows: given a dictionary F composed of L vectors f ^j , j = 1, ···, L of dimension N and a vector x of dimension N, we aim at extracting from the dictionary a matrix A composed of K vectors amongst L and at finding a vector g of dimension K which minimizes

This is exactly the same problem as the one presented in introduction.^a This problem, which is identical to multi-stage gain-shape vector quantization [10], is illustrated in Figure 2.

Typical values for the different parameters greatly vary depending on the application. For example, in speech coding [20] (and especially for low bit rate) a highly redundant dictionary (L ≫ N) is used and coupled with high sparsity (K very small).^b In music signals coding, it is common to consider much larger dictionaries and to select a much larger number of dictionary elements (or atoms). For example, in the scheme proposed in [21], based on an union of MDCTs, the observed vector x represents several seconds of the music signal sampled at 44.1 kHz and typical values could be N > 10⁵, L > 10⁶, and K ≈ 10³.

3.2 Standard iterative algorithm

If the indices j(1) ··· j(K) are known (e.g., the matrix A), then the solution is easily obtained following a least square minimization strategy [22]. Let be the best approximate of x, e.g. the orthogonal projection of x on the subspace spanned by the column vectors of A verifying:

The solution is then given by

(1)

when A is composed of K linearly independent vectors which guarantees the invertibility of the Gram matrix A^tA.

The main problem is then to obtain the best set of indices j(1) ··· j(K), or in other words to find the set of indices that minimizes or that maximizes

(2)

since we have if g is chosen according to Equation 1.

This best set of indices can be obtained by an exhaustive search in the dictionary F (e.g., the optimal solution exists) but in practice the complexity burdens impose to follow a greedy strategy.

The main principle is then to select one vector (dictionary element or atom) at a time, iteratively. This leads to the so-called Standard Iterative algorithm [16, 23]. At the k th iteration, the contribution of the k - 1 vectors (atoms) previously selected is subtracted from x

and a new index j(k) and a new gain g_k verifying

are determined.

Let

α ^j =< f ^j , f ^j >= || f ^j ||² be the vector (atom) energy,

be the crosscorrelation between f ^j and x then the crosscorrelation between f ^j and the error (or residual) e ^k at step k,

the updated crosscorrelation.

By noticing that

one obtains the Standard Iterative algorithm, but called herein as the MP (cf. Appendix). Indeed, although it is not mentioned in [1], this standard iterative scheme is strictly equivalent to the MP algorithm.

To reduce the sub-optimality of this algorithm, two common methodologies can be followed. The first approach is to recompute all gains at the end of the minimization procedure (this method will constitute the reference MP method chosen for the comparative evaluation section). A second approach consists in recomputing the gains at each step by applying Equation 1 knowing j(1) ··· j(k), i.e., matrix A. Initially proposed in [16] for multi-pulse excitation, it is equivalent to an orthogonal projection of x on the subspace spanned by f^j(1)··· f^j(k), and therefore, equivalent to the OMP later proposed in [2].

3.3 Locally optimal algorithms

3.3.1 Principle

A third direction to reduce the sub-optimality of the standard algorithm aims at directly finding the subspace which minimizes the error norm. At step k, the subspace of dimension k - 1 previously determined and spanned by f^{j (1)}··· f^{j (k- 1)}is extended by the vector f^{j (k)}, which maximizes the projection norm of x on all possible subspaces of dimension k spanned by f^j(1)··· f^{j (k-1)}f^j. As illustrated in Figure 3, the solution obtained by this algorithm may be better than the other solution obtained by the previous OMP algorithm.

This algorithm produces a set of locally optimal indices, since at each step, the best vector is added to the existing subspace (but obviously, it is not globally optimal due to its greedy process). An efficient mean to implement this algorithm consists in orthogonalizing the dictionary F at each step k relatively to the k - 1 chosen vectors.

This idea was already suggested in [17], and then later developed in [24, 25] for multi-pulse excitation, and formalized in a more general framework in [26, 23]. This framework is recalled below and it is shown as to how it encompasses the later proposed OOMP algorithm [3].

3.3.2 Gram-Schmidt decomposition and QR factorization

Orthogonalizing a vector f ^j with respect to vector q (supposed herein of unit norm) consists in subtracting from f ^j its contribution in the direction of q. This can be written:

More precisely, if k - 1 successive orthogonalizations are performed relatively to the k - 1 vectors q¹ · · · q^k-1which form an orthonormal basis, one obtains for step k:

Then, maximizing the projection norm of x on the subspace spanned by is done by choosing the vector maximizing with

and

In fact, this algorithm, presented as a Gram-Schmidt decomposition with a partial QR factorization of the matrix f, is equivalent to the OOMP algorithm [3]. This is referred herein as the OOMP algorithm (see Appendix).

The QR factorization can be shown as follows. If is the component of f ^j on the unit norm vector q ^k , one obtains:

For the sake of clarity and without loss of generality, let us suppose that the k th selected vector corresponds to the k th column of matrix F (note that this can always be obtained by column wise permutation), then, the following relation exists between the original (F) and the orthogonalized (F_orth(k+1)) dictionaries

where the orthogonalized dictionary F_orth(k+1) is given by

due to the orthogonalization step of vector by q ^k .

This readily corresponds to the Gram-Schmidt decomposition of the first k columns of the matrix F extended by the remaining L - k vectors (referred as the modified Gram-Schmidt (MGS) algorithm by [22]).

3.3.3 Recursive MGS algorithm

A significant reduction of complexity is possible by noticing that it is not necessary to explicitly compute the orthogonalized dictionary. Indeed, thanks to orthogonality properties, it is sufficient to update the energies and cross-correlations as follows:

A recursive update of the energies and crosscorrelations is possible as soon as the crosscorrelation is known at each step. The crosscorrelations can also be obtained recursively with

The gains can be directly obtained. Indeed, it can be seen that the scalar corresponds to the component of x (or gain) on the (k - 1) ^th vector of the current orthonormal basis, that is, the gain . The gains which correspond to the non-orthogonalized vectors can simply be obtained as:

with

which is an already computed matrix since it corresponds to a subset of the matrix R of size K × L obtained by QR factorization of matrix F. This algorithm will be further referenced herein as RMGS and was originally published in [23].

4 Other recent algorithms

4.1 GP algorithm

This algorithm is presented in detail in [4]. Therefore, the aim of this section is to provide an alternate view and to show that the GP algorithm is similar to the standard iterative algorithm for the search of index j(k) at step k, and then corresponds to a direct application of the conjugate gradient method [22] to obtain the gain g_k and error e ^k . To that aim, we will first recall some basic properties of the conjugate gradient algorithm. We will highlight how the GP algorithm is based on the conjugate gradient method and finally show that this algorithm is exactly equivalent to the OMP algorithm.^c

4.1.1 Conjugate gradient

The conjugate gradient is a classical method for solving problems that are expressed by A g = x, where A is a N × N symmetric, positive-definite square matrix. It is an iterative method that provides the solution g* = A^-1x in N iterations by searching the vector g which minimizes

(3)

Let e^k-1= x- A g^k-1be the error at step k and note that e^k-1is in the opposite direction of the gradient Φ(g) in g^k-1. The basic gradient method consists in finding at each step the positive constant c_k which minimizes Φ(g^k-1+ c_ke^k-1). In order to obtain the optimal solution in N iterations, the Conjugate Gradient algorithm consists of minimizing Φ(g), using all successive directions q¹ · · · q ^N . The search for the directions q ^k is based on the A-conjugate principle.^d

It is shown in [22] that the best direction q ^k at step k is the closest one to the gradient e^k-1that verifies the conjugate constraint (that is, e^k-1from which its contribution on q^k-1using the scalar product < u, Av > is subtracted):

(4)

The results can be extended to any N × L matrix A, noting that the two systems A g = x and A^tA g = A^t x have the same solution in g. However, for the sake of clarity, we will distinguish in the following the error e ^k = x- A g ^k and the error .

4.1.2 Conjugate gradient for parsimonious representations

Let us recall that the main problem tackled in this article consists in finding a vector g with K non-zero components that minimizes ||x- F g ||² knowing x and F. The vector g that minimizes the following cost function

verifies F^t x = F^tF g . The solution can then be obtained, thanks to the conjugate gradient algorithm (see Equation 3). Below, we further describe the essential steps of the algorithm presented in [4].

Let A^k = [f^j(1)· · · f^j(k)] be the dictionary at step k. For k = 1, once the index j(1) is selected (e.g. A¹ is fixed), we look for the scalar

where

The gradient writes

The first direction is then chosen as .

For k = 2, knowing A², we look for the bi-dimensional vector g

The gradient now writes

As described in the previous section, we now choose the direction q² which is the closest one to the gradient , which satisfies the conjugation constraint (e.g., from which its contribution on q¹ using the scalar product < u, (A²) ^tA²v > is subtracted):

At step k, Equation 4 does not hold directly since in this case the vector g is of increasing dimension which does not directly guarantee the orthogonality of the vectors q¹ · · · q ^k . We then must write:

(5)

This is referenced as GP in this article. At first, it is the standard iterative algorithm (described in Section 3.2), and then it is a conjugate gradient algorithm presented in the previous section, where the matrix A was replaced by the A^k and where the vector q ^k was modified according to Equation 5. Therefore, this algorithm is equivalent to the OMP algorithm.

4.2 CMP algorithms

The CMP algorithm and its orthogonalized version (OCMP) [5, 6] are rather straightforward variants of the standard algorithms. They exploit the following property: if the vector g (again of dimension L in this section) is the minimal norm solution of the underdetermined system F g = x, then it is also a solution of the equation system

if in F there are N linearly independent vectors. Then, a new family of algorithms can be obtained by simply applying one of the previous algorithms to this new system of equations Φ g = y with Φ = F^t (FF^t )^-1F and y= F^t (FF^t )^-1x. All these algorithms necessitate the computation of α^j = < ϕ^j , ϕ^j >, β ^j = < ϕ^j , y> and . It is easily shown that if

then, one obtains α ^j =< c ^j , f ^j >, β ^j =< c ^j , x^j> and .

The CMP algorithm shares the same update equations (and therefore same complexity) as the standard iterative algorithm except for the initial calculation of the matrix C which requires the inversion of a symmetric matrix of size N × N. Thus, in this article the simulation results for the OOCMP will be obtained with the RMGS algorithm with the modified formulas for α ^j , β ^j , and as shown above. The OCMP algorithm, requiring the computation of the L × L matrix Φ = F^t (FF^t )^-1F is not retained for the comparative evaluation since it is of greater computational load and lower signal-to-noise (SNR) than OOCMP.

4.3 Methods based on the minimization of the L₁norm

It must be underlined that an exhaustive comparison of L₁ norm minimization methods is beyond the scope of this article and the BP algorithm is selected here as a representative example.

Because of the NP complexity of the problem,

it is often preferred to minimize the L₁ norm instead of the L₀ norm. Generally, the algorithms used to solve the modified problem are not greedy and special measures should be taken to obtain a gain vector having exactly K nonzero components (i.e., ||g||₀ = K). Some algorithms, however, allow to control the degree of sparsity of the final solution--namely the LARS algorithms [8]. In these methods, the codebook vectors f^j(k)are consecutively appended to the base. In the k th iteration, the vector f^j(k)having the minimum angle with the current error e^k-1is selected. The algorithm may be stopped if K different vectors are in the base. This greedy formulation does not lead to the optimal solution and better results may be obtained using, e.g., linear programming techniques. However, it is not straightforward in such approaches to control the degree of sparsity ||g||₀. For example, the solution of the problem [9, 27]

(6)

will exhibit a different degree of sparsity depending on the value of the parameter λ. In practice, it is then necessary to run several simulations with different parameter values to find a solution with exactly K non-zero components. This further increases the computational cost of the already complex L₁ norm approaches. The L₁ norm minimization may be iteratively re-weighted to obtain better results. Despite the increase of complexity, this approach is very promising [28].

5 Comparative evaluation

5.1 Simulations

We propose in this section a comparative evaluation of all greedy algorithms listed in Table 1.

Table 1 Tested algorithms and corresponding acronyms

Full size table

For the sake of coherence, other algorithms based on L1 minimization (such as the solution of the problem (6)) are not included in this comparative evaluation, since they are not strictly greedy (in terms of constantly growing L₀). They will be compared with the other non-greedy algorithms (see Section 6).

We recall that the three algorithms, MGS, RMGS, and OOMP are equivalent except on computation load. We therefore only use for the performance evaluation the least complex algorithm RMGS. Similarly, for the OMP and GP, we will only use the least complex OMP algorithm. For MP, the three previously described variants (standard, with orthogonal projection and optimized with iterative dictionary orthogonalization) are evaluated. For CMP, only two variants are tested, i.e., the standard one and the OOCMP (RMGS-based implementation). The LARS algorithm is implemented in its simplest, stepwise form [8]. Gains are recalculated after the computation of the indices of the codebook vectors.

To highlight specific trends and to obtain reproducible results, the evaluation is conducted on synthetic data. Synthetic signals are widely used for comparison and testing of sparse approximation algorithms. Dictionaries usually consist of Gaussian vectors [6, 29, 30], and in some cases with a constraint of uniform distribution on the unit sphere [4]. This more or less uniform distribution of the vectors on the unit sphere is not necessarily adequate in particular for speech and audio signals where strong correlations exist. Therefore, we have also tested the sparse approximation algorithms on correlated data to simulate conditions which are characteristic to speech and audio applications.

The dictionary F is then composed of L = 128 vectors of dimension N = 40. The experiments will consider two types of dictionaries: a dictionary with uncorrelated elements (realization of a white noise process) and a dictionary with correlated elements [realizations of a second order AutoRegressive (AR) random process]. These correlated elements are obtained; thanks to the filter H(z):

with ρ = 0.9 and φ = π/4.

The observation vector x is also a realization of one of the two processes mentioned above. For all algorithms, the gains are systematically recomputed at the end of the iterative process (e.g., when all indices are obtained). The results are provided as SNR ratio for different values of K. For each value of K and for each algorithm, M = 1000 random draws of F and x are performed. The SNR is computed by

As in [4], the different algorithms are also evaluated on their capability to retrieve the exact elements that were used to generate the signal ("exact recovery performance").

Finally, overall complexity figures are given for all algorithms.

5.2 Results

5.2.1 Signal-to-noise ratio

The results in terms of SNR (in dB) are given in Figure 4 both for the case of a dictionary of uncorrelated (left) and correlated elements (right). Note that in both cases, the observation vector x is also a realization of the corresponding random process, but it is not a linear combination of the dictionary vectors.

Figure 5 illustrates the performances of the different algorithms in the case where the observation vector x is also a realization of the selected random process but this time it is a linear combination of P = 10 dictionary vectors. Note that at each try, the indices of these P vectors and the coefficients of the linear combination are randomly chosen.

5.2.2 Exact recovery performance

Finally, Figure 6 gives the success rate as a function of K, that is, the relative number of times that all the correct vectors involved in the linear combination are retrieved (which will be called exact recovery).

It can be noticed that the success rate never reaches 1. This is not surprising since in some cases the coefficients of the linear combination may be very small (due to the random draw of these coefficients in these experiments) which makes the detection very challenging.

5.2.3 Complexity

The aim of the section is to provide overall complexity figures for the raw algorithms studied in this article, that is, without including the complexity reduction techniques based on structured dictionaries.

These figures, given in Table 2 are obtained by only counting the multiplication/additions operations linked to the scalar product computation and by only retaining the dominant terms^e (more detailed complexity figures are provided for some algorithms in Appendix).

Table 2 Overall complexity in number of multiplications/additions per algorithm (approximated)

Full size table

The results are also displayed in Figure 7 for all algorithms and different values of K. In this figure, the complexity figures of OOMP (or MGS) and GP are also provided and it can be seen, as expected, that their complexity is much higher than RMGS and OMP, while they share exactly the same SNR performances.

5.3 Discussion

As exemplified in the results provided above, the tested algorithms exhibit significant differences in terms of complexity and performances. However, they are sometimes based on different trade-off between these two characteristics. The MP algorithm is clearly the less complex algorithm but it does not always lead to the poorest performances. At the cost of slight increasing complexity due to the gain update at each step, the OMP algorithm shows a clear gain in terms of performance. The three algorithms (OOMP, MGS, and RMGS) allow to reach higher performances (compared to OMP) in nearly all cases, but these algorithms are not at all equivalent in terms of complexity. Indeed, due to the fact that the updated dictionary does not need to be explicitly computed in RMGS, this method has nearly the same complexity as the standard iterative (or MP) algorithm including for high values of K.

The complementary algorithms are clearly more complex. It can be noticed that the CMP algorithm has a complexity curve (see Figure 7) that is shifted upwards compared with the MP's curve, leading to a dramatic (relative) increase for small values of K. This is due to the fact that in this algorithm an initial processing is needed (it is necessary to determine the matrix C - see Section 4.2). However, for all applications where numerous observations are processed from a single dictionary, this initial processing is only needed once which makes this approach quite attractive. Indeed, these algorithms obtain significantly improved results in terms of SNR and in particular OOCMP outperforms RMGS in all but one case. In fact, as depicted in Figure 4, RMGS still obtained better results when the signals were correlated and also in the case where K << N which are desired properties in many applications.

The algorithms CMP and OOCMP are particularly effective when the observation vector x is a linear combination of dictionary elements, and especially, when the dictionary elements are correlated. These algorithms can, almost surely, find the exact combination of vectors (contrary to the other algorithms). This can be explained by the fact that the crosscorrelation properties of the normalized dictionary vectors (angles between vectors) are not the same for F and Φ. This is illustrated in Figure 8, where the histograms of the cosines of the angles between the dictionary elements are provided for different values of the parameter ρ of the AR(2) random process. Indeed, the angle between the elements of the dictionary Φ are all close to π/2, or in other words they are, for a vast majority, nearly orthogonal whatever the value of ρ be. This property is even stronger when the F matrix is obtained with realizations of white noise (ρ = 0).

This is a particularly interesting property. In fact, when the vector x is a linear combination of P vectors of the dictionary F, then the vector y is a linear combination of P vectors of the dictionary Φ, and the quasi-orthogonality of the vectors of Φ allows to favor the choice of good vectors (the others being orthogonal to y). In CMP, OCMP, and OOCMP, the first selected vectors are not necessarily minimizing the norm ||F g - x||, which explains why these methods are poorly performing for a low number K of vectors. Note that the operation Φ = C^tF can be interpreted as a preconditioning of matrix F[31], as also observed in [6].

Finally, it can be observed that the GP algorithm exhibits a higher complexity than OMP in its standard version but can reach lower complexity by some approximations (see [4]).

It should also be noted, that the simple, stepwise implementation of the LARS algorithm yields comparable SNR values to the MP algorithm, at a rather high computational load. It then seems particularly important to use more elaborated approaches based on the L₁ minimization. In the next section, we will evaluate in particular a method based on the study of [32].

6 Toward improved performances

6.1 Improving the decomposition

Most of the algorithms described in the previous sections are based upon K steps iterative or greedy process, in which, at step k, a new vector is appended to a subspace defined at step k - 1. In this way, a K-dimensional subspace is progressively created.

Such greedy algorithms may be far from optimality and this explains the interest for better algorithms (i.e., algorithms that would lead to a better subspace), even if they are at the cost of increased computational complexity. For example, in the ITU G.729 speech coder, four vectors are selected in the four nested loops [20]. It is not a full-search algorithm (there are 2¹⁷ combinations of four vectors in this coder), because the innermost loop is skipped in most cases. It is, however, much more complex than the algorithms described in the previous sections. The Backward OOMP algorithm introduced by Andrle et al. [33] is a less complex solution than the nested loop approach. The main idea of this algorithm is to find a K' > K dimensional subspace (by using the OOMP algorithm) and to iteratively reduce the dimension of the subspace until the targeted dimension K is reached. The criterion used for the dimension reduction is the norm of the orthogonal projection of the vector x on the subspace of reduced dimension.

In some applications, the temporary increase of the subspace dimension is not convenient or even not possible (e.g., ACELP [20]). In such cases, optimization of the subspace of dimension K may be performed using the cyclic minimization concept [11]. Cyclic minimizers are frequently employed to solve the following problem:

where V is a function to be minimized and θ₁,..., θ_K are scalars or vectors. As presented in [11], cyclic minimization consists in performing, for i = 1,..., K, the minimization with respect to one variable:

and substituting the new value for the previous one: . The process can be iterated as many times as desired.

In [12], the cyclic minimization is employed to find the signal model consisting of complex sinusoids. In the augmentation step, a new sinusoid is added (according to the MP approach in frequency domain), and then in the optimization step the parameters of the previously found sinusoids are consecutively revised. This approach, termed as CyMP by the authors, has been extended to the time-frequency dictionaries (consisting of Gabor atoms and Dirac spikes) and to OMP algorithms [34].

Our idea is to combine the cyclic minimization approach with the locally optimal greedy algorithms like RMGS and OOCMP to improve the subspace generated by these algorithms.

Recently some other non-greedy algorithms have been proposed, which also tend to improve the subspace, namely the COSAMP [35] and the Subspace Pursuit (SP) [29]. These algorithms enable, in the same iteration, to reject some of the basis vectors and to introduce new candidate vectors. Greedy algorithms also exist, namely the Stagewise Orthogonal Matching Pursuit (StOMP) [36] and the Regularized Orthogonal Matching Pursuit (ROMP) [30], in which, a series of vectors is selected in the same iteration. It has been shown that the non-greedy SP outperforms the greedy ROMP [29]. This motivates our choice only to include the non-greedy COSAMP and SP algorithms in our study.

The COSAMP algorithm starts with the iteration index k = 0, the codebook F, the error vector e= x, and the L-dimensional gain vector g ^k = 0. Number of nonzero gains in the output gain vector should be equal to K. Each iteration consists of the following steps:

k = k + 1,
Crosscorrelation computation: β = F^t e .
Search for the 2K indices of the largest crosscorrelations:

Ω = supp_2K(β).

Merging of the new and previous indices: T = Ω ∪ supp_K( g ^k-1).
Selection of the codebook vectors corresponding to the indices T : A = F_T .
Calculation of the corresponding gains (least squares): g _T = (A^tA)^-1 A^t x (the remaining gains are set to zero).
Pruning g _T to obtain K nonzero gains of maximum absolute values: g ^k .
Update of the error vector: e = x - F g ^k .
Stop if || e ||₂ < ε ₁ or || g ^k- g ^k-1|| < ε ₂ or k = k_max .

Note that, in COSAMP, 2K new indices are merged with K old ones, while the SP algorithm merges K old and K new indices. This constitutes the main difference between the two algorithms. For the sake of fair comparison, the stopping condition has been modified and unified for both algorithms.

6.2 Combining algorithms

We propose in this section a new family of algorithms which, like the CyMP, consist of an augmentation phase and an optimization phase. In our approach, the augmentation phase is performed using one of the greedy algorithms described in previous sections, yielding the initial K-dimensional subspace. The cyclic optimization phase consists in substituting new vectors for the previously chosen ones, without modification of the subspace dimension K. The K vectors spanning the subspace are consecutively tested by removing them from the subspace. Each time a K - 1 -dimensional subspace is created. A substitution takes place, if one of the L - K codebook vectors, appended to this K - 1 -dimensional subspace, forms a better K-dimensional subspace than the previous one. The criterion is, naturally, the approximation error, i.e., . In this way a "wandering subspace" is created: a K-dimensional subspace evolves in the N-dimensional space, trying to approach the vector x being modeled. Generic scheme of the proposed algorithms may be described as follows:

1.
The augmentation phase: Creation of a K-dimensional initial subspace, using one of the locally optimal greedy algorithms.
2.
The cyclic optimization phase:
(a)
Outer loop: testing of codebook vectors f ^j(i), i = 1,..., K, spanning the K-dimensional subspace. In the i-th iteration vector f ^j(i)is temporarily removed from the subspace.
(b)
Inner loop: testing the codebook vectors f ^l , l = 1,..., L - except for vectors belonging to the subspace. Substitute f ^l for f ^j(i)if the obtained new K-dimensional subspace yields better approximation of the modeled vector x . If there are no substitutions in the inner loop, put the vector f ^j(i)back to the set of spanning vectors.
3.
Stop if there are no substitutions in the outer loop (i.e., in the whole cycle).

In the augmentation phase, any greedy algorithm may be used, but, due to the local convergence of the cyclic optimization algorithm, a good initial subspace yields a better final result and reduces the computational cost. Therefore, the OOMP (RMGS algorithm) and OOCMP were considered and the proposed algorithms will be referred below as CyOOMP or CyRMGS and CyOOCMP. In the cyclic optimization phase, the implementation of the operations in both loops is always based on the RMGS (OOMP) algorithm (no matter which algorithm has been used in the augmentation phase). In the outer loop the K - 1 steps of the RMGS algorithm are performed, using already known vector indices. In the inner loop, the K th step of the RMGS algorithm is made, yielding the index of the best vector belonging to the orthogonalized codebook. Thus, in the inner loop, it may be either one substitution (if the vector f ^l calculated using the RMGS algorithm is better than the vector f^j(i)temporarily removed from the subspace) or no substitution.

If the initial subspace is good (e.g., created by the OOCMP algorithm), then, in most cases, there are no substitutions at all (the outer loop operations are performed only once). If the initial subspace is poor (e.g., randomly chosen), the outer loop operations are performed many times and the algorithm becomes computationally complex. Moreover, this algorithm stops in some suboptimal subspace (it is not equivalent to the full search algorithm), and it is therefore, important to start from a good initial subspace. The final subspace is, in any case, not worse than the initial one and the algorithm may be stopped at any time.

In [34], the cyclic optimization is performed at each stage of the greedy algorithm (i.e., the augmentation steps and cyclic optimization steps are interlaced). This yields a more complex algorithm, but which possesses a higher probability of finding a better subspace.

The proposed algorithms are compared with the other non-greedy procedures: COSAMP, SP, and L₁ minimization. The last algorithm is based on minimization of (6), using the BP procedure available in [32]. Ten trials are performed with different values of the parameter λ. These values are logarithmically distributed within a range depending on the demanded degree of sparsity K. At the end of each trial, pruning is performed, to select K codebook vectors having the maximum gains. The gains are recomputed according to the least squares criterion.

6.3 Results

The performance results are shown in Figure 9 in terms of SNR (in dB) for different values of K, when the dictionary elements are realizations of the white noise process (left) or AR(2) random process (right).

It can be observed that since RMGS and OOCMP are already quite efficient for uncorrelated signal, the gain in performance for CyRMGS and CyOOCMP are only significant for correlated signals. We then discuss below only the results obtained for the correlated case. Figure 10 (left) provides the SNRs in the case where the vector x is a linear combination of P = 10 dictionary vectors and the success rate to retrieve the exact vectors (right).

The SNR are clearly improved for both the algorithms compared with their initial core algorithm in all tested cases. A typical gain of 5 dB is obtained for CyRMGS (compared to RMGS). This cyclic substitution technique also significantly improves the initially poor results of OOCMP for small values of K. One can also notice that a typical gain of 10 dB is observed for the simulations, where x is a linear combination of P = 10 dictionary vectors for correlated signals (see Figure 10 (left)). Finally, the exact recovery performances are also improved as compared with for both the core algorithms (RMGS and OOCMP).

L 1 minimization (BP algorithm) performs nearly as good as the Cyclic OOMP, but is more complex in practice due to the necessity of running several trials with different values for the parameter λ.

SP outperforms COSAMP, but both methods yield lower SNR as compared with the cyclic implementations. Moreover, COSAMP and SP do not guarantee monotonic decrease of the error. Indeed, in practice, they often reach a local minimum and yield the same result in consecutive iterations, which stops the procedure. In some other situations they may exhibit oscillatory behaviors, repeating the same sequence of solutions. In that case, the iterative procedure is only stopped after k_max iterations which, for typical value of k_max = 100, considerably increases the average computational load. Detection of the oscillations should diminish the computational complexity of these two algorithms.

Nevertheless, the main drawback of these new algorithms is undoubtedly the significant increase in complexity. One may indeed observe that the complexity figures displayed in Figure 11 are of order one in magnitude and higher than those displayed in Figure 7.

7 Conclusion

The common ground of all the methods discussed in this article is the iterative procedure to greedily compute a basis of vectors q¹ · · · q ^K which are

simply f ^j(1)· · · f ^j(K)in MP, OMP, CMP, and LARS algorithms,
orthogonal in OOMP, MGS, and RMGS (explicit computation for the first two algorithms and only implicit for RMGS),
A-conjugate in GP algorithm.

It was shown in particular in this article that some methods often referred as different techniques in the literature are equivalent. The merit of the different methods was studied in terms of complexity and performances and it is clear that some approaches realize a better trade-off between these two facets. As an example, the RMGS provides substantial gain in performance to the standard MP algorithm with only a very minor complexity increase. Its main interest is indeed the use of a dictionary that is iteratively orthogonalized, but without explicitly building that dictionary. On the other end, for application where complexity is not a major issue, CMP-based algorithms represent an excellent choice, and especially the newly introduced CyOOCMP.

The cyclic algorithms are compared with the other non-greedy procedures, i.e., COSAMP, SP, and L₁ minimization. The proposed cyclic complementary OOMP successfully competes with these algorithms in solving the sparse and non-sparse problems of small dimension (encountered, e.g., in CELP speech coders).

Although it is not discussed in this article, it is interesting to note that the efficiency of an algorithm may be dependent on how the dictionary F is built. As noted, in the introduction, the dictionary may have an analytic expression (e.g., when F is an union of several transforms at different scales). But F can also be built by machine learning approaches (such as K-means [10], K-SVD [37], or other clustering strategy [38]).

Finally, a recent and different paradigm was introduced, the compressive sampling [39]. Based on solid grounds, it clearly opens the path for different approaches that should permit better performances with possibly smaller dictionary sizes.

Appendix

The algorithmic description of the main algorithms discussed in the article along with the more precise complexity figures is presented in this section. Note that all implementations are directly available on line at http://www. telecom-paristech.fr/^~grichard/EURASIP_Moreau2011/.

Algorithm 1 Standard Iterative algorithm (MP)

for j = 1 to L do

α^j =< f ^j , f ^j >

end for

for k = 1 to K do

for j = 1 to L (if k < K) do

end for

Option : recompute all gains

A = [f^j(1)· · · f^j(K)]

g= (A^tA)^-1A^tx

Complexity: (K + 1)NL + α(K), where α(K) ≈ K³/3 is the cost of final gains calculation

Algorithm 2 Optimized Orthogonalized MP (OOMP)

for j = 1 to L do

end for

for k = 1 to K do

for j = 1 to L (if k < K) do

end for

A = [f^j(1)· · · f^j(K)]

g= (A^tA)^-1A^tx

Complexity: (K + 1)NL + 3(K - 1)NL + α(K)

Algorithm 3 Recursive modified Gram-Schmidt

for j = 1 to L do

end for

for k = 1 to K do

for j = 1 to L (if k < K) do

end for

for k = K - 1 to 1 do

end for

Complexity: (K + 1)NL + (K - 1)L(1 + K/2)

Algorithm 4 Gradient pursuit

for j = 1 to L do

α^j =< f ^j , f ^j >

end for

e⁰ = x

g⁰ = 0

for k = 1 to K do

A_k₌ [f^j(1)· · · f^j(k)]

B^k = (A^k ) ^tA^k

if k = 1 then

else

end if

g ^k = g^k-1+ c_kq^k

e ^k = x- A^k g ^k

for j = 1 to L (if k < K) do

end for

Complexity:

Endnotes

^aNote though that the vector g is now of dimension K instead of L, the indices j(1) · · · j(K) point to dictionary vectors (columns of F ) corresponding to non-zero gains. ^bK = 2 or 3, L = 512 or 1024, N = 40 for a sampling rate of 8kHz are typical values found in speech coding schemes. ^cSeveral alternatives of this algorithm are also proposed in [4], and in particular the "approximate conjugate gradient pursuit" (ACGP) which exhibits a significantly lower complexity. However, in this article all figures and discussion will only consider the primary GP algorithm. ^dTwo vectors u and v are A-conjugate, if they are orthogonal with respect to the scalar product u^tAv. ^eThe overall complexity figures were obtained by considering the following approximation for small i values: and by only keeping dominant terms considering that K ≪ N. Practical simulations showed that the approximation error with these approximative figures was less than 10% compared to the exact figures

Abbreviations

BP:: basis pursuit
CELP:: code excited linear predictive
CMP:: complementary matching pursuit
CyMP:: cyclic matching pursuit
GP:: gradient pursuit
LARS:: least angle regression
MP:: matching pursuit
OCMP:: orthogonal complementary matching pursuit
OMP:: orthogonal matching pursuit
OOMP:: optimized orthogonal matching pursuit.

References

Mallat S, Zhang Z: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process 1993,41(12):3397-3415. 10.1109/78.258082
Article Google Scholar
Pati Y, Rezaifar R, Krishnaprasad P: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Proceedings of the 27th Asilomar Conference on Signals, Systems and Computers 1993.
Google Scholar
Rebollo-Neira L, Lowe D: Optimized orthogonal matching pursuit approach. IEEE Signal Process. Lett 2002, 9: 137-140. 10.1109/LSP.2002.1001652
Article Google Scholar
Blumensath T, Davies M: Gradient pursuits. IEEE Trans. Signal Process 2008,56(6):2370-2382.
Article MathSciNet Google Scholar
Rath G, Guillemot C: Sparse approximation with an orthogonal complementary matching pursuit algorithm. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 2009, 3325-3328.
Google Scholar
Rath G, Sahoo A: A comparative study of some greedy pursuit algorithms for sparse approximation. Eusipco 2009.
Google Scholar
Chen S, Donoho D, Saunders M: Atomic decomposition by basis pursuit. SIAM Rev 2001,43(1):129-159. 10.1137/S003614450037906X
Article MathSciNet Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R: Least angle regression. Ann. Stat 2004,32(2):407-499. 10.1214/009053604000000067
Article MathSciNet Google Scholar
Tibshirani R: Regression shrinkage and selection via the lasso. J. R. Stat. Soc 1996,58(1):267-288.
MathSciNet Google Scholar
Gersho A, Gray R: Vector Quantization and Signal Compression. Kluwer Academic Publishers; 1992.
Chapter Google Scholar
Stoica P, Selen Y: Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher. IEEE Signal Process. Mag 2004,21(1):112-114. 10.1109/MSP.2004.1267055
Article Google Scholar
Christensen M, Jensen S: The cyclic matching pursuit and its application to audio modeling and coding. Rec. Asilomar Conf. Signals Systems and Computers 2007.
Google Scholar
Moreau N: Tools for Signal Compression. ISTE Wiley; 2011.
Google Scholar
Tremain T: The government standard linear predictive coding algorithm: LPC-10. Speech Technology Magazine 1982, 1: 40-49.
Google Scholar
Atal B, Remde J: A new model of LPC excitation for producing natural-sounding speech at low bit rates. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1982, 614-617.
Google Scholar
Berouti M, Garten H, Kabal P, Mermelstein P: Efficient computation and encoding of the multi-pulse excitation for LPC. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1984, 10.1.1-10.1.4.
Google Scholar
Kroon P, Deprettere E: Experimental evaluation of different approaches to the multi-pulse coder. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1984, 10.4.1-10.4.4.
Google Scholar
Schroeder M, Atal B: Code-excited linear prediction (CELP): high-quality speech at very low bit rates. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1985, 937-940.
Google Scholar
Linde Y, Buzo A, Gray R: An algorithm for vector quantizer design. IEEE Trans. Commun 1980, COM-28: 84-95.
Article Google Scholar
Salami R, Laflamme C, Adoul J, Kataoka A, Hayashi S, Lamblin C, Massaloux D, Proust S, Kroon P, Shoham Y: Design and description of CS-ACELP: a toll quality 8 kb/s speech coder. IEEE Trans. Speech Audio Process 1998,6(2):116-130. 10.1109/89.661471
Article Google Scholar
Ravelli E, Richard G, Daudet L: Union of MDCT bases for audio coding. IEEE Trans. Speech, Audio Lang. Process 2008,16(8):1361-1372.
Article Google Scholar
Golub G, Van Loan C: Matrix Computations. Johns Hopkins University Press; 1983.
Google Scholar
Dymarski P, Moreau N, Vigier A: Optimal and sub-optimal algorithms for selecting the excitation in linear predictive coders. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1990, 485-488.
Chapter Google Scholar
Singhal S: Reducing computation in optimal amplitude multipulse coders. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1986, 2363-2366.
Google Scholar
Singhal S, Atal B: Amplitude optimization and pitch prediction in multipulse coders. IEEE Trans. Acoust. Speech Signal Process 1989,37(3):317-327. 10.1109/29.21700
Article Google Scholar
Moreau N, Dymarski P: Mixed excitation CELP coder. Proceedings of the European Conference on Speech Communication and Technology 1989, 322-325.
Google Scholar
Giacobello D, Christensen MG, Murthi MN, Jensen SH, Moonen M: Retrieving sparse patterns using a compressed sensing framework: applications to speech coding based on sparse linear prediction. IEEE Signal Process. Lett 2010,17(1):103-106.
Article Google Scholar
Giacobello D, Christensen MG, Murthi MN, Jensen SH, Moonen M: Enhancing sparsity in linear prediction of speech by iteratively reweighted 1-norm minimization. Proceedings of the Internatinoal Conference on Acoustics, Speech, and Signal Processing 2010, 4650-4653.
Google Scholar
Dai W, Milenkovic O: Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inf. Theory 2009,55(5):2230-2249.
Article MathSciNet Google Scholar
Needell D, Vershynin R: Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Topics Signal Process 2010,4(2):310-316.
Article Google Scholar
Schnass K, Vandergheynst P: Dictionary preconditioning for greedy algorithms. IEEE Trans. Signal Process 2008,56(5):1994-2002.
Article MathSciNet Google Scholar
Peyre G: Toolbox Sparsity--Sparsity-Based Signal Processing Related Functions. Matlab Central 2007.
Google Scholar
Andrle M, Rebollo-Neira L, Sagianos E: Backward-optimized orthogonal matching pursuit approach. IEEE Signal Process. Lett 2004,11(9):705-708. 10.1109/LSP.2004.833503
Article Google Scholar
Sturm B, Christensen M: Cyclic matching pursuit with multiscale time-frequency dictionaries. Rec. Asilomar Conference on Signals Systems and Computers 2010.
Google Scholar
Needell D, Tropp JA: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmonic Anal 2008,26(3):301-321.
Article MathSciNet Google Scholar
Donoho D, Tsaig Y, Drori I, Starck J: Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit, vol. 2. Technical Report, Department of Statistics, Stanford University 2006.
Google Scholar
Aharon M, Elad M, Bruckstein A: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process 2006,54(11):4311-4322.
Article Google Scholar
Leveau P, Vincent E, Richard G, Daudet L: Instrument-specific harmonic atoms for mid-level music representation. IEEE Trans. Speech Audio Lang. Process 2008,16(1):116-128.
Article Google Scholar
Candès E, Wakin M: An introduction to compressive sampling. IEEE Signal Process. Mag 2008,56(6):21-30.
Article Google Scholar

Download references

Acknowledgements

The authors would like to warmly thank Prof. Laurent Daudet for his detailed and constructive comments on an earlier version of this manuscript.

Author information

Authors and Affiliations

Institute of Telecommunications, Warsaw University of Technology, Warsaw, Poland
Przemyslaw Dymarski
Institut Telecom, Telecom ParisTech, CNRS-LTCI, Paris, France
Nicolas Moreau & Gaël Richard

Authors

Przemyslaw Dymarski
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Moreau
View author publications
You can also search for this author in PubMed Google Scholar
Gaël Richard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Przemyslaw Dymarski.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Dymarski, P., Moreau, N. & Richard, G. Greedy sparse decompositions: a comparative study. EURASIP J. Adv. Signal Process. 2011, 34 (2011). https://doi.org/10.1186/1687-6180-2011-34

Download citation

Received: 16 February 2011
Accepted: 03 August 2011
Published: 03 August 2011
DOI: https://doi.org/10.1186/1687-6180-2011-34

Greedy sparse decompositions: a comparative study

Abstract

1 Introduction

2 Notations

3 Overview of classical algorithms

3.1 CELP speech coding

3.2 Standard iterative algorithm

3.3 Locally optimal algorithms

3.3.1 Principle

3.3.2 Gram-Schmidt decomposition and QR factorization

3.3.3 Recursive MGS algorithm

4 Other recent algorithms

4.1 GP algorithm

4.1.1 Conjugate gradient

4.1.2 Conjugate gradient for parsimonious representations

4.2 CMP algorithms

4.3 Methods based on the minimization of the L1norm

5 Comparative evaluation

5.1 Simulations

5.2 Results

5.2.1 Signal-to-noise ratio

5.2.2 Exact recovery performance

5.2.3 Complexity

5.3 Discussion

6 Toward improved performances

6.1 Improving the decomposition

6.2 Combining algorithms

6.3 Results

7 Conclusion

Appendix

Algorithm 1 Standard Iterative algorithm (MP)

Algorithm 2 Optimized Orthogonalized MP (OOMP)

Algorithm 3 Recursive modified Gram-Schmidt

Algorithm 4 Gradient pursuit

Endnotes

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

4.3 Methods based on the minimization of the L₁norm