Skip to main content

Clustering K-SVD for sparse representation of images

Abstract

K-singular value decomposition (K-SVD) is a frequently used dictionary learning (DL) algorithm that iteratively works between sparse coding and dictionary updating. The sparse coding process generates sparse coefficients for each training sample, and the sparse coefficients induce clustering features. In the applications like image processing, the features of different clusters vary dramatically. However, all the atoms of dictionary jointly represent the features, regardless of clusters. This would reduce the accuracy of sparse representation. To address this problem, in this study, we develop the clustering K-SVD (CK-SVD) algorithm for DL and the corresponding greedy algorithm for sparse representation. The atoms are divided into a set of groups, and each group of atoms is employed to represent the image features of a specific cluster. Hence, the features of all clusters can be utilized and the number of redundant atoms are reduced. Additionally, two practical extensions of the CK-SVD are provided. Experimental results demonstrate that the proposed methods could provide more accurate sparse representation of images, compared to the conventional K-SVD and its existing extended methods. The proposed clustering DL model also has the potential to be applied to the online DL cases.

1 Introduction

Sparse representation aims to model signals as sparse linear combinations of the atoms in a dictionary, and this technique is widely used in various fields of image processing [14]. Let \(\boldsymbol {z}\in \mathbb {R}^{n}\) and \(\boldsymbol {D}\in \mathbb {R}^{n\times q}, q\geq n\) denote a signal and an over-complete dictionary, respectively. The sparse representation of z with respect to the dictionary D is expressed as zDs. The sparse coefficients vector \(\boldsymbol {s}\in \mathbb {R}^{q}\) satisfies s0k and zDs2ε, where ·0 denotes the number of non-zero entries of a vector, k and ε represent the maximum number of sparse coefficients and sparse representation error, respectively. In general, the dictionaries used for sparse representation can be divided into two categories: analytical dictionaries and learned dictionaries. The analytical dictionaries like wavelet dictionaries can be universally applied, and they are easy to obtain. However, the moderate sparse representation accuracy limits their applications. For better performance, the over-complete dictionary D is commonly obtained from the DL process using a set of training samples \(\boldsymbol {Z}\in \mathbb {R}^{n\times \xi }\), expressed as:

$$ \begin{aligned} \mathop{\arg\min}_{\boldsymbol{D},\boldsymbol{S}}\, \|\boldsymbol{Z}-\boldsymbol{D}\boldsymbol{S}\|_{F}^{2}\; s.t. \, \|\boldsymbol{s}_{t}\|_{0}\leq k, \|\boldsymbol{d}_{i}\|_{2}=1,\end{aligned} $$
(1)

where t{1,2,ξ}, i{1,2,q}, and ·F denote the Frobenius norm. The notation di denotes the ith column of the dictionary D, which is also referred to as the ith atom. S is the sparse coefficients with respect to Z and D, and it is obtained with D in the DL process; st is the tth column of S.

To date, researchers have proposed various DL algorithms. In [5], Engan et al. propose the well-known DL method, named “the method of optimal directions (MOD).” The MOD contained two iterative process, sparse coefficients computing and dictionary updating. The dictionary updating is globally realized by the least squares (LS) computation in terms of training samples and sparse coefficients. In [6], Aharon et al. propose another LS-based algorithm for DL, referred to as the K-SVD. Different from the global update strategy, for K-SVD, the atoms of dictionary is updated separately. The MOD, the K-SVD, and their extended methods are used for batch DL, i.e., the training samples input simultaneously. However, when the training samples can not be obtained all at one, the online learning is required. In [7], Mairal et al. propose the online DL (ODL) algorithm, aiming to update the atoms by using only the newly input samples. This algorithm allows training samples to be input successively and realizes the online learning. Additionally, a set of DL methods that are extended from MOD, K-SVD, and ODL have also been proposed, in order to improve the sparse representation accuracy or reduce the computational complexity [813].

Among these algorithms, the K-SVD is frequently used in the fields of image processing due to its generality and low complexity. The K-SVD algorithm consists two processes, sparse coding and dictionary updating, which are executed alternately. In the sparse coding process, at most k sparse coefficients for each training sample are computed via greedy algorithms, inducing clustering features [1416]. For the K-SVD algorithm, all the atoms of dictionary jointly represent the training images, regardless of clusters. While representing different training samples, an atom may be employed by different clusters of features. In the applications of image processing, the features of different clusters vary dramatically, and therefore, the above phenomenon may reduce the accuracy of sparse representation. In [9], Nazzal et al. utilize the residual of training samples to train a set of sub-dictionaries. However, the sub-dictionaries are not distinguished by different clusters. In [11], Smith and Elad improve the K-SVD by considering only the used atoms in dictionary updating process. In [31], Tariyal et al. propose the deep DL by combining the concepts of DL and deep learning. The multiple DL framework is developed for multiple levels of dictionaries. In [30], Yi et al. build a hierarchical sparse representation framework that consists of the local histogram-based model, the weighted alignment pooling model, and the sparsity-based discriminative model. In [28], Rubinstein et al. propose the approximate K-SVD method to reduce the computational complexity, which can be regarded as another implementation of the K-SVD. In [29], Mairal et al. develop a multiscale DL framework based on an efficient quadtree decomposition of the learned dictionary.

In this study, we aim to the utilize the clusters of features of training samples. We divide the atoms of learned dictionary into a set of groups, and each group serves for a specific cluster of features. This strategy improves the DL process from two aspects. First, besides the image features of the original training samples, we also consider the features of residuals of different clusters, which reduces the number of redundant atoms. Second, we develop the strategy to ensure that an arbitrary atom of a dictionary is utilized for only a specific feature of a cluster. Hence, the atom would not be influenced by the features of other clusters. Based on the strategy, we propose the CK-SVD algorithm, as well as the corresponding greedy recovery algorithm for computing sparse representations. Compared to the conventional K-SVD, the CK-SVD improves the sparse reconstruction accuracy without increasing the requirements or computational complexity of DL process. Based on the clustering DL model, we also provide two practical extended methods of the CK-SVD, which achieve the adaptive sparsity and the dynamic refinement of atoms, respectively.

The remainder of this paper is organized as follows. Section 2 describes the aim of this study and introduces the proposed method. Section 3 provides the extended methods of the CK-SVD. Section 4 presents the experimental results. Section 5 discusses the proposed clustering model and its potential to be applied to online learning. Section 6 draws a conclusion.

2 Proposed method

In this section, we primarily review the conventional K-SVD algorithm and describe the problem that needs to be addressed. Next, we introduce the proposed CK-SVD algorithm.

2.1 Problem formulation

Given the training samples \(\boldsymbol {Z}\in \mathbb {R}^{n\times \xi }\) and the initialized dictionary \(\boldsymbol {D}\in \mathbb {R}^{n\times q}, q\geq n\), the sparse coding process is given by:

$$ \begin{aligned} \mathop{\arg\min}_{\boldsymbol{S}}\, \|\boldsymbol{Z}-\boldsymbol{D}\boldsymbol{S}\|_{F}^{2}\; s.t. \, \|\boldsymbol{s}_{t}\|_{0}\leq k,\end{aligned} $$
(2)

where t=1,2,ξ. The process is executed by computing the sparse representation for each training sample expressed as:

$$ \begin{aligned} \mathop{\arg\min}_{\boldsymbol{s}_{t}}\, \|\boldsymbol{z}_{t}-\boldsymbol{D}\boldsymbol{s}_{t}\|_{2}^{2}\; s.t. \, \|\boldsymbol{s}_{t}\|_{0}\leq k,\end{aligned} $$
(3)

where the vector \(\boldsymbol {z}_{t}\in \mathbb {R}^{n}\) denotes an arbitrary training sample. The above problem is commonly solved by greedy algorithms like orthogonal matching pursuit (OMP) [17]. Specifically, for each iteration of the process for solving (3), the atom that leads the largest inner product with the residual of zt is selected. Thus, k atoms are selected successively, inducing k clusters of features. For the image samples, the objective features of different clusters vary greatly. But the atoms jointly represent the features, without considering the clusters. Hence, an arbitrary atom of the dictionary may be interfered by different features.

Here, we provide an example to describe this issue by employing the test image “Koala” with the size of 480×320 from the standard image dataset, Berkeley dataset [18]. We divided the images into patches with the size of 4×4, i.e., n=16. We vectorized the patches and used as the training samples for the K-SVD algorithm. The dictionary \(\boldsymbol {D}\in \mathbb {R}^{n\times q}\) was initialized as the Gaussian random matrix. We set the total number of atoms and the maximum number of sparse coefficients for each sample to q=3n and k=3, respectively. After ten iterations, the outputted dictionary was obtained and it was utilized for sparse coding of the image “Koala” via the OMP algorithm. We divided the image into a set of patches and coded these patches respectively by using the learned dictionary. The original image, the clusters of image features, and the residual images are presented in Fig. 1. We obtain the residual image R1 by removing the first cluster of features H1 from the original image X, and H1 can be regarded as the sparse representation by using the first selected atom in the greedy recovering process. Similarly, we can obtain R2 and R3 by removing H2 and H3 from R1 and R2, respectively.

Fig. 1
figure 1

The original “Koala” image, the image features of different clusters, and the residual images

An atom of the learned dictionary can be used to represent several patches. When an atom is used to represent a patch, it may be invoked in the first (H1), the second (H2), or the third (H3) cluster (as indicated in Fig. 1). Thus, in Table 1 and Fig. 2, we summarize how many times that a specific atom is invoked by different cluster of features. Most atoms are invoked by more than one cluster of features.

Fig. 2
figure 2

Histogram of the results in Table 1

Table 1 Summary on how many times that a specific atom is invoked by different cluster of features

To further illustrate this issue, we also selected an atom, the 12th atom of the learned dictionary and collected the features that invoked the atom. The result is displayed in Fig. 3. The features invoked the 12th atom belong to two different clusters. It can be noted that different cluster of features vary greatly. In other word, the 12th atom of the dictionary is employed to represent two different type of features. Obviously, one atom cannot provide the accuracy represents for both two types of features. Hence, the atom has to compromise among these different features to achieve the global minimum representation error. As a result, the performance of learned dictionary will be influenced. The graphical representation of the atom is presented in Fig. 3, indicating it contains part of characteristic of the first cluster of features and part of characteristic of the second cluster of features. It implies the learned atom is a compromise of the two types of features.

Fig. 3
figure 3

Part of the features invoked the 12th atom and the learned atom

To address this problem, we propose the CK-SVD algorithm for DL and the corresponding greedy algorithm for sparse recovery, which will be introduced in the following section.

2.2 CK-SVD for sparse representation of images

The proposed DL algorithm is also composed of two iterative process. As shown in Fig. 4, for the sparse coding process, we divide the atoms into k groups, which serve for each cluster of features.

Fig. 4
figure 4

Sparse recovery using the proposed method

In other words, we divide the dictionary into k sub-dictionaries, expressed as D=[D1,D2,,Dk]. We propose the greedy algorithm to solve the sparse recovery problem, which is described in Algorithm 1. For the lth iterative cycle, the features of the lth cluster are considered, and therefore, we only search the atoms in the lth sub-dictionary. In other words, only the atoms of Dl have the opportunity to be selected in the lth iterative cycle. Among these atoms, the one that is most relative to the residual obtained in the (l−1)th iteration is selected to represent the feature of the lth cluster. We compute the sparse coefficient for the current cluster based on the LS method:

$$ \begin{aligned} c=(\boldsymbol{b}^{*}\boldsymbol{b})^{-1}\boldsymbol{b}^{*}\boldsymbol{z},\end{aligned} $$
(4)

where b is the atom of D indexed by ω, and ω is obtained by steps 3 and 4 in Algorithm 1. Then, the residual of the objective sample is updated by:

$$ \begin{aligned} \boldsymbol{r}\leftarrow\boldsymbol{r}-c\boldsymbol{b}.\end{aligned} $$
(5)

The above process is executed until the maximum number of coefficients is reached or the residual is small enough.

The Algorithm 1 is employed for computing the sparse coefficients of each training sample zt. Next, the dictionary updating process is executed. Different from the conventional K-SVD, for the proposed method, the sub-dictionaries \(\{\boldsymbol {D}_{l}\in \mathbb {R}^{n\times q_{l}}\}\) are initialized. The larger parameter q leads better performance of the dictionary but increases the complexity of the DL process and subsequent applications. For simplicity and without loss of generality, in this study, we assume the number of atoms of each sub-dictionary is the same. For an arbitrary atom di, we first find the training samples that have used di, and their indexes are denoted as γi. Then, we focus on the training samples indexed by γi, i.e., \(\boldsymbol {Z}_{\gamma _{i}}\), and compute the residual of these samples by excluding the atom di that is expressed as:

$$ \begin{aligned} \boldsymbol{R}_{\gamma_{i}}=\boldsymbol{Z}_{\gamma_{i}}-\sum_{j\neq i} \boldsymbol{d}_{j} \tilde{\boldsymbol{s}}_{\gamma_{i}}^{j},\end{aligned} $$
(6)

where \(\boldsymbol {R}_{\gamma _{i}}\) denotes the mentioned residual, \(\tilde {\boldsymbol {s}}_{\gamma _{i}}^{j}\) represents the jth row of \(\boldsymbol {S}_{\gamma _{i}}\). In fact, while excluding di, other used atoms for \(\boldsymbol {Z}_{\gamma _{i}}\) would not belong to the group that contains di, and this property could be utilized to reduce the computational complexity. Next, we apply the singular value decomposition (SVD) to the residual \(\boldsymbol {R}_{\gamma _{i}}\), expressed as:

$$ \begin{aligned} \boldsymbol{R}_{\gamma_{i}}=\boldsymbol{U}\boldsymbol{\Delta}\boldsymbol{V}^{*}.\end{aligned} $$
(7)

We update the atom di to be the first column of U, denoted as u1, and update the sparse coefficients row \(\tilde {\boldsymbol {s}}_{\gamma _{i}}^{j}\) by multiplying Δ1,1 and the first column of V, denoted as v1.

3 Extension of cK-SVD

The proposed idea not only leads to the CK-SVD method but also builds a framework for DL. In other words, the CK-SVD can still be extended for better performance. Next, we introduce two practical extensions.

3.1 Sparsity-wise CK-SVD

For the standard CK-SVD, we fix the sparsity level, i.e., the number of sparse coefficients, for each training sample. However, this may lead to underfitting or overfitting of sparse representation. To address this problem, we develop the sparsity-wise CK-SVD (SwCK-SVD). We employ multi-atoms to represent a cluster of features instead of a single atom. To obtain the SwCK-SVD from the CK-SVD, we set termination conditions to determine the number of used atoms for a training sample. The sparse recovery strategy is summarized in Algorithm 3. For each cluster, the sparse coding is realized via an iterative process. The parameter amax controls the maximum expected number of used atoms. In step 9, it is determined whether more atoms are required by examining if the residual is relative enough to the remained atoms. This operation allows the sparsity to be adaptive to each training sample, aiming to achieve a satisfied representation accuracy by using as few atoms as possible. The parameter ρ controls the threshold of the termination condition, and we empirically suggest ρ=0.40.6.

3.2 Dynamic CK-SVD

Although the sparse coding strategies for the K-SVD, the CK-SVD, and the SwCK-SVD are different, their dictionary updating strategies are the same. They all use the first principal component of the SVD result to update dictionary and sparse coefficients (see steps 9 and 10 in Algorithm 2), while ignoring other components. Under the framework of CK-SVD, the first cluster contributes most to the representation, and later clusters make less contributions. For instance, it is possible that a second principal component of the SVD result of a residual \(\boldsymbol {R}_{\gamma _{i}}\) in the first cluster, expressed as \(\boldsymbol {u}_{2}\boldsymbol {\Delta }\boldsymbol {v}^{*}_{2}\), may be more significant than a first principal component of the SVD results of a residual in the second cluster. Based on this consideration, we extend the CK-SVD to the dynamic CK-SVD (DCK-SVD), for which the atoms of different clusters are refined after each iterative cycle. The dictionary updating strategy is provided in Algorithm 4. For each iterative cycle, we use the second component of the SVD results with respect to the most used atom in Dl, to replace the least used atom in the next cluster of dictionary. This operation makes the dictionaries dynamic that means the atoms will be refined after an iterative cycle and those contribute little to the representation will be abandoned.

4 Results

In this section, we provide the experimental results and analysis. The Berkeley dataset is employed for the experiments [18]. The experiments are organized as follows. First, we conducted the standard CK-SVD-based DL process and the conventional K-SVD-based DL process, respectively, by using the training dataset, in order to verify the improvement of the CK-SVD over the conventional K-SVD. Second, we applied different dictionaries to the compressive sensing, which is the typical application in the field of image processing. Besides the K-SVD and the standard CK-SVD, we also employed the SwCK-SVD, the DCK-SVD, and two existing methods extended from the K-SVD, proposed in [28] and [29], respectively.

4.1 Experiments on sparse representation

The Berkeley dataset contains various images, and they are composed of the training dataset and test dataset. For this part of experiments, we first used the training dataset to execute the DL process based on the proposed method and the conventional K-SVD method, respectively. We divided the training dataset into 19,194 patches with a size of 4×4, i.e., n=16, and set the initialized dictionaries to Gaussian random matrices. We considered different number of atoms, different maximum number of sparse coefficients, and different number of iterative cycles in this part of experiments. The parameters are the same for both methods in each set of experiments. After the DL process, we used the obtained dictionaries for the sparse representation of test dataset. The test dataset contains 100 test images with the size of 480×312. The images are represented based on the trained dictionaries by being divided into the image patches. The accuracy is evaluated by the peak signal to noise ratio (PSNR), and the results are presented in Figs. 5, 6, and 7.

Fig. 5
figure 5

PSNR of the sparse representations of test images versus the number of iterations, k=2

Fig. 6
figure 6

PSNR of the sparse representations of test images versus the number of iterations, k=3

Fig. 7
figure 7

PSNR of the sparse representations of test images versus the number of iterations, k=4

It could be noted that, with the same parameters, the dictionaries trained by the CK-SVD can provide more accurate sparse representations to the test images. As the number of iterations increases, the performance of dictionaries is improved. When the number of iterations exceeds 10, the increasing trend is slow. Similarly, the accuracy of sparse representations can benefit from the increase of q0 the larger q0. However, the growth trend slows down ceaselessly, and a too large q0 would increase the computational complexity of sparse coding and dictionary updating. The larger number of sparse coefficients could also improve the performance of dictionaries. But it is not suggested to set a too large k, as it would reduce the sparsity of images.

4.2 Applied to compressive sensing

Compressive sensing (CS) is the technique that compressively samples and reconstructs signals from fewer measurements, in order to reduce the cost of signal transmission and storage[1923]. The signal x can be compressively sampled by a sensing matrix Φ expressed as:

$$ \begin{aligned} \boldsymbol{y}=\boldsymbol{\Phi}\boldsymbol{x}, \end{aligned} $$
(8)

where \(\boldsymbol {y} \in \mathbb {R}^{m}\) is the measurements and \(\boldsymbol {\Phi }\in \mathbb {R}^{m\times n}\) with m<n. When x denotes the vectorization of an image patch, its sparse representation in terms of the dictionary \(\boldsymbol {D}\in \mathbb {R}^{n\times q}\) can be written as:

$$ \begin{aligned} \boldsymbol{x}=\boldsymbol{D}\boldsymbol{s}, \end{aligned} $$
(9)

where \(\boldsymbol {s}\in \mathbb {R}^{q}\) is the sparse coefficients vector. The operation that recovers the sparse coefficients s from measurements y, sensing matrix Φ, and dictionary D is referred to as reconstruction, which is expressed by:

$$ \boldsymbol{\hat{s}}=\text{arg}\: \text{min}_{\boldsymbol{s}}\|\boldsymbol{s}\|_{0} \ \ s.t.\; \boldsymbol{y}=\boldsymbol{\Phi}\boldsymbol{D}\boldsymbol{s}. $$
(10)

This problem can be directly solved by greedy algorithms[17, 2426], and then the original signal x can be obtained by (9).

In this part of experiments, we considered the applications of the proposed methods on compressive sensing. We trained the dictionaries from the samples introduced in Section 4.1 using the compared methods. For the training process, we divided the training images into 36,000 patches with the sizes of 6×6 and 8×8. For the size of 6×6, i.e., n=36, the maximum number of coefficients for all methods was set to 6 and the total number of atoms was set to 324; for the size of 8×8, i.e., n=64, the maximum number of coefficients for all methods was set to 9 and the total number of atoms was set to 576. The number of atoms for all compared method is the same. Hence, for the standard CK-SVD, we set the numbers of clusters to 6 and 9 as selected only 1 atom for each cluster. For the SwCK-SVD, we set the number of clusters to 3, and therefore, the maximum numbers of used atoms for each cluster were 2 and 3 for different patch sizes. The maximum number of iterative cycles was set to 20 for all compared methods. We select three typical images, “Elephant,” “Horse,” and “Penguin” from the testing dataset as original images and divided them into a set of patches, the size of which was the same as training samples. Each patch was compressed by using the Gaussian random matrices with numbers of measurements, denoted as m. This technique is also referred as block-CS [27]. The measurements were then reconstructed by using the OMP algorithm, and their accuracy was measured by the PSNR. To reduce the influence of stochastic factors that were brought by the Gaussian random sensing matrices, we repeated each trial for 50 times. The average results are presented in Tables 2, 3, and 4.

Table 2 Comparison of PSNR (dB) of reconstructed image “Elephant” with different number of measurements and different patch sizes
Table 3 Comparison of PSNR (dB) of reconstructed image “Horse” with different number of measurements
Table 4 Comparison of PSNR (dB) of reconstructed image “Penguin” with different number of measurements

The results demonstrate that the PSNR of the reconstructed images based on the dictionaries trained by the CK-SVD is much higher than those based on the dictionary trained by the conventional K-SVD, regardless of the original image, patch size, and the number of measurements. The reason is the CS greedy reconstruction requires that the number of measurements should be at least twice the number of sparse coefficients; if not, the primary sparse coefficients may not be completely reconstructed, and the reconstruction accuracy would be significantly influenced. On the other hand, the required number of sparse coefficients for accurate representation using the proposed method is much smaller than that using the conventional K-SVD, as the main features can be presented by only the coefficients and atoms of the first cluster. Taking the example of the experiments with the patch size of 8×8, for the conventional K-SVD, the maximum number of sparse coefficients is 9. For the proposed methods, the number of cluster is 3, i.e., each cluster contains at most three coefficients. Then, when employed for CS reconstruction, for the dictionaries trained by the conventional K-SVD, maybe only when all nine sparse coefficients are found out would the image can be reconstructed accurately. But for the dictionaries trained by the proposed method, finding out only the coefficients of the first cluster, i.e., 3 coefficients, could lead to a satisfied reconstruction. Therefore, the proposed method has an advantage on CS reconstruction, especially for the low sampling ratio. As the extensions of CK-SVD, both the SwCK-SVD and the DCK-SVD show the improvement over the standard CK-SVD, benefiting from the adaptive sparsity and the dynamic refinement of atoms, respectively. The CK-SVD, the SwCK-SVD, and the DCK-SVD also outperform the methods proposed in [28] and [29]. Besides the PSNR comparison, we also provide the visual comparison of the results. The reconstructed images based on the patch size of 8×8 with m=14 and m=32 are presented in Figs. 8 and 9. It can be noted that the quality of reconstructed images based on the proposed methods is obviously higher than those based on the conventional K-SVD and the methods proposed in [28] and [29].

Fig. 8
figure 8

Reconstructed images by using different dictionaries with the patch size of 8×8, m=14

Fig. 9
figure 9

Reconstructed images by using different dictionaries with the patch size of 8×8, m=32

Besides the Gaussian-initialized dictionary, we also used the discrete cosine transform (DCT) basis and pre-learned dictionary as the initialized dictionary for experiment. For the DCT case, we first generated an over-complete DCT basis containing the same amount of atoms as the Gaussian-initialized dictionary. Then, we employed all the compared methods to conduct the DL processes. The learned dictionaries were utilized for CS reconstruction of the “Horse” image. Other experimental settings remained unchanged. The results are provided in Table 5.

Table 5 Comparison of PSNR (dB) of reconstructed image “Horse” using the DCT basis for initialization

For the experiment of pre-learned dictionary, we firstly chose 10,000 patches for per-training using the conventional K-SVD. Then, the pre-learned dictionary was trained again by another 10,000 patches using all compared methods. The initialized dictionaries were the over-complete DCT basis. Other experimental setting were the same as those used in the previous CS reconstruction experiments. The PSNR of the reconstructed “Horse” image using different dictionaries (including the pre-learned dictionaries) was summarized in Table 6.

Table 6 Comparison of PSNR (dB) of reconstructed image “Horse” using the pre-learned dictionaries for initialization

The results in Table 5 demonstrate the over-complete DCT basis can also be utilized as the initialized dictionary for the proposed DL methods. Comparing to the Gaussian random-initialized dictionary, the DCT-initialized dictionary provide better performance of accuracy, regardless of the DL methods. In Table 6, it can be noted that the pre-learned dictionaries can be still trained by new samples with or without using the online DL methods, and the performance is improved obviously after the new training process. Similarly, the proposed methods outperforms the conventional methods.

4.3 Applied to image denoising

Besides the CS, we also performed the experiments to apply the proposed method to image denoising. Two images were selected for this part of experiments. The first is the “Koala” image from the Berkeley image dataset with the size of 320×480, and it has been introduced in Section 2.1. The other is the standard test image “Pepper” with the size of 512×512. The experimental setup is described as follows.

  • We first added the Gaussian white noise to the original images with zero mean and the variance of σ.

  • We selected 20,000 patches and 40,000 patches of the noised “Koala” image and the noised “Pepper” image, respectively. The size of all patches was 8×8.

  • We used the vectorization of these patches to conduct the DL process for two test images.

  • We employed the learned dictionary for denoising by using the strategy given in [32].

We initialized all dictionaries as the over-complete DCT basis with the size of 64×256 and set the maximum of sparse coefficients for each training sample to 6. The maximum number of iterative cycles was set to 10. We set the noise level to be σ=15 and σ=20. For each noise level, we repeated the above process. More detailed description of the denoising experiments can be found in [32]. The conventional K-SVD, the methods introduced in [28] and [29], and the proposed CK-SVD were employed for comparison. The results are presented in Figs. 10 and 11. It can be noted from the results that the dictionaries trained by the proposed CK-SVD provide to the most accurate denoised images, regardless of the original images and the noise levels.

Fig. 10
figure 10

Denoising results with the noise level of σ=15

Fig. 11
figure 11

Denoising results with the noise level of σ=20

5 Discussion

As mentioned in Section 3, this study not only develops a K-SVD-based method but also provides the clustering DL model. The potential and advantage of the clustering model mainly come from two aspects. First, different cluster of dictionaries is isolated from each other. Thus, an atom of learned dictionary could concentrate on a specific type of feature, leading greater utilization of atoms. In other words, a common phenomenon in the conventional DL model can be avoid, that is, a part of atoms is widely employed by training samples whereas others are seldom used. Second, the clustering DL model makes it possible to adjust the sparsity based on different training samples and therefore to reduce the underfitting of overfitting of sparse representation. We provide the SwCK-SVD by adaptively selected the number of used atoms for each cluster. It is believed that the adaptive strategy can also be implemented by adjusting the number of clusters. This potential is verified by the fact the SwCK-SVD performs obviously better than the standard CK-SVD.

Future work could consider extending the clustering DL model to online learning. In this study, we focus on the batch DL and the dictionary updating strategy is based on the SVD. We believe the proposed clustering DL model is not limited to batch DL but can be extended to online DL problem. In [7], the standard ODL method is proposed, for which the information-storing variables are updated when a new training sample inputs. The variables are then used to update the learned dictionary through an optimization approach. For clustering DL, we may utilize a set of information-storing variables for different cluster of dictionaries. When a new sample inputs, we could employ Algorithm 1 or Algorithm 3 to solve the sparse coefficients with respect to different cluster of sub-dictionaries. Then, the sparse coefficients are used to update different cluster of information-storing variables. Finally, we could update the sub-dictionaries based on the information-storing variables, such that the clustering ODL is achieved. We believe the clustering ODL has the potential to be applied in the cases where training samples cannot be obtained simultaneously.

6 Conclusions

We proposed a DL method named CK-SVD for sparse representation of images. For CK-SVD, the atoms of dictionary are divided into a set of groups, and each group of atoms serve for image features of a specific cluster. Hence, the features of all clusters can be utilized and the redundant atoms are avoided. Based on this strategy, we introduced the CK-SVD and two practical extensions. Experimental results demonstrated that the proposed methods could provide more accurate sparse representation of images, compared to the conventional K-SVD algorithm and its extended methods.

Availability of data and materials

Please contact the authors for data requests.

Abbreviations

CK-SVD:

Clustering K-singular value decomposition

CS:

Compressive sensing

DCK-SVD:

Dynamic CK-SVD

DL:

Dictionary learning

K-SVD:

K-singular value decomposition

LS:

Least square

MOD:

Method of optimal directions

OMP:

Orthogonal matching pursuit

PSNR:

Peak signal-to-noise ratio

SwCK-SVD:

Sparsity-wise CK-SVD

References

  1. R. Rubinstein, A. M. Bruckstein, M. Elad, Dictionaries for sparse representation modeling. Proc. IEEE. 98(6), 1045–1057 (2010).

    Article  Google Scholar 

  2. X. Lu, D. Wang, W. Shi, D. Deng, Group-based single image super-resolution with online dictionary learning. EURASIP J. Adv. Signal Process.2016(84), 1–12 (2016).

    Google Scholar 

  3. V. Naumova, K. Schnass, Fast dictionary learning from incomplete data. EURASIP J. Adv. Signal Process.2018(12), 1–21 (2018).

    Google Scholar 

  4. L. Zhang, W. Zuo, D. Zhang, LSDT: latent sparse domain transfer learning for visual adaptation. IEEE Trans. on Image Process.25(3), 1177–1191 (2016).

    Article  MathSciNet  Google Scholar 

  5. K. Engan, S. O. Aase, J. H. Husy, Multi-frame compression: theory and design. EURASIP Signal Process.90(2), 2121–2140 (2000).

    Article  Google Scholar 

  6. M. Aharon, M. Elad, A. Bruckstein, The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process.54(11), 4311–4322 (2006).

    Article  Google Scholar 

  7. J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res.11:, 19–60 (2010).

    MathSciNet  MATH  Google Scholar 

  8. B. Dumitrescu, P. Irofti, Regularized K-SVD. IEEE Signal Process. Lett.24(3), 309–313 (2017).

    Article  Google Scholar 

  9. M. Nazzal, F. Yeganli, H. Ozkaramanli, A strategy for residual component-based multiple structured dictionary learning. IEEE Signal Process. Lett.22(11), 2059–2063 (2015).

    Article  Google Scholar 

  10. J. K. Pant, S. Krishnan, Compressive sensing of electrocardiogram signals by promoting sparsity on the second-order difference and by using dictionary learning. IEEE Trans. Biomed. Circuits Syst.8(2), 293–302 (2014).

    Article  Google Scholar 

  11. L. N. Smith, M. Elad, Improving dictionary learning: multiple dictionary updates and coefficient reuse. IEEE Signal Process. Lett.20(1), 79–82 (2013).

    Article  Google Scholar 

  12. R. Zhao, Q. Wang, Y. Shen, J. Li, Multidimensional dictionary learning algorithm for compressive sensing-based hyperspectral imaging. J. Electron. Imaging. 25(6), 063013 (2016).

    Article  Google Scholar 

  13. K. Skretting, K. Engang, Recursive least squares dictionary learning algorithm. IEEE Trans. Signal Process.58(4), 2121–2130 (2010).

    Article  MathSciNet  Google Scholar 

  14. J. A. Tropp, Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory. 50(10), 2231–2242 (2004).

    Article  MathSciNet  Google Scholar 

  15. E. J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory. 52(2), 489–509 (2006).

    Article  MathSciNet  Google Scholar 

  16. E. J. Candès, T. Tao, Decoding by linear programming. IEEE Trans. Inf. Theory. 51(12), 4203–4215 (2005).

    Article  MathSciNet  Google Scholar 

  17. J. A. Tropp, A. C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory. 53(12), 4655–4666 (2007).

    Article  MathSciNet  Google Scholar 

  18. D. Martin, C. Fowlkes, D. Tal, J. Malik, in Proc. IEEE Int. Conf. Comput. Vis. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics (IEEEVancouver, 2001), pp. 416–423.

    Google Scholar 

  19. D. L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory. 52(4), 1289–1306 (2006).

    Article  MathSciNet  Google Scholar 

  20. E. J. Candès, Compressive sampling. Int. Congress of Mathematicians, Madrid, Spain. 3:, 1433–1452 (2006).

    MathSciNet  MATH  Google Scholar 

  21. A. Massa, P. Rocca, G. Oliveri, Compressive sensing in electromagnetics - a review. IEEE Anten. Propag. Mag.57(1), 224–238 (2015).

    Article  Google Scholar 

  22. D. Craven, B. McGinley, L. Kilmartin, M. Glavin, E. Jones, Compressed sensing for bioelectric signals: a review. IEEE J. Biomed. Health Inf. 19(2), 539–540 (2015).

    Google Scholar 

  23. Y. Zhang, L. Y. Zhang, et. al, A review of compressive sensing in information security field. IEEE Access. 4:, 2507–2519 (2016).

    Article  Google Scholar 

  24. D. Nion, N. D. Sidiropoulos, Tensor algebra and multidimensional harmonic retrieval in signal processing for MIMO radar. IEEE Trans. Signal Process.58(11), 5693–4705 (2010).

    Article  MathSciNet  Google Scholar 

  25. W. Dai, O. Milenkovic, Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inf. Theory. 55(5), 2230–2249 (2009).

    Article  MathSciNet  Google Scholar 

  26. D. L. Donoho, Y. Tsaig, I. Drori, J. L. Starck, Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inf. Theory. 58(2), 1094–1121 (2012).

    Article  MathSciNet  Google Scholar 

  27. L. Gan, in Proc. IEEE Int. Conf. Digit. Signal Process. Block compressed sensing of natural images (IEEEWales, 2007), pp. 403–406.

    Google Scholar 

  28. R. Rubinstein, M. Zibulevsky, M. Elad. Efficient Implementation of the K-SVD Algorithm Using Batch Orthogonal Matching Pursuit. Technical Report CS-2008-08 (Technion UniversityHaifa, 2008).

  29. J. Mairal, G. Sapiro, M. Elad, Learning multi-scale sparse representations for image restoration. Multiscale Model. Simul.7(1), 214–241 (2008).

    Article  MathSciNet  Google Scholar 

  30. Y. Yi, Y. Cheng, C. Xu, Visual tracking based on hierarchical framework and sparse representation. Multimed. Tools Appl.77(13), 16267–16289 (2018).

    Article  Google Scholar 

  31. S. Tariyal, A. Majumdar, R. Singh, M. Vatsa, Deep dictionary learning. Multimed. Tools Appl.4:, 10096–10109 (2016).

    Google Scholar 

  32. M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process.15(12), 3736–3745 (2006).

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the National Key R&D Program of China and the National Natural Science Foundation of China for the financial support.

Funding

This work was supported by the National Key R&D Program of China under Grant 2017YFD0700302 and by the National Natural Science Foundation of China under Grant 51705193.

Author information

Authors and Affiliations

Authors

Contributions

JF provided the methodology. RZ wrote the original manuscript. HY reviewed and edited the manuscirpt. JF and LR funded this study. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rongqiang Zhao.

Ethics declarations

Consent for publication

This manuscript does not contain any individual person’s data in any form.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, J., Yuan, H., Zhao, R. et al. Clustering K-SVD for sparse representation of images. EURASIP J. Adv. Signal Process. 2019, 47 (2019). https://doi.org/10.1186/s13634-019-0650-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13634-019-0650-4

Keywords