A wavelet denoising approach based on unsupervised learning model

Image denoising plays an important role in image processing, which aims to separate clean images from noisy images. A number of methods have been presented to deal with this practical problem over the past several years. The best currently available wavelet-based denoising methods take advantage of the merits of the wavelet transform. Most of these methods, however, still have difficulties in defining the threshold parameter which can limit their capability. In this paper, we propose a novel wavelet denoising approach based on unsupervised learning model. The approach taken aims at exploiting the merits of the wavelet transform: sparsity, multi-resolution structure, and similarity with the human visual system, to adapt an unsupervised dictionary learning algorithm for creating a dictionary devoted to noise reduction. Using the K-Singular Value Decomposition (K-SVD) algorithm, we obtain an adaptive dictionary by learning over the wavelet decomposition of the noisy image. Experimental results on benchmark test images show that our proposed method achieves very competitive denoising performance and outperforms state-of-the-art denoising methods, especially in the peak signal to noise ratio (PSNR), the structural similarity (SSIM) index, and visual effects with different noise levels.


Introduction
Image denoising as a low-level image processing operator is an important front-end procedure for high-level visual tasks such as object recognition, digital entertainment, and remote sensing imaging. Noise is a random variation of image intensity and appears as grains in the image. It may arise in the image as effects of basic physics-like photon nature of light or thermal energy of heat inside the image sensors. Noise means that the pixels in the image show different intensity values instead of true pixel values. Digital images may be contaminated during acquisition, transmission, and compression [1], by diverse types of noise [2], generated by different causes, such as signal instabilities, defective sensors, physical deterioration of the materiel due to aging, poor lighting conditions, errors in the transmission due to channel noise, or interference caused by electromagnetic fields. Noise suppression is of great interest in digital image processing, considering that the quality improvement of corrupted images is of essential importance for the majority (2020) 2020: 36 Page 2 of 26 of image processing areas, including analysis of images, detection of edges, and pattern recognition. Mathematically, the problem of image denoising can be modeled as follows: where y is the observed noisy image, x is the original image, and b represents additive zero-mean white and homogeneous Gaussian noise with standard deviation σ . The goal of image denoising is to recover the original image from its noisy observation where the main challenge is to remove noise while retaining as much as possible the important signal features and improving the PSNR as a common metric used to assess the performance of the denoising methods, the higher the value of PSNR, the more accurate is the denoising.
Owing to solve the clean image x from the (1) is an ill-posed problem, we cannot get the unique solution from the image model with noise. To obtain a good estimation imagẽ x, image denoising has been well-studied in the field of image processing over the past several years. Generally, image denoising methods can be classified as [3] spatial domain methods, transform domain methods, which are introduced in more detail in [4].
Wavelet transform (WT) [5] has proved to be effective in noise removal. It decomposes the input signal into multiple scales, which represent different space-frequency components of the original signal. At each scale, some operations, such as thresholding [6][7][8][9][10][11] and statistical modeling [12][13][14], can be performed to suppress noise. Denoising is accomplished by transforming back the processed wavelet coefficients into spatial domain. These methods known as wavelet-based denoising techniques can be viewed also as fixed basis dictionaries [15][16][17][18][19][20][21][22] to whole images. These wavelet-based methods have demonstrated its efficiency in denoising and have achieved state-of-the-art PSNR performances. However, in the denoising process, these methods use a thresholding technique, by using one of the most popular thresholding functions: the soft-thresholding function and the hard-thresholding function. Consequently, a small threshold retains the noisy wavelet coefficients, and hence, the resultant images may still be noisy whereas a large threshold makes a greater number of wavelet coefficients to zero, which leads to smooth image and image processing may cause blur and artifacts. Therefore, in some of the previous wavelet-based denoising methods based on the thresholding technique, even though the quantitative results are promising, the artifacts in the denoised images are quite noticeable.
In this paper, we address these issues by proposing a new wavelet denoising approach based on unsupervised learning model. In the proposed method, the approximation and wavelet coefficients are denoised by using an adaptive dictionary learned over the set of extracted patches from the wavelet representation of the corrupted image, instead of using the thresholding operator. On the other hand, sparse and redundant representation model is applied to image denoising and has drawn a lot of researchers' attention. In [23], the K-SVD algorithm was proposed for learning an adaptive dictionary for sparse representation of gray-scale image patches. Inspired by the idea of prior-learning on the corrupted image, the K-SVD algorithm was used to remove white Gaussian noise [24,25]. This patch-based sparse representation algorithm [26] has shown to outperform in both providing the sparse representation and capability of denoising. Motivated by these ideas and the merits of the wavelet transform, sparsity, multi-resolution structure, and similarity with the human visual system, in this paper, we propose a new image denoising method that takes advantages of the wavelet transform and sparse coding framework. In our proposed scheme, the prior-learning on the corrupted image is transferred to the wavelet coefficients of it. And the adaptive dictionary has been learned in wavelet domain. It means image denoising in the framework of sparse representations in wavelet domain. Moreover, the proposed model only uses the wavelet decomposition of the noisy image as a training set in the dictionary learning phase, and it does not involve the target output. The model has to learn by its own through determining and adapting according to the structural characteristics in the input patterns. As a result, the proposed work is able to produce a wavelet denoising approach based on unsupervised learning model, by using the K-SVD as a dictionary learning algorithm, and without making any a priori assumption about the data except a parsimony principle.
Experimental results show that the proposed method provides impressive denoising results compared with the wavelet thresholding approach for image denoising, the K-SVD denoising method, and the total variation (TV) denoising method based on the split Bregman technique [27], one of the best denoising models. It also found in the experiments that the proposed method has better performance than the block-matching and 3D filtering (BM3D) method [28] that is often regarded as a state-of-the-art denoising algorithm.
The rest of the paper is organized as follows. Section 2 briefly reviews the wavelet transform. It presents the advantages of wavelet theory and introduces related work. It exposes then and explains the proposed approach. Section 3 addresses the experimental protocol and discusses the obtained results. Finally, we will conclude the paper in Section 4.

Wavelet
A wavelet is a function that oscillates like a wave but is quickly attenuated. A wavelet is a function ψ of L 2 (R) who verifies the following admissibility condition:

Discrete wavelet transform
The discrete wavelet transform is based on the concept of multi-resolution analysis (MRA) introduced by Mallat [29]. The discrete wavelet transform (DWT) of image signals produces a non-redundant image representation, which provides better spatial and spectral localization of image formation, compared with other multi-scale representations such as Gaussian and Laplacian pyramid. Recently, the discrete wavelet transform has attracted more and more interest in image denoising. The DWT can be interpreted as signal decomposition in a set of independent, spatially oriented frequency channels. The signal S is passed through two complementary filters and produces two signals: approximation and details. This is called decomposition or analysis. The components can be assembled back into the original signal without loss of information. This process is called reconstruction or synthesis. The mathematical manipulation, which implies analysis and synthesis, is called a discrete wavelet transform and inverse discrete wavelet transform [30]. 1D discrete wavelet transform Every analog signal x(t) with finite energy can be decomposed into a sum of shifted and dilated wavelet functions ψ(t) and shifted scale functions φ(t): where c(k) are scale coefficients and d(j, k) wavelet coefficients. This is a dyadic variant of the DWT. Scale and wavelet coefficients are calculated using scalar products: Hence, filter banks with perfect reconstruction property can be used as a simple realization of the DWT using low-pass and high-pass filters associated, respectively, to the scale function, and the wavelet function [5].
2D discrete wavelet transform Separable 2D discrete wavelet transform is the simplest form of the two-dimensional wavelet generalization. It consists of a standard 1D DWT applied to each row and then to each column as shown in Fig. 1.
In Fig. 1, if an image has N 1 rows and N 2 columns, decomposition results in four quarter-size images (N 1 /2 × N 2 /2): details (LH, HL, HH) and approximation LL. Approximation LL is product of two low-pass filters and provides for an input to the next decomposition level. The reconstruction is performed in the opposite way: first on columns, then on rows. Separable 2D DWT has three wavelet functions (m and n are coordinates of the input image): and one scale function: associated to the approximation LL [31]. An N level decomposition can be performed resulting in 3N + 1 different frequency bands: LL is low frequency or approximation coefficients, and the wavelet image coefficients LH, HL, and HH which correspond, respectively, to vertical high frequencies (horizontal edges), horizontal high frequencies (vertical edges), and high frequencies in both directions (corners), as shown in Fig. 2. In Fig. 2, the number written next to the sub-band name shows the level. The next level of wavelet transform is applied to the low-frequency sub-band image LL only. For more details about the 2D discrete wavelet transform and 2D inverse discrete wavelet transform, the reader can refer to [5] and [32][33][34].

Advantages of wavelet theory
One of the main advantages of wavelets is that they allow complex information such as images to be decomposed into elementary forms at different positions and scales and subsequently reconstructed with high precision [5].
The second main advantage of wavelets is that using fast wavelet transform based on filter banks [5], it is computationally efficient.
Wavelet transform provides sparse representation for a large class of signals [5], and it is capable of revealing aspects of data that other signal analysis techniques miss the aspects like trends, breakdown points, and discontinuities in higher derivatives and self-similarity [35].
Wavelets have the great advantage of being able to capture the energy of a signal in few energy transform values, it does not change the number of pixels required to represent the image and separate the information in a way that resembles the human visual system [36].

The K-SVD denoising method
Sparse and redundant representations model of signal Given a signal y ∈ R n and a over- The sparse and redundant representations model of signal is equivalent to the following problem: here α 0 is called l 0 norm. It defines the sparsity measurement of the vector α. The problem (P 0 ) can be solved by the orthogonal matching pursuit (OMP) algorithm [37] because of its simplicity and efficiency.

Sparse and redundant representations model for image denoising
We address the classical image denoising problem: a clear image x is corrupted by an additive zero-mean white and homogeneous Gaussian noise z, with standard deviation σ and z 2 ≤ ε, and the observed noisy image y is generated. Hence, Assume x ∈ R n has sparse representation over redundant dictionary, modifying (P 0 ), we get the denoising model as follows: Here δ = δ(ε). And we get the denoised imagex = Dα [38,39].

The K-SVD image denoising method
The key point for solving problem (P 0,δ ) is to find a suitable redundant dictionary, for this reason, the K-SVD algorithm [23] is proposed. The basic idea is when the training signal and the initial dictionary are given, then the prior-learning idea is used. It learns a dictionary that yields sparse representations for the training signal. The algorithm alternates a sparse coding step based on the OMP method and a dictionary updating step based on a simple singular value decomposition (SVD). The reader can refer to [23,24] for some details.

Wavelet-based image denoising method
Wavelet-based methods have proved to be effective in noise removal. These methods are mainly based on thresholding the discrete wavelet transform coefficients, which have been affected by additive white Gaussian noise [6]. As shown in Fig. 3, the basic denoising algorithms that use DWT consist of three steps: • The discrete wavelet transform is adopted to decompose the noisy image and get the wavelet coefficients. • The wavelet coefficients are denoised by using the wavelet thresholding technique.
• The inverse discrete wavelet transform is applied to the modified coefficients to get the denoised image.
The goal is to estimate the denoised image f ij from noisy observation g ij . It is convenient to label the sub-bands of the transform as in Fig. 2. The sub-bands HH k , HL k , and LH k are called the details, where k is the level ranging from 1 to J, where J is the largest level. The sub-band LL J is the low resolution residual. The wavelet transform is applied to (13) to obtain the wavelet coefficients c ij . Then, the wavelet thresholding technique is used to filter each wavelet coefficient from the detail sub-bands with a threshold function to obtain the modified coefficients. Finally, the inverse wavelet transform is applied to the modified coefficients to get the denoised image. There are two thresholding methods frequently used. The soft-thresholding function (also called the shrinkage function): takes the argument and shrinks it towards zero by the threshold T. The other popular alternative is the hard-thresholding function: which keeps the input if it is larger than the threshold; otherwise, it is set to zero. The wavelet thresholding procedure removes noise by thresholding only the wavelet coefficients of the detail sub-bands, while keeping the low-resolution coefficients unaltered [7,30]. Donoho et al. [8] have discussed a simple, but influential wavelet-based denoising pattern known as VisuShrink. The outcomes of VisuShrink are stable along with an alluring visual feature. VisuShrink uses the universal threshold, T, which is proportional to the standard deviation of the noise, and it is defined as follows [9]: where σ 2 is the noise variance, defined as follows: where c ij ∈ HH 1 sub-band thresholding. S: Number of pixel for the test image.

Total variation denoising method
TV-based regularization is the most influential research in the field of image denoising. TV regularization is based on the statistical fact that natural images are locally smooth, and the pixel intensity gradually varies in most regions. It has achieved great success in image denoising because it cannot only effectively calculate the optimal solution but also retain sharp edges. The l 1 total variation functional was first introduced by Rudin, Osher, and Fatemi (ROF) [40] to address image-denoising problems. It is formulated as: wheref is the reconstructed denoised image, f is the noisy image, . 1 is the l 1 -norm, . 2 is the l 2 -norm, and σ is an error tolerance included to account for noisy data. The choice of technique for solving l 1 regularization-based problems is crucial, as l 1 is nonlinear; therefore, the computational burden can increase significantly using classic gradient-based methods. The recently published Split Bregman (SB) method [27,41] is a simple and efficient algorithm for solving l 1 regularization-based problems that makes it possible to split the minimization of l 1 and l 2 functionals. By applying the SB method to image denoising proposed in [27], the authors showed that the TV-denoising problem solved using SB formulation was computationally efficient, given that the SB formulation leads to a problem that can be solved using Gauss-Seidel and Fourier transform methods.

BM3D denoising method
The BM3D method [28] is a famous denoising method. It has become a baseline algorithm to test the performance of denoising algorithms. BM3D includes the following steps: the first step is block matching: for each image block located at j, the similar image blocks with size √ n × √ n are collected in groups with member number I j . Image blocks in each group are stacked together to form a √ n × √ n × I j 3D data array. Second, the sparsity (2020) 2020:36 Page 9 of 26 regularization step, the 3D arrays are decorrelated by using an invertible sparse 3D transform such as discrete cosine transform (DCT) [42] and then are filtered by thresholding or Wiener filtering. Finally, the restoration is obtained by aggregating all the estimated image patches.

Proposed denoising method
Suppose the input noisy image y is from a clean image x contaminated by additive zeromean white and homogenous Gaussian noise z, with standard deviation σ . The measured image y is, thus We desire to design an algorithm that can remove the noise from y, getting a denoised imagex as close as possible to the original image x. As shown in Fig. 4, the proposed denoising method consists of three steps: • In the first step, discrete wavelet transform is applied to the noisy image y, to get the approximation and wavelet coefficients. • In the second step, approximation and wavelet coefficients are denoised by using an adaptive dictionary learned on the set of extracted patches from wavelet representation of the noisy image, by the K-SVD as a dictionary learning algorithm.
• In the third step, inverse discrete wavelet transform is applied for reconstructing the image, which results in the denoised imagex.
The detailed steps of the proposed method can be given bellow: In (19), we assume clear image x of size √ N × √ N pixels and then use 2D discrete wavelet transform on the noisy image y. According to (19) and the property of wavelet transform, we have: where Wy, Wx, and Wz are the wavelet transform of y, x, and z, respectively. Just like the K-SVD algorithm, learning dictionary on small image patches, note each small patch Wx ij = R ij Wx of size √ n × √ n in every location (i, j) of Wx. The matrix R ij is an n × N matrix that extracts the (ij) block size of n×1 pixels from the image Wx. is given, according to the sparse prior of wavelet coefficients, every patch Wx ij has a sparse representation with bounded error, we get: The Lagrange form of it is: By maximum a posteriori estimation, denoising of Wy is equivalent to the energy minimization problem: In this expression, the first term is the log-likelihood between Wx and Wy, and λ depends on Wx − Wy 2 2 ≤ Cst × σ 2 . The second and the third terms are the sparse prior of wavelet coefficients.
When D is known, we can solve (23) by two steps. First, let Wx = Wy, thus (23) is equivalent to solve ( √ N − √ n + 1) 2 problems of (22) which can be solved by OMP. This step is called sparse coding. Second, when gettingα ij for all (ij) locations, fix them and turn to update Wx. Returning to (23), we need to solve: This is a simple quadratic term that has a closed-form solution of the form: This expression says the averaging of the denoised patches and is called patches averaging step. Given the updated, we can repeat the sparse coding stage, working on the already denoised patches. Once this is done, a new averaging should be calculated, and so on, and so forth. Finally, the satisfyingα ij and Wx are obtained. In practice, the dictionary D is unknown and we can get it by learning which is same to [23]. After fusion dictionary learning in (23), we get the denoising model as follows: In this work, the K-SVD algorithm is used to learn and update the initial redundant DCT dictionary, and Wy ij is the training set. As so far, we can solve (26) as follows: (a) Given the initial dictionary D and let Wx = Wy, then computeα ij by (21); (b) update initial dictionary D toD using K-SVD; and (c) compute Wx by (25). Finally, letD and Wx are the initial D and Wx, then repeat the above process until getting the denoised image Wx.
As so far, we have finished the denoising process in the wavelet domain, the 2D inverse discrete wavelet transform is used to get the restoration imagex. The following sequence defines our proposed algorithm: Algorithm 1 Wavelet denoising approach based on unsupervised learning model Step 1 Choose a suitable wavelet basis, use a 2D discrete wavelet transform on the noisy image y and get Wy.
Step 2 Do the following operations for Wy.

2.1
Choose the parameters: λ-Lagrange multiplier, C-noise gain, knumber of atoms, n-size of patches and J-number of training iterations. Initialization: Set Wx = Wy, D =redundant DCT dictionary and p = 0 as the counter of loop.

2.2
Use OMP to compute coefficientsα ij ∈ R k×1 for each patch by solving (21).

Set D =D in
Step 3 Use 2D inverse discrete wavelet transform on Wx, to get the denoised imagex.

Results and discussion
In this section, we aim to demonstrate the advantages and the performance that our proposed wavelet denoising approach based on unsupervised learning model has.
Our proposed algorithm is evaluated and compared with four representative and stateof-the-art denoising algorithms: the wavelet thresholding approach for image denoising, the sparse representation-based K-SVD denoising method in the image domain, the TV denoising method, and the BM3D denoising method.
In our experiments, we choose five well-known images as test images, including "Barbara, " "House, " "Flinstones, " "Bridge, " and "Fingerprint. " Each image is contaminated by adding zero-mean white Gaussian noise with various deviations. These following parameters are used in our proposed method: for the choice of the best wavelet basis, we test several wavelets under each noise level for all images and take the results with the highest PSNR for comparison, as a result, for the wavelet transform and inverse wavelet transform, we choose the Coiflet wavelet transform (MATLAB 'coif ') for the two images: "House, " "Fingerprint, " and Symlet wavelet transform (MATLAB 'sym') for the rest of images, the size of patches n = 64, the number of dictionary elements k = 256, the Lagrange multiplier λ = 30/σ , and the noise gain C = 1.15. Since we evaluate denoised images with the measures of the PSNR and SSIM index, the formulas of the PSNR and SSIM are also given. The PSNR is estimated based on the mean square error (MSE) as: where I 1 and I 2 denotes two images of size N 1 × N 2 , I 1 (i, j) represents the pixel value on the ith row and jth column of I 1 . The SSIM index is defined as: where μ I 1 and σ I 1 denote the average gray values and the variance for I 1 , σ I 1 I 2 represents the covariance between the two images I 1 and I 2 , and the symbols C 1 and C 2 are two small constants [43]. The higher the value of PSNR or SSIM, the closer is the denoised image I 2 to the original image I 1 , hence the more accurate is the denoising. The comparison of quantitative results is shown in Tables 1 and 2. A visual comparison is shown in Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16. Let's first see the PSNR and SSIM results by different methods on the five test images. The statistical results of the five methods are given in the PSNR measure in Table 1.
The results show that the proposed method outperforms the three denoising methods: wavelet thresholding, TV, and K-SVD, in all cases. Compared to the BM3D, for the two images: "Barbara" and "House", the proposed method has the lowest PSNR value under some noise level variations, but for the rest of images, the proposed method has the higher PSNR measures than BM3D. The statistical results of the five methods are also recorded in the SSIM measure as shown in Table 2. Based on these results, we can deduce that our proposed method is always better than the other denoising methods, which assures the efficiency of our algorithm. Let us then focus on the visual quality evaluation of these denoising algorithms. We take two example images: "Barbara" and "House", to show that although BM3D has higher PSNR measures than the proposed method under some noise level variations for these two images, the proposed method has better visual quality than the BM3D method. To show the performance under different levels of noise, for the two images: "Barbara" and "House", we show the results under low-level noise with standard deviation σ = 10 and noise levels, the results show that the edges are well preserved, the textures and more details are better restored, and least artifacts exist in the result of our proposed method. The performance of the wavelet thresholding approach, the K-SVD, and the TV denoising method is the worst in various noise levels; it is easy to see that the restored images contain too many artifacts, the image edges and flat areas are blurred, and a number of details and textures are lost. It also found in the experiments that the proposed method has much better visual quality than the BM3D method that produces too many artificial ringing effects which are caused by stacking the image blocks and erroneous grouping. Depending on the size of the image, the execution of the main proposed algorithm requires an average of 17 s to 2 min on Intel(R) Core(TM) i3-2330M CPU 2.20 GHz computer. The average time required by the other algorithms to complete their executions is given as follows: the wavelet thresholding approach for image denoising: 2 to 5 s, the sparse representation-based K-SVD denoising method in the image domain: 15 s to 2 min, the TV denoising method: 12 to 30 s, and the BM3D denoising method: 5 to 15 s. The proposed algorithm is slower than the other denoising methods, due to the use of the K-SVD as a dictionary learning algorithm that needs many iteration steps. Our proposed method performs better than the state-of-the-art denoising methods, due to the merits of the wavelet transform and to the use of an adaptive dictionary devoted to noise reduction instead of using the thresholding operator.

Conclusion
We propose in this work a new method for image denoising. The approach taken aims at exploiting the merits of the wavelet transform: sparsity, multi-resolution structure, similarity with the human visual system, and good localization properties both in space and frequency, to adapt an unsupervised dictionary learning algorithm for creating a dictionary devoted to eliminate the useless information while keeping most significant ones. Experiments illustrate that the proposed method achieves a better performance than some other well-developed denoising methods, especially in PSNR, SSIM index, and visual effects. In future work, we will be focused more on reducing the computational cost of the proposed model.