Optimal graph edge weights driven nlms with multi-layer residual compensation

Non-local Means (NLMs) play essential roles in image denoising, restoration, inpainting, etc., due to its simple theory but effective performance. However, when the noise increases, the denoising accuracy of NLMs decreases significantly. This paper further develop the NLMs-based denoising method to remove noise with less loss of image details. It is realized by embedding an optimal graph edge weights driven NLMs kernel into a multi-layer residual compensation framework. Unlike the patch similarity-based weights in the traditional NLMs filters, the edge weights derived from the optimal graph Laplacian regularization consider (1) the distance between the target pixel and the candidate pixel, (2) the local gradient and (3) the patch similarity. After defining the weights, the graph-based NLMs kernel is then put into a multi-layer framework. The corresponding primal and residual terms at each layer are finally fused with learned weights to recover the image. Experimental results show that our method is effective and robust, especially for piecewise smooth images.

in a non-local region (even the whole image), then average them with different weights.Since the similarity between two pixels is computed by the corresponding non-local patches around them, NLMs are robust to noise and yield effective performance.However, the NLMs filter removes some image details such as edges and rare textures during denoising, especially when the SNR is low.Although total variation (TV) regularization models [7,17,26] have been combined with NLMs approaches to deal with rare patches (no similar pixels have been selected), it still can not address this issue.
The basic idea of denoising in the transform domain is to separate noise from the observed signal in the transform domain.In general, the noise is randomly distributed, changes rapidly in images, and has almost uniform power across the whole frequency domain.The clean images usually change slowly in local areas, and their power is generally distributed on low frequencies.Based on this phenomenon, various transforms have been used for denoising, e.g., Curvelet transform [25], Wavelet transform [6], graph Fourier transform [15,18,29].Among all transform-based methods, the block-match in 3D transform-domain filter (BM3D) is the most popular and widely used method [8].BM3D filter is effective by combining the NLMs theory with the wavelet transformbased denoising.Except for the transforms with fixed bases, data-driven transforms have also been widely used in image denoising tasks, including PCA [2], sparse coding [14], dictionary learning [9] and compressed sensing [10].
More recently, machine learning-based denoising methods, especially deep learningbased approaches, attract public attention.The deep network was first applied in image denoising in [16], in which the auto-encoder network does not need manually set parameters for removing the noise.Then Zhang et al. proposed the DnCNN to deal with image denoising, super-resolution, and JPEG image deblocking [34].The generative adversarial network (GAN) is used to remove blind noise in [4].In addition, attention mechanism [30] and batch re-normalization [31] theories have been introduced in denoising tasks, which achieve excellent performance.In a word, the recent deep learning approaches can yield better results than the traditional filters, however, a considerable amount of high-quality training data is required for the network training, which is not always available in reality.
This paper aims at developing the NLMs by introducing the graph signal processing theory and a multi-layer framework.Graph signal processing (GSP) is a powerful and developed tool for analyzing signals on graphs [5,20,24,33].Traditional image processing methods regard the image domain as regular 2D grids.But if one treats each pixel on an image as a node of a graph and constructs proper links between nodes, one can interpret an image as a signal on an irregular graph.Then the computation of similarity between pixels and patches will no longer be restricted by the regular grids.In this way, the image information can be exploited more comprehensively by using the GSP.The optimal graph Laplacian regularization (OGLR) method derives the optimal metric space in the sense of minimum mean square error (MMSE), thus defining the optimal edge weights.Unlike the patch similarity-based weights in the classic NLMs filters, the OGLR defines the weights by considering (1) the distance between the target pixel and the candidate pixel, (2) the local gradient and (3) the patch similarity.The OGLR method yields good results for image denoising.However, it needs a large number of iterations to achieve comparable performance.Moreover, it involves a lot of matrix inversion, making the method time costly.Hence in our paper, we replace the weights in the classical NLMs by the graph edge weights from the OGLR algorithm and embed the newly obtained NLMs into a multi-layer framework.The main contributions of this paper are threefold: Firstly, our method uses the edge weight defined on a graph structure to compute the similarity between nodes and patches, which behaves better than the traditional NLMs.Experiments show that our method is comparable with the state-of-the-art methods, both visually and quantitatively.Furthermore, our method is better at denoising piecewise smooth images, especially when the noise level is high.
Secondly, a multi-layer representation is performed in our method to remove the noise while preserving the details.Obviously, the multi-layer strategy helps to smooth the image better.In addition, for the sake of recovering image details, the residual terms at each layer (the difference between the input and output of the NLMs filter) are combined with the smooth filtered image.
Last but not least, the coefficients of each component derived from each layer, including the smooth filtered image and the residual terms, are learned according to the least square method.These coefficients can be set as default parameters for any image.
Note that a similar idea has been proposed in [28].However, the key difference is that our method adapts the NLMs filter parameters (the graph Laplacian regularization) to the input image, instead of a fixed filter presented in [28].
This paper is organized as follows.Section 2 introduces the related work about graph construction, the OGLR algorithm and the multi-layer representation of filter images.The proposed denoising method is detailed in Sect.3. Experiments and results are presented in Sects.4 and 5 respectively.And finally, conclusions are given in Sect.6.

Graph construction
Let G(V, E) denote a graph structure, V = {v i } N i=1 is a set of N nodes, and E = {e ij } a set of edges.If two different nodes v i and v j are connected, there exists an edge weight w i,j describing the affinity between these two nodes.Generally, the larger the w i,j is, the more similar or correlated the nodes v i and v j are.A widely used form of the weight w i,j is as follows: where φ ij measures the distance between two nodes v i and v j , and r is a threshold.Note that φ ij does not necessarily correspond to the Euclidean distance between the nodes.Typically, w ij is non-negative.Apparently, the larger the distance between two nodes, the smaller w i,j is.The weighted affinity matrix W ∈ R N ×N is then formed by the weight w i,j and measures the similarity between nodes.The degree matrix D ∈ R N ×N is a diagonal matrix with each entry the degree (sum of each row of W ) of each node.
A graph signal u is often defined as a discrete signal on the nodes of the graph u : V → R .The discrete signal u can be regarded as a vector u ∈ R N , where the i-th entry represents the signal value at the i-th node in V .In terms of 2D discrete images, each pixel represents a node, and the pixel intensity stands for the signal value. (1)

The optimal graph Laplacian regularization algorithm
The OGLR algorithm seeks for a metric space to measure the similarity of image patches [21].For each pixel location in a 2D image, a vector v i of length M is constructed by using a set of exemplar functions f m M m=1 : The set of vector {v i } M i=1 is used to build the weighted graph G(V, E, W) with N vertices, where N is the total number of pixels.The determination of the exemplar functions is induced from a continuous graph Laplacian regularizer, described by an anisotropic Dirichlet energy functional E(u): where s ∈ Ω is the pixel location.The metric tensor G : Ω � → R 2×2 can be viewed as the structure tensor at s constructed according to the gradients {∇f} M m=1 : An optimal metric tensor G * can be estimated by considering the noise model from patch gradients in the MMSE sense: where g is the average gradient of a patch, and the constant α g is determined by the covariance of the patch.With the estimated G * , the exemplar functions can be expressed in the following form: where (x i , y i ) are the coordinates of pixel i, and {z l } L l=0 is a set of L non-local patches that are similar to z 0 .These similar patches are obtained by using the K-Nearest-Neighbour (KNN) algorithm, which seeks L patches with the smallest Euclidean distance from the current patch z 0 .Here σ p is a given constant over the whole noisy image, and σ g is an estimated variance of the gradient of the patch.f * 1 and f * 2 indicate the spatial relationship between pixels, f * 3 represents the average pixel intensity of a target patch.Note that the coefficient in f * 1 , f * 2 and f * 3 can balance the contributions of the spatial and intensity factors.Hence, the three exemplar functions defined in Eq. ( 6) can be used to construct the optimal graph edge weight. (2)

The multi-layer framework
A K-layer tree structure can represent the image hierarchy from fine to coarse, and all the leaf nodes can sum up to the input image.In the multi-layer scheme, the output filtered image u out can be described by a smooth term and several detail terms: where {β 0 , β 1 , . . ., β K } is a set of coefficients that controls the smoothness and the detail preservation of the output image.More details on the multi-layer scheme can be found in [27,28].

Graph edge weights driven NLMs in a multi-layer framework
Although the OGLR algorithm has excellent filtering performance, it needs numerous iterations to achieve a comparable result.Moreover, it involves a large amount of inverse operation during the denoising process, thus leading to a very high computational cost.Hence in this paper, we would like to take advantage of the edge weights defined in OGLR and apply it to the NLMs algorithm.Then we embed the newly obtained NLMs method into a multi-layer scheme.The multi-layer scheme can decompose input image details from fine to coarse scale, where the fine scale is used to preserve image details and the coarse scale helps smooth the image.The proposed pipeline is shown in Fig. 1.

NLMs kernel
The NLMs kernel is computed based on the graph edge weights.With the exemplar functions f m M m=1 , the vector v i on node v i is as follows: Then the distance φ ij in Eq. ( 1) between node v i and v j can be obtained by: where � • � is the L 2 norm and φ ij determines the weighted affinity matrix W according to Eq. ( 1).The diagonal elements of the degree matrix D is defined as: (7)  The NLMs kernel F is similar to the graph-based bilateral filter [12] and the classical NLMs kernel [3].The difference lies in that F considers the spatial relationship between pixels and the average intensity of patches.In addition, the relationship and the average intensity are weighted by the gradient estimates, which helps to improve the denoising performance.On the one hand, when the image is polluted by high-level noise, the spatial relationship between pixels dominates the denoising process (like a Gaussian filter).On the other hand, when the signal-noise ratio (SNR) is high, the average intensity plays a more critical role.
It is worthy to note that the OGLR algorithm denoises the target patch z 0 by cal- culating the inverse of the Laplace operator, i.e., z * = (I + τ L) −1 z 0 .Although L is of small size, the inverse operation still costs a lot of time.On the contrary, our NLMs kernel works forward, which avoids the inverse operation as done in the OGLR algorithm (except for the inverse of the diagonal matrix D, a linear operation).Hence, our method works much faster than the OGLR method.

Determine the coefficients with least square
The set of coefficients {β k } in the multi-layer scheme plays a significant role in achieving good denoising performance.In this paper, instead of using parameters according to the s-curve functions proposed in [28], we regard the determination of {β k } as a regression problem and apply the least square algorithm to solve it.Our cost function is as follows: where K is the number of layers, P is the total number of training images, z p represents the p-th noisy image patch, z 0p stands for the p-th noise-free image patch.The aim of ( 12) is to find an appropriate series of {β k } that work on different filters to minimize the difference between the noisy and clean image patches.
Note that during the training process, we distinguish the images with different noise levels.In other words, each noise level will be assigned with a set of optimal coefficients.For each noise level, when the training process is finished, we will estimate the noise variance according to the newly-obtained {β k } .If the estimated noise is higher than a given threshold σ th , it is encouraged to train {β k } again with the newly-obtained {β k }.
Additionally, the number of layers K is also an important parameter.Details will be discussed in Sects.3.3 and 4.2. (10)

NLMs with K residual compensation
The NLMs filter can be embedded into the multi-layer scheme and the output filtered image is with one smooth term and K residuals: Since F is the normalized affinity matrix, it can act as a low-pass filter according to the graph Fourier transform theory.(I − F) is the normalized Laplacian, and it can function as a high-pass filter [5].F K u stands for the smooth term, which is obtained by the cas- cade of K low-pass filters F .The residual K terms are the corresponding detail terms.When K = 1 , Eq.( 13) degenerates to: where the filtered image is composed of one smooth term Fu and one residual detail term (I − F)u .Thus, when K increases, the smoother u out will be.The value of K can not be too large or too small.Too few layers may lead to an incomplete representation of the image, which can not remove the noise effectively, i.e., some details are not restored, or the homogeneous part of the filtered image is not smooth enough etc.However, too many layers would result in a large computation work, which consumes a lot of time, with only a slight performance improvement.The choice of K will be discussed in Sect.4.2.
With the learned coefficients {β k } and the number of layers K, the proposed method is summarized in Algorithm 1.In addition, the flowchart of the proposed graph-based NLMs with multi-layer residual compensation is shown in Fig. 1, where a noisy image with noise variance σ = 50 is used as an example.(13)

Experimental setup
We testify the effectiveness of the proposed method both on natural images and depth images.Additive white Gaussian noise (AWGN) is added to these images, with standard deviations σ ranging from 10 to 50.According to different noise variances σ , the patch size in our experiment ranges from 10 to 22, the and step size N S is from 2 to 6.In the implementation, the normalization parameter γ in Eq. ( 3) was empirically set to be 0.6 for the natural images and γ = 1 for the depth images.The constant σ p in Eq. ( 6) is set to be 10 6 and the patch cluster size L is from 5 to 50.The noise variance threshold men- tioned in Sec.3.2 is σ th = 5.
The test images are from public dataset such as the BSDS500 dataset [1] and the Middlebury Stereo Datasets [23].
We compare our method with the original NLMs [3], OGLR [21] and two other stateof-the-art methods, i.e., Block-Matching 3D (BM3D) [8] and the ADNet method [30].The peak signal noise ratio (PSNR) and the structural similarity (SSIM) are used to evaluate the performance of these methods.

Determination of number of layers K
To find out the most appropriate number of layers K, we test six different images.Five levels of noise are tested separately, i.e., σ = 10, 20, 30, 40, 50 .The average PSNR and SSIM of test images under different noise levels are computed with different K.In our experiments, K ranges from 1 to 6.The maximum of K is set to be 6 because when K > 6 , the computation of the power of matrix costs a lot of time, which is contrary to our motivation.
Figure 2 shows the PSNR (a) and SSIM (b) results according to K. We can see from (a) that when K ≥ 4 , the PSNR converges to a fixed value for all five noise levels.Given SSIM (b), the more layers there are, the higher the SSIM, but when K ≥ 4 , it does not improve significantly.Hence, to balance the PSNR, SSIM, and time cost, we make a compromise by setting K = 4 in our algorithm according to Fig. 2.

Determination of {β k }
In the process of training the coefficient set {β k } in Eq. ( 13), three depth images(cloth, Aloe and flowerpot) and three natural images (barbara, chips and man) are used as the training data, as shown in Fig. 3.Each image is divided into 3000 to 6000 image patches under different noise levels with varying patch sizes.
The learned sets {β k } are shown in Table 1 and 2. Both the two tables indicate that the smooth term F K u of Eq. ( 13) plays the major role in the denoising process.Addition- ally, although the coefficients of the residual terms are small values, even negative values, they also play important roles in retaining detailed information and removing noise.Figures 4, 5, 6, and 7 depict the denoising performance of the five methods on four depth images (wood, bowling, lampshade, and teddy) with a noise variance σ = 50 , the difference between the original depth images and the filtered images, and the corresponding zoomed parts.Figures 8, 9, and 10 show the denoising results of the five methods on three natural images (bird, house and jar) with noise variance σ = 50 respectively.From left to right are: the original image, the noisy image, the results obtained by OGLR, BM3D, ADNet and the proposed method.For the image wood, the horizontal line in the center of the image is seriously blurred by OGLR and BM3D.Moreover, the homogeneous parts are still corrupted and not well restored.ADNet can preserve edges very well visually.However, it generates some undesirable parts, such as the black point in the lower-left corner and the black segment in the center.In our case, although the edges are not preserved as well as ADNet, the homogeneous parts are well smoothed.The middle row shows the figures of differences between the original image and filtered images.Dark contours and areas indicate that there are significant differences.From the difference figures, we can see that our result is more similar to the original clean image.In terms of both visual performance and the indexes, our method provides the best denoising result.
For the image bowling, the PSNR and SSIM of the proposed method are superior to the other three methods.The edge between the bowling ball and pin is blurred, even distorted by BM3D.ADNet generates a deformation on the edge of the ball.The deformation may be due to that the training data does not include images with this kind of data and shape.Our method and OGLR provide better results, while our result is smoother in the homogeneous regions.Additionally, the figures of differences demonstrate that our method has excellent denoising performance and preserves the brightness very well.
The images lampshade and teddy are more complex than the above two images, with more details and weak edges, shown in Figs. 6 and 7.The results show that our method can restore the image very well in smooth areas, but can not preserve the sharp corners, e.g., the corner of the zoomed parts.In addition, our method generates some artifacts in the homogeneous areas.Actually, this phenomenon exists in the NLM theory-based methods, including the NLM, BM3D, OGLR.When the SNR is low, i.e., the noise is strong, some neighboring pixels in a patch may be considered as line structures.These structures will be enhanced or preserved during denoising, thus producing the artifacts.
Figures 8 and 9 show the denoising results of two natural images bird and house under noisy case σ = 50 .It is clear that the proposed method achieves satisfactory perfor- mance in the homogeneous region of the image, like the sky in bird and the shadow of the roof in house.From the enlarged sub-images in Fig. 9, one can see that BM3D, OGLR and ADNet generate some undesirable artifacts and destroy the edge of shadow, while our method provides smoother results with fewer artifacts.Figure 10 shows the results on jar, the carved patterns on the jar can be barely seen after denoising.This hints that our method is not that effective in maintaining the details of texture-rich images.
Figure 11 display the examples on two color images tape and pepper under noisy case σ = 50 .The color image are denoised channel by channel.Visually speaking, the denois- ing results of our method is competitive with the other methods Tables 3 and 4 illustrate the PSNR and SSIM of the proposed method and four other state-of-the-art methods on several depth images and real natural images.The highest indexes are in bold, the second-best are underlined.From the results, we can see that our proposed method is comparable with the state-of-the-art methods.In addition, it outperforms the OGLR method in nearly all cases, especially with a large noise variance σ .Furthermore, when σ is large, the performance of our method becomes more compet- itive.In addition, our method performs better for the piece-wise depth image compared to the performance on real natural images.However, when dealing with texture-rich images such as jar and flower, our result is not as good as ADNet.and a noise level of sigma=10, whereas the OGLR takes about 90 seconds.However, as compared to other approaches such as BM3D and ADnet, the graph-based methods take longer, which is a common downside of the graph-based method.

Conclusion
In this paper, we propose a graph-based NLMs algorithm for image denoising.The edge weights defined in the OGLR algorithm are applied as the NLMs kernel.A multi-layer residual compensation strategy is then used to recover the details.The coefficients of the smooth term and the residual terms of the multi-layer representation are learned according to the least mean square method.We testify the effectiveness of our method both on natural images and depth images.Our proposed method outperforms the original OGLR method in PSNR/SSIM/time cost.Compared with the other state-of-the-art methods, including the classical NLMs, BM3D and the AD-Net, our proposed method provides comparable or better results.Especially, our method has excellent denoising performance on the piecewise smooth images when the noise level is high.

Fig. 1
Fig. 1 Flowchart of the proposed algorithm

Fig. 2
Fig. 2 From above to bottom: a the average PSNR (dB) of different number of layers, b the average SSIM of different number of layers

Fig. 3
Fig. 3 The images in the process of training the coefficient set {β k } .Depth images: a cloth, b Aloe, c flowerpot; natural images: d barbara, e chips, f man

Fig. 4 Fig. 5 Fig. 6 Fig. 7 Fig. 8 Fig. 9 Fig. 10
Fig. 4 Denoising results of depth image (wood) with the noise variance σ = 50 [top], the difference between the original depth images and the filtered images [middle], and zoom-in image [bottom].From the left (a) to right (g) are: the original image, the noisy image, the results obtained by NLM, BM3D, OGLR, ADnet and the proposed method respectively

Table 1
The learned {β k } for depth images with different levels of noise

Table 2
The learned {β k } for real natural images with different levels of noise

Table 5
The results of image denoising on depth dataset Bold and underline to mark the best and the second best results for each quality index, respectively