Although the OGLR algorithm has excellent filtering performance, it needs numerous iterations to achieve a comparable result. Moreover, it involves a large amount of inverse operation during the denoising process, thus leading to a very high computational cost. Hence in this paper, we would like to take advantage of the edge weights defined in OGLR and apply it to the NLMs algorithm. Then we embed the newly obtained NLMs method into a multi-layer scheme. The multi-layer scheme can decompose input image details from fine to coarse scale, where the fine scale is used to preserve image details and the coarse scale helps smooth the image. The proposed pipeline is shown in Fig. 1.

### 3.1 NLMs kernel

The NLMs kernel is computed based on the graph edge weights. With the exemplar functions \(\left\{\text{f}_{{\rm m}}\right\} _{m=1}^{M}\), the vector \({{\mathbf{v }}}_i\) on node \(v_i\) is as follows:

$$\begin{aligned} {{{\mathbf{v }}}_i} = \left[ \sqrt{{\alpha _g}} {x_i}, \sqrt{{\alpha _g}} {y_i}, 1/(L + {\sigma _g^2} / {\sigma _p^2})\sum \limits {_{l = 0}^{L - 1}}{z_l}\right] . \end{aligned}$$

(8)

Then the distance \(\phi _{ij}\) in Eq. (1) between node \(v_i\) and \(v_j\) can be obtained by:

$$\begin{aligned} \phi _{ij} =\Vert {\mathbf{v }}_i-{\mathbf{v }}_j\Vert , \end{aligned}$$

(9)

where \(\Vert \cdot \Vert\) is the \({{\mathfrak {L}}_2}\) norm and \(\phi _{ij}\) determines the weighted affinity matrix \({\mathbf {W}}\) according to Eq. (1). The diagonal elements of the degree matrix \({\mathbf {D}}\) is defined as:

$$\begin{aligned} {\mathbf {D}}_{ii} = \sum _j w_{ij}. \end{aligned}$$

(10)

The NLMs kernel \({\mathbf {F}}\) is a normalized version of the weight matrix and obtained by the product of \({\mathbf {D}}^{-1}\) and \({\mathbf {W}}\):

$$\begin{aligned} {\mathbf {F}}={\mathbf {D}}^{-1}{\mathbf {W}} \end{aligned}$$

(11)

The NLMs kernel \({\mathbf {F}}\) is similar to the graph-based bilateral filter [12] and the classical NLMs kernel [3]. The difference lies in that \({\mathbf {F}}\) considers the spatial relationship between pixels and the average intensity of patches. In addition, the relationship and the average intensity are weighted by the gradient estimates, which helps to improve the denoising performance. On the one hand, when the image is polluted by high-level noise, the spatial relationship between pixels dominates the denoising process (like a Gaussian filter). On the other hand, when the signal-noise ratio (SNR) is high, the average intensity plays a more critical role.

It is worthy to note that the OGLR algorithm denoises the target patch \(z_0\) by calculating the inverse of the Laplace operator, i.e., \(z^*=({\mathbf {I}} + \tau {\mathbf {L}})^{-1} z_0\). Although \({\mathbf {L}}\) is of small size, the inverse operation still costs a lot of time. On the contrary, our NLMs kernel works forward, which avoids the inverse operation as done in the OGLR algorithm (except for the inverse of the diagonal matrix D, a linear operation). Hence, our method works much faster than the OGLR method.

### 3.2 Determine the coefficients with least square

The set of coefficients \(\{\beta _k\}\) in the multi-layer scheme plays a significant role in achieving good denoising performance. In this paper, instead of using parameters according to the s-curve functions proposed in [28], we regard the determination of \(\{\beta _k\}\) as a regression problem and apply the least square algorithm to solve it. Our cost function is as follows:

$$\begin{aligned} \begin{aligned} C&= \min _{\{\beta _0,...\beta _K\}} \sum _{p=1}^P \left\| \left( \beta _0{\mathbf {F}}^K + \sum _{k=1}^K\beta _k({{\mathbf {I}}}-{\mathbf {F}}\right) {\mathbf {F}}^{K-k} ){z_{p}} - {z_{0p}}\right\| _2^2, \end{aligned} \end{aligned}$$

(12)

where *K* is the number of layers, *P* is the total number of training images, \(z_p\) represents the *p*-th noisy image patch, \(z_{0p}\) stands for the *p*-th noise-free image patch. The aim of (12) is to find an appropriate series of \(\{\beta _k\}\) that work on different filters to minimize the difference between the noisy and clean image patches.

Note that during the training process, we distinguish the images with different noise levels. In other words, each noise level will be assigned with a set of optimal coefficients. For each noise level, when the training process is finished, we will estimate the noise variance according to the newly-obtained \(\{\beta _k\}\). If the estimated noise is higher than a given threshold \(\sigma _{th}\), it is encouraged to train \(\{\beta _k\}\) again with the newly-obtained \(\{\beta _k\}\).

Additionally, the number of layers *K* is also an important parameter. Details will be discussed in Sects. 3.3 and 4.2.

### 3.3 NLMs with K residual compensation

The NLMs filter can be embedded into the multi-layer scheme and the output filtered image is with one smooth term and *K* residuals:

$$\begin{aligned} \begin{aligned} {{{u}_{out}} = {\beta _0}{{\mathbf {F}}^K}{u}+ {\beta _1}({\mathbf {I}}-{\mathbf {F}}){\mathbf {F}}^{K - 1}{u} +...+ {\beta _{K-1}}({\mathbf {I}}-{\mathbf {F}}){\mathbf {F}}{u} + {\beta _{K}}({\mathbf {I}} - {\mathbf {F}}){u}.} \end{aligned} \end{aligned}$$

(13)

Since \({\mathbf {F}}\) is the normalized affinity matrix, it can act as a low-pass filter according to the graph Fourier transform theory. \(({{\mathbf {I}}}-{\mathbf {F}})\) is the normalized Laplacian, and it can function as a high-pass filter [5]. \({\mathbf {F}}^K u\) stands for the smooth term, which is obtained by the cascade of *K* low-pass filters \({\mathbf {F}}\). The residual *K* terms are the corresponding detail terms. When \(K = 1\), Eq.(13) degenerates to:

$$\begin{aligned} u_{out}=\beta _0 {\mathbf {F}}u+\beta _1 ({{\mathbf {I}}}-{\mathbf {F}})u, \end{aligned}$$

where the filtered image is composed of one smooth term \({\mathbf {F}}u\) and one residual detail term \(({{\mathbf {I}}}-{\mathbf {F}})u\). Thus, when *K* increases, the smoother \(u_{out}\) will be. The value of *K* can not be too large or too small. Too few layers may lead to an incomplete representation of the image, which can not remove the noise effectively, i.e., some details are not restored, or the homogeneous part of the filtered image is not smooth enough *etc.* However, too many layers would result in a large computation work, which consumes a lot of time, with only a slight performance improvement. The choice of K will be discussed in Sect. 4.2.

With the learned coefficients \(\{\beta _k\}\) and the number of layers *K*, the proposed method is summarized in Algorithm 1. In addition, the flowchart of the proposed graph-based NLMs with multi-layer residual compensation is shown in Fig. 1, where a noisy image with noise variance \(\sigma = 50\) is used as an example.