- Research
- Open access
- Published:

# A collaborative adaptive Wiener filter for image restoration using a spatial-domain multi-patch correlation model

*EURASIP Journal on Advances in Signal Processing*
**volume 2015**, Article number: 6 (2015)

## Abstract

We present a new patch-based image restoration algorithm using an adaptive Wiener filter (AWF) with a novel spatial-domain multi-patch correlation model. The new filter structure is referred to as a collaborative adaptive Wiener filter (CAWF). The CAWF employs a finite size moving window. At each position, the current observation window represents the reference patch. We identify the most similar patches in the image within a given search window about the reference patch. A single-stage weighted sum of all of the pixels in the similar patches is used to estimate the center pixel in the reference patch. The weights are based on a new multi-patch correlation model that takes into account each pixel’s spatial distance to the center of its corresponding patch, as well as the intensity vector distances among the similar patches. One key advantage of the CAWF approach, compared with many other patch-based algorithms, is that it can jointly handle blur and noise. Furthermore, it can also readily treat spatially varying signal and noise statistics. To the best of our knowledge, this is the first multi-patch algorithm to use a single spatial-domain weighted sum of all pixels within multiple similar patches to form its estimate and the first to use a spatial-domain multi-patch correlation model to determine the weights. The experimental results presented show that the proposed method delivers high performance in image restoration in a variety of scenarios.

## 1 Introduction

### 1.1 Image restoration

During image acquisition, images are subject to a variety of degradations. These invariably include blurring from diffraction and noise from a variety of sources. Restoring such degraded images is a fundamental problem in image processing that has been researched since the earliest days of digital images [1,2]. A wide variety of linear and non-linear methods have been proposed. Many methods have focused exclusively on noise reduction, and others seek to address multiple degradations jointly, such as blur and noise.

A widely used method for image restoration, relevant to the current paper, is the classic Wiener filter [3]. The standard Wiener filter is a linear space-invariant filter designed to minimize mean squared error (MSE) between the desired signal and estimate, assuming stationary random signals and noise. It is important to note that there are many disparate variations of Wiener filters. These include finite impulse response, infinite impulse response, transform-domain, and spatially adaptive methods. Within each of these categories, a wide variety of statistical models may be employed. Some statistical models are very simple, such as the popular constant noise-to-signal power spectral density model, and others are far more complex. In the case of the empirical Wiener filter [4], no explicit statistical model is used at all. Rather, a pilot or prototype estimate is used in lieu of a parametric statistical model. While all of these methods may go by the name of ‘Wiener filter’, they can be quite different in their character.

Recently, a form of adaptive Wiener filter (AWF) has been developed and successfully applied to super-resolution (SR) and other restoration applications by one of the current authors [5]. This AWF approach employs a spatially varying weighted sum to form an estimate of each pixel. The Wiener weights are determined based on a spatially varying spatial-domain parametric correlation model. This particular brand of AWF SR emerged from earlier work, including that in [6-8]. This kind of AWF is capable of jointly addressing blur, noise, and undersampling and is well suited to dealing with a non-stationary signal and noise. The approach is also very well suited to dealing with non-uniformly sampled imagery and missing or bad pixels. This AWF SR method has been shown to provide best-in-class performance for nonuniform interpolation-based SR [5,9-11] and has also been used successfully for demosaicing [12,13] and Nyquist sampled video restoration [14]. Under certain conditions, the method can also be very computationally efficient [5]. The key to this method lies in the particular correlation model used and how it is employed for spatially adaptive filtering.

A different approach to image restoration, also relevant to the current paper, is based on fusing multiple similar patches within the observed image. This patch-based approach is used primarily for noise reduction applications and exploits spatial redundancy that may express itself within an image, locally and/or non-locally. The method of non-local means (NLM), introduced in [15], may be the first method to directly fuse non-local patches from within the observed image based on vector distances for the purpose of image denoising. A notable early precursor to the NLM is the vector detection method [16,17]. In the vector detection approach, a codebook of representative patches from training data is used, rather than patches from the observed image itself [16,17]. A number of NLM variations have been proposed, including [18-26]. The basic NLM method forms an estimate of a reference pixel as a weighted sum of non-local pixels. The weights are based on the vector distances of the patch intensities between various non-local patches and the reference patch. In particular, the center samples of a non-local patches are weighted in proportion to the negative exponential of the corresponding patch distance. The NLM algorithm can be viewed as an extension of the bilateral filter [27-31], which forms an estimate by weighting neighboring pixels based on both spatial proximity and intensity similarity of individual pixels (rather than patches).

Improved performance for noise reduction is obtained with the block matching and 3D filtering (BM3D) approach proposed in [32-34]. The BM3D method also uses vector distances between patches, but the filtering is performed using a transform-domain shrinkage operation. By utilizing all of the samples within selected patches and aggregating the results, excellent noise reduction performance can be achieved with BM3D. Another related patch-based image denoising algorithm is the total least squares method presented in [35]. In this method, each ideal patch is modeled as a linear combination of similar patches from the observed image. Another type of patch-based Wiener denoising filter is proposed in [36], and a globalized approach to patch-based denoising is proposed in [37]. While such patch-based methods perform well in noise reduction, most are not capable of addressing blur and noise jointly. However, there are a few recent methods that do treat both blur and noise and incorporate multi-patch fusion. These include BM3D deblurring (BM3DDEB) [38] and iterative decoupled deblurring-BM3D (IDD-BM3D) [39]. The deblurring in these algorithms is not achieved by patch fusion alone. Rather, the patch fusion component of these algorithms serves as a type of signal model used for regularization. Note that multi-patch methods have also been developed and applied to SR [40-45]. However, the focus of this paper is on image restoration without undersampling/aliasing.

### 1.2 Proposed method and novel contribution

In this paper, we propose a novel multi-patch AWF algorithm for image restoration. In the spirit of [32], we refer to this new filter structure as a collaborative adaptive Wiener filter (CAWF). It can be viewed as an extension of the AWF in [5], with the powerful new feature of incorporating multiple patches for each pixel estimate. As with other patch-based algorithms, we employ a moving window. At each position, the current observation window represents the reference patch. Within a given search window about the reference, we identify the most similar patches to the reference patch. However, instead of simply weighting just the center pixels of these similar patches, as with NLM, or using transform-domain shrinkage like BM3D, we use a spatial-domain weighted sum of all of the pixels within all of the selected patches to estimate the one center pixel in the reference patch. The weights used are based on a novel spatially varying spatial-domain multi-patch correlation model. The correlation model takes into account each pixel’s spatial distance to the center of its corresponding patch, as well as the intensity vector distances among the similar patches. The ‘collaborative’ nature of the CAWF springs from the fusion of multiple, potentially non-local, patches. One key advantage of the CAWF approach is that it can jointly handle blur and noise. Furthermore, the CAWF method is able to accommodate spatially varying signal and noise statistics.

Our approach is novel in that we use a single-pass spatial-domain weighted sum of all pixels within all of the similar patches to form the estimate each desired pixel. In the case of NLM, only the center pixel of each similar patch is given a weight [15].This is simple and effective for denoising, but deconvolution is not possible within the basic NLM framework, and all of the available information in the patches may not be exploited. While BM3D does fuse all of the pixels in the similar patches, the fusion in BM3D is based on a wavelet shrinkage operation and not a spatial-domain correlation model [32]. Because of the nature of wavelet shrinkage, the standard BM3D is also unable to perform deblurring [32]. On the other hand, the CAWF structure can jointly address blur and noise and does not employ separate transform-domain inverse filtering as in [38] or iterative processing like that in [39]. To the best of our knowledge, this is the first multi-patch algorithm to use a single-pass weighted sum of all pixels within multiple similar patches to jointly address blur and noise. It is also the first to use a spatial-domain multi-patch correlation model.

The remainder of this paper is organized as follows. The observation model is described in Section 1. The CAWF algorithm is presented in Section 1. This includes the basic algorithm description as well as the new spatial-domain multi-patch correlation model. Computational complexity and implementation are also discussed in Section 1. Experimental results for simulated and real data are presented and discussed in Section 1. Finally, conclusions are offered in Section 1.

## 2 Observation model

The proposed CAWF algorithm is based on the observation model shown in Figure 1. The model input is a desired 2D discrete grayscale image, denoted *d*(*n*
_{1},*n*
_{2}), where *n*
_{1},*n*
_{2} are the spatial pixel indices. Using lexicographical notation, we shall represent all of the pixels in this desired image with a single column vector **d**=[*d*
_{1},*d*
_{2},…,*d*
_{
N
}]^{T}, where *N* is the total number of pixels in the image. Next in the observation model, we assume the desired image is convolved with a specified point spread function (PSF), yielding

where *h*(*n*
_{1},*n*
_{2}) is the PSF, and ∗ is 2D spatial convolution. In matrix form, this can be expressed as

where **H** is an *N*×*N* matrix containing values of PSF, and the vector **f** is the image *f*(*n*
_{1},*n*
_{2}) in lexicographical form. The PSF can be designed to model different types of blurring, such as diffraction from optics, spatial detection integration, atmospheric effects, and motion blurring. In the experimental results presented in this paper, we use a simple Gaussian PSF.

With regard to noise, our model assumes zero-mean additive Gaussian noise, with noise standard deviation of *σ*
_{
η
}. Thus, the observed image is given by

where *η*(*n*
_{1},*n*
_{2}) is a Gaussian noise array. In lexicographic form, this is given by

where **g**=[*g*
_{1},*g*
_{2},…,*g*
_{
N
}]^{T} and ** η**=[

*η*

_{1},

*η*

_{2},…,

*η*

_{ N }]

^{T}are the observed image and noise vectors, respectively. The random noise vector is Gaussian such that \(\boldsymbol {\eta } \sim \mathcal {N} \left ({\mathbf {0},{\sigma }^{2}_{\eta } \mathbf {I}} \right)\).

## 3 Collaborative adaptive Wiener filter

### 3.1 CAWF overview

The CAWF employs a moving window approach with a moving reference patch and corresponding moving search window, each centered about pixel *i*, where *i*=1,2,…,*N*. The reference patch spans *K*
_{1}×*K*
_{2}=*K* pixels symmetrically about pixel *i*. All of the pixels that lie within the span of this reference patch are placed into the reference patch vector defined as **g**
_{
i
}=[*g*
_{
i,1},*g*
_{
i,2},…,*g*
_{
i,K
}]^{T}. The search window is of size *L*
_{1}×*L*
_{2}=*L* pixels. Let the set *S*
_{
i
}=[*S*
_{
i
}(1),*S*
_{
i
}(2),… *S*
_{
i
}(*L*)]^{T} contain the indices of the pixels within the search window.

The next step in the CAWF algorithm is identifying the *M* patches from the search window that are most similar to the reference. This is done using a simple squared Euclidean distance. That is, we compute \({\left \| {{\mathbf {g}_{i}} - {\mathbf {g}_{j}}} \right \|}^{2}_{2}\), for *j*∈*S*
_{
i
}. We select the *M* patches corresponding to the *M* smallest distances and designate these as ‘similar patches’. All of the pixels from the similar patches shall be concatenated into a single *K*
*M*×1 column vector, \({\tilde {\mathbf {g}}_{i}} = {\left [{\mathbf {g}_{{s_{i,1}}}^{T},\mathbf {g}_{{s_{i,2}}}^{T},\ldots,\mathbf {g}_{{s_{i,M}}}^{T}} \right ]^{T}}\), where **s**
_{
i
}=[*s*
_{
i,1},*s*
_{
i,2},…*s*
_{
i,M
}]^{T} contains the indices of the similar patches in order from smallest corresponding distance to largest. The minimum distance will always be zero and will correspond to the reference patch itself, such that *s*
_{
i,1}=*i*. This selection of similar patches is common to many patch-based algorithms, such as those in [32-34]. Examples of the similar patch selection is illustrated in Figure 2. The red square represents the reference patch, and the green squares represent selected similar patches. Note that there will generally be variability with regard to how similar the selected patches are to the reference and to each other. Some reference patches will have numerous low distance similar patches, and others will not. It is this variability that we wish to capture and account for with our multi-patch correlation model in Section 1.

With the collection of similar patches defined, we can now express the CAWF output as a weighted sum of the values in \(\tilde {\mathbf {g}_{i}}\). In particular, the CAWF estimate for desired pixel *i* is given by

where **w**
_{
i
}=[*w*
_{1},*w*
_{2},… *w*
_{
KM
}]^{T} is a vector of weights. Note that this approach is a one-pass weighted-sum operation that incorporates all of the pixels in \(\tilde {\mathbf {g}}_{i}\) for the estimate of *d*
_{
i
}. To minimize the MSE, the Wiener filter weights [5] may be used such that

where \(\tilde {\mathbf {R}}_{i} = E \left \{ \tilde {\mathbf {g}}_{i} \tilde {\mathbf {g}}_{i}^{T} \right \}\) is a *K*
*M*×*K*
*M* autocorrelation matrix for the multi-patch observation vector \(\tilde {\mathbf {g}}_{i}\), and \(\tilde {\mathbf {p}}_{i} = E\left \{\tilde {\mathbf {g}}_{i} d_{i} \right \}\) is a *K*
*M*×1 cross-correlation vector between the desired pixel *d*
_{
i
} and \(\tilde {\mathbf {g}}_{i}\). The statistics used to fill \(\tilde {\mathbf {R}}_{i}\) and \(\tilde {\mathbf {p}}_{i}\) are found using the new multi-patch correlation model described in Section 1.

### 3.2 Spatial-domain multi-patch correlation model

The multi-patch correlation model provides the values for \(\tilde {\mathbf {R}}_{i}\) and \(\tilde {\mathbf {p}}_{i}\), so that we may generate the weights in Equation 6. The model attempts to capture the spatial relationship among the pixels within a given patch, which is essential for deconvolution. Furthermore, it also seeks to incorporate knowledge of redundancy among the similar patches. Finally, the model captures the local desired signal variance, as well as the noise variance of each observed pixel.

To begin, first let \(\tilde {\mathbf {f}}_{i}\) be the noise-free version of \(\tilde {\mathbf {g}}_{i}\). In other words, we have \(\tilde {\mathbf {g}}_{i}=\tilde {\mathbf {f}}_{i}+\tilde {\boldsymbol {\eta }}_{i}\), where \(\tilde {\boldsymbol {\eta }}_{i}\) is the random noise vector associated with the samples within multi-patch observation vector *i*. We shall assume the noise is zero mean and uncorrelated with the signal. Furthermore, we will assume the noise samples in \(\tilde {\boldsymbol {\eta }}_{i}\) are independent and identically distributed (i.i.d.) with a noise variance of \(\sigma _{\eta }^{2}\). In this case, it is straightforward to show that

and

Now, the problem reduces to modeling \(E\left \{\tilde {\mathbf {{f}}}_{i}\,d_{i}\right \} \) and \(E\left \{\tilde {\mathbf {{f}}}_{i}\,\tilde {\mathbf {{f}}}_{i}^{T}\right \}\).

In our new correlation model, the multi-patch statistics will be expressed in terms of statistics for a single patch and a distance matrix for all of the similar patches. In particular, we use the model

where ⊗ is a Kronecker product, and **R** is a *K*×*K* autocorrelation matrix of a single noise-free patch obtained from a variance-normalized desired image. We will say more about this shortly. The matrix

is an *M*×*M* distance matrix among the *M* similar patches. In our notation, the exponential term in Equation 9 is a matrix whose elements are made of the exponential values of the corresponding distance matrix elements. The variable *α* is a tuning parameter that controls the correlation decay as a function of the distances between similar patches, and *σ*
_{
η
} is noise standard deviation. The parameter \(\hat \sigma _{{d_{i}}}^{2}\) is the estimated desired signal variance associated with the reference patch. By substituting Equation 9 into Equation 7, we get the autocorrelation matrix for \(\tilde {\mathbf {g}}_{i}\) as

In a similar manner, we model the cross-correlation vector as

where **p** is the *K*×1 cross-correlation vector for a single normalized patch, and [**D**
_{
i
}]_{1} is the first column of the distance matrix **D**
_{
i
}. The correlation models in Equations 11 and 12 capture the spatial correlations among pixels within each patch using **R** and **p**. The patch similarities, captured in the distance matrix, are used to ‘modulate’ these correlations with the Kronecker product to provide the full multi-patch correlation model. In this manner, pixels belonging to patches with smaller inter-patch distances will be modeled with higher correlations among them. Potential changes in the underlying desired image variance are captured in the model with the term \( \hat \sigma _{{d_{i}}}^{2} \). In addition, a spatially varying noise variance can easily be incorporated if appropriate.

The specific distance metric used to populate the distance matrix **D**
_{
i
} here is a scaled and shifted *l*
^{2}-norm. This type of metric has been used successfully to quantify similarity between image patches corrupted with additive Gaussian noise [15]. In particular, the distance between patches centered about pixels *i* and *j* is given by

where ∥**g**
_{
i
}−**g**
_{
j
}∥_{2} is the *l*
^{2}-norm distance, *K* is total number of pixels in each patch, and *σ*
_{
η
} is noise standard deviation. The scaling by \(\sigma _{\eta }\sqrt {2K}\) normalizes the distance with respect to *K* and *σ*
_{
η
}, and *D*
_{0} can be used as a tuning parameter in the correlation model to adjust for distance due to noise. To see how the scaling works, and understand the potential role of *D*
_{0}, consider the distance with *D*
_{0}=0 defined as

It can be shown that for identical patches with distance due only to i.i.d. Gaussian noise, the probability density function (pdf) for \({{\bar D}_{i,j}} \) is that of a scaled Chi random variable and is given by

This pdf is plotted in Figure 3 for four values of *K*. Note that with our scaling, the pdf is not a function of *σ*
_{
η
} and the mean is close to 1 for all *K*. Also, note that *D*
_{0} can be used to shift the pdf.

Let us now turn our attention to **R**, **p**, and \(\hat \sigma _{{d_{i}}}^{2}\). Note that **R** and **p** correspond to a single patch derived from a desired image with zero mean and variance of one, denoted \(\bar d(n_{1},n_{2})\). After PSF blurring, the resulting image is denoted \(\bar f(n_{1},n_{2})\), following the model shown in Figure 1. Since these statistics are for only a single patch, they can be modeled in a fashion similar to that in [5]. To begin, consider the 2D wide sense stationary (WSS) autocorrelation function model for \(\bar d(n_{1},n_{2})\) given by

where *ρ* is the one pixel step correlation value. The cross correlation function between \(\bar d(n_{1},n_{2})\) and \(\bar f(n_{1},n_{2})\) can then be expressed as

The auto-correlation function for \(\bar f(n_{1},n_{2})\) can also be expressed in terms of the desired autocorrelation function as

Figure 4 shows an example of \(r_{\bar d \bar f }(n_{1},n_{2})\) and \(r_{\bar f \bar f }(n_{1},n_{2})\) for Gaussian blur PSF with standard deviation of 1.5 pixels, and *ρ*=0.7. As expected, the correlation drops with distance, as controlled by *ρ*.

Now consider a *K*
_{1}×*K*
_{2}=*K* patch from \(\bar f(n_{1},n_{2})\), denoted \(\bar {\mathbf {f}}=[{\bar f}_{1}, {\bar f}_{2},\ldots,{\bar f}_{K}]^{T}\), and the corresponding desired pixel value, \({\bar d}\). The *K*×*K* autocorrelation matrix for \(\bar {\mathbf {f}}\) is \(\mathbf {R} = E\left \{ \bar {\mathbf {f}}\bar {\mathbf {f}}^{T}\right \}\), and the *K*×1 cross-correlation vector is \(\mathbf {p} = E\left \{\bar {\mathbf {f}}\,\bar {\mathbf {d}} \right \}\). Expressing all of the terms in the autocorrelation matrix, we obtain

Assuming a 2D WSS model, as in [5], this matrix can be populated from Equation 18 as follows

where **Δ**(*m*,*n*)=[*Δ*
_{
x
}(*m*,*n*),*Δ*
_{
y
}(*m*,*n*)], and *Δ*
_{
x
}(*m*,*n*) and *Δ*
_{
y
}(*m*,*n*) are the *x* and *y* distances between \({\bar f}_{m}\) and \({\bar f}_{n}\) in \({\bar f}(n_{1},n_{2})\) in units of pixels. In a similar fashion, we can populate **p** using Equation 17 as follows

where \({\bar f}_{\frac {K+1}{2}}\) corresponds to the spatial position of \({\bar d}\).

The final term needed for the correlation model is \(\hat {\sigma }_{d_{i}}^{2}\), which corresponds to the underlying desired signal variance associated with the reference patch. Since **R** and **p** are based on a desired signal with unit variance, we scale these by an estimate of the desired signal variance for each reference patch in the observed image to obtain the appropriate values. To obtain this estimate, we first compute the sample variance estimate of the pixels in \(\tilde {\mathbf {g}}_{i}\) and we denoted this as \({\hat {\sigma }}_{g_{i}}\). We then subtract the noise variance to give an estimate of the noise-free observed signal variance as

Note that in practice, we do not allow the value in Equation 22 to go below a specified minimum value. Using Equation 17, it can be shown that the relationship between \(\sigma _{d_{i}}^{2}\) and \(\sigma _{f_{i}}^{2}\) is given by [5]

where

and \(\tilde h({n_{1}},{n_{2}}) = h({n_{1}},{n_{2}}) * h(- {n_{1}}, - {n_{2}})\). Thus, our desired signal variance estimate is \( {\hat \sigma }_{d_{i}}^{2} = {\hat \sigma }_{f_{i}}^{2} / C(\rho)\).

By substituting Equations11 and 12 into Equation 6, and dividing through by \( {\hat \sigma }_{d_{i}}^{2}\), the CAWF weight vector can be computed as

Note that the DC response of the CAWF filter is not guaranteed to be one (i.e., the weights may not sum to 1). To prevent artifacts when processing an image that is not zero mean, we normalize the weights to sum to one by dividing the weight vector by the sum of the weights for each *i* before computing the weighted sum in Equation 5. From Equation 25, it is clear that CAWF weights adapt spatially based on the local signal variance, the variance of the noise, and the distance matrix among the similar patches. There are two tuning parameters in the correlation model, *ρ*, which controls the correlation between samples within a given patch expressed in **R** and **p**, and *α* which controls the correlation between patches. We have found that the algorithm is not highly sensitive to these tuning parameters, and good performance can be obtained for a wide range of images using a specified fixed value for these. Note that in addition to providing an estimate of the desired image, an estimate of the MSE itself can be readily generated based on the correlation model. This estimated MSE is given by [9]

A block diagram showing all of the key steps in the CAWF filter is shown in Figure 5.

To better understand the workings of the collaborative correlation model in determining the filter weights, consider Examples 1 and 2 shown in Figures 6 and 7, respectively. These examples are for the case of blur with a Gaussian PSF with a 1 pixel standard deviation and Gaussian noise of standard deviation 20. The patch size is *K*=9×9=81, and there are *M*=4 patches. Example 1 in Figure 6 shows patches for an edge region where the multiple patches have a very similar underlying structure, and hence small inter-patch distances. In contrast, Example 2 in Figure 7 shows the case of dissimilar patches with large inter-patch distances. In Figures 6 and 7, (a) shows the spatial domain patches, and (b) shows the CAWF weights corresponding to these patches. Note that in Example 1, pixels in all of the patches get significant weight as shown in Figure 6b. In contrast, for the dissimilar patches in Example 2, only the reference patch pixels get significant weight, as shown in Figure 7b. Figure 8 provides additional insight by comparing the filter weights, summed over patches, for Examples 1 and 2 side by side. When many similar patches are available, more noise reduction is possible. In turn, this allows the deconvolution to be more aggressive. This can be seen in the more aggressive deconvolution weights in Figure 8a, compared with those in Figure 8b. On the other hand, if no low-distance patches can be found for the reference, the CAWF algorithm essentially reverts to a single-patch AWF filter, giving non-zero weights only to the reference patch. Because less noise reduction can be achieved this way, the filter automatically becomes less aggressive in its deconvolution, so as to not exaggerate noise.

These examples show how the distance matrix plays an interesting role in the joint deblurring/denoising aspect of the CAWF. This type of spatially adaptive deconvolution is unlike anything we have seen before in other multi-patch restoration algorithms. A similar process occurs with noise only. In that case, the CAWF balances the amount of spatial low-pass filtering employed. When good patch matches are found, the CAWF relies more on patch fusion for noise reduction. When no good matches are found, it resorts to more spatial smoothing. All of this is also balanced by the estimate of the local desired signal variance, \(\hat {\sigma }^{2}_{d_{i}}\). Generally, more smoothing is done in low signal variance areas. One last point of interest with regard to these examples relates to the structure of the multi-patch autocorrelation matrix \(\tilde {\mathbf {R}}_{i}\). These matrices are shown in Figure 9 for Examples 1 and 2. Note that based on Equation 11, these are both block matrices made up of a 4×4 grid of scaled 81×81**R** submatrices. The **R** matrix itself has an apparent 9×9 substructure, due to the column-ordered lexicographical representation of each patch. Note that the off diagonal blocks of \(\tilde {\mathbf {R}}_{i}\) for Example 2 in Figure 9b are essentially zero. In most cases, the inter-block correlations will lie between Examples 1 and 2.

### 3.3 Multi-pixel estimation and aggregation

In the CAWF algorithm described in Section 1, one pixel is estimated for each reference patch. However, in a manner similar to that in [5], it is possible to estimate multiple desired pixels from each multi-patch observation vector \(\tilde {\mathbf {g}}_{i}\). In fact, all of the desired pixels corresponding to \(\tilde {\mathbf {g}}_{i}\) can be estimated. Let this full *K*
*M*×1 vector of desired pixels be denoted \(\tilde {\mathbf {d}}_{i}\). If all multi-patch observation vectors are used in this fashion, many estimates of each desired pixel are obtained. These can be aggregated by a simple average. In the case of noise only, we have observed that aggregation yields improved results. For joint deblurring and denoising with any significant amount of blur, the aggregation does not appear to provide any advantage. However, this multi-pixel estimation approach can be used to reduce the computational complexity, since not every multi-patch observation vector must be processed in order to form a complete image estimate.

To perform the multi-pixel estimation, the CAWF filter output is expressed as

where \(\hat {\tilde {\mathbf {d}}}_{i} \) is the estimate of \(\tilde {\mathbf {d}}_{i}\), and **W**
_{
i
} is a *K*
*M*×*K*
*M* matrix of weights. The weight matrix is given by

where

\( \mathbf {P} = E\left \{\bar {\mathbf {f}} \bar {\mathbf {d}}^{T} \right \}\) is a *K*×*K* normalized cross-correlation matrix, and \(\bar {\mathbf {d}}\) is the *K*×1 desired vector corresponding to \(\bar {\mathbf {f}}\).

### 3.4 Computational complexity and implementation

Here, we briefly address the computational complexity of the CAWF filter by tracking the number of floating point operations (flops), where a flop is defined as one multiply plus add operation. The first action of the CAWF filter is finding similar patches. This requires computing *L* distances of *K* dimensional vectors (note that *L* is the search window size, and *K* is the patch size in pixels). The next step is computing the distance matrix based on Equation 13 for the *M* selected patches. This requires computing *M*
^{2}/2 scaled and shifted distances for *K* dimensional vectors. The Kronecker products for \(\tilde {\mathbf {R}}_{i}\) and \(\tilde {\mathbf {p}}_{i}\) require (*K*
*M*)^{2} and *KM* multiplies, respectively. However, the main computational burden of the CAWF filter comes next with the computation of the weights in Equation 6. This can be done using Cholesky factorization, which requires (*K*
*M*)^{3}/3 flops to perform LU decomposition for the *K*
*M*×*K*
*M* autocorrelation matrix \(\tilde {\mathbf {R}}_{i}\). Computing the weights from the LU decomposition requires 2(*K*
*M*)^{2} flops using forward and backward substitution. The final weighted sum operation is accomplished with *KM* flops. Since the dominant term in the computational complexity is the Cholesky factorization, we might conclude that the complexity of the CAWF filter is *O*((*K*
*M*)^{3}). Thus, the complexity of the CAWF algorithm goes up significantly with larger windows sizes, *K*, and more similar patches, *M*. However, an important thing to note is that the CAWF algorithm is completely parallel at the output pixel level. Unlike most variational image restoration methods, each output pixel can be computed independently and in parallel. Also, the CAWF approach requires only one pass over the data.

To put the CAWF computational complexity into context, consider that the AWF method employed here, with a spatially varying signal-to-noise ratio (SNR) estimate, may be viewed as a special case of the CAWF with *M*=1. Thus, increasing *M* for CAWF causes a corresponding increase in complexity according to *O*((*K*
*M*)^{3}). The NLM method shares the same distance computations and comparisons and CAWF. However, in contrast to CAWF, NLM only requires *L* flops per output in the weighted sum, since it only weights the center sample of each patch in the search window. Although significantly simpler computationally, NLM does not fully exploit all of the information in the patches and it cannot perform deconvolution. Also, AWF is not able to exploit multi-patch information.

For pure denoising application, we have found that good results can be obtained with CAWF for *M*=10, and *K*=3×3=9 for light noise and *K*=5×5=25 for moderate to heavy noise. In the case of joint deblurring and denoising, a larger window size is needed for adequate deconvolution. We have found that *K*=9×9=81 is a reasonable choice for light to moderate blurring. Our implementation uses MATLAB with no parallel acceleration or mex files, and processing is done on a PC with Intel®Xeon®Processor 3.7 GHz. CAWF processing time for a pure denoising application with a 512×512 image using *K*=9 and *M*=10 is 155 s. For context, the AWF processing takes 33 s, and NLM takes 3.2 s.

## 4 Experimental results

In this section, we demonstrate the efficacy of the proposed CAWF algorithm using images with a variety of simulated degradations and using real video frames. We also present a parameter sensitivity analysis. The filter parameters used for all of the experimental results are listed in Table 1. Note that for a given scenario, the same parameters are used for processing all of the test images.

### 4.1 Simulated data

In this section, we present quantitative results using simulated data. We consider two cases: noise only and blur with noise. For each case, we consider four specific scenarios and use six test images. Also, for each case, we compare against state-of-the art methods for which MATLAB implementations are publicly available.

The test images are shown in Figure 10. These are 8-bit uncompressed images with a high level of detail. We use two quantitative performance metrics to evaluate the restorations. The first is the commonly used peak signal-to-noise ratio (PSNR), defined as

We also use the structural similarity (SSIM) index [46], which many argue is more consistent with subjective perception than PSNR. When reporting PSNR, we also include the improvement in PSNR (ISNR) for the reader’s convenience. This is given by

#### 4.1.1 Additive Gaussian noise

In our first case, we consider additive Gaussian noise with no PSF blur (i.e., *h*(*n*
_{1},*n*
_{2})=*δ*(*n*
_{1},*n*
_{2})). We consider four different noise standard deviations. The denoising benchmark methods are NLM [15], Globalized NLM (GLIDE-NLM) [37], PLOW [36], BM3D [32], and the single patch AWF [5]. Note that the NLM implementation is from [37], and AWF used is the same as CAWF with no aggregation and *M*=1.

The PSNR comparison is provided in Table 2, and the SSIM comparison is in Table 3. Note that CAWF provides the highest PSNR results in Table 2 in all but one instance, with BM3D generally providing the next highest PSNR values. Looking at Table 3, we see that CAWF still performs well, but BM3D does provide a higher SSIM in 5 of 24 scenarios. These results also show that CAWF consistently outperforms AWF. This demonstrates the advantage of using multiple patches within this framework. It is also interesting to note that AWF itself does quite well compared with some of the benchmark methods on these data, especially in the SSIM metric.

Selected regions of interest (ROIs) from images bridge and river for the noise-only case with *σ*
_{
η
}=30 are shown in Figures 11 and 12, respectively. We find that BM3D tends to do a better job in smooth areas, and CAWF generally appears better in high-detail texture areas. Note that more branches on the small trees are visible in the CAWF estimate in Figure 11f, compared with that for BM3D in Figure 11e. Also, the texture in the tree foliage appears to be better preserved with CAWF processing in Figure 12f, compared with that for BM3D in Figure 12e.

The results in Figure 13 show how the CAWF method can produce an estimate of the MSE on a pixel-by-pixel basis. Figure 13a shows an ROI from the image aerial with noise of standard deviation 10. The CAWF estimate image is shown in Figure 13b. The estimated MSE, computed according to Equation 26, is shown in Figure 13c. The average squared error over 100 noise realizations is shown in Figure 13d. Aside from some of the small high frequency structures, the estimated MSE is appears similar to the average squared error. The ability to provide an estimate of the MSE is another distinctive feature of the CAWF method among other multi-patch methods.

#### 4.1.2 Gaussian blur plus Gaussian noise

We consider four scenarios of Gaussian blur plus Gaussian noise, and these are listed in Table 4. The benchmark methods in this case must be able to address both blur and noise. We use L0-Abs [47], TVMM [48], BM3DDEB [38], IDD-BM3D [39], and AWF [5]. Note that for IDD-BM3D, the tuning parameters are selected from those used in [39]. In particular, we use the tuning parameters from Scenario 4 in [39], as these produce the highest PSNR values in the current experiments.

The PSNR comparison is provided in Table 5, and the SSIM comparison is in Table 6. Here, IDD-BM3D provides the highest PSNR in 15 of 24 instances, and CAWF provides the highest in 9 of 24. In terms of SSIM, IDD-BM3D provides the highest values in 5 of 24 instances and CAWF in 19 of 24. We see a similar situation with AWF as we did in the noise-only experiments. The AWF method does well compared to many of the benchmark methods, but CAWF consistently outperforms it. Selected ROIs from the images aerial and bones for Scenario III in Table 4 are shown in Figures 14 and 15, respectively. Again, CAWF appears to do a good job restoring image detail, based on subjective evaluation and SSIM. Note that a larger blur kernel generally demands a larger restoration filter window. The variational benchmark methods, like IDD-BM3D, are not restricted to local processing like CAWF and AWF. Thus, they may have an advantage in high levels of blur. However, the iterative nature of these methods also means that a full parallel implementation may not be possible.

### 4.2 Real data

Real video frames have been acquired of an outdoor natural scene on the campus of the University of Dayton using an Imaging Source 8 bit grayscale camera (DMK 23U618) with Sony ICX618ALA sensor. A short exposure time is used, proving a low SNR. A sequence of 500 frames is acquired for the static scene. This allows us to form a temporal average as a type of reference with which to compare the noise reduction estimates. Since this real noise will have both a signal-dependent and signal-independent component, we apply an Anscombe transform to stabilize the local noise variance prior to applying all denoising methods [49]. After the transform and scaling, an effective constant noise standard deviation of *σ*
_{
η
}=9.11 is estimated and used for all methods.

Figure 16a shows a 500 frame temporal average image. This image represents a near noise-free image of the scene that can be used as a reference. A single noisy frame is shown in Figure 16b. The processed single frame outputs for GLIDE-NLM, AWF, BM3D, and CAWF are shown in Figure 16c-f, respectively. The PSNR values, relative to the temporal average, for observed GLIDE-NLM, AWF, BM3D, and CAWF outputs are 31.78, 36.04, 36.33, 36.47, and 36.65, respectively. The corresponding SSIM values are 0.7525, 0.8929, 0.8975, 0.8973, and 0.9059. These results appear to be consistent with the results obtained with the simulated data.

### 4.3 Parameter and distance metric sensitivity

In this section, we investigate the sensitivity of the CAWF algorithm to some of the key tuning parameters listed in Table 1 and the distance metric used in the correlation model. We begin with the number of patches *M*. A plot of PSNR versus the number of similar patches is shown in Figure 17 for CAWF using the image bones with noise only and *σ*
_{
η
}=40. This plot is representative of many of the restoration scenarios. One can see a ‘knee’ in the curve near *M*=10. Since increasing *M* has a significant impact on computational complexity, we have elected to use *M*=10 for our denoising applications. Note that deconvolution requires a larger window size than denoising. Thus, to manage computational complexity, we compensate with a somewhat lower *M*=8, as shown in Table 1.

Next, we examine the autocorrelation decay constant, *ρ*, and the patch similarity decay, *α*, for the image aerial with additive Gaussian noise. We have evaluated CAWF PSNR for *ρ* ranging from 0.6 to 0.75, with all other parameter values as listed in Table 1. The maximum change in PSNR as a function of *ρ*, for noise levels ranging from *σ*
_{
η
}=10 to *σ*
_{
η
}=40, is only 0.13*%*. Similarly, we have evaluated PSNR values for *α* ranging from 1.0 to 2.0. The maximum change in PSNR as a function of *α* is observed to be 0.23*%*. Thus, we conclude that the CAWF method is not highly sensitive to these tuning parameters within these operating ranges.

Finally, we explore CAWF performance using different distance metrics in the correlation model. Our standard metric uses the *l*
^{2}-norm as defined in Equation 13. To test distance metric sensitivity, we compare distances with the *l*
^{1}-, *l*
^{2}-, and *l*
^{10}-norms. For aerial with *σ*
_{
η
}=10, the PSNRs are 30.67, 30.67, and 30.60, respectively (with tuned scaling parameters). For aerial with *σ*
_{
η
}=40, the PSNRs are 23.01, 23.13, and 23.05, respectively (also with tuned scaling parameters). As with the other parameters, we do not see a strong sensitivity to the choice of distance metric. However, the *l*
^{2}-norm generally provides the best results.

## 5 Conclusions

We have proposed a novel CAWF method for image restoration, which can be thought of an extension of the AWF [5] using multiple patches. For each reference window, *M* similar patches are identified. The output is formed as a single-pass weighted sum of all of the pixels from the multiple selected patches. Wiener weights are used to provide a minimum MSE estimate for this filter structure. A key aspect of the method is the new spatial-domain multi-patch correlation model, presented in Section 1. This model attempts to capture the spatial correlation among the samples within a given patch and also the correlations among the patches.

The CAWF is able to jointly perform denoising and deblurring. We believe this type of joint restoration is advantageous, compared with decoupling these operations. The CAWF algorithm is also capable of adapting to local signal and noise variance. Bad or missing pixels can easily be accommodated by leaving them out of the multi-patch observation vector and corresponding correlation statistics. The weights will adapt in a non-trivial way to the missing pixels [5,9,10].

In simulated and real data for Gaussian noise, the CAWF outperforms the benchmark methods in our experiments in Sections 1 and 1, both in PSNR and in SSIM. With blur and noise, CAWF produces the highest SSIM in more cases than the benchmark methods. However, IDD-BM3D does provide a higher PSNR in more instances. Our results show that the CAWF method consistently outperforms the AWF. This clearly demonstrates that incorporating multiple patches within this filter structure is advantageous. From the results in Section 1, we also conclude that CAWF performance is not highly sensitive to the tuning parameter values within a given operating range.

We believe the single-pass weighted-sum structure of the CAWF method is conceptually simple and versatile. It is also highly parallel. In principle, each output pixel can be computed in parallel. We have demonstrated that the method provides excellent performance in image restoration with noise and blur and noise. This method may be beneficial in numerous other applications as well, including those where its predecessor, the AWF, is successful [5,9-14]. We think there may also be an opportunity for further improvements in the parametric correlation model that could boost filter performance. Thus, we hope this approach will be of interest to the signal and image processing community.

## References

G Demoment, Image reconstruction and restoration: overview of common estimation structures and problems. IEEE Trans Acoustics, Speech Signal Process. 37(12), 2024–36 (1989).

MI Sezan, AM Tekalp, Survey of recent developments in digital image restoration. Opt Eng. 29(5), 393–404 (1990).

AK Jain,

*Fundamentals of Digital Image Processing*(Prentice Hall, New Jersey, 1989).SP Ghael, AM Sayeed, RG Baraniuk, in

*Proc of SPIE*, 3169. Improved wavelet denoising via empirical Wiener filtering (SPIESan Diego, 1997), pp. 389–99.RC Hardie, A fast image super-resolution algorithm using an adaptive Wiener filter. IEEE Transactions on Image Process. 16, 2953–64 (Dec. 2007).

KE Barner, AM Sarhan, RC Hardie, Partition-based weighted sum filters for image restoration. IEEE Trans Image Process. 8, 740–745 (May 1999).

M Shao, KE Barner, RC Hardie, Partition-based interpolation for image demosaicing and super-resolution reconstruction. Opt Eng. 44, 107003–1–107003–14 (Oct 2005).

B Narayanan, RC Hardie, KE Barner, M Shao, A computationally efficient super-resolution algorithm for video processing using partition filters. IEEE Trans Circuits Syst Video Technol. 17, 621–34 (May 2007).

RC Hardie, KJ Barnard, R Ordonez, Fast super-resolution with affine motion using an adaptive Wiener filter and its application to airborne imaging. Opt Express, 1926208–31 (Dec 2011).

RC Hardie, KJ Barnard, Fast super-resolution using an adaptive Wiener filter with robustness to local motion. Opt Express. 20, 21053–73 (Sep 2012).

B Narayanan, RC Hardie, E Balster, Multiframe adaptive Wiener filter super-resolution with JPEG2000-compressed images. EURASIP J Adv Signal Process. 2014(1), 55 (2014).

RC Hardie, DA LeMaster, BM Ratliff, Super-resolution for imagery from integrated microgrid polarimeters. Opt Express. 19, 12937–60 (Jul 2011).

BK Karch, RC Hardie, Adaptive Wiener filter super-resolution of color filter array images. Opt Express, 2118820–41 (Aug 2013).

M Rucci, RC Hardie, KJ Barnard, 53. Appl Opt, C1–13 (May 2014).

A Buades, B Coll, JM Morel, A review of image denoising algorithms, with a new one. Multiscale Model Simul. 4, 490–530 (2005).

KE Barner, GR Arce, J-H Lin, On the performance of stack filters and vector detection in image restoration. Circuits Syst Signal Process. 11, No. 1 (Jan 1992).

KE Barner, RC Hardie, GR Arce, in

*Proceedings of the 1994 CISS*. On the permutation and quantization partitioning of**R**^{N}and the filtering problem (New Jersey, Princeton, Mar 1994).A Buades, B Coll, JM Morel, in

*Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Comput Soc Conference on, vol. 2*. A non-local algorithm for image denoising (IEEE, June 2005), pp. 60–65.C Kervrann, J Boulanger, Optimal spatial adaptation for patch-based image denoising. IEEE Trans Image Process. 15(10), 2866–78 (2006).

A Buades, B Coll, J-M Morel, Nonlocal image and movie denoising. Int J Comput Vision. 76, 123–39 (Feb 2008).

Y Han, R Chen, Efficient video denoising based on dynamic nonlocal means. Image Vision Comput. 30, 78–85 (Feb 2012).

Tasdizen, Principal neighborhood dictionaries for nonlocal means image denoising. IEEE Trans Image Process. 18, 2649–60 (July 2009).

H Bhujle, S Chaudhuri, Novel speed-up strategies for non-local means denoising with patch and edge patch based dictionaries. IEEE Trans Image Process. 23, 356–365 (Jan 2014).

Y Wu, B Tracey, P Natarajan, JP Noonan, SUSAN controlled decay parameter adaption for non-local means image denoising. Electron Lett. 49, 807–8 (June 2013).

WL Zeng, XB Lu, Region-based non-local means algorithm for noise removal. Electron Lett. 47, 1125–7 (September 2011).

WF Sun, YH Peng, WL Hwang, Modified similarity metric for non-local means algorithm. Electron Lett. 45, 1307–9 (Dec 2009).

C Tomasi, R Manduchi, in

*Proceedings of the 1998 IEEE International Conference on Computer Vision, Bombay*. Bilateral filtering for gray and color images (IEEEIndia, 1998).H Kishan, CS Seelamantula,

*Sure-fast bilateral filters. Acoustics, Speech and Signal Processing (ICASSP) 2012 IEEE International Conference on*(IEEE, Kyoto, 2012).W Kesjindatanawaj, S Srisuk, in

*Communications and Information Technologies (ISCIT), 2013 13th International Symposium on*. Deciles-based bilateral filtering (IEEESurat Thani, 2013), pp. 429–33.X Changzhen, C Licong, P Yigui,

*An adaptive bilateral filtering algorithm and its application in edge detection. Measuring Technology and Mechatronics Automation (ICMTMA), 2010 International Conference on. vol. 1*(IEEE, Changsha City, 2010).H Peng, R Rao, SA Dianat, Multispectral image denoising with optimized vector bilateral filter. IEEE Trans Image Process. 23, 264–73 (Jan 2014).

K Dabov, A Foi, V Katkovnik, K Egiazarian, Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Trans Image Process. 16, 2080–95 (Aug 2007).

K Dabov, A Foi, K Egiazarian, Video denoising by sparse 3d transform-domain collaborative filtering. Proc 15th Eur Signal Process Conference. 1, 7 (2007).

M Maggioni, G Boracchi, A Foi, K Egiazarian, Video denoising, deblocking, and enhancement through separable 4-d nonlocal spatiotemporal transforms. IEEE Trans Image Process. 21(9), 3952–66 (2012).

K Hirakawa, T Parks, Image denoising using total least squares. IEEE Trans Image Process. 15(9), 2730–42 (2006).

P Chatterjee, P Milanfar, Patch-based near-optimal image denoising. IEEE Trans Image Process. 21, 1635–49 (April 2012).

H Talebi, P Milanfar, Global image denoising. IEEE Trans Image Process. 23, 755–768 (Feb 2014).

K Dabov, A Foi, V Katkovnik, K Egiazarian, in

*SPIE Electronic Imaging*, 6812. Image restoration by sparse 3d transform-domain collaborative filtering (San Jose, Jan 2008).A Danielyan, V Katkovnik, K Egiazarian, BM3D frames and variational image deblurring. IEEE Trans Image Process. 21, 1715–28 (April 2012).

K Nasrollahi, TB Moeslund, Super-resolution: a comprehensive survey. Mach Vision Appl. 25(6), 1423–68 (Aug 2014).

M Protter, M Elad, Super-resolution with probabilistic motion estimation. IEEE Trans Image Process. 18(8), 1899–904 (2009).

M Protter, M Elad, H Takeda, P Milanfar, Generalizing the nonlocal-means to super-resolution reconstruction. IEEE Trans Image Process. 18(1), 36–51 (2009).

MH Cheng, HY Chen, JJ Leou, Video super-resolution reconstruction using a mobile search strategy and adaptive patch size. Signal Process. 91, 1284–97 (2011).

B Huhle, T Schairer, P Jenke, W Straber, Fusion of range and color images for denoising and resolution enhancement with a non-local filter. Comput Vision Image Understanding. 114, 1336–45 (2012).

K-W Hung, W-C Siu, Single image super-resolution using iterative Wiener filter. Proc IEEE Int Conference Acoustics, Speech and Signal Process, 1269–72 (2012).

Z Wang, A Bovik, H Sheikh, E Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 13, 600–12 (April 2004).

J Portilla, Image restoration through l0 analysis-based sparse optimization in tight frames. Image Process (ICIP), 2009 16th IEEE Int Conference on, 3909–912 (Nov 2009).

J Oliveira, JM Bioucas-Dias, MA Figueire, Adaptive total variation image deblurring: A majorization-minimization approach. Signal Process. 89, 1683–93 (Sep 2009).

M Makitalo, F Foi, Optimal inversion of the generalized Anscombe transformation for Poisson-Gaussian noise. IEEE Trans Image Process. 22(1), 91–103 (2013).

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Competing interests

The authors declare that they have no competing interests.

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Mohamed, K.M., Hardie, R.C. A collaborative adaptive Wiener filter for image restoration using a spatial-domain multi-patch correlation model.
*EURASIP J. Adv. Signal Process. * **2015**, 6 (2015). https://doi.org/10.1186/s13634-014-0189-3

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s13634-014-0189-3