SSIM-inspired image restoration using sparse representation

Rehman, Abdul; Rostami, Mohammad; Wang, Zhou; Brunet, Dominique; Vrscay, Edward R

doi:10.1186/1687-6180-2012-16

Research
Open access
Published: 20 January 2012

SSIM-inspired image restoration using sparse representation

Abdul Rehman¹,
Mohammad Rostami¹,
Zhou Wang¹,
Dominique Brunet² &
…
Edward R Vrscay²

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 16 (2012) Cite this article

7138 Accesses
54 Citations
3 Altmetric
Metrics details

Abstract

Recently, sparse representation based methods have proven to be successful towards solving image restoration problems. The objective of these methods is to use sparsity prior of the underlying signal in terms of some dictionary and achieve optimal performance in terms of mean-squared error, a metric that has been widely criticized in the literature due to its poor performance as a visual quality predictor. In this work, we make one of the first attempts to employ structural similarity (SSIM) index, a more accurate perceptual image measure, by incorporating it into the framework of sparse signal representation and approximation. Specifically, the proposed optimization problem solves for coefficients with minimum $ℒ_{0}$ norm and maximum SSIM index value. Furthermore, a gradient descent algorithm is developed to achieve SSIM-optimal compromise in combining the input and sparse dictionary reconstructed images. We demonstrate the performance of the proposed method by using image denoising and super-resolution methods as examples. Our experimental results show that the proposed SSIM-based sparse representation algorithm achieves better SSIM performance and better visual quality than the corresponding least square-based method.

1 Introduction

In many signal processing problems, mean squared error (MSE) has been the preferred choice as the optimization criterion due to its ease of use and popularity, irrespective of the nature of signals involved in the problem. The story is not different for image restoration tasks. Algorithms are developed and optimized to generate the output image that has minimum MSE with respect to the target image [1–6]. However, MSE is not the best choice when it comes to image quality assessment (IQA) and signal approximation tasks [7]. In order to achieve better visual performance, it is desired to modify the optimization criterion to the one that can predict visual quality more accurately. SSIM has been quite successful in achieving superior IQA performance [8]. Figure 1 demonstrates the difference between the performance of SSIM and absolute error (the bases for $ℒ_{p}$ , MSE, PSNR, etc.). Figure 1c shows the quality map of the image 1b with reference to 1a, obtained by calculating the absolute pixel-by-pixel error, which forms the basis of MSE calculation for quality evaluation. Figure 1d shows the corresponding SSIM quality map which is used to calculate the SSIM index of the whole image. It is quite evident from the maps that SSIM performs a better job in predicting perceived image quality. Specifically, the absolute error map is uniform over space, but the texture regions in the noisy image appear to be much less noisier than the smooth regions. Clearly, the SSIM map is more consistent with such observations.

The SSIM index and its extensions have found a wide variety of applications, ranging from image/video coding i.e., H.264 video coding standard implementation [9], image classification [10], restoration and fusion [11], to watermarking, denoising and biometrics (see [7] for a complete list of references). In most existing works, however, SSIM has been used for quality evaluation and algorithm comparison purposes only. SSIM possesses a number of desirable mathematical properties, making it easier to be employed in optimization tasks than other state-of-the-art perceptual IQA measures [12]. But, much less has been done on using SSIM as an optimization criterion in the design and optimization of image processing algorithms and systems [13–19].

Image restoration problems are of particular interest to image processing researchers, not only for their practical value, but also because they provide an excellent test bed for image modeling, representation and estimation theories. When addressing general image restoration problems with the help of Bayesian approach, an image prior model is required. Traditionally, the problem of determining suitable image priors has been based on a close observation of natural images. This leads to simplifying assumptions such as spatial smoothness, low/max-entropy or sparsity in some basis set. Recently, a new approach has been developed for learning the prior based on sparse representations. A dictionary is learned either from the corrupted image or a high-quality set of images with the assumption that it can sparsely represent any natural image. Thus, this learned dictionary encapsulates the prior information about the set of natural images. Such methods have proven to be quite successful in performing image restoration tasks such as image denoising [3] and image super-resolution [5, 20]. More specifically, an image is divided into overlapping blocks with the help of a sliding window and subsequently each block is sparsely coded with the help of dictionary. The dictionary, ideally, models the prior of natural images and is therefore free from all kinds of distortions. As a result the reconstructed blocks, obtained by linear combination of the atoms of dictionary, are distortion free. Finally, the blocks are put back into their places and combined together in light of a global constraint for which a minimum MSE solution is reached. The accumulation of many blocks at each pixel location might affect the sharpness of the image. Therefore, the distorted image must be considered as well in order to reach the best compromise between sharpness and admissible distortions.

Since MSE is employed as the optimization criterion, the resulting output image might not have the best perceptual quality. This motivated us to replace the role of MSE with SSIM in the framework. The solution of this novel optimization problem is not trivial because SSIM is non-convex in nature. There are two key problems that have to be resolved before effective SSIM-based optimization can be performed. First, how to optimally decompose an image as a linear combination of basis functions in maximal SSIM, as opposed to minimal MSE sense. Second, how to estimate the best compromise between the distorted and sparse dictionary reconstructed images for maximal SSIM. In this article, we provide solutions to these problems and use image denoising and image super-resolution as applications to demonstrate the proposed framework for image restoration problems.

We formulate the problem in Section 2.1 and provide our solutions to issues discussed above in Sections 2.2 and 2.3. Section 3.1 describes our approach to denoise the images. The proposed method for image super-resolution is described in Section 3.2 and finally we conclude in Section 4.

2 The proposed method

In this section we will incorporate SSIM as our quality measure, particularly for sparse representation. In contrast to what we may expect, it is shown that sparse representation in minimal $ℒ_{2}$ norm sense can be easily converted to maximal SSIM sense. We will also use a gradient descend approach to solve a global optimization problem in maximal SSIM sense. Our framework can be applied to a wide class of problems dealing with sparse representation to improve visual quality.

2.1 Image restoration from sparsity

The classic formulation of image restoration problem is as following:

y = Φ x + n

(1)

where x ∈ ℝⁿ, y ∈ ℝ^m, n ∈ ℝ^m, and Φ ∈ ℝ^{m x n}. Here we assume x and y are vectorized versions, by column stacking, of original 2-D original and distorted images, respectively. n is the noise term, which is mostly assumed to be zero mean, additive, and independent Gaussian. Generally m < n and thus the problem is ill-posed. To solve the problem assertion of a prior on the original image is necessary. The early approaches used least square (LS) [21] and Tikhonov regularization [22] as priors. Later minimal total variation (TV) solution [23] and sparse priors [3] were used successfully on this problem. Our focus in the current work is to improve algorithms, in terms of visual quality, that assert sparsity prior on the solution in term of a dictionary domain.

Sparsity prior has been used successfully to solve different inverse problems in image processing [3, 5, 24, 25]. If our desired signal, x, is sparse enough then it has been shown that the solution to (1) is the one with maximum sparsity which is unique (within some ϵ-ball around x) [26, 27]. It can be easily found by solving a linear programming problem or by orthogonal matching pursuit (OMP). Not all natural signals are sparse but a wide range of natural signals can be represented sparsely in terms of a dictionary and this makes it possible to use sparsity prior on a wide range of inverse problems. One major problem is that the image signals are considered to be high dimensional data and thus, solving (1) directly is computationally expensive. To tackle this problem we assume local sparsity on image patches. Here, it is assumed that all the image patches have sparse representation in terms of a dictionary. This dictionary can be trained over some patches [28].

Central to the process of image restoration, using local sparse and redundant representations, is the solution to the following optimization problems [3, 5],

{\hat{α}}_{i j} = \underset{α}{arg min} μ_{i j} {∥α∥}_{0} + {∥Ψ α - R_{i j} X∥}_{2}^{2},

(2)

\hat{X} = \underset{x}{arg min} {∥X - W∥}_{2}^{2} + λ {∥D H X - Y∥}_{2}^{2} .

(3)

where Y is the observed distorted image, X is the unknown output restored image, R_ijis a matrix that extracts the (ij) block from the image, Ψ ∈ ℝ^{n x k}is the dictionary with k > n, α_ijis the sparse vector of coefficients corresponding to the (ij) block of the image, $\hat{X}$ is the estimated image, λ is the regularization parameter, and W is the image obtained by averaging the blocks obtained using the sparse coefficients vectors ${\hat{α}}_{i j}$ calculated by solving optimization problem in (2). This is a local sparsity-based method that divides the whole image into blocks and represents each block sparsely using some trained dictionary Among other advantages, one major advantage of such a method is the ease to train a small dictionary as compared to one large global dictionary This is achieved with the help of (2) which is equivalent to (4). As to the coefficients μ_ij, those must be location dependent, so as to comply with a set of constraints of the form ${∥Ψ α - R_{i j} X∥}_{2}^{2} \leq T$ . Solving this using the orthonormal matching pursuit [29] is easy, gathering one atom at a time, and stopping when the error ${∥Ψ α - R_{i j} X∥}_{2}^{2}$ goes below T. This way, the choice of μ_ijhas been handled implicitly Equation (3) applies a global constraint on the reconstructed image and uses the local patches and the noisy image as input in order to construct the output that complies with local-sparsity and also lies within the proximity of the distorted image which is defined by amount and type of distortion.

{\hat{α}}_{i j} = \underset{α}{arg min} {∥α∥}_{0} subject to {∥Ψ α - R_{i j} X∥}_{2}^{2} \leq T

(4)

In (3), we have assumed that the distortion operator Φ in (1) may be represented by the product DH, where H is a blurring filter and D the downsampling operator. Here we have assumed each non-overlapping patch of the images can be represented sparsely in the domain of Ψ. Assuming this prior on each patch (2) refers to the sparse coding of local image patches with bounded prior, hence building a local model from sparse representations. This enables us to restore individual patches by solving (2) for each patch. By doing so, we face the problem of blockiness at the patch boundaries when denoised non-overlapping patches are placed back in the image. To remove these artifacts from the denoised images overlapping patches are extracted from the noisy image which are combined together with the help of (3). The solution of (3) demands the proximity between the noisy image, Y, and the output image X, thus enforcing the global reconstruction constraint. The $ℒ_{2}$ optimal solution suggests to take the average of the overlapping patches [3], thus eliminating the problem of blockiness in the denoised image.

As stated earlier, we propose a modified restoration method which incorporates SSIM into the procedure defined by (2) and (3). It is defined as follows,

{\hat{α}}_{i j} = \underset{α}{arg min} μ_{i j} {∥α∥}_{0} + (1 - S (Ψ α, R_{i j} X)),

(5)

\hat{X} = \underset{x}{arg max} S (W, X) + λ S (D H X, Y),

(6)

where S(·,·) defines the SSIM measure. The expression for SSIM index is

S (a, y) = \frac{2 μ_{a} μ_{y} + C_{1}}{μ_{a}^{2} + μ_{y}^{2} + C_{1}} \frac{2 σ_{a, y} + C_{2}}{σ_{a}^{2} + σ_{y}^{2} + C_{2}},

(7)

with μ_aand μ_ythe means of a and y respectively, $σ_{a}^{2}$ and $σ_{y}^{2}$ the sample variances of a and y respectively, and σ_aythe covariance between a and y. The constants C₁ and C₂ are stabilizing constants and account for the saturation effect of the HVS.

Equation (5) aims to provide the best approximation of a local patch in SSIM-sense with the help of minimum possible number of atoms. The process is performed locally for each block in the image which are then combined together by simple averaging to construct W. Equation (6) applies a global constraint and outputs the image that is the best compromise between the noisy image, Y, and W in SSIM-sense. This step is very vital because it has been observed that the image W lacks the sharpness in the structures present in the image. Due to the masking effect of the HVS, same level of noise does not distort different visual content equally. Therefore, the noisy image is used to borrow the content from its regions which are not convoluted severely by noise. Use of SSIM is very well-suited for such a task, as compared to MSE, because it accounts for the masking effect of HVS and allows us to capture improve structural details with the help of the noisy image. Note the use of 1 - S(·, ·) in (5). This is motivated by the fact that 1 - S(·,·) is a squared variance-normalized $ℒ_{2}$ distance [30]. Solutions to the optimization problems in (5) and (6) are given in Sections 2.2 and 2.3, respectively.

2.2 SSIM-optimal local model from sparse representation

This section discusses the solution to the optimization problem in (5). Equation (2) can be solved approximately using OMP [29] by including one atom at a time and stopping when the error ${∥Ψ α_{i j} - R_{i j} X∥}_{2}^{2}$ goes below Tmse = (Cσ)². C is the noise gain and σ is the standard deviation of the noise. We solve the optimization problem in (5) based on the same philosophy We gather one atom at a time and stop when S(Ψ α,x_ij) goes above T_ssim, threshold defined in terms of SSIM. In order to obtain T_ssim, we need to consider the relationship between MSE and SSIM. For the mean reduced a and y, the expression of SSIM reduces to the following equation

S (a, y) = \frac{2 σ_{a, y} + C_{2}}{σ_{a}^{2} + σ_{y}^{2} + C_{2}},

(8)

Subtracting both sides of (8) from 1 yields

1 - S (a, y) = 1 - \frac{2 σ_{a, y} + C_{2}}{σ_{a}^{2} + σ_{y}^{2} + C_{2}}

(9)

= \frac{σ_{a}^{2} + σ_{y}^{2} - 2 σ_{a, y}}{σ_{a}^{2} + σ_{y}^{2} + C_{2}}

(10)

= \frac{{∥a - y∥}_{2}^{2}}{σ_{a}^{2} + σ_{y}^{2} + C_{2}},

(11)

(12)

Equation (12) can be re-arranged to arrive at the following result

S (a, y) = 1 - \frac{{∥a - y∥}_{2}^{2}}{σ_{a}^{2} + σ_{y}^{2} + C_{2}}

(13)

With the help of the equation above, we can calculate the value of T_ssim as follows

T_{ssim} = 1 - \frac{T_{mse}}{σ_{a}^{2} + σ_{y}^{2} + C_{2}},

(14)

where C₂ is the constant originally used in SSIM index expression [8] and $σ_{a}^{2}$ is calculated based on current approximation of the block given by a: = Ψ α.

It has already been shown that the main difference between SSIM and MSE is the divisive normalization [30, 31]. This normalization is conceptually consistent with the light adaptation (also called luminance masking) and contrast masking effect of HVS. It has been recognized as an efficient perceptually and statistically non-linear image representation model [32, 33]. It is shown to be a useful framework that accounts for the masking effect in human visual system, which refers to the reduction of the visibility of an image component in the presence of large neighboring components [34, 35]. It has also been found to be powerful in modeling the neuronal responses in the visual cortex [36, 37]. Divisive normalization has been successfully applied in IQA [38, 39], image coding [40], video coding [31] and image denoising [41].

Equation (14) suggests that the threshold is chosen adaptively for each patch. The set of coefficients α = (α₁, α₂, α₃,..., α_k) should be calculated such that we get the best approximation a in terms of SSIM. We search for the stationary points of the partial derivatives of S with respect to α. The solution to this problem for orthogonal set of basis is discussed in [30]. Here we aim to solve a more general case of linearly independent atoms. The $ℒ_{2}$ -based optimal coefficients, ${c_{i}}_{i = 1}^{k}$ , can be calculated by solving the following system of equations

\sum_{j = 1}^{k} c_{j} ⟨ψ_{i}, ψ_{j}⟩ = ⟨y, ψ_{i}⟩, 1 \leq i \leq k,

(15)

We denote the inner product of a signal with the constant signal (1/n, 1/n,..., 1/n) of length n by < ψ >: = < ψ, 1/n >, where < ·, · > represents the inner product.

First, we write the mean, the variance and the covariance of a in terms of α with n the size of the current block:

μ_{a} = ⟨\sum_{i = 1}^{k} α_{i} ψ_{i}⟩ = \sum_{i = 1}^{k} α_{i} ⟨ψ_{i}⟩

(16)

\begin{align} (n - 1) σ_{a}^{2} & = ⟨a, a⟩ - n {⟨a⟩}^{2} \\ = \sum_{i = 1}^{k} \sum_{j = 1}^{k} α_{i} α_{j} ⟨ψ_{i} ψ_{j}⟩ - n μ_{a}^{2}, \end{align}

(17)

\begin{align} (n - 1) σ_{a y} & = ⟨a, y⟩ - n ⟨a⟩ ⟨y⟩ \\ = \sum_{i = 1}^{k} α_{i} ⟨y, ψ_{i}⟩ - n μ_{a} μ_{y}, \end{align}

(18)

where < · > represents the sample mean. The partial derivatives are given as follows

\frac{\partial μ_{a}}{\partial α_{i}} = ⟨ψ_{i}⟩,

(19)

(n - 1) \frac{{\partial σ}_{a}^{2}}{\partial α_{i}} = 2 \sum_{j = 1}^{k} α_{j} ⟨ψ_{i}, ψ_{j}⟩ - 2 n μ_{a} ⟨ψ_{i}⟩,

(20)

(n - 1) \frac{\partial σ_{a y}}{\partial α_{i}} = ⟨y, ψ_{i}⟩ - n μ_{y} ⟨ψ_{i}⟩,

(21)

The structural similarity can be written as

\begin{align} log S & = log (2 μ_{a} μ_{y} + C_{1}) + log (2 σ_{a, y} + C_{2}) \\ - log (σ_{a}^{2} + σ_{y}^{2} + C_{2}) - log (μ_{a}^{2} + μ_{y}^{2} + C_{2}) \end{align}

(22)

From logarithmic differentiation of (7) combined with (19)-(21), we have

\frac{1}{S} \frac{\partial S}{\partial α_{i}} = \frac{2 μ_{y} ⟨ψ_{i}⟩}{2 μ_{a} μ_{y} + C_{1}} - \frac{2 μ_{a} ⟨ψ_{i}⟩}{μ_{a}^{2} μ_{y}^{2} + C_{1}} + \frac{2 [⟨y, ψ_{i}⟩ - n μ_{y} ⟨ψ_{i}⟩]}{(n - 1) [2 σ_{a, y} + C_{2}]} - \frac{2 [\sum_{j = 1}^{k} α_{j} ⟨ψ_{i}, ψ_{j}⟩ - n μ_{a} ⟨ψ_{i}⟩]}{(n - 1) [σ_{a}^{2} + σ_{y}^{2} + C_{2}]}

(23)

After subtracting the corresponding DC values from all the blocks in the image, we are interested only in the particular case where the atoms are made of oscillatory functions, i.e., when 〈ψ_i〉 = 0 for 1 ≤ i ≤ k, thus reducing (23) to

\frac{1}{S} \frac{\partial S}{\partial α_{i}} = \frac{2 ⟨y, ψ_{i}⟩}{(n - 1) 2 σ_{a, y} + C_{2}} - \frac{2 (\sum_{j = 1}^{k} α_{j} ⟨ψ_{i}, ψ_{j}⟩)}{(n - 1) (σ_{a}^{2} + σ_{y}^{2} + C_{2})} .

(24)

We equate (24) to zero in order to find the stationary points. The result is the following linear system of equations

\sum_{j = 1}^{k} α_{j} ⟨ψ_{i}, ψ_{j}⟩ = β ⟨y, ψ_{i}⟩, 1 \leq i \leq k,

(25)

where

β = \frac{σ_{a}^{2} + σ_{y}^{2} + C_{2}}{2 σ_{a y} + C_{2}} .

(26)

where β is an unknown constant dependent on the statistics of the unknown image block a. Comparing α with the optimal coefficients in $ℒ_{p}$ sense denoted by c and given by (15) results in the following solution:

α_{i} = β c_{i}, 1 \leq i \leq k,

(27)

which implies that the optimal SSIM-based solution is just a scaling of the optimal $ℒ_{2}$ -based solution. The last step is to find β. It is important to note that the value of β varies over the image and is therefore content dependent. Also, the scaling factor, β, may lead to selection of a different set of atoms from the dictionary, as compared to $ℒ_{2}$ where β = 1, which are better suited to providing a closer and sparser approximation of the patch in SSIM-sense. After substituting (27) in the expression (26) for β via (16), (17) and (18) and then isolating for β gives us the following quadratic equation

β^{2} (B - A) + β C_{2} - σ_{y}^{2} - C_{2} = 0 .

(28)

where

A = \frac{1}{n - 1} \sum_{i = 1}^{k} \sum_{j = 1}^{k} c_{i} c_{j} ⟨ψ_{i}, ψ_{j}⟩,

(29)

B = \frac{2}{n - 1} \sum_{j = 1}^{k} c_{j} ⟨y, ψ_{j}⟩ .

(30)

Solving for β and picking a positive value for maximal SSIM gives us

β = \frac{- C_{2} + \sqrt{C_{2}^{2} + 4 (B - A) (σ_{y}^{2} + C_{2})}}{2 (B - A)} .

(31)

Now we have all the tools required for an OMP algorithm that perform the sparse coding stage in optimal SSIM sense. The modified OMP pursuit algorithm is explained in Algorithm 1. There are two main differences between the OMP algorithm [29] and the one proposed in this work. First, the stopping criterion is based on SSIM. Unlike MSE, SSIM is adaptive according to the reference image. In particular, if the distortion is consistent with the underlying reference e.g., contract enhancement, the distortion is non-structural and is much less objectional than structural distortions. Defining the stopping criterion according to SSIM essentially means that we are modifying the set of accepted points (image patches) around the noisy image patch which can be represented as the linear combination of dictionary atoms. This way, in the space of image patches, we are omitting image patches in the direction of structural distortion and including the ones which are in the same direction as the original image patch in the set of acceptable image patches. Therefore, we can expect to see more structures in the image constructed using sparsity as a prior. Second, we calculate the SSIM-optimal coefficients from the optimal coefficients in $ℒ_{2}$ -sense using the derivation in Section 2.2, which are scalar multiple of the optimal $ℒ_{2}$ -based coefficients.

2.3 SSIM-based global reconstruction

The solution to this optimization problem defined in Equation (6) is the image that is the best compromise between the distorted image and the one obtained using sparse representation in the maximal SSIM sense. With the assumption of known dictionary, the only other thing the optimization problem in (6) requires is the coefficients α_ijwhich can be obtained by solving optimization problem in (5). SSIM is a local quality measure when it is applied using a sliding window, it provides us with a quality map that reflects the variation of local quality over the whole image. The global SSIM is computed by pooling (averaging) the local SSIM map. The global SSIM for an image, Y, with respect to the reference image, X, is given by the following equation

S (X, Y) = \frac{1}{N_{l}} \sum_{i j} S (x_{i j}, y_{i j}),

(32)

where x_ij= R_ijX and y_ij= R_ijY where R_ijis an N_w× N matrix that extracts the (ij) block from the image. The expression for local SSIM, S(x_ij, y_ij), is given by (7). N_lis the total number of local windows and can be calculated as

N_{l} = \frac{1}{N_{w}} tr (\sum_{i j} R_{i j}^{T} R_{i j}) .

(33)

where tr(·) denotes the trace of a matrix.

We use a gradient-descent approach to solve the optimization problem given by (6). The update equation is given by

\begin{align} {\hat{X}}_{k + 1} & = {\hat{X}}_{k} + λ {\vec{\nabla}}_{Y} S (X, Y) \\ = {\hat{X}}_{k} + λ \frac{1}{N_{l}} {\vec{\nabla}}_{Y} \sum_{i j} S (x_{i j}, y_{i j}) \\ = {\hat{X}}_{k} + λ \frac{1}{N_{l}} \sum_{i j} R_{i j}^{T} {\vec{\nabla}}_{y} S (x_{i j}, y_{i j}) \end{align}

(34)

where

\begin{gathered} {\vec{\nabla}}_{y} S (x, y) = \frac{2}{N_{w} B_{1}^{2} B_{2}^{2}} [A_{1} B_{1} (B_{2} x - A_{2} y + B_{1} B_{2} (A_{2} - A_{1}) μ_{x} 1 + A_{1} A_{2} (B_{1} - B_{2}) μ_{y} 1)], \\ \begin{matrix} A_{1} = 2 μ_{x} μ_{y} + C_{1}, & A_{2} = 2 σ_{x y} + C_{2}, \\ B_{1} = μ_{x}^{2} + μ_{y}^{2} + C_{1}, & B_{2} = σ_{x}^{2} + σ_{y}^{2} + C_{2}, \end{matrix} \end{gathered}

(35)

where N_wis the number of pixels in the local image patch, μ_x, $σ_{x}^{2}$ and σ_xyrepresent the sample mean of x, the sample variance of x, and the sample covariance of x and y, respectively Equation (34) suggests that averaging of the gradients of local patches is to be calculated in order to obtain the global SSIM gradient, and thus the direction and distance of the k th update in $\hat{X}$ . More details regarding the computation of SSIM gradient can be found in [42]. In our experiment, we found this gradient based approach is well-behaved and it takes only a few iterations for $\hat{X}$ to converge to a stationary point. We initialize $\hat{x}$ as the best MSE solution. Having the gradient of SSIM we follow an iterative procedure to solve (6), assuming the initial value derived from minimal MSE solution.

3 Applications

The framework we proposed provides a general approach that can be used for different applications. To show the effectiveness of our method we will provide two applications: image denoising and super-resolution.

3.1 Image denoising

We use the SSIM-based sparse representations framework developed in Sections 2.2 and 2.3 to perform the task of image denoising. The noise-contaminated image is obtained using the following equation

Y = X + N,

(36)

where Y is the observed distorted image, X is the noise-free image and N is additive Gaussian noise. Our goal is to remove the noise from distorted image. Here we train a dictionary, Ψ, for which the original image can be represented sparsely in its domain. We use KSVD method [28] to train the dictionary. In this method the dictionary, which is trained directly over the noisy image and denoising is done in parallel. For a fixed number of iterations, J, we initialize the dictionary by discrete cosine transform (DCT) dictionary. In each step we update the image and then the dictionary. First, based on the current dictionary, sparse coding is done for each patch, and then KSVD is used to update the dictionary (interested reader can refer to [28] for details of dictionary updating). Finally, after doing this procedure J times we execute a global construction stage, following the gradient descend procedure. The proposed image denoising algorithm is summarized in Algorithm 2.

The proposed image denoising scheme is tested on various images with different amount of noise. In all the experiments, the dictionary used was of size 64 × 256, designed to handle patches of 8 × 8 pixels. The value of noise gain, C, is selected to be 1.15 and λ = 30/σ[3]. Table 1 shows the results for images Barbara, Lena, Peppers, House. It also compares the K-SVD method [3] with the proposed denoising method. It can be observed that the proposed denoising method achieves better performance in terms of SSIM which is expected to imply better perceptual quality of the denoised image. Figures 2 and 3 show the denoised images using K-SVD [3] and the proposed methods along with corresponding SSIM maps. It can be observed that SSIM-based method outperforms specially in the texture region which confirms that the proposed denoising scheme preserves the structures better and therefore has better perceptual image quality.

Table 1 SSIM and PSNR comparisons of image denoising results

Full size table

3.2 Image super-resolution

In this section we demonstrate the performance of the SSIM-based sparse representations when used for image super-resolution. In this problem, a low resolution image, Y, is given and a high resolution version of the image, X, is required as output. We assume that the low resolution image is produced from high resolution image based on the following equation:

Y = D H X,

(37)

where H represents a blurring matrix, and D is a downsampling matrix. We use local sparsity model as prior to regularize this problem that has infinite many solutions which satisfy (37). Our approach is motivated by recent results in sparse signal representation, which suggests that the linear relationships among high-resolution signals can be accurately recovered from their low-dimensional projections. Here, we work with two coupled dictionaries, Ψ_hfor high-resolution patches, and Ψ_lfor low-resolution ones. The sparse representation of a low-resolution patch in terms of Ψ_lwill be directly used to recover the corresponding high resolution patch from Ψ_h[20]. Given these two dictionaries, each corresponding patch of low resolution image, y, and high resolution image, x, can be represented sparsely with the same coefficient vector, α in Algorithm 2.

y = Ψ_{l} α

(38)

x = Ψ_{h} α

(39)

The patch from each location of the low-resolution image, that needs to be scaled up, is extracted and sparsely coded with the help of SSIM-optimal Algorithm 1. Once the sparse coefficients, α, are obtained, high resolution patches, y, are computed using (39) which are finally merged by averaging in the overlap area to create the resulting image. The proposed image super-resolution algorithm is summarized in Algorithm 3:

The proposed image super resolution scheme is tested on various images. To be consistent with [20] patches of 5 × 5 pixels were used on the low resolution image. Each patch is converted to a vector of length 25. The dictionaries are trained using KSVD [3] with the sizes of 25 × 1024 and 100 × 1024 for the low and the high resolution dictionaries, respectively. 66 natural images are used for dictionary training, which are also used in [43] for similar purpose. To remove artifacts on the patch edges we set overlap of one pixel during patch extraction from the image. Fixed number of atoms (3) has been used by [20] in the sparse coding stage. However SSIM-OMP determines the number of atoms adaptively from patch to patch based on its importance considering SSIM measure. In order to calculate the threshold, T_ssim, defined in (14), T_mse is calculated using MSE-based sparse coding stage in [20]. After calculating sparse representation for all the low resolution patches, we use them to reconstruct the patches and then the difference with the original patch is calculated. We set T_mse to the average of these differences. The performance comparison with state-of-the-art method is given in Table 2. It can be observed that the proposed algorithm outperforms the other methods consistently in terms of SSIM evaluations. It is also interesting to observe PSNR improvements in some cases, though PSNR is not the optimization goal of the proposed approach. The improvements are not always consistent (for example, PSNR drops in some cases in Table 1, while SSIM always improves). There are complicated reasons behind these results. It needs to be aware that the so-called "MSE-optimal" algorithms include many suboptimal and heuristic steps and thus have potentials to be improved even in the MSE sense. Our methods are different from the "MSE-optimal" methods in multiple stages. Although the differences are made to improve SSIM, they may have positive impact on improving MSE as well. For example, when using the learned dictionary to reconstruct an image patch, if SSIM is used to replace MSE in selecting the atoms in the dictionary, then essentially the set of accepted atoms in the dictionary have been changed. In particular, since SSIM is variance normalized, the set of acceptable reconstructed patches near the noisy patch may be structurally similar but are significantly different in variance. This may lead to different selections of the atoms in the dictionary, which when appropriately scaled to approximate the noisy patch, may result in better reconstruction result. Although the visual and SSIM improvements are only moderate, these are promising results as an initial attempt of incorporating a perceptually more meaningful measure into the optimization problem of KSVD-based superresolution method. Figures 4 and 5 compare the reconstructed images obtained using [5] and the proposed methods for the Raccoon and the Girl images, respectively. It can be seen that the proposed scheme preserves many local structures better and therefore has better perceptual image quality. The visual quality improvement is also reflected in the corresponding SSIM maps, which provide useful guidance on how local image quality is improved over space. It can be observed from the SSIM maps that the areas which are relatively more structured benefit more from the proposed algorithm as the quality measure used is better at calculating the similarity of structures as compared to MSE.

Table 2 SSIM and PSNR comparisons of image super-resolution results

Full size table

4 Conclusions

In this article, we attempt to combine perceptual image fidelity measurement with optimal sparse signal representation in the context of image denoising and image super-resolution to improve two state-of-the-art algorithms in these areas. We proposed an algorithm to solve for the optimal coefficients for sparse and redundant dictionary in maximal SSIM sense. We also developed a gradient descent approach to achieve the best compromise between the distorted image and the image reconstructed using sparse representation. Our simulations demonstrate promising results and also indicate the potential of SSIM to replace the ubiquitous PSNR/MSE as the optimization criterion in image processing applications. It must be taken into account that this is only an early attempt along a new but promising direction. The main contribution of the current work is mostly in the general framework and theoretical development. Significant improvement in visual quality can be expected by improving the dictionary learning process based on SSIM, as dictionary encapsulates in itself the prior knowledge about the image to be restored. An SSIM-optimal dictionary will capture structures contained in the image in a better way and the restoration task will result into sharper output image. Further improvement is also expected in the future when some of the advanced mathematical properties of SSIM and normalized metrics [12] are incorporated into the optimization framework.

Algorithm 1: SSIM-inspired OMP

Initialize: D = {} set of selected atoms, S_opt= 0, r = Y

while S_opt< T_ssim

Add the next best atom in $ℒ_{2}$ sense to D
Find the optimal $ℒ_{2}$ -based coefficient(s) using (15)
Find the optimal SSIM-based coefficient(s) using (27) and (31)
Update the residual r
Find SSIM-based approximation a
Calculate S_opt= S(a, y)

end

Algorithm 2: SSIM-inspired image denoising

1.
Initialize: X = Y, Ψ = overcomplete DCT dictionary
2.
Repeat J times
- Sparse coding stage: use SSIM-optimal OMP to compute the representation vectors α_ijfor each patch
- Dictionary update stage: Use K-SVD [28] to calculate the updated dictionary and coefficients. Calculate
SSIM-optimal coefficients using (27) and (31)
3.
Global Reconstruction: Use gradient descent algorithm to optimize (6), where the SSIM gradient is given by (35).

Algorithm 3: SSIM-inspired image super resolution

1.
Dictionary Training Phase: trained high and low resolution dictionaries Ψ _l, Ψ _h, [20]
2.
Reconstruction Phase
- Sparse coding stage: use SSIM-optimal OMP to compute the representation vectors _a_ijfor all the patches of low resolution image
- High resolution patches reconstruction: Reconstruct high resolution patches by Ψ_hα_ij
3.
Global Reconstruction: merge high-resolution patches by averaging over the overlapped

region to create the high resolution image.

References

Dabov K, Foi A, Katkovnik V, Egiazarian K: Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Trans. Image Process 2007, 16: 2080-2095.
Article MathSciNet Google Scholar
Buades A, Coll B, Morel JM: A review of image denoising algorithms, with a new one. Multi-scale Model Simul 2005, 4(2):490-530. 10.1137/040616024
Article MathSciNet MATH Google Scholar
Elad M, Aharon M: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 2006, 15(12):3736-3745.
Article MathSciNet Google Scholar
Hou H, Andrews H: Cubic splines for image interpolation and digital filtering. IEEE Trans Signal Process 1978, 26: 508-517. 10.1109/TASSP.1978.1163154
Article MATH Google Scholar
Yang J, Wright J, Huang T, Ma Y: Image super-resolution via sparse representation. IEEE Trans Image Process 2010, 19(11):2861-2873.
Article MathSciNet Google Scholar
Yang J, Wright J, Huang TS, Ma Y: Image super-resolution as sparse representation of raw image patches. Proc IEEE Comput Vis Pattern Recognit 2008, 1-8.
Google Scholar
Wang Z, Bovik AC: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process Mag 2009, 26: 98-117.
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP: Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 2004, 13(4):600-612. 10.1109/TIP.2003.819861
Article Google Scholar
Joint Video Team (JVT) Reference Software [Online][http://iphome.hhi.de/suehring/tml/download/old_jm]
Gao Y, Rehman A, Wang Z: CW-SSIM Based image classification. In IEEE International Conference on Image Processing ICIP. Brussels, Belgium; 2011:1249-1252.
Google Scholar
Piella G, Heijmans H: A new quality metric for image fusion. In IEEE International Conference on Image Processing (ICIP). Volume 3. Barcelona, Spain; 2003:173-176.
Google Scholar
Brunet D, Vrscay ER, Wang Z:On the Mathematical Properties of the Structural Similarity Index (Preprint). University of Waterloo, Waterloo; 2011. [http://www.math.uwaterloo.ca/~dbrunet/]
Google Scholar
Channappayya SS, Bovik AC, Caramanis C, Heath R: Design of linear equalizers optimized for the structural similarity index. IEEE Trans Image Process 2008, 17(6):857-872.
Article MathSciNet Google Scholar
Wang Z, Li Q, Shang X: Perceptual image coding based on a maximum of minimal structural similarity criterion. IEEE Int Conf Image Process 2007, 2: II-121-II-124.
Google Scholar
Rehman A, Wang Z: SSIM-based non-local means image denoising. In IEEE International Conference on Image Processing (ICIP). Brussels, Belgium; 2011:1-4.
Google Scholar
Wang S, Rehman A, Wang Z, Ma S, Gao W: Rate-SSIM optimization for video coding. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 11). Prague, Czech Republic; 2011:833-836.
Google Scholar
Ou T, Huang Y, Chen H: A perceptual-based approach to bit allocation for H.264 encoder. SPIE Visual Communications and Image Processing 2010, 77441B.
Google Scholar
Mai Z, Yang C, Kuang K, Po L: A novel motion estimation method based on structural similarity for h.264 inter prediction. In IEEE Int Conf Acoust Speech Signal Process. Volume 2. Toulouse; 2006:913-916.
Google Scholar
Yang C, Wang H, Po L: Improved inter prediction based on structural similarity in H.264. In IEEE Int Conf Signal Process Commun. Volume 2. Dubai; 2007:340-343.
Google Scholar
Zeyde R, Elad M, Protter M: On single image scale-up using sparse-representations. In Curves & Surfaces. Avignon-France; 2010:711-730.
Google Scholar
Savitzky A, Golay MJE: Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 1964, 36: 1627-1639. 10.1021/ac60214a047
Article Google Scholar
Tikhonov AN, Arsenin VY: Solutions of Ill-Posed Problem. V. H. Winston, Washington DC; 1977.
MATH Google Scholar
Rudin LI, Osher S, Fatemi E: Nonlinear total variation based noise removal algorithms. Physica D 1992, 60: 259-268. 10.1016/0167-2789(92)90242-F
Article MathSciNet MATH Google Scholar
Protter M, Elad M: Image sequence denoising via sparse and redundant representations. IEEE Trans Image Process 2009, 18: 27-35.
Article MathSciNet Google Scholar
Mairal J, Sapiro G, Elad M: Learning multiscale sparse representations for image and video restoration. Multiscale Model Simul 2008, 7: 214-241. 10.1137/070697653
Article MathSciNet MATH Google Scholar
Candés EJ, Romberg J, Tao T: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 2006, 52(2):489-509.
Article MathSciNet MATH Google Scholar
Donoho DL: Compressed sensing. IEEE Trans Inf Theory 2006, 52(4):1289-1306.
Article MathSciNet MATH Google Scholar
Aharon M, Elad M, Bruckstein A: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 2006, 54(11):4311-4322.
Article Google Scholar
Pati Y, Rezaiifar R, Krishnaprasad P: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In Twenty Seventh Asilomar Conference on Signals, Systems and Computers. Volume 1. Pacific Grove, CA; 1993:40-44.
Chapter Google Scholar
Brunet D, Vrscay ER, Wang Z: Structural similarity-based approximation of signals and images using orthogonal bases. In Proc Int Conf on Image Analysis and Recognition. Edited by: M Kamel, A Campilho. Springer, Heidelberg; 2010:11-22. vol. 6111 of LNCS
Chapter Google Scholar
Wang S, Rehman A, Wang Z, Ma S, Gao W: SSIM-inspired divisive normalization for perceptual video coding. In IEEE International Conference on Image Processing ICIP. Brussels, Belgium; 2011:1657-1660.
Google Scholar
Wainwright MJ, Simoncelli EP: Scale mixtures of gaussians and the statistics of natural images. Adv Neural Inf Process Syst 2000, 12: 855-861.
Google Scholar
Lyu S, Simoncelli EP: Statistically and perceptually motivated nonlinear image representation. In Proc SPIE Conf Human Vision Electron Imaging XII. Volume 6492. San Jose, CA; 2007:649207-1-649207-15.
Google Scholar
Foley J: Human luminance pattern mechanisms: masking experiments require a new model. J Opt Soc Am 1994, 11: 1710-1719. 10.1364/JOSAA.11.001710
Article Google Scholar
Watson AB, Solomon JA: Model of visual contrast gain control and pattern masking. J Opt Soc Am 1997, 14: 2379-2391. 10.1364/JOSAA.14.002379
Article Google Scholar
Heeger DJ: Normalization of cell responses in cat striate cortex. Vis Neural Sci 1992, 9: 181-198.
Article Google Scholar
Simoncelli EP, Heeger DJ: A model of neuronal responses in visual area MT. Vis Res 1998, 38: 743-761. 10.1016/S0042-6989(97)00183-1
Article Google Scholar
Li Q, Wang Z: Reduced-reference image quality assessment using divisive normalization-based image representation. IEEE J Coupled dictionary training for image s Spec Top Signal Process 2009, 3: 202-211.
Article Google Scholar
Rehman A, Wang Z: Reduced-reference SSIM estimation. In International Conference on Image Processing. Hong Kong, China; 2010:289-292.
Google Scholar
Malo J, Epifanio I, Navarro R, Simoncelli EP: Non-linear image representation for efficient perceptual coding. IEEE Trans Image Process 2006, 15: 68-80.
Article Google Scholar
Portilla J, Strela V, Wainwright MJ, Simoncelli EP: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans Image Process 2003, 12: 1338-1351. 10.1109/TIP.2003.818640
Article MathSciNet MATH Google Scholar
Wang Z, Simoncelli EP: Maximum differentiation (MAD) competition: a methodology for comparing computational models of perceptual quantities. J Vis 2008, 8(12):1-13. 10.1167/8.12.1
Article Google Scholar
Yang J, Wang Z, Lin Z, Huang T: Coupled dictionary training for image super-resolution.2011. [http://www.ifp.illinois.edu/~jyang29/]
Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada and in part by Ontario Early Researcher Award program, which are gratefully acknowledged.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Abdul Rehman, Mohammad Rostami & Zhou Wang
Department of Applied Mathematics, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Dominique Brunet & Edward R Vrscay

Authors

Abdul Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Rostami
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Brunet
View author publications
You can also search for this author in PubMed Google Scholar
Edward R Vrscay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Rehman.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rehman, A., Rostami, M., Wang, Z. et al. SSIM-inspired image restoration using sparse representation. EURASIP J. Adv. Signal Process. 2012, 16 (2012). https://doi.org/10.1186/1687-6180-2012-16

Download citation

Received: 06 June 2011
Accepted: 20 January 2012
Published: 20 January 2012
DOI: https://doi.org/10.1186/1687-6180-2012-16

SSIM-inspired image restoration using sparse representation

Abstract

1 Introduction

2 The proposed method

2.1 Image restoration from sparsity

2.2 SSIM-optimal local model from sparse representation

2.3 SSIM-based global reconstruction

3 Applications

3.1 Image denoising

3.2 Image super-resolution

4 Conclusions

Algorithm 1: SSIM-inspired OMP

Algorithm 2: SSIM-inspired image denoising

Algorithm 3: SSIM-inspired image super resolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords