 Research
 Open access
 Published:
Image inpainting based on sparse representations with a perceptual metric
EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 179 (2013)
Abstract
This paper presents an image inpainting method based on sparse representations optimized with respect to a perceptual metric. In the proposed method, the structural similarity (SSIM) index is utilized as a criterion to optimize the representation performance of image data. Specifically, the proposed method enables the formulation of two important procedures in the sparse representation problem, 'estimation of sparse representation coefficients’ and 'update of the dictionary’, based on the SSIM index. Then, using the generated dictionary, approximation of target patches including missing areas via the SSIMbased sparse representation becomes feasible. Consequently, image inpainting for which procedures are totally derived from the SSIM index is realized. Experimental results show that the proposed method enables successful inpainting of missing areas.
1 Introduction
In the field of image processing, there exist many studies on image restoration/enhancement such as image denoising[1–3], image deblurring[4, 5], and image inpainting[6]. Furthermore, it is well known that the performance of these studies has been rapidly improved in recent years[1, 2, 4]. Missing area reconstruction is one of the most attractive topics for study in the field of image restoration since it has a number of applications. Unnecessary object removal, missing block reconstruction in an errorprone environment in wireless communication, and restoration of corrupted old films are representative applications. Since missing area reconstruction can be used in many applications, it has various names including inpainting, image completion, error concealment, and blotch and scratch removal. In this paper, we use 'inpainting’ since this is one of the most common names in this research field.
Many inpainting methods for the above applications have been proposed[7–45]. Most methods are broadly classified into two categories: missing structure reconstruction[7–18] and missing texture reconstruction[21–45]. In addition, there have been proposed several inpainting methods which adopt the combined use of the structure and texture reconstruction approaches[20, 42].
Variational image inpainting methods which aim at successful structure component reconstruction have traditionally been studied. Variational image inpainting is performed based on the continuity of the geometrical structure of images. Most variational inpainting methods solve partial differential equations (PDEs). One of the pioneering works was proposed by Masnou et al.[7]. Furthermore, Bertalmio et al. proposed a representative image inpainting technique which is based on PDEs. Not only the above methods but also several improved methods have recently been proposed[12–15]. Although these variational image inpainting methods enable successful reconstruction of the structure components, images also include other different important components, i.e., texture components, and alternative methods tend to output better results. The remainder of this paper focuses on the reconstruction of textures with discussion of its details.
Results of pioneering work based on texture synthesis were reported by Efros et al.[21]. Their method is based on the Markov random field model, and inpainting is realized by copying known pixels within a target image. It is well known that successful inpainting of pure texture images can be realized using their method. In recent years, their ideas have been improved by many researchers[22–30].
Drori et al.[23] and Criminisi et al.[24] developed more accurate inpainting techniques. Drori et al. proposed a fragmentbased image completion algorithm that can preserve not only textures but also structures within target images. Criminisi et al. proposed an exemplarbased inpainting method, and it became a benchmarking method in this study field. Their method adopts a patchbased greedy sampling algorithm, and faster and simpler inpainting becomes feasible. Recently, many improved versions of the above exemplarbased inpainting method[25–29] have intensively been proposed. Specifically, Meur et al. proposed multiresolution analysisbased inpainting approaches using the exemplarbased method[28, 29]. Kwok et al. proposed a much faster inpainting method in which useful schemes for calculating patch similarities in exemplarbased inpainting were introduced[30]. They also reported that their method provided better results than those of the previously reported methods in some cases.
The above existing methods based on texture synthesis and exemplarbased inpainting generally copy pixel values to missing areas directly. Thus, if target images contain uniform and simple textures, the methods can perform accurate inpainting. However, if the above conditions are not satisfied, it becomes difficult to approximate missing textures by only the best matched examples. Therefore, many inpainting methods that approximate patches including missing areas using subspaces generated from known areas within target images have been proposed. In these methods, target patches are generally represented by linear combinations of bases that span the obtained subspaces. The performance of inpainting therefore depends on the generated subspaces and linear coefficients for calculating the linear combination. Amano et al. proposed a principal component analysis (PCA)based missing area inpainting method using back projection for lost pixels[31]. They utilized an eigenspace that enabled derivation of inverse projection for the inpainting. Several inpainting methods in which kernel methods are introduced into PCAbased subspace construction have also been proposed[32–35]. Based on nonlinear eigenspaces, successful representation of image data becomes feasible, i.e., the methods are suitable for approximating nonlinear structures in images.
Recently, sparse representation for image inpainting has been intensively studied. Sparse representation enables adaptive selection of optimal bases suitable for approximating target images[36, 37]. This means subspaces utilized for the inpainting can be adaptively provided. Therefore, several inpainting methods using sparse representation have been proposed[38–42]. Furthermore, Xu et al. have shown the effective use of sparse representation for realizing image inpainting[41]. Specifically, in their method, new modeling of patch priority and patch representation, which are two crucial steps for patch propagation in an exemplarbased inpainting approach, based on sparsity is adopted. In similar ideas, several inpainting methods based on neighbor embedding approaches are proposed[43, 44]. These methods are derived from the aspect of the manifold learning and provide good results. Furthermore, inpainting methods based on rank minimization have also been proposed[45].
The abovedescribed existing methods are based on least squares approximation for inpainting. This means that inpainting minimizing the mean square error (MSE) of intensities, which is the most popular metric, is performed. However, several works[46, 47] show that MSE optimal algorithms cannot provide high visual quality. Thus, it may not be appropriate to use MSE as a quality measure for the inpainting. It should be noted that using kernel PCA (KPCA)[32, 33], methods such as those shown in[34] and[35] try to approximate nonlinear image features. These methods perform least squares approximation in highdimensional nonlinear feature spaces, and it has been reported that improvement in performance was achieved in some cases.
Recently, image quality assessment has become popular in overcoming the problem of MSE and its variants. Criteria such as noise quality measure[48], information fidelity criterion[49], and visual information fidelity[50] are well known as perceptual distortion measures, and their performances have been evaluated in detail[51]. The structural similarity (SSIM) index[52] is utilized as one of the most representative quality measures in many fields of image processing. Since its formulation is simple and easy to analyze, the SSIM index can be applied to not only image quality assessment but also design of linear equalizers[53]. Therefore, successful inpainting based on this quality measure can be expected.
In this paper, we present an inpainting method based on sparse representations optimized with respect to a perceptual metric. In order to perform inpainting using sparse representation, the SSIM index is used for a criterion to optimize the representation performance.
Specifically, the proposed method introduces the SSIMbased criterion into two important procedures in the sparse representation problem, i.e., 'estimation of the sparse representation coefficients’ and 'update of the dictionary’. This is the biggest difference between the proposed method and existing methods. Then, by deriving the sparse representation of target patches including missing areas based on the generated dictionary, inpainting based on the SSIM index is realized. Note that in the above approach, since optimization problems maximizing the SSIM index are nonconvex, the computation scheme in[53] is adopted, and nonconvex optimization problems are reformulated as quasiconvex problems. In the proposed method, the optimal subspace can be adaptively provided for each target patch using sparse representation. Furthermore, since the SSIM index, which is a better perceptual criterion than the traditional MSE and its variants, is used, successful inpainting can be expected.
A similar approach has also been proposed by Rehman et al. for realizing noise removal and superresolution[54]. On the other hand, we present a new scheme for realizing inpainting in this paper, and the target application is different from those in[54]. Basically, in our method, the algorithms for estimation of sparse representation coefficients and generation of the dictionary are different from those in the method of Rehman et al. Furthermore, the biggest difference between our method and the method in[54] is generation of the dictionary. Specifically, in the existing method[54], the dictionary is obtained by directly using the KSVD algorithm[36], which is based on the MSEbased criterion, where SVD represents singular value decomposition. On the other hand, the proposed method tries to obtain the dictionary based on the SSIMbased criterion, and all of the procedures are based on the SSIM index.
This paper is organized as follows. First, in Section 2, we briefly explain sparse representation and the SSIM index, which are used in the proposed method, as preliminaries. Next, in Section 3, we explain the overview of the proposed method. An inpainting method via sparse representation based on the SSIM index is proposed in Section 4. Experimental results that verify the performance of the proposed method are shown in Section 5. Finally, conclusions are given in Section 6.
2 Preliminaries
In this section, we briefly explain sparse representation and the SSIM index used in the proposed method as preliminaries. They are presented in Sections 2.1 and 2.2, respectively.
2.1 Sparse representation
Sparse representation of signals is explained in this subsection. The basic algorithm for sparse representation and the KSVD algorithm[36], which is closely related to the proposed method, are shown in this subsection. Thus, we briefly explain their ideas.
Given an overcomplete dictionary D ∈ R ^{n×K} whose columns are prototype signalatoms d _{ j } ∈ R ^{n}(j = 1,2,…,K), a target signal y ∈ R ^{n} can be represented as a sparse linear combination of these atoms^{a}. Specifically, y is approximated as y ≅ D x(x ∈ R ^{K}), where x is a vector containing the representation coefficients of signal y, and it satisfies yD x_{ p } ≤ ε. In this subsection, we assume p = 2.
If n < K and D is a fullrank matrix, an infinite number of solutions are available for the above representation problem. Thus, a new constraint is introduced into this problem, and the solution is obtained by solving
where ·_{0} represents the l ^{0}norm. Furthermore, T determines the sparsity of the signals. The above equation represents the optimal representation coefficient vector x minimizing the distance{\mathbf{y}\mathbf{Dx}}_{2}^{2} which is calculated under the constraint that the number of the nonzero elements in x is T or less. For example, Figure1a shows an example of the sparse representation of the target vector y, where in this example, x_{0} = 6. Therefore, the number of the nonzero elements in x is six. By limiting the number of the nonzero elements, we can obtain the solution of the above linear combination. It is well known that calculation of the optimal solution is a nondeterministic polynomialtime hard (NPhard) problem[55]. Thus, several methods that approximately provide solutions of the above problem have been proposed, and the simplest ones are matching pursuit (MP)[56] and orthogonal MP (OMP) algorithms[57–59]. The basis pursuit algorithm is also a representative algorithm solving the problems by replacing the l ^{0}norm with an l ^{1}norm[60]. The focal underdetermined system solver is a similar algorithm using l ^{p}norm (p ≤ 1)[61].
Next, given a set of signal vectors y _{ i }(i = 1,2,…,N), there exist dictionary matrices providing the sparse solution x _{ i }. The KSVD algorithm[36] can provide the optimal dictionary matrix D and coefficient vectors x _{ i }(i = 1,2,…,N) by solving
where X = [x _{1},x _{2},…,x _{ N }] and Y = [y _{1},y _{2},…,y _{ N }], and ·_{ F } represents the Frobenius norm. In Equation 2, this problem is to obtain the optimal dictionary matrix D and representation coefficient vectors x _{ i }(i = 1,2,…,N) minimizing the sum of y _{ i }Dx _{ i }^{2}(i = 1,2,…,N) under the constraint that the number of the nonzero elements in x _{ i }(i = 1,2,…,N) is T or less. Figure1b shows the relationship between Y and DX, where the number of the nonzero values in each x _{ i } of X is six in this example. The KSVD algorithm approximately calculates the optimal solution of Equation 2 by iterating calculation of x _{ i }(i = 1,2,…,N) based on the OMP algorithm and update of the atoms d _{ j }(j = 1,2,…,K) in the dictionary matrix D using singular value decomposition (SVD). Specifically, the representation coefficient vector x _{ i }(i = 1,2,…,N) is estimated one by one, and each atom d _{ j }(j = 1,2,…,K) in the dictionary matrix D is also updated one by one. As described above, for updating d _{ j }(j = 1,2,…,K), SVD is adopted for effectively providing the approximately optimal solution.
2.2 Structural similarity index
The SSIM index represents the similarity between two signal vectors y _{1} and y _{2}(∈R ^{n}), and its specific definition is as follows:
where the terms l(y _{1},y _{2}) and c(y _{1},y _{2}) respectively compare the mean and variance of the two signal vectors. Furthermore, s(y _{1},y _{2}) measures their structural correlation. Therefore, from Equation 3, the similarity between two signal vectors is obtained from the three similarities of their luminance, contrast, and structure components, i.e., l(y _{1},y _{2}),c(y _{1},y _{2}), and s(y _{1},y _{2}), which are closely related to the human visual system (HVS), where their details are shown below. Note that the parameters α > 0, β > 0, and γ > 0 determine the relative importance of the three components in Equation 3. Next, the three terms, l(y _{1},y _{2}),c(y _{1},y _{2}), and s(y _{1},y _{2}), are obtained as
In the above equations,{\mu}_{{\mathbf{y}}_{1}} and{\mu}_{{\mathbf{y}}_{2}} are the means of y _{1} and{\mathbf{y}}_{2},{\mathit{\sigma}}_{{\mathbf{y}}_{1}}^{2} and{\mathit{\sigma}}_{{\mathbf{y}}_{2}}^{2} are the variances of y _{1} and y _{2}, and{\sigma}_{{\mathbf{y}}_{1},{\mathbf{y}}_{2}} is the cross covariance between y _{1} and y _{2}. The constants C _{1},C _{2}, and C _{3} are necessary to avoid instability when the denominators are very close to zero.
As shown in[52], the parameters are set as α = β = γ = 1 and{C}_{3}=\frac{{C}_{2}}{2}, and formulation of the SSIM index is simplified by
Note that in the proposed method shown in Section 4, C _{1} = (K _{1} I _{max}) and C _{2} = (K _{2} I _{max}), where I _{max} = 255, K _{1} = 0.01, and K _{2} = 0.03. Thus, α,β,γ,C _{1},C _{2}, and C _{3} are set to the values shown in[52].
In[47] and[52], the effectiveness of the SSIM index as a quality measure, its superiority to MSE, and its variants are presented in detail. Generally, MSE cannot reflect perceptual distortions, and its value becomes higher for images altered with some distortions such as mean luminance shift, contrast stretch, spatial shift, spatial scaling, and rotation but with negligible loss of subjective image quality. Furthermore, blurring severely deteriorates image quality, but its MSE becomes lower than those of the above alterations. On the other hand, the SSIM index is defined by separately calculating three similarities in terms of luminance, variance, and structure, which are derived on the basis of the HVS not accounted for by MSE. Therefore, it becomes a better quality measure providing a solution to the above problem, and this is also confirmed in[47]. We can therefore expect that the use of this similarity for inpainting will provide successful results.
Note that moment invariants take not only image features, such as means and variance, but also image degradations, such as translation, scaling, and rotation, into accounts to generate some invariants and to properly match images without setting any constant. Therefore, in the rest of this subsection, we show some discussions of advantage and disadvantage of the use of the SSIM index by comparing with moment invariants.
2.2.1 Advantage
In the proposed method, we use the SSIM index to represent the visual quality of inpainting results. The SSIM index is defined based on several characteristics in the HVS. As shown in Equations 3 to 7, the SSIM index is related to luminance and contrast masking and the correlation. This means that the SSIM index is obtained from the three elements, i.e., Equations 4 to 6. Specifically, the first term defined in Equation 4 is consistent with Weber’s law, which states that the HVS is sensitive to the relative luminance change, and not to the absolute luminance change. The second term defined in Equation 5 is derived based on the contrast masking characteristic that the contrast change is less sensitive when there is a high base contrast than there is a low base contrast. Then, in the third term defined in Equation 6, the structure comparison is conducted after luminance subtraction and contrast normalization. If we ignore C _{3}, it is equivalent to calculating the correlation coefficient. In this way, it can be seen that the SSIM index is derived by a bottomup scheme according to the HVS. This means the proposed method using the SSIM index can perform the inpainting with consideration of the sensitivity to the HVS.
2.2.2 Disadvantage
It is known that the SSIM index tends to be robust to translation, scaling, and rotation. However, as those gaps become larger, it also becomes difficult to provide accurate visual quality using the SSIM index due to its definition. On the other hand, moment invariants can output several useful criteria which are invariant under translation, scaling, and rotation. Therefore, if a new visual quality measure can be derived from these moment invariants, successful inpainting based on the derived measure can be also expected. Furthermore, the SSIM index has several parameters compared to the moment invariants.
Note that when comparing with the MSE and its variants, the SSIM index can only be calculated from some areas. This means the SSIM index is calculated in a blockwise scheme, not in a pixelwise scheme. Therefore, to realize the use of the SSIM index for inpainting, we have to adopt the blockwise procedures.
3 Overview of our proposed framework
This section presents the overview of the proposed framework. First, we show the outline of the proposed method in Figure2. As shown in this figure, the proposed method consists of two algorithms, 'generation of dictionary’ and 'inpainting of missing areas’. This means these two algorithms respectively correspond to training and test phases.
3.1 Generation of dictionary
First, in the generation of the dictionary, we clip known patches not including any missing areas from the target image, and the dictionary matrix D shown in Section 2.1 is calculated from these patches. In the same manner as the traditional sparse representation problems, we iteratively perform two procedures, 'calculation of the representation coefficients’ and 'update of the atoms included in the dictionary matrix D’. The procedures are similar to those of the traditional method (KSVD algorithm[36]). The contribution of the proposed method, i.e., the difference from the traditional method, is the introduction of the SSIM index. Specifically, the representation coefficients and the atoms of the dictionary matrix are calculated in such a way that the SSIMbased approximation performance becomes the highest. This means that the cost function{\mathbf{Y}\mathbf{DX}}_{F}^{2} in Equation 2 is replaced with that of the SSIM index. Note that in the calculation of the representation coefficients, the maximization problem of the SSIM index is a nonconvex problem, and thus, it is reformulated as a quasiconvex problem using the computation scheme in[53]. On the other hand, in the update of the atoms of the dictionary matrix, we use a simple steepest ascent algorithm since the introduction of the computation scheme in[53] needs high computation costs. In KSVD algorithm[36], the atoms can be effectively updated using SVD, but this scheme is based on the leastsquare approximation, and therefore, we use the simple steepest ascent algorithm.
3.2 Inpainting of missing areas
In the inpainting of missing areas, we first clip a patch including missing areas from the target image. Note that we have to determine which patch should be first selected for the inpainting. In the proposed method, we calculate the patch priority for determining the inpainting order based on the method in[24]. Therefore, the patch maximizing the patch priority is selected, and its missing areas are reconstructed in the proposed method.
For the selected patch (denoted as the target patch) including missing areas, the inpainting procedures are performed. Specifically, the proposed method performs the sparse representation of the target patch to estimate the missing intensities. Note that the cost function in Equation 1 is replaced with an SSIM version. Thus, this is the difference from the traditional sparse representation approach and the biggest contribution in our method. The sparse representation of the target patch maximizing the SSIM index is then performed, where this nonconvex maximization problem is also reformulated as a quasiconvex problem using the computation scheme in[53]. In Figure2, the specific procedures for calculating the sparse representation are shown. Their details are shown in the following section. From the approximation results obtained by the above sparse representation, the proposed method outputs the estimated intensities within the missing areas of the target patch.
By iterating the patch selection based on the patch priority and its SSIMbased missing area reconstruction, we can inpaint the whole missing areas within the target image.
4 Image inpainting via SSIMbased sparse representation
The inpainting method via SSIMbased sparse representation is presented in this section. As described in the previous section, the proposed method is divided into two algorithms, generation of a dictionary and inpainting algorithm. In the first algorithm, the dictionary is generated from known patches f _{ i } (i = 1,2,…,N) within the target image, where N is the number of known patches, and their size is w × h pixels. It should be noted that the proposed method performs calculation of the dictionary based on the new perceptually optimized criterion, i.e., the SSIM index. The details of this calculation are shown in Section 4.1. In the second algorithm, the proposed method clips a patch f including missing areas from the target image and estimates their unknown intensities. In this algorithm, sparse representation based on the SSIM index is introduced into the inpainting. Its details are shown in Section 4.2. For the following explanation, we denote unknown and known areas within f as Ω and\stackrel{\u0304}{\mathrm{\Omega}}, respectively.
4.1 Generation of the dictionary
In this subsection, the algorithm for generating the dictionary is presented. In the proposed method, we calculate the dictionary matrix D in Equation 2 for reconstructing the missing areas within the target image. Note that the difference from Equation 2 is the use of the SSIM index. In contrast to Equation 2 in minimizing the MSE of the approximation results, the proposed method maximizes the SSIM index of the approximation results by the sparse representation. Similar to KSVD algorithm[36], since it is difficult to simultaneously obtain the dictionary matrix and the representation coefficients, we iteratively update these two. Specifically, for the calculation of the representation coefficients optimal in terms of the SSIM index, we use their simple estimation scheme similar to some matching pursuit algorithms. Furthermore, its nonconvex optimization problem is reformulated as a quasiconvex problem using the calculation scheme in[53]. On the other hand, each atom of the dictionary matrix is updated one by one by a simple steepest ascent algorithm. The details are shown below.
As described above, known patches f _{ i }(i = 1,2,…,N) with sizes of w × h pixels are clipped from the target image in the same interval. This means that the patches f _{ i } for generating the dictionary are selected from known parts, which are not damaged, of the target image. Next, for each patch f _{ i }, we define a vector y _{ i } ∈ R ^{wh}, whose elements are its rasterscanned intensities. Using an overcomplete dictionary matrix D ∈ R ^{wh×K} containing K prototype atoms d _{ j } ∈ R ^{wh}(j = 1,2,…,K), each vector y _{ i } is represented as a sparse linear combination of these atoms, y _{ i } ≅ Dx _{ i }, where it satisfies SSIM(y _{ i },Dx _{ i }) ≥ η for a fixed value η that corresponds to ε in the previous section. The vector x _{ i } ∈ R ^{K} contains the representation coefficients of y _{ i }.
If wh < K and D is a fullrank matrix, an infinite number of solutions are available for the representation problems. Therefore, in the same manner as Equation 1, the proposed method adopts the solution of
This means that the optimal vector of x _{ i } is obtained by maximizing the SSIM index between y _{ i } and Dx _{ i } under the constraint that the number of the nonzero elements in x _{ i } is T or less. The optimal representation coefficients can then be obtained by solving the above equation.
In addition, according to Equation 2 in the KSVD algorithm[36], the optimal dictionary matrix D can be obtained by solving the following maximization problem:
This means that we calculate the dictionary matrix D maximizing the approximation performance of all y _{ i }(i = 1,2,…,N) in terms of the SSIM index under the constraint that the number of the nonzero elements in x _{ i }(i = 1,2,…,N) is T or less. In the proposed method, the optimal dictionary matrix D is estimated using a scheme similar to the KSVD algorithm[36], where the procedures are based on the SSIM index. Specifically, this scheme is divided into two procedures, calculation of the optimal vector x _{ i }(i = 1,2,…,N) and update of the dictionary matrix D, and they are iteratively performed. We show each of the procedures below.
4.1.1 Calculation of the optimal vector x^{i}
By fixing the dictionary matrix D, the optimal vector x _{ i } is calculated for each y _{ i }. Specifically, x _{ i } can be calculated on the basis of Equation 8. In this optimization problem, we select T optimal atoms that provide the optimal linear combination based on the SSIM index. Therefore, we adopt the simplest algorithm that selects the optimal atoms one by one, and it is similar to several matching pursuit algorithms[56–59]. Specifically, for each y _{ i }(i = 1,2,…,N), we first search one atom which provides its optimal approximation, maximizing the SSIM index. Furthermore, by adding another atom to the previously selected atoms, we calculate their SSIMbased linear combination approximating each y _{ i }, and then, the optimal atom maximizing the SSIM index with the previously selected atoms is selected. Then, by iterating this procedure T times, the T optimal atoms can be selected for each y _{ i }. Therefore, the procedures are quite simple. In each iteration, we simply select one atom in such a way that the linear combination of this atom and the previously selected atoms maximizes the SSIM index for approximating each y _{ i }(i = 1,2,…,N).
The details of the t th (t = 1,2,…,T) optimal atom selection are shown below.
In the t th optimal atom selection for y _{ i }, the following vector is first defined:
where{\mathbf{D}}_{i}^{(t1)} is a w h × (t  1) matrix containing t  1 atoms previously selected from d _{ j }(j = 1,2,…,K) in t  1 iterations. In addition,
and
is a coefficient vector for calculating{\mathbf{y}}_{i,j}^{(t)}. The vector{\mathbf{x}}_{i}^{(t)} contains representation coefficients that respectively correspond to the atoms in{\mathbf{D}}_{i}^{(t1)}, and x _{ j } is that corresponding to d _{ j }. Here, we show the specific definitions of{\mathbf{x}}_{i,j}^{(t)},{\mathbf{y}}_{i,j}^{(t)}, and{\mathbf{D}}_{i,j}^{(t)}. First,{\mathbf{x}}_{i,j}^{(t)} is the sparse representation coefficient vector for representing y _{ i } with the atom d _{ j } selected to be appended at iteration t, and{\mathbf{y}}_{i,j}^{(t)} is the corresponding approximation of y _{ i }. Next,{\mathbf{D}}_{i,j}^{(t)} is a matrix including t  1 atoms previously selected in t  1 iterations and the atom d _{ j } at iteration t which are used for representing y _{ i }. The proposed method estimates the optimal vector{\widehat{\mathbf{y}}}_{i,j}^{(t)} of{\mathbf{y}}_{i,j}^{(t)} (j = 1,2,…,K) that provides the optimal representation performance. Then the optimal atom d _{ j } is selected to maximize the SSIM index for the representation of y _{ i } by itself together with the atoms selected in the previous t  1 iterations.
In order to calculate{\widehat{\mathbf{y}}}_{i,j}^{(t)}, the optimal coefficient vector{\widehat{\mathbf{x}}}_{i,j}^{(t)} in the following equation must be estimated:
Thus, we have to solve
where\text{SSIM}\left({\mathbf{y}}_{i},{\mathbf{y}}_{i,j}^{(t)}\right) is defined as
In this equation,{\mu}_{{\mathbf{y}}_{i}}(=\frac{1}{\mathit{\text{wh}}}{\mathbf{1}}^{\prime}{\mathbf{y}}_{i}) and{\sigma}_{{\mathbf{y}}_{i}}^{2}(=\frac{1}{\mathit{\text{wh}}}{{\mathbf{y}}_{i}{\mu}_{{\mathbf{y}}_{i}}\mathbf{1}}^{2}) are respectively the mean and variance of y _{ i }, where 1 = [1,1,…,1]^{′} is a w h × 1 vector, and the vector/matrix transpose is denoted by the superscript ^{′} in this paper. Similarly,{\mu}_{{\mathbf{y}}_{i,j}^{(t)}} and{\sigma}_{{\mathbf{y}}_{i,j}^{(t)}}^{2} are the mean and variance of{\mathbf{y}}_{i,j}^{(t)}, respectively, and are obtained as follows:
where
Furthermore,
is a centering matrix, where H = H ^{′} and H ^{2} = H are satisfied, and I is the identity matrix. In addition,
In Equation 15,{\sigma}_{{\mathbf{y}}_{i},{\mathbf{y}}_{i,j}^{(t)}} is the cross covariance between y _{ i } and{\mathbf{y}}_{i,j}^{(t)} and is defined as
where
Then, Equation 15 is rewritten as
It should be noted that the criterion in Equation 23 is a nonconvex function of{\mathbf{x}}_{i,j}^{(t)}, and it is difficult to obtain the global optimal solution. Thus, we introduce the calculation scheme used in[53] into the estimation of the optimal vector{\widehat{\mathbf{x}}}_{i,j}^{(t)}. Specifically, the nonconvex problem is transformed into a quasiconvex formulation. The main idea of this scheme is shown as follows. By fixing the mean of{\mathbf{y}}_{i,j}^{(t)}\phantom{\rule{0.3em}{0ex}}(={{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}), we can focus only on the second term in Equation 23. Therefore, the maximization problem can be simplified.
First, we note that the first term in Equation 23 is a function only of{{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}\left(={\rho}_{i,j}^{(t)}\right). Thus, Equation 23 can be rewritten as
Therefore, it can be seen that the first term of the above equation can be fixed by fixing{\rho}_{i,j}^{(t)} since{\mu}_{{\mathbf{y}}_{i}} is a constant.
Then, by constraining{{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}={\rho}_{i,j}^{(t)}, the optimization problem can be simplified to find
Thus, the cost function becomes more simple, i.e., we can focus only on the second term of the SSIM index under the constraint fixing{{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}={\rho}_{i,j}^{(t)}.
Therefore, the overall problem is to find the highest SSIM index by searching over a range of{\rho}_{i,j}^{(t)}. Furthermore, the above problem can be rewritten as
and, in the proposed method, the following simple Lagrange multiplier approach is utilized for estimating the optimal vector of{\mathbf{x}}_{i,j}^{(t)}:
where the first and second terms correspond to the cost function and the third term corresponds to the constraint. The specific derivations of the above equations are shown in the Appendix. We can then estimate the optimal value of τ using a standard bisection procedure, and the optimal vectors{\widehat{\mathbf{x}}}_{i,j}^{(t)}\left({\rho}_{i,j}^{(t)}\right) are calculated for several values of{\rho}_{i,j}^{(t)}(={\mu}_{{\mathbf{y}}_{i}}R\delta ,\dots ,{\mu}_{{\mathbf{y}}_{i}}2\delta ,{\mu}_{{\mathbf{y}}_{i}}\delta ,{\mu}_{{\mathbf{y}}_{i}},{\mu}_{{\mathbf{y}}_{i}}+\delta ,{\mu}_{{\mathbf{y}}_{i}}+2\delta ,\dots ,{\mu}_{{\mathbf{y}}_{i}}+R\delta ) to select{\widehat{\mathbf{x}}}_{i,j}^{(t)} maximizing Equation 15. Note that δ is the searching interval, and R determines the searching range. Their specific values are shown in Section 5.1. The detailed procedures for estimating τ in the proposed method are as follows:

(i)
An initial value of τ (say τ _{0}) is determined between zero to one. Furthermore, U _{ τ } = 1.0 and L _{ τ } = τ _{0}, where U _{ τ } and L _{ τ } respectively represent the upper limit and the lower limit of τ. In this paper, we set τ _{0} = 0.2.

(ii)
The optimization problem in Equation 28 is solved using τ.

(iii)
Two criteria S _{ τ } and D _{ τ } are calculated as
\begin{array}{l}{S}_{\tau}=\tau \left({\sigma}_{{\mathbf{y}}_{i}}^{2}+{{\mathbf{x}}_{i,j}^{(t)}}^{\prime}{\mathbf{K}}_{i,j}^{(t)}{\mathbf{x}}_{i,j}^{(t)}+{C}_{2}\right)\left(2{{\mathbf{k}}_{i,j}^{(t)}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}+{C}_{2}\right),\\ {D}_{\tau}={U}_{\tau}{L}_{\tau}.\end{array} 
(iv)
According to the obtained criteria S _{ τ } and D _{ τ }, the following steps are operated:

(a)
If S _{ τ } ≥ 0 and D _{ τ } < ε, the final optimal solution of τ is output, where ε = 0.05.

(b)
If S _{ τ } ≥ 0 but D _{ τ } ≥ ε, \tau =\frac{{U}_{\tau}+{L}_{\tau}}{2} and U _{ τ } = τ.

(c)
Otherwise, \tau =\frac{{U}_{\tau}+{L}_{\tau}}{2} and L _{ τ } = τ.

(v)
Procedures (ii) to (iv) are iterated.
4.1.2 Update of dictionary matrix D
From the calculated optimal vectors x _{ i } (i = 1,2,…,N), the proposed method updates the dictionary matrix D. We update each dictionary element, i.e., each atom, one by one in a greedy fashion. Specifically, we choose one atom and update it in such a way that the representation performance, i.e., the sum of the SSIM index, becomes the highest. We perform the update of each atom d _{ j } (j = 1,2,…,K) by solving the following problem:
where x _{ i }(j) is a j th element of x _{ i }.
In the above equation, we try to maximize the approximation performance of y _{ i } (i = {1,2,…,Nx _{ i }(j) ≠ 0}) by x _{ i }(j)d _{ j }, i.e., by the target atom d _{ j } and its corresponding representation coefficient x _{ i }(j). Note that it is difficult to maximize Equation 28 in the same way as the calculation of the optimal vector x _{ i } (i = 1,2,…,N) since the optimization problem is too complex. Thus, using the wellknown steepest ascent algorithm, the proposed method updates each atom d _{ j } (j = 1,2,…,K). Specifically, the proposed method performs an update of the dictionary matrix D by the following procedures:
Step 1. Select one atom d _{ j } (j = 1,2,…,K).
Step 2. Update the selected atom d _{ j } by iterating the following equation:
where ζ is a fixed small parameter.
Step 3. Replace the selected atom d _{ j } with the vector obtained by step 2. Note that a new dictionary matrix, whose j th column, i.e., d _{ j }, is only updated, is obtained.
Step 4. Repeat steps 1 to 3 for all atoms d _{1},d _{2},…,d _{ K } within the dictionary matrix D.
Using the above procedures, the proposed method can update the dictionary matrix D.
Finally, we clarify the relationship between the KSVD algorithm[36] and our SSIMbased algorithm. First, the biggest difference between the proposed method and the KSVD algorithm is the use of different quality metrics. The KSVD algorithm tries to minimize the MSE for performing sparse representation and dictionary generation. On the other hand, the proposed method tries to maximize the SSIM index for them. Specifically, for the calculation of sparse representation coefficients, we adopt an algorithm similar to the OMP algorithm, but the quality measure is the SSIM index, not the MSE. Therefore, representation coefficients are obtained to maximize the SSIM index which is used as the representation performance. Then, the optimal solution is obtained on the basis of the algorithm used in[53], which is quite different from the algorithm based on the MSE. Furthermore, for generation of the dictionary, the proposed method updates each atom, and its scheme is also similar to that of the KSVD algorithm. However, the proposed method performs the update of each atom in such a way that the sum of the SSIM index becomes highest and, thus, SVD is not used for the calculation. Then, since the update procedure is too complicated, we simply adopt the steepest ascent algorithm in our method.
4.2 Inpainting algorithm
In this subsection, the inpainting algorithm of the missing area Ω in the target patch f based on the SSIM index is presented. In the proposed method, the target patch f is approximated by a sparse linear combination of the atoms of the dictionary matrix D obtained in the previous subsection. In this approach, we introduce the SSIM index as the approximation performance, and then, the optimal reconstruction results maximizing the SSIM index can be obtained. Note that to obtain the optimal sparse linear combination maximizing the SSIM index, we also introduce the calculation scheme in[53]. Note that different from the previous subsection, since we simultaneously estimate the representation coefficients and the missing intensities, the calculation scheme in[53] is extended. Then, the inpainting of the missing area Ω within the target patch f can be realized based on the SSIM index. The details are shown below.
The proposed method tries to estimate the optimal linear combination
of the unknown vector y of f, where
Note that E (\in {\mathbf{R}}^{{N}_{\stackrel{\u0304}{\mathrm{\Omega}}}\times \mathit{\text{wh}}}) is a matrix whose diagonal elements are one or zero, and it extracts only known intensities within y to obtain y ^{∗} (\in {\mathbf{R}}^{{N}_{\stackrel{\u0304}{\mathrm{\Omega}}}}), where{N}_{\stackrel{\u0304}{\mathrm{\Omega}}} is the number of known pixels in f. From Equation 31, the proposed method tries to estimate the unknown vector y approximated by the linear combination of the atoms in the dictionary matrix D under the constraints that the known intensities in\stackrel{\u0304}{\mathrm{\Omega}} are fixed and the number of the nonzero elements in x is T or less.
Instead of directly calculating the optimal solution in Equation 31, we first perform the selection of the optimal T atoms used for approximating y. Specifically, the proposed method selects T optimal atoms from D by solving the following problem:
where its solution can be obtained on the basis of the same algorithm as the calculation of the optimal vector x _{ i } described in the previous subsection. Then, a matrix\widehat{\mathbf{D}} containing atoms whose corresponding coefficients in\widehat{\mathit{\alpha}} are nonzero values is obtained.
Next, from the obtained matrix\widehat{\mathbf{D}}, Equation 31 is rewritten as
In the above equation,
and
In the proposed method, we estimate\widehat{\mathbf{y}} and\widehat{\mathbf{a}}, maximizing Equation 34 under the constraint E y = y ^{∗} using the computation scheme in[53] in a similar way shown in the previous subsection. Note that we have to estimate the two vectors, and this computation scheme is extended as follows.
Specifically, Equation 34 is a nonconvex function of y and a, but the first term in Equation 34 is a function only of\frac{1}{\mathit{\text{wh}}}{\mathbf{1}}^{\prime}\mathbf{y}\left(=\rho \right) and{{\mathit{\mu}}_{\widehat{\mathbf{D}}}}^{\prime}\mathbf{a}\left(=\omega \right) and, thus, we rewrite Equation 33 in the same way as that in the previous subsection.
By fixing\frac{1}{\mathit{\text{wh}}}{\mathbf{1}}^{\prime}\mathbf{y}=\rho and{{\mathit{\mu}}_{\widehat{\mathbf{D}}}}^{\prime}\mathbf{a}=\omega, the first term of the SSIM index shown in Equation 34 can be fixed, and the cost function of Equation 33 can be simplified.
Therefore, the overall problem is to find the highest SSIM index by searching over ranges of ρ and ω as shown in Figure2. Note that their search ranges are set to{\mu}_{{\mathbf{y}}^{\ast}}R\delta ,\dots ,{\mu}_{{\mathbf{y}}^{\ast}}2\delta ,{\mu}_{{\mathbf{y}}^{\ast}}\delta ,{\mu}_{{\mathbf{y}}^{\ast}},{\mu}_{{\mathbf{y}}^{\ast}}+\delta ,{\mu}_{{\mathbf{y}}^{\ast}}+2\delta ,\dots ,{\mu}_{{\mathbf{y}}^{\ast}}+R\delta, where{\mu}_{{\mathbf{y}}^{\ast}} is the mean of y ^{∗}. Thus, the solution can be obtained in the same manner as that shown in the previous subsection.
Then, the following problem can be obtained:
where
Note that the optimal value of τ can be obtained as shown in the previous subsection.
Furthermore, the proposed method adopts the Lagrange multiplier approach to obtain the optimal vectors of y and a as follows:
where{\mathbf{v}}_{k}(k=1,2,\dots ,{N}_{\stackrel{\u0304}{\mathrm{\Omega}}}) is a vector satisfying
and{y}_{k}^{\ast}(k=1,2,\dots ,{N}_{\stackrel{\u0304}{\mathrm{\Omega}}}) satisfies
In Equation 40, the first and second terms are from the cost function, and the third, fourth, and fifth terms are from the constraints.
Then, by solving the above problem, the proposed method can calculate the optimal vectors\widehat{\mathbf{a}} and\widehat{\mathbf{y}}. Finally, from the obtained result\widehat{\mathbf{y}}, the proposed method outputs the estimated intensities in the missing area Ω.
As shown in the above procedures, we can estimate the missing intensities in Ω within the target patch f. Therefore, the proposed method clips patches including missing areas and performs inpainting to estimate all missing intensities. This means that the proposed method gradually reconstructs missing areas patch by patch starting from the missing boundary. It should be noted that in order to realize this scheme, we have to determine the order in which patches along the fillfront ∂ Ω of missing areas are filled. We call this order 'patch priority’. In the proposed method, patch priorities are determined by the method proposed by Criminisi et al.[24]. Specifically, given a patch f _{ p } centered at pixel p that is in the fillfront of the missing areas within the target image, its priority P(p) is defined as follows:
where C(p) and D(p) are called confidence term and data term, respectively, and they are defined as follows:
In the above equations, I and Θ are the whole areas of the target image and whole missing areas, respectively. Furthermore, area (f _{ p }) (= w × h) represents the number of pixels included within the target patch f _{ p }. Then, I _{max} is a normalization factor (e.g., I _{max} = 255 for a typical gray scale image),\nabla {I}_{\mathbf{p}}^{\perp} is an isophote at pixel p, and n _{ p } is a unit vector orthogonal to the fillfront at pixel p. Note that C(p) is initially set as C(p) = 0∀p ∈ Θ and C(p) = 1∀p ∈ (IΘ). After performing the inpainting, C(p) is substituted into those in the inpainted areas for the following inpainting process.
Note that the confidence term represents the mean reliability of the pixels within the target patch f _{ p }. Therefore, if the target patch f _{ p } contains many known intensities, its value becomes higher. Furthermore, after the inpainting, the reconstructed pixels have the values less than one, i.e., the reconstructed pixels have higher reliability than that of the missing pixels but lower reliability than that of the original pixels. Furthermore, as shown in Figure3, the data term is a function of the strength of isophotes at the fill front δ Ω[24]. Therefore, by calculating the inner product of the isophote\nabla {I}_{\mathbf{p}}^{\perp} at pixel p and unit vector n _{ p } orthogonal to the fillfront at pixel p, the linear structures can be reconstructed first. In this way, we can restore all of the missing areas within the target image according to the patch priorities in Equation 43.
5 Experimental results
In this section, we verify the performance of the proposed method in order to confirm its effectiveness. First, we show results of subjective evaluation of the proposed method using several test images. Furthermore, results of quantitative evaluation using peak signaltonoise ratio (PSNR) obtained from MSE and the SSIM index are shown, and the effectiveness of the use of the SSIM index is also discussed.
In this section, we show the conditions of the experiments in Section 5.1. In this subsection, we mainly explain the details of the experiments and the comparative methods. In Section 5.2, subjective and quantitative results are shown in comparison with those of the existing methods, and the effectiveness of the proposed method is also discussed. In Section 5.3, we show some examples by applying the proposed method to test images including larger missing areas.
5.1 Conditions of experiments
In this subsection, we explain the conditions of the experiments. In the experiments, we first prepared three test images, which are shown in Figures4,5, and6. Furthermore, we added text regions to these test images and obtained corrupted images.
We performed inpainting of the three corrupted test images using the proposed method and the following existing methods:

1.
Methods based on PCA or KPCA [31, 34, 35]
These existing methods generate eigenspaces or nonlinear eigenspaces of patches for inpainting based on PCA or KPCA. Since it is well known that eigenspaces can provide leastsquare approximation of target data, i.e., eigenspaces are the optimal subspaces based on MSE, the method in[31] is suitable for comparison with the proposed method. Furthermore, the methods in[34] and[35] utilize nonlinear eigenspaces to perform the approximation of nonlinear texture features in images, and we therefore used these methods in the experiments.

2.
Exemplarbased inpainting methods [24, 30]
Several exemplarbased inpainting methods have been proposed. The method in[24] is a representative method, and its improved version was proposed in[30], both methods being based on leastsquare error approaches. In the proposed method, we determine the patch priority using the scheme in[24] and, thus, the difference between our method and[24] is the algorithm for estimating missing intensities. Therefore, the method in[24] is suitable in confirming the effectiveness of the proposed inpainting algorithm, i.e., the missing intensity estimation algorithm. Furthermore, although the method in[30] improves on the speed rather than inpainting performance improvement, it is reported in their paper that their method improves the performance of[24] in some cases. Therefore, in the experiments, we used these methods as comparative methods.

3.
Sparse representationbased inpainting methods [41]
As described above, the method in[41] adopts the new modeling of patch priority and patch representation, which are two crucial steps for patch propagation in the exemplarbased inpainting approach, based on sparsity. It should be noted that since this method is based on sparse representation but uses MSEbased criteria, it is suitable for comparison.
In this paper, we regard those in[35] and[41] as stateoftheart methods.
Furthermore, the method in[41] has improved the performance in both patch approximation improvement based on sparse representation and patch priority estimation. Thus, we regard this method as a stateoftheart method.
In the experiments, we used the above methods as comparative methods for evaluation of our method. For performing inpainting by the proposed method and existing methods[24, 30, 41], patch size was fixed to 15 (w = h = 15). Furthermore, the existing methods in[31, 34] and[35] simply perform inpainting in a raster scanning order. Then, for some test images, since target patches contain missing areas in the whole parts, those methods cannot perform inpainting on those missing areas. Thus, in the experiments, patch size was set to 30. Note that much smaller patches were used in some existing methods in previous studies and that accurate performance could be achieved. In these experiments, we used such difficult conditions in order to make the difference in the performances of the proposed method and the existing methods clearer. Furthermore, in our method, we simply determined T = 10, δ = 5, and R = 6.
5.2 Subjective and quantitative evaluations
Based on the experimental conditions shown in the previous subsection, inpainting was performed using the proposed method and existing methods. Figures4,5, and6 show the results obtained by those methods. For better subjective evaluation, we also show their zoomed portions in Figures7,8, and9, respectively. From the obtained results, we can confirm that the proposed method successfully performs inpainting without suffering from oversmoothness. Some MSE criterionbased methods also accurately perform inpainting, but it becomes difficult to simultaneously maintain sharpness in some cases. In some existing methods such as[24],[30], and[41], the performance becomes worse than that reported in those papers. As described above, in this paper, we selected conditions different from those used in those paper, i.e., larger size patches were used. Since this comparison scheme was adopted in several papers, we also used such difficult conditions in order to make the difference in the performance of the proposed method and existing methods clearer. Then, the representation abilities of the methods become worse and the obtained results tend to be blurred. Since the exemplarbased methods in[24] and[30] directly select known patches from the target image for inpainting, blurring tends to be reduced. Nevertheless, even in those methods, it is difficult to perfectly remove degradation. Furthermore, although the methods in[34] and[35] adopt nonlinear eigenspaces for inpainting to represent nonlinear texture features, their representation abilities become worse as the dimension of the subspace becomes smaller, where the dimension was set to the same as that of the proposed method.
Generally, natural images contain much more powers in lowfrequency components than those in highfrequency components. Lowdimensional subspaces obtained from the MSEbased criteria in the existing methods therefore tend to represent only such lowfrequency components. Thus, since it becomes difficult to represent highfrequency components, their results suffer from oversmoothness. On the other hand, the SSIM index contains a term comparing components not including average components, i.e., variances, as shown in Equation 5, and, thus, subspaces used for inpainting tend to successfully represent highfrequency components. Therefore, the proposed method can perform inpainting successfully. Furthermore, the proposed method adopts sparse representation in addition to the SSIM index. This approach enables adaptive selection of the optimal atoms for each target patch including missing areas. This means that the optimal subspace can be provided for each target patch by our method.
Next, we show results of quantitative evaluation for the proposed method and the existing methods. Eight test images shown in Figures10 and11 are added to those in Figures4,5, and6. In these figures, the results of inpainting by our method are also shown. Tables1 and2 show the results of PSNR (dB), which is calculated from MSE, and the SSIM index of inpainting results, respectively. Note that since the inpainting is performed for each patch, the SSIM index is calculated for the patches, and their average values are shown in the tables. In addition, the evaluation values are computed only on the reconstructed pixels. The results show that several existing methods have higher PSNR values, i.e., lower MSE values, than that of the proposed method. Specifically, the existing method in[35] and the proposed method output the best results in terms of PSNR (MSE) and the SSIM index, respectively.
In recent years, several researchers of image quality assessment have also pointed out the problem that MSE and its variants cannot reflect some degradations[46, 51]. Therefore, in order to tackle this problem, several criteria for determining image qualities have been proposed, the SSIM index being a representative criterion. In the proposed method, we focus on this criterion and realize inpainting that maximizes the SSIM index. Therefore, it is natural that the proposed method achieves the highest SSIM values. Note that even though the use of the SSIM index for inpainting is effective, it is difficult to perfectly determine the order of inpainting performance that is the same as the subjective evaluation. This means that ranking of inpainting performance that perfectly reflects subjective evaluation is difficult, and further improvement is necessary in future work.
As quantitative evaluation, we have shown PSNR and SSIM index of the results by our method and other existing methods. Next, we focus on the computation cost of the proposed method. We first compare the computation times of the proposed method and other multivariate analysisbased methods in[31] and[35]^{b}. The average computation time for obtaining the results of images 1 to 11 by our method was about 342.1 s. Then the proposed method is about 0.78 to 3.1 times (1.6 times on average) slower than the method in[31]. Note that the ratio smaller than one means that the computation time of the proposed method is shorter. In the method in[31], the procedure for the inpainting is simple since it only needs the computation of the eigenvector matrix and the calculation of the back projection for lost pixels. Therefore, the fast computation can be realized. On the other hand, the proposed method is about 1.2 to 4.7 times (2.9 times on average) faster than the method in[35]. In this method, the kernel PCA is adopted, and we have to calculate the projection onto the nonlinear subspace using the kernel trick, i.e., we cannot perform the direct projection. Furthermore, in this approach, the classification of the target patch including missing areas is performed, and thus, the inpainting procedures are performed for all clusters. Therefore, this needs high computation costs. Furthermore, the computation time of our method is about 4.3 to 12.5 times (7.4 times on average) longer than that of the exemplarbased method in[24]. Note that the method in[30] used as the comparative method in the experiments drastically improves the computation costs of[24], and it also introduces the GPU implementation. The CPU version improving the computation costs of[24] has also been proposed by the same authors[62]. In[62], Kwok et al. reported inpainting that was about 15 to 50 times faster than that of the method in[24].
In addition, compared with the MSEbased inpainting approach, which calculates the optimal sparse representation coefficients based on the MSE, the proposed method requires complex optimization procedures as shown in the previous section. In the MSEbased approach, it is well known that the normal equation can be simply solved, and it is much simpler than our method. It is therefore necessary to improve the speed of computation by introducing some alternative approaches into our inpainting method. This topic will be investigated in subsequent studies.
Note that in the above experiment, we used the difficult condition, i.e., larger size patches, in order to make the difference of the inpainting performance between our method and the existing methods clearer. Next, we show other different experimental results obtained using conditions which were adopted in each paper. This means that the conditions of the existing methods were determined according to their papers. In the new experiments, we used the eight test images shown in Figures10 and11 and randomly added missing blocks of size 8×8 pixels with the changing ratio of the missing pixels. Figure12 shows the relationship between the ratio of the missing pixels and the SSIM index calculated from the reconstructed image. From these results, we can see that the proposed method tends to output better results than those of the existing methods.
5.3 Inpainting of larger missing areas
Finally, in Figures13,14,15, and16, we show some examples obtained by performing inpainting for larger missing areas based on the proposed method. Note that in these experiments, we performed inpainting of larger missing areas by the proposed method including one simple additional procedure. The details are shown as follows. First, for a target patch selected on the basis of the patch priority shown in Equation 43, the proposed method performs the inpainting shown in Section 4.2 and obtains the result\widehat{\mathbf{y}}. Next, as an additional procedure, we search for the optimal known patch, which is best matched to the obtained result\widehat{\mathbf{y}}, using the SSIM index from the target image, and the selected known patch is used as the final output. Then, by performing the above procedures for all patches selected according to patch priority, the whole missing areas can be reconstructed.
The above scheme is similar to existing methods that simply select only the best matched examples, but the difference is shown below. In existing methods using only the best matched examples, the best matched patch is selected by monitoring errors in the known neighboring areas around the missing areas. On the other hand, the proposed method performs reconstruction of the patches based on SSIMbased sparse representation, and then, the examples that are best matched to the reconstructed patches are selected using the SSIM index, i.e., the best matched examples are selected from wellapproximated reconstruction patches. This is the biggest difference between the existing methods and the proposed method.
It should be noted that although the proposed method can perform accurate reconstruction of patches, the obtained results tend to include color that is not included within the target image. This is because the proposed method does not adopt any specific procedures to avoid spurious color. Therefore, in the experiments on reconstruction of larger missing areas, we adopted the above scheme to avoid the propagation of spurious color.
From the obtained results, we can confirm that the proposed method enables successful inpainting of such large missing areas. Note that the images shown in Figures13,14,15, and16 are used as test images in several papers such as in[23, 24, 30] and[41]. Furthermore, since the flag images that correspond to Figures13b,14b,15b, and16b are generated in each paper, i.e., positions of missing areas are different from each other in those papers, we show discussion by comparing the results obtained by our method shown in Figures13,14,15, and16 and the results shown in those papers. From the results shown in these figures, we can see that the proposed method achieves comparable performance or some improvements, though it should be noted that since we do not have ground truth images for these test images, we perform subjective evaluation. Specifically, as shown in Figure15, the proposed method and the methods in[23] and[41] can achieve visually pleasant results. In this test image, since structural and textural components are simple and the percentage of missing areas is relatively small, it is easier to achieve successful inpainting. Similarly, Figure13 shows that successful inpainting could be achieved by our method and the methods in[24],[30], and[41], and improvement by our method can be confirmed in some areas. However, it should be noted that reconstruction of structural components, i.e., edges, by[41] can be realized more accurately. The biggest difference between[41] and other works including our method is priority estimation. Thus, by introducing an improved priority estimation scheme, the performance of the proposed method will be improved. Furthermore, Figure16 shows that results obtained by our method are comparable to results in[24] and[30]. Note that the flag images of this test image are different from each other in these methods, and we found that performance was affected by generation of flag images. This was also observed in the image shown in Figure14.
As shown in the above discussion, it becomes difficult in the proposed method to perform successful structure reconstruction. In order to understand this problem easily, we show some examples in Figures17 and18. From the two images (512 × 512 pixels) shown in Figures17a and18a, we artificially added missing blocks (16 × 16 pixels) to obtain the corrupted images in Figures17b and18b. As shown in the results reconstructed by the proposed method in Figures17c and18c, it can be seen that texture regions and simple structure regions can be reconstructed successfully. On the other hand, in some complex structure regions including several directional edges simultaneously, it becomes difficult to perfectly recover those structure components by our method. This is because the proposed method considers the structure components only in the patch priority determination. This means the inpainting algorithm in the proposed method is optimized only for the texture reconstruction.
In order to simultaneously reconstruct the structure and texture regions, several methods have been proposed[23, 25, 42]. The method in[23] proposed a fragmentbased algorithm which could preserve both structures and textures. A confidence map is used to determine which pixels have more surrounding information available. The reconstruction is performed from more confident pixels and is proceeded in a multiscale fashion from coarse to fine. Furthermore, a similar image fragment is found and copied to current unknown location, where a fragment is a circular neighborhood, and its radius is defined adaptive to its underlying structure. In contrast to the above advantage, it is reported in[24, 41] that this algorithm is extremely slow and may introduce blurring artifacts. The fragment is selected based on the absolute distance, and this tends to cause the problem, i.e., the blurring artifacts, similar to that caused when using the MSEbased distance. The method in[42] introduced a sparse representation model representing both structure and texture components to realize their simultaneous reconstruction. On the other hand, this method is based on leastsquare approximation, and the problem of using the MSE may occur. Therefore, by introducing this simultaneous representation model into the proposed SSIMbased approach, successful reconstruction can be expected. Furthermore, the method in[25] introduced interactive image editing tools to realize highly accurate structure reconstruction. Since the guide for the reconstruction can be provided by users, this improves the inpainting performance. Although this approach does not realize the perfectly automatic image inpainting, it will also improve the performance of the proposed method by adopting the interactive image editing tools.
6 Conclusions
In this paper, we have presented an inpainting method based on sparse representations optimized with respect to a perceptual metric. Using sparse representation, the proposed method adaptively provides subspaces optimal for reconstructing target patches including missing areas. In this approach, the SSIMbased criterion is introduced into calculation of the dictionary and inpainting algorithm. This enables perceptually optimized inpainting, and successful results can be obtained by the proposed method.
Although the proposed method can reconstruct large missing regions without blurring artifacts, it has more computational complexity than other existing approaches and also generates some artifacts in the output image as shown in Figure14. The computation cost and some artifacts caused by the proposed method should be concerned and solved in the future work.
Furthermore, extension of the algorithm to reconstruction of other types of missing image data is desirable for various applications. These topics will be future works and results will be presented in subsequent reports.
Endnotes
^{a}In this paper, signalatoms are simply referred to as 'atoms’ hereafter according to[38].
^{b}The experiments were performed on a personal computer using Intel(R) Core(TM) i7 950 CPU 3.06 GHz with 8.0 GB RAM. The implementation was performed using MATLAB.
Appendix
In this appendix, we show the details of the derivations in Equations 26 and 27. Since the optimization problem in Equation 25 is still nonconvex, it is converted into a quasiconvex optimization problem as follows:
Since minimization of τ is the same as finding the least upper bound of Equation 25, the first equivalence relationship holds. The second equivalence relationship holds since the denominator in Equation 25 is strictly positive, allowing us to multiply through and rearrange terms. In this way, we can derive Equation 26. Then, τ becomes a true upper bound if
optimized in Equation 46 has a nonnegative optimal value, and the optimal vector{\widehat{\mathbf{x}}}_{i,j}^{(t)}\left({\rho}_{i,j}^{(t)}\right) in Equation 25 can be obtained. Thus, by applying the Lagrange multiplier approach to the above equation under the constraint{{\mathit{\mu}}_{{\mathbf{D}}_{i,j}^{(t)}}}^{\prime}{\mathbf{x}}_{i,j}^{(t)}={\rho}_{i,j}^{(t)}, Equation 27 can be obtained.
References
Yan R, Shao L, Liu Y: Nonlocal hierarchical dictionary learning using wavelets for image denoising. IEEE Trans. Image Process 2013, 22(12):46894698.
Shao L, Yan R, Li X, Liu Y: From heuristic optimization to dictionary learning: A review and comprehensive comparison of image denoising algorithms. IEEE Trans. Cybern. 2013. doi:101109/TCYB20132278548
Buades A, Coll B, Morel J: A review of image denoising algorithms, with a new one. Multiscale Model. Simul 2005, 4(2):490530. 10.1137/040616024
Shao L, Zhang H, de Haan G: An overview and performance evaluation of classificationbased least squares trained filters. IEEE Trans. Image Process 2008, 17(10):17721782.
Hansen PC, Nagy JG, O’Leary DP: Deblurring Images: Matrices, Spectra, and Filtering (Fundamentals of Algorithms). Philadelphia: Society for Industrial and Applied Mathematics; 2006.
Tauber Z, Li ZN, Drew M: Review and preview: disocclusion by inpainting for imagebased rendering. IEEE Trans. Syst., Man, Cybern., Part, C: Appl. Rev 2007, 37(4):527540.
Masnou S, Morel J: Level lines based disocclusion. Proc. IEEE Int. Conf. Image Process. (ICIP) 1998, 3: 259263.
Bertalmio M, Sapiro G, Caselles V, Ballester C: Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, (SIGGRAPH’00). New York: ACM; 2000:417424.
Bertalmio M, Bertozzi A, Sapiro G: Navierstokes, fluid dynamics, and image and video inpainting. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit 2001, 1: 355362.
Chan TF, Shen J: Nontexture inpainting by curvaturedriven diffusions. J. Vis. Commun. Image Representation 2001, 12(4):436449. 10.1006/jvci.2001.0487
Ogawa T, Haseyama M, Kitajima H: Restoration method of missing areas in still images using GMRF model. IEEE Int. Symp. Circuits Syst. (ISCAS) 2005, 5: 49314934.
Rares A, Reinders MJT, Biemond J: Edgebased image restoration. IEEE Trans. Image Process 2005, 14(10):14541468.
AuclairFortier MF, Ziou D: A global approach for solving evolutive heat transfer for image denoising and inpainting. IEEE Trans. Image Process 2006, 15(9):25582574.
Bertalmio M: Strongcontinuation, contrastinvariant inpainting with a thirdorder optimal PDE. IEEE Trans. Image Process 2006, 15(7):19341938.
Liu J, Li M, He F: A novel inpainting model for partial differential equation based on curvature function. J. Multimedia 2012, 7(3):239246.
Wang M, Yan B, Ngan KN: An efficient framework for image/video inpainting. Signal Process: Image Commun 2013, 28(7):753762. 10.1016/j.image.2013.03.002
Qi F, Han J, Wang P, Shi G, Li F: Structure guided fusion for depth map inpainting. Pattern Recognit. Lett 2013, 34: 7076. 10.1016/j.patrec.2012.06.003
C Ballester C, Bertalmio M, Caselles V, Sapiro G: Fillingin by joint interpolation of vector fields and gray levels. IEEE Trans. Image Process 2001, 10(8):12001211. 10.1109/83.935036
Kokaram A: A statistical framework for picture reconstruction using 2D AR models. Image Vis. Comput 2004, 22(2, 1):165171.
Bertalmio M, Vese L, Sapiro G, Osher S: Simultaneous structure and texture image inpainting. IEEE Trans. Image Process 2003, 12(8):882889. 10.1109/TIP.2003.815261
Efros AA, Leung TK: Texture synthesis by nonparametric sampling. Proceedings of the Seventh IEEE International Conference on Computer Vision 1999, 2: 10331038.
Wey LW, Levoy M: Fast texture synthesis using treestructured vector quantization. In Proceedings of SIGGRAPH 2000. New York: ACM; 2000:479488.
Drori I, CohenOr D, Teshurun H: Fragmentbased image completion. In Proceedings of SIGGRAPH 2003. New York: ACM; 2003:303312.
Criminisi A, Perez P, Toyama K: Region filling and object removal by exemplarbased image inpainting. IEEE Trans. Image Process 2004, 13(9):12001212. 10.1109/TIP.2004.833105
Barnes C, Shechtman E, Finkelstein A, Goldman DB: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. (Proc. SIGGRAPH) 2009. doi:10.1145/1531326.1531330
Zhang Q, Lin J: Exemplarbased image inpainting using color distribution analysis. J. Inf. Sci. Eng 2012, 28(4):641654.
Shibata T, Iketani A, Senda S: Fast and structurepreserving image Inpainting based on probabilistic structure estimation. IEICE Trans. Inf. Syst 2012, 95(7):17311739.
Le Meur O, Guillemot C: Superresolutionbased inpainting. In Computer Vision  ECCV 2012 Edited by: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C. 12th European Conference on Computer Vision, Florence, 713 October 2012, Proceedings, Part VI (Springer, Heidelberg, 2012), pp. 554–567
Le Meur O, Ebdelli M, Guillemot C: Hierarchical superresolutionbased inpainting. Image Process. IEEE Trans 2013, 22(10):37793790.
Kwok TH, Sheung H, Wang CCL: Fast query for exemplarbased image completion. IEEE Trans. Image Process 2010, 19(12):31063115.
Amano T, Sato Y: Image interpolation using BPLP method on the eigenspace. Syst. Comput. Japan 2007, 38: 8796. 10.1002/scj.10319
Schölkopf B, Mika S, Burges CJC, Knirsch P, Müller KR, Rätsch G, Smola AJ: Input space versus feature space in kernelbased methods. IEEE Trans. Neural Netw 1999, 10(5):10001017. 10.1109/72.788641
Mika S, Schölkoph B, Smola A, Müller KR, Scholz M, Rätsch G: Kernel PCA and denoising in feature spaces. Adv. Neural Inf. Process. Syst 1999, 11: 536542.
Kim KI, Franz MO, Schölkoph B: Iterative kernel principal component analysis for image modeling. IEEE Trans. Pattern Anal. Mach. Intell 2005, 27(9):13511366.
Ogawa T, Haseyama M: Missing intensity interpolation using a kernel PCABased POCS algorithm and its applications. IEEE Trans. Image Process 2011, 20(2):417432.
Aharon M, Elad M, Bruckstein A: KSVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process 2006, 54(11):43114322.
Elad M, Aharon M: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process 2006, 15(12):37363745.
Mairal J, Elad M, Sapiro G: Sparse representation for color image restoration. IEEE Trans. Image Process 2008, 17: 5369.
Wohlberg B: Inpainting with sparse linear combinations of exemplars. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009). Piscataway: IEEE; 2009:689692.
Shen B, Hu W, Zhang Y, Zhang YJ: Image inpainting via sparse representation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009). Piscataway: IEEE; 2009:697700.
Xu Z, Sun J: Image inpainting by patch propagation using patch sparsity. IEEE Trans. Image Process 2010, 19(5):11531165.
Elad M, Starck JL, Querre P, Donoho D: Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmonic Anal 2005, 19(3):340358. 10.1016/j.acha.2005.03.005
Turkan M, Guillemot C: Locally linear embedding based texture synthesis for image prediction and error concealment. In 19th IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE; 2012:30093012.
Guillemot C, Turkan M, Meur OL, Ebdelli M: Object removal and loss concealment using neighbor embedding methods. Signal Process.: Image Commun 28(10):14051419.
Takahashi T, Konishi K, Furukawa T: Structured matrix rank minimization approach to image inpainting. In IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS). Piscataway: IEEE; 2012:860863.
Girod B: What’s wrong with meansquared error? In Digital Images and Human Vision. Edited by: Watson AB. Cambridge: MIT Press; 1993:207220.
Wang Z, Bovik AC: Modern Image Quality Assessment. San Rafael: Morgan & Claypool Publishers; 2006.
DameraVenkata N, Kite TD, Geisler WS, Evans BL, Bovik AC: Image quality assessment based on a degradation model. IEEE Trans. Image Process 2000, 4(4):636650.
Sheikh HR, Bovik AC, de Veciana G: An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process 2005, 14(12):21172128.
Sheikh HR, Bovik AC: Image information and visual quality. IEEE Trans. Image Process 2006, 15(2):430444.
Sheikh HR, Sabir MF, Bovik AC: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process 2006, 15(11):34403451.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process 2004, 13(4):600612. 10.1109/TIP.2003.819861
Channappayya SS, Bovik AC, Caramanis C, Heath RW: Design of linear equalizers optimized for the structural similarity index. IEEE Trans. Image Process 2008, 17(6):857872.
Rehman A, Rostami M, Wang Z, Brunet D, Vrscay ER: SSIMinspired image restoration using sparse representation. EURASIP J. Adv. Signal Process 2012, 2012: 16. 10.1186/16876180201216
Davis G, Mallat S, Avellaneda M: Adaptive greedy approximations. J. Construct. Approx 1997, 13: 5798.
Mallat S, Zhang Z: Matching pursuits with timefrequency dictionaries. IEEE Trans. Signal Process 1993, 41(12):33973415. 10.1109/78.258082
Chen S, Billings SA, Luo W: Orthogonal least squares methods and their applications to nonlinear system identification. Int. J. Contr 1989, 50(5):18731896. 10.1080/00207178908953472
Pati YC, Rezaiifar R, Krishnaprasad PS: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Conf. Rec. 27th Asilomar Conf. Signals, Syst. Comput 1993, 1: 4044.
Tropp JA: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 2004, 50(10):22312242. 10.1109/TIT.2004.834793
Chen SS, Donoho DL, Saunders MA: Automatic decomposition by basis pursuit. SIAM Rev 2001, 43: 129159. 10.1137/S003614450037906X
Gorodnitsky IF, Rao BD: Sparse signal reconstruction from limited data using FOCUSS: a reweighted norm minimization algorithm. IEEE Trans. Signal Process 1997, 45(3):600616. 10.1109/78.558475
Kwok TH, Wang CCL: Interactive image inpainting using DCT based exemplar matching. Adv. Vis. Comput., Lecture Notes Comput. Sci 2009, 5876/2009: 709718.
Acknowledgements
This work was partly supported by GrantinAid for Scientific Research (B) 25280036 and GrantinAid for Young Scientists (B) 22700088, Japan Society for the Promotion of Science (JSPS).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Ogawa, T., Haseyama, M. Image inpainting based on sparse representations with a perceptual metric. EURASIP J. Adv. Signal Process. 2013, 179 (2013). https://doi.org/10.1186/168761802013179
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/168761802013179