 Review
 Open access
 Published:
Survey of hyperspectral image denoising methods based on tensor decompositions
EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 186 (2013)
Abstract
A hyperspectral image (HSI) is always modeled as a threedimensional tensor, with the first two dimensions indicating the spatial domain and the third dimension indicating the spectral domain. The classical matrixbased denoising methods require to rearrange the tensor into a matrix, then filter noise in the column space, and finally rebuild the tensor. To avoid the rearranging and rebuilding steps, the tensorbased denoising methods can be used to process the HSI directly by employing multilinear algebra. This paper presents a survey on three newly proposed HSI denoising methods and shows their performances in reducing noise. The first method is the Multiway Wiener Filter (MWF), which is an extension of the Wiener filter to data tensors, based on the TUCKER3 decomposition. The second one is the PARAFAC filter, which removes noise by truncating the lower rank K of the PARAFAC decomposition. And the third one is the combination of multidimensional wavelet packet transform (MWPT) and MWF (MWPTMWF), which models each coefficient set as a tensor and then filters each tensor by applying MWF. MWPTMWF has been proposed to preserve rare signals in the denoising process, which cannot be preserved well by using the MWF or PARAFAC filters. A realworld HYDICE HSI data is used in the experiments to assess these three tensorbased denoising methods, and the performances of each method are analyzed in two aspects: signaltonoise ratio and improvement of subsequent target detection results.
1 Review
1.1 Introduction
Hyperspectral images (HSI) attract more and more interest in recent years in different domains, such as geography, agriculture, and military [1–3]. They use the HSI to do the target detection [4] or classification [5] to find objects or materials of interest on the ground. Unfortunately, in the capturing procedure, the HSI is usually impaired by several types of noise, such as thermal noise [6], photonic noise [7], and strip noise [8]. Therefore, denoising methods [9–13] have become a critical step for improving the subsequent target detection and classification in remote sensing imaging applications [14].
In HSI processing, images are modeled as a threedimensional tensor, i.e., two spatial dimensions and one spectral dimension. The classical denoising methods [15–18] rearrange the HSI into a matrix whose columns contain the spectral signatures of all the pixels, then estimate the signal subspace by methods based on the analysis of secondorder statistics, and finally rebuild the original HSI structure after processing.
Since matrixbased techniques cannot take advantage of spectra in hyperspectral images, therefore, in order to treat the HSI as a whole entity, some new techniques were developed. For example, an HSI was treated as a hypercube in order to take into account the correlation among different bands [19, 20], tensoralgebra was brought to jointly analyze the 3D HSI, etc. In this paper, we mainly focus on the problem of applying tensor algebra in reducing noise in HSIs. Unlike the matrixbased denoising methods which are based on matrix algebra, the newly proposed tensorbased denosing methods utilize multilinear algebra to analyze the HSI tensor directly. It is well known that SVD (singular value decomposition) is important for matrix analysis. Similarly, there are two important tensor decompositions: TUCKER3 and PARAFAC. These two decompositions play significant roles in analyzing tensors. Therefore, in this paper, we focus on comparative methods based on multilinear algebra for sake of coherence with the recently developed method which combines multidimensional wavelet packet transform and TUCKER3 decomposition: The three methods involve a tensor decomposition either TUCKER3 or PARAFAC.
TUCKER3 decomposition, also known as lower rank(K _{1},…,K _{ N }) tensor approximation (LRTA(K _{1},…,K _{ N })), has been firstly used as multimode PCA, which uses the first K _{ n } PCA components in mode n, n=1,…,N, to restore the multidimensional signal. The LRTA(K _{1},…,K _{ N }) has been employed for seismic wave separation [21], face recognition [22], and color image denoising [23]. Although the LRTA(K _{1},…,K _{ N }) can obtain good results in denoising, it is not an optimal solution for filtering noise in the aspect of the mean squared error (MSE). The multidimensional Wiener filter (MWF) has been proposed to overcome this drawback of LRTA(K _{1},…,K _{ N }) [24]. MWF calculates the filter in each mode under the criterion of minimizing the MSE between the desired signal and the estimated signal, therefore it can been understood as an optimal LRTA(K _{1},…,K _{ N }). Moreover, MWF can also be understood as an extension of the classical matrixbased Wiener filter to the tensor model by using multilinear algebra tools. MWF has been used in seismic wave denoising [24] and HSI denoising [12, 25], and obtained good results. Recently, a statistical criteria has been adapted to estimate the rank of signal subspace in each mode [13], which makes MWF an automatic method to reduce noise in the data.
Apart from TUCKER3, the PARAFAC [26] decomposition, also known as CANDECOMP [27], is another way to decompose a tensor into lower rank factors. Distinguishing from TUCKER3, PARAFAC decomposes a tensor into a sum of rankone tensors and only one rank K needs to be estimated for the tensor. Moreover, the PARAFAC decomposition is unique when the rank K is greater than one, whereas TUCKER3 cannot be. PARAFAC decomposition has recently been applied to chemical sciences [28], array processing [29], telecommunications [30], and HSI denoising [14]. As a comparison of MWF, reference [31] shows the potential of PARAFAC in the HSI denoising. However, there is not an efficient way to estimate the rank of PARAFAC, which constrains it in automaticdenoising.
In a HSI, a rare signal is the one that is represented by only a few number of pixels, while the abundant signal is the one that contains a large number of pixels compared to a rare signal [17]. MWF and PARAFAC treat a HSI as a whole entity in the denoising operation; therefore, the abundant signals and the rare signals are processed together, which inhibits a drawback: the rare signals may be unintentionally removed. In fact, the energy of the rare signal is so weak compared to that of the abundant signal that the estimated signal subspace cannot include the rare signal, and as a result, the rare signal is removed. MWPTMWF (multidimensional wavelet packet transform (MWPT) with multiway Wiener filter) has been proposed to overcome this drawback of MWF and PARAFAC [32]. Instead of treating the HSI as a whole entity, MWPTMWF firstly decomposes the HSI into several coefficient sets, also called components, by employing MWPT, therefore the abundant signal and the rare signal can be separated. After this step, each component is filtered by MWF automatically. Because the rare signal and the abundant signal are separated into different components, the signal subspace in each component can be estimated more exactly.
The goal of this paper is to present a survey of the tensorbased denoising methods applied in filtering the HSI. Some recent simulations and comparative results on a realworld HYDICE HSI are also presented. The reminder of this paper is organized as follows: Section 1.2 briefly introduces some basic knowledge about multilinear algebra. Section 1.3 introduces the signal model used in this paper. Sections 1.4, 1.5, and 1.6 present the recently proposed denoising methods MWF, PARAFAC, and MWPTMWF, respectively. Section 1.7 supplies some comparative denoising and detection results. And finally, Section 2 concludes this paper.
1.2 Basics on tensor tools and multilinear algebra
1.2.1 Tensor model
A multiway signal is also called tensor. A tensor is a multidimensional array, \mathcal{X}\in {\mathbb{R}}^{{I}_{1}\times {I}_{2}\times \dots \times {I}_{N}}, in which {\mathbb{R}}^{} indicates the real mainfold, and N is the number of dimensions. The elements in this tensor can be expressed as {x}_{{i}_{1}{i}_{2}\dots {i}_{N}}, with i _{1}=1,…,I _{1}; i _{2}=1,…,I _{2};⋯ ; i _{ N }=1,…,I _{ N }. The nth dimension of this tensor is called nmode. In particular, tensor is called a rankone tensor when it can be written as the outer product of N vectors [33]:
where ∘ indicates the outer product [34].
1.2.2 Multilinear algebra tools
1.2.2.0 nmode unfolding
denotes the nmode unfolding matrix of a tensor \mathcal{X}\in {\mathbb{R}}^{{I}_{1}\times {I}_{2}\times \dots \times {I}_{N}}, where M _{ n }=I _{ n+1}…I _{1} I _{ N }…I _{ n−1}. The columns of X _{ n } are the I _{ n }dimensional vectors obtained from by varying index i _{ n } while keeping the other indices fixed. Here, we define the nmode rank K _{ n } as the nmode unfolding matrix rank, i.e., K _{ n }=rank (X _{ n }).
1.2.2.0 nmode product
The nmode product is defined as the product between a data tensor \mathcal{X}\in {\mathbb{R}}^{{I}_{1}\times {I}_{2}\times \dots \times {I}_{N}} and a matrix \mathbf{B}\in {\mathbb{R}}^{J\times {I}_{n}} in mode n. This nmode product is denoted by
whose entries are given by
where \mathcal{C}\in {\mathbb{R}}^{{I}_{1}\times {I}_{2}\times \dots \times {I}_{n1}\times J\times {I}_{n+1}\times \dots \times {I}_{N}}.
1.3 Problem formulation and signal modeling
A noisy HSI is modeled as a tensor \mathcal{R}\in {\mathbb{R}}^{{I}_{1}\times {I}_{2}\times {I}_{3}} resulting from a pure HSI \mathcal{X}\in {\mathbb{R}}^{{\mathit{\text{I}}}_{1}\times {\mathit{\text{I}}}_{2}\times {\mathit{\text{I}}}_{3}} impaired by an additive noise \mathcal{N}\in {\mathbb{R}}^{{\mathit{\text{I}}}_{1}\times {\mathit{\text{I}}}_{2}\times {\mathit{\text{I}}}_{3}}. The tensor can be expressed as:
In this paper, we assume that the noise is zeromean white Gaussian noise and independent from the signal . The aim of this paper was to estimate the desired signal from the noisy HSI .
1.4 Multiway Wiener filtering
1.4.1 Denoising model
MWF provides an estimate \widehat{\mathcal{X}} of the desired signal from data tensor by using a threedimensional filtering, which can be expressed as follows [35]:
From the signal processing point of view, the nmode product is a nmode filtering of ; therefore, H _{ n } is named as nmode filter.
In order to obtain the optimal nmode filters {H _{ n }, n=1,2,3}, the usually used criterion is the mean squared error (MSE) between the estimated signal \widehat{\mathcal{X}} and the desired signal :
Then, the optimal nmode filters are the ones which can minimize the MSE given in (6).
1.4.2 Calculation of H _{ n }
To minimize the MSE given in (6) with respect to nmode filters {H _{ n }, n=1,2,3}, the derivation is employed and the calculation details are presented in [24]. By setting the derivation of the MSE to zero, the expression of the optimal nmode filter H _{ n } is [24]:
where {\mathbf{V}}_{s}^{\left(n\right)} is a matrix containing the K _{ n } orthonormal basis vectors of the signal subspace in the column space of the nmode unfolding matrix R _{ n }, and
in which \{{\lambda}_{i}^{\mathbf{\gamma}},\phantom{\rule{1em}{0ex}}i=1,\dots ,{K}_{n}\} and \{{\lambda}_{i}^{\mathbf{\gamma}},\phantom{\rule{1em}{0ex}}i=1,\dots ,{K}_{n}\} are the K _{ n } largest eigenvalues of matrices {\gamma}_{\mathit{\text{RR}}}^{\left(n\right)} and {\gamma}_{\mathit{\text{RR}}}^{\left(n\right)} respectively, where
with
where p _{1}≠n, p _{2}≠n, p _{1},p _{2}=1,2,3 and ⊗ defines the Kronecker product. Moreover, {\sigma}_{\mathbf{\gamma}}^{{\left(n\right)}^{2}} is equal to the I _{ n }−K _{ n } smallest eigenvalues \{{\lambda}_{i}^{\gamma},\phantom{\rule{1em}{0ex}}i={K}_{n}+1,\dots ,{I}_{n}\} of {\gamma}_{\mathit{\text{RR}}}^{\left(n\right)}. However, in the practice, the I _{ n }−K _{ n } smallest eigenvalues are generally different. Hence, {\sigma}_{\mathbf{\gamma}}^{{\left(n\right)}^{2}} can be estimated by:
1.4.3 Estimation of K _{ n }
Being used in the computation of the nmode filter H _{ n }, expression (7) requires the unknown K _{ n } value, i.e., the number of largest eigenvalues of the covariance matrix of {\gamma}_{\mathit{\text{RR}}}^{\left(n\right)}, for n=1, 2, 3. Choosing a small K _{ n } makes that some signals are lost whereas choosing a large K _{ n } makes that noise is included after restoration. For this case, the optimal K _{ n } should be estimated to yield an optimum restoration. Akaike information criterion (AIC) is a criterion used to measure the information lost; therefore, it is employed in MWF to determine the optimal rank K _{ n }[13]. For mode n, the AIC can be expressed as:
where \{{\lambda}_{i}^{\gamma},\phantom{\rule{1em}{0ex}}i=1,\dots ,{I}_{n}\} are the eigenvalues of {\gamma}_{\mathit{\text{RR}}}^{\left(n\right)}, M _{ n } is the column number of {\gamma}_{\mathit{\text{RR}}}^{\left(n\right)} and k _{ n } changes in the range of {1,…,I _{ n }−1}. The estimated nmode rank K _{ n } is the value of k _{ n } which minimizes AIC criterion.
1.4.4 ALS algorithm
To jointly find nmode filters {H _{ n }, n=1,2,3} that minimize (6), an Alternating Least Square (ALS) algorithm [13] is necessary. Owing to this procedure, any filter along a given mode depends on the filters along all other modes. The steps of this algorithm can be summarized as presented here.

1.
Input: Data tensor

2.
Initialization k=0:
{\mathcal{X}}^{0}=\mathcal{R}\iff {\mathbf{H}}_{n}={\mathbf{I}}_{{I}_{n}}\forall n=1,2,3. Where {\mathbf{I}}_{{I}_{n}} is the I _{ n }×I _{ n } identity matrix.

3.
ALS loop: Repeat until convergence, that is, for example, while \parallel {\mathcal{X}}^{k+1}{\mathcal{X}}^{k}\parallel >\epsilon

(a)
Estimation of K _{ n }, n=1,2,3,
{K}_{n}={argmin}_{{k}_{n}}\left[\text{AIC}\left({k}_{n}\right)\right],{k}_{n}=1,\dots ,{I}_{n}1. 
(b)
Estimation of {\mathbf{H}}_{n}^{k+1} for n=1,2,3.

(i)
{\mathcal{X}}_{n}^{k}=\mathcal{R}{\times}_{p}{\mathbf{H}}_{p}^{k+1}{\times}_{q}{\mathbf{H}}_{q}^{k}
.
p,q=1,2,3, p,q≠n and p<q

(ii)
{\mathbf{H}}_{n}^{k+1}=\underset{{\mathbf{Z}}_{n}}{argmin}{\parallel \mathcal{X}{\mathcal{X}}_{n}^{k}{\times}_{n}{\mathbf{Z}}_{n}\parallel}^{2}
subject to
{\mathbf{\text{Z}}}_{n}\in {\mathbb{R}}^{{I}_{n}\times {I}_{n}}.

(i)

(c)
Multidimensional Wiener filtering {\mathcal{X}}^{k+1}=\mathcal{R}{\times}_{1}{\mathbf{H}}_{1}^{k+1}{\times}_{2}{\mathbf{H}}_{2}^{k+1}{\times}_{3}{\mathbf{H}}_{3}^{k+1}.(d)
k\leftarrow k+1.

(a)

4.
Output: Estimated signal tensor \widehat{\mathcal{X}}=\mathcal{R}{\times}_{1}{\mathbf{H}}_{1}^{{k}_{c}}{\times}_{2}{\mathbf{H}}_{2}^{{k}_{c}}{\times}_{3}{\mathbf{H}}_{3}^{{k}_{c}}, where k _{ c } is the convergence iteration index.
As the calculation of nmode filter H _{ n } in step 33b utilizes the filters in other modes {H _{ i }, 1≤i≤3andi≠n}, it shows that the MWF considers the relationships between elements in all modes of the data set.
1.5 PARAFAC filtering
1.5.1 Denoising model
Since the decomposition by TUCKER3 model is not unique and needs to estimate the rank K _{ n } in each mode, another tensor decomposition model PARAFAC was recently introduced to reduce noise in [14]. Different from TUCKER3 model, PARAFAC model can decompose a tensor uniquely into a sum of rankone tensors:
where \widehat{\mathcal{N}} is the decomposition error. Under the assumption that signal can be expressed by finite rank PARAFAC factorization, the estimate \widehat{\mathcal{X}} of the desired signal can be expressed by the PARAFAC model:
where is a identity tensor, and {\mathbf{A}}_{n}=[\phantom{\rule{0.3em}{0ex}}{\mathbf{a}}_{n}^{1},\dots ,{\mathbf{a}}_{n}^{K}], n=1,2,3. In order to obtain the optimal A _{ n }, the error between \widehat{\mathcal{X}} and should be minimized:
Nonetheless, it is worth noting that the criterion of PARAFAC is the squared error between the estimate \widehat{\mathcal{X}} and the noisy HSI , while that of MWF is the mean squared error between the estimate \widehat{\mathcal{X}} and the desired signal (see (6)). For a given rank K, minimizing (17) means removing as little signal as possible in the denoising process.
1.5.2 Calculation of A _{ n }
To obtain A _{ n } in each mode, the error given in (17) should be minimized:
where {\widehat{\mathbf{X}}}_{n} is the nmode unfolding matrix of \widehat{\mathcal{X}} in (16):
where p,q=1,2,3 and p≠q≠n. By substituting (19) into (18), we can obtain
Obviously, the estimation of A _{ n } needs information of A _{ p } and A _{ q }, which are not known. In this situation, an ALS algorithm should be employed to calculate the optimal A _{ n }.
1.5.3 PARAFAC ALS algorithm
To jointly estimate A _{ n }, a ‘PARAFAC ALS’ algorithm is introduced and its steps are listed as follows:

1.
Input:
Data tensor .

2.
Initialization:
Set k=0 and e _{ k }=0. Randomly initialize {\mathbf{A}}_{n}^{0}\in {\mathbb{R}}^{{I}_{n}\times K}, n=1,2,3.

3.
Loop:

(a)
Estimate {\mathbf{A}}_{n}^{k+1}

(b)
Compute
{\widehat{\mathbf{X}}}_{3}^{k+1}={\mathbf{A}}_{3}^{k+1}{{\mathbf{U}}_{3}^{k+1}}^{T} 
(c)
{e}_{k+1}=?{\mathbf{R}}_{3}^{}{\widehat{\mathbf{X}}}_{3}^{k+1}{?}^{2}
, if e _{ k+1}e _{ k }>e and k is less than the maximum number of iteration, k?k+1 and then go back to step 33a. Otherwise, break the loop.

(i)
{\mathbf{U}}_{1}^{k+1}={\mathbf{A}}_{3}^{k}?{\mathbf{A}}_{2}^{k}{\mathbf{A}}_{1}^{k+1}={\mathbf{X}}_{1}^{}{\mathbf{U}}_{1}^{k+1}\left({{\mathbf{U}}_{1}^{k+1}}^{T}{\mathbf{U}}_{1}^{k+1}\right)

(ii)
{\mathbf{U}}_{2}^{k+1}={\mathbf{A}}_{3}^{k}?{\mathbf{A}}_{1}^{k}{\mathbf{A}}_{2}^{k+1}={\mathbf{X}}_{2}^{}{\mathbf{U}}_{2}^{k+1}\left({{\mathbf{U}}_{2}^{k+1}}^{T}{\mathbf{U}}_{2}^{k+1}\right)

(iii)
{\mathbf{U}}_{3}^{k+1}={\mathbf{A}}_{2}^{k}?{\mathbf{A}}_{1}^{k}{\mathbf{A}}_{3}^{k+1}={\mathbf{X}}_{3}^{}{\mathbf{U}}_{3}^{k+1}\left({{\mathbf{U}}_{3}^{k+1}}^{T}{\mathbf{U}}_{3}^{k+1}\right)

(i)

(a)

4.
Output:
Return {\mathbf{A}}_{n}^{}={\mathbf{A}}_{n}^{k+1}, n=1,2,3.
1.5.4 Rank estimation
As described in Section 1.5.1, PARAFAC filtering is an algorithm which minimizes e(A _{1},A _{2},A _{3}) in (17) under a given rank K. In other words, it is assumed that the rank K is known in PARAFAC filtering. Unfortunately, the rank K is generally unknown in practice; therefore, an algorithm used to estimate K is presented in this section. The details are as follows:

1.
Input: Data tensor

2.
Initialization:
Set i=1. Set ranksearchingset KSCOPE.

3.
Loop:

(a)
Set K=KSCOPE[i].

(b)
Do PARAFAC decomposition: \mathcal{R}=\sum _{k=1}^{K}{\mathbf{a}}_{1}^{k}\circ {\mathbf{a}}_{2}^{k}\circ {\mathbf{a}}_{3}^{k}+\widehat{\mathcal{N}}.

(c)
At n=1,2,3, calculate the covariance matrix C _{ n } of {\widehat{\mathbf{N}}}_{n}, the nmode unfolding matrix of \widehat{\mathcal{N}}.

(d)
If

(i)
{s}_{\text{diag}}^{2}=1/{I}_{n}\underset{i=1}{\overset{{I}_{n}}{?}}{({c}_{i,i}1/{I}_{n}\underset{i=1}{\overset{{I}_{n}}{?}}{c}_{i,i})}^{2}<{d}_{1}
, , where c _{ i,i } is the diagonal elements of C _{ n }.

(ii)
?{\mathbf{C}}_{n}{?}^{2}\underset{i=1}{\overset{{I}_{n}}{?}}{c}_{i,i}^{2}<{d}_{2}

(i)
these two conditions are satisfied for all n=1,2,3 at the same time, break the loop. Otherwise, i\leftarrow i+1.

(a)

4.
Output:
Return the rank K.
1.6 MWPTMWF
1.6.1 Denoising model
MWF and PARAFAC treat as an entire entity in the denoising process. This works well when there are only abundant signals or the rare signals can be neglected. However, in the situation where the rare signals cannot be neglected, such as the target detection, MWF and PARAFAC might remove rare signals in the denoising process.
MWPTMWF has been proposed to preserve rare signals in the denoising process and hence improve the denoising performance. In MWPTMWF, is estimated by minimizing MSE between the desired signal and its estimate \widehat{\mathcal{X}}:
Nevertheless, unlike MWF or PARAFAC, MWPTMWF reduces noise by jointly filtering the wavelet packet coefficient set. The details of MWPTMWF will be described in the following subsections.
1.6.2 Multidimensional wavelet packet transform
The multidimensional wavelet packet transform (MWPT) can be written in tensor form as:
and the reconstruction can be written as:
where {\mathbf{W}}_{n}\in {\mathbb{R}}^{{I}_{n}\times {I}_{n}},n=1,2,3 indicate the wavelet packet transform matrices. When the transform level vector is l=[ l _{1},l _{2},l _{3}]^{T}, where l _{ n }≥0 denotes the wavelet packet transform level in mode n, the coefficient tensor {\mathcal{C}}_{\text{l,m}}^{\mathcal{R}}, which is also called a component in this paper, of scale m=[m _{1},m _{2},m _{3}], where 0\le {m}_{n}\le {2}^{{l}_{k}}1, can be extracted by
and the corresponding inverse process is
where the extraction operator {\mathbf{E}}_{{m}_{n}} is defined as
where 0 _{1} is a zero matrix with size \frac{{I}_{n}}{{2}^{{l}_{n}}}\times \frac{{m}_{n}{I}_{n}}{{2}^{{l}_{n}}} and 0 _{2} is a zero matrix with size \frac{{I}_{n}}{{2}^{{l}_{n}}}\times \frac{({2}^{{l}_{n}}1m){I}_{n}}{{2}^{{l}_{n}}}.
1.6.3 Multiway Wiener filter in multidimensional wavelet packet domain
After the MWPT, abundant and rare signals can be separated into different components; therefore, the signal subspace of each component can be estimated more accurately than that of the entire dataset. Furthermore, a better estimation of the signal subspace can improve the performance of MWF in each component. However, the denoising criterion of MWPTMWF is the minimization of the MSE between and \widehat{\mathcal{X}} but not the component and its estimate; therefore, this subsection proves the ability of MWPTMWF in minimizing the MSE between and \widehat{\mathcal{X}}, which is defined in (21). By performing MWPT to tensor , and in expression (4), we obtain
The coefficient tensor of each part:
and the coefficient tensor of the estimate \widehat{\mathcal{X}}:
Extracting the components of each frequency {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{R}}, {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} and {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{N}} from {\mathcal{C}}_{\mathbf{l}}^{\mathcal{R}}, {\mathcal{C}}_{\mathbf{l}}^{\mathcal{X}} and {\mathcal{C}}_{\mathbf{l}}^{\mathcal{N}} respectively by using (24), we obtain
From Parseval’s theorem, the following expression can be obtained:
which means that minimizing the MSE between and its estimate \widehat{\mathcal{X}} is equivalent to minimizing the MSE between {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} and {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} for each m. If {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} is estimated by Tucker3 decomposition of {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{R}}:
then H _{1,m },H _{2,m },H _{3,m } are the nmode filters of the multiway Wiener filter aforementioned in Section 1.4. After estimating {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} for each m, we obtain {\widehat{\mathcal{C}}}_{\mathbf{l}}^{\mathcal{X}} by concatenating {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}. Furthermore, the estimate \widehat{\mathcal{X}} can be obtained by inverse MWPT:
1.6.4 Best transform level and basis selection
In MWPTMWF, several parameters should be determined, as presented here.

1.
Level of transform: the performance of the algorithm is affected by the level of transform, which depends on the size of tensor . The maximum level can be calculated by
{N}_{{L}_{k}}=\lceil \underset{2}{log}{I}_{k}\rceil 5,\phantom{\rule{1em}{0ex}}k=1,2,3,(36)where ⌈·⌉ rounds a number upward to its nearest integer, and the constant 5 is subtracted from \lceil \underset{2}{log}{I}_{k}\rceil to make sure there are enough elements in each mode so that the transform is meaningful.
Then, the set of possible transform levels can be expressed as:
{L}_{k}=\{0,1,\cdots \phantom{\rule{0.3em}{0ex}},{N}_{{L}_{k}}\},\phantom{\rule{1em}{0ex}}k=1,2,3,(37)where {·} denotes a set.

2.
Basis of transform: there are many wavelet bases designed for different cases. For the simplicity of expression, we define
W=\{{\mathrm{w}}_{1},\phantom{\rule{1em}{0ex}}{\mathrm{w}}_{2},\cdots \phantom{\rule{0.3em}{0ex}},{\mathrm{w}}_{{N}_{W}}\}(38)to denote the set of possible wavelet bases, where N _{ W } is the number of wavelets in this set.
The best transform level and basis should minimize the MSE or risk {R}_{c}(\mathcal{X},\widehat{\mathcal{X}})=\mathrm{E}\left[\parallel \mathcal{X}\widehat{\mathcal{X}}{\parallel}^{2}\right][36], whose equivalent form for each component can be expressed as:
Then, the best transform level and basis can be selected by
While selecting the optimal l,w depends on which is generally unknown. To overcome this drawback, an alternative solution should be found. Denoting by {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}\left[\phantom{\rule{0.3em}{0ex}}d\right] the estimate of {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} at the dth ALS loop aforementioned in Section 1.4.4 and noticing that, when \parallel {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}\left[\phantom{\rule{0.3em}{0ex}}d\right]{\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}[\phantom{\rule{0.3em}{0ex}}d1]{\parallel}^{2} is minimized, {\widehat{\mathcal{C}}}_{\mathbf{l},m}^{\mathcal{X}}\triangleq {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}\left[\phantom{\rule{0.3em}{0ex}}d\right] is the optimal estimate of {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} obtained by MWF, and at the same time, \mathrm{E}\left[\parallel {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}{\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}{\parallel}^{2}\right] is minimized according to Section 1.4.2. Therefore, (40) can be replaced by
where
1.6.5 Summary of the MWPTMWF
The proposed algorithm, which is denoted by MWPTMWF, can be summarized as presented here.

1.
Input:
Data tensor .

2.
Initialization:
Set L=\{1,\dots ,{N}_{{L}_{k}}\}, W=\{{w}_{1},\dots ,{W}_{{N}_{w}}\} and the risk threshold ε.

3.
Loop:
For each l _{1},l _{2},l _{3}∈L and w∈W. Loop l _{1},l _{2},l _{3} and w:

(a)
Decompose the whitened data by MWPT: {\mathcal{C}}_{\mathbf{l}}^{\mathcal{R}}=\mathcal{R}{\times}_{1}\mathbf{W}1{\times}_{2}\mathbf{W}2{\times}_{3}\mathbf{W}3.

(b)
Extract component {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{R}} from {\mathcal{C}}_{\mathbf{l}}^{\mathcal{R}} by (24), for m=[m _{1},m _{2},m _{3}]^{T}, where 0\le {m}_{k}\le {2}^{{l}_{k}}1, k=1,2,3.

(c)
Filter component {\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{R}} by MWF: {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}={\mathcal{C}}_{\mathbf{l},\mathbf{m}}^{\mathcal{R}}{\times}_{1}{\mathbf{H}}_{1,\mathbf{m}}{\times}_{2}{\mathbf{H}}_{2,\mathbf{m}}{\times}_{3}{\mathbf{H}}_{3,\mathbf{m}}.

(d)
Calculate the risk \widehat{{R}_{c}}=\sum _{\mathbf{m}}\parallel {\widehat{\mathcal{C}}}_{\mathbf{l},m}^{\mathcal{X}}\left[d\right]{\widehat{\mathcal{C}}}_{\mathbf{l},m}^{\mathcal{X}}[\phantom{\rule{0.3em}{0ex}}d1]{\parallel}^{2}. If \widehat{{R}_{c}} reaches a fixed threshold ε, return the optimal l _{1},l _{2},l _{3},w and {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}}.

(a)

4.
Output: Concatenate {\widehat{\mathcal{C}}}_{\mathbf{l},\mathbf{m}}^{\mathcal{X}} to obtain {\mathcal{C}}_{\mathbf{l}}^{\mathcal{X}} and perform inverse MWPT: \widehat{\mathcal{X}}={\widehat{\mathcal{C}}}_{\mathbf{l}}^{\mathcal{X}}{\times}_{1}\mathbf{W}{1}^{T}{\times}_{2}\mathbf{W}{2}^{T}{\times}_{3}\mathbf{W}{3}^{T}.
1.7 Experimental results
In this section, we use a realworld high spatial resolution image acquired by HYperspectral Digital Imagery Collection Experiment (HYDICE). The HYDICE image contains 65 rows, 100 columns, and 160 spectral bands, and is modeled as a 65×100×160 tensor in this paper. Six targets of interest are selected in the image as shown in the groundtruth map in Figure 1 and the corresponding mask is shown in Figure 2. The spectral signatures of these six targets are presented in Figure 3. These six targets are chosen because they have different spectral signatures and sizes, so that the denoising and target detection performance on different target sizes can be evaluated.
White Gaussian noise is added into the HSI with signaltonoise ratio (SNR) ranged from 15 to 30 dB (with a step of 5 dB) to reproduce different simulation scenarios. MWF, PARAFAC, and MWPTMWF are used to reduce noise in the HSI. The ranksearchingset of PARAFAC is set as [51,101,151,201], and wavelet db3 is selected to do MWPT with transform levels [ l _{1},l _{2},l _{3}]=[ 1,1,0].
1.7.1 Denoising performance evaluation and comparison
To present the denoising results intuitively, Figure 4 shows the target spectral signatures of the noisy HSI and the HSI denoised by MWF, PARAFAC, and MWPTMWF, respectively. By comparing the four subfigures in Figure 4, it is evident that denoising is a necessary procedure to restore the target spectral signatures. Moreover, we can see that there still exists more noise in Figure 4b than Figure 4c and Figure 4d. Especially, the spectral signatures of targets 1, 3, and 5 are almost mixed together in Figure 4b. Figure 4c and Figure 4d are much better, at least the residual noise is small after denoising. However, the spectral signatures are changed more greatly after denoising by PARAFAC than by MWPTMWF, which can be seen obviously from the signatures of targets 5 and 6. In Figure 4c, the signatures of targets 5 and 1 are almost overlapped, while in Figure 4d, these two signatures can be distinguished easily.
To compare the performances of MWF, PARAFAC, and MWPTMWF, the SNR of the image after denoising, also named as SNR output, is defined as below [12]:
If SNR_{OUTPUT} is greater than SNR_{INPUT}, we can conclude that the algorithm improves the SNR of the image.
The SNR_{OUTPUT} values obtained when SNR_{INPUT} is varying from 15 to 30 dB by different denoising methods are shown in Table 1. It is obvious that MWPTMWF outperforms the other two denoising methods significantly. When the SNR_{INPUT} is low, from 15 to 25 dB in Table 1, the denoising result of PARAFAC is better than that of MWF. But when the SNR_{INPUT} is high, from 25 to 30 dB, the performance of MWF is slightly better than that of PARAFAC. Moreover, it is worth noting that all of these three methods can improve the SNR significantly. When the SNR_{INPUT} is 15 dB, the SNR_{OUTPUT} after denoising is 30 dB maximum by MWPTMWF and 24 dB minimum by MWF. The denoising results shown in Table 1 give the experimental evidence of the benefits derived from the denoising procedure.
1.7.2 Target detection performance evaluation and comparison
In the last subsection, we have compared the denoising performances of different methods in the aspect of SNR_{OUTPUT}. However, sometimes SNR_{OUTPUT} cannot reflect the denoising performance we want, especially when we consider preserving small targets in the HSI while removing noise. Hence, in this subsection, we compare the target detection performance after denoising by MWF, PARAFAC, and MWPTMWF.
Spectral Angle Mapper (SAM) detector [37] is used in the experiment to detect targets in the HSI. As SAM does not require the characterization of the background, it can avoid the inaccuracy of the comparison result caused by the noise covariance matrix estimation error. The SAM detector can be expressed as
where s is the reference spectrum and x is the pixel spectrum.
To assess the performances of detection, the probability of detection (Pd) is defined as
and the probability of false alarm (Pfa) is defined as
where n _{ s } is the number of spectral signatures, N _{ i } the number of pixels with spectral signature i, {N}_{i}^{\mathit{\text{rd}}} the number of correctly detected pixels, and {N}_{i}^{\mathit{\text{fd}}} the number of falsealarm pixels.
Figures 5, 6, and 7 are the target detection results after denoising by MWF, PARAFAC, and MWPTMWF, respectively, in the noise environment SNR_{INPUT}=15dB. In the images, the black pixel indicates notarget, the green the correctdetection, the red the false alarm, and the blue the missed target. From Figure 5, which shows the detection result after denoising by MWF, we can see that all of the three small targets (targets 1, 3, and 5) are missed in the detection. Moreover, most of the pixels of target 5 are also missed. The detection result after denoising by PARAFAC, in Figure 6, is slightly better than that by MWF, but all of the small targets are also lost in the detection. MWPTMWF shows its capability of preserving small targets in Figure 7, in which two of the three small targets are detected correctly. Apart from preserving small targets, MWPTMWF can also improve the detection performance of the largesizesmallenergy target 6, which is obvious by comparing Figures 5, 6, and 7.
To evaluate the detection performance in different noise environments, Table 2 shows the Pd values versus SNR_{INPUT} of different denoising methods with SNR_{INPUT} ranged from 15 to 30 dB.
It is obvious that the detection result after denoising by MWPTMWF outperforms the two other methods. By comparing Table 2 with Table 1, we can understand that the denoising process can improve the target detection performance.
2 Conclusion
In this paper, a survey has been presented on three recently proposed tensor filtering methods: MWF, PARAFAC, and MWPTMWF. They utilize multilinear algebra in analyzing a multidimensional data cube to jointly filter it in each mode.
The MWF extends the classical Wiener filter to the multidimensional case by using the TUCKER3 decomposition while minimizing the MSE between the desired signal tensor and the estimated signal tensor. As the filter in one mode relies on the filters in the other modes, the ALS algorithm is used to jointly calculate the MWF filters. In the filtering process, the signal subspace rank in mode n needs to be known to remove the noise in the orthogonal complement subspace of the signal subspace. For this reason, the AIC algorithm is taken to estimate the rank in mode n, which implies that the MWF can reduce noise automatically.
The PARAFAC filtering method was proposed to reduce the number of rank values to be estimated. As aforementioned, the rank in each mode must be estimated in MWF, while only one rank must be estimated in PARAFAC filtering. Moreover, the lowrank PARAFAC decomposition is unique for rank values higher than one, whereas the TUCKER3 decomposition is not. However, there is not an efficient way to estimate the PARAFAC rank automatically. Though we have shown a rank estimation method in this paper, it is a timeconsuming brute force searching way.
The MWF and PARAFAC were proposed to process the HSI as a whole entity, but this may remove the small targets in an HSI in the denoising process. Distinguishing from MWF and PARAFAC, MWPTMWF firstly transforms the HSI into different wavelet packet sets, also called components in this paper, and then filters each component as a whole entity. As the small targets are separated from the large ones, the former can be well preserved in the denoising process.
A realworld HYDICE HSI is used in the comparative study. Quantitative and visual evaluation of the three methods is shown. From the experimental results, we can conclude that MWPTMWF is a suitable tool for denoising especially when there exist small targets in the HSI.
References
Kotwal K, Chaudhuri S: Visualization of hyperspectral images using bilateral filtering. IEEE Trans. Geosci. Remote Sens 2010, 48(5):23082316.
Lewis S, Hudak A, Ottmar R, Robichaud P, Lentile L, Hood S, Cronan J, Morgan P: Using hyperspectral imagery to estimate forest floor consumption from wildfire in boreal forests of Alaska, USA. Int. J. Wildland Fire 2011, 20(2):255271. 10.1071/WF09081
Tiwari K, Arora M, Singh D: An assessment of independent component analysis for detection of military targets from hyperspectral images. Int. J. Appl. Earth Obs. Geoinf 2011, 13(5):730740. 10.1016/j.jag.2011.03.007
Veracini T, Matteoli S, Diani M, Corsini G: Nonparametric framework for detecting spectral anomalies in hyperspectral images. IEEE Geosci. Remote Sens. Lett 2011, 8(4):666670.
Prasad S, Li W, Fowler JE, Bruce LM: Information fusion in the redundantwavelettransform domain for noiserobust hyperspectral classification. IEEE Trans. Geosci. Remote Sens 2012, 50(9):34743486.
Kerekes J, Baum J: Fullspectrum spectral imaging system analytical model. IEEE Trans. Geosci. Remote Sens 2005, 43(3):571580.
Uss ML, Vozel B, Lukin VV, Chehdi K: Local signaldependent noise variance estimation from hyperspectral textural images. IEEE J. Sel. Topics Signal Process 2011, 5(3):469486.
Acito N, Diani M, Corsini G: Subspacebased striping noise reduction in hyperspectral images. IEEE Trans. Geosci. Remote Sens 2011, 49(4):13251342.
Shao L, Yan R, Li X, Liu Y: From heuristic optimization to dictionary learning: a review and comprehensive comparaison of image denoising algorithms. IEEE Trans. Cybernet. 2013. in press.
Yan R, Shao L, Liu Y: Nonlocal hierarchical dictionary learning using wavelets for image denoising. IEEE Trans. Image Process 2013, 22(12):46894698.
Yan R, Shao L, Cvetković S, Klijn J: Improved nonlocal means based on preclassification and invariant block matching. J. Display Technol 2012, 8(4):212218.
Letexier D, Bourennane S: Noise removal from hyperspectral images by multidimensional filtering. IEEE Trans. Geosci. Remote Sens 2008, 46(7):20612069.
Renard N, Bourennane S: Improvement of target detection methods by multiway filtering. IEEE Trans. Geosci. Remote Sens 2008, 46(8):24072417.
Liu X, Bourennane S, Fossati C: Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis. IEEE Trans. Geosci. Remote Sens 2012, 50(10):37173724.
Richards JA: Remote sensing digital image analysis: an introduction. Berlin Heidelberg: Springer; 2012.
Chein IC, Qian D: Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens 2004, 42(3):608619. 10.1109/TGRS.2003.819189
Kuybeda O, Malah D, Barzohar M, Rank estimation and redundancy reduction of highdimensional noisy signals with preservation of rare vectors: IEEE Trans. Signal Process. 2007, 55(12):55795592.
Acito N, Diani M, Corsini G: A new algorithm for robust estimation of the signal subspace in hyperspectral images in the presence of rare signal components. IEEE Trans. Geosci. Remote Sens 2009, 47(11):38443856.
MartinHerrero J: Anisotropic diffusion in the hypercube. IEEE Trans. Geosci. Remote Sens 2007, 45(5):13861398.
MendezRial R, CalvinoCancela M, MartinHerrero J: Accurate implementation of anisotropic diffusion in the hypercube. IEEE Geosci. Remote Sens. Lett 2010, 7(4):870874.
Le Bihan N, Ginolhac G: Threemode data set analysis using higher order subspace method: application to sonar and seismoacoustic signal processing. Signal Process 2004, 84(5):919942. 10.1016/j.sigpro.2004.02.003
Vasilescu MAO, Terzopoulos D: Multilinear image analysis for facial recognition. In International Association of Pattern Recognition (IAPR). Quebec City; August 2002:511514.
Muti D, Bourennane S: Multidimensional signal processing using lowerrank tensor approximation. In IEEE ICASSP. Hongkong; 6–10 April 2003:45760.
Muti D, Bourennane S: Multidimensional filtering based on a tensor approach. Signal Process 2005, 85(12):23382353. 10.1016/j.sigpro.2004.11.029
Letexier D, Bourennane S, Talon J: Nonorthogonal tensor matricization for hyperspectral image filtering. IEEE Geosci. Remote Sens. Lett 2008, 5: 37.
Harshman RA, Lundy ME: The PARAFAC model for threeway factor analysis and multidimensional scaling. In Research methods for multimode data analysis. New York: Praeger; 1984:122215.
Carroll JD, Chang JJ: Analysis of individual differences in multidimensional scaling via an Nway generalization of EckartYoung decomposition. Psychometrika 1970, 35(3):283319. 10.1007/BF02310791
Smilde A, Bro R, Geladi P: Multiway analysis: applications in the chemical sciences. Hoboken: Wiley; 2005.
Guo X, Miron S, Brie D, Zhu S, Liao X: A CANDECOMP/PARAFAC perspective on uniqueness of DOA estimation using a vector sensor array. IEEE Trans. Signal Process 2011, 59(7):34753481.
De Almeida AL, Favier G, Mota JCM: PARAFACbased unified tensor modeling for wireless communication systems with application to blind multiuser equalization. Signal Process 2007, 87(2):337351. 10.1016/j.sigpro.2005.12.014
Liu X, Bourennane S, Fossati C: Nonwhite noise reduction in hyperspectral images. IEEE Geosci. Remote Sens. Lett 2012, 9(3):368372.
Lin T, Bourennane S: Hyperspectral image processing by jointly filtering wavelet component tensor. IEEE Trans. Geosci. Remote Sens 2013, 51(6):35293541.
Kolda TG, Bader BW: Tensor decompositions and applications. SIAM Rev 2009, 51(3):455500. 10.1137/07070111X
Cichocki A, Zdunek R, Phan A, Amari S: Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation. Hoboken: Wiley; 2009.
Muti D, Bourennane S, Marot J: Lowerrank tensor approximation and multiway, filtering. SIAM J. Matrix Anal. Appl 2008, 30(3):11721204. 10.1137/060653263
Donoho D, Johnstone I: Ideal denoising in an orthonormal basis chosen from a library of bases. Comptes Rendus de l’Academie des SciencesSerie IMathematique 1994, 319(12):13171322.
Jin X, Paswaters S, Cline H: A comparative study of target detection algorithms for hyperspectral imagery. In SPIE Defense, Security, and Sensing. Orlando, FL; 13–17 April 2009.
Acknowledgements
The authors would like to thank the reviewers for their careful reading and helpful comments which improve the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Lin, T., Bourennane, S. Survey of hyperspectral image denoising methods based on tensor decompositions. EURASIP J. Adv. Signal Process. 2013, 186 (2013). https://doi.org/10.1186/168761802013186
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/168761802013186